Many sources on testing imagine that the developer, or someone close-by, knows exactly how the system works and how it behaves, and imagines that the writing of tests naturally means supplying known inputs to some routine, and checking the results against the expected outputs. But the truth is that for some code, what it does and how it works is effectively unknown … not only do we not know what it does, we don’t even know if it does it correctly!
A ‘Characterization Test’ is a test that you write not knowing what the code under test does or what the correct operation is. The tests will be written in conjunction with examining the code, writing test expectations in response to seeing the actual results; turning the normal expectation for writing tests on its head.
Michael Feathers writes (in ‘Characterization Testing – Writing tests to describe and fix existing code‘):
“Most types of testing have the quality of being about correctness. We test our code to see that it is doing what we want it to do. This assumes that we know what it is supposed to do – that we have some sort of specification in our heads or elsewhere that tells us what we’re aiming for.
What if we don’t?
Unfortunately, this is exactly where we are with a lot of software. We need to make changes to it but we don’t know enough about what it does.”
The Value of Writing the Tests
There are several ways in which writing tests are useful, evem when you can not confirm if the results are correct:
- Testing frameworks often allow you to run some code more easily than writing any other kind of testbed on which to run experiments; I have often found that an NUnit test is a good way to understand some library function or other without creating some new experimental solution (I don’t propose that you start writing unit tests of the .Net Framework, but you may still occasionally want to clarify some particular detail of its operation);
- Adding tests as you progressively identify key parts of the operation of the method under test will help you learn step-by-step. You might start with simple tests of the null-checking guard clauses at the start of a method (or you might choose to skip that)… or perhaps if any guard clauses apply any sort of validity check (e.g. byte b must be >= 0 and <= 99) then you might want to create tests that specifically check for an exception if the input is out of range;
- Having created tests for guard clauses, you may want to try some random input. You don’t yet know what the expected outputs are, though, but I suggest you still write your Assert statement with a test value that will ‘guarantee’ the test will fail first run… (if anyone else is likely to see this code, or you are likely to be delayed in your next steps, make it clear this expected-result is a placeholder!) e.g.
string result = MethodUnderTest(63, "Meatballs"); Assert.AreEqual("**Update this expected result to match actual, Characterization Test!**", result);
- As you provide values and get results, you can update the test name to be meaningful and of course update the test – say you happen to pick an input value at random, and it turns out that the method throws an exception; then rewrite the test to check that the method throws an exception when that value is entered! At this point, you may not understand why the exception was thrown, and you may not be able to be sure that it is ‘correct’ (or ‘intended’) operation, but it is nevertheless the current behaviour.
- As you build up this approach, not only are you learning how the code operates; perhaps the exception that was thrown actually explains the problem with the value (opening your eyes to a whole class of problem you were not aware of) you may also be building some familiarity with the code.
This approach may take you step-by-step through to understanding the code completely… perhaps even leading to understanding why it does what it does and how it achieves that. Or perhaps it will only take you part of the way… but conceptually it is possible that at the end of the process you will be no nearer to understanding the code; if so, you will at least have a number of tests that characterise the current behavior of the system. With these tests in-hand you are in a far better position to modify the code in future; perhaps you want to modernise it, and (with the tests passing appropriately) you will have some increased confidence that the change has been sound. Additionally, if you have any mystery results, you will have a better understanding of the ‘normal behaviour’ to discuss the unexpected results with a colleague. Naturally such discussions may lead to a decision to classify aspects of the operation as an error in need of a fix, or perhaps to identify things that should be enhanced.
Problems with Characterization Tests
I think the biggest risk with Characterization Tests is probably going to be communication of the idea that these ‘expected results’ are far less informed and meaningful than the expected result of a test which may have been written for a new method as it was being created for the first time. One hopes that in this case the author of the new method knows how they expect it to perform and they will create appropriate tests as they work. But two months later, looking at a suite of tests, how will another developer know the difference between these ‘TDD’ tests, and a suite of Characterization Tests?
My thinking here is that if you are not writing ‘Unit Tests’ you may want to make that clear by applying an CategoryAttribute to the test (e.g. in C# with NUnit, you could use [Category(“Characterization”)] before the tests, or with a name, or by endeavouring to make it clear in comments that these tests are to be considered slightly differently.
‘Effectively Unknown‘ System Operation?
I suggested earlier than system operation may be effectively unknown. How can a system’s operation be described as unknown when we believe it is working and basically everything is going ok? There may even be one or more experts on-hand who can tell you a lot about how the system works in general, and yet not recall the precise details of how something works. Other circumstances in which I think it is fair to say the system operation is unknown to all practical extents is:
- There are no specification documents and / or no reliable change history (e.g. tickets on a change control system such as Jira) — how can you know the code is behaving as specified if there is no specification? Obviously, pre-existing tests might act as a specification of sorts but they don’t always exist;
- There are no central error logs, and / or any error-logging is not regularly monitored — how can you know the system is working if you are not taking the time to properly monitor errors that may tell you something is not working? In a general sense, people often assume everything is working as long as no-one is shouting about it not-working; but people rarely look for problems!
- There are no unit tests, or integration tests, or incomplete coverage of them — if the system is not tested thoroughly, how can you know it is working?
- Even with very high-level test coverage, I would argue that we can not be certain that the private / internal code properly reflects the intention (e.g. imagine some sort of graphic processing that correctly processes an image in some way or other… but then imagine that some internal method ignores the convention of using 0-based pixel indexes and uses 1-based instead. Overall the system may work, but it would surely surprise anyone expecting the routine to use 0-based indexes for all the points, wouldn’t it?)
Conclusion
In an ideal world, Characterization Tests would allow you to develop your knowledge of the code to get a complete understanding of what ‘correct’ operation for that code is (at this point the tests would effectively become Unit or Integration Tests). While that is not guaranteed, a reasonable interim goal may be to have a suite of tests that check that operation of the code is not changed uninentionally over time.
Whatever the exact outcome, I have certainly found the process very useful; and I recommend the process if you have legacy (meaning ‘untested’) code in your codebase.