Write Tests That Matter: Tackle The Most Complex Code First
There are a lot of discussions, articles, and blogs around the topic of code quality. People say - use Test Driven techniques! Tests are a must have to start any refactoring! That’s all cool, but it’s 2016 and there is a massive volume of products and
While I’d like to be always at the leading, or even bleeding, edge of the technology world - engaged with new cool projects and technologies – unfortunately it’s not always possible and often I have to deal with old systems. I like to say that when you develop from scratch, you act as a creator, mastering new matter. But when you’re working on legacy code, you’re more like a surgeon – you know how the system works in general, but you never know for sure whether the patient will survive your operation. And since it’s legacy code, there are not many up to date tests for you to rely on. This means that very frequently one of the very first steps is to cover it with tests. More precisely, not merely to provide coverage, but to develop a test coverage strategy.
Forget 100% coverage. Test smarter by identifying classes that are more likely to break.
Basically, what I needed to determine was what parts (classes / packages) of the system we needed to cover with tests in the first place, where we needed unit tests, where integration tests would be more helpful etc. There are admittedly many ways to approach this type of analysis and the one that I’ve used may not be the best, but it’s kind of an automatic approach. Once my approach is implemented, it takes minimal time to actually do the analysis itself and, what is more important, it brings some fun into legacy code analysis.
The main idea here is to
The first one measures how many classes use our class, so it basically tells us how close a particular class is to the heart of the system; the more classes there are that use our class, the more important it is to cover it with tests.
On the other hand, if a class is very simple (e.g. contains only constants), then even if it’s used by many other parts of the system, it’s not nearly as important to create a test for. Here is where the second metric can help. If a class contains a lot of logic, the Cyclomatic complexity will be high.
The same logic can also be applied in reverse; i.e., even if a class is not used by many classes and represents just one particular use case, it still makes sense to cover it with tests if its internal logic is complex.
There is one
To summarize: we need to identify classes with high CA and Cyclomatic complexity. In mathematical terms, a fitness function is needed that can be used as a rating - f(CA,Complexity) - whose values increase along with CA and Complexity.
Generally speaking, the classes with the smallest differences between the two metrics should be given the highest priority for test coverage.
Finding tools to calculate CA and Complexity for the whole code base, and provide a simple way to extract this information in CSV format, proved to be a challenge. During my search, I came across two tools that are free so it would be unfair not to mention them:
- Coupling metrics: www.spinellis.gr/sw/ckjm/
- Complexity: cyvis.sourceforge.net/
A Bit Of Math
The main problem here is that we have two criteria – CA and Cyclomatic complexity – so we need to combine them and convert into one scalar value. If we had a slightly different task – e.g., to find a class with the worst combination of our criteria – we would have a classical multi-objective optimization problem:
We would need to find a point on the
There are a lot of techniques that we can use here. Each has its own pros and cons. However, the most popular ones are linear scalarizing and the one based on
f(CA, Complexity) = A×CA + B×Complexity
where A and B are some coefficients.
The point which represents a solution to our optimization problem will lie on the line (blue in the picture below). More precisely, it will be at the intersection of the blue line and red Pareto front. Our original problem is not exactly an optimization problem. Rather, we need to create a ranking function. Let’s consider two values of our ranking function, basically two values in our Rank column:
R1 = A∗CA + B∗Complexity and R2 = A∗CA + B∗Complexity
Both of the formulas written above are equations of lines,
Unfortunately, there is an issue with this approach. For any line (Rank value), we’ll have points with very small CA and very big Complexity (and visa
The other way to do the scalarizing is based on the reference point. Reference point is a point with the maximum values of both criteria:
(max(CA), max(Complexity))
The fitness function will be the distance between the Reference point and the data points:
f(CA,Complexity) = √((CA−CA )2 + (Complexity−Complexity)2)
We can think about this fitness function as a circle with the center at the reference point. The
This approach deals better with extreme values but there are still two issues: First – I’d like to have more points near the reference points to better overcome the problem that we’ve faced with
Here is a small trick that we can apply to solve the first issue – instead of looking at the CA and Cyclomatic Complexity, we can look at their inverted values. The reference
Inverted and normalized complexity – NormComplexity:
(1 + min(Complexity)) / (1 + Complexity)∗100
Inverted and normalized CA – NormCA:
(1 + min(CA)) / (1+CA)∗100
Note: I added 1 to make sure that there is no division by 0.
The following picture shows a plot with the inverted values:
Final Ranking
We are now coming to the last step - calculating the rank. As mentioned, I’m using the reference point method, so the only thing that we need to do is to calculate the length of the vector, normalize it, and make it ascend with the importance of a unit test creation for a class. Here is the final formula:
Rank(NormComplexity , NormCA) = 100 − √(NormComplexity2 + NormCA2) / √2
More Statistics
There is one more thought that I’d like to add, but let’s first have a look at some statistics. Here is a histogram of the Coupling metrics:
What is interesting about this picture is the number of classes with low CA (0-2). Classes with CA 0 are either not used at all or are top level services. These represent API endpoints, so it’s fine that we have a lot of them. But classes with CA 1 are the ones that are directly used by the endpoints and we have more of these classes than endpoints. What does this mean from architecture / design perspective?
In general, it means that we have a kind of script oriented approach – we script every business case separately (we can’t really reuse the code as business cases are too diverse). If that is the case, then it’s definitely a code smell and we need to do refactoring. Otherwise, it means the cohesion of our system is low, in which case we also need refactoring, but architectural refactoring this time.
Additional useful information we can get from the histogram above is that we can completely filter out classes with
You can find all the scripts and resources that I have used in this GitHub repository:
Does It Always Work?
Not necessarily. First of