Net Objectives

Net Objectives
If you are interested in coaching or training in ATDD or TDD please click here.

Thursday, December 8, 2011

Lies, Damned Lies, and Code Coverage

Download the Podcast

As unit testing has gained a strong foothold in many development organizations, many teams are now laboring under a code coverage requirement.  75% - 80% of the code, typically, must be covered by unit tests.  Most popular Integrated Development Environments (IDE’s) include tools for measuring this percentage, often as part of their testing framework.

Let’s ask a couple of questions, however:
  1. "What does code coverage actually measure?"
  2. "What does mandating a code coverage percentage get you?"
These two will yield another: 
  
     3. “Is code coverage actually useful for anything?”


What does code coverage actually measure?

Test-related code coverage measures the percentage of code[1] that is executed when the suite of unit tests run.  By demanding a high percentage of coverage, management is attempting to ensure quality; the premise being that if the code is invoked during the suite's execution it is therefore guaranteed to be correct.

But, consider this:

// pseudocode
class Foo {
    public ret someAlgorithm(par parameter){
        // some complex algorithm that should be tested
    }
    public ret someOtherAlgorithm(par parameter) {
        // some other complex algorithm that should be tested
    }
}

class FooTest {
    public void testOfFooBehvaior() {
        Foo testFoo = new Foo();
        testFoo.someAlgorithm(Any.par());
        testFoo.someOtherAlgorithm(Any.par());
        assertTrue(true);
    }
}

Anyone want to run the code coverage on this?  It is going to clock at 100%, assuming the algorithms comprise single code execution paths.  You might need to do a bit more if the paths branch (using different parameters in the calls), or more, depending on the type of coverage you're aiming for, and the test will always pass.  It’s a test of nothing (true always being, you know, true).

Code coverage does not measure how much code is tested, it covers how many lines of code is executed. Now, I can hear you saying “yeah, but that’s a completely contrived example! Why would anyone do that?

Even if the developers would not dare to do something so brazen, they might be tempted to write the simplest tests they could, perhaps using a tool that automatically generates a test-per-method to save time. These tests would simply reflect the current code's behavior, not the correct behavior of the system; what the system does, not what it is needs to do

Why would they do that?

Why indeed.  That gets us to question number two.

What does mandating a code coverage percentage get you?

There is an old adage in project management: “You get what you measure”[2].  Woe-betide the organization that decides to pay its developers based on the number of bugs they fix per quarter. There will be a lot of bugs to fix in that code!  Or, more realistically, many teams have been compensated for the number of lines of code they generate.  Not surprisingly they have been writing lots and lots of unnecessary code. This is just human nature.

If developers are writing unit tests because “the boss says so” then they have no real professional or personal motivation driving the activity.  They’re doing it because they have to, not because they want to.  Thus, they will put in whatever effort they have to in order to increase their code coverage to the required level and not one bit more.  It becomes a “tedious thing I have to do to before I can check in my code, period."

At a recent Net Objectives conference a member of the audience[3] came up after a talk we gave on TDD and shared a piece of code he had found in a code base he had inherited.  It was a class with a single method that did something legitimate (but was, apparently, difficult to write a unit test for  -- maybe it had a void return).  But the developer had added a second method.  This second method created an integer i, incremented the integer 700 times (not in a loop, but literaly 700 “i++; increments), and returned the result.  His unit test then called this second method and asserted the return was 700.  Because this bogus method was so lengthy he got his 75% code coverage without calling the legitimate method at all.  How had he arrived at 700?  He probably started with a smaller number and kept copying-and-pasting the “i++;“ until the coverage hit 75%.

Here again, this is a rather extreme case.  What’s not so extreme is leaving code in place that is actually never used (“dead code”) simply because it has tests, and since removing the code would mean removing the tests, this would lower the code coverage.  Should we keep dead code just so we can keep the tests? 

You get what you measure.

The only way to get developers to write the tests we really want them to write (and the only way to reliably get anyone to do anything, frankly) is to point out to them why they should care, what benefit will accrue to them if they spend time, energy, and passion to create them.  Most of the other topics we will write about in this work will, in one way or another, provide this motivation[4]. 

But... then is code coverage actually useful for anything? 

Yes, and here we will see an example of something that occurs repeatedly in TDD: using a tool for something other than it was intended for.  Something better.

Often in TDD, especially in the initial stage of developing some particular behavior of the system, we find ourselves less than certain about how to proceed, what exactly a requirement means, or just what the system’s code should do.  When we’re “in the weeds” we might choose to investigate the issue by writing a lot of small tests to work out the edges, boundaries, and permutations of a behavior in order to improve our understanding of it.  These “triangulation tests” [5] can be very useful, but they are often largely redundant.  Once we get the understanding we need, can write the proper test, and create the proper behavior in the system to get it to pass, we then will want to remove some, most, or all of the triangulation tests.

But... is it some, most, or all?  Here is where the code coverage measurement will help. Before removing a test that you believe to be redundant, run the coverage percentage and note it.  It should, in TDD, be 100% or very close to it.  Now remove the theoretically redundant test.  Finally, re-run the coverage percentage.  If it has slipped, even a little, then one of two things must be true: either the test you removed was not entirely redundant, or you have dead code somewhere in the system.  Either way now is the time to figure it out and fix it.

Here’s another use:  In TDD we usually find that the test suite, once we’re done developing the system, will serve other purposes.  One such purpose is this: if you come back to the system six months later  the suite of tests might be the best thing to read in order to get re-familiarized with the system.  If they all compile, and pass, then they are accurate to the system [6].  However, can we be certain that no one has added to the system without adding to the test suite?  Sure.  Run the code coverage.  If it’s not 100% then someone has enhanced the system without doing TDD, and you know it.

Developers who run code coverage for these purposes love their coverage tool.  And, as we’ll see, the kind of tests we’ll be learning about in this work will be the tests that developers love, care for, and always keep current to the system.  Because they help us to succeed.

----

[1] We know there are different types and levels of code coverage, the blog is relevant for all of them. See http://en.wikipedia.org/wiki/Code_coverage for more on the subject.
[2] This is often attributed to Lord Kelvin, but he actually said “If you cannot measure it, your knowledge is meager and unsatisfactory.”  Tom Peter’s paraphrase is more to our point: “"What gets measured, gets done ."   Or we can go to Albert Einstein, who wrote on his wall: "Not everything that counts can be counted, and not everything that can be counted counts."
[3] Paddy Healey is the gentleman.  
[4] This should not be read as a slam on developers, btw.  We’re often given bureaucratic tasks to complete in life, and it’s understandable that we have little energy on them.  We simply need to make sure our tests are not in that category!
[5] Much more on this in another blog.
[6] Much much more on this in another blog!

7 comments:

  1. I agree. I also have a suspicion that mutation testing fulfills the original intent of code coverage better than code coverage ever could. When mutation testing fails it says "this line of code isn't justified with a test" as opposed to code coverage saying "sure... the instruction pointer passes through this line of code when you run some tests."

    ReplyDelete
  2. Little typo:
    "guaranteed to by correct"

    ReplyDelete
  3. Max: For mutation testing, are you referring to something like Jester?

    ReplyDelete
  4. Hi, guys!

    Looking really forward to hear from you, since I enjoy reading your posts very much!

    By the way, are you planning to take on the 'how to start' topic? There are really different approaches on how to start from the blank sheet - either write a test and assume one object does everything for you, then implement the simplest thing that could work (even if naive) and refactor a more complex structure as you go, and the second approach I came across is to define some (simple) collaboration model upfront and use mocks for roles in that model to cater for what you don't have. So I'm kind of confused on the topic and I'd like to hear your take on it!

    Best regards and happy new year!

    ReplyDelete
  5. We'll add it to the list of topics, Astral. Thanks for the suggestion!

    ReplyDelete