Testing and Code Coverage


Introduction

Hopefully, most of us are aware of what testing is and how it relates to the software development process. It is not my intention to provide an overview of how to test Perl code, but rather to focus on how using code coverage can help in that process.


What is code coverage?

Simply put, code coverage is a way of ensuring that your tests are actually testing your code. When you run your tests you are presumably checking that you are getting the expected results. Code coverage will tell you how much of your code you exercised by running the test. Your tests may all pass with flying colours, but if you've only tested 50% of your code, how much confidence can you have in it?

There are a number of criteria that can be used to determine how well your tests exercise your code. The most simple is statement coverage, which simply tells you whether you exercised the statements in your code. We will examine statement coverage along with some other coverage criteria a little later.

When working with code coverage, and when testing in general, it is wise to remember the following quote from Dijkstra, who said:

  Testing never proves the absence of faults, it only shows their presence.

Using code coverage is a way to try to cover more of the testing problem space so that we come closer to proving the absence of faults, or at least the absence of a certain class of faults.

In particular, code coverage is just one weapon in the software engineer's testing arsenal.


Where does code coverage fit into the testing process?

Black box testing

Black box testing considers the object being tested to be a black box, that is no knowledge of the internals of the object being tested should be known to the test. The test will simply use the interface provided by the object and will ensure that the outputs are as expected. Should the internals of the object change slightly, or even radically, the test should still pass provided that the interface remains constant.

White box testing

White box testing looks inside the object being tested and uses that knowledge as part of the testing process. So, for example, if it is known that a certain variable is designed to take values within a specific range, a test might check what happens when that variable is given values around its extremes.

Code coverage

Code coverage is a white box testing methodology, that is it requires knowledge of and access to the code itself rather than simply using the interface provided. Code coverage is probably most useful during the module testing phase, though it also has benefit during integration testing and probably at other times, depending on how and what you are testing.

Regression tests are usually black box tests and as such may be unsuitable for use with code coverage. But often, especially in the Perl world, module, integration, regression and any other tests you might perform all use the same test code, just at different times.


How do people normally write tests?

The flippant answer to that question is ``they don't''. Unfortunately, that answer is the correct one in far too many cases. I'm not going to evangelise about the importance of writing tests; people who don't write tests have no use for code coverage and, in fact, are unable to use code coverage until they do write a test suite.

Sometimes people quickly write small tests to check the major functionality of their code. This is a sanity check and, whilst it has its place, it cannot replace a proper test suite.

``How's the project coming along?'' ``It's 99% done, I just need to test it.''

If tests are written after development is completed there is no way to gauge the quality of the code while it is in development. What happens if the tests show major implementation or design flaws? Or even minor ones? It is much better to have tests available to be run as the code is developed. Some people propose that tests should be written before the implementation. This has a lot of merit in certain situations. For the purposes of code coverage, it is often better to write a test while the code it is testing is being developed, or at least to write more tests at that time.

No matter how well you write your code, you will get a bug report sooner or later, even if it's only from yourself. That's a good time to write a test. You can use it to reproduce the bug, make sure you've fixed it and, when the test is added to the test suite, make sure it doesn't come back again.


Code coverage metrics

A number of different metrics are used determine how well exercised the code is. I'll describe some of the most common metrics here. Most of the metrics have slight variations and synonyms which can make things a little more confusing than they need to be. While I'm describing each metric I'll also show what class of errors it can be used to detect.

Statement coverage

Statement coverage is the most basic form of code coverage. A statement is covered if it is executed. Note that a statement does not necessarily correspond to a line of code. Multiple statements on a single line can confuse issues - the reporting if nothing else.

Where there are sequences of statements without branches it is not necessary to count the execution of every statement, just one will suffice, but people often like the count of every line to be reported anyway, especially in summary statistics.

This type of coverage is relatively weak in that even with 100% statement coverage there may still be serious problems in a program which could be discovered through the use of other metrics. Even so, the first time that statement coverage is used in any reasonably sized development effort it is very likely to show up some bugs.

It can be quite difficult to achieve 100% statement coverage. There may be sections of code designed to deal with error conditions, or rarely occurring events such as a signal received during a certain section of code. There may also be code that should never be executed:

  if ($param > 20)
  {
      die "This should never happen!";
  }

It can be useful to mark such code in some way and flag an error if it is executed.

Statement coverage, or something very similar, can also be called statement execution, line, block, basic block or segment coverage.

Branch coverage

The goal of branch coverage is to ensure that whenever a program can jump, it jumps to all possible destinations. The most simple example is a complete if statement:

  if ($x)
  {
      print "a";
  }
  else
  {
      print "b";
  }

Full coverage is only achieved here only if $x is true on one occasion and false on another.

Achieving full branch coverage will protect against errors in which some requirements are not met in a certain branch. For example:

  if ($x)
  {
      $h = { a => 1 }
  }
  else
  {
      $h = 0;
  }
  print $h->{a};

This code will fail if $x is false (and you are using strict refs).

In such a simple example statement coverage is as powerful, but branch coverage should also allow for the case where the else part is missing, and in languages which support the construct, switch statements should be catered for:

  $h = 0;
  if ($x)
  {
      $h = { a => 1 }
  }
  print $h->{a};

100% branch coverage implies 100% statement coverage.

Branch coverage is also called decision, arc or all edges coverage.

Path coverage

There are classes of errors which branch coverage cannot detect, such as:

  $h = 0;
  if ($x)
  {
      $h = { a => 1 };
  }
  if ($y)
  {
      print $h->{a};
  }

100% branch coverage can be achieved by setting ($x, $y) to (1, 1) and then to (0, 0). But if we have (0, 1) then things go bang.

The purpose of path coverage is to ensure that all paths through the program are taken. In any reasonably sized program there will be an enormous number of paths through the program and so in practice the paths can be limited to those within a single subroutine, if the subroutine is not too big, or simply to two consecutive branches.

In the above example there are four paths which correspond to the truth table for $x and $y. To achieve 100% path coverage they must all be taken. Note that missing elses count as paths.

In some cases it may be impossible to achieve 100% path coverage:

  a if $x;
  b;
  c if $x;

50% path coverage is the best you can get here. Ideally, the code coverage tool you are using will recognise this and not complain about it, but unfortunately we do not live in an ideal world. And anyway, solving this problem in the general case requires a solution to the halting problem, and I couldn't find a module on CPAN for that.

Loops also contribute to paths, and pose their own problems which I'll ignore for now.

100% path coverage implies 100% branch coverage.

Path coverage and some of its close cousins are also known as predicate, basis path and LCSAJ (Linear Code Sequence And Jump) coverage.

Condition coverage

When a boolean expression is evaluated it can be useful to ensure that all the terms in the expression are exercised. For example:

  a if $x || $y;

To achieve full condition coverage, this expression should be evaluated with $x and $y set to each of the four combinations of values they can take.

In Perl, as is common in many software programming languages, most boolean operators are short-circuiting operators. This means that the second term will not be evaluated if the value of the first term has already determined the value of the whole expression. For example, when using the || operator the second term is never evaluated if the first evaluates to true. This means that for full condition coverage there are only three combinations of values to cover instead of four.

Condition coverage gets complicated, and difficult to achieve, as the expression gets complicated. For this reason there are a number of different ways of reporting condition coverage which try to ensure that the most important combinations are covered without worrying about less important combinations.

Expressions which are not part of a branching construct should also be covered:

  $z = $x || $y;

Condition coverage is also known as expression, condition-decision and multiple decision coverage.

Time coverage

OK, this isn't really code coverage at all, it's profiling of a sort, but while we're seeing what code gets exercised, why not just see how long it takes for it to be exercised? Maybe it will show up some problems with the algorithm being used, or something.

Documentation coverage

No, this isn't really code coverage either, but documentation is important, right? So let's try and remind people to write some, at least something about each function in the public API, anyway.


How to use code coverage

So we are all enthused about using code coverage to help ensure that our software is well tested. How do things work in practice?

Let's assume that you have successfully run your test suite with your code coverage tool. (I know that is a very big assumption, but anything else is beyond the scope ...) You examine the output from your tool and it tells you that you have achieved 86% statement coverage. Not bad, but hardly stellar. You look at a more detailed report and notice that a couple of methods have never been called. You have two options: write tests for the methods, or decide they are not needed and delete them. Well, I suppose you've always got the option to bury your head in the sand, but let's assume that's not an option we'll take. You decide to write some tests which test these methods, your code coverage increases, your test suite is strengthened, and the world is a happy place.

Unfortunately, it's not always quite this easy. For example, you might notice that a certain missing else clause is never exercised, so you decide to write a test to take that branch, but when you run the test the program performs incorrectly. It may be an error in the implementation, or even in the specification. Maybe no one had even considered what should happen for the test you just wrote.

Sometimes it is impossible to write a test to exercise some code. This can also show errors or omissions in the implementation or specification. For example, the following code hasn't been fully thought through:

  $x = 0;
  print "a" if $x;

Given that the use of code coverage can find errors even in the specification, it makes sense to use it as early as possible in the development process. It also makes sense to run your tests regularly with code coverage turned on. It is probably a good idea to have a cover target in your makefile, or to do something similar so that it is easy to get code coverage data.

Here's a suggested way to use code coverage as part of the development process.

  Decide you need a new program / module / method / subroutine / hack.
  Write tests and documentation.
  Write the code.
  Run the tests.
  Check the coverage.
  Add tests or refine the design and code until coverage is satisfactory.

This may or may not be more fun than your current method, depending on your current method, your definition of fun and how much you enjoy bug reports and maintenance. It's also something of an ideal, and we have already established that we don't live in an ideal world. However, simply striving for the ideal, even though we may not actually reach it, will bring its own reward.


What coverage metrics to use and what percentages to aim for

It's usually a good idea to start with the most simple metrics and move on to the more powerful ones later. In practise this means starting with statement coverage. Then, when you are happy with the coverage you have achieved there, move on to branch coverage, then to condition and path coverage.

Whilst it would be nice to be able to achieve 100% coverage for all the criteria, that is probably not a sensible goal for all but the smallest of projects. So what is a sensible goal? Well, it depends. It depends on the goals of project you are working on. It depends on the cost of failure. It depends on what the software or hardware will be used for and by how many people. It depends on how late into the project you start using code coverage. It depends on the priorities of the people you work with or for. It depends on the way the software was designed and written. It depends on the code coverage tool you are able to use. It just depends.

But, in general, you'll want statement coverage to be way up there. For each statement that is uncovered, you'll want to understand why. If it is a statement that should never be executed then your coverage tool may provide a way to flag that, and even to report if it is ever executed. That way, statement coverage can approach 100%, but in any case you'll probably want to be aiming for 95% at least.

Branch coverage is unlikely to be as high as statement coverage. You might be satisfied with 90% branch coverage, especially if you understand what happened to the other 10%. Path coverage will probably be lower still, but this figure will almost certainly be at the mercy of what, if anything, your coverage tool defines a path to be. The value of condition coverage you achieve will also be dependent on how this metric is defined by your coverage tool. It will also, to a large part, depend on the complexity of the expressions you are using.

When using code coverage, as with every other part of the software engineering process, it is important to be pragmatic. Rightly or wrongly, when I write software at home I have different aims to when I program at work. I even have different aims in the different projects I work on. This applies to the use of code coverage as much as it applies to anything else.


Danger, Will Robinson! Danger!

So, what's the downside? Code coverage isn't all that new, why isn't everybody routinely using it everywhere? Well, some of the reasons are the same as for why people aren't using a host of good ideas, tools and methodologies in the software development process. But let's look at problems specific to code coverage.

To start with, a lot of people just don't know about code coverage. Even people who really should know about it may not. And even if people do know about it, they may not fully understand it, or may not see the benefits it provides. So the first hurdle may be to get agreement, if not enthusiasm, from all concerned for the use of code coverage.

Then you have to find a code coverage tool. Unfortunately, this is not always as easily as it sounds, even if you have a corporate wad to wave around. Good code coverage tools seem to be few and far between, even for relatively common programming languages and platforms.

Once you have your coverage tool, you need to learn how to use it. You need to understand its capabilities and limitations. Of course, this is no different to the use of any other relatively complex tool. That done, you will need to run your test suite using the coverage tool. This might be where things start to get interesting. If you are lucky everything will just work, and you'll get a nice report at the end. If you are unlucky you may hit a host of problems.

Because code coverage is a white box testing technique, it is necessary for the coverage tool to look at and understand your source code. It is possible that some of your code may not be recognised by, or may not work correctly with the coverage tool. Different tools will use different methods for calculating the code coverage. One method is to instrument your source code by adding extra code to it, and then executing the program containing this extra code. It is possible that this will produce different results to your original code. Of course, these are bugs or at least deficiencies in the coverage tool, but did I mention that this world is not always ideal?

If the coverage tool does understand and work with your code, you will have to accept an overhead for using code coverage. This is primarily a time overhead, but the calculation and storage of code coverage data will obviously require a certain amount of resources too. The overhead will vary between tools, based on what coverage criteria they are checking, the way they calculate the coverage and how efficient they are. The time overhead will probably be anywhere between 1.2 times for an efficient tool checking only statement coverage to 20 times or more for a less efficient tool checking many coverage criteria. This might not be too important if it only takes a minute to run your test suite, but if it takes a week then there's a chance that the coverage data won't be available until after the release.

So now you have run your test suite with your coverage tool and you have a nice report telling you which parts of your code weren't exercised. If ignorance is bliss, what does that mean for your coverage report? Improving code coverage can be a major undertaking, but then so can any serious drive to improve quality.

When you have improved your tests and are satisfied with the level of code coverage you are achieving, remember that the quality of your product is not guaranteed, merely improved. Do not rely exclusively on code coverage as a measure of product quality.


What code coverage won't do

Your code coverage tool will have various limitations in the coverage it performs. One would hope that statement and branch coverage would be almost universally available and consistent, but a number of coverage tools handle only statement coverage. Condition and path coverage, where available, will almost certainly not provide complete information, especially in the case of path coverage. Other coverage criteria may or may not be catered for.

Data coverage

Consider the following program:

  my $input = int shift;
  my @squares = (0, 1, 5, 9);
  if ($input < 4 && $input > -4)
  {
      print $squares[abs $input];
  }

I can get full code coverage by running three tests, with the input values -4, 0 and 4 (for example). Of course, that won't help a bit when someone wants to know what two squared is.

Some form of data coverage might come in handy here, checking that you have accessed each of the values in @squares. But then again, maybe you are already doing that, and the values you are checking for come from the same source that was used during implementation. No form of coverage can help you there.

Regular expressions

Or whatever they're called now. They might not be regular and they might not be expressions, but they just might be little languages all of their own that deserve to be tested as much as anything else. Regular expressions have their own version of statements, branches, paths and conditions, and that's before we even start thinking about embedded Perl.


Some code coverage tools

On to the specifics. What can we use to get code coverage information for hacking Perl and perl? And XS, since that sort of falls in the middle.

Devel::Coverage

Devel::Coverage was written by Randy J Ray. Version 0.1 was released on 1st September 1997 and version 0.2 was released on 17th July 2000. It is described as alpha code, but seems to work reasonably well. I was going to mention something about the code being stable and reference Jarrko's .sig, but Randy popped up on p5p recently promising a new release soon.

Devel::Coverage interfaces with the Perl debugger in order to calculate the code coverage. This means that it is restricted to statement coverage only, and it also makes it fairly slow, but it does mean that there are no dependencies outside of the perl core.

It is simple to use:

  perl -d:Coverage script_name arg1 arg2 ...

and coverage data can be seen with:

  coverperl script_name.cvp

Devel::Coverage requires perl 5.005, but unfortunately it doesn't work with 5.8.0.

Devel::Cover

Devel::Cover was written by me. Version 0.01 was released on 9th April 2001 and version 0.14 was released on 5th March 2002, but there should be a new version available by the time you read this. It is also described as alpha code.

Why did I write Devel::Cover, when Devel::Coverage was already there? Primarily, because I wanted to be able to check other types of coverage than just statement coverage. That required a fundamentally different approach to collecting the coverage data, and the infrastructure for that approach was not available until perl 5.6.1 and perl 5.7.1.

Devel::Cover does not use the Perl debugger, but instead it uses a pluggable runops function to gather the coverage data. Perl's runops function is a small function that does little more than loop through all the opcodes that make up your Perl program, running the appropriate functions and moving between the ops as the program dictates. It is possible for a module to replace this function with one of its own, and that is what Devel::Cover does. This allows for the tracking of each op as it is executed, and it is here that the coverage data are gathered.

But users don't care, in general, about the ops that perl is using to run their program. And so in a post processing phase, information about the ops is mapped back to reality using the rather wonderful B modules.

This approach may allow for a fairly low overhead, at least with the more basic criteria. It also means that it is necessary to have a compiler available to be able to compile the XS code in the module.

As we go to press, statement, time and documentation coverage are supported. Condition coverage is mostly there and branch and path coverage still need to be implemented. Documentation coverage is just a front end to Pod::Coverage by Richard Clamp and Michael Stevens, and is unavailable unless you have that module installed.

Devel::Cover is also simple to use:

  perl -MDevel::Cover script_name arg1 arg2 ...

and coverage data can be seen with:

  cover

The module is still a work in progress, but I think that most of the most difficult problems have been solved. Well, except for finding a name which won't cause confusion, anyway.

C code

Perl is written in C. XS code is basically C. It would be nice to be able to get code coverage information for our XS code. It would also be nice to see how well the Perl test suite tests perl. Since perl is written in C, just use your C code coverage tool to build perl and run the test suite. If you then use this version of perl to build a module containing XS code you might, if you are lucky, automatically get coverage information for the XS code. Otherwise you will have to see how to integrate your coverage tool with the module build process.

If you compile perl with gcc version 3 this process is simplified. There is a make target perl.gcov which will build a version of perl able to use the code coverage abilities of gcc.

A perennial problem with coverage tools is the back end, used to display the results. For Devel::Cover I decided to try to build a generic back end that could be used not only by Devel::Cover but also by any other coverage tool. To that end I have created a generic database format to store the coverage information, and reporting programs which read the database and display the output appropriately. Devel::Cover comes with a small program which can translate the gcov output into this database format.

The output from the reporting programs is not brilliant. In particular the HTML output is rather clunky. I've been hoping that someone would come along and write a really nice back end, but so far there's no one to blame but me.


Conclusions

Code coverage can help you lose weight as part of a calorie controlled diet.


Author

Paul Johnson - paul@pjcj.net http://www.pjcj.net

I am a professional software engineer who has built commercial code coverage tools for Hardware Description Languages. I have been hacking Perl almost since the beginning, both professionally and for fun, and if I'm lucky, both at the same time. I am interested in the production of high quality software and the processes which facilitate that. I am currently enjoying living and working in Zürich.