December 26, 2005

Strong vs Weak Mutation Testing

On the XPDeveloper page on mutation testing tools someone has asked "What is the difference between "strong" and "weak" mutation testing?". I take an interest in this because I wrote Jester, probably the best known strong mutation testing tool for Java.

The point of both strong and weak mutation testing tools is to tell you something about the quality of your tests, inasmuch as the tests ensure that the code does what the tests say it should do. (You can, of course, write tests that are testing for the wrong things, but I don't know of any tool that can tell you about that).

Weak Mutation Testing

Consider this example of weak mutation testing, taken from one of Brian Marick's papers with minor edits:

In weak mutation coverage we suppose that a program contains a particular simple kind of fault. One such fault might be using <= instead of the correct < in an expression like this:

if (A <= B)

Given this program, a weak mutation coverage system would produce a message like

"gcc.c", line 488: operator <= might be <

This message would be produced until the program was executed over a test case such that (A <= B) has a different value than (A < B).

That is, a weak mutation testing tool tells you if the tests cause the code to run such that (A <= B) always has the same value as (A < B), which might indicate a missing test, i.e. you should probably have a test in which the value of (A <= B) is different than the value of (A < B), i.e. a test which causes the code to run such that A = B.

Strong Mutation Testing as done by Jester

Jester (a strong mutation testing tool) tells you something subtlely different. Jester indicates if the tests still pass when the expression (A <= B) is replaced by (A < B). This indicates that a test might be missing in which it makes a difference that it's "<=" rather than "<". Note that with Jester, it matters whether the tests pass or not. If the tests execute code that causes A=B but that has no effect on whether the tests pass, then weak mutation is satisfied, but strong mutation is not.

So what's the difference?

Weak mutation is a coverage measure - i.e. it measures something about the code that is run as a result of running your tests. Strong mutation testing measures something about whether your tests test that your code is like it is.

Having your tests run your code is necessary for your tests to stand any chance of being any good, but it's not enough. You could have tests that have no assertions that still get 100% coverage based on any measure of coverage, including weak mutation testing. Coverage tools don't measure whether your tests actually test anything, they just measure that your tests run the code in whatever way the coverage is measuring.

Strong mutation testing tells you whether the code has to be like it is for the tests to pass.

So which is better?

One very significant benefit of a coverage tool (including a weak mutation testing tool) compared to a strong mutation testing tool is that a coverage tool will run much, much faster. Most coverage tools add very little to the time it takes to run your test suite. Jester requires your test suite to be run for every mutation, which could be thousands of times. Even with a really fast test suite and small code base, Jester can take a really long time to run.

If your code isn't executed by your tests, then that can be indicated in a very user friendly way by a simple code coverage tool. If Jester is used on code that isn't executed by your tests, it will just indicate all the mutations that Jester can make (and take a long time to do that) and it won't be as obvious that the reason is that the code simply isn't executed by the tests.

Coverage tools don't tell you about the quality of your tests directly; you can fool a coverage tool (deliberately or accidentally) by having tests that execute your code but have no assertions. You can't fool Jester in the same way.

Jester is best run on code that has "high coverage" as measured by a coverage tool - Jester will then tell you more about the quality of your tests in addition to what the coverage tool tells you, with less "noise" than running it on "low coverage" code.

In practice, doing Test Driven Development, coverage tools, or Jester, might be telling you about redundant code, or code that could be simplified, rather than just telling you about the quality of your tests.

Most "enterprise" code bases have a level of automated unit testing that makes the simplest code coverage tools the most useful - i.e. weak mutation testing is over-the-top for most code bases that I see. In most cases, the simplest to understand line coverage is probably most useful as it gives a high level indication of where no automated tests have been written, in a simple to understand format, rather than using anything more sophisticated.

Summary

Weak mutation testing is a coverage measure - i.e. tells you about about the code that is run by your tests.

Strong mutation testing measures whether your code needs to be like it is to pass the tests.

Line coverage or similar simple coverage measures are probably most appropriate for "enterprise" code bases.

Posted by ivan at December 26, 2005 9:17 PM
Copyright (c) 2004-2008 Ivan Moore