Several mutation testing systems for Java exist, this page in an attempt to objectively categorise and compare they way in which they are implemented and their relative merits.
In most cases the information assembled below has been obtained from the web without any direct experience of the tools so is incomplete and potentially inaccurate. Corrections, additions and clarifications are welcomed.
WebSite : http://cs.gmu.edu/offutt/mujava/
µJava is the grandaddy of mutation testing systems for Java, predating even JUnit.
It is the result of a collaboration between two universities, Korea Advanced Institute of Science and Technology (KAIST) in S. Korea and George Mason University in the USA.
Source code is available on a limited basis to researchers in mutation analysis.
According to the website :-
“We offer µJava on an”as-is" basis as a service to the community. We welcome comments and feedback, but do not guarantee support in the form of question answering, bug fixing, or improvements (in other words, we don’t have money for support, just good intentions)."
WebSite : http://jester.sourceforge.net/
Jester was the first open source mutation system for Java, but does not appear to be actively developed or supported.
WebSite : http://jester.sourceforge.net/
Simple Jester is a variant of the original Jester by the same author who describes it as easier to use, but slower 1.
It appears to no longer be actively developed or supported.
WebSite : http://jumble.sourceforge.net/
Jumble was developed in 2003-2006 by a commercial company in New Zealand before being open sourced under the GPL licence. The licence was later changed to Apache2 to enable the mutator to be used in earlier versions of PIT.
Allthough no releases have been made since 2009 some development activity seems to still be taking place and the authors respond to queries to the mailing list.
Javalanche was developed by a team based at Sarland University in Germany.
According to their website
“Javalanche is built for efficiency from the ground up, manipulating byte code directly and allowing mutation testing of programs that are several orders of magnitude larger than earlier research subjects. Javalanche addresses the problem of equivalent mutants by assessing the impact of mutations on dynamic invariants: The more invariants impacted by a mutation, the more likely it is to be useful for improving test suites.”
Source code is available but no public issue tracking or mailing list is provided.
WebSite : http://pitest.org
PIT was developed just for fun, originally as a parallel and distributed testing framework, before being gradually diverted into being a mutation testing tool.
Early versions of PIT used the Jumble mutation engine, but made improvements over Jumble in terms of performance, usability and compatibility with third party libraries (e.g mocking frameworks). Recent versions of PIT use its own mutation engine.
PIT’s aim is to provide a high performance, scalable user friendly tool that makes mutation testing practical for real world codebases. As of the 0.29 release PIT is also became the first generally available incremental mutation testing system, with the option to amortize the cost of analysis by storing a history of results.
Source code is provided and support is available via mailing list and public issue tracking.
Mutation testing can be conceptually split into four phases
In practice some these phases may be concurrent or hard to distinguish from each other. In some systems only the generation phase is automated.
There are various strategies by which these phases might be implemented, some possibilities for each stage are discussed in the next section.
For bytecode based systems the mutant detection phase will usually be the most computationally expensive, but its speed is likely to be largely determined by how well the test selection stage was performed.
For source code mutation systems the generation phase may also represent a significant proportion of the computational cost if a naive approach is followed.
Source mutators create mutations by making changes to the Java source files and recompiling them.
Byte code mutators create mutations by manipulating the compied byte code. This will usually be done using a third party library such as ASM, javassist or BCEL.
Of these libraries, BCEL is no longer actively developed or maintained.
There are are great many possible ways in which test selection might be performed and optimised. The decisions made at this stage will largely dictate the performance of detection phase.
Very broadly we can categorize the strategies as follows, but there may be significant differences between systems that have been placed within the same category.
It is left to the user to select the tests to be run against a mutant. These systems are not designed to be integrated into a build, but instead provide an interface by which a user can select an individual class to mutate and the tests to run.
Test selection is automatic, but little or no attempt is made to pick relevant tests for each mutant. Potentially the entire suite, or a large portion of the test suite, is run against each mutant.
Test selection is automatic, with tests selected from the suite based on a naming convention or other simple scheme such as annotations. Optimisations may also be implemented to choose an optimal running order for the tests.
Test selection is automatic. Tests are selected by first measuring their line, block or instruction coverage. Only those tests that exercise the line, block or instruction that contains the mutant will be run against it.
Optimisations may also be implemented to choose an optimal running order for the tests.
Mutants are generated, the class files written to disk, and a new JVM launched with the mutant on the classpath.
A single class is generated that contains all mutants, each mutant is then enabled programatically. Mutant schmeta could be used as part of any scheme for mutant insertion, but makes most sense as a variant of a scheme in which class files are written to disk.
Mutants are held in memory and inserted into the JVM by creating a new classloader that does not delegate to its parent when loading the mutant class.
Mutants are held in memory and inserted into the JVM by creating a new classloader which has the boot classloader as a parent.
Mutants are held in memory and the debugger api used to insert them into a running JVM.
Note the expected performance of this approach is unclear. The debugger can degrade the overall performance of a JVM significantly, but this approach does avoid having to launch a new JVM for each mutant.
Mutants are held in memory and inserted into a JVM using the instrumentation api.
All selected tests are run.
The selected test classes are run until one of them kills a mutant.
Test classes are split into individual test cases which are then run until one of them kills a mutant.
The output of the tools can be broadly categorized as follows
These formats are largely for human consumption. The tools may also produce results in a structured format (SF) suitable to be read and manipulated by other tools, eg XML, RDMS etc.
note that with the exception of PIT little information is available on the compatibility of the various mutation testing systems and mocking frameworks. Most mocking systems are implemented with dynamic proxies or custom class loaders, and will probably work across all the mutation testing sytems. The exceptions are Powermock and JMockit where issues might be encountered.
Of the systems listed, the only three that seem suitable for any serious use in real projects are PIT, Jumble and Javalanche.
Jumble is the most mature of the three projects, and is the only one to offer support for versions of Java prior to 1.5. It is however slower and less sophisticated than the two coverage based systems and is unable to provide a view on the effectiveness of a whole test suite. Documentation is limited, but support is available via a mailing list.
Javalanche is less mature than Jumble, and currently does not integrate with any of the main build tools. It is unclear what support is available, and documentation is limited. It does however provide the unique feature of equivalent mutation detection. This comes with a high computational cost, but would be the least time consuming approach if you have a large number of surviving mutants and a requirement to categorize each one.
PIT is the least mature of the three projects, but is actively developed and supported. All known defects to date have been quickly addressed.
Support is available via a google group and documentation is the strongest of the three. PIT integrates with both the major build systems - it is the only project to provide a Maven plugin. PIT is the only option for TestNG users, and also the only system to have no known issues with any of the major mocking frameworks. It also offers the unique feature of incremental analysis.
In practice the operators may however end up mutating mainly the compiler generated plumbing code for these languages↩
In most circumstances this will be insignificant compared to the cost of mutation analysis, but could be significant if only a small number of mutants are being generated within a large project with a slow test suite. PIT’s dependency analysis feature addresses this scneario.↩