Ivor Rankin is Senior Technical Specialist, Symantec Middle East & Africa

Antivirus software tests can be an important factor when selecting antivirus software. However, there are many different tests available, and interpreting the results can be challenging. Additionally, the needs of the corporate customer and home user differ, and it's important to understand these differences in order to critically evaluate antivirus software tests.

What makes a Good Test?
A Good Test is one that is both Scientific and Meaningful
Scientific tests possess several qualities. First, they have validity - they measure the thing they purport to measure. Second, they are repeatable - not only do they measure the thing they purport to measure, they do so consistently, reliably and in ways that can be peer-reviewed. The processes are documented and stand up to scientific scrutiny. In addition to being scientific, a good test is meaningful. This is a bit trickier, as what is meaningful to the home user might not be meaningful to the corporate client; what is meaningful to a person from one region might not be as meaningful to a person from another. Thus, while ³meaning² is sometimes difficult to interpret the critical question is this: does the test measure something that's important to the reader. It's important to keep in mind that it does not matter how ³in depth² a test appears to be if it isn't scientific and meaningful as well. To be useful to the user, a good test must be both scientific and meaningful.
Testing Organisations: University, Commercial, Independent Specialists and Magazine

Not all Tests are Created Equal
There are several types of tests. Here I will describe several different tests, focusing mainly on Good Magazine Tests.

University Tests

University tests offer an excellent opportunity for students to learn about testing processes, methodology, and criteria by testing a wide variety of products. These types of test results are of limited use for the Corporate Client or Home User as they produce so much data that interpreting it correctly can be extremely time consuming. They tend to measure things ³because they can be measured²; while usually scientifically valid and reliable; not all of these things will meet the ³is it meaningful to the user² criteria. Additionally, the tests are performed by students in the University environment; there is little if any applicability of their experience or of the testing environment to the corporate world.

Commercial Testers

Commercial testers offer vendors the opportunity to certify their products against a criteria selected by the vendor, using a methodology approved by the vendor community and virus supplied by the vendor community (either directly or via The Wild List). The test strengths are that these tests are peer reviewed, and well documented. They provide a baseline for both the corporate and home user - certifying that products detect (and when possible, repair) at a minimum the viruses that are spreading In the Wild. Additionally, such tests can be reviewed to show performance over time, when reports are available online from the commercial testing labs.

Independent Specialists

Independent Specialists may be of value to corporate customers, but finding them can be challenging. They must possess not only intimate knowledge of antivirus software, but of the Internet, viruses and malware and the corporate environment. They offer individualised tests and do special projects for corporate customers; thus, the output is not generally publicly available*. These testers frequently go beyond the detection tests of the Commercial Tester and measure selected products' ability to mitigate risks specific to a particular corporation. There are a limited number of competent independent specialists.

Magazine Testers

Magazine testers are the most visible and therefore in some ways the most influential. This therefore leads us to our third rule of testing:

A Good Magazine Test is subject to the same criteria as all other tests - scientific validity and meaningfulness.
Some magazine tests make use of in-house expertise to perform and interpret the tests; others hire outside contractors to perform and interpret the tests. In either case, there are some things to consider when evaluating the usefulness of magazine tests to a given environment.
First, the expertise of the tester should be considered. If a tester writes about modems one week, printers the next, and antivirus the next, it is unlikely he has the expertise to competently and safely test using real viruses. Due to this lack of expertise, Magazine Tests sometimes rely on the output from the Commercial Testers, or Academic Testers, focusing their own in-house expertise on non-viral aspects of testing. This can be a useful way to approach the tests - if the test criteria and methodology meet the requirements of ³scientific and meaningful².
Next, the test criteria should be evaluated for meaningfulness, and the methodology used assessed. For example, both Corporate and Home Users need to know if a product performs against a virus they are likely to encounter. However, ³viruses² used in some magazine tests are not viruses at all - they are non-replicating or damaged samples, and measuring a product's ability to detect them has little relevance to anyone (other than the tester).
Additionally, some testers use non-meaningful zoo samples, obscure or little used archivers or packers, virus simulators, or viruses created especially for a test - these all detract from the validity and reliability of the tests overall. How the criteria are chosen is extremely important.

A good test knows its own limits. It does not measure things just for the sake of having things to measure.
More isn't necessarily better. In fact, it's usually worse. Tests that overwhelm the reader with lots of information ³just because we can measure it² don't help anyone.

What's Left Out?

Good intentions aren't enough
There are some things that are, due to the complexity of user needs and expertise/resources required, not well-considered by any testers. Attempting to measure these things often result in data that is not only flawed, but misleading to both the Home and Corporate User. When examining test results, ensure that you pay as much attention to what is not there as to what is.

Weighting

All things are not created equal
Equally as important to consider is how the test data is weighted in the interpretation. A product's ability to update itself automatically is important, a product's ability to detect all of the viruses in circulation is important. A product's ability to detect an obscure zoo virus sample is much less important. The ability of a product to detect a virus that is within an archiver is not nearly so important as its ability to detect a destructive worm coming into the network. When evaluating a test, always consider the relative weights given to different parts of the test. All things are not equally important. It is difficult to test and model all environments. Things like the System impact on detection and Synergistic/holistic effects. Would non-AV specific solutions have stopped a particular threat. Is the right response reconfiguration, firewall, or even user response?

Response and Deployment

Don't compare apples to oranges
Magazine Tests that measure response need to consider reader needs, as well as different types of response. Corporate and home users have different needs and it's a fallacy to compare response in one category and make claims concerning the other. Reviewers must refrain from apples to oranges comparisons and don't confuse the needs of disparate groups.

Conclusion

In summary: A Good Test results in the presentation of a clear, easy to understand picture of what is being measured, and how it is being measured - and how those measurements apply to the readership¹s own requirements. These seven simple rules won¹t tell you ³how² to test, or even ³what² to test. However, they will help you critically evaluate existing tests of antivirus software.

Rule #1: A Good Test is one that is both Scientific and Meaningful
Does the test measure something that meaningful and is the test process from - start to finish - scientifically valid?

Rule #2: Not all testers are created equal
Does the tester have the requisite experience and knowledge to correctly evaluate the aspects of the antivirus software he is attempting to measure?

Rule #3: A Good Magazine Test is subject to the same criteria as all other tests - scientific validity and meaningfulness - and assessing the meaningfulness is highly contextual
Does the test accurately measure something that is meaningful to the target reader?

Rule #4: A good test knows its own limits. It does not measure things just for the sake of having things to measure.
Does the test correctly interpret the data gathered?

Rule #5: Good intentions aren¹t enough
Does the test incompletely measure the right parts of the problem? Are the intentions good but the follow through lacking?

Rule #6: All things are not created equal
Does the test correctly weight the relative importance of the different results?

Rule #7: Don¹t mix apples and oranges
Does the test consistently - and accurately - differentiate between the home and corporate users needs - and products - and avoid confusion by presenting test results presented in light of the users needs and product design?

* Tests done by Independent Specialists are extremely complex. They are geared solely for Corporate and Government environments and are beyond the scope of the other three types of testers.