Test procedures and test strategies
Nowadays it is impossible to keep track of all the different methods and procedures that are available for testing software. However, only a few well-proven methods are necessary for testing smart card programs. It is possible to draw on decades of experience and a large number of publications on the subject of testing. Incidentally, software testing always means attempting to discover errors in the program, not demonstrating that the program is correct. All test procedures can be divided into static and dynamic types. In a static procedure, the program code is analyzed and evaluated using various methods, either manually or automatically. The two most commonly used static testing methods are program assessment and review, which are briefly described and explained below.

Static program assessment using software tools
This consists of analyzing various properties of the program code using static techniques. The properties that can be analyzed include the following:
–number of lines of code (LOC)
–number of lines of comments
–ratio of the amount of comments to the amount of program code
–structure of the program code
–number of functions
–nesting depth
–‘dead’ code

Review
Review consists of the formal analysis and evaluation of program modules by a team of assessors. This is sometimes referred to as a ‘code walkthrough’ or a ‘code inspection’. In contrast to static methods, dynamic program analysis methods test the program while it is in operation, either manually or with the aid of computers. There are two fundamentally different approaches (blackbox and whitebox testing), plus a third, hybrid approach (graybox testing).

Blackbox test
A blackbox test is based on the idea that tester knows nothing about the internal processes, functions and mechanisms of the program to be tested. This means that all that can be done is to examine the input and output data with regard to their relationship to each other, as defined in the specifications. Blackbox tests are the standard for smart card operating systems. They are also used for security modules for terminals and computer systems. However, it is often incorrectly assumed that these tests can discover Trojan horses or similar items that may be present, in addition to errors in the software. This assumption is used as an argument for dispensing with relatively time-consuming and expensive program code analysis. Although a blackbox text may allow the tester to detect simple, unsophisticated trapdoors programmed into the system or ones that have been inadvertently generated, an experienced programmer can easily create access possibilities that can never be detected by a blackbox test. This can be illustrated using the following simple example. It is not meant to serve as a model for a Trojan horse, since this has already been known for a long time, but rather to enhance the awareness of the necessity of code inspections in security analyses. Almost all smart card operating systems contain a command for generating and issuing random numbers (GET CHALLENGE). This command could be modified such that only the first 8-byte number that it issues is actually generated by the pseudorandom number generator. Each of the subsequent ‘random’ numbers would then consist of an 8-byte value taken from the EEPROM and XORed with the first random number. A simple external program could then be used to read out the entire memory contents, including all the keys. Incidentally, this is a very good example of applied steganography in smart cards.

With a blackbox test, there is no way to determine whether a Trojan horse is concealed behind this command. Even a statistical analysis of the random numbers obtained would not detect any significant deviation from the normal pseudorandom numbers. The only way to recognize such a manipulated program is to inspect the entire code of the operating system. This example illustrates only one of many possible ways in which a normal command can be modified in order to obtain the contents of the memory. Since only a few lines of program code are needed for this modification, the only effective way to combat such a possibility is to completely reveal and analyze the source code. Abort tests are used to test the functional viability of atomic operations. Such tests are also called ‘recovery tests’. In such a test, a suitable command is sent to the smart card to cause an atomic operation to be initiated in the card. While the atomic operation is being executed, power to the card is interrupted at a specific time. Following this, a check is made to see whether the processed data have been maintained a consistent state by the atomic operation. In such tests, power is not just interrupted at one particular time. Instead, the test is run may times with the power being interrupted at various times distributed over the entire duration of the atomic operation. In order to obtain valid test results, the abort time is displaced in steps, each of which is approximately equal to half of the EEPROM write time for a single page. The functionality of atomic operations can be tested very well in this manner. However, the number of tests required is fairly large. If a typical time increment of 1 ms is used, thoroughly testing command processing in a smart card over an interval of 100 ms would require 100 tests.

Whitebox test
A whitebox test is often called a ‘glassbox test’, which clearly describes the concept.With this type of testing, all internal data structures and processes are known to the tester and can be completely understood. The relevant program documentation is used to design and generate the tests, but the specification is always the sole authority. For decades, program flowcharts and Nassi–Schneidemann diagrams (structograms) have commonly been used to document programs, and they also form the basis for evaluating the internal functions of the software in a whitebox test. With object-oriented languages such as Java, the unified modeling language (UML) has become the prevalent form of representation. The various description variants of UML are also very well suited to the architectural description of smart card software. Since the exact program sequences are known, it is natural for the tester to want to test all possible execution paths through the software. There are several ways to do this. One of them is statement coverage, in which every instruction in the program is executed at least once. This makes it very easy to discover whether the program contains dead code, which is code that is never used, but it is not capable of ensuring that the desired functionality is present. A better method for this is decision coverage, which involves traversing all decision nodes in the program code at least once in each of their possible options. In order to be able to recognize internal program processes during dynamic testing, it is necessary to have a sophisticated emulator for the smart card microcontroller in question or to ‘instrument’ the program being tested. An instrumented program has special program code inserted just before every jump instruction, branch instruction and function call. This code collects location and parameter information when the program is run. An analysis program can be used to statistically and graphically evaluate this information. Unfortunately, the additional program code alters the timing relationships of the program, and in the worst case, it can even cause the behavior of the program to change. This must be borne in mind whenever this technique is used. An extension of the decision coverage criterion is to traverse all program decisions in all possible combinations once for each combination. This covers all possible execution paths. However, the limited amount of time available for testing means that this is possible only with very small programs consisting of a few hundred bytes of code. Even with programs on the order of 1000 bytes, it is not possible to test all possible combinations in a reasonable length of time.

Using a typical smart card command interpreter as an example. The function of this program module is to identify a command located in the card’s input buffer by means of the class and instruction bytes, and then to check the P1, P2, Lc and Le parameters. This size of the program code for this routine is around 200 bytes, and it contains 18 branches. The possible output values consist of five return codes and calls to 26 different command procedures. Two other path coverage criteria are used in particular for testing smart card operating systems: input coverage and output coverage. The objective is to generate all possible input and output values. The output values are often restricted to the available return codes, since otherwise the number of variations would be too large. Since the number of possible input values can also quickly reach a magnitude that makes testing impractical, due to the multitude of input values or the amount of time required, equivalence classes are usually employed. This reduces the large number of possible input values to a relatively small number that can be tested in a reasonable length of time. Equivalence classes are formed by selecting boundary cases on either side of the decision range, together with a value in the middle of the range. For example, if the smart card command interpreter allows a range of 20 through 50 for the value of the P1 byte, the equivalence class would be formed using the values 19, 20, 50 and 51 for the boundary values and 35 (for example) as the midrange value. This set of values verifies the essential query conditions of the program. After this test, it could be assumed with a relatively high level of confidence that parameter range checking has been correctly implemented. Particularly with assembly language programming, it is unfortunately necessary to take the properties of the target hardware into account when defining the equivalence classes. For instance, all arithmetic operations that can cause an overflow or underflow in the processor due to the architecture of the arithmetic unit (8-, 16- or 32-bit width) must be taken into account when forming the equivalence classes. Only then is it possible to be sure that underflows and overflows are correctly handled in the program. Whitebox tests are often used for module testing during smart card development. In such tests, finished software modules located in smart cards are fed data from outside using special test commands, following which the results of the actions of the software modules are determined from outside using test commands. The actual results are then compared with the expected results outside the card.

Graybox test
A graybox test represents a hybrid combination of blackbox and whitebox tests. With such a test, only some parts of the software are known, such as internal program processes. Graybox tests are primarily used in the integration phase with smart cards, since they allows errors in the interaction of the individual components to be very quickly and effectively detected and corrected. Naturally, appropriate test keys (which are public) are needed from the key management facility. Once this part of the integration tests is successfully concluded, the results can be checked using the real keys (life keys).