EVALUATING AND TESTING SOFTWARE
Physical components, such as the bodies and modules of smart cards, can largely be tested using conventional methods. Electrical characteristics can also be measured in a satisfactory manner using automated test equipment. However, the situation with regard to the microcontroller software is somewhat different. Although the methods used to test software for errors have been steadily refined during the past 40 years, since the appearance of the first programs, and there are many recognized good methods for producing programs with a low number of errors, it is still true that in everyday practice, software errors show up relatively frequently. This is not a serious problem in most applications, since a revised version of the software can quickly be issued to correct the errors. This cannot be done as easily with smart cards, since most of the software is located in the ROM of the microcontroller. A new version of the software necessitates a completely new production run by the semiconductor manufacturer, which takes around 8–12 weeks. If the smart cards have already been put into service, it is practically impossible to modify the existing software. It follows from these very strict constraints that software for smart card microcontrollers must have an extremely low number of errors. Software that is truly ‘error-free’ would be even better, but given the present state of software development, this remains a distant goal. As is well known, the subject of software testing is extremely extensive. It is described in many books in all of its variations and orientations. We can only present a short sketch of this subject, which by now has become almost an independent branch of information technology. Consequently, in the following sections we discuss only certain special aspects of testing software for smart card microcontrollers. Glenford J. Myers’ book [Myers 95] can be considered to be representative of the literature on this subject. We would also like to point out that military standards, in particular, contain many good and well-proven methods for generating and testing software.

Evaluation
Due to their ability to store data securely, smart cards are primarily employed in securitysensitive areas. However, smart cards can be used to advantage not only for the secure storage of data, but also equally well for the secure execution of cryptographic algorithms. The field of electronic payments, in particular, is an expanding market for smart cards. Since enormous amounts of money flow in a widely distributed system, the application provider or card issuer must have a high degree of confidence in the semiconductor manufacturer, the producer of the operating system and the smart card personalizer. The application provider must be able to be absolutely certain that the software in the smart card performs the required financial transactions without any errors and that the software is free of security leaks, not to mention trapdoors deliberately introduced into the software. For example, suppose a secret command could be sent to the smart card to read out the PIN and all secret keys. In the case of a GSM or Eurocheque card, the attacker would then be able to clone any number of cards and sell them in perfect working order. These security requirements relate not only to manufacturing the smart cards, but equally well to initialization and personalization of the cards, since the secret keys and PIN are loaded into the cards in these stages. The card issuer must place a high degree of trust in the card provider with regard to security. This also applies to the fundamental security of the software in the smart card. Problems can arise even if a ‘trap door’ has not been intentionally included in the software to allow data to be spied out of the card. Faulty operation of the software could very well make it possible to read data from the card or write data to the card using a combination of commands that is not used in normal processes. Although the likelihood of such a coincidence is extremely low, it is nevertheless well known that given the current state of software technology, it is impossible to guarantee that programs are free of errors under all conditions. It is certain that in the future, companies that produce software for smart cards will no longer be able to deny all responsibility on the basis of such legalistic formulations. There are only two ways in which the application provider can test the trustworthiness of a product. He can either test all possible variations the smart card software himself, or he can have the software tested by a trustworthy party. The first option is frequently possible only to a limited degree, since the provider usually does not have all the necessary technical expertise and capabilities. The second option, which is assigning the tests to another party, is currently regarded by all concerned as an acceptable solution. This same problem has existed for many years with software and systems developed for military use. It is thus not something that is new in the smart card world. In order to establish metrics for the trustworthiness of software products, which means to make it objectively measurable, the US National Computer Security Center (NCSC) issued a catalog of criteria for evaluating the trustworthiness of information technology systems in 1983. NCSC was founded in 1981 by the American Department of Defense (DoD). The publication of ‘Trusted Computer System Evaluation Criteria’ (TCSEC) followed in 1985. This book had an orange binding, so it has come to be generally known as the ‘Orange Book’. These criteria serve as guidelines to the NCSC for the certification of information technology systems. The TCSEC has become an international model for practically all criteria catalogs in the information technology field. In Europe, specifically European criteria have been defined, although they are based on the TCSEC. They were first published in 1990 as the ‘Information Technique System Evaluation Criteria’ (ITSEC), and a revised version was issued in 1991.

The Common Criteria (CC) were created in order to provide a uniform standard for testing the correctness of software. They can be regarded as representing the essential elements of the TCSEC and the ITSEC. The Common Criteria are also better organized for the evaluation of software than the TCSEC or the ITSEC. Although the first version of the Common Criteria was published as early as 1996, it has not yet supplanted the TCSEC or the ITSEC. The Common Criteria have also been published as an international standard (ISO 15408). In contrast to the ITSEC, which has six levels, the Common Criteria have seven levels of trustworthiness. It is relatively easy to make the transition from an evaluation based on the TCSEC or the ITSEC to one based on the Common Criteria, since all of these catalogs have many features in common. However, since in the smart card field in particular the ITSEC is still used as the essential basis for software evaluation, we refer only to this catalog in the following description. Occasionally, the requirements of the FIPS 140-2 standard are taken into account in performing evaluations, in addition to the ITSEC and the CC. This standard specifies four possible security levels for security modules, which can be considered to include smart cards, and provides detailed descriptions of seven requirement areas related to security. The contents of this standard are very practically oriented and also deal with details of technical implementation, such as criteria for the quality of random-number generators. Regardless of the method used, an evaluation process has four characteristics. First, it must be unbiased, which means that the evaluator must not have any preconceived ideas regarding the item to be evaluated or its producer. The second characteristic is that the evaluation process must be objective and structured to minimize the significance of personal opinions. The third characteristic is that the same result must be obtained if the evaluation process is repeated. The final characteristic is that the evaluation process must be reproducible, which means that a different tester or testing agency must reach the same conclusions. One of the most important considerations in any evaluation is defining the security targets for the target of evaluation (TOE). The target of the evaluation is the object to be tested, and the security targets describe the mechanisms to be tested. Incidentally, an evaluation can be dramatically simplified by carefully selecting the security targets, since elements that are critical with regard to security can thereby be excluded. This is just a trick that can be used to achieve a high evaluation level in the quickest and least costly possible manner. Naturally, the actual security can only suffer as a result.