The alphanumeric data in the files and data objects of smart cards can be stored in a wide variety of formats. In part, this is a result of intensive memory space optimization measures and a lack of general agreement among the various applications and specifications, and in part it is due to the triumphant progress of smart cards in countries outside of Western Europe, which have their own alphabets. In such situations, the original 7- and 8-bit character sets must be replaced by more powerful coding schemes for alphanumeric data.

7-bit code
A total of 128 (27) characters can be represented using a 7-bit code. The most widely used international 7-bit code, which is commonly known as the ASCII (American Standard Code
for Information Interchange) code, is specified in ISO/IEC 646. The importance of ASCII has been steadily decreasing for many years, since the number of characters it can represent is much too small.

8-bit code
The most commonly used 8-bit code (28 =256 characters) is derived from the 7-bit ASCII code and is standardized in ISO/IEC 8859. It consists of two 7-bit code tables specifying control characters and printable characters. The lower order table is identical to the 7-bit ASCII table and is always the same. The higher order table can vary to accommodate a country-specific character set. Probably the best-known higher-order code table is Latin 1, which contains the characters specific to the countries ofWestern Europe. Latin 2, by contrast, contains the special characters for the East European countries. ISO/IEC 8859 consists of 16 parts in total, which define a series of higher order code tables for the character sets of various languages. The characters of the Latin 1 code table in ISO/IEC 8869 are also found in a slightly modified coding in DOS as ‘Code Table 850’ according to the IBM register, in the form of ‘PC ASCII’and as ‘ANSI code’ under Windows. EBCDIC (‘extended binary coded decimal interchange code’), which is widely used in mainframe computers, is not used with smart cards.

16-bit code (Unicode)
Codes with a width of 16 bits allow 65,546 (216) characters to be represented. The only example of such a code is Unicode, which was developed as a private initiative by the Unicode Consortium [Unicode] as an industry standard. The first 256 Unicode characters are identical to ISO/IEC 8859 Latin 1, so there is at least upward compatibility in this part of the character coding. Although the number of characters that can be represented with a 16-bit code is sufficient to represent the characters of the most important living languages, it is unfortunately not sufficient to represent all existing characters. To compensate for this, a sort of escape sequence (‘surrogate pairs’) has been incorporated into the 16-bit character code in the current version of Unicode (3.0). This allows a supplementary byte to be used, so that up to one million characters can be represented.

32-bit code (UCS)
Unicode was originally limited to 65,536 characters. Although this limitation does not cause problems in everyday use, it can be avoided by using an extended character coding scheme. ISO/IEC 10 646 specifies a 32-bit code called the ‘Universal Character Set’ (UCS), which allows 4,294,967,296 (232) characters to be represented, although only half of the available codes (2,147,483,648) are actually used. The four bytes of the UCS are called (in decreasing order of significance) group, level, row and cell. USC thus consists of 256 groups of 256 levels, each of which has 256 rows of 256 cells. A level thus specifies 65,546 characters. The lowest level, which is Group 0, Level 0, is called the ‘basic multilingual plane’ (BMP) and is identical to Unicode. The lowest row, which is Group 0, Level 0, Row 0, thus automatically corresponds to the character set of ISO/IEC
8859 Latin 1, and the first 128 characters are thus identical to the ASCII code. This can be illustrated using a brief example. The letter”A”is coded as’30′in 7-bit ASCII and 8-bit ISO/IEC 8859 Latin 1. Since the first 256 characters of Unicode are identical to ISO/IEC 8859 Latin 1, the letter ”A” is coded as ’00 30′ in 16-bit Unicode and as ’00 00 00 30′ in 32-bit UCS. UCS is the only character coding scheme that allows all characters of all living and dead languages to be coded using unique numerical values. Consequently, UCS is the most important coding scheme for future use, despite its memory requirement of four bytes per character. There are three commonly used schemes, called ‘UCS transition formats’ (UTFs), for translating the codes of 32-bit UCS and 16-bit Unicode characters. UTF-8 translates characters into variable-length byte strings whose least-significant seven bits correspond to ASCII. UTF-16 uses 16 bits for coding and thus corresponds to the BMP of UCS, which also uses two bytes for coding. UTF-16 is also referred to as ‘UTF-2’, since it uses two bytes for coding. UTF-32 corresponds to the usual four-byte representation of UCS, for which reason it is also referred to as ‘UCS-4’.

This book uses SDL notation to describe states and state transitions. For some years, this approach has been used ever more frequently in the smart card domain to describe state-oriented mechanisms, such as those used for communication protocols. ‘SDL’ stands for ‘Specification and Description Language’, and it is described in detail in CCITT Recommendation Z.100. SDL notation is similar to the notation used in standard flowcharts. However, it does not describe program flows, but instead states and state transitions. SDL diagrams are constructed using standardized individual symbols interconnected by lines. The flow is always from top left to bottom right, so the lines connecting individual symbols do not need arrowheads to identify their start and end points. In simplified form, the notation can be regarded as a description of a system consisting of a certain number of processes, where each process is a state machine. If a state machine is in a stable state, it can receive a signal from outside. Depending on the data it receives, the machine may then attain a specific new state. Additional actions may occur between the initial and final states, such as receiving and transmitting data or computing a value. Figure 4.6 shows the 10 symbols used in this book. They are a selection from a much larger set defined in Z.100, but they suffice as a basic set for use with smart cards. The Start symbol (1) denotes the beginning of a process. Most SDL diagrams begin with this symbol. The Task symbol (2) indicates a specific activity, which is described by text within the box. With this symbol, there is no additional detailed description in the form of a subroutine. The Decision symbol (3) allows a query during a state transition, to which the answer may be ‘yes’or ‘no’. The Label symbol (4) marks a link to another SDL diagram and is primarily used to divide large diagrams into several smaller diagrams. The Input (5) and Output (6) symbols represent interfaces to the outside world. The exact input and output parameters are described inside the symbol. The State symbol (7) is used to describe a state. The state attained at each stage is indicated by this symbol. The final three symbols describe subroutines. The Subroutine symbol (8) indicates that the content of this box is described in more detail elsewhere. The Subroutine start (9) and Subroutine end (10) symbols delimit the detailed description of a subroutine.