STRUCTURING DATA
Storing or transmitting data unavoidably requires an exact definition of the data in question and their structure. Only then is it possible to subsequently recognize and interpret the data elements. Fixed-length data structures with non-modifiable sequences regularly cause systems to ‘collapse’. The best example of this is the conversion of the many different European currencies to the euro. All systems and data structures with fixed currency definitions had to be upgraded at considerable cost. The same difficulties manifest themselves in many smart card applications. Fixed data structures that need to be extended or shortened sooner or later give rise to considerable effort and expense. However, the problem of structuring data has been around for a long time, and there is an adequate choice of methods that can be used to solve the problem. One method that is very popular in the world of smart cards, and which is coming into more general use in informatics, comes from the field of data transmission. It is called Abstract Syntax Notation 1, or ASN.1 for short. This is a coding-independent description of data objects, originally developed for transmitting data between different computer systems. An alternative to ASN.1 would be using extensible markup language (XML) to structure data, but up to now this method has not gained a foothold in real applications in the smart card world.

In principle, ASN.1 is a sort of artificial language that is suitable for describing data and data structures, rather than programs. The syntax is standardized in ISO/IEC 8824, and the coding rules are defined by ISO/IEC 8825. Both of these standards were developed from Recommendation X.409 of the CCITT. Describing ASN.1 in detail would require a book on its own, so here we only address a few essential aspects in order to give a general ideal of how it works. For further information, we suggest you consult the relevant literature, such as Walter Gora [Gora 98]. ASN.1 has a number of elementary (‘primitive’) data types and composite (‘constructed’) data types. It is also possible to extend the syntax of ASN.1 using macros in order to obtain any desired enhancements to ASN.1. Listings 4.1 through 4.3 show some simple examples of how ASN.1 can be used, including defining and coding data. The basic idea of coding data using ASN.1 is to prefix each data object with a unique label and information about its length. The rather complex syntax of the description language also allows users to define their own data types and nest data objects. The original idea, which was to create a generally valid syntax that could form the basis for data exchange between fundamentally different computer systems, is scarcely used in smart cards. Currently, only a very small part of the available syntax is used in this area, mainly due to the very limited memory capacity of smart cards.

The Basic Encoding Rules (BER) for ASN.1 are defined in the ISO/IEC 8825 standard. Data objects created according to these rules are called BER-TLV-coded data objects. A BER-coded data object has a label (called a ‘tag’), a length field and the actual data part, with an optional end marker. Certain bits in the tag are predefined by the coding rules. The actual structure is shown in Figure 4.1. The Distinguished Encoding Rules (DER) form a subset of the BER. These coding rules specify, among other things, the coding of the length information, which may be one, two or three bytes long. A basic summary of the BER and DER can be found in Burton Kaliski [Kaliski 93]. ASN.1 objects are coded using the classic TLV structure, in which ‘T’ (tag) denotes the object’s label, ‘L’ (length) refers to its length and ‘V’ (value) is the actual data. The first field
of a TLV structure is the tag for the data object in the following V field. To avoid the need for each user to define his or her own tags, which would open the door to incompatibility, there are standards that define tags for various, frequently used data structures. ISO/IEC 7816-6, for example, defines tags for objects used in general industrial applications, ISO/IEC 7816-4 defines tags for secure messaging, and EMV also defines several other tags. It is by no means the case that a given tag is universally used for the same type of data element, but a process of standardization is essentially taking place.

The two most significant bits of the tag encode the class of the following data object. The class indicates the general type of the data object. The universal class indicates general data objects, such as an integers and character strings. The application class indicates that the data object belongs to a particular application or standard (e.g. ISO/IEC 7816-6). The other two classes, context-specific and private, fall under the heading of non-standardized applications. The bit following the two class bits indicates whether the tagged object is constructed from other data objects. The five least-significant bits are the actual label. Since this can have a value of only 0 through 30, due to its limited address space, it is possible to point to the following byte by setting all five bits to 1. All values from 31 to 127 are allowed in the second byte. Bit 8
of the second byte is a pointer that is reserved for future use, so it cannot presently be used. The required number of length bytes is shown in Table 4.3. The standard also defines the term ‘template’. A template is a data object that serves as a container for other data objects. ISO/IEC 7816-6 defines the tags for possible data objects in the domain of industry-wide applications of smart cards. ISO 9992-2 covers the domain of smart card financial transactions. This method of data encoding has several characteristics that are particularly useful in the field of smart cards. Since the available memory space is generally never enough, using data objects based on ASN.1 can produce considerable space savings. TLV encoding makes it possible to transfer and store variable-length data without a lot of complications. This allows memory to be used very economically. This is illustrated in Figure 4.2, which shows the TLV encoding of a name.

Subsequent extensions to data structures can be undertaken very easily with ASN.1, since all that is necessary is to insert additional TLV-coded data objects into the existing data structure. Full compatibility with the previous version is retained as long as the previous TLV objects are not deleted. The same is true of new versions of data structures in which changes have been made with respect to the previous coding. This is a straightforward process that only requires modifications to the tags. It is equally simple to represent the same data using different codings. Collectively, these advantages explain why the ASN.1 syntax, based on TLV coding, is particularly popular in the smart card industry. The main disadvantage of ASN.1 data objects is that the administrative data overhead is rather high if the volume of user data is small. For example, if the user data is only one byte, two additional bytes (tag and length) are still needed for its administrative data. However, the larger the volume of the user data, the more favorable is the relationship. The ASN.1 structured data in the German health insurance card form a good example of this. There are between 70 and 212 bytes of user data. The administrative data amount to 36 bytes, which means that the administrative overhead ranges from 17 to 51 %.

We can recapitulate all the above with a further example. Suppose we wish to store surnames, given names and titles in a file with a transparent data structure. Irrespective of the proper ASN.1 description, the TLV-coded data will have the structure shown in Figure 4.4. The tags used in this example have been freely chosen and thus do not correspond to any relevant standard. When evaluating this data structure, the computer compares the first tag with all tags known to it. If it finds a match, then it recognizes the first object as a given name. It reads the length of this object from the next byte. The subsequent bytes are then the actual object, i.e. the given name. This is followed by the next TLV object, whose first byte is the tag for a surname. The computer recognizes this using exactly the same process as for the first object. If it becomes necessary to extend the data structure, e.g. by adding a title, a new type of data object can simply be inserted into the existing structure. The insertion point is unimportant. The extended structure remains fully compatible with the previous version, since the new type of data object receives its own tag and is thus unambiguously identified. Programs that only know the old tags will not be upset by the new one, since they do not recognize it and thus automatically skip it. Other programs that do know the new tag can evaluate it, but even if the old structure is used, they will not experience any problems.