Supplementary hardware
There are some requirements specific to smart cards that cannot be fully satisfied using software and thus must be satisfied by supplementary hardware, since they cannot be satisfied using the hardware of conventional microcontrollers. Consequently, the various manufacturers of smart card microcontrollers offer a wide range of supplementary functions in the form of on-chip hardware. The most commonly used components for supplementary functions are described below. These components do not necessarily have to all be present in any particular microcontroller. The components that are present depend strongly on the target application, among other things. For example, it would be economically unreasonable to integrate an RSA coprocessor into a microcontroller whose target application uses only symmetric cryptographic algorithms. Nevertheless, there are a few commercially available microcontrollers that include nearly all of the components described below. Another aspect of supplementary functionality with regard to smart card microcontrollers relates to the general subject of security. Chapter 8, ‘Security Techniques’, contains extensive descriptions of supplementary functions implemented in hardware that are primarily intended to counter possible attacks. Consequently, here we describe only those components whose primary purpose is not enhancing security against attacks.

Hardware-based data transmission (UARTs)
The only communications between a smart card and the outside world take place via a bidirectional serial interface. Originally, data transmission and reception via this interface were controlled exclusively by operating system software, without any hardware support. This requires very complex software, and it creates additional potential sources of software errors. However, the main problem is that the speed of software-based data transmission is limited, since the speed of the processor is limited. With current processors, the upper limit is represented by a divider value (clock rate conversion factor) of around 30, which yields a data transmission rate of approximately 115 kbit/s with a 3.5-MHz clock. If higher communication speeds are desired or required, it is necessary to use either internal clock multiplication or a UART (universal asynchronous receiver–transmitter) component. As the name suggests, such a component is a general-purpose component for transmitting and receiving data independent of the processor. It is not limited by the speed of the processor, nor does it need software for communicating at the byte level. Of course, the upper layers of the data transmission protocol still must be present in the smart card in the form of software, but the lowest layer is implemented in hardware in the UART.

Current UARTs can generally work with divider values smaller than 372, in line with ISO/IEC 7816-3, and some of them can transmit and receive data with divider values as small as 1. There is a wide range of implementations of this function. Some UARTs can transmit and receive only single bytes, and only support byte retransmission according to the T=1 protocol in the event of a transmission error.With such UARTs, all the processor has to do is to supply the necessary data to the UART on time and read it from the UART on time. Reception of a complete byte can be signaled to the processor by a flag or interrupt. All more advanced UARTs can transmit or receive multiple bytes in succession. The highest level of technical capability is presently provided by UARTs that can directly transmit data from the RAM or store received data in the RAM using direct memory access (DMA), without the intervention of the processor and in parallel with the other activities of the processor. It has been technically feasible to implement UARTs in smart card microcontrollers since the origin of smart cards, but until the end of the 1990s, transmission and reception routines implemented in ROM software required less physical area on the silicon than a functionally equivalentUART component. Since the total surface area is a decisive cost factor for smart card microcontrollers, for a long time nearly all semiconductor manufacturers rejected the hardware approach. However, conditions have changed with increasing integration density. UARTs with a wide range of capabilities are now standard components of all smart card microcontrollers. Many new types of microcontrollers also allow a USB interface to be placed on the chip as an optional component, in addition to a UART.With such an interface, it would be possible to exchange data with a terminal using the USB protocol with hardware support. Unfortunately, up to now it is effectively not possible to use this USB hardware extension, since USB on smart cards is not yet covered by any standard, which means that it is impossible to guarantee the
mutual compatibility of smart cards and terminals.

Timers and watchdogs
Timers in smart card microcontrollers are connected to the internal processor clock or UART clock (which counts etu’s) via a configurable divider. They usually have a counting range of 16 or (more rarely) 32 bits. Using a timer, the number of clock pulses from the Start command to the End command can be measured without involving the processor. Most timers can also be used in the reloadable mode, in which they count down from a predefined value and trigger an interrupt when the count reaches zero, after which the counter is automatically reset to the initial value and continues counting. A watchdog is also often present in the microcontroller. In principle, a watchdog is a timer that must be regularly reset by an explicit processor instruction, in order to prevent it from timing out after a present interval and triggering a reset. A watchdog allows the processor to be reset to a defined state after a definable maximum interval if it becomes trapped in an endless loop. The primary typical application for watchdogs is in the autonomous controller environment, where they are highly useful. However, they are not of much use for smart cards, partly because the software is (hopefully) extremely reliable and partly because the terminal can always interrupt the processor if it lands in an endless loop. Consequently, watchdogs are generally not used in smart card microcontrollers.

Internal clock multiplication and generation
The demands on the processing power of smart cards are constantly increasing. This applies to the processor as well as components that support cryptographic algorithms. One way to meet these demands is to simply increase the frequency of the clock applied to the microcontroller, since processing power is proportional to the clock rate. Doubling the clock rate thus doubles the performance of the processor. However, for reasons of compatibility with applicable standards, it is generally not possible to increase clock rate above 5 MHz. To get around this restriction, the internal clock frequency of the microcontroller can be increased by using a clock multiplier. This is technically realized using a phase-locked loop (PLL) circuit, which is a well-proven technique, or an RC oscillator. For instance, a smart card connected to an external 5-MHz clock can be operated internally at 20 MHz, which provides significant benefits with regard to computation times for complex cryptographic algorithms or running a Java virtual machine. Nevertheless, when clock multiplication or an internal clock generator is used, it must be remembered that a higher clock rate causes a proportional increase in current consumption. As a rule, the relationship between clock frequency and current consumption is linear, which means that quadrupling the clock frequency (for example) also quadruples the current consumption. Particularly with battery-operated terminals, increased current consumption is not desirable.

An elegant solution to this problem is provided by ‘intelligent power management’ in the microcontroller, which involves communicating the maximum allowable current consumption to the control logic of the PLL. This logic then adjusts the PLL to operate in a frequency range that avoids exceeding the prescribed maximum current consumption, without any involvement by the processor. For instance, if the power-hungry NPU is switched into the processing loop, the internal clock frequency will be automatically reduced to prevent the current consumption from rising above the permissible value. Unfortunately, there is a small difficulty with this solution, which is that the specifications for smart cards for GSM and UMTS telecommunications (presently) prohibit the use of free-running oscillators in smart card microcontrollers. This prohibition is based on fear of possible interference with the other circuitry in the mobile telephone. As long as these portions of the specifications continue to exist, it is not possible to use either a continuously adjustable internal clock frequency or an oscillator that is completely independent of the applied clock signal. However, these specifications do allow clock rate multipliers to be used, as long as the internal and external clock rates have a fixed relationship governed by predefined multiplication factors. Processor speed is not the only bottleneck in smart cards. Data transmission rates, which are specified in the standards, and EEPROM write and erase times do not benefit from increased clock rates. This somewhat limits the advantage of increasing the clock rate. Nevertheless, it can be highly beneficial to use a smart card with an elevated internal clock rate for certain applications, particularly considering that the amount of additional circuitry (and thus area) required on the chip is small. For this reason, nearly all newtypes of smart card microcontrollers have internal clock multiplication capability.

SLE5542,SLE5542 Contact Smart Cards,SLE4442 Contact Smart Cards,Package Outlines Wire-Bonded Module M3,

Figure 3.59 Block diagram of a possible internal clock multiplication circuit using a PLL oscillator followed by a divider to supply clock signals to the various microcontroller components. The clock for the NPU is often taken directly from the PLL without any division. If there are several timers in the microcontroller, each one usually has its own individually configurable predivider

DMA components have been used for a long time in the PC realm. DMA makes it possible to copy or exchange data between two or more memory regions at high speed, independent of the processor and in part in parallel with the other activities of the processor. It is often also possible to independently fill a certain memory region with a predefined value. The main effect of a DMA unit is to offload the processor and thus allow certain routines to be fashioned more simply. Up to now, high-performance DMA components have been sporadically available in smart card microcontrollers.

Hardware-based memory management, firewalls and memory management units (MMUs)
The latest smart card operating systems allow executable machine code to be downloaded directly to the card.4 This code, which can then be run using a special command, can be used for purposes such as executing a cryptographic function only known to the card issuer. However, it is in principle not possible to prevent such downloaded code from including a function for reading out secret data from the memory. Operating system manufacturers have been very careful to maintain the confidentiality of their system architectures and program code. The same is also true of secret keys and algorithms in various applications in the card. The public availability of such confidential information would have fatal consequences for an application provider. One administrative solution is to have every new program tested by an independent organization. However, even this cannot guarantee complete security, since a program that is not the same as the one that was tested could later be substituted for the certified program, or the program might be so secret that nobody other than the application provider is allowed to know about it.

One acceptable solution to this impasse is to equip the smart card microcontroller with a memory management unit (MMU). Such a unit monitors the memory boundaries of the current application program while it is running. The permitted memory region is defined by an operating system routine before the application is called, and it cannot be altered by the application program while it is running. This ensures that the application is completely encapsulated and cannot access memory areas forbidden to it. The barriers formed in this manner are called ‘firewalls’, in analogy to walls used for fire protection in buildings. If an application attempts to access another memory region from within a region demarcated by firewalls, it will fundamentally be prevented from doing so, and in addition any such attempt usually triggers an interrupt so the violation can be immediately detected. Presently, very few smart card microcontrollers have MMUs, although they have been used for years in many other areas. Nonetheless, the importance of this supplementary hardware will greatly increase in the future, since it is the only practical way to securely isolate several applications sharing a single smart card.

Another aspect of MMUs is their ability to relocate physically addressable memory regions to any desired location within the logical memory space of the processor. To a certain extent, this considerably simplifies the memory management function of the smart card operating system, as well as making it possible to enforce strict isolation of applications with regard to memory space. Furthermore, if downloadable native code is used, the MMU can be used to relocate it to a suitable memory area, thus eliminating the need to use the operating system to manually relocate the executable code. There is a critical factor that must be considered when using an MMU in a smart card operating system.With the current state of the technology, all MMUs used in various types of microcontrollers are specifically designed for the microcontroller in question. Although this allows the operation and space requirements of the MMU to be optimized, it comes at the price of portability of the object code. In practice, the particular type of MMU that is used has significantly greater consequences for the operating system than the type of processor that is used. Consequently, MMUs are used only very reluctantly in combination with smart card operating systems for large-scale applications, which must of necessity support several different hardware platforms. 

SLE5542 Cards,SLE5542 Cards Supplier, SLE5542 Silk Screen Printing Cards, SLE5542 Pre-printed Cards, ISO SLE5542 Cards, ISO SLE5542 Cards with Magnetic Stripe,

Figure 3.60 Schematic representation of the operating principles of a hardware-based memory management unit (MMU) in a smart card microcontroller. Process ‘A’ shows a call to an operating-system function that is channeled via the MMU and controlled by a task dispatcher. ‘B’ is an example of a write–read access to an application memory area demarcated by the MMU

SLE5542,SLE5542 Contact Smart Cards,SLE4442 Contact Smart Cards,Memory Overview SLE 4432,

Figure 3.61 Schematic representation of the operating principles of hardware-based memory management (MMU) in a smart card microcontroller with regard to the arrangement of logical and physical address spaces. This example shows an operating system and two applications that share the physically available memory via the MMU. For each of these software components, theMMUtranslates its physical address space into to a logical address space starting at’0000′

CRC (cyclic redundancy check) calculation unit
CRC codes are still frequently used to secure data or programs by means of an error detection code. Calculating a CRC in software is relatively slow, due to the large number of bit manipulations required, and the calculation can be readily implemented in hardware on the silicon of the microcontroller. For this reason, there are microcontrollers for smart cards that have hardware-based CRC calculation units. Naturally, with such units it must be possible to select the usual generator polynomials and seed values.

Random number generator (RNG)
Random numbers are frequently needed in smart cards for generating keys and authenticating smart cards and terminals. For reasons of security, they should be genuine random numbers rather than pseudo-random numbers, as are commonly produced by typical software-based random number generators. All newsmart card microcontrollers have hardware random number generators that produce true random numbers. However, the quality of the numbers produced by such generators must be immune to being adversely affected by external influences, such as temperature or supply voltage. The hardware may use such external influences to assist in generating random numbers, but it must not be possible to predict the generated random numbers by purposefully manipulating one or more of these parameters. This is very difficult to implement in silicon, so a different approach is taken. The random number generator takes various logic states of the processor, such as the clock signal and the contents of the memory, and applies them to a linear feedback shift register (LFSR) clocked by a signal that is also generated using several different parameters. In some cases, this clock can have a frequency several times that of the processor. If the CPU reads the content of this random number generator, it obtains a relatively good random number that cannot be ascertained from outside in a deterministic manner. The quality of the random number so obtained can be improved by supplementary procedures and algorithms. However, what is important here is that the hardware-based random number generator must basically provide good random numbers that can withstand the usual tests5 (e.g., FIPS 140–2).

SLE5542 Full Colour Printing Cards,ISO SLE5542 Cards,SLE5542 Cards,

Figure 3.62 Example of a random number generator whose outputs are constantly written to a ring buffer, from which they can be requested as necessary. The gray rectangles mark random numbers that have already been read once and cannot be requested again. This sort of buffer arrangement is used in some types of low-speed random number generators

Java accelerator
Within only two years, Java Card has established itself as an industry standard for executable program code in smart cards. However, since the Java VM must interpret the bytecode rather than directly execute it, there is an unavoidable loss of execution speed compared with native machine instructions, which can be directly executed by the processor.However, the widespread use of Java in smart cards makes it attractive for semiconductor manufacturers to devise remedies for this processing speed problem. Presently, two different approaches are being pursued. In the first approach, large portions of the Java VM are incorporated into the smart card microcontroller as dedicated hardware components that supplement the actual processor. This technique thus goes in the direction of picoJava, which means in the direction of a real IC that The quality of random numbers is treated in overview in Section 4.10.2, ‘Testing random numbers’ can directly process Java bytecode. This solution has two drawbacks, which are that the Java processor takes up additional space, besides that occupied by the regular processor, and that a full implementation of a Java VM is relatively costly. The advantage of this solution is its high execution speed. In the second approach, the instruction set of the processor is extended to include typical Java machine instructions. This allows bytecodes supplied by the software VM to be immediately processed by the extended processor. This variant is implemented using a processor lookup table containing CPU microinstruction sequences corresponding to the bytecodes to be emulated. The advantage of this solution relative to the first one is that it requires less additional space on the chip, although its execution speed is somewhat lower.

Coprocessors for symmetric cryptographic algorithms
Up to now, DES has been used as the standard cryptographic algorithm for financial transaction systems and telecommunications applications. This large market potential made it worthwhile for semiconductor manufacturers to fit smart card microcontrollers with their own DES calculation units. In principle, this is not particularly difficult, since DES was originally designed to be primarily implemented in hardware. The largest problem in marketing DES calculation units in microcontrollers is not technical, but instead relates to export restrictions, since in many countries components with fast, hardware-based DES encryption are subject to a variety of export regulations. The advantages of DES calculation units for smart card microcontrollers can be clearly seen by examining their performance figures. At 3.5 MHz, they can achieve times on the order of 75 μs for a simple DES operation and 150 μs for a triple-DES operation with two keys. The calculation time decreases linearly as the clock rate is increased. Besides this, a DES calculation unit does not require significantly more chip area than that occupied by the ROM code for a software DES implementation, so it does not increase the size of the die. In the future, besides DES coprocessors there will also be special coprocessors for AES in smart card microcontrollers, usually supporting all three possible key lengths (128, 196 and 256 bytes). This is technically just as feasible as a DES coprocessor, since the AES algorithm is also relatively easy to implement in hardware.

Coprocessors for asymmetric cryptographic algorithms
For calculations in the realm of public-key algorithms, such as RSA and elliptic-curve algorithms, there are specially developed arithmetic units that are placed on the silicon along with the usual functional components of a smart card microcontroller. These arithmetic units are only capable of performing several basic calculations that are necessary for these types of algorithms, namely exponentiation and modulo calculations using large numbers. The speed of these components, which are optimized for these two arithmetic operations, is due to their very broad architectures (up to 140 bits). In their particular application area, some of them can even outperform a powerful PC. The arithmetic unit is called by the processor, which either passes the data directly or passes a pointer to the data and then issues an instruction to start the processing. After the task has been completed and the result has been stored in RAM, control of the chip is returned to the processor. In general, these coprocessors can process all key lengths up to 1024 bits for the RSA algorithm, and in the medium term this will increase to 2048 bits. For elliptic curves, the usual capacity is up to 160 bits, with 210 bits to come in the future.

Error detection and correction in EEPROM
The essential limitation on the useful life of a smart card is imposed by the EEPROM, with its technically limited number of possible write/erase cycles. One way to relax this limitation is to use software to calculate error correction codes for certain heavily used regions of EEPROM, so that errors can be corrected. It is also possible to implement error correction codes using hardware circuitry on the chip. In this way, EEPROM errors can be detected and corrected (as long as they are not too extensive) in a manner that is transparent to the software. Naturally, additional EEPROM is necessary to store the codes. Since good error correction codes take up a relatively large amount of memory, the designer is confronted with a strategic decision: good error detection demands extra memory – up to 50% of the memory to be protected. What’s more, the memory for the error correction mechanism can only be used for this purpose. Although lower performance error correction requires less additional memory, its usefulness is highly questionable. There are a few microcontrollers on the market that have EEPROM error detection and correction implemented in hardware, but they may require extra memory amounting to as much as half the volume of the memory to be protected to be used for the protection codes. As a result, the amount of EEPROM available to the user may not be particularly large. However, the useful life of an EEPROM secured using this mechanism is several times the usual value.

Chip hardware extensions
If the chip hardware must be extended for a particular reason, considerable expenditures of development effort and costs are required on the part of the manufacturer. There are only two ways to implement customer-specific hardware: it can be built in silicon on the basis of an existing chip family, or it can be built as a two-chip system, with all of the associated drawbacks. There is an acceptable solution to this problem in the form of a compromise that incorporates elements of both of these options. A chip with the new hardware unit can be glued directly to the existing chip and electrically connected to it by bonding wires. This solution benefits from the fact that most smart card microcontrollers have several I/O ports, and one of these ports can be used to communicate with the additional chip. The thickness of the resulting sandwich construction is not significantly greater than that of a normal chip, since the silicon substrates can be ground away more than usual to make them thinner. A sandwich chip can thus be built into a standard module without additional effort or costs. This technique is ideal for satisfying customer-specific needs for additional hardware without expensive redesign. An existing chip can be combined with a new unit, which may for instance have a special serial interface for testing the security features of other chips. It is also possible to fit a special ASIC containing a secret cryptographic algorithm into the card. This method is not cost-effective for large production quantities (in the range of millions of pieces), since in such cases it is worthwhile to develop special chips. However, for small to medium piece counts, sandwich chips are a very effective solution for prototype series or special applications, such as security modules for terminals or smart cards for pay-TV decoders.

SLE5542,SLE5542 Contact Smart Cards,SLE4442 Contact Smart Cards,Memory Overview SLE 4442,

Figure 3.63 Cross section of a chip module containing two different chips electrically interconnected via bonding wires

Vertical system integration (VSI) and face-to-face
Another technique used to extend chip hardware by combining semiconductor technologies that are incompatible on a single chip is vertical system integration (VSI), in which two or more dies that have been ground thin are bonded together mechanically to form a stack, with the individual dies also being electrically interconnected by through contacts (‘vias’) formed using semiconductor fabrication processes. Using VSI, the available chip area can be increased in units of the original area in a very elegant manner.With two stacked dice, twice the original area is available, and with four dice there is four times as much area. It is possible to achieve not only a significant increase in the amount of available memory, but also a considerable improvement in chip security. This is because it is presently effectively impossible to access a chip sandwiched between two other dice using analytical equipment of the type used in the semiconductor industry without destroying the surrounding chips. A simpler variant of VSI, which in principle can be scaled up as much as desired, is the face-to-face arrangement of two chips. Here the electrical connections are made by extremely precise positioning of the two chips, with their upper surfaces (faces) touching. VSI and face-to-face chip bonding both allow significantly better extensions of chip hardware to be realized than what can be achieved by interconnecting two chips using wire bonds.

SLE5542,SLE5542 Contact Smart Cards,SLE4442 Contact Smart Cards,Memory Overview SLE 4442,

Figure 3.64 Cross-sectional photograph of a VSI stack. The two through contacts between the two stacked dies can be easily recognized (Source: Giesecke & Devrient)