1-2 Effects of 16-bit instructions

The SuperH microcomputer features a 16-bit instruction. This allows a single-chip microcomputer with small code size to be used without concern for memory resources, and to operate efficiently even with a small cache. Here we will explain how this efficiency can be attained with the 16-bit instruction.

<RISC and fixed length instruction>
RISC microcomputers are intended to process at higher speeds by executing instructions per clock cycle. For that purpose instruction functions are simplified in order to be executed in a single clock cycle. This is based on research that showed that even if complicated instructions are provided they are seldom used. Complicated operations can be executed by shifting multiple simple instructions, so chip size can be smaller, development costs can be reduced and faster clocks can be used.

In order for a different instruction to be executed in each clock cycle, an instruction per clock cycle must be able to be fetched from memory. For that reason RISC microcomputers are designed with an instruction length that is less than the bus width, to ensure that an instruction is fetched in each clock cycle. Now most 32-bit microcomputers have at least 32-bit buses, and most have 32-bit instructions.

This is how the RISC microcomputer fetches an instruction per clock cycle, then decodes, executes and processes the instruction in the pipeline. The circuits that process instructions are completely separate, so different instructions are in different circuits at the same time.
There are a variety of numbers of circuit partitions in different products and the SuperH is configured with 5 circuit partitions, and with a 5-stage pipeline.

However, there are two problems with the RISC in utilizing a 32-bit instruction.
One problem is low code density. This is easier to understand if we use operations between registers as an example. When operations are performed between 2 data, a 5-bit area is needed to select one target register out of 32 registers. A 15-bit area is sufficient for 3 operands. 17 bits are not required to specify the operation so most of the bits are unused. In other words many unused bits are included in the program code, which is not a problem in personal computers or work stations that are equipped with a lot of memory, but it is a problem for a single-chip computer. In order to resolve that it is necessary to have higher code density and a short instruction length.

Another problem is the bus bottleneck. The bus is used to fetch instructions and for data access. In configurations with a single common bus, it can be used for only one of those at a time. If data access is given priority then the instruction cannot be fetched and it takes 2 clock cycles to execute.
To solve both of the problems, the SuperH has adopted 16-bit fixed length instructions. First, with a 32-bit bus 2 instructions can be fetched simultaneously for single clock cycle execution, so there is time available in which instructions are not being fetched. Data access can be performed during this time, ensuring single clock cycle execution. It is thus faster than a CISC microcomputer operating at the same frequency.

In addition, the code density of a 16-bit instruction is higher compared to that of a 32-bit instruction, and is suitable for the on-chip ROM in the single chip computer. It can be seen that the code size is smaller even when compared to other RISC microcomputers.

<Delayed branching>
When microcomputers are sped up problems can occur during execution of branching. In both RISC and CISC instructions are read before execution. There is a penalty for the time taken to discard the instruction that was read and fetch instructions from the branch destination.
SuperH solves this problem with delayed branching.
In delayed branching, the branch instruction is executed after the instruction following the branch instruction has executed. Branching which had required 3 clock cycles can be executed in 2 clock cycles. The time for branching within instruction execution is said to be reduced by about 30%, so delayed branching has a considerable effect.

<General purpose registers>
The SuperH is a high performance microcomputer even on a smaller chip with 16-bit fixed instructions giving greater efficiency. However, there are not many general purpose registers that can be specified by instructions. The SuperH can utilize 16 general purpose registers. That is a fully sufficient number. For example if we look at an analysis of the number of registers used in motor control programs on a task unit basis, the vast majority use 7 or less. 97% use 16 or less.

<Multiplication and accumulation>
The SuperH has an on-chip fixed point multiplication and accumulation unit for DSP function. This unit completes multiplication and accumulation in 2 to 3 clock cycles. Digital filtering, vector operations, discrete cosine transforms and high speed Fourier transforms can be performed faster.
This microcomputer has performance that satisfies a variety of requirements, with 16-bit fixed length instructions and delayed branching for greater speed, and the multiplication and accumulation unit for multimedia.