I’ve focused on the topic of optimized instruction encoding repeatedly recently because it’s a big story in terms of the code-density advantage that the Renesas RX microcontroller (MCU) architecture offers over competing MCUs. I’ll be sharing some details of that advantage in a future post just as I did in the performance benchmarking area recently.
Today, I’ll focus on the details of instruction encoding and specifically the instructions that are used to call subroutines. The analysis that the RX team did when working on instruction encoding revealed that subroutine branch instructions account for 8% of all of the instructions that comprise actual application code. That made the instruction a target for optimization. Most processors and MCUs including the RX have two types of subroutine branch instructions. The JSR (Jump Subroutine) instruction is used to transfer execution to a subroutine located at the address that’s specified by the register specified by the instruction. Conversely, the BSR (Branch Subroutine) instruction uses relative addressing to determine the address where the subroutine is located. Ironically, the use-case advantages of each are almost opposite of their origin. Let’s temporarily go back to the day of 8-bit processors before compilers and high-level languages were commonplace. The JSR instruction was simple to use because programmers working in assembly language knew the exact address at which their code would be loaded and therefore a directly-addressed JSR instruction was foolproof. But many subroutine calls were to addresses relatively close to the current program counter. Microprocessor architects conceived the BSR call with relative addressing to boost code density. The BSR was more efficient than JSR when the relative displacement between source and destination was small. Many implementations of the JSR instruction supported immediate values and therefore could take more bits to encode. Fast forward to the day of the compiler and the situation is opposite. The RX JSR instruction can always be encoded in 2 bytes – it doesn’t support immediate values. In fact immediate values are pretty useless in the compiler world where the programmer generally don’t control the address at which the code will load. Compilers use the BSR instruction that these days requires more bytes to encode given the huge memory arrays that are present even in relatively-simple systems. The displacement value is determined during the compilation process. The RX design team optimized the BSR instruction so that compilers or assembly language programmers could use the smallest instruction length possible — dictated by the relative-addressing displacement between the program counter and the target address of the subroutine. The RX BSR instruction comes in three forms, and the smallest of the three can be encoded in two bytes. Moreover, the smallest form can support the largest – 32-bit — range of address destinations. The RX architecture accomplishes that feat by storing the displacement value in a register. In a three-byte version, the instruction can cover a 16-bit displacement range covering jumps from -32768 to 32767 relative to the value of the program counter. The first byte stores the op code and the next two bytes store the displacement. That three byte version will suffice in the vast majority of cases and offers a one-byte savings over the more typical four-byte version found in most 32-bit MCUs. The RX does support a four-byte version that covers a 24-bit displacement range. These instruction optimizations may seem inconsequential at first glance. But one byte here and two bytes there adds up. Moreover, the RX CISC architecture is able to maximize code density without playing the tricks that RISC vendors such as ARM have resorted to – specifically using a 16-bit instruction set with a 32-bit processor. The RX can take full advantage of the 32-bit architecture at all times while offering best-in-class code density.
- Maury Wright
Post a Comment