Multiple Precision Arithmetic

What do we do if we want to work with operands that are wider than our word size? There are at least two answers: either use a bigger computer, or break down the problem into smaller pieces. The first of these options is preferable - if we're in an environment where going to a bigger computer is a viable option, and if we don't want a word size so big the bigger computer doesn't exist.

Every architecture I've ever come across has had provisions for extended-precision arithmetic, in this sense (note that there is a terminology snare here: while I'm using extended-precision in its general sense, there is also a specific sense in which it refers specifically to extended-precision floating point values. Don't get confused!). The universal in these schemes is that the Carry bit is used as the "glue" to communicate between the lower order words and the upper order words in the operation.

Let's look at some examples of how the HC11 handles extended precision arithmetic.

Addition

Suppose we want to perform a 24-bit addition. Our operands will be in RAM at locations src and dst; we'll add src to dst and leave the result in dst. Note that if we only wanted to perform a 16 bit addition, we could just do it in the D accumulator. Also, unfortunately, one of the instructions we're going to need to use (add with carry) doesn't exist for D; it only exists for A and B. The upshot is that I could write a more efficient version of the code that follows in which the first 16 bits are handled in one shot, but it would be less clear than the code I am providing that just works 8 its at a time.

        ldaa   dst+2   * low order 8 bits
        adda   src+2
        staa   dst+2

        ldaa   dst+1   * middle order 8 bits
        adca   src+1   * note the add with carry!
        staa   dst+1

        ldaa   dst     * high order 8 bits
        adca   src     * add with carry
        staa   dst

Notice that this depends on the HC11's leaving the carry bit unchanged on a load or a store. Which it does!

Subtraction

Just as there is an "add with carry" instruction to allow us to do extended-precision addition, there is a "subtract with carry" instruction to let us do extended-precision subtraction. Here's a 24 bit subtraction that subtracts src from dst and leave the result in dst.

        ldaa   dst+2   * low order 8 bits
        suba   src+2
        staa   dst+2

        ldaa   dst+1   * middle order 8 bits
        sbca   src+1   * note the subtract with carry!
        staa   dst+1

        ldaa   dst     * high order 8 bits
        sbca   src     * subtract with carry
        staa   dst

Crucial to the success of this code snippet is the HC11's "reverse definition" of the C bit on a subtract: instead of being a carry-out, it's a borrow-out.

Shifts

Again, we can perform a shift by using the carry bit to communicate between the bytes. We'll shift the 24 bits starting at dst one bit to the left. Since there are in-memory versions of the shift and rotate instructions, we'll use them.

        lsl    dst+2   * low order byte

        rol    dst+1   * middle order

        rol    dst     * high order

If we'd started with a rol instead of a lsl, we would have performed a 24-bit rotate, not a 24-bit logical shift.

If we want to do a right shift, we have to start with the high-order byte and work right, instead of the low-order working left. Like this.

        lsr    dst     * high order
        ror    dst+1   * middle order
        ror    dst+2   * low order

Endianness

Let's finish this off with a brief mention of "endianness." When a value is stored in memory in an HC11, the high order byte comes first. There's no particular reason for this; it's just a decision that has been made. A processor which puts the high-order bytes first in memory is called a "big-endian" processor. Most processors today are big-endian; one major exception is Intel, which is little-endian.

One place this causes major problems is when communicating between different processor architectures: if you send a four byte integer from a big-endian processor, it will send the most-significant byte first. If you receive the byte on a little-endian processor, it will expect the least-significant byte first; this means it will see the data completely wrong. The same problem exists sending from a little-endian processor to a big-endian.

Consequently, there are some functions defined as part of the C networking libraries called htonl() and ntohl(), which convert a 32-bit piece of data from your processor's order to the "network standard" byte order, which is big-endian (the processors that implemented the original ARPAnet happened to be big-endian...). If you compile a program for a big-endian machine, these functions do nothing (and in fact are optimized away completely). If you compile it for a little-endian machine, they perform the appropriate shoving around of the data.

Yes, the terms big-endian and little-endian are from Gulliver's Travels.


Last modified: Wed Apr 21 10:28:14 MDT 2004