What do we do if we want to work with operands that are wider than our word size? There are at least two answers: either use a bigger computer, or break down the problem into smaller pieces. The first of these options is preferable - if we're in an environment where going to a bigger computer is a viable option, and if we don't want a word size so big the bigger computer doesn't exist.
Every architecture I've ever come across has had provisions for extended-precision arithmetic, in this sense (note that there is a terminology snare here: while I'm using extended-precision in its general sense, there is also a specific sense in which it refers specifically to extended-precision floating point values. Don't get confused!). The universal in these schemes is that the Carry bit is used as the "glue" to communicate between the lower order words and the upper order words in the operation.
Let's look at some examples of how the HC11 handles extended precision arithmetic.
Suppose we want to perform a 24-bit addition. Our operands will be in
RAM at locations src
and dst
; we'll add
src
to dst
and leave the result in
dst
. Note that if we only wanted to perform a 16 bit
addition, we could just do it in the D accumulator. Also,
unfortunately, one of the instructions we're going to need to use (add
with carry) doesn't exist for D; it only exists for A and B. The
upshot is that I could write a more efficient version of the code that
follows in which the first 16 bits are handled in one shot, but it
would be less clear than the code I am providing that just works 8 its
at a time.
ldaa dst+2 * low order 8 bits adda src+2 staa dst+2 ldaa dst+1 * middle order 8 bits adca src+1 * note the add with carry! staa dst+1 ldaa dst * high order 8 bits adca src * add with carry staa dst
Notice that this depends on the HC11's leaving the carry bit unchanged on a load or a store. Which it does!
Just as there is an "add with carry" instruction to allow us to do
extended-precision addition, there is a "subtract with carry"
instruction to let us do extended-precision subtraction. Here's a 24
bit subtraction that subtracts src
from dst
and leave the result in dst
.
ldaa dst+2 * low order 8 bits suba src+2 staa dst+2 ldaa dst+1 * middle order 8 bits sbca src+1 * note the subtract with carry! staa dst+1 ldaa dst * high order 8 bits sbca src * subtract with carry staa dst
Crucial to the success of this code snippet is the HC11's "reverse definition" of the C bit on a subtract: instead of being a carry-out, it's a borrow-out.
Again, we can perform a shift by using the carry bit to communicate
between the bytes. We'll shift the 24 bits starting at
dst
one bit to the left. Since there are in-memory
versions of the shift and rotate instructions, we'll use them.
lsl dst+2 * low order byte rol dst+1 * middle order rol dst * high order
If we'd started with a rol
instead of a lsl
,
we would have performed a 24-bit rotate, not a 24-bit logical shift.
If we want to do a right shift, we have to start with the high-order byte and work right, instead of the low-order working left. Like this.
lsr dst * high order ror dst+1 * middle order ror dst+2 * low order
Let's finish this off with a brief mention of "endianness." When a value is stored in memory in an HC11, the high order byte comes first. There's no particular reason for this; it's just a decision that has been made. A processor which puts the high-order bytes first in memory is called a "big-endian" processor. Most processors today are big-endian; one major exception is Intel, which is little-endian.
One place this causes major problems is when communicating between different processor architectures: if you send a four byte integer from a big-endian processor, it will send the most-significant byte first. If you receive the byte on a little-endian processor, it will expect the least-significant byte first; this means it will see the data completely wrong. The same problem exists sending from a little-endian processor to a big-endian.
Consequently, there are some functions defined as part of the C
networking libraries called htonl()
and
ntohl()
, which convert a 32-bit piece of data from your
processor's order to the "network standard" byte order, which is
big-endian (the processors that implemented the original ARPAnet
happened to be big-endian...). If you compile a program for a
big-endian machine, these functions do nothing (and in fact are
optimized away completely). If you compile it for a little-endian
machine, they perform the appropriate shoving around of the data.
Yes, the terms big-endian and little-endian are from Gulliver's Travels.