ldx $0 * clear x
clra * clear d
clrb
ldy #16 * loop counter
loop
xgdx * make room in X for next result bit
lsld
xgdx
lsl numerator+1 * 32 bit shift! Gets one bit into D
rol numerator
rold
subd denominator * find out if we can subtract on this pass
bcs addback
inx
bra else
addback addd denominator * subtract and put bit into x
else dey
bra loop
In hardware, this can run as fast as the other (though in assembly
language it is slower).
We can run a little faster in hardware by making the observation that 10-5=5, or more generally 2n - 2n-1 = 2n-1.
What's the relevance of this? It says that we can just do the subtraction, and if we get it wrong patch it on the next iteration. This one will be a little tricky.
ldx #0 * clear X
clra * clear C
clrb
ldy #16 * loop counter
* subtract side: first time, and later if no errors
sloop xgdx * shift 0 into x
lsld
xgdx
subd denom * assume we can subtract...
bcs aend * find out if we were wrong
send inx * if we were right, bit is a 1
dey * next iteration
bne sloop
bra alldone
aloop xgdx * shift 0 into x
lsld
xgdx
addd denom * patching an old mistake
bcs send * result went positive; subtract next pass
bra aend
aend dey * we couldn't subtract this pass
bne aloop
blt alldone * we may have still been negative
add denom
alldone
(this code does make an assumption that the remained is signed, even
though the numerator and denominator are unsigned. Handling an
unsigned remainder would have required us to have two versions of the
aend
, so that if we fell through the end of the loop we'd
know whether the carry bit meant we just landed in aend on the last
pass, or meant we didn't find our way out on the last pass).
A couple of last notes: division is inherently harder than
multiplication, but there are some special cases that are easy. In
particular, 1/n
can be calculated using a different
algorithm that can be implemented quickly and cheaply. A few
computers (the Cray-1 being most prominent) have not even had a divide
instruction, due to this - the Cray had a reciprocal instruction, and
was able to computer m * 1/n
faster than it would have
been able to compute m/n
.