Code Optimization
The text talks a little bit about compiler optimizations that directly
take advantage of the pipeline, and avoids stalls. I'd like to
provide a little bit of context here, and discuss some more general
optimization techniques first. So what we're going to do is to take a
running example, naively compile it for the DLX, and then apply
optimizations to it. The running example will be adding the contents
of two arrays together to create a third array. In C, it would look like
this:
for (i = 0; i < arrsize; i++)
c[i] = a[i] + b[i];
We assume that i and arrsize are global
integer variables, and that a, b and
c are global integer arrays.
Naive Code Generation
We simply produce assembly code that will evaluate the expression. We
keep all variables in memory, load them as required, and store them
whenever their values change. We assume an infinite number of
registers, planning to pare the number used down later (this is a
standard approach to register allocation).
xor r1, r1, r1 ; clear r1
lhi r2, i >> 16
sw i & 0xffff(r2) r1 ; store r1 to i
loop:
lhi r3, i >> 16 ; fetch i into r4
lw r4, i & 0xffff(r3)
lhi r5, arrsize && 16 ; fetch arrsize into r6
lw r6, arrsize & 0xffff(r5),
slt r7, r4, r6 ; compare r4 to r6
bnez r7, body ; enter loop body if compare said no.
beqz r0, endloop ; jump around loop body if you didn't enter
body: ; get a[i]
lhi r8, a>>16 ; first get address of a into r9
addi r9, r8, a & 0xffff
lhi r10, i >> 16 ; now get index into r12
lw r11, i & 0xffff(r10)
slli r12, r11, 2
add r13, r9, r12 ; load a[i] into r14
lw r14, 0(r13)
; get b[i]
lhi r15, b>>16 ; first get address of b into r16
addi r16, r15, b & 0xffff
lhi r17, i >> 16 ; get index into r19
lw r18, i & 0xffff(r17)
slli r19, r18, 2
add r20, r16, r19 ; load b[i] into r21
lw r21, 0(r20)
add r22, r14, r21 ; add a[i] + b[i]
; store c[i]
lhi r23, c>>16 ; get address of c into r24
addi r24, r23, c & 0xffff
lhi r25, i >> 16 ; get index into r27
lw r26, i & 0xffff(r25)
slli r27, r26, 2
add r28, r24, r27 ; store r22 to c[i]
sw 0(r28), r22
; increment i
lhi r29, i >> 16 ; load i
lw r30, i & 0xffff(r29)
addi r31, r30, 1 ; add 1
lhi r32, i << 16 ; store i
sw i & 0xffff(r32), r31
beqz r0 loop
endloop: