Code Optimization

The text talks a little bit about compiler optimizations that directly take advantage of the pipeline, and avoids stalls. I'd like to provide a little bit of context here, and discuss some more general optimization techniques first. So what we're going to do is to take a running example, naively compile it for the DLX, and then apply optimizations to it. The running example will be adding the contents of two arrays together to create a third array. In C, it would look like this:

for (i = 0; i < arrsize; i++)
    c[i] = a[i] + b[i];

We assume that i and arrsize are global integer variables, and that a, b and c are global integer arrays.

Naive Code Generation

We simply produce assembly code that will evaluate the expression. We keep all variables in memory, load them as required, and store them whenever their values change. We assume an infinite number of registers, planning to pare the number used down later (this is a standard approach to register allocation).


      xor   r1, r1, r1             ; clear r1
      lhi   r2, i >> 16
      sw    i & 0xffff(r2) r1  ; store r1 to i


loop:
      lhi   r3, i >> 16      ; fetch i into r4
      lw    r4, i & 0xffff(r3)

      lhi   r5, arrsize && 16 ; fetch arrsize into r6
      lw    r6, arrsize & 0xffff(r5),

      slt   r7, r4, r6          ; compare r4 to r6
      bnez  r7, body            ; enter loop body if compare said no.

      beqz  r0, endloop         ; jump around loop body if you didn't enter


body: ; get a[i]
      lhi   r8, a>>16         ; first get address of a into r9
      addi  r9, r8, a & 0xffff

      lhi   r10, i >> 16      ; now get index into r12
      lw    r11, i & 0xffff(r10)
      slli  r12, r11, 2

      add   r13, r9, r12            ; load a[i] into r14
      lw    r14, 0(r13)

      ; get b[i]
      lhi   r15, b>>16        ; first get address of b into r16
      addi  r16, r15, b & 0xffff

      lhi   r17, i >> 16       ; get index into r19
      lw    r18, i & 0xffff(r17)
      slli  r19, r18, 2

      add   r20, r16, r19            ; load b[i] into r21
      lw    r21, 0(r20)

      add   r22, r14, r21            ; add a[i] + b[i]

      ; store c[i]
      lhi   r23, c>>16        ; get address of c into r24
      addi  r24, r23, c & 0xffff

      lhi   r25, i >> 16      ; get index into r27
      lw    r26, i & 0xffff(r25)
      slli  r27, r26, 2

      add   r28, r24, r27           ; store r22 to c[i]
      sw    0(r28), r22

      ; increment i
      lhi   r29, i >> 16      ; load i
      lw    r30, i & 0xffff(r29)

      addi  r31, r30, 1             ; add 1

      lhi   r32, i << 16      ; store i
      sw    i & 0xffff(r32), r31

      beqz  r0   loop

endloop: