CS473 - HW5

Out of Order Execution

Due Monday, October 14, 2002

  1. (20 points) Draw a Gantt (timing) chart, using the same conventions as those on my CDC notes page, showing how long it would take to execute the following code on a CDC:
    
    X1 <- X2 / X3
    X4 <- X1 * X0
    X0 <- X3 + X5
    X6 <- X2 * X3
    
    
  2. (50 points)

    Here's some code that purports to be MIPS code for the C program in HW4 (since it only ``purports to be,'' you don't need to worry about whether I've got a bug in it!), except that it should only go through the loop body twice:

    	
          lui     $2, hi(awry)
          ori     $2, $2, lo(awry) ; pointer to awry
          addi    $3, $2, 8        ; end of awry
          lui     $4, hi(histo)
          ori     $4, $4, lo(histo); pointer to histo
        
    loop: lw      $5, 0($2)        ; read awry[i]
          sll     $5, $5, 2        ; convert to index
          add     $5, $4, $5       ; get pointer to location in histo
          lw      $6, 0($5)        ; get old value of histo
          addiu   $6, $6, 1        ; increment histo
          sw      $6, 0($5)        ; store back
          addiu   $2, $2, 4        ; increment awry pointer
          bne     $2, $3, loop     ; if not done, do it again
    	
    	

    Now, assume you have a CPU executing MIPS code with the following assumptions:

    In a moment, I'll be asking you some questions about this machine. But first, here's an example showing the execution of the first few instructions.

    execution of first six instructions

    1. On the first cycle, instructions 1-4 are fetched into IF/ID.

    2. Instructions 1 and 4 have no dependencies on other instructions, so they start down the pipe. This frees space for two more instructions, so 5 and 6 are fetched.

    3. Instruction 2 (which was held for a cycle) will have had its dependency satisfied by the time it reaches the execute stage, so it starts down the pipe on cycle 3. Instruction 5 can also start down the pipe (it didn't actually have to wait, since it was just fetched last cycle).

      Since a second pair of instructions have started, there is room to fetch another pair. I'm not going to show that; it's where the homework starts for you.

    4. Instructions three and six have now had their dependencies satisfied, and can proceed. Again, there's room for two more instructions in IF/ID.

    As the instructions move on to retirement, notice that 1-5 all take four cycles (though they may be delayed by stalls), and 6 takes five.

    One little thing: it would be easy to get a misimpression from this that you can always fetch two new instructions per cycle. You can't; the number you can fetch is determined by the number that go down the pipes. So if only one went down a pipe you could only fetch one; if four went down together you could fetch four.

    Now then, on to the questions:

    1. In the assumptions at the start of this problem, I mentioned infinite pipelines, register porting, and so forth. Given the restriction on the size of the IF/ID register, how ``infinite'' do the other resources actually have to be? In particular, what is the maximum number of arithmetic and memory pipelines you can actually need? How many register reads and writes actually have to be possible on a cycle? How many simultaneous memory reads and writes are actually needed?
    2. Draw a Gantt (timing) diagram showing how the code I gave you is executed on this machine.
    3. How many arithmetic and memory pipelines actually turn out to be needed for this code on this machine?
    4. Evaluate the extent to which the IF/ID register is a bottleneck in the execution of the code, and the extent to which all available parallelism has been exploited.


Last modified: Tue Oct 8 10:41:01 MDT 2002