CS 473 - HW4

More MIPS

Due Wednesday, February 26, 2003

Consider the following block of C code, which looks up 100 items from an array in a table and computes a checksum based on the items found there (this is actually a very practical problem: I'm having to do it for some barcode scanning software I'm writing for Science Fair).


checksum = 0;
for (i = 0; i < 100; i++)
    checksum = (checksum + table[string[i]]) % 256;

Make the following assumptions:

  1. i and checksum are local variables declared as int, stored in register $t0 and $t1, respectively.
  2. string and table are both global variables. string is declared as char[], while table is declared as int[]. Following standard C semantics, you don't need to do bounds checking on the arrays. Unfortunately, the addresses of both string and table are too large to fit in 16 bits, so you'll have to construct their addresses by hand. To simplify things a bit, assume you can say high(symbol) to mean the high-order 16 bits of a symbol, and low(symbol) to mean the low-order 16 bits of the symbol. You can do simple arithmetic involving constants only, and use that where a symbol would go (so you could say something like low(sym+1), but not low(sym+i), as i is a variable).
  3. Assume the "standard" MIPS pipeline (as shown on page 499), additionally assuming any extra data paths needed for instructions that this pipeline can't handle. Also, assume no delayed loads and branches. The main points here are that
    1. you have a five-stage pipeline
    2. a taken branch requires a 1-cycle stall
    3. a lw, in which the loaded value is used for arithmetic in the immediately following instruction, also requires a one-cycle stall.
  4. Use only "real" MIPS instructions, no pseudo-instructions: so, for instance, you can't use the rem (remainder) pseudo-instruction.

On to the problems. Note that I want exact answers to all the questions. When I ask for code size, that means only the code; you don't need to count space for variables.

  1. (20 points) ``Naively'' compile the code sequence shown into MIPS assembly code. By ``naively'' translate, I mean perform a straightforward translation, without thinking about optimizing the code. How large (measured in bytes) is your resulting code? How many cycles does it take to execute, given the assumptions?
  2. (20 points) Optimize your code from the previous question for minimum execution time: do absolutely anything you can think of to make the code run as quickly as possible; this typically involves eliminating the loop. Now how large is it? How many cycles does it take?
  3. (20 points) Now optimize your code for minimum size. This time, do absolutely anything you can think of to make the code as small as possible. How large is it? How fast is it? The resulting optimized code is typically very similar to a naive compilation
  4. (30 points) Finally, optimize your code for a compromise between speed and size on a dual-pipeline superscalar MIPS, as shown on pages 511 through 514 (use the book version, not my lecture version, since the book version is easier to look up details on). Also as on page 513, you should assume a delayed branch this time. How big is your code? How fast? Note that this last one won't have a unique correct answer; as you found in the previous parts of the problem, you can trade of space vs. time, and I've deliberately not told you exactly how to trade (all the same, a minimum-size answer will likely lose points for being slow, and a minimum-time answer will ikely lose points for being big).

Last modified: Wed Feb 19 10:36:28 MST 2003