Synchronous Dynamic Ram

It seems like the most effective way to discuss the tradeoff between narrow, wide, and pipelined memory as discussed in the book is to discuss the most common memory in use in recent PCs today, SDRAM.

Most discussions of SDRAM focus on the fact that it is synchronized with the system clock. For our purposes, what's important is how it is accessed, without considering that detail.... The information here comes mostly from Texas Instruments' SDRAM Technical Reference

Row and Column Addresses

DRAM (including SDRAM) is always arranged as an array. An address is broken into two parts, a row and a column, before being presented to the array. The row is presented first, activating it, and then the column is presented. Naturally, getting successive columns from within a single row is faster than getting data from several rows, since there is no need to send a new row address.

SDRAM

The three most important features of SDRAM, from our perspective, are

  1. There is an onboard buffer. When we present a row address, we can transfer up to 256 bytes from the memory array to the buffer. Now we can transfer data from the buffer more quickly than if we were going to transfer it all straight from the memory.
  2. There is a burst transfer mode. Once the data is in the buffer, a burst of 1, 2, 4, 8 or 256 transfers can happen on receipt of a column address. This is much faster than requiring a new column for each transfer.
  3. There are at least two banks of memory. We can be activating one while transferring from the other.

What does this mean to our attempts to balance the costs of our memory bus against the performance we want? Instead of just using DRAM (so we would incur the full transfer latency on each eight bytes of a cache block, and the bus would be waiting for the latency most of the time), or using DRAM with a wide bus (which would be more expensive, and the bus would spend most of its time sitting and waiting for data), we can try to balance the width of the bus against the latency of the memory so we are getting better bus utilization than the wide version, but higher speed than the narrow version.

SDRAM and the P4 L2 Cache

The P4 L2 Cache has the following specifications:

While we're at it, it has two L1 caches: an 8K, 4-way set associative data cache (the specifications I found didn't mention the block size), and an instruction cache that holds up to 12K micro-ops (so the instruction cache doesn't cache instructions!) with a completely unspecified structure.


Last modified: Wed Oct 31 11:30:46 MST 2001