Hard cold facts of life:
- Bigger memory is slower (for a given technology)
- Faster memory is more expensive (there's a word for slow,
expensive memory: obsolete).
These two facts lead to the concept of a memory hierarchy: we want to
use several types of memory of varying speeds and sizes, and arrange
them in a hierarchy.
Basic idea of the memory hierarchy is that we want to simultaneously
increase the average speed (per access) and decrease the average cost
(per byte) of memory.
The average cost per byte is given by the weighted average of the cost
per byte of memory at each level, while the average speed per access
is given by the weighted average of the speed of access at each
level. This can mean either latency (the time to access the first
piece of a transfer) or bandwidth (the rate at which data can be
transferred).
The basic idea at all levels is that we take an address and apply a
mapping function to it to get a new address at some level of the
hierarchy. The implementation of the mapping function depends on the
level of the hierarchy. There are two important levels at which the
characteristics of the mapping function will tend to change: between
memory and disk, and between on-chip and off-chip.
The importance of the memory/disk distinction is that disk has very
different access characteristics than memory.
- Disk has horrible latency. Memory latency in modern systems is
something like 10-100 nsec; disk latency is something like 10 msec.
That's a facter of between 100,000 and a million!
- Disk has a much more extreme difference between its latency and
its bandwidth than memory does. Even very sophisticated pipelined
memory schemes (like Cray used) won't show a difference between
latency and bandwidth than a factor of around 10; disk shows a latency
of around 10 msec and bandwidth of around 10 MB/sec.
These will end up implying that (1) we need to avoid having to go to
disk for a memory access at nearly any cost, (2) when we do have to go
to memory, we want to bring in a big chunk. The answer will turn out
to be we want a lookup table, filled by software but read by hardware.
The importance of the on-chip/off-chip distinction is that the memory
cost function is different between on-chip and off-chip.
- Off-chip,
the primary determinant of memory cost is number of packages and
number of pins. If you can increase the amount of memory in a package
without markedly increasing the pin count, you can increase the amount
of memory at a sub-linear cost (if you check memory prices in a
catalog, you will typically find that the cost of a module doesn't go
up at anywhere near the size of the module, until you get to the ones
that were just released very recently at which point there is a jump.
- On-chip, the primary determinant is area on the chip. If you can
provide a more flexible mapping without markedly increasing the area,
you can do a better mapping.
The result of this is that on-chip and off-chip cache will use
markedly similar hardware algorithms to produce the mapping, but with
different parameterization.