Inverted Page Tables

First, remember basic TLB idea: cache of virtual memory translations. Absolutely vital for performance! The majority of TLB implementations are completely in hardware; they are managed just like a very small cache. However, given a little bit of size to the TLB, its hit rate will be much higher than the cache hit rate. This implies that we can manage the TLB in software; Alpha does this.

As address spaces get bigger, page tables get more unwieldy. 64-bit address might imply something like 5 levels of table (10 bit index for all but last). As virtual address spaces grow so much bigger than physical addresses, it becomes possible that we can save space and time by inverting the page table -- mapping physical to virtual, instead of the other way 'round.

So, instead of each process having a page table, and the page table entries mapping virtual addresses to physical addresses, we can have a single inverted page table for the system, which the entries mapping physical frames to (process ID, virtual page number) pairs. But, this requires very efficient search strategies! It can be done by hashing, and using a hash anchor table to map hashed virtual addresses to entries in the inverted page table. Then, each inverted PTE has to contain:

Here's a picture of the scheme.

The steps in a translation are:

  1. Hash the process ID and virtual page number to get an index into the HAT.
  2. Look up a Physical Frame Number in the HAT.
  3. Look at the inverted page table entry, to see if it is the right process ID and virtual page number. If it is, you're done.
  4. If the PID or VPN does not match, follow the pointer to the next link in the hash chain. Again, if you get a match then you're done; if you don't, then you continue. Eventually, you will either get a match or you will find a pointer that is marked invalid. If you get a match, then you've got the translation; if you get the invalid pointer, then you have a miss.

The surprising thing is that this can be reasonably efficient. If you can have a successful search in only a couple of probes on average, then fewer accesses are needed than to do a lookup in a page table that is more than a couple of levels deep.

Sharing memory between processes is problematic, to say the least.