Inverted and Soft Page Tables

It turns out that for really-large address space machines, a multi-level paging structure like we've been describing so far really isn't such a good idea. A newer idea, which has appeared in some 64-bit architectures, is that of "inverted page tables"; closely related is the notion of implementing page tables completely in software.

Translation Lookaside Buffers

Before talking about inverted page tables, we need to describe an implementation detail regarding virtual memory: translation lookaside buffers (TLBs). TLBs were created to avoid a horrible performance hit caused by virtual memory translations, and have become a more and more important as a means of increasing the flexibility of the virtual memory scheme.

So, let's consider the process of performing a virtual memory translation, as we've described it so far for Intel.

  1. Use the first ten bits of the virtual address to perform a lookup in the process's page directory, and obtain the address of one of the process's page tables (that's a memory access).

  2. Use the second ten bits of the virtual address to perform a lookup in a page table, and obtain the address of one of the process's pages (that's another memory access).

  3. Use the last twelve bits of the virtual address to actually get to the memory location (that's a third memory address; if the data is misaligned).

So every time we want to read or write memory, we need three memory accesses. That's unacceptable. If it's all in cache, it's still three cache accesses; that's not as bad as if it were out in memory, but it's still something we want to try to get rid of.

The idea of a TLB is that we can have a special-purpose cache, separate from the "real" cache, used only to save virtual memory translations. The idea here (and I don't want to go into details; they're in 473) is that a limited number (maybe eight to 32 or so) of translations are stored in the TLB, and when we want to perform a VM translation of an address that is in the TLB we can do it with a single access that takes no longer than a cache access. If we are sufficiently clever about it, we can even manage to do the TLB translation at the same time as the cache lookup, and avoid having any time penalty at all ("I have things up my sleeves" -- Tennessee Williams, The Glass Menagerie).

For more details on how TLBs look, I've got some notes on them at http://www.cs.nmsu.edu/~pfeiffer/classes/473/notes/tlb.html and on cache/TLB interactions at http://www.cs.nmsu.edu/~pfeiffer/classes/473/notes/cache-vm.html

What we need to take away from this for this class is that the tree-walking required for a virtual memory translation can really be avoided almost all the time.

Large Address Spaces Mean Deep Trees

The next point to bring up is what's going to happen to page tables as our virtual memory space gets bigger (as, indeed, has happened with 64 bit processors). Basically, pages will get bigger, but they won't get big enough for a three-level page table to handle them. So, either we need to have a deeper tree structure (four levels? five?) or we need to try something radically different.

What's wrong with just having a deeper tree structure? A couple of things. First, we end up using more space for our page tables; that's unpleasant since we want to be efficient, but probably isn't really a major concern. For that matter, the size of our page tables grows with the virtual address space and the number of processes, which isn't as reasonable as growing with the size of the resource being managed: the physical memory. Again, that really isn't a major concern. A bigger problem is that TLB misses get more and more expensive. So let's try something radically different that (1) has the size of the paging data structures grow according to the amount of physical memory, not the virtual memory space, and (2) doesn't have its access time increase with virtual memory space.

Inverted Page Tables

The basic idea is that instead of mapping virtual addresses to physical addresses, we'll map physical address to virtual addresses. Then we'll use a hashing function to do our table lookups.

So here's the idea. We have a table of inverted page table entries someplace. We also have a table of pointers at a fixed location in memory. When we want to do a translation, we hash our virtual page number to give us an index into the pointer table. Now we follow the pointer to an ipt entry. We compare the ipt entry to the virtual page number to see if we've got a success; if not we have a fault. Notice that if we get a hit on our first hash, it takes the same number of accesses to do a translation with an IPT as for Intel. As we go to larger address spaces, it still just takes us two addresses for the translation.

We use standard chaining techniques to resolve collisions in the IPT.

Software TLB Management

Here's one last clever idea. Remember I commented that the time to do a lookup really isn't that big a deal, even as the tables get deeper, since almost all our lookups are resolved by the TLB. Let's explore that idea one step further...


Last modified: Wed Oct 19 11:25:33 MDT 2005