The multilevel table scheme used by Intel works very well -- but as address spaces get larger, it gets impractical, for a couple of reasons. First, memory management on large address space machines tends to result in quite sparse memory maps, so you'll tend to go through several levels of page table with only one or two valid entries at each level. Also, on a TLB miss, you have to go through a memory access for each level of page table, until you finally get either a hit or a miss. So, let's try to find a more efficient approach.
Consider the IBM RS/6000. It uses a 4K page, like Intel -- but it generates a 52 bit virtual address! To use the Intel multilevel scheme would require four levels of page table -- if IBM used a 4-byte PTE like Intel does. But they don't; it's sixteen bytes, which would make things worse. They could only decode eight bits of an address per page, so they'd need five levels of page table.
A second problem is that if the TLB works well (ie, the vast majority of PTE lookups are TLB hits), we seldom do a lookup in the page table entries anyway. Instead, the majority of accesses to the page tables are to perform bookkeeping operations, such as searching for a page to eject, or updating the Accessed bits. The multilevel structure really isn't well suited to this; generally, for these functions, we're only interested in the pages actually in memory at the moment.
The solution is to invert the page tables: only have a single page table in the system, which maps physical pages to virtual pages. Now, when you need to do a translation, you do a search in this table: when you get a match, you know you've found the entry you're looking for. Sounds like a disaster, unless we have a good search strategy; well, hashing gives us an O(1) expected time, so that's what we'll use. IBM has used inverted in quite a few systems, including the 801 (a research machine that some call the first RISC), the PC/RT, the RS/6000 POWER, the AS/400, and the PowerPC. The particular variant we'll talk about will be the POWER architecture's IPT, due to its clever support for database locking. I haven't seen a clear description of how close this comes to the details of the implementation of the others I listed up above. First we'll talk about how the RS/6000 generates addresses, then we'll show how the inverted page table structure works.
32 bit ``Effective Address,'' divided as (bits are numbered left to right)
| Segment Number |
Page Number |
Offset |
| 0-3 | 4-19 | 20-31 |
This is an odd sort of ``segmented'' virtual memory system. Ordinarily, a segmented system keeps the address divided into two parts in a programmer-visible way; in this case, it acts just like a linear address space, except that the programmer has to be aware that the high order address bits act funny.
So, high order four bits select one of 16 segment registers. Segment registers contain 24 segment ID bits, plus a bunch of others. I've seen references to:
Segment ID, virtual page index, offset are combined to form 52 bit virtual address like this:
We search by using yet another table: the Hash Anchor Table, or HAT. The 40 bit VPN is hashed to get an index into a hash table with the same number of entries as the number of pages of physical memory. This provides a pointer into a Page Frame Table, which has one entry for each page of physical memory. The PFT entry is a four word (16 byte) entry, containing (going from left to right):
| Field | Meaning | Width |
|---|---|---|
| VPN | Virtual Page Number | 27 |
| V | Valid VPN | 1 |
| F | Referenced | 1 |
| C | Changed (dirty!) | 1 |
| PP | Page Protect | 2 |
| I | Invalid pointer | 1 |
| unused | 11 | |
| Next PFT ptr | 20 | |
| Lock bits | 32 | |
| L | Lock type | 1 |
| W | Grant write locks | 1 |
| R | Grant read locks | 1 |
| A | Allow read | 1 |
| unused | 12 | |
| TID | Transaction ID | 16 bits |
Page translation is easy here. But wait a minute? Aren't we 13 bits short? Yes. Hash index is guaranteed to be unique in low order 13 bits. All done in hardware.
Locks? Remember, this is IBM, so support for databases is important. The RS/6000 has hardware support for database locking at the cache line level; locks can be granted automatically by the hardware simply by having a process try to read or write the cache line. When complicated cases arise, or a lock is denied, it traps to the OS and lets software deal with it.
Notice that the number of lock bits works out to 1 per cache line. All on a page will be the same type.
Lock type says read locks vs. write locks.
Finally, there is a logic table which determines circumstances under which accesses are allowed and locks are granted. First, the table for read accesses:
| B | L | A | R | TID Match | Access Permitted | Notes |
|---|---|---|---|---|---|---|
| - | - | 0 | - | no | no | |
| 0 | 0 | 0 | 0 | yes | no | |
| 0 | 1 | 0 | 0 | yes | yes | |
| 1 | 0 | 0 | 0 | yes | yes | |
| 1 | 1 | 0 | 0 | yes | yes | |
| 0 | 0 | 0 | 1 | yes | yes | Set lock bit to 1 |
| 1 | 0 | 0 | 1 | yes | yes | |
| - | 1 | 0 | 1 | yes | yes | |
| - | - | 1 | - | - | yes |
And now the table for write accesses:
| B | L | W | TID Match | Access Permitted | Notes |
|---|---|---|---|---|---|
| - | - | - | no | no | |
| 0 | 0 | 0 | yes | no | |
| 0 | 1 | 0 | yes | yes | |
| 1 | 0 | 0 | yes | yes | |
| 1 | 1 | 0 | yes | yes | |
| 0 | 0 | 1 | yes | yes | Set lock bit and lock type to 1; all other lock bits to 0 |
| 0 | 1 | 1 | yes | yes | Set lock bit to 1 |
| 1 | 1 | 1 | yes | yes |
Advantages of inverted page table:
Separate instruction and data caches
Instruction cache: 8K, 2-way set associative, real addresses. But notice that the cache is the right size to use the byte offset from the effective address to find a cache set, while the rest of the virtual address is being constructed! Another feature is that two cache lines can be read simultaneously, if both are on the same page.
64K data cache, 4 way set associative, 128 byte line, real address, partial writeback (two change bits per line). Notice that this means you have to complete a virtual address translation before you can start a cache access -- but they don't! Instead, there is a software requirement that the low order two bits of the virtual address can't be different from the low order two bits of the physical address (and, we'll see that they actually have more strenuous requirements than this!).
Cache miss requires 8 memory accesses to satisfy. Gets the first word first.
Misaligned accesses OK, IF the data all comes from a single cache line. Otherwise, fault handled by software.