Inverted Page Tables

The multilevel table scheme used by Intel works very well -- but as address spaces get larger, it gets impractical, for a couple of reasons. First, memory management on large address space machines tends to result in quite sparse memory maps, so you'll tend to go through several levels of page table with only one or two valid entries at each level. Also, on a TLB miss, you have to go through a memory access for each level of page table, until you finally get either a hit or a miss. So, let's try to find a more efficient approach.

Consider the IBM RS/6000. It uses a 4K page, like Intel -- but it generates a 52 bit virtual address! To use the Intel multilevel scheme would require four levels of page table -- if IBM used a 4-byte PTE like Intel does. But they don't; it's sixteen bytes, which would make things worse. They could only decode eight bits of an address per page, so they'd need five levels of page table.

A second problem is that if the TLB works well (ie, the vast majority of PTE lookups are TLB hits), we seldom do a lookup in the page table entries anyway. Instead, the majority of accesses to the page tables are to perform bookkeeping operations, such as searching for a page to eject, or updating the Accessed bits. The multilevel structure really isn't well suited to this; generally, for these functions, we're only interested in the pages actually in memory at the moment.

The solution is to invert the page tables: only have a single page table in the system, which maps physical pages to virtual pages. Now, when you need to do a translation, you do a search in this table: when you get a match, you know you've found the entry you're looking for. Sounds like a disaster, unless we have a good search strategy; well, hashing gives us an O(1) expected time, so that's what we'll use. IBM has used inverted in quite a few systems, including the 801 (a research machine that some call the first RISC), the PC/RT, the RS/6000 POWER, the AS/400, and the PowerPC. The particular variant we'll talk about will be the POWER architecture's IPT, due to its clever support for database locking. I haven't seen a clear description of how close this comes to the details of the implementation of the others I listed up above. First we'll talk about how the RS/6000 generates addresses, then we'll show how the inverted page table structure works.

IBM RS/6000 Addressing

32 bit ``Effective Address,'' divided as (bits are numbered left to right)

Segment
Number
Page
Number
Offset
0-3 4-19 20-31

This is an odd sort of ``segmented'' virtual memory system. Ordinarily, a segmented system keeps the address divided into two parts in a programmer-visible way; in this case, it acts just like a linear address space, except that the programmer has to be aware that the high order address bits act funny.

So, high order four bits select one of 16 segment registers. Segment registers contain 24 segment ID bits, plus a bunch of others. I've seen references to:

Special bit
enable hardware locking mechanisms
IO
selects IO vs. memory space

Segment ID, virtual page index, offset are combined to form 52 bit virtual address like this:

RS/6000 Effective Address to Virtual Address Translation

Virtual Memory Translation

We search by using yet another table: the Hash Anchor Table, or HAT. The 40 bit VPN is hashed to get an index into a hash table with the same number of entries as the number of pages of physical memory. This provides a pointer into a Page Frame Table, which has one entry for each page of physical memory. The PFT entry is a four word (16 byte) entry, containing (going from left to right):

Field Meaning Width
VPN Virtual Page Number 27
V Valid VPN 1
F Referenced 1
C Changed (dirty!) 1
PP Page Protect 2
I Invalid pointer 1
unused 11
Next PFT ptr 20
Lock bits 32
L Lock type 1
W Grant write locks 1
R Grant read locks 1
A Allow read 1
unused 12
TID Transaction ID 16 bits

Page translation is easy here. But wait a minute? Aren't we 13 bits short? Yes. Hash index is guaranteed to be unique in low order 13 bits. All done in hardware.

Locks? Remember, this is IBM, so support for databases is important. The RS/6000 has hardware support for database locking at the cache line level; locks can be granted automatically by the hardware simply by having a process try to read or write the cache line. When complicated cases arise, or a lock is denied, it traps to the OS and lets software deal with it.

Notice that the number of lock bits works out to 1 per cache line. All on a page will be the same type.

Lock type says read locks vs. write locks.

Finally, there is a logic table which determines circumstances under which accesses are allowed and locks are granted. First, the table for read accesses:
BLARTID MatchAccess PermittedNotes
--0-nono
0000yesno
0100yesyes
1000yesyes
1100yesyes
0001yesyesSet lock bit to 1
1001yesyes
-101yesyes
--1--yes

And now the table for write accesses:
BLWTID MatchAccess PermittedNotes
---nono
000yesno
010yesyes
100yesyes
110yesyes
001yesyesSet lock bit and lock type to 1; all other lock bits to 0
011yesyesSet lock bit to 1
111yesyes

Advantages of inverted page table:

Disadvantages:

RS/6000 Caches

Separate instruction and data caches

Instruction cache: 8K, 2-way set associative, real addresses. But notice that the cache is the right size to use the byte offset from the effective address to find a cache set, while the rest of the virtual address is being constructed! Another feature is that two cache lines can be read simultaneously, if both are on the same page.

64K data cache, 4 way set associative, 128 byte line, real address, partial writeback (two change bits per line). Notice that this means you have to complete a virtual address translation before you can start a cache access -- but they don't! Instead, there is a software requirement that the low order two bits of the virtual address can't be different from the low order two bits of the physical address (and, we'll see that they actually have more strenuous requirements than this!).

Cache miss requires 8 memory accesses to satisfy. Gets the first word first.

Misaligned accesses OK, IF the data all comes from a single cache line. Otherwise, fault handled by software.


Last modified: Wed Apr 2 09:22:21 MST 2003