Why Use Speculation
Proper instruction scheduling reorders instructions to maximize the efficiency of a processor's resources.
A local scheduler divides a given program in segments called basic blocks. A basic block is a contiguous set of instructions that has a single entry and a single exit point. Branches do not exist within a basic block. A local scheduler is limited to reorganizing the instructions of a basic block within the bounds of the block.
However, a branch instruction typically occurs about once in every eight instructions. Because of the small typical size of basic blocks and other common data dependencies, local schedulers typically have poor performance.
Additionally, latencies to secondary caches, tertiary caches and main memory are continually increasing when expressed as number of clock cycles.
Another critical issue related to loads is the fact that they typically start a chain of computation:
IA-64 provides mechanisms for scheduling loads prior to logical stores and branches.
When a compiler moves a load instruction ahead of the basic block to which it belongs two conditions arise:
IA-64 explicitly provides a speculation check instruction, chk.s. This instruction has two arguments, a register and a memory address. When the speculation check is encountered, the NAT token of the register is checked. If it is not set, no operation takes place. Otherwise, execution is transferred to the location specified by the second argument. The check does not consume a clock cycle, and subsequent arguments that use the value of the register can execute in the same clock cycle as the check.
The second argument refers to the starting location of recovery code. The compiler is responsible for generating a sequence of code that will successfully execute the load if it speculatively failed, as well as any code that executed on the improper speculative load. The recovery code is responsible for restoring the correctness of the program.
The following example from  shows the effect of speculative control code:
It is possible that the compiler may need to move a load instruction ahead of a store instruction that may cause a potential conflict. It may be possible for the compiler to prove that no conflict exists, but as is often the case with pointers, the dependency remains ambiguous.
The following is an example from :
It is unlikely that *b will point to the global variable flag. However, the compiler must honor the potential dependency, and schedule the load of b after storing the incremented value of flag. IA-64 provides an instruction for advanced speculative loading allowing the data dependency to be broken, and allowing the scheduling of the load of *b prior to the store of flag. The following is the compiled code:
The ALAT (Advanced Load Address Table) detects collisions between advanced loads and stores. The following is an example ALAT table entry:
This entry stores the target register executing the speculative load, a portion of the physical address of the load target, and the byte size of the load quantity.
When an advanced load is executed, it checks for any existing entries for the same register, and overwrites those entries if necessary.
Store instructions pass the physical address to the ALAT which removes any matching entries from the table.
Finally, when the speculative check is performed, the ALAT is searched for an appropriate entry. If it exists, then the speculative load succeeded, and the memory remained unchanged. If there is no entry, it was removed by a store instruction, hence a collision. The speculative check then reloads the value into the target register, and any subsequent instructions with data dependencies must wait until the load is completed.Additionally, the ALAT requires no state saves on context switch. Upon a context switch, the ALAT is cleared. Upon return from the context switch, any outstanding speculative loads must execute recovery code, but the recovery penalty is considered much smaller than cold cache effects from context switch.
The compiler is also provided with instructions such as ld.c.clr and chk.a.clr for specifying when ALAT entries are no longer needed.