![]() |
![]() |
|||||
|
The MIT Alewife ProjectGoalsIn recent years, large scale multiprocessors have been developed which are capable of truly astounding feats of computing power. While the best case is always the one that attracts the most attention, it is important to realize that these results are achieved not only from pouring a great deal of effort into the design of the machine. The problem also generally requires agonizing months of algorithm development, programming, debugging, and relentless tuning. Worse yet, often a parallel architecture is designed around a specific problem and is then only effective for that type of problem. Fine, for weather forecasters, but what about the rest of us?The MIT Alewife machine was designed with the goal of programmability in mind. The hardware, compiler, and operating system all work together to solve the problems which are traditionally a burden to parallel programmers; namely scheduling computation and moving data between processing elements. Features of the Alewife system include:
The Programming ModelWhile the programmer sees a shared memory programming model, the actual implementation uses message passing to achieve the sharing of data. Message passing provides a more efficient and scalable architecture as the number of processing nodes in the system becomes large. Alewife features that help to improve the performance of message passing include:
Alewife has compilers for a parallel version of ANSI C and a parallel version of LISP called Mul-T. For parallel C, Alewife supports the p4 library from Argonne National Laboratory as well as parallel loops and distributed arrays. Latency ToleranceBecause there is no way to avoid all cache misses, Alewife provides certain mechanisms for minimizing the delay caused by having to fetch data from a remote node. The Alewife compiler supports prefetching of data, so that the latency can be avoided by requesting data before it is actually needed. Block multithreading allows the processor to switch to a different thread of execution if the current thread is delayed by a cache miss. This option is supported by the fast context switching of the Sparcle processor.Debugging and TuningA version of the GNU Debugger (GDB) has been developed for Alewife to support program debugging. The debugger allows the user to set breakpoints, examine data and registers on individual nodes, and inspect both active and blocked threads. In parallel programs it is also useful to inspect the execution of programs in order to tune them for maximum performance. Alewife's LimitLESS cache coherence system can be configured to collect information about which memory locations are being shared and accessed and how that affects performance. The Communications and Memory Management Unit, which handles memory accesses for its node, also provides extensive facilities for performance monitoring. It can generate histograms of a variety of hardware events including cache hits and misses, instruction counts, and network throughput statistics. A graphical user interface is provided for this service which allows a user to access both static and dynamic views of performance data. |