Brief History of High Performance Computing
Parallelism has always been present, particularly in IO intensive
areas. IBM had dedicated, ``smart'' IO controllers handling a large
part of business tasks by the late 1950s.
In the 1960s and 1970s, the primary focus of high-performance
computing was in building single high-performance processors (CDC
6600, IBM 360/91, Cray 1). Exotic hardware was the most effective
means to performance, as we could see with innovations like the 6600
being the first computer to use silicon (rather than germanium)
transistors, and many high-end mainframes (including the Cray-1) using
non-saturated logics like ECL, combined with really impressive cooling
technologies.
Nobody realized it at the time, but the right road to performance
changed in 1971, when the Intel 8008 came out. The key advance here
is the amount of functionality that could be crammed on a chip, and
the fact that the market would grow to the point that enormous
resources could be devoted to the development of processors, but with
sales high enough that the per-processor development cost would be
very low. As a result, single-chip processor performance increased at
a rate of 50% per year from the mid 1980s until the mid 2000s, while
``supercomputers'' were only seeing the same 20% per year improvement
they'd had since the beginning.
The question becomes, how do we make effective use of silicon with a
lot of functionality? Some early answers have included machines like
ZMOB (a ``mob'' of Z-80s), the Cosmic Cube (a hypercube of 8086s), and
custom designs like MPP or the Connection Machine (both SIMD
processors).
Current answer seems to be to use as much COTS hardware as possible,
augmented with special-purpose hardware where necessary.
Note on DEC view of CPI: when the Alpha was introduced, they
considered the VAX's lifespan (~25 years), and saw a factor of 1000
improvement in that time. Assumption: there will be another factor
of 1000 performance in next 25 years. How?
- Clock rate: only see a factor of ten there
- Multiprocessors: see another factor of ten there
- Instruction-level parallelism: last factor of ten. They want to
execute 10 instructions/cycle!
Landmark Computers
Here is my completely idiosyncratic view of a few computers which have
had a huge impact:
- ABC (1942) and ENIAC (1946)
- The first two "computers" were the Atanasoff-Berry Computer
and ENIAC. ABC was designed to perform Gaussian row elimination
to solve linear algebra problems; the ENIAC was capable of
executing programs that were entered by setting switching and
changing cable connections.
The developers of ENIAC were originally granted a patent on the
digital computer; this was overturned in 1973 and ABC is legally
recognized as the first digital computer. You can see some
highly partisan information on which was really first at http://www.cs.iastate.edu/jva/jva-archive.shtml (the ABC side) and
http://ftp.arl.mil/~mike/comphist/eniac-story.html (the ENIAC side).
- COLOSSUS
- Another early computing engine that deserves mention here is
COLOSSUS, built during World War II to break the Lorenz cipher
used by the Germans during the war (Lorenz was a more advanced
cipher than the better-known Enigma). Alan Turing had a great
deal of input into its design; a total of ten of these
machines were built during the war.
Due to wartime secrecy, and a long, long delay in relaxing
this security, the developers of COLOSSUS never got the
recognition they deserved at the time. Like ABC and ENIAC,
COLOSSUS has a claim to being the "first computer"; also like
those two, it was not a stored-program computer.
- EDVAC and EDSAC (1949)
- From a modern perspective, neither ABC, ENIAC, nor COLOSSUS had
what I would regard as the key to a ``real'' computer: the ability to
maintain a stored program: neither was a "stored program"
computer. ABC was a hard-wired calculator, and ENIAC's
programming amounted to rewiring the computer.
John von Neumann receives the credit
for the idea of a stored-program computer (the 1945 paper on
EDVAC it had only his name as author), though it's very likely
Eckert and Mauchly (the ENIAC
developers) were as responsible for it as he was. Maurice
Wilkes began development of a stored-program computer called
EDSAC (Electronic Delay Storage Automatic Calculator) in 1946;
it was operational in 1949.
EDSAC used mercury delay lines for memory. It also had a
subroutine instruction; this was called a ``wheeler jump'' after
David Wheeler (the grad student who invented it).
- Manchester/Ferranti Mark I (1951)
- This computer was operational in 1951; it used Williams
electrostatic storage tubes and a magnetic drum for memory. It
was this project (though before the Mark I itself) that first
executed a small stored program. This was the first
commercially available computer.
- Whirlwind (1952)
- Originally intended as an analog controller for a flight
simulator during WWII, this machine wound up being a digital
computer, with no flight simulator, first operational in 1952.
Core memory was invented for this computer.
Whirlwind was arguably the world's first supercomputer: a
computer developed using the most advanced technology available,
with a very cost-is-no-object philosophy, for high performance.
It was able to execute a whopping 50,000 operations per second!
- IBM 7030 (STRETCH) (1961)
- By the mid 1950s, there were quite a few companies building
computers. At that time, there was a much clearer distinction
between a ``commercial'' processor and a ``scientific''
processor than there is now: a scientific processor would
likely not have instructions aimed at doing packed decimal
arithmetic, while a commercial processor would not have floating
point, for instance. IBM's scientific and commercial computers
at the time were the 704 and 705. The STRETCH project aimed at
developing a machine using germanium transistor technology
instead of tubes, which was to be 100 times faster than the 704
and 705, and would be useable as both a commercial and
scientific computer. Its circuitry was 10 times faster than
that of the 704 and 705 due to transistors, and improvements in
core memory made memory accesses six times faster. It also
introduced pipelining.
It didn't fully meet its performance goals, especially on
commercial codes, but it was a huge step forward. Another
candidate for ``first supercomputer,'' based on degree of
improvement in state of the art.
- CDC 6600 (1964)
- First use of silicon transistors.
Advanced packaging technology, with cross-shaped computer. Yet
another candidate for first supercomputer, for all the same
reasons as Whirlwind and STRETCH (in the CDC's case, it
introduced silicon transistors to computers).
- CDC 7600 (1969)
- Fully pipelined functional units, another step forward in
packaging technology (and shaped like the letter C! Not that
its designer, Seymour Cray, had an ego or anything. One
of CDC's later computers -- I think it was the CDC STAR 100, but
haven't been able to verify it -- was shaped like the letter
"t". Not that Jim Thornton had an ego or anything. He probably
should have steered clear of the temptation; the STAR was a
major disappointment to everybody and wound up costing Thornton
his job). Four CPUs in the box.
- ILLIAC IV (1974)
- 64 processors all executing in lockstep; first SIMD computer I
know of. For a while computers with this approach were a
popular way of performing image processing.
- Cray-1 (1976)
- First successful commercial pipelined computer, 1976.
Non-saturated ECL technology gave balanced circuit load (and
dissipated huge amounts of power). 12.5 nanosecond cycle
time -- IBM didn't catch up until 15 years later. 65 sold.
Yes, another candidate for first
supercomputer, based on parallel instructions in instruction
set. Hot-rodded by Steve Chen as X-MP.
- Goodyear MPP (1979)
- 128x128 grid of one-bit processors (another SIMD design),
specifically intended for computer vision applications.
Capable of 250 MFlops.
- Cray-2 (1985)
- Wanted to use GaAs, but technology wasn't mature enough.
Fluorocarbon direct immersion cooling. First Cray-2 sold had more
memory than all Cray-1s deployed. No, I haven't come across
anyone who thought of this as the first supercomputer :)
- IBM 801 (1975), RISC (1982), and MIPS (1982)
- Three processors, all with instruction sets designed around the
twin concepts of making the common case fast, and being easy to
create a pipelined single-chip implementation. RISC project
coined term RISC; Hennessy and Patterson (RISC and MIPS
developers) regard 801 as first RISC processor. An argument
could be made that the CDC 6600 was really the first RISC
machine.
- Caltec Cosmic Cube (1985)
- Six-dimensional hypercube of 8086's.
- Connection Machine (1985)
- Up to 65,536 one-bit processors arranged in a hypercube,
specifically intended for artificial intelligence applications.
- Intel 80486 (1990)
- Pipelined implementation of Intel IA-32 instruction set. IA-32
had widely been regarded as impossible to pipeline effectively,
due to extreme variability in sizes of instructions; this
machine showed it could be done, and that IA-32 could be an
effective processor for high-performance applications.
- Intel Pentium Pro (P6 processor core) (1995)
- Introduced Out of Order execution to IA-32 processor line.
- Modern High Performance
- A dominant factor in high performance computing today has
been the increased die size and decreased feature size of
integrated circuits. Systems have become more and more
integrated, with first CPUs, then floating point units and
cache, and now even muliple processor cores and levels of
cache on a single chip.
The current highest-speed computers in the world amount to
massive arrays of commodity processors, with very high-speed
interconnects -- so-called Pile o' PC
or Beowulf clusters (the best-known early project to build a
Pile o' PCs was named Beowulf).
Another recent development has been the recognition that
forever-higher clock rates and ever-greater instruction level
parallelism was reaching a limit, and that putting multiple
processor cores on a chip has become a better road to high
performance at present.
As of this writing (January, 2009) the fastest computer
in the world is Roadrunner, at Los Alamos National Labs. It's got
129,600 PowerXCell 8i processor cores. Wow.
Last modified: Fri Jan 16 13:01:42 MST 2009