Cray-2

Cray-2
	Cray Research
Type	Supercomputer
Release date	1985
Discontinued	1990
Units sold	25
CPU	Custom Vector Processors
Predecessor	Cray X-MP

NERSC

Front view of 1985 Supercomputer Cray-2, Musée des Arts et Métiers, Paris

Side view of 1985 Supercomputer Cray-2, Musée des Arts et Métiers, Paris

The Cray-2 is a

Cray Research starting in 1985. At 1.9 GFLOPS peak performance, it was the fastest machine in the world when it was released, replacing the Cray X-MP in that spot. It was, in turn, replaced in that spot by the Cray Y-MP

in 1988.

The Cray-2 was the first of Seymour Cray's designs to successfully use multiple CPUs. This had been attempted in the CDC 8600 in the early 1970s, but the emitter-coupled logic (ECL) transistors of the era were too difficult to package into a working machine. The Cray-2 addressed this through the use of ECL integrated circuits, packing them in a novel 3D wiring that greatly increased circuit density.

The dense packaging and resulting heat loads were a major problem for the Cray-2. This was solved in a unique fashion by forcing the electrically inert Fluorinert liquid through the circuitry under pressure and then cooling it outside the processor box. The unique "waterfall" cooler system came to represent high-performance computing in the public eye and was found in many informational films and as a movie prop for some time.

Unlike the original Cray-1, the Cray-2 had difficulties delivering peak performance. Other machines from the company, like the X-MP and Y-MP, outsold the Cray-2 by a wide margin. When Cray began development of the Cray-3, the company chose to develop the Cray C90 series instead. This is the same sequence of events that occurred when the 8600 was being developed, and as in that case, Cray left the company.

Initial design

With the successful launch of his famed

Control Data HQ in Minneapolis, Minnesota, Cray management understood his needs and supported his move to a new lab in Boulder, Colorado. Working as an independent consultant at these new Cray Labs, starting in 1980 he put together a team and started on a completely new design. This lab would later close, and a decade later a new facility in Colorado Springs

would open.

Cray had previously attacked the problem of increased speed with three simultaneous advances: more functional units to give the system higher parallelism, tighter packaging to decrease signal delays, and faster components to allow for a higher clock speed. The classic example of this design is the

ECL logic into a 1 × 1 meter cylinder and ran them at an 8 ns cycle speed (125 MHz). Unfortunately, the density needed to achieve this cycle time led to the machine's downfall. The circuit boards inside were densely packed, and since even a single malfunctioning transistor

would cause an entire module to fail, packing more of them onto the cards greatly increased the chance of failure. Cooling the closely packed individual components also represented a major challenge.

One solution to this problem, one that most computer vendors had already moved to, was to use integrated circuits (ICs) instead of individual components. Each IC included a selection of components from a module pre-wired into a circuit by the automated construction process. If an IC did not work, another one would be tried. At the time the 8600 was being designed the simple MOSFET-based technology did not offer the speed Cray needed. Relentless improvements changed things by the mid-1970s, however, and the Cray-1 had been able to use newer ICs and still run at a respectable 12.5 ns (80 MHz). In fact, the Cray-1 was actually somewhat faster than the 8600 because it packed considerably more logic into the system due to the ICs' small size.

Although IC design continued to improve, the physical size of the ICs was constrained largely by mechanical limits; the resulting component had to be large enough to solder into a system. Dramatic improvements in density were possible, as the rapid improvement in microprocessor design was showing, but for the type of ICs used by Cray, ones representing a very small part of a complete circuit, the design had plateaued. In order to gain another 10-fold increase in performance over the Cray-1, the goal Cray aimed for, the machine would have to grow more complex. So once again he turned to an 8600-like solution, doubling the clock speed through increased density, adding more of these smaller processors into the basic system, and then attempting to deal with the problem of getting heat out of the machine.

Another design problem was the increasing performance gap between the processor and

main memory. In the era of the CDC 6600

memory ran at the same speed as the processor, and the main problem was feeding data into it. Cray solved this by adding ten smaller computers to the system, allowing them to deal with the slower external storage (disks and tapes) and "squirt" data into memory when the main processor was busy. This solution no longer offered any advantages; memory was large enough that entire data sets could be read into it, but the processors ran so much faster than memory that they would often spend long times waiting for data to arrive. Adding four processors simply made this problem worse.

To avoid this problem the new design banked memory and two sets of registers (the B- and T-registers) were replaced with a 16

word

buffers, instead of tying up the existing cache pipes to the background processors. Modern CPUs use a variation of this design as well, although the foreground processor is now referred to as the load/store unit and is not a complete machine unto its own.

Main memory banks were arranged in quadrants to be accessed at the same time, allowing programmers to scatter their data across memory to gain higher parallelism. The downside to this approach is that the cost of setting up the scatter/gather unit in the foreground processor was fairly high. Stride conflicts corresponding to the number of memory banks suffered a performance penalty (latency) as occasionally happened in power-of-2 FFT-based algorithms. As the Cray 2 had a much larger memory than Cray 1s or X-MPs, this problem was easily rectified by adding an extra unused element to an array to spread the work out.

Packed circuit boards and new design ideas

Early Cray-2 models soon settled on a design using large circuit boards packed with ICs. This made them extremely difficult to solder together, and the density was still not enough to reach their performance goals. Teams worked on the design for about two years before even Cray himself "gave up" and decided it would be best if they simply canceled the project and fired everyone working on it. Les Davis, Cray's former design collaborator who had remained at Cray headquarters, decided it should be continued at low priority. After some minor personnel movements, the team continued on much as before.

Six months later Cray had his "eureka" moment. He called the main engineers together for a meeting and presented a new solution to the problem. Instead of making one larger circuit board, each "card" would instead consist of a 3-D stack of eight, connected together in the middle of the boards using pins sticking up from the surface (known as "pogos" or "z-pins"). The cards were packed right on top of each other, so the resulting stack was only about 30 mm high.

With this sort of density there was no way any conventional air-cooled system would work; there was too little room for air to flow between the ICs. Instead the system would be immersed in a tank of a new inert liquid from 3M, Fluorinert. The cooling liquid was forced sideways through the modules under pressure, and the flow rate was roughly one inch per second. The heated liquid was cooled using chilled water heat exchangers and returned to the main tank. Work on the new design started in earnest in 1982, several years after the original start date.

While this was going on the Cray X-MP was being developed under the direction of Steve Chen at Cray headquarters, and looked like it would give the Cray-2 a serious run for its money. In order to address this internal threat, as well as a series of newer Japanese Cray-1-like machines, the Cray-2 memory system was dramatically improved, both in size as well as the number of "pipes" into the processors. When the machine was eventually delivered in 1985, the delays had been so long that much of its performance benefits were due to the faster memory. Purchasing the machine really made sense only for users with huge data sets to process.

The first Cray-2 delivered possessed more physical memory (256 MWord) than all previously delivered Cray machines combined. Simulation moved from a 2-D realm or coarse 3-D to a finer 3-D realm because computation did not have to rely on slow virtual memory.

Uses and successors

The Cray-2 was predominantly developed for the

Finite Element Analysis

models of car bodyshells, and for performing virtual crash testing of bodyshell components prior to production.

The Cray-2 would have been superseded by the Cray-3, but due to development problems only a single Cray-3 was built and it was never paid for. The spiritual descendant of the Cray-2 is the Cray X1, offered by Cray.

Comparison to later computers

In 2012, Piotr Luszczek (a former doctoral student of Jack Dongarra), presented results showing that an iPad 2 matched the historical performance of the Cray-2 on an embedded LINPACK benchmark.^[1]

Trivia

Due to the use of liquid cooling, the Cray-2 was given the nickname "Bubbles", and common jokes around the computer made reference to this unique system. Gags included "No Fishing" signs, cardboard depictions of the

perfluoroisobutylene.^[2] At the time, Cray had created a poster showing the transparent "bubble chamber" that the cooling fluid was run through for visual effect, with a spill of the same material glistening on the floor—the joke was that if this actually occurred, the facility would have to be evacuated.^[3]

The manufacturer of the liquid developed a scrubber that could be placed in line with the pump that would catalytically degrade this toxic breakdown product.

Each vertical stack of logic modules sat above a stack of power modules which powered 5 volt busbars, each of which delivered about 2200 amps. The Cray-2 was powered by two motor-generators, which took in 480 V three-phase.

References

^ Larabel, Michael (16 September 2012). "Apple iPad 2 As Fast As The Cray-2 Super Computer". Archived from the original on 20 February 2015. Retrieved February 19, 2015.
^ Kwan, J. Kelly, R, Miller G. Presentation at the American Industrial Hygiene Conference, Salt Lake City, UT, May 1991
^ Kelly, R. J., Personal Experience^{[unreliable source?]}

External links

Records
Preceded by Cray X-MP/4 713 megaflops	World's most powerful supercomputer 1985–1987	Succeeded by Cray Y-MP/832 2.144 gigaflops

[iPad2_Cray2-1] Larabel, Michael (16 September 2012). "Apple iPad 2 As Fast As The Cray-2 Super Computer". Archived from the original on 20 February 2015. Retrieved February 19, 2015.

[2] Kwan, J. Kelly, R, Miller G. Presentation at the American Industrial Hygiene Conference, Salt Lake City, UT, May 1991

[3] Kelly, R. J., Personal Experience^{[unreliable source?]}

[1]

[2]

[3]

v t e Cray computers
Cray Research	Cray-1 Cray X-MP Cray-2 Cray Y-MP Cray XMS Cray Y-MP EL Cray C90 Cray EL90 Cray T3D Cray J90 Cray T90 Cray T3E Cray SV1
Cray Computer Corp.	Cray-3 Cray-3/SSS Cray-4
Cray Research Superservers	Cray APP Cray S-MP Cray CS6400
Cray Inc.	Cray SX-6 Cray MTA-2 Cray Red Storm Cray X1 Cray XT3 Cray XD1 Cray XT4 Cray XMT Cray XT5 Cray CX1 Cray XT6 Cray XE6 Cray CX1000 Cray XK6 Cray XK7 Cray XC30 Cray XC40 Cray XC50 Urika-XA Urika-GD