Supercomputer operating system
A supercomputer operating system is an operating system intended for supercomputers. Since the end of the 20th century, supercomputer operating systems have undergone major transformations, as fundamental changes have occurred in supercomputer architecture.[1] While early operating systems were custom tailored to each supercomputer to gain speed, the trend has been moving away from in-house operating systems and toward some form of Linux,[2] with it running all the supercomputers on the TOP500 list in November 2017. In 2021, top 10 computers run for instance Red Hat Enterprise Linux (RHEL), or some variant of it or other Linux distribution e.g. Ubuntu.
Given that modern massively parallel supercomputers typically separate computations from other services by using multiple types of nodes, they usually run different operating systems on different nodes, e.g., using a small and efficient lightweight kernel such as Compute Node Kernel (CNK) or Compute Node Linux (CNL) on compute nodes, but a larger system such as a Linux-derivative on server and input/output (I/O) nodes.[3][4]
While in a traditional multi-user computer system
Although most modern supercomputers use the Linux operating system,[6] each manufacturer has made its own specific changes to the Linux-derivative they use, and no industry standard exists, partly because the differences in hardware architectures require changes to optimize the operating system to each hardware design.[1][7]
Context and overview
In the early days of supercomputing, the basic architectural concepts were evolving rapidly, and system software had to follow hardware innovations that usually took rapid turns.[1] In the early systems, operating systems were custom tailored to each supercomputer to gain speed, yet in the rush to develop them, serious software quality challenges surfaced and in many cases the cost and complexity of system software development became as much an issue as that of hardware.[1]
In the 1980s the cost for software development at Cray came to equal what they spent on hardware and that trend was partly responsible for a move away from the in-house operating systems to the adaptation of generic software.[2] The first wave in operating system changes came in the mid-1980s, as vendor specific operating systems were abandoned in favor of Unix. Despite early skepticism, this transition proved successful.[1][2]
By the early 1990s, major changes were occurring in supercomputing system software.
Thus as general purpose operating systems became stable, supercomputers began to borrow and adapt the critical system code from them and relied on the rich set of secondary functions that came with them, not having to reinvent the wheel.
The separation of the operating system into separate components became necessary as supercomputers developed different types of nodes, e.g., compute nodes versus I/O nodes. Thus modern supercomputers usually run different operating systems on different nodes, e.g., using a small and efficient lightweight kernel such as CNK or CNL on compute nodes, but a larger system such as a Linux-derivative on server and I/O nodes.[3][4]
Early systems
The
The first Cray-1 was delivered to the Los Alamos Lab with no operating system, or any other software.[11] Los Alamos developed the application software for it, and the operating system.[11] The main timesharing system for the Cray 1, the Cray Time Sharing System (CTSS), was then developed at the Livermore Labs as a direct descendant of the Livermore Time Sharing System (LTSS) for the CDC 6600 operating system from twenty years earlier.[11]
In developing supercomputers, rising software costs soon became dominant, as evidenced by the 1980s cost for software development at Cray growing to equal their cost for hardware.[2] That trend was partly responsible for a move away from the in-house Cray Operating System to UNICOS system based on Unix.[2] In 1985, the Cray-2 was the first system to ship with the UNICOS operating system.[12]
Around the same time, the
By the middle 1990s, despite the extant investment in older operating systems, the trend was toward the use of Unix-based systems, which also facilitated the use of interactive
Modern approaches
The IBM
While in traditional multi-user computer systems and early supercomputers,
Some, but not all supercomputer schedulers attempt to maintain locality of job execution. The
See also
- Distributed operating system
- Supercomputer architecture
- Usage share of supercomputer operating systems
References
- ^ ISBN 0-387-09765-1pages 426–429.
- ^ ISBN 0-262-63188-1page 149–151.
- ^ ISBN 3-540-22924-8page 835.
- ^ a b An Evaluation of the Oak Ridge National Laboratory Cray XT3 by Sadaf R. Alam, et al., International Journal of High Performance Computing Applications, February 2008 vol. 22 no. 1 52–80.
- ^ ISBN 978-3-540-31024-2pages 95–101.
- ZDNet. Retrieved June 20, 2013.
- ^ "Top500 OS chart". Top500.org. Archived from the original on 2012-03-05. Retrieved 2010-10-31.
- ISBN 0-8157-2851-4 page 82 [1]
- ^ ISBN 0-262-22064-4page 258.
- ^ Design of a computer: the Control Data 6600 by James E. Thornton, Scott, Foresman Press 1970 page 163.
- ^ ISBN 0-8157-2851-4pages 81–83.
- ISBN 0-444-82163-5 page 126 [2].
- ^ a b c
Lloyd M. Thorndyke, The Demise of the ETA Systems in "Frontiers of Supercomputing II by Karyn R. Ames, Alan Brenner 1994 ISBN 0-520-08401-2pages 489–497.
- ISBN 3-540-19664-1page 326.
- ISBN 0-520-08401-2page 356.
- ^ Brightwell, Ron Riesen, Rolf Maccabe, Arthur. "On the Appropriateness of Commodity Operating Systems for Large-Scale, Balanced Computing Systems" (PDF). Retrieved January 29, 2013.
{{cite web}}
: CS1 maint: multiple names: authors list (link) - ^ ISBN 0-309-09502-6page 136.
- ^ Forbes magazine, 03.15.05: Linux Rules Supercomputers
- ^ ISBN 3-540-37783-2.
- ^ ISBN 3-642-04632-0pages 138–144.
- ^ SLURM at SchedMD
- ^ Jette, M. and M. Grondona, SLURM: Simple Linux Utility for Resource Management in the Proceedings of ClusterWorld Conference, San Jose, California, June 2003 [3]