Lightweight kernel operating system
A lightweight kernel (LWK) operating system is one used in a large computer with many processor cores, termed a parallel computer.
A
Examples
Custom lightweight kernel operating systems, used on some of the fastest computers in the world, help alleviate this problem. The
The Cray XT4 and Cray XT5 supercomputers run Compute Node Linux[2] while the earlier XT3 ran the lightweight kernel Catamount which was based on SUNMOS. Sandia National Laboratories has an almost two-decade commitment to lightweight kernels on its high-end HPC systems.[3] Sandia and University of New Mexico researchers began work on SUNMOS for the Intel Paragon in the early 1990s. This operating system evolved into the Puma, Cougar - which achieved the first teraflop on ASCI Red - and Catamount on Red Storm. Sandia continues its work in LWKs with a new R&D effort, called kitten.[4]Characteristics
Although it is surprisingly difficult to exactly define what a lightweight kernel is,[5] there are some common design goals:
- Targeted at massively parallel environments composed of thousands of processors with distributed memory and a tightly coupled network.
- Provide necessary support for scalable, performance-oriented scientific applications.
- Offer a suitable development environment for parallel applications and libraries.
- Emphasize efficiency over functionality.
- Maximize the amount of resources (e.g., CPU, memory, and network bandwidth) allocated to the application.
- Seek to minimize time to completion for the application.[6]
Implementation
LWK implementations vary, but all strive to provide applications with predictable and maximum access to the central processing unit (CPU) and other system resources. To achieve this, simplified algorithms for scheduling and memory management are usually included. System services (e.g., daemons), are limited to the absolute minimum. Available services, such as job launch are constructed in a hierarchical fashion to ensure scalability to thousands of nodes. Networking protocols for communication between nodes in the system are also carefully selected and implemented to ensure scalability. One such example is the Portals network programming application programming interface (API).
Lightweight kernel operating systems assume access to a small set of nodes that are running full-service operating systems to offload some of the necessary services: login access, compiling environments, batch job submission, and file I/O.
By restricting services to only those that are absolutely necessary and by streamlining those that are provided, the overhead (sometimes called noise) of the lightweight operating system is minimized. This allows a significant and predictable amount of the processor cycles to be given to the parallel application. Since the application can make consistent progress on each processor, they will reach their synchronization points faster, ideally at the same time. Lost wait time is reduced.
Future
The last supercomputers running lightweight kernels are the remaining IBM
References
- ^ Moreira, Jose; et al. (November 2006). "Designing a Highly-Scalable Operating System: The Blue Gene/L Story". Proceedings of the 2006 ACM/IEEE International Conference for High-Performance Computing, Networking, Storage, and Analysis (SC’06).
{{cite journal}}
: Cite journal requires|journal=
(help) - ^ Wallace, D. (May 2007). "Compute Node Linux: Overview, progress to date, and roadmap". Proceedings of the 2007 Cray User Group Annual Technical Conference.
{{cite journal}}
: Cite journal requires|journal=
(help) - ^ Riesen, Rolf; et al. (April 2009). "Designing and Implementing Lightweight Kernels for Capability Computing". Concurrency and Computation: Practice and Experience.
{{cite journal}}
: Cite journal requires|journal=
(help) - ^ "Kitten Lightweight Kernel".
- ^
Riesen, Rolf; et al. (June 2015). "What is a Lightweight Kernel?". Proceedings of the 5th International Workshop on Runtime and Operating Systems for Supercomputers. pp. 1–8. S2CID 11698915. Retrieved 19 October 2019.
- ^ Kelly, S.; Brightwell, R. (May 2005). "Software Architecture of the Light Weight Kernel, Catamount". Proceedings of the 2005 Cray User Group Annual Technical Conference.
{{cite journal}}
: Cite journal requires|journal=
(help)