TDT09

TDT09 Topic 7

Massively parallel, completely different

Massively parallel computers for high-performance computing (HPC) applications have significantly different requirements to the operating system compared to typical server and desktop computers. These systems typically have a large number of processors (tens to hundreds of thousands), different I/O (disks and network connections to the outside are centralized in special nodes), and a single processor usually does not have to be time-shared between different applications (but the complete system can be partitioned to support multiple users at the same time).

Operating systems should not stand in the way of the user by requiring a lot of resources for themselves. This is especially relevant in HPC systems, which should deliver the highest possible performance to the application. Accordingly, specialized OS kernels support thin hardware abstractions and enable the operation of applications in these complex systems.

Accordingly, different systems have been developed to support these massively parallel systems. Early manyprocessor systems such as the Connection Machine [1] had only very primitive systems, but later systems, such as the nCUBE [2] and the German SUPRENUM [3] project, already used sophisticated systems.

More recent massively parallel systems, such as IBM's BlueGene system, were a fertile basis for OS research [4]. A version of the Plan 9 OS [5] was ported to BlueGene due to Plan 9's excellent support for transparently distributed networked systems with specialized functionality (such as having separate compute, networking and I/O nodes). Recent developments in massively parallel systems include Steve Furber's SpiNNaker system [6], a research project in neuromorphic computing intended to model the functions of biological brains, which requires special system software [7]. On the more conventional side, the current number 1 system in the TOP500 supercomputer list, Fujitsu's Fukagu, is based on specialized ARM CPUs running McKernel [8], a bespoke lightweight kernel to support HPC applications.

References

L. W. Tucker and G. G. Robertson, "Architecture and applications of the Connection Machine," in Computer, vol. 21, no. 8, pp. 26-38, Aug. 1988 link
B. Duzett and R. Buck, "An overview of the nCUBE 3 supercomputer," in The Fourth Symposium on the Frontiers of Massively Parallel Computation link
W. Schröder. PEACE: The distributed SUPRENUM operating system. Parallel Computing Volume 7, Issue 3, 1988 link
Ronald G. Minnich et al. Right-weight kernels: an off-the-shelf alternative to custom light-weight kernels. ACM SIGOPS Operating Systems Review April 2006 link
Ronald G. Minnich and Jim McKie. Experiences porting the Plan 9 research operating system to the IBM Blue Gene supercomputers. Computer Science - Research and Development volume 23, pages 117–124(2009) link
Furber, S. B. et al. The SpiNNaker Project. Proceedings of the IEEE. 102 (5): 652–66 link
Steve Furber et al. Overview of the SpiNNaker System Architecture. IEEE Transactions on Computers 62(12):2454-2467 link
B. Gerofi, M. Takagi, A. Hori, G. Nakamura, T. Shirasawa and Y. Ishikawa, "On the Scalability, Performance Isolation and Device Driver Transparency of the IHK/McKernel Hybrid Lightweight Kernel," 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS) link