TDT09 Topic 3


Everything is a file, except when not

There are two hard problems in computer science: caching, naming things and off-by-one errors
This infamous quote holds some truth - many problems in computer science have one of these problems hidden deep inside (especially the first two ;-)). Here, we will take a closer look at naming the resources that a user of an operating system can use. Resources can include many different things, such as files on a local or remote disk, devices of a system, servers on the network, kernel data structures, windows in a GUI, etc.

This heterogeneous collection of resources led to the implementation of very different approaches to access them. Early Unix systems were among the first that tried to provide a unique access method to different resources, often shortened to "everything is a file". For example, devices are accessible using normal file operations (e.g., open/read/write) in the /dev directory already in very early versions of Unix. Later versions introduced virtual file hierarchies that represent data structures internal to the kernel, e.g. process information in the /proc directory [1]. This enabled the development of portable system utilities such as ps, which previously read the kernel memory space directly (e.g. via /dev/kmem) and had to be adapted and recompiled for every new kernel version.

However, Unix did not stay consistent with this approach. One example is the handling of network connections using the socket interface [2]. In most Unix-like systems, one cannot access resources on the network via the file system. This deficiency was corrected in the followup project to Unix, Plan 9, which provides a consistent name space for local as well as remote resources [3].

However, there are also completely different approaches to handling resources. Large address spaces (e.g. in 64 bit processors) enable the use of pointers to identify objects. This led to the development of single address space operating systems that make use of this large, sparsely populated virtual address space to map objects, as in the Mungi OS [4]. Based on pointers, one can build further abstractions, e.g. by representing every resource in a system as an object in memory as in Smalltalk [5]. The Multics operating system used a hybrid approach. While persistens storage in Multics was file based, the usual method to access files was to create a memory mapping of the file, enabling access via pointers [6]. A similar approach was later introduced in BSD Unix with the mmap system call [7].

References

  1. T. J. Kilian. Processes as files. Usenix 1994 pdf
  2. An Introductory 4.4BSD Interprocess Communication Tutorial, Stuart Sechrest, University of California, Berkeley. 1986 pdf
  3. Rob Pike, Dave Presotto, Ken Thompson, Howard Trickey, and Phil Winterbottom. The use of name spaces in plan 9. In Proceedings of the 5th workshop on ACM SIGOPS European workshop link
  4. G. Heiser. Implementation and Performance of the Mungi Single-Address-Space Operating System. Software - Practice and Experience (1997) link
  5. Ted Kaehler. Virtual Memory for an Object-Oriented Language. In Byte, the Smalltalk issue, 1981, p. 378ff. link
  6. A. Bensoussan, C. T. Clingen, and R. C. Daley. The Multics Virtual Memory: Concepts and Design. Communications of the ACM, May 1972 pdf
  7. Avadis Tevanian et al. A UNIX Interface for Shared Memory and Memory Mapped Files Under Mach. USENIX Summer 1987 pdf