Understanding UNIX
I wanted to understand how come UNIX is so standardized and developer-friendly.
Before engineers at BELL Labs even thought UNIX, the computer industry suffered unproductivity in large-scale projects: too many people, too many inefficiencies, delay, and improper planning. UNIX made computers more productive.
Unlike hardware, people demand new or modified features all the time while using the software. This invited building software in a change-tolerant way.
The paper titled, “The UNIX TimeSharing System” by Dennis M. Ritchie and Ken Thompson from Bell Laboratories, is available here.
UNIX is an OS designed for general use, multi-user, and interactive (that is, you can have GUIs instead of just commands).
Major features for UNIX (in 1974):
(1) a hierarchical file system incorporating demountable volumes
(2) compatible file, device, and inter-process I/O
(3) the ability to initiate asynchronous processes
(4) system command language selectable on a per-user basis
(5) over 100 subsystems, including a dozen languages.
Introduction
Engineers at Bell Labs wrote this paper in 1974, and at that time, spending $40k and two years on development and usage was not expensive. UNIX turned out to be a powerful OS in interactive OSs; it impacted the industry as it was cheap to develop and use.
UNIX software was developed and maintained by the UNIX team, making it a self-sufficient system.
The three most important characteristics were:
- simplicity,
- elegance,
- ease of use.
Hardware and Software Environment
UNIX was completely written in C. some parts weren’t initially but then later on rewritten in C.
The File System
There are three types of files:
- Ordinary Files
- Directories
- Special Files
Ordinary Files
Normal files created by users — directly or indirectly ( via programs ). The text files have a string of characters with the newline character as a separator for each line. There are also binary programs with the exact data as it would appear in core memory when executing a corresponding program. The UNIX system plays no role in structuring the system files — it leaves them upon the programs and the users.
Directories
Directories are just like ordinary files, but they contain an additional mapping of the parent directory and the files or subdirectories. Each directory will always have at least two entries in its mapping: “.” and “..”, the former being the present directory and the latter pointing out to the previous or immediate parent directory.
Why do we need a mapping to the parent directory?
To maintain the “previous” pointer while traversing paths. If there were arbitrary linking allowed in UNIX, navigation would be too hard and complicated.
UNIX also saves directories from being written by unprivileged programs so that any unprivileged user/program does not modify the contents of the directories. In this way, the system holds the structure of files.
Just like root, the system has multiple directories of its own. There is also a system directory for all the commands. Commands are programs provided for general use.
Linking: There can be a single non-directory file that is present in multiple directories. UNIX links such files with those directories. If you recall from above, directories contain a mapping of files under them. So, all UNIX does is maps this one file in multiple directories. This means that a file does not reside in a particular directory but its name and pointer. A file exists independent of its parent directory(ies). However, UNIX decides to delete a file if the last link to it is deleted.
Special Files
Like ordinary files have I/O interaction with and from the user, special files have the same connection with the associated devices. They are also called drivers (the paper does not refer to such files as drivers though me).
UNIX consciously chose to treat the I/O devices as files for interacting with them:
- file and device I/O are as similar as possible — handling device I/O becomes executing a file
- file and device names have the same syntax and meaning — so the programs you write that take file names and/or paths as arguments can also be used for drivers
- special files can be secured using the same mechanism as the regular files — the same security bits we all know of
It’s noteworthy that the above three points seem obvious, or “if not this, then what else?” was a conscious choice for the UNIX engineers back then.
Removable File Systems
Objective: More storage than present in the machine and/or a different file hierarchy than /root.
UNIX makes it easy by letting you attach removable volumes to the system. Thus, the system may have volumes that comprise: i) the system’s own volume (the /root) and ii) removable volumes that you mount.
After mounting, there is no virtual distinction between the permanent and removable storage.
If you can recall the hierarchy tree mentioned above, the mount replaces that hierarchy with a totally new tree that does not start from /root.
Please recall the Linking concept: which is basically creating pointers to file so that directories can get linked to the files stored under them. Also, recall that files are independent of their directories. There is one rule enforced by UNIX for linking: no link may exist between one file system hierarchy and another.
Protection
Each user has a UID. When a user creates any file, that file has that user as its creator. Now, files have 7 bits for their permissions, where the last bit is for determining ownership. So, if the last bit is set, any other user can execute the file. UNIX will temporarily change the creator of this file to the user executing it. However, if the 7th bit is not set, then only the creator can execute the file. Please note that executing a file means executing it as a program.
I/O Calls
The system calls responsible for I/O calls are designed so that the user needs not to worry about sequential/random access. There are no user-visible locks in the UNIX file system. There is no restriction on the number of users accessing the same file for reading or writing. The authors acknowledge that files can get corrupted if two or more users write on them simultaneously. Still, they discard this situation by stating: “this situation does not arise in practice.” The paper reasons for not having locks due to the following two factors:
Locks are not sufficient.
- Consider a normal case where one user is denied access to reading a file when the other user is editing it. Locks help you achieve this mechanism. However, confusion is still possible for two users accessing the different copies of the same file.
- Copies? An editor makes a copy of the file being edited. So, if two users are editing the same file, there are two copies in their respective editors. Hence, in practice, locks cannot prevent confusion when the two editors require a merge.
Locks are not necessary.
- People at Bell Labs(where this system got developed) do not have large files operating on a single database with access to independent processes at the same time. It is not in their requirements.
The system has internal locks to deal with some major inconsistencies, such as deleting a currently opened file, creating files in the same directory by the same name, etc.
This question deals with all the use cases UNIX discusses for reading and writing files in a stream-fashioned way.
Processes and Images
An image is the state of a computer — the environment and attributes. A process is an execution of an image. The processor executes the image, and that execution is called a process. It will remain active unless a higher priority process replaces it, or it gets crashed, or the best case — it gets finished executing by itself. If undergoing an execution, the image resides in the core; if not, it gets swapped out.
Concept of Fork
fork() is a method in UNIX using which a program splits into two parts. The core's initial running image is called the parent process, whereas the newly forked process of the same image — a different process altogether, is called the child process. Whatever files were opened before forking are shared with both processes. When the child process finishes executing, it returns to the fork call in the parent process.
Pipelining
This was new in the world of software when UNIX launched this. Data flows from left to right, one program to another, made the lives of software programmers a lot easier. It removed the effort of handling data using intermediary files and writing programs for itself. UNIX obtained the data pipeline mechanism using its basic primitives.
Considerations that made UNIX successful
Built by programmers
UNIX developers were themselves computer programmers, and so they wanted to make their own lives easier.
Salvation through Suffering
That is the exact phrase authors mention in the paper — they had to build this system cheaply and on low resources, and so, the design turned out to be an elegant piece of work.
System maintaining itself
The most important constraint, perhaps, was that UNIX had to maintain itself — that is, all the functional and non-functional requirements of the system were supposed to be handled by UNIX itself. So, in other words, you use UNIX to build UNIX — that is like eating your own dog food.