aboutsummaryrefslogtreecommitdiff
path: root/en_US.ISO8859-1/books/design-44bsd/book.sgml
diff options
context:
space:
mode:
Diffstat (limited to 'en_US.ISO8859-1/books/design-44bsd/book.sgml')
-rw-r--r--en_US.ISO8859-1/books/design-44bsd/book.sgml2858
1 files changed, 0 insertions, 2858 deletions
diff --git a/en_US.ISO8859-1/books/design-44bsd/book.sgml b/en_US.ISO8859-1/books/design-44bsd/book.sgml
deleted file mode 100644
index 6c44b5a92a..0000000000
--- a/en_US.ISO8859-1/books/design-44bsd/book.sgml
+++ /dev/null
@@ -1,2858 +0,0 @@
-<!-- $FreeBSD: doc/en_US.ISO_8859-1/books/design-44bsd/book.sgml,v 1.2 2001/03/07 07:21:23 sheldonh Exp $ -->
-<!-- FreeBSD Documentation Project -->
-
-<!DOCTYPE book PUBLIC "-//FreeBSD//DTD DocBook V4.1-Based Extension//EN" [
-<!ENTITY % man PUBLIC "-//FreeBSD//ENTITIES DocBook Manual Page Entities//EN">
-%man;
-]>
-
-<book>
- <bookinfo>
- <title>The Design and Implementation of the 4.4BSD Operating System</title>
-
- <authorgroup>
- <author>
- <firstname>Marshall</firstname>
- <othername>Kirk</othername>
- <surname>McKusick</surname>
- </author>
-
- <author>
- <firstname>Keith</firstname>
- <surname>Bostic</surname>
- </author>
-
- <author>
- <firstname>Michael</firstname>
- <othername>J.</othername>
- <surname>Karels</surname>
- </author>
-
- <author>
- <firstname>John</firstname>
- <othername>S.</othername>
- <surname>Quarterman</surname>
- </author>
- </authorgroup>
-
- <copyright>
- <year>1996</year>
- <holder>Addison-Wesley Longman, Inc</holder>
- </copyright>
-
-<!-- I seem to recall the editor wanting this notice to be bold. In html, I'd
- use the _strong_ tag. What should I use instead? -->
-
- <legalnotice>
- <para>The second chapter of the book, <citetitle>The Design and
- Implementation of the 4.4BSD Operating System</citetitle> is
- excerpted here with the permission of the publisher. No part of it
- may be further reproduced or distributed without the publisher's
- express written
- <ulink url="mailto:peter.gordon@awl.com">permission</ulink>. The
- rest of
- <ulink url="http://cseng.aw.com/book/0,3828,0201549794,00.html">the
- book</ulink> explores the concepts introduced in this chapter in
- incredible detail and is an excellent reference for anyone with an
- interest in BSD UNIX. More information about this book is available
- from the publisher, with whom you can also sign up to receive news
- of <ulink url="mailto:curt.johnson@awl.com">related titles</ulink>.
- Information about <ulink url="http://www.mckusick.com/courses/">BSD
- courses</ulink> is available from Kirk McKusick.</para>
- </legalnotice>
- </bookinfo>
-
- <chapter label="2">
- <title>Design Overview of 4.4BSD</title>
-
- <sect1>
- <title>4.4BSD Facilities and the Kernel</title>
-
- <para>The 4.4BSD kernel provides four basic facilities:
- processes,
- a filesystem,
- communications, and
- system startup.
- This section outlines where each of these four basic services
- is described in this book.</para>
-
- <orderedlist>
- <listitem>
- <para>Processes constitute a thread of control in an address space.
- Mechanisms for creating, terminating, and otherwise
- controlling processes are described in
- Chapter 4.
- The system multiplexes separate virtual-address spaces
- for each process;
- this memory management is discussed in
- Chapter 5.</para>
- </listitem>
-
- <listitem>
- <para>The user interface to the filesystem and devices is similar;
- common aspects are discussed in
- Chapter 6.
- The filesystem is a set of named files, organized in a tree-structured
- hierarchy of directories, and of operations to manipulate them,
- as presented in
- Chapter 7.
- Files reside on physical media such as disks.
- 4.4BSD supports several organizations of data on the disk,
- as set forth in
- Chapter 8.
- Access to files on remote machines is the subject of
- Chapter 9.
- Terminals are used to access the system; their operation is
- the subject of
- Chapter 10.</para>
- </listitem>
-
- <listitem>
- <para>Communication mechanisms provided by traditional UNIX systems include
- simplex reliable byte streams between related processes (see pipes,
- Section 11.1),
- and notification of exceptional events (see signals,
- Section 4.7).
- 4.4BSD also has a general interprocess-communication facility.
- This facility, described in
- Chapter 11,
- uses access mechanisms distinct from those of the filesystem,
- but, once a connection is set up, a process can access it
- as though it were a pipe.
- There is a general networking framework,
- discussed in
- Chapter 12,
- that is normally used as a layer underlying the
- IPC
- facility.
- Chapter 13
- describes a particular networking implementation in detail.</para>
- </listitem>
-
- <listitem>
- <para>Any real operating system has operational issues, such as how to
- start it running.
- Startup and operational issues are described in
- Chapter 14.</para>
- </listitem>
- </orderedlist>
-
- <para>Sections 2.3 through 2.14 present introductory
- material related to Chapters 3 through 14.
- We shall define terms, mention basic system calls,
- and explore historical developments.
- Finally, we shall give the reasons for many major design decisions.</para>
-
- <sect2>
- <title>The Kernel</title>
-
- <para>The
- <emphasis>kernel</emphasis>
- is the part of the system that runs in protected mode and mediates
- access by all user programs to the underlying hardware (e.g.,
- CPU,
- disks, terminals, network links)
- and software constructs
- (e.g., filesystem, network protocols).
- The kernel provides the basic system facilities;
- it creates and manages processes,
- and provides functions to access the filesystem
- and communication facilities.
- These functions, called
- <emphasis>system calls</emphasis>
- appear to user processes as library subroutines.
- These system calls are the only interface
- that processes have to these facilities.
- Details of the system-call mechanism are given in
- Chapter 3,
- as are descriptions of several kernel mechanisms that do not execute
- as the direct result of a process doing a system call.</para>
-
- <para>A
- <emphasis>kernel</emphasis>
- in traditional operating-system terminology,
- is a small nucleus of software that
- provides only the minimal facilities necessary for implementing
- additional operating-system services.
- In contemporary research operating systems -- such as
- Chorus
- <xref linkend="biblio-rozier">,
- Mach
- <xref linkend="biblio-accetta">,
- Tunis
- <xref linkend="biblio-ewens">,
- and the
- V Kernel
- <xref linkend="biblio-cheriton"> --
- this division of functionality is more than just a logical one.
- Services such as filesystems and networking protocols are
- implemented as client application processes of the nucleus or kernel.</para>
-
- <para>The
- 4.4BSD kernel is not partitioned into multiple processes.
- This basic design decision was made in the earliest versions of UNIX.
- The first two implementations by
- Ken Thompson had no memory mapping,
- and thus made no hardware-enforced distinction
- between user and kernel space
- <xref linkend="biblio-ritchie">.
- A message-passing system could have been implemented as readily
- as the actually implemented model of kernel and user processes.
- The monolithic kernel was chosen for simplicity and performance.
- And the early kernels were small;
- the inclusion of facilities such as networking
- into the kernel has increased its size.
- The current trend in operating-systems research
- is to reduce the kernel size by placing
- such services in user space.</para>
-
- <para>Users ordinarily interact with the system through a command-language
- interpreter, called a
- <emphasis>shell</emphasis>,
- and perhaps through additional user application programs.
- Such programs and the shell are implemented with processes.
- Details of such programs are beyond the scope of this book,
- which instead concentrates almost exclusively on the kernel.</para>
-
- <para>Sections 2.3 and 2.4
- describe the services provided by the 4.4BSD kernel,
- and give an overview of the latter's design.
- Later chapters describe the detailed design and implementation of these
- services as they appear in 4.4BSD.</para>
- </sect2>
- </sect1>
-
- <sect1>
- <title>Kernel Organization</title>
-
- <para>In this section, we view the organization of the 4.4BSD
- kernel in two ways:</para>
-
- <orderedlist>
- <listitem>
- <para>As a static body of software,
- categorized by the functionality offered by the modules
- that make up the kernel</para>
- </listitem>
-
- <listitem>
- <para>By its dynamic operation,
- categorized according to the services provided to users</para>
- </listitem>
- </orderedlist>
-
- <para>The largest part of the kernel implements
- the system services that applications access through system calls.
- In 4.4BSD, this software has been organized according to the following:</para>
-
- <itemizedlist>
- <listitem>
- <para>Basic kernel facilities:
- timer and system-clock handling,
- descriptor management, and process management</para>
- </listitem>
-
- <listitem>
- <para>Memory-management support:
- paging and swapping</para>
- </listitem>
-
- <listitem>
- <para>Generic system interfaces:
- the I/O,
- control, and multiplexing operations performed on descriptors</para>
- </listitem>
-
- <listitem>
- <para>The filesystem:
- files, directories, pathname translation, file locking,
- and I/O buffer management</para>
- </listitem>
-
- <listitem>
- <para>Terminal-handling support:
- the terminal-interface driver and terminal
- line disciplines</para>
- </listitem>
-
- <listitem>
- <para>Interprocess-communication facilities:
- sockets</para>
- </listitem>
-
- <listitem>
- <para>Support for network communication:
- communication protocols and
- generic network facilities, such as routing</para>
- </listitem>
- </itemizedlist>
-
- <table frame="none" id="table-mach-indep">
- <title>Machine-independent software in the 4.4BSD kernel</title>
- <tgroup cols="3">
- <thead>
- <row>
- <entry>Category</entry>
- <entry>Lines of code</entry>
- <entry>Percentage of kernel</entry>
- </row>
- </thead>
-
- <tfoot>
- <row>
- <entry>total machine independent</entry>
- <entry>162,617</entry>
- <entry>80.4</entry>
- </row>
- </tfoot>
-
- <tbody>
- <row>
- <entry>headers</entry>
- <entry>9,393</entry>
- <entry>4.6</entry>
- </row>
-
- <row>
- <entry>initialization</entry>
- <entry>1,107</entry>
- <entry>0.6</entry>
- </row>
-
- <row>
- <entry>kernel facilities</entry>
- <entry>8,793</entry>
- <entry>4.4</entry>
- </row>
-
- <row>
- <entry>generic interfaces</entry>
- <entry>4,782</entry>
- <entry>2.4</entry>
- </row>
-
- <row>
- <entry>interprocess communication</entry>
- <entry>4,540</entry>
- <entry>2.2</entry>
- </row>
-
- <row>
- <entry>terminal handling</entry>
- <entry>3,911</entry>
- <entry>1.9</entry>
- </row>
-
- <row>
- <entry>virtual memory</entry>
- <entry>11,813</entry>
- <entry>5.8</entry>
- </row>
-
- <row>
- <entry>vnode management</entry>
- <entry>7,954</entry>
- <entry>3.9</entry>
- </row>
-
- <row>
- <entry>filesystem naming</entry>
- <entry>6,550</entry>
- <entry>3.2</entry>
- </row>
-
- <row>
- <entry>fast filestore</entry>
- <entry>4,365</entry>
- <entry>2.2</entry>
- </row>
-
- <row>
- <entry>log-structure filestore</entry>
- <entry>4,337</entry>
- <entry>2.1</entry>
- </row>
-
- <row>
- <entry>memory-based filestore</entry>
- <entry>645</entry>
- <entry>0.3</entry>
- </row>
-
- <row>
- <entry>cd9660 filesystem</entry>
- <entry>4,177</entry>
- <entry>2.1</entry>
- </row>
-
- <row>
- <entry>miscellaneous filesystems (10)</entry>
- <entry>12,695</entry>
- <entry>6.3</entry>
- </row>
-
- <row>
- <entry>network filesystem</entry>
- <entry>17,199</entry>
- <entry>8.5</entry>
- </row>
-
- <row>
- <entry>network communication</entry>
- <entry>8,630</entry>
- <entry>4.3</entry>
- </row>
-
- <row>
- <entry>internet protocols</entry>
- <entry>11,984</entry>
- <entry>5.9</entry>
- </row>
-
- <row>
- <entry>ISO protocols</entry>
- <entry>23,924</entry>
- <entry>11.8</entry>
- </row>
-
- <row>
- <entry>X.25 protocols</entry>
- <entry>10,626</entry>
- <entry>5.3</entry>
- </row>
-
- <row>
- <entry>XNS protocols</entry>
- <entry>5,192</entry>
- <entry>2.6</entry>
- </row>
- </tbody>
- </tgroup>
- </table>
-
- <para>Most of the software in these categories is machine independent
- and is portable across different hardware architectures.</para>
-
- <para>The machine-dependent aspects of the kernel
- are isolated from the mainstream code.
- In particular, none of the machine-independent code contains
- conditional code for specific architecture.
- When an architecture-dependent action is needed,
- the machine-independent code calls an architecture-dependent
- function that is located in the machine-dependent code.
- The software that is machine dependent includes</para>
-
- <itemizedlist>
- <listitem>
- <para>Low-level system-startup actions</para>
- </listitem>
-
- <listitem>
- <para>Trap and fault handling</para>
- </listitem>
-
- <listitem>
- <para>Low-level manipulation of the run-time context of a
- process</para>
- </listitem>
-
- <listitem>
- <para>Configuration and initialization of hardware devices</para>
- </listitem>
-
- <listitem>
- <para>Run-time support for I/O devices</para>
- </listitem>
- </itemizedlist>
-
- <table frame="none" id="table-mach-dep">
- <title>Machine-dependent software for the HP300 in the 4.4BSD
- kernel</title>
-
- <tgroup cols="3">
- <thead>
- <row>
- <entry>Category</entry>
- <entry>Lines of code</entry>
- <entry>Percentage of kernel</entry>
- </row>
- </thead>
-
- <tfoot>
- <row>
- <entry>total machine dependent</entry>
- <entry>39,634</entry>
- <entry>19.6</entry>
- </row>
- </tfoot>
-
- <tbody>
- <row>
- <entry>machine dependent headers</entry>
- <entry>1,562</entry>
- <entry>0.8</entry>
- </row>
-
- <row>
- <entry>device driver headers</entry>
- <entry>3,495</entry>
- <entry>1.7</entry>
- </row>
-
- <row>
- <entry>device driver source</entry>
- <entry>17,506</entry>
- <entry>8.7</entry>
- </row>
-
- <row>
- <entry>virtual memory</entry>
- <entry>3,087</entry>
- <entry>1.5</entry>
- </row>
-
- <row>
- <entry>other machine dependent</entry>
- <entry>6,287</entry>
- <entry>3.1</entry>
- </row>
-
- <row>
- <entry>routines in assembly language</entry>
- <entry>3,014</entry>
- <entry>1.5</entry>
- </row>
-
- <row>
- <entry>HP/UX compatibility</entry>
- <entry>4,683</entry>
- <entry>2.3</entry>
- </row>
- </tbody>
- </tgroup>
- </table>
-
- <para><xref linkend="table-mach-indep"> summarizes the machine-independent software that constitutes the
- 4.4BSD kernel for the HP300.
- The numbers in column 2 are for lines of C source code,
- header files, and assembly language.
- Virtually all the software in the kernel is written in the C
- programming language;
- less than 2 percent is written in
- assembly language.
- As the statistics in <xref linkend="table-mach-dep"> show,
- the machine-dependent software, excluding
- HP/UX
- and device support,
- accounts for a minuscule 6.9 percent of the kernel.</para>
-
- <para>Only a small part of the kernel is devoted to
- initializing the system.
- This code is used when the system is
- <emphasis>bootstrapped</emphasis>
- into operation and is responsible for setting up the kernel hardware
- and software environment
- (see
- Chapter 14).
- Some operating systems (especially those with limited physical memory)
- discard or
- <emphasis>overlay</emphasis>
- the software that performs these functions after that software has
- been executed.
- The 4.4BSD kernel does not reclaim the memory used by the
- startup code because that memory space is barely 0.5 percent
- of the kernel resources used on a typical machine.
- Also, the startup code does not appear in one place in the kernel -- it is
- scattered throughout, and it usually appears
- in places logically associated with what is being initialized.</para>
- </sect1>
-
- <sect1>
- <title>Kernel Services</title>
-
- <para>The boundary between the kernel- and user-level code is enforced by
- hardware-protection facilities provided by the underlying hardware.
- The kernel operates in a separate address space that is inaccessible to
- user processes.
- Privileged operations -- such as starting I/O
- and halting the central processing unit
- (CPU) --
- are available to only the kernel.
- Applications request services from the kernel with
- <emphasis>system calls</emphasis>
- System calls are used to cause the kernel to execute complicated
- operations, such as writing data to secondary storage,
- and simple operations, such as returning the current time of day.
- All system calls appear
- <emphasis>synchronous</emphasis>
- to applications:
- The application does not run while the kernel does the actions associated
- with a system call.
- The kernel may finish some operations associated with a system call
- after it has returned.
- For example, a
- <emphasis>write</emphasis>
- system call will copy the data to be written
- from the user process to a kernel buffer while the process waits,
- but will usually return from the system call
- before the kernel buffer is written to the disk.</para>
-
- <para>A system call usually is implemented as a hardware trap that changes the
- CPU's
- execution mode and the current address-space mapping.
- Parameters supplied by users in system calls are validated by the kernel
- before being used.
- Such checking ensures the integrity of the system.
- All parameters passed into the kernel are copied into the
- kernel's address space,
- to ensure that validated parameters are not changed
- as a side effect of the system call.
- System-call results are returned by the kernel,
- either in hardware registers or by their values
- being copied to user-specified memory addresses.
- Like parameters passed into the kernel,
- addresses used for
- the return of results must be validated to ensure that they are
- part of an application's address space.
- If the kernel encounters an error while processing a system call,
- it returns an error code to the user.
- For the
- C programming language, this error code
- is stored in the global variable
- <emphasis>errno</emphasis>,
- and the function that executed the system call returns the value -1.</para>
-
- <para>User applications and the kernel operate
- independently of each other.
- 4.4BSD does not store I/O control blocks or other
- operating-system-related
- data structures in the application's address space.
- Each user-level application is provided an independent address space in
- which it executes.
- The kernel makes most state changes,
- such as suspending a process while another is running,
- invisible to the processes involved.</para>
- </sect1>
-
- <sect1>
- <title>Process Management</title>
-
- <para>4.4BSD supports a multitasking environment.
- Each task or thread of execution is termed a
- <emphasis>process</emphasis>.
- The
- <emphasis>context</emphasis>
- of a 4.4BSD process consists of user-level state,
- including the contents of its address space
- and the run-time environment, and kernel-level state,
- which includes
- scheduling parameters,
- resource controls,
- and identification information.
- The context includes everything
- used by the kernel in providing services for the process.
- Users can create processes, control the processes' execution,
- and receive notification when the processes' execution status changes.
- Every process is assigned a unique value, termed a
- <emphasis>process identifier</emphasis>
- (PID).
- This value is used by the kernel to identify a process when reporting
- status changes to a user, and by a user when referencing a process
- in a system call.</para>
-
- <para>The kernel creates a process by duplicating the context of another process.
- The new process is termed a
- <emphasis>child process</emphasis>
- of the original
- <emphasis>parent process</emphasis>
- The context duplicated in process creation includes
- both the user-level execution state of the process and
- the process's system state managed by the kernel.
- Important components of the kernel state are described in
- Chapter 4.</para>
-
- <figure id="fig-process-lifecycle">
- <title>Process lifecycle</title>
-
- <mediaobject>
- <imageobject>
- <imagedata fileref="fig1" format="EPS">
- </imageobject>
-
- <textobject>
- <literallayout class="monospaced">+----------------+ wait +----------------+
-| parent process |--------------------------------->| parent process |--->
-+----------------+ +----------------+
- | ^
- | fork |
- V |
-+----------------+ execve +----------------+ wait +----------------+
-| child process |------->| child process |------->| zombie process |
-+----------------+ +----------------+ +----------------+</literallayout>
- </textobject>
-
- <textobject>
- <phrase>Process-management system calls</phrase>
- </textobject>
- </mediaobject>
- </figure>
-
- <para>The process lifecycle is depicted in <xref linkend="fig-process-lifecycle">.
- A process may create a new process that is a copy of the original
- by using the
- <emphasis>fork</emphasis>
- system call.
- The
- <emphasis>fork</emphasis>
- call returns twice: once in the parent process, where
- the return value is the process identifier of the child,
- and once in the child process, where the return value is 0.
- The parent-child relationship induces a hierarchical structure on
- the set of processes in the system.
- The new process shares all its parent's resources, such as
- file descriptors, signal-handling status, and memory layout.</para>
-
- <para>Although there are occasions when the new process is intended
- to be a copy of the parent,
- the loading and execution of a different program is
- a more useful and typical action.
- A process can overlay itself with the memory image of another program,
- passing to the newly created image a set of parameters,
- using the system call
- <emphasis>execve</emphasis>.
- One parameter is the name of a file whose contents are
- in a format recognized by the system -- either a binary-executable file
- or a file that causes
- the execution of a specified interpreter program to process its contents.</para>
-
- <para>A process may terminate by executing an
- <emphasis>exit</emphasis>
- system call, sending 8 bits of
- exit status to its parent.
- If a process wants to communicate more than a single byte of
- information with its parent,
- it must either set up an interprocess-communication channel
- using pipes or sockets,
- or use an intermediate file.
- Interprocess communication is discussed extensively in
- Chapter 11.</para>
-
- <para>A process can suspend execution until any of its child processes terminate
- using the
- <emphasis>wait</emphasis>
- system call, which returns the
- PID
- and
- exit status of the terminated child process.
- A parent process can arrange to be notified by a signal when
- a child process exits or terminates abnormally.
- Using the
- <emphasis>wait4</emphasis>
- system call, the parent can retrieve information about
- the event that caused termination of the child process
- and about resources consumed by the process during its lifetime.
- If a process is orphaned because its parent exits before it is finished,
- then the kernel arranges for the child's exit status to be passed back
- to a special system process
- <!-- FIXME, the emphasis is wrong -->
- <emphasis>init</emphasis>:
- see Sections 3.1 and 14.6).</para>
-
- <para>The details of how the kernel creates and destroys processes are given in
- Chapter 5.</para>
-
- <para>Processes are scheduled for execution according to a
- <emphasis>process-priority</emphasis>
- parameter.
- This priority is managed by a kernel-based scheduling algorithm.
- Users can influence the scheduling of a process by specifying
- a parameter
- (<emphasis>nice</emphasis>)
- that weights the overall scheduling priority,
- but are still obligated to share the underlying
- CPU
- resources according to the kernel's scheduling policy.</para>
-
- <sect2>
- <title>Signals</title>
-
- <para>The system defines a set of
- <emphasis>signals</emphasis>
- that may be delivered to a process.
- Signals in 4.4BSD are modeled after hardware interrupts.
- A process may specify a user-level subroutine to be a
- <emphasis>handler</emphasis>
- to which a signal should be delivered.
- When a signal is generated,
- it is blocked from further occurrence while it is being
- <emphasis>caught</emphasis>
- by the handler.
- Catching a signal involves saving the current process context
- and building a new one in which to run the handler.
- The signal is then delivered to the handler, which can either abort
- the process or return to the executing process
- (perhaps after setting a global variable).
- If the handler returns, the signal is unblocked
- and can be generated (and caught) again.</para>
-
- <para>Alternatively, a process may specify that a signal is to be
- <emphasis>ignored</emphasis>,
- or that a default action, as determined by the kernel, is to be taken.
- The default action of certain signals is to terminate the process.
- This termination may be accompanied by creation of a
- <emphasis>core file</emphasis>
- that contains the current memory image of the process for use
- in postmortem debugging.</para>
-
- <para>Some signals cannot be caught or ignored.
- These signals include
- <emphasis>SIGKILL</emphasis>,
- which kills runaway processes,
- and the
- job-control signal
- <emphasis>SIGSTOP</emphasis>.</para>
-
- <para>A process may choose to have signals delivered on a
- special stack so that sophisticated software stack manipulations
- are possible.
- For example, a language supporting
- coroutines needs to provide a stack for each coroutine.
- The language run-time system can allocate these stacks
- by dividing up the single stack provided by 4.4BSD.
- If the kernel does not support a separate signal stack,
- the space allocated for each coroutine must be expanded by the
- amount of space required to catch a signal.</para>
-
- <para>All signals have the same <emphasis>priority</emphasis>.
- If multiple signals are pending simultaneously, the order in which
- signals are delivered to a process is implementation specific.
- Signal handlers execute with the signal that caused their
- invocation to be blocked, but other signals may yet occur.
- Mechanisms are provided so that processes can protect critical sections
- of code against the occurrence of specified signals.</para>
-
- <para>The detailed design and implementation of signals is described in
- Section 4.7.</para>
- </sect2>
-
- <sect2>
- <title>Process Groups and Sessions</title>
-
- <para>Processes are organized into
- <emphasis>process groups</emphasis>.
- Process groups are used to control access to terminals
- and to provide a means of distributing signals to collections of
- related processes.
- A process inherits its process group from its parent process.
- Mechanisms are provided by the kernel to allow a process to
- alter its process group or the process group of its descendents.
- Creating a new process group is easy;
- the value of a new process group is ordinarily the
- process identifier of the creating process.</para>
-
- <para>The group of processes in a process group is sometimes
- referred to as a
- <emphasis>job</emphasis>
- and is manipulated by high-level system software, such as the shell.
- A common kind of job created by a shell is a
- <emphasis>pipeline</emphasis>
- of several processes connected by pipes, such that the output of the first
- process is the input of the second, the output of the second is the
- input of the third, and so forth.
- The shell creates such a job by forking a
- process for each stage of the pipeline,
- then putting all those processes into a separate process group.</para>
-
- <para>A user process can send a signal to each process in
- a process group, as well as to a single process.
- A process in a specific process group may receive
- software interrupts affecting the group, causing the group to
- suspend or resume execution, or to be interrupted or terminated.</para>
-
- <para>A terminal has a process-group identifier assigned to it.
- This identifier is normally set to the identifier of a process group
- associated with the terminal.
- A job-control shell may create a number of process groups
- associated with the same terminal; the terminal is the
- <emphasis>controlling terminal</emphasis>
- for each process in these groups.
- A process may read from a descriptor for its controlling terminal
- only if the terminal's process-group identifier
- matches that of the process.
- If the identifiers do not match,
- the process will be blocked if it attempts to read from the terminal.
- By changing the process-group identifier of the terminal,
- a shell can arbitrate a terminal among several different jobs.
- This arbitration is called
- <emphasis>job control</emphasis>
- and is described, with process groups, in
- Section 4.8.</para>
-
- <para>Just as a set of related processes can be collected into a process group,
- a set of process groups can be collected into a
- <emphasis>session</emphasis>.
- The main uses for sessions are to create an isolated environment for a
- daemon process and its children,
- and to collect together a user's login shell
- and the jobs that that shell spawns.</para>
- </sect2>
- </sect1>
-
- <sect1>
- <title>Memory Management</title>
-
- <para>Each process has its own private address space.
- The address space is initially divided into three logical segments:
- <emphasis>text</emphasis>,
- <emphasis>data</emphasis>,
- and
- <emphasis>stack</emphasis>.
- The text segment is read-only and contains the machine
- instructions of a program.
- The data and stack segments are both readable and writable.
- The data segment contains the
- initialized and uninitialized data portions of a program, whereas
- the stack segment holds the application's run-time stack.
- On most machines, the stack segment is extended automatically
- by the kernel as the process executes.
- A process can expand or contract its data segment by making a system call,
- whereas a process can change the size of its text segment
- only when the segment's contents are overlaid with data from the
- filesystem, or when debugging takes place.
- The initial contents of the segments of a child process
- are duplicates of the segments of a parent process.</para>
-
- <para>The entire contents of a process address space do not need to be resident
- for a process to execute.
- If a process references a part of its address space that is not
- resident in main memory, the system
- <emphasis>pages</emphasis>
- the necessary information into memory.
- When system resources are scarce, the system uses a two-level
- approach to maintain available resources.
- If a modest amount of memory is available, the system will take
- memory resources away from processes if these resources have not been
- used recently.
- Should there be a severe resource shortage, the system will resort to
- <emphasis>swapping</emphasis>
- the entire context of a process to secondary storage.
- The
- <emphasis>demand paging</emphasis>
- and
- <emphasis>swapping</emphasis>
- done by the system are effectively transparent to processes.
- A process may, however, advise the system
- about expected future memory utilization as a performance aid.</para>
-
- <sect2>
- <title>BSD Memory-Management Design Decisions</title>
-
- <para>The support of large sparse address spaces, mapped files,
- and shared memory was a requirement for 4.2BSD.
- An interface was specified, called
- <emphasis>mmap</emphasis>,
- that allowed unrelated processes to request a shared
- mapping of a file into their address spaces.
- If multiple processes mapped the same file into their address spaces,
- changes to the file's portion of an address space
- by one process would be reflected
- in the area mapped by the other processes, as well as in the file itself.
- Ultimately, 4.2BSD was shipped without the
- <emphasis>mmap</emphasis>
- interface, because of pressure to make other features, such as
- networking, available.</para>
-
- <para>Further development of the
- <emphasis>mmap</emphasis>
- interface continued during the work on 4.3BSD.
- Over 40 companies and research groups participated
- in the discussions leading to the revised architecture
- that was described in the Berkeley Software Architecture Manual
- <xref linkend="biblio-mckusick-1">.
- Several of the companies have implemented the revised interface
- <xref linkend="biblio-gingell">.</para>
-
- <para>Once again, time pressure prevented 4.3BSD from providing an
- implementation of the interface.
- Although the latter could have been built into the existing
- 4.3BSD virtual-memory system,
- the developers decided not to put it in because
- that implementation was nearly 10 years old.
- Furthermore, the original virtual-memory design was based
- on the assumption that computer
- memories were small and expensive, whereas disks were
- locally connected, fast, large, and inexpensive.
- Thus, the virtual-memory system was designed to be frugal
- with its use of memory at the expense of generating extra disk traffic.
- In addition, the
- 4.3BSD implementation was riddled with
- VAX
- memory-management hardware dependencies that impeded its portability
- to other computer architectures.
- Finally, the virtual-memory system was not designed
- to support the tightly coupled
- multiprocessors that are becoming
- increasingly common and important today.</para>
-
- <para>Attempts to improve the old implementation incrementally
- seemed doomed to failure.
- A completely new design,
- on the other hand,
- could take advantage of large memories,
- conserve disk transfers,
- and have the potential to run on multiprocessors.
- Consequently, the virtual-memory system was completely replaced in 4.4BSD.
- The 4.4BSD virtual-memory system
- is based on the Mach 2.0 VM system
- <xref linkend="biblio-tevanian">.
- with updates from Mach 2.5 and Mach 3.0.
- It features
- efficient support for sharing,
- a clean separation of machine-independent and machine-dependent features,
- as well as (currently unused) multiprocessor support.
- Processes can map files anywhere in their address space.
- They can share parts of their address space by
- doing a shared mapping of the same file.
- Changes made by one process are visible in the address space of
- the other process, and also are written back to the file itself.
- Processes can also request private mappings of a file, which prevents
- any changes that they make from being visible to other processes
- mapping the file or being written back to the file itself.</para>
-
- <para>Another issue with the virtual-memory system is the way that
- information is passed into the kernel when a system call is made.
- 4.4BSD always copies data from the process address space
- into a buffer in the kernel.
- For read or write operations
- that are transferring large quantities of data,
- doing the copy can be time consuming.
- An alternative to doing the copying is to remap the
- process memory into the kernel.
- The 4.4BSD kernel always copies the data for several reasons:</para>
-
- <itemizedlist>
- <listitem>
- <para>Often, the user data are not page aligned and are not a multiple of
- the hardware page length.</para>
- </listitem>
-
- <listitem>
- <para>If the page is taken away from the process,
- it will no longer be able to reference that page.
- Some programs depend on the data remaining in the
- buffer even after those data have been written.</para>
- </listitem>
-
- <listitem>
- <para>If the process is allowed to keep a copy of the page
- (as it is in current 4.4BSD semantics),
- the page must be made
- <emphasis>copy-on-write</emphasis>.
- A copy-on-write page is one that is protected against being written
- by being made read-only.
- If the process attempts to modify the page,
- the kernel gets a write fault.
- The kernel then makes a copy of the page that the process can modify.
- Unfortunately, the typical process will immediately
- try to write new data to its output buffer,
- forcing the data to be copied anyway.</para>
- </listitem>
-
- <listitem>
- <para>When pages are remapped to new virtual-memory addresses,
- most memory-management hardware requires that the hardware
- address-translation cache be purged selectively.
- The cache purges are often slow.
- The net effect is that remapping is slower than
- copying for blocks of data less than 4 to 8 Kbyte.</para>
- </listitem>
- </itemizedlist>
-
- <para>The biggest incentives for memory mapping are the needs for
- accessing big files and for passing large quantities of data
- between processes.
- The
- <emphasis>mmap</emphasis>
- interface provides a way for both of these tasks
- to be done without copying.</para>
- </sect2>
-
- <sect2>
- <title>Memory Management Inside the Kernel</title>
-
- <para>The kernel often does allocations of memory that are
- needed for only the duration of a single system call.
- In a user process, such short-term
- memory would be allocated on the run-time stack.
- Because the kernel has a limited run-time stack,
- it is not feasible to allocate even moderate-sized blocks of memory on it.
- Consequently, such memory must be allocated
- through a more dynamic mechanism.
- For example,
- when the system must translate a pathname,
- it must allocate a 1-Kbyte buffer to hold the name.
- Other blocks of memory must be more persistent than a single system call,
- and thus could not be allocated on the stack even if there was space.
- An example is protocol-control blocks that remain throughout
- the duration of a network connection.</para>
-
- <para>Demands for dynamic memory allocation in the kernel have increased
- as more services have been added.
- A generalized memory allocator reduces the complexity
- of writing code inside the kernel.
- Thus, the 4.4BSD kernel has a single memory allocator that can be
- used by any part of the system.
- It has an interface similar to the C library routines
- <emphasis>malloc</emphasis>
- and
- <emphasis>free</emphasis>
- that provide memory allocation to application programs
- <xref linkend="biblio-mckusick-2">.
- Like the C library interface,
- the allocation routine takes a parameter specifying the
- size of memory that is needed.
- The range of sizes for memory requests is not constrained;
- however, physical memory is allocated and is not paged.
- The free routine takes a pointer to the storage being freed,
- but does not require the size
- of the piece of memory being freed.</para>
- </sect2>
- </sect1>
-
- <sect1>
- <title>I/O System</title>
-
- <para>The basic model of the UNIX
- I/O system is a sequence of bytes
- that can be accessed either randomly or sequentially.
- There are no
- <emphasis>access methods</emphasis>
- and no
- <emphasis>control blocks</emphasis>
- in a typical UNIX user process.</para>
-
- <para>Different programs expect various levels of structure,
- but the kernel does not impose structure on I/O.
- For instance, the convention for text files is lines of
- ASCII
- characters separated by a single newline character
- (the
- ASCII
- line-feed character),
- but the kernel knows nothing about this convention.
- For the purposes of most programs,
- the model is further simplified to being a stream of data bytes,
- or an
- <emphasis>I/O stream</emphasis>.
- It is this single common data form that makes the
- characteristic UNIX tool-based approach work
- <xref linkend="biblio-kernighan">.
- An I/O stream from one program can be fed as input
- to almost any other program.
- (This kind of traditional UNIX
- I/O stream should not be confused with the
- Eighth Edition stream I/O system or with the
- System V, Release 3
- STREAMS,
- both of which can be accessed as traditional I/O streams.)</para>
-
- <sect2>
- <title>Descriptors and I/O</title>
-
- <para>UNIX processes use
- <emphasis>descriptors</emphasis>
- to reference I/O streams.
- Descriptors are small unsigned integers obtained from the
- <emphasis>open</emphasis>
- and
- <emphasis>socket</emphasis>
- system calls.
- The
- <emphasis>open</emphasis>
- system call takes as arguments the name of a file and
- a permission mode to
- specify whether the file should be open for reading or for writing,
- or for both.
- This system call also can be used to create a new, empty file.
- A
- <emphasis>read</emphasis>
- or
- <emphasis>write</emphasis>
- system call can be applied to a descriptor to transfer data.
- The
- <emphasis>close</emphasis>
- system call can be used to deallocate any descriptor.</para>
-
- <para>Descriptors represent underlying objects supported by the kernel,
- and are created by system calls specific to the type of object.
- In 4.4BSD, three kinds of objects can be represented by descriptors:
- files, pipes, and sockets.</para>
-
- <itemizedlist>
- <listitem>
- <para>A
- <emphasis>file</emphasis>
- is a linear array of bytes with at least one name.
- A file exists until all its names are deleted explicitly
- and no process holds a descriptor for it.
- A process acquires a descriptor for a file
- by opening that file's name with the
- <emphasis>open</emphasis>
- system call.
- I/O devices are accessed as files.</para>
- </listitem>
-
- <listitem>
- <para>A
- <emphasis>pipe</emphasis>
- is a linear array of bytes, as is a file, but it is used solely
- as an I/O stream, and it is unidirectional.
- It also has no name,
- and thus cannot be opened with
- <emphasis>open</emphasis>.
- Instead, it is created by the
- <emphasis>pipe</emphasis>
- system call, which returns two descriptors,
- one of which accepts input that is sent to the other descriptor reliably,
- without duplication, and in order.
- The system also supports a named pipe or
- FIFO.
- A
- FIFO
- has properties identical to a pipe, except that it appears
- in the filesystem;
- thus, it can be opened using the
- <emphasis>open</emphasis>
- system call.
- Two processes that wish to communicate each open the
- FIFO:
- One opens it for reading, the other for writing.</para>
- </listitem>
-
- <listitem>
- <para>A
- <emphasis>socket</emphasis>
- is a transient object that is used for
- interprocess communication;
- it exists only as long as some process holds a descriptor
- referring to it.
- A socket is created by the
- <emphasis>socket</emphasis>
- system call, which returns a descriptor for it.
- There are different kinds of sockets that support various communication
- semantics, such as reliable delivery of data, preservation of
- message ordering, and preservation of message boundaries.</para>
- </listitem>
- </itemizedlist>
-
- <para>In systems before 4.2BSD, pipes were implemented using the filesystem;
- when sockets were introduced in 4.2BSD,
- pipes were reimplemented as sockets.</para>
-
- <para>The kernel keeps for each process a
- <emphasis>descriptor table</emphasis>,
- which is a table that the kernel uses
- to translate the external representation
- of a descriptor into an internal representation.
- (The descriptor is merely an index into this table.)
- The descriptor table of a process is inherited from that process's parent,
- and thus access to the objects
- to which the descriptors refer also is inherited.
- The main ways that a process can obtain a descriptor are by
- opening or creation of an object,
- and by inheritance from the parent process.
- In addition, socket
- IPC
- allows passing of descriptors in messages between unrelated processes
- on the same machine.</para>
-
- <para>Every valid descriptor has an associated
- <emphasis>file offset</emphasis>
- in bytes from the beginning of the object.
- Read and write operations start at this offset, which is
- updated after each data transfer.
- For objects that permit random access,
- the file offset also may be set with the
- <emphasis>lseek</emphasis>
- system call.
- Ordinary files permit random access, and some devices do, as well.
- Pipes and sockets do not.</para>
-
- <para>When a process terminates, the kernel
- reclaims all the descriptors that were in use by that process.
- If the process was holding the final reference to an object,
- the object's manager is notified so that it can do any
- necessary cleanup actions, such as final deletion of a file
- or deallocation of a socket.</para>
- </sect2>
-
- <sect2>
- <title>Descriptor Management</title>
-
- <para>Most processes expect three descriptors to be open already
- when they start running.
- These descriptors are 0, 1, 2, more commonly known as
- <emphasis>standard input</emphasis>,
- <emphasis>standard output</emphasis>,
- and
- <emphasis>standard error</emphasis>,
- respectively.
- Usually, all three are associated with the user's terminal
- by the login process
- (see
- Section 14.6)
- and are inherited through
- <emphasis>fork</emphasis>
- and
- <emphasis>exec</emphasis>
- by processes run by the user.
- Thus, a program can read what the user types by reading standard
- input, and the program can send output to the user's screen by
- writing to standard output.
- The standard error descriptor also is open for writing and is
- used for error output, whereas standard output is used for ordinary output.</para>
-
- <para>These (and other) descriptors can be mapped to objects other than
- the terminal;
- such mapping is called
- <emphasis>I/O redirection</emphasis>,
- and all the standard shells permit users to do it.
- The shell can direct the output of a program to a file
- by closing descriptor 1 (standard output) and opening
- the desired output file to produce a new descriptor 1.
- It can similarly redirect standard input to come from a file
- by closing descriptor 0 and opening the file.</para>
-
- <para>Pipes allow the output of one program to be input to another program
- without rewriting or even relinking of either program.
- Instead of descriptor 1 (standard output)
- of the source program being set up to write to the terminal,
- it is set up to be the input descriptor of a pipe.
- Similarly, descriptor 0 (standard input)
- of the sink program is set up to reference the output of the pipe,
- instead of the terminal keyboard.
- The resulting set of two processes and the connecting pipe is known as a
- <emphasis>pipeline</emphasis>.
- Pipelines can be arbitrarily long series of processes connected by pipes.</para>
-
- <para>The
- <emphasis>open</emphasis>,
- <emphasis>pipe</emphasis>,
- and
- <emphasis>socket</emphasis>
- system calls produce new descriptors with the lowest unused number
- usable for a descriptor.
- For pipelines to work,
- some mechanism must be provided to map such descriptors into 0 and 1.
- The
- <emphasis>dup</emphasis>
- system call creates a copy of a descriptor that
- points to the same file-table entry.
- The new descriptor is also the lowest unused one,
- but if the desired descriptor is closed first,
- <emphasis>dup</emphasis>
- can be used to do the desired mapping.
- Care is required, however: If descriptor 1 is desired,
- and descriptor 0 happens also to have been closed, descriptor 0
- will be the result.
- To avoid this problem, the system provides the
- <emphasis>dup2</emphasis>
- system call;
- it is like
- <emphasis>dup</emphasis>,
- but it takes an additional argument specifying
- the number of the desired descriptor
- (if the desired descriptor was already open,
- <emphasis>dup2</emphasis>
- closes it before reusing it).</para>
- </sect2>
-
- <sect2>
- <title>Devices</title>
-
- <para>Hardware devices have filenames, and may be
- accessed by the user via the same system calls used for regular files.
- The kernel can distinguish a
- <emphasis>device special file</emphasis>
- or
- <emphasis>special file</emphasis>,
- and can determine to what device it refers,
- but most processes do not need to make this determination.
- Terminals, printers, and tape drives are all accessed as though they
- were streams of bytes, like 4.4BSD disk files.
- Thus, device dependencies and peculiarities are kept in the kernel
- as much as possible, and even in the kernel most of them are segregated
- in the device drivers.</para>
-
- <para>Hardware devices can be categorized as either
- <emphasis>structured</emphasis>
- or
- <emphasis>unstructured</emphasis>;
- they are known as
- <emphasis>block</emphasis>
- or
- <emphasis>character</emphasis>
- devices, respectively.
- Processes typically access devices through
- <emphasis>special files</emphasis>
- in the filesystem.
- I/O operations to these files are handled by
- kernel-resident software modules termed
- <emphasis>device drivers</emphasis>.
- Most network-communication hardware devices are accessible through only
- the interprocess-communication facilities,
- and do not have special files in the filesystem name space,
- because the
- <emphasis>raw-socket</emphasis>
- interface provides a more natural interface than does a special file.</para>
-
- <para>Structured or block devices are typified by disks and magnetic tapes,
- and include most random-access devices.
- The kernel supports read-modify-write-type buffering actions
- on block-oriented structured devices to allow the latter
- to be read and written in a
- totally random byte-addressed fashion, like regular files.
- Filesystems are created on block devices.</para>
-
- <para>Unstructured devices are those devices that do not support a block
- structure.
- Familiar unstructured devices are communication lines, raster
- plotters, and unbuffered magnetic tapes and disks.
- Unstructured devices typically support large block I/O transfers.</para>
-
- <para>Unstructured files are called
- <emphasis>character devices</emphasis>
- because the first of these to be implemented were terminal device drivers.
- The kernel interface to the driver for these devices proved convenient
- for other devices that were not block structured.</para>
-
- <para>Device special files are created by the
- <emphasis>mknod</emphasis>
- system call.
- There is an additional system call,
- <emphasis>ioctl</emphasis>,
- for manipulating the underlying device parameters of special files.
- The operations that can be done differ for each device.
- This system call allows the special characteristics of devices to
- be accessed, rather than overloading the semantics of other system calls.
- For example, there is an
- <emphasis>ioctl</emphasis>
- on a tape drive to write an end-of-tape mark,
- instead of there being a special or modified version of
- <emphasis>write</emphasis>.</para>
- </sect2>
-
- <sect2>
- <title>Socket IPC</title>
-
- <para>The 4.2BSD kernel introduced an
- IPC
- mechanism more flexible than pipes, based on
- <emphasis>sockets</emphasis>.
- A socket is an endpoint of communication referred to by
- a descriptor, just like a file or a pipe.
- Two processes can each create a socket, and then connect those
- two endpoints to produce a reliable byte stream.
- Once connected, the descriptors for the sockets can be read or written
- by processes, just as the latter would do with a pipe.
- The transparency of sockets allows the kernel to redirect the output
- of one process to the input of another process residing on another machine.
- A major difference between pipes and sockets is that
- pipes require a common parent process to set up the
- communications channel.
- A connection between sockets can be set up by two unrelated processes,
- possibly residing on different machines.</para>
-
- <para>System V provides local interprocess communication through
- FIFOs
- (also known as
- <emphasis>named pipes</emphasis>).
- FIFOs
- appear as an object in the filesystem that unrelated
- processes can open and send data through in the same
- way as they would communicate through a pipe.
- Thus,
- FIFOs
- do not require a common parent to set them up;
- they can be connected after a pair of processes are up and running.
- Unlike sockets,
- FIFOs
- can be used on only a local machine;
- they cannot be used to communicate between processes on different machines.
- FIFOs
- are implemented in 4.4BSD only because they are required by the
- POSIX.1
- standard.
- Their functionality is a subset of the socket interface.</para>
-
- <para>The socket mechanism requires extensions to the traditional UNIX
- I/O system calls to provide the associated naming and connection semantics.
- Rather than overloading the existing interface,
- the developers used the existing interfaces to the extent that
- the latter worked without being changed,
- and designed new interfaces to handle the added semantics.
- The
- <emphasis>read</emphasis>
- and
- <emphasis>write</emphasis>
- system calls were used for byte-stream type connections,
- but six new system calls were added
- to allow sending and receiving addressed messages
- such as network datagrams.
- The system calls for writing messages include
- <emphasis>send</emphasis>,
- <emphasis>sendto</emphasis>,
- and
- <emphasis>sendmsg</emphasis>.
- The system calls for reading messages include
- <emphasis>recv</emphasis>,
- <emphasis>recvfrom</emphasis>,
- and
- <emphasis>recvmsg</emphasis>.
- In retrospect, the first two in each class are special cases of the others;
- <emphasis>recvfrom</emphasis>
- and
- <emphasis>sendto</emphasis>
- probably should have been added as library interfaces to
- <emphasis>recvmsg</emphasis>
- and
- <emphasis>sendmsg</emphasis>,
- respectively.</para>
- </sect2>
-
- <sect2>
- <title>Scatter/Gather I/O</title>
-
- <para>In addition to the traditional
- <emphasis>read</emphasis>
- and
- <emphasis>write</emphasis>
- system calls, 4.2BSD introduced the ability to do scatter/gather I/O.
- Scatter input uses the
- <emphasis>readv</emphasis>
- system call to allow a single read
- to be placed in several different buffers.
- Conversely, the
- <emphasis>writev</emphasis>
- system call allows several different buffers
- to be written in a single atomic write.
- Instead of passing a single buffer and length parameter, as is done with
- <emphasis>read</emphasis>
- and
- <emphasis>write</emphasis>,
- the process passes in a pointer to an array of buffers and lengths,
- along with a count describing the size of the array.</para>
-
- <para>This facility allows buffers in different parts of a process
- address space to be written atomically, without the
- need to copy them to a single contiguous buffer.
- Atomic writes are necessary in the case where the underlying
- abstraction is record based, such as tape drives that output a
- tape block on each write request.
- It is also convenient to be able to read a single request into
- several different buffers (such as a record header into one place
- and the data into another).
- Although an application can simulate the ability to scatter data
- by reading the data into a large buffer and then copying the pieces
- to their intended destinations,
- the cost of memory-to-memory copying in such cases often
- would more than double the running time of the affected application.</para>
-
- <para>Just as
- <emphasis>send</emphasis>
- and
- <emphasis>recv</emphasis>
- could have been implemented as library interfaces to
- <emphasis>sendto</emphasis>
- and
- <emphasis>recvfrom</emphasis>,
- it also would have been possible to simulate
- <emphasis>read</emphasis>
- with
- <emphasis>readv</emphasis>
- and
- <emphasis>write</emphasis>
- with
- <emphasis>writev</emphasis>.
- However,
- <emphasis>read</emphasis>
- and
- <emphasis>write</emphasis>
- are used so much more frequently that the added cost
- of simulating them would not have been worthwhile.</para>
- </sect2>
-
- <sect2>
- <title>Multiple Filesystem Support</title>
-
- <para>With the expansion of network computing,
- it became desirable to support both local and remote filesystems.
- To simplify the support of multiple filesystems,
- the developers added a new virtual node or
- <emphasis>vnode</emphasis>
- interface to the kernel.
- The set of operations exported from the vnode interface
- appear much like the filesystem operations previously supported
- by the local filesystem.
- However, they may be supported by a wide range of filesystem types:</para>
-
- <itemizedlist>
- <listitem>
- <para>Local disk-based filesystems</para>
- </listitem>
-
- <listitem>
- <para>Files imported using a variety of remote filesystem protocols</para>
- </listitem>
-
- <listitem>
- <para>Read-only
- CD-ROM
- filesystems</para>
- </listitem>
-
- <listitem>
- <para>Filesystems providing special-purpose interfaces -- for example, the
- <filename>/proc</filename>
- filesystem</para>
- </listitem>
- </itemizedlist>
-
- <para>A few variants of 4.4BSD, such as FreeBSD,
- allow filesystems to be loaded dynamically
- when the filesystems are first referenced by the
- <emphasis>mount</emphasis>
- system call.
- The vnode interface is described in
- Section 6.5;
- its ancillary support routines are described in
- Section 6.6;
- several of the special-purpose filesystems are described in
- Section 6.7.</para>
- </sect2>
- </sect1>
-
- <sect1>
- <title>Filesystems</title>
-
- <para>A regular file is a linear array of bytes,
- and can be read and written starting at any byte in the file.
- The kernel distinguishes no record boundaries in regular files, although
- many programs recognize line-feed characters as distinguishing
- the ends of lines, and other programs may impose other structure.
- No system-related information about a file is kept in the file itself,
- but the filesystem stores a small amount of ownership, protection,
- and usage information with each file.</para>
-
- <para>A
- <emphasis>filename</emphasis>
- component is a string of up to 255 characters.
- These filenames are stored in a type of file called a
- <emphasis>directory</emphasis>.
- The information in a directory about a file is called a
- <emphasis>directory entry</emphasis>
- and includes, in addition to the filename,
- a pointer to the file itself.
- Directory entries may refer to other directories, as well as to plain files.
- A hierarchy of directories and files is thus formed, and is called a
- <emphasis>filesystem</emphasis>;</para>
-
- <figure id="fig-small-fs">
- <title>A small filesystem</title>
-
- <mediaobject>
- <imageobject>
- <imagedata fileref="fig2" format="EPS">
- </imageobject>
-
- <textobject>
- <literallayout class="monospaced"> +-------+
- | |
- +-------+
- / \
- usr / \ vmunix
- |/ \|
- +-------+ +-------+
- | | | |
- +-------+ +-------+
- / | \
- staff / | \ bin
- |/ | tmp \|
- +-------+ V +-------+
- | | +-------+ | |
- +-------+ | | +-------+
- / | \ +-------+ / | \
- mckusick / | \| |/ | \ ls
- |/ | karels | vi \|
-+-------+ V V +-------+
-| | +-------+ +-------+ | |
-+-------+ | | | | +-------+
- +-------+ +-------+</literallayout>
- </textobject>
-
- <textobject>
- <phrase>A small filesystem tree</phrase>
- </textobject>
- </mediaobject>
- </figure>
-
- <para>a small one is shown in <xref linkend="fig-small-fs">.
- Directories may contain subdirectories, and there is no inherent
- limitation to the depth with which directory nesting may occur.
- To protect the consistency of the filesystem, the kernel
- does not permit processes to write directly into directories.
- A filesystem may include not only plain files and directories,
- but also references to other objects, such as devices and sockets.</para>
-
- <para>The filesystem forms a tree, the beginning of which is the
- <emphasis>root directory</emphasis>,
- sometimes referred to by the name
- <emphasis>slash</emphasis>,
- spelled with a single solidus character (/).
- The root directory contains files; in our example in Fig 2.2, it contains
- <filename>vmunix</filename>,
- a copy of the kernel-executable object file.
- It also contains directories; in this example, it contains the
- <filename>usr</filename>
- directory.
- Within the
- <filename>usr</filename>
- directory is the
- <filename>bin</filename>
- directory, which mostly contains executable object code of programs,
- such as the files
- <!-- FIXME -->
- <filename>ls</filename>
- and
- <filename>vi</filename>.</para>
-
- <para>A process identifies a file by specifying that file's
- <emphasis>pathname</emphasis>,
- which is a string composed of zero or more
- filenames separated by slash (/) characters.
- The kernel associates two directories with each process for use
- in interpreting pathnames.
- A process's
- <emphasis>root directory</emphasis>
- is the topmost point in the filesystem that the process can access;
- it is ordinarily set to the root directory of the entire filesystem.
- A pathname beginning with a slash is called an
- <emphasis>absolute pathname</emphasis>,
- and is interpreted by the kernel starting with the process's root directory.</para>
-
- <para>A pathname that does not begin with a slash is called a
- <emphasis>relative pathname</emphasis>,
- and is interpreted relative to the
- <emphasis>current working directory</emphasis>
- of the process.
- (This directory also is known by the shorter names
- <emphasis>current directory</emphasis>
- or
- <emphasis>working directory</emphasis>.)
- The current directory itself may be referred to directly by the name
- <emphasis>dot</emphasis>,
- spelled with a single period
- (<filename>.</filename>).
- The filename
- <emphasis>dot-dot</emphasis>
- (<filename>..</filename>)
- refers to a directory's parent directory.
- The root directory is its own parent.</para>
-
- <para>A process may set its root directory with the
- <emphasis>chroot</emphasis>
- system call,
- and its current directory with the
- <emphasis>chdir</emphasis>
- system call.
- Any process may do
- <emphasis>chdir</emphasis>
- at any time, but
- <emphasis>chroot</emphasis>
- is permitted only a process with superuser privileges.
- <emphasis>Chroot</emphasis>
- is normally used to set up restricted access to the system.</para>
-
- <para>Using the filesystem shown in Fig. 2.2,
- if a process has the root of the filesystem as its root directory, and has
- <filename>/usr</filename>
- as its current directory, it can refer to the file
- <filename>vi</filename>
- either from the root with the absolute pathname
- <filename>/usr/bin/vi</filename>,
- or from its current directory with the relative pathname
- <filename>bin/vi</filename>.</para>
-
- <para>System utilities and databases are kept in certain well-known directories.
- Part of the well-defined hierarchy includes a directory that contains the
- <emphasis>home directory</emphasis>
- for each user -- for example,
- <filename>/usr/staff/mckusick</filename>
- and
- <filename>/usr/staff/karels</filename>
- in Fig. 2.2.
- When users log in,
- the current working directory of their shell is set to the
- home directory.
- Within their home directories,
- users can create directories as easily as they can regular files.
- Thus, a user can build arbitrarily complex subhierarchies.</para>
-
- <para>The user usually knows of only one filesystem, but the system may
- know that this one virtual filesystem
- is really composed of several physical
- filesystems, each on a different device.
- A physical filesystem may not span multiple hardware devices.
- Since most physical disk devices are divided into several logical devices,
- there may be more than one filesystem per physical device,
- but there will be no more than one per logical device.
- One filesystem -- the filesystem that
- anchors all absolute pathnames -- is called the
- <emphasis>root filesystem</emphasis>,
- and is always available.
- Others may be mounted;
- that is, they may be integrated into the
- directory hierarchy of the root filesystem.
- References to a directory that has a filesystem mounted on it
- are converted transparently by the kernel
- into references to the root directory of the mounted filesystem.</para>
-
- <para>The
- <emphasis>link</emphasis>
- system call takes the name of an existing file and another name
- to create for that file.
- After a successful
- <emphasis>link</emphasis>,
- the file can be accessed by either filename.
- A filename can be removed with the
- <emphasis>unlink</emphasis>
- system call.
- When the final name for a file is removed (and the final process that
- has the file open closes it), the file is deleted.</para>
-
- <para>Files are organized hierarchically in
- <emphasis>directories</emphasis>.
- A directory is a type of file,
- but, in contrast to regular files,
- a directory has a structure imposed on it by the system.
- A process can read a directory as it would an ordinary file,
- but only the kernel is permitted to modify a directory.
- Directories are created by the
- <emphasis>mkdir</emphasis>
- system call and are removed by the
- <emphasis>rmdir</emphasis>
- system call.
- Before 4.2BSD, the
- <emphasis>mkdir</emphasis>
- and
- <emphasis>rmdir</emphasis>
- system calls were implemented by a series of
- <emphasis>link</emphasis>
- and
- <emphasis>unlink</emphasis>
- system calls being done.
- There were three reasons for adding systems calls
- explicitly to create and delete directories:</para>
-
- <orderedlist>
- <listitem>
- <para>The operation could be made atomic.
- If the system crashed,
- the directory would not be left half-constructed,
- as could happen when a series of link operations were used.</para>
- </listitem>
- <listitem>
- <para>When a
- networked filesystem is being run,
- the creation and deletion of files and directories need to be
- specified atomically so that they can be serialized.</para>
- </listitem>
- <listitem>
- <para>When supporting non-UNIX filesystems, such as an
- MS-DOS
- filesystem, on another partition of the disk,
- the other filesystem may not support link operations.
- Although other filesystems might support the concept of directories,
- they probably would not create and delete the directories with links,
- as the UNIX filesystem does.
- Consequently, they could create and delete directories only
- if explicit directory create and delete requests were presented.</para>
- </listitem>
- </orderedlist>
-
- <para>The
- <emphasis>chown</emphasis>
- system call sets the owner and group of a file, and
- <emphasis>chmod</emphasis>
- changes protection attributes.
- <emphasis>Stat</emphasis>
- applied to a filename can be used to read back such properties of a file.
- The
- <emphasis>fchown</emphasis>,
- <emphasis>fchmod</emphasis>,
- and
- <emphasis>fstat</emphasis>
- system calls are applied to a descriptor, instead of
- to a filename, to do the same set of operations.
- The
- <emphasis>rename</emphasis>
- system call can be used to give a file a new name in the filesystem,
- replacing one of the file's old names.
- Like the directory-creation and directory-deletion operations, the
- <emphasis>rename</emphasis>
- system call was added to 4.2BSD
- to provide atomicity to name changes in the local filesystem.
- Later, it proved useful explicitly to
- export renaming operations to foreign filesystems and over the network.</para>
-
- <para>The
- <emphasis>truncate</emphasis>
- system call was added to 4.2BSD to allow files to be shortened
- to an arbitrary offset.
- The call was added primarily in support of the Fortran
- run-time library,
- which has the semantics such that the end of a random-access
- file is set to be wherever the program most recently accessed that file.
- Without the
- <emphasis>truncate</emphasis>
- system call, the only way to shorten a file was to
- copy the part that was desired to a new file, to delete the old file,
- then to rename the copy to the original name.
- As well as this algorithm being slow,
- the library could potentially fail on a full filesystem.</para>
-
- <para>Once the filesystem had the ability to shorten files,
- the kernel took advantage of that ability
- to shorten large empty directories.
- The advantage of shortening empty directories is that it reduces the
- time spent in the kernel searching them
- when names are being created or deleted.</para>
-
- <para>Newly created files are assigned the user identifier of the process
- that created them and the group identifier of the directory
- in which they were created.
- A three-level access-control mechanism is provided for
- the protection of files.
- These three levels specify the accessibility of a file to</para>
-
- <orderedlist>
- <listitem>
- <para>The user who owns the file</para>
- </listitem>
- <listitem>
- <para>The group that owns the file</para>
- </listitem>
- <listitem>
- <para>Everyone else</para>
- </listitem>
- </orderedlist>
-
- <para>Each level of access has separate indicators for read permission,
- write permission, and execute permission.</para>
-
- <para>Files are created with zero length, and may grow when they are written.
- While a file is open, the system maintains a pointer into
- the file indicating the current location in
- the file associated with the descriptor.
- This pointer can be moved about in the file in a random-access fashion.
- Processes sharing a file descriptor through a
- <emphasis>fork</emphasis>
- or
- <emphasis>dup</emphasis>
- system call share the current location pointer.
- Descriptors created by separate
- <emphasis>open</emphasis>
- system calls have separate current location pointers.
- Files may have
- <emphasis>holes</emphasis>
- in them.
- Holes are void areas in the linear extent of the file where data have
- never been written.
- A process can create these holes by positioning
- the pointer past the current end-of-file and writing.
- When read, holes are treated by the system as zero-valued bytes.</para>
-
- <para>Earlier UNIX systems had a limit of 14 characters per filename component.
- This limitation was often a problem.
- For example,
- in addition to the natural desire of users
- to give files long descriptive names,
- a common way of forming filenames is as
- <filename><replaceable>basename</replaceable>.<replaceable>extension</replaceable></filename>,
- where the extension (indicating the kind of file, such as
- <literal>.c</literal>
- for C source or
- <literal>.o</literal>
- for intermediate binary object)
- is one to three characters,
- leaving 10 to 12 characters for the basename.
- Source-code\-control systems and editors usually take up another
- two characters, either as a prefix or a suffix, for their purposes,
- leaving eight to 10 characters.
- It is easy to use 10 or 12 characters in a single
- English word as a basename (e.g., ``multiplexer'').</para>
-
- <para>It is possible to keep within these limits,
- but it is inconvenient or even dangerous, because other UNIX
- systems accept strings longer than the limit when creating files,
- but then
- <emphasis>truncate</emphasis>
- to the limit.
- A C language source file named
- <filename>multiplexer.c</filename>
- (already 13 characters) might have a source-code-control file
- with
- <literal>s.</literal>
- prepended, producing a filename
- <filename>s.multiplexer</filename>
- that is indistinguishable from the source-code-control file for
- <filename>multiplexer.ms</filename>,
- a file containing
- <!-- FIXME -->
- <literal>troff</literal>
- source for documentation for the C program.
- The contents of the two original files could easily get confused
- with no warning from the source-code-control system.
- Careful coding can detect this problem, but the
- long filenames
- first introduced in 4.2BSD practically eliminate it.</para>
- </sect1>
-
- <sect1>
- <title>Filestores</title>
-
- <para>The operations defined for local filesystems are divided into two parts.
- Common to all local filesystems are hierarchical naming,
- locking, quotas, attribute management, and protection.
- These features are independent of how the data will be stored.
- 4.4BSD has a single implementation to provide these semantics.</para>
-
- <para>The other part of the local filesystem is the organization
- and management of the data on the storage media.
- Laying out the contents of files on the storage media is
- the responsibility of the filestore.
- 4.4BSD supports three different filestore layouts:</para>
-
- <itemizedlist>
- <listitem>
- <para>The traditional Berkeley Fast Filesystem</para>
- </listitem>
- <listitem>
- <para>The log-structured filesystem,
- based on the Sprite operating-system design
- <xref linkend="biblio-rosenblum"></para>
- </listitem>
- <listitem>
- <para>A memory-based filesystem</para>
- </listitem>
- </itemizedlist>
-
- <para>Although the organizations of these filestores are completely different,
- these differences are indistinguishable
- to the processes using the filestores.</para>
-
- <para>The Fast Filesystem organizes data into cylinder groups.
- Files that are likely to be accessed together,
- based on their locations in the filesystem hierarchy,
- are stored in the same cylinder group.
- Files that are not expected to accessed together are moved into
- different cylinder groups.
- Thus, files written at the same time may be placed far apart on the
- disk.</para>
-
- <para>The log-structured filesystem organizes data as a log.
- All data being written at any point in time are gathered together,
- and are written at the same disk location.
- Data are never overwritten;
- instead, a new copy of the file is written that replaces the old one.
- The old files are reclaimed by a garbage-collection process that runs
- when the filesystem becomes full and additional free space is needed.</para>
-
- <para>The memory-based filesystem is designed to store data in virtual memory.
- It is used for filesystems that need to support
- fast but temporary data, such as
- <filename>/tmp</filename>.
- The goal of the memory-based filesystem is to keep
- the storage packed as compactly as possible to minimize
- the usage of virtual-memory resources.</para>
- </sect1>
-
- <sect1>
- <title>Network Filesystem</title>
-
- <para>Initially, networking was used
- to transfer data from one machine to another.
- Later, it evolved to allowing users to log in remotely to another machine.
- The next logical step was to bring the data to the user,
- instead of having the user go to the data --
- and network filesystems were born.
- Users working locally
- do not experience the network delays on each keystroke,
- so they have a more responsive environment.</para>
-
- <para>Bringing the filesystem to a local machine was among the first
- of the major client-server applications.
- The
- <emphasis>server</emphasis>
- is the remote machine that exports one or more of its filesystems.
- The
- <emphasis>client</emphasis>
- is the local machine that imports those filesystems.
- From the local client's point of view,
- a remotely mounted filesystem appears in the file-tree name space
- just like any other locally mounted filesystem.
- Local clients can change into directories on the remote filesystem,
- and can read, write, and execute binaries within that remote filesystem
- identically to the way that they can do these operations
- on a local filesystem.</para>
-
- <para>When the local client does an operation on a remote filesystem,
- the request is packaged and is sent to the server.
- The server does the requested operation and
- returns either the requested information or an error
- indicating why the request was denied.
- To get reasonable performance,
- the client must cache frequently accessed data.
- The complexity of remote filesystems lies in maintaining cache
- consistency between the server and its many clients.</para>
-
- <para>Although many remote-filesystem protocols
- have been developed over the years,
- the most pervasive one in use among UNIX
- systems is the Network Filesystem
- (NFS),
- whose protocol and most widely used implementation were
- done by Sun Microsystems.
- The 4.4BSD kernel supports the
- NFS
- protocol, although the implementation was done independently
- from the protocol specification
- <xref linkend="biblio-macklem">.
- The
- NFS
- protocol is described in
- Chapter 9.
- </para>
- </sect1>
-
- <sect1>
- <title>Terminals</title>
-
- <para>Terminals support the standard system I/O operations, as well
- as a collection of terminal-specific operations to control input-character
- editing and output delays.
- At the lowest level are the terminal device drivers that control
- the hardware terminal ports.
- Terminal input is handled according to the underlying communication
- characteristics, such as baud rate,
- and according to a set of software-controllable
- parameters, such as parity checking.</para>
-
- <para>Layered above the terminal device drivers are line disciplines
- that provide various degrees of character processing.
- The default line discipline is selected when a port is being
- used for an interactive login.
- The line discipline is run in
- <emphasis>canonical mode</emphasis>;
- input is processed to provide standard line-oriented editing functions,
- and input is presented to a process on a line-by-line basis.</para>
-
-
- <para>Screen editors and programs that communicate with other computers
- generally run in
- <emphasis>noncanonical mode</emphasis>
- (also commonly referred to as
- <emphasis>raw mode</emphasis>
- or
- <emphasis>character-at-a-time mode</emphasis>).
- In this mode, input is passed through to the reading process immediately
- and without interpretation.
- All special-character input processing is disabled,
- no erase or other line editing processing is done,
- and all characters are passed to the program
- that is reading from the terminal.</para>
-
-
- <para>It is possible to configure the terminal in thousands
- of combinations between these two extremes.
- For example,
- a screen editor that wanted to receive user interrupts asynchronously
- might enable the special characters that
- generate signals and enable output flow control,
- but otherwise run in noncanonical mode;
- all other characters would be passed through to the process uninterpreted.</para>
-
- <para>On output, the terminal handler provides simple formatting services,
- including</para>
-
-
- <itemizedlist>
- <listitem>
- <para>Converting the line-feed character
- to the two-character carriage-return-line-feed sequence</para>
- </listitem>
-
- <listitem>
- <para>Inserting delays after certain standard control characters</para>
- </listitem>
-
- <listitem>
- <para>Expanding tabs</para>
- </listitem>
-
- <listitem>
- <para>Displaying echoed nongraphic
- ASCII
- characters as a two-character sequence of the
- form ``^C''
- (i.e., the
- ASCII
- caret character followed by the
- ASCII
- character that is the character's value offset from the
- ASCII
- ``@'' character).</para>
- </listitem>
- </itemizedlist>
-
- <para>Each of these formatting services can be disabled individually by
- a process through control requests.</para>
-
- </sect1>
-
- <sect1>
- <title>Interprocess Communication</title>
-
- <para>Interprocess communication in 4.4BSD is organized in
- <emphasis>communication domains</emphasis>.
- Domains currently supported include the
- <emphasis>local domain</emphasis>,
- for communication between processes executing on the same machine; the
- <emphasis>internet domain</emphasis>,
- for communication between processes using the
- TCP/IP
- protocol suite (perhaps within the Internet); the
- ISO/OSI
- protocol family for communication between sites required to run them;
- and the
- <emphasis>XNS domain</emphasis>,
- for communication between processes using the
- XEROX
- Network Systems
- (XNS)
- protocols.</para>
-
- <para>Within a domain, communication takes place between communication
- endpoints known as
- <emphasis>sockets</emphasis>.
- As mentioned in
- Section 2.6,
- the
- <emphasis>socket</emphasis>
- system call creates a socket and returns a descriptor;
- other
- IPC
- system calls are described in
- Chapter 11.
- Each socket has a type that defines its communications semantics;
- these semantics include properties such as reliability, ordering,
- and prevention of duplication of messages.</para>
-
- <para>Each socket has associated with it a
- <emphasis>communication protocol</emphasis>.
- This protocol provides the semantics required
- by the socket according to the latter's type.
- Applications may request a specific protocol when creating a socket, or
- may allow the system to select a protocol that is appropriate for the type
- of socket being created.</para>
-
- <para>Sockets may have addresses bound to them.
- The form and meaning of socket addresses are dependent on the
- communication domain in which the socket is created.
- Binding a name to a socket in the
- local domain causes a file to be created in the filesystem.</para>
-
- <para>Normal data transmitted and received through sockets are untyped.
- Data-representation issues are the responsibility of libraries built
- on top of the interprocess-communication facilities.
- In addition to transporting normal data, communication domains may
- support the transmission and reception of specially typed data, termed
- <emphasis>access rights</emphasis>.
- The local domain, for example,
- uses this facility to pass descriptors between processes.</para>
-
- <para>Networking implementations on UNIX before 4.2BSD
- usually worked by overloading the character-device interfaces.
- One goal of the socket interface was for naive
- programs to be able to work without change on stream-style connections.
- Such programs can work only if the
- <emphasis>read</emphasis>
- and
- <emphasis>write</emphasis>
- systems calls are unchanged.
- Consequently, the original interfaces were left intact,
- and were made to work on stream-type sockets.
- A new interface was added for more complicated sockets,
- such as those used to send datagrams, with which a destination address
- must be presented with each
- <emphasis>send</emphasis>
- call.</para>
-
- <para>Another benefit is that the new interface is highly portable.
- Shortly after a test release was available from Berkeley,
- the socket interface had been ported to System III
- by a UNIX vendor
- (although AT&amp;T did not support the socket interface
- until the release of System V Release 4,
- deciding instead to use the
- Eighth Edition stream mechanism).
- The socket interface was also ported to run in many
- Ethernet boards by vendors, such as Excelan and Interlan, that were
- selling into the PC market, where the machines were
- too small to run networking in the main processor.
- More recently, the socket interface was used as the basis for
- Microsoft's Winsock networking interface for Windows.</para>
- </sect1>
-
- <sect1>
- <title>Network Communication</title>
-
- <para>Some of the communication domains supported by the
- <emphasis>socket</emphasis>
- IPC
- mechanism provide access to network protocols.
- These protocols are implemented as a separate software
- layer logically below the socket software in the kernel.
- The kernel provides many ancillary services, such as
- buffer management, message routing, standardized interfaces
- to the protocols, and interfaces to the network interface drivers
- for the use of the various network protocols.</para>
-
- <para>At the time that 4.2BSD was being implemented,
- there were many networking protocols in use or under development,
- each with its own strengths and weaknesses.
- There was no clearly superior protocol or protocol suite.
- By supporting multiple protocols, 4.2BSD
- could provide interoperability and resource sharing
- among the diverse set of machines that was available
- in the Berkeley environment.
- Multiple-protocol support also provides for future changes.
- Today's protocols designed for 10- to 100-Mbit-per-second
- Ethernets are likely to be inadequate for
- tomorrow's 1- to 10-Gbit-per-second fiber-optic networks.
- Consequently, the network-communication layer is
- designed to support multiple protocols.
- New protocols are added to the kernel without
- the support for older protocols being affected.
- Older applications can continue to operate using the old protocol
- over the same physical network as is used by newer applications
- running with a newer network protocol.</para>
- </sect1>
-
- <sect1>
- <title>Network Implementation</title>
-
- <para>The first protocol suite implemented in 4.2BSD was
- DARPA's
- Transmission Control Protocol/Internet Protocol
- (TCP/IP).
- The
- CSRG
- chose
- TCP/IP
- as the first network to incorporate into the socket
- IPC
- framework,
- because a 4.1BSD-based implementation was publicly available from a
- DARPA-sponsored
- project at Bolt, Beranek, and Newman
- (BBN).
- That was an influential choice:
- The 4.2BSD implementation
- is the main reason for the extremely widespread use of this protocol suite.
- Later performance and capability improvements to the
- TCP/IP
- implementation have also been widely adopted.
- The
- TCP/IP
- implementation is described in detail in
- Chapter 13.</para>
-
- <para>The release of 4.3BSD added the Xerox Network Systems
- (XNS)
- protocol suite,
- partly building on work done at the
- University of Maryland and at
- Cornell University.
- This suite was needed to connect
- isolated machines that could not communicate using
- TCP/IP.</para>
-
- <para>The release of 4.4BSD added the
- ISO
- protocol suite because of the latter's increasing
- visibility both within and outside the United States.
- Because of the somewhat different semantics defined for the
- ISO
- protocols, some minor changes were required in the socket interface
- to accommodate these semantics.
- The changes were made such that they were invisible to clients
- of other existing protocols.
- The
- ISO
- protocols also required extensive addition to the two-level routing
- tables provided by the kernel in 4.3BSD.
- The greatly expanded routing capabilities of 4.4BSD include
- arbitrary levels of routing with variable-length addresses and
- network masks.</para>
- </sect1>
-
- <sect1>
- <title>System Operation</title>
-
- <para>Bootstrapping mechanisms are used to start the system running.
- First, the 4.4BSD
- kernel must be loaded into the main memory of the processor.
- Once loaded, it must go through an initialization phase to
- set the hardware into a known state.
- Next, the kernel must do
- autoconfiguration, a process that finds
- and configures the peripherals that are attached to the processor.
- The system begins running in single-user mode while a start-up script does
- disk checks and starts the accounting and quota checking.
- Finally, the start-up script starts the general system services
- and brings up
- the system to full multiuser operation.</para>
-
- <para>During multiuser operation, processes wait for login requests
- on the terminal lines and network ports that have been configured
- for user access.
- When a login request is detected,
- a login process is spawned and user validation is done.
- When the login validation is successful, a
- login shell is created from which
- the user can run additional processes.</para>
- </sect1>
-
- <bibliography>
- <title>References</title>
-
- <biblioentry id="biblio-accetta">
- <abbrev>Accetta et al, 1986</abbrev>
-
- <biblioset relation="article">
- <title>Mach: A New Kernel Foundation for UNIX Development"</title>
-
- <authorgroup>
- <author>
- <firstname>M. </firstname>
- <surname>Accetta</surname>
- </author>
- <author>
- <firstname>R.</firstname>
- <surname>Baron</surname>
- </author>
- <author>
- <firstname>W.</firstname>
- <surname>Bolosky</surname>
- </author>
- <author>
- <firstname>D.</firstname>
- <surname>Golub</surname>
- </author>
- <author>
- <firstname>R.</firstname>
- <surname>Rashid</surname>
- </author>
- <author>
- <firstname>A.</firstname>
- <surname>Tevanian</surname>
- </author>
- <author>
- <firstname>M.</firstname>
- <surname>Young</surname>
- </author>
- </authorgroup>
-
- <pagenums>93-113</pagenums>
- </biblioset>
-
- <biblioset relation="journal">
- <title>USENIX Association Conference Proceedings</title>
- <publishername>USENIX Association</publishername>
- <pubdate>June 1986</pubdate>
- </biblioset>
- </biblioentry>
-
- <biblioentry id="biblio-cheriton">
- <abbrev>Cheriton, 1988</abbrev>
-
- <biblioset relation="article">
- <title>The V Distributed System</title>
-
- <author>
- <firstname>D. R.</firstname>
- <surname>Cheriton</surname>
- </author>
-
- <pagenums>314-333</pagenums>
- </biblioset>
-
- <biblioset relation="journal">
- <title>Comm ACM, 31, 3</title>
-
- <pubdate>March 1988</pubdate>
- </biblioset>
- </biblioentry>
-
- <biblioentry id="biblio-ewens">
- <abbrev>Ewens et al, 1985</abbrev>
-
- <biblioset relation="article">
- <title>Tunis: A Distributed Multiprocessor Operating System</title>
-
- <authorgroup>
- <author>
- <firstname>P.</firstname>
- <surname>Ewens</surname>
- </author>
-
- <author>
- <firstname>D. R.</firstname>
- <surname>Blythe</surname>
- </author>
-
- <author>
- <firstname>M.</firstname>
- <surname>Funkenhauser</surname>
- </author>
-
- <author>
- <firstname>R. C.</firstname>
- <surname>Holt</surname>
- </author>
- </authorgroup>
-
- <pagenums>247-254</pagenums>
- </biblioset>
-
- <biblioset relation="journal">
- <title>USENIX Assocation Conference Proceedings</title>
- <publishername>USENIX Association</publishername>
- <pubdate>June 1985</pubdate>
- </biblioset>
- </biblioentry>
-
- <biblioentry id="biblio-gingell">
- <abbrev>Gingell et al, 1987</abbrev>
-
- <biblioset relation="article">
- <title>Virtual Memory Architecture in SunOS</title>
-
- <authorgroup>
- <author>
- <firstname>R.</firstname>
- <surname>Gingell</surname>
- </author>
-
- <author>
- <firstname>J.</firstname>
- <surname>Moran</surname>
- </author>
-
- <author>
- <firstname>W.</firstname>
- <surname>Shannon</surname>
- </author>
- </authorgroup>
-
- <pagenums>81-94</pagenums>
- </biblioset>
-
- <biblioset relation="journal">
- <title>USENIX Association Conference Proceedings</title>
- <publishername>USENIX Association</publishername>
- <pubdate>June 1987</pubdate>
- </biblioset>
- </biblioentry>
-
- <biblioentry id="biblio-kernighan">
- <abbrev>Kernighan & Pike, 1984</abbrev>
-
- <title>The UNIX Programming Environment</title>
-
- <authorgroup>
- <author>
- <firstname>B. W.</firstname>
- <surname>Kernighan</surname>
- </author>
-
- <author>
- <firstname>R.</firstname>
- <surname>Pike</surname>
- </author>
- </authorgroup>
-
- <publisher>
- <publishername>Prentice-Hall</publishername>
- <address>
- <city>Englewood Cliffs</city>
- <state>NJ</state>
- </address>
- </publisher>
-
- <pubdate>1984</pubdate>
- </biblioentry>
-
- <biblioentry id="biblio-macklem">
- <abbrev>Macklem, 1994</abbrev>
-
- <biblioset relation="chapter">
- <title>The 4.4BSD NFS Implementation</title>
-
- <author>
- <firstname>R.</firstname>
- <surname>Macklem</surname>
- </author>
-
- <pagenums>6:1-14</pagenums>
- </biblioset>
-
- <biblioset relation="book">
- <title>4.4BSD System Manager's Manual</title>
-
- <publisher>
- <publishername>O'Reilly &amp; Associates, Inc.</publishername>
- <address>
- <city>Sebastopol</city>
- <state>CA</state>
- </address>
- </publisher>
-
- <pubdate>1994</pubdate>
- </biblioset>
- </biblioentry>
-
- <biblioentry id="biblio-mckusick-2">
- <abbrev>McKusick & Karels, 1988</abbrev>
-
- <biblioset relation="article">
- <title>Design of a General Purpose Memory Allocator for the 4.3BSD
- UNIX Kernel</title>
-
- <authorgroup>
- <author>
- <firstname>M. K.</firstname>
- <surname>McKusick</surname>
- </author>
-
- <author>
- <firstname>M. J.</firstname>
- <surname>Karels</surname>
- </author>
- </authorgroup>
-
- <pagenums>295-304</pagenums>
- </biblioset>
-
- <biblioset relation="journal">
- <title>USENIX Assocation Conference Proceedings</title>
- <publishername>USENIX Assocation</publishername>
- <pubdate>June 1998</pubdate>
- </biblioset>
- </biblioentry>
-
- <biblioentry id="biblio-mckusick-1">
- <abbrev>McKusick et al, 1994</abbrev>
-
- <biblioset relation="manual">
- <title>Berkeley Software Architecture Manual, 4.4BSD Edition</title>
-
- <authorgroup>
- <author>
- <firstname>M. K.</firstname>
- <surname>McKusick</surname>
- </author>
-
- <author>
- <firstname>M. J.</firstname>
- <surname>Karels</surname>
- </author>
-
- <author>
- <firstname>S. J.</firstname>
- <surname>Leffler</surname>
- </author>
-
- <author>
- <firstname>W. N.</firstname>
- <surname>Joy</surname>
- </author>
-
- <author>
- <firstname>R. S.</firstname>
- <surname>Faber</surname>
- </author>
- </authorgroup>
-
- <pagenums>5:1-42</pagenums>
- </biblioset>
-
- <biblioset relation="book">
- <title>4.4BSD Programmer's Supplementary Documents</title>
-
- <publisher>
- <publishername>O'Reilly &amp; Associates, Inc.</publishername>
- <address>
- <city>Sebastopol</city>
- <state>CA</state>
- </address>
- </publisher>
-
- <pubdate>1994</pubdate>
- </biblioset>
- </biblioentry>
-
- <biblioentry id="biblio-ritchie">
- <abbrev>Ritchie, 1988</abbrev>
-
- <title>Early Kernel Design</title>
- <subtitle>private communication</subtitle>
-
- <author>
- <firstname>D. M.</firstname>
- <surname>Ritchie</surname>
- </author>
-
- <pubdate>March 1988</pubdate>
- </biblioentry>
-
- <biblioentry id="biblio-rosenblum">
- <abbrev>Rosenblum & Ousterhout, 1992</abbrev>
-
- <biblioset relation="article">
- <title>The Design and Implementation of a Log-Structured File
- System</title>
-
- <authorgroup>
- <author>
- <firstname>M.</firstname>
- <surname>Rosenblum</surname>
- </author>
-
- <author>
- <firstname>K.</firstname>
- <surname>Ousterhout</surname>
- </author>
- </authorgroup>
-
- <pagenums>26-52</pagenums>
- </biblioset>
-
- <biblioset relation="journal">
- <title>ACM Transactions on Computer Systems, 10, 1</title>
-
- <publishername>Association for Computing Machinery</publishername>
- <pubdate>February 1992</pubdate>
- </biblioset>
- </biblioentry>
-
- <biblioentry id="biblio-rozier">
- <abbrev>Rozier et al, 1988</abbrev>
-
- <biblioset relation="article">
- <title>Chorus Distributed Operating Systems</title>
-
- <authorgroup>
- <author>
- <firstname>M.</firstname>
- <surname>Rozier</surname>
- </author>
-
- <author>
- <firstname>V.</firstname>
- <surname>Abrossimov</surname>
- </author>
-
- <author>
- <firstname>F.</firstname>
- <surname>Armand</surname>
- </author>
-
- <author>
- <firstname>I.</firstname>
- <surname>Boule</surname>
- </author>
-
- <author>
- <firstname>M.</firstname>
- <surname>Gien</surname>
- </author>
-
- <author>
- <firstname>M.</firstname>
- <surname>Guillemont</surname>
- </author>
-
- <author>
- <firstname>F.</firstname>
- <surname>Herrmann</surname>
- </author>
-
- <author>
- <firstname>C.</firstname>
- <surname>Kaiser</surname>
- </author>
-
- <author>
- <firstname>S.</firstname>
- <surname>Langlois</surname>
- </author>
-
- <author>
- <firstname>P.</firstname>
- <surname>Leonard</surname>
- </author>
-
- <author>
- <firstname>W.</firstname>
- <surname>Neuhauser</surname>
- </author>
- </authorgroup>
-
- <pagenums>305-370</pagenums>
- </biblioset>
-
- <biblioset relation="journal">
- <title>USENIX Computing Systems, 1, 4</title>
- <pubdate>Fall 1988</pubdate>
- </biblioset>
- </biblioentry>
-
- <biblioentry id="biblio-tevanian">
- <abbrev>Tevanian, 1987</abbrev>
-
- <title>Architecture-Independent Virtual Memory Management for Parallel
- and Distributed Environments: The Mach Approach</title>
- <subtitle>Technical Report CMU-CS-88-106,</subtitle>
-
- <author>
- <firstname>A.</firstname>
- <surname>Tevanian</surname>
- </author>
-
- <publisher>
- <publishername>Department of Computer Science, Carnegie-Mellon
- University</publishername>
-
- <address>
- <city>Pittsburgh</city>
- <state>PA</state>
- </address>
- </publisher>
-
- <pubdate>December 1987</pubdate>
- </biblioentry>
- </bibliography>
- </chapter>
-</book>