aboutsummaryrefslogtreecommitdiff
path: root/bin/sh/TOUR
diff options
context:
space:
mode:
Diffstat (limited to 'bin/sh/TOUR')
-rw-r--r--bin/sh/TOUR301
1 files changed, 301 insertions, 0 deletions
diff --git a/bin/sh/TOUR b/bin/sh/TOUR
new file mode 100644
index 000000000000..8f7a741ba5c4
--- /dev/null
+++ b/bin/sh/TOUR
@@ -0,0 +1,301 @@
+# @(#)TOUR 8.1 (Berkeley) 5/31/93
+# $FreeBSD$
+
+NOTE -- This is the original TOUR paper distributed with ash and
+does not represent the current state of the shell. It is provided anyway
+since it provides helpful information for how the shell is structured,
+but be warned that things have changed -- the current shell is
+still under development.
+
+================================================================
+
+ A Tour through Ash
+
+ Copyright 1989 by Kenneth Almquist.
+
+
+DIRECTORIES: The subdirectory bltin contains commands which can
+be compiled stand-alone. The rest of the source is in the main
+ash directory.
+
+SOURCE CODE GENERATORS: Files whose names begin with "mk" are
+programs that generate source code. A complete list of these
+programs is:
+
+ program input files generates
+ ------- ----------- ---------
+ mkbuiltins builtins.def builtins.h builtins.c
+ mknodes nodetypes nodes.h nodes.c
+ mksyntax - syntax.h syntax.c
+ mktokens - token.h
+
+There are undoubtedly too many of these.
+
+EXCEPTIONS: Code for dealing with exceptions appears in
+exceptions.c. The C language doesn't include exception handling,
+so I implement it using setjmp and longjmp. The global variable
+exception contains the type of exception. EXERROR is raised by
+calling error or errorwithstatus. EXINT is an interrupt.
+
+INTERRUPTS: In an interactive shell, an interrupt will cause an
+EXINT exception to return to the main command loop. (Exception:
+EXINT is not raised if the user traps interrupts using the trap
+command.) The INTOFF and INTON macros (defined in exception.h)
+provide uninterruptible critical sections. Between the execution
+of INTOFF and the execution of INTON, interrupt signals will be
+held for later delivery. INTOFF and INTON can be nested.
+
+MEMALLOC.C: Memalloc.c defines versions of malloc and realloc
+which call error when there is no memory left. It also defines a
+stack oriented memory allocation scheme. Allocating off a stack
+is probably more efficient than allocation using malloc, but the
+big advantage is that when an exception occurs all we have to do
+to free up the memory in use at the time of the exception is to
+restore the stack pointer. The stack is implemented using a
+linked list of blocks.
+
+STPUTC: If the stack were contiguous, it would be easy to store
+strings on the stack without knowing in advance how long the
+string was going to be:
+ p = stackptr;
+ *p++ = c; /* repeated as many times as needed */
+ stackptr = p;
+The following three macros (defined in memalloc.h) perform these
+operations, but grow the stack if you run off the end:
+ STARTSTACKSTR(p);
+ STPUTC(c, p); /* repeated as many times as needed */
+ grabstackstr(p);
+
+We now start a top-down look at the code:
+
+MAIN.C: The main routine performs some initialization, executes
+the user's profile if necessary, and calls cmdloop. Cmdloop
+repeatedly parses and executes commands.
+
+OPTIONS.C: This file contains the option processing code. It is
+called from main to parse the shell arguments when the shell is
+invoked, and it also contains the set builtin. The -i and -m op-
+tions (the latter turns on job control) require changes in signal
+handling. The routines setjobctl (in jobs.c) and setinteractive
+(in trap.c) are called to handle changes to these options.
+
+PARSING: The parser code is all in parser.c. A recursive des-
+cent parser is used. Syntax tables (generated by mksyntax) are
+used to classify characters during lexical analysis. There are
+four tables: one for normal use, one for use when inside single
+quotes and dollar single quotes, one for use when inside double
+quotes and one for use in arithmetic. The tables are machine
+dependent because they are indexed by character variables and
+the range of a char varies from machine to machine.
+
+PARSE OUTPUT: The output of the parser consists of a tree of
+nodes. The various types of nodes are defined in the file node-
+types.
+
+Nodes of type NARG are used to represent both words and the con-
+tents of here documents. An early version of ash kept the con-
+tents of here documents in temporary files, but keeping here do-
+cuments in memory typically results in significantly better per-
+formance. It would have been nice to make it an option to use
+temporary files for here documents, for the benefit of small
+machines, but the code to keep track of when to delete the tem-
+porary files was complex and I never fixed all the bugs in it.
+(AT&T has been maintaining the Bourne shell for more than ten
+years, and to the best of my knowledge they still haven't gotten
+it to handle temporary files correctly in obscure cases.)
+
+The text field of a NARG structure points to the text of the
+word. The text consists of ordinary characters and a number of
+special codes defined in parser.h. The special codes are:
+
+ CTLVAR Parameter expansion
+ CTLENDVAR End of parameter expansion
+ CTLBACKQ Command substitution
+ CTLBACKQ|CTLQUOTE Command substitution inside double quotes
+ CTLARI Arithmetic expansion
+ CTLENDARI End of arithmetic expansion
+ CTLESC Escape next character
+
+A variable substitution contains the following elements:
+
+ CTLVAR type name '=' [ alternative-text CTLENDVAR ]
+
+The type field is a single character specifying the type of sub-
+stitution. The possible types are:
+
+ VSNORMAL $var
+ VSMINUS ${var-text}
+ VSMINUS|VSNUL ${var:-text}
+ VSPLUS ${var+text}
+ VSPLUS|VSNUL ${var:+text}
+ VSQUESTION ${var?text}
+ VSQUESTION|VSNUL ${var:?text}
+ VSASSIGN ${var=text}
+ VSASSIGN|VSNUL ${var:=text}
+ VSTRIMLEFT ${var#text}
+ VSTRIMLEFTMAX ${var##text}
+ VSTRIMRIGHT ${var%text}
+ VSTRIMRIGHTMAX ${var%%text}
+ VSLENGTH ${#var}
+ VSERROR delayed error
+
+In addition, the type field will have the VSQUOTE flag set if the
+variable is enclosed in double quotes and the VSLINENO flag if
+LINENO is being expanded (the parameter name is the decimal line
+number). The parameter's name comes next, terminated by an equals
+sign. If the type is not VSNORMAL (including when it is VSLENGTH),
+then the text field in the substitution follows, terminated by a
+CTLENDVAR byte.
+
+The type VSERROR is used to allow parsing bad substitutions like
+${var[7]} and generate an error when they are expanded.
+
+Commands in back quotes are parsed and stored in a linked list.
+The locations of these commands in the string are indicated by
+CTLBACKQ and CTLBACKQ+CTLQUOTE characters, depending upon whether
+the back quotes were enclosed in double quotes.
+
+Arithmetic expansion starts with CTLARI and ends with CTLENDARI.
+
+The character CTLESC escapes the next character, so that in case
+any of the CTL characters mentioned above appear in the input,
+they can be passed through transparently. CTLESC is also used to
+escape '*', '?', '[', and '!' characters which were quoted by the
+user and thus should not be used for file name generation.
+
+CTLESC characters have proved to be particularly tricky to get
+right. In the case of here documents which are not subject to
+variable and command substitution, the parser doesn't insert any
+CTLESC characters to begin with (so the contents of the text
+field can be written without any processing). Other here docu-
+ments, and words which are not subject to file name generation,
+have the CTLESC characters removed during the variable and command
+substitution phase. Words which are subject to file name
+generation have the CTLESC characters removed as part of the file
+name phase.
+
+EXECUTION: Command execution is handled by the following files:
+ eval.c The top level routines.
+ redir.c Code to handle redirection of input and output.
+ jobs.c Code to handle forking, waiting, and job control.
+ exec.c Code to do path searches and the actual exec sys call.
+ expand.c Code to evaluate arguments.
+ var.c Maintains the variable symbol table. Called from expand.c.
+
+EVAL.C: Evaltree recursively executes a parse tree. The exit
+status is returned in the global variable exitstatus. The alter-
+native entry evalbackcmd is called to evaluate commands in back
+quotes. It saves the result in memory if the command is a buil-
+tin; otherwise it forks off a child to execute the command and
+connects the standard output of the child to a pipe.
+
+JOBS.C: To create a process, you call makejob to return a job
+structure, and then call forkshell (passing the job structure as
+an argument) to create the process. Waitforjob waits for a job
+to complete. These routines take care of process groups if job
+control is defined.
+
+REDIR.C: Ash allows file descriptors to be redirected and then
+restored without forking off a child process. This is accom-
+plished by duplicating the original file descriptors. The redir-
+tab structure records where the file descriptors have been dupli-
+cated to.
+
+EXEC.C: The routine find_command locates a command, and enters
+the command in the hash table if it is not already there. The
+third argument specifies whether it is to print an error message
+if the command is not found. (When a pipeline is set up,
+find_command is called for all the commands in the pipeline be-
+fore any forking is done, so to get the commands into the hash
+table of the parent process. But to make command hashing as
+transparent as possible, we silently ignore errors at that point
+and only print error messages if the command cannot be found
+later.)
+
+The routine shellexec is the interface to the exec system call.
+
+EXPAND.C: As the routine argstr generates words by parameter
+expansion, command substitution and arithmetic expansion, it
+performs word splitting on the result. As each word is output,
+the routine expandmeta performs file name generation (if enabled).
+
+VAR.C: Variables are stored in a hash table. Probably we should
+switch to extensible hashing. The variable name is stored in the
+same string as the value (using the format "name=value") so that
+no string copying is needed to create the environment of a com-
+mand. Variables which the shell references internally are preal-
+located so that the shell can reference the values of these vari-
+ables without doing a lookup.
+
+When a program is run, the code in eval.c sticks any environment
+variables which precede the command (as in "PATH=xxx command") in
+the variable table as the simplest way to strip duplicates, and
+then calls "environment" to get the value of the environment.
+
+BUILTIN COMMANDS: The procedures for handling these are scat-
+tered throughout the code, depending on which location appears
+most appropriate. They can be recognized because their names al-
+ways end in "cmd". The mapping from names to procedures is
+specified in the file builtins.def, which is processed by the
+mkbuiltins command.
+
+A builtin command is invoked with argc and argv set up like a
+normal program. A builtin command is allowed to overwrite its
+arguments. Builtin routines can call nextopt to do option pars-
+ing. This is kind of like getopt, but you don't pass argc and
+argv to it. Builtin routines can also call error. This routine
+normally terminates the shell (or returns to the main command
+loop if the shell is interactive), but when called from a non-
+special builtin command it causes the builtin command to
+terminate with an exit status of 2.
+
+The directory bltins contains commands which can be compiled in-
+dependently but can also be built into the shell for efficiency
+reasons. The header file bltin.h takes care of most of the
+differences between the ash and the stand-alone environment.
+The user should call the main routine "main", and #define main to
+be the name of the routine to use when the program is linked into
+ash. This #define should appear before bltin.h is included;
+bltin.h will #undef main if the program is to be compiled
+stand-alone. A similar approach is used for a few utilities from
+bin and usr.bin.
+
+CD.C: This file defines the cd and pwd builtins.
+
+SIGNALS: Trap.c implements the trap command. The routine set-
+signal figures out what action should be taken when a signal is
+received and invokes the signal system call to set the signal ac-
+tion appropriately. When a signal that a user has set a trap for
+is caught, the routine "onsig" sets a flag. The routine dotrap
+is called at appropriate points to actually handle the signal.
+When an interrupt is caught and no trap has been set for that
+signal, the routine "onint" in error.c is called.
+
+OUTPUT: Ash uses its own output routines. There are three out-
+put structures allocated. "Output" represents the standard out-
+put, "errout" the standard error, and "memout" contains output
+which is to be stored in memory. This last is used when a buil-
+tin command appears in backquotes, to allow its output to be col-
+lected without doing any I/O through the UNIX operating system.
+The variables out1 and out2 normally point to output and errout,
+respectively, but they are set to point to memout when appropri-
+ate inside backquotes.
+
+INPUT: The basic input routine is pgetc, which reads from the
+current input file. There is a stack of input files; the current
+input file is the top file on this stack. The code allows the
+input to come from a string rather than a file. (This is for the
+-c option and the "." and eval builtin commands.) The global
+variable plinno is saved and restored when files are pushed and
+popped from the stack. The parser routines store the number of
+the current line in this variable.
+
+DEBUGGING: If DEBUG is defined in shell.h, then the shell will
+write debugging information to the file $HOME/trace. Most of
+this is done using the TRACE macro, which takes a set of printf
+arguments inside two sets of parenthesis. Example:
+"TRACE(("n=%d0, n))". The double parenthesis are necessary be-
+cause the preprocessor can't handle functions with a variable
+number of arguments. Defining DEBUG also causes the shell to
+generate a core dump if it is sent a quit signal. The tracing
+code is in show.c.