2  Unix shells

Unix shells, such as sh or csh, provide two things at once: an interactive command language and a programming language. Let us focus on the latter function: the writing of ``shell scripts'' -- interpreted programs that perform small tasks or assemble a collection of Unix tools into a single application.

Unix shells are real programming languages. They have variables, if/then conditionals, and loops. But they are terrible programming languages. The data structures typically consist only of integers and vectors of strings. The facilities for procedural abstraction are non-existent to minimal. The lexical and syntactic structures are multi-phased, unprincipled, and baroque.

If most shell languages are so awful, why does anyone use them? There are a few important reasons.

There is a tension between the shell's dual role as interactive command language and shell-script programming language. A command language should be terse and convenient to type. It doesn't have to be comprehensible. Users don't have to maintain or understand a command they typed into a shell a month ago. A command language can be ``write-only,'' because commands are thrown away after they are used. However, it is important that most commands fit on one line, because most interaction is through tty drivers that don't let the user back up and edit a line after its terminating newline has been entered. This seems like a trivial point, but imagine how irritating it would be if typical shell commands required several lines of input. Terse notation is important for interactive tasks.

Shell syntax is also carefully designed to allow it to be parsed on-line -- that is, to allow parsing and interpretation to be interleaved. This usually penalizes the syntax in other ways (for example, consider rc's clumsy if/then/else syntax [rc]).

Programming languages, on the other hand, can be a little more verbose, in return for generality and readability. The programmer enters programs into a text editor, so the language can spread out a little more.

The constraints of the shell's role as command language are one of the things that make it unpleasant as a programming language.

The really compelling advantage of shell languages over other programming languages is the first one mentioned above. Shells provide a powerful notation for connecting processes and files together. In this respect, shell languages are extremely well-adapted to the general paradigm of the Unix operating system. In Unix, the fundamental computational agents are programs, running as processes in individual address spaces. These agents cooperate and communicate among themselves to solve a problem by communicating over directed byte streams called pipes. Viewed at this level, Unix is a data-flow architecture. From this perspective, the shell serves a critical role as the language designed to assemble the individual computational agents to solve a particular task.

As a programming language, this interprocess ``glue'' aspect of the shell is its key desireable feature. This leads us to a fairly obvious idea: instead of adding weak programming features to a Unix process-control language, why not add process invocation features to a strong programming language?

What programming language would make a good base? We would want a language that was powerful and high-level. It should allow for implementations based on interactive interpreters, for ease of debugging and to keep programs small. Since we want to add new notation to the language, it would help if the language was syntactically extensible. High-level features such as automatic storage allocation would help keep programs small and simple. Scheme is an obvious choice. It has all of the desired features, and its weak points, such as it lack of a module system or its poor performance relative to compiled C on certain classes of program, do not apply to the writing of shell scripts.

I have designed and implemented a Unix shell called scsh that is embedded inside Scheme. I had the following design goals and non-goals:

The result design, scsh, has two dependent components, embedded within a very portable Scheme system:

The process-control notation allows the user to control Unix programs with a compact notation. The syscall library gives the programmer full low-level access to the kernel for tasks that cannot be handled by the high-level notation. In this way, scsh's functionality spans a spectrum of detail that is not available to either C or sh.