10 Systems programming in Scheme
Unix systems programming in Scheme is a much more pleasant experience than Unix systems programming in C. Several features of the language remove a lot of the painful or error-prone problems C systems programmers are accustomed to suffering. The most important of these features are:
exceptions
automatic storage management
real strings
higher-order procedures
S-expression syntax and backquote
Many of these features are available in other advanced programming languages, such as Modula-3 or ML. None are available in C.
10.1 Exceptions and robust error handling
In scsh, system calls never return the error codes that make careful systems programming in C so difficult. Errors are signaled by raising exceptions. Exceptions are usually handled by default handlers that either abort the program or invoke a run-time debugger; the programmer can override these when desired by using exception-handler expressions. Not having to return error codes frees up procedures to return useful values, which encourages procedural composition. It also keeps the programmer from cluttering up his code with (or, as is all too often the case, just forgetting to include) error checks for every system call. In scsh, the programmer can assume that if a system call returns at all, it returns successfully. This greatly simplifies the flow of the code from the programmer's point of view, as well as greatly increasing the robustness of the program.
10.2 Automatic storage management
Further, Scheme's automatic storage allocation removes the ``result'' parameters from the procedure argument lists. When composite data is returned, it is simply returned in a freshly-allocated data structure. Again, this helps make it possible for procedures to return useful values.
For example, the C system call readlink() dereferences a symbolic link in the file system. A working definition for the system call is given in figure 5b. It is complicated by many small bookkeeping details, made necessary by C's weak linguistic facilities.
In contrast, scsh's equivalent procedure, read-symlink, has a much simpler definition (fig. 5a).
(read-symlink fname)
readlink(char *path, char *buf, int bufsiz) If there is a real error, the procedure will, in most cases, return an error code. (We will gloss over the error-code mechanism for the sake of brevity.) However, if the length of buf does not actually match the argument bufsiz, the system call may either
It all depends.
| ||
Figure 5: Two definitions of readlink | ||
With the scsh version, there is no possibility that the result buffer will be too small. There is no possibility that the programmer will misrepresent the size of the result buffer with an incorrect bufsiz argument. These sorts of issues are completely eliminated by the Scheme programming model. Instead of having to worry about seven or eight trivial but potentially fatal issues, and write the necessary 10 or 15 lines of code to correctly handle the operation, the programmer can write a single function call and get on with his task.
10.3 Return values and procedural composition
Exceptions and automatic storage allocation make it easier for procedures to return useful values. This increases the odds that the programmer can use the compact notation of function composition -- f(g(x)) -- to connect producers and consumers of data, which is surprisingly difficult in C.
In C, if we wish to compose two procedure calls, we frequently must write:
Procedures that compute composite data structures for a result commonly return them by storing them into a data structure passed by-reference as a parameter. If g does this, we cannot nest calls, but must write the code as shown.
/* C style: */
g(x,&y);
...f(y)...
In fact, the above code is not quite what we want; we forgot to check g for an error return. What we really wanted was:
The person who writes this code has to remember to check for the error; the person who reads it has to visually link up the data flow by connecting y's def and use points. This is the data-flow equivalent of goto's, with equivalent effects on program clarity.
/* Worse/better: */
err=g(x,&y);
if( err ) {
<handle error on g call>
}
...f(y)...
In Scheme, none of this is necessary. We simply write
(f (g x)) ; SchemeEasy to write; easy to read and understand. Figure 6 shows an example of this problem, where the task is determining if a given file is owned by root.
if( stat(fname,&statbuf) ) { perror(progname); exit(-1); } if( statbuf.st_uid == 0 ) ...
| ||
Figure 6: Why we program with Scheme. | ||
10.4 Strings
Having a true string datatype turns out to be surprisingly valuable in making systems programs simpler and more robust. The programmer never has to expend effort to make sure that a string length kept in a variable matches the actual length of the string; never has to expend effort wondering how it will affect his program if a nul byte gets stored into his string. This is a minor feature, but like garbage collection, it eliminates a whole class of common C programming bugs.
10.5 Higher-order procedures
Scheme's first-class procedures are very convenient for systems programming. Scsh uses them to parameterise the action of procedures that create Unix processes. The ability to package up an arbitrary computation as a thunk turns out to be as useful in the domain of Unix processes as it is in the domain of Scheme computation. Being able to pass computations in this way to the procedures that create Unix processes, such as fork, fork/pipe and run/port* is a powerful programming technique.
First-class procedures allow us to parameterise port readers over different parsers, with the
(port->list parser port)procedure. This is the essential Scheme ability to capture abstraction in a procedure definition. If the user wants to read a list of objects written in some syntax from an i/o source, he need only write a parser capable of parsing a single object. The port->list procedure can work with the user's parser as easily as it works with read or read-line. {Note On-line streams}
First-class procedures also allow iterators such as for-each and filter to loop over lists of data. For example, to build the list of all my files in /usr/tmp, I write:
To delete every C file in my directory, I write:
(filter (lambda (f) (= (file-owner f) (user-uid)))
(glob "/usr/tmp/*"))
(for-each delete-file (glob "*.c"))
10.6 S-expression syntax and backquote
In general, Scheme's s-expression syntax is much, much simpler to understand and use than most shells' complex syntax, with their embedded pattern matching, variable expansion, alias substitution, and multiple rounds of parsing. This costs scsh's notation some compactness, at the gain of comprehensibility.
Recursive embeddings and balls of mud
Scsh's ability to cover a high-level/low-level spectrum of expressiveness is a function of its uniform s-expression notational framework. Since scsh's process notation is embedded within Scheme, and Scheme escapes are embedded within the process notation, the programmer can easily switch back and forth as needed, using the simple notation where possible, and escaping to system calls and general Scheme where necessary. This recursive embedding is what gives scsh its broad-spectrum coverage of systems functionality not available to either shells or traditional systems programming languages; it is essentially related to the ``ball of mud'' extensibility of the Lisp and Scheme family of languages.
Backquote and reliable argument lists
Scsh's use of implicit backquoting in the process notation is a particularly nice feature of the s-expression syntax. Most Unix shells provide the user with a way to take a computed string, split it into pieces, and pass them as arguments to a program. This usually requires the introduction of some sort of $IFS separator variable to control how the string is parsed into separate arguments. This makes things error prone in the cases where a single argument might contain a space or other parser delimiter. Worse than error prone, $IFS rescanning is in fact the source of a famous security hole in Unix [Reeds].
In scsh, data are used to construct argument lists using the implicit backquote feature of process forms, e.g.:
Backquote completely avoids the parsing issue because it deals with pre-parsed data: it constructs expressions from lists, not character strings. When the programmer computes a list of arguments, he has complete confidence that they will be passed to the program exactly as is, without running the risk of being re-parsed by the shell.
(run (cc ,file -o ,binary ,@flags)).