6  I/O

Perhaps the most difficult part of the design of scsh was the integration of Scheme ports and Unix file descriptors. Dealing with Unix file descriptors in a Scheme environment is difficult. In Unix, open files are part of the process state, and are referenced by small integers called file descriptors. Open file descriptors are the fundamental way i/o redirections are passed to subprocesses, since file descriptors are preserved across fork() and exec() calls.

Scheme, on the other hand, uses ports for specifying i/o sources. Ports are anonymous, garbage-collected Scheme objects, not integers. When a port is collected, it is also closed. Because file descriptors are just integers, it's impossible to garbage collect them -- in order to close file descriptor 3, you must prove that the process will never again pass a 3 as a file descriptor to a system call doing I/O, and that it will never exec() a program that will refer to file descriptor 3.

This is difficult at best.

If a Scheme program only used Scheme ports, and never directly used file descriptors, this would not be a problem. But Scheme code must descend to the file-descriptor level in at least two circumstances:

This causes problems. Suppose we have a Scheme port constructed on top of file descriptor 2. We intend to fork off a C program that will inherit this file descriptor. If we drop references to the port, the garbage collector may prematurely close file 2 before we exec the C program.

Another difficulty arising between the anonymity of ports and the explicit naming of file descriptors arises when the user explicitly manipulates file descriptors, as is required by Unix. For example, when a file port is opened in Scheme, the underlying run-time Scheme kernel must open a file and allocate an integer file descriptor. When the user subsequently explicitly manipulates particular file descriptors, perhaps preparatory to executing some Unix subprocess, the port's underlying file descriptor could be silently redirected to some new file.

Scsh's Unix i/o interface is intended to fix this and other problems arising from the mismatch between ports and file descriptors. The fundamental principle is that in scsh, most ports are attached to files, not to particular file descriptors. When the user does an i/o redirection (e.g., with dup2()) that must allocate a particular file descriptor fd, there is a chance that fd has already been inadvertently allocated to a port by a prior operation (e.g., an open-input-file call). If so, fd's original port will be shifted to some new file descriptor with a dup(fd) operation, freeing up fd for use. The port machinery is allowed to do this as it does not in general reveal which file descriptors are allocated to particular Scheme ports. Not revealing the particular file descriptors allocated to Scheme ports allows the system two important freedoms:

Users can explicitly manipulate file descriptors, if so desired. In this case, the associated ports are marked by the run time as ``revealed,'' and are no longer subject to automatic collection. The machinery for handling this is carefully marked in the documentation, and with some simple invariants in mind, follow the user's intuitions. This facility preserves the transparent close-on-collect property for file ports that are used in straightforward ways, yet allows access to the underlying Unix substrate without interference from the garbage collector. This is critical, since shell programming absolutely requires access to the Unix file descriptors, as their numerical values are a critical part of the process interface.

Under normal circumstances, all this machinery just works behind the scenes to keep things straightened out. The only time the user has to think about it is when he starts accessing file descriptors from ports, which he should almost never have to do. If a user starts asking what file descriptors have been allocated to what ports, he has to take responsibility for managing this information.

Further details on the port mechanisms in scsh are beyond the scope of this note; for more information, see the reference manual [refman].