PCLSRing: Keeping Process State Modular Alan Bawden A_b_s_t_r_a_c_t_: This paper describes the "PCLSRing" feature of the Incompatible Time Sharing (ITS) operating system. PCLSRing permits a process to access the state of another process in a modular fashion. The benefits of PCLSRing are explained, its implementation is described in detail, and the results of some simple performance metering are presented. 1. Introduction The Incompatible Time Sharing (ITS) operating system was originally written for the PDP-6 at the MIT Artificial Intelligence Laboratory during the late 1960's. At MIT during the 1970's ITS ran on three KA-10's and one KL-10, providing service to researchers in the AI Laboratory and in MIT's Laboratory for Computer Science. Many well-known PDP-10 programs, such as the Emacs editor, the MacLisp dialect of Lisp, the Macsyma symbolic algebra system, and the Midas assembler, were originally developed under ITS. More than twenty years later ITS still has a small, but devoted, user community. It now runs on a handful of KS-10 systems around the world. The designers of ITS were not primarily interested in doing research in operating systems. They held a pragmatic view of ITS as a tool for enabling the AI Lab's small user community to get the most out of their PDP-6/10 and its peripherals (robot arms, TV cameras, etc.). This is not to say that the designers were ignorant of the state of the art in operating systems, but they were unlikely to publish papers about their experiences. As a result, many of their pioneering ideas remain largely unknown outside the ITS user community. Among other features, ITS boasts a convenient symbolic system call mechanism that allows calls to be invoked by name and to pass values in arbitrary locations, a flexible system for delivering software interrupts to processes, device-independent display terminal support, a process organization that allows a single command processor to control multiple sub-processes, and a powerful mechanism for implementing devices in software that enabled ITS to have what was probably the first network file system. Many of these features have since been adopted or rediscovered by *DRAFT* DSK:BAWDEN;PCLSR 200 Page 2 other operating systems, but one of ITS's major design principles, called "PCLSRing", has never found its way into any other operating system. It is my belief that PCLSRing is an important idea that does not deserve to be neglected merely because it has never been adequately explained in print. Therefore, in this paper I have endeavored to explain what PCLSRing is, what its benefits are for both the users and the implementors of ITS, and how it is implemented. The reader should not expect to completely understand the description of the implementation of PCLSRing on first reading. Many of the points are subtle, and can probably only be fully appreciated by someone who tries to implement PCLSRing for himself. (If any reader makes this attempt, I would like to hear from him!) 2. The PCLSRing Modularity Principle Under any timesharing operating system there will be occasions when a process must access the state of another process. A process may need to start, stop, debug, load, dump, create, or destroy another process. There is also one occasion when a process must access its own state: an interrupt handler needs access to the state of the running process prior to the arrival of the interrupt, so that the process may continue after the interrupt has been dealt with. "PCLSRing" is a mechanism the ITS operating system uses to enforce a kind of modularity when a process must access the state of another process. The modularity principle is very simple: no process ever catches another process (including itself) in the act of executing a system call. System calls thus behave as if they were directly implemented in hardware. A process can no more catch another process in the middle of deleting a file than it can catch another process in the middle of a multiply instruction. Another way to state the PCLSRing modularity principle is to say that when a process looks at the program counter of another process, it must always see a User mode PC, never an Exec mode PC. When one process wants to access the state of another process, and the target process happens to be executing in Exec mode, then something must be done in order to put the target process into User mode. Either the target process must be backed out of the system call, leaving the User mode PC pointing at the call, or it must be allowed to finish, leaving the User mode PC pointing just after the call. This act of getting a process into User mode as quickly as possible is called "PCLSRing" (pronounced "pee-see-loo-zer-ing") a process. We say that one process "PCLSRed" ("pee-see-loo-zerd") another process, or we speak of the need to "PCLSR" ("pee-see-loo-zer") a process before giving it an interrupt. Just as certain PDP-10 instructions cannot be guaranteed to run to *DRAFT* DSK:BAWDEN;PCLSR 200 Page 3 completion, either because they might be interrupted or because they run the risk of taking a page fault (e.g. BLT), certain ITS system calls cannot always run to completion before the executing process must be PCLSRed. A prime example is the SIOT system call, which is used to transfer a string of bytes to or from an I/O device. Suppose a process is using SIOT to send a long string of characters to a slow device (such as a terminal) when an interrupt arrives. It is impossible to take back the characters that have already reached the device, and there may be an indefinite delay before the rest of the characters can be output. The PCLSRing modularity principle requires that the process be maneuvered out of the system call and into User mode as quickly as possible, else the interrupt cannot be promptly delivered. How can the state of the process be captured in a User mode PC given that the process really is halfway through outputting the string of characters? The solution is to employ the same trick that the PDP-10 hardware uses when it is necessary to interrupt an instruction, such as BLT, that has only partly performed its task. The arguments originally passed to the system call are updated in place to reflect any work already accomplished, and the PC is left pointing at the system call, so that when the system call is restarted it will proceed from where it left off. In the case of the SIOT system call, the locations in memory from which the byte count and the byte pointer arguments were originally fetched will be overwritten with a new count and pointer that address the bytes that remain to be output. All ITS system calls that might be interrupted with their tasks partially accomplished are designed to update their arguments in this fashion. This requirement might be expected to introduce some system calls with baroque calling conventions, but in fact the vast majority of system calls have no need of ever updating any of their arguments, and those few that do are generally able to do so in a natural way. (One rarely used system call does require an additional argument that should initially be supplied as zero and is used as a "state word" to keep track of the progress of the call. In this obscure case, the documentation explains what the non-zero values of this argument mean, although no program is known to make use of this information.) 3. The Benefits of PCLSRing The PCLSRing modularity principle is easy to state and easy for the user to understand, but it is tricky to implement. The majority of this paper is devoted to explaining the techniques used in the ITS implementation of PCLSRing. Before going into that, however, let us consider why PCLSRing is worth all the trouble it entails. I will present examples of the benefits that PCLSRing has for both the users and the implementors of ITS. Users are chiefly conscious of the effects of PCLSRing when they write *DRAFT* DSK:BAWDEN;PCLSR 200 Page 4 interrupt handlers. The main program level PC that is passed to an interrupt handler is always a User mode PC. By examining this PC and the contents of memory, the interrupt handler can determine precisely what the state of the interrupted task was. If the interrupt handler needs to know, for example, whether or not the main program level had successfully deleted a file before the user typed an interrupt character, it can simply look to see if the PC points before or after the DELETE system call. Any question the interrupt handler may have about the state of the computation when the interrupt occurred can be answered in terms of a simple machine model in which all system calls behave as if they were hardware instructions. The effects of eccentric interrupt handlers that dismiss interrupts by returning to a PC other than the one interrupted from, or that alter the main program level accumulators, are straightforward to predict. It is, for example, easy to construct a simple interrupt driven scheduler that multiplexes one process into multiple tasks. For the implementors of ITS, PCLSRing is a powerful aid in managing the interactions between processes. This can be illustrated by considering the example of the DTTY system call. ITS supports a notion of "ownership" of the user's terminal. As various processes run on the user's behalf (editor, compiler, debugger, etc.), they pass ownership of the terminal back and forth. A critical moment occurs when one process uses DTTY to take ownership of the terminal back from a process that it had previously given it to. When the acquiring process wrests control of the terminal away from its current owner, how can it know it won't catch the owner in the middle of actually using the terminal, potentially leaving some per-terminal data structures in an inconsistent state? The answer is for the acquiring process to start by PCLSRing the terminal's current owner. This will insure that the current owner is not in the middle of any system call. Therefore all terminal-specific data structures will be in a known state, and it will be safe to transfer ownership of the terminal. This example shows most clearly why PCLSRing is a tool for promoting m_o_d_u_l_a_r_i_t_y_. PCLSRing can be used to hide the implementation details of a system call that modifies the state of some data structure. Only documented state transitions are ever exposed to other processes. 4. The Basics of ITS In order to understand the techniques used to implement PCLSRing, it is necessary to have some understanding of the basic structure of ITS. This section introduces the players. *DRAFT* DSK:BAWDEN;PCLSR 200 Page 5 4.1 Processes and "User" Variables The basic job of any timesharing operating system is to share a single processor among set of independent processes. An ITS process has a name, some I/O channels, a program counter, sixteen accumulators, a page table, an interrupt system, and assorted other properties. These properties are all held in a collection of per-process variables that ITS keeps in its own address space. At any particular moment a process can be running in either User mode or Exec mode (as determined by a bit in the left half of the program counter). In User mode, each process has its own page table, and can configure its User mode address space to suit itself. In Exec mode, all processes share the same page table, and thus all execute in the same Exec mode address space. Hardware interrupt service routines also execute in this single Exec mode address space. An important property of the Exec mode address space is that its pages are never swapped out to secondary storage. All system code and data structures, including the per-process variables, are stored in Exec address space, and are always available without risk of causing a page fault. This convenient property is possible only because the implementors have taken great pains to minimize the size of ITS and its data structures. ITS processes are variously called "processes", "jobs", "users", and "procedures". To avoid confusion, this paper consistently uses the word "process", with one exception: the set of per-process variables that ITS maintains are referred to as "user variables", rather than the more consistent "process variables", because to do otherwise would violate tradition. (Knowing that processes are sometimes called "users" will also help the reader understand why there are so many U's in the names of various routines and variables!) Typical user variables are the UNAME and JNAME; these contain two SIXBIT words that are unique to this process, and together serve as the process's name. Additional user variables will be introduced as they are needed. For easy reference, a dictionary of all user variables mentioned anywhere in the paper, each with a brief description of its function, is given in an appendix. This appendix also reveals some simplifications made by the main text. 4.2 System Calls As in any other PDP-10 operating system, ITS system calls are performed by executing instructions that cause a trap to Exec mode. The Exec mode trap handler saves the User mode PC in the SUUOH user variable, saves the contents of the User mode accumulators in the UUOACS user variable (an *DRAFT* DSK:BAWDEN;PCLSR 200 Page 6 array of sixteen words), examines the instruction that caused the trap, and dispatches to the appropriate system call. Each system call is responsible for obtaining its own arguments from the User mode address space, performing its particular task, and writing any return values back into User mode memory. (Many system calls make use of a flexible and convenient facility ITS provides for processing arguments and returning values.) All system calls return to a common point in order to return to User mode. This common return code checks for a number of possible error conditions (such as returning to User mode while still holding a lock or with hardware interrupts deferred), performs some general cleanup, restores the accumulators from UUOACS, and returns to the User mode PC saved in SUUOH. 4.3 Scheduler The ITS Scheduler runs at the lowest priority of the PDP-10's seven hardware interrupt levels. This interrupt level is commonly called "clock level" because a hardware clock interrupt is signaled there periodically. (Actually, several minor interrupts are also signaled at clock level, and a variety of periodic overhead tasks are performed there as well, but in this paper the only clock level activity that will concern us is the Scheduler.) The Scheduler's job is to switch the processor back and forth between those processes that are currently runnable. When a process is not actually running on the processor the Scheduler keeps its PC in the UPC user variable and keeps its accumulators in the AC0S user variable (an array of sixteen words). When the Scheduler examines a process it may discover that the process is either "stopped", "blocked", or "running". If a process is stopped, then some external agent has decided that this process must not run, so the Scheduler can do nothing with it. If a process is blocked, it is waiting for something, so the Scheduler must check for the awaited condition to see if the process can be unblocked and allowed to run. If a process is running, then it is capable of making immediate progress, so the Scheduler can allow it to use the processor. The Scheduler uses the user variables USTP and FLSINS to keep track of the runnability of a process. If a process's USTP is non-zero, then the process is stopped. The left half contains various bits corresponding to specific reasons why the process should not run (for example, one bit is set when the process receives a fatal interrupt). The right half contains a count of the number of other processes that are currently keeping this process stopped. As each such process finishes, it decrements this variable so that the Scheduler will know when everybody is finished and it is safe to run the process once again. *DRAFT* DSK:BAWDEN;PCLSR 200 Page 7 If the FLSINS user variable is non-zero, then the process is blocked waiting for some condition, such as the typing of a character on the user's terminal, the passage of a given amount of time, or the arrival of a page to satisfy a page fault. In such a case FLSINS contains an instruction that the Scheduler can execute in order to determine if the awaited condition has come to pass. Such an instruction should skip if the process is no longer blocked. At any given moment a process can be in one of five possible states: (1) Stopped. In this case USTP is non-zero and UPC contains a User mode PC. (FLSINS is zero, which isn't immediately relevant, but does insure that the process will run as soon as its USTP is cleared.) (2) Blocked in Exec mode. USTP is zero, FLSINS contains an instruction that will skip when the process is no longer blocked, and UPC contains an Exec mode PC. (3) Blocked in User mode. USTP is zero, FLSINS contains an instruction that will skip when the process is no longer blocked, and UPC contains a User mode PC. The only possible reason for a process to be blocked in User mode is that it is waiting for a page to satisfy a page fault; all other cases of blocking occur after a process has executed a system call and entered Exec mode. (4) Running in Exec mode. USTP is zero, FLSINS is zero, and UPC contains an Exec mode PC. (5) Running in User mode. USTP is zero, FLSINS is zero, and UPC contains a User mode PC. Note that a stopped process m_u_s_t_ be in User mode. Once they get a chance to run, processes have ways of influencing the behavior of the Scheduler. For example, a process can temporarily mask clock-level interrupts, which insures that it has exclusive use of the processor while it performs some critical action that the Scheduler must not interrupt. Processes can also intentionally yield the processor and allow the Scheduler to run other processes; a process will do this when it first becomes blocked. 4.4 Software Interrupts ITS has a sophisticated software interrupt system that it uses to inform processes of a variety of conditions that might require attention. When a hardware device driver or another process wishes to interrupt a given process, it merely sets a bit in one of two user variables, PIRQC or IFPIR. It is then up to the Scheduler to actually deliver the interrupt. *DRAFT* DSK:BAWDEN;PCLSR 200 Page 8 Whenever the Scheduler is considering running a process it looks at its PIRQC and IFPIR variables as well as its USTP and FLSINS. If it discovers that an interrupt is pending, it tries to manipulate the process into its interrupt handler. This is a somewhat tricky procedure, because the required manipulations can cause page faults! Fortunately, only the first step of the procedure concerns us in this paper: the Scheduler first must PCLSR the process so that the process's interrupt handler sees a User mode PC. 5. The Implementation of PCLSRing Now that we have introduced the players, we can proceed to examine the implementation of PCLSRing in detail. When it becomes necessary to PCLSR some target process, we will find it in one of the five possible states described in the introduction to the Scheduler above. Considering each state in turn, what needs to be done in order to get the target process into User mode as quickly as possible? (1) If the process is stopped, it is already in User mode and nothing needs to be done. (2) If the process is blocked in Exec mode, then we must force it back to the start of whatever system call brought it into Exec mode. First we clean up what the process was doing: the LSWPR user variable, described below, contains a linked list of cleanup actions that we perform. Then we copy the saved User mode accumulators from UUOACS into AC0S, and we pick up the User mode return PC from SUUOH, decrement it so that the process will start the system call from the beginning, and put it in UPC. Finally we clear the process's FLSINS. (3) If the process is blocked in User mode, it must be waiting for a page to satisfy a page fault. Fortunately, nothing is harmed if the process simply forgets this fact, because as soon as it starts running it will either immediately take the same fault again or the page will have arrived in the interim. So in this case we simply clear the process's FLSINS. (4) If the process is running in Exec mode, we know that it cannot remain in this state for long. If it is not able to quickly finish its task and return to User mode, this can only be because it is about to become blocked. Rather than bother it immediately, we set a flag (the SUEXIT user variable) that tells the process that it should PCLSR itself either the next time it tries to block or when it tries to return to User mode. (5) If the process is running in User mode, nothing needs to be done. *DRAFT* DSK:BAWDEN;PCLSR 200 Page 9 The decision made in case 4 has some important corollaries. First, it means that when a PCLSR is requested, the requester cannot be assured of immediate success. Second, it means that a process can run in Exec mode without risk of being PCLSRed, provided it doesn't do anything that might cause it to become blocked. In the sections that follow we explore the consequences of the algorithm given above. The algorithm itself is available as a routine named "PCLSR". 5.1 Rules for System Calls PCLSRing cannot work unless all system calls are written with attention to the fact that they might be PCLSRed. This isn't all that great of a burden once the rules of the game are explained. The most obvious rule is that a system call is free to do anything it likes if it disables clock level interrupts before it starts, and cleans up after itself before reenabling them. This may seem a rather heavy-handed technique, especially given the subtlety of some of the other techniques discussed here, but for critical operations no more than a few instructions in duration it's hard to imagine anything quicker or more effective. (The basic PCLSRing routines are themselves built on the ability to temporarily turn off clock level interrupts; if ITS were ever converted to run with multiple processors, it would be necessary to rework this code. PCLSRing would be just as useful on a multiprocessor ITS, but the substrate upon which it is built would have to change.) System calls may also rely on the fact that they cannot be PCLSRed unless they attempt an operation that might cause them to become blocked. A process can mess around with its own state as much as it likes and no other process will be able to see what it is doing, as long as it is careful not to block. If another process wants to examine this one, it must first PCLSR it, but as long as this process remains running in Exec mode, the PCLSRing algorithm given above guarantees that it will continue to run, and the other process will have to wait. This guarantee is only useful if a system call can easily tell when it runs the risk of going blocked. Fortunately there are only two things that a process can do to become blocked in Exec mode. First, it can explicitly ask to go blocked, directly supplying the Scheduler with the instruction for its FLSINS user variable. Second, it can take a page fault referencing a page in User address space, an action that requires the use of a special instruction. A third possibility, taking a page fault referencing a page in Exec address space -- perhaps doing something as innocuous as fetching the next instruction -- cannot occur. This is because, as was pointed out above, pages in Exec address space are never swapped out. One additional guarantee is made concerning references to pages in a *DRAFT* DSK:BAWDEN;PCLSR 200 Page 10 process's own User address space. System calls can be certain that any page successfully referenced will remain swapped in for the duration of the execution of the system call. (More on how this is accomplished later.) A system call can thus be certain that a reference to a given User address will not cause the process to block, and therefore risk PCLSRing, if it has previously referenced that address. Some system calls take advantage of this guarantee by touching some selected User addresses at the beginning of their execution, thus insuring that they can modify the contents of those addresses later without fear of being PCLSRed at an awkward moment. For example, the SIOT system call starts by touching the locations that contain its byte pointer and byte count arguments. It can then proceed, secure in the knowledge that those locations are swapped in (as well as writable!). Most system calls cannot completely avoid the possibility of going blocked. Given the PCLSRing algorithm, every point where the execution of a system call might block is also a point where that execution might be terminated. In order to allow system calls to prepare themselves for this possibility, each process's LSWPR user variable contains a list of actions that must be performed in order to clean up if it is PCLSRed. LSWPR is usually maintained as a stack, with various system subroutines pushing and popping entries as needed. The most common cleanup handler calls for the release of a lock that the process holds, but it is possible for a system call to supply an arbitrary routine. (There are some restrictions on what such a routine can do, given the delicacy of the situations in which it will be invoked.) System calls also prepare themselves for possible PCLSRing by updating the locations in User memory that contained their original arguments to reflect whatever work they have accomplished thus far. This insures that if the call is PCLSRed and restarted, it will take up where it left off. We have already seen how users are made aware of this aspect of PCLSRing. The need to update locations in User memory in order to properly prepare for possible PCLSRing is the reason why ITS does not swap out User pages during the execution of a system call. If it did, then the very attempt to update arguments before possibly blocking might itself cause the process to block due to a page fault. If the process were to be PCLSRed at that point, there would be no record of what the system call had already accomplished, since the arguments would remain unmodified. PCLSRing has consequences that go beyond simply supplying a few cleanup handlers to protect system calls where they might block. Consider the case of a system call that must read some data from the disk. After it makes its request of the disk system, it will want to block waiting for the disk interrupt level to complete the transfer. If the executing process is PCLSRed at that point, then when the disk interrupt level finishes, there won't be any process waiting for the answer. *DRAFT* DSK:BAWDEN;PCLSR 200 Page 11 The process may come back later looking for the same information, so the interrupt level does not want to discard the data that it just read. On the other hand, the process may never come back, so something must make certain to eventually reclaim the space occupied by the data. The interface between the system call and interrupt level must be designed with these possibilities in mind. In general, certain operations that in other operating systems could be performed directly by system calls must be performed by someone else, since any code in a system call beyond a point where the call might block may never get executed. One final rule that system calls must observe in order for PCLSRing to work: system calls may not run indefinitely without presenting opportunities to be PCLSRed. For some system calls it is possible to supply arguments that require the call to loop. Those system calls periodically check the SUEXIT user variable themselves to see if someone has requested a PCLSR, and if so, they submit to the request. 5.2 Delivering Software Interrupts It is the Scheduler's job to deliver software interrupts to a process when its PIRQC or IFPIR user variable is non-zero. The first step of delivering such interrupts is to PCLSR the process so that its interrupt handler will see a User mode PC. The Scheduler discovers pending software interrupts when it is examining a process to determine if it should actually be run. If a process is not stopped (i.e., if its USTP is zero) and if it has pending interrupts, then the Scheduler will immediately try to PCLSR the process. If the PCLSR attempt fails, then the process must be running in Exec mode. In this case, the Scheduler just treats the process as it would any other runnable process. Since the process's SUEXIT now indicates that the process needs to be PCLSRed, the next time the process tries to go blocked, or return to User mode, it will PCLSR itself and return to the Scheduler. The Scheduler will then be able to deliver the interrupt. If the PCLSR attempt succeeds, then the process has been forced back to User mode. The Scheduler then proceeds to manipulate the process to put it into its interrupt handler. The subsequent steps of this procedure (while quite interesting!) are beyond the scope of this paper. 5.3 When One Process Wants to PCLSR Another The most interesting case of PCLSRing occurs when one process wants to PCLSR another process, as in the example of the DTTY system call given above. In this case the PCLSRing process will want to PCLSR its target and immediately stop it by incrementing the target's USTP. This common operation is done by calling a routine named "RPCLSR", so we often speak of *DRAFT* DSK:BAWDEN;PCLSR 200 Page 12 one process RPCLSRing ("ar-pee-see-loo-zer-ing") another. The difficulties with RPCLSRing all stem from the possibility that the target process might be running in Exec mode, and thus will be unable to be PCLSRed immediately. In this case the perpetrating process will have to go blocked (!) waiting for the target to notice the PCLSR attempt. As we are about to discover, this situation is more delicate than blocking ordinarily is. Later, when the target PCLSRs itself, it will need to be able to find the perpetrator. Thus we introduce the RPCL user variable to keep track of this situation when it occurs. When a process fails to immediately RPCLSR its target, it will store the address of the target process in its own RPCL, and its own address in the target's RPCL. The left half of RPCL is used as a flag to record which process is the target and which process the perpetrator. The algorithm for RPCLSRing is as follows: First, try to PCLSR the target process. If this succeeds, then immediately increment the target's USTP and return. Clock level interrupts are turned off during this procedure to prevent the target from getting a chance to run after it is PCLSRed, but before it is stopped. If the PCLSR attempt fails, the target's SUEXIT has been set to indicate a PCLSR attempt in progress. The perpetrator should now set up the appropriate pointers in both its own and the target's RPCL user variables, but two rare cases must be considered first: (1) The perpetrator might discover that its own RPCL indicates that it itself is the target of an RPCLSR by some third process. In this case the perpetrator immediately submits to the third party. Since the target's RPCL is unmodified, but its SUEXIT has been set, it will uselessly PCLSR itself at some point in the future and then continue to run. These useless PCLSRs are not a problem because this case rarely happens, and a few extra PCLSRs are harmless. (2) The perpetrator might discover that the target's RPCL indicates that some third process is already trying to RPCLSR the target. In this case the perpetrator simply blocks waiting for the target's RPCL to be cleared, and then starts the RPCLSR attempt again from the beginning. (It is possible that after the Scheduler sees that the target's RPCL is clear and unblocks the perpetrator, but before the perpetrator gets a chance to run, the target will start up and get back into Exec mode and become the target of yet another third-party RPCLSR attempt. A process might therefore starve forever waiting for its chance to RPCLSR someone, even though all other processes are making progress. This is not a problem in practice because it is extraordinarily unlikely. It may be more likely to snow in the *DRAFT* DSK:BAWDEN;PCLSR 200 Page 13 machine room than it is for a process to starve in this way for a noticeable amount of time.) If neither of these cases occurs, the perpetrator sets up the two RPCL user variables and blocks. Since the act of going blocked runs the risk of being PCLSRed, something must be done to clear out the RPCL pointers if the perpetrator is itself PCLSRed while it waits. This could be done by using an appropriate cleanup handler pushed on the perpetrator's LSWPR, but instead, the PCLSR routine itself always checks the RPCL of any process it is applied to for the possibility that that process is RPCLSRing someone, and cleans up appropriately. The final step in RPCLSRing occurs when the target tries to block or to return to User mode, and notices that its SUEXIT indicates that it should PCLSR itself, and that in addition its RPCL indicates that it is the target of an RPCLSR request. The target must then PCLSR itself, clear the RPCLs of both itself and the perpetrator, increment its own USTP in order to stop itself, and clear the FLSINS of the perpetrator in order to unblock it. (And it must do all this with clock level interrupts turned off, to prevent a variety of timing errors.) It is important to unblock the PCLSRing process instantly; otherwise it might be PCLSRed and the target's USTP would never get decremented. This risk is eliminated by starting the perpetrator running in Exec mode where it can't be PCLSRed until after taking appropriate precautions. In fact, the next thing many RPCLSRing processes do is add a cleanup handler to their LSWPR to decrement the target's USTP. (Of course, if there is no danger of going blocked, the perpetrator can simply perform its task and decrement the target's USTP when it is done without bothering about cleanup handlers at all.) 5.4 Swapping Recall that ITS guarantees that no User mode pages will be swapped out during the execution of a system call. This cannot be accomplished by simply granting blanket immunity to processes in Exec mode, because processes spend most of their time blocked in Exec mode (waiting for the next character from the terminal or whatever). It is therefore necessary to resort to PCLSRing blocked processes in order to find pages to swap out to secondary storage. Unfortunately, simply PCLSRing a process blocked in Exec mode backs it out of the system call and starts it running in User mode again, which causes it to immediately reexecute the call, and usually to return to the same blocked condition as before. Thus if ITS were to immediately PCLSR a process every time it wanted to swap out some of its pages, that would at least cause a lot of unnecessary activity, and might even cause the process to swap pages back in that were just swapped out. *DRAFT* DSK:BAWDEN;PCLSR 200 Page 14 Instead, when ITS wants to swap out a page belonging to a process that is blocked in Exec mode, it simply sets a bit (named "%SWPCL") in the process's USWST user variable. %SWPCL is checked by the Scheduler whenever it is about to unblock a process, and if it is set, the Scheduler will PCLSR the process instead. The process is allowed to finish waiting for whatever it was waiting for, and only then is it PCLSRed. This is just as effective at preventing the process from ever seeing its pages swapped out as immediately PCLSRing it would be. (Actually, the swapping code will PCLSR a process immediately if it has any entries on its LSWPR. Presumably the idea is to release any locks that the process is holding as soon as possible, so other processes can seize them and make progress. Perhaps because of this, the swapping code tries to avoid swapping out pages that belong to processes with LSWPR entries. The author is not prepared to judge these aspects of ITS's page swapping algorithm.) Since processes that are running in Exec mode cannot be PCLSRed, and since they might depend on keeping any of their pages swapped in, those pages must be kept immune from being swapped out. Fortunately, processes spend very little time actually running in Exec mode, so the number of pages locked down by this restriction is generally small. 5.5 PCLSR Test Mode To debug ITS system calls, it is convenient to have a way to test their behavior when PCLSRed from various points in their execution. PCLSR test mode allows the system call writer to choose a set of such test points by inserting calls to an assembler macro, named "PCLT", in the system call. Each call to PCLT is positioned immediately before a place where the call would block. Then PCLSR test mode lets the system call writer run the system call until it reaches one of the test points. There are three modes that control what happens each time the process being tested reaches one of the test points: (1) In "hold" mode, the process PCLSRs itself and stops. If the system call writer starts the process running again, it will run to the exact same test point and then PCLSR and stop again. Thus the system call writer can test the same point over and over again. (2) In "advance" mode, each time the system call writer starts the process it runs to the same test point as last time, and then sets a flag that causes the process to PCLSR and stop at the n_e_x_t_ test point, where and whenever that happens to occur. Thus the system call writer can test all of the test points in succession in a particular path through a system call. (3) "Advance but don't stop" mode is just like advance mode except that the process is not stopped each time it is PCLSRed, it just keeps *DRAFT* DSK:BAWDEN;PCLSR 200 Page 15 running and thus immediately starts the system call over. This lets the system call writer exercise a system call over and over, presumably waiting for some bug to happen. The exact way in which PCLSR test mode decides that the call has reached "the same test point" is not simply a test of the PC. This would not work very well because a particular test point might be in a subroutine that was called from several places. In fact, PCLSR test mode crawls down the stack sniffing around for return addresses, in an attempt to discover the call chain that got to the current routine. These return addresses are then hashed together. It is the resulting hash that is used to compare test points. 6. Statistics Many of the design decisions made in the ITS implementation of PCLSRing can only be justified by measuring the behavior of a running ITS. Of primary interest are the frequencies at which the various events considered above occur. The statistics below were collected on AI.AI.MIT.EDU, a KS-10, during December 1988 and January 1989. AI.AI.MIT.EDU supports a moderately small user community; typically there are 3 or 4 users logged in, each with 3 or 4 processes, plus a handful of system demons and network servers, for a total of between 15 and 20 processes. There are rarely more than 40 processes, and the system never runs up against its configured limit of 62 processes. How often do the various cases in the PCLSRing algorithm occur? During a 10 day period PCLSR was called once every 0.32 seconds. 35.9% of the time the target was in User mode already (stopped, blocked, or running). 60.3% of the time the target was blocked in Exec mode. The remaining 3.8% of the time the target was running in Exec mode and could not be immediately PCLSRed. This last case therefore happened once every 8.5 seconds. The problem case of PCLSRing, where the target cannot be immediately forced into User mode, is observed to be a relatively rare occurrence. A process that calls RPCLSR will only rarely have to block, and will even more rarely find a third party already blocked attempting to RPCLSR the same target. Also, at any given moment, only a few processes will be holding their pages locked in memory because they are running in Exec mode. We might also wonder for what reasons PCLSR is called. During a 32 hour period PCLSR was called once every 0.40 seconds. The Scheduler called PCLSR in order to deliver a software interrupt once every 0.58 seconds, accounting for 69.0% of all calls to PCLSR. A process called RPCLSR once every 1.44 seconds, accounting for 27.7% of all calls to PCLSR. The page swapping code caused a process to be PCLSRed once every 11.77 seconds, *DRAFT* DSK:BAWDEN;PCLSR 200 Page 16 accounting for the remaining 3.4% of all calls to PCLSR. Given that PCLSR fails 3.8% of the time, we can estimate that the Scheduler failed to immediately deliver an interrupt (and ran the target process instead) once every 15 seconds. We can also estimate that a process called RPCLSR and had to block once every 38 seconds. Of course these events would have happened more frequently on an ITS supporting more processes; still, it is nice to have a feel for just how rare these events are. It is also nice that the page swapping code was doing so little PCLSRing. Of course an ITS with less physical memory, or running more memory intensive applications, would exhibit a higher percentage of PCLSRs for swapping. Finally, we might wonder just how quickly a process that is PCLSRed while running in Exec mode normally reaches User mode. During a 53 hour period PCLSR failed to immediately force a process into User mode 13381 times. On 7376 of these occasions the target process was already the subject of a PCLSR attempt. (These occasions mostly represent duplicate attempts by the Scheduler to deliver an interrupt to a process that had been continuously running in Exec mode since the last time the Scheduler ran.) Therefore on the remaining 6005 occasions the target was n_o_t_ the subject of a prior PCLSR attempt. On these 6005 occasions a timer was started to measure the amount of time it took for the process to eventually reach User mode. The resulting distribution of times has some very interesting features. The average time taken was 0.116 seconds, which is misleading because a minority of cases that took a very long time inflate the average. Perhaps a more reasonable measure is the median: 50% of the time the process reached User mode in under 0.015 seconds. 80% of the time the process took less than 0.107 seconds (still less than the average). 90% of the time the process took less than 0.267 seconds. The Scheduler allows a process 0.133 seconds of runtime before the next schedule, so 82% of the time the target of a PCLSR attempt reached User mode before the next schedule. The time measured was elapsed time, not per-process run time. This makes little difference to the results because typically the target process was the only runnable process in the system, so no delay was introduced while some other process ran. Occasionally, however, there was another runnable process, and as a result the distribution shows a small spike just after 0.133 seconds, caused by PCLSRed processes that had to wait one turn, while another process ran, before they got their chance to quickly reach User mode. An even smaller spike appears after 0.266 seconds. 7. Conclusion The PCLSRing modularity principle is simple: a process should never catch another process with its pants down. PCLSRing gives a process a clean way to access the state of another process. *DRAFT* DSK:BAWDEN;PCLSR 200 Page 17 In this paper I have explained the benefits of this principle for both users and system programmers, and I have explained how the ITS operating system goes about supporting it. I described the implementation in some detail, and also provided some performance metering, in order to demonstrate that the costs of applying the principle are acceptable. I would like to believe that some reader of this paper will be inspired to try bringing the benefits of PCLSRing to some younger operating system. 8. Acknowledgments I am grateful to the early ITS developers who originally designed, implemented, and debugged the ideas presented here: Donald Eastlake, Jerry Freiberg, Richard Greenblatt, Jack Holloway, Tom Knight, George Mitchel, Stewart Nelson, Jeff Rubin, and Frederick Wright. Dave Moon and Tom Knight provided many useful comments on early drafts of this paper and helped me debug my understanding of PCLSRing. Joe Dempster, Ian Horswill, Richard Greenblatt, Leigh Klotz, Chris Maeda, and Guy Steele also provided valuable feedback. It was Joe Dempster who convinced me that s_o_m_e_t_h_i_n_g_ about ITS should be recorded for posterity. Pandora Berman sometimes helped me find the right words. Erasmus helped too. Appendix: User Variable Dictionary AC0S When a process is not actually running, the Scheduler stores the contents of its accumulators in this block of sixteen words. This might be the User mode or the Exec mode accumulators, depending on what the process was doing when the Scheduler took the processor away from it. See also UPC. FLSINS If this variable is non-zero, the process is blocked, and the contents are an instruction the Scheduler can execute to determine if the awaited condition has come to pass. The instruction should skip if the process no longer needs to wait. See also USTP. LSWPR This variable addresses a linked list of actions that must be taken in order to clean up properly if this process is PCLSRed. Common cleanup actions are releasing locks held by this process and decrementing counters incremented by this process. An important instance of such a cleanup action is ensuring that the USTP variable of another process is properly decremented. *DRAFT* DSK:BAWDEN;PCLSR 200 Page 18 PIRQC IFPIR These two variables contain bits that indicate pending software interrupts. RPCL This variable is used to keep track of which processes are trying to RPCLSR which other processes. If process A is trying to RPCLSR process B, then process A's RPCL will contain the address of process B in its right half and zero in its left half. Process B's RPCL will contain the address of process A in its right half and minus one in its left half. SUEXIT If the process is currently executing a system call in Exec mode, this variable contains the instruction that will be used to return to User mode -- usually an instruction that restores the PC from this process's SUUOH. If there is an attempt in progress to PCLSR this process, this variable will instead contain a jump to a special routine. This variable could actually be replaced with a flag that just indicates a pending PCLSR attempt. SUUOH If the process is currently executing a system call in Exec mode, this variable contains the User mode PC that will be returned to when the call completes. See also UUOACS. UNAME JNAME These two variables contain the two SIXBIT names of the process. These variables actually have nothing to do with PCLSRing; they are here to serve as examples of "typical" variables. UPC When a process is not actually running, this is where the Scheduler stores its PC. This might be a User mode or an Exec mode PC, depending on what the process was doing when the Scheduler took the processor away from it. See also AC0S. USTP If this variable is non-zero, the process is stopped. The left half contains various bits corresponding to specific reasons why the process should not run. The right half contains a count of the number of other processes that are keeping this process from running. See also FLSINS. *DRAFT* DSK:BAWDEN;PCLSR 200 Page 19 USWST This variable contains various bits controlling page swapping. Only one of these bits, %SWPCL, concerns us in this paper. %SWPCL is set if any of this process's pages were swapped out while it was blocked in Exec mode. When the process becomes runnable again, the Scheduler will see this bit and PCLSR the process instead of letting it continue in Exec mode. UUOACS While a process is executing a system call in Exec mode, its User mode accumulators are stored in this block of sixteen words. (On a KL or KS the User mode accumulators might instead be in a different hardware AC block, but we ignore that possibility in this paper.) See also SUUOH.