PCLSRing:
                           Keeping Process State Modular

                                    Alan Bawden

        A_b_s_t_r_a_c_t_:  This paper describes the "PCLSRing" feature of the
        Incompatible Time Sharing (ITS) operating system.  PCLSRing
        permits a process to access the state of another process in a
        modular fashion.  The benefits of PCLSRing are explained, its
        implementation is described in detail, and the results of some
        simple performance metering are presented.

    1.  Introduction

    The Incompatible Time Sharing (ITS) operating system was originally written
    for the PDP-6 at the MIT Artificial Intelligence Laboratory during the late
    1960's.  At MIT during the 1970's ITS ran on three KA-10's and one KL-10,
    providing service to researchers in the AI Laboratory and in MIT's
    Laboratory for Computer Science.  Many well-known PDP-10 programs, such as
    the Emacs editor, the MacLisp dialect of Lisp, the Macsyma symbolic algebra
    system, and the Midas assembler, were originally developed under ITS.  More
    than twenty years later ITS still has a small, but devoted, user community.
    It now runs on a handful of KS-10 systems around the world.

    The designers of ITS were not primarily interested in doing research in
    operating systems.  They held a pragmatic view of ITS as a tool for
    enabling the AI Lab's small user community to get the most out of their
    PDP-6/10 and its peripherals (robot arms, TV cameras, etc.).  This is not
    to say that the designers were ignorant of the state of the art in
    operating systems, but they were unlikely to publish papers about their
    experiences.  As a result, many of their pioneering ideas remain largely
    unknown outside the ITS user community.

    Among other features, ITS boasts a convenient symbolic system call
    mechanism that allows calls to be invoked by name and to pass values in
    arbitrary locations, a flexible system for delivering software interrupts
    to processes, device-independent display terminal support, a process
    organization that allows a single command processor to control multiple
    sub-processes, and a powerful mechanism for implementing devices in
    software that enabled ITS to have what was probably the first network file
    system.  Many of these features have since been adopted or rediscovered by
                                        *DRAFT*   DSK:BAWDEN;PCLSR 200   Page 2

    other operating systems, but one of ITS's major design principles, called
    "PCLSRing", has never found its way into any other operating system.

    It is my belief that PCLSRing is an important idea that does not deserve to
    be neglected merely because it has never been adequately explained in
    print.  Therefore, in this paper I have endeavored to explain what PCLSRing
    is, what its benefits are for both the users and the implementors of ITS,
    and how it is implemented.

    The reader should not expect to completely understand the description of
    the implementation of PCLSRing on first reading.  Many of the points are
    subtle, and can probably only be fully appreciated by someone who tries to
    implement PCLSRing for himself.  (If any reader makes this attempt, I would
    like to hear from him!)

    2.  The PCLSRing Modularity Principle

    Under any timesharing operating system there will be occasions when a
    process must access the state of another process.  A process may need to
    start, stop, debug, load, dump, create, or destroy another process.  There
    is also one occasion when a process must access its own state: an interrupt
    handler needs access to the state of the running process prior to the
    arrival of the interrupt, so that the process may continue after the
    interrupt has been dealt with.

    "PCLSRing" is a mechanism the ITS operating system uses to enforce a kind
    of modularity when a process must access the state of another process.  The
    modularity principle is very simple: no process ever catches another
    process (including itself) in the act of executing a system call.  System
    calls thus behave as if they were directly implemented in hardware.  A
    process can no more catch another process in the middle of deleting a file
    than it can catch another process in the middle of a multiply instruction.

    Another way to state the PCLSRing modularity principle is to say that when
    a process looks at the program counter of another process, it must always
    see a User mode PC, never an Exec mode PC.  When one process wants to
    access the state of another process, and the target process happens to be
    executing in Exec mode, then something must be done in order to put the
    target process into User mode.  Either the target process must be backed
    out of the system call, leaving the User mode PC pointing at the call, or
    it must be allowed to finish, leaving the User mode PC pointing just after
    the call.  This act of getting a process into User mode as quickly as
    possible is called "PCLSRing" (pronounced "pee-see-loo-zer-ing") a process.
    We say that one process "PCLSRed" ("pee-see-loo-zerd") another process, or
    we speak of the need to "PCLSR" ("pee-see-loo-zer") a process before giving
    it an interrupt.

    Just as certain PDP-10 instructions cannot be guaranteed to run to
                                        *DRAFT*   DSK:BAWDEN;PCLSR 200   Page 3

    completion, either because they might be interrupted or because they run
    the risk of taking a page fault (e.g. BLT), certain ITS system calls cannot
    always run to completion before the executing process must be PCLSRed.  A
    prime example is the SIOT system call, which is used to transfer a string
    of bytes to or from an I/O device.

    Suppose a process is using SIOT to send a long string of characters to a
    slow device (such as a terminal) when an interrupt arrives.  It is
    impossible to take back the characters that have already reached the
    device, and there may be an indefinite delay before the rest of the
    characters can be output.  The PCLSRing modularity principle requires that
    the process be maneuvered out of the system call and into User mode as
    quickly as possible, else the interrupt cannot be promptly delivered.  How
    can the state of the process be captured in a User mode PC given that the
    process really is halfway through outputting the string of characters?

    The solution is to employ the same trick that the PDP-10 hardware uses when
    it is necessary to interrupt an instruction, such as BLT, that has only
    partly performed its task.  The arguments originally passed to the system
    call are updated in place to reflect any work already accomplished, and the
    PC is left pointing at the system call, so that when the system call is
    restarted it will proceed from where it left off.  In the case of the SIOT
    system call, the locations in memory from which the byte count and the byte
    pointer arguments were originally fetched will be overwritten with a new
    count and pointer that address the bytes that remain to be output.

    All ITS system calls that might be interrupted with their tasks partially
    accomplished are designed to update their arguments in this fashion.  This
    requirement might be expected to introduce some system calls with baroque
    calling conventions, but in fact the vast majority of system calls have no
    need of ever updating any of their arguments, and those few that do are
    generally able to do so in a natural way.  (One rarely used system call
    does require an additional argument that should initially be supplied as
    zero and is used as a "state word" to keep track of the progress of the
    call.  In this obscure case, the documentation explains what the non-zero
    values of this argument mean, although no program is known to make use of
    this information.)

    3.  The Benefits of PCLSRing

    The PCLSRing modularity principle is easy to state and easy for the user to
    understand, but it is tricky to implement.  The majority of this paper is
    devoted to explaining the techniques used in the ITS implementation of
    PCLSRing.  Before going into that, however, let us consider why PCLSRing is
    worth all the trouble it entails.  I will present examples of the benefits
    that PCLSRing has for both the users and the implementors of ITS.

    Users are chiefly conscious of the effects of PCLSRing when they write
                                        *DRAFT*   DSK:BAWDEN;PCLSR 200   Page 4

    interrupt handlers.  The main program level PC that is passed to an
    interrupt handler is always a User mode PC.  By examining this PC and the
    contents of memory, the interrupt handler can determine precisely what the
    state of the interrupted task was.  If the interrupt handler needs to know,
    for example, whether or not the main program level had successfully deleted
    a file before the user typed an interrupt character, it can simply look to
    see if the PC points before or after the DELETE system call.

    Any question the interrupt handler may have about the state of the
    computation when the interrupt occurred can be answered in terms of a
    simple machine model in which all system calls behave as if they were
    hardware instructions.  The effects of eccentric interrupt handlers that
    dismiss interrupts by returning to a PC other than the one interrupted
    from, or that alter the main program level accumulators, are
    straightforward to predict.  It is, for example, easy to construct a simple
    interrupt driven scheduler that multiplexes one process into multiple
    tasks.

    For the implementors of ITS, PCLSRing is a powerful aid in managing the
    interactions between processes.  This can be illustrated by considering the
    example of the DTTY system call.

    ITS supports a notion of "ownership" of the user's terminal.  As various
    processes run on the user's behalf (editor, compiler, debugger, etc.), they
    pass ownership of the terminal back and forth.  A critical moment occurs
    when one process uses DTTY to take ownership of the terminal back from a
    process that it had previously given it to.  When the acquiring process
    wrests control of the terminal away from its current owner, how can it know
    it won't catch the owner in the middle of actually using the terminal,
    potentially leaving some per-terminal data structures in an inconsistent
    state?

    The answer is for the acquiring process to start by PCLSRing the terminal's
    current owner.  This will insure that the current owner is not in the
    middle of any system call.  Therefore all terminal-specific data structures
    will be in a known state, and it will be safe to transfer ownership of the
    terminal.

    This example shows most clearly why PCLSRing is a tool for promoting
    m_o_d_u_l_a_r_i_t_y_.  PCLSRing can be used to hide the implementation details of a
    system call that modifies the state of some data structure.  Only
    documented state transitions are ever exposed to other processes.

    4.  The Basics of ITS

    In order to understand the techniques used to implement PCLSRing, it is
    necessary to have some understanding of the basic structure of ITS.  This
    section introduces the players.
                                        *DRAFT*   DSK:BAWDEN;PCLSR 200   Page 5


    4.1  Processes and "User" Variables

    The basic job of any timesharing operating system is to share a single
    processor among set of independent processes.  An ITS process has a name,
    some I/O channels, a program counter, sixteen accumulators, a page table,
    an interrupt system, and assorted other properties.  These properties are
    all held in a collection of per-process variables that ITS keeps in its own
    address space.

    At any particular moment a process can be running in either User mode or
    Exec mode (as determined by a bit in the left half of the program counter).
    In User mode, each process has its own page table, and can configure its
    User mode address space to suit itself.  In Exec mode, all processes share
    the same page table, and thus all execute in the same Exec mode address
    space.  Hardware interrupt service routines also execute in this single
    Exec mode address space.

    An important property of the Exec mode address space is that its pages are
    never swapped out to secondary storage.  All system code and data
    structures, including the per-process variables, are stored in Exec address
    space, and are always available without risk of causing a page fault.  This
    convenient property is possible only because the implementors have taken
    great pains to minimize the size of ITS and its data structures.

    ITS processes are variously called "processes", "jobs", "users", and
    "procedures".  To avoid confusion, this paper consistently uses the word
    "process", with one exception:  the set of per-process variables that ITS
    maintains are referred to as "user variables", rather than the more
    consistent "process variables", because to do otherwise would violate
    tradition.  (Knowing that processes are sometimes called "users" will also
    help the reader understand why there are so many U's in the names of
    various routines and variables!)

    Typical user variables are the UNAME and JNAME; these contain two SIXBIT
    words that are unique to this process, and together serve as the process's
    name.  Additional user variables will be introduced as they are needed.
    For easy reference, a dictionary of all user variables mentioned anywhere
    in the paper, each with a brief description of its function, is given in an
    appendix.  This appendix also reveals some simplifications made by the main
    text.

    4.2  System Calls

    As in any other PDP-10 operating system, ITS system calls are performed by
    executing instructions that cause a trap to Exec mode.  The Exec mode trap
    handler saves the User mode PC in the SUUOH user variable, saves the
    contents of the User mode accumulators in the UUOACS user variable (an
                                        *DRAFT*   DSK:BAWDEN;PCLSR 200   Page 6

    array of sixteen words), examines the instruction that caused the trap, and
    dispatches to the appropriate system call.

    Each system call is responsible for obtaining its own arguments from the
    User mode address space, performing its particular task, and writing any
    return values back into User mode memory.  (Many system calls make use of a
    flexible and convenient facility ITS provides for processing arguments and
    returning values.)

    All system calls return to a common point in order to return to User mode.
    This common return code checks for a number of possible error conditions
    (such as returning to User mode while still holding a lock or with hardware
    interrupts deferred), performs some general cleanup, restores the
    accumulators from UUOACS, and returns to the User mode PC saved in SUUOH.

    4.3  Scheduler

    The ITS Scheduler runs at the lowest priority of the PDP-10's seven
    hardware interrupt levels.  This interrupt level is commonly called "clock
    level" because a hardware clock interrupt is signaled there periodically.
    (Actually, several minor interrupts are also signaled at clock level, and a
    variety of periodic overhead tasks are performed there as well, but in this
    paper the only clock level activity that will concern us is the Scheduler.)

    The Scheduler's job is to switch the processor back and forth between those
    processes that are currently runnable.  When a process is not actually
    running on the processor the Scheduler keeps its PC in the UPC user
    variable and keeps its accumulators in the AC0S user variable (an array of
    sixteen words).

    When the Scheduler examines a process it may discover that the process is
    either "stopped", "blocked", or "running".  If a process is stopped, then
    some external agent has decided that this process must not run, so the
    Scheduler can do nothing with it.  If a process is blocked, it is waiting
    for something, so the Scheduler must check for the awaited condition to see
    if the process can be unblocked and allowed to run.  If a process is
    running, then it is capable of making immediate progress, so the Scheduler
    can allow it to use the processor.

    The Scheduler uses the user variables USTP and FLSINS to keep track of the
    runnability of a process.  If a process's USTP is non-zero, then the
    process is stopped.  The left half contains various bits corresponding to
    specific reasons why the process should not run (for example, one bit is
    set when the process receives a fatal interrupt).  The right half contains
    a count of the number of other processes that are currently keeping this
    process stopped.  As each such process finishes, it decrements this
    variable so that the Scheduler will know when everybody is finished and it
    is safe to run the process once again.

                                        *DRAFT*   DSK:BAWDEN;PCLSR 200   Page 7

    If the FLSINS user variable is non-zero, then the process is blocked
    waiting for some condition, such as the typing of a character on the user's
    terminal, the passage of a given amount of time, or the arrival of a page
    to satisfy a page fault.  In such a case FLSINS contains an instruction
    that the Scheduler can execute in order to determine if the awaited
    condition has come to pass.  Such an instruction should skip if the process
    is no longer blocked.

    At any given moment a process can be in one of five possible states:

      (1)  Stopped.  In this case USTP is non-zero and UPC contains a User mode
           PC.  (FLSINS is zero, which isn't immediately relevant, but does
           insure that the process will run as soon as its USTP is cleared.)

      (2)  Blocked in Exec mode.  USTP is zero, FLSINS contains an instruction
           that will skip when the process is no longer blocked, and UPC
           contains an Exec mode PC.

      (3)  Blocked in User mode.  USTP is zero, FLSINS contains an instruction
           that will skip when the process is no longer blocked, and UPC
           contains a User mode PC.  The only possible reason for a process to
           be blocked in User mode is that it is waiting for a page to satisfy
           a page fault; all other cases of blocking occur after a process has
           executed a system call and entered Exec mode.

      (4)  Running in Exec mode.  USTP is zero, FLSINS is zero, and UPC
           contains an Exec mode PC.

      (5)  Running in User mode.  USTP is zero, FLSINS is zero, and UPC
           contains a User mode PC.

    Note that a stopped process m_u_s_t_ be in User mode.

    Once they get a chance to run, processes have ways of influencing the
    behavior of the Scheduler.  For example, a process can temporarily mask
    clock-level interrupts, which insures that it has exclusive use of the
    processor while it performs some critical action that the Scheduler must
    not interrupt.  Processes can also intentionally yield the processor and
    allow the Scheduler to run other processes; a process will do this when it
    first becomes blocked.

    4.4  Software Interrupts

    ITS has a sophisticated software interrupt system that it uses to inform
    processes of a variety of conditions that might require attention.  When a
    hardware device driver or another process wishes to interrupt a given
    process, it merely sets a bit in one of two user variables, PIRQC or IFPIR.
    It is then up to the Scheduler to actually deliver the interrupt.

                                        *DRAFT*   DSK:BAWDEN;PCLSR 200   Page 8

    Whenever the Scheduler is considering running a process it looks at its
    PIRQC and IFPIR variables as well as its USTP and FLSINS.  If it discovers
    that an interrupt is pending, it tries to manipulate the process into its
    interrupt handler.  This is a somewhat tricky procedure, because the
    required manipulations can cause page faults!  Fortunately, only the first
    step of the procedure concerns us in this paper:  the Scheduler first must
    PCLSR the process so that the process's interrupt handler sees a User mode
    PC.

    5.  The Implementation of PCLSRing

    Now that we have introduced the players, we can proceed to examine the
    implementation of PCLSRing in detail.

    When it becomes necessary to PCLSR some target process, we will find it in
    one of the five possible states described in the introduction to the
    Scheduler above.  Considering each state in turn, what needs to be done in
    order to get the target process into User mode as quickly as possible?

      (1)  If the process is stopped, it is already in User mode and nothing
           needs to be done.

      (2)  If the process is blocked in Exec mode, then we must force it back
           to the start of whatever system call brought it into Exec mode.
           First we clean up what the process was doing: the LSWPR user
           variable, described below, contains a linked list of cleanup actions
           that we perform.  Then we copy the saved User mode accumulators from
           UUOACS into AC0S, and we pick up the User mode return PC from SUUOH,
           decrement it so that the process will start the system call from the
           beginning, and put it in UPC.  Finally we clear the process's
           FLSINS.

      (3)  If the process is blocked in User mode, it must be waiting for a
           page to satisfy a page fault.  Fortunately, nothing is harmed if the
           process simply forgets this fact, because as soon as it starts
           running it will either immediately take the same fault again or the
           page will have arrived in the interim.  So in this case we simply
           clear the process's FLSINS.

      (4)  If the process is running in Exec mode, we know that it cannot
           remain in this state for long.  If it is not able to quickly finish
           its task and return to User mode, this can only be because it is
           about to become blocked.  Rather than bother it immediately, we set
           a flag (the SUEXIT user variable) that tells the process that it
           should PCLSR itself either the next time it tries to block or when
           it tries to return to User mode.

      (5)  If the process is running in User mode, nothing needs to be done.
                                        *DRAFT*   DSK:BAWDEN;PCLSR 200   Page 9

    The decision made in case 4 has some important corollaries.  First, it
    means that when a PCLSR is requested, the requester cannot be assured of
    immediate success.  Second, it means that a process can run in Exec mode
    without risk of being PCLSRed, provided it doesn't do anything that might
    cause it to become blocked.

    In the sections that follow we explore the consequences of the algorithm
    given above.  The algorithm itself is available as a routine named "PCLSR".

    5.1  Rules for System Calls

    PCLSRing cannot work unless all system calls are written with attention to
    the fact that they might be PCLSRed.  This isn't all that great of a burden
    once the rules of the game are explained.

    The most obvious rule is that a system call is free to do anything it likes
    if it disables clock level interrupts before it starts, and cleans up after
    itself before reenabling them.  This may seem a rather heavy-handed
    technique, especially given the subtlety of some of the other techniques
    discussed here, but for critical operations no more than a few instructions
    in duration it's hard to imagine anything quicker or more effective.

    (The basic PCLSRing routines are themselves built on the ability to
    temporarily turn off clock level interrupts; if ITS were ever converted to
    run with multiple processors, it would be necessary to rework this code.
    PCLSRing would be just as useful on a multiprocessor ITS, but the substrate
    upon which it is built would have to change.)

    System calls may also rely on the fact that they cannot be PCLSRed unless
    they attempt an operation that might cause them to become blocked.  A
    process can mess around with its own state as much as it likes and no other
    process will be able to see what it is doing, as long as it is careful not
    to block.  If another process wants to examine this one, it must first
    PCLSR it, but as long as this process remains running in Exec mode, the
    PCLSRing algorithm given above guarantees that it will continue to run, and
    the other process will have to wait.

    This guarantee is only useful if a system call can easily tell when it runs
    the risk of going blocked.  Fortunately there are only two things that a
    process can do to become blocked in Exec mode.  First, it can explicitly
    ask to go blocked, directly supplying the Scheduler with the instruction
    for its FLSINS user variable.  Second, it can take a page fault referencing
    a page in User address space, an action that requires the use of a special
    instruction.  A third possibility, taking a page fault referencing a page
    in Exec address space -- perhaps doing something as innocuous as fetching
    the next instruction -- cannot occur.  This is because, as was pointed out
    above, pages in Exec address space are never swapped out.

    One additional guarantee is made concerning references to pages in a
                                       *DRAFT*   DSK:BAWDEN;PCLSR 200   Page 10

    process's own User address space.  System calls can be certain that any
    page successfully referenced will remain swapped in for the duration of the
    execution of the system call.  (More on how this is accomplished later.)  A
    system call can thus be certain that a reference to a given User address
    will not cause the process to block, and therefore risk PCLSRing, if it has
    previously referenced that address.

    Some system calls take advantage of this guarantee by touching some
    selected User addresses at the beginning of their execution, thus insuring
    that they can modify the contents of those addresses later without fear of
    being PCLSRed at an awkward moment.  For example, the SIOT system call
    starts by touching the locations that contain its byte pointer and byte
    count arguments.  It can then proceed, secure in the knowledge that those
    locations are swapped in (as well as writable!).

    Most system calls cannot completely avoid the possibility of going blocked.
    Given the PCLSRing algorithm, every point where the execution of a system
    call might block is also a point where that execution might be terminated.
    In order to allow system calls to prepare themselves for this possibility,
    each process's LSWPR user variable contains a list of actions that must be
    performed in order to clean up if it is PCLSRed.  LSWPR is usually
    maintained as a stack, with various system subroutines pushing and popping
    entries as needed.  The most common cleanup handler calls for the release
    of a lock that the process holds, but it is possible for a system call to
    supply an arbitrary routine.  (There are some restrictions on what such a
    routine can do, given the delicacy of the situations in which it will be
    invoked.)

    System calls also prepare themselves for possible PCLSRing by updating the
    locations in User memory that contained their original arguments to reflect
    whatever work they have accomplished thus far.  This insures that if the
    call is PCLSRed and restarted, it will take up where it left off.  We have
    already seen how users are made aware of this aspect of PCLSRing.

    The need to update locations in User memory in order to properly prepare
    for possible PCLSRing is the reason why ITS does not swap out User pages
    during the execution of a system call.  If it did, then the very attempt to
    update arguments before possibly blocking might itself cause the process to
    block due to a page fault.  If the process were to be PCLSRed at that
    point, there would be no record of what the system call had already
    accomplished, since the arguments would remain unmodified.

    PCLSRing has consequences that go beyond simply supplying a few cleanup
    handlers to protect system calls where they might block.  Consider the case
    of a system call that must read some data from the disk.  After it makes
    its request of the disk system, it will want to block waiting for the disk
    interrupt level to complete the transfer.  If the executing process is
    PCLSRed at that point, then when the disk interrupt level finishes, there
    won't be any process waiting for the answer.

                                       *DRAFT*   DSK:BAWDEN;PCLSR 200   Page 11

    The process may come back later looking for the same information, so the
    interrupt level does not want to discard the data that it just read.  On
    the other hand, the process may never come back, so something must make
    certain to eventually reclaim the space occupied by the data.  The
    interface between the system call and interrupt level must be designed with
    these possibilities in mind.  In general, certain operations that in other
    operating systems could be performed directly by system calls must be
    performed by someone else, since any code in a system call beyond a point
    where the call might block may never get executed.

    One final rule that system calls must observe in order for PCLSRing to
    work:  system calls may not run indefinitely without presenting
    opportunities to be PCLSRed.  For some system calls it is possible to
    supply arguments that require the call to loop.  Those system calls
    periodically check the SUEXIT user variable themselves to see if someone
    has requested a PCLSR, and if so, they submit to the request.

    5.2  Delivering Software Interrupts

    It is the Scheduler's job to deliver software interrupts to a process when
    its PIRQC or IFPIR user variable is non-zero.  The first step of delivering
    such interrupts is to PCLSR the process so that its interrupt handler will
    see a User mode PC.

    The Scheduler discovers pending software interrupts when it is examining a
    process to determine if it should actually be run.  If a process is not
    stopped (i.e., if its USTP is zero) and if it has pending interrupts, then
    the Scheduler will immediately try to PCLSR the process.

    If the PCLSR attempt fails, then the process must be running in Exec mode.
    In this case, the Scheduler just treats the process as it would any other
    runnable process.  Since the process's SUEXIT now indicates that the
    process needs to be PCLSRed, the next time the process tries to go blocked,
    or return to User mode, it will PCLSR itself and return to the Scheduler.
    The Scheduler will then be able to deliver the interrupt.

    If the PCLSR attempt succeeds, then the process has been forced back to
    User mode.  The Scheduler then proceeds to manipulate the process to put it
    into its interrupt handler.  The subsequent steps of this procedure (while
    quite interesting!) are beyond the scope of this paper.

    5.3  When One Process Wants to PCLSR Another

    The most interesting case of PCLSRing occurs when one process wants to
    PCLSR another process, as in the example of the DTTY system call given
    above.  In this case the PCLSRing process will want to PCLSR its target and
    immediately stop it by incrementing the target's USTP.  This common
    operation is done by calling a routine named "RPCLSR", so we often speak of
                                       *DRAFT*   DSK:BAWDEN;PCLSR 200   Page 12

    one process RPCLSRing ("ar-pee-see-loo-zer-ing") another.

    The difficulties with RPCLSRing all stem from the possibility that the
    target process might be running in Exec mode, and thus will be unable to be
    PCLSRed immediately.  In this case the perpetrating process will have to go
    blocked (!) waiting for the target to notice the PCLSR attempt.

    As we are about to discover, this situation is more delicate than blocking
    ordinarily is.  Later, when the target PCLSRs itself, it will need to be
    able to find the perpetrator.  Thus we introduce the RPCL user variable to
    keep track of this situation when it occurs.  When a process fails to
    immediately RPCLSR its target, it will store the address of the target
    process in its own RPCL, and its own address in the target's RPCL.  The
    left half of RPCL is used as a flag to record which process is the target
    and which process the perpetrator.

    The algorithm for RPCLSRing is as follows:

    First, try to PCLSR the target process.  If this succeeds, then immediately
    increment the target's USTP and return.  Clock level interrupts are turned
    off during this procedure to prevent the target from getting a chance to
    run after it is PCLSRed, but before it is stopped.

    If the PCLSR attempt fails, the target's SUEXIT has been set to indicate a
    PCLSR attempt in progress.  The perpetrator should now set up the
    appropriate pointers in both its own and the target's RPCL user variables,
    but two rare cases must be considered first:

      (1)  The perpetrator might discover that its own RPCL indicates that it
           itself is the target of an RPCLSR by some third process.  In this
           case the perpetrator immediately submits to the third party.  Since
           the target's RPCL is unmodified, but its SUEXIT has been set, it
           will uselessly PCLSR itself at some point in the future and then
           continue to run.  These useless PCLSRs are not a problem because
           this case rarely happens, and a few extra PCLSRs are harmless.

      (2)  The perpetrator might discover that the target's RPCL indicates that
           some third process is already trying to RPCLSR the target.  In this
           case the perpetrator simply blocks waiting for the target's RPCL to
           be cleared, and then starts the RPCLSR attempt again from the
           beginning.

           (It is possible that after the Scheduler sees that the target's RPCL
           is clear and unblocks the perpetrator, but before the perpetrator
           gets a chance to run, the target will start up and get back into
           Exec mode and become the target of yet another third-party RPCLSR
           attempt.  A process might therefore starve forever waiting for its
           chance to RPCLSR someone, even though all other processes are making
           progress.  This is not a problem in practice because it is
           extraordinarily unlikely.  It may be more likely to snow in the
                                       *DRAFT*   DSK:BAWDEN;PCLSR 200   Page 13

           machine room than it is for a process to starve in this way for a
           noticeable amount of time.)

    If neither of these cases occurs, the perpetrator sets up the two RPCL user
    variables and blocks.  Since the act of going blocked runs the risk of
    being PCLSRed, something must be done to clear out the RPCL pointers if the
    perpetrator is itself PCLSRed while it waits.  This could be done by using
    an appropriate cleanup handler pushed on the perpetrator's LSWPR, but
    instead, the PCLSR routine itself always checks the RPCL of any process it
    is applied to for the possibility that that process is RPCLSRing someone,
    and cleans up appropriately.

    The final step in RPCLSRing occurs when the target tries to block or to
    return to User mode, and notices that its SUEXIT indicates that it should
    PCLSR itself, and that in addition its RPCL indicates that it is the target
    of an RPCLSR request.  The target must then PCLSR itself, clear the RPCLs
    of both itself and the perpetrator, increment its own USTP in order to stop
    itself, and clear the FLSINS of the perpetrator in order to unblock it.
    (And it must do all this with clock level interrupts turned off, to prevent
    a variety of timing errors.)

    It is important to unblock the PCLSRing process instantly; otherwise it
    might be PCLSRed and the target's USTP would never get decremented.  This
    risk is eliminated by starting the perpetrator running in Exec mode where
    it can't be PCLSRed until after taking appropriate precautions.  In fact,
    the next thing many RPCLSRing processes do is add a cleanup handler to
    their LSWPR to decrement the target's USTP.  (Of course, if there is no
    danger of going blocked, the perpetrator can simply perform its task and
    decrement the target's USTP when it is done without bothering about cleanup
    handlers at all.)

    5.4  Swapping

    Recall that ITS guarantees that no User mode pages will be swapped out
    during the execution of a system call.  This cannot be accomplished by
    simply granting blanket immunity to processes in Exec mode, because
    processes spend most of their time blocked in Exec mode (waiting for the
    next character from the terminal or whatever).  It is therefore necessary
    to resort to PCLSRing blocked processes in order to find pages to swap out
    to secondary storage.

    Unfortunately, simply PCLSRing a process blocked in Exec mode backs it out
    of the system call and starts it running in User mode again, which causes
    it to immediately reexecute the call, and usually to return to the same
    blocked condition as before.  Thus if ITS were to immediately PCLSR a
    process every time it wanted to swap out some of its pages, that would at
    least cause a lot of unnecessary activity, and might even cause the process
    to swap pages back in that were just swapped out.

                                       *DRAFT*   DSK:BAWDEN;PCLSR 200   Page 14

    Instead, when ITS wants to swap out a page belonging to a process that is
    blocked in Exec mode, it simply sets a bit (named "%SWPCL") in the
    process's USWST user variable.  %SWPCL is checked by the Scheduler whenever
    it is about to unblock a process, and if it is set, the Scheduler will
    PCLSR the process instead.  The process is allowed to finish waiting for
    whatever it was waiting for, and only then is it PCLSRed.  This is just as
    effective at preventing the process from ever seeing its pages swapped out
    as immediately PCLSRing it would be.

    (Actually, the swapping code will PCLSR a process immediately if it has any
    entries on its LSWPR.  Presumably the idea is to release any locks that the
    process is holding as soon as possible, so other processes can seize them
    and make progress.  Perhaps because of this, the swapping code tries to
    avoid swapping out pages that belong to processes with LSWPR entries.  The
    author is not prepared to judge these aspects of ITS's page swapping
    algorithm.)

    Since processes that are running in Exec mode cannot be PCLSRed, and since
    they might depend on keeping any of their pages swapped in, those pages
    must be kept immune from being swapped out.  Fortunately, processes spend
    very little time actually running in Exec mode, so the number of pages
    locked down by this restriction is generally small.

    5.5  PCLSR Test Mode

    To debug ITS system calls, it is convenient to have a way to test their
    behavior when PCLSRed from various points in their execution.  PCLSR test
    mode allows the system call writer to choose a set of such test points by
    inserting calls to an assembler macro, named "PCLT", in the system call.
    Each call to PCLT is positioned immediately before a place where the call
    would block.  Then PCLSR test mode lets the system call writer run the
    system call until it reaches one of the test points.  There are three modes
    that control what happens each time the process being tested reaches one of
    the test points:

      (1)  In "hold" mode, the process PCLSRs itself and stops.  If the system
           call writer starts the process running again, it will run to the
           exact same test point and then PCLSR and stop again.  Thus the
           system call writer can test the same point over and over again.

      (2)  In "advance" mode, each time the system call writer starts the
           process it runs to the same test point as last time, and then sets a
           flag that causes the process to PCLSR and stop at the n_e_x_t_ test
           point, where and whenever that happens to occur.  Thus the system
           call writer can test all of the test points in succession in a
           particular path through a system call.

      (3)  "Advance but don't stop" mode is just like advance mode except that
           the process is not stopped each time it is PCLSRed, it just keeps
                                       *DRAFT*   DSK:BAWDEN;PCLSR 200   Page 15

           running and thus immediately starts the system call over.  This lets
           the system call writer exercise a system call over and over,
           presumably waiting for some bug to happen.

    The exact way in which PCLSR test mode decides that the call has reached
    "the same test point" is not simply a test of the PC.  This would not work
    very well because a particular test point might be in a subroutine that was
    called from several places.  In fact, PCLSR test mode crawls down the stack
    sniffing around for return addresses, in an attempt to discover the call
    chain that got to the current routine.  These return addresses are then
    hashed together.  It is the resulting hash that is used to compare test
    points.

    6.  Statistics

    Many of the design decisions made in the ITS implementation of PCLSRing can
    only be justified by measuring the behavior of a running ITS.  Of primary
    interest are the frequencies at which the various events considered above
    occur.

    The statistics below were collected on AI.AI.MIT.EDU, a KS-10, during
    December 1988 and January 1989.  AI.AI.MIT.EDU supports a moderately small
    user community; typically there are 3 or 4 users logged in, each with 3 or
    4 processes, plus a handful of system demons and network servers, for a
    total of between 15 and 20 processes.  There are rarely more than 40
    processes, and the system never runs up against its configured limit of 62
    processes.

    How often do the various cases in the PCLSRing algorithm occur?  During a
    10 day period PCLSR was called once every 0.32 seconds.  35.9% of the time
    the target was in User mode already (stopped, blocked, or running).  60.3%
    of the time the target was blocked in Exec mode.  The remaining 3.8% of the
    time the target was running in Exec mode and could not be immediately
    PCLSRed.  This last case therefore happened once every 8.5 seconds.

    The problem case of PCLSRing, where the target cannot be immediately forced
    into User mode, is observed to be a relatively rare occurrence.  A process
    that calls RPCLSR will only rarely have to block, and will even more rarely
    find a third party already blocked attempting to RPCLSR the same target.
    Also, at any given moment, only a few processes will be holding their pages
    locked in memory because they are running in Exec mode.

    We might also wonder for what reasons PCLSR is called.  During a 32 hour
    period PCLSR was called once every 0.40 seconds.  The Scheduler called
    PCLSR in order to deliver a software interrupt once every 0.58 seconds,
    accounting for 69.0% of all calls to PCLSR.  A process called RPCLSR once
    every 1.44 seconds, accounting for 27.7% of all calls to PCLSR.  The page
    swapping code caused a process to be PCLSRed once every 11.77 seconds,
                                       *DRAFT*   DSK:BAWDEN;PCLSR 200   Page 16

    accounting for the remaining 3.4% of all calls to PCLSR.

    Given that PCLSR fails 3.8% of the time, we can estimate that the Scheduler
    failed to immediately deliver an interrupt (and ran the target process
    instead) once every 15 seconds.  We can also estimate that a process called
    RPCLSR and had to block once every 38 seconds.  Of course these events
    would have happened more frequently on an ITS supporting more processes;
    still, it is nice to have a feel for just how rare these events are.  It is
    also nice that the page swapping code was doing so little PCLSRing.  Of
    course an ITS with less physical memory, or running more memory intensive
    applications, would exhibit a higher percentage of PCLSRs for swapping.

    Finally, we might wonder just how quickly a process that is PCLSRed while
    running in Exec mode normally reaches User mode.  During a 53 hour period
    PCLSR failed to immediately force a process into User mode 13381 times.  On
    7376 of these occasions the target process was already the subject of a
    PCLSR attempt.  (These occasions mostly represent duplicate attempts by the
    Scheduler to deliver an interrupt to a process that had been continuously
    running in Exec mode since the last time the Scheduler ran.)  Therefore on
    the remaining 6005 occasions the target was n_o_t_ the subject of a prior
    PCLSR attempt.  On these 6005 occasions a timer was started to measure the
    amount of time it took for the process to eventually reach User mode.

    The resulting distribution of times has some very interesting features.
    The average time taken was 0.116 seconds, which is misleading because a
    minority of cases that took a very long time inflate the average.  Perhaps
    a more reasonable measure is the median:  50% of the time the process
    reached User mode in under 0.015 seconds.  80% of the time the process took
    less than 0.107 seconds (still less than the average).  90% of the time the
    process took less than 0.267 seconds.  The Scheduler allows a process 0.133
    seconds of runtime before the next schedule, so 82% of the time the target
    of a PCLSR attempt reached User mode before the next schedule.

    The time measured was elapsed time, not per-process run time.  This makes
    little difference to the results because typically the target process was
    the only runnable process in the system, so no delay was introduced while
    some other process ran.  Occasionally, however, there was another runnable
    process, and as a result the distribution shows a small spike just after
    0.133 seconds, caused by PCLSRed processes that had to wait one turn, while
    another process ran, before they got their chance to quickly reach User
    mode.  An even smaller spike appears after 0.266 seconds.

    7.  Conclusion

    The PCLSRing modularity principle is simple:  a process should never catch
    another process with its pants down.  PCLSRing gives a process a clean way
    to access the state of another process.

                                       *DRAFT*   DSK:BAWDEN;PCLSR 200   Page 17

    In this paper I have explained the benefits of this principle for both
    users and system programmers, and I have explained how the ITS operating
    system goes about supporting it.  I described the implementation in some
    detail, and also provided some performance metering, in order to
    demonstrate that the costs of applying the principle are acceptable.

    I would like to believe that some reader of this paper will be inspired to
    try bringing the benefits of PCLSRing to some younger operating system.

    8.  Acknowledgments

    I am grateful to the early ITS developers who originally designed,
    implemented, and debugged the ideas presented here:  Donald Eastlake, Jerry
    Freiberg, Richard Greenblatt, Jack Holloway, Tom Knight, George Mitchel,
    Stewart Nelson, Jeff Rubin, and Frederick Wright.

    Dave Moon and Tom Knight provided many useful comments on early drafts of
    this paper and helped me debug my understanding of PCLSRing.  Joe Dempster,
    Ian Horswill, Richard Greenblatt, Leigh Klotz, Chris Maeda, and Guy Steele
    also provided valuable feedback.  It was Joe Dempster who convinced me that
    s_o_m_e_t_h_i_n_g_ about ITS should be recorded for posterity.  Pandora Berman
    sometimes helped me find the right words.  Erasmus helped too.

    Appendix:  User Variable Dictionary

    AC0S
            When a process is not actually running, the Scheduler stores the
            contents of its accumulators in this block of sixteen words.  This
            might be the User mode or the Exec mode accumulators, depending on
            what the process was doing when the Scheduler took the processor
            away from it.  See also UPC.

    FLSINS
            If this variable is non-zero, the process is blocked, and the
            contents are an instruction the Scheduler can execute to determine
            if the awaited condition has come to pass.  The instruction should
            skip if the process no longer needs to wait.  See also USTP.

    LSWPR
            This variable addresses a linked list of actions that must be taken
            in order to clean up properly if this process is PCLSRed.  Common
            cleanup actions are releasing locks held by this process and
            decrementing counters incremented by this process.  An important
            instance of such a cleanup action is ensuring that the USTP
            variable of another process is properly decremented.
                                       *DRAFT*   DSK:BAWDEN;PCLSR 200   Page 18

    PIRQC
    IFPIR
            These two variables contain bits that indicate pending software
            interrupts.

    RPCL
            This variable is used to keep track of which processes are trying
            to RPCLSR which other processes.  If process A is trying to RPCLSR
            process B, then process A's RPCL will contain the address of
            process B in its right half and zero in its left half.  Process B's
            RPCL will contain the address of process A in its right half and
            minus one in its left half.

    SUEXIT
            If the process is currently executing a system call in Exec mode,
            this variable contains the instruction that will be used to return
            to User mode -- usually an instruction that restores the PC from
            this process's SUUOH.  If there is an attempt in progress to PCLSR
            this process, this variable will instead contain a jump to a
            special routine.  This variable could actually be replaced with a
            flag that just indicates a pending PCLSR attempt.

    SUUOH
            If the process is currently executing a system call in Exec mode,
            this variable contains the User mode PC that will be returned to
            when the call completes.  See also UUOACS.

    UNAME
    JNAME
            These two variables contain the two SIXBIT names of the process.
            These variables actually have nothing to do with PCLSRing; they are
            here to serve as examples of "typical" variables.

    UPC
            When a process is not actually running, this is where the Scheduler
            stores its PC.  This might be a User mode or an Exec mode PC,
            depending on what the process was doing when the Scheduler took the
            processor away from it.  See also AC0S.

    USTP
            If this variable is non-zero, the process is stopped.  The left
            half contains various bits corresponding to specific reasons why
            the process should not run.  The right half contains a count of the
            number of other processes that are keeping this process from
            running.  See also FLSINS.


                                       *DRAFT*   DSK:BAWDEN;PCLSR 200   Page 19

    USWST
            This variable contains various bits controlling page swapping.
            Only one of these bits, %SWPCL, concerns us in this paper.  %SWPCL
            is set if any of this process's pages were swapped out while it was
            blocked in Exec mode.  When the process becomes runnable again, the
            Scheduler will see this bit and PCLSR the process instead of
            letting it continue in Exec mode.

    UUOACS
            While a process is executing a system call in Exec mode, its User
            mode accumulators are stored in this block of sixteen words.  (On a
            KL or KS the User mode accumulators might instead be in a different
            hardware AC block, but we ignore that possibility in this paper.)
            See also SUUOH.
  