Subject: KLH10 info 
From: Ken Harrenstien <klh@us.oracle.com>
To: its-lovers@mc.lcs.mit.edu, tops-20@wsmr-simtel20.army.mil
Cc: klh@us.oracle.com
Date: 25 Jul 1992 05:52

OK, here are some more details about the KLH10.  I've been a bit
overwhelmed by the many private responses and hope this answers most
of the questions.


First, the FAQs I've been getting:

    * "When/How can I get a copy?"
	Ouch, probably not until after Interop (Oct).  It kills me to say
	this, but my time *is* limited.  (The usual bribes might help :-)
	RMS has suggested distributing it via GNU/FSF, and I like that idea.

    * "Will it run <my favorite PDP-10 OS>?"
	Anything based on the KS10 should run with minor mods.
	The 166, KA and KI will require moderate I/O instruction changes.
	The KL10/20 (extended addressing, DTE20, etc) would require
	major surgery -- without anesthetic.

    * "How fast is it?"
	That depends mostly on the host platform.  Using a 33 MHz Sparc-2,
	very roughly 0.2 MIPS, perhaps 50% of a KA.
	More about this farther on.	

    * "Why can't I connect to NX?"
	Oracle uses firewalls to prevent full Internet access to their
	internal networks.  We are trying to set up an ITS turist machine
	but it needs to be done very carefully.  Wait for an announcement.

    * "Thanks!"
	You're welcome!!

The rest of this message talks about various aspects at more length:
specifics of the target machine & devices, the emulator, and various
other thoughts.  It is a bit long for one message, but I figure it's
better to get it all out of the way.  My apologies to any uninterested
people.

Summary:
-------
	Target machine: KS10 with MIT ITS microcode
	Target config: 512K memory, RH11/RP06, RH11/TM03, FE console,
			ACC LH/DH & IMP
	Current speed:  ~0.2 MIPS
	Current host: 33 MHz Sparcstation-2 with internal disk, SunOS 4.1.1
	Compiler: GCC 2.1
	Misc: Sources .5MB, runtime memory 4.3MB, disk lotsaMB

About the Target Machine:
------------------------

	The decisions I made while coding the emulator all boil down
	to a simple desire to get something useful and interesting up
	as quickly as possible.  Among other things this meant using
	an ITS-microcode KS10 as the target machine, because:


	[1] I don't have official access to TOPS-20 binaries or
	sources, and didn't want to worry about licensing hassles.
	Ditto TOPS-10.   I also am lacking the voluminous
	documentation about installation and operating procedures.


	[2] The most recent ITS binaries and sources are all archived
	online (thanks to Alan), and all are for the KS10 version.
	All documentation is likewise online, and in any case I have
	a somewhat more intimate understanding of ITS.


	[3] The KS10 has a simpler architecture than the KL10, which
	allows an emulator to run much faster.  If not for [2] I might
	have started with the KA10 for this reason.

It is important to realize that the KLH10 is emulating the ITS
microcode, which is slightly different from that for TOPS-10 or
TOPS-20.  The primary differences have to do with the pager and the
absence of the extended-instruction set (the noteworthy ITS features
such as one-proceed, JPC, and CIRC are of course supported).

In order to bring up TOPS-10 or TOPS-20, minor changes would be
needed to the pager code (somewhat more for TOPS-20, owing to its
more complex design).  The extended string instructions are ugly but
straightforward for anybody with a strong stomach (and weak mind?).


Everything else is fully supported, including the double-precision
floating point instructions.  I took pains to verify that all results
and PC flags are identical to the real hardware, even for wildly
unnormalized operands.  This was *not* one of the fun parts.

With respect to a KA or KI version, the I/O instructions would also
need to be modified and a number of things changed in the APR and PI
system, but not a great deal.  Bringing up TENEX should be relatively
easy given a complete description of the BBN pager.  WAITS, however,
would be up to a true SAILor.

I know many people are going to wish for a KL10 they can run a
full-grown TOPS-20 on.  Of course this is theoretically possible, but
it is not going to be easy to get something that runs at a useful
speed without a superfast host.  Remember the KL10 is an oversexed
mutant with these strange bulging growths oozing out of random body
parts, all of which have to be duplicated no matter how bizarre.
Just dealing with extended addressing is going to slow down the basic
instructions as well as requiring much more pager overhead; this is
one area where the hardware parallelism is hard to beat.  The
additional device cruft (DTE20s, meters, address breaks, etc) doesn't
help.  Personally I won't try it without some concrete incentives.


P.S.  Just for grins I included the KA10 and PDP-6 arithmetic ops,
although I don't have a reference machine for verification (ha).  Who
knows, maybe a 340 will come along!  Spacewar, yeah!

		
I/O Devices:
-----------

	A great deal of the work consisted not of emulating the 10 but
rather emulating a basic set of peripherals.  The KS10's use of Unibus
devices made it somewhat more painful than the old I/O device scheme,
because instead of a small matrix of devices and I/O instruction
opcodes, there's a long list of bus addresses to check, each of which
has completely arbitrary meanings.  Not to mention the Unibus adapters
with their individual Unibus page maps, all of which are emulated as
well.

DSK: Currently one RH11/RP06 is supported as a virtual disk.  The
actual implementation uses a standard Unix disk file (not a raw disk)
to hold the "RP06" contents; this is so all blocks that haven't been
written will simply not exist (also known as a holey file), thus
taking up much less space than a full raw disk would.  Formatting the
disk is obviously unnecessary; sector headers are not written or
read.  Errors are pretty much limited to those caused by garbage
written into the registers, so the interface is a bit simpler than
the real thing.  It's possible that an OS which can't leave well
enough alone will insist on using some weird maintenance bits, in
which case the device code will need work.  Mods for multiple drives
or other RH-controlled drive types are trivial.


MTA: Similar considerations apply to the RH11/TM03 magtape interface.
>From the "FE" (front-end controller interface) one can mount or
unmount "tapes" that consist of either binary tape images or virtual
tapes built on the fly from lists of unix files.  Hooking up a real
magtape drive would probably have been easier, but loading virtual
dump tapes into the filesystem is very fast, actually -- much faster
than the real devices would be.  In fact I found one race condition
in the ITS bootstrap loader that required slowing down disk I/O until
I was able to reassemble a fixed version.


NET: The network interface is an emulation of the ACC LH/DH that some
MIT machines used to have, as well as a virtual IMP.  Putting this
one together was a major project in network hacking, not to mention
deceit.   Using Sun's NIT (Network Interface Tap) and various
trickery, I was able to set up NX with its own IP address,
independent of its platform's address and thus permitting me to run
ITS without interfering with the other network stuff I'm doing on my
workstation.   For efficiency, the virtual IMP is actually a "Simple
IMP" that doesn't bother sending RFNMs, and the virtual LH/DH does
I/O in PDP-10 byte order (not Unibus order) -- this all required
changes to the ITS IMP driver.  For a while I considered munging the
packets within the virtual IMP to pretend that the local net was the
ARPANET, but finally decided it was better to fix ITS itself, and did
so; ITS can now be configured for non-ARPA subnets.  Geez, I never
thought after I did ITS TCP/IP that I'd be hacking the code again ten
years later!


TTY: The FE console TTY "interface" is emulated, tho the 8080 FE
commands aren't -- no need.  There is also a dummy DZ11 driver that
merely reads and writes registers without doing anything.  This (and
an equally empty Chaosnet driver) was needed because that's what the
last AI ITS binary image was configured for, and I had to get that up
before I could reassemble a new system (oh the joys of
bootstrapping).  The DZ11 won't be hard to finish off, but I'm not
sure it's worthwhile; it's a horribly inefficient device and it's
much faster to telnet in over the network.  If realio trulio
serial-line I/O is needed, I'd recommend going for a DH11 if the OS
supports it.  (It isn't as if it still costs three times as much as a
DZ :-)


About the emulator:
-----------------

	The KLH10 is written in C for a vanilla Unix OS, largely so
	it can be readily ported to other platforms; in particular,
	those of the future as well as those of today.  Although
	re-coding critical sections in assembler would readily double
	or triple the basic instruction speed, it is much easier to
	simply recompile it on a faster machine.   Although I use the
	GCC compiler for its efficiency, I don't use any of its
	non-standard language constructs, again for portability.


The most fundamental design decision had to do with the method of
representing 36-bit words on a 32-bit architecture.  I wound up coding
around a halfword-based scheme, with each 18-bit PDP-10 halfword
right-justified in a 32-bit host word; thus each PDP-10 word uses 8
octets.  The same format is used for memory, ACs, and disk storage;
basically I traded off space for speed.

On a machine with a word size larger than 36, or a C compiler that
supported an equivalent integer type (such as double-word ints), many
things become easier, and I'd want to re-do a fair amount of the
arithmetic code.  Not because the current version won't work -- it
will
-- but because it won't be as efficient.  I decided early on that the
differences between optimal code on a 32-bit and +36-bit machine were
just too great to easily combine with the primitive tools available in
C.

The arithmetic emulation relies only on well-defined native integer
operations; native floating point is never used, both because it is
non-portable (formats vary) and because it is very difficult to
precisely emulate the PDP-10's behavior without actually carrying out
the same internal operations "by hand".  This is by far the slowest
part of the KLH10.  I checked it out by compiling it on a real 20
(using KCC, of course!) and running it for hours under a test program
that generated various operand bit patterns and compared the results
with those for real instructions.

The current implementation is being used as a testbed for threads, and
includes code for both a non-threaded and threaded version.  (I
figured, what better test of software parallelism than to emulate
hardware parallelism?) This means that running on a true
multi-processor platform could produce somewhat better performance,
although the main APR execution thread is still serial and will impose
an upper bound on the speedup.  Let's not talk about pipelining just
now.

To clear up one thing that has caused confusion: the emulator is
*not* running standalone on my Sparc.  Except for the net device it
uses the usual Unix system calls and burns 100% of the CPU, running
flat out.  I don't really notice it since the amount of CPU I use
most of the time is typically a negligible fraction of the Sparc's
capability; for example, I'm writing this note in one window while
ITS runs in the other, and everything's cool.


The NEED for SPEED:
------------------

	In the summary I mentioned a speed of 0.2 MIPS, or about 5
	usec per instruction.  This is probably the number people are
	most avidly interested in, but at the same time it's also one
	of the hardest to measure.  PDP-10 speeds have always been
	tricky, mainly because it all depends on the precise mix of
	instructions and operands.  Now it's even fuzzier because so
	much of the KLH10 is variable.


Of course, the host platform speed is an overriding factor.  But I've
seen that even minor changes in the main loop can produce noticeable
differences in response time, as can the exact nature of memory
references.  For example, the SPARC is a register-window machine and
it's faster to pass arguments to functions than to set and read global
variables; it's also faster to use C structure members than globals!
Another machine could have precisely the opposite behavior.  I should
add here that the compiler is also tremendously important; using
GCC results in code that is 40% faster than Sun's CC!  Yow!

Anyway, I haven't really buckled down and tried to tune or measure its
performance, so take the 0.2 MIPS loosely.  My back-of-the-envelope
calculations before I started suggested that an assembler-coded SPARC
implementation could achieve a KA performance level; the current C
version appears to get perhaps half that speed.  Assembling the entire
ITS OS takes about 26 minutes of real time, but I don't know if anyone
remembers how long it took on a real KS10, much less a KA10.

The Sparc is already a slowpoke compared with some of the new
workstations and chipsets coming out, so it's just a matter of time
before someone cracks the KS10 speed limit on reasonably cheap
personal hardware.  I can't help but wonder if might be worth paying
serious attention to a high-speed version, one that would represent a
cost-effective solution for those places still committed to PDP-10
software.  Systems Concepts probably doesn't need to worry -- yet!

Future improvements:
-------------------

	The obvious one is speed.  Then more machine variants, more
devices.  More instrumentation & profiling.  A nicely packaged
distribution with your choice of OS and initial filesystem included.
The usual embellishments.  Whatever people suggest.

There's also the possibility of evolving a new virtual machine, one
which realizes more closely whatever the ideal PDP-10 is conceived to
be.  (That itself is an interesting question.) For example, the real
KS10 ITS microcode does not actually support JPC, which was a feature
of Holloway's MAC-10 pager; the KLH10 does it trivially and in that
sense is more than a simple KS10.  Its pager could support 2048K as
easily as 512K.  And so on, all of which implies reconfiguring a real
OS to run on an unreal machine.  The first step down this road has
already been taken with NX ITS, assembled to use a non-existent net
interface.


And I've always thought it would be lovely to have a window displaying
a KA-style console panel with a full bank of lights and clickable
switches.  I know the KS10 never had one, but that doesn't mean it has
to stay that way, and besides, the KL10 cheated with a 11/40 panel, so
it's down to either the KA or KI.  Of course it's just for fun, what
else is this all about?

Final thoughts:
--------------

	I was going to close with the last paragraph, but realized
	there's one more thing I want to say.  It's just that, um,
	well, this will sound silly, but it feels so... weird? ...
	eerie?   ... just plain literally *mind-blowing* to watch
	this system boot up and run happily, utterly unaware that
	it's not on a real machine, or that anything odd happened
	since the last time it ran, or that its earthly incarnation
	of a noisy roomful of huge cabinets and washing machines is
	now entirely self-contained within a small innocuous pizza
	box holding up my ITS manuals.  Do systems have wathans?
	I've gotten a bit more used to it, but every now and then I
	still sit back, realize once again what the hell is going on,
	and hold on to something while the chills pass.  I didn't
	expect this at all.  A side effect of being imprinted at a
	tender young age, or something...


--Ken