Rockwell PPS-8

Rockwell PPS-8
General information
Launched	1974; 51 years ago
Common manufacturer	Rockwell International;
Performance
Max. CPU clock rate	256 kHz
Data width	8
Address width	14
Physical specifications
Package	42-pin quad in-line package;

The Rockwell PPS-8, short for "Parallel Processing System, 8-bit", was an early 8-bit microprocessor from Rockwell International, announced in 1974. It had a number of unique design features, which Adam Osborne described as "most unusual... more powerful... also one of the most difficult to understand."^[1] It was released with a suite of support chips, including ROM and RAM, parallel and serial controllers, and a direct memory access (DMA) system.

The release of simpler and less expensive designs like the MOS 6502 around the same time led Rockwell to pull the design from the market without entering widespread production. National Semiconductor had a cross-licensing arrangement with Rockwell, but they did not produce the PPS-8. The simpler Rockwell PPS-4 did not suffer the same fate, finding a number of roles in low-end systems and being produced into the 1980s.

Description

Physical construction

The PPS-8 was built on a metal gate process, compared to the contemporary Intel 8008 and similar designs which were based on the more advanced silicon gate PMOS logic process. PMOS logic required large amounts of power; the PPS-8 ran on a -17 VDC power supply and also needed separate -12V, +5V and ground.^[1] The circuitry dissipated so much power that the chip could not generate a strong enough clock signal internally, and the clock had to be an external chip in its own TO-100 package.^[2] Like the PPS-4, the clock was based on a standard NTSC timing crystal as these were widely available. Inside the CPU, the clock's two-phase output, A and B', was used to build a four-phase internal clock running at four times the external clock rate. For instance, the normal 250 kHz clock became 1 MHz inside the CPU.^[3]

Like the PPS-4, the PPS-8 was packaged in a 42-pin quad in-line package. The system had a 14-pin address bus, allowing it to address up to 16 kB of main memory. This was normally used in conjunction with the read inhibit (RIH) and write inhibit (WIO) lines to address two banks of memory, up to 16 kB of ROM containing a program, and up to 16 kB of RAM for data storage.^[4] In an era when memory was very expensive and machines often used 2 to 4 kB of ROM and even less RAM, the limited address space of the PPS-8 was not a significant limitation. The CPU treated the two banks differently, one was for ROM containing program code, while the other was for RAM and contained data. The shared data bus was 8-bit wide, allowing it to read one instruction or word of data in a single cycle.^[5]

Registers and memory

The PPS-8 is an accumulator-based design with only one general purpose 8-bit processor register, A. A second 8-bit register, W, was used to buffer data for some of the accumulator instructions, but could be used for general storage otherwise. It also had three 8-bit "data counters", X, Y and Z. X and Z mostly acted as index registers, but could only access "data memory", whereas Y was a secondary accumulator and also a buffer (like W) for the X register. The 16-bit L, for Link, register was used as an index register to program memory (ROM), whereas X, Y and Z pointed to data memory (RAM). There was also a 14-bit program counter and 5-bit stack pointer.^[4]^[6]

Referring to a location in the data memory required a 14-bit address to be constructed with a 7-bit "page number" in the Z register and another 7-bit "byte number" in X. If bit 7 of the X register was 0, then the Z register value was ignored. This was used as a form of short addressing, which was more commonly seen in other processors as a "zero page" or "base page", the idea being that only a single byte was needed to specify and address and thus save memory in the program code and the time needed to load an extra byte in the instruction.^[6] A single index register is a limitation, and additional addressing could be arranged using the L and Y registers. One operation swapped the values in X and Y, allowing Y to be used as a direct backup for X. A second allowed the values in Z and X to both be copied into L, allowing L to be used as a buffer for a complete 14-bit address.^[7]

Additionally, the value in the X register could be automatically incremented or decremented by certain branch instructions. This improved the speed of loops as a single instruction would increment or decrement the value, test to see if it crossed zero (incremented from 0xFF to 0x00, or decremented from 0x00 to 0xFF), and if it had, skipped forward one or two bytes. To use this feature you would write a loop with one of these instructions at the bottom, followed by the address of the top of the loop. Normally it would not cross zero and the PC would be loaded with the following address. When it did cross, the address would be skipped over and the program would continue at the next instruction. Addresses could be 1 or 2 byte depending on whether the 7th bit of the first byte was a zero or one.^[7]

The 5-bit stack pointer is normally implemented in 32-bytes of RAM. Like most systems, the S register is updated when data is pushed or pulled.^[8] If S ever reaches 31, indicating the stack is full, the next instruction will be skipped, similar to the increment/decrement modes. This feature allows one to place a branch or return instruction just after the push or pop, and then place a branch into a handler for the stack full beyond that. Normally the execution would continue with the first branch, but in the case the stack fills alternate code can be called which might, for instance, write out the stack to memory and clear it to provide more room. While small by standards set by processors like the Zilog Z80 or even the MOS 6502, this size of stack is suitable for systems that generally used it only for subroutine calls into ROM or interrupt handlers.^[9]

Among its more curious features was that it only had two status bits, carry and intermediate carry. These were placed in bit 16 in L, which was otherwise unused due to the 14-bit addresses. Most processors of the era had additional status bits to indicate the outcome of comparisons, like whether the value in A is zero, but in the PPS-8 these were combined with the branch instructions so they did not have to be user-visible.^[10]

Instruction set

The instruction set in the PPS-8 was relatively simple in terms of the numbers and types of instructions, including the typical add and subtract, increment and decrement, logical operations for AND, OR and XOR, shifts and rotates. As was typical for the era, the PPS-8 included instructions for directly working with binary coded decimal (BCD) data. This is a numeric format that stores a decimal digit in 4-bits, with two digits per byte. BCD is generally slower to process than binary numbers due to the "decimal adjust" step that is not needed for binary data, but can more than make up for this because it is very easy to convert to and from ASCII values. Not only did the PPS-8 offer BCD addition and subtraction, it also had instructions that shifted (rolled) an 8-bit value by 4-bits, making it easier to extract individual digits. The system also had instructions to load values from memory to registers, save from registers to memory, and, somewhat uncommon, exchange them in a single operation. Note that it does not have compare operations, as these are combined with the branches into a single operation.^[11]

Although the instructions were fairly typical, there were a large number of addressing modes and thus a large number of opcodes. The load/store operations generally came in five versions, the base instructions, L(oad), S(tore) or X(change), which used the addresses referred to data memory pointed to in Z+X. Adding N to the assembler mnemonic post-incremented the X value after the instruction ran, D post-decremented instead, while NXL and DXL did the same but also then swapped the ZX value into L, and NXY which incremented X and then swapped it with Y. Finally, the NCX instruction incremented X and then compared it with Y as the decision of whether or not the branch was complete, rather than crossing zero, and DCX was the same with a decrement.^[12] Access to the program memory was only through the PC or the L register; when using L, bits 16 and 8 were ignored to produce the 14-bit value. With all of these modes, if the upper bit of the lower byte of the address was zero, the upper byte would be ignored and thus allowed a form of short addressing similar to a zero page.^[10]

Data pools

One of the unique features of the PPS-8 was the concept of "data pooling". This was similar to the concept of a zero page in that it allowed instructions to be written in fewer bytes, normally two instead of three, but took this one level further to produce instructions that required only a single byte. To do this, several "pools" of memory were set aside for special purposes.^[13]

The first pool, the first four bytes in memory, were used to point to the power-on routines. The next 60 bytes, making up the first half of page zero, contained the "command pool". These were single byte values that held the second or third bytes of multi-byte instructions. The idea was to allow those instructions that were widely used in the program to be placed in the command pool and then referred to using a single byte whose lower 6 bits encoded one of the locations in the pool.^[14]

The second half of page zero was used as the "literal pool". These could only be used with the instructions LXI, LYI, LZI/AISK and LAI/ANI. Each of these instructions was given its own set of 15 bytes, with a 16th left empty. The "I" in these instructions indicated that they used "immediate" addressing, meaning the operand was a literal value in the code. If that literal value appeared in many places, which is often the case for hardware addresses and loop counters, moving it to the literal pool would reduce them all by one byte. This could be combined with the command pool to reduce the remaining instruction to a single opcode as well, making these common instructions go from three bytes to one.^[15]

Finally, the first half of page one held the "subroutine entry pool", a list of addresses in program memory. These were stored with the high bytes in locations 0 to 31, and the low bytes in 32 to 63. Using these slots, branches and subroutine calls could also be reduced to a single byte opcode. The use of data pools and the increment-compare-and-branch instructions allowed the PPS-8 to build loop structures in very little code, often as few as three instructions, and by using the command pool, a single byte each. This made the system extremely efficient, but at the cost of being much more complex to program.^[16]

Bus interface

The control bus was at the simple end for processors of the era. The entire multi-chip system, including RAM and ROM, used only two control pins along with the clock signals. Normally the processor itself would alternate access to memory based on four-phase internal clock, with the address on the address bus referring to the data memory (RAM) or other devices on phase 2, and program memory (ROM) on phase 4.^[17] When accessing memory, the RIH pin signals whether memory should be read (low) or written (high). To talk to devices, the WIO pin is pulled high, signalling memory to disconnect and devices to listen for read and write instructions based on RIH.^[10]

There are three levels of interrupts, signalled by INT0, INT1 and INT2. The system supported vectored interrupts; when one of these pins is pulled high, the processor finishes the current instruction, swaps the PC onto L, and then calls the code located at an address stored in the first three slots in the subroutine entry pool. As these were in program memory, normally in ROM, generally these were dedicated to specific tasks. INT0 was the highest level, and normally used for a power failure signal which required immediate shutdown. INT1 was used for high-priority interrupts like the realtime clock, and INT2 was used for all devices. Once the system entered the interrupt, it pulled the ACK0 line high. The DMRA pin was used in conjunction with the DMAC device (see below), which pulled it high to indicate it wanted to take over the bus, and the processor then pulled it low again to indicate it was releasing the bus when it was ready. PO (power on) was the reset pin, which cleared the contents of all of the registers and began executing the instruction at address zero. SPO (synchronized power on) was then pulled low to cause the same to occur in any attached devices.^[18]

Support chips

Rockwell designed the PPS-8 to be used with a series of additional devices that used the same bus protocol and 42-pin layout. Among these was a 2 kB ROM, a 256 byte RAM, the GPIO, PDC, DMAC, SDC and some device controllers.^[1]

The GPIO, short for general purpose input/output, buffered data to and from external devices. The chip included a set of 12 input lines and 12 output lines which would be connected to external devices, grouped 4 lines each. Communications with the processor took place over the existing 8-bit data bus. "Instructions" we sent to the GPIO over the data bus using the upper four bits as a device number, 0 to 15, with the other four bits as an instruction code. The upper four bits are passed through to four "address strap" pins, which activates a single I/O device. This are used as device select lines, selecting which of the GPIOs or PDCs is being addressed. The lower four bits then read or write one set of I/O pins. There are no connections from the GPIO to the address bus, meaning that all transfers to memory must be mediated by another chip.^[19] The GPIO also passes through one clock signal on the A pin to allow external devices to match the timing of the system.^[20]

The PDC, short for parallel data controller, was essentially a GPIO re-arranged into two 8-bit ports, A and B, instead of three 4-bit ones. The chip includes more logic than the GPIO, which allows it to read or write one byte of data for each instruction from the CPU, or use the more advanced "handshake" mode in which it continues to read or write until a stop signal is received on the CA2 pin. Additionally, it offers an alternative way to move data to and from the chip using the pins for channel A connected to the data bus, copying the value read from the device on port B to the processor.^[21] DMA was turned on by the CPU writing a bit to the PDC's internal status register, after which an interrupt was generated whenever the B port had new byte of data, indicating the CPU or DMAC should copy that value.^[22]

The DMAC, for DMA controller, was the only I/O chip that had connections to the address bus, allowing it to access data memory as an equal partner with the CPU. The DMAC had internal registers to support up to eight external devices. Each device had an associated memory address and buffer length register, mapping the area of memory they would communicate with. When a device had data, it would pull its DMA line high (1 through 8), and the DMAC would then signal this to the CPU by pulling the DMRA line high. When the CPU completed its current operation it would pull the DMRA low again, and the DMAC was then free to take over the bus.^[23]

The DMAC does not actually transfer data to and from the devices, it simply listens for DMA requests and then uses another chip, typically the PDC, to handle the transfer. When the request is received, it uses the pin number to select the base address for that device from its registers, offsets that by a counter value, and expresses the result on the address bus. It then uses the DMA0 pin to signal the PDC to activate that device, reads or writes a value from channel B, and places the result on the data bus. It is the combination of signals from the DMAC to present an address and PDC to read or write on the data bus that completes the transfer. DMAC could operate in two basic modes, a one-off mode in which the chip reads data from a device when it receives a signal on one of the device DMA line and then uses cycle stealing to transmit it to memory and decrements the counter, or in a permanent mode that mapped an area of memory to a particular device, which operates in a similar fashion but then resets the buffer length and address to the original values when the transfer is complete.^[24]

Another chip in the family is the SDC, for serial data controller. In general terms this is a three-output, two-input UART that includes additional logic to allow it to trigger the DMAC to perform the transfer into memory. For instance, one could set up the DMAC with a base address and buffer length, set up the SDC to read from device 1 with a particular setup like 8 bit data with 1 start and stop bit, and then whenever the SDC has received a full byte it will signal the DMAC to transfer it. The SDC could be used synchronous or asynchronous, with 5 to 8 data bits, optional odd or even parity bits, and one or two stop bits.^[24] It was quite fast for the era, running in synchronous mode up to 250 kbit/s and async up to 18 kB.^[23]

On top of all of this, Rockwell also had a number of chips dedicated to specific devices, like a printer controller (PCC), telecommunication controller (TDI), and keyboard and display (GPKD).^[11]

Development

A PPS-8 assembler and emulator was hosted on Tymshare, GEnie and Rockwell's Time Sharing Option system. These were written in Fortran, and were also combined as a "assemulator". They also offered a number of pre-assembled evaluation systems.^[25]

Complexity

A number of design features of the PPS-8 made it difficult to work with, or as Osborne put it, "the most difficult to understand."^[1]

Among these was the split addressing system. The basic concept was not unusual at the time, processors such as the Signetics 2650 and EA9002 had similar features in which an address had to be expressed as multiple parts that produced a series of blocks or pages. But in these processors, memory was still treated in the same general fashion no matter what bank the address was in; instructions that could access an instruction or data in one page generally could do the same in any other.^[26] In contrast, the PPS-8 used entirely different addressing modes and registers for the two banks. Referring to a location in program memory used the PC and L registers as a single 14-bit value, whereas data was referred to using the separate X and Z registers, and sometimes Y. And while L was generally used to refer to locations in program memory, it was also used as a buffer for the X and Z into data. On top of all of this was the data pooling system which added further addressing modes.^[6]

Another complex feature of the PPS-8 was its bus system, which was designed for very high throughput. Using a combination of tight loops based on code in the data pools and the separate I/O bus formed by the DMAC, the system could process data at up to 250 k bytes per second, even running at a relatively slow system clock of 250 kHz. But this also required significant complexity to achieve. The Rockwell documentation shows a basic system layout containing the clock chip, CPU, DMAC, PDC, SDC, several device controllers, and a user-selected amount of RAM and ROM.^[27] In contrast, something like the two-chip Fairchild F8 included ROM and RAM, and three parallel ports that could also be used as serial lines or controllers like GPIO. This allowed it to build the same basic system as the PPS-8 using only two standard 40-pin DIPs.^[28]

Performance was good overall; Osborne lists a simple looping benchmark where the core of the loop requires only three bytes, allowing it to run much faster than the same program on other systems.^[16]

References

Citations

^ ^a ^b ^c ^d Osborne 1976, p. 8.1.
^ Soucek 1976, p. 381.
^ Soucek 1976, p. 385.
^ ^a ^b Soucek 1976, pp. 383–384.
^ Osborne 1976, p. 8.11.
^ ^a ^b ^c Osborne 1976, p. 8.4.
^ ^a ^b Osborne 1976, p. 8.6.
^ Soucek 1976, p. 407.
^ Soucek 1976, p. 408.
^ ^a ^b ^c Osborne 1976, p. 8.9.
^ ^a ^b Rockwell 1974, p. 2.1.
^ Osborne 1976, p. 8.8.
^ Osborne 1976, p. 8.24.
^ Rockwell 1974, p. 3.24.
^ Rockwell 1974, p. 3.25.
^ ^a ^b Osborne 1976, p. 8.48.
^ Soucek 1976, p. 383.
^ Osborne 1976, p. 8.10.
^ Osborne 1976, p. 8.17.
^ Osborne 1976, p. 8.18.
^ Osborne 1976, p. 8.19.
^ Osborne 1976, p. 8.20.
^ ^a ^b Osborne 1976, p. 8.21.
^ ^a ^b Osborne 1976, p. 8.22.
^ Rockwell 1974, p. 2.3.
^ Rowe, Jamieson (September 1976). "The Signetics 2650" (PDF). Electronics Australia.
^ Rockwell 1974, p. 1.1.
^ Osborne 1976, p. 2.1.

Bibliography

Introduction and Description Parallel Processing System (PPS-8) (PDF) (Technical report). Rockwell International. October 1974.
Soucek, Branko (1976). Microprocessors and Microcomputers. John Wiley and Sons.
Osborne, Adam (1976). An Introduction to Microcomputers, Volume II. Sybex.

[FOOTNOTEOsborne19768.1-1] Osborne 1976, p. 8.1.

[FOOTNOTESoucek1976381-2] Soucek 1976, p. 381.

[FOOTNOTESoucek1976385-3] Soucek 1976, p. 385.

[FOOTNOTESoucek1976383–384-4] Soucek 1976, pp. 383–384.

[FOOTNOTEOsborne19768.11-5] Osborne 1976, p. 8.11.

[FOOTNOTEOsborne19768.4-6] Osborne 1976, p. 8.4.

[FOOTNOTEOsborne19768.6-7] Osborne 1976, p. 8.6.

[FOOTNOTESoucek1976407-8] Soucek 1976, p. 407.

[FOOTNOTESoucek1976408-9] Soucek 1976, p. 408.

[FOOTNOTEOsborne19768.9-10] Osborne 1976, p. 8.9.

[FOOTNOTERockwell19742.1-11] Rockwell 1974, p. 2.1.

[FOOTNOTEOsborne19768.8-12] Osborne 1976, p. 8.8.

[FOOTNOTEOsborne19768.24-13] Osborne 1976, p. 8.24.

[FOOTNOTERockwell19743.24-14] Rockwell 1974, p. 3.24.

[FOOTNOTERockwell19743.25-15] Rockwell 1974, p. 3.25.

[FOOTNOTEOsborne19768.48-16] Osborne 1976, p. 8.48.

[FOOTNOTESoucek1976383-17] Soucek 1976, p. 383.

[FOOTNOTEOsborne19768.10-18] Osborne 1976, p. 8.10.

[FOOTNOTEOsborne19768.17-19] Osborne 1976, p. 8.17.

[FOOTNOTEOsborne19768.18-20] Osborne 1976, p. 8.18.

[FOOTNOTEOsborne19768.19-21] Osborne 1976, p. 8.19.

[FOOTNOTEOsborne19768.20-22] Osborne 1976, p. 8.20.

[FOOTNOTEOsborne19768.21-23] Osborne 1976, p. 8.21.

[FOOTNOTEOsborne19768.22-24] Osborne 1976, p. 8.22.

[FOOTNOTERockwell19742.3-25] Rockwell 1974, p. 2.3.

[26] Rowe, Jamieson (September 1976). "The Signetics 2650" (PDF). Electronics Australia.

[FOOTNOTERockwell19741.1-27] Rockwell 1974, p. 1.1.

[FOOTNOTEOsborne19762.1-28] Osborne 1976, p. 2.1.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]

[20]

[21]

[22]

[23]

[24]

[25]

[26]

[27]

[28]