Computers in Spaceflight: The NASA Experience
 
- Chapter Five -
- From Sequencers to Computers: Exploring the Moon and the Inner Planets -
 
- The Star Computer -
 
 
 
Researching the Reliability Problem
 
[149] In 1961 a Lithuanian-born computer scientist named Algirdas Avizienis, employed at UCLA, began research on a highly fault-tolerant computer system for use on long-duration space missions. The nonprogrammable version of the Central Computer and Sequencer would soon make its first flight on Mariner Venus 1962. Even at that early date, JPL expected to use computers on board the "Grand Tour" spacecraft planned for the 1970s. A favorable alignment of the outer planets would make possible a mission that could fly by Jupiter, Saturn, Uranus, and Neptune, thus having encounters with all the gas giants in one sweep. Such a mission would have to last for years, with the spacecraft operating autonomously for long periods of time. Inconvenient speed-of-light communications delays in the exploration of the inner planets would become crippling in an outer planets mission, requiring a spacecraft to carry its own "brain," because the earthbound brains of its makers would be hours away in an emergency.
 
Avizienis' chief interest was in computer reliability. Computer failures occurred much more frequently then than in today's world of ICs. A computer entrusted with the successful completion of a deep space mission could not afford to fail before or during its long awaited encounter, so JPL and Avizienis' interests came together at just the right time. During the period from 1961 to 1965, the Laboratory sponsored his search for a more fault-tolerant computer. In 1965 the reliability scheme was settled and construction of a prototype began. The breadboard version first ran a program in March of 1969, after a 2-year effort at software development38. Avizienis named the computer STAR, for self testing and repair, and the name gives a clue to the architecture. JPL's Flight Computers and Sequencers Section of the Guidance and Navigation Division paid for the work. Avizienis was responsible for the concept; David A. Rennels, later a colleague at UCLA, for the hardware; John A. Rohr, for the software. F.P. Mathur did the reliability calculations, and the MIT Instrumentation Laboratory developed the read-only memory, which was basically a core rope type of memory39.
 
[150] Avizienis used selective redundancy to achieve reliability. On the Space Shuttle, the on-board computers are complete redundant versions of each other and are considered multiple computers. In the STAR, the computer is considered a single entity with its separate components replicated. Thus, each subsystem of STAR had several duplicate versions of itself in the computer as spares. The key advantage is that the spares were unpowered as long as the primary component ran successfully. Only when there was a failure would the spare come to life, and then power to the failed component would be cut off. Thus, the total power consumption of the STAR equaled, but did not exceed, that of a similar computer without the spares, making it attractive to power-conscious spacecraft designers40. In the 1960s, all spacecraft computers were simplex systems. The only ultrareliable system was the Launch Vehicle Digital Computer used on the Saturn IB and Saturn V boosters. Its reliability was achieved by using triple modular redundant (TMR) circuits such as those in the Common Section of the Skylab computer system. Avizienis evaluated TMR circuitry and found that the number of independent failures a TMR system could tolerate before failing was much smaller than a component-redundant computer such as STAR could tolerate41. Also, reliability theoretically increased through dormancy42. Mean-time-between-failure (MTBF) figures for a component begin when the component is turned on; thus, a subsystem with a MTBF of 1,000 hours, backed up with two identical spares, yields a MTBF of 3,000 hours. That was the theory behind STAR.
 
Avizienis reasoned that failures were either caused by transient conditions or permanent component failures. In order to check for transient faults, STAR would repeat the program segment in which a fault was first detected. If the fault repeated itself, the affected component would be turned off and its spare activated, with the program segment repeated again. All fault detection was by hardware techniques, with error-correcting codes included in the software43. Potentially, STAR could be an "automatic repairman" for the entire spacecraft, if other spacecraft systems used the same concepts44.
 
 
Applications for STAR
 
 
In 1969, JPL began designing a Thermoelectric Outer Planet Spacecraft, or TOPS. In previous inner planet probes, the flight paths were close enough to the sun to enable the spacecraft to use solar cells for power generation. Outer planet missions ranged so far from the sun that solar cells would be inadequate. TOPS would carry radioisotope thermoelectric generators to provide electrical power.
 
 

[
151]
 
Figure 5-3.
 
Figure 5-3. The STAR computer configuration. (From Avizienis, "Design Methods for Fault-Tolerant Navigation Computers" JPL TR-32-1409)
 

[152] Box 5-2: STAR Architecture and Software
 
STAR was a fixed-point machine with a 32-bit word. Using separated components for redundancy meant that they had to be connected on a bus, which had 4-bit bytes as the basic transfer block45. There were 16K words of read-only memory, which Avizienis said consisted of a "braid" of transformers and wires46. Since MIT built the device, the description almost certainly indicates that it was a core rope similar to that used in the Apollo Guidance Computer (AGC). The basic version used two copies of 4K of random-access memory, with up to 12 units attachable. Avizienis forsaw that the memory would have to be reprogrammed in flight on a mission like Grand Tour, so provision was made for that function47.
 
Use of a large word size was not to increase arithmetic power as much as to provide space for error-checking codes. A STAR address consisted of a 16-bit field for the address and a 4-bit check field. The address would be multiplied by 15 (yielding 20 bits), and then stored or transmitted along the bus during an operation. At the receiving end, the address would be evaluated according to the following equation:
 
C(a) = 15-15Ia
 
where 15Ia is the modulo residue of a. Numeric data were handled similarly; the 28-bit operands multiplied by 15 to get a 32-bit word. If the result of the check operation was zero residue, the data or address was correct. If not, STAR issued a fault signal48.
 
STAR had three control signals. One was the common 1-megahertz CLOCK signal. RESET indicated a return to a standard initial state. SYNC signaled the beginning of a new 10-step instruction cycle49. If a fault was detected, the computer would return to the last SYNC point and begin executing instructions from there. If the 10 instructions after the SYNC were executed successfully, STAR sent a new SYNC signal.
 
STAR's read/write memory units were different in that they would recognize either their hard-wired name or an assigned name50. In this way, if a memory unit and its backup copy failed, another memory unit could be assigned its name, loaded with the appropriate data, and then act like the original memory unit, thus avoiding the necessity of changing all the addresses in the software. When an instruction appeared on the memory in (MI) bus, the memory unit that had that address put its contents on the memory out (MO) bus, and the Arithmetic Processor or other component loaded it in for processing51.
 
[153] The heart of the STAR was the Testing and Repair Processor, or TARP. Whereas the other components of STAR had either one or two unpowered spares, the TARP had three active versions and two inactive spares. Functions of the TARP were to maintain the rollback points to which the software returned after a failure detection, to diagnose failures, and to check itself. Each time an error check was made, TARP's three units would vote. If all three or two of three indicated a failure, then the TARP issued an unconditional transfer to the rollback point. In the case of a 2-to-1 vote, the dissenting unit was considered failed, and was shut off as a spare was activated52. Another TARP disagreement caused the last spare to be activated. On the third TARP failure, one of the previously shut down units would be reactivated, so that there were always three TARPs in action at any given time. Avizienis thought that since most failures would be transients, it would be safe to reactivate a unit. After all, if it disagreed again, it would be shutdown.
 
John A. Rohr's software group did not begin work until 1967. An assembler, loader, and simulator were developed on a UNIVAC 1108 mainframe computer owned by JPL53. Software was all done in assembler, with a rich set of 180 single address instructions54. The assembler did allow some types of higher level statements, mostly for arithmetic. For instance, COMP Y=Y + 1 was directly compiled into the several machine instructions necessary for execution55. In this way, some of the tedium associated with assembly language programming was avoided. A floating-point subroutine to extend the calculating power of the machine was planned, but there is no evidence it was ever Implemented56. It would have had to have been done in software. The STARlet, a limited breadboard version, ran its first program on March 24, 196957. The full system, save the timing processor, was on the bread-board by April 197858.


 

STAR was considered as the on-board computer for TOPS59. A control computer subsystem for the TOPS would use STAR technology, the full 32-bit word, but just 4K of read-only memory and 8K of the read/write memory60. The chief physical obstacle to using STAR on a spacecraft was size. The breadboard version filled 100 cubic feet. Avizienis wanted to reduce it to 2 cubic feet and 50 watts61. By 1971, the requirements reduced to 1 cubic foot, 40 pounds weight, and 40 watts power62. Even though progress was made in this area, STAR never flew on a spacecraft. Components built to STAR specifications found their way into the NASA Standard Spacecraft Computer 1 (NSSC-1), used in earth orbital operations, but the concept of selective redundancy was not incorporated into flight computers to the extent desired by Avizienis.
 
STAR did not find its way to the outer planets for two reasons. [154] One was budget cuts63. Even though the Voyagers were launched in the late 1970s, the original TOPS program and the Grand Tour were canceled due to budget constraints. The fact that Voyager 2 is essentially executing the Grand Tour is a bonus. On-board computers used on Voyager developed from a different line. So, even though Avizienis designed a Super-STAR with a microprogrammable processor using large-scale integration technology, which seemed certain to fulfill the requirements of size, power, and weight, he never sold it to JPL64. A second reason STAR never flew was that engineers were concerned that the STAR's TARP and its failure switches were a weak point. The concept of a TARP, as with TMR, is always limited by the question of "who tests the tester?"65. The actual switches entrusted with powering down a failed component and charging up another are the weakest link in the system. At one point, JPL subcontracted to the Stanford Research Institute for work on a magnetic switch, but apparently the results were not satisfying66.
 
The STAR research program was not a waste even though the computer itself did not fly. It contributed to the development of new, reliable electronic components, such as those used on NSSC-1. It also provided a contrast to the development track being taken on the Mariner and Viking Orbiter spacecraft. One engineer involved in Viking Orbiter computer development said that STAR-type hardware was considered but deemed too complex. He thought that a two machine system turning in parallel would be simpler and as reliable for a Mars orbiter/lander67. Even though the technology of computers was not ready for STAR, it remains an innovative design and one of the few computer research projects funded by NASA. The principles developed remained valid for possible future applications that JPL was about to begin.
 
By far the most direct and far-reaching contribution of the STAR program to the future of JPL projects was John Rohr's work on the assembler/linker/loader for the software. It was the basis for the command sequence translators used through the present. Though extensively reworked and redesigned, the fundamental concepts were established by Rohr during the STAR development68.


link to previous pagelink to indexlink to next page