Dsky · Volume 12

DSKY — Volume 12 — 1202: The Alarms That Almost Stopped Apollo 11

Twelve minutes from the Moon, the computer cried for help — and the design answered

About This Volume

This is the dramatic peak of the series. Everything the previous eleven volumes built toward — the rope memory, the verb-noun grammar, the priority-scheduled executive, the restart protection — was, on the afternoon of 20 July 1969, put to the only test that ever truly mattered. For about twelve minutes, a one-cubic-foot computer with less memory than a modern greeting card flew two men toward the surface of another world, and partway down it began throwing alarms that none of the people aboard could decode from memory.

This volume reconstructs those minutes closely. It is the story of the 1202 and 1201 program alarms: what the DSKY was actually trying to say, why it was saying it, why the computer did not crash, and how a 26-year-old guidance officer and a 24-year-old backroom specialist — armed with a hand-written cheat sheet — kept the first lunar landing alive.

The piloting of the final descent — Armstrong flying past a boulder field, the fuel running low — is developed in Volume 13. Here we stay centered on the alarms, because the alarms are where the AGC’s whole philosophy either held or failed. It held.

Figure 1 — Lunar Module Eagle in flight, photographed from Columbia over the Moon during Apollo 11. Photo: File:Earth, Moon and Lunar Module, AS11-44-6643.jpg by NASA / Apollo 11. License: Public d…
Figure 1 — Lunar Module Eagle in flight, photographed from Columbia over the Moon during Apollo 11. Photo: File:Earth, Moon and Lunar Module, AS11-44-6643.jpg by NASA / Apollo 11. License: Public domain. Via Wikimedia Commons.

The Scene: Powered Descent

At a little after 4:05 p.m. Eastern, the Lunar Module Eagle was falling toward the Sea of Tranquility, engine pointed forward and slightly down, riding the long parabola of powered descent. Neil Armstrong, mission commander, stood at the left; Buzz Aldrin, lunar module pilot, at the right. There were no seats — the men flew standing, tethered, faces close to the small triangular windows and to the two display-and-keyboard panels that connected them to the Apollo Guidance Computer.

The descent ran as a sequence of major programs. P63, the braking phase, had lit the descent engine at “powered descent initiation” some 50,000 feet above the surface and was burning off the enormous horizontal velocity left over from orbit. It would hand off to P64, the approach phase, when Eagle pitched upright and the landing point appeared in Armstrong’s window under the guidance computer’s pointing aid, the Landing Point Designator. The plan was for the AGC to fly almost the entire descent automatically, with Armstrong taking over only at the very end.

On Aldrin’s side, the DSKY was running a verb that the crew had rehearsed hundreds of times in the simulators: Verb 16 Noun 68, a continuously updating monitor of altitude, altitude rate, and time, the numbers rolling over every second. Aldrin called them out. Armstrong watched the world come up to meet him. Everything, for the first three minutes, was nominal.

Figure 2 — Buzz Aldrin inside the Lunar Module during Apollo 11; the DSKY and the LM control panels are the crew's window into the guidance computer. Photo: File:AS11-36-5396 — Astronaut Edwin E. A…
Figure 2 — Buzz Aldrin inside the Lunar Module during Apollo 11; the DSKY and the LM control panels are the crew's window into the guidance computer. Photo: File:AS11-36-5396 — Astronaut Edwin E. Aldrin inside the Lunar Module — NARA — 16683256.jpg. License: Public domain. Via Wikimedia Commons.

The Alarm: “It’s a 1202”

About five minutes into the burn, with Eagle somewhere around 33,000 feet, the DSKY did something it was not supposed to do during a landing. The PROG light — program alarm — lit up, and the displays flashed. Aldrin keyed the computer to read out the trouble code, and Armstrong’s voice came down to Houston, terse and tighter than usual:

“Program alarm.” A pause. “It’s a 1202.”

And then, a moment later, the request that betrayed exactly how thin the margin was: “Give us a reading on the 1202 program alarm.”

Neither astronaut knew the number cold. Why would they? The AGC defined dozens of alarm codes, most of which would never appear in flight, and “1202” was not on any list a pilot was expected to memorize. The DSKY had just told the two most carefully trained human beings on Earth that something was wrong, in a vocabulary they could not instantly translate, with the lunar surface rushing up beneath them and roughly six minutes of usable engine time in the tanks.

This is the moment the entire series has been pointing at. A program alarm during the most unforgiving phase of the most scrutinized flight in history. Armstrong later said his heart rate told the truth even when his voice did not. The question that hung in the cabin and across the loops in Houston was brutally simple: do we keep going, or do we abort? An abort meant firing the ascent engine, blowing the descent stage away, and climbing back to a rendezvous with Michael Collins in Columbia — failure, but survivable failure. Pressing on with a computer that was complaining might mean no computer at all at the instant they needed it most.

The decision had to be made in seconds. It was.

What 1201 and 1202 Actually Meant

To understand the call Houston made, you have to understand what the DSKY was reporting — and here the previous volumes pay off.

The AGC did not run one program at a time. As Volume 10 described, its Executive juggled many jobs at once on a priority basis, and its lower-level Waitlist handled short time-driven tasks. To run a job, the Executive needed a scratchpad: a block of erasable memory called a core set, which held the job’s registers and bookkeeping. A job that needed to do vector and matrix arithmetic — most of the guidance and navigation jobs did — also needed a second, larger scratchpad called a VAC area (for “vector accumulator”). The computer had a fixed, small number of each: in the landing software, on the order of seven or eight core sets and five VAC areas. That was all the simultaneity the hardware could afford.

The two alarm codes meant precisely this:

  • 1202 — “no core sets.” A job tried to start, and the Executive had no free core set to give it. Executive overflow.
  • 1201 — “no VAC areas.” A job that needed vector arithmetic tried to start, and there was no free VAC area. The same overflow, one rung deeper.

Both are the same underlying condition stated two ways: the computer was being asked to start more work than it had room to hold at once. It was not a hardware fault. Nothing was broken. The AGC was simply over-subscribed — more jobs were piling into the queue, second after second, than it could schedule in the time available. The pool of scratchpads filled, the next request found none free, and the software raised its hand and reported the overflow rather than silently corrupting itself.

That distinction — raised its hand rather than silently corrupted — is the whole ballgame, and we will come back to it.

Figure 3 — A flight-type Apollo display and keyboard (DSKY). The PROG light at upper left is the lamp that flashed the program alarms during Eagle's descent. Photo: File:Apollo display and keyboard…
Figure 3 — A flight-type Apollo display and keyboard (DSKY). The PROG light at upper left is the lamp that flashed the program alarms during Eagle's descent. Photo: File:Apollo display and keyboard unit (DSKY) used on F-8 DFBW DVIDS683588.jpg by NASA/Dennis Taylor. License: Public domain. Via Wikimedia Commons.

The Cause: A Radar That Should Have Been Quiet

Why was the computer over-subscribed? The answer is one of the great cautionary tales in the history of engineering, and it has almost nothing to do with the landing software itself.

Eagle carried two radars. The landing radar pointed down and fed the AGC the altitude and velocity it needed to touch down — that radar was supposed to be talking to the computer, and it was. The rendezvous radar pointed up and out; its job was to track the orbiting command module, and its data mattered only if the crew had to abort and climb back to Collins. During a normal descent, the rendezvous radar rode along strictly as insurance.

To have it ready in case of an abort, the crew left the rendezvous radar powered up, with its mode switch in a position that kept it slaved and energized rather than under computer control. And here a subtle hardware flaw bit hard. The rendezvous radar’s antenna position was encoded by synchros — angle sensors excited by an 800-hertz alternating-current reference. The radar’s 800 Hz supply and the computer’s own 800 Hz timing reference came from two different sources. The two were frequency-locked but not phase-locked: they ran at the same rate but drifted in phase by a small, random amount.

To the Coupling Data Units — the converters that turned those analog synchro angles into digital counts for the AGC — that random phase offset looked exactly like motion. The antenna was bolted still, but the electronics reported it dithering rapidly back and forth, and every apparent flicker of motion arrived at the computer as a counter-interrupt: a tiny, involuntary “cycle steal” in which the AGC paused whatever it was doing to tick a counter up or down.

The numbers are merciless. The spurious dithering generated on the order of 6,400 extra cycle-steals per second, and servicing them consumed roughly 13 percent of the AGC’s total processing capacity — capacity the descent had not budgeted for. The landing software had been sized with real margin, but not 13-percent-of-everything-gone margin, not on top of a descent that was already the busiest the computer would ever fly. As the guidance jobs stacked up faster than the throttled-down computer could clear them, the core-set pool filled, a new job found none free, and the DSKY lit the PROG light: 1202.

It is worth being exact about blame, because the story is often told as “Aldrin flipped the wrong switch.” The crew’s switch configuration was per the checklist — they did what the procedures told them. The deep fault was the unsynchronized 800 Hz power phasing, an interface flaw between two subsystems that had each been tested and signed off in isolation. The lesson the program drew was not “watch your switches.” It was “test the system, not just the boxes.”

Why It Did Not Crash

Here is the thing most retellings rush past, and it is the most important thing in this volume: the 1202 was not a failure. It was the system working.

A naïvely written real-time computer, handed more work than it can do, fails in one of two ugly ways. It can lock up, spinning forever on a queue it can never drain. Or it can quietly corrupt itself, overwriting one job’s scratchpad with another’s, flying the spacecraft on numbers that are subtly, fatally wrong. Either way the pilot may not learn there is a problem until the vehicle is already lost.

The AGC did neither, because the people at the MIT Instrumentation Laboratory had designed for exactly this contingency, as Volumes 10 and 11 detailed. Two features carried the day.

Priority scheduling. The Executive ran jobs in strict priority order. The guidance equations that kept the engine pointed correctly, the throttle servo, the routines that drove the crew’s displays — these were the highest-priority work in the machine. The low-priority work, including some of the housekeeping the spurious radar counts were tangled up with, sat at the bottom. When the computer ran short, the important jobs still got the machine; the unimportant ones waited.

Restart protection. When the Executive detected overflow — no core set, no VAC area — it did not stagger on in a corrupted state. It triggered a software restart: a fast, deliberate reboot of the running software, designed to take a fraction of a second. The restart logic, built on the “restart groups” and protected register design covered in Volume 11, knew where each critical computation had reached a safe checkpoint. On restart, the computer threw away the backlog of half-started, piled-up jobs, cleared the scratchpad pools, and resumed the high-priority guidance from its last good checkpoint. The work that mattered picked up almost exactly where it left off. The work that did not matter was simply gone, and good riddance.

To Armstrong and Aldrin the visible symptom was a flashing PROG light and, for a heartbeat, the DSKY’s numbers freezing or blanking before they resumed updating. The descent guidance never actually stopped steering the ship. The engine kept pointing where it should. The landing radar kept feeding altitude. The computer was, in effect, periodically dropping its least important chores, brushing itself off, and getting back to the only jobs that could not be allowed to fail. Across the descent it did this five times — three 1202s and two 1201s — and each time it recovered in well under a second.

The DSKY, in other words, was not announcing a disaster. It was announcing that the disaster had just been averted, automatically, again. The hard part was that nobody in Eagle or, for a few seconds, in Houston, knew that yet.

Mission Control: The Go on the Alarm

Figure 4 — The Mission Operations Control Room in Houston during Apollo 11. The guidance officer's console — where Steve Bales made the call — sat in the "Trench," the front row of consoles. Photo:…
Figure 4 — The Mission Operations Control Room in Houston during Apollo 11. The guidance officer's console — where Steve Bales made the call — sat in the "Trench," the front row of consoles. Photo: File:Mission Operations Control Room at the conclusion of Apollo 11.jpg by NASA. License: Public domain. Via Wikimedia Commons.

In Houston the alarm landed on the console of the guidance officer — call sign GUIDO — a 26-year-old named Steve Bales, sitting in the front row of consoles known as the Trench. Bales was responsible for the health of the guidance and navigation system, and the abort/no-abort recommendation for a computer problem was his to make. When Armstrong’s “1202” came down, the decision clock started, and it was short.

Bales did not have the alarm meanings memorized either — but he was not relying on memory. Behind him, in a back support room, sat Jack Garman, a 24-year-old AGC specialist who knew the computer’s software cold. And Garman had a piece of paper.

The paper existed because of a near-disaster in training. Weeks earlier, in a landing simulation, the team had been hit with program alarms, had not understood them quickly enough, and had called an abort — the wrong call, as the post-sim review showed; those particular alarms had been survivable. Stung, the flight controllers and the support team did their homework. Garman drew up, by hand, a list of every program alarm that could plausibly appear during descent and what to do about each one. Bales kept his copy under the glass on his console. Garman had his right in front of him.

So when “1202” came over the loop, Garman did not hesitate. He found it on his list. The key insight he carried — the thing that turned a heart-stopping number into a manageable one — was this: an executive-overflow alarm was okay to continue on, as long as it was not continuous. A single 1202, followed by a clean restart and resumed guidance, meant the computer had hiccuped and recovered. A steady stream of them, with no recovery between, would mean the machine was genuinely saturated and the guidance might no longer be trustworthy. The test was not “did an alarm happen” but “is the computer still recovering.”

Garman called it to Bales: it’s okay, we’re go, as long as it doesn’t recur continuously. Bales, trusting the man and the homework, made the GUIDO call up the chain to the flight director, Gene Kranz: “We’re go on that alarm.” Kranz, holding the whole landing in his head, passed it on, and the capsule communicator — Charlie Duke, an astronaut himself and the one voice cleared to talk to the crew — keyed his mic and sent up the words Eagle was waiting for:

“Roger. We got — we’re go on that alarm.”

The whole loop, from Armstrong’s callout to Duke’s reassurance, ran in a matter of seconds. Then the 1201 came, and then more 1202s, and each time the answer was faster and steadier, because now everyone understood the pattern: alarm, restart, recover, keep flying. We’re go on that alarm. It became almost a refrain. The computer kept asking; Houston kept answering; Eagle kept descending.

The Landing

With the alarms triaged, the descent went on — and promptly handed Armstrong a second problem that belongs to Volume 13 but cannot go unmentioned here. As the program advanced from P64, the approach phase, the automatic guidance was steering Eagle toward a crater rimmed with car-sized boulders. Armstrong did not like the look of his landing site. He took semi-manual control, flying in P66, the rate-of-descent mode in which the computer held the throttle and altitude rate while the commander steered the spacecraft’s path by hand. He flew long, past the boulder field, hunting for smooth ground, the fuel gauges falling the whole way.

He found it. Eagle settled onto the Sea of Tranquility with famously little propellant remaining — the call from Houston was that they were down to seconds of margin — and Armstrong’s voice, calm again, made it official: “Houston, Tranquility Base here. The Eagle has landed.” Through the entire final descent, including the manual flyover, the guidance computer that had thrown five alarms on the way down never once stopped doing its job.

Figure 5 — The lunar horizon over the Sea of Tranquility, photographed from Tranquility Base after Eagle landed. Photo: File:AS11-37-5497 — Lunar horizon from the Sea of Tranquility — NARA — 166835…
Figure 5 — The lunar horizon over the Sea of Tranquility, photographed from Tranquility Base after Eagle landed. Photo: File:AS11-37-5497 — Lunar horizon from the Sea of Tranquility — NARA — 16683567.jpg. License: Public domain. Via Wikimedia Commons.

Aftermath: The Textbook Case

The radar problem was understood and fixed; later landings phased the power correctly, and the alarms never recurred in flight. Don Eyles and the other young MIT programmers who had written the landing software — and who spent the descent listening, white-knuckled, to a machine doing precisely what they had built it to do — got to watch their restart logic save the mission in real time. Eyles would later call the whole experience a “controlled panic,” which is about right.

A few weeks after the landing, when the crew and the flight team were honored, Steve Bales was chosen to receive a presidential award on behalf of the entire mission operations team — the young man who, given seconds to decide whether a complaining computer should land on the Moon, had said go. He always insisted the honor belonged to everyone behind him, Garman most of all, and to the design itself. Jack Garman went on to a long career in NASA software and is remembered, fairly, as the man whose hand-written cheat sheet helped save the first lunar landing.

But the deepest legacy is not a medal. It is a principle. For decades since, the 1202 alarm has been the textbook case study in robust real-time software — taught to engineers building flight computers, medical devices, and any system where a machine must keep doing the one thing that cannot fail even when it is handed more than it can do. The right behavior under overload, the AGC proved, is not to crash and not to lie. It is to know your priorities, shed what you can afford to lose, tell the humans clearly what is happening, and keep flying. A one-cubic-foot computer with rope memory demonstrated that, twelve minutes from the surface of the Moon, in front of the largest audience in human history, and it is still the right answer today.

The DSKY had flashed a number nobody aboard could read. The number, in the end, meant: I have got this.

Next — Volume 13: Landing on the Moon — The Descent Programs.