The Midnight Search
You are three months into a rocket project. You type “rocket avionics teardown” at midnight and find a YouTube channel called Le labo de Michel, where old avionics, missile electronics, Thales flight units, and pitot-static transducers are opened, traced, powered up, and tested. One video shows a sensor being probed in real time. Another walks through a circuit board trace by trace. It is not entertainment. It is documentation.
You find several rocketryforum threads. One builder writes: “I hear to ground test, ground test, ground test over and over again.” Another, in a different thread about dual deployment: “Once you demonstrate vigorous satisfactory ejection, you can be pretty confident in actual flight.” Dozens of builders, across dozens of threads, converge on the same point: the people who fly are the people who test.
You look at your CAD model. You look at your avionics bench. And you realize the thing you have not designed yet – not budgeted for, not scheduled for, not staffed for – is the test system. Not a test. A system.
The thesis is simple: your first system is the test system.
A Launch Is a Terrible First Integration Test
On June 4, 1996, the first Ariane 5 stood on the pad in Kourou. Approximately thirty-seven seconds after the start of the main engine ignition sequence – roughly thirty seconds after liftoff – the launcher and payload were lost.
The failure originated in the inertial reference system’s alignment routine. The code had been carried over from Ariane 4, where it served a real purpose. On Ariane 5 it served none – the trajectory was different, the alignment window was different. But the code ran anyway because it was “known to work.” At T+36.7 seconds it produced an operand error. Both inertial reference units shut down. The flight computer, now blind, interpreted diagnostic bit patterns as flight data, commanded full nozzle deflection, and the vehicle tore apart sideways at Mach 0.9.
That alignment routine had been unit-tested. It had flown on Ariane 4. But it had never been tested in a system-level configuration that included the actual Ariane 5 trajectory. The flight was the integration test. The ESA Inquiry Board found that including the system in integrated simulations “would have been technically feasible” and “the failure could have been detected.”
Twenty-three years later, on December 20, 2019, Boeing’s Starliner launched on OFT-1. It separated from the Atlas V and began flying the wrong mission timeline. Its mission elapsed timer was off by roughly eleven hours, after polling the Atlas V time reference at the wrong countdown phase. Every maneuver after separation was based on a clock that was wrong by half a day.
A full end-to-end mission timeline test using flight-configured hardware, flight software, and the actual mission sequence would have caught this the moment the spacecraft’s state vector diverged. That test was never run. The flight was – again – the integration test. The unplanned second flight cost Boeing an estimated $410 million.
Two programs, two decades apart, same root cause: the launch vehicle was used as the test fixture. These are not organizations that skip testing. They test more than almost anyone. And still – when the wrong test is cut, the physics does not care how many you ran. A launch vehicle is a catastrophic place to discover integration behavior for the first time.
The Cost of Finding Bugs Late
There is a curve that every test organization learns eventually. Find a bug at the component level: one unit of cost. At subsystem integration: ten. At system-level testing: one hundred. In flight: one thousand. After a field failure – with investigation, stand-down, public scrutiny – ten thousand.
This is not unique to aerospace. It is the same logic behind the FDA’s design verification requirements and the automotive industry’s investment in HIL rigs. The physics of the curve does not scale down. For a weekend rocket, flight failure costs a rocket. For an orbital vehicle, it costs a program.
What changes is the denominator.
The Rocket Is a Vehicle. The Test System Is the Memory.
A rocket is consumed by its own operation. The airframe burns up, crashes into the ocean, or drifts into a graveyard orbit. Even if you recover hardware from salt water, you do not get the flight back. The flight happened once. Its data is the only record.
This means your test data and your decision records are the only engineering artifacts that survive the project. The CAD models, the analysis reports, the material certs – that is input. The test data is the output. It is the only thing that tells you whether any of the inputs were correct.
Your test system is more important than your flight vehicle. The flight vehicle is a single data point. The test system generates the thousands of data points that justify building it. Lose the vehicle and keep the test data, and you can figure out what happened. Lose the test data and keep the vehicle, and you have a machine you cannot trust.
The practical implication: the test system deserves its own architecture, budget, schedule, and design reviews. It is not an accessory to the flight program. It is the flight program’s memory.
Consider the CRS-7 investigation. SpaceX had high-speed video from multiple cameras, hundreds of telemetry channels, and the flight computer’s final state vectors. Still, the initial failure signature was ambiguous. It took weeks to cross-reference video timeline against telemetry timeline and converge on a leading failure scenario involving a failed strut inside the second-stage LOX tank. If any link in that telemetry chain had been misconfigured – a sensor range too narrow, a data channel sampled too slowly – the anomaly might never have been definitively explained. End-to-end data path testing is what turns physical events into engineering records. Without it, a flight failure produces an unrecoverable mystery.
There is also the physical artifact. In his videos, the teardowns feel less like entertainment than a working archive: open the case, trace the board, power it up, test, take notes. Hardware reveals things that specifications, simulations, and datasheets cannot.
Redundancy Is Not a Checkbox
“Redundancy” sounds like rigor. You add a second battery, a second computer, a second charge. Checkbox checked. But redundancy without a failure hypothesis and a tested switchover mechanism is not protection – it is the duplication of unknown failure modes.
Five questions for every redundant element:
First: What specific failure mode does this protect against? Not “sensor failure” – a category. A specific mode: “the pressure transducer output saturates high due to an open bridge circuit.”
Second: How is that failure detected? What is the detection threshold, the latency, the false-positive rate?
Third: What is the switchover mechanism, and has it been tested under flight-representative conditions?
Fourth: What is the common mode? Two sensors on the same bracket share a vibration mode. Two computers from the same batch share a latent defect. The Challenger SRBs had primary and secondary O-rings – but both were the same material at the same temperature compromised by the same mechanism. The redundancy was nominal, not functional.
Fifth: Has this exact configuration – both paths, the switchover logic, the fault injection – been tested end-to-end in a single run?
Many builders would discover that their confidence in redundant altimeters is based more on the design pattern than on test evidence. The answers to questions three through five – tested switchover? tested common mode? tested end-to-end? – would often be no. And design patterns, as Challenger demonstrated, are only as good as their validation.
Simulation Is Not Reality. But Reality Without Data Is Just Anecdote.
Some say “we simulated it.” Others say “just test it.” Both are incomplete.
Challenger’s O-ring blow-by was simulated. The simulation said it was fine – because the model did not capture cold-temperature resilience loss in the elastomer. Only the 1977 test data told the truth. Simulation models what you know to model. Testing observes what you did not think to model. You need both.
Hardware-in-the-loop testing combines the two: real hardware, simulated environment, structured data – at a fraction of the cost of full physical testing. Every integration bug found during HIL is a bug not found in flight.
The V-model formalizes this: every requirement on the left side gets a corresponding verification on the right, from unit through system. The model is clear, widely documented – and repeatedly abandoned under schedule pressure. Ariane 5 skipped system-level trajectory testing. Starliner skipped the end-to-end mission timeline. In both cases the V-model was known. The verification step was not in the plan.
The Minimum Evidence Stack
You cannot test everything. The question is: test the right things – know what each test answers, know what happens if it is missing – and build an evidence pack that survives the project.
| Evidence Artifact | What It Answers | What Happens If Missing |
|---|---|---|
| Requirement verification matrix | Does every requirement trace to a test? | Untestable requirements become unverifiable safety claims. |
| Risk and hazard log | What failure modes are credible and what mitigations exist? | Risks that are never identified are never tested. |
| Configuration-under-test record | What exactly was tested – firmware, hardware, calibration, parameters? | When something fails later, you cannot reproduce the test conditions. |
| Test procedure with hold/abort criteria | What is the sequence, and when should the test stop? | A test that never aborts generates useless data – the sensor was drifting, the condition was invalid, but nobody stopped it. |
| Instrumentation and calibration record | Were the sensors measuring the right things, within tolerance? | The data says “the temperature was 22 degrees” but the sensor had drifted 3 degrees and nobody knew. |
| Pass/fail criteria per requirement | What constitutes success and what constitutes failure? | ”The test looked good” is not engineering. It is a memory. |
| Anomaly log | What unexpected behavior occurred, and how was it resolved? | Anomalies that are not recorded become mysteries that repeat. |
| Decision record | Who reviewed the evidence and authorized the next step? | When the reviewer leaves, the rationale leaves with them. |
Each artifact answers a specific question. If the question is not answered, the answer is not “probably fine.” It is “unknown.”
Flight Readiness Is a Review of Evidence, Not Confidence
Confidence is a feeling. A Flight Readiness Review is a structured walkthrough of test evidence. The NASA Systems Engineering Handbook describes it: a review of test results that demonstrate readiness – A Flight Readiness Review is not a vote on confidence. It is a structured review of evidence: tests, demonstrations, analyses, audits, anomalies, waivers, and readiness constraints..
Challenger shows what happens when readiness becomes a communication failure. The Rogers Commission found that engineers had evidence and objections about O-ring behavior at low temperature, but the decision chain did not preserve those concerns as launch constraints. The key people making the final call did not have the full history of the O-ring problem in front of them. The test data existed. It was not invisible. But the process that should have carried it into the go/no-go decision broke down.
The FDA’s QMSR provides a parallel: design verification requires test evidence showing that outputs meet inputs. “We analyzed it” is not accepted. “We tested it and here is the data” is. The principle applies across all safety-critical industries.
An FRR that works is methodical. For each flight-critical function: test performed, conditions, data, pass/fail criterion. Waived test: written rationale. Open anomaly: risk assessment. Go/No-Go is a logical consequence of the evidence, not a separate judgment.
The Real Question After a Test
You run a test. The hardware survives. The data looks clean. Someone exhales and says “it worked.” The test is declared a success. You move on.
A test that only confirms what you already believed was not a test. It was a ceremony. The most valuable tests produce surprises – a failure mode on the ground, where it costs time, not a vehicle.
The real question after a test is not “Did it work?” It is “What did we learn that changes the next decision?” An AJ-26 engine failed during ground testing at Stennis in May 2014. Five months later, the Orb-3 vehicle was destroyed by a turbopump bearing failure on ascent after 15 seconds. The NASA investigation identified three credible root causes and noted an insufficient acceptance test program for the engine. The earlier test stand anomaly had not been fully incorporated into the flight risk assessment.
You will hear, in the rocketry community and in professional aerospace, a persistent temptation to skip testing because “the analysis shows it will work.” The history of launch vehicle failures is a history of analysis that was rigorous, internally consistent, peer-reviewed – and wrong. Wrong because the boundary conditions differed from flight. Wrong because a material property was not what the spec sheet said. Wrong because physics is larger than anyone’s model of physics.
Richard Feynman wrote, in Appendix F to the Rogers Commission: “For a successful technology, reality must take precedence over public relations, for nature cannot be fooled.”
The test system is the only part of your program that tells you the truth – and the only part that survives to tell it to the next team. Build it before you need it. Build it with the same rigor as the flight vehicle. Instrument it to answer specific questions. Run it to failure. Learn what it teaches.
The rocket will fly, or it won’t. The test data will remain.
Confidence is for spectators. Evidence is for engineers.
Sources: ESA Ariane 501 Inquiry Board Report (Lions et al., 1996); NASA/Boeing OFT-1 Post-Flight Assessment (2020); Rogers Commission Report on the Challenger Accident (1986); NASA CRS-7 Independent Review Team Summary (2015); NASA Orb-3 Accident Investigation Executive Summary (2015); NASA Systems Engineering Handbook (NASA/SP-2016-6105 Rev 2); Feynman Appendix F; rocketryforum.com public posts; Le labo de Michel (YouTube).
Why test infrastructure must be the first system in mission-critical engineering – using rocketry as the extreme case.