You know something’s wrong when a Mars spacecraft lands without any software glitches or crashes, while at the same time here on Earth a company’s financial trading system fails disastrously costing nearly half-a-billion dollars. I expected the opposite, didn’t you? These two separate events demonstrate the brutally obvious outcomes of an organization’s level of commitment to quality. And the differences couldn’t be more stark.
The Mars landing (a.k.a. the 7 minutes of terror) required about 500,000 lines of code to execute with perfect timing, slowing Curiosity’s descent from 13,000 miles per hour to less than three miles per hour in a hovering maneuver to touch down on the surface of Mars. This system’s execution flawlessly triggered 76 explosive charges without damaging the sensitive electronics of Curiosity’s scientific lab equipment.
Alternatively, the high-frequency low-latency trading software used by Knight Capital Group might involve exponentially more code when you consider all of the end-to-end components that comprise the system’s architecture. Knight’s software is responsible for routing trades across the various exchanges with extremely quick response times. There are typically no explosive charges in Knight’s trading system; that is if you leave out the $440 million charge to rescue the company from a financial crash.
With similar levels of system complexity both of these organizations had a common goal: avoiding a crash. But the difference between them is their attitude about quality. Some think that quality is just a characteristic that must be technically validated. But there aren’t just knobs to twist, buttons to press and formulas to compute for quality. Quality is not just a technical aspect of software. Quality is the result of human attention to detail and critical thinking about the outcomes and impacts of our actions. An attitude that supports high quality comes from people who are engaged in and supported in the pursuit of excellence in their work.
The commitment to quality software starts with the attitudes of the people at the very top, through investment in resources and support for culture that rewards the individual’s behavior to prioritize and ensure quality outcomes. There’s an expectation that if individuals are given time and respect to deliver high quality, they will be committed to doing so. As an example, with regard to the engineers working on the Curiosity program NASA Administrator Charles Bolden was quoted as saying: "This is an amazing achievement, made possible by a team of scientists and engineers from around the world and led by the extraordinary men and women of NASA and our Jet Propulsion Laboratory." Contrast that with this statement from Knight’s CEO Thomas Joyce: "You cannot keep people from doing stupid things…that is what happens when you have a culture of risk." Joyce’s company lost nearly 70% of its value in 2 days.
The “culture of risk” Joyce refers to is about taking financial risks for gain, which includes hedging against poor quality financial decisions. This attitude of risk-taking is counter-productive to quality. It can permeate the culture of a company even when it comes to decisions about technology. You can see this in the form of extreme cost-cutting on tools and expertise, a lack of training and professional development and also in the prioritization of project completion over quality. When it comes to technology systems the most effective approach to hedging risk is to test thoroughly everything involved with the system and to not simply rationalize impending failure.
As evidence to their attitude towards quality, the engineers at NASA know how to manage technical risk.
UPDATE: this article was re-posted in modified form on the STPCON website as “Quality Risk Takers: NASA vs. Wall Street” (which is totally cool!)