multipart-mixed

Quality is Job 0.99999

This is intended to introduce a chapter on quality. There’s a myriad of topics here; this post is merely setting the stage.

I take it as a given that any product should be a high-quality product. Why bother making it otherwise? This seems like a straightforward goal, but the technology industry has had a hell of a time figuring out how to build quality software. There’s a joke that’s been around for at least a decade:

There’s word in business circles that the computer industry likes to measure itself against the Big Three auto-makers. The comparison goes this way: If automotive technology had kept pace with Silicon Valley, motorists could buy a V-32 engine that goes 10,000 m.p.h. or a 30-pound car that gets 1,000 miles to the gallon—either one at a sticker price of less than $50. Detroit’s response: “OK. But who would want a car that crashes twice a day?”

Any good joke has an element of truth to it—this one especially so. Auto makers can build cars that’ll drive hundreds of thousands of miles, but Windows 95 would crash after 49.7 days of continuous operation—a bug that took four years to discover 1 because other bugs would crash Windows 95 long before 49.7 days could pass.

Complexity

Despite the opinion of some managers, the primary reason for quality problems is not, in fact, stupid programmers. The reason is complexity. Computer systems are the most complex machines that humans build—airplanes, by comparison, are stupid simple in their principles and construction.

If you start thinking about the principles and construction of a computer, you need to start with electricity, then circuits, digital logic, gates, adders, and so on. In my 2007 vintage laptop, the CPU alone has 291 million transistors. The graphics processor has another 289 million transistors. Two chips, over half a billion transistors.

And that’s just the electronics. Then you get into firmware, drivers, scheduler, memory manager, file systems, and so on. By the time I can print “hello world” to a terminal window I’ve far exceeded the computing power used to put a man on the moon. This stuff isn’t rocket science, it’s way more complex than rocket science.

Abstraction

Given the level of complexity inherent in computing, how does the beleaguered programmer get anything done? Only by hiding complexity behind layers of abstraction. When I run “hello world,” is my program actually setting pixels on the display? No, it’s calling printf(), which puts characters on standard output, which my terminal program puts in a window, and the windowing toolkit decides what font to use and how to scroll the display, and somewhere fifteen steps later pixels start changing. It may sound inefficient, but as a programmer I either need printf() or else I need years to print “hello world” to the screen.

This layering and hiding of complexity—better known as abstraction—is the only thing that keeps us productive and sane. However, abstraction is a tricky thing to get right. Some things are straightforward; for example an abstract file needs operations for open, close, read, write, seek, and that’s about it. What about an abstract database? There are many abstractions for dealing with databases; the most common, SQL, has an ISO specification that’s 3777 pages long.

Provability

In a perfect world, abstractions would be very clearly defined and you could write an inductive proof of every function in each layer. In the book Structure and Interpretation of Computer Programs the authors masterfully construct layers of abstraction that can, indeed, be proven correct without ever running them on a computer; you can treat the functions like math and simply prove them on paper.

Our world is not perfect. Even provably correct functions, if they do anything useful, will interact with other functions. Can you guarantee that those functions are also correct? And the functions that they call? No way. Step all the way back and consider the whole system—you can never guarantee with 100% certainty that the system will work. Regardless of who deserves the blame when the system fails, it’s still your responsibility to make it work.

The Cost of Quality

Technology companies, however much they want, can’t just tell the programmers, “make a system that works.” The problems behind quality are myriad, some in the programmers’ control and some not. Either way, it’s really hard to build a reliable system. That hard translates to a lot of time, effort, and skill. And that translates to a lot of money.

When a company wants to build a quality product, the response needs to be: how much are you willing to pay for it? Frankly, most tech companies would rather get a product of dubious quality out the door—thus start making money on the product rather than spending money on it—and try to fix the quality problems afterwards.

Problem is, it’s awful hard to fix quality after the fact.

Comments

So sad, but true :(

Post a comment