multipart-mixed

Software Over Time

NOTE: I’m really skipping around here. Let’s get away from business principles for a bit and focus on software.

In school, the lifetime of your programs is often days, weeks, maybe months at the very most. You’re given an assignment, some input data, and expectations about what the output should be. Write the program, turn it in, and now that program is dead.

That’s not how it works in industry. We’ve got a saying, “bad code lives forever.” In reality, it seems like most code lives forever, good or bad. The lifetime of a program is at least measured in years, sometimes decades. This one factor, time, has tremendous implications for how you write and maintain programs.

Convergence vs. Divergence

Let’s first look at time and how it interacts with the scope of a product. Most problems you face in school, computer science or otherwise, require looking at a problem and converging on a solution. Thus the field of possibilities starts broad and narrows over time.

In industry the opposite happens. The product tends to start with a small core—the nugget of a good idea—and over time the field of possibilities diverges in any number of directions.

Obviously the product can’t go in every possible direction—though some companies try—but the field of where it could go opens up almost infinitely.

Design Over Time

How do you design for divergent possibilities? In the worst case, you guess wrong and wind up having to rip up a bunch of code and do it over. I’ve been in situations where I designed a perfect object-oriented hierarchy to model a problem domain—one such hierarchy modeled physical objects in an industrial robot—only to have new types of physical objects added several years later that blew my “perfect” model out of the water.

Fact is, and we’ll see this pattern repeated over and over, textbook solutions for today may be completely wrong for tomorrow. Does that mean today’s solution is wrong? Not necessarily. Just be prepared for the world to change—it has a nasty habit of doing that—and you may need to redo some work.

However, there are many design approaches that seek to minimize your rework. For example, the Model/View/Controller (MVC) paradigm disentagles business logic from presentation logic. Programmers tend to vastly underestimate the amount of time needed to get the presentation part right, and also underestimate the number of business logic changes required over time, so MVC is a widely used paradigm for good reason.

Code Size

As you build a product, its code size will grow vastly beyond anything you’ve worked on in school. Code bases are measured in KLOC — thousands of lines of code. Projects can easily range into the hundreds of KLOC. This scale requires modular and layered construction so programmers can understand what goes where, and who’s working on what. The lines are rarely black-and-white and require constant judgement calls by senior programmers.

As with MVC, there are approaches that seek to handle code size problems more gracefully than “the one huge build.” For example, Service-Oriented Architecture (SOA) breaks up separate functions into processes that communicate through web services. Then processes can be written by separate teams and evolve (somewhat) independently of each other.

Scaling

In school your data sets were probably small—large enough that you could measure the difference between bubble sort and quicksort, small enough that the CS department didn’t complain about disk space. It’s estimated that the total amount of digital information in the world doubles every 18 months. Now multiply that by the time spans we’ve been discussing. I don’t care how fast your processor is, by the time that CS101 bubble sort gets done with a petabyte of numbers, the sun will have burned out.

This is why the Big-O stuff your CS101 professor drilled into you is important. Data sets get vastly huge over time, and you need to know how the algorithms you’re writing today will handle the data sets of tomorrow. Further consider how algorithms combine; let’s say you have one function that’s O(n), and you figure that’s good enough. But it turns out some other function is iterating over those same n elements and calling your function each time. Now you’ve multiplied one O(n) by another O(n) and you’ve got a O(n^2) problem on your hands.

Many times you will be using library functions to do common tasks like maintaining lists or queues. Be sure to understand how these scale. The Standard Template Library in C++, despite all its weirdness, is notably good about specifying the complexity for every operation on every data structure.

Correctness

When you’re programming, you may determine a program is correct by careful code review and observation with a debugger. However, that only tells you the program is correct right now. Over time there are many monkeys that can throw a wrench into your otherwise perfect program.

First of all, rarely does a program live in isolation. Say it gets data from a network connection; the format of that data could change. Say it depends on a module in another part of the system; the behavior of that module could change. Or say some coworker just monkeys with the program and screws it up.

How do you verify that a program is correct and stays correct? The best answer is automated testing. Rather than tell people how your program behaves, either via documentation or yelling over a cubicle wall, it’s far better to write tests that document in code how the program should behave. Then you can validate it at any time. We’ll discuss automated testing in more detail later.