multipart-mixed

Programming Languages

There are few topics that can get programmers riled up more than programming languages. Language flame-wars have been raging on Usenet since about the dawn of Usenet. They’ll continue burning until, as best I can tell, the end of time. Why? The programming language is your most-used tool (second only to your text editor) and it takes a long time to master, so it’s reasonable that people get attached to their languages.

In college you likely focused on one “do it all” language like Java. There’s nothing wrong with Java, in fact it’s a great language. The problem is the “do it all” part—no language does it all, at least not well. You simply cannot be productive and versatile without knowing a couple programming languages.

The High Road, The Low Road

Languages are characterized as “high level” or “low level” depending (very roughly) on how many machine instructions each line of code translates to. Or, said another way, how much distance is there between your code and what the machine is actually doing? As we’ll see later in the book, each line of C doesn’t translate to a of machine code. I consider it a low-level language. With Ruby, on the other end of the spectrum, it’s interpreted at runtime and runs on a virtual machine on top of your physical machine—it’s tremendously abstract and therefore high-level.

Low-level languages have their benefits. You get much more deterministic control of the machine: the number of instructions it’s running, the amount of memory it’s using, the ability to fiddle bits in hardware, and so forth. For this reason, most operating systems have been written in assembly language or C. However, that level of control comes at a price: you must manage memory yourself and you’ll crash the machine if you fiddle the wrong hardware bit. I consider these “sharp” tools: amazing good for surgical work, but you’re going to cut yourself unless you know exactly what you’re doing.

High-level languages have different benefits. In many of them, you don’t need to endlessly worry about memory leaks (unused objects are automatically reclaimed), you can do dynamic things like add methods to objects at runtime, and you have very powerful abstractions for shuffling data around. In general, your code more closely reflect your problem domain, and this allows you to reason much more clearly about problems. This level of abstraction, however, comes at its own price: the application tends to be much slower and memory-intensive than the same code written in a low-level language.

Efficiency

Many programmers, especially those that started on small computers, strive for efficient code—code that maximizes the use of the CPU and other resources. Now that most computers aren’t small, in fact most are staggeringly large by comparison to those of just ten years ago, we can look at a different kind of efficiency: programmer efficiency. Namely, how much value can the programmer deliver in a day? These days, computers are cheap but good programmers are not, so it makes sense to maximize the programmer’s efficiency and screw the computer. For this reason, high-level languages are becoming quite popular.

But again, consider the problem domain. If the problem is a web site or payroll system—i.e. something concerned with business logic or presentation—then a high-level language is a good match for the problem domain. If the problem is making a fast clustered storage system, then you need deterministic control of performance and resource use, so a low-level language is the tool for the job.

Design Intent of Languages

Not all programming languages were designed upfront for the uses they’ve been applied to. For example, Perl was designed to help system administrators automate tasks. It beat shell scripting, that’s for sure. In the 90’s people starting using it to build web-based applications because Perl was great at handling text. However, Perl was never designed for the complexity required by these applications, and most large Perl applications are a complete mess.

This mismatch of design intent is readily evident in Perl. For example, variables default to having global scope rather than local scope. This is great for writing small automation scripts. It’s terrible beyond belief when trying to build a large system. The result of using global scope by accident is often bizarre behavior in a completely different part of the system—something very difficult to debug.

Other languages were designed from the outset for building large systems. Java is a good example: it has all kinds of constructs for packaging, controlling the visibility of methods, various tools for abstraction and combination. It’s absolutely terrible for simple text-munging tasks (where Perl excels) but you can write very large systems in Java and not go completely nuts.

Idiom

idiom (noun): a characteristic mode of expression in music or art.

When you’re used to one language and you learn another, it’s tempting to apply the same style of coding from language A to language B. However, each language has its own idiomatic style—it’s own characteristic way of approaching problems. For example, some C code might look like:

for (i = 0; i < length; i++) {
  doStuff(array[i]);
}

You could approach Ruby in the same way:

# BAD BAD BAD
for (i = 0; i < length; i += 1) do
  do_stuff(array[i]);
end

However, idiomatic Ruby looks very different:

array.each { |a| do_stuff a }

To learn idiomatic use of a language, you need to read a lot of code in that language. Specifically, good code in that language. Not necessarily wizard-level code, just good solid code. For each problem you face, try to find “The [Ruby, Python, etc.] Way” of approaching the problem.

Idiomatic use of the language has several benefits. First, it’s probably a lot less code to write, and therefore the code will be more reliable. Second, it’ll be easier to read to other programmers familiar with the language. Third, it’s probably more efficient both from the programmer’s and computer’s perspective.

Today’s Languages

It’s worth mentioning some common languages as of this writing. This might give you ideas for what language to learn next.

  • C: the de-facto standard for operating systems, device drivers, and other code that needs (near-)absolute control over what the computer is doing. As I mentioned, this is a sharp tool and it requires considerable attention to detail. The well-rounded programmer needs to know C well.

  • C++: not just C with objects, C++ provides significant benefits for encapsulation (e.g. classes), error handling (exceptions), and more. If you understand what C++ is doing, it’s very useful for low-level stuff like device drivers and high-performance work. However, if you’re not very clear on what it’s doing, you’re going to be in a world of hurt. In my mind, C++ is the katana of programming languages: very sharp and very powerful, and you can chop your head off if you’re swinging it around without paying very careful attention to what you’re doing. C++ requires more study than any other language to understand and use well.

  • Java: originally Java was presented as the C++ katana without the sharpened blade. Many people (especially managers) took this to mean that mediocre programmers could use Java without killing themselves. Turns out mediocre programmers still write mediocre code, even in Java, and therefore Java has earned a reputation as being the language of mediocrity. I disagree. I’ve found Java and its libraries tremendously useful in some cases, and it’s a very well-designed language for what it was trying to accomplish.

  • The Java Virtual Machine: the JVM deserves its own entry. Originally intended to be the runtime environment for Java, the JVM is a semi-generic abstract machine that you can write other languages for. Scala, Clojure, JRuby, and others can all run on the JVM. Furthermore, they can interoperate with each other in very compelling ways—far easier and better than traditional C library interfaces.

  • C#: Microsoft’s Java that’s not Java. Yawn. Leave this one to the timecard-punchers.

  • Perl: don’t go here, really. I’ve written large Perl applications and a couple open-source modules available on CPAN, I would know. There are better languages.

  • PHP: like Perl, there are better ways to do everything that PHP does.

  • JavaScript: this is the language of the web browser, so if you’re doing web applications, you’ve got to learn it. However, I don’t think the language was very well designed, and it has several pitfalls. Note that JavaScript is not only for the browser, there are JS engines that can run from the command line (e.g. on a server), so it can be used for back-end work too. Regardless, if you’re not doing front-end work on web sites, skip it in favor of better languages.

  • Python: a very good high-level language. Very broadly useful, and used in a tremendous number of places in industry. Very solid choice for a high-level language to learn and use frequently.

  • Ruby: also a very good language, one of the very highest-level around. This is my personal favorite for its purity of design and its extremely dynamic nature. It’s also easy to read; well-written Ruby code is like reading natural language. On the downside, Ruby is the slowest and most memory-intensive language of the bunch.

  • Lua: a very small language designed to be embedded into other programs. We’ll discuss Lua further in just a moment.

  • Lisp, Scheme: very powerful languages with a tremendous history. Also probably the hardest to learn to program idiomatically. Code is barely readable to the newcomer. To the wizard they’re extremely powerful. I have never seen a Lisp application in production, though they do exist. At this time I would suggest Clojure instead.

  • [Newcomer] Scala: I consider it Java with a ton of the syntax cruft removed, and some very powerful new features added. It runs on the JVM and interoperates cleanly with existing Java classes. Definitely worth learning, possibly as an alternative to Java.

  • [Newcomer] Clojure: Lisp for the JVM, with some twists. Extends Lisp’s power for list processing to all kind of sequences (files, XML, and more). Extremely interesting to anyone with an inkling of interest in Lisp. Not likely to gain wide acceptance in industry, however, because it still takes considerable learning just to read Clojure effectively.

What to Learn

I won’t give definite advice here; as I’ve been saying all along, it depends on what you want to do. The career programmer, however, does need to cover a wide range, and needs to master at least one low-level language and one high-level one. You can go to opposite ends of the spectrum (C and Ruby, for example) and do well. I personally favor learning things that are significantly different simply for the purpose of broadening your mindset.

Consider, too, how languages can be combined. You don’t need to write an entire product in one language. For example, many games have very performance-critical pieces that are written in C. However, there’s lots of game logic that would be extremely tedious to write in C, so they embed a Lua interpreter in the game that part. Then the graphics guys get their speed and the game artists get to script stuff like “hitting this switch causes that platform to go up” in Lua. Best of both worlds.

Comments

I would consider adding smalltalk to the list. I am playing with it recently and believe it should be part of CS curriculum. That's how OOP should be taught.
Also:
Ocaml - language for writing languages parsers. Very fast but lacks libraries.
Hashell - practical ocaml, but slower. Interesting languages, as scheme and lisp, unlikely to be practically useful in clean form. However for writing flash application haxe may be a good approach.

Hey,
this is my book :D But can you please explain exact why the C# statement ? Some of my friends making fancy thinks with it plus WPF.

@Brian

I don't appreciate the intent behind C#. Sun built Java to break the platform lock-in that Microsoft had with MFC and Win32. Sun made a platform-independent language and set of libraries. Microsoft responded with another way to lock developers back into their platform. (There's mono but it's not a first-class citizen; it will always track behind .NET because Microsoft will do their own thing and the community must play catch-up.)

I say "leave this one to the timecard-punchers" because I can't encourage programmers to learn a language and libraries that are only formally supported on one operating system. There's money to be made as a C# programmer, and if I wanted to write Windows-only applications I'd go ahead and use C#, but I don't think a junior engineer should choose Windows-only as their platform of choice.

@Alex

Instead of adding more languages I should actually remove a few and limit the list to "common" languages used in practice. For example, I'm passionate about Lisp and its variants, but they're not commonly used. Lisp may deserve a separate discussion (e.g. learning a language for the purpose of stretching your mind rather than padding your resumé) but at this point I don't think a junior engineer should focus on Lisp. I count OCaml and Haskell in the same boat; I experiment with them and read research papers about them, but I wouldn't suggest that a junior engineer focus on them before mastering at least C and one common scripting language.

Post a comment