Introducing the Machine
As best I can tell, the Intel 8086/8088 processor and its successors were designed by sadists. In the future, Skynet will win and the machines will destroy humankind because the great-great-[…]-grandfathers of the 8086 became so unfathomable that nobody could comprehend their function and, therefore, their evil plot.
I will focus on the 32-bit i386 instruction set because:
i386 32-bit instructions will work on both 32-bit and 64-bit x86 processors.
Most of the existing documentation, on the web and in books, covers i386.
If you don’t have an i386-compatible machine, you can buy one cheap, cheap, cheap and run a free operating system on it.
I will use the GCC compiler and GDB debugger for my examples. You can get this free for most any operating system; for Windows I would recommend running GCC under cygwin.
This section is not intended to teach you how to program in assembly. We’re going to look at assembly code and understand what it’s doing. To actually compose assembly code by hand is another matter entirely. There are numerous lengthy tomes on that topic. Remember, I’m using assembly as a tool to better understand what C code is doing.
Why use C as my “high level” language? In industry, C is a foundational language that you must know. Not necessarily for your first job, but any well-versed programmer will need to master C at some point. Operating systems, drivers, libraries, and so forth—they’re all written in C. C allows you very direct control over memory and runtime behavior, so you know what it’s doing. And that knowing is exactly what we’re going for.
Also note, the terms “assembly code” and “machine code” are nearly synonyms; I think of assembly code as a human-readable version of machine code. For example, in assembly you’ll see
addl $5, %eax to add five to the register EAX. The machine code the binary representation of that. They’re two ways of saying the same thing.
CPU and Memory
The CPU has an instruction pipeline that loads and runs your program’s instructions in sequence. (I’ll touch on multiple pipelines, caching, and reordering later; for the moment let’s consider a single pipeline.) Most instructions operate on registers within the CPU. Some registers are special and are used to control the CPU; others perform operations on data. These operations can run extremely fast because the registers are within the CPU, whereas loading data from memory is considerably slower. (I’ll touch on data cache later, too.)
Here’s a diagram of how the system is wired from a high level.
Let’s put this into practice with some examples. First, we’ll load a value from memory, add 5 to it, and store the result back to memory:
movl ..., %edx # load address into register EDX movl (%edx), %eax # load value from that address to EAX addl $5, %eax # add five to EAX movl %eax, (%edx) # store EAX back to the address
There are several things to note here:
We’ll get to the “…” part later.
I’m using the GAS (GNU Assembler) syntax. The general format of each line is:
operation source, destination. GAS is different than the way you’ll often see assembly documented, where the destination is listed first. I’m using GAS because our intent is not to write assembly from scratch, but to investigate what the compiler is generating. GCC generates this syntax, and GDB uses this syntax when you switch it into assembly mode.
EAX, EBX, ECX, and EDX are 32 bit “general purpose” registers to store stuff in. I say “general purpose” in quotes because they also have special oddities, for example with EAX the CPU has a specialized operation for doing addition, so EAX is often used to add things. This was heavily utilized back in the 8086 days; I’m not sure if EAX actually has any magic optimizations (aside from a shorter opcode) than other registers on a modern chip.
%eaxrefers to the EAX register itself, whereas
(%eax)refers to the contents of the register EAX.
In x86, there’s more than one way to do it. In this example, I could have replaced the last 3 lines with
addl $5, (%edx)to add 5 directly to the value pointed to by EDX. However, there’s no such thing as a free lunch. Is that one line faster than the three lines I wrote? Possibly by a clock cycle or two, but the CPU must load the memory address and store the new value either way; if the address is not in cache then it will take many clock cycles to fetch the address out of memory.
You should be. As I said, this chip harbors an evil plot and the machines will eventually rise up to destroy us. However, we’ll work through several exercises with GCC and GDB to illustrate what’s going on. But first, we need to understand one more concept: the stack.