multipart-mixed

The Stack

Every thread in every process has a stack. This is where local state is kept, including:

  • All variables declared on the stack.

  • Parameters to function calls.

  • The pointer to the calling function’s stack base.

Let’s look at an example. Say my C code defined some variables on the stack like so:

int i = 0x11223344;
char s[4];
s[0] = 'f';
s[1] = 'o';
s[2] = 'o';
s[3] = '\0';

The stack would look something like:

Address Value
0xbffff888 f
0xbffff889 o
0xbffff88a o
0xbffff88b 0
0xbffff88c 44
0xbffff88d 33
0xbffff88e 22
0xbffff88f 11

Each function has a stack frame for its local variables. In this case, the stack frame only needs to be 8 bytes in size; enough to hold i and j. (I notice GCC sometimes uses more than it strictly needs.) Two registers in the CPU are used to maintain the stack: ESP (stack pointer) and EBP (base pointer). The base pointer refers to the top of the stack frame; therefore all accesses to variables in the stack frame are relative to EBP.

The assembly code generated by GCC for the above code looks like:

movl    %esp, %ebp            # we'll get to this in a moment
subl    $24, %esp             # ...and this...
movl    $287454020, -12(%ebp) # move 0x11223344 to -12 off of EBP
movb    $102, -16(%ebp)       # 102 = ASCII for 'f'
movb    $111, -15(%ebp)       # 111 = ASCII for 'o'
movb    $111, -14(%ebp)       # 111 = ASCII for 'o'
movb    $0, -13(%ebp)         # null terminator '\0'

For whatever reason, GCC uses 24 bytes for the stack frame. EBP is 0xbffff898, and GCC puts the stack variable i at EBP minus 12, therefore it occupies addresses 0xbffff88c..8f. (We’ll get to why the 0x44 byte is first later.) GCC puts s at 0xbffff888..8b.

Down the Rabbit Hole…

You should be asking, among other things, why the stack variables are below EBP. That’s because the stack grows downward. (Remember, the machine is evil.) GCC can place stack variables in any order within its stack frame, but with each function call the stack consistently moves down in address.

Now it’s worth investigating the first two lines of that assembly code:

movl    %esp, %ebp  # set base pointer to stack pointer
subl    $24, %esp   # move stack pointer down 24

The base pointer (EBP) always tells us the top of the current stack frame. The stack pointer (ESP) tells us where the next stack frame will begin. Therefore the size of the stack frame is EBP minus ESP. Before our function was called, ESP was set up for us. Likewise, if we call another function, ESP is ready to go and that function will copy ESP for its base pointer.

Function Calls

Now let’s investigate how the stack is used with function calls. Here’s some C code with some stack variables and a function call:

void foo()
{
    int j = 0x55667788;
}

int main()
{
    int i = 0x11223344;    
    foo();
    return 0;
}

Here’s the matching assembly:

_foo:
    pushl   %ebp
    movl    %esp, %ebp
    subl    $24, %esp
    movl    $1432778632, -12(%ebp)  # j = 0x55667788;
    leave
    ret
_main:
    pushl   %ebp
    movl    %esp, %ebp
    subl    $24, %esp
    movl    $287454020, -12(%ebp)   # i = 0x11223344;
    call    _foo
    movl    $0, %eax
    leave
    ret

When main() starts, it pushes the base pointer onto the stack (I’ll get back to that) and then sets the base pointer to the stack pointer. The stack pointer was already set up before main() was called. Now EBP tells us the top of our stack frame. GCC decrements the stack pointer by 24, setting it up for a future function call. It does the assignment of i with a MOV instruction; the address is relative to the base pointer as we saw earlier. Then it uses the CALL instruction to call foo().

The behavior of foo() looks very similar, doesn’t it? The function can assume the stack pointer was already set up—main() did that earlier—and foo() uses ESP as its base pointer. Then it subtracts 24 from ESP, giving foo() its own 24 byte stack frame.

Now I said we’d get back to why EBP was pushed onto the stack. That’s because the LEAVE instruction copies the base pointer EBP back to ESP—thus making our stack frame’s size zero. Now what’s on the stack right at the address pointed to by EBP? That’s the previous value of EBP, which we pushed when entering the function. So LEAVE sets EBP back to that value—now it’s the base pointer for main() again—and the RET instruction returns control to the instruction just after the CALL.

LEAVE is really just a shorthand instruction for:

movl    %ebp, %esp
pop   %ebp

…which is simply the inverse of what we did when entering the function.

At this point, go have another coffee, you earned it. From here out things will get easier because I’ll show you how to use the GDB debugger to wade through this morass.