thomasloven.com/pages/2012-08-08-Thread-Stacks.md

layout: post
title: "Thread Stacks"
subtitle: "4 a.m. -  know where your stack pointer is?"
tags: [osdev]

Since the x86 architecture has relatively few processor registers, a
programmer may need additional space to store temporary values. For most
compilers and languages, this space is the stack. C, for example, (gcc
and clang at least) uses the stack to store local variables, function
arguments and return addresses. In other words, the stack comes in use
every time there is a function call.

The common way a function call is handled by a c compiler is this:

- Push each argument to the stack (in reverse order)
- Execute the `CALL` instruction (which pushes the address of the next

instruction to the stack and jumps to the callee)

The callee does the following:

- Push the base pointer to the stack
- Sets the base pointer to the current stack pointer
- Subtracts the stack pointer to reserve place for local variables.
- Do its thing
- Increase the stack pointer to free the space used by local variables.
- Pop the base pointer from stack.
- Execute the `RET` instruction (which puts the return value in EAX and

jumps to the position at the top of the stack.

While the callee is doing its thing it now has access to all the pushed
arguments at addresses (ebp + 8) and forwards and all local variables
at addresses up to ebp. The return address is reachable at (ebp + 4) if
you'd ever want that.

This convention makes it really easy to have functions which takes an
undefined number of arguments, like `printf` does.

###Stacks in context switching
It also makes for really simple _context switching_.
Since the return address is stored on the stack, if you were to switch
stacks inside a function, when you return, you'll be somewhere else.
This is a common way of making usermode threads. Ponder the following:

    :::c
    void switch_thread()
    {
        push_all_registers();
        switch_stack_pointer();
        pop_all_registers();
        return;
    }

    void a()
    {
        while(1)
        {
            do_something();
            switch_thread();
        }
    }

    void b()
    {
        while(1)
        {
            do_something_else();
            switch_thread();
        }
    }

Imagine two threads - __A__ and __B__ running, __A__ runs `a()` and __B__
runs `b()`. Each has a stack somewhere in memory, and __A__ is currently
running. The top of the stacks looks like:

                               +-----------------------+
                               |switch_stack_pointer RA|
                               |all registers          |
    +----------ESP----------+  |switch_thread RA       |
    |a RA                   |  |b RA                   |
    |          ...          |  |          ...          |

where `RA` means Return Address and `ESP` is where the stack pointer is
currently pointing.
As execution of __A__ continues, the processor will `do_something()` and
then call `switch_thread()`...

                               +-----------------------+
                               |switch_stack_pointer RA|
    +----------ESP----------+  |all registers          |
    |switch_thread RA       |  |switch_thread RA       |
    |a RA                   |  |b RA                   |
    |          ...          |  |          ...          |

`switch_thread()` pushes all registers to the stack and calls
`switch_stack_pointer()`

    +----------ESP----------+  +-----------------------+
    |switch_stack_pointer RA|  |switch_stack_pointer RA|
    |all registers          |  |all registers          |
    |switch_thread RA       |  |switch_thread RA       |
    |a RA                   |  |b RA                   |
    |          ...          |  |          ...          |

`switch_stack_pointer()` performs some scheduling to find out which
thread is to run next, and then switches the stack pointer over to the
top of __B__'s stack.

    +-----------------------+  +----------ESP----------+
    |switch_stack_pointer RA|  |switch_stack_pointer RA|
    |all registers          |  |all registers          |
    |switch_thread RA       |  |switch_thread RA       |
    |a RA                   |  |b RA                   |
    |          ...          |  |          ...          |

The processor keeps on executing code, and `switch_stack_pointer()` soon
returns

    +-----------------------+
    |switch_stack_pointer RA|  +----------ESP----------+
    |all registers          |  |all registers          |
    |switch_thread RA       |  |switch_thread RA       |
    |a RA                   |  |b RA                   |
    |          ...          |  |          ...          |

`switch_thread()` pops all registers and returns...

    +-----------------------+
    |switch_stack_pointer RA|
    |all registers          |
    |switch_thread RA       |  +----------ESP----------+
    |a RA                   |  |b RA                   |
    |          ...          |  |          ...          |

... and we're now in `b()` with all registers of __B__ loaded.

###Stacks in the kernel
When an interrupt or exception happens in user mode, [a new stack is
loaded from the tss](/blog/2012/08/Return-To-Kernel-Mode/) and
(usually) all registers are pushed onto it before the kernel starts the
__Interrupt Service Routine__.

Wait... _all registers are pushed onto it_? I like the sound of that.
That's, like, half the work of changing threads, right? Right!

If you've been following a kernel development tutorial (like [James
Molloys](http://www.jamesmolloy.co.uk/tutorial_html/) or [Brandon
Friesens](http://www.osdever.net/bkerndev/Docs/title.htm)) you probably
have something like this to handle interrupts:

    :::nasm
    int_stub:
        pusha

        xor eax, eax
        mov ax, ds
        push eax

        mov eax, 0x10
        mov ds, ax
        mov es, ax
        mov fs, ax
        mov gs, ax

        call int_handler

        pop eax
        mov ds, ax
        mov es, ax
        mov fs, ax
        mov gs, ax

        popa

        add esp, 8

        iret

&nbsp;

    :::c
    void int_handler(registers_t r)
    {
        do_stuff();
    }

In fact, if you've been following one of those tutorials, you probably
have the above code twice, for some reason...

Anyway. This would take care of both pushing and poping all registers,
and with only a small modification, it becomes very easy to switch the
stacks too...

    :::nasm
    int_stub:
        pusha

        xor eax, eax
        mov ax, ds
        push eax

        mov eax 0x10
        mov ds, ax
        mov es, ax
        mov fs, ax
        mov gs, ax

        push esp ;Pass stack pointer to int_handler
        call int_handler
        mov esp, eax ;int_handler returns a new stack pointer

        pop eax
        mov ds, ax
        mov es, ax
        mov fs, ax
        mov gs, ax

        popa

        add esp, 8

        iret

&nbsp;

    :::c
    registers_t *int_handler(registers_t *r)
    {
        do_stuff();
        r = get_next_thread(r);
        return r;
    }

This gives a pointer to the threads registers as input to the ISR and
expect a pointer to some registers in return. They may or may not be the
same.

###Keeping track of the stacks
The saved registers are a large part of what defines each thread, but
there are actually a few things more that are needed.

First of all, the kernel may want some extra information associated
with each thread, such as scheduling information and a list of signal
handlers.

Sometimes a thread in user mode will need help from the kernel which
it cannot offer immediately. The thread may for example issue a read
request to a file that's on a drive which has some spin-up time before
it can be read. The kernel may then switch to another thread while the
disk spins up. Therefore it's a good idea to have a separate kernel
stack space for each thread.

With some thought, those three things can be easily combined into a
single data structure. So let's think about it for a while.

While the thread is running we want some information stored somewhere in
kernel space about it.

    +-----------------------+
    |thread information     |
    +-----------------------+

Then, when an interrupt or syscall happens, a new stack is loaded
and some stuff is pushed onto it. If we want this near our thread
information it will have to go right before it, since the stack grows
backwards.

    +-----------------------+
    |thread registers       |
    |thread information     |
    +-----------------------+

Finally, we want the kernel mode stack. Well... the stack pointer is
right at the start of the registers now, so why not just continue the
stack from there?

    +-----------------------+
    |   ...                 |
    |kernel mode stack      |
    |thread registers       |
    |thread information     |
    +-----------------------+

###Setting this up
To set this up, the thread information structure has to be set up
something like:

    :::c
    struct thread_info_struct
    {
        uint8_t stack_space[KERNEL_STACK_SIZE];
        registers_t r;
        struct thread_data_struct thread_data;
    } my_thread_info;

When the thread is running in user mode, the TSS should be set up in
such a way that the stack pointer loaded at an interrupt points to the
end of the registers, i.e. the beginning of the thread data.

    :::c
    TSS.esp0 = &my_thread_info.thread_data;

And that's really all there is to it. Unbelievable, really, how many
years it took for me to figure this out.

In the process, I've found inspiration in [Rhombus by Nick
Johnson](https://github.com/nickbjohnson4224/rhombus/) and
[linux](http://www.linux.org).

###Some considerations
In order to do the actual switching of threads, I implemented a special
syscall which can be called only from kernel mode.

Let's say a user mode program calls `yield()`. This performs a syscall
in the form of an interrupt instruction `INT 0x80` and thus we jump into
the kernel.

The kernel performs some housekeeping and selects a new thread to run.
It then performs the special switching interrupt `INT 0x82`.

Since we're already in kernel mode, no new stack is loaded but the
registers are pushed onto the old one. The top of the kernel stack will
then contain a `registers_t` structure and a pointer to it is saved in
a `kernel_stack` variable in the `thread_data` portion of the thread
information structure.

Next, the thread information structure of the new thread is read and
the `kernel_stack` pointer from it is returned to the `int_stub` as
above. The `IRET` instruction brings us back to wherever we were before
(probably in kernel mode, but could as well be user mode). If the new
thread was swapped out while in kernel mode, it will carry on from
wherever it was and eventually return to user mode.

This way of handling kernel stacks also makes for really clean nesting
of interrupts.

###Usage
This method has been implemented in git commit
[756852fc66](https://github.com/thomasloven/os5/tree/756852fc66b80b1e605
8d74b8dc334ad841ec5ea)

###A warning
I recently learned - the hard way - that the [clang
compiler](http://clang.llvm.org) does not use this calling convention
for functions which do not in turn call other functions. I.e

    :::c
    int double_integer(int a)
    {
        return 2*a;
    }

    int main(int argc, char **argv)
    {
        double_integer(5);
    }

If this code is compiled with clang `double_integer` will (in some
cases) not push `ebp` to stack.

This severely hinders many debuggers and should be considered a bug in
my oppinion.