mittos64/doc/10_Threading.md

# Threading

In this chapter we'll implement context switching and a simple scheduler.

## Context switching

Switching from one thread to another is actually really easy. All you need to
do is replace everything in every processor register at the same time, and
you're good to go. Ok... that doesn't sound so simple, but we get some help.

In the conventions used by our gcc cross compiler (known as [System V
ABI](https://wiki.osdev.org/System_V_ABI)), when calling a function only a few
registers are guaranteed to be preserved. Those are `rbx`, `rsp`, `rbp`, `r12`,
`r13`, `r14` and `r15`. The rest can not be assumed to retain the same value.

So, if we make our context switch out to look like a function call, we only
need to replace those seven registers. This can easily be done with a small asm
routine:

`src/kernel/proc/swtch.S`
```asm
.intel_syntax noprefix

.global switch_stack
switch_stack:
  push rbp
  mov rbp, rsp

  push r15
  push r14
  push r13
  push r12
  push rbx
  push rbp

  mov [rdi], rsp
  mov rsp, [rsi]

  pop rbp
  pop rbx
  pop r12
  pop r13
  pop r14
  pop r15

  leaveq
  ret
```

This pushes all registers, writes the stack pointer to the address passed as
the first argument, reads a new stack pointer from the address in the second
argument, and pops all registers. Since the return address is on the stack, the
`ret` instruction will return to the new thread.

> #### A note on credit
>
> Not everything I present here is my own original ideas. In fact, most of it
> probably isn't. I've been itterating my kernel design from the ground up a
> dozen times or more through the last ten years, and where I picked up methods
> and ideas have gotten lost along the way.
>
> This method of switching threads, though, I know I got from XV6, where it may
> or may not have originated.
>
> I'm sorry I can't always give proper and detailed credit to the giants on
> whose shoulders I stand, but for a list of my most significant sources of
> inspiration through the years, see Chapter 0.

## Threads

Ok, so now switching between two threads of execution only requires:

```c
switch_stack(&old_stack_ptr, &new_stack_ptr);
```

So the next step would be a structured and reliable way of keeping track of new
and old stack pointers, and to allocate the stacks themselves. We might also
want some extra information relating to each thread. A struct would be ideal
for this.

`src/kernel/include/thread.h`
```c
struct thread
{
  uint64_t tid;
  void *stack_ptr;
  uint64_t state;
};
```

This will grow with a lot of more information later.

But where should we put this struct? you may remember from an earlier chapter
that we don't have a `malloc` implementation to assign storage. We could use
`pmm_alloc` to get some memory, but that would give us an entire page, and this
struct is only 24 bytes. So what should we do with the rest of the space?

How about using it for the thread stack?
Allocating a new thread would then look something like this:

`src/kernel/proc/thread.c`
```c
uint64_t next_tid = 1;
struct thread *new_thread()
{
  struct thread *th = P2V(pmm_calloc());
  th->tid = next_tid++;
  th->stack_ptr = incptr(th, PAGE_SIZE);

  return th;
}
```

Of course, this thread can't be run. If we try to switch to it, the
`switch_stack` function won't get a propper stack to start from, so we need to
mock that up first:

`src/kernel/proc/thread.c`
```c
struct swtch_stack
{
  uint64_t RBP;
  uint64_t RBX;
  uint64_t R12;
  uint64_t R13;
  uint64_t R14;
  uint64_t R15;
  uint64_t RBP2;
  uint64_t ret;
};

uint64_t next_tid = 1;
struct thread *new_thread(void (*function)(void))
{
  struct thread *th = P2V(pmm_calloc());
  th->tid = next_tid++;
  th->stack_ptr = incptr(th, PAGE_SIZE - sizeof(struct swtch_stack));

  struct swtch_stack *stk = th->stack_ptr;
  stk->RBP = (uint64_t)&stk->RBP2;
  stk->ret = (uint64_t)function;

  return th;
}
```

That's all you need to do to set up a new thread with it's own stack, which
will start running the function you pass to `new_thread`.

If you wish, you can try this out now. Here's a quick (untested) mockup on how
it might be done:

`src/kernel/boot/kmain.c`
```c

struct thread *current, *next;

void thread_function()
{
  int thread_id = current->tid;

  while(1)
  {
    debug("Thread %d\n", thread_id);
    struct thread *_next = next;
    struct thread *_current = current;

    // Update "scheduler"
    next = current;
    current = _next;

    // Switch thread
    switch_stack(&_current->stack_ptr, &_next->stack_ptr)
  }
}

void kmain(...)
{
  ...

  current = new_thread(thread_function);
  next = new_thread(thread_function);

  uint64_t dummy_stack_ptr;
  switch_stack(&dummy_stack_ptr, &current->stack_ptr);

  ...
}

```

This implements a simple "scheduler" which keeps track of two threads and
switches between them.

Switching in the first thread requires a dummy variable to store the old stack
pointer. This value is thrown away, because we will never switch back into
`kmain`.

If you run this, the screen should fill with alternating lines of "Thread 1"
and "Thread 2". Note that the variable `thread_id` is function local, and thus
stored on the stack. If you're not convinced, you can make the threads run
different functions instead.

I don't get why tutorial writers make this look so hard...

Now it's time to make it scaleable.

## Queueing

Ok, so now we can create and switch between threads, but we still need some way
of keeping track of them.

Threads can be in one of several states. They are either *running*, *waiting to
run*, or *waiting for something else*. This might seem obvious, kind of like
how everything is either *a banana* or *not a banana*, but those three states
are kind of important and decide how we keep track of the thread.

For the running threads it's easy. Only one thread per cpu core can be running
at a time, so it makes sense to keep a global variable which points to the
running thread (one per cpu core).

Threads that are waiting to run will be kept in the scheduler queue. When a
running thread has finished running for one reason or another, the sheduler
will pick the next one to run from the run queue based on various conditions
and, if necessary, put the previously running thread back in the queue.

Where the threads waiting for something else are kept depends on what they are
waiting for. For example, a thread waiting for a disk read to finish will
probably be kept track of by the disk driver or simmilar. Once the read
completes, the driver will hand the thread over to the scheduler to be put in
the run queue.

Either way, almost all waiting threads will be kept in a queue somewhere.

I won't go into the pros and cons of different queueing methods here, but just
use a simple linked list, where the queue header keeps track of the first and
last item, and each item in the queue keeps track of the next one. Something
like this:

```c
struct {
  struct thread *first;
  struct thread *last;
} run_queue;


struct thread
{
  ...
  struct thread *run_queue_next;
  ..
};

void init_queue()
{
  run_queue.first = 0;
  run_queue.last = 0;
}

void queue_add(struct thread *th)
{
  if(!run_queue.last)
    run_queue.first = th;
  else
    run_queue.last->run_queue_next = th;
  run_queue.last = th;
  th->run_queue_next = 0;
}

thread *queue_pop()
{
  thread *ret = run_queue.first;
  if(run_queue.first && !(run_queue.first = run_queue.first->run_queue_next))
    run_queue.last = 0;
  return ret;
}
```

That's a full FIFO queue setup. Simple, but not very fun.

Let's generalize!

There are three things which are unique for each queue.

- The name of the queue header struct
- The name of the pointer to the next item in the item struct
- The type of item in the queue

With some macro magic, we can condense this into a single symbol:

```c
#define RunQ run_queue, run_queue_next, struct thread
```

The plan is that `RunQ` should be used every time we want to do or define
something with the run queue. Such as `queue_add(RunQ, my_thread)` or
`queue_pop(RunQ)`, or:

```c
struct thread
{
  ...
  QUEUE_SPOT(RunQ);
  ...
};
```

You probably see by now that `queue_add` would need to be a macro. You can't
pass a type to a function, and the example above would expand to
`queue_add(run_queue, run_queue_next, struct thread, my_thread)`.

But if `queue_add` is a macro, `RunQ` won't be expanded...

This can be solved with an extra layer of indirection. We define a variadic
wrapper macro which expands all arguments and pass them to another macro. This
will make the preprocessor realize it needs to make another run of the code.

```c
#define _QUEUE_ADD(queue, entry, type, item) \
  if(!queue.last) \
    queue.first = (item) \
  else
    queue.last->entry = (item); \
  queue.last = (item); \
  (item)->entry = 0;
#define queue_add(...) _QUEUE_ADD(__VA_ARGS__)
```

Then we do the same thing with any other queue operations we might need, as
well as declaring and defining the queue head and next item pointer.

```c
#define _QUEUE_DECL(queue, entry, type) \
  struct queue{ \
    type *first; \
    type *last; \
  } queue;
#define QUEUE_DECLARE(...) _QUEUE_DECL(__VA_ARGS__)

#define _QUEUE_HEAD(queue, entry, type) \
  struct queue queue = {0, 0};
#define QUEUE_DEFINE(...) _QUEUE_HEAD(__VA_ARGS__)

#define _QUEUE_SPOT(queue, entry, type) \
  type *entry
#define QUEUE_SPOT(...) _QUEUE_SPOT(__VA_ARGS__)

#define _QUEUE_POP(queue, entry, type) \
  __extension__({ \
    type *_ret = _(queue.first); \
    if(queue.first && !(queue.first = queue.first->entry)) \
      queue.last = 0; \
    _ret; \
  })
#define queue_pop(...) _QUEUE_POP(__VA_ARGS__)
```

> The `__extension__` thing is a workaround to make gcc accept a macro with a
> return value without warnings.  For some reason I get warnings that this is
> valid ANSI c even when compiling with `-std=gnu11`...