From 0b3648de7cf10823e26eef9bb5348ad0cf81f3be Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Thomas=20Lov=C3=A9n?= <thomasloven@gmail.com>
Date: Thu, 21 Dec 2017 16:21:58 +0100
Subject: [PATCH] Chapter 8: Exceptions - COMPLETE

---
 doc/8_Exceptions.md | 294 ++++++++++++++++++++++++++++++++++++++++++++
 doc/README.md       |   1 +
 2 files changed, 295 insertions(+)
 create mode 100644 doc/8_Exceptions.md

diff --git a/doc/8_Exceptions.md b/doc/8_Exceptions.md
new file mode 100644
index 0000000..f10b8b3
--- /dev/null
+++ b/doc/8_Exceptions.md
@@ -0,0 +1,294 @@
+# Exceptions and Interrupts
+
+Sometimes, things go wrong. When they do, we want to fail gracefully - or even
+recover. That's the point of exceptions.
+
+## Interrupt Service Routines
+
+The x86 interrupt handling method is, for historical reasons I assume, messy.
+The x86\_64 architecture saw a slight improvement in that the stack pointer and
+segment are always pushed, even if the cpu was running in ring 0 when the
+interrupt happened. Still, though, some exceptions push an error code, and
+others do not. And no data is provided to determine which interrupt occurred,
+besides which interrupt service routine was called.
+
+If all interrupts pushed a dummy error code and an identifying number, a single
+ISR would be enough, and the rest could be done in software.
+
+Anyway. Let's play with the cards we're dealt.
+
+The most common way of solving this discrepancy is by having a number of short
+ISRs in the form
+
+```asm
+isr1:
+  push 0 //; Dummy error code
+  push 1 //; Interrupt number
+  jmp isr_common //; The rest is the same for all interrupts
+```
+
+You may want up to 256 ISRs, so let's do some finger warmup exercises!
+
+Or rather yet, let's generate the ISRs automatically. With python!
+
+`src/kernel/cpu/isr.S.py`
+```python
+#!/usr/bin/env python2
+# -*- coding: utf-8 -*-
+
+from __future__ import print_function
+
+num_isr = 256
+pushes_error = [8, 10, 11, 12, 13, 14, 17]
+
+print('''
+.intel_syntax noprefix
+.extern isr_common
+''')
+
+
+print('// Interrupt Service Routines')
+for i in range(num_isr):
+    print('''isr{0}:
+    cli
+    {1}
+    push {0}
+    jmp isr_common '''.format(i,
+        'push 0' if i not in pushes_error else 'nop'))
+
+print('')
+print('''
+// Vector table
+
+.section .data
+.global isr_table
+isr_table:''')
+
+for i in range(num_isr):
+    print('  .quad isr{}'.format(i))
+```
+
+This outputs an assembly file with 256 ISRs like the one above, except numbers
+8, 10, 11, 12, 13, 14 and 17, which has an `nop` instruction instead of pushing
+a bogus error code.
+
+It's written for python 2 because that's what's included in the alpine version
+the build docker image is based on - despite it being 2018.  The encoding is
+utf-8, and I import the print function from \_\_future\_\_, because it's 2018.
+
+It also makes a table with pointers to each ISR, which makes it easy to set up
+the Interrupt Descriptor Table later:
+
+`src/kernel/cpu/interrupts.c`
+```c
+...
+struct idt
+{
+  uint16_t base_l;
+  uint16_t cs;
+  uint8_t ist;
+  uint8_t flags;
+  uint16_t base_m;
+  uint32t base_h;
+  uint32_t _;
+}__attribute__((packed)) idt[NUM_INTERRUPTS];
+
+extern uintptr_t isr_table[]
+
+void interrupt_init()
+{
+  memset(idt, 0, sizeof(idt));
+  for(int i=0; i < NUM_INTERRUPTS; i++)
+  {
+    idt[i].base_l = isr_table[i] & 0xFFFF;
+    idt[i].base_m = (isr_table[i] >> 16) & 0xFFFF;
+    idt[i].base_h = (isr_table[i] >> 32) & 0xFFFFFFFF;
+    idt[i].cs = 0x8;
+    idt[i].ist = 0;
+    idt[i].flags = IDT_PRESENT | IDT_DPL0 | IDT_INTERRUPT;
+  }
+...
+```
+
+`isr_common` pushes all registers to the stack (one by one, there's no `pusha`
+instruction in x86\_64) and passes controll to a c interrupt handler. Note that
+for x86\_64 the arguments to a function is not primarily passed on the stack,
+  but in registers. So the last thing it does before calling the c function is
+  move the stack pointer into `rdi`. In case the handler returns, `isr_common`
+  restores the stack pointer from `rax` - which is the function return value,
+  pops all values again, and performs an `iretq` instruction, which is pretty
+  much a backwards interrupt.
+
+`src/kernel/cpu/isr_common.S`
+```asm
+...
+isr_common:
+  push r15
+  push r14
+...
+  push rbx
+  push rax
+  mov rdi, rsp
+  call int_handler
+
+  mov rdi, rax
+isr_return:
+  mov rsp, rdi
+  pop rax
+  pop rbx
+...
+  pop r14
+  pop r15
+  add rsp, 0x10
+  iretq
+```
+
+But what's the deal with passing `rax` to `rsp` via `rdi`? Doing it this way
+will allow us to call `isr_return` as a function, with a faked interrupt stack.
+We'll use this later to get into user mode.
+
+## Building isr.S.py
+
+But back to the ISRs. In order to build this, we need some changes in the
+kernel makefile.
+First of all, the lines
+
+```make
+SRC := $(wildcard **/*.[cS])
+OBJ := $(patsubst %, %.o, $(basename $(SRC)))
+```
+
+need to be updated to allow more file extensions:
+
+```make
+SRC := $(wildcard **/*.[cS]*)
+OBJ := $(patsubst %, %.o, $(basename $(basename $(SRC))))
+```
+
+We also need a special rule to generate .o files from .S.py:
+
+`src/kernel/Makefile`
+```asm
+%.o: %.S.py
+	python $^ | $(COMPILE.S) $(DEPFLAGS) -x assembler-with-cpp - -o $@
+```
+
+In theory, it should be enough with a rule of the form
+
+```make
+%.S: %.S.py
+	python $^ > $@
+```
+
+However, this generates the dependency tree .o <- .s <- .S <- .py rather than
+.o <- .S <- .py, which uses `as` to compile, and causes some other trouble as
+well with intermediate files that are removed once, but not if you run make
+again, and stuff...
+
+Some of this can be solved with an `.INTERMEDIATE:` rule, but that's not very
+elegant. The big problem's probably with me rather than make.
+
+
+## The Interrupt Handler
+
+The c interrupt handler routine is a simple thing. Its default modus operandi
+is to print an error message and hang.
+
+However, before doing this, it checks a table of other interrupt handlers, and
+if one exists for the current interrupt, it passes execution over to that.
+
+`src/kernel/cpu/interrupts.c`
+```c
+registers *int_handler(registers *r)
+{
+  if(int_handlers[r->int_no])
+    return int_handlers[r->int_no](r);
+
+  debug("Unhandled interrupt occurred\n");
+  debug("Interrupt number: %d Error code: %d\n", r->int_no, r->err_code);
+  debug_print_registers(r);
+
+  PANIC("Unhandled interrupt occurred");
+  for(;;);
+}
+```
+
+## Final Note
+
+For tidyness sake, I wrapped the call to `interrupt_init` inside a function
+called `cpu_init`, which in turn is called from `kmain`. For now, that's all it
+is, but it will soon grow more important.
+
+## Bonus: Debugging Interrupts
+
+There's a small problem with the way interrupts are handled by the processor;
+they don't follow the calling convention.
+
+This means that when an interrupt occurs, and the debugger breaks in the
+`PANIC` macro, it has lost all context, and we can't see what happened.
+
+But wait. The entire context is saved. It was pushed to the stack and passed to
+the interrupt handler. And by using gdbs ability to set the value of registers
+in qemu, we can bring it back into scope.
+
+I put the following function in `toolchain/gdbinit`
+
+```gdb
+define restore_env
+set $name = $arg0
+python
+
+registers = {r: gdb.parse_and_eval('$name->' + r) for r in
+['rax', 'rbx', 'rcx', 'rdx', 'rsi', 'rdi', 'rbp', 'rsp', 'r8', 'r9', 'r10',
+'r11', 'r12', 'r13', 'r14', 'r15', 'rip']}
+
+for r in registers.items():
+  gdb.parse_and_eval('$%s=%s' % r)
+gdb.execute('frame 0')
+end
+end
+```
+
+And it's used like this:
+
+```
+(gdb) c
+Continuing.
+
+Thread 1 hit Breakpoint 2, int_handler (r=0xffffff8000019f10) at cpu/interrupts.c:74
+74        PANIC("Unhandled interrupt occurred");
+(gdb) restore_env r
+#0  0xffffff8000010caa in divide_two_numbers (divisor=0, dividend=0) at boot/kmain.c:18
+18        return dividend/divisor;
+(gdb) bt
+#0  0xffffff8000010caa in divide_two_numbers (divisor=0, dividend=0) at boot/kmain.c:18
+#1  0xffffff8000010dbd in kmain (multiboot_magic=920085129, multiboot_data=0x105fa0) at boot/kmain.c:33
+#2  0xffffff8000010efd in .reload_cs () at boot/boot.S:96
+#3  0x0000000000000007 in ?? ()
+#4  0x0000000000000730 in ?? ()
+#5  0x0000000000000000 in ?? ()
+(gdb) list
+13        for(;;);
+14      }
+15
+16      int divide_two_numbers(int divisor, int dividend)
+17      {
+18        return dividend/divisor;
+19      }
+20
+21      void kmain(uint64_t multiboot_magic, void *multiboot_data)
+22      {
+(gdb) p divisor
+$1 = 0
+(gdb) p divident
+$2 = 5
+(gdb) frame 1
+#1  0xffffff8000010dbd in kmain (multiboot_magic=920085129, multiboot_data=0x105fa0) at boot/kmain.c:33
+33        divide_two_numbers(0,5); // Calculate 0/5 and discard the results
+(gdb)
+```
+
+By restoring the processor to the state stored in `r`, we can debug from where
+the interrupt occurred as normal. By backtracing and inspecting variables we
+find that whoever wrote line 33 in `kmain.c` got the divisor and divident mixed
+up, which resulted in a divide by zero exception.
diff --git a/doc/README.md b/doc/README.md
index 8c85053..a41c206 100644
--- a/doc/README.md
+++ b/doc/README.md
@@ -10,4 +10,5 @@
 [Chapter 5: Unit Testing Framework](5_Unit_Testing.md)<br>
 [Chapter 6: Debug output](6_Debug_Output.md)<br>
 [Chapter 7: Multiboot Data](7_Multiboot_Data.md)<br>
+[Chapter 8: Exceptions and Interrupts](8_Exceptions.md)<br>