Chapter 4: Higher Half Kernel - COMPLETE

2017-11-17 23:20:50 +01:00
parent 823560d3ae
commit a908284dc6
2 changed files with 335 additions and 0 deletions
--- a/doc/4_Higher_Half_Kernel.md
+++ b/doc/4_Higher_Half_Kernel.md
@@ -0,0 +1,334 @@
+# Chapter 4 - "Higher Half" Kernel
+
+In this chapter we'll make our kernel run in the top of memory - well out of
+the way of user programs and memory mapped devices.
+
+## What is a higher half kernel
+
+Some arguments for a higher half kernel can be found at [the osdev
+wiki](http://wiki.osdev.org/Higher_Half_Kernel).  There are arguments against
+as well, such as it being pointless with modern memory management routines.  My
+main argument for using it is that it makes things simpler.
+
+I chose to put the split at `0xFFFFFF8000000000`, which corresponds to the last
+entry in P4.
+
+> A note about the address. If you go through the calculations, you'll find
+> that the addresses mapped from the last entry in P4 actually starts at
+> `0xFF8000000000` or `0x0000FF8000000000`. However, due to limitations in the
+> hardware, the X86 architecture requires the most significant 13 bits of an
+> address to be equal; thus `0xFFFFFF8000000000`. This address format is called
+> 'canonical'.
+
+## Higher half linking
+
+I want to have the kernel running at address `0xFFFFFF8000000000` and above.
+However, when we're booted from GRUB, paging is not enabled, so
+we're limited to physical RAM.
+
+The solution to this problem is to tell GRUB to load the kernel into a low
+memory position, but make the kernel think it's loaded at a high position. This
+can be done with a linker script trick.
+
+We change the linker script from Chapter 2 to something like this:
+
+`src/kernel/Link.ld`
+```
+ENTRY(_start)
+
+KERNEL_OFFSET = 0xFFFFFF8000000000;
+KERNEL_START = 0x10000;
+
+SECTIONS
+{
+  . = KERNEL_START + KERNEL_OFFSET;
+  .text : AT(ADDR(.text.) - KERNEL_OFFSET)
+  {
+    *(.multiboot)
+    *(.text)
+  }
+}
+```
+
+What this does is tell the linker to assume the code starts at
+`0xFFFFFF8000010000` when calculating addresses for things like function calls
+and jumps and such, but to generate an ELF file with headers saying the `.text`
+section should be *loaded* at an address `0xFFFFFF8000000000` below that, i.e.
+`0x10000` which is within the physical RAM limits.
+
+This also means that GRUB will jump to address `0x10000` after loading the
+kernel, and from there we can set up paging and jump to above
+`0xFFFFFF8000000000`. We just need to take care at all memory references, since
+we can't trust the linker to sort them out before paging is setup.
+
+Oh, and you will also want to do the same with the other default elf sections
+in the linker file, such as `.rodata`, `.data` and `.bss`.
+
+## Fixing memory references
+
+In order to make sure all memory references are correct, we'll define some
+helpful macros.
+
+`src/kernel/include/memory.h`
+```c
+#define KERNEL_OFFSET 0xFFFFFF8000000000
+
+#ifdef __ASSEMBLER__
+#define V2P(a) ((a) - KERNEL_OFFSET)
+#define P2V(a) ((a) + KERNEL_OFFSET)
+#else
+#include <stdint.h>
+#define V2P(a) ((uintptr_t)(a) & ~KERNEL_OFFSET)
+#define P2V(a) ((uintptr_t)(a) | KERNEL_OFFSET))
+#endif
+...
+```
+
+I define two versions of the macros, one for use in assembly and one for c.
+`__ASSEMBLER__` is set by gcc when compiling a `.S` file.  The c version uses
+bit operations which means you can run `V2P(VP2(address))` without any problems
+due to the format of `KERNEL_OFFSET`. The proof of this is left as an exercise
+to the reader.
+
+Note that although we don't have access to a standard c library at this point,
+`stdint.h` (which defines `uintptr_t`) can still be used since it's included in
+`libgcc`, which we built and installed in the docker image together with the
+compiler.
+
+Then we need to go through our code and make sure all memory references are
+corrected.
+
+`src/kernel/boot/boot.S`
+```asm
+#include <memory.h>
+.intel_syntax noprefix
+
+...
+_start:
+  cli
+  mov esp, offset V2P(BootStack)
+
+  ...
+
+  mov eax, offset V2P(BootP4)
+  mov cr3, eax
+
+  ...
+
+  lgdt [V2P(BootGDTp)]
+
+  ...
+
+  jmp 0x8:V2P(long_mode_start)
+  ...
+```
+
+Note that `call` instructions don't have to be modified, since `call` uses
+relative addressing.
+
+And don't forget about the memory references in the page tables:
+
+`src/kernel/boot/boot_PT.S`
+```asm
+...
+BootP4:
+  .quad offset V2P(BootP3) + (PAGE_PRESENT | PAGE_WRITE)
+...
+```
+
+Note also that the GDT pointer does not require to be redirected to `V2P(BootGDT)` as one would assume. But why is that? Because of pure luck and coincidence.
+
+Before starting long mode the `lgdt` instruction will expect a 32 bit gdt pointer. If there's any more data, the top bits will just be truncated (due to the small-endian nature of the processor). As luck would have it, the only difference between `BootGDT` and `V2P(BootGDT)` lies in the top 32 bits. This also means that when it's time to load a 64 bit GDT, we can use the same pointer. Neat!
+
+At this point, it would be a good idea to check that the kernel still boots.
+However, gdb won't be able to tell you anything about the code since we're
+running outside of the linked addresses. You can still use it to inspect
+registers and such, though. You can also set breakpoints by modifying the
+address manually: `(gdb) break *(long_mode_start - 0xFFFFFF8000000000)`.
+
+## Jumping to higher half
+
+The final piece of setup we need to do before we can start running in the
+higher half is update the page table.
+
+We'll do this by adding a pointer to the same P3 we set up earlier at the end
+of the BootP4.
+
+`src/kernel/boot/boot_PT.S`
+```asm
+...
+BootP4:
+  .quad offset V2P(BootP3) + (PAGE_PRESENT | PAGE_WRITE)
+  .rept ENTRIES_PER_PT - 2
+    .quad 0
+  .endr
+  .quad offset V2P(BootP3) + (PAGE_PRESENT | PAGE_WRITE)
+...
+```
+
+If you start up the emulator now you can check that the higher half is mapped
+
+```
+(gdb) mmap
+0000000000000000-0000000040000000 0000000040000000 -rw
+0000ff8000000000-0000ff8040000000 0000000040000000 -rw
+```
+
+Note that qemu doesn't report the addresses in canonical mode.
+
+Anyway. It should now be safe to jump to higher half code:
+
+`src/kernel/boot/boot.S`
+```asm
+...
+.code64
+long_mode_start:
+  mov eax, 0x0
+  mov ss, eax
+  mov ds, eax
+  mov es, eax
+  mov fs, eax
+  mov gs, eax
+
+  movabs rax, offset upper_memory
+  jmp rax
+
+upper_memory:
+
+  jmp $
+```
+
+By loading the address of the `upper_memory` into a register and jumping to it
+we force the assembler to make a non-relative jump.
+
+If you run this, you'll find that gdb will be able to track where in the code
+you are again (after passing `upper_memory:`, or you could check the `RIP`
+register.
+
+```
+upper_memory () at boot/boot.S:116
+116       jmp $
+(gdb) reg RIP
+RIP=ffffff800001019f
+```
+
+Great! Now we can do some cleanup
+
+Move the stack pointer to higher half memory:
+```asm
+...
+upper_memory:
+  mov rax, KERNEL_OFFSET
+  add rsp, rax
+...
+```
+
+and unmap the identity mapping of the first gigabyte and reload the page table:
+
+```asm
+...
+  mov rax, 0
+  movabs [BootP4], rax
+
+  mov rax, cr3
+  mov cr3, rax
+...
+```
+
+Run it all again, and check that the low memory is unmapped:
+
+```
+(gdb) mmap
+0000ff8000000000-0000ff8040000000 0000000040000000 -rw
+```
+
+Finally, we also need to reload the GDT. In long mode, the GDT
+register points to the physical address of the GDT, and we just
+unmapped that...
+
+So we need to
+
+- reload the GDT and update the data selectors:
+```asm
+...
+  lgdt[rax]
+  mov rax, 0x0
+  mov ss, rax
+  mod ds, rax
+  mov es, rax
+...
+```
+- and reload the code selector. There are no long jumps in long mode,
+  so instead we'll use the `retfq` instruction which pops a return
+  address and code segment selector off the stack:
+```asm
+...
+  movabs rax, offset .reload_cs
+  pushq 0x8
+  push rax
+  retfq
+.reload_cs:
+```
+
+## Running c code
+
+Now that the instruction pointer is safely within our linked memory, we can
+trust c code to run.
+
+Calling a c function is simple enough:
+
+`src/kernel/boot/boot.S`
+```asm
+...
+.reload_cs:
+
+.extern kmain
+  movabs rax, offset kmain
+  call rax
+
+  hlt
+  jmp $
+```
+
+And the c source file:
+`src/kernel/boot/kmain.c`
+```c
+#include <memory.h>
+
+void clear_screen()
+{
+  unsigned char *vidmem = P2V(0xB8000);
+  for(int i=0; i < 80*24*2; i++)
+    *vidmem++ = 0;
+}
+
+void print_string(char *str)
+{
+  unsigned char *vidmem = P2V(0xB8000);
+  while(*str)
+  {
+    *vidmem++ = *str++;
+    *vidmem++ = 0x7;
+  }
+}
+
+void kmain()
+{
+  clear_screen();
+  print_string("Hello from c, world!");
+  for(;;);
+}
+```
+
+... which will clear the screen and print "Hello from c, world!".
+Things are so much simpler in c...
+
+But what now? This doesn't compile!
+
+You'll probably get an error about "relocation truncated to fit: R_X86_64_32
+against \`.rodata\`"
+
+This is because gcc assumes your code will be running at a lower memory
+address, and optimizes it as such. The solution is to tell gcc to make no
+assumption about addresses by adding the switch `-mcmodel=large` to `CFLAGS` in
+your makefile.
--- a/doc/README.md
+++ b/doc/README.md
@@ -6,4 +6,5 @@
 [Chapter 1: Toolchain](1_Toolchain.md)<br>
 [Chapter 2: Booting a Kernel](2_A_Bootable_Kernel.md)<br>
 [Chapter 3: Activate Long Mode](3_Activate_Long_Mode.md)<br>
+[Chapter 4: "Higher Half" Kernel](4_Higher_Half_Kernel.md)<br>