diff --git a/doc/4_Higher_Half_Kernel.md b/doc/4_Higher_Half_Kernel.md new file mode 100644 index 0000000..a0171c8 --- /dev/null +++ b/doc/4_Higher_Half_Kernel.md @@ -0,0 +1,334 @@ +# Chapter 4 - "Higher Half" Kernel + +In this chapter we'll make our kernel run in the top of memory - well out of +the way of user programs and memory mapped devices. + +## What is a higher half kernel + +Some arguments for a higher half kernel can be found at [the osdev +wiki](http://wiki.osdev.org/Higher_Half_Kernel). There are arguments against +as well, such as it being pointless with modern memory management routines. My +main argument for using it is that it makes things simpler. + +I chose to put the split at `0xFFFFFF8000000000`, which corresponds to the last +entry in P4. + +> A note about the address. If you go through the calculations, you'll find +> that the addresses mapped from the last entry in P4 actually starts at +> `0xFF8000000000` or `0x0000FF8000000000`. However, due to limitations in the +> hardware, the X86 architecture requires the most significant 13 bits of an +> address to be equal; thus `0xFFFFFF8000000000`. This address format is called +> 'canonical'. + +## Higher half linking + +I want to have the kernel running at address `0xFFFFFF8000000000` and above. +However, when we're booted from GRUB, paging is not enabled, so +we're limited to physical RAM. + +The solution to this problem is to tell GRUB to load the kernel into a low +memory position, but make the kernel think it's loaded at a high position. This +can be done with a linker script trick. + +We change the linker script from Chapter 2 to something like this: + +`src/kernel/Link.ld` +``` +ENTRY(_start) + +KERNEL_OFFSET = 0xFFFFFF8000000000; +KERNEL_START = 0x10000; + +SECTIONS +{ + . = KERNEL_START + KERNEL_OFFSET; + .text : AT(ADDR(.text.) - KERNEL_OFFSET) + { + *(.multiboot) + *(.text) + } +} +``` + +What this does is tell the linker to assume the code starts at +`0xFFFFFF8000010000` when calculating addresses for things like function calls +and jumps and such, but to generate an ELF file with headers saying the `.text` +section should be *loaded* at an address `0xFFFFFF8000000000` below that, i.e. +`0x10000` which is within the physical RAM limits. + +This also means that GRUB will jump to address `0x10000` after loading the +kernel, and from there we can set up paging and jump to above +`0xFFFFFF8000000000`. We just need to take care at all memory references, since +we can't trust the linker to sort them out before paging is setup. + +Oh, and you will also want to do the same with the other default elf sections +in the linker file, such as `.rodata`, `.data` and `.bss`. + +## Fixing memory references + +In order to make sure all memory references are correct, we'll define some +helpful macros. + +`src/kernel/include/memory.h` +```c +#define KERNEL_OFFSET 0xFFFFFF8000000000 + +#ifdef __ASSEMBLER__ +#define V2P(a) ((a) - KERNEL_OFFSET) +#define P2V(a) ((a) + KERNEL_OFFSET) +#else +#include +#define V2P(a) ((uintptr_t)(a) & ~KERNEL_OFFSET) +#define P2V(a) ((uintptr_t)(a) | KERNEL_OFFSET)) +#endif +... +``` + +I define two versions of the macros, one for use in assembly and one for c. +`__ASSEMBLER__` is set by gcc when compiling a `.S` file. The c version uses +bit operations which means you can run `V2P(VP2(address))` without any problems +due to the format of `KERNEL_OFFSET`. The proof of this is left as an exercise +to the reader. + +Note that although we don't have access to a standard c library at this point, +`stdint.h` (which defines `uintptr_t`) can still be used since it's included in +`libgcc`, which we built and installed in the docker image together with the +compiler. + +Then we need to go through our code and make sure all memory references are +corrected. + +`src/kernel/boot/boot.S` +```asm +#include +.intel_syntax noprefix + +... +_start: + cli + mov esp, offset V2P(BootStack) + + ... + + mov eax, offset V2P(BootP4) + mov cr3, eax + + ... + + lgdt [V2P(BootGDTp)] + + ... + + jmp 0x8:V2P(long_mode_start) + ... +``` + +Note that `call` instructions don't have to be modified, since `call` uses +relative addressing. + +And don't forget about the memory references in the page tables: + +`src/kernel/boot/boot_PT.S` +```asm +... +BootP4: + .quad offset V2P(BootP3) + (PAGE_PRESENT | PAGE_WRITE) +... +``` + +Note also that the GDT pointer does not require to be redirected to `V2P(BootGDT)` as one would assume. But why is that? Because of pure luck and coincidence. + +Before starting long mode the `lgdt` instruction will expect a 32 bit gdt pointer. If there's any more data, the top bits will just be truncated (due to the small-endian nature of the processor). As luck would have it, the only difference between `BootGDT` and `V2P(BootGDT)` lies in the top 32 bits. This also means that when it's time to load a 64 bit GDT, we can use the same pointer. Neat! + +At this point, it would be a good idea to check that the kernel still boots. +However, gdb won't be able to tell you anything about the code since we're +running outside of the linked addresses. You can still use it to inspect +registers and such, though. You can also set breakpoints by modifying the +address manually: `(gdb) break *(long_mode_start - 0xFFFFFF8000000000)`. + +## Jumping to higher half + +The final piece of setup we need to do before we can start running in the +higher half is update the page table. + +We'll do this by adding a pointer to the same P3 we set up earlier at the end +of the BootP4. + +`src/kernel/boot/boot_PT.S` +```asm +... +BootP4: + .quad offset V2P(BootP3) + (PAGE_PRESENT | PAGE_WRITE) + .rept ENTRIES_PER_PT - 2 + .quad 0 + .endr + .quad offset V2P(BootP3) + (PAGE_PRESENT | PAGE_WRITE) +... +``` + +If you start up the emulator now you can check that the higher half is mapped + +``` +(gdb) mmap +0000000000000000-0000000040000000 0000000040000000 -rw +0000ff8000000000-0000ff8040000000 0000000040000000 -rw +``` + +Note that qemu doesn't report the addresses in canonical mode. + +Anyway. It should now be safe to jump to higher half code: + +`src/kernel/boot/boot.S` +```asm +... +.code64 +long_mode_start: + mov eax, 0x0 + mov ss, eax + mov ds, eax + mov es, eax + mov fs, eax + mov gs, eax + + movabs rax, offset upper_memory + jmp rax + +upper_memory: + + jmp $ +``` + +By loading the address of the `upper_memory` into a register and jumping to it +we force the assembler to make a non-relative jump. + +If you run this, you'll find that gdb will be able to track where in the code +you are again (after passing `upper_memory:`, or you could check the `RIP` +register. + +``` +upper_memory () at boot/boot.S:116 +116 jmp $ +(gdb) reg RIP +RIP=ffffff800001019f +``` + +Great! Now we can do some cleanup + +Move the stack pointer to higher half memory: +```asm +... +upper_memory: + mov rax, KERNEL_OFFSET + add rsp, rax +... +``` + +and unmap the identity mapping of the first gigabyte and reload the page table: + +```asm +... + mov rax, 0 + movabs [BootP4], rax + + mov rax, cr3 + mov cr3, rax +... +``` + +Run it all again, and check that the low memory is unmapped: + +``` +(gdb) mmap +0000ff8000000000-0000ff8040000000 0000000040000000 -rw +``` + +Finally, we also need to reload the GDT. In long mode, the GDT +register points to the physical address of the GDT, and we just +unmapped that... + +So we need to + +- reload the GDT and update the data selectors: +```asm +... + lgdt[rax] + mov rax, 0x0 + mov ss, rax + mod ds, rax + mov es, rax +... +``` +- and reload the code selector. There are no long jumps in long mode, + so instead we'll use the `retfq` instruction which pops a return + address and code segment selector off the stack: +```asm +... + movabs rax, offset .reload_cs + pushq 0x8 + push rax + retfq +.reload_cs: +``` + +## Running c code + +Now that the instruction pointer is safely within our linked memory, we can +trust c code to run. + +Calling a c function is simple enough: + +`src/kernel/boot/boot.S` +```asm +... +.reload_cs: + +.extern kmain + movabs rax, offset kmain + call rax + + hlt + jmp $ +``` + +And the c source file: +`src/kernel/boot/kmain.c` +```c +#include + +void clear_screen() +{ + unsigned char *vidmem = P2V(0xB8000); + for(int i=0; i < 80*24*2; i++) + *vidmem++ = 0; +} + +void print_string(char *str) +{ + unsigned char *vidmem = P2V(0xB8000); + while(*str) + { + *vidmem++ = *str++; + *vidmem++ = 0x7; + } +} + +void kmain() +{ + clear_screen(); + print_string("Hello from c, world!"); + for(;;); +} +``` + +... which will clear the screen and print "Hello from c, world!". +Things are so much simpler in c... + +But what now? This doesn't compile! + +You'll probably get an error about "relocation truncated to fit: R_X86_64_32 +against \`.rodata\`" + +This is because gcc assumes your code will be running at a lower memory +address, and optimizes it as such. The solution is to tell gcc to make no +assumption about addresses by adding the switch `-mcmodel=large` to `CFLAGS` in +your makefile. diff --git a/doc/README.md b/doc/README.md index cf33e60..c8c3b59 100644 --- a/doc/README.md +++ b/doc/README.md @@ -6,4 +6,5 @@ [Chapter 1: Toolchain](1_Toolchain.md)
[Chapter 2: Booting a Kernel](2_A_Bootable_Kernel.md)
[Chapter 3: Activate Long Mode](3_Activate_Long_Mode.md)
+[Chapter 4: "Higher Half" Kernel](4_Higher_Half_Kernel.md)