mittos64/doc/4_Higher_Half_Kernel.md

8.4 KiB

Chapter 4 - "Higher Half" Kernel

In this chapter we'll make our kernel run in the top of memory - well out of the way of user programs and memory mapped devices.

What is a higher half kernel

Some arguments for a higher half kernel can be found at the osdev wiki. There are arguments against as well, such as it being pointless with modern memory management routines. My main argument for using it is that it makes things simpler.

I chose to put the split at 0xFFFFFF8000000000, which corresponds to the last entry in P4.

A note about the address. If you go through the calculations, you'll find that the addresses mapped from the last entry in P4 actually starts at 0xFF8000000000 or 0x0000FF8000000000. However, due to limitations in the hardware, the X86 architecture requires the most significant 13 bits of an address to be equal; thus 0xFFFFFF8000000000. This address format is called 'canonical'.

Higher half linking

I want to have the kernel running at address 0xFFFFFF8000000000 and above. However, when we're booted from GRUB, paging is not enabled, so we're limited to physical RAM.

The solution to this problem is to tell GRUB to load the kernel into a low memory position, but make the kernel think it's loaded at a high position. This can be done with a linker script trick.

We change the linker script from Chapter 2 to something like this:

src/kernel/Link.ld

ENTRY(_start)

KERNEL_OFFSET = 0xFFFFFF8000000000;
KERNEL_START = 0x10000;

SECTIONS
{
  . = KERNEL_START + KERNEL_OFFSET;
  .text : AT(ADDR(.text.) - KERNEL_OFFSET)
  {
    *(.multiboot)
    *(.text)
  }
}

What this does is tell the linker to assume the code starts at 0xFFFFFF8000010000 when calculating addresses for things like function calls and jumps and such, but to generate an ELF file with headers saying the .text section should be loaded at an address 0xFFFFFF8000000000 below that, i.e. 0x10000 which is within the physical RAM limits.

This also means that GRUB will jump to address 0x10000 after loading the kernel, and from there we can set up paging and jump to above 0xFFFFFF8000000000. We just need to take care at all memory references, since we can't trust the linker to sort them out before paging is setup.

Oh, and you will also want to do the same with the other default elf sections in the linker file, such as .rodata, .data and .bss.

Fixing memory references

In order to make sure all memory references are correct, we'll define some helpful macros.

src/kernel/include/memory.h

#define KERNEL_OFFSET 0xFFFFFF8000000000

#ifdef __ASSEMBLER__
#define V2P(a) ((a) - KERNEL_OFFSET)
#define P2V(a) ((a) + KERNEL_OFFSET)
#else
#include <stdint.h>
#define V2P(a) ((uintptr_t)(a) & ~KERNEL_OFFSET)
#define P2V(a) ((uintptr_t)(a) | KERNEL_OFFSET))
#endif
...

I define two versions of the macros, one for use in assembly and one for c. __ASSEMBLER__ is set by gcc when compiling a .S file. The c version uses bit operations which means you can run V2P(VP2(address)) without any problems due to the format of KERNEL_OFFSET. The proof of this is left as an exercise to the reader.

Note that although we don't have access to a standard c library at this point, stdint.h (which defines uintptr_t) can still be used since it's included in libgcc, which we built and installed in the docker image together with the compiler.

Then we need to go through our code and make sure all memory references are corrected.

src/kernel/boot/boot.S

#include <memory.h>
.intel_syntax noprefix

...
_start:
  cli
  mov esp, offset V2P(BootStack)

  ...

  mov eax, offset V2P(BootP4)
  mov cr3, eax

  ...

  lgdt [V2P(BootGDTp)]

  ...

  jmp 0x8:V2P(long_mode_start)
  ...

Note that call instructions don't have to be modified, since call uses relative addressing.

And don't forget about the memory references in the page tables:

src/kernel/boot/boot_PT.S

...
BootP4:
  .quad offset V2P(BootP3) + (PAGE_PRESENT | PAGE_WRITE)
...

Note also that the GDT pointer does not require to be redirected to V2P(BootGDT) as one would assume. But why is that? Because of pure luck and coincidence.

Before starting long mode the lgdt instruction will expect a 32 bit gdt pointer. If there's any more data, the top bits will just be truncated (due to the small-endian nature of the processor). As luck would have it, the only difference between BootGDT and V2P(BootGDT) lies in the top 32 bits. This also means that when it's time to load a 64 bit GDT, we can use the same pointer. Neat!

At this point, it would be a good idea to check that the kernel still boots. However, gdb won't be able to tell you anything about the code since we're running outside of the linked addresses. You can still use it to inspect registers and such, though. You can also set breakpoints by modifying the address manually: (gdb) break *(long_mode_start - 0xFFFFFF8000000000).

Jumping to higher half

The final piece of setup we need to do before we can start running in the higher half is update the page table.

We'll do this by adding a pointer to the same P3 we set up earlier at the end of the BootP4.

src/kernel/boot/boot_PT.S

...
BootP4:
  .quad offset V2P(BootP3) + (PAGE_PRESENT | PAGE_WRITE)
  .rept ENTRIES_PER_PT - 2
    .quad 0
  .endr
  .quad offset V2P(BootP3) + (PAGE_PRESENT | PAGE_WRITE)
...

If you start up the emulator now you can check that the higher half is mapped

(gdb) mmap
0000000000000000-0000000040000000 0000000040000000 -rw
0000ff8000000000-0000ff8040000000 0000000040000000 -rw

Note that qemu doesn't report the addresses in canonical mode.

Anyway. It should now be safe to jump to higher half code:

src/kernel/boot/boot.S

...
.code64
long_mode_start:
  mov eax, 0x0
  mov ss, eax
  mov ds, eax
  mov es, eax
  mov fs, eax
  mov gs, eax

  movabs rax, offset upper_memory
  jmp rax

upper_memory:

  jmp $

By loading the address of the upper_memory into a register and jumping to it we force the assembler to make a non-relative jump.

If you run this, you'll find that gdb will be able to track where in the code you are again (after passing upper_memory:, or you could check the RIP register.

upper_memory () at boot/boot.S:116
116       jmp $
(gdb) reg RIP
RIP=ffffff800001019f

Great! Now we can do some cleanup

Move the stack pointer to higher half memory:

...
upper_memory:
  mov rax, KERNEL_OFFSET
  add rsp, rax
...

and unmap the identity mapping of the first gigabyte and reload the page table:

...
  mov rax, 0
  movabs [BootP4], rax

  mov rax, cr3
  mov cr3, rax
...

Run it all again, and check that the low memory is unmapped:

(gdb) mmap
0000ff8000000000-0000ff8040000000 0000000040000000 -rw

Finally, we also need to reload the GDT. In long mode, the GDT register points to the physical address of the GDT, and we just unmapped that...

So we need to

  • reload the GDT and update the data selectors:
...
  lgdt[rax]
  mov rax, 0x0
  mov ss, rax
  mod ds, rax
  mov es, rax
...
  • and reload the code selector. There are no long jumps in long mode, so instead we'll use the retfq instruction which pops a return address and code segment selector off the stack:
...
  movabs rax, offset .reload_cs
  pushq 0x8
  push rax
  retfq
.reload_cs:

Running c code

Now that the instruction pointer is safely within our linked memory, we can trust c code to run.

Calling a c function is simple enough:

src/kernel/boot/boot.S

...
.reload_cs:

.extern kmain
  movabs rax, offset kmain
  call rax

  hlt
  jmp $

And the c source file: src/kernel/boot/kmain.c

#include <memory.h>

void clear_screen()
{
  unsigned char *vidmem = P2V(0xB8000);
  for(int i=0; i < 80*24*2; i++)
    *vidmem++ = 0;
}

void print_string(char *str)
{
  unsigned char *vidmem = P2V(0xB8000);
  while(*str)
  {
    *vidmem++ = *str++;
    *vidmem++ = 0x7;
  }
}

void kmain()
{
  clear_screen();
  print_string("Hello from c, world!");
  for(;;);
}

... which will clear the screen and print "Hello from c, world!". Things are so much simpler in c...

But what now? This doesn't compile!

You'll probably get an error about "relocation truncated to fit: R_X86_64_32 against `.rodata`"

This is because gcc assumes your code will be running at a lower memory address, and optimizes it as such. The solution is to tell gcc to make no assumption about addresses by adding the switch -mcmodel=large to CFLAGS in your makefile.