diff --git a/doc/9_Memory_Management.md b/doc/9_Memory_Management.md new file mode 100644 index 0000000..9ce7990 --- /dev/null +++ b/doc/9_Memory_Management.md @@ -0,0 +1,238 @@ +# Memory Management + +Normally, I think of the memory management of the kernel as four parts. + +- Physical Memory Manager +- Virtual Memory Manager +- Kernel Heap +- Process Memory Manager + +We'll go through them in order, but first a radical design decision. + +## Mapping the entire physical memory into kernel space + +A 32-bit processor can address about 4 Gb of memory. Near the end of the 32-bit +era, even home desktop computers were actually running into this limit. + +In chapter 4, I mentioned that te top 16 address bits must be the same due to +hardware limitations, in what's called canonical addressing. This leaves 48 +bits, which lets us address 256 Tb. + +I also mentioned that I reserve a 512th of this for the kernel (virtual +addresses above 0xFFFFFF8000000000), which is about 512 Gb. + +512 Gb - to paraphrase something Bill Gates probably never actually said - +ought to be enough for anybody. + +This means we could - for any forseeable future, and definitely longer than the +lifespan of my hobby kernel designed for running in an emulated environment +with a couple of Mb of RAM - map the entirety of RAM right into the kernel +address space. + +The main advantage of this is that we will never have to map anything into +kernel space temporarily, say to modifiy the page tables of a process. It also +means that we can use unused pages for temporary storage (more on this in a +minute). + +The main disadvantage is that the entirety of memory is mapped into the kernels +virtual memory at all times. At the time of writing this, the Spectre and +Meltdown hardware bugs affecting almost all modern Intel CPUs have recently +proved that anything mapped into memory is insecure. + +There are ways to remedy this, but they come at a performance cost, and - +honestly - I don't feel like implementing them right now. Perhaps I will later, +or perhaps I'll never have to because Intel decides that maybe they should fix +the issue. + +Anyway. The entirety of physical RAM will be mapped into kernel space. That +means a physical page with address `addr` can be accessed at `P2V(addr)`, where +`P2V` is one of the two macros defined in Chapter 4. This explains their name +P2V - Physical to Virtual, and V2P - Virtual to Physical. We'll use that a lot. + +## The physical memory manager - PMM + +The PMM keeps track of available physical memory and hands out free pages when +requested. + +Thanks to virtual address translation, we almost never need to care where a +physical page is located, and therefor the PMM can be made very simple. + +In reality, we sometimes do need a number of continuous physical pages - for +example when reading from disk using Direct Memory Access, but we'll save that +for another day. + +The entire PMM has only one single state variable. A pointer to the last freed +page. This page, in turn, holds a pointer to the one that was freed before, and +so on, so when we free a page, we write the pointer to it, and update the free +pointer: + +`src/kernel/memory/pmm.c` +```c +... +uint64_t next = 0; + +void pmm_free(uint64_t page) +{ + *(uint64_t *)P2V(page) = next; + next = (uint64_t)P2V(page); +} +... +``` + +That's all. Allocating a page is equally simple: + +```c +... +uint64_t pmm_alloc() +{ + if(!next) return 0; + uint64_t page = next; + next = *(uint64_t *)page; + return V2P(page); +} +... +``` + +Feeling comfortable about dereferencing a pointer cast will help you when +debugging your VMM in gdb, by the way. + +I also define a `pmm_calloc()` function, which allocates and zeros a page. + +## The virtual memory manager - VMM + +The VMM handles setting up page tables, and separating user and kernel space. + +As mentioned earlier, x86\_64 uses four levels of page tables, each containing +512 64 bit entries. The 52 most significant bits of each entry is a pointer to +the next level of page table, or the page itself in the case of the last page +table (P1 in my inofficial nomenclature). This pointer is obviously always page +alligned. + +The 12 least significant bits of each entry are for various flags. In order to +find the next level, they need to be masked out. So let's start with some +macros: + +`src/kernel/memory/vmm.c` +```c +... +#define FLAGS_MASK (PAGE_SIZE - 1) +#define MASK_FLAGS(addr) ((uint64_t)addr & ~FLAGS_MASK) +... +``` + +Now it's easy to go from a page table entry to something we can parse: + +```c +... +#define PT(ptr) ((uint64_t *)P2V(MASK_FLAGS(ptr))) +... +``` + +Page table entries are physical addresses, so we need to go through `P2V` to +access them. To access a certain page table entry, you just do e.g. +`PT(P4)[num]`.I chose to define some macros to access the entries for a certain +address in each page table (P4 trough P1). They look like this: + +```c +... +#define P4E (PT(P4)[P4_OFFSET(addr)]) +#define P3E PT(P4E)[P3_OFFSET(addr)]) +#define P2E PT(P3E)[P2_OFFSET(addr)]) +#define P1E PT(P2E)[P1_OFFSET(addr)]) +... +``` + +This assumes that there's a variable named `addr` which +contains the virtual address whose page table entries you +want, and a variable named `P4` which points to the top +level page directory. The `OFFSET` macros finds the +correct entry for an address for each page table level +(`#define P4_OFFSET(a) (((a)>>39 & 0x1FF)` and so on). + +Now getting the physical page for a virtual address is very easy: + +```c +... +uint64_t vmm_get_page(uint64_t P4, uint64_t addr) +{ + if(P4 && PRESENT(P4E) && PRESENT(P3E) && PRESENT(P2E)) + return P1E; + return -1; +} +... +``` + +Where the `PRESENT` macro just checks for the `PAGE_PRESENT` bit being set. + +Setting the physical page for a virtal address is also very easy: + +```c +... +int vmm_set_page(uint64_t P4, uint64_t addr, uint64_t page, uint16_t flags) +{ + ... + + if(!PRESENT(P4E) && !(P4E = pmm_calloc())) + return -1; + P4E |= flags | PAGE_PRESENT; + + // Do the same thing for P3E and P2E + ... + + P1E = page | flags; + return 0; +} +... +``` + +The first three lines checks if the P4 entry is set, and if not, allocates a P3 +and sets the entry to point to it. If the allocation fails, `vmm_set_page` +fails as well. The same is then done with P2 and P1, and finally the correct P1 +entry is set with the page address and flags. + +I also wrote a function `void vmm_clear_page(uint64_t P4, uint64_t addr, int +free)` which zeros the P1 entry, and - if `free` is true - frees P1, P2 and P3 +if they are empty. This is left as an exercise to the reader. + +## The kernel heap + +The kernel heap keeps track of and hands out small chunks of memory for +temporary storage. This corresponds to the `malloc()`-family of functions. + +In this case, I actually chose to forego the heap, and try to make do with +hard-coded global variables and structures using entire pages. + +For example, when setting up a new process, I might normally do something like + +```c +struct process *new_process = kmalloc(sizeof(struct process)); +setup_process(new_process); +``` + +Now, I'll instead do: + +```c +struct process *new_process = (void *)pmm_alloc(); +setup_process(new_process); +``` + +This will reqire me to think a bit more carefully of how I define my various +data structures in order to keep wasted space to a minimum. It'll be an +interesting experiment, but we'll see - perhaps I'll end up implementing a heap +later... + +## The process memory manager - PROCMM + +The final memory related part of the kernel - the procmm - handles user space +memory. Setting up and cloning process memory spaces, replacing them with new +executables, and handling the user stack and `brk()` calls are some of its +tasks. + +Since there are no processes yet, having a process memory manager doesn't really make sense, so I'll save this for later... + +## A remark about the git history + +For this chapter, I went a bit crazy with the TDD and made one git commit every +time I wrote and passed a test. Perhaps that would make sense if I had a +finished API to conform agains. Now, it just got a bit messy... If you explore +the git history - I'm sorry. diff --git a/doc/README.md b/doc/README.md index a41c206..af5af8a 100644 --- a/doc/README.md +++ b/doc/README.md @@ -11,4 +11,5 @@ [Chapter 6: Debug output](6_Debug_Output.md)
[Chapter 7: Multiboot Data](7_Multiboot_Data.md)
[Chapter 8: Exceptions and Interrupts](8_Exceptions.md)
+[Chapter 9: Memory Management](9_Memory_Management.md)