mittos64/doc/3_Activate_Long_Mode.md

457 lines
12 KiB
Markdown

# Chapter 3 - Entering Long Mode
In this chapter, we'll put the processor in long mode with minimal
possible effort.
## Preparation
The AMD64 manual volume 4 outlines what needs to be done in order to
actiave long mode (chapter 14). It says we need:
- An IDT with 64 bit interrupt-gate descriptors *We don't need this as long as
interrupts are disabled*
- 64-bit interrupt and exception handlers *See above*
- A GDT containing:
- Any LDT descriptors *We don't have any*
- A TSS descriptor *Only needed when we want to enter User mode*
- Code descriptors for long mode code *One is enough for now*
- Data-segment descriptors for software running in compatibility mode *We
don't have that*
- FS and GS data-segment descriptors *We won't be using those*
- A 64-bit TSS *See note about TSS descriptor above*
- The 4-level page translation tables
So if we bring it down to the essentials:
- A GDT with one entry
- A Page Table
Shouldn't be too hard. In fact, for now we can actually pretty much
hardcode those...
## GDT
In long mode, segmentation and the GDT doesn't really fill any
purpose... It's still required, for some reason, but if you read the
AMD manual, you'll see that in long mode, almost all fields of the GDT
entries are ignored.
What's left can be set up like this:
`src/kernel/boot/boot_GDT.S`
```asm
#include <gdt.h>
.intel_syntax noprefix
.section .rodata
.global BootGDT
.global BootGDTp
BootGDT:
.long 0,0
.long 0, (GDT_PRESENT | GDT_CODE | GDT_LONG)
BootGDTp:
.short 2*8-1
.quad offset BootGDT
```
where
- `GDT_PRESENT = 1<<15`
- `GDT_CODE = 3<<11`
- `GDT_LONG = 1<<21`
The GDT is page aligned, of course, and the GDT pointer is configured in
the same way as in 32 bit mode.
## Page Tables
Paging works pretty much exactly the same way in 64 bit mode as in
32, but with four levels of nested tables instead of two. If you have
trouble wrapping your head around it, chapter 5 *Page Translation and
Protection* of the AMD64 Systems programming manual should help.
The four levels do have names, "Page-Map Level-4 Table", "Page-Directory
Pointer Table", "Page Directory Table" and "Page Table", but I like to
think of them as P4, P3, P2 and P1.
We could make use of the 2 Mb page translation feature, which uses only three
levels. I.e. the entries of P2 points directly at the start of a 2 Mb memory
area rather than at a P1. This is indicated by a special flag in the P2 entry.
Doing so would make the memory management a bit more complicated later, though,
so I won't use that for now.
For now, we'll just identity map the first two megabytes of memory. That should
be enough to get the kernel started. So we just need a P4 where the first
entry points to a P3 where the first entry points to a P2 where the first entry
points to a P1 filled with 512 entries ranging from 0 to 2 mb.
`src/kernel/boot/boot_PT.S`
```asm
.#include <memory.h>
.intel_syntax noprefix
.section .data
.align PAGE_SIZE
.global BootP4
BootP4:
.quad offset BootP3 + (PAGE_PRESENT | PAGE_WRITE)
.rept ENTRIES_PER_PT - 1
.quad 0
.endr
BootP3:
.quad offset BootP2 + (PAGE_PRESENT | PAGE_WRITE)
.rept ENTRIES_PER_PT - 1
.quad 0
.endr
BootP2:
.quad offset BootP1 + (PAGE_PRESENT | PAGE_WRITE)
.rept ENTRIES_PER_PT - 1
.quad 0
.endr
BootP1:
.set i, 0
.rept ENTRIES_PER_PT
.quad (i << 12) + (PAGE_PRESENT | PAGE_WRITE)
.set i, (i+1)
.endr
```
where
- `PAGE_PRESENT = 0x001`
- `PAGE_WRITE = 0x002`
- `ENTRIES_PER_PT = 512`
## Activating Long Mode
Again, consulting the AMD64 manual we find the following steps to
activate long mode:
1. Disable paging *Paging isn't enabled by GRUB, so we're good to go*
2. In any order:
- Enable PAE by setting CR4.PAE to 1
- Load CR3 with the address of P4
- Enable long mode by setting EFER.LME to 1
3. Enable paging
We should then reload the system tables (in our case only GDT) with 64
bit descriptors.
The manual is even kind enough to supply us with some sample code which
also performs some checks to ensure that long mode is available. So
let's go.
`src/kernel/boot/boot.S`
```asm
...
.code32
.global _start
_start:
cli
mov esp, offset BootStack
...
```
First we set up a temporary stack for booting. The label BootStack is
defined earlier:
```asm
.section .bss
.align PAGE_SIZE
.skip PAGE_SIZE
BootStack:
```
Note that the label is after the reserved memory, since the stack grows upwards.
If you wish to make things The Right Way, you should probably check if the
processor supports long mode before going further. This can be done through
the `cpuid` instruction and the process is described in the AMD64 manual. I
opted to skip this check, and just fail in an uncontrolled manner in the
unlikely event that the code is run on 32 bit processor.
Ok. Let's get to the meat of it
`src/kernel/boot/boot.S`
```asm
...
//; Set CR4.PAE
//; enabling Page Address Extension
mov eax, cr4
or eax, 1<<5
mov cr4, eax
//; Load a P4 page table
mov eax, offset BootP4
mov cr3, eax
//; Set EFER.LME
//; enabling Long Mode
mov ecx, 0x0C0000080
rdmsr
or eax, 1<<8
wrmsr
//; Set CR0.PG
//; enabling Paging
mov eax, cr0
or eax, 1<<31
mov cr0, eax
...
```
I think the comments explain this well enough. It's just following the
list of actions from the AMD manual anyway.
> Speaking of comments, I apologize for the unconventional comment style `//;`.
> Normally GAS assembly is commented by a `;`, but I run all my files through
> the gcc preprocessor, which interprets semicolon as the end of a line.
> Instead, I have to use c-style comments (`//` or `/* */`). Those are,
> however, not recognized by the github markdown syntax coloring engine, and
> the results look messy with weird colors all over the place. That's why I use
> the combination.
The only step that's left is reloading the system tables. This is done
in exactly the same way as when going to protected mode, by loading a
GDT, loading selectors, and performing a long jump to load CS.
`src/kernel/boot/boot.S`
```asm
...
//; Load a new GDT
lgdt [BootGDTp]
//; and update the code selector by a long jump
jmp 0x8:long_mode_start
.code64
long_mode_start:
//; Clear out all other selectors
mov eax, 0x0
mov ss, eax
mov ds, eax
mov es, eax
//; Loop infinitely
jmp $
```
And that's all!
## Testing it out
Fire up the emulator, make sure the kernel is loaded into gdb, and let's go!
Let's step through the entire boot process
(gdb) b _start
Breakpoint 1 at 0x91: file boot/boot.S, line 63
(gdb) c
Continuing.
Breakpoint 1, _start () at boot/boot.S:63
64 cli
(gdb)
The first thing that happens is that we set the stack pointer. You can
see that this happens by printing `esp`.
(gdb)p/x $esp
$1 = 0x7ff00
(gdb)si
65 mov esp, offset BootStack
(gdb)si
67 call check_cpuid
(gdb)p/x $esp
$2 = 0x5000
So things seem to work so far.
The next thing that happens is that the two functions are called to
check cpuid and long mode availability. You can step through those and
inspect values as you wish. I'll just skip to after returning from
check\_longmode.
You can't print the contents of `CR4` in gdb, but you can read it from
the qemu monitor command `info registers` which can be called from gdb
by the `monitor command`.
_start () at boot/boot.S:72
72 mov eax, cr4
(gdb) monitor info registers
EAX=00000664 EBX=00000000 ECX=00000005 EDX=2193fbfd
...
CR0=00000011 CR2=00000000 CR3=00000000 CR4=00000000
...
XMM06=00000000000000000000000000000000 XMM07=00000000000000000000000000000000
(gdb)
To simplify things, I wrote a python function to extract individual
registers and put it in `toolchain/gdbinit` - see next section. This
let's you run `reg cr4` to se the register contents.
(gdb) reg cr4
CR4=00000000
(gdb) n
73 or eax, 1<<5
(gdb)
74 mov cr4, eax
(gdb)
77 mov eax, offset BootP4
(gdb) reg cr4
CR4=00000020
(gdb)
No surprises there, really. Page address extension should now be enabled.
The next step is loading the page table. This shouldn't actually matter,
since paging is disabled, so just step through it and make sure the
value loaded into `CR3` is page aligned.
82 mov ecx, 0x8C000080
(gdb) reg cr3
CR3=00002000
The same goes for setting the Long Mode Enable bit of the EFER register.
Make sure to remember the value of EFER, though ...
89 mov eax, cr0
(gdb) reg efer
EFER=0000000000000100
(gdb) monitor info mem
PG disabled
(gdb)
... because after we enable paging by setting the Paging bit in CR0 ...
(gdb) reg cr0
CR0=00000011
(gdb) n
90 or eax, 1<<31
(gdb)
91 mov cr0, eax
(gdb)
94 lgdt [BootGDTp]
(gdb) reg cr0
CR0=80000011
(gdb)
... the Long Mode Active bit (bit 10) should also be set.
(gdb) reg efer
EFER=0000000000000500
(gdb) monitor info mem
0000000000000000-0000000040000000 0000000040000000 -rw
(gdb)
This means the processor is in Long Mode!
You'll also see that the command `monitor info mem` (which I mapped to
`mmap` in my gdbinit - see next section) show that paging is enabled
and that the first Gb of virtual memory is mapped. Also note that the
virtual address space expects addresses of 64 bits now.
But we're still running code in Legacy Mode. That's why we load the GDT
and reload the segment selectors next.
(gdb) n
97 jmp 0x8:long_mode_start
(gdb)
long_mode_start () at boot/boot.S:103
103 mov eax, 0x0
(gdb) reg
RAX=0000000080000011 RBX=0000000000000000 RCX=00000000c0000080 RDX=0000000000000000
...
CS =0008 0000000000000000 00000000 00209800 DPL=0 CS64 [---]
...
(gdb)
You'll note that the `reg` command now outputs registers with the R
prefix (`RAX` instead of `EAX` etc.) and that they are twice as big.
We are now running 64 bit code!
## Bonus
I mentioned two custom gdb commands in the previous section.
`mmap` which shows the memory map from qemu, and `reg` which prints the
value of a register. Those are defined in `toolchain/gdbinit`:
```
define mmap
monitor info mem
end
python
import re
class Reg(gdb.Command):
def __init__(self):
super(Reg, self).__init__("reg", gdb.COMMAND_USER)
def invoke(self, arg, from_tty):
regs = gdb.execute('monitor info registers', False, True)
if not arg:
# If no argument was given, print the output from qemu
print regs
return
if arg.upper() in ['CS', 'DS', 'ES', 'FS', 'GS', 'SS']:
# Code selectors may contain equals signs
for l in regs.splitlines():
if l.startswith(arg.upper()):
print l
elif arg.upper() in ['EFL', 'RFL']:
# The xFLAGS registers contains equals signs
for l in regs.splitlines():
if arg.upper() in l:
print ' '.join(l.split()[1:])
# The xFLAGS register is the second one on the line
else:
# Split at any word followed by and equals sign
# Clean up both sides of the split and put into a dictionary
# then print the requested register value
regex = re.compile("[A-Z0-9]+\s?=")
names = [v[:-1].strip() for v in regex.findall(regs)]
values = [v.strip() for v in regex.split(regs)][1:]
regs = dict(zip(names, values))
print "%s=%s" % (arg.upper(), regs[arg.upper()])
Reg()
end
```
The `mmap` command is obvious enough, but the `reg` one is a bit tougher.
A bit of information on the syntax of python commands in gdb can be
found [here](https://sourceware.org/gdb/onlinedocs/gdb/Python-API.html).
The rest is some rather messy python, but the basic flow is this
- Get the register output from grub by running `monitor info registers`
- If no argument was given, print the output we got and end
- Look for text in the format `SOMETHING=SOMETHINGELSE` and split it into `SOMETHING` and `SOMETHINGELSE`
- Put `SOMETHING` and `SOMETHINGELSE` back together in a way that's more useful for python
- Print the value we want
Then there are some special cases for things like `EFL` which contains
equals signs in the output. E.g. displaying bit flags `EFL=0000002
[-------] CPL=0 II=0 A20=1 SMM=0 HLT=0`. Note that I'm not catching all
such cases, but only the ones I think might be interesting.
Note also that gdb doesn't require the entire command name, but only enough to
make it unambiguous. As such, you can run `(gdb) mm` instead of `(gdb) mmap` if
you'd like. Just a heads up...