Chapter 3: Enter Long Mode - COMPLETE
This commit is contained in:
parent
646d25825a
commit
8f8e03de10
456
doc/3_Activate_Long_Mode.md
Normal file
456
doc/3_Activate_Long_Mode.md
Normal file
@ -0,0 +1,456 @@
|
||||
# Chapter 3 - Entering Long Mode
|
||||
|
||||
In this chapter, we'll put the processor in long mode with minimal
|
||||
possible effort.
|
||||
|
||||
|
||||
## Preparation
|
||||
|
||||
The AMD64 manual volume 4 outlines what needs to be done in order to
|
||||
actiave long mode (chapter 14). It says we need:
|
||||
|
||||
- An IDT with 64 bit interrupt-gate descriptors *We don't need this as long as
|
||||
interrupts are disabled*
|
||||
- 64-bit interrupt and exception handlers *See above*
|
||||
- A GDT containing:
|
||||
- Any LDT descriptors *We don't have any*
|
||||
- A TSS descriptor *Only needed when we want to enter User mode*
|
||||
- Code descriptors for long mode code *One is enough for now*
|
||||
- Data-segment descriptors for software running in compatibility mode *We
|
||||
don't have that*
|
||||
- FS and GS data-segment descriptors *We won't be using those*
|
||||
- A 64-bit TSS *See note about TSS descriptor above*
|
||||
- The 4-level page translation tables
|
||||
|
||||
So if we bring it down to the essentials:
|
||||
|
||||
- A GDT with one entry
|
||||
- A Page Table
|
||||
|
||||
Shouldn't be too hard. In fact, for now we can actually pretty much
|
||||
hardcode those...
|
||||
|
||||
## GDT
|
||||
|
||||
In long mode, segmentation and the GDT doesn't really fill any
|
||||
purpose... It's still required, for some reason, but if you read the
|
||||
AMD manual, you'll see that in long mode, almost all fields of the GDT
|
||||
entries are ignored.
|
||||
|
||||
What's left can be set up like this:
|
||||
|
||||
`src/kernel/boot/boot_GDT.S`
|
||||
```asm
|
||||
#include <gdt.h>
|
||||
.intel_syntax noprefix
|
||||
|
||||
.section .rodata
|
||||
.global BootGDT
|
||||
.global BootGDTp
|
||||
|
||||
BootGDT:
|
||||
.long 0,0
|
||||
.long 0, (GDT_PRESENT | GDT_CODE | GDT_LONG)
|
||||
|
||||
BootGDTp:
|
||||
.short 2*8-1
|
||||
.quad offset BootGDT
|
||||
```
|
||||
|
||||
where
|
||||
- `GDT_PRESENT = 1<<15`
|
||||
- `GDT_CODE = 3<<11`
|
||||
- `GDT_LONG = 1<<21`
|
||||
|
||||
The GDT is page aligned, of course, and the GDT pointer is configured in
|
||||
the same way as in 32 bit mode.
|
||||
|
||||
|
||||
## Page Tables
|
||||
|
||||
Paging works pretty much exactly the same way in 64 bit mode as in
|
||||
32, but with four levels of nested tables instead of two. If you have
|
||||
trouble wrapping your head around it, chapter 5 *Page Translation and
|
||||
Protection* of the AMD64 Systems programming manual should help.
|
||||
|
||||
The four levels do have names, "Page-Map Level-4 Table", "Page-Directory
|
||||
Pointer Table", "Page Directory Table" and "Page Table", but I like to
|
||||
think of them as P4, P3, P2 and P1.
|
||||
|
||||
We could make use of the 2 Mb page translation feature, which uses only three
|
||||
levels. I.e. the entries of P2 points directly at the start of a 2 Mb memory
|
||||
area rather than at a P1. This is indicated by a special flag in the P2 entry.
|
||||
Doing so would make the memory management a bit more complicated later, though,
|
||||
so I won't use that for now.
|
||||
|
||||
For now, we'll just identity map the first two megabytes of memory. That should
|
||||
be enough to get the kernel started. So we just need a P4 where the first
|
||||
entry points to a P3 where the first entry points to a P2 where the first entry
|
||||
points to a P1 filled with 512 entries ranging from 0 to 2 mb.
|
||||
|
||||
`src/kernel/boot/boot_PT.S`
|
||||
```asm
|
||||
.#include <memory.h>
|
||||
.intel_syntax noprefix
|
||||
|
||||
.section .data
|
||||
.align PAGE_SIZE
|
||||
.global BootP4
|
||||
|
||||
BootP4:
|
||||
.quad offset BootP3 + (PAGE_PRESENT | PAGE_WRITE)
|
||||
.rept ENTRIES_PER_PT - 1
|
||||
.quad 0
|
||||
.endr
|
||||
BootP3:
|
||||
.quad offset BootP2 + (PAGE_PRESENT | PAGE_WRITE)
|
||||
.rept ENTRIES_PER_PT - 1
|
||||
.quad 0
|
||||
.endr
|
||||
BootP2:
|
||||
.quad offset BootP1 + (PAGE_PRESENT | PAGE_WRITE)
|
||||
.rept ENTRIES_PER_PT - 1
|
||||
.quad 0
|
||||
.endr
|
||||
BootP1:
|
||||
.set i, 0
|
||||
.rept ENTRIES_PER_PT
|
||||
.quad (i << 12) + (PAGE_PRESENT | PAGE_WRITE)
|
||||
.set i, (i+1)
|
||||
.endr
|
||||
```
|
||||
|
||||
where
|
||||
- `PAGE_PRESENT = 0x001`
|
||||
- `PAGE_WRITE = 0x002`
|
||||
- `ENTRIES_PER_PT = 512`
|
||||
|
||||
|
||||
## Activating Long Mode
|
||||
|
||||
Again, consulting the AMD64 manual we find the following steps to
|
||||
activate long mode:
|
||||
|
||||
1. Disable paging *Paging isn't enabled by GRUB, so we're good to go*
|
||||
2. In any order:
|
||||
- Enable PAE by setting CR4.PAE to 1
|
||||
- Load CR3 with the address of P4
|
||||
- Enable long mode by setting EFER.LME to 1
|
||||
3. Enable paging
|
||||
|
||||
We should then reload the system tables (in our case only GDT) with 64
|
||||
bit descriptors.
|
||||
|
||||
The manual is even kind enough to supply us with some sample code which
|
||||
also performs some checks to ensure that long mode is available. So
|
||||
let's go.
|
||||
|
||||
`src/kernel/boot/boot.S`
|
||||
|
||||
```asm
|
||||
...
|
||||
.code32
|
||||
.global _start
|
||||
_start:
|
||||
cli
|
||||
mov esp, offset BootStack
|
||||
...
|
||||
```
|
||||
|
||||
First we set up a temporary stack for booting. The label BootStack is
|
||||
defined earlier:
|
||||
|
||||
```asm
|
||||
.section .bss
|
||||
.align PAGE_SIZE
|
||||
.skip PAGE_SIZE
|
||||
BootStack:
|
||||
```
|
||||
|
||||
Note that the label is after the reserved memory, since the stack grows upwards.
|
||||
|
||||
If you wish to make things The Right Way, you should probably check if the
|
||||
processor supports long mode before going further. This can be done through
|
||||
the `cpuid` instruction and the process is described in the AMD64 manual. I
|
||||
opted to skip this check, and just fail in an uncontrolled manner in the
|
||||
unlikely event that the code is run on 32 bit processor.
|
||||
|
||||
Ok. Let's get to the meat of it
|
||||
|
||||
`src/kernel/boot/boot.S`
|
||||
```asm
|
||||
...
|
||||
//; Set CR4.PAE
|
||||
//; enabling Page Address Extension
|
||||
mov eax, cr4
|
||||
or eax, 1<<5
|
||||
mov cr4, eax
|
||||
|
||||
//; Load a P4 page table
|
||||
mov eax, offset BootP4
|
||||
mov cr3, eax
|
||||
|
||||
//; Set EFER.LME
|
||||
//; enabling Long Mode
|
||||
mov ecx, 0x0C0000080
|
||||
rdmsr
|
||||
or eax, 1<<8
|
||||
wrmsr
|
||||
|
||||
//; Set CR0.PG
|
||||
//; enabling Paging
|
||||
mov eax, cr0
|
||||
or eax, 1<<31
|
||||
mov cr0, eax
|
||||
...
|
||||
```
|
||||
|
||||
I think the comments explain this well enough. It's just following the
|
||||
list of actions from the AMD manual anyway.
|
||||
|
||||
> Speaking of comments, I apologize for the unconventional comment style `//;`.
|
||||
> Normally GAS assembly is commented by a `;`, but I run all my files through
|
||||
> the gcc preprocessor, which interprets semicolon as the end of a line.
|
||||
> Instead, I have to use c-style comments (`//` or `/* */`). Those are,
|
||||
> however, not recognized by the github markdown syntax coloring engine, and
|
||||
> the results look messy with weird colors all over the place. That's why I use
|
||||
> the combination.
|
||||
|
||||
The only step that's left is reloading the system tables. This is done
|
||||
in exactly the same way as when going to protected mode, by loading a
|
||||
GDT, loading selectors, and performing a long jump to load CS.
|
||||
|
||||
`src/kernel/boot/boot.S`
|
||||
```asm
|
||||
...
|
||||
//; Load a new GDT
|
||||
lgdt [BootGDTp]
|
||||
|
||||
//; and update the code selector by a long jump
|
||||
jmp 0x8:long_mode_start
|
||||
|
||||
.code64
|
||||
long_mode_start:
|
||||
|
||||
//; Clear out all other selectors
|
||||
mov eax, 0x0
|
||||
mov ss, eax
|
||||
mov ds, eax
|
||||
mov es, eax
|
||||
|
||||
//; Loop infinitely
|
||||
jmp $
|
||||
```
|
||||
|
||||
And that's all!
|
||||
|
||||
## Testing it out
|
||||
|
||||
Fire up the emulator, make sure the kernel is loaded into gdb, and let's go!
|
||||
|
||||
Let's step through the entire boot process
|
||||
|
||||
(gdb) b _start
|
||||
Breakpoint 1 at 0x91: file boot/boot.S, line 63
|
||||
(gdb) c
|
||||
Continuing.
|
||||
|
||||
Breakpoint 1, _start () at boot/boot.S:63
|
||||
64 cli
|
||||
(gdb)
|
||||
|
||||
The first thing that happens is that we set the stack pointer. You can
|
||||
see that this happens by printing `esp`.
|
||||
|
||||
(gdb)p/x $esp
|
||||
$1 = 0x7ff00
|
||||
(gdb)si
|
||||
65 mov esp, offset BootStack
|
||||
(gdb)si
|
||||
67 call check_cpuid
|
||||
(gdb)p/x $esp
|
||||
$2 = 0x5000
|
||||
|
||||
So things seem to work so far.
|
||||
|
||||
The next thing that happens is that the two functions are called to
|
||||
check cpuid and long mode availability. You can step through those and
|
||||
inspect values as you wish. I'll just skip to after returning from
|
||||
check\_longmode.
|
||||
|
||||
You can't print the contents of `CR4` in gdb, but you can read it from
|
||||
the qemu monitor command `info registers` which can be called from gdb
|
||||
by the `monitor command`.
|
||||
|
||||
_start () at boot/boot.S:72
|
||||
72 mov eax, cr4
|
||||
(gdb) monitor info registers
|
||||
EAX=00000664 EBX=00000000 ECX=00000005 EDX=2193fbfd
|
||||
...
|
||||
CR0=00000011 CR2=00000000 CR3=00000000 CR4=00000000
|
||||
...
|
||||
XMM06=00000000000000000000000000000000 XMM07=00000000000000000000000000000000
|
||||
(gdb)
|
||||
|
||||
To simplify things, I wrote a python function to extract individual
|
||||
registers and put it in `toolchain/gdbinit` - see next section. This
|
||||
let's you run `reg cr4` to se the register contents.
|
||||
|
||||
(gdb) reg cr4
|
||||
CR4=00000000
|
||||
(gdb) n
|
||||
73 or eax, 1<<5
|
||||
(gdb)
|
||||
74 mov cr4, eax
|
||||
(gdb)
|
||||
77 mov eax, offset BootP4
|
||||
(gdb) reg cr4
|
||||
CR4=00000020
|
||||
(gdb)
|
||||
|
||||
No surprises there, really. Page address extension should now be enabled.
|
||||
|
||||
The next step is loading the page table. This shouldn't actually matter,
|
||||
since paging is disabled, so just step through it and make sure the
|
||||
value loaded into `CR3` is page aligned.
|
||||
|
||||
82 mov ecx, 0x8C000080
|
||||
(gdb) reg cr3
|
||||
CR3=00002000
|
||||
|
||||
The same goes for setting the Long Mode Enable bit of the EFER register.
|
||||
Make sure to remember the value of EFER, though ...
|
||||
|
||||
89 mov eax, cr0
|
||||
(gdb) reg efer
|
||||
EFER=0000000000000100
|
||||
(gdb) monitor info mem
|
||||
PG disabled
|
||||
(gdb)
|
||||
|
||||
... because after we enable paging by setting the Paging bit in CR0 ...
|
||||
|
||||
(gdb) reg cr0
|
||||
CR0=00000011
|
||||
(gdb) n
|
||||
90 or eax, 1<<31
|
||||
(gdb)
|
||||
91 mov cr0, eax
|
||||
(gdb)
|
||||
94 lgdt [BootGDTp]
|
||||
(gdb) reg cr0
|
||||
CR0=80000011
|
||||
(gdb)
|
||||
|
||||
... the Long Mode Active bit (bit 10) should also be set.
|
||||
|
||||
(gdb) reg efer
|
||||
EFER=0000000000000500
|
||||
(gdb) monitor info mem
|
||||
0000000000000000-0000000040000000 0000000040000000 -rw
|
||||
(gdb)
|
||||
|
||||
This means the processor is in Long Mode!
|
||||
|
||||
You'll also see that the command `monitor info mem` (which I mapped to
|
||||
`mmap` in my gdbinit - see next section) show that paging is enabled
|
||||
and that the first Gb of virtual memory is mapped. Also note that the
|
||||
virtual address space expects addresses of 64 bits now.
|
||||
|
||||
But we're still running code in Legacy Mode. That's why we load the GDT
|
||||
and reload the segment selectors next.
|
||||
|
||||
(gdb) n
|
||||
97 jmp 0x8:long_mode_start
|
||||
(gdb)
|
||||
long_mode_start () at boot/boot.S:103
|
||||
103 mov eax, 0x0
|
||||
(gdb) reg
|
||||
RAX=0000000080000011 RBX=0000000000000000 RCX=00000000c0000080 RDX=0000000000000000
|
||||
...
|
||||
CS =0008 0000000000000000 00000000 00209800 DPL=0 CS64 [---]
|
||||
...
|
||||
(gdb)
|
||||
|
||||
You'll note that the `reg` command now outputs registers with the R
|
||||
prefix (`RAX` instead of `EAX` etc.) and that they are twice as big.
|
||||
|
||||
We are now running 64 bit code!
|
||||
|
||||
## Bonus
|
||||
|
||||
I mentioned two custom gdb commands in the previous section.
|
||||
|
||||
`mmap` which shows the memory map from qemu, and `reg` which prints the
|
||||
value of a register. Those are defined in `toolchain/gdbinit`:
|
||||
|
||||
```
|
||||
define mmap
|
||||
monitor info mem
|
||||
end
|
||||
|
||||
python
|
||||
|
||||
import re
|
||||
|
||||
class Reg(gdb.Command):
|
||||
|
||||
def __init__(self):
|
||||
super(Reg, self).__init__("reg", gdb.COMMAND_USER)
|
||||
|
||||
def invoke(self, arg, from_tty):
|
||||
regs = gdb.execute('monitor info registers', False, True)
|
||||
|
||||
if not arg:
|
||||
# If no argument was given, print the output from qemu
|
||||
print regs
|
||||
return
|
||||
|
||||
if arg.upper() in ['CS', 'DS', 'ES', 'FS', 'GS', 'SS']:
|
||||
# Code selectors may contain equals signs
|
||||
for l in regs.splitlines():
|
||||
if l.startswith(arg.upper()):
|
||||
print l
|
||||
elif arg.upper() in ['EFL', 'RFL']:
|
||||
# The xFLAGS registers contains equals signs
|
||||
for l in regs.splitlines():
|
||||
if arg.upper() in l:
|
||||
print ' '.join(l.split()[1:])
|
||||
# The xFLAGS register is the second one on the line
|
||||
else:
|
||||
# Split at any word followed by and equals sign
|
||||
# Clean up both sides of the split and put into a dictionary
|
||||
# then print the requested register value
|
||||
regex = re.compile("[A-Z0-9]+\s?=")
|
||||
names = [v[:-1].strip() for v in regex.findall(regs)]
|
||||
values = [v.strip() for v in regex.split(regs)][1:]
|
||||
regs = dict(zip(names, values))
|
||||
print "%s=%s" % (arg.upper(), regs[arg.upper()])
|
||||
|
||||
|
||||
Reg()
|
||||
|
||||
end
|
||||
```
|
||||
|
||||
The `mmap` command is obvious enough, but the `reg` one is a bit tougher.
|
||||
A bit of information on the syntax of python commands in gdb can be
|
||||
found [here](https://sourceware.org/gdb/onlinedocs/gdb/Python-API.html).
|
||||
|
||||
The rest is some rather messy python, but the basic flow is this
|
||||
|
||||
- Get the register output from grub by running `monitor info registers`
|
||||
- If no argument was given, print the output we got and end
|
||||
- Look for text in the format `SOMETHING=SOMETHINGELSE` and split it into `SOMETHING` and `SOMETHINGELSE`
|
||||
- Put `SOMETHING` and `SOMETHINGELSE` back together in a way that's more useful for python
|
||||
- Print the value we want
|
||||
|
||||
Then there are some special cases for things like `EFL` which contains
|
||||
equals signs in the output. E.g. displaying bit flags `EFL=0000002
|
||||
[-------] CPL=0 II=0 A20=1 SMM=0 HLT=0`. Note that I'm not catching all
|
||||
such cases, but only the ones I think might be interesting.
|
||||
|
||||
Note also that gdb doesn't require the entire command name, but only enough to
|
||||
make it unambiguous. As such, you can run `(gdb) mm` instead of `(gdb) mmap` if
|
||||
you'd like. Just a heads up...
|
||||
|
@ -5,4 +5,5 @@
|
||||
[Chapter 0: Introduction](0_Introduction.md)<br>
|
||||
[Chapter 1: Toolchain](1_Toolchain.md)<br>
|
||||
[Chapter 2: Booting a Kernel](2_A_Bootable_Kernel.md)<br>
|
||||
[Chapter 3: Activate Long Mode](3_Activate_Long_Mode.md)<br>
|
||||
|
||||
|
Loading…
x
Reference in New Issue
Block a user