457 lines
12 KiB
Markdown
457 lines
12 KiB
Markdown
# Chapter 3 - Entering Long Mode
|
|
|
|
In this chapter, we'll put the processor in long mode with minimal
|
|
possible effort.
|
|
|
|
|
|
## Preparation
|
|
|
|
The AMD64 manual volume 4 outlines what needs to be done in order to
|
|
actiave long mode (chapter 14). It says we need:
|
|
|
|
- An IDT with 64 bit interrupt-gate descriptors *We don't need this as long as
|
|
interrupts are disabled*
|
|
- 64-bit interrupt and exception handlers *See above*
|
|
- A GDT containing:
|
|
- Any LDT descriptors *We don't have any*
|
|
- A TSS descriptor *Only needed when we want to enter User mode*
|
|
- Code descriptors for long mode code *One is enough for now*
|
|
- Data-segment descriptors for software running in compatibility mode *We
|
|
don't have that*
|
|
- FS and GS data-segment descriptors *We won't be using those*
|
|
- A 64-bit TSS *See note about TSS descriptor above*
|
|
- The 4-level page translation tables
|
|
|
|
So if we bring it down to the essentials:
|
|
|
|
- A GDT with one entry
|
|
- A Page Table
|
|
|
|
Shouldn't be too hard. In fact, for now we can actually pretty much
|
|
hardcode those...
|
|
|
|
## GDT
|
|
|
|
In long mode, segmentation and the GDT doesn't really fill any
|
|
purpose... It's still required, for some reason, but if you read the
|
|
AMD manual, you'll see that in long mode, almost all fields of the GDT
|
|
entries are ignored.
|
|
|
|
What's left can be set up like this:
|
|
|
|
`src/kernel/boot/boot_GDT.S`
|
|
```asm
|
|
#include <gdt.h>
|
|
.intel_syntax noprefix
|
|
|
|
.section .rodata
|
|
.global BootGDT
|
|
.global BootGDTp
|
|
|
|
BootGDT:
|
|
.long 0,0
|
|
.long 0, (GDT_PRESENT | GDT_CODE | GDT_LONG)
|
|
|
|
BootGDTp:
|
|
.short 2*8-1
|
|
.quad offset BootGDT
|
|
```
|
|
|
|
where
|
|
- `GDT_PRESENT = 1<<15`
|
|
- `GDT_CODE = 3<<11`
|
|
- `GDT_LONG = 1<<21`
|
|
|
|
The GDT is page aligned, of course, and the GDT pointer is configured in
|
|
the same way as in 32 bit mode.
|
|
|
|
|
|
## Page Tables
|
|
|
|
Paging works pretty much exactly the same way in 64 bit mode as in
|
|
32, but with four levels of nested tables instead of two. If you have
|
|
trouble wrapping your head around it, chapter 5 *Page Translation and
|
|
Protection* of the AMD64 Systems programming manual should help.
|
|
|
|
The four levels do have names, "Page-Map Level-4 Table", "Page-Directory
|
|
Pointer Table", "Page Directory Table" and "Page Table", but I like to
|
|
think of them as P4, P3, P2 and P1.
|
|
|
|
We could make use of the 2 Mb page translation feature, which uses only three
|
|
levels. I.e. the entries of P2 points directly at the start of a 2 Mb memory
|
|
area rather than at a P1. This is indicated by a special flag in the P2 entry.
|
|
Doing so would make the memory management a bit more complicated later, though,
|
|
so I won't use that for now.
|
|
|
|
For now, we'll just identity map the first two megabytes of memory. That should
|
|
be enough to get the kernel started. So we just need a P4 where the first
|
|
entry points to a P3 where the first entry points to a P2 where the first entry
|
|
points to a P1 filled with 512 entries ranging from 0 to 2 mb.
|
|
|
|
`src/kernel/boot/boot_PT.S`
|
|
```asm
|
|
.#include <memory.h>
|
|
.intel_syntax noprefix
|
|
|
|
.section .data
|
|
.align PAGE_SIZE
|
|
.global BootP4
|
|
|
|
BootP4:
|
|
.quad offset BootP3 + (PAGE_PRESENT | PAGE_WRITE)
|
|
.rept ENTRIES_PER_PT - 1
|
|
.quad 0
|
|
.endr
|
|
BootP3:
|
|
.quad offset BootP2 + (PAGE_PRESENT | PAGE_WRITE)
|
|
.rept ENTRIES_PER_PT - 1
|
|
.quad 0
|
|
.endr
|
|
BootP2:
|
|
.quad offset BootP1 + (PAGE_PRESENT | PAGE_WRITE)
|
|
.rept ENTRIES_PER_PT - 1
|
|
.quad 0
|
|
.endr
|
|
BootP1:
|
|
.set i, 0
|
|
.rept ENTRIES_PER_PT
|
|
.quad (i << 12) + (PAGE_PRESENT | PAGE_WRITE)
|
|
.set i, (i+1)
|
|
.endr
|
|
```
|
|
|
|
where
|
|
- `PAGE_PRESENT = 0x001`
|
|
- `PAGE_WRITE = 0x002`
|
|
- `ENTRIES_PER_PT = 512`
|
|
|
|
|
|
## Activating Long Mode
|
|
|
|
Again, consulting the AMD64 manual we find the following steps to
|
|
activate long mode:
|
|
|
|
1. Disable paging *Paging isn't enabled by GRUB, so we're good to go*
|
|
2. In any order:
|
|
- Enable PAE by setting CR4.PAE to 1
|
|
- Load CR3 with the address of P4
|
|
- Enable long mode by setting EFER.LME to 1
|
|
3. Enable paging
|
|
|
|
We should then reload the system tables (in our case only GDT) with 64
|
|
bit descriptors.
|
|
|
|
The manual is even kind enough to supply us with some sample code which
|
|
also performs some checks to ensure that long mode is available. So
|
|
let's go.
|
|
|
|
`src/kernel/boot/boot.S`
|
|
|
|
```asm
|
|
...
|
|
.code32
|
|
.global _start
|
|
_start:
|
|
cli
|
|
mov esp, offset BootStack
|
|
...
|
|
```
|
|
|
|
First we set up a temporary stack for booting. The label BootStack is
|
|
defined earlier:
|
|
|
|
```asm
|
|
.section .bss
|
|
.align PAGE_SIZE
|
|
.skip PAGE_SIZE
|
|
BootStack:
|
|
```
|
|
|
|
Note that the label is after the reserved memory, since the stack grows upwards.
|
|
|
|
If you wish to make things The Right Way, you should probably check if the
|
|
processor supports long mode before going further. This can be done through
|
|
the `cpuid` instruction and the process is described in the AMD64 manual. I
|
|
opted to skip this check, and just fail in an uncontrolled manner in the
|
|
unlikely event that the code is run on 32 bit processor.
|
|
|
|
Ok. Let's get to the meat of it
|
|
|
|
`src/kernel/boot/boot.S`
|
|
```asm
|
|
...
|
|
//; Set CR4.PAE
|
|
//; enabling Page Address Extension
|
|
mov eax, cr4
|
|
or eax, 1<<5
|
|
mov cr4, eax
|
|
|
|
//; Load a P4 page table
|
|
mov eax, offset BootP4
|
|
mov cr3, eax
|
|
|
|
//; Set EFER.LME
|
|
//; enabling Long Mode
|
|
mov ecx, 0x0C0000080
|
|
rdmsr
|
|
or eax, 1<<8
|
|
wrmsr
|
|
|
|
//; Set CR0.PG
|
|
//; enabling Paging
|
|
mov eax, cr0
|
|
or eax, 1<<31
|
|
mov cr0, eax
|
|
...
|
|
```
|
|
|
|
I think the comments explain this well enough. It's just following the
|
|
list of actions from the AMD manual anyway.
|
|
|
|
> Speaking of comments, I apologize for the unconventional comment style `//;`.
|
|
> Normally GAS assembly is commented by a `;`, but I run all my files through
|
|
> the gcc preprocessor, which interprets semicolon as the end of a line.
|
|
> Instead, I have to use c-style comments (`//` or `/* */`). Those are,
|
|
> however, not recognized by the github markdown syntax coloring engine, and
|
|
> the results look messy with weird colors all over the place. That's why I use
|
|
> the combination.
|
|
|
|
The only step that's left is reloading the system tables. This is done
|
|
in exactly the same way as when going to protected mode, by loading a
|
|
GDT, loading selectors, and performing a long jump to load CS.
|
|
|
|
`src/kernel/boot/boot.S`
|
|
```asm
|
|
...
|
|
//; Load a new GDT
|
|
lgdt [BootGDTp]
|
|
|
|
//; and update the code selector by a long jump
|
|
jmp 0x8:long_mode_start
|
|
|
|
.code64
|
|
long_mode_start:
|
|
|
|
//; Clear out all other selectors
|
|
mov eax, 0x0
|
|
mov ss, eax
|
|
mov ds, eax
|
|
mov es, eax
|
|
|
|
//; Loop infinitely
|
|
jmp $
|
|
```
|
|
|
|
And that's all!
|
|
|
|
## Testing it out
|
|
|
|
Fire up the emulator, make sure the kernel is loaded into gdb, and let's go!
|
|
|
|
Let's step through the entire boot process
|
|
|
|
(gdb) b _start
|
|
Breakpoint 1 at 0x91: file boot/boot.S, line 63
|
|
(gdb) c
|
|
Continuing.
|
|
|
|
Breakpoint 1, _start () at boot/boot.S:63
|
|
64 cli
|
|
(gdb)
|
|
|
|
The first thing that happens is that we set the stack pointer. You can
|
|
see that this happens by printing `esp`.
|
|
|
|
(gdb)p/x $esp
|
|
$1 = 0x7ff00
|
|
(gdb)si
|
|
65 mov esp, offset BootStack
|
|
(gdb)si
|
|
67 call check_cpuid
|
|
(gdb)p/x $esp
|
|
$2 = 0x5000
|
|
|
|
So things seem to work so far.
|
|
|
|
The next thing that happens is that the two functions are called to
|
|
check cpuid and long mode availability. You can step through those and
|
|
inspect values as you wish. I'll just skip to after returning from
|
|
check\_longmode.
|
|
|
|
You can't print the contents of `CR4` in gdb, but you can read it from
|
|
the qemu monitor command `info registers` which can be called from gdb
|
|
by the `monitor command`.
|
|
|
|
_start () at boot/boot.S:72
|
|
72 mov eax, cr4
|
|
(gdb) monitor info registers
|
|
EAX=00000664 EBX=00000000 ECX=00000005 EDX=2193fbfd
|
|
...
|
|
CR0=00000011 CR2=00000000 CR3=00000000 CR4=00000000
|
|
...
|
|
XMM06=00000000000000000000000000000000 XMM07=00000000000000000000000000000000
|
|
(gdb)
|
|
|
|
To simplify things, I wrote a python function to extract individual
|
|
registers and put it in `toolchain/gdbinit` - see next section. This
|
|
let's you run `reg cr4` to se the register contents.
|
|
|
|
(gdb) reg cr4
|
|
CR4=00000000
|
|
(gdb) n
|
|
73 or eax, 1<<5
|
|
(gdb)
|
|
74 mov cr4, eax
|
|
(gdb)
|
|
77 mov eax, offset BootP4
|
|
(gdb) reg cr4
|
|
CR4=00000020
|
|
(gdb)
|
|
|
|
No surprises there, really. Page address extension should now be enabled.
|
|
|
|
The next step is loading the page table. This shouldn't actually matter,
|
|
since paging is disabled, so just step through it and make sure the
|
|
value loaded into `CR3` is page aligned.
|
|
|
|
82 mov ecx, 0x8C000080
|
|
(gdb) reg cr3
|
|
CR3=00002000
|
|
|
|
The same goes for setting the Long Mode Enable bit of the EFER register.
|
|
Make sure to remember the value of EFER, though ...
|
|
|
|
89 mov eax, cr0
|
|
(gdb) reg efer
|
|
EFER=0000000000000100
|
|
(gdb) monitor info mem
|
|
PG disabled
|
|
(gdb)
|
|
|
|
... because after we enable paging by setting the Paging bit in CR0 ...
|
|
|
|
(gdb) reg cr0
|
|
CR0=00000011
|
|
(gdb) n
|
|
90 or eax, 1<<31
|
|
(gdb)
|
|
91 mov cr0, eax
|
|
(gdb)
|
|
94 lgdt [BootGDTp]
|
|
(gdb) reg cr0
|
|
CR0=80000011
|
|
(gdb)
|
|
|
|
... the Long Mode Active bit (bit 10) should also be set.
|
|
|
|
(gdb) reg efer
|
|
EFER=0000000000000500
|
|
(gdb) monitor info mem
|
|
0000000000000000-0000000040000000 0000000040000000 -rw
|
|
(gdb)
|
|
|
|
This means the processor is in Long Mode!
|
|
|
|
You'll also see that the command `monitor info mem` (which I mapped to
|
|
`mmap` in my gdbinit - see next section) show that paging is enabled
|
|
and that the first Gb of virtual memory is mapped. Also note that the
|
|
virtual address space expects addresses of 64 bits now.
|
|
|
|
But we're still running code in Legacy Mode. That's why we load the GDT
|
|
and reload the segment selectors next.
|
|
|
|
(gdb) n
|
|
97 jmp 0x8:long_mode_start
|
|
(gdb)
|
|
long_mode_start () at boot/boot.S:103
|
|
103 mov eax, 0x0
|
|
(gdb) reg
|
|
RAX=0000000080000011 RBX=0000000000000000 RCX=00000000c0000080 RDX=0000000000000000
|
|
...
|
|
CS =0008 0000000000000000 00000000 00209800 DPL=0 CS64 [---]
|
|
...
|
|
(gdb)
|
|
|
|
You'll note that the `reg` command now outputs registers with the R
|
|
prefix (`RAX` instead of `EAX` etc.) and that they are twice as big.
|
|
|
|
We are now running 64 bit code!
|
|
|
|
## Bonus
|
|
|
|
I mentioned two custom gdb commands in the previous section.
|
|
|
|
`mmap` which shows the memory map from qemu, and `reg` which prints the
|
|
value of a register. Those are defined in `toolchain/gdbinit`:
|
|
|
|
```
|
|
define mmap
|
|
monitor info mem
|
|
end
|
|
|
|
python
|
|
|
|
import re
|
|
|
|
class Reg(gdb.Command):
|
|
|
|
def __init__(self):
|
|
super(Reg, self).__init__("reg", gdb.COMMAND_USER)
|
|
|
|
def invoke(self, arg, from_tty):
|
|
regs = gdb.execute('monitor info registers', False, True)
|
|
|
|
if not arg:
|
|
# If no argument was given, print the output from qemu
|
|
print regs
|
|
return
|
|
|
|
if arg.upper() in ['CS', 'DS', 'ES', 'FS', 'GS', 'SS']:
|
|
# Code selectors may contain equals signs
|
|
for l in regs.splitlines():
|
|
if l.startswith(arg.upper()):
|
|
print l
|
|
elif arg.upper() in ['EFL', 'RFL']:
|
|
# The xFLAGS registers contains equals signs
|
|
for l in regs.splitlines():
|
|
if arg.upper() in l:
|
|
print ' '.join(l.split()[1:])
|
|
# The xFLAGS register is the second one on the line
|
|
else:
|
|
# Split at any word followed by and equals sign
|
|
# Clean up both sides of the split and put into a dictionary
|
|
# then print the requested register value
|
|
regex = re.compile("[A-Z0-9]+\s?=")
|
|
names = [v[:-1].strip() for v in regex.findall(regs)]
|
|
values = [v.strip() for v in regex.split(regs)][1:]
|
|
regs = dict(zip(names, values))
|
|
print "%s=%s" % (arg.upper(), regs[arg.upper()])
|
|
|
|
|
|
Reg()
|
|
|
|
end
|
|
```
|
|
|
|
The `mmap` command is obvious enough, but the `reg` one is a bit tougher.
|
|
A bit of information on the syntax of python commands in gdb can be
|
|
found [here](https://sourceware.org/gdb/onlinedocs/gdb/Python-API.html).
|
|
|
|
The rest is some rather messy python, but the basic flow is this
|
|
|
|
- Get the register output from grub by running `monitor info registers`
|
|
- If no argument was given, print the output we got and end
|
|
- Look for text in the format `SOMETHING=SOMETHINGELSE` and split it into `SOMETHING` and `SOMETHINGELSE`
|
|
- Put `SOMETHING` and `SOMETHINGELSE` back together in a way that's more useful for python
|
|
- Print the value we want
|
|
|
|
Then there are some special cases for things like `EFL` which contains
|
|
equals signs in the output. E.g. displaying bit flags `EFL=0000002
|
|
[-------] CPL=0 II=0 A20=1 SMM=0 HLT=0`. Note that I'm not catching all
|
|
such cases, but only the ones I think might be interesting.
|
|
|
|
Note also that gdb doesn't require the entire command name, but only enough to
|
|
make it unambiguous. As such, you can run `(gdb) mm` instead of `(gdb) mmap` if
|
|
you'd like. Just a heads up...
|
|
|