87 lines
2.6 KiB
Markdown

# Using SSE
Stream SIMD Extensions (SSD) is a set of processor instructions that help with floating point operations. They have been around for over 20 years, but are not enabled by default.
The reason for this is that the SSE instructions make use of some special CPU registers, and the OS needs to be aware of this so they can be saved and restored at context switching.
TODO: Three versions of SSE instructions and registers?
## Preserving registers
Saving and restoring the SSE registers is actually very easy, and there's even special instructions that does it:
```asm
.global sse_save
sse_save:
fxsave [rdi]
ret
.global sse_restore
sse_restore:
fxrstor [rdi]
ret
```
Those functions will save or restore the SSE registers to or from the 512 byte buffer used as the first argument (passed in `rdi`).
A good time to do this is to restore the registers right before switching to a new thread, and saving them right after:
```c
process *next = scheduler_next();
...
sse_restore(next->sse);
switch_stack(scheduler->stack_ptr, next->stack_ptr);
sse_save(next->sse);
...
```
TODO: Can SSE usage be checked through exceptions?
Now, before moving on to how to actually enable SSE, there's one pitfall to look out for.
The memory location for storing the SSE registers MUST BE 16 BYTE ALLIGNED.
This is easiest done by allocation 16 bytes extra, and adding an offset to the pointer:
TODO: Make sure this is freed correctly!
```c
struct process *new_process() {
...
void *sse = malloc(512+16);
new->sse = (void *) (((uintptr_t)sse + 0xF) & ~0xF);
```
## Enabling SSE
Ok, so finally - enabling SSE.
This is simply done by setting or unsetting four controll register bits.
```asm
sse_init:
mov rax, cr4
or rax, 1<<9 //; Enable Operating System FXSAVE/FXRSTOR Support bit (OSFXSR)
or rax, 1<<10 //; Enable Operating System Unmasked Exception Support bit (OSXMMEXCPT)
mov cr4, rax
mov rax, cr0
or rax, 1<<1 //; Enable Monitor Coprocessor bit (MP)
and rax, ~(1<<2) //; Disable Emulate Coprocessor bit (EM)
mov cr0, rax
```
And that's it.
We can now use floating point math!
TODO: Find out and explain what the flags actually do and why they are all needed.
OSFXSR: Enable use of legacy SSE instructions and indicate that OS saves 64 and 128 bit registers on task switching
OSXMMEXCPT: Indicate that floating point excetion (#XF) is handled by the OS
MP: CHECK IF THIS IS REQUIRED HERE OR ALREADY SET
EM: Should be set to 1 if processor supports x86 - check if this is set already
CR0= 0x80000011
EM already unset, MP not set
CR4= 0x00000020
CR0= 0x80000013
CR4= 0x00000620