thomasloven.com/pages/2013-06-07-System-Calls.md

124 lines
3.5 KiB
Markdown

layout: post
title: "System calls"
subtitle: "Bend the stack to your will"
tags: [osdev]
System calls is the way user processes communicate to the kernel. Look
at the following program, for example.
:::c
#include <stdio.h>
int main(int argc, char **argv)
{
printf("Hello, world!");
return 0;
}
When you call the program, even before it is started, the shell makes a
couple of system calls such as `fork()` and `exec()`. The program itself
then makes several more system calls before the `write()` and `exit()`
system calls represented by the two lines in the code.
System calls can be performed in several ways, but one of the most
common is through a special software interrupt with the `int`
instruction. For example, linux and most unix-like hobby kernels I've
studied use `int 0x80`. That's also what I chose to use in my kernel.
Next is the problem of passing data. The simplest way is using
registers, and that's what most projects seem to use. For this, I chose
a combination of a single register and the processes own stack.
###Sample system call
Let's look at how `read()` would be implemented. I've not actually
implemented it in my kernel yet, but here's how it would work.
####User side
First the definition in the c library:
:::c
int read(int file, char *ptr, int len)
{
return _syscall_read(file, ptr, len);
}
Simply a wrapper for an assembly function:
:::nasm
[global _syscall_read]
_syscall_read:
mov eax, SYSCALL_READ
int 0x80
mov [syscall_error], edx
ret
This function puts an identifier for the system call in the `eax`
register and then execute the system call interrupt.
_Note:_ Here I return the error code through register
`edx`. In the actual code at this point, I used the
register `ebx`. I should have looked up [Calling
Conventions](http://wiki.osdev.org/Calling_Conventions) more carefully.
Of course, this can be simplified with a macro to
:::nasm
[global _syscall_read]
DEF_SYSCALL(read, SYSCALL_READ)
####Kernel side
In the kernel, the system call is caught by the following function:
:::c
registers_t *syscall_handler(registers_t *r)
{
if(syscall_handlers[r->eax])
r = syscall_handlers[r->eax](r);
else
r->edx = ERR_NOSYSCALL;
return r;
}
If the system call is registered correctly in the kernel (through the
macro `KREG_SYSCALL(read, SYSCALL_READ)`), this will pass everything
onto the following function:
:::c
KDEF_SYSCALL(read, r)
{
process_stack stack = init_pstack();
r->eax = read((int)stack[0], (char *)stack[1], (int)stack[2]);
r->edx = errno;
return r;
}
The `init_pstack()` macro expands to `(unitptr_t *)(r->useresp + 0x4)`
and this lets us read the arguments passed to the system call from where
they are pushed on call.
Then the `read()` function has the same definition as the library version.
:::c
int read(int file, char *ptr, int len)
{
...
}
_Spoiler alert:_ Keeping a version of `read()` (and in fact every
syscall function) inside the kernel will turn out to have some really
cool advantages...
This works for c compiled with the `cdecl` calling convention. For other
languages or calling conventions, the asm functions will have to be
adjusted.
###Git
The methods described in this post has been implemented in git commit
[8a26e26163](https://github.com/thomasloven/os5/tree/8a26e26163c15c9d9854554dce9d4fc5ad8baee5).