thomasloven.com/pages/2013-08-21-Loading-Elf.md

layout: post
title: "Loading elf"
subtitle: "there's DWARF in my ELF."
tags: [osdev]

### Elf header format

Elf files all start with a header which identifies the file and explains
where to find everything. It has the following structure. The
[ELF specification](http://www.skyfree.org/linux/references/ELF_Format.pdf)
gives an excellent description on the meaning and use of each field.

    :::c
    typedef struct
    {
        uint8_t identity[16];
        uint16_t type;
        uint16_t machine;
        uint32_t version;
        uint32_t entry;
        uint32_t ph_offset;
        uint32_t sh_offset;
        uint32_t flags;
        uint16_t header_size;
        uint16_t ph_size;
        uint16_t ph_num;
        uint16_t sh_size;
        uint16_t sh_num;
        uint16_t strtab_index;
    }__attributes__((packed)) elf_header;

The first thing we should do is check whether we actually got an
executable ELF file. (In the following code, I'll assume the entire elf
file is located somewhere in memory and that this location is passed to
the `load_elf()` function.)

To check if the file is an ELF executable we can look at the
identity field. The first four bytes of this filed should always be
`0x7F`,`'E'`,`'L'`,`'F'`. If that's correct, we can look at the `type`
field. For an executable standalone program, this should be `2`.

    :::c
    int load_elf(uint8_t *data)
    {
        elf_header *elf = (elf_header *)data;
        if(is_elf(elf) != ELF_TYPE_EXECUTABLE)
            return -1;
    ...

`is_elf` looks as follows. Note the use of `strncmp` which I can do
because I link [newlib into my kernel](/blog/2013/08/Catching-Up/).

    :::c
    int is_elf(elf_header *elf)
    {
        int iself = -1;

        if((elf->identity[0] == 0x7f) && \
            !strncmp((char *)&elf->identity[1], "ELF", 3))
        {
            iself = 0;
        }

        if(iself != -1)
            iself = elf->type;

        return iself;
    }

Should be pretty straight forward. Let's continue.

For just loading a simple ELF program, we only need to look at the
program headers which are located in a table at offset `ph_offset` in
the file.

    :::c
    typedef struct
    {
        uint32_t type;
        uint32_t offset;
        uint32_t virtual_address;
        uint32_t physical_address;
        uint32_t file_size;
        uint32_t mem_size;
        uint32_t flags;
        uint32_t align;
    }__attributes__((packed)) elf_phead;

The program headers each tell us about one section of the file, and we
use them to find out what parts of the elf image should be loaded where
in memory. So, the next step would be to go through all program headers
looking for loadable sections and load them into memory.

    :::c
        ...
        elf_phead *phead = (elf_phead)&data[elf->ph_offset];
        uint32_t i;
        for(i = 0; i < elf->ph_num; i++)
        {
            if(phead[i].type == ELF_PT_LOAD)
            {
                load_elf_segment(data, &phead[i]);
            }
        }
        return 0;
    }

This would also be a good time to update the memory manager information
about the executable. You might want to keep track of the start and end
of code and data for example.

Anyway, `load_elf_segment()` looks like this

    :::c
    void load_elf_segment(uint8_t *data, elf_phead *phead)
    {

        uint32_t memsize = phead->mem_size; // Size in memory
        uint32_t filesize = phead->file_size; // Size in file
        uint32_t mempos = phead->virtual_address; // Offset in memory
        uint32_t filepos = phead->offset; // Offset in file

        uint32_t flags = MM_FLAG_READ;
        if(phead->flags & ELF_PT_W) flags |= MM_FLAG_WRITE;

        new_area(current->proc, mempos, mempos + memsize, \
            flags, MM_TYPE_DATA);

        if(memsize == 0) return;

        memcpy(mempos, &data[filepos], filesize);
        memset(mempos + filesize, 0, memsize - filesize);
    }

Let's go through it.

First we define some helper variables.

Next we check if the section we're loading should be writable.

Then we request a new memory area from the [process memory
manager](/blog/2013/06/Even-More-Memory/).

Finally, we copy as much data as is provided in the file and fill the
rest of the new area with zeros.

And that's really all you need to do to load an ELF executable.
The only thing left is to jump to `elf->entry` and you're going.

### Improvements
Of course the entire executable image won't be loaded into memory in the
normal case, but it might be true for e.g. an `init` program or similar
that your bootloaded loads as a module to your kernel. Instead, you
should read the parts you want through your filesystem as you go along.

Or maybe you shouldn't. It doesn't make sense to load a huge program
into memory all at once. What if it encounters an error and exits with
99% of the code unexecuted?

Perhaps the process memory manager could be told where to find certain
parts of the program, and load them only when needed?

### Git
The methods described in this post has been implemented in git commit
[a4ca835d1d](https://github.com/thomasloven/os5/tree/a4ca835d1db61faf214b4b617d38a335ef05d142).