415 lines
13 KiB
Markdown
415 lines
13 KiB
Markdown
layout: post
|
|
title: "Virtual File System 2"
|
|
subtitle: "for real this time."
|
|
tags: [osdev, filesystems]
|
|
|
|
Once again, several months have passed since I wrote anything here.
|
|
I also worked very little on the kernel during this time, though. So no
|
|
losses there, at least. Instead I've been busy with a special side
|
|
project that I will write a bit about later and a lot of personal stuff
|
|
that you probably don't care about if you're reading those posts. Unless
|
|
you're my wife, that is (I got married! :) ). Then again, if you're my
|
|
wife you're probably not reading those posts anyway, so I guess the
|
|
entire point of this paragraph was to tell you I got married(!).
|
|
|
|
Ahem...
|
|
|
|
###The virtual filesystem
|
|
So, as I said in [my last post](/blog/2013/08/Virtual-File-System/)
|
|
I recently rewrote my VFS layer. Although, I must admit I'm not quite satisfied
|
|
with it yet...
|
|
|
|
...
|
|
|
|
Know what? Let's rewrite it again!
|
|
|
|
###What I want
|
|
First of all: What do I want from the virtual filesystem?
|
|
|
|
The VFS should be an abstraction layer for the files used by the kernel
|
|
and user processes. The files in this case could be files on a disk,
|
|
the disk hardware itself, pipes, files stored in ram, read-only files
|
|
generated on-the-fly by the kernel, network connections(?) etc.
|
|
|
|
Further, I want the VFS to be independent of any disk file system, e.g.
|
|
the VFS shouldn't have user-group-other read-write-execute tuples just
|
|
because it's designed with ext2 in mind. It might have those tuples -
|
|
I haven't decided yet - but it it does it won't be because ext2 uses
|
|
them. Nor will the VFS be designed with ext2 or any other disk file
|
|
system in mind.
|
|
|
|
The VFS should offer the functions
|
|
|
|
:::c
|
|
open() // Open or create a file
|
|
close() // Close a file
|
|
read() // Read data from opened file
|
|
write() // Write data to opened file
|
|
move() // Move a file or directory
|
|
link() // Put a file or directory in path tree
|
|
unlink() // Remove a file or directory from path tree
|
|
stat() // Get more info about a file or directory
|
|
isatty() // Returns true if the file is a terminal
|
|
mkdir() // Create a directory
|
|
readdir() // Get a directory entry
|
|
finddir() // Find a file by name from a directory
|
|
|
|
for all files and directories regardless of their underlying device or
|
|
driver.
|
|
|
|
The file system should have a tree structure with a single root.
|
|
|
|
Filesystems should be mountable at any path in the tree provided the
|
|
path points to a directory or non-existing node within an existing
|
|
parent directory. I.e. if the empty directory `/foo`, a filesystem
|
|
can be mounted to `/foo` or `/foo/bar` but not to `foo/baz/bar`. If
|
|
`/foo` is not empty, it should a filesystem can still be mounted to
|
|
it, unless it is already a mountpoint. All contents of `/foo` are then
|
|
hidden untill the filesystem is unmounted. Mounting filesystems to
|
|
non-directories should not be possible. Mounted filesystems does not
|
|
have to have a root directory, but can consist of a single file.
|
|
|
|
Those are just some rules for mounting that I thought of pretty much
|
|
arbitrarily. I might change my mind later, but this will do for now.
|
|
|
|
|
|
###Implementation
|
|
The most important data structure of the VFS is the inode.
|
|
Each file used by the kernel gets an inode which keeps track of some
|
|
important information of it, such as what type of file it is or which
|
|
driver controls it.
|
|
|
|
Some inodes live in the mount tree, which keeps track of all mounted
|
|
filesystems. Looking up a file by absolute path always starts in the
|
|
mount tree and is performed by a function called namei (name to inode).
|
|
|
|
Example:
|
|
|
|
- A user process wants the file `/mnt/floppy/foo/bar.txt`
|
|
- `namei` starts at the VFS root /
|
|
- `namei` searches for `/mnt` in the VFS tree, finds it and gets it's inode.
|
|
- `namei` searches for `/mnt/floppy` in the VFS tree, finds it and gets it's inode.
|
|
- `namei` searches for `/mnt/floppy/foo`, which is not found.
|
|
- `namei` asks the `/mnt/floppy` inode for `/mnt/floppy/foo` and gets it's inode.
|
|
- `namei` asks the `/mnt/floppy/foo` inode for `/mnt/floppy/foo/bar.txt` and gets it's inode.
|
|
- `namei` returns the inode for `/mnt/floppy/foo/bar.txt`
|
|
|
|
A good starting point for the inode structure might be some pointers to
|
|
allow it to be placed in a tree, then.
|
|
|
|
:::c
|
|
struct vfs_node_st;
|
|
typedef vfs_node_t * INODE;
|
|
|
|
typedef struct vfs_node_st
|
|
{
|
|
char name[VFS_NAME_SZ];
|
|
INODE parent;
|
|
INODE child;
|
|
INODE older, younger;
|
|
uint32_t type;
|
|
} vfs_node_t;
|
|
|
|
This does waste a bit of memory, since most inodes that are used by the
|
|
system won't be in the VFS tree, but four `size_t` isn't that much,
|
|
and it's far from the worst memory thief in this kernel anyway.
|
|
|
|
The `type` field in the VFS node struct is used by the VFS tree to make
|
|
sure stuff is only mounted onto directories.
|
|
|
|
I typedef `INODE` as a pointer to keep the code a bit cleaner and easier
|
|
to maintain.
|
|
|
|
|
|
Then, we need a way to keep track of the driver, i.e. the functions
|
|
called to access the file. To do this, I define a new struct:
|
|
|
|
:::c
|
|
typedef struct vfs_driver_st
|
|
{
|
|
uint32_t (*open)(INODE, uint32_t);
|
|
uint32_t (*close)(INODE);
|
|
uint32_t (*read)(INODE, void *, uint32_t, uint32_t);
|
|
uint32_t (*write)(INODE, void *, uint32_t, uint32_t);
|
|
uint32_t (*link)(INODE, INODE, const char *);
|
|
uint32_t (*unlink)(INODE, const char *);
|
|
uint32_t (*stat)(INODE, struct stat *st);
|
|
uint32_t (*isatty)(INODE);
|
|
uint32_t (*mkdir)(INODE, const char *);
|
|
dirent_t *(*readdir)(INODE, uint32_t);
|
|
INODE (*finddir)(INODE, const char *);
|
|
} vfs_driver_t;
|
|
|
|
and add `vfs_driver_t *d` to the inode struct. I also added a length
|
|
value, a void pointer for arbitrary data used by the drivers and a flags
|
|
value - also for use by the drivers. The
|
|
inode struct now looks like this:
|
|
|
|
:::c
|
|
typedef struct vfs_node_st
|
|
{
|
|
char name[VFS_NAME_SZ];
|
|
void *parent;
|
|
void *child;
|
|
void *older, *younger;
|
|
uint32_t type;
|
|
vfs_driver_t *d;
|
|
void *data;
|
|
uint32_t flags;
|
|
uint32_t length;
|
|
}
|
|
|
|
###Vfs functions
|
|
Next, I create some wrapper functions to call the driver functions.
|
|
|
|
:::c
|
|
uint32_t vfs_open(INODE ino, uint32_t mode)
|
|
{
|
|
if(ino->d->open)
|
|
return ino->d->open(ino, mode);
|
|
return 0;
|
|
}
|
|
|
|
and similar for all functions except `readdir` and `finddir` which
|
|
contain code to handle `.` and `..` for mount roots.
|
|
|
|
|
|
:::c
|
|
dirent_t *vfs_readdir(INODE ino, uint32_t num)
|
|
{
|
|
if(ino->type & FS_MOUNT)
|
|
{
|
|
if(num == 0)
|
|
{
|
|
dirent_t *ret = calloc(1, sizeof(dirent_t));
|
|
ret->ino = ino;
|
|
strcpy(ret->name, ".");
|
|
return ret;
|
|
} else if(num == 1) {
|
|
dirent_t *ret = calloc(1, sizeof(dirent_t));
|
|
ret->ino = ino->parent;
|
|
strcpy(ret->name, "..");
|
|
return ret;
|
|
}
|
|
}
|
|
if(ino->d->readdir)
|
|
return ino->d->readdir(ino, num);
|
|
return 0;
|
|
}
|
|
|
|
|
|
|
|
:::c
|
|
INODE vfs_finddir(INODE ino, const char *name)
|
|
{
|
|
if(ino->type & FS_MOUNT)
|
|
{
|
|
if(!strcmp(name, "."))
|
|
{
|
|
return ino;
|
|
} else if(!strcmp(name, "..")) {
|
|
return ino->parent;
|
|
}
|
|
}
|
|
if(ino->d->finddir)
|
|
return ino->d->finddir(ino, name);
|
|
if(ino->d->readdir)
|
|
{
|
|
// Backup solution
|
|
int num = 0;
|
|
dirent_t *de;
|
|
while(1)
|
|
{
|
|
de = vfs_readdir(ino, num);
|
|
if(!de)
|
|
return 0;
|
|
if(!strcmp(name, de->name))
|
|
break;
|
|
free(de->name);
|
|
free(de);
|
|
num++;
|
|
}
|
|
INODE ret = de->ino;
|
|
free(de->name);
|
|
free(de);
|
|
return ret;
|
|
}
|
|
return 0;
|
|
}
|
|
|
|
Finally, I needed a function for mounting filesystems in the mount tree
|
|
and the `namei` function, which can actually be combined since they both
|
|
need to traverse the entire path.
|
|
|
|
_Warning:_ Pointer-pointers ahead!
|
|
|
|
First: a function for traversing the mount tree as far as possible
|
|
|
|
:::c
|
|
INODE vfs_find_root(char **path)
|
|
{
|
|
// Find closest point in mount tree
|
|
INODE current = vfs_root;
|
|
INODE mount = current;
|
|
char *name;
|
|
while((name = strsep(path, "/")))
|
|
{
|
|
current = current->child;
|
|
while(current)
|
|
{
|
|
if(!strcmp(current->name, name))
|
|
{
|
|
mount = current;
|
|
break;
|
|
}
|
|
current = current->olderyounger;
|
|
}
|
|
if(!current)
|
|
{
|
|
if(*path)
|
|
{
|
|
*path = *path - 1;
|
|
*path[0] = '/';
|
|
}
|
|
*path = name;
|
|
break;
|
|
}
|
|
}
|
|
|
|
return (INODE)mount;
|
|
}
|
|
|
|
Pretty self explanatory. No? Well, `strsep` is a library function which
|
|
picks out one part of the path at a time and also advances the `path`
|
|
pointer. Then, for each part, we look through the children of the node
|
|
we're at for one with the right name. If it is not found, the path
|
|
pointer is backed up one step and the last node we found is returned.
|
|
|
|
The namei/mount function then uses this as a starting point:
|
|
|
|
|
|
|
|
:::c
|
|
INODE vfs_namei_mount(const char *path, INODE root)
|
|
{
|
|
char *npath = strdup(path);
|
|
char *pth = &npath[1];
|
|
// Find closest point in mount tree
|
|
INODE current = vfs_find_root(&pth);
|
|
char *name;
|
|
while(current && (name = strsep(&pth, "/")))
|
|
{
|
|
// Go through the path
|
|
INODE next = vfs_finddir(current, name);
|
|
|
|
if(root)
|
|
{
|
|
// If we want to mount someting
|
|
if(!next)
|
|
{
|
|
// Create last part of path if it doesn't exist
|
|
// But only if it is the last part.
|
|
if(pth)
|
|
return 0;
|
|
next = calloc(1, sizeof(vfs_node_t));
|
|
strcpy(next->name, name);
|
|
next->type = FS_DIRECTORY;
|
|
}
|
|
|
|
// Add path to mount tree
|
|
next->parent = current;
|
|
next->older = current->child;
|
|
current->child = next;
|
|
}
|
|
|
|
if(!next)
|
|
return 0;
|
|
if(!current->parent)
|
|
free(current);
|
|
|
|
current = next;
|
|
}
|
|
free(npath);
|
|
|
|
if(root && current->type == FS_DIRECTORY)
|
|
{
|
|
// Replace node in mount tree
|
|
root->parent = current->parent;
|
|
if(root->parent->child == current)
|
|
root->parent->child = root;
|
|
root->older = current->older;
|
|
if(root->older)
|
|
root->older->younger = current;
|
|
root->younger = current->younger;
|
|
if(root->younger)
|
|
root->younger->older = current;
|
|
strcpy(root->name, current->name);
|
|
root->type = FS_MOUNT;
|
|
if(current == vfs_root)
|
|
vfs_root = root;
|
|
|
|
free(current);
|
|
}
|
|
return current;
|
|
}
|
|
|
|
Note how `pth` is changed by `vfs_find_root()` to only contain the part
|
|
of the path that wasn't found. After that, we ask each node for the next
|
|
until we reach the target or a dead end. If the dead end is at the very
|
|
last part of the path (`/foo/bar` in the example above) and we want to
|
|
mount something a new node is created. Otherwise the function returns.
|
|
Also, if the goal is to mount something, each part of the path is added
|
|
to the mount tree. Finally, the mounting is performed - if requested and
|
|
the final inode is returned.
|
|
|
|
I also made two simple wrappers for this function:
|
|
|
|
|
|
:::c
|
|
INODE vfs_namei(const char *path)
|
|
{
|
|
return vfs_namei_mount(path, 0);
|
|
}
|
|
|
|
INODE vfs_mount(const char *path, INODE root)
|
|
{
|
|
return vfs_namei_mount(path, root);
|
|
}
|
|
|
|
And finally, a function for unmounting file systems:
|
|
|
|
|
|
:::c
|
|
INODE vfs_umount(const char *path)
|
|
{
|
|
char *npath = strdup(path);
|
|
char *pth = &npath[1];
|
|
INODE ino = vfs_find_root(&pth);
|
|
if(!ino || pth)
|
|
{
|
|
free(npath);
|
|
return 0;
|
|
}
|
|
if(ino->child)
|
|
{
|
|
free(npath);
|
|
return 0;
|
|
} else {
|
|
// Remove node from mount tree
|
|
if(ino->parent->child == ino)
|
|
ino->parent->child = ino->older;
|
|
if(ino->younger)
|
|
ino->younger->older = ino->older;
|
|
if(ino->older)
|
|
ino->older->younger = ino->younger;
|
|
free(npath);
|
|
return ino;
|
|
}
|
|
}
|
|
|
|
And that's it for now. A lot of code this time, but that's because I
|
|
don't want to push my changes to github quite yet, so I can't give you a
|
|
commit link.
|
|
|
|
Next time, I'll look at some file related system calls.
|