415 lines
		
	
	
		
			13 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
			
		
		
	
	
			415 lines
		
	
	
		
			13 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
| layout: post
 | |
| title: "Virtual File System 2"
 | |
| subtitle: "for real this time."
 | |
| tags: [osdev, filesystems]
 | |
| 
 | |
| Once again, several months have passed since I wrote anything here.
 | |
| I also worked very little on the kernel during this time, though. So no
 | |
| losses there, at least. Instead I've been busy with a special side
 | |
| project that I will write a bit about later and a lot of personal stuff
 | |
| that you probably don't care about if you're reading those posts. Unless
 | |
| you're my wife, that is (I got married! :) ). Then again, if you're my
 | |
| wife you're probably not reading those posts anyway, so I guess the
 | |
| entire point of this paragraph was to tell you I got married(!).
 | |
| 
 | |
| Ahem...
 | |
| 
 | |
| ###The virtual filesystem
 | |
| So, as I said in [my last post](/blog/2013/08/Virtual-File-System/)
 | |
| I recently rewrote my VFS layer. Although, I must admit I'm not quite satisfied
 | |
| with it yet...
 | |
| 
 | |
| ...
 | |
| 
 | |
| Know what? Let's rewrite it again!
 | |
| 
 | |
| ###What I want
 | |
| First of all: What do I want from the virtual filesystem?
 | |
| 
 | |
| The VFS should be an abstraction layer for the files used by the kernel
 | |
| and user processes. The files in this case could be files on a disk,
 | |
| the disk hardware itself, pipes, files stored in ram, read-only files
 | |
| generated on-the-fly by the kernel, network connections(?) etc.
 | |
| 
 | |
| Further, I want the VFS to be independent of any disk file system, e.g.
 | |
| the VFS shouldn't have user-group-other read-write-execute tuples just
 | |
| because it's designed with ext2 in mind. It might have those tuples -
 | |
| I haven't decided yet - but it it does it won't be because ext2 uses
 | |
| them. Nor will the VFS be designed with ext2 or any other disk file
 | |
| system in mind.
 | |
| 
 | |
| The VFS should offer the functions
 | |
| 
 | |
|     :::c
 | |
|     open() // Open or create a file
 | |
|     close() // Close a file
 | |
|     read() // Read data from opened file
 | |
|     write() // Write data to opened file
 | |
|     move() // Move a file or directory
 | |
|     link() // Put a file or directory in path tree
 | |
|     unlink() // Remove a file or directory from path tree
 | |
|     stat() // Get more info about a file or directory
 | |
|     isatty() // Returns true if the file is a terminal
 | |
|     mkdir() // Create a directory
 | |
|     readdir() // Get a directory entry
 | |
|     finddir() // Find a file by name from a directory
 | |
| 
 | |
| for all files and directories regardless of their underlying device or
 | |
| driver.
 | |
| 
 | |
| The file system should have a tree structure with a single root.
 | |
| 
 | |
| Filesystems should be mountable at any path in the tree provided the
 | |
| path points to a directory or non-existing node within an existing
 | |
| parent directory. I.e. if the empty directory `/foo`, a filesystem
 | |
| can be mounted to `/foo` or `/foo/bar` but not to `foo/baz/bar`. If
 | |
| `/foo` is not empty, it should a filesystem can still be mounted to
 | |
| it, unless it is already a mountpoint. All contents of `/foo` are then
 | |
| hidden untill the filesystem is unmounted. Mounting filesystems to
 | |
| non-directories should not be possible. Mounted filesystems does not
 | |
| have to have a root directory, but can consist of a single file.
 | |
| 
 | |
| Those are just some rules for mounting that I thought of pretty much
 | |
| arbitrarily. I might change my mind later, but this will do for now.
 | |
| 
 | |
| 
 | |
| ###Implementation
 | |
| The most important data structure of the VFS is the inode.
 | |
| Each file used by the kernel gets an inode which keeps track of some
 | |
| important information of it, such as what type of file it is or which
 | |
| driver controls it.
 | |
| 
 | |
| Some inodes live in the mount tree, which keeps track of all mounted
 | |
| filesystems. Looking up a file by absolute path always starts in the
 | |
| mount tree and is performed by a function called namei (name to inode).
 | |
| 
 | |
| Example:
 | |
| 
 | |
| - A user process wants the file `/mnt/floppy/foo/bar.txt`
 | |
| - `namei` starts at the VFS root /
 | |
| - `namei` searches for `/mnt` in the VFS tree, finds it and gets it's inode.
 | |
| - `namei` searches for `/mnt/floppy` in the VFS tree, finds it and gets it's inode.
 | |
| - `namei` searches for `/mnt/floppy/foo`, which is not found.
 | |
| - `namei` asks the `/mnt/floppy` inode for `/mnt/floppy/foo` and gets it's inode.
 | |
| - `namei` asks the `/mnt/floppy/foo` inode for `/mnt/floppy/foo/bar.txt` and gets it's inode.
 | |
| - `namei` returns the inode for `/mnt/floppy/foo/bar.txt`
 | |
| 
 | |
| A good starting point for the inode structure might be some pointers to
 | |
| allow it to be placed in a tree, then.
 | |
| 
 | |
|     :::c
 | |
|     struct vfs_node_st;
 | |
|     typedef vfs_node_t * INODE;
 | |
| 
 | |
|     typedef struct vfs_node_st
 | |
|     {
 | |
|         char name[VFS_NAME_SZ];
 | |
|         INODE parent;
 | |
|         INODE child;
 | |
|         INODE older, younger;
 | |
|         uint32_t type;
 | |
|     } vfs_node_t;
 | |
| 
 | |
| This does waste a bit of memory, since most inodes that are used by the
 | |
| system won't be in the VFS tree, but four `size_t` isn't that much,
 | |
| and it's far from the worst memory thief in this kernel anyway.
 | |
| 
 | |
| The `type` field in the VFS node struct is used by the VFS tree to make
 | |
| sure stuff is only mounted onto directories.
 | |
| 
 | |
| I typedef `INODE` as a pointer to keep the code a bit cleaner and easier
 | |
| to maintain.
 | |
| 
 | |
| 
 | |
| Then, we need a way to keep track of the driver, i.e. the functions
 | |
| called to access the file. To do this, I define a new struct:
 | |
| 
 | |
|     :::c
 | |
|     typedef struct vfs_driver_st
 | |
|     {
 | |
|         uint32_t (*open)(INODE, uint32_t);
 | |
|         uint32_t (*close)(INODE);
 | |
|         uint32_t (*read)(INODE, void *, uint32_t, uint32_t);
 | |
|         uint32_t (*write)(INODE, void *, uint32_t, uint32_t);
 | |
|         uint32_t (*link)(INODE, INODE, const char *);
 | |
|         uint32_t (*unlink)(INODE, const char *);
 | |
|         uint32_t (*stat)(INODE, struct stat *st);
 | |
|         uint32_t (*isatty)(INODE);
 | |
|         uint32_t (*mkdir)(INODE, const char *);
 | |
|         dirent_t *(*readdir)(INODE, uint32_t);
 | |
|         INODE (*finddir)(INODE, const char *);
 | |
|     } vfs_driver_t;
 | |
| 
 | |
| and add `vfs_driver_t *d` to the inode struct. I also added a length
 | |
| value, a void pointer for arbitrary data used by the drivers and a flags
 | |
| value - also for use by the drivers. The
 | |
| inode struct now looks like this:
 | |
| 
 | |
|     :::c
 | |
|     typedef struct vfs_node_st
 | |
|     {
 | |
|         char name[VFS_NAME_SZ];
 | |
|         void *parent;
 | |
|         void *child;
 | |
|         void *older, *younger;
 | |
|         uint32_t type;
 | |
|         vfs_driver_t *d;
 | |
|         void *data;
 | |
|         uint32_t flags;
 | |
|         uint32_t length;
 | |
|     }
 | |
| 
 | |
| ###Vfs functions
 | |
| Next, I create some wrapper functions to call the driver functions.
 | |
| 
 | |
|     :::c
 | |
|     uint32_t vfs_open(INODE ino, uint32_t mode)
 | |
|     {
 | |
|         if(ino->d->open)
 | |
|             return ino->d->open(ino, mode);
 | |
|         return 0;
 | |
|     }
 | |
| 
 | |
| and similar for all functions except `readdir` and `finddir` which
 | |
| contain code to handle `.` and `..` for mount roots.
 | |
| 
 | |
| 
 | |
|     :::c
 | |
|     dirent_t *vfs_readdir(INODE ino, uint32_t num)
 | |
|     {
 | |
|         if(ino->type & FS_MOUNT)
 | |
|         {
 | |
|             if(num == 0)
 | |
|             {
 | |
|                 dirent_t *ret = calloc(1, sizeof(dirent_t));
 | |
|                 ret->ino = ino;
 | |
|                 strcpy(ret->name, ".");
 | |
|                 return ret;
 | |
|             } else if(num == 1) {
 | |
|                 dirent_t *ret = calloc(1, sizeof(dirent_t));
 | |
|                 ret->ino = ino->parent;
 | |
|                 strcpy(ret->name, "..");
 | |
|                 return ret;
 | |
|             }
 | |
|         }
 | |
|         if(ino->d->readdir)
 | |
|         return ino->d->readdir(ino, num);
 | |
|         return 0;
 | |
|     }
 | |
| 
 | |
|  
 | |
| 
 | |
|     :::c
 | |
|     INODE vfs_finddir(INODE ino, const char *name)
 | |
|     {
 | |
|         if(ino->type & FS_MOUNT)
 | |
|         {
 | |
|             if(!strcmp(name, "."))
 | |
|             {
 | |
|                 return ino;
 | |
|             } else if(!strcmp(name, "..")) {
 | |
|                 return ino->parent;
 | |
|             }
 | |
|         }
 | |
|         if(ino->d->finddir)
 | |
|         return ino->d->finddir(ino, name);
 | |
|         if(ino->d->readdir)
 | |
|         {
 | |
|         // Backup solution
 | |
|         int num = 0;
 | |
|         dirent_t *de;
 | |
|         while(1)
 | |
|         {
 | |
|             de = vfs_readdir(ino, num);
 | |
|             if(!de)
 | |
|                 return 0;
 | |
|             if(!strcmp(name, de->name))
 | |
|                 break;
 | |
|             free(de->name);
 | |
|             free(de);
 | |
|             num++;
 | |
|         }
 | |
|         INODE ret = de->ino;
 | |
|         free(de->name);
 | |
|         free(de);
 | |
|         return ret;
 | |
|         }
 | |
|         return 0;
 | |
|     }
 | |
| 
 | |
| Finally, I needed a function for mounting filesystems in the mount tree
 | |
| and the `namei` function, which can actually be combined since they both
 | |
| need to traverse the entire path.
 | |
| 
 | |
| _Warning:_ Pointer-pointers ahead!
 | |
| 
 | |
| First: a function for traversing the mount tree as far as possible
 | |
| 
 | |
|     :::c
 | |
|     INODE vfs_find_root(char **path)
 | |
|     {
 | |
|         // Find closest point in mount tree
 | |
|         INODE current = vfs_root;
 | |
|         INODE mount = current;
 | |
|         char *name;
 | |
|         while((name = strsep(path, "/")))
 | |
|         {
 | |
|             current = current->child;
 | |
|             while(current)
 | |
|             {
 | |
|                 if(!strcmp(current->name, name))
 | |
|                 {
 | |
|                 mount = current;
 | |
|                 break;
 | |
|                 }
 | |
|                 current = current->olderyounger;
 | |
|             }
 | |
|             if(!current)
 | |
|             {
 | |
|                 if(*path)
 | |
|                 {
 | |
|                     *path = *path - 1;
 | |
|                     *path[0] = '/';
 | |
|                 }
 | |
|                 *path = name;
 | |
|                 break;
 | |
|             }
 | |
|         }
 | |
| 
 | |
|         return (INODE)mount;
 | |
|     }
 | |
| 
 | |
| Pretty self explanatory. No? Well, `strsep` is a library function which
 | |
| picks out one part of the path at a time and also advances the `path`
 | |
| pointer. Then, for each part, we look through the children of the node
 | |
| we're at for one with the right name. If it is not found, the path
 | |
| pointer is backed up one step and the last node we found is returned.
 | |
| 
 | |
| The namei/mount function then uses this as a starting point:
 | |
| 
 | |
| 
 | |
| 
 | |
|     :::c
 | |
|     INODE vfs_namei_mount(const char *path, INODE root)
 | |
|     {
 | |
|         char *npath = strdup(path);
 | |
|         char *pth = &npath[1];
 | |
|         // Find closest point in mount tree
 | |
|         INODE current = vfs_find_root(&pth);
 | |
|         char *name;
 | |
|         while(current && (name = strsep(&pth, "/")))
 | |
|         {
 | |
|             // Go through the path
 | |
|             INODE next = vfs_finddir(current, name);
 | |
| 
 | |
|             if(root)
 | |
|             {
 | |
|             // If we want to mount someting
 | |
|                 if(!next)
 | |
|                 {
 | |
|                     // Create last part of path if it doesn't exist
 | |
|                     // But only if it is the last part.
 | |
|                     if(pth)
 | |
|                         return 0;
 | |
|                     next = calloc(1, sizeof(vfs_node_t));
 | |
|                     strcpy(next->name, name);
 | |
|                     next->type = FS_DIRECTORY;
 | |
|                 }
 | |
| 
 | |
|                 // Add path to mount tree
 | |
|                 next->parent = current;
 | |
|                 next->older = current->child;
 | |
|                 current->child = next;
 | |
|             }
 | |
| 
 | |
|             if(!next)
 | |
|                 return 0;
 | |
|             if(!current->parent)
 | |
|                 free(current);
 | |
| 
 | |
|             current = next;
 | |
|         }
 | |
|         free(npath);
 | |
| 
 | |
|         if(root && current->type == FS_DIRECTORY)
 | |
|         {
 | |
|             // Replace node in mount tree
 | |
|             root->parent = current->parent;
 | |
|             if(root->parent->child == current)
 | |
|                 root->parent->child = root;
 | |
|             root->older = current->older;
 | |
|             if(root->older)
 | |
|                 root->older->younger = current;
 | |
|             root->younger = current->younger;
 | |
|             if(root->younger)
 | |
|                 root->younger->older = current;
 | |
|             strcpy(root->name, current->name);
 | |
|             root->type = FS_MOUNT;
 | |
|             if(current == vfs_root)
 | |
|                 vfs_root = root;
 | |
| 
 | |
|             free(current);
 | |
|         }
 | |
|         return current;
 | |
|     }
 | |
| 
 | |
| Note how `pth` is changed by `vfs_find_root()` to only contain the part
 | |
| of the path that wasn't found. After that, we ask each node for the next
 | |
| until we reach the target or a dead end. If the dead end is at the very
 | |
| last part of the path (`/foo/bar` in the example above) and we want to
 | |
| mount something a new node is created. Otherwise the function returns.
 | |
| Also, if the goal is to mount something, each part of the path is added
 | |
| to the mount tree. Finally, the mounting is performed - if requested and
 | |
| the final inode is returned.
 | |
| 
 | |
| I also made two simple wrappers for this function:
 | |
| 
 | |
| 
 | |
|     :::c
 | |
|     INODE vfs_namei(const char *path)
 | |
|     {
 | |
|         return vfs_namei_mount(path, 0);
 | |
|     }
 | |
| 
 | |
|     INODE vfs_mount(const char *path, INODE root)
 | |
|     {
 | |
|         return vfs_namei_mount(path, root);
 | |
|     }
 | |
| 
 | |
| And finally, a function for unmounting file systems:
 | |
| 
 | |
| 
 | |
|     :::c
 | |
|     INODE vfs_umount(const char *path)
 | |
|     {
 | |
|         char *npath = strdup(path);
 | |
|         char *pth = &npath[1];
 | |
|         INODE ino = vfs_find_root(&pth);
 | |
|         if(!ino || pth)
 | |
|         {
 | |
|             free(npath);
 | |
|             return 0;
 | |
|         }
 | |
|         if(ino->child)
 | |
|         {
 | |
|             free(npath);
 | |
|             return 0;
 | |
|         } else {
 | |
|             // Remove node from mount tree
 | |
|             if(ino->parent->child == ino)
 | |
|                 ino->parent->child = ino->older;
 | |
|             if(ino->younger)
 | |
|                 ino->younger->older = ino->older;
 | |
|             if(ino->older)
 | |
|                 ino->older->younger = ino->younger;
 | |
|             free(npath);
 | |
|             return ino;
 | |
|         }
 | |
|     }
 | |
| 
 | |
| And that's it for now. A lot of code this time, but that's because I
 | |
| don't want to push my changes to github quite yet, so I can't give you a
 | |
| commit link.
 | |
| 
 | |
| Next time, I'll look at some file related system calls.
 |