Current LinLogFS Design Issues

Latest Modification: Feb, 28th 2000


This page contains some suggestions about future changes to LinLogFS. They are motivated by the experiences I've gained while working on LinLogFS so far. Please note that the issues listed on this page are currently under discussion, so you might or might not see them in future LinLogFS versions.

Feel free to comment on these issues...

LinLogFS Features Under Discussion

The Catalog Filesystem

The original LinLogFS design had some data structures mapped to fixed on-disk locations. This was done for metadata that is not tied to a specific filesystem, such as segment allocation and a table of all available filesystems and their versions.

However, since we plan to support more than one filesystem, it is recommended to put such information into a catalog filesystem. The catalog filesystem contains files that list metadata information, such as the free segment bitmap, the currently active filesystems and their versions, etc...

Change Segment Linkage

Currently, the segments that are part of the log form a doubly linked list. This makes the task of removing a segment from the log rather time-consuming, since this task must be performed in a crash-proof way.

However, the exact sequence in which the segments have been written is not important anymore after a major checkpoint has been written. So there is only need for a linked list of the segments that have been written after the last major checkpoint.

So when cleaning segments that have been written before the last major checkpoint, it is sufficient to mark that segment as clean in its segment header (and update the logical timestamp that can be found there) without changing the pointer to it that can be found in its predecessor (because the linked list is not required for segments older than the latest major checkpoint).

However, this might complicate the task of re-synchronizing with a slave node in conjuncton with drdb...

Eliminate Blocksize Changes

Currently, LinLogFS switches the blocksize to 1kb when it has to write out data at fixed on-disk locations, such as the super-block. The reason for this is that these data structures are aligned on 1kb boundaries.

Altough this happens quite infrequently (in general, only during major checkpoint writes), it has the negative side effect that the buffer cache for the underlying device is cleared every time the blocksize is switched. However, this might even cause harmful interference with drdb...

This should not be too hard to do after a Catalog Filesystem has been introduced.

Eliminate the .atime file?

Currently, the atime and the LinLog inode version number (needed for efficient cleaning) for each inode a stored in a separate file. -- However, check out whether the usage pattern for this file is not a sparse one. -- Furthermore, all the information stored in this file could also be put in data checkpoints, so that updating the inode file itself every now and then is probably the better way...

Make usage of data checkpoints more general

Use data checkpoints in a similar way that journaled filesystems make use of their journal. -- Could minimize the need for cascaded metadata updates...

Take iusage and atime inode out of the ifile

To save one block per checkpoint, these inodes could be put directly into the filesytem-specific part of a checkpoint.

Modularize Filesystem Personality Interface

Idea: Put code for indirect handling and (basically hidden in ext2_getblk) and for directory access in separate modules and save a version number for this modules in the on-disk data structures.

This should allow flexible upgrades of these modules in the future (just revert to older modules for older filesystems)...

Re-think the Layout Within a Segment

New write semantics (write barriers) for a more generic block-device/io-reordering and for drdb should be taken into account...

Christian Czezatke, email: