Articles / Tired of fscking? Try a jou...

Tired of fscking? Try a journaling filesystem!

One of the most-anticipated of recent Linux developments is the availability of journaling filesystems. In today's editorial, Philipp Tomsich provides an overview of the alternatives and his thoughts on which you should consider using, depending on your needs.

Journaling filesystems

Waiting for a fsck to complete on a server system can tax your patience more than it should. Fortunately, a new breed of filesystem is coming to your Linux machine soon. Journaling filesystems maintain a special file called a log (or journal), the contents of which are not cached. Whenever the filesystem is updated, a record describing the transaction is added to the log. An idle thread processes these transactions, writes data to the filesystem, and flags each processed transaction as completed. If the machine crashes, the background process is run on reboot and simply finishes copying updates from the journal to the filesystem. Incomplete transactions in the journal file are discarded, so the filesystem's internal consistency is guaranteed.

This cuts the complexity of a filesystem check by a couple of orders of magnitude. A full-blown consistency check is never necessary (in contrast to ext2fs and similar filesystems) and restoring a filesystem after a reboot is a matter of seconds at most.

The players

Today, at least four major players exist in the Linux journaling filesystem arena. They are in various stages of completion, with some of them becoming ready for use in production systems. They are:

Each offers distinct advantages. A detailed technical comparison is available from issue 55 of Linux Gazette.

Most of the available options provide support for dynamically extending the filesystems using a logical volume manager (such as LVM), which makes them perfect for large server installations.

ReiserFS

ReiserFS is a radical departure from the traditional Unix filesystems, which are block-structured. It will be available in the upcoming Red Hat 7.1 distribution and is already available in SuSE Linux 7.0.

Hans Reiser writes about the filesystem he designed: "In my approach, I store both files and filenames in a balanced tree, with small files, directory entries, inodes, and the tail ends of large files all being more efficiently packed as a result of relaxing the requirements of block alignment and eliminating the use of a fixed space allocation for inodes." The effect is that a wide array of common operations, such a filename resolution and file accesses, are optimized when compared to traditional filesystems such as ext2fs. Furthermore, optimizations for small files are well developed, reducing storage overheads due to fragmentation.

ReiserFS is not yet a true journaling filesystem (although full journaling support is currently under development). Instead, buffering and preserve lists are used to track all tree modifications, which achieves a very similar effect. This reduces the risk of filesystem inconsistencies in the event of a crash and thus provides rapid recovery on restart.

Beside offering rapid restart capability after a crash and efficient storage of large numbers of small files, it is the developers' intention to offer facilities to store objects much smaller than those that are normally saved as separate files. Future design plans include adding set-theoretic semantics, making it possible to retrieve files by specifying their attributes instead of an explicit pathname.

ReiserFS was the first of this new breed that managed to be included in the standard Linux kernel distribution, giving it a head start in building a user community.

XFS/Linux

When SGI needed a high performance and scalable filesystem to replace EFS in 1990, it developed XFS to handle the demands of increased disk capacity and bandwidth, and parallelism with new applications such as film, video, and large databases. These demands included extremely fast crash recovery, support for large filesystems, directories with large numbers of files, and fair performance with small and large files. Now SGI is contributing this technology to the Open Source community and is in the process of finalizing its port to Linux.

Technically, XFS is based on the use of B+ trees (similar to the use of balanced trees in ReiserFS) to replace the conventional linear file system structure. B+ trees provide an efficient way to index directory entries and manage file extents, free space, and filesystem metadata. This guarantees quick directory listing and file accesses. The allocation of disk blocks to inodes is done dynamically, which means that you no longer need to create a filesystem with smaller block sizes for your mail server; your filesystem will handle this automatically for you. XFS is also a 64-bit filesystem, which theoretically allows the creation of files that are a few million terabytes in size, which compares favorably to the limitations of 32-bit filesystems. The ability to attach free-form metadata tags to files on an XFS volume is yet another useful feature of this filesystem.

XFS also contains good support for multiprocessor machines. This is visible in the implementation of the page buffer subsystem, which uses an AVL tree which is kept separate from the objects to avoid locking problems and cache thrashing on larger SMP systems. Multithreaded operation has been a declared design goal of this filesystem and has been well tested in large multiprocessor IRIX systems worldwide.

The Linux port is still undergoing development and some features are still to be finalized. For example, loop-mounting a file containing an XFS volume will not work without problems, yet. The X/Open data management API provided on IRIX is still incomplete in the Linux port and guaranteed rate I/O is also an IRIX exclusive, so far. Even now, XFS is more than just a viable alternative on Linux. I've personally used it for a few months on my own systems and have been very happy with its performance, which is at least on a par with ext2fs. Now that an installable CD image (based of the first CD of the Red Hat 7.0 distribution) is available for download, it will be even easier to enjoy the benefits of this filesystem. The user-level tools for filesystem creation, maintenance, and resizing are more functional and easier to use than their ReiserFS counterparts, which mostly stems from the fact that they have been around for a far longer time.

So why should one switch to XFS/Linux if ReiserFS will be readily available in Red Hat 7.1 and SuSE 7.0 (even though it will be a while until it is equally well integrated into and supported by the major distributions)? The main factor is trust, robustness, and maturity... XFS has been deployed on IRIX systems since 1994 and been used in a wide array of mission-critical applications. It's a proven technology, while ReiserFS and ext3fs are relatively new without offering too much new functionality.

JFS

IBM's JFS is a journaling filesystem used in its enterprise servers. It was designed for "high-throughput server environments, key to running intranet and other high-performance e-business file servers" according to IBM's Web site. Judging from the documentation available and the source drops, it will still be a while before the Linux port is completed and included in the standard kernel distribution.

JFS offers a sound design foundation and a proven track record on IBM servers. It uses an interesting approach to organizing free blocks by structuring them in a tree and using a special technique to collect and group continuous groups of free logical blocks. Although it uses extents for a file's block addressing, free space is therefore not used to maintain the free space. Small directories are supported in an optimized fashion (i.e., stored directly within an inode), although with different limitations than those of XFS. However, small files cannot be stored directly within an inode.

The port of JFS is an interesting project and will benefit the Linux community. However, it seems to be farther from being usable for production systems than its competitors.

ext3fs

ext3fs is an alternative for all those who do not want to switch their filesystem, but require journaling capabilities. It is distributed in the form of a kernel patch and provides full backward compatibility. It also allows the conversion of an ext2fs partition without reformatting and a reverse conversion to ext2fs, if desired.

However, using such an add-on to ext2fs has the drawback that none of the advanced optimization techniques employed in the other journaling filesystems is available: no balanced trees, no extents for free space, etc.

My personal opinion on ext3fs is that it is about to meet its fate with the availability of more powerful journaling filesystems. A handful of successful sites, such as RPMFind use this filesystem, but it lacks the momentum that the others have.

Conclusion

With the increasing size of hard disks, journaling filesystems are becoming important to an ever-increasing number of users. If you ever waited for a filesystem check on a machine with an 80GB hard disk, you know what I'm talking about. Even if you do not plan to reboot your system often, they can save you a lot of time and trouble if you experience a power failure or a hardware glitch. With the large number of contenders striving to become the de-facto standard in the journaling filesystem space on Linux, we can look forward to interesting months as these filesystems' code bases mature, are integrated into the standard kernel, and are supported in upcoming releases of the major Linux distributions.

However, keep in mind that migrating to another filesystem is not a trivial task. It usually requires backing up your data, reformatting, and restoring the data onto the newly created volume. You should thoroughly evaluate your options before making the switch.

Rss Recent comments

Rcomment-before 17 Feb 2001 15:32 Rcomment-trans sracer9 Rcomment-after

ReiserFS works for me.
I've been using ReiserFS for about the last 6
months and it's worked rather well for me.
Performance feels a bit snappier than ext2 and in
the event that something happens requiring a
not-quite-so-clean restart, the time to boot is
greatly reduced. Overall, I've been happy with
ReiserFS and would recommend it. Also, ReiserFS is
included in Mandrake since 7.0 (I believe - maybe
7.1).

Rcomment-before 17 Feb 2001 16:51 Rcomment-trans slask Rcomment-after

What about NWFS & TUX2 ????
Why aren´t they in the editorial ?

Rcomment-before 17 Feb 2001 19:29 Rcomment-trans jjramsey Rcomment-after

Define "proven technology"
"The main factor is trust, robustness, and maturity... XFS has been deployed on IRIX systems since 1994 and been used in a wide array of mission-critical applications. It's a proven technology, while ReiserFS and ext3fs are relatively new without offering too much new functionality. "

The problem with that statement is that while XFS has proven itself on IRIX, it is still in a tender state on Linux. It seems to me that looking at the relative progress of XFS and JFS, file systems are not easy to port from one OS to another, and that to get the file systems to work on another OS, some the file systems' code needs to be replaced with new, green code. While that does not mean that XFS or JFS are inherently bad, it does mean that the track records of these file systems when they were used on the OSs for which they were originally designed don't have that much bearing on how well these file systems will work on Linux. That means that when one is talking about how mature these file systems are, it is, for the most part, only useful to talk about how well they perform on Linux.

Given that, it is not all that accurate to describe XFS on Linux as more mature than ReiserFS. Realistically, XFS on Linux is still beta.

Rcomment-before 17 Feb 2001 20:19 Rcomment-trans jeffcovey Rcomment-after

Re: What about NWFS & TUX2 ????

> Why aren´t they in the editorial ?

He can only write what he knows. Tell us about them. :-)

Rcomment-before 17 Feb 2001 20:34 Rcomment-trans Sleuth Rcomment-after

ReiserFS Availabilty
Reiser is also available as far back as SuSE 6.4. Whether or not it's production ready may be in question, but about a year ago there was traffic on the Reiser development email list about several people using it on mail servers and such. I'd suggest that it's more stable than the article implies (and has been under fairly heavy use.)

Rcomment-before 17 Feb 2001 20:44 Rcomment-trans blutgens1 Rcomment-after

There's also GFS
GFS is the Global FIle System. You can find out
more about it http://www.sistina.com/gfs/. It's
not just a Journaled FIleSystem, it's also cluster
aware. It's meant to be used on a SAN, you can
actually read and write to the same filesystem
from many nodes at once. It's really a great piece
of software, the latest release is 4.0.

Rcomment-before 17 Feb 2001 23:28 Rcomment-trans zoftie Rcomment-after

Snappy, fragmention horror?
Weird, my system works well, but I do not do
alot of IO. My friend often burns MP3s onto
regular CDs, so large filesets are to be
normalised for volume, his harddrive is fragmented
into shreds, he using ReiserFS. So do I. I admit
using alpha software with Reiser is blast because
of frequent reboots, but performance loss on my
friend's computer, is rather disconcerning to me.
He said it never happend with ext2. Ext2 got
corrupted more often though.

Rcomment-before 18 Feb 2001 05:42 Rcomment-trans slask Rcomment-after
Rcomment-before 18 Feb 2001 10:49 Rcomment-trans leonbrooks Rcomment-after

Re: ReiserFS Availabilty - production systems

> Reiser is also available as far back as
> SuSE 6.4. Whether or not it's
> production ready may be in question, but
> about a year ago there was traffic on
> the Reiser development email list about
> several people using it on mail servers
> and such. I'd suggest that it's more
> stable than the article implies (and has
> been under fairly heavy use.)

I've used ReiserFS on production systems for some
time now (most notably under Mandrake 7.2 since
setup is trivial) and the only time I had a
problem was with a semi-sane motherboard (it
*really* screwed up the data!). All else has been
plain sailing and no long fsck's (worst case so
far was 3 seconds on a 27GB drive).

Rcomment-before 19 Feb 2001 10:09 Rcomment-trans rullskidor Rcomment-after

Re: Snappy, fragmention horror?
Ive got the impression that fragmentation isn't a
big deal with multiprocessing systems anyway. You
cant assume the same process will be able to read
a file from beginning to end since it will be
interuppted by other processes. You don't want
sequential files, you want them slightly fragmented.

Strange problem though

Rcomment-before 19 Feb 2001 13:39 Rcomment-trans LEgregius Rcomment-after

Re: ReiserFS Availabilty - production systems

>
> % Reiser is also available as far back
> as
> % SuSE 6.4. Whether or not it's
> % production ready may be in question,
> but
> % about a year ago there was traffic
> on
> % the Reiser development email list
> about
> % several people using it on mail
> servers
> % and such. I'd suggest that it's
> more
> % stable than the article implies (and
> has
> % been under fairly heavy use.)
>
>
> I've used ReiserFS on production
> systems for some
> time now (most notably under Mandrake
> 7.2 since
> setup is trivial) and the only time I
> had a
> problem was with a semi-sane
> motherboard (it
> *really* screwed up the data!). All
> else has been
> plain sailing and no long fsck's
> (worst case so
> far was 3 seconds on a 27GB drive).
>

I agree, We have serveral servers running
Reiserfs, including our high traffic mail server,
and it runs like a dream. I had some flaky
hardware on my home machine that was causing it to
crash pretty regularly for a while there. I've
running reiserfs there for a while too and I never
had longer than about a one second fs check on a
9gig drive.

I ran it on a laptop and someone attempted,
unsuccessfully, to reinstall it. They only got as
far as repartitioning it. I set the partitions
back and ended up with a corrupted drive. I ran
the full fs check for reiserfs and it fixed
everything. After fixing a few files that ended
up corrupted, it was running great again.

Also, the article forgot to mentions that reiserfs
has the ability to plugin different hashing
algorithms for different types of uses. This
could, in theory, be very useful.

Rcomment-before 20 Feb 2001 09:55 Rcomment-trans wesmo Rcomment-after

Re: Define "proven technology"

> ...[XFS] is still in a tender state on
> Linux. While
> that does not mean that XFS or JFS are
> inherently bad, it does mean that the
> track records of these file systems when
> they were used on the OSs for which they
> were originally designed don't have that
> much bearing on how well these file
> systems will work on Linux. That means
> that when one is talking about how
> mature these file systems are, it is,
> for the most part, only useful to talk
> about how well they perform on Linux.
>
> Given that, it is not all that
> accurate to describe XFS on Linux as
> more mature than ReiserFS.
> Realistically, XFS on Linux is still
> beta.

Agreed! Often it is easier/safer to pull from one's experience (In this case, the author has close ties with SGI), and that can make it a bit easier to forget some of the smaller details: filesystem access at the kernel level is coded specifically to the architecture it is on.

To reiterate the above quoted message: You have to consider the length of time it has existed on that architecture in order to properly measure what is proven technology and what is still alpha/beta..

Rcomment-before 21 Feb 2001 11:08 Rcomment-trans adamjacobmuller Rcomment-after

Ok so i did it.
I reformatted and upgraded to ReiserFS. I decided on ReiserFS because 1. it sounds the best of all and 2. it's natively supported by mandrake linux. my os of choice. not that that was a big deal as my first task now that i'm back in linux is to rebuild with the new 2.4.1-ac20 kernel so... the only problem i encountered was that it refused to reformat my /boot partion .. it's only 30 megs in size so it's not like i really care when i have to fsck it. i'm more concerned about the others which combined can take almost an hour ( i have a 32 gb hard drive.) with that in mind ReiserFS was perfect for me. now i am wondering if there is a way to go from ext2 to ReiserFS without destroying my data. it wasnt a problem on my laptop ( that was the computer i just did ) i could transfer the relitivly little data to other computers. but my other computers have so much data that i dont have the disk space to convert them. some of these computers virtually never fsck because of an invalid reboot but when they do... they have spent 3 hours because i have massive hard drive space on them. ( need to put mp3's some where. ) and they are not that fast. they really don't need to be. but overally this first experience with a journaling FS is cool.. i'm going to cold boot my computer now and see what happens.

Rcomment-before 21 Feb 2001 11:49 Rcomment-trans Aredhead Rcomment-after

Re: Ok so i did it.

> i encountered was that it refused to
> reformat my /boot partion .. it's only
> 30 megs in size so it's not like i
> really care when i have to fsck it.

The reason is, that the log files for reiserFS is about 30mb, so if you would have reiserFS on a partition of aboput 30mb, there wouldnt be any more room for data.

> i'm more concerned about the others which
> combined can take almost an hour ( i
> have a 32 gb hard drive.) with that in
> mind ReiserFS was perfect for me. now i
> am wondering if there is a way to go
> from ext2 to ReiserFS without destroying
> my data. it wasnt a problem on my laptop

The problem with reiserFS is that you will have to reformat the partitions, so I would recomment you to transfer the data onto other computeres while your converting to reiserFS.

Rcomment-before 24 Feb 2001 13:03 Rcomment-trans headbulb Rcomment-after

Re: Ok so i did it.
If you have that many mp3's SHARE

Rcomment-before 25 Feb 2001 07:02 Rcomment-trans mandree Rcomment-after

ReiserFS not yet ready for prime time, ext3fs "works for me"
I've been using ReiserFS 3.5.28 and newer on Linux 2.2.18 now for quite some time, and it has consistently failed to work reliably when NFS has come into play. I usually get files inaccessible on a client every other day, while the server logs vs-13048 trouble to find inode and the server has the same file also inaccessible until ls -laR / is done. NFS and ReiserFS need some more work.
While I've never suffered from permanent file system corruption, ReiserFS is not my favourite. The ReiserFS team themselves agree that their 3.5 fsck tools which are supposed to repair FS damage are far from mature, there are also reports that failing disk blocks can render a whole file system unusable. Plus, there is no dump (NOTE: dumpreiserfs is NOT a backup program, but just reports meta data!), so backups will have to be done by tar which is a lengthy process. Keep away from it for NFS servers, decide on your own if you want to trust more data than an easily recovered /usr partition to it. I'm in the process of getting rid of ReiserFS for /export/home-style directories for now.
As a side node, you can get the Adstar or Distributed Storage Manager backup systems to work on ReiserFS when you define the file system mount point as virtualmountpoint.
On the other hand, I used ext3fs 0.0.2X (don't recall which letter) last summer for two months on a development at work, which never gave me any troubles, but saved me a lot of time on a machine which would lock up for no apparent reason until I exchanged its entire memory modules. I cannot comment whether ext3fs will play nicely with nfsv3, but will try soon.

Rcomment-before 28 Feb 2001 14:49 Rcomment-trans jdanield Rcomment-after

reiser
I use reiser from suse 6.4 on (approx a year now), and am very satisfied. I note that erasing a bunch of files is far more fast than with ext2.

however there are problems booting from reiser and I prefere keeping a small /boot partition with ext2.

one must notice that there are no windows utility to read reiserfs. It's sometimes a problem, but can also be a good thing.

jdd

Rcomment-before 12 Mar 2001 08:19 Rcomment-trans stic Rcomment-after

reiser on large partitions is a great relief
I use reiser since 9 months on kernel 2.2.14 and got no reason to complain.

Mostly my /reiser has to store lots of medium sized files (100k) which

caused a terribly long fsck when on ext2. reiser solved that problem for me.

I cannot confirm the alleged storage economy for small files, though.

One of my applications generates files with only a few hundred bytes

of content. 20000 of them contain 5 MB (= 250 byte average) but du -sk

says they occupied 80 MB of disk space (= 4 Kbyte average).

Maybe i misconfigured (but what ?) or du and ReiserFs are badly coordinated

on my SuSE 6.4 ... shrug, it's not a big drawback for me.

Rcomment-before 05 May 2001 20:52 Rcomment-trans tsikora Rcomment-after

Re: ReiserFS not yet ready for prime time, ext3fs "works for me"
No problems here. Been running ReiserFS on Slack-current
since 2.2.14 with not one glitch. In fact I have found more problems on FFS/softupdates in FreeBSD than Linux. (actually only twice) I have it installed on a bunch of Slack production servers. I highly recommend it. It's the best thing that has happened to Linux in a while. Thanks Hans.

Rcomment-before 01 Aug 2001 18:08 Rcomment-trans hodeleri Rcomment-after

Tired of fscking? Try FFS with softupdates

Well, ok, so it doesn't compeletely remove fscking, but your disks can be brought up immediately after a crash and checked in the background.

For more information, see Kirk McKusick's (the author's) site here (http://www.mckusick.com/softdep/), a paper on softupdates (http://www.usenix.org/publications/library/proceedings/usenix2000/general/seltzer.html) is also available.

(Available on *BSD)

Rcomment-before 10 Aug 2001 10:12 Rcomment-trans dgunia Rcomment-after

Re: ReiserFS Availabilty - production systems

I think reiserfs has still some errors. I have here a reproduceable problem: When I convert a rpm package of the commercial software sniff++ by using alien to a debian package the computer slows down more and more. I managed to shutdown the system because working was not possible any more and it could not unmount the filesystem. After a reboot I had some files in my /tmp directory I could not delete any more (I was root and had all rights, it was definitly a file system problem). I tried this on my notebook and my desktop pc and had both times these problems and could not convert this package. This was both on reiserfs.

Then I changed my filesystem to XFS and tried it again (or on my desktop-pc tried to convert it on a partition with XFS) and I had no problems!

So there IS an error in reiserfs.

Right now we are trying to rescue 50GB of data that were on a reiserfs partition that got some bad sectors. reiserfsck says everything is ok, but when one tries to mount this partition, one gets an "Oops" and the mount process hangs.

And we had more of these problems on different computers. So now we will try XFS, both of my computers already use it as root file system and I have no problems yet :)

No-screenshot

Project Spotlight

FeResPost

A COM component and Ruby extension for finite element results post-processing.

No-screenshot

Project Spotlight

pulse

A continuous integration server.