Posted on | March 1, 2008 | No Comments
A good natured ext3cow user named Nicholas finally nailed a problem that has been perplexing at least five people for the better part of a year. Under heavy usage, ext3cow would crash with a nasty journal abort. No damage was ever done, just remounting fixed it .. but it was impossible to build your code on the same file system that was managing revisions.
Zachary Peterson (its creator), Scott Shinn, myself and others have been pouring over the ext3cow implementation to find something wrong. Nothing was wrong with ext3cow (the fs), a very simple bug in the patched e2fsprogs was the culprit. As Nicholas writes:
I’ve investigated on the problem of journal abortion on ext3cow. When using ext3cow under not-so-heavy disk load, i always get messages like journal_bmap: journal block not found at offset 12 on loop0. This problem appears when the journal tries to reach it’s first indirect block. I’ve tried to remove the journal (tune2fs -O ^has_journal dev), then add it back using tune2fs -O has_journal dev. With an original (unmodified for ext3cow) version of e2fsprogs, this trick worked and allowed me to use ext3cow normally. In fact the mistake is not in ext3cow implementation, but in the modification of e2fsprogs. While creating the filesystem, no indirect (doubly indirect and thirdly indirect) blocks are allocated for the journal. These blocks are not allocated due to the replacement of : ctx->fs->blocksize >> 2 by : ctx->fs->blocksize >> 2 – ((ctx->fs->blocksize >> 2)/32). which is evaluated to 0 due to the lack of parenthesis. the correct expression is : (ctx->fs->blocksize >> 2) – ((ctx->fs->blocksize >> 2)/32). Replacing all the occurences and rebuilding e3cfsprogs worked and solved the problem.
He was right, after correcting that in several places, I’ve found ext3cow extremely hard to break.
Oh happy day