Message ID | AANLkTimAe-p1-ApmnueybQY474J3_G9JL+8WJ2WTn+04@mail.gmail.com |
---|---|
State | New, archived |
Headers | show |
Am I missing something? The kernel stores up to 3k worth of data, on a 4k block file system. Whereas e2fsck patch blindly assume 2k worhth of data regardless of the block size. The kernel patch looks ok, but the e2fsprogs patch seems badly broken.... - Ted -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Also, the kernel patch seems to write the messages in 256-byte records, whereas the e2fsck patch assumes the messages are packed together in null-terminated packed lines.... - Ted -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Jan 7, 2011 at 7:26 AM, Ted Ts'o <tytso@mit.edu> wrote: > Am I missing something? The kernel stores up to 3k worth of data, on > a 4k block file system. Whereas e2fsck patch blindly assume 2k worhth > of data regardless of the block size. The kernel patch looks ok, but > the e2fsprogs patch seems badly broken.... > > - Ted > I thought that ext4 super block on 4K block is 1K at offset 1K, so kernel only uses the remaining 2K. It may very well be that e2fsck patch only works with 4K block size properly, because I never tested it on other block sizes. Amir. -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Jan 7, 2011 at 7:28 AM, Ted Ts'o <tytso@mit.edu> wrote: > Also, the kernel patch seems to write the messages in 256-byte > records, whereas the e2fsck patch assumes the messages are packed > together in null-terminated packed lines.... > > - Ted > I guess that is the reason I always see only the first message on fsck :-) It was always enough information for me though... feel free to amend to e2fsck patch to match the kernel side. Thanks, Amir. -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Jan 7, 2011 at 9:43 PM, Amir Goldstein <amir73il@gmail.com> wrote: > On Fri, Jan 7, 2011 at 7:28 AM, Ted Ts'o <tytso@mit.edu> wrote: >> Also, the kernel patch seems to write the messages in 256-byte >> records, whereas the e2fsck patch assumes the messages are packed >> together in null-terminated packed lines.... I checked again. e2fsck patch DOES read messages in 256-byte records: +static void e2fsck_print_message_buffer(e2fsck_t ctx) ... +#define MSGLEN 256 ... + fputs(buf+offset, stdout); + offset += MSGLEN; >> >> - Ted >> > > I guess that is the reason I always see only the first message on fsck :-) > It was always enough information for me though... > feel free to amend to e2fsck patch to match the kernel side. > > Thanks, > Amir. > -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Jan 7, 2011 at 9:41 PM, Amir Goldstein <amir73il@gmail.com> wrote: > On Fri, Jan 7, 2011 at 7:26 AM, Ted Ts'o <tytso@mit.edu> wrote: >> Am I missing something? The kernel stores up to 3k worth of data, on >> a 4k block file system. Whereas e2fsck patch blindly assume 2k worhth >> of data regardless of the block size. The kernel patch looks ok, but >> the e2fsprogs patch seems badly broken.... So it's not badly broken, it copies blocksize-2K, which is clumsily written like this: + int len = ctx->fs->blocksize - 2*SUPERBLOCK_OFFSET; After that, only 4K block and 8K block will have a leftover, which will be copied from journal sb+1K to ext4 sb+1K. Yes, 2K blocks - no message buffer for you! The reason I am only copying 2K and throwing the extra K is this: print_message_buffer() is called from check_super_block(), *after* journal recovery, which was executed either by e2fsck itself or (and more likely in a errors=remount-ro situation) by ext4 on read-only mount. In the latter case, the extra K must be discarded, so I saw no reason to write special code for the first case. Neither did I find a good reason to complicate the recording code and limit it to record only blocksize-2K. >> >> - Ted >> > > I thought that ext4 super block on 4K block is 1K at offset 1K, so > kernel only uses the remaining 2K. > > It may very well be that e2fsck patch only works with 4K block size properly, > because I never tested it on other block sizes. > > Amir. > -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Jan 7, 2011 at 11:07 PM, Amir Goldstein <amir73il@gmail.com> wrote: > On Fri, Jan 7, 2011 at 9:41 PM, Amir Goldstein <amir73il@gmail.com> wrote: >> On Fri, Jan 7, 2011 at 7:26 AM, Ted Ts'o <tytso@mit.edu> wrote: >>> Am I missing something? The kernel stores up to 3k worth of data, on >>> a 4k block file system. Whereas e2fsck patch blindly assume 2k worhth >>> of data regardless of the block size. The kernel patch looks ok, but >>> the e2fsprogs patch seems badly broken.... > > So it's not badly broken, it copies blocksize-2K, which is clumsily > written like this: > + int len = ctx->fs->blocksize - 2*SUPERBLOCK_OFFSET; > > After that, only 4K block and 8K block will have a leftover, > which will be copied from journal sb+1K to ext4 sb+1K. > Yes, 2K blocks - no message buffer for you! > > The reason I am only copying 2K and throwing the extra K is this: > print_message_buffer() is called from check_super_block(), > *after* journal recovery, which was executed either by e2fsck itself > or (and more likely in a errors=remount-ro situation) by ext4 > on read-only mount. > In the latter case, the extra K must be discarded, so I saw no reason > to write special code for the first case. > Neither did I find a good reason to complicate the recording code > and limit it to record only blocksize-2K. > > Ted, I have a suggestion how to use the wasted extra K. As I pointed out in the past, the first/last_error_xxx statistics are most likely to be lost in errors=panic and errors=remount-ro (journal recovery will override super block) If you record this information in the last K of journal sb (even copy the entire ext4 sb), you can then override ext4 sb with the most up-to-date error stats after journal recovery. Amir. -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Jan 07, 2011 at 11:07:23PM +0200, Amir Goldstein wrote: > On Fri, Jan 7, 2011 at 9:41 PM, Amir Goldstein <amir73il@gmail.com> wrote: > > On Fri, Jan 7, 2011 at 7:26 AM, Ted Ts'o <tytso@mit.edu> wrote: > >> Am I missing something? The kernel stores up to 3k worth of data, on > >> a 4k block file system. Whereas e2fsck patch blindly assume 2k worhth > >> of data regardless of the block size. The kernel patch looks ok, but > >> the e2fsprogs patch seems badly broken.... > > So it's not badly broken, it copies blocksize-2K, which is clumsily > written like this: > + int len = ctx->fs->blocksize - 2*SUPERBLOCK_OFFSET; So this should be: int len = ctx->fs->blocksize - SUPERBLOCK_OFFSET - sizeof (<superblock>); Although those two numbers are equal right now, there is no reason to assume that they will remain so in the future. So if the superblock size (or the offset) changes in the future, it's much better to have programmed this so that it will keep on working as opposed to getting to deal with ugly bugs in code that hasn't changed in years... Roger.
On Sat, Jan 8, 2011 at 12:12 AM, Amir Goldstein <amir73il@gmail.com> wrote: > On Fri, Jan 7, 2011 at 11:07 PM, Amir Goldstein <amir73il@gmail.com> wrote: >> On Fri, Jan 7, 2011 at 9:41 PM, Amir Goldstein <amir73il@gmail.com> wrote: >>> On Fri, Jan 7, 2011 at 7:26 AM, Ted Ts'o <tytso@mit.edu> wrote: >>>> Am I missing something? The kernel stores up to 3k worth of data, on >>>> a 4k block file system. Whereas e2fsck patch blindly assume 2k worhth >>>> of data regardless of the block size. The kernel patch looks ok, but >>>> the e2fsprogs patch seems badly broken.... >> >> So it's not badly broken, it copies blocksize-2K, which is clumsily >> written like this: >> + int len = ctx->fs->blocksize - 2*SUPERBLOCK_OFFSET; >> >> After that, only 4K block and 8K block will have a leftover, >> which will be copied from journal sb+1K to ext4 sb+1K. >> Yes, 2K blocks - no message buffer for you! >> >> The reason I am only copying 2K and throwing the extra K is this: >> print_message_buffer() is called from check_super_block(), >> *after* journal recovery, which was executed either by e2fsck itself >> or (and more likely in a errors=remount-ro situation) by ext4 >> on read-only mount. >> In the latter case, the extra K must be discarded, so I saw no reason >> to write special code for the first case. >> Neither did I find a good reason to complicate the recording code >> and limit it to record only blocksize-2K. >> >> > > Ted, > > I have a suggestion how to use the wasted extra K. > > As I pointed out in the past, the first/last_error_xxx statistics are most > likely to be lost in errors=panic and errors=remount-ro (journal > recovery will override super block) > If you record this information in the last K of journal sb (even copy > the entire ext4 sb), > you can then override ext4 sb with the most up-to-date error stats > after journal recovery. > > Amir. > Ted, I just realize you did implement save&update of super block s_error_xxx fields. However, I wonder if it is not a bug to call ext4_commit_super() from save_error_info() to commit the s_error_xxx fields in the first place. The super block buffer is most likely participating in the current transaction and should not be committed to disk. The alleged bug is likely to be hidden by the fact that the super block has most likely participated also in the last committed transaction, so a valid version of it will most likely override the invalid version. I can imagine there could be a corner case, though, when committing the super block a head of transaction commit will result in inconsistencies. Am I missing something? Amir. -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sat, Jan 08, 2011 at 09:05:20AM +0100, Rogier Wolff wrote: > Although those two numbers are equal right now, there is no reason to > assume that they will remain so in the future. So if the superblock > size (or the offset) changes in the future, it's much better to have > programmed this so that it will keep on working as opposed to getting > to deal with ugly bugs in code that hasn't changed in years... No. The superblock nor its offset will never change. It's like the syscall ABI, only worse. If we changed it would break *everybody*. Fortunately there is a huge amount of space left over in the 1024 byte superblock. - Ted -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sat, Jan 08, 2011 at 05:00:59PM -0500, Ted Ts'o wrote: > On Sat, Jan 08, 2011 at 09:05:20AM +0100, Rogier Wolff wrote: > > Although those two numbers are equal right now, there is no reason to > > assume that they will remain so in the future. So if the superblock > > size (or the offset) changes in the future, it's much better to have > > programmed this so that it will keep on working as opposed to getting > > to deal with ugly bugs in code that hasn't changed in years... > > No. The superblock nor its offset will never change. It's like the > syscall ABI, only worse. If we changed it would break *everybody*. > Fortunately there is a huge amount of space left over in the 1024 byte > superblock. It's called defensive programming. It prevents bugs before they happen. By your reasoning you could've written 2048 or 0x800 there. My version: - documents why something is subtracted from the blockszize and how much. - keeps on working even if the superblock would oddly change size in the future. Even if now you don't expect that to happen ever. Roger.
On Sun, Jan 09, 2011 at 09:12:49AM +0100, Rogier Wolff wrote: > > No. The superblock nor its offset will never change. It's like the > > syscall ABI, only worse. If we changed it would break *everybody*. > > Fortunately there is a huge amount of space left over in the 1024 byte > > superblock. > > It's called defensive programming. It prevents bugs before they > happen. By your reasoning you could've written 2048 or 0x800 there. Defensive programming would be something like BUG_ON(sizeof(struct ext4_super_block) != 1024); (unfortunately #error sizeof(struct ext4_super_block) != 1024 won't work since #error is handled by the preprocessor, and I don't think we can trigger a compile-time warning for a structure size issue). We could add that, if people like. I do have regression tests (i.e., boot a system with ext4) which would die if anything like that changed, though. And yes, I have similar regression tests in e2fsprogs that would trigger if the superblock size were to ever change. - Ted P.S. The only way I can think of to do it at compile time would be to build a test .o file with -g, and then use a program like pahole that pulls the information out of the DWARF information. Might actually be a good thing to do that, since it could also be useful for automating searches for unoptimize structures. Unfortunately, many developers don't have the DWARF utilities installed, so that would add a dependency on the kernel build. -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 2011-01-09, at 07:58, Ted Ts'o wrote: > Defensive programming would be something like > > BUG_ON(sizeof(struct ext4_super_block) != 1024); > > (unfortunately #error sizeof(struct ext4_super_block) != 1024 won't > work since #error is handled by the preprocessor, and I don't think we > can trigger a compile-time warning for a structure size issue). We actually use a compile-time assertion in the Lustre code: /* * compile-time assertions. @cond has to be constant expression. * ISO C Standard: * * 6.8.4.2 The switch statement * * .... * * [#3] The expression of each case label shall be an integer * constant expression and no two of the case constant expressions * in the same switch statement shall have the same value after * after conversion... */ #define CLASSERT(cond) ({ switch(42) { case (cond): case 0: break; } }) Of course, the "cond" expression must be resolvable at compile time, or it will turn into a compile error because it is not legal to have a variable "case". Cheers, Andreas -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sun, Jan 09, 2011 at 09:58:38AM -0500, Ted Ts'o wrote: > On Sun, Jan 09, 2011 at 09:12:49AM +0100, Rogier Wolff wrote: > > > No. The superblock nor its offset will never change. It's like the > > > syscall ABI, only worse. If we changed it would break *everybody*. > > > Fortunately there is a huge amount of space left over in the 1024 byte > > > superblock. > > > > It's called defensive programming. It prevents bugs before they > > happen. By your reasoning you could've written 2048 or 0x800 there. > > Defensive programming would be something like > > BUG_ON(sizeof(struct ext4_super_block) != 1024); It is one form of "defense", but not what I call defensive programming. Defensive programming, is that you make things robust in the the face of unexpected changes. If you do the BUG_ON and do this throughout the code, one day your grandson will be increasing the superblock size. He'll fix all the BUG_ON and your other "defensive" measures. But lo and behold. He's human, and forgot one or two. Especially the run-time detections that only get called occasionally (like in this case on an error sitatuation) might take a while before they are noticed. What use is it to turn a "we've found a serious error in your filesystem, we strongly recommend you no longer write to it and run fsck first" into a "system halted"? What is wrong with just putting the right formula where it belongs? We need to set the variable "len" to the amount of free space beyond the superblock in the first block of the filesystem. So we take the size of the first block, subtract the size of the superblock and we subtract the start of the superblock. It's as simple as that. > We could add that, if people like. I do have regression tests (i.e., > boot a system with ext4) which would die if anything like that > changed, though. How about Makefile: ext4.o: ...the objects.... testsbsize.out testsbsize.out: testsbsize ./testsbsize (oh and something about useing "hostcc" for testsbsize). with testsbsize.c: #include <stdio.h> #include <...ext4....> int main (int argc, char **argv) { if (sizeof (struct ext4_super_block) != 1024) { fprintf (stderr, "Superblock size is %d, should be 1024.\n", sizeof (struct ext4_super_block)); exit (1); } exit (0); } Roger.
diff --git a/fs/ext4/super.c b/fs/ext4/super.c index 7728a4c..8888815 100644 --- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -290,10 +290,53 @@ int __ext4_journal_stop(const char *where, unsigned int line, handle_t *handle) return err; } +/* record error messages after journal super block */ +static void ext4_record_journal_err(struct super_block *sb, const char *where, + const char *function, const char *fmt, va_list args) +{ +#define MSGLEN 256 + journal_t *journal = EXT4_SB(sb)->s_journal; + char *buf; + unsigned long offset; + int len; + + if (!journal) + return; + + buf = (char *)journal->j_superblock; + offset = (unsigned long)buf % sb->s_blocksize; + buf += sizeof(journal_superblock_t); + offset += sizeof(journal_superblock_t); + + /* seek to end of message buffer */ + while (offset < sb->s_blocksize && *buf) { + buf += MSGLEN; + offset += MSGLEN; + } + + if (offset+MSGLEN > sb->s_blocksize) + /* no space left in message buffer */ + return; + + len = snprintf(buf, MSGLEN, "%s: %s: ", where, function); + len += vsnprintf(buf+len, MSGLEN-len, fmt, args); +} + +static void ext4_record_journal_errstr(struct super_block *sb, + const char *where, const char *function, ...) +{ + va_list args; + + va_start(args, function); + ext4_record_journal_err(sb, where, function, "%s\n", args); + va_end(args); +} + void ext4_journal_abort_handle(const char *caller, unsigned int line, const char *err_fn, struct buffer_head *bh, handle_t *handle, int err) { + struct super_block *sb = handle->h_transaction->t_journal->j_private; char nbuf[16]; const char *errstr = ext4_decode_error(NULL, err, nbuf);