Message ID | AANLkTimLAL7hPL+Yb392hyVAWEfHtObv7WwjzKt2awur@mail.gmail.com |
---|---|
State | Not Applicable, archived |
Headers | show |
On Sat, Feb 26, 2011 at 10:40:19PM +0200, Amir Goldstein wrote: > This patch skips the orphan cleanup if readonly compatible features > would prevent the fs from being mounted (or remounted) readwrite. I use the "mount readonly" option to, for instance, view/check the filesystem to determine wether or not I need to fsck first. I use the "readonly" feature to prevent the mounting to be a mistake-prone situation. It prevents e.g. applications from dropping temporary files in my current directory. Every time fsck or such a cleanup does something, there is the option of the cleanup or fixup being wrong. When you honour the "readonly" request from the user, the careful user can go back to the situation where he/she started. If the cleanup/fixup is really neccesary, do so in in-core buffers of the filesystem. Write the infrastructure that allows us to have dirty buffers that MAY NOT (yet?!?) be written to the device. This will also solve the problem of journal recovery on readonly mount of a root filesystem. when it has been fscked, and it's remounted rw, we can remove the ban on the writeback of the dirty buffers. So I stronly disagree with your patch: It should not only prevent the cleanup when writing is not allows due to ro-compat situation, but also when requested by the user. Roger. ------------------ Back in the old days I was still using minix. Linux didn't exist or wasn't usable enough. I had something that needed removing, so I typed rm -rf *, thinking I was in the directory that needed removing. I wasn't! There went my (modified) kernel tree! It took me some three seconds to find, verify and execute the solution: Powerswitch. In such an incident, cleaning up inodes that you think were deleted anyway removes information about files that may need recovery. Imagine that accidentally my big database file was unlinked. Ooops. But the database server is still keeping the file open. Phew. We can continue to run until we find a solution.... So the inode is now orphaned. But we can recover it with some filesystem magic. Maybe not by answering yes to fsck questions, but it is recoverable without dataloss, right? Then the power goes out... Ooops. Instead of two more days to get everything ready for the recovery we have to do it NOW. There goes. Boot from CD, let's just mount the partition readonly to get access to our tools and binaries that may facilitate the recovery of our database. BAM! Away goes the nicely allocated inode. (ext2 used to just mark the inode as not in use, ext3 cleans it up so that we lose the pointers to the 7 datablocks, the indirect block and double indirect block!) Now we will have to guess where the file started etc etc. The principle is: Do as I say. That keeps things predictable. If you try to outguess the user, it will be horribly wrong every once in a while. You're right. In 99% of the cases, the system just crashed/rebooted while some temporary files were still open, but already deleted. And in 99% of the cases, the system will boot, perform an automatic fsck and eventually remount rw. So writing those orphaned inodes back early doesn't make a real difference.
On Mon, Feb 28, 2011 at 12:14 PM, Rogier Wolff <R.E.Wolff@bitwizard.nl> wrote: > > On Sat, Feb 26, 2011 at 10:40:19PM +0200, Amir Goldstein wrote: >> This patch skips the orphan cleanup if readonly compatible features >> would prevent the fs from being mounted (or remounted) readwrite. > > I use the "mount readonly" option to, for instance, view/check the > filesystem to determine wether or not I need to fsck first. I use the > "readonly" feature to prevent the mounting to be a mistake-prone > situation. It prevents e.g. applications from dropping temporary files > in my current directory. > > Every time fsck or such a cleanup does something, there is the option > of the cleanup or fixup being wrong. When you honour the "readonly" > request from the user, the careful user can go back to the situation > where he/she started. > > If the cleanup/fixup is really neccesary, do so in in-core buffers of > the filesystem. Write the infrastructure that allows us to have dirty > buffers that MAY NOT (yet?!?) be written to the device. This will also > solve the problem of journal recovery on readonly mount of a root > filesystem. when it has been fscked, and it's remounted rw, we can > remove the ban on the writeback of the dirty buffers. > > So I stronly disagree with your patch: It should not only prevent the > cleanup when writing is not allows due to ro-compat situation, but > also when requested by the user. > I am not disagreeing with your disagreement, but implementing what you desire has more implications than my patch, so I rather that my patch gets merged, because it is mostly-harmless and we can continue the discussion, whether or not file systems need to honor the readonly mount option to the word in parallel. Amir. -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sat 26-02-11 22:40:19, Amir Goldstein wrote: > Orphan cleanup is currently executed even on readonly mount. > It deletes inodes and frees blocks, which could be very bad for some > RO_COMPAT features, HAS_SNAPSHOT to name one. > > Now the damage is done, because current stock kernels may > corrupt future fs with readonly compatible features, > when doing orphan cleanup. > > This patch skips the orphan cleanup if readonly compatible features > would prevent the fs from being mounted (or remounted) readwrite. Thanks Amir. I actually took the version Ted committed to ext4 and ported it for ext3. Anyway, the patch is in my tree. Honza
On Mon 28-02-11 11:14:55, Rogier Wolff wrote: > > On Sat, Feb 26, 2011 at 10:40:19PM +0200, Amir Goldstein wrote: > > This patch skips the orphan cleanup if readonly compatible features > > would prevent the fs from being mounted (or remounted) readwrite. > > I use the "mount readonly" option to, for instance, view/check the > filesystem to determine wether or not I need to fsck first. I use the > "readonly" feature to prevent the mounting to be a mistake-prone > situation. It prevents e.g. applications from dropping temporary files > in my current directory. > > Every time fsck or such a cleanup does something, there is the option > of the cleanup or fixup being wrong. When you honour the "readonly" > request from the user, the careful user can go back to the situation > where he/she started. > > If the cleanup/fixup is really neccesary, do so in in-core buffers of Mounting (even read-only) without recovering the journal will give you a view of a corrupted filesystem. Usually not what you want (although I agree with you that there are occasions where this *is* what you want). > the filesystem. Write the infrastructure that allows us to have dirty > buffers that MAY NOT (yet?!?) be written to the device. This will also > solve the problem of journal recovery on readonly mount of a root > filesystem. when it has been fscked, and it's remounted rw, we can > remove the ban on the writeback of the dirty buffers. Yes, this would be a nice feature but noone ever got to implementing it. You are welcome to contribute it :). > So I stronly disagree with your patch: It should not only prevent the > cleanup when writing is not allows due to ro-compat situation, but > also when requested by the user. As Amir said, the patch is trivial and a clear improvement. So it goes in. Honza
On Mon, Feb 28, 2011 at 11:14:55AM +0100, Rogier Wolff wrote: > If the cleanup/fixup is really neccesary, do so in in-core buffers of > the filesystem. Write the infrastructure that allows us to have dirty > buffers that MAY NOT (yet?!?) be written to the device. This will also > solve the problem of journal recovery on readonly mount of a root > filesystem. when it has been fscked, and it's remounted rw, we can > remove the ban on the writeback of the dirty buffers. That would be an interesting and useful thing to add, but it's not a trivial change. There is the risk that if the journal is very large, and system memory is very, that there might not be enough memory to hold all of the dirty buffers (or it might handicap the machine). That shouldn't be an issue on reasonably configured machines, but as we know, not all file servers are reasonably configured (see previous discussions about tdb :-) If someone want to work on this as a project, that would be great, but to be honest it's not high on my priority list at the moment. - Ted -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Feb 28, 2011 at 07:22:01PM +0100, Jan Kara wrote: > As Amir said, the patch is trivial and a clear improvement. So it goes > in. You might want to look at the improvements I made to the ext4 version of the patch, in terms of a slightly better error message and commit description, and make similar changes to the ext3 version of the patch. See: http://article.gmane.org/gmane.comp.file-systems.ext4/23620 - Ted -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 2/28/11 12:22 PM, Jan Kara wrote: > On Mon 28-02-11 11:14:55, Rogier Wolff wrote: >> >> On Sat, Feb 26, 2011 at 10:40:19PM +0200, Amir Goldstein wrote: >>> This patch skips the orphan cleanup if readonly compatible features >>> would prevent the fs from being mounted (or remounted) readwrite. >> >> I use the "mount readonly" option to, for instance, view/check the >> filesystem to determine wether or not I need to fsck first. I use the >> "readonly" feature to prevent the mounting to be a mistake-prone >> situation. It prevents e.g. applications from dropping temporary files >> in my current directory. >> >> Every time fsck or such a cleanup does something, there is the option >> of the cleanup or fixup being wrong. When you honour the "readonly" >> request from the user, the careful user can go back to the situation >> where he/she started. >> >> If the cleanup/fixup is really neccesary, do so in in-core buffers of > Mounting (even read-only) without recovering the journal will give you a > view of a corrupted filesystem. Usually not what you want (although I agree > with you that there are occasions where this *is* what you want). Also, you can tell ext3/4 -not- to recover the journal, or, you can mark the device RO before mount. So while not the default behavior, it is at least possible. -Eric -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon 28-02-11 13:49:29, Ted Ts'o wrote: > On Mon, Feb 28, 2011 at 07:22:01PM +0100, Jan Kara wrote: > > As Amir said, the patch is trivial and a clear improvement. So it goes > > in. > > You might want to look at the improvements I made to the ext4 version > of the patch, in terms of a slightly better error message and commit > description, and make similar changes to the ext3 version of the > patch. > > See: http://article.gmane.org/gmane.comp.file-systems.ext4/23620 Yes, I took your version and ported it to ext3 in the end :). Honza
Hi, Please apply commit ce654b37 to any stable kernel out there. The relevant ext4 patch is in Ted's tree and has not reached mainline yet. Ted, can you please add CC: stable to my patch. Thanks, Amir. On Mon, Feb 28, 2011 at 8:09 PM, Jan Kara <jack@suse.cz> wrote: > On Sat 26-02-11 22:40:19, Amir Goldstein wrote: >> Orphan cleanup is currently executed even on readonly mount. >> It deletes inodes and frees blocks, which could be very bad for some >> RO_COMPAT features, HAS_SNAPSHOT to name one. >> >> Now the damage is done, because current stock kernels may >> corrupt future fs with readonly compatible features, >> when doing orphan cleanup. >> >> This patch skips the orphan cleanup if readonly compatible features >> would prevent the fs from being mounted (or remounted) readwrite. > Thanks Amir. I actually took the version Ted committed to ext4 and ported > it for ext3. Anyway, the patch is in my tree. > > Honza > > -- > Jan Kara <jack@suse.cz> > SUSE Labs, CR > -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Mar 24, 2011 at 12:34:05PM +0200, Amir Goldstein wrote: > Hi, > > Please apply commit ce654b37 to any stable kernel out there. Now applied, thanks. greg k-h -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/fs/ext3/super.c b/fs/ext3/super.c index 85c8cc8..6cdd575 100644 --- a/fs/ext3/super.c +++ b/fs/ext3/super.c @@ -1464,6 +1464,13 @@ static void ext3_orphan_cleanup (struct super_block * sb, return; } + /* Check if feature set allows readwrite operations */ + if (EXT3_HAS_RO_COMPAT_FEATURE(sb, ~EXT3_FEATURE_RO_COMPAT_SUPP)) { + ext3_msg(sb, KERN_INFO, "Skipping orphan cleanup on readonly-" + "compatible fs"); + return; + } + if (EXT3_SB(sb)->s_mount_state & EXT3_ERROR_FS) { if (es->s_last_orphan) jbd_debug(1, "Errors on filesystem, "
Orphan cleanup is currently executed even on readonly mount. It deletes inodes and frees blocks, which could be very bad for some RO_COMPAT features, HAS_SNAPSHOT to name one. Now the damage is done, because current stock kernels may corrupt future fs with readonly compatible features, when doing orphan cleanup. This patch skips the orphan cleanup if readonly compatible features would prevent the fs from being mounted (or remounted) readwrite. Signed-off-by: Amir Goldstein <amir73il@users.sf.net> --- fs/ext3/super.c | 7 +++++++ 1 files changed, 7 insertions(+), 0 deletions(-) -- 1.7.0.4 -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html