Message ID | 1337158192-4591-1-git-send-email-tm@tao.ma |
---|---|
State | Superseded, archived |
Headers | show |
On 5/16/12 3:49 AM, Tao Ma wrote: > From: Tao Ma <boyu.mt@taobao.com> > > Now when we set the group inode free count, we don't have a proper > group lock so that multiple threads may decrease the inode free > count at the same time. And e2fsck will complain something like: This is only in the ! EXT4_FEATURE_RO_COMPAT_GDT_CSUM case I guess? That would be worth mentioning in the summary & changelog. I guess you were testing without that for some reason? -eric > Free inodes count wrong for group #1 (1, counted=0). > Fix? no > > Free inodes count wrong for group #2 (3, counted=0). > Fix? no > > Directories count wrong for group #2 (780, counted=779). > Fix? no > > Free inodes count wrong for group #3 (2272, counted=2273). > Fix? no > > So this patch try to protect it with the ext4_lock_group. > > btw, it is found by xfstests test case 269 and I have run it 100 > times and the error in e2fsck doesn't show up again. > > Cc: Theodore Ts'o <tytso@mit.edu> > Signed-off-by: Tao Ma <boyu.mt@taobao.com> > --- > fs/ext4/ialloc.c | 9 +++++---- > 1 files changed, 5 insertions(+), 4 deletions(-) > > diff --git a/fs/ext4/ialloc.c b/fs/ext4/ialloc.c > index 409c2ee..25595e2 100644 > --- a/fs/ext4/ialloc.c > +++ b/fs/ext4/ialloc.c > @@ -772,7 +772,9 @@ got: > ext4_itable_unused_set(sb, gdp, > (EXT4_INODES_PER_GROUP(sb) - ino)); > up_read(&grp->alloc_sem); > - } > + } else > + ext4_lock_group(sb, group); > + > ext4_free_inodes_set(sb, gdp, ext4_free_inodes_count(sb, gdp) - 1); > if (S_ISDIR(mode)) { > ext4_used_dirs_set(sb, gdp, ext4_used_dirs_count(sb, gdp) + 1); > @@ -782,10 +784,9 @@ got: > atomic_inc(&sbi->s_flex_groups[f].used_dirs); > } > } > - if (EXT4_HAS_RO_COMPAT_FEATURE(sb, EXT4_FEATURE_RO_COMPAT_GDT_CSUM)) { > + if (EXT4_HAS_RO_COMPAT_FEATURE(sb, EXT4_FEATURE_RO_COMPAT_GDT_CSUM)) > gdp->bg_checksum = ext4_group_desc_csum(sbi, group, gdp); > - ext4_unlock_group(sb, group); > - } > + ext4_unlock_group(sb, group); > > BUFFER_TRACE(inode_bitmap_bh, "call ext4_handle_dirty_metadata"); > err = ext4_handle_dirty_metadata(handle, NULL, inode_bitmap_bh); -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 05/16/2012 09:43 PM, Eric Sandeen wrote: > On 5/16/12 3:49 AM, Tao Ma wrote: >> From: Tao Ma <boyu.mt@taobao.com> >> >> Now when we set the group inode free count, we don't have a proper >> group lock so that multiple threads may decrease the inode free >> count at the same time. And e2fsck will complain something like: > > This is only in the ! EXT4_FEATURE_RO_COMPAT_GDT_CSUM case I guess? > That would be worth mentioning in the summary & changelog. sure, I will add it in v2. > > I guess you were testing without that for some reason? See my comments below. I found it when running xfstests 269. Thanks Tao > > -eric > >> Free inodes count wrong for group #1 (1, counted=0). >> Fix? no >> >> Free inodes count wrong for group #2 (3, counted=0). >> Fix? no >> >> Directories count wrong for group #2 (780, counted=779). >> Fix? no >> >> Free inodes count wrong for group #3 (2272, counted=2273). >> Fix? no >> >> So this patch try to protect it with the ext4_lock_group. >> >> btw, it is found by xfstests test case 269 and I have run it 100 >> times and the error in e2fsck doesn't show up again. >> >> Cc: Theodore Ts'o <tytso@mit.edu> >> Signed-off-by: Tao Ma <boyu.mt@taobao.com> >> --- >> fs/ext4/ialloc.c | 9 +++++---- >> 1 files changed, 5 insertions(+), 4 deletions(-) >> >> diff --git a/fs/ext4/ialloc.c b/fs/ext4/ialloc.c >> index 409c2ee..25595e2 100644 >> --- a/fs/ext4/ialloc.c >> +++ b/fs/ext4/ialloc.c >> @@ -772,7 +772,9 @@ got: >> ext4_itable_unused_set(sb, gdp, >> (EXT4_INODES_PER_GROUP(sb) - ino)); >> up_read(&grp->alloc_sem); >> - } >> + } else >> + ext4_lock_group(sb, group); >> + >> ext4_free_inodes_set(sb, gdp, ext4_free_inodes_count(sb, gdp) - 1); >> if (S_ISDIR(mode)) { >> ext4_used_dirs_set(sb, gdp, ext4_used_dirs_count(sb, gdp) + 1); >> @@ -782,10 +784,9 @@ got: >> atomic_inc(&sbi->s_flex_groups[f].used_dirs); >> } >> } >> - if (EXT4_HAS_RO_COMPAT_FEATURE(sb, EXT4_FEATURE_RO_COMPAT_GDT_CSUM)) { >> + if (EXT4_HAS_RO_COMPAT_FEATURE(sb, EXT4_FEATURE_RO_COMPAT_GDT_CSUM)) >> gdp->bg_checksum = ext4_group_desc_csum(sbi, group, gdp); >> - ext4_unlock_group(sb, group); >> - } >> + ext4_unlock_group(sb, group); >> >> BUFFER_TRACE(inode_bitmap_bh, "call ext4_handle_dirty_metadata"); >> err = ext4_handle_dirty_metadata(handle, NULL, inode_bitmap_bh); > > -- > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 2012-05-16, at 2:49 AM, Tao Ma wrote: > From: Tao Ma <boyu.mt@taobao.com> > > Now when we set the group inode free count, we don't have a proper > group lock so that multiple threads may decrease the inode free > count at the same time. And e2fsck will complain something like: > > diff --git a/fs/ext4/ialloc.c b/fs/ext4/ialloc.c > index 409c2ee..25595e2 100644 > --- a/fs/ext4/ialloc.c > +++ b/fs/ext4/ialloc.c > @@ -772,7 +772,9 @@ got: > ext4_itable_unused_set(sb, gdp, > (EXT4_INODES_PER_GROUP(sb) - ino)); > up_read(&grp->alloc_sem); > - } > + } else > + ext4_lock_group(sb, group); Minor coding style fix - when one half of if/else is using braces then the other half should also use braces, like: if { : } else { : } > + > ext4_free_inodes_set(sb, gdp, ext4_free_inodes_count(sb, gdp) - 1); > if (S_ISDIR(mode)) { > ext4_used_dirs_set(sb, gdp, ext4_used_dirs_count(sb, gdp) + 1); > @@ -782,10 +784,9 @@ got: > atomic_inc(&sbi->s_flex_groups[f].used_dirs); > } > } > - if (EXT4_HAS_RO_COMPAT_FEATURE(sb, EXT4_FEATURE_RO_COMPAT_GDT_CSUM)) { > + if (EXT4_HAS_RO_COMPAT_FEATURE(sb, EXT4_FEATURE_RO_COMPAT_GDT_CSUM)) > gdp->bg_checksum = ext4_group_desc_csum(sbi, group, gdp); > - ext4_unlock_group(sb, group); > - } > + ext4_unlock_group(sb, group); > > BUFFER_TRACE(inode_bitmap_bh, "call ext4_handle_dirty_metadata"); > err = ext4_handle_dirty_metadata(handle, NULL, inode_bitmap_bh); > -- > 1.7.1 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html Cheers, Andreas -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 05/16/2012 10:58 PM, Andreas Dilger wrote: > On 2012-05-16, at 2:49 AM, Tao Ma wrote: >> From: Tao Ma <boyu.mt@taobao.com> >> >> Now when we set the group inode free count, we don't have a proper >> group lock so that multiple threads may decrease the inode free >> count at the same time. And e2fsck will complain something like: >> >> diff --git a/fs/ext4/ialloc.c b/fs/ext4/ialloc.c >> index 409c2ee..25595e2 100644 >> --- a/fs/ext4/ialloc.c >> +++ b/fs/ext4/ialloc.c >> @@ -772,7 +772,9 @@ got: >> ext4_itable_unused_set(sb, gdp, >> (EXT4_INODES_PER_GROUP(sb) - ino)); >> up_read(&grp->alloc_sem); >> - } >> + } else >> + ext4_lock_group(sb, group); > > Minor coding style fix - when one half of if/else is using braces > then the other half should also use braces, like: > > if { > : > } else { > : > } thank you Andreas, and I will change it in the next version. Thanks Tao > >> + >> ext4_free_inodes_set(sb, gdp, ext4_free_inodes_count(sb, gdp) - 1); >> if (S_ISDIR(mode)) { >> ext4_used_dirs_set(sb, gdp, ext4_used_dirs_count(sb, gdp) + 1); >> @@ -782,10 +784,9 @@ got: >> atomic_inc(&sbi->s_flex_groups[f].used_dirs); >> } >> } >> - if (EXT4_HAS_RO_COMPAT_FEATURE(sb, EXT4_FEATURE_RO_COMPAT_GDT_CSUM)) { >> + if (EXT4_HAS_RO_COMPAT_FEATURE(sb, EXT4_FEATURE_RO_COMPAT_GDT_CSUM)) >> gdp->bg_checksum = ext4_group_desc_csum(sbi, group, gdp); >> - ext4_unlock_group(sb, group); >> - } >> + ext4_unlock_group(sb, group); >> >> BUFFER_TRACE(inode_bitmap_bh, "call ext4_handle_dirty_metadata"); >> err = ext4_handle_dirty_metadata(handle, NULL, inode_bitmap_bh); >> -- >> 1.7.1 >> >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html > > > Cheers, Andreas > > > > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 5/16/12 9:55 AM, Tao Ma wrote: > On 05/16/2012 09:43 PM, Eric Sandeen wrote: >> On 5/16/12 3:49 AM, Tao Ma wrote: >>> From: Tao Ma <boyu.mt@taobao.com> >>> >>> Now when we set the group inode free count, we don't have a proper >>> group lock so that multiple threads may decrease the inode free >>> count at the same time. And e2fsck will complain something like: >> >> This is only in the ! EXT4_FEATURE_RO_COMPAT_GDT_CSUM case I guess? >> That would be worth mentioning in the summary & changelog. > sure, I will add it in v2. >> >> I guess you were testing without that for some reason? > See my comments below. I found it when running xfstests 269. Still not sure how you got a filesystem w/o that feature though, unless I am forgetting something obvious. Isn't it on by default? -Eric -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 05/16/2012 11:49 PM, Eric Sandeen wrote: > On 5/16/12 9:55 AM, Tao Ma wrote: >> On 05/16/2012 09:43 PM, Eric Sandeen wrote: >>> On 5/16/12 3:49 AM, Tao Ma wrote: >>>> From: Tao Ma <boyu.mt@taobao.com> >>>> >>>> Now when we set the group inode free count, we don't have a proper >>>> group lock so that multiple threads may decrease the inode free >>>> count at the same time. And e2fsck will complain something like: >>> >>> This is only in the ! EXT4_FEATURE_RO_COMPAT_GDT_CSUM case I guess? >>> That would be worth mentioning in the summary & changelog. >> sure, I will add it in v2. >>> >>> I guess you were testing without that for some reason? >> See my comments below. I found it when running xfstests 269. > > Still not sure how you got a filesystem w/o that feature though, unless > I am forgetting something obvious. Isn't it on by default? oh, I see. Yes, we mkfs the system with the following configurations: mke2fs -O ^resize_inode,^uninit_bg,extent,meta_bg,flex_bg,ext_attr Maybe that's the reason why it has never be met by others before. ;) Thanks Tao -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 5/16/12 9:17 PM, Tao Ma wrote: > On 05/16/2012 11:49 PM, Eric Sandeen wrote: >> On 5/16/12 9:55 AM, Tao Ma wrote: >>> On 05/16/2012 09:43 PM, Eric Sandeen wrote: >>>> On 5/16/12 3:49 AM, Tao Ma wrote: >>>>> From: Tao Ma <boyu.mt@taobao.com> >>>>> >>>>> Now when we set the group inode free count, we don't have a proper >>>>> group lock so that multiple threads may decrease the inode free >>>>> count at the same time. And e2fsck will complain something like: >>>> >>>> This is only in the ! EXT4_FEATURE_RO_COMPAT_GDT_CSUM case I guess? >>>> That would be worth mentioning in the summary & changelog. >>> sure, I will add it in v2. >>>> >>>> I guess you were testing without that for some reason? >>> See my comments below. I found it when running xfstests 269. >> >> Still not sure how you got a filesystem w/o that feature though, unless >> I am forgetting something obvious. Isn't it on by default? > oh, I see. Yes, we mkfs the system with the following configurations: > mke2fs -O ^resize_inode,^uninit_bg,extent,meta_bg,flex_bg,ext_attr > Maybe that's the reason why it has never be met by others before. ;) Ok, good. I figured it was in some barely-reachable corner of the infinite test matrix ;) -Eric > Thanks > Tao > -- > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, May 17, 2012 at 10:17:34AM +0800, Tao Ma wrote: > oh, I see. Yes, we mkfs the system with the following configurations: > mke2fs -O ^resize_inode,^uninit_bg,extent,meta_bg,flex_bg,ext_attr > Maybe that's the reason why it has never be met by others before. ;) I'm curious -- is there a specific reason you're disabling the group descriptor checksum? Or was that just something that was picked at one point and you haven't changed it since? Thanks, - Ted -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 05/29/2012 06:22 AM, Ted Ts'o wrote: > On Thu, May 17, 2012 at 10:17:34AM +0800, Tao Ma wrote: >> oh, I see. Yes, we mkfs the system with the following configurations: >> mke2fs -O ^resize_inode,^uninit_bg,extent,meta_bg,flex_bg,ext_attr >> Maybe that's the reason why it has never be met by others before. ;) > > I'm curious -- is there a specific reason you're disabling the group > descriptor checksum? Or was that just something that was picked at > one point and you haven't changed it since? We are just disabling the uninit_bg so as to let the block group initialization happen in the mkfs time. I don't know why the checksum is also disabled by ^uninit_bg. Thanks Tao > > Thanks, > > - Ted > -- > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 2012-05-28, at 8:18 PM, Tao Ma wrote: > On 05/29/2012 06:22 AM, Ted Ts'o wrote: >> On Thu, May 17, 2012 at 10:17:34AM +0800, Tao Ma wrote: >>> oh, I see. Yes, we mkfs the system with the following configurations: >>> mke2fs -O ^resize_inode,^uninit_bg,extent,meta_bg,flex_bg,ext_attr >>> Maybe that's the reason why it has never be met by others before. ;) >> >> I'm curious -- is there a specific reason you're disabling the group >> descriptor checksum? Or was that just something that was picked at >> one point and you haven't changed it since? > > We are just disabling the uninit_bg so as to let the block group > initialization happen in the mkfs time. I don't know why the checksum is > also disabled by ^uninit_bg. The checksum is controlled by uninit_bg, because there was a need to ensure the bg_itable_unused count and the UNINIT flags could be trusted when doing an e2fsck. If you don't want to do lazy inode table initialization, that is disabled by "mke2fs -E lazy_itable_init=0". Cheers, Andreas -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, May 28, 2012 at 11:57:59PM -0600, Andreas Dilger wrote: > > We are just disabling the uninit_bg so as to let the block group > > initialization happen in the mkfs time. I don't know why the checksum is > > also disabled by ^uninit_bg. > > The checksum is controlled by uninit_bg, because there was a need to > ensure the bg_itable_unused count and the UNINIT flags could be > trusted when doing an e2fsck. > > If you don't want to do lazy inode table initialization, that is > disabled by "mke2fs -E lazy_itable_init=0". You can also disable whether or not lazy inode table initialization happens by default via /etc/mke2fs.conf. At work we have a very fairly havily modified /etc/mke2fs.conf which has our production-specific mke2fs parameters defined, so that someone who runs mke2fs by hand will get the same result at as the automated systems. I can easily see how if you are trying for predictable latency numbers, disabling lazy inode table initialization makes a huge amount of sense. I'd recommend doing it via /etc/mke2fs.conf, though. Regards, - Ted -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 05/29/2012 08:39 PM, Ted Ts'o wrote: > On Mon, May 28, 2012 at 11:57:59PM -0600, Andreas Dilger wrote: >>> We are just disabling the uninit_bg so as to let the block group >>> initialization happen in the mkfs time. I don't know why the checksum is >>> also disabled by ^uninit_bg. >> >> The checksum is controlled by uninit_bg, because there was a need to >> ensure the bg_itable_unused count and the UNINIT flags could be >> trusted when doing an e2fsck. >> >> If you don't want to do lazy inode table initialization, that is >> disabled by "mke2fs -E lazy_itable_init=0". > > You can also disable whether or not lazy inode table initialization > happens by default via /etc/mke2fs.conf. At work we have a very > fairly havily modified /etc/mke2fs.conf which has our > production-specific mke2fs parameters defined, so that someone who > runs mke2fs by hand will get the same result at as the automated > systems. > > I can easily see how if you are trying for predictable latency > numbers, disabling lazy inode table initialization makes a huge amount > of sense. I'd recommend doing it via /etc/mke2fs.conf, though. OK, thank you all for the good suggestion. Thanks Tao -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/fs/ext4/ialloc.c b/fs/ext4/ialloc.c index 409c2ee..25595e2 100644 --- a/fs/ext4/ialloc.c +++ b/fs/ext4/ialloc.c @@ -772,7 +772,9 @@ got: ext4_itable_unused_set(sb, gdp, (EXT4_INODES_PER_GROUP(sb) - ino)); up_read(&grp->alloc_sem); - } + } else + ext4_lock_group(sb, group); + ext4_free_inodes_set(sb, gdp, ext4_free_inodes_count(sb, gdp) - 1); if (S_ISDIR(mode)) { ext4_used_dirs_set(sb, gdp, ext4_used_dirs_count(sb, gdp) + 1); @@ -782,10 +784,9 @@ got: atomic_inc(&sbi->s_flex_groups[f].used_dirs); } } - if (EXT4_HAS_RO_COMPAT_FEATURE(sb, EXT4_FEATURE_RO_COMPAT_GDT_CSUM)) { + if (EXT4_HAS_RO_COMPAT_FEATURE(sb, EXT4_FEATURE_RO_COMPAT_GDT_CSUM)) gdp->bg_checksum = ext4_group_desc_csum(sbi, group, gdp); - ext4_unlock_group(sb, group); - } + ext4_unlock_group(sb, group); BUFFER_TRACE(inode_bitmap_bh, "call ext4_handle_dirty_metadata"); err = ext4_handle_dirty_metadata(handle, NULL, inode_bitmap_bh);