Message ID | 1226053376.3542.8.camel@frecb007923.frec.bull.fr |
---|---|
State | Accepted, archived |
Headers | show |
On Fri, Nov 07, 2008 at 11:22:56AM +0100, Frédéric Bohé wrote: > From: Frederic Bohe <frederic.bohe@bull.net> > > Block group's checksum need to be re-calculated during the > initialization of an UNINIT'd group. This fix a race when several > threads try to allocate a new inode in an UNINIT'd group. This patch looks sane, and so I'll accept it, but there's a higher order hiding here ---- why are we initializing the block bitmap in ext4_new_inode()? Sure, *most* of the time where we create a new inode, we'll be needing to allocate a new block, but sometimes we won't (i.e., when creating a symlink, device file, socket, or a zero-length regular file). More seriously, we don't account for the potential need for an extra journal credit in all of the callers for ext4_new_inode(). Obviously this doesn't get us in trouble because we generally massively overestimate the number of journal credits we need --- but from the point of view of code simplification, maybe code block to ininitialize the block bitmap in ext4_new_inode() should be dropped entirely. We have to do the exact same check in the mballoc.c when we actually allocate blocks --- and in that case we know we'll be modifying the block bitmap, so there's no need to first initialize the block bitmap in ext4_new_inode(), only to need to request to redirty that same block bitmap in mballoc.c when we are really allocating data for the inode. Does that make sense for a future cleanup? - Ted -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Nov 07, 2008 at 08:52:22AM -0500, Theodore Tso wrote: > On Fri, Nov 07, 2008 at 11:22:56AM +0100, Frédéric Bohé wrote: > > From: Frederic Bohe <frederic.bohe@bull.net> > > > > Block group's checksum need to be re-calculated during the > > initialization of an UNINIT'd group. This fix a race when several > > threads try to allocate a new inode in an UNINIT'd group. > > This patch looks sane, and so I'll accept it, but there's a higher > order hiding here ---- why are we initializing the block bitmap in > ext4_new_inode()? Sure, *most* of the time where we create a new > inode, we'll be needing to allocate a new block, but sometimes we > won't (i.e., when creating a symlink, device file, socket, or a > zero-length regular file). Because when we clear the uninitt_bg flag the kernel expect the block bitmap to be correctly indicate blocks containing block bitmap and inode bitmap as used. If mke2fs didn't do that we would need to do the same when we remove the uninit_bg flag. -aneesh -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Nov 07, 2008 at 07:57:18PM +0530, Aneesh Kumar K.V wrote: > On Fri, Nov 07, 2008 at 08:52:22AM -0500, Theodore Tso wrote: > > On Fri, Nov 07, 2008 at 11:22:56AM +0100, Frédéric Bohé wrote: > > > From: Frederic Bohe <frederic.bohe@bull.net> > > > > > > Block group's checksum need to be re-calculated during the > > > initialization of an UNINIT'd group. This fix a race when several > > > threads try to allocate a new inode in an UNINIT'd group. > > > > This patch looks sane, and so I'll accept it, but there's a higher > > order hiding here ---- why are we initializing the block bitmap in > > ext4_new_inode()? Sure, *most* of the time where we create a new > > inode, we'll be needing to allocate a new block, but sometimes we > > won't (i.e., when creating a symlink, device file, socket, or a > > zero-length regular file). > > Because when we clear the uninitt_bg flag the kernel expect the block > bitmap to be correctly indicate blocks containing block > bitmap and inode bitmap as used. If mke2fs didn't do that we would > need to do the same when we remove the uninit_bg flag. We have separate flags inidicating whether the block allocation bitmap and inode allocation bitmaps are initialized or not, EXT4_BG_BLOCK_UNINIT, and EXT4_BG_INODE_UNINIT, respectively. So what I am proposing is to not initialize the block bitmap in ext4_new_inode(), and not to clear the EXT4_BG_BLOCK_UNINIT flag, either. - Ted -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Nov 07, 2008 09:38 -0500, Theodore Ts'o wrote: > On Fri, Nov 07, 2008 at 07:57:18PM +0530, Aneesh Kumar K.V wrote: > > Because when we clear the uninitt_bg flag the kernel expect the block > > bitmap to be correctly indicate blocks containing block > > bitmap and inode bitmap as used. If mke2fs didn't do that we would > > need to do the same when we remove the uninit_bg flag. > > We have separate flags inidicating whether the block allocation bitmap > and inode allocation bitmaps are initialized or not, > EXT4_BG_BLOCK_UNINIT, and EXT4_BG_INODE_UNINIT, respectively. So what > I am proposing is to not initialize the block bitmap in > ext4_new_inode(), and not to clear the EXT4_BG_BLOCK_UNINIT flag, either. That would be dangerous, because the block group _would_ be in use due to the fact that one of the inode table blocks is in use. That isn't to say we couldn't adopt sematics as you suggest (e.g. that INODE_UNINIT not being set implies that the inode table blocks are in use regardless of whether or not BLOCK_UNINIT is set, but it needs careful consideration. Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc. -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Index: linux/fs/ext4/ialloc.c =================================================================== --- linux.orig/fs/ext4/ialloc.c 2008-11-06 17:22:14.000000000 +0100 +++ linux/fs/ext4/ialloc.c 2008-11-07 10:43:41.000000000 +0100 @@ -718,6 +718,8 @@ got: gdp->bg_flags &= cpu_to_le16(~EXT4_BG_BLOCK_UNINIT); free = ext4_free_blocks_after_init(sb, group, gdp); gdp->bg_free_blocks_count = cpu_to_le16(free); + gdp->bg_checksum = ext4_group_desc_csum(sbi, group, + gdp); } spin_unlock(sb_bgl_lock(sbi, group)); --