Message ID | 20240215155009.94493-1-mheyne@amazon.de |
---|---|
State | Awaiting Upstream |
Headers | show |
Series | [v2] ext4: fix corruption during on-line resize | expand |
-----Original Message----- From: Maximilian Heyne <mheyne@amazon.de> Sent: Thursday, February 15, 2024 9:20 PM Cc: ravib@amazon.com; Maximilian Heyne <mheyne@amazon.de>; stable@vger.kernel.org; Theodore Ts'o <tytso@mit.edu>; Andreas Dilger <adilger.kernel@dilger.ca>; Yongqiang Yang <xiaoqiangnk@gmail.com>; linux-ext4@vger.kernel.org; linux-kernel@vger.kernel.org Subject: [External] : [PATCH v2] ext4: fix corruption during on-line resize > We observed a corruption during on-line resize of a file system that is > larger than 16 TiB with 4k block size. With having more then 2^32 blocks > resize_inode is turned off by default by mke2fs. The issue can be > reproduced on a smaller file system for convenience by explicitly > turning off resize_inode. An on-line resize across an 8 GiB boundary (the > size of a meta block group in this setup) then leads to a corruption: > > dev=/dev/<some_dev> # should be >= 16 GiB > mkdir -p /corruption > /sbin/mke2fs -t ext4 -b 4096 -O ^resize_inode $dev $((2 * 2**21 - 2**15)) > mount -t ext4 $dev /corruption > > dd if=/dev/zero bs=4096 of=/corruption/test count=$((2*2**21 - 4*2**15)) > sha1sum /corruption/test > # 79d2658b39dcfd77274e435b0934028adafaab11 /corruption/test > > /sbin/resize2fs $dev $((2*2**21)) > # drop page cache to force reload the block from disk > echo 1 > /proc/sys/vm/drop_caches > > sha1sum /corruption/test > # 3c2abc63cbf1a94c9e6977e0fbd72cd832c4d5c3 /corruption/test > > 2^21 = 2^15*2^6 equals 8 GiB whereof 2^15 is the number of blocks per > block group and 2^6 are the number of block groups that make a meta > block group. > > The last checksum might be different depending on how the file is laid > out across the physical blocks. The actual corruption occurs at physical > block 63*2^15 = 2064384 which would be the location of the backup of the > meta block group's block descriptor. During the on-line resize the file > system will be converted to meta_bg starting at s_first_meta_bg which is > 2 in the example - meaning all block groups after 16 GiB. However, in > ext4_flex_group_add we might add block groups that are not part of the > first meta block group yet. In the reproducer we achieved this by > substracting the size of a whole block group from the point where the > meta block group would start. This must be considered when updating the > backup block group descriptors to follow the non-meta_bg layout. The fix > is to add a test whether the group to add is already part of the meta > block group or not. > > Fixes: 01f795f9e0d67 ("ext4: add online resizing support for meta_bg and 64-bit file systems") > Cc: stable@vger.kernel.org Tested the patch across filesystem of various sizes and blocksizes. The patch stops the corruption. > Signed-off-by: Maximilian Heyne <mheyne@amazon.de> Tested-by: Srivathsa Dara <srivathsa.d.dara@oracle.com> Reviewed-by: Srivathsa Dara <srivathsa.d.dara@oracle.com> > --- > fs/ext4/resize.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/fs/ext4/resize.c b/fs/ext4/resize.c > index 4d4a5a32e310..3c0d12382e06 100644 > --- a/fs/ext4/resize.c > +++ b/fs/ext4/resize.c > @@ -1602,7 +1602,8 @@ static int ext4_flex_group_add(struct super_block *sb, > int gdb_num = group / EXT4_DESC_PER_BLOCK(sb); > int gdb_num_end = ((group + flex_gd->count - 1) / > EXT4_DESC_PER_BLOCK(sb)); > - int meta_bg = ext4_has_feature_meta_bg(sb); > + int meta_bg = ext4_has_feature_meta_bg(sb) && > + gdb_num >= le32_to_cpu(es->s_first_meta_bg); > sector_t padding_blocks = meta_bg ? 0 : sbi->s_sbh->b_blocknr - > ext4_group_first_block_no(sb, 0); > > -- > 2.40.1 > > > > > Amazon Development Center Germany GmbH > Krausenstr. 38 > 10117 Berlin > Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss > Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B > Sitz: Berlin > Ust-ID: DE 289 237 879
On Thu, 15 Feb 2024 15:50:09 +0000, Maximilian Heyne wrote: > We observed a corruption during on-line resize of a file system that is > larger than 16 TiB with 4k block size. With having more then 2^32 blocks > resize_inode is turned off by default by mke2fs. The issue can be > reproduced on a smaller file system for convenience by explicitly > turning off resize_inode. An on-line resize across an 8 GiB boundary (the > size of a meta block group in this setup) then leads to a corruption: > > [...] Applied, thanks! [1/1] ext4: fix corruption during on-line resize commit: 3a944549dd26ccaf1f898a4be952e75a42bf37dd Best regards,
diff --git a/fs/ext4/resize.c b/fs/ext4/resize.c index 4d4a5a32e310..3c0d12382e06 100644 --- a/fs/ext4/resize.c +++ b/fs/ext4/resize.c @@ -1602,7 +1602,8 @@ static int ext4_flex_group_add(struct super_block *sb, int gdb_num = group / EXT4_DESC_PER_BLOCK(sb); int gdb_num_end = ((group + flex_gd->count - 1) / EXT4_DESC_PER_BLOCK(sb)); - int meta_bg = ext4_has_feature_meta_bg(sb); + int meta_bg = ext4_has_feature_meta_bg(sb) && + gdb_num >= le32_to_cpu(es->s_first_meta_bg); sector_t padding_blocks = meta_bg ? 0 : sbi->s_sbh->b_blocknr - ext4_group_first_block_no(sb, 0);
We observed a corruption during on-line resize of a file system that is larger than 16 TiB with 4k block size. With having more then 2^32 blocks resize_inode is turned off by default by mke2fs. The issue can be reproduced on a smaller file system for convenience by explicitly turning off resize_inode. An on-line resize across an 8 GiB boundary (the size of a meta block group in this setup) then leads to a corruption: dev=/dev/<some_dev> # should be >= 16 GiB mkdir -p /corruption /sbin/mke2fs -t ext4 -b 4096 -O ^resize_inode $dev $((2 * 2**21 - 2**15)) mount -t ext4 $dev /corruption dd if=/dev/zero bs=4096 of=/corruption/test count=$((2*2**21 - 4*2**15)) sha1sum /corruption/test # 79d2658b39dcfd77274e435b0934028adafaab11 /corruption/test /sbin/resize2fs $dev $((2*2**21)) # drop page cache to force reload the block from disk echo 1 > /proc/sys/vm/drop_caches sha1sum /corruption/test # 3c2abc63cbf1a94c9e6977e0fbd72cd832c4d5c3 /corruption/test 2^21 = 2^15*2^6 equals 8 GiB whereof 2^15 is the number of blocks per block group and 2^6 are the number of block groups that make a meta block group. The last checksum might be different depending on how the file is laid out across the physical blocks. The actual corruption occurs at physical block 63*2^15 = 2064384 which would be the location of the backup of the meta block group's block descriptor. During the on-line resize the file system will be converted to meta_bg starting at s_first_meta_bg which is 2 in the example - meaning all block groups after 16 GiB. However, in ext4_flex_group_add we might add block groups that are not part of the first meta block group yet. In the reproducer we achieved this by substracting the size of a whole block group from the point where the meta block group would start. This must be considered when updating the backup block group descriptors to follow the non-meta_bg layout. The fix is to add a test whether the group to add is already part of the meta block group or not. Fixes: 01f795f9e0d67 ("ext4: add online resizing support for meta_bg and 64-bit file systems") Cc: stable@vger.kernel.org Signed-off-by: Maximilian Heyne <mheyne@amazon.de> --- fs/ext4/resize.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)