Message ID | 20240710040654.1714672-4-libaokun@huaweicloud.com |
---|---|
State | Superseded |
Headers | show |
Series | ext4: some bugfixes and cleanups for ext4 extents path | expand |
On Wed 10-07-24 12:06:37, libaokun@huaweicloud.com wrote: > From: Baokun Li <libaokun1@huawei.com> > > In ext4_ext_try_to_merge_up(), set path[1].p_bh to NULL after it has been > released, otherwise it may be released twice. > > An example of what triggers this is as follows: > > split2 map split1 > |--------|-------|--------| > > ext4_ext_map_blocks > ext4_ext_handle_unwritten_extents > ext4_split_convert_extents > // path->p_depth == 0 > ext4_split_extent > // 1. do split1 > ext4_split_extent_at > ext4_ext_insert_extent > ext4_ext_create_new_leaf > ext4_ext_grow_indepth > le16_add_cpu(&neh->eh_depth, 1) > ext4_find_extent > path->p_depth = 1 > ext4_ext_try_to_merge > ext4_ext_try_to_merge_up > path->p_depth = 0 > brelse(path[1].p_bh) ---> not set to NULL here > // 2. update path > ext4_find_extent > // 3. do split2 > ext4_split_extent_at > ext4_ext_insert_extent > ext4_ext_create_new_leaf > ext4_ext_grow_indepth > le16_add_cpu(&neh->eh_depth, 1) > ext4_find_extent > path[0].p_bh = NULL; > path->p_depth = 1 > read_extent_tree_block ---> return err > // path[1].p_bh is still the old value > ext4_free_ext_path > ext4_ext_drop_refs > // path->p_depth == 1 > brelse(path[1].p_bh) ---> brelse a buffer twice > > Finally got the following WARRNING when removing the buffer from lru: > > ============================================ > VFS: brelse: Trying to free free buffer > WARNING: CPU: 2 PID: 72 at fs/buffer.c:1241 __brelse+0x58/0x90 > CPU: 2 PID: 72 Comm: kworker/u19:1 Not tainted 6.9.0-dirty #716 > RIP: 0010:__brelse+0x58/0x90 > Call Trace: > <TASK> > __find_get_block+0x6e7/0x810 > bdev_getblk+0x2b/0x480 > __ext4_get_inode_loc+0x48a/0x1240 > ext4_get_inode_loc+0xb2/0x150 > ext4_reserve_inode_write+0xb7/0x230 > __ext4_mark_inode_dirty+0x144/0x6a0 > ext4_ext_insert_extent+0x9c8/0x3230 > ext4_ext_map_blocks+0xf45/0x2dc0 > ext4_map_blocks+0x724/0x1700 > ext4_do_writepages+0x12d6/0x2a70 > [...] > ============================================ > > Fixes: ecb94f5fdf4b ("ext4: collapse a single extent tree block into the inode if possible") > Cc: stable@kernel.org > Signed-off-by: Baokun Li <libaokun1@huawei.com> Looks good. Feel free to add: Reviewed-by: Jan Kara <jack@suse.cz> Honza > --- > fs/ext4/extents.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c > index 4d589d34b30e..657baf3991c1 100644 > --- a/fs/ext4/extents.c > +++ b/fs/ext4/extents.c > @@ -1888,6 +1888,7 @@ static void ext4_ext_try_to_merge_up(handle_t *handle, > path[0].p_hdr->eh_max = cpu_to_le16(max_root); > > brelse(path[1].p_bh); > + path[1].p_bh = NULL; > ext4_free_blocks(handle, inode, NULL, blk, 1, > EXT4_FREE_BLOCKS_METADATA | EXT4_FREE_BLOCKS_FORGET); > } > -- > 2.39.2 >
On Wed, Jul 10, 2024 at 12:06:37PM +0800, libaokun@huaweicloud.com wrote: > From: Baokun Li <libaokun1@huawei.com> > > In ext4_ext_try_to_merge_up(), set path[1].p_bh to NULL after it has been > released, otherwise it may be released twice. > > An example of what triggers this is as follows: > > split2 map split1 > |--------|-------|--------| > > ext4_ext_map_blocks > ext4_ext_handle_unwritten_extents > ext4_split_convert_extents > // path->p_depth == 0 > ext4_split_extent > // 1. do split1 > ext4_split_extent_at > ext4_ext_insert_extent > ext4_ext_create_new_leaf > ext4_ext_grow_indepth > le16_add_cpu(&neh->eh_depth, 1) > ext4_find_extent > path->p_depth = 1 > ext4_ext_try_to_merge > ext4_ext_try_to_merge_up > path->p_depth = 0 > brelse(path[1].p_bh) ---> not set to NULL here > // 2. update path > ext4_find_extent > // 3. do split2 > ext4_split_extent_at > ext4_ext_insert_extent > ext4_ext_create_new_leaf > ext4_ext_grow_indepth > le16_add_cpu(&neh->eh_depth, 1) > ext4_find_extent > path[0].p_bh = NULL; > path->p_depth = 1 > read_extent_tree_block ---> return err > // path[1].p_bh is still the old value > ext4_free_ext_path > ext4_ext_drop_refs > // path->p_depth == 1 > brelse(path[1].p_bh) ---> brelse a buffer twice Hi Baokun, If i'm not wrong, in this trace, we'll enter ext4_ext_insert_extent() with gb_flags having EXT4_GET_BLOCKS_PRE_IO so we won't actually go for a merge_up. That being said, there seems to be a few places where we follow the split-insert pattern and it might be possible that one of those call sites might not be passing EXT4_GET_BLOCKS_PRE_IO and we'll the double free issue you mentioned. I'll check and update if I see anything. On a separate note, isn't it a bit weird that we grow the tree indepth (which includes allocation, marking buffer dirty etc) only to later merge it up again and throwing all the changes we did while growing the tree. Surely we could optimize this particular case somehow right? > > Finally got the following WARRNING when removing the buffer from lru: > > ============================================ > VFS: brelse: Trying to free free buffer > WARNING: CPU: 2 PID: 72 at fs/buffer.c:1241 __brelse+0x58/0x90 > CPU: 2 PID: 72 Comm: kworker/u19:1 Not tainted 6.9.0-dirty #716 > RIP: 0010:__brelse+0x58/0x90 > Call Trace: > <TASK> > __find_get_block+0x6e7/0x810 > bdev_getblk+0x2b/0x480 > __ext4_get_inode_loc+0x48a/0x1240 > ext4_get_inode_loc+0xb2/0x150 > ext4_reserve_inode_write+0xb7/0x230 > __ext4_mark_inode_dirty+0x144/0x6a0 > ext4_ext_insert_extent+0x9c8/0x3230 > ext4_ext_map_blocks+0xf45/0x2dc0 > ext4_map_blocks+0x724/0x1700 > ext4_do_writepages+0x12d6/0x2a70 > [...] > ============================================ > > Fixes: ecb94f5fdf4b ("ext4: collapse a single extent tree block into the inode if possible") > Cc: stable@kernel.org > Signed-off-by: Baokun Li <libaokun1@huawei.com> > --- > fs/ext4/extents.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c > index 4d589d34b30e..657baf3991c1 100644 > --- a/fs/ext4/extents.c > +++ b/fs/ext4/extents.c > @@ -1888,6 +1888,7 @@ static void ext4_ext_try_to_merge_up(handle_t *handle, > path[0].p_hdr->eh_max = cpu_to_le16(max_root); > > brelse(path[1].p_bh); > + path[1].p_bh = NULL; Anyways, I agree that adding this here is the right thing to do: Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> Thanks, Ojaswin > ext4_free_blocks(handle, inode, NULL, blk, 1, > EXT4_FREE_BLOCKS_METADATA | EXT4_FREE_BLOCKS_FORGET); > } > -- > 2.39.2 >
On 2024/7/26 19:45, Ojaswin Mujoo wrote: > On Wed, Jul 10, 2024 at 12:06:37PM +0800, libaokun@huaweicloud.com wrote: >> From: Baokun Li <libaokun1@huawei.com> >> >> In ext4_ext_try_to_merge_up(), set path[1].p_bh to NULL after it has been >> released, otherwise it may be released twice. >> >> An example of what triggers this is as follows: >> >> split2 map split1 >> |--------|-------|--------| >> >> ext4_ext_map_blocks >> ext4_ext_handle_unwritten_extents >> ext4_split_convert_extents >> // path->p_depth == 0 >> ext4_split_extent >> // 1. do split1 >> ext4_split_extent_at >> ext4_ext_insert_extent >> ext4_ext_create_new_leaf >> ext4_ext_grow_indepth >> le16_add_cpu(&neh->eh_depth, 1) >> ext4_find_extent >> path->p_depth = 1 >> ext4_ext_try_to_merge >> ext4_ext_try_to_merge_up >> path->p_depth = 0 >> brelse(path[1].p_bh) ---> not set to NULL here >> // 2. update path >> ext4_find_extent >> // 3. do split2 >> ext4_split_extent_at >> ext4_ext_insert_extent >> ext4_ext_create_new_leaf >> ext4_ext_grow_indepth >> le16_add_cpu(&neh->eh_depth, 1) >> ext4_find_extent >> path[0].p_bh = NULL; >> path->p_depth = 1 >> read_extent_tree_block ---> return err >> // path[1].p_bh is still the old value >> ext4_free_ext_path >> ext4_ext_drop_refs >> // path->p_depth == 1 >> brelse(path[1].p_bh) ---> brelse a buffer twice > Hi Baokun, > > If i'm not wrong, in this trace, we'll enter ext4_ext_insert_extent() with > gb_flags having EXT4_GET_BLOCKS_PRE_IO so we won't actually go for a > merge_up. > > That being said, there seems to be a few places where we follow the > split-insert pattern and it might be possible that one of those call > sites might not be passing EXT4_GET_BLOCKS_PRE_IO and we'll the double > free issue you mentioned. I'll check and update if I see anything. Hi Ojaswin, You're right. I am very sorry for the confusion. The trace here is wrong, this patch should actually be placed after the two UAF patches. Here ext4_ext_try_to_merge() is called when trying zeroout in ext4_split_extent_at(). It is called when trying zeroout with or without EXT4_GET_BLOCKS_PRE_IO.The correct trace is as follows: split2 map split1 |--------|-------|--------| ext4_ext_map_blocks ext4_ext_handle_unwritten_extents ext4_split_convert_extents // path->p_depth == 0 ext4_split_extent // 1. do split1 ext4_split_extent_at |ext4_ext_insert_extent | ext4_ext_create_new_leaf | ext4_ext_grow_indepth | le16_add_cpu(&neh->eh_depth, 1) | ext4_find_extent | // return -ENOMEM |// get error and try zeroout |path = ext4_find_extent | path->p_depth = 1 |ext4_ext_try_to_merge | ext4_ext_try_to_merge_up | path->p_depth = 0 | brelse(path[1].p_bh) ---> not set to NULL here |// zeroout success // 2. update path ext4_find_extent // 3. do split2 ext4_split_extent_at ext4_ext_insert_extent ext4_ext_create_new_leaf ext4_ext_grow_indepth le16_add_cpu(&neh->eh_depth, 1) ext4_find_extent path[0].p_bh = NULL; path->p_depth = 1 read_extent_tree_block ---> return err // path[1].p_bh is still the old value ext4_free_ext_path ext4_ext_drop_refs // path->p_depth == 1 brelse(path[1].p_bh) ---> brelse a buffer twice I'll adjust the order of the patches and correct the trace in the next version. > On a separate note, isn't it a bit weird that we grow the tree indepth > (which includes allocation, marking buffer dirty etc) only to later > merge it up again and throwing all the changes we did while growing the > tree. Surely we could optimize this particular case somehow right? Sorry that my trace misled you. It seems reasonable to try to merge extent in error handling. >> Finally got the following WARRNING when removing the buffer from lru: >> >> ============================================ >> VFS: brelse: Trying to free free buffer >> WARNING: CPU: 2 PID: 72 at fs/buffer.c:1241 __brelse+0x58/0x90 >> CPU: 2 PID: 72 Comm: kworker/u19:1 Not tainted 6.9.0-dirty #716 >> RIP: 0010:__brelse+0x58/0x90 >> Call Trace: >> <TASK> >> __find_get_block+0x6e7/0x810 >> bdev_getblk+0x2b/0x480 >> __ext4_get_inode_loc+0x48a/0x1240 >> ext4_get_inode_loc+0xb2/0x150 >> ext4_reserve_inode_write+0xb7/0x230 >> __ext4_mark_inode_dirty+0x144/0x6a0 >> ext4_ext_insert_extent+0x9c8/0x3230 >> ext4_ext_map_blocks+0xf45/0x2dc0 >> ext4_map_blocks+0x724/0x1700 >> ext4_do_writepages+0x12d6/0x2a70 >> [...] >> ============================================ >> >> Fixes: ecb94f5fdf4b ("ext4: collapse a single extent tree block into the inode if possible") >> Cc: stable@kernel.org >> Signed-off-by: Baokun Li <libaokun1@huawei.com> >> --- >> fs/ext4/extents.c | 1 + >> 1 file changed, 1 insertion(+) >> >> diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c >> index 4d589d34b30e..657baf3991c1 100644 >> --- a/fs/ext4/extents.c >> +++ b/fs/ext4/extents.c >> @@ -1888,6 +1888,7 @@ static void ext4_ext_try_to_merge_up(handle_t *handle, >> path[0].p_hdr->eh_max = cpu_to_le16(max_root); >> >> brelse(path[1].p_bh); >> + path[1].p_bh = NULL; > Anyways, I agree that adding this here is the right thing to do: > > Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> > > Thanks, > Ojaswin Thanks for the review! >> ext4_free_blocks(handle, inode, NULL, blk, 1, >> EXT4_FREE_BLOCKS_METADATA | EXT4_FREE_BLOCKS_FORGET); >> } >> -- >> 2.39.2 >>
On Tue, Jul 30, 2024 at 04:47:56PM +0800, Baokun Li wrote: > On 2024/7/26 19:45, Ojaswin Mujoo wrote: > > On Wed, Jul 10, 2024 at 12:06:37PM +0800, libaokun@huaweicloud.com wrote: > > > From: Baokun Li <libaokun1@huawei.com> > > > snip > > Hi Baokun, > > > > If i'm not wrong, in this trace, we'll enter ext4_ext_insert_extent() with > > gb_flags having EXT4_GET_BLOCKS_PRE_IO so we won't actually go for a > > merge_up. > > > > That being said, there seems to be a few places where we follow the > > split-insert pattern and it might be possible that one of those call > > sites might not be passing EXT4_GET_BLOCKS_PRE_IO and we'll the double > > free issue you mentioned. I'll check and update if I see anything. > Hi Ojaswin, > > You're right. I am very sorry for the confusion. > > The trace here is wrong, this patch should actually be placed after the two > UAF patches. Here ext4_ext_try_to_merge() is called when trying zeroout in > ext4_split_extent_at(). It is called when trying zeroout with or without > EXT4_GET_BLOCKS_PRE_IO.The correct trace is as follows: > > split2 map split1 > |--------|-------|--------| > > ext4_ext_map_blocks > ext4_ext_handle_unwritten_extents > ext4_split_convert_extents > // path->p_depth == 0 > ext4_split_extent > // 1. do split1 > ext4_split_extent_at > |ext4_ext_insert_extent > | ext4_ext_create_new_leaf > | ext4_ext_grow_indepth > | le16_add_cpu(&neh->eh_depth, 1) > | ext4_find_extent > | // return -ENOMEM > |// get error and try zeroout > |path = ext4_find_extent > | path->p_depth = 1 > |ext4_ext_try_to_merge > | ext4_ext_try_to_merge_up > | path->p_depth = 0 > | brelse(path[1].p_bh) ---> not set to NULL here > |// zeroout success > // 2. update path > ext4_find_extent > // 3. do split2 > ext4_split_extent_at > ext4_ext_insert_extent > ext4_ext_create_new_leaf > ext4_ext_grow_indepth > le16_add_cpu(&neh->eh_depth, 1) > ext4_find_extent > path[0].p_bh = NULL; > path->p_depth = 1 > read_extent_tree_block ---> return err > // path[1].p_bh is still the old value > ext4_free_ext_path > ext4_ext_drop_refs > // path->p_depth == 1 > brelse(path[1].p_bh) ---> brelse a buffer twice > > I'll adjust the order of the patches and correct the trace in the next > version. Hey Baokun, The corrected trace looks good, thanks. Regards, Ojaswin
diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c index 4d589d34b30e..657baf3991c1 100644 --- a/fs/ext4/extents.c +++ b/fs/ext4/extents.c @@ -1888,6 +1888,7 @@ static void ext4_ext_try_to_merge_up(handle_t *handle, path[0].p_hdr->eh_max = cpu_to_le16(max_root); brelse(path[1].p_bh); + path[1].p_bh = NULL; ext4_free_blocks(handle, inode, NULL, blk, 1, EXT4_FREE_BLOCKS_METADATA | EXT4_FREE_BLOCKS_FORGET); }