Message ID | 20241010025855.2632516-1-liubaolin12138@163.com |
---|---|
State | New |
Headers | show |
Series | [v1] ext4: fix a assertion failure due to ungranted bh dirting | expand |
On Thu 10-10-24 10:58:55, Baolin Liu wrote: > From: Baolin Liu <liubaolin@kylinos.cn> > > Since the merge of commit 3910b513fcdf ("ext4: persist the new uptodate > buffers in ext4_journalled_zero_new_buffers"), a new assertion failure > occurred under a old kernel(ext3, data=journal, pagesize=64k) with > corresponding ported patches: ... > which was caused by bh dirting without calling > do_journal_get_write_access(). > > In the loop for all bhs of a page in ext4_block_write_begin(), > when a err occurred, it will jump out of loop. > But that will leaves some bhs being processed and some not, > which will lead to the asserion failure in calling write_end_fn(). Thanks for the patch but I don't understand one thing here: For ext4_journalled_zero_new_buffers() to call write_end_fn() the buffer must have buffer_new flag set. That flag can get set only by ext4_get_block() function when it succeeds in which case we also call do_journal_get_write_access(). So how is it possible that buffer_new was set on a buffer on which we didn't call do_journal_get_write_access()? This indicates there may be some deeper problem hidden. How exactly did you trigger this problem? Honza > > To fixed that, get write access for the rest unprocessed bhs, just > as what write_end_fn do. > > Fixes: 3910b513fcdf ("ext4: persist the new uptodate buffers in ext4_journalled_zero_new_buffers") > Reported-and-tested-by: Zhi Long <longzhi@sangfor.com.cn> > Suggested-by: Shida Zhang <zhangshida@kylinos.cn> > Signed-off-by: Baolin Liu <liubaolin@kylinos.cn> > --- > fs/ext4/inode.c | 17 ++++++++++++++++- > 1 file changed, 16 insertions(+), 1 deletion(-) > > diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c > index 54bdd4884fe6..a72f951288e4 100644 > --- a/fs/ext4/inode.c > +++ b/fs/ext4/inode.c > @@ -1102,9 +1102,24 @@ int ext4_block_write_begin(handle_t *handle, struct folio *folio, > err = -EIO; > } > if (unlikely(err)) { > - if (should_journal_data) > + if (should_journal_data) { > + if (bh != head || !block_start) { > + do { > + block_end = block_start + bh->b_size; > + > + if (buffer_new(bh)) > + if (block_end > from && block_start < to) > + do_journal_get_write_access(handle, > + inode, bh); > + > + block_start = block_end; > + bh = bh->b_this_page; > + } while (bh != head); > + } > + > ext4_journalled_zero_new_buffers(handle, inode, folio, > from, to); > + } > else > folio_zero_new_buffers(folio, from, to); > } else if (fscrypt_inode_uses_fs_layer_crypto(inode)) { > -- > 2.39.2 >
> Greetings, > This problem is reproduced by our customer using their own testing tool “run_bug”. > When I consulted with a client, the testing tool “run_bug” used a variety of background programs to benchmark > (including memory pressure, cpu pressure, file cycle manipulation, fsstress Stress testing tool, postmark program,and so on). > The recurrence probability is relatively low. > > In response to your query, in ext4_block_write_begin, the new state will be clear before get block, > and the bh that failed get_block will not be set to new. > However, when the page size is greater than the block size, a page will contain multiple bh. > bh->b_this_page is a circular list for managing all bh on the same page. > After get_block jumps out of the for loop, then bh->b_this_page is not processed by clear new in the for loop. > So after calling ext4_journalled_zero_new_buffers, > The ext4_journalled_zero_new_buffers function will determine all bh of the same page and call write_end_fn if they are in new state, > get_block returns err's bh->b_this_page and circular list other unhandled bh if the state was previously set to new. > Because bh not get access, the corresponding transaction is not placed in jh->b_transaction, resulting in a crash. > > Therefore, the patch processing method I submit is to make unprocessed bh determines if it is in new state and get access. > There is another way to handle the remaining bh clear_buffer_new that is not processed. > I personally recommend get access this way, the impact is small. > Please guide the two processing methods, which one do you recommend? 在 2024/10/10 17:29, Jan Kara 写道: > On Thu 10-10-24 10:58:55, Baolin Liu wrote: >> From: Baolin Liu <liubaolin@kylinos.cn> >> >> Since the merge of commit 3910b513fcdf ("ext4: persist the new uptodate >> buffers in ext4_journalled_zero_new_buffers"), a new assertion failure >> occurred under a old kernel(ext3, data=journal, pagesize=64k) with >> corresponding ported patches: > ... >> which was caused by bh dirting without calling >> do_journal_get_write_access(). >> >> In the loop for all bhs of a page in ext4_block_write_begin(), >> when a err occurred, it will jump out of loop. >> But that will leaves some bhs being processed and some not, >> which will lead to the asserion failure in calling write_end_fn(). > > Thanks for the patch but I don't understand one thing here: For > ext4_journalled_zero_new_buffers() to call write_end_fn() the buffer must > have buffer_new flag set. That flag can get set only by ext4_get_block() > function when it succeeds in which case we also call > do_journal_get_write_access(). So how is it possible that buffer_new was > set on a buffer on which we didn't call do_journal_get_write_access()? This > indicates there may be some deeper problem hidden. How exactly did you > trigger this problem? > > Honza > >> >> To fixed that, get write access for the rest unprocessed bhs, just >> as what write_end_fn do. >> >> Fixes: 3910b513fcdf ("ext4: persist the new uptodate buffers in ext4_journalled_zero_new_buffers") >> Reported-and-tested-by: Zhi Long <longzhi@sangfor.com.cn> >> Suggested-by: Shida Zhang <zhangshida@kylinos.cn> >> Signed-off-by: Baolin Liu <liubaolin@kylinos.cn> >> --- >> fs/ext4/inode.c | 17 ++++++++++++++++- >> 1 file changed, 16 insertions(+), 1 deletion(-) >> >> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c >> index 54bdd4884fe6..a72f951288e4 100644 >> --- a/fs/ext4/inode.c >> +++ b/fs/ext4/inode.c >> @@ -1102,9 +1102,24 @@ int ext4_block_write_begin(handle_t *handle, struct folio *folio, >> err = -EIO; >> } >> if (unlikely(err)) { >> - if (should_journal_data) >> + if (should_journal_data) { >> + if (bh != head || !block_start) { >> + do { >> + block_end = block_start + bh->b_size; >> + >> + if (buffer_new(bh)) >> + if (block_end > from && block_start < to) >> + do_journal_get_write_access(handle, >> + inode, bh); >> + >> + block_start = block_end; >> + bh = bh->b_this_page; >> + } while (bh != head); >> + } >> + >> ext4_journalled_zero_new_buffers(handle, inode, folio, >> from, to); >> + } >> else >> folio_zero_new_buffers(folio, from, to); >> } else if (fscrypt_inode_uses_fs_layer_crypto(inode)) { >> -- >> 2.39.2 >>
> Greetings, > Regarding this issue, > I was able to reproduce it quickly by injecting faults via module parameter passing in fsstest while testing simultaneously. > And we tested that neither get access nor clear new would reproduce the issue after injecting faults. > Could you please take a look at which approach, get access or clear new, is better? > > The fsstress testing and injection fault command are as follows: > fsstress_arm -d "/fsstress_dir2/" -l 102400 -n 100 -p 128 -r -S -s 10 -c > watch -n 1 "echo 1 > /sys/module/ext4/parameters/inject_fault" > > The injected fault test is modified as follows, > where the module parameter inject_fault injects the fault, > and the module parameter try_fix selects whether to handle the fault by getting access or clearing the new: > > diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c > index b231cd437..590f84391 100644 > --- a/fs/ext4/inode.c > +++ b/fs/ext4/inode.c > @@ -50,6 +50,12 @@ > > #define MPAGE_DA_EXTENT_TAIL 0x01 > > +static int ext4_inject_fault __read_mostly; > +module_param_named(inject_fault, ext4_inject_fault, int, 0644); > +static int ext4_try_fix __read_mostly; > +module_param_named(try_fix, ext4_try_fix, int, 0644); > + > + > static void ext4_journalled_zero_new_buffers(handle_t *handle, > struct page *page, > unsigned int from, unsigned int to); > @@ -1065,6 +1071,12 @@ int ext4_block_write_begin(handle_t *handle, struct page *page, loff_t pos, unsi > clear_buffer_new(bh); > if (!buffer_mapped(bh)) { > WARN_ON(bh->b_size != blocksize); > + if (unlikely(ext4_inject_fault)) { > + ext4_inject_fault = 0; > + ext4_warning(inode->i_sb, "XXX inject fault get_block return -ENOSPC\n"); > + err = -ENOSPC; > + break; > + } > err = get_block(inode, block, bh, 1); > if (err) > break; > @@ -1116,10 +1128,31 @@ int ext4_block_write_begin(handle_t *handle, struct page *page, loff_t pos, unsi > err = -EIO; > } > if (unlikely(err)) > - if (should_journal_data) > + if (should_journal_data) { > + if(bh != head || !block_start) { > + do { > + block_end = block_start + bh->b_size; > + > + if (buffer_new(bh)) > + if (block_end > from && block_start < to) { > + if (ext4_try_fix == 1) { > + ext4_warning(inode->i_sb, "XXX try fix 1\n"); > + do_journal_get_write_access(handle, > + bh); > + } else if (ext4_try_fix == 2) { > + ext4_warning(inode->i_sb, "XXX try fix 2\n"); > + clear_buffer_new(bh); > + } > + } > + > + block_start = block_end; > + bh = bh->b_this_page; > + } while (bh != head); > + } > + > ext4_journalled_zero_new_buffers(handle, page, from, > to); > - else > + } else > page_zero_new_buffers(page, from, to); > else if (decrypt) > err = fscrypt_decrypt_page(page->mapping->host, page, 在 2024/10/11 14:18, liubaolin 写道: >> Greetings, >> This problem is reproduced by our customer using their own testing >> tool “run_bug”. >> When I consulted with a client, the testing tool “run_bug” used a >> variety of background programs to benchmark (including memory >> pressure, cpu pressure, file cycle manipulation, fsstress Stress >> testing tool, postmark program,and so on). >> The recurrence probability is relatively low. >> >> In response to your query, in ext4_block_write_begin, the new state >> will be clear before get block, and the bh that failed get_block will >> not be set to new. >> However, when the page size is greater than the block size, a page >> will contain multiple bh. bh->b_this_page is a circular list for >> managing all bh on the same page. >> After get_block jumps out of the for loop, then bh->b_this_page is not >> processed by clear new in the for loop. >> So after calling ext4_journalled_zero_new_buffers, >> The ext4_journalled_zero_new_buffers function will determine all bh of >> the same page and call write_end_fn if they are in new state, >> get_block returns err's bh->b_this_page and circular list other >> unhandled bh if the state was previously set to new. >> Because bh not get access, the corresponding transaction is not placed >> in jh->b_transaction, resulting in a crash. >> >> Therefore, the patch processing method I submit is to make unprocessed >> bh determines if it is in new state and get access. >> There is another way to handle the remaining bh clear_buffer_new that >> is not processed. >> I personally recommend get access this way, the impact is small. >> Please guide the two processing methods, which one do you recommend? > > > > 在 2024/10/10 17:29, Jan Kara 写道: >> On Thu 10-10-24 10:58:55, Baolin Liu wrote: >>> From: Baolin Liu <liubaolin@kylinos.cn> >>> >>> Since the merge of commit 3910b513fcdf ("ext4: persist the new uptodate >>> buffers in ext4_journalled_zero_new_buffers"), a new assertion failure >>> occurred under a old kernel(ext3, data=journal, pagesize=64k) with >>> corresponding ported patches: >> ... >>> which was caused by bh dirting without calling >>> do_journal_get_write_access(). >>> >>> In the loop for all bhs of a page in ext4_block_write_begin(), >>> when a err occurred, it will jump out of loop. >>> But that will leaves some bhs being processed and some not, >>> which will lead to the asserion failure in calling write_end_fn(). >> >> Thanks for the patch but I don't understand one thing here: For >> ext4_journalled_zero_new_buffers() to call write_end_fn() the buffer must >> have buffer_new flag set. That flag can get set only by ext4_get_block() >> function when it succeeds in which case we also call >> do_journal_get_write_access(). So how is it possible that buffer_new was >> set on a buffer on which we didn't call do_journal_get_write_access()? >> This >> indicates there may be some deeper problem hidden. How exactly did you >> trigger this problem? >> >> Honza >> >>> >>> To fixed that, get write access for the rest unprocessed bhs, just >>> as what write_end_fn do. >>> >>> Fixes: 3910b513fcdf ("ext4: persist the new uptodate buffers in >>> ext4_journalled_zero_new_buffers") >>> Reported-and-tested-by: Zhi Long <longzhi@sangfor.com.cn> >>> Suggested-by: Shida Zhang <zhangshida@kylinos.cn> >>> Signed-off-by: Baolin Liu <liubaolin@kylinos.cn> >>> --- >>> fs/ext4/inode.c | 17 ++++++++++++++++- >>> 1 file changed, 16 insertions(+), 1 deletion(-) >>> >>> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c >>> index 54bdd4884fe6..a72f951288e4 100644 >>> --- a/fs/ext4/inode.c >>> +++ b/fs/ext4/inode.c >>> @@ -1102,9 +1102,24 @@ int ext4_block_write_begin(handle_t *handle, >>> struct folio *folio, >>> err = -EIO; >>> } >>> if (unlikely(err)) { >>> - if (should_journal_data) >>> + if (should_journal_data) { >>> + if (bh != head || !block_start) { >>> + do { >>> + block_end = block_start + bh->b_size; >>> + >>> + if (buffer_new(bh)) >>> + if (block_end > from && block_start < to) >>> + do_journal_get_write_access(handle, >>> + inode, bh); >>> + >>> + block_start = block_end; >>> + bh = bh->b_this_page; >>> + } while (bh != head); >>> + } >>> + >>> ext4_journalled_zero_new_buffers(handle, inode, folio, >>> from, to); >>> + } >>> else >>> folio_zero_new_buffers(folio, from, to); >>> } else if (fscrypt_inode_uses_fs_layer_crypto(inode)) { >>> -- >>> 2.39.2 >>> >
Hello, On Fri 11-10-24 12:08:58, Baolin Liu wrote: > Greetings, > > This problem is reproduced by our customer using their own testing tool > “run_bug”. When I consulted with a client, the testing tool “run_bug” > used a variety of background programs to benchmark (including memory > pressure, cpu pressure, file cycle manipulation, fsstress Stress testing > tool, postmark program,and so on). > > The recurrence probability is relatively low. OK, thanks for asking! > In response to your query, in ext4_block_write_begin, the new state will > be clear before get block, and the bh that failed get_block will not be > set to new. However, when the page size is greater than the block size, a > page will contain multiple bh. True. I wanted to argue that the buffer_new bit should be either cleared in ext4_block_write_begin() (in case of error) or in ext4_journalled_write_end() (in case of success) but actually ext4_journalled_write_end() misses the clearing. So I think the better solution is like the attached patch. I'll submit it once testing finishes but it would be great if you could test that it fixes your problems as well. Thanks! Honza
> Hello, > I reviewed the patch attached in your email. The issue you mentioned about clearing buffer_new(bh) in write_end_fn() is indeed a bug. > However, this patch does not resolve the crash issue we encountered. > > Let me explain my analysis in detail below. > The crash occurs in the function jbd2_journal_dirty_metadata(). > > ext4_block_write_begin() -> ext4_journalled_zero_new_buffers() -> write_end_fn() > -> ext4_dirty_journalled_data() -> ext4_handle_dirty_metadata() -> __ext4_handle_dirty_metadata() > -> jbd2_journal_dirty_metadata() > > In the function jbd2_journal_dirty_metadata(), there is the following condition: > —--------------------------------------------------------------------------------------------------- > if (data_race(jh->b_transaction != transaction && > jh->b_next_transaction != transaction)) { > spin_lock(&jh->b_state_lock); > J_ASSERT_JH(jh, jh->b_transaction == transaction || > jh->b_next_transaction == transaction); > spin_unlock(&jh->b_state_lock); > } > ---------------------------------------------------------------------------------------------------- > By analyzing the vmcore, I found that both jh->b_transaction and jh->b_next_transaction are NULL. > Through code analysis, I discovered that the __jbd2_journal_file_buffer() function adds the corresponding transaction of bh to jh->b_transaction. > Normally, this is accessed through do_journal_get_write_access(), which can call __jbd2_journal_file_buffer(). > The detailed function call process is as follows: > do_journal_get_write_access() -> ext4_journal_get_write_access() -> __ext4_journal_get_write_access() > -> jbd2_journal_get_write_access() -> do_get_write_access() -> __jbd2_journal_file_buffer() > > > Therefore, resolving the crash issue requires obtaining write access before calling the jbd2_journal_dirty_metadata() function. > The comment at the definition of the jbd2_journal_dirty_metadata() function also states: > 'The buffer must have previously had jbd2_journal_get_write_access().' > > In the ext4_block_write_begin() function, if get_block() encounters an error, then neither bh->b_this_page nor the subsequent bh calls do_journal_get_write_access(). > If bh->b_this_page and the subsequent bh are in the new state, it will lead to a crash when reaching the jbd2_journal_dirty_metadata() function. > > So, there are two ways to resolve this crash issue: > 1、Call do_journal_get_write_access() on bh that is not handled due to get_block() error. > The patch modification is in the attachment 0001-ext4-fix-a-assertion-failure-due-to-ungranted-bh-dir.patch. > > 2、Call clear_buffer_new() on bh that is not handled due to get_block() error. > The patch modification is in the attachment 0001-ext4-fix-a-assertion-failure-due-to-bh-not-clear-new.patch. > > Additionally, I have found a method to quickly reproduce this crash issue. > For details, please refer to the email I previously sent you: “https://lore.kernel.org/all/bd41c24b-7325-4584-a965-392a32e32c74@163.com/”. > I have verified that this quick reproduction method works for both solutions to resolve the issue. > > Please continue to consider which method is better to resolve this issue. > If you think that using clear_buffer_new() is a better solution, I can resend the patch via git send-mail. 在 2024/10/16 18:33, Jan Kara 写道: > Hello, > > On Fri 11-10-24 12:08:58, Baolin Liu wrote: >> Greetings, >> >> This problem is reproduced by our customer using their own testing tool >> “run_bug”. When I consulted with a client, the testing tool “run_bug” >> used a variety of background programs to benchmark (including memory >> pressure, cpu pressure, file cycle manipulation, fsstress Stress testing >> tool, postmark program,and so on). >> >> The recurrence probability is relatively low. > > OK, thanks for asking! > >> In response to your query, in ext4_block_write_begin, the new state will >> be clear before get block, and the bh that failed get_block will not be >> set to new. However, when the page size is greater than the block size, a >> page will contain multiple bh. > > True. I wanted to argue that the buffer_new bit should be either cleared in > ext4_block_write_begin() (in case of error) or in > ext4_journalled_write_end() (in case of success) but actually > ext4_journalled_write_end() misses the clearing. So I think the better > solution is like the attached patch. I'll submit it once testing finishes > but it would be great if you could test that it fixes your problems as > well. Thanks! > > Honza From cbed522ccb695681d94ec02940e958fcf77e58cd Mon Sep 17 00:00:00 2001 From: Baolin Liu <liubaolin@kylinos.cn> Date: Wed, 9 Oct 2024 09:30:34 +0800 Subject: [PATCH] ext4: fix a assertion failure due to bh not clear new Since the merge of commit 3910b513fcdf ("ext4: persist the new uptodate buffers in ext4_journalled_zero_new_buffers"), a new assertion failure occurred under a old kernel(ext3, data=journal, pagesize=64k) with corresponding ported patches: ================================================================ Call trace: __ext4_handle_dirty_metadata+0x320/0x7e8 write_end_fn+0x78/0x178 ext4_journalled_zero_new_buffers+0xd0/0x2c8 ext4_block_write_begin+0x850/0xc00 ext4_write_begin+0x334/0xc68 generic_perform_write+0x1a4/0x380 ext4_buffered_write_iter+0x180/0x370 ext4_file_write_iter+0x194/0xfc0 new_sync_write+0x338/0x4b8 __vfs_write+0xc4/0xe8 vfs_write+0x12c/0x3d0 ksys_write+0xf4/0x230 sys_write+0x34/0x48 el0_svc_naked+0x44/0x48 ================================================================ which was caused by bh dirting without calling clear_buffer_new(). In the loop for all bhs of a page in ext4_block_write_begin(), when a err occurred, it will jump out of loop. But that will leaves some bhs being processed and some not, which will lead to the asserion failure in calling write_end_fn(). To fixed that, clear_buffer_new for the rest unprocessed bhs, just as what write_end_fn do. Fixes: 3910b513fcdf ("ext4: persist the new uptodate buffers in ext4_journalled_zero_new_buffers") Reported-and-tested-by: Zhi Long <longzhi@sangfor.com.cn> Suggested-by: Shida Zhang <zhangshida@kylinos.cn> Signed-off-by: Baolin Liu <liubaolin@kylinos.cn> --- fs/ext4/inode.c | 16 +++++++++++++++- 1 file changed, 15 insertions(+), 1 deletion(-) diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index 54bdd4884fe6..26c107e083c6 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -1102,9 +1102,23 @@ int ext4_block_write_begin(handle_t *handle, struct folio *folio, err = -EIO; } if (unlikely(err)) { - if (should_journal_data) + if (should_journal_data) { + if (bh != head || !block_start) { + do { + block_end = block_start + bh->b_size; + + if (buffer_new(bh)) + if (block_end > from && block_start < to) + clear_buffer_new(bh); + + block_start = block_end; + bh = bh->b_this_page; + } while (bh != head); + } + ext4_journalled_zero_new_buffers(handle, inode, folio, from, to); + } else folio_zero_new_buffers(folio, from, to); } else if (fscrypt_inode_uses_fs_layer_crypto(inode)) {
> Hello, I am very sorry. > I did not previously understand the approach of your patch to solve the issue. > Yesterday, I intentionally injected faults during the quick reproduction test, > and indeed, after applying your patch, the crash issue was resolved and did not occur again. > I finally understood your approach to solving the problem. Please disregard my previous email. > Thank you for helping me solve this crash issue in a better way. > I still need to improve my skills in file systems, and I truly appreciate your guidance. 在 2024/10/16 21:38, liubaolin 写道: >> Hello, >> I reviewed the patch attached in your email. The issue you mentioned >> about clearing buffer_new(bh) in write_end_fn() is indeed a bug. >> However, this patch does not resolve the crash issue we encountered. >> >> Let me explain my analysis in detail below. >> The crash occurs in the function jbd2_journal_dirty_metadata(). >> >> ext4_block_write_begin() -> ext4_journalled_zero_new_buffers() -> >> write_end_fn() >> -> ext4_dirty_journalled_data() -> ext4_handle_dirty_metadata() -> >> __ext4_handle_dirty_metadata() >> -> jbd2_journal_dirty_metadata() >> >> In the function jbd2_journal_dirty_metadata(), there is the following >> condition: >> —--------------------------------------------------------------------------------------------------- >> if (data_race(jh->b_transaction != transaction && >> jh->b_next_transaction != transaction)) { >> spin_lock(&jh->b_state_lock); >> J_ASSERT_JH(jh, jh->b_transaction == transaction || >> jh->b_next_transaction == transaction); >> spin_unlock(&jh->b_state_lock); >> } >> ---------------------------------------------------------------------------------------------------- >> By analyzing the vmcore, I found that both jh->b_transaction and jh- >> >b_next_transaction are NULL. >> Through code analysis, I discovered that the >> __jbd2_journal_file_buffer() function adds the corresponding >> transaction of bh to jh->b_transaction. >> Normally, this is accessed through do_journal_get_write_access(), >> which can call __jbd2_journal_file_buffer(). >> The detailed function call process is as follows: >> do_journal_get_write_access() -> ext4_journal_get_write_access() -> >> __ext4_journal_get_write_access() >> -> jbd2_journal_get_write_access() -> do_get_write_access() -> >> __jbd2_journal_file_buffer() >> >> >> Therefore, resolving the crash issue requires obtaining write access >> before calling the jbd2_journal_dirty_metadata() function. >> The comment at the definition of the jbd2_journal_dirty_metadata() >> function also states: 'The buffer must have previously had >> jbd2_journal_get_write_access().' >> >> In the ext4_block_write_begin() function, if get_block() encounters an >> error, then neither bh->b_this_page nor the subsequent bh calls >> do_journal_get_write_access(). >> If bh->b_this_page and the subsequent bh are in the new state, it will >> lead to a crash when reaching the jbd2_journal_dirty_metadata() function. >> >> So, there are two ways to resolve this crash issue: >> 1、Call do_journal_get_write_access() on bh that is not handled due to >> get_block() error. >> The patch modification is in the attachment 0001-ext4-fix-a- >> assertion-failure-due-to-ungranted-bh-dir.patch. >> >> 2、Call clear_buffer_new() on bh that is not handled due to >> get_block() error. >> The patch modification is in the attachment 0001-ext4-fix-a- >> assertion-failure-due-to-bh-not-clear-new.patch. >> >> Additionally, I have found a method to quickly reproduce this crash >> issue. >> For details, please refer to the email I previously sent you: >> “https://lore.kernel.org/all/bd41c24b-7325-4584- >> a965-392a32e32c74@163.com/”. >> I have verified that this quick reproduction method works for both >> solutions to resolve the issue. >> >> Please continue to consider which method is better to resolve this >> issue. If you think that using clear_buffer_new() is a better >> solution, I can resend the patch via git send-mail. > > > > 在 2024/10/16 18:33, Jan Kara 写道: >> Hello, >> >> On Fri 11-10-24 12:08:58, Baolin Liu wrote: >>> Greetings, >>> >>> This problem is reproduced by our customer using their own testing tool >>> “run_bug”. When I consulted with a client, the testing tool “run_bug” >>> used a variety of background programs to benchmark (including memory >>> pressure, cpu pressure, file cycle manipulation, fsstress Stress testing >>> tool, postmark program,and so on). >>> >>> The recurrence probability is relatively low. >> >> OK, thanks for asking! >> >>> In response to your query, in ext4_block_write_begin, the new state will >>> be clear before get block, and the bh that failed get_block will not be >>> set to new. However, when the page size is greater than the block >>> size, a >>> page will contain multiple bh. >> >> True. I wanted to argue that the buffer_new bit should be either >> cleared in >> ext4_block_write_begin() (in case of error) or in >> ext4_journalled_write_end() (in case of success) but actually >> ext4_journalled_write_end() misses the clearing. So I think the better >> solution is like the attached patch. I'll submit it once testing finishes >> but it would be great if you could test that it fixes your problems as >> well. Thanks! >> >> Honza
On Fri 18-10-24 09:48:17, liubaolin wrote: > > Hello, I am very sorry. > > I did not previously understand the approach of your patch to solve the issue. > > Yesterday, I intentionally injected faults during the quick reproduction > > test, and indeed, after applying your patch, the crash issue was > > resolved and did not occur again. > > I finally understood your approach to solving the problem. Please disregard my previous email. > > Thank you for helping me solve this crash issue in a better way. > > I still need to improve my skills in file systems, and I truly appreciate your guidance. Great! Thanks for testing. I'll send the patch for inclusion then. Honza > 在 2024/10/16 21:38, liubaolin 写道: > > > Hello, > > > I reviewed the patch attached in your email. The issue you mentioned > > > about clearing buffer_new(bh) in write_end_fn() is indeed a bug. > > > However, this patch does not resolve the crash issue we encountered. > > > > > > Let me explain my analysis in detail below. > > > The crash occurs in the function jbd2_journal_dirty_metadata(). > > > > > > ext4_block_write_begin() -> ext4_journalled_zero_new_buffers() -> > > > write_end_fn() > > > -> ext4_dirty_journalled_data() -> ext4_handle_dirty_metadata() -> > > > __ext4_handle_dirty_metadata() > > > -> jbd2_journal_dirty_metadata() > > > > > > In the function jbd2_journal_dirty_metadata(), there is the > > > following condition: > > > —--------------------------------------------------------------------------------------------------- > > > if (data_race(jh->b_transaction != transaction && > > > jh->b_next_transaction != transaction)) { > > > spin_lock(&jh->b_state_lock); > > > J_ASSERT_JH(jh, jh->b_transaction == transaction || > > > jh->b_next_transaction == transaction); > > > spin_unlock(&jh->b_state_lock); > > > } > > > ---------------------------------------------------------------------------------------------------- > > > By analyzing the vmcore, I found that both jh->b_transaction and jh- > > > >b_next_transaction are NULL. > > > Through code analysis, I discovered that the > > > __jbd2_journal_file_buffer() function adds the corresponding > > > transaction of bh to jh->b_transaction. > > > Normally, this is accessed through do_journal_get_write_access(), > > > which can call __jbd2_journal_file_buffer(). > > > The detailed function call process is as follows: > > > do_journal_get_write_access() -> ext4_journal_get_write_access() -> > > > __ext4_journal_get_write_access() > > > -> jbd2_journal_get_write_access() -> do_get_write_access() -> > > > __jbd2_journal_file_buffer() > > > > > > > > > Therefore, resolving the crash issue requires obtaining write access > > > before calling the jbd2_journal_dirty_metadata() function. > > > The comment at the definition of the jbd2_journal_dirty_metadata() > > > function also states: 'The buffer must have previously had > > > jbd2_journal_get_write_access().' > > > > > > In the ext4_block_write_begin() function, if get_block() encounters > > > an error, then neither bh->b_this_page nor the subsequent bh calls > > > do_journal_get_write_access(). > > > If bh->b_this_page and the subsequent bh are in the new state, it > > > will lead to a crash when reaching the jbd2_journal_dirty_metadata() > > > function. > > > > > > So, there are two ways to resolve this crash issue: > > > 1、Call do_journal_get_write_access() on bh that is not handled due > > > to get_block() error. > > > The patch modification is in the attachment 0001-ext4-fix-a- > > > assertion-failure-due-to-ungranted-bh-dir.patch. > > > > > > 2、Call clear_buffer_new() on bh that is not handled due to > > > get_block() error. > > > The patch modification is in the attachment 0001-ext4-fix-a- > > > assertion-failure-due-to-bh-not-clear-new.patch. > > > > > > Additionally, I have found a method to quickly reproduce this crash > > > issue. > > > For details, please refer to the email I previously sent you: > > > “https://lore.kernel.org/all/bd41c24b-7325-4584- > > > a965-392a32e32c74@163.com/”. > > > I have verified that this quick reproduction method works for both > > > solutions to resolve the issue. > > > > > > Please continue to consider which method is better to resolve this > > > issue. If you think that using clear_buffer_new() is a better > > > solution, I can resend the patch via git send-mail. > > > > > > > > 在 2024/10/16 18:33, Jan Kara 写道: > > > Hello, > > > > > > On Fri 11-10-24 12:08:58, Baolin Liu wrote: > > > > Greetings, > > > > > > > > This problem is reproduced by our customer using their own testing tool > > > > “run_bug”. When I consulted with a client, the testing tool “run_bug” > > > > used a variety of background programs to benchmark (including memory > > > > pressure, cpu pressure, file cycle manipulation, fsstress Stress testing > > > > tool, postmark program,and so on). > > > > > > > > The recurrence probability is relatively low. > > > > > > OK, thanks for asking! > > > > > > > In response to your query, in ext4_block_write_begin, the new state will > > > > be clear before get block, and the bh that failed get_block will not be > > > > set to new. However, when the page size is greater than the > > > > block size, a > > > > page will contain multiple bh. > > > > > > True. I wanted to argue that the buffer_new bit should be either > > > cleared in > > > ext4_block_write_begin() (in case of error) or in > > > ext4_journalled_write_end() (in case of success) but actually > > > ext4_journalled_write_end() misses the clearing. So I think the better > > > solution is like the attached patch. I'll submit it once testing finishes > > > but it would be great if you could test that it fixes your problems as > > > well. Thanks! > > > > > > Honza >
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index 54bdd4884fe6..a72f951288e4 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -1102,9 +1102,24 @@ int ext4_block_write_begin(handle_t *handle, struct folio *folio, err = -EIO; } if (unlikely(err)) { - if (should_journal_data) + if (should_journal_data) { + if (bh != head || !block_start) { + do { + block_end = block_start + bh->b_size; + + if (buffer_new(bh)) + if (block_end > from && block_start < to) + do_journal_get_write_access(handle, + inode, bh); + + block_start = block_end; + bh = bh->b_this_page; + } while (bh != head); + } + ext4_journalled_zero_new_buffers(handle, inode, folio, from, to); + } else folio_zero_new_buffers(folio, from, to); } else if (fscrypt_inode_uses_fs_layer_crypto(inode)) {