Message ID | 20210408113618.1033785-2-yi.zhang@huawei.com |
---|---|
State | New |
Headers | show |
Series | ext4: fix two issue about bdev_try_to_free_page() | expand |
On Thu 08-04-21 19:36:16, Zhang Yi wrote: > There is a race between jbd2_journal_try_to_free_buffers() and > jbd2_journal_destroy(), so the jbd2_log_do_checkpoint() may still > missing to detect the buffer write io error flag and lead to filesystem > inconsistency. > > jbd2_journal_try_to_free_buffers() ext4_put_super() > jbd2_journal_destroy() > __jbd2_journal_remove_checkpoint() > detect buffer write error jbd2_log_do_checkpoint() > jbd2_cleanup_journal_tail() > <--- lead to inconsistency > jbd2_journal_abort() > > Fix this issue by add j_checkpoint_mutex to protect journal buffer > release on jbd2_journal_try_to_free_buffers(). > > Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Thanks for the patch Zhang. I agree with your problem analysis but I don't think the solution is correct: > J_ASSERT(PageLocked(page)); > > + mutex_lock(&journal->j_checkpoint_mutex); We cannot grab j_checkpoint_mutex inside jbd2_journal_try_to_free_buffers() (or even ext4_releasepage()) because that function is called withe a page lock which ranks below the checkpoint mutex - generally page locks are acquired within a transaction and thus all locks required to start a transaction (and j_checkpoint_mutex is one of them) rank above the page lock. Also even if the lock ordering was OK, grabbing j_checkpoint_mutex for every page from memory reclaim just to close this rare race seems like a performance overkill. What we seem to need is a quick way of marking the journal as "IO error occured" in __journal_try_to_free_buffer() before actually removing the buffer from the checkpoint list. Perhaps this marking could even happen already in __jbd2_journal_remove_checkpoint() and we can reuse it in jbd2_log_do_checkpoint() for IO error handling as well... And then once we are in a safer context, we can do: if (!is_journal_aborted(journal) && journal_io_error_happened(journal)) jbd2_journal_abort(...) Honza > head = page_buffers(page); > bh = head; > do { > @@ -2163,6 +2164,7 @@ int jbd2_journal_try_to_free_buffers(journal_t *journal, struct page *page) > if (has_write_io_error) > jbd2_journal_abort(journal, -EIO); > > + mutex_unlock(&journal->j_checkpoint_mutex); > return ret; > } > > -- > 2.25.4 >
diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c index 9396666b7314..b935b20cbae4 100644 --- a/fs/jbd2/transaction.c +++ b/fs/jbd2/transaction.c @@ -2123,6 +2123,7 @@ int jbd2_journal_try_to_free_buffers(journal_t *journal, struct page *page) J_ASSERT(PageLocked(page)); + mutex_lock(&journal->j_checkpoint_mutex); head = page_buffers(page); bh = head; do { @@ -2163,6 +2164,7 @@ int jbd2_journal_try_to_free_buffers(journal_t *journal, struct page *page) if (has_write_io_error) jbd2_journal_abort(journal, -EIO); + mutex_unlock(&journal->j_checkpoint_mutex); return ret; }
There is a race between jbd2_journal_try_to_free_buffers() and jbd2_journal_destroy(), so the jbd2_log_do_checkpoint() may still missing to detect the buffer write io error flag and lead to filesystem inconsistency. jbd2_journal_try_to_free_buffers() ext4_put_super() jbd2_journal_destroy() __jbd2_journal_remove_checkpoint() detect buffer write error jbd2_log_do_checkpoint() jbd2_cleanup_journal_tail() <--- lead to inconsistency jbd2_journal_abort() Fix this issue by add j_checkpoint_mutex to protect journal buffer release on jbd2_journal_try_to_free_buffers(). Signed-off-by: Zhang Yi <yi.zhang@huawei.com> --- fs/jbd2/transaction.c | 2 ++ 1 file changed, 2 insertions(+)