[REBASED,0/7] ext4: Cleanup data=journal writeback path

Message ID	20230228051319.4085470-1-tytso@mit.edu
Headers	show Return-Path: <SRS0=KYHn=6Y=vger.kernel.org=linux-ext4-owner@ozlabs.org> From: "Theodore Ts'o" <tytso@mit.edu> To: Ext4 Developers List <linux-ext4@vger.kernel.org> Cc: "Theodore Ts'o" <tytso@mit.edu> Subject: [PATCH REBASED 0/7] ext4: Cleanup data=journal writeback path Date: Tue, 28 Feb 2023 00:13:12 -0500 Message-Id: <20230228051319.4085470-1-tytso@mit.edu> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	ext4: Cleanup data=journal writeback path \| expand [REBASED,0/7] ext4: Cleanup data=journal writeback path [1/7] ext4: Update stale comment about write constraints [2/7] ext4: Use nr_to_write directly in mpage_prepare_extent_to_map() [3/7] ext4: Mark page for delayed dirtying only if it is pinned [4/7] ext4: Don't unlock page in ext4_bio_write_page() [5/7] ext4: Move page unlocking out of mpage_submit_page() [6/7] ext4: Move mpage_page_done() calls after error handling [7/7] ext4: Convert data=journal writeback to use ext4_writepages()

Theodore Ts'o Feb. 28, 2023, 5:13 a.m. UTC

This is Jan's data=journal cleanup patch series, previously submitted
here[1] rebased on top of Linus's patches to address merge conflicts
with mm-stable, per this discussion[2].

[1] https://lore.kernel.org/r/20230111152736.9608-1-jack@suse.cz
[2] https://lore.kernel.org/r/Y/k4Jvph15ugcY54@mit.edu

While retesting this patch series, I've noticed a potential regression
which doesn't trigger before applying the last patch in this series
(Convert data=journal writeback to use ext4_writepages), but which
triggers a WARNING in generic/390 about half the time.  I've gone back
and retested, and this was happening before the rebase.

Jan, could you take a look and (1) let me know what you think about my
patch conflict resolutions and (2) what you think about this warning
which is occasionally triggered by generic/390?  Many thanks!

      	 	      		   - Ted

generic/390 2s ...  [00:08:04][    2.708542] run fstests generic/390 at 2023-02-28 00:08:04
[    2.871030] EXT4-fs: Warning: mounting with data=journal disables delayed allocation, dioread_nolock, O_DIRECT and fast_commit support!
[    3.814748] ------------[ cut here ]------------
[    3.816039] WARNING: CPU: 1 PID: 151 at fs/ext4/ext4_jbd2.c:75 ext4_journal_check_start+0x67/0xa0
[    3.817902] CPU: 1 PID: 151 Comm: kworker/u4:3 Not tainted 6.2.0-xfstests-11603-g17b3ec378915 #953
[    3.818285] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.0-debian-1.16.0-5 04/01/2014
[    3.818674] Workqueue: writeback wb_workfn (flush-254:32)
[    3.818903] RIP: 0010:ext4_journal_check_start+0x67/0xa0
[    3.819277] Code: 00 04 74 26 48 8b 90 28 02 00 00 31 c0 48 85 d2 74 07 8b 02 83 e0 02 75 15 5b c3 cc cc cc cc b8 fb ff ff ff 5b c3 cc cc cc cc <0f> 0b eb d6 44 8b 42 10 45 31 c9 68 60 57 59 82 48 89 df b9 01 00
[    3.820069] RSP: 0018:ffffc900009f7930 EFLAGS: 00010246
[    3.820301] RAX: ffff888009c65000 RBX: ffff888006ec1800 RCX: ffff88800665f000
[    3.820605] RDX: 0000000000000000 RSI: 0000000000000044 RDI: ffffffff8259574c
[    3.820908] RBP: 0000000000000001 R08: 0000000000000001 R09: 0000000000000000
[    3.821213] R10: 0000000000000228 R11: 0000000000000000 R12: 0000000000000000
[    3.821512] R13: 0000000000000002 R14: 000000000000097f R15: 0000000000000008
[    3.821830] FS:  0000000000000000(0000) GS:ffff88807dd00000(0000) knlGS:0000000000000000
[    3.822179] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    3.822432] CR2: 00007f4ab2dce670 CR3: 0000000006764006 CR4: 0000000000770ee0
[    3.822739] PKRU: 55555554
[    3.822861] Call Trace:
[    3.822972]  <TASK>
[    3.823070]  __ext4_journal_start_sb+0x3f/0x190
[    3.823267]  mpage_prepare_extent_to_map+0x470/0x500
[    3.823483]  ext4_do_writepages+0x250/0x760
[    3.823661]  ext4_writepages+0x99/0x130
[    3.823828]  do_writepages+0xcf/0x1e0
[    3.823988]  ? fprop_fraction_percpu+0x2f/0x80
[    3.824186]  __writeback_single_inode+0x3d/0x280
[    3.824389]  writeback_sb_inodes+0x1ed/0x4b0
[    3.824571]  wb_writeback+0xdb/0x2f0
[    3.824724]  wb_do_writeback+0x87/0x2b0
[    3.824890]  ? set_worker_desc+0xc7/0xd0
[    3.825060]  wb_workfn+0x5f/0x260
[    3.825205]  ? ttwu_do_activate+0x83/0x1e0
[    3.825382]  ? _raw_spin_unlock_irqrestore+0xe/0x30
[    3.825592]  ? try_to_wake_up+0x275/0x480
[    3.825774]  process_one_work+0x1c3/0x3d0
[    3.825949]  worker_thread+0x51/0x3b0
[    3.826108]  ? __pfx_worker_thread+0x10/0x10
[    3.826291]  kthread+0xe7/0x110
[    3.826430]  ? __pfx_kthread+0x10/0x10
[    3.826593]  ret_from_fork+0x29/0x50
[    3.826747]  </TASK>
[    3.826846] ---[ end trace 0000000000000000 ]---
[    3.827056] ------------[ cut here ]------------


				   

Jan Kara (7):
  ext4: Update stale comment about write constraints
  ext4: Use nr_to_write directly in mpage_prepare_extent_to_map()
  ext4: Mark page for delayed dirtying only if it is pinned
  ext4: Don't unlock page in ext4_bio_write_page()
  ext4: Move page unlocking out of mpage_submit_page()
  ext4: Move mpage_page_done() calls after error handling
  ext4: Convert data=journal writeback to use ext4_writepages()

 fs/ext4/inode.c             | 410 +++++++++++-------------------------
 fs/ext4/page-io.c           |  10 +-
 include/trace/events/ext4.h |   7 -
 3 files changed, 126 insertions(+), 301 deletions(-)

Jan Kara Feb. 28, 2023, 1:58 p.m. UTC | #1

Hi Ted!

On Tue 28-02-23 00:13:12, Theodore Ts'o wrote:
> This is Jan's data=journal cleanup patch series, previously submitted
> here[1] rebased on top of Linus's patches to address merge conflicts
> with mm-stable, per this discussion[2].
> 
> [1] https://lore.kernel.org/r/20230111152736.9608-1-jack@suse.cz
> [2] https://lore.kernel.org/r/Y/k4Jvph15ugcY54@mit.edu
> 
> While retesting this patch series, I've noticed a potential regression
> which doesn't trigger before applying the last patch in this series
> (Convert data=journal writeback to use ext4_writepages), but which
> triggers a WARNING in generic/390 about half the time.  I've gone back
> and retested, and this was happening before the rebase.
> 
> Jan, could you take a look and (1) let me know what you think about my
> patch conflict resolutions and (2) what you think about this warning
> which is occasionally triggered by generic/390?  Many thanks!

I went through the patches and they look good to me. I have a side note
that obviously the code isn't quite going to work for folios larger than 1
page but for 1 page folios we should be fine.

Regarding the warning somehow there are dirty pages left after the
procedures freeze_super() goes through to flush all dirty data. That is not
too surprising since in data=journal mode pages get (as a surprise to
freezing code) dirtied when transaction commits. I thought we have this
covered by jbd2_journal_flush() call in ext4_freeze() but maybe there are
some stranded PageDirty bits left? It needs more debugging...

								Honza

Theodore Ts'o March 2, 2023, 1:37 p.m. UTC | #2

On Tue, Feb 28, 2023 at 02:58:54PM +0100, Jan Kara wrote:
> 
> Regarding the warning somehow there are dirty pages left after the
> procedures freeze_super() goes through to flush all dirty data. That is not
> too surprising since in data=journal mode pages get (as a surprise to
> freezing code) dirtied when transaction commits. I thought we have this
> covered by jbd2_journal_flush() call in ext4_freeze() but maybe there are
> some stranded PageDirty bits left? It needs more debugging...

So the question I have is whether this is a bug that was always there,
or perhaps code changes affected the timing, so that it was much more
likely to happen (although in my testing it's still only triggering
about 50% of the time).

This warning can mean that data can be lost, especially if someone was
doing hibernation, right?  We can discuss this at today's ext4 video
conference, but I'm inclined to wait until the next merge cycle for
this patch series until we can figure out what's going on.  Jan, what
do you think?

	     	     	       	      - Ted

Jan Kara March 3, 2023, 9:29 a.m. UTC | #3

On Thu 02-03-23 08:37:35, Theodore Ts'o wrote:
> On Tue, Feb 28, 2023 at 02:58:54PM +0100, Jan Kara wrote:
> > 
> > Regarding the warning somehow there are dirty pages left after the
> > procedures freeze_super() goes through to flush all dirty data. That is not
> > too surprising since in data=journal mode pages get (as a surprise to
> > freezing code) dirtied when transaction commits. I thought we have this
> > covered by jbd2_journal_flush() call in ext4_freeze() but maybe there are
> > some stranded PageDirty bits left? It needs more debugging...
> 
> So the question I have is whether this is a bug that was always there,
> or perhaps code changes affected the timing, so that it was much more
> likely to happen (although in my testing it's still only triggering
> about 50% of the time).
> 
> This warning can mean that data can be lost, especially if someone was
> doing hibernation, right?  We can discuss this at today's ext4 video
> conference, but I'm inclined to wait until the next merge cycle for
> this patch series until we can figure out what's going on.  Jan, what
> do you think?

As we discussed in the meeting, I'll have a look into the failure. For now
I'd keep patches on hold as it is not urgent patchset.

								Honza

Jan Kara March 3, 2023, 10:38 a.m. UTC | #4

On Fri 03-03-23 10:29:39, Jan Kara wrote:
> On Thu 02-03-23 08:37:35, Theodore Ts'o wrote:
> > On Tue, Feb 28, 2023 at 02:58:54PM +0100, Jan Kara wrote:
> > > 
> > > Regarding the warning somehow there are dirty pages left after the
> > > procedures freeze_super() goes through to flush all dirty data. That is not
> > > too surprising since in data=journal mode pages get (as a surprise to
> > > freezing code) dirtied when transaction commits. I thought we have this
> > > covered by jbd2_journal_flush() call in ext4_freeze() but maybe there are
> > > some stranded PageDirty bits left? It needs more debugging...
> > 
> > So the question I have is whether this is a bug that was always there,
> > or perhaps code changes affected the timing, so that it was much more
> > likely to happen (although in my testing it's still only triggering
> > about 50% of the time).
> > 
> > This warning can mean that data can be lost, especially if someone was
> > doing hibernation, right?  We can discuss this at today's ext4 video
> > conference, but I'm inclined to wait until the next merge cycle for
> > this patch series until we can figure out what's going on.  Jan, what
> > do you think?
> 
> As we discussed in the meeting, I'll have a look into the failure. For now
> I'd keep patches on hold as it is not urgent patchset.

FWIW I'm able to reproduce the issue and also as I'm looking into the
freeze to it was always there in principle. But my patches likely made it
much more probable. I'll have a look what I can do...

								Honza

Jan Kara March 8, 2023, 4:37 p.m. UTC | #5

On Fri 03-03-23 11:38:06, Jan Kara wrote:
> On Fri 03-03-23 10:29:39, Jan Kara wrote:
> > On Thu 02-03-23 08:37:35, Theodore Ts'o wrote:
> > > On Tue, Feb 28, 2023 at 02:58:54PM +0100, Jan Kara wrote:
> > > > 
> > > > Regarding the warning somehow there are dirty pages left after the
> > > > procedures freeze_super() goes through to flush all dirty data. That is not
> > > > too surprising since in data=journal mode pages get (as a surprise to
> > > > freezing code) dirtied when transaction commits. I thought we have this
> > > > covered by jbd2_journal_flush() call in ext4_freeze() but maybe there are
> > > > some stranded PageDirty bits left? It needs more debugging...
> > > 
> > > So the question I have is whether this is a bug that was always there,
> > > or perhaps code changes affected the timing, so that it was much more
> > > likely to happen (although in my testing it's still only triggering
> > > about 50% of the time).
> > > 
> > > This warning can mean that data can be lost, especially if someone was
> > > doing hibernation, right?  We can discuss this at today's ext4 video
> > > conference, but I'm inclined to wait until the next merge cycle for
> > > this patch series until we can figure out what's going on.  Jan, what
> > > do you think?
> > 
> > As we discussed in the meeting, I'll have a look into the failure. For now
> > I'd keep patches on hold as it is not urgent patchset.
> 
> FWIW I'm able to reproduce the issue and also as I'm looking into the
> freeze to it was always there in principle. But my patches likely made it
> much more probable. I'll have a look what I can do...

So in the end this was mostly a cosmetic issue (jbd2 checkpointing code
leaving dirty bits on pages whose buffer heads were already written). Patch
submitted.

								Honza

Theodore Ts'o March 15, 2023, 4:16 p.m. UTC | #6

On Tue, 28 Feb 2023 00:13:12 -0500, Theodore Ts'o wrote:
> This is Jan's data=journal cleanup patch series, previously submitted
> here[1] rebased on top of Linus's patches to address merge conflicts
> with mm-stable, per this discussion[2].
> 
> [1] https://lore.kernel.org/r/20230111152736.9608-1-jack@suse.cz
> [2] https://lore.kernel.org/r/Y/k4Jvph15ugcY54@mit.edu
> 
> [...]

Applied, thanks!

[1/7] ext4: Update stale comment about write constraints
      commit: e7a2d7ab32fd5bfe431d69df56eedf74c747dfa1
[2/7] ext4: Use nr_to_write directly in mpage_prepare_extent_to_map()
      commit: 6e6213b505709902199d69f56f64bba3dbd867ff
[3/7] ext4: Mark page for delayed dirtying only if it is pinned
      commit: 90f1929e5e5af0467572c81e6ac78ae20316f439
[4/7] ext4: Don't unlock page in ext4_bio_write_page()
      commit: 2c892bdd69ce89e534c915e84c3e4ba98de9ef6c
[5/7] ext4: Move page unlocking out of mpage_submit_page()
      commit: 22e00e971f55a0b7b375d937db614863e1a2e500
[6/7] ext4: Move mpage_page_done() calls after error handling
      commit: 47c6f573b4ef9bf68ce45bd4e4c35ef56d326fe1
[7/7] ext4: Convert data=journal writeback to use ext4_writepages()
      commit: f7233fb54d18b45b42d8f6ad6a95bd8641114c36

Best regards,

[REBASED,0/7] ext4: Cleanup data=journal writeback path

Message

Comments