Message ID | 20230228051319.4085470-1-tytso@mit.edu |
---|---|
Headers | show |
Series | ext4: Cleanup data=journal writeback path | expand |
Hi Ted! On Tue 28-02-23 00:13:12, Theodore Ts'o wrote: > This is Jan's data=journal cleanup patch series, previously submitted > here[1] rebased on top of Linus's patches to address merge conflicts > with mm-stable, per this discussion[2]. > > [1] https://lore.kernel.org/r/20230111152736.9608-1-jack@suse.cz > [2] https://lore.kernel.org/r/Y/k4Jvph15ugcY54@mit.edu > > While retesting this patch series, I've noticed a potential regression > which doesn't trigger before applying the last patch in this series > (Convert data=journal writeback to use ext4_writepages), but which > triggers a WARNING in generic/390 about half the time. I've gone back > and retested, and this was happening before the rebase. > > Jan, could you take a look and (1) let me know what you think about my > patch conflict resolutions and (2) what you think about this warning > which is occasionally triggered by generic/390? Many thanks! I went through the patches and they look good to me. I have a side note that obviously the code isn't quite going to work for folios larger than 1 page but for 1 page folios we should be fine. Regarding the warning somehow there are dirty pages left after the procedures freeze_super() goes through to flush all dirty data. That is not too surprising since in data=journal mode pages get (as a surprise to freezing code) dirtied when transaction commits. I thought we have this covered by jbd2_journal_flush() call in ext4_freeze() but maybe there are some stranded PageDirty bits left? It needs more debugging... Honza
On Tue, Feb 28, 2023 at 02:58:54PM +0100, Jan Kara wrote: > > Regarding the warning somehow there are dirty pages left after the > procedures freeze_super() goes through to flush all dirty data. That is not > too surprising since in data=journal mode pages get (as a surprise to > freezing code) dirtied when transaction commits. I thought we have this > covered by jbd2_journal_flush() call in ext4_freeze() but maybe there are > some stranded PageDirty bits left? It needs more debugging... So the question I have is whether this is a bug that was always there, or perhaps code changes affected the timing, so that it was much more likely to happen (although in my testing it's still only triggering about 50% of the time). This warning can mean that data can be lost, especially if someone was doing hibernation, right? We can discuss this at today's ext4 video conference, but I'm inclined to wait until the next merge cycle for this patch series until we can figure out what's going on. Jan, what do you think? - Ted
On Thu 02-03-23 08:37:35, Theodore Ts'o wrote: > On Tue, Feb 28, 2023 at 02:58:54PM +0100, Jan Kara wrote: > > > > Regarding the warning somehow there are dirty pages left after the > > procedures freeze_super() goes through to flush all dirty data. That is not > > too surprising since in data=journal mode pages get (as a surprise to > > freezing code) dirtied when transaction commits. I thought we have this > > covered by jbd2_journal_flush() call in ext4_freeze() but maybe there are > > some stranded PageDirty bits left? It needs more debugging... > > So the question I have is whether this is a bug that was always there, > or perhaps code changes affected the timing, so that it was much more > likely to happen (although in my testing it's still only triggering > about 50% of the time). > > This warning can mean that data can be lost, especially if someone was > doing hibernation, right? We can discuss this at today's ext4 video > conference, but I'm inclined to wait until the next merge cycle for > this patch series until we can figure out what's going on. Jan, what > do you think? As we discussed in the meeting, I'll have a look into the failure. For now I'd keep patches on hold as it is not urgent patchset. Honza
On Fri 03-03-23 10:29:39, Jan Kara wrote: > On Thu 02-03-23 08:37:35, Theodore Ts'o wrote: > > On Tue, Feb 28, 2023 at 02:58:54PM +0100, Jan Kara wrote: > > > > > > Regarding the warning somehow there are dirty pages left after the > > > procedures freeze_super() goes through to flush all dirty data. That is not > > > too surprising since in data=journal mode pages get (as a surprise to > > > freezing code) dirtied when transaction commits. I thought we have this > > > covered by jbd2_journal_flush() call in ext4_freeze() but maybe there are > > > some stranded PageDirty bits left? It needs more debugging... > > > > So the question I have is whether this is a bug that was always there, > > or perhaps code changes affected the timing, so that it was much more > > likely to happen (although in my testing it's still only triggering > > about 50% of the time). > > > > This warning can mean that data can be lost, especially if someone was > > doing hibernation, right? We can discuss this at today's ext4 video > > conference, but I'm inclined to wait until the next merge cycle for > > this patch series until we can figure out what's going on. Jan, what > > do you think? > > As we discussed in the meeting, I'll have a look into the failure. For now > I'd keep patches on hold as it is not urgent patchset. FWIW I'm able to reproduce the issue and also as I'm looking into the freeze to it was always there in principle. But my patches likely made it much more probable. I'll have a look what I can do... Honza
On Fri 03-03-23 11:38:06, Jan Kara wrote: > On Fri 03-03-23 10:29:39, Jan Kara wrote: > > On Thu 02-03-23 08:37:35, Theodore Ts'o wrote: > > > On Tue, Feb 28, 2023 at 02:58:54PM +0100, Jan Kara wrote: > > > > > > > > Regarding the warning somehow there are dirty pages left after the > > > > procedures freeze_super() goes through to flush all dirty data. That is not > > > > too surprising since in data=journal mode pages get (as a surprise to > > > > freezing code) dirtied when transaction commits. I thought we have this > > > > covered by jbd2_journal_flush() call in ext4_freeze() but maybe there are > > > > some stranded PageDirty bits left? It needs more debugging... > > > > > > So the question I have is whether this is a bug that was always there, > > > or perhaps code changes affected the timing, so that it was much more > > > likely to happen (although in my testing it's still only triggering > > > about 50% of the time). > > > > > > This warning can mean that data can be lost, especially if someone was > > > doing hibernation, right? We can discuss this at today's ext4 video > > > conference, but I'm inclined to wait until the next merge cycle for > > > this patch series until we can figure out what's going on. Jan, what > > > do you think? > > > > As we discussed in the meeting, I'll have a look into the failure. For now > > I'd keep patches on hold as it is not urgent patchset. > > FWIW I'm able to reproduce the issue and also as I'm looking into the > freeze to it was always there in principle. But my patches likely made it > much more probable. I'll have a look what I can do... So in the end this was mostly a cosmetic issue (jbd2 checkpointing code leaving dirty bits on pages whose buffer heads were already written). Patch submitted. Honza
On Tue, 28 Feb 2023 00:13:12 -0500, Theodore Ts'o wrote: > This is Jan's data=journal cleanup patch series, previously submitted > here[1] rebased on top of Linus's patches to address merge conflicts > with mm-stable, per this discussion[2]. > > [1] https://lore.kernel.org/r/20230111152736.9608-1-jack@suse.cz > [2] https://lore.kernel.org/r/Y/k4Jvph15ugcY54@mit.edu > > [...] Applied, thanks! [1/7] ext4: Update stale comment about write constraints commit: e7a2d7ab32fd5bfe431d69df56eedf74c747dfa1 [2/7] ext4: Use nr_to_write directly in mpage_prepare_extent_to_map() commit: 6e6213b505709902199d69f56f64bba3dbd867ff [3/7] ext4: Mark page for delayed dirtying only if it is pinned commit: 90f1929e5e5af0467572c81e6ac78ae20316f439 [4/7] ext4: Don't unlock page in ext4_bio_write_page() commit: 2c892bdd69ce89e534c915e84c3e4ba98de9ef6c [5/7] ext4: Move page unlocking out of mpage_submit_page() commit: 22e00e971f55a0b7b375d937db614863e1a2e500 [6/7] ext4: Move mpage_page_done() calls after error handling commit: 47c6f573b4ef9bf68ce45bd4e4c35ef56d326fe1 [7/7] ext4: Convert data=journal writeback to use ext4_writepages() commit: f7233fb54d18b45b42d8f6ad6a95bd8641114c36 Best regards,