Message ID | 20230605054601.1410517-1-gerald.yang@canonical.com |
---|---|
Headers | show |
Series | sbitmap: fix IO hung due to waiters not | expand |
On 23/06/05 01:45PM, Gerald Yang wrote: > BugLink: https://bugs.launchpad.net/bugs/2022318 > > SRU Justification: > > [ Impact ] > > When running fio on a NVME on an AWS test instance with 5.19 kernel, IOs get stuck and fio never ends > > fio command: > sudo fio --name=read_iops_test --filename=/dev/nvme1n1 --filesize=50G --time_based --ramp_time=2s --runtime=1m --ioengine=libaio --direct=1 --verify=0 --randrepeat=0 --bs=16K --iodepth=256 --rw=randread > read_iops_test: (g=0): rw=randread, bs=(R) 16.0KiB-16.0KiB, (W) 16.0KiB-16.0KiB, (T) 16.0KiB-16.0KiB, ioengine=libaio, iodepth=256 > fio-3.28 > Starting 1 process > Jobs: 1 (f=0): [/(1)][-.-%][eta 01m:02s] > > IOs completely get stuck, after a while kernel log shows: > Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.230970] INFO: task fio:2545 blocked for more than 120 seconds. > Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.232878] Not tainted 5.19.0-43-generic #44~22.04.1-Ubuntu > Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.234738] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237053] task:fio state:D stack: 0 pid: 2545 ppid: 2495 flags:0x00000002 > Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237057] Call Trace: > Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237058] <TASK> > Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237061] __schedule+0x257/0x5d0 > Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237066] schedule+0x68/0x110 > Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237068] io_schedule+0x46/0x80 > Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237069] blk_mq_get_tag+0x117/0x300 > Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237072] ? destroy_sched_domains_rcu+0x40/0x40 > Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237076] __blk_mq_alloc_requests+0xc4/0x1e0 > Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237079] blk_mq_get_new_requests+0xf6/0x1a0 > Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237080] blk_mq_submit_bio+0x1eb/0x440 > Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237082] __submit_bio+0x109/0x1a0 > Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237085] submit_bio_noacct_nocheck+0xc2/0x120 > Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237087] submit_bio_noacct+0x209/0x590 > Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237088] submit_bio+0x40/0xf0 > Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237090] __blkdev_direct_IO_async+0x146/0x1f0 > Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237092] blkdev_direct_IO.part.0+0x40/0xa0 > Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237093] blkdev_read_iter+0x9f/0x1a0 > Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237094] aio_read+0xec/0x1d0 > Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237097] ? __io_submit_one.constprop.0+0x113/0x200 > Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237099] __io_submit_one.constprop.0+0x113/0x200 > Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237100] ? __io_submit_one.constprop.0+0x113/0x200 > Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237101] io_submit_one+0xe8/0x3d0 > Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237103] __x64_sys_io_submit+0x84/0x190 > Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237104] ? do_syscall_64+0x69/0x90 > Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237106] ? do_syscall_64+0x69/0x90 > Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237107] do_syscall_64+0x59/0x90 > Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237108] ? syscall_exit_to_user_mode+0x2a/0x50 > Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237110] ? do_syscall_64+0x69/0x90 > Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237111] entry_SYSCALL_64_after_hwframe+0x63/0xcd > Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237113] RIP: 0033:0x7f44f351ea3d > Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237116] RSP: 002b:00007fff1dcfe558 EFLAGS: 00000246 ORIG_RAX: 00000000000000d1 > Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237117] RAX: ffffffffffffffda RBX: 00007f44f2272b68 RCX: 00007f44f351ea3d > Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237118] RDX: 000056315d9ad828 RSI: 0000000000000001 RDI: 00007f44f224f000 > Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237119] RBP: 00007f44f224f000 R08: 00007f44e9430000 R09: 00000000000002d8 > Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237120] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000001 > Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237120] R13: 0000000000000000 R14: 000056315d9ad828 R15: 000056315d9e1830 > Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237122] </TASK> > > This issue can not be reproduced on 5.15 and 6.2 kernels > > From call trace, it got stuck more than 120 seconds waiting on previous IOs to complete for freeing some tags, so new IO requests are able to obtain tags > But in fact, not all previous IOs got stuck, at least some of previous IOs should have completed, but the waiters were not waken up > > This issue is fixed by the upstream commit below which has been merged in kernel 6.1 > commit 4acb83417cadfdcbe64215f9d0ddcf3132af808e > Author: Keith Busch <kbusch@kernel.org> > Date: Fri Sep 9 11:40:22 2022 -0700 > > sbitmap: fix batched wait_cnt accounting > > Batched completions can clear multiple bits, but we're only decrementing > the wait_cnt by one each time. This can cause waiters to never be woken, > stalling IO. Use the batched count instead. > > Link: https://bugzilla.kernel.org/show_bug.cgi?id=215679 > Signed-off-by: Keith Busch <kbusch@kernel.org> > Link: https://lore.kernel.org/r/20220909184022.1709476-1-kbusch@fb.com > Signed-off-by: Jens Axboe <axboe@kernel.dk> > > To SRU the commit to 5.19 kernel, we also need to SRU all its dependencies and one further fix: > All of the commits, as you mention above, have landed upstream (6.1). As such, I would have expected the relevant footers, including your Signed-off-by: "(cherry picked from" ... or "(backported from ..." Signed-off-by: ... > 30514bd2dd4e sbitmap: fix lockup while swapping This was a clean cherry-pick from 30514bd2dd4e86a3ecfd6a93a3eadf7b9ea164a0. > 4acb83417cad sbitmap: fix batched wait_cnt accounting Clean cherry-pick: 4acb83417cadfdcbe64215f9d0ddcf3132af808e. > c35227d4e8cb sbitmap: Use atomic_long_try_cmpxchg in __sbitmap_queue_get_batch Clean cherry-pick: c35227d4e8cbc70a6622cc7cc5f8c3bff513f1fa. > 48c033314f37 sbitmap: Avoid leaving waitqueue in invalid state in __sbq_wake_up() Clean cherry-pick: 48c033314f372478548203c583529f53080fd078. > bce1b56c7382 Revert "sbitmap: fix batched wait_cnt accounting" I don't understand the meaning of this revert. You are adding the commit and reverting it later. That revert is also upstream (bce1b56c73826fec8caf6187f0c922ede397a5a8) but is this "dance" needed for this submission? > 16ede66973c8 sbitmap: fix batched wait_cnt accounting 4acb83417cadfdcbe64215f9d0ddcf3132af808e doesn't apply cleanly. Your submission does so this should have been mentioned as a backport with some details on its implications. > ddbfc34fcf5d sbitmap: remove unnecessary code in __sbitmap_queue_get_batch Clean cherry-pick: ddbfc34fcf5d0bc33b006b90c580c56edeb31068. > 040b83fcecfb sbitmap: fix possible io hung due to lost wakeup Clean cherry-pick: 040b83fcecfb86f3225d3a5de7fd9b3fbccf83b4. > > [ Test Plan ] > > This can be simply reproduced by launching an instance on AWS EC2 > And run the fio command on a nvme device for few hours to make sure IOs don’t get stuck > > I’ve built a test kernel with above commits on 5.19.0-43 generic kernel here: > https://launchpad.net/~gerald-yang-tw/+archive/ubuntu/361041-generic > > With this test kernel, fio has been running for few hours without any issue > > [ Where problems could occur ] > > The sbitmap is mainly used in blk-mq in block layer, scsi drivers and fungible ethernet driver > If there is any issue happens in sbitmap, the symptom should be IO hung, or packets get stuck in fungible driver > > Hugh Dickins (1): > sbitmap: fix lockup while swapping > > Jan Kara (1): > sbitmap: Avoid leaving waitqueue in invalid state in __sbq_wake_up() > > Jens Axboe (1): > Revert "sbitmap: fix batched wait_cnt accounting" > > Keith Busch (2): > sbitmap: fix batched wait_cnt accounting > sbitmap: fix batched wait_cnt accounting > > Liu Song (1): > sbitmap: remove unnecessary code in __sbitmap_queue_get_batch > > Uros Bizjak (1): > sbitmap: Use atomic_long_try_cmpxchg in __sbitmap_queue_get_batch > > Yu Kuai (1): > sbitmap: fix possible io hung due to lost wakeup > > block/blk-mq-tag.c | 2 +- > include/linux/sbitmap.h | 3 +- > lib/sbitmap.c | 109 ++++++++++++++++++++++++++-------------- > 3 files changed, 73 insertions(+), 41 deletions(-) I'll avoid NACK-ing for now as we could fix all these when we apply but I would resubmit.
On 23/06/05 10:41AM, Andrei Gherzan wrote: > On 23/06/05 01:45PM, Gerald Yang wrote: > > BugLink: https://bugs.launchpad.net/bugs/2022318 You also miss the BugLink references in the patches. I'll NACK as I think a refresh would be way easier to work with for the maintainers. > > > > SRU Justification: > > > > [ Impact ] > > > > When running fio on a NVME on an AWS test instance with 5.19 kernel, IOs get stuck and fio never ends > > > > fio command: > > sudo fio --name=read_iops_test --filename=/dev/nvme1n1 --filesize=50G --time_based --ramp_time=2s --runtime=1m --ioengine=libaio --direct=1 --verify=0 --randrepeat=0 --bs=16K --iodepth=256 --rw=randread > > read_iops_test: (g=0): rw=randread, bs=(R) 16.0KiB-16.0KiB, (W) 16.0KiB-16.0KiB, (T) 16.0KiB-16.0KiB, ioengine=libaio, iodepth=256 > > fio-3.28 > > Starting 1 process > > Jobs: 1 (f=0): [/(1)][-.-%][eta 01m:02s] > > > > IOs completely get stuck, after a while kernel log shows: > > Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.230970] INFO: task fio:2545 blocked for more than 120 seconds. > > Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.232878] Not tainted 5.19.0-43-generic #44~22.04.1-Ubuntu > > Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.234738] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > > Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237053] task:fio state:D stack: 0 pid: 2545 ppid: 2495 flags:0x00000002 > > Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237057] Call Trace: > > Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237058] <TASK> > > Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237061] __schedule+0x257/0x5d0 > > Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237066] schedule+0x68/0x110 > > Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237068] io_schedule+0x46/0x80 > > Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237069] blk_mq_get_tag+0x117/0x300 > > Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237072] ? destroy_sched_domains_rcu+0x40/0x40 > > Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237076] __blk_mq_alloc_requests+0xc4/0x1e0 > > Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237079] blk_mq_get_new_requests+0xf6/0x1a0 > > Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237080] blk_mq_submit_bio+0x1eb/0x440 > > Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237082] __submit_bio+0x109/0x1a0 > > Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237085] submit_bio_noacct_nocheck+0xc2/0x120 > > Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237087] submit_bio_noacct+0x209/0x590 > > Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237088] submit_bio+0x40/0xf0 > > Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237090] __blkdev_direct_IO_async+0x146/0x1f0 > > Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237092] blkdev_direct_IO.part.0+0x40/0xa0 > > Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237093] blkdev_read_iter+0x9f/0x1a0 > > Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237094] aio_read+0xec/0x1d0 > > Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237097] ? __io_submit_one.constprop.0+0x113/0x200 > > Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237099] __io_submit_one.constprop.0+0x113/0x200 > > Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237100] ? __io_submit_one.constprop.0+0x113/0x200 > > Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237101] io_submit_one+0xe8/0x3d0 > > Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237103] __x64_sys_io_submit+0x84/0x190 > > Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237104] ? do_syscall_64+0x69/0x90 > > Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237106] ? do_syscall_64+0x69/0x90 > > Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237107] do_syscall_64+0x59/0x90 > > Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237108] ? syscall_exit_to_user_mode+0x2a/0x50 > > Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237110] ? do_syscall_64+0x69/0x90 > > Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237111] entry_SYSCALL_64_after_hwframe+0x63/0xcd > > Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237113] RIP: 0033:0x7f44f351ea3d > > Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237116] RSP: 002b:00007fff1dcfe558 EFLAGS: 00000246 ORIG_RAX: 00000000000000d1 > > Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237117] RAX: ffffffffffffffda RBX: 00007f44f2272b68 RCX: 00007f44f351ea3d > > Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237118] RDX: 000056315d9ad828 RSI: 0000000000000001 RDI: 00007f44f224f000 > > Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237119] RBP: 00007f44f224f000 R08: 00007f44e9430000 R09: 00000000000002d8 > > Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237120] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000001 > > Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237120] R13: 0000000000000000 R14: 000056315d9ad828 R15: 000056315d9e1830 > > Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237122] </TASK> > > > > This issue can not be reproduced on 5.15 and 6.2 kernels > > > > From call trace, it got stuck more than 120 seconds waiting on previous IOs to complete for freeing some tags, so new IO requests are able to obtain tags > > But in fact, not all previous IOs got stuck, at least some of previous IOs should have completed, but the waiters were not waken up > > > > This issue is fixed by the upstream commit below which has been merged in kernel 6.1 > > commit 4acb83417cadfdcbe64215f9d0ddcf3132af808e > > Author: Keith Busch <kbusch@kernel.org> > > Date: Fri Sep 9 11:40:22 2022 -0700 > > > > sbitmap: fix batched wait_cnt accounting > > > > Batched completions can clear multiple bits, but we're only decrementing > > the wait_cnt by one each time. This can cause waiters to never be woken, > > stalling IO. Use the batched count instead. > > > > Link: https://bugzilla.kernel.org/show_bug.cgi?id=215679 > > Signed-off-by: Keith Busch <kbusch@kernel.org> > > Link: https://lore.kernel.org/r/20220909184022.1709476-1-kbusch@fb.com > > Signed-off-by: Jens Axboe <axboe@kernel.dk> > > > > To SRU the commit to 5.19 kernel, we also need to SRU all its dependencies and one further fix: > > > > All of the commits, as you mention above, have landed upstream (6.1). As > such, I would have expected the relevant footers, including your > Signed-off-by: > > "(cherry picked from" ... or "(backported from ..." > Signed-off-by: ... > > > 30514bd2dd4e sbitmap: fix lockup while swapping > > This was a clean cherry-pick from > 30514bd2dd4e86a3ecfd6a93a3eadf7b9ea164a0. > > > 4acb83417cad sbitmap: fix batched wait_cnt accounting > > Clean cherry-pick: > 4acb83417cadfdcbe64215f9d0ddcf3132af808e. > > > c35227d4e8cb sbitmap: Use atomic_long_try_cmpxchg in __sbitmap_queue_get_batch > > Clean cherry-pick: > c35227d4e8cbc70a6622cc7cc5f8c3bff513f1fa. > > > 48c033314f37 sbitmap: Avoid leaving waitqueue in invalid state in __sbq_wake_up() > > Clean cherry-pick: > 48c033314f372478548203c583529f53080fd078. > > > bce1b56c7382 Revert "sbitmap: fix batched wait_cnt accounting" > > I don't understand the meaning of this revert. You are adding the > commit and reverting it later. That revert is also upstream > (bce1b56c73826fec8caf6187f0c922ede397a5a8) but is > this "dance" needed for this submission? > > > 16ede66973c8 sbitmap: fix batched wait_cnt accounting > > 4acb83417cadfdcbe64215f9d0ddcf3132af808e doesn't apply cleanly. Your > submission does so this should have been mentioned as a backport with > some details on its implications. > > > ddbfc34fcf5d sbitmap: remove unnecessary code in __sbitmap_queue_get_batch > > Clean cherry-pick: > ddbfc34fcf5d0bc33b006b90c580c56edeb31068. > > > 040b83fcecfb sbitmap: fix possible io hung due to lost wakeup > > Clean cherry-pick: > 040b83fcecfb86f3225d3a5de7fd9b3fbccf83b4. > > > > > [ Test Plan ] > > > > This can be simply reproduced by launching an instance on AWS EC2 > > And run the fio command on a nvme device for few hours to make sure IOs don’t get stuck > > > > I’ve built a test kernel with above commits on 5.19.0-43 generic kernel here: > > https://launchpad.net/~gerald-yang-tw/+archive/ubuntu/361041-generic > > > > With this test kernel, fio has been running for few hours without any issue > > > > [ Where problems could occur ] > > > > The sbitmap is mainly used in blk-mq in block layer, scsi drivers and fungible ethernet driver > > If there is any issue happens in sbitmap, the symptom should be IO hung, or packets get stuck in fungible driver > > > > Hugh Dickins (1): > > sbitmap: fix lockup while swapping > > > > Jan Kara (1): > > sbitmap: Avoid leaving waitqueue in invalid state in __sbq_wake_up() > > > > Jens Axboe (1): > > Revert "sbitmap: fix batched wait_cnt accounting" > > > > Keith Busch (2): > > sbitmap: fix batched wait_cnt accounting > > sbitmap: fix batched wait_cnt accounting > > > > Liu Song (1): > > sbitmap: remove unnecessary code in __sbitmap_queue_get_batch > > > > Uros Bizjak (1): > > sbitmap: Use atomic_long_try_cmpxchg in __sbitmap_queue_get_batch > > > > Yu Kuai (1): > > sbitmap: fix possible io hung due to lost wakeup > > > > block/blk-mq-tag.c | 2 +- > > include/linux/sbitmap.h | 3 +- > > lib/sbitmap.c | 109 ++++++++++++++++++++++++++-------------- > > 3 files changed, 73 insertions(+), 41 deletions(-) > > I'll avoid NACK-ing for now as we could fix all these when we apply but > I would resubmit. > > -- > Andrei Gherzan
Thanks Andrei for the review and the details, I will adjust the patches and resubmit the SRU Regards, Gerald On Mon, Jun 5, 2023 at 5:41 PM Andrei Gherzan <andrei.gherzan@canonical.com> wrote: > On 23/06/05 01:45PM, Gerald Yang wrote: > > BugLink: https://bugs.launchpad.net/bugs/2022318 > > > > SRU Justification: > > > > [ Impact ] > > > > When running fio on a NVME on an AWS test instance with 5.19 kernel, IOs > get stuck and fio never ends > > > > fio command: > > sudo fio --name=read_iops_test --filename=/dev/nvme1n1 --filesize=50G > --time_based --ramp_time=2s --runtime=1m --ioengine=libaio --direct=1 > --verify=0 --randrepeat=0 --bs=16K --iodepth=256 --rw=randread > > read_iops_test: (g=0): rw=randread, bs=(R) 16.0KiB-16.0KiB, (W) > 16.0KiB-16.0KiB, (T) 16.0KiB-16.0KiB, ioengine=libaio, iodepth=256 > > fio-3.28 > > Starting 1 process > > Jobs: 1 (f=0): [/(1)][-.-%][eta 01m:02s] > > > > IOs completely get stuck, after a while kernel log shows: > > Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.230970] INFO: task > fio:2545 blocked for more than 120 seconds. > > Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.232878] Not tainted > 5.19.0-43-generic #44~22.04.1-Ubuntu > > Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.234738] "echo 0 > > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > > Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237053] task:fio state:D > stack: 0 pid: 2545 ppid: 2495 flags:0x00000002 > > Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237057] Call Trace: > > Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237058] <TASK> > > Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237061] > __schedule+0x257/0x5d0 > > Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237066] schedule+0x68/0x110 > > Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237068] > io_schedule+0x46/0x80 > > Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237069] > blk_mq_get_tag+0x117/0x300 > > Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237072] ? > destroy_sched_domains_rcu+0x40/0x40 > > Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237076] > __blk_mq_alloc_requests+0xc4/0x1e0 > > Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237079] > blk_mq_get_new_requests+0xf6/0x1a0 > > Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237080] > blk_mq_submit_bio+0x1eb/0x440 > > Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237082] > __submit_bio+0x109/0x1a0 > > Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237085] > submit_bio_noacct_nocheck+0xc2/0x120 > > Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237087] > submit_bio_noacct+0x209/0x590 > > Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237088] > submit_bio+0x40/0xf0 > > Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237090] > __blkdev_direct_IO_async+0x146/0x1f0 > > Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237092] > blkdev_direct_IO.part.0+0x40/0xa0 > > Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237093] > blkdev_read_iter+0x9f/0x1a0 > > Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237094] aio_read+0xec/0x1d0 > > Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237097] ? > __io_submit_one.constprop.0+0x113/0x200 > > Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237099] > __io_submit_one.constprop.0+0x113/0x200 > > Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237100] ? > __io_submit_one.constprop.0+0x113/0x200 > > Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237101] > io_submit_one+0xe8/0x3d0 > > Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237103] > __x64_sys_io_submit+0x84/0x190 > > Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237104] ? > do_syscall_64+0x69/0x90 > > Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237106] ? > do_syscall_64+0x69/0x90 > > Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237107] > do_syscall_64+0x59/0x90 > > Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237108] ? > syscall_exit_to_user_mode+0x2a/0x50 > > Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237110] ? > do_syscall_64+0x69/0x90 > > Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237111] > entry_SYSCALL_64_after_hwframe+0x63/0xcd > > Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237113] RIP: > 0033:0x7f44f351ea3d > > Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237116] RSP: > 002b:00007fff1dcfe558 EFLAGS: 00000246 ORIG_RAX: 00000000000000d1 > > Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237117] RAX: > ffffffffffffffda RBX: 00007f44f2272b68 RCX: 00007f44f351ea3d > > Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237118] RDX: > 000056315d9ad828 RSI: 0000000000000001 RDI: 00007f44f224f000 > > Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237119] RBP: > 00007f44f224f000 R08: 00007f44e9430000 R09: 00000000000002d8 > > Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237120] R10: > 0000000000000000 R11: 0000000000000246 R12: 0000000000000001 > > Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237120] R13: > 0000000000000000 R14: 000056315d9ad828 R15: 000056315d9e1830 > > Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237122] </TASK> > > > > This issue can not be reproduced on 5.15 and 6.2 kernels > > > > From call trace, it got stuck more than 120 seconds waiting on previous > IOs to complete for freeing some tags, so new IO requests are able to > obtain tags > > But in fact, not all previous IOs got stuck, at least some of previous > IOs should have completed, but the waiters were not waken up > > > > This issue is fixed by the upstream commit below which has been merged > in kernel 6.1 > > commit 4acb83417cadfdcbe64215f9d0ddcf3132af808e > > Author: Keith Busch <kbusch@kernel.org> > > Date: Fri Sep 9 11:40:22 2022 -0700 > > > > sbitmap: fix batched wait_cnt accounting > > > > Batched completions can clear multiple bits, but we're only > decrementing > > the wait_cnt by one each time. This can cause waiters to never be > woken, > > stalling IO. Use the batched count instead. > > > > Link: https://bugzilla.kernel.org/show_bug.cgi?id=215679 > > Signed-off-by: Keith Busch <kbusch@kernel.org> > > Link: > https://lore.kernel.org/r/20220909184022.1709476-1-kbusch@fb.com > > Signed-off-by: Jens Axboe <axboe@kernel.dk> > > > > To SRU the commit to 5.19 kernel, we also need to SRU all its > dependencies and one further fix: > > > > All of the commits, as you mention above, have landed upstream (6.1). As > such, I would have expected the relevant footers, including your > Signed-off-by: > > "(cherry picked from" ... or "(backported from ..." > Signed-off-by: ... > > > 30514bd2dd4e sbitmap: fix lockup while swapping > > This was a clean cherry-pick from > 30514bd2dd4e86a3ecfd6a93a3eadf7b9ea164a0. > > > 4acb83417cad sbitmap: fix batched wait_cnt accounting > > Clean cherry-pick: > 4acb83417cadfdcbe64215f9d0ddcf3132af808e. > > > c35227d4e8cb sbitmap: Use atomic_long_try_cmpxchg in > __sbitmap_queue_get_batch > > Clean cherry-pick: > c35227d4e8cbc70a6622cc7cc5f8c3bff513f1fa. > > > 48c033314f37 sbitmap: Avoid leaving waitqueue in invalid state in > __sbq_wake_up() > > Clean cherry-pick: > 48c033314f372478548203c583529f53080fd078. > > > bce1b56c7382 Revert "sbitmap: fix batched wait_cnt accounting" > > I don't understand the meaning of this revert. You are adding the > commit and reverting it later. That revert is also upstream > (bce1b56c73826fec8caf6187f0c922ede397a5a8) but is > this "dance" needed for this submission? > > > 16ede66973c8 sbitmap: fix batched wait_cnt accounting > > 4acb83417cadfdcbe64215f9d0ddcf3132af808e doesn't apply cleanly. Your > submission does so this should have been mentioned as a backport with > some details on its implications. > > > ddbfc34fcf5d sbitmap: remove unnecessary code in > __sbitmap_queue_get_batch > > Clean cherry-pick: > ddbfc34fcf5d0bc33b006b90c580c56edeb31068. > > > 040b83fcecfb sbitmap: fix possible io hung due to lost wakeup > > Clean cherry-pick: > 040b83fcecfb86f3225d3a5de7fd9b3fbccf83b4. > > > > > [ Test Plan ] > > > > This can be simply reproduced by launching an instance on AWS EC2 > > And run the fio command on a nvme device for few hours to make sure IOs > don’t get stuck > > > > I’ve built a test kernel with above commits on 5.19.0-43 generic kernel > here: > > https://launchpad.net/~gerald-yang-tw/+archive/ubuntu/361041-generic > > > > With this test kernel, fio has been running for few hours without any > issue > > > > [ Where problems could occur ] > > > > The sbitmap is mainly used in blk-mq in block layer, scsi drivers and > fungible ethernet driver > > If there is any issue happens in sbitmap, the symptom should be IO hung, > or packets get stuck in fungible driver > > > > Hugh Dickins (1): > > sbitmap: fix lockup while swapping > > > > Jan Kara (1): > > sbitmap: Avoid leaving waitqueue in invalid state in __sbq_wake_up() > > > > Jens Axboe (1): > > Revert "sbitmap: fix batched wait_cnt accounting" > > > > Keith Busch (2): > > sbitmap: fix batched wait_cnt accounting > > sbitmap: fix batched wait_cnt accounting > > > > Liu Song (1): > > sbitmap: remove unnecessary code in __sbitmap_queue_get_batch > > > > Uros Bizjak (1): > > sbitmap: Use atomic_long_try_cmpxchg in __sbitmap_queue_get_batch > > > > Yu Kuai (1): > > sbitmap: fix possible io hung due to lost wakeup > > > > block/blk-mq-tag.c | 2 +- > > include/linux/sbitmap.h | 3 +- > > lib/sbitmap.c | 109 ++++++++++++++++++++++++++-------------- > > 3 files changed, 73 insertions(+), 41 deletions(-) > > I'll avoid NACK-ing for now as we could fix all these when we apply but > I would resubmit. > > -- > Andrei Gherzan >