From patchwork Mon Jun 5 05:45:53 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Gerald Yang X-Patchwork-Id: 1790219 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=lists.ubuntu.com (client-ip=91.189.94.19; helo=huckleberry.canonical.com; envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=) Authentication-Results: legolas.ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=canonical.com header.i=@canonical.com header.a=rsa-sha256 header.s=20210705 header.b=t1UI6tvU; dkim-atps=neutral Received: from huckleberry.canonical.com (huckleberry.canonical.com [91.189.94.19]) (using TLSv1.2 with cipher ECDHE-ECDSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4QZN000ZD7z20Tb for ; Mon, 5 Jun 2023 15:46:38 +1000 (AEST) Received: from localhost ([127.0.0.1] helo=huckleberry.canonical.com) by huckleberry.canonical.com with esmtp (Exim 4.86_2) (envelope-from ) id 1q632s-00076A-QW; Mon, 05 Jun 2023 05:46:22 +0000 Received: from smtp-relay-internal-1.internal ([10.131.114.114] helo=smtp-relay-internal-1.canonical.com) by huckleberry.canonical.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1q632q-00075h-PI for kernel-team@lists.ubuntu.com; Mon, 05 Jun 2023 05:46:20 +0000 Received: from mail-oi1-f199.google.com (mail-oi1-f199.google.com [209.85.167.199]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by smtp-relay-internal-1.canonical.com (Postfix) with ESMTPS id B2B793F0E9 for ; Mon, 5 Jun 2023 05:46:18 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=canonical.com; s=20210705; t=1685943978; bh=VIMhUmUuYjlvLqroZbEfC815JYSB/28V3CLT8i6DkpA=; h=From:To:Subject:Date:Message-Id:MIME-Version:Content-Type; b=t1UI6tvUst6XSRONtitS9bHzilicwybfTWPvP823jgIzC8OJaD9MkatWkrPx/JV0Z O8GpbvppU+kZ9JX54zHH7c0tEkylP782IBsaNi+NOENqqytKvnPCFTvx3oqfYpsGEB nZ5WMAkFEqRwrQwKzh6Sk4NdXH+4I3jWjQxOIwAjwwD3lac2g/3JXDoipi0Mo3TcEv PjxG5csAbNz7Gqi7emuTrEgfNHC3nHACY8SNMGSsbvZ3FmTLih8pAwxwMaQoBgavKe fGDu4Y58CslYtrWiHEPFpQfG39qsdm3S7oJ7Cm2VB+pJ4iQ6x/unfXaIn2iS1JLAIF dvsGxnQNdj/JQ== Received: by mail-oi1-f199.google.com with SMTP id 5614622812f47-39abcc30dfaso445355b6e.3 for ; Sun, 04 Jun 2023 22:46:18 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1685943977; x=1688535977; h=content-transfer-encoding:mime-version:message-id:date:subject:to :from:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=VIMhUmUuYjlvLqroZbEfC815JYSB/28V3CLT8i6DkpA=; b=Ee7CCO77dEDHwixmbFvNGDaX5so4PTeVp8m9aW8YfydPa/i0ar/06IAmKTvO2uYs0L pz/GHwj4rAFwxiHG6N6e1/ziuaQ0jGF/fmNWgxINLrcZ8I/+UrPhD7T6nmGTqEbFPnmk /DtHfQXaXuZcmtoIRuHW0f/kqoAgBoYw3nA6kNCDAGR9N2ujcv5MtEX4qlhYOaZIS8OU angeKlFYpgot1FjdXaI6pexZsenn1G5TEsi7Mgx5aXP5nfy5ze4a5fvtpvMeXXEOoa7M RaICv7MldTUq6VqF3cCHTr4gITPvoPPEQ3sjM9VMtKNkA0nNvIX5WxtVTjcirbiw1++Y cZEA== X-Gm-Message-State: AC+VfDz1ZAb885xnmahbODfhXeDnCOqDq5cuLXS7Dpi2s3qq5gh5u1GQ ll7yB0E6qovfsAS/qXMwrNeoQtqmweo/Y659/WJnSuSH1vtXOdhKL5vhjS+DrSdT/GqCjjkziph QH7PuI9pWyUO1hS8PE9yVO3sOcleYhQ95AV0S9qo55OQE62y6iQ== X-Received: by 2002:aca:1805:0:b0:396:11b3:5851 with SMTP id h5-20020aca1805000000b0039611b35851mr6054754oih.54.1685943977183; Sun, 04 Jun 2023 22:46:17 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ4fTiQZIG0cTS4n8J5MR5zM3Aim5w+blkOSLrvV3gIgCdcxJdOkhzaLoK2wQNkgNx8f6P2ZGQ== X-Received: by 2002:aca:1805:0:b0:396:11b3:5851 with SMTP id h5-20020aca1805000000b0039611b35851mr6054727oih.54.1685943976570; Sun, 04 Jun 2023 22:46:16 -0700 (PDT) Received: from localhost.localdomain (220-135-31-21.hinet-ip.hinet.net. [220.135.31.21]) by smtp.gmail.com with ESMTPSA id p5-20020a170902eac500b001b03a1a3151sm5637798pld.70.2023.06.04.22.46.15 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 04 Jun 2023 22:46:16 -0700 (PDT) From: Gerald Yang To: kernel-team@lists.ubuntu.com Subject: [SRU][K][PATCH 0/8] sbitmap: fix IO hung due to waiters not Date: Mon, 5 Jun 2023 13:45:53 +0800 Message-Id: <20230605054601.1410517-1-gerald.yang@canonical.com> X-Mailer: git-send-email 2.25.1 MIME-Version: 1.0 X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: "kernel-team" BugLink: https://bugs.launchpad.net/bugs/2022318 SRU Justification: [ Impact ] When running fio on a NVME on an AWS test instance with 5.19 kernel, IOs get stuck and fio never ends fio command: sudo fio --name=read_iops_test --filename=/dev/nvme1n1 --filesize=50G --time_based --ramp_time=2s --runtime=1m --ioengine=libaio --direct=1 --verify=0 --randrepeat=0 --bs=16K --iodepth=256 --rw=randread read_iops_test: (g=0): rw=randread, bs=(R) 16.0KiB-16.0KiB, (W) 16.0KiB-16.0KiB, (T) 16.0KiB-16.0KiB, ioengine=libaio, iodepth=256 fio-3.28 Starting 1 process Jobs: 1 (f=0): [/(1)][-.-%][eta 01m:02s] IOs completely get stuck, after a while kernel log shows: Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.230970] INFO: task fio:2545 blocked for more than 120 seconds. Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.232878] Not tainted 5.19.0-43-generic #44~22.04.1-Ubuntu Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.234738] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237053] task:fio state:D stack: 0 pid: 2545 ppid: 2495 flags:0x00000002 Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237057] Call Trace: Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237058] Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237061] __schedule+0x257/0x5d0 Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237066] schedule+0x68/0x110 Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237068] io_schedule+0x46/0x80 Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237069] blk_mq_get_tag+0x117/0x300 Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237072] ? destroy_sched_domains_rcu+0x40/0x40 Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237076] __blk_mq_alloc_requests+0xc4/0x1e0 Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237079] blk_mq_get_new_requests+0xf6/0x1a0 Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237080] blk_mq_submit_bio+0x1eb/0x440 Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237082] __submit_bio+0x109/0x1a0 Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237085] submit_bio_noacct_nocheck+0xc2/0x120 Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237087] submit_bio_noacct+0x209/0x590 Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237088] submit_bio+0x40/0xf0 Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237090] __blkdev_direct_IO_async+0x146/0x1f0 Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237092] blkdev_direct_IO.part.0+0x40/0xa0 Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237093] blkdev_read_iter+0x9f/0x1a0 Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237094] aio_read+0xec/0x1d0 Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237097] ? __io_submit_one.constprop.0+0x113/0x200 Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237099] __io_submit_one.constprop.0+0x113/0x200 Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237100] ? __io_submit_one.constprop.0+0x113/0x200 Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237101] io_submit_one+0xe8/0x3d0 Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237103] __x64_sys_io_submit+0x84/0x190 Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237104] ? do_syscall_64+0x69/0x90 Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237106] ? do_syscall_64+0x69/0x90 Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237107] do_syscall_64+0x59/0x90 Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237108] ? syscall_exit_to_user_mode+0x2a/0x50 Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237110] ? do_syscall_64+0x69/0x90 Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237111] entry_SYSCALL_64_after_hwframe+0x63/0xcd Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237113] RIP: 0033:0x7f44f351ea3d Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237116] RSP: 002b:00007fff1dcfe558 EFLAGS: 00000246 ORIG_RAX: 00000000000000d1 Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237117] RAX: ffffffffffffffda RBX: 00007f44f2272b68 RCX: 00007f44f351ea3d Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237118] RDX: 000056315d9ad828 RSI: 0000000000000001 RDI: 00007f44f224f000 Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237119] RBP: 00007f44f224f000 R08: 00007f44e9430000 R09: 00000000000002d8 Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237120] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000001 Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237120] R13: 0000000000000000 R14: 000056315d9ad828 R15: 000056315d9e1830 Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237122] This issue can not be reproduced on 5.15 and 6.2 kernels From call trace, it got stuck more than 120 seconds waiting on previous IOs to complete for freeing some tags, so new IO requests are able to obtain tags But in fact, not all previous IOs got stuck, at least some of previous IOs should have completed, but the waiters were not waken up This issue is fixed by the upstream commit below which has been merged in kernel 6.1 commit 4acb83417cadfdcbe64215f9d0ddcf3132af808e Author: Keith Busch Date: Fri Sep 9 11:40:22 2022 -0700 sbitmap: fix batched wait_cnt accounting Batched completions can clear multiple bits, but we're only decrementing the wait_cnt by one each time. This can cause waiters to never be woken, stalling IO. Use the batched count instead. Link: https://bugzilla.kernel.org/show_bug.cgi?id=215679 Signed-off-by: Keith Busch Link: https://lore.kernel.org/r/20220909184022.1709476-1-kbusch@fb.com Signed-off-by: Jens Axboe To SRU the commit to 5.19 kernel, we also need to SRU all its dependencies and one further fix: 30514bd2dd4e sbitmap: fix lockup while swapping 4acb83417cad sbitmap: fix batched wait_cnt accounting c35227d4e8cb sbitmap: Use atomic_long_try_cmpxchg in __sbitmap_queue_get_batch 48c033314f37 sbitmap: Avoid leaving waitqueue in invalid state in __sbq_wake_up() bce1b56c7382 Revert "sbitmap: fix batched wait_cnt accounting" 16ede66973c8 sbitmap: fix batched wait_cnt accounting ddbfc34fcf5d sbitmap: remove unnecessary code in __sbitmap_queue_get_batch 040b83fcecfb sbitmap: fix possible io hung due to lost wakeup [ Test Plan ] This can be simply reproduced by launching an instance on AWS EC2 And run the fio command on a nvme device for few hours to make sure IOs don’t get stuck I’ve built a test kernel with above commits on 5.19.0-43 generic kernel here: https://launchpad.net/~gerald-yang-tw/+archive/ubuntu/361041-generic With this test kernel, fio has been running for few hours without any issue [ Where problems could occur ] The sbitmap is mainly used in blk-mq in block layer, scsi drivers and fungible ethernet driver If there is any issue happens in sbitmap, the symptom should be IO hung, or packets get stuck in fungible driver Hugh Dickins (1): sbitmap: fix lockup while swapping Jan Kara (1): sbitmap: Avoid leaving waitqueue in invalid state in __sbq_wake_up() Jens Axboe (1): Revert "sbitmap: fix batched wait_cnt accounting" Keith Busch (2): sbitmap: fix batched wait_cnt accounting sbitmap: fix batched wait_cnt accounting Liu Song (1): sbitmap: remove unnecessary code in __sbitmap_queue_get_batch Uros Bizjak (1): sbitmap: Use atomic_long_try_cmpxchg in __sbitmap_queue_get_batch Yu Kuai (1): sbitmap: fix possible io hung due to lost wakeup block/blk-mq-tag.c | 2 +- include/linux/sbitmap.h | 3 +- lib/sbitmap.c | 109 ++++++++++++++++++++++++++-------------- 3 files changed, 73 insertions(+), 41 deletions(-) Signed-off-by: Signed-off-by: ...