From patchwork Tue Jun 6 07:22:23 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Gerald Yang X-Patchwork-Id: 1790842 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=lists.ubuntu.com (client-ip=91.189.94.19; helo=huckleberry.canonical.com; envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=) Authentication-Results: legolas.ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=canonical.com header.i=@canonical.com header.a=rsa-sha256 header.s=20210705 header.b=wZVBnw8l; dkim-atps=neutral Received: from huckleberry.canonical.com (huckleberry.canonical.com [91.189.94.19]) (using TLSv1.2 with cipher ECDHE-ECDSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4Qb24b59nmz20Wd for ; Tue, 6 Jun 2023 17:22:55 +1000 (AEST) Received: from localhost ([127.0.0.1] helo=huckleberry.canonical.com) by huckleberry.canonical.com with esmtp (Exim 4.86_2) (envelope-from ) id 1q6R1i-0000LQ-HD; Tue, 06 Jun 2023 07:22:46 +0000 Received: from smtp-relay-internal-0.internal ([10.131.114.225] helo=smtp-relay-internal-0.canonical.com) by huckleberry.canonical.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1q6R1f-0000L5-OO for kernel-team@lists.ubuntu.com; Tue, 06 Jun 2023 07:22:43 +0000 Received: from mail-pf1-f197.google.com (mail-pf1-f197.google.com [209.85.210.197]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by smtp-relay-internal-0.canonical.com (Postfix) with ESMTPS id 2AE143F137 for ; Tue, 6 Jun 2023 07:22:43 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=canonical.com; s=20210705; t=1686036163; bh=hy1xuVtLpu/QaNRxRkTp+rVeUNR+ey3l03Gltnp4lGI=; h=From:To:Subject:Date:Message-Id:MIME-Version:Content-Type; b=wZVBnw8lXXI3YJ5LGwmdvZ3fJkddw51SSXqxGFMYkBVPc4tMZ6AuwJvRZ2BjT6uVV IjOjT61qqONxIgTQqBawSAcfK+6y/7VXPtIrFC7QNe2ggwkjwotBEcTRyFBW2ABnmk 6zGzdK475RYCIHglaSYSTehtkanRs28BxHY+Tab+TvI5U4i0AXqPXREB8WIaXpmIuJ 6Vn9Z5c8FrFbQzYS/QblJ7IpMOPHd5CA7AhMdGH7jELEN89ptcH3ffNwwnae6vrLsT pJcHUncFRsRrxuAT55HwemnrssLhtbtyaAqlgnjbS1XS6TBPutZdoWNTM+6a2zCdl9 ik54H3kF2446w== Received: by mail-pf1-f197.google.com with SMTP id d2e1a72fcca58-653c16b3093so1242550b3a.3 for ; Tue, 06 Jun 2023 00:22:43 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1686036161; x=1688628161; h=content-transfer-encoding:mime-version:message-id:date:subject:to :from:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=hy1xuVtLpu/QaNRxRkTp+rVeUNR+ey3l03Gltnp4lGI=; b=BJS3m7J2LF6m587sRSLNQjE7iiIvKri3LF2MtrWdLw4NHMFaCEAYIpmsIjLXkf1UY1 Z0fTo41za6ltY2I+vZTOEapHzLK8fzSEyQtss6FA/LZL8ojBG+xma8WKg59gInzfsSjA wHPCeUkSjaSns/by5mY7iVBPxbe0mDVzCM3CqOYIPFw9wXkQ/G7pVzaQt6+XGxOZ/0SC BY+r324YropLjvXqf2qwNOTBVXEGq6yWrOK8auaoE5h2Ob1WNdSqBVpb/SJd3q3ZfaWZ iuZj4K4YSMzGRbtb3ggUQ/aoiivV9E/hb8o2WjoeXXVSNdVDgTLTvZRdS3KH9QIffOEe OgIg== X-Gm-Message-State: AC+VfDyCkoe0Q3lHAf/zLapicXwLs+y0CzuJGLqwXBfIQQx/4YiPpT10 UAJSGRzon5K/L1hgjEuV19dHKdLFFFzbuVGKIHIxzg5XNdna9/pmoOVVQ0Raciy1AwsCpmOywhJ GV3+p20eQ3SZdVHmVn0tW3A7jh2eBkNUXgy6BJPwoB1WpBN8iOQ== X-Received: by 2002:a05:6a20:7f98:b0:10a:cbe6:69f0 with SMTP id d24-20020a056a207f9800b0010acbe669f0mr835548pzj.10.1686036161055; Tue, 06 Jun 2023 00:22:41 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ6FD3khEtxUej468W8xX3c9J/MlJXhHX8JyL3xpJ8HAQn8EM0wK7W374k1IMcYkXNkiHlbG3Q== X-Received: by 2002:a05:6a20:7f98:b0:10a:cbe6:69f0 with SMTP id d24-20020a056a207f9800b0010acbe669f0mr835537pzj.10.1686036160396; Tue, 06 Jun 2023 00:22:40 -0700 (PDT) Received: from localhost.localdomain (220-135-31-21.hinet-ip.hinet.net. [220.135.31.21]) by smtp.gmail.com with ESMTPSA id j6-20020a170902758600b001a260b5319bsm7776954pll.91.2023.06.06.00.22.39 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 06 Jun 2023 00:22:40 -0700 (PDT) From: Gerald Yang To: kernel-team@lists.ubuntu.com Subject: [PATCH 0/6] *** SUBJECT HERE *** Date: Tue, 6 Jun 2023 15:22:23 +0800 Message-Id: <20230606072229.3988976-1-gerald.yang@canonical.com> X-Mailer: git-send-email 2.25.1 MIME-Version: 1.0 X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: "kernel-team" BugLink: https://bugs.launchpad.net/bugs/2022318 SRU Justification: [ Impact ] When running fio on a NVME on an AWS test instance with 5.19 kernel, IOs get stuck and fio never ends fio command: sudo fio --name=read_iops_test --filename=/dev/nvme1n1 --filesize=50G --time_based --ramp_time=2s --runtime=1m --ioengine=libaio --direct=1 --verify=0 --randrepeat=0 --bs=16K --iodepth=256 --rw=randread read_iops_test: (g=0): rw=randread, bs=(R) 16.0KiB-16.0KiB, (W) 16.0KiB-16.0KiB, (T) 16.0KiB-16.0KiB, ioengine=libaio, iodepth=256 fio-3.28 Starting 1 process Jobs: 1 (f=0): [/(1)][-.-%][eta 01m:02s] IOs completely get stuck, after a while kernel log shows: Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.230970] INFO: task fio:2545 blocked for more than 120 seconds. Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.232878] Not tainted 5.19.0-43-generic #44~22.04.1-Ubuntu Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.234738] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237053] task:fio state:D stack: 0 pid: 2545 ppid: 2495 flags:0x00000002 Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237057] Call Trace: Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237058] Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237061] __schedule+0x257/0x5d0 Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237066] schedule+0x68/0x110 Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237068] io_schedule+0x46/0x80 Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237069] blk_mq_get_tag+0x117/0x300 Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237072] ? destroy_sched_domains_rcu+0x40/0x40 Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237076] __blk_mq_alloc_requests+0xc4/0x1e0 Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237079] blk_mq_get_new_requests+0xf6/0x1a0 Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237080] blk_mq_submit_bio+0x1eb/0x440 Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237082] __submit_bio+0x109/0x1a0 Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237085] submit_bio_noacct_nocheck+0xc2/0x120 Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237087] submit_bio_noacct+0x209/0x590 Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237088] submit_bio+0x40/0xf0 Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237090] __blkdev_direct_IO_async+0x146/0x1f0 Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237092] blkdev_direct_IO.part.0+0x40/0xa0 Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237093] blkdev_read_iter+0x9f/0x1a0 Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237094] aio_read+0xec/0x1d0 Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237097] ? __io_submit_one.constprop.0+0x113/0x200 Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237099] __io_submit_one.constprop.0+0x113/0x200 Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237100] ? __io_submit_one.constprop.0+0x113/0x200 Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237101] io_submit_one+0xe8/0x3d0 Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237103] __x64_sys_io_submit+0x84/0x190 Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237104] ? do_syscall_64+0x69/0x90 Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237106] ? do_syscall_64+0x69/0x90 Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237107] do_syscall_64+0x59/0x90 Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237108] ? syscall_exit_to_user_mode+0x2a/0x50 Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237110] ? do_syscall_64+0x69/0x90 Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237111] entry_SYSCALL_64_after_hwframe+0x63/0xcd Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237113] RIP: 0033:0x7f44f351ea3d Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237116] RSP: 002b:00007fff1dcfe558 EFLAGS: 00000246 ORIG_RAX: 00000000000000d1 Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237117] RAX: ffffffffffffffda RBX: 00007f44f2272b68 RCX: 00007f44f351ea3d Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237118] RDX: 000056315d9ad828 RSI: 0000000000000001 RDI: 00007f44f224f000 Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237119] RBP: 00007f44f224f000 R08: 00007f44e9430000 R09: 00000000000002d8 Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237120] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000001 Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237120] R13: 0000000000000000 R14: 000056315d9ad828 R15: 000056315d9e1830 Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237122] This issue can not be reproduced on 5.15 and 6.2 kernels From call trace, it got stuck more than 120 seconds waiting on previous IOs to complete for freeing some tags, so new IO requests are able to obtain tags But in fact, not all previous IOs got stuck, at least some of previous IOs should have completed, but the waiters were not waken up This issue is fixed by the upstream commit below which has been merged in kernel 6.1 commit 4acb83417cadfdcbe64215f9d0ddcf3132af808e Author: Keith Busch Date: Fri Sep 9 11:40:22 2022 -0700 sbitmap: fix batched wait_cnt accounting Batched completions can clear multiple bits, but we're only decrementing the wait_cnt by one each time. This can cause waiters to never be woken, stalling IO. Use the batched count instead. Link: https://bugzilla.kernel.org/show_bug.cgi?id=215679 Signed-off-by: Keith Busch Link: https://lore.kernel.org/r/20220909184022.1709476-1-kbusch@fb.com Signed-off-by: Jens Axboe This commit can not be cherry picked cleanly, so we also need to SRU all its dependencies and one further fix below: 30514bd2dd4e sbitmap: fix lockup while swapping 4acb83417cad sbitmap: fix batched wait_cnt accounting c35227d4e8cb sbitmap: Use atomic_long_try_cmpxchg in __sbitmap_queue_get_batch 48c033314f37 sbitmap: Avoid leaving waitqueue in invalid state in __sbq_wake_up() ddbfc34fcf5d sbitmap: remove unnecessary code in __sbitmap_queue_get_batch 040b83fcecfb sbitmap: fix possible io hung due to lost wakeup [ Test Plan ] This can be simply reproduced by launching an instance on AWS EC2 And run the fio command on a nvme device for few hours to make sure IOs don’t get stuck I’ve built a test kernel with above commits on 5.19.0-43 generic kernel here: https://launchpad.net/~gerald-yang-tw/+archive/ubuntu/361041-generic With this test kernel, fio has been running for few hours without any issue [ Where problems could occur ] The sbitmap is mainly used in blk-mq in block layer, scsi drivers and fungible ethernet driver If there is any issue happens in sbitmap, the symptom should be IO hung, or packets get stuck in fungible driver Hugh Dickins (1): sbitmap: fix lockup while swapping Jan Kara (1): sbitmap: Avoid leaving waitqueue in invalid state in __sbq_wake_up() Keith Busch (1): sbitmap: fix batched wait_cnt accounting Liu Song (1): sbitmap: remove unnecessary code in __sbitmap_queue_get_batch Uros Bizjak (1): sbitmap: Use atomic_long_try_cmpxchg in __sbitmap_queue_get_batch Yu Kuai (1): sbitmap: fix possible io hung due to lost wakeup block/blk-mq-tag.c | 2 +- include/linux/sbitmap.h | 3 +- lib/sbitmap.c | 109 ++++++++++++++++++++++++++-------------- 3 files changed, 73 insertions(+), 41 deletions(-)