From patchwork Mon Jun 5 05:45:54 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Gerald Yang X-Patchwork-Id: 1790217 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=lists.ubuntu.com (client-ip=91.189.94.19; helo=huckleberry.canonical.com; envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=) Authentication-Results: legolas.ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=canonical.com header.i=@canonical.com header.a=rsa-sha256 header.s=20210705 header.b=AKzwvdQB; dkim-atps=neutral Received: from huckleberry.canonical.com (huckleberry.canonical.com [91.189.94.19]) (using TLSv1.2 with cipher ECDHE-ECDSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4QZN000cQpz20Vv for ; Mon, 5 Jun 2023 15:46:38 +1000 (AEST) Received: from localhost ([127.0.0.1] helo=huckleberry.canonical.com) by huckleberry.canonical.com with esmtp (Exim 4.86_2) (envelope-from ) id 1q632y-00077s-EJ; Mon, 05 Jun 2023 05:46:28 +0000 Received: from smtp-relay-internal-0.internal ([10.131.114.225] helo=smtp-relay-internal-0.canonical.com) by huckleberry.canonical.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1q632r-00075f-HE for kernel-team@lists.ubuntu.com; Mon, 05 Jun 2023 05:46:21 +0000 Received: from mail-qv1-f71.google.com (mail-qv1-f71.google.com [209.85.219.71]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by smtp-relay-internal-0.canonical.com (Postfix) with ESMTPS id 778A93F03D for ; Mon, 5 Jun 2023 05:46:20 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=canonical.com; s=20210705; t=1685943980; bh=mJ+zXVqyZ0+N+aaQai8AEcTmtMOYaQoMje0ODV5NaMU=; h=From:To:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=AKzwvdQBenFQh0VLtUmWuriU3iI3hSoIZyINBr/ARdshTK08me/uqJCnHjL34VxsQ nqERtUtH+HPNisrvt2hQzMdZIvZCQhUhgV9GXDqanIJ+jMr9UbZc+Ohiq2BQxYAJt9 S3NXuMcMBsU/4Qx1bYmVxA6GuwXFG59jbeCW46psazyxRLwPAyFGwOQsFahWx/PEXb CpemXIqCPQW3E4JZN5LoyqxNcq5OKVAsUFuJ7CNEyNLAT85L2DzVg9HYRNc1/cY4/3 +YBA+dKsduo0b4K1b/dRG9bNdFzujux0tVvXH9bafoj2iFMrwP7V/fGXhaerBuxwXQ BPGTk/Dyr3OTg== Received: by mail-qv1-f71.google.com with SMTP id 6a1803df08f44-621257e86daso65846536d6.1 for ; Sun, 04 Jun 2023 22:46:20 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1685943978; x=1688535978; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=mJ+zXVqyZ0+N+aaQai8AEcTmtMOYaQoMje0ODV5NaMU=; b=dXEkmEJn1JcmN+k82FGYotddlnA4qjaxRcyMxcak0Spe4itoKYBN4iaHqYVByDBg0B bzvAHUXBqiZGHrtnWpYGasrB5aBsPnOTzfxcS6wKPe+ujv0PzWQPT+Gs2SiQ3aPMh2Cw QshsnvgZAA0+QIkDHNaq+w9r/dVpIn7ZjCYukNDH/qqIWxf6+aL6DSfxT/+ZWhwVFR0W Df8jjMBRWBQVpNAMNO8NK4I5/c92rbO9RatF/YianK2jZuUDN3JMzOhSY4+Bi1C4o/dr qIxp5Gdn1Fw1haGycJSFGDOLKR++aT1HTbMvFMmW0FKpIX2um7yL3pcf0eU71oe8CllS xZIg== X-Gm-Message-State: AC+VfDwh0j1sGxlqtvvwkZ3AxdNXfi10HyA7qAIAoBlqFZFJoacBLueh ly2GERAPS0s2yNL1FGR7Eu6hf3l/hOUiUfVCEdk6YztOIrBBk3BjTVL9gPpl6r3tl5U7wnRIH6c ULUKnllYein1itCQFgfmE/a224xQF3R/TWFMQWHdp0TdcSmUbzg== X-Received: by 2002:a05:6214:1256:b0:626:1b8f:4940 with SMTP id r22-20020a056214125600b006261b8f4940mr8322498qvv.23.1685943978029; Sun, 04 Jun 2023 22:46:18 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ6R3k1GFuFcZ81fpKuBOTNWeyeIM6nzwxUBjEBPDLR9xDOu2c9d9cZrC3g57E8SoBAyEVrycQ== X-Received: by 2002:a05:6214:1256:b0:626:1b8f:4940 with SMTP id r22-20020a056214125600b006261b8f4940mr8322472qvv.23.1685943977683; Sun, 04 Jun 2023 22:46:17 -0700 (PDT) Received: from localhost.localdomain (220-135-31-21.hinet-ip.hinet.net. [220.135.31.21]) by smtp.gmail.com with ESMTPSA id p5-20020a170902eac500b001b03a1a3151sm5637798pld.70.2023.06.04.22.46.16 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 04 Jun 2023 22:46:17 -0700 (PDT) From: Gerald Yang To: kernel-team@lists.ubuntu.com Subject: [SRU][K][PATCH 1/8] sbitmap: fix possible io hung due to lost wakeup Date: Mon, 5 Jun 2023 13:45:54 +0800 Message-Id: <20230605054601.1410517-2-gerald.yang@canonical.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20230605054601.1410517-1-gerald.yang@canonical.com> References: <20230605054601.1410517-1-gerald.yang@canonical.com> MIME-Version: 1.0 X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: "kernel-team" From: Yu Kuai There are two problems can lead to lost wakeup: 1) invalid wakeup on the wrong waitqueue: For example, 2 * wake_batch tags are put, while only wake_batch threads are woken: __sbq_wake_up atomic_cmpxchg -> reset wait_cnt __sbq_wake_up -> decrease wait_cnt ... __sbq_wake_up -> wait_cnt is decreased to 0 again atomic_cmpxchg sbq_index_atomic_inc -> increase wake_index wake_up_nr -> wake up and waitqueue might be empty sbq_index_atomic_inc -> increase again, one waitqueue is skipped wake_up_nr -> invalid wake up because old wakequeue might be empty To fix the problem, increasing 'wake_index' before resetting 'wait_cnt'. 2) 'wait_cnt' can be decreased while waitqueue is empty As pointed out by Jan Kara, following race is possible: CPU1 CPU2 __sbq_wake_up __sbq_wake_up sbq_wake_ptr() sbq_wake_ptr() -> the same wait_cnt = atomic_dec_return() /* decreased to 0 */ sbq_index_atomic_inc() /* move to next waitqueue */ atomic_set() /* reset wait_cnt */ wake_up_nr() /* wake up on the old waitqueue */ wait_cnt = atomic_dec_return() /* * decrease wait_cnt in the old * waitqueue, while it can be * empty. */ Fix the problem by waking up before updating 'wake_index' and 'wait_cnt'. With this patch, noted that 'wait_cnt' is still decreased in the old empty waitqueue, however, the wakeup is redirected to a active waitqueue, and the extra decrement on the old empty waitqueue is not handled. Fixes: 88459642cba4 ("blk-mq: abstract tag allocation out into sbitmap library") Signed-off-by: Yu Kuai Reviewed-by: Jan Kara Link: https://lore.kernel.org/r/20220803121504.212071-1-yukuai1@huaweicloud.com Signed-off-by: Jens Axboe Signed-off-by: Gerald Yang --- lib/sbitmap.c | 55 ++++++++++++++++++++++++++++++--------------------- 1 file changed, 33 insertions(+), 22 deletions(-) diff --git a/lib/sbitmap.c b/lib/sbitmap.c index 29eb0484215a..1f31147872e6 100644 --- a/lib/sbitmap.c +++ b/lib/sbitmap.c @@ -611,32 +611,43 @@ static bool __sbq_wake_up(struct sbitmap_queue *sbq) return false; wait_cnt = atomic_dec_return(&ws->wait_cnt); - if (wait_cnt <= 0) { - int ret; + /* + * For concurrent callers of this, callers should call this function + * again to wakeup a new batch on a different 'ws'. + */ + if (wait_cnt < 0 || !waitqueue_active(&ws->wait)) + return true; - wake_batch = READ_ONCE(sbq->wake_batch); + if (wait_cnt > 0) + return false; - /* - * Pairs with the memory barrier in sbitmap_queue_resize() to - * ensure that we see the batch size update before the wait - * count is reset. - */ - smp_mb__before_atomic(); + wake_batch = READ_ONCE(sbq->wake_batch); - /* - * For concurrent callers of this, the one that failed the - * atomic_cmpxhcg() race should call this function again - * to wakeup a new batch on a different 'ws'. - */ - ret = atomic_cmpxchg(&ws->wait_cnt, wait_cnt, wake_batch); - if (ret == wait_cnt) { - sbq_index_atomic_inc(&sbq->wake_index); - wake_up_nr(&ws->wait, wake_batch); - return false; - } + /* + * Wake up first in case that concurrent callers decrease wait_cnt + * while waitqueue is empty. + */ + wake_up_nr(&ws->wait, wake_batch); - return true; - } + /* + * Pairs with the memory barrier in sbitmap_queue_resize() to + * ensure that we see the batch size update before the wait + * count is reset. + * + * Also pairs with the implicit barrier between decrementing wait_cnt + * and checking for waitqueue_active() to make sure waitqueue_active() + * sees result of the wakeup if atomic_dec_return() has seen the result + * of atomic_set(). + */ + smp_mb__before_atomic(); + + /* + * Increase wake_index before updating wait_cnt, otherwise concurrent + * callers can see valid wait_cnt in old waitqueue, which can cause + * invalid wakeup on the old waitqueue. + */ + sbq_index_atomic_inc(&sbq->wake_index); + atomic_set(&ws->wait_cnt, wake_batch); return false; }