From patchwork Tue Aug 20 18:01:44 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Philip Cox X-Patchwork-Id: 1974571 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=lists.ubuntu.com (client-ip=185.125.189.65; helo=lists.ubuntu.com; envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=patchwork.ozlabs.org) Received: from lists.ubuntu.com (lists.ubuntu.com [185.125.189.65]) (using TLSv1.2 with cipher ECDHE-ECDSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4WpHRD19Mcz1yXZ for ; Wed, 21 Aug 2024 04:04:23 +1000 (AEST) Received: from localhost ([127.0.0.1] helo=lists.ubuntu.com) by lists.ubuntu.com with esmtp (Exim 4.86_2) (envelope-from ) id 1sgTDI-0001D8-2G; Tue, 20 Aug 2024 18:04:12 +0000 Received: from smtp-relay-internal-0.internal ([10.131.114.225] helo=smtp-relay-internal-0.canonical.com) by lists.ubuntu.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1sgTDF-0001Cr-W9 for kernel-team@lists.ubuntu.com; Tue, 20 Aug 2024 18:04:10 +0000 Received: from mail-qk1-f199.google.com (mail-qk1-f199.google.com [209.85.222.199]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by smtp-relay-internal-0.canonical.com (Postfix) with ESMTPS id 388F93F1EF for ; Tue, 20 Aug 2024 18:04:08 +0000 (UTC) Received: by mail-qk1-f199.google.com with SMTP id af79cd13be357-7a1d7344c81so609585185a.1 for ; Tue, 20 Aug 2024 11:04:08 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1724177046; x=1724781846; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=hx+v8hqBFurBic7MyLU/dTdMta2BKUD9g4qnwaYANzY=; b=vetNnpq9Wz0I65MDqz+P6kb3Ee8R/aDTT1j6gb2HH5JgmqfP+k0SJJ1CTEQVS4c/qh vRZeEXJ/owNphdqn7xwI3aMraFlBqsKQUBRAO9AQBebvXpCef8q/9AG9zdduxXHVP0G/ 0Vp092yOHB7aIeRgC0D/KBnVuTS479TWV0VJHuN24IMf5OknjIENVDnkTbrkUUm1asT/ m1L+XPTkDL2Z9cix6BtJNwu1gzLU5lfrRDYXEOVA6E4ubX7cjNzy4c0ksu3ZJaoCCJ23 PZ5NL7dDssSD+jMSxkaJS/EIqC5PhhyfRGZmw5Lrglb6OTBWxs6aoUei32J/rWw1jqau Ij0g== X-Gm-Message-State: AOJu0Yz6MgvEG3JPZCPLay9atxTgvREA94G1GBNit+vYfMguzY4Z5fgg nYTb1kLW3qz5FvSkfDbqSZh8+cexPMhMz3DZ1Il80AO+jVmjgF9qAyrMujYDaOvYfD8qG3AtImR FGpBBUMJq1JAHKKLaqrwW3iFExOQS/w/ii2+NZwQODjzU1xnITkE4sLbgyuJkBeF4ePrBFPYQax KraFuf6EpEtw== X-Received: by 2002:a05:620a:4589:b0:7a2:1bd:416 with SMTP id af79cd13be357-7a674023b7fmr14810585a.25.1724177046295; Tue, 20 Aug 2024 11:04:06 -0700 (PDT) X-Google-Smtp-Source: AGHT+IGWTej9GlMfakgCkYwzPvtqvTKhCjqZClQW4Uh4uWP+27aOFhiMMkmDf7sfuA4llfaFAGL2dA== X-Received: by 2002:a05:620a:4589:b0:7a2:1bd:416 with SMTP id af79cd13be357-7a674023b7fmr14807185a.25.1724177045882; Tue, 20 Aug 2024 11:04:05 -0700 (PDT) Received: from cox.conference ([174.89.143.188]) by smtp.gmail.com with ESMTPSA id af79cd13be357-7a4ff051a59sm548493485a.39.2024.08.20.11.04.03 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 20 Aug 2024 11:04:03 -0700 (PDT) From: Philip Cox To: kernel-team@lists.ubuntu.com Subject: [SRU][n][PATCH 1/2] md: change the return value type of md_write_start to void Date: Tue, 20 Aug 2024 14:01:44 -0400 Message-Id: <20240820180145.1175010-2-philip.cox@canonical.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240820180145.1175010-1-philip.cox@canonical.com> References: <20240820180145.1175010-1-philip.cox@canonical.com> MIME-Version: 1.0 X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: "kernel-team" From: Li Nan BugLink: https://bugs.launchpad.net/bugs/2073695 Commit cc27b0c78c79 ("md: fix deadlock between mddev_suspend() and md_write_start()") aborted md_write_start() with false when mddev is suspended, which fixed a deadlock if calling mddev_suspend() with holding reconfig_mutex(). Since mddev_suspend() now includes lockdep_assert_not_held(), it no longer holds the reconfig_mutex. This makes previous abort unnecessary. Now, remove unnecessary abort and change function return value to void. Signed-off-by: Li Nan Reviewed-by: Yu Kuai Signed-off-by: Song Liu Link: https://lore.kernel.org/r/20240525185257.3896201-2-linan666@huaweicloud.com (cherry picked from commit 03e792eaf18ec2e93e2c623f9f1a4bdb97fe4126) Signed-off-by: Philip Cox --- drivers/md/md.c | 14 ++++---------- drivers/md/md.h | 2 +- drivers/md/raid1.c | 3 +-- drivers/md/raid10.c | 3 +-- drivers/md/raid5.c | 3 +-- 5 files changed, 8 insertions(+), 17 deletions(-) diff --git a/drivers/md/md.c b/drivers/md/md.c index f54012d68441..f2fec066ae4b 100644 --- a/drivers/md/md.c +++ b/drivers/md/md.c @@ -8622,12 +8622,12 @@ EXPORT_SYMBOL(md_done_sync); * A return value of 'false' means that the write wasn't recorded * and cannot proceed as the array is being suspend. */ -bool md_write_start(struct mddev *mddev, struct bio *bi) +void md_write_start(struct mddev *mddev, struct bio *bi) { int did_change = 0; if (bio_data_dir(bi) != WRITE) - return true; + return; BUG_ON(mddev->ro == MD_RDONLY); if (mddev->ro == MD_AUTO_READ) { @@ -8660,15 +8660,9 @@ bool md_write_start(struct mddev *mddev, struct bio *bi) if (did_change) sysfs_notify_dirent_safe(mddev->sysfs_state); if (!mddev->has_superblocks) - return true; + return; wait_event(mddev->sb_wait, - !test_bit(MD_SB_CHANGE_PENDING, &mddev->sb_flags) || - is_md_suspended(mddev)); - if (test_bit(MD_SB_CHANGE_PENDING, &mddev->sb_flags)) { - percpu_ref_put(&mddev->writes_pending); - return false; - } - return true; + !test_bit(MD_SB_CHANGE_PENDING, &mddev->sb_flags)); } EXPORT_SYMBOL(md_write_start); diff --git a/drivers/md/md.h b/drivers/md/md.h index 375ad4a2df71..0f8efdd1924b 100644 --- a/drivers/md/md.h +++ b/drivers/md/md.h @@ -774,7 +774,7 @@ extern void md_unregister_thread(struct mddev *mddev, struct md_thread __rcu **t extern void md_wakeup_thread(struct md_thread __rcu *thread); extern void md_check_recovery(struct mddev *mddev); extern void md_reap_sync_thread(struct mddev *mddev); -extern bool md_write_start(struct mddev *mddev, struct bio *bi); +extern void md_write_start(struct mddev *mddev, struct bio *bi); extern void md_write_inc(struct mddev *mddev, struct bio *bi); extern void md_write_end(struct mddev *mddev); extern void md_done_sync(struct mddev *mddev, int blocks, int ok); diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c index 4f3c35f1320d..a88453c175b1 100644 --- a/drivers/md/raid1.c +++ b/drivers/md/raid1.c @@ -1605,8 +1605,7 @@ static bool raid1_make_request(struct mddev *mddev, struct bio *bio) if (bio_data_dir(bio) == READ) raid1_read_request(mddev, bio, sectors, NULL); else { - if (!md_write_start(mddev,bio)) - return false; + md_write_start(mddev,bio); raid1_write_request(mddev, bio, sectors); } return true; diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c index a5f8419e2df1..a874c69143d7 100644 --- a/drivers/md/raid10.c +++ b/drivers/md/raid10.c @@ -1861,8 +1861,7 @@ static bool raid10_make_request(struct mddev *mddev, struct bio *bio) && md_flush_request(mddev, bio)) return true; - if (!md_write_start(mddev, bio)) - return false; + md_write_start(mddev, bio); if (unlikely(bio_op(bio) == REQ_OP_DISCARD)) if (!raid10_handle_discard(mddev, bio)) diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c index 40f951d0a752..0fa8cf58dd4d 100644 --- a/drivers/md/raid5.c +++ b/drivers/md/raid5.c @@ -6095,8 +6095,7 @@ static bool raid5_make_request(struct mddev *mddev, struct bio * bi) ctx.do_flush = bi->bi_opf & REQ_PREFLUSH; } - if (!md_write_start(mddev, bi)) - return false; + md_write_start(mddev, bi); /* * If array is degraded, better not do chunk aligned read because * later we might have to read it again in order to reconstruct From patchwork Tue Aug 20 18:01:45 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Philip Cox X-Patchwork-Id: 1974573 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=lists.ubuntu.com (client-ip=185.125.189.65; helo=lists.ubuntu.com; envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=patchwork.ozlabs.org) Received: from lists.ubuntu.com (lists.ubuntu.com [185.125.189.65]) (using TLSv1.2 with cipher ECDHE-ECDSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4WpHRD1zdsz1ygh for ; Wed, 21 Aug 2024 04:04:23 +1000 (AEST) Received: from localhost ([127.0.0.1] helo=lists.ubuntu.com) by lists.ubuntu.com with esmtp (Exim 4.86_2) (envelope-from ) id 1sgTDK-0001Dx-6r; Tue, 20 Aug 2024 18:04:14 +0000 Received: from smtp-relay-internal-0.internal ([10.131.114.225] helo=smtp-relay-internal-0.canonical.com) by lists.ubuntu.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1sgTDI-0001DW-Pl for kernel-team@lists.ubuntu.com; Tue, 20 Aug 2024 18:04:12 +0000 Received: from mail-qk1-f198.google.com (mail-qk1-f198.google.com [209.85.222.198]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by smtp-relay-internal-0.canonical.com (Postfix) with ESMTPS id 88D833F1EF for ; Tue, 20 Aug 2024 18:04:12 +0000 (UTC) Received: by mail-qk1-f198.google.com with SMTP id af79cd13be357-7a20d58da41so747510185a.1 for ; Tue, 20 Aug 2024 11:04:12 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1724177050; x=1724781850; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=7dagubXBEQAj1YhUV4taSnBNbxG/n6Xz8ABSEIVMLOA=; b=FOLPeQ5eG6kVrKTbE2teNHnog1EwYOlGp8p2LrHU/tqaV+j+1cTibvB7taPH0kSlpa oJVIXAOrKI6FbH3/a6yA3LNjg/hC5wM1RDY3Eme5Uq4t4sEVFmpNy+BLmQI3sg/gGGBF JQ7xTaZyZx2acKtMWCm7M/+dafzD5LqTEINxJDIomVIDDMdbHYYK9pBn/vivP+dxgNMI SYpYmBQ5jT5laBqhRE50kd/umWj/PdztSnDueSh8pV3qDrp3RhkyMjeBFbi66uwXUhY8 EJzo2ZLlKbgU6IdQHJGJY5VeNJwIJS+Wa7DgLlFIGcXFXcr5b5pi5ikjkVNmkKvzaAhr LVog== X-Gm-Message-State: AOJu0YyyVf92SM1P87ZqhXlEVkRATJ98TWrzid3U9+09Xjf00LvE2hM4 wYspeRBRvV2u+hXMRWTmbahEhejowls2OWf7pjuPpXfW393aXwhvi0OB7037jDhXzy4nlC1lZKb 9stkAx3ulowrO7sdKsaw/LJR7ZVY4fIDfw/4keSydVnysUjoDNzxXCMQ3LyTK6KT3Yf5CQ3dzyP Dt7kvUz6KJYg== X-Received: by 2002:a05:620a:f03:b0:7a3:49dc:f084 with SMTP id af79cd13be357-7a67428edb3mr3827385a.35.1724177049649; Tue, 20 Aug 2024 11:04:09 -0700 (PDT) X-Google-Smtp-Source: AGHT+IHDEwE4btibqW6aX2Q8GIyFdxGbiRKVCfY3HIMm1BNSu1b2pelf/TqZTlUMT0q60awJCSlsWw== X-Received: by 2002:a05:620a:f03:b0:7a3:49dc:f084 with SMTP id af79cd13be357-7a67428edb3mr3822785a.35.1724177048945; Tue, 20 Aug 2024 11:04:08 -0700 (PDT) Received: from cox.conference ([174.89.143.188]) by smtp.gmail.com with ESMTPSA id af79cd13be357-7a4ff051a59sm548493485a.39.2024.08.20.11.04.06 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 20 Aug 2024 11:04:06 -0700 (PDT) From: Philip Cox To: kernel-team@lists.ubuntu.com Subject: [SRU][n][PATCH 2/2] md: fix deadlock between mddev_suspend and flush bio Date: Tue, 20 Aug 2024 14:01:45 -0400 Message-Id: <20240820180145.1175010-3-philip.cox@canonical.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240820180145.1175010-1-philip.cox@canonical.com> References: <20240820180145.1175010-1-philip.cox@canonical.com> MIME-Version: 1.0 X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: "kernel-team" From: Li Nan BugLink: https://bugs.launchpad.net/bugs/2073695 Deadlock occurs when mddev is being suspended while some flush bio is in progress. It is a complex issue. T1. the first flush is at the ending stage, it clears 'mddev->flush_bio' and tries to submit data, but is blocked because mddev is suspended by T4. T2. the second flush sets 'mddev->flush_bio', and attempts to queue md_submit_flush_data(), which is already running (T1) and won't execute again if on the same CPU as T1. T3. the third flush inc active_io and tries to flush, but is blocked because 'mddev->flush_bio' is not NULL (set by T2). T4. mddev_suspend() is called and waits for active_io dec to 0 which is inc by T3. T1 T2 T3 T4 (flush 1) (flush 2) (third 3) (suspend) md_submit_flush_data mddev->flush_bio = NULL; . . md_flush_request . mddev->flush_bio = bio . queue submit_flushes . . . . md_handle_request . . active_io + 1 . . md_flush_request . . wait !mddev->flush_bio . . . . mddev_suspend . . wait !active_io . . . submit_flushes . queue_work md_submit_flush_data . //md_submit_flush_data is already running (T1) . md_handle_request wait resume The root issue is non-atomic inc/dec of active_io during flush process. active_io is dec before md_submit_flush_data is queued, and inc soon after md_submit_flush_data() run. md_flush_request active_io + 1 submit_flushes active_io - 1 md_submit_flush_data md_handle_request active_io + 1 make_request active_io - 1 If active_io is dec after md_handle_request() instead of within submit_flushes(), make_request() can be called directly intead of md_handle_request() in md_submit_flush_data(), and active_io will only inc and dec once in the whole flush process. Deadlock will be fixed. Additionally, the only difference between fixing the issue and before is that there is no return error handling of make_request(). But after previous patch cleaned md_write_start(), make_requst() only return error in raid5_make_request() by dm-raid, see commit 41425f96d7aa ("dm-raid456, md/raid456: fix a deadlock for dm-raid456 while io concurrent with reshape)". Since dm always splits data and flush operation into two separate io, io size of flush submitted by dm always is 0, make_request() will not be called in md_submit_flush_data(). To prevent future modifications from introducing issues, add WARN_ON to ensure make_request() no error is returned in this context. Fixes: fa2bbff7b0b4 ("md: synchronize flush io with array reconfiguration") Signed-off-by: Li Nan Signed-off-by: Song Liu Link: https://lore.kernel.org/r/20240525185257.3896201-3-linan666@huaweicloud.com (cherry picked from commit 611d5cbc0b35a752e657a83eebadf40d814d006b) Signed-off-by: Philip Cox --- drivers/md/md.c | 26 +++++++++++++++----------- 1 file changed, 15 insertions(+), 11 deletions(-) diff --git a/drivers/md/md.c b/drivers/md/md.c index f2fec066ae4b..9b276871a0e6 100644 --- a/drivers/md/md.c +++ b/drivers/md/md.c @@ -533,13 +533,9 @@ static void md_end_flush(struct bio *bio) rdev_dec_pending(rdev, mddev); - if (atomic_dec_and_test(&mddev->flush_pending)) { - /* The pair is percpu_ref_get() from md_flush_request() */ - percpu_ref_put(&mddev->active_io); - + if (atomic_dec_and_test(&mddev->flush_pending)) /* The pre-request flush has finished */ queue_work(md_wq, &mddev->flush_work); - } } static void md_submit_flush_data(struct work_struct *ws); @@ -570,12 +566,8 @@ static void submit_flushes(struct work_struct *ws) rcu_read_lock(); } rcu_read_unlock(); - if (atomic_dec_and_test(&mddev->flush_pending)) { - /* The pair is percpu_ref_get() from md_flush_request() */ - percpu_ref_put(&mddev->active_io); - + if (atomic_dec_and_test(&mddev->flush_pending)) queue_work(md_wq, &mddev->flush_work); - } } static void md_submit_flush_data(struct work_struct *ws) @@ -600,8 +592,20 @@ static void md_submit_flush_data(struct work_struct *ws) bio_endio(bio); } else { bio->bi_opf &= ~REQ_PREFLUSH; - md_handle_request(mddev, bio); + + /* + * make_requst() will never return error here, it only + * returns error in raid5_make_request() by dm-raid. + * Since dm always splits data and flush operation into + * two separate io, io size of flush submitted by dm + * always is 0, make_request() will not be called here. + */ + if (WARN_ON_ONCE(!mddev->pers->make_request(mddev, bio))) + bio_io_error(bio);; } + + /* The pair is percpu_ref_get() from md_flush_request() */ + percpu_ref_put(&mddev->active_io); } /*