ubi: Fix deadlock caused by recursively holding work_sem

Message ID	20230304014141.2099204-1-wangzhaolong1@huawei.com
State	Accepted
Headers	show Return-Path: <linux-mtd-bounces+incoming=patchwork.ozlabs.org@lists.infradead.org> From: ZhaoLong Wang <wangzhaolong1@huawei.com> To: <richard@nod.at>, <miquel.raynal@bootlin.com>, <vigneshr@ti.com> CC: <linux-mtd@lists.infradead.org>, <linux-kernel@vger.kernel.org>, <wangzhaolong1@huawei.com>, <yi.zhang@huawei.com> Subject: [PATCH] ubi: Fix deadlock caused by recursively holding work_sem Date: Sat, 4 Mar 2023 09:41:41 +0800 Message-ID: <20230304014141.2099204-1-wangzhaolong1@huawei.com> MIME-Version: 1.0 preview: During the processing of the bgt, if the sync_erase() return -EBUSY or some other error code in __erase_worker(),schedule_erase() called again lead to the down_read(ubi->work_sem) hold twice and may g [...] Content analysis details: (-2.3 points, 5.0 required) pts rule name description ---- ---------------------- -------------------------------------------------- -2.3 RCVD_IN_DNSWL_MED RBL: Sender listed at https://www.dnswl.org/, medium trust [45.249.212.188 listed in list.dnswl.org] -0.0 SPF_PASS SPF: sender matches SPF record 0.0 SPF_HELO_NONE SPF: HELO does not publish an SPF Record Precedence: list Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-mtd" <linux-mtd-bounces@lists.infradead.org> Errors-To: linux-mtd-bounces+incoming=patchwork.ozlabs.org@lists.infradead.org
Series	ubi: Fix deadlock caused by recursively holding work_sem \| expand ubi: Fix deadlock caused by recursively holding work_sem

Message ID

20230304014141.2099204-1-wangzhaolong1@huawei.com

State

Accepted

Headers

From: ZhaoLong Wang <wangzhaolong1@huawei.com>
To: <richard@nod.at>, <miquel.raynal@bootlin.com>, <vigneshr@ti.com>
CC: <linux-mtd@lists.infradead.org>, <linux-kernel@vger.kernel.org>,
	<wangzhaolong1@huawei.com>, <yi.zhang@huawei.com>
Subject: [PATCH] ubi: Fix deadlock caused by recursively holding work_sem
Date: Sat, 4 Mar 2023 09:41:41 +0800
Message-ID: <20230304014141.2099204-1-wangzhaolong1@huawei.com>
MIME-Version: 1.0
Precedence: list
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Sender: "linux-mtd" <linux-mtd-bounces@lists.infradead.org>
Errors-To: linux-mtd-bounces+incoming=patchwork.ozlabs.org@lists.infradead.org

Series

ubi: Fix deadlock caused by recursively holding work_sem | expand

Commit Message

Wang Zhaolong March 4, 2023, 1:41 a.m. UTC

During the processing of the bgt, if the sync_erase() return -EBUSY
or some other error code in __erase_worker(),schedule_erase() called
again lead to the down_read(ubi->work_sem) hold twice and may get
block by down_write(ubi->work_sem) in ubi_update_fastmap(),
which cause deadlock.

          ubi bgt                        other task
 do_work
  down_read(&ubi->work_sem)          ubi_update_fastmap
  erase_worker                         # Blocked by down_read
   __erase_worker                      down_write(&ubi->work_sem)
    schedule_erase
     schedule_ubi_work
      down_read(&ubi->work_sem)

Fix this by changing input parameter @nested of the schedule_erase() to
'true' to avoid recursively acquiring the down_read(&ubi->work_sem).

Also, fix the incorrect comment about @nested parameter of the
schedule_erase() because when down_write(ubi->work_sem) is held, the
@nested is also need be true.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=217093
Fixes: 2e8f08deabbc ("ubi: Fix races around ubi_refill_pools()")
Signed-off-by: ZhaoLong Wang <wangzhaolong1@huawei.com>
---
 drivers/mtd/ubi/wl.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Comments

Zhihao Cheng March 4, 2023, 2:32 a.m. UTC | #1

> During the processing of the bgt, if the sync_erase() return -EBUSY
> or some other error code in __erase_worker(),schedule_erase() called
> again lead to the down_read(ubi->work_sem) hold twice and may get
> block by down_write(ubi->work_sem) in ubi_update_fastmap(),
> which cause deadlock.
> 
>            ubi bgt                        other task
>   do_work
>    down_read(&ubi->work_sem)          ubi_update_fastmap
>    erase_worker                         # Blocked by down_read
>     __erase_worker                      down_write(&ubi->work_sem)
>      schedule_erase
>       schedule_ubi_work
>        down_read(&ubi->work_sem)
> 
> Fix this by changing input parameter @nested of the schedule_erase() to
> 'true' to avoid recursively acquiring the down_read(&ubi->work_sem).
> 
> Also, fix the incorrect comment about @nested parameter of the
> schedule_erase() because when down_write(ubi->work_sem) is held, the
> @nested is also need be true.
> 
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=217093
> Fixes: 2e8f08deabbc ("ubi: Fix races around ubi_refill_pools()")
> Signed-off-by: ZhaoLong Wang <wangzhaolong1@huawei.com>
> ---
>   drivers/mtd/ubi/wl.c | 4 ++--
>   1 file changed, 2 insertions(+), 2 deletions(-)

Reviewed-by: Zhihao Cheng <chengzhihao1@huawei.com>
> 
> diff --git a/drivers/mtd/ubi/wl.c b/drivers/mtd/ubi/wl.c
> index 40f39e5d6dfc..26a214f016c1 100644
> --- a/drivers/mtd/ubi/wl.c
> +++ b/drivers/mtd/ubi/wl.c
> @@ -575,7 +575,7 @@ static int erase_worker(struct ubi_device *ubi, struct ubi_work *wl_wrk,
>    * @vol_id: the volume ID that last used this PEB
>    * @lnum: the last used logical eraseblock number for the PEB
>    * @torture: if the physical eraseblock has to be tortured
> - * @nested: denotes whether the work_sem is already held in read mode
> + * @nested: denotes whether the work_sem is already held
>    *
>    * This function returns zero in case of success and a %-ENOMEM in case of
>    * failure.
> @@ -1131,7 +1131,7 @@ static int __erase_worker(struct ubi_device *ubi, struct ubi_work *wl_wrk)
>   		int err1;
>   
>   		/* Re-schedule the LEB for erasure */
> -		err1 = schedule_erase(ubi, e, vol_id, lnum, 0, false);
> +		err1 = schedule_erase(ubi, e, vol_id, lnum, 0, true);
>   		if (err1) {
>   			spin_lock(&ubi->wl_lock);
>   			wl_entry_destroy(ubi, e);
>

Richard Weinberger March 4, 2023, 4:59 p.m. UTC | #2

----- Ursprüngliche Mail -----
> Von: "chengzhihao1" <chengzhihao1@huawei.com>
>> During the processing of the bgt, if the sync_erase() return -EBUSY
>> or some other error code in __erase_worker(),schedule_erase() called
>> again lead to the down_read(ubi->work_sem) hold twice and may get
>> block by down_write(ubi->work_sem) in ubi_update_fastmap(),
>> which cause deadlock.
>> 
>>            ubi bgt                        other task
>>   do_work
>>    down_read(&ubi->work_sem)          ubi_update_fastmap
>>    erase_worker                         # Blocked by down_read
>>     __erase_worker                      down_write(&ubi->work_sem)
>>      schedule_erase
>>       schedule_ubi_work
>>        down_read(&ubi->work_sem)
>> 
>> Fix this by changing input parameter @nested of the schedule_erase() to
>> 'true' to avoid recursively acquiring the down_read(&ubi->work_sem).
>> 
>> Also, fix the incorrect comment about @nested parameter of the
>> schedule_erase() because when down_write(ubi->work_sem) is held, the
>> @nested is also need be true.
>> 
>> Link: https://bugzilla.kernel.org/show_bug.cgi?id=217093
>> Fixes: 2e8f08deabbc ("ubi: Fix races around ubi_refill_pools()")
>> Signed-off-by: ZhaoLong Wang <wangzhaolong1@huawei.com>
>> ---
>>   drivers/mtd/ubi/wl.c | 4 ++--
>>   1 file changed, 2 insertions(+), 2 deletions(-)
> 
> Reviewed-by: Zhihao Cheng <chengzhihao1@huawei.com>

Applied to -next. Thanks everyone!

Thanks,
//richard

diff --git a/drivers/mtd/ubi/wl.c b/drivers/mtd/ubi/wl.c
index 40f39e5d6dfc..26a214f016c1 100644
--- a/drivers/mtd/ubi/wl.c
+++ b/drivers/mtd/ubi/wl.c
@@ -575,7 +575,7 @@  static int erase_worker(struct ubi_device *ubi, struct ubi_work *wl_wrk,
  * @vol_id: the volume ID that last used this PEB
  * @lnum: the last used logical eraseblock number for the PEB
  * @torture: if the physical eraseblock has to be tortured
- * @nested: denotes whether the work_sem is already held in read mode
+ * @nested: denotes whether the work_sem is already held
  *
  * This function returns zero in case of success and a %-ENOMEM in case of
  * failure.
@@ -1131,7 +1131,7 @@  static int __erase_worker(struct ubi_device *ubi, struct ubi_work *wl_wrk)
 		int err1;
 
 		/* Re-schedule the LEB for erasure */
-		err1 = schedule_erase(ubi, e, vol_id, lnum, 0, false);
+		err1 = schedule_erase(ubi, e, vol_id, lnum, 0, true);
 		if (err1) {
 			spin_lock(&ubi->wl_lock);
 			wl_entry_destroy(ubi, e);

ubi: Fix deadlock caused by recursively holding work_sem

Commit Message

Comments

Patch