From patchwork Mon Nov 20 11:13:45 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zhihao Cheng X-Patchwork-Id: 1865851 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; secure) header.d=lists.infradead.org header.i=@lists.infradead.org header.a=rsa-sha256 header.s=bombadil.20210309 header.b=VJJdyaBi; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=none (no SPF record) smtp.mailfrom=lists.infradead.org (client-ip=2607:7c80:54:3::133; helo=bombadil.infradead.org; envelope-from=linux-mtd-bounces+incoming=patchwork.ozlabs.org@lists.infradead.org; receiver=patchwork.ozlabs.org) Received: from bombadil.infradead.org (bombadil.infradead.org [IPv6:2607:7c80:54:3::133]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4SYXp52wzDz1ySQ for ; Mon, 20 Nov 2023 14:20:44 +1100 (AEDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:MIME-Version:References:In-Reply-To: Message-ID:Date:Subject:CC:To:From:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=Xsmhnn5iwJqvOgnsAfhkeytq5nLy112e5BJbRpwpx6E=; b=VJJdyaBiaDPp8U /z1IvYSP+WoHmpIXCOWHW+P0NTeDE9PixvrsRjAWW+SPNEcBK2XirjA9yOmUCHfwTbJmDmqbYqTbw 6Uq2KJKtE4ay70OvUtlzWM8xjK/sgZhZjz/2oRdyZw7mA7x6YIXlg7me7q4ItDultbE3dVO3/diyn c1dwX76BZsaH926gj2kf5D640hDU3zbf+2ePT8dmN4tJnppphP9THA/5GCFeMuoAdXlcjKdolFeOY TN8y4btYhLa9ErMSvH0e+OG7WKjMYCSuLfjK7dpzrqV6T9x+73ogNUedzn0Shl9DTykcfpNP7T9nh /TwHWEeYTnD+EeI9Wsag==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.96 #2 (Red Hat Linux)) id 1r4upT-00B0sI-0s; Mon, 20 Nov 2023 03:20:07 +0000 Received: from szxga02-in.huawei.com ([45.249.212.188]) by bombadil.infradead.org with esmtps (Exim 4.96 #2 (Red Hat Linux)) id 1r4upO-00B0os-2S for linux-mtd@lists.infradead.org; Mon, 20 Nov 2023 03:20:04 +0000 Received: from kwepemm000013.china.huawei.com (unknown [172.30.72.55]) by szxga02-in.huawei.com (SkyGuard) with ESMTP id 4SYXmQ11N8zWhYP; Mon, 20 Nov 2023 11:19:18 +0800 (CST) Received: from huawei.com (10.175.104.67) by kwepemm000013.china.huawei.com (7.193.23.81) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.35; Mon, 20 Nov 2023 11:19:47 +0800 From: Zhihao Cheng To: , , , , CC: , Subject: [PATCH 1/3] ubifs: Check @c->dirty_[n|p]n_cnt and @c->nroot state under @c->lp_mutex Date: Mon, 20 Nov 2023 19:13:45 +0800 Message-ID: <20231120111347.2254153-2-chengzhihao1@huawei.com> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20231120111347.2254153-1-chengzhihao1@huawei.com> References: <20231120111347.2254153-1-chengzhihao1@huawei.com> MIME-Version: 1.0 X-Originating-IP: [10.175.104.67] X-ClientProxiedBy: dggems703-chm.china.huawei.com (10.3.19.180) To kwepemm000013.china.huawei.com (7.193.23.81) X-CFilter-Loop: Reflected X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20231119_192002_984182_D5D41BF4 X-CRM114-Status: GOOD ( 12.85 ) X-Spam-Score: -2.3 (--) X-Spam-Report: Spam detection software, running on the system "bombadil.infradead.org", has NOT identified this incoming email as spam. The original message has been attached to this so you can view it or label similar future email. If you have any questions, see the administrator of that system for details. Content preview: The checking of @c->nroot->flags and @c->dirty_[n|p]n_cnt in function nothing_to_commit() is not atomic, which could be raced with modifying of lpt, for example: P1 P2 P3 run_gc ubifs_garbage_collect [...] Content analysis details: (-2.3 points, 5.0 required) pts rule name description ---- ---------------------- -------------------------------------------------- 0.0 RCVD_IN_MSPIKE_H5 RBL: Excellent reputation (+5) [45.249.212.188 listed in wl.mailspike.net] -2.3 RCVD_IN_DNSWL_MED RBL: Sender listed at https://www.dnswl.org/, medium trust [45.249.212.188 listed in list.dnswl.org] -0.0 SPF_PASS SPF: sender matches SPF record 0.0 SPF_HELO_NONE SPF: HELO does not publish an SPF Record 0.0 DATE_IN_FUTURE_06_12 Date: is 6 to 12 hours after Received: date 0.0 RCVD_IN_MSPIKE_WL Mailspike good senders X-BeenThere: linux-mtd@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Linux MTD discussion mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-mtd" Errors-To: linux-mtd-bounces+incoming=patchwork.ozlabs.org@lists.infradead.org The checking of @c->nroot->flags and @c->dirty_[n|p]n_cnt in function nothing_to_commit() is not atomic, which could be raced with modifying of lpt, for example: P1 P2 P3 run_gc ubifs_garbage_collect do_commit ubifs_return_leb ubifs_lpt_lookup_dirty dirty_cow_nnode do_commit nothing_to_commit if (test_bit(DIRTY_CNODE, &c->nroot->flags) // false test_and_set_bit(DIRTY_CNODE, &nnode->flags) c->dirty_nn_cnt += 1 ubifs_assert(c, c->dirty_nn_cnt == 0) // false ! Fetch a reproducer in Link: UBIFS error (ubi0:0 pid 2747): ubifs_assert_failed UBIFS assert failed: c->dirty_pn_cnt == 0, in fs/ubifs/commit.c Call Trace: ubifs_ro_mode+0x58/0x70 [ubifs] ubifs_assert_failed+0x6a/0x90 [ubifs] do_commit+0x5b7/0x930 [ubifs] ubifs_run_commit+0xc6/0x1a0 [ubifs] ubifs_sync_fs+0xd8/0x110 [ubifs] sync_filesystem+0xb4/0x120 do_syscall_64+0x6f/0x140 Fix it by checking @c->dirty_[n|p]n_cnt and @c->nroot state with @c->lp_mutex locked. Fixes: 944fdef52ca9 ("UBIFS: do not start the commit if there is nothing to commit") Link: https://bugzilla.kernel.org/show_bug.cgi?id=218162 Signed-off-by: Zhihao Cheng --- fs/ubifs/commit.c | 13 ++++++++++++- 1 file changed, 12 insertions(+), 1 deletion(-) diff --git a/fs/ubifs/commit.c b/fs/ubifs/commit.c index c4fc1047fc07..5b3a840098b0 100644 --- a/fs/ubifs/commit.c +++ b/fs/ubifs/commit.c @@ -69,6 +69,14 @@ static int nothing_to_commit(struct ubifs_info *c) if (c->zroot.znode && ubifs_zn_dirty(c->zroot.znode)) return 0; + /* + * Increasing @c->dirty_pn_cnt/@c->dirty_nn_cnt and marking + * nnodes/pnodes as dirty in run_gc() could race with following + * checking, which leads inconsistent states between @c->nroot + * and @c->dirty_pn_cnt/@c->dirty_nn_cnt, holding @c->lp_mutex + * to avoid that. + */ + mutex_lock(&c->lp_mutex); /* * Even though the TNC is clean, the LPT tree may have dirty nodes. For * example, this may happen if the budgeting subsystem invoked GC to @@ -76,12 +84,15 @@ static int nothing_to_commit(struct ubifs_info *c) * free space. In this case GC would just change the lprops of this * LEB (by turning all space into free space) and unmap it. */ - if (c->nroot && test_bit(DIRTY_CNODE, &c->nroot->flags)) + if (c->nroot && test_bit(DIRTY_CNODE, &c->nroot->flags)) { + mutex_unlock(&c->lp_mutex); return 0; + } ubifs_assert(c, atomic_long_read(&c->dirty_zn_cnt) == 0); ubifs_assert(c, c->dirty_pn_cnt == 0); ubifs_assert(c, c->dirty_nn_cnt == 0); + mutex_unlock(&c->lp_mutex); return 1; } From patchwork Mon Nov 20 11:13:46 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Zhihao Cheng X-Patchwork-Id: 1865853 X-Patchwork-Delegate: richard@nod.at Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; secure) header.d=lists.infradead.org header.i=@lists.infradead.org header.a=rsa-sha256 header.s=bombadil.20210309 header.b=BMbtmyOY; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=none (no SPF record) smtp.mailfrom=lists.infradead.org (client-ip=2607:7c80:54:3::133; helo=bombadil.infradead.org; envelope-from=linux-mtd-bounces+incoming=patchwork.ozlabs.org@lists.infradead.org; receiver=patchwork.ozlabs.org) Received: from bombadil.infradead.org (bombadil.infradead.org [IPv6:2607:7c80:54:3::133]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4SYXp537rrz1ySR for ; Mon, 20 Nov 2023 14:20:43 +1100 (AEDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:MIME-Version:References:In-Reply-To: Message-ID:Date:Subject:CC:To:From:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=+TPACNIo8eePDKHNOMpzKDX8x6PIPJObj2Q/GZe+Omw=; b=BMbtmyOYtHBSCk hdmZ2yOYUtv+2LVUCB+WT6x3dhMWMBlJlbYo2T3LbDV9DxR9jYQB8ahQVZn0mnPcPwSppT8JlYAve 8rWYJn6Uo8BsaYZo3X31/Z/FtYn3JjCB472YuRKHjfcOpLNoh/DQIax1cJ1oBr2C3PYDqCJX2kVil GrVJWrMapEhHDvE4TtQBuZFS7MgDTi9abLS7xdu9PEubrvmWUa6rynALZbwkhpaVHqSlIgoGKBSeN t6+sybMnix2aoc4DFOxTnneoi60zPepMcXcejj/CjMxQMpRI3yuRCDtCnKrgwvuMBVerzbg74Mb4a XsXs5r6ZOzIJUyUeOJAQ==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.96 #2 (Red Hat Linux)) id 1r4upU-00B0sc-0i; Mon, 20 Nov 2023 03:20:08 +0000 Received: from szxga08-in.huawei.com ([45.249.212.255]) by bombadil.infradead.org with esmtps (Exim 4.96 #2 (Red Hat Linux)) id 1r4upO-00B0ou-1r for linux-mtd@lists.infradead.org; Mon, 20 Nov 2023 03:20:05 +0000 Received: from kwepemm000013.china.huawei.com (unknown [172.30.72.53]) by szxga08-in.huawei.com (SkyGuard) with ESMTP id 4SYXj30Chkz1P6j8; Mon, 20 Nov 2023 11:16:23 +0800 (CST) Received: from huawei.com (10.175.104.67) by kwepemm000013.china.huawei.com (7.193.23.81) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.35; Mon, 20 Nov 2023 11:19:48 +0800 From: Zhihao Cheng To: , , , , CC: , Subject: [PATCH 2/3] ubifs: ubifs_tnc_locate: Drop TNC mutex lockless reading path Date: Mon, 20 Nov 2023 19:13:46 +0800 Message-ID: <20231120111347.2254153-3-chengzhihao1@huawei.com> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20231120111347.2254153-1-chengzhihao1@huawei.com> References: <20231120111347.2254153-1-chengzhihao1@huawei.com> MIME-Version: 1.0 X-Originating-IP: [10.175.104.67] X-ClientProxiedBy: dggems703-chm.china.huawei.com (10.3.19.180) To kwepemm000013.china.huawei.com (7.193.23.81) X-CFilter-Loop: Reflected X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20231119_192003_001180_8D8AC18C X-CRM114-Status: GOOD ( 12.33 ) X-Spam-Score: -2.3 (--) X-Spam-Report: Spam detection software, running on the system "bombadil.infradead.org", has NOT identified this incoming email as spam. The original message has been attached to this so you can view it or label similar future email. If you have any questions, see the administrator of that system for details. Content preview: This problem is described in [1], and Tao has proposed a solution in [2]. There exists a race between node reading and gc, which is shown as: P1 P2 ubifs_tnc_locate zbr.lnum = lnum_a GC // node is mov [...] Content analysis details: (-2.3 points, 5.0 required) pts rule name description ---- ---------------------- -------------------------------------------------- -2.3 RCVD_IN_DNSWL_MED RBL: Sender listed at https://www.dnswl.org/, medium trust [45.249.212.255 listed in list.dnswl.org] -0.0 SPF_PASS SPF: sender matches SPF record 0.0 SPF_HELO_NONE SPF: HELO does not publish an SPF Record 0.0 DATE_IN_FUTURE_06_12 Date: is 6 to 12 hours after Received: date 0.0 RCVD_IN_MSPIKE_H5 RBL: Excellent reputation (+5) [45.249.212.255 listed in wl.mailspike.net] 0.0 RCVD_IN_MSPIKE_WL Mailspike good senders X-BeenThere: linux-mtd@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Linux MTD discussion mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-mtd" Errors-To: linux-mtd-bounces+incoming=patchwork.ozlabs.org@lists.infradead.org This problem is described in [1], and Tao has proposed a solution in [2]. There exists a race between node reading and gc, which is shown as: P1 P2 ubifs_tnc_locate zbr.lnum = lnum_a GC // node is moved from lnum_a to lnum_b journal head switching // lnum_a becomes a bud ubifs_get_wbuf(c, zbr.lnum) // true ubifs_tnc_read_node ubifs_read_node_wbuf // read data from lnum_a check node failed ! There are two ways of reading node(See ubifs_tnc_read_node()): 1. Reading from wbuf. The node is written in wbuf(in mem), and wbuf is not written back to flash. 2. Otherwise, reading from flash. In most cases, ubifs_tnc_read_node() is invoked with TNC mutex locked, but except the lockless path in ubifs_tnc_locate() which is imported by commit 601c0bc46753("UBIFS: allow for racing between GC and TNC"). Function ubifs_tnc_locate() is mainly used for path lookup and file reading, VFS has inode/dentry/page cache for multiple times reading, the lockless optimization only works for first reading. Based on the discussion in [2], this patch simply drops the TNC mutex lockless reading path in ubifs_tnc_locate(). Fetch a reproducer in [3]. [1] https://lore.kernel.org/all/fda84926-09d1-1fc7-4b78-99e0d04508bc@huawei.com/T/ [2] https://lore.kernel.org/linux-mtd/20200305092205.127758-1-houtao1@huawei.com/ [3] https://bugzilla.kernel.org/show_bug.cgi?id=218163 Fixes: 601c0bc46753 ("UBIFS: allow for racing between GC and TNC") Fixes: 1e51764a3c2a ("UBIFS: add new flash file system") Reported-by: 李傲傲 (Carson Li1/9542) Analyzed-by: Hou Tao Signed-off-by: Zhihao Cheng --- fs/ubifs/tnc.c | 31 +++---------------------------- 1 file changed, 3 insertions(+), 28 deletions(-) diff --git a/fs/ubifs/tnc.c b/fs/ubifs/tnc.c index f4728e65d1bd..7b7d75ed3ec7 100644 --- a/fs/ubifs/tnc.c +++ b/fs/ubifs/tnc.c @@ -1478,11 +1478,10 @@ static int maybe_leb_gced(struct ubifs_info *c, int lnum, int gc_seq1) int ubifs_tnc_locate(struct ubifs_info *c, const union ubifs_key *key, void *node, int *lnum, int *offs) { - int found, n, err, safely = 0, gc_seq1; + int found, n, err; struct ubifs_znode *znode; - struct ubifs_zbranch zbr, *zt; + struct ubifs_zbranch *zt; -again: mutex_lock(&c->tnc_mutex); found = ubifs_lookup_level0(c, key, &znode, &n); if (!found) { @@ -1505,31 +1504,7 @@ int ubifs_tnc_locate(struct ubifs_info *c, const union ubifs_key *key, err = tnc_read_hashed_node(c, zt, node); goto out; } - if (safely) { - err = ubifs_tnc_read_node(c, zt, node); - goto out; - } - /* Drop the TNC mutex prematurely and race with garbage collection */ - zbr = znode->zbranch[n]; - gc_seq1 = c->gc_seq; - mutex_unlock(&c->tnc_mutex); - - if (ubifs_get_wbuf(c, zbr.lnum)) { - /* We do not GC journal heads */ - err = ubifs_tnc_read_node(c, &zbr, node); - return err; - } - - err = fallible_read_node(c, key, &zbr, node); - if (err <= 0 || maybe_leb_gced(c, zbr.lnum, gc_seq1)) { - /* - * The node may have been GC'ed out from under us so try again - * while keeping the TNC mutex locked. - */ - safely = 1; - goto again; - } - return 0; + err = ubifs_tnc_read_node(c, zt, node); out: mutex_unlock(&c->tnc_mutex); From patchwork Mon Nov 20 11:13:47 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zhihao Cheng X-Patchwork-Id: 1865854 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; secure) header.d=lists.infradead.org header.i=@lists.infradead.org header.a=rsa-sha256 header.s=bombadil.20210309 header.b=LIW75DbW; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=none (no SPF record) smtp.mailfrom=lists.infradead.org (client-ip=2607:7c80:54:3::133; helo=bombadil.infradead.org; envelope-from=linux-mtd-bounces+incoming=patchwork.ozlabs.org@lists.infradead.org; receiver=patchwork.ozlabs.org) Received: from bombadil.infradead.org (bombadil.infradead.org [IPv6:2607:7c80:54:3::133]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4SYXp52kfmz1ySN for ; Mon, 20 Nov 2023 14:20:43 +1100 (AEDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:MIME-Version:References:In-Reply-To: Message-ID:Date:Subject:CC:To:From:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=1l8qBtE+Fnkbdb1pu4F1PAzSCrMOmrZ8O5FEusgjyJs=; b=LIW75DbWyqmv6m pPtTNBIjn4moCItqDH5e65zccEWNMB+t5ACkI3wyvrhnHYQ6YFwzJEuJ54dseCJw1LIgXB9LEilsY SgrK9Y+B54TNzZpkMKBWr2fMVoSJpwCbrisewTC8e+OxGBlnEWcsuakgV3Qe+jPgYQPxNh7Xi2e6k eEeoVmSSQUUm2jpti551lfs0Lw3dU6Tr3dziSjdgN1dykonUHfDSL6y9H7zXe8UT6JYLbEoOxUAQF auki2g9fAs806Fz7NVRtM22PCnsUfrHdguSLK2vhP3jICrdmdmMQgjwoYPxlAnknLWDET50/jorhr DqEpfz9ACDjN+guaUEQA==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.96 #2 (Red Hat Linux)) id 1r4upW-00B0sn-0y; Mon, 20 Nov 2023 03:20:10 +0000 Received: from szxga01-in.huawei.com ([45.249.212.187]) by bombadil.infradead.org with esmtps (Exim 4.96 #2 (Red Hat Linux)) id 1r4upQ-00B0p3-1g for linux-mtd@lists.infradead.org; Mon, 20 Nov 2023 03:20:07 +0000 Received: from kwepemm000013.china.huawei.com (unknown [172.30.72.53]) by szxga01-in.huawei.com (SkyGuard) with ESMTP id 4SYXj26WJTzsRH6; Mon, 20 Nov 2023 11:16:22 +0800 (CST) Received: from huawei.com (10.175.104.67) by kwepemm000013.china.huawei.com (7.193.23.81) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.35; Mon, 20 Nov 2023 11:19:48 +0800 From: Zhihao Cheng To: , , , , CC: , Subject: [PATCH 3/3] ubifs: Don't stop retrying even if committing times exceeds two Date: Mon, 20 Nov 2023 19:13:47 +0800 Message-ID: <20231120111347.2254153-4-chengzhihao1@huawei.com> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20231120111347.2254153-1-chengzhihao1@huawei.com> References: <20231120111347.2254153-1-chengzhihao1@huawei.com> MIME-Version: 1.0 X-Originating-IP: [10.175.104.67] X-ClientProxiedBy: dggems703-chm.china.huawei.com (10.3.19.180) To kwepemm000013.china.huawei.com (7.193.23.81) X-CFilter-Loop: Reflected X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20231119_192005_058656_0BAD0861 X-CRM114-Status: GOOD ( 19.25 ) X-Spam-Score: -2.3 (--) X-Spam-Report: Spam detection software, running on the system "bombadil.infradead.org", has NOT identified this incoming email as spam. The original message has been attached to this so you can view it or label similar future email. If you have any questions, see the administrator of that system for details. Content preview: Recently we catched ENOSPC returned by make_reservation() while doing fsstress on UBIFS, we got following information when it occurred (See details in Link): UBIFS error (ubi0:0 pid 3640152): make_reservation [ubifs]: cannot reserve 112 bytes in jhead 2, error -28 CPU: 2 PID: 3640152 Comm: kworker/u16:2 Tainted: G B W Hardware name: Hisilicon PhosphorHi12 [...] Content analysis details: (-2.3 points, 5.0 required) pts rule name description ---- ---------------------- -------------------------------------------------- -2.3 RCVD_IN_DNSWL_MED RBL: Sender listed at https://www.dnswl.org/, medium trust [45.249.212.187 listed in list.dnswl.org] -0.0 SPF_PASS SPF: sender matches SPF record 0.0 SPF_HELO_NONE SPF: HELO does not publish an SPF Record 0.0 DATE_IN_FUTURE_06_12 Date: is 6 to 12 hours after Received: date 0.0 RCVD_IN_MSPIKE_H5 RBL: Excellent reputation (+5) [45.249.212.187 listed in wl.mailspike.net] 0.0 RCVD_IN_MSPIKE_WL Mailspike good senders X-BeenThere: linux-mtd@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Linux MTD discussion mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-mtd" Errors-To: linux-mtd-bounces+incoming=patchwork.ozlabs.org@lists.infradead.org Recently we catched ENOSPC returned by make_reservation() while doing fsstress on UBIFS, we got following information when it occurred (See details in Link): UBIFS error (ubi0:0 pid 3640152): make_reservation [ubifs]: cannot reserve 112 bytes in jhead 2, error -28 CPU: 2 PID: 3640152 Comm: kworker/u16:2 Tainted: G B W Hardware name: Hisilicon PhosphorHi1230 EMU (DT) Workqueue: writeback wb_workfn (flush-ubifs_0_0) Call trace: dump_stack+0x114/0x198 make_reservation+0x564/0x610 [ubifs] ubifs_jnl_write_data+0x328/0x48c [ubifs] do_writepage+0x2a8/0x3e4 [ubifs] ubifs_writepage+0x16c/0x374 [ubifs] generic_writepages+0xb4/0x114 do_writepages+0xcc/0x11c writeback_sb_inodes+0x2d0/0x564 wb_writeback+0x20c/0x2b4 wb_workfn+0x404/0x510 process_one_work+0x304/0x4ac worker_thread+0x31c/0x4e4 kthread+0x23c/0x290 (pid 3640152) Budgeting info: data budget sum 17576, total budget sum 17768 budg_data_growth 4144, budg_dd_growth 13432, budg_idx_growth 192 min_idx_lebs 13, old_idx_sz 988640, uncommitted_idx 0 page_budget 4144, inode_budget 160, dent_budget 312 nospace 0, nospace_rp 0 dark_wm 8192, dead_wm 4096, max_idx_node_sz 192 freeable_cnt 0, calc_idx_sz 988640, idx_gc_cnt 0 dirty_pg_cnt 4, dirty_zn_cnt 0, clean_zn_cnt 4811 gc_lnum 21, ihead_lnum 14 jhead 0 (GC) LEB 16 jhead 1 (base) LEB 34 jhead 2 (data) LEB 23 bud LEB 16 bud LEB 23 bud LEB 34 old bud LEB 33 old bud LEB 31 old bud LEB 15 commit state 4 Budgeting predictions: available: 33832, outstanding 17576, free 15356 (pid 3640152) start dumping LEB properties (pid 3640152) Lprops statistics: empty_lebs 3, idx_lebs 11 taken_empty_lebs 1, total_free 1253376, total_dirty 2445736 total_used 3438712, total_dark 65536, total_dead 17248 LEB 15 free 0 dirty 248000 used 5952 (taken) LEB 16 free 110592 dirty 896 used 142464 (taken, jhead 0 (GC)) LEB 21 free 253952 dirty 0 used 0 (taken, GC LEB) LEB 23 free 0 dirty 248104 used 5848 (taken, jhead 2 (data)) LEB 29 free 253952 dirty 0 used 0 (empty) LEB 33 free 0 dirty 253952 used 0 (taken) LEB 34 free 217088 dirty 36544 used 320 (taken, jhead 1 (base)) LEB 37 free 253952 dirty 0 used 0 (empty) OTHERS: index lebs, zero-available non-index lebs According to the budget algorithm, there are 5 LEBs reserved for budget: three journal heads(16,23,34), 1 GC LEB(21) and 1 deletion LEB(can be used in make_reservation()). There are 2 empty LEBs used for index nodes, which is calculated as min_idx_lebs - idx_lebs = 2. In theory, LEB 15 and 33 should be reclaimed as free state after committing, but it is now in taken state. After looking the realization of reserve_space(), there's a possible situation: LEB 15: free 2000 dirty 248000 used 3952 (jhead 2) LEB 23: free 2000 dirty 248104 used 3848 (bud, taken) LEB 33: free 2000 dirty 251952 used 0 (bud, taken) wb_workfn wb_workfn_2 do_writepage // write 3000 bytes ubifs_jnl_write_data make_reservation reserve_space ubifs_garbage_collect ubifs_find_dirty_leb // ret ENOSPC, dirty LEBs are taken nospc_retries++ // 1 ubifs_run_commit do_commit LEB 15: free 2000 dirty 248000 used 3952 (jhead 2) LEB 23: free 2000 dirty 248104 used 3848 (dirty) LEB 33: free 2000 dirty 251952 used 0 (dirty) do_writepage // write 2000 bytes for 3 times ubifs_jnl_write_data // grabs 15\23\33 LEB 15: free 0 dirty 248000 used 5952 (bud, taken) LEB 23: free 0 dirty 248104 used 5848 (jhead 2) LEB 33: free 0 dirty 253952 used 0 (bud, taken) reserve_space ubifs_garbage_collect ubifs_find_dirty_leb // ret ENOSPC, dirty LEBs are taken if (nospc_retries++ < 2) // false ubifs_ro_mode ! Fetch a reproducer in Link. The dirty LEBs could be grabbed by other writeback threads, which fails finding dirty LEBs of GC in current thread, so make_reservation() should try many times to invoke GC&&committing, but current realization limits the times of retrying as 'nospc_retries'(twice). Actually the limitation of retrying times should be removed from make_reservation(), but considering the existence of 'cmt_retries' in case of infinite loop, just remove 'nospc_retries' and set the limitation of retrying times same as 'cmt_retries'. Link: https://bugzilla.kernel.org/show_bug.cgi?id=218164 Fixes: 1e51764a3c2a ("UBIFS: add new flash file system") Signed-off-by: Zhihao Cheng --- fs/ubifs/journal.c | 26 ++++++++++---------------- 1 file changed, 10 insertions(+), 16 deletions(-) diff --git a/fs/ubifs/journal.c b/fs/ubifs/journal.c index f0a5538c84b0..ec42d42f06dc 100644 --- a/fs/ubifs/journal.c +++ b/fs/ubifs/journal.c @@ -310,7 +310,7 @@ static int write_head(struct ubifs_info *c, int jhead, void *buf, int len, */ static int make_reservation(struct ubifs_info *c, int jhead, int len) { - int err, cmt_retries = 0, nospc_retries = 0; + int err, cmt_retries = 0; again: down_read(&c->commit_sem); @@ -323,21 +323,14 @@ static int make_reservation(struct ubifs_info *c, int jhead, int len) if (err == -ENOSPC) { /* * GC could not make any progress. We should try to commit - * once because it could make some dirty space and GC would - * make progress, so make the error -EAGAIN so that the below - * will commit and re-try. - */ - if (nospc_retries++ < 2) { - dbg_jnl("no space, retry"); - err = -EAGAIN; - } - - /* - * This means that the budgeting is incorrect. We always have - * to be able to write to the media, because all operations are - * budgeted. Deletions are not budgeted, though, but we reserve - * an extra LEB for them. + * because it could make some dirty space and GC would make + * progress, so that the below will commit and re-try. This + * process could repeat many times(cmt_retries < 128), + * journal heads in other threads could grab the dirty lebs, + * which fails finding dirty LEBs of GC in current thread. */ + dbg_jnl("no space, retry"); + err = -EAGAIN; } if (err != -EAGAIN) @@ -350,7 +343,8 @@ static int make_reservation(struct ubifs_info *c, int jhead, int len) if (cmt_retries > 128) { /* * This should not happen unless the journal size limitations - * are too tough. + * are too tough, or the racing between GC and journal heads + * switching is too frequent when space is nearly run out. */ ubifs_err(c, "stuck in space allocation"); err = -ENOSPC;