From patchwork Tue Feb 14 05:50:22 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Matthew Ruffell X-Patchwork-Id: 1742032 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=lists.ubuntu.com (client-ip=91.189.94.19; helo=huckleberry.canonical.com; envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=) Authentication-Results: legolas.ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=canonical.com header.i=@canonical.com header.a=rsa-sha256 header.s=20210705 header.b=JGRZd4H4; dkim-atps=neutral Received: from huckleberry.canonical.com (huckleberry.canonical.com [91.189.94.19]) (using TLSv1.2 with cipher ECDHE-ECDSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4PG9L2216sz23h0 for ; Tue, 14 Feb 2023 16:50:49 +1100 (AEDT) Received: from localhost ([127.0.0.1] helo=huckleberry.canonical.com) by huckleberry.canonical.com with esmtp (Exim 4.86_2) (envelope-from ) id 1pRoD4-00022z-Nt; Tue, 14 Feb 2023 05:50:34 +0000 Received: from smtp-relay-internal-1.internal ([10.131.114.114] helo=smtp-relay-internal-1.canonical.com) by huckleberry.canonical.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1pRoD2-00022l-Nx for kernel-team@lists.ubuntu.com; Tue, 14 Feb 2023 05:50:32 +0000 Received: from mail-pl1-f197.google.com (mail-pl1-f197.google.com [209.85.214.197]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by smtp-relay-internal-1.canonical.com (Postfix) with ESMTPS id 544F53F11B for ; Tue, 14 Feb 2023 05:50:32 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=canonical.com; s=20210705; t=1676353832; bh=7ipYHvyn/fbWwidJjssv4xZWe+6lJsXFidSx4Vv5DKM=; h=From:To:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=JGRZd4H4IaS7ltDX9quv8UfXAlv8e2Rd8ighTo1UjqhwwHuJRl+Psd7Z8FpmUASFY r7Z2ZYGBH0MHerEpNVjfxUMCVY4vrinR478PY5JYOBjUAlx4g1SBp9mPnVQ52sQKvV 3CYlnLuGf8o0woSUthR7MKTULqWrANkcJtT4NIHYOTtJR0LWmwYvU9yKkFIzDjOFRs 0DmTWMwun7ScRf+XZb+H5vqfU+eUtrLi4+aiT7Z3rk0y3hdz23LYBgDr8rAe/Z0Zol pPpUErsY9hPL0klJb8khs88HV/gVwpWIXF7kexKOE9H6Tvg6OA1BaXJ9io58s4fD7H ciHDzPnAUa/LA== Received: by mail-pl1-f197.google.com with SMTP id jn16-20020a170903051000b00198f5741d23so8577273plb.18 for ; Mon, 13 Feb 2023 21:50:32 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=7ipYHvyn/fbWwidJjssv4xZWe+6lJsXFidSx4Vv5DKM=; b=kuDtwf3BQ9KMQsSaD2dnEWK8HM2LuG+heEkiJGb6yyio0qZ+d9moNKp0ttwYP/Msls IutNebLgQ8nfBTb6hxXpQcWGos/ipeL6zTY7CAE+0COxp3vXhr7Irr0vD81oCKgAysv1 jmlezzzJ5wGJVYD2AgdmBPqexQoL/WrCuePOBFoN0YOBEws04JS0rsdZdHIT83GPYjzW 9b5Xu4yuPIQRaHaxh+ThFpI7GKthJ9OlIrRMT6YNUkg8fpTiKM0KQK5TQsmwVOYsMqH0 6nF2KXGyIheM5Uv9vuisPLLFjP0vMMEkOw3vcpJNMr1F/3ZrEa6O5M7qqRSEXoJotBRn 5Kcg== X-Gm-Message-State: AO0yUKUF7YmX9T40PxIoaaDerlmHMcYoXdbCbL5S9wa7wi2RJwgfOY6N BeXgVB9LQR336ZD8goWhGB1lT0V7a4PorsxQ1SjXfhE0DbKhTX+iYQ80v65AeGkAUdttbedjmc6 vPyDYmjYrGcDLE1Od5ATdEBWBWlSku3WONQE2yTz4MAV2kzw= X-Received: by 2002:a17:90b:1b42:b0:231:2559:35af with SMTP id nv2-20020a17090b1b4200b00231255935afmr1169282pjb.33.1676353831002; Mon, 13 Feb 2023 21:50:31 -0800 (PST) X-Google-Smtp-Source: AK7set+vk05hC8KKwoMOwYnyioAClnynW8R3JMWMkwTUPtcSTl+MViS7xo5JXupPDV5yXjzPZIQeOw== X-Received: by 2002:a17:90b:1b42:b0:231:2559:35af with SMTP id nv2-20020a17090b1b4200b00231255935afmr1169268pjb.33.1676353830751; Mon, 13 Feb 2023 21:50:30 -0800 (PST) Received: from ThinkPad-X1.. (125-239-70-54-fibre.sparkbb.co.nz. [125.239.70.54]) by smtp.gmail.com with ESMTPSA id m6-20020a17090a414600b0023234abed5csm8316296pjg.9.2023.02.13.21.50.29 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 13 Feb 2023 21:50:30 -0800 (PST) From: Matthew Ruffell To: kernel-team@lists.ubuntu.com Subject: [SRU][F][PATCH 1/1] xfs: drop submit side trans alloc for append ioends Date: Tue, 14 Feb 2023 18:50:22 +1300 Message-Id: <20230214055022.7876-2-matthew.ruffell@canonical.com> X-Mailer: git-send-email 2.37.2 In-Reply-To: <20230214055022.7876-1-matthew.ruffell@canonical.com> References: <20230214055022.7876-1-matthew.ruffell@canonical.com> MIME-Version: 1.0 X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: "kernel-team" From: Brian Foster BugLink: https://bugs.launchpad.net/bugs/2007219 Per-inode ioend completion batching has a log reservation deadlock vector between preallocated append transactions and transactions that are acquired at completion time for other purposes (i.e., unwritten extent conversion or COW fork remaps). For example, if the ioend completion workqueue task executes on a batch of ioends that are sorted such that an append ioend sits at the tail, it's possible for the outstanding append transaction reservation to block allocation of transactions required to process preceding ioends in the list. Append ioend completion is historically the common path for on-disk inode size updates. While file extending writes may have completed sometime earlier, the on-disk inode size is only updated after successful writeback completion. These transactions are preallocated serially from writeback context to mitigate concurrency and associated log reservation pressure across completions processed by multi-threaded workqueue tasks. However, now that delalloc blocks unconditionally map to unwritten extents at physical block allocation time, size updates via append ioends are relatively rare. This means that inode size updates most commonly occur as part of the preexisting completion time transaction to convert unwritten extents. As a result, there is no longer a strong need to preallocate size update transactions. Remove the preallocation of inode size update transactions to avoid the ioend completion processing log reservation deadlock. Instead, continue to send all potential size extending ioends to workqueue context for completion and allocate the transaction from that context. This ensures that no outstanding log reservation is owned by the ioend completion worker task when it begins to process ioends. Signed-off-by: Brian Foster Reviewed-by: Christoph Hellwig Reviewed-by: Darrick J. Wong Signed-off-by: Darrick J. Wong (backported from 7cd3099f4925d7c15887d1940ebd65acd66100f5) [mruffell: context adjustments, in particular xfs_end_bio()] Signed-off-by: Matthew Ruffell --- fs/xfs/xfs_aops.c | 43 +++---------------------------------------- 1 file changed, 3 insertions(+), 40 deletions(-) diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c index f16d5f196c6b..1b5b67b55f5f 100644 --- a/fs/xfs/xfs_aops.c +++ b/fs/xfs/xfs_aops.c @@ -125,33 +125,6 @@ static inline bool xfs_ioend_is_append(struct xfs_ioend *ioend) XFS_I(ioend->io_inode)->i_d.di_size; } -STATIC int -xfs_setfilesize_trans_alloc( - struct xfs_ioend *ioend) -{ - struct xfs_mount *mp = XFS_I(ioend->io_inode)->i_mount; - struct xfs_trans *tp; - int error; - - error = xfs_trans_alloc(mp, &M_RES(mp)->tr_fsyncts, 0, 0, 0, &tp); - if (error) - return error; - - ioend->io_append_trans = tp; - - /* - * We may pass freeze protection with a transaction. So tell lockdep - * we released it. - */ - __sb_writers_release(ioend->io_inode->i_sb, SB_FREEZE_FS); - /* - * We hand off the transaction to the completion thread now, so - * clear the flag here. - */ - current_restore_flags_nested(&tp->t_pflags, PF_MEMALLOC_NOFS); - return 0; -} - /* * Update on-disk file size now that data has been written to disk. */ @@ -269,12 +242,10 @@ xfs_end_ioend( error = xfs_reflink_end_cow(ip, offset, size); else if (ioend->io_state == XFS_EXT_UNWRITTEN) error = xfs_iomap_write_unwritten(ip, offset, size, false); - else - ASSERT(!xfs_ioend_is_append(ioend) || ioend->io_append_trans); + if (!error && xfs_ioend_is_append(ioend)) + error = xfs_setfilesize(ip, ioend->io_offset, ioend->io_size); done: - if (ioend->io_append_trans) - error = xfs_setfilesize_ioend(ioend, error); list_replace_init(&ioend->io_list, &ioend_list); xfs_destroy_ioend(ioend, error); @@ -404,7 +375,7 @@ xfs_end_bio( if (ioend->io_fork == XFS_COW_FORK || ioend->io_state == XFS_EXT_UNWRITTEN || - ioend->io_append_trans != NULL) { + xfs_ioend_is_append(ioend)) { spin_lock_irqsave(&ip->i_ioend_lock, flags); if (list_empty(&ip->i_ioend_list)) WARN_ON_ONCE(!queue_work(mp->m_unwritten_workqueue, @@ -661,14 +632,6 @@ xfs_submit_ioend( ioend->io_offset, ioend->io_size); } - /* Reserve log space if we might write beyond the on-disk inode size. */ - if (!status && - (ioend->io_fork == XFS_COW_FORK || - ioend->io_state != XFS_EXT_UNWRITTEN) && - xfs_ioend_is_append(ioend) && - !ioend->io_append_trans) - status = xfs_setfilesize_trans_alloc(ioend); - memalloc_nofs_restore(nofs_flag); ioend->io_bio->bi_private = ioend;