From patchwork Wed Mar 6 00:50:24 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Krister Johansen X-Patchwork-Id: 1908495 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=lists.ubuntu.com (client-ip=185.125.189.65; helo=lists.ubuntu.com; envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=patchwork.ozlabs.org) Received: from lists.ubuntu.com (lists.ubuntu.com [185.125.189.65]) (using TLSv1.2 with cipher ECDHE-ECDSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4TqDPv08MDz1yX4 for ; Wed, 6 Mar 2024 11:50:58 +1100 (AEDT) Received: from localhost ([127.0.0.1] helo=lists.ubuntu.com) by lists.ubuntu.com with esmtp (Exim 4.86_2) (envelope-from ) id 1rhfUY-0003r6-Qi; Wed, 06 Mar 2024 00:50:42 +0000 Received: from cheetah.elm.relay.mailchannels.net ([23.83.212.34]) by lists.ubuntu.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1rhfUR-0003qa-VB for kernel-team@lists.ubuntu.com; Wed, 06 Mar 2024 00:50:36 +0000 X-Sender-Id: dreamhost|x-authsender|kjlx@templeofstupid.com Received: from relay.mailchannels.net (localhost [127.0.0.1]) by relay.mailchannels.net (Postfix) with ESMTP id A324582406 for ; Wed, 6 Mar 2024 00:50:33 +0000 (UTC) Received: from pdx1-sub0-mail-a210.dreamhost.com (unknown [127.0.0.6]) (Authenticated sender: dreamhost) by relay.mailchannels.net (Postfix) with ESMTPA id 3D4AC82676 for ; Wed, 6 Mar 2024 00:50:33 +0000 (UTC) ARC-Seal: i=1; s=arc-2022; d=mailchannels.net; t=1709686233; a=rsa-sha256; cv=none; b=rMFsAWORdl6P4F8wgYc06IDKqesEQ/c/fS8T+nm0QWXOUqs+u8jthE1UoEKMoBtjt1/Kvm Szb3w5Ce9MDQQyv3InBeLLbRBWWRPHj/iaEJKFU332dUraUc9GclpP6vLeMX+u/GSnM+im 7rQfZCv5zSWUDdXiULvxhxZxDzy65pVznaeaOEiHVMWhkELL0dlHH3TIIwGJBLUSHjQFjR mxHB2pymP+d7M+5PmS2lsjZcv8rdeeSn8M5ZqG7ctefan0hEn+ySwfF9X94JaNWcD7CV10 KPK8Ni3hE4FSK5uhLiW/EJFrNOflh1hD7SiwYc4uOZzWJMO9ACmrdM/9J+p9aA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=mailchannels.net; s=arc-2022; t=1709686233; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references:dkim-signature; bh=fRKLz/5E/TWg8bgnDvUHXfkDvcU+TKfsonqW3JwVfUM=; b=ZviEoqd1NDFqpC3cfP7HNWn9Pjs/NyqjonthZMQSNk+9BnAoYIGnTwXv2OUqB/tAUVcfAB MOdPm4ewdlM2QiAFmcvIl5TWMmJUD2bi2noqDNODuombYu235KwC+j/lQW2jIN5NKxyPmU aC/N4thjACnc+nwj91tVIwL5AkV0OXxTIy13a9vu294cd+Skd1HeKoQcTDOyq7spIATWjx +5YTZG1AbXYod5HG2Bu1Wjx7e6GnHDnKHXk4/qeHQOJ4hP1vPFV5NAYjPZwyfVgJ4e0kKP 1oRQ4HPvlgfIVG0MYeX7rjDp0OGHMK1RpHZQxMWkF+9d3Essj/sf/82R0dv6GA== ARC-Authentication-Results: i=1; rspamd-7f9dd9fb96-7z9jn; auth=pass smtp.auth=dreamhost smtp.mailfrom=kjlx@templeofstupid.com X-Sender-Id: dreamhost|x-authsender|kjlx@templeofstupid.com X-MC-Relay: Good X-MailChannels-SenderId: dreamhost|x-authsender|kjlx@templeofstupid.com X-MailChannels-Auth-Id: dreamhost X-Glossy-Coil: 2eb58eaf4dd187d3_1709686233488_1000816971 X-MC-Loop-Signature: 1709686233488:4213303728 X-MC-Ingress-Time: 1709686233487 Received: from pdx1-sub0-mail-a210.dreamhost.com (pop.dreamhost.com [64.90.62.162]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384) by 100.117.236.39 (trex/6.9.2); Wed, 06 Mar 2024 00:50:33 +0000 Received: from kmjvbox.templeofstupid.com (c-73-222-159-162.hsd1.ca.comcast.net [73.222.159.162]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) (Authenticated sender: kjlx@templeofstupid.com) by pdx1-sub0-mail-a210.dreamhost.com (Postfix) with ESMTPSA id 4TqDPP0K3lzDb for ; Tue, 5 Mar 2024 16:50:33 -0800 (PST) Received: from johansen (uid 1000) (envelope-from kjlx@templeofstupid.com) id e0082 by kmjvbox.templeofstupid.com (DragonFly Mail Agent v0.12); Tue, 05 Mar 2024 16:50:24 -0800 Date: Tue, 5 Mar 2024 16:50:24 -0800 From: Krister Johansen To: kernel-team@lists.ubuntu.com Subject: [SRU][J][PATCH v2 0/2] KVM: arm64: fix softlockups in stage2_apply_range Message-ID: References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Received-SPF: pass client-ip=23.83.212.34; envelope-from=kjlx@templeofstupid.com; helo=cheetah.elm.relay.mailchannels.net X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: "kernel-team" BugLink: https://bugs.launchpad.net/bugs/2056227 [Impact] Tearing down kvm VMs on arm64 can cause softlockups to appear on console. When terminating VMs with > 100Gb of memory and 4k pages, the memory unmap times often exceed 20 seconds, which can trigger the softlockup detector. Portions of the unmap path also have interrupts disabled while tlb invalidation instructions run, which can further contribute to latency problems. My team has observed networking latency problems if the cpu where the teardown is occurring is also mapped to handle a NIC interrupt. Fortunately, a solution has been in place since Linux 6.1. A small pair of patches modify stage2_apply_range to operate on smaller memory ranges before performing a cond_resched. With these patches applied, softlockups are no longer observed when tearing down VMs with large amounts of memory. Although I also submitted the patches to 5.15 LTS (link to LTS submission in "Backport" section), I'd appreciate it if Ubuntu were willing to take this submission in parallel since the impact has left us unable to utilize arm64 for kvm until we can either migrate our hypervisors to hugepages, pick up this fix, or some combination of the two. [Backport] Backport the following fixes from linux 6.1: 3b5c082bbf KVM: arm64: Work out supported block level at compile time 5994bc9e05 KVM: arm64: Limit stage2_apply_range() batch size to largest block The fix is in 5994bc9e05 and 3b5c082bbf is a dependency that was submitted as part of the series. The original submission is here: https://lore.kernel.org/all/20221007234151.461779-1-oliver.upton@linux.dev/ I've also submitted the patches to 5.15 LTS here: https://lore.kernel.org/stable/cover.1709665227.git.kjlx@templeofstupid.com/ Both fixes cherry picked cleanly and there were no conflicts. [Test] Executed a variation of the test from 5994bc9e05 as well as my own run of kvm_page_table_test on a VM with 4k pages and a memory size > 100Gb. Without the patches, softlockups were observed in both tests. With the patches applied, the tests ran without incident. This was tested against both LTS 5.15.150 and linux-aws-5.15.0-1055. [Potential Regression] Regression potential is low. These patches have been present in Linux since 6.1 and appear to have needed no further maintenance. [Change in v2] I ran format-patch without the --from option which incorrectly generated the first series without leaving Oliver in place as the author. The v2 should retain the correct authorship. Apologies for the mistake. Oliver Upton (2): KVM: arm64: Work out supported block level at compile time KVM: arm64: Limit stage2_apply_range() batch size to largest block arch/arm64/include/asm/kvm_pgtable.h | 18 +++++++++++++----- arch/arm64/include/asm/stage2_pgtable.h | 20 -------------------- arch/arm64/kvm/mmu.c | 9 ++++++++- 3 files changed, 21 insertions(+), 26 deletions(-) Acked-by: Tim Gardner Acked-by: Roxana Nicolescu