From patchwork Tue May 18 15:25:32 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Andrea Righi X-Patchwork-Id: 1480323 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=lists.ubuntu.com (client-ip=91.189.94.19; helo=huckleberry.canonical.com; envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=) Received: from huckleberry.canonical.com (huckleberry.canonical.com [91.189.94.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4Fl0FY2sk1z9sXH; Wed, 19 May 2021 01:25:53 +1000 (AEST) Received: from localhost ([127.0.0.1] helo=huckleberry.canonical.com) by huckleberry.canonical.com with esmtp (Exim 4.86_2) (envelope-from ) id 1lj1bQ-0001Kp-6T; Tue, 18 May 2021 15:25:48 +0000 Received: from youngberry.canonical.com ([91.189.89.112]) by huckleberry.canonical.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1lj1bN-0001K6-Ai for kernel-team@lists.ubuntu.com; Tue, 18 May 2021 15:25:45 +0000 Received: from mail-ed1-f70.google.com ([209.85.208.70]) by youngberry.canonical.com with esmtps (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.93) (envelope-from ) id 1lj1bN-0008Ae-2m for kernel-team@lists.ubuntu.com; Tue, 18 May 2021 15:25:45 +0000 Received: by mail-ed1-f70.google.com with SMTP id q18-20020a50cc920000b029038cf491864cso5960657edi.14 for ; Tue, 18 May 2021 08:25:45 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id:mime-version :content-transfer-encoding; bh=oZ4rIEgY8SWTgTJX0G/HNyOtens+CFD+9qbcc1HptlY=; b=tjmpG+5vw7UGRv9HGCWl6wJ83EEYIufPz+QXEU5bLc61VW22Wm93drsEcXcWHSK4hW 9H0u8+UR2iZQvAt+fQqUm6/0rx216gER0RMt5w0IrU98SnlmIfov7vuaHDDV/DEBpmdC R5ij/d/X/W1r97I6/HdVHygvg8EJ3nQkbeOtdbJcLu6HsbQT1F7EVTX/+/i17etI9l8m lITUldzHdG9oLKRnzHFlR0qfOlQBSeNiEQ3LDB+esNjPAcatJGgXYJO6ETGDLEQTvv0W P+/kpbtDEa15o3sUdWLUV08kkSM/L+sd5hsUN7qohAZSN2H1iQOkWwNgZuYDMJ4PvSnG yovA== X-Gm-Message-State: AOAM532p0igPg6dPOvG4dFwcK9K0pvVJHUTbIdFY3obL6Su765Oau2yI xYevx2yNEBNNNhoGkVapI0axcZUzdTbfocGBFvuthyCuednWyLJyQuyomiB5YazHuOe0RHY7u4K 7lfCBvaHzI7rJm4BOp+K/idJMvemHalq/3tc8Z/FtDw== X-Received: by 2002:a17:906:1c46:: with SMTP id l6mr6650350ejg.328.1621351544793; Tue, 18 May 2021 08:25:44 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzadqIGZTBoHhEhjlQ9nYoz4BcUPdsw8x7OF6x37DpI/pasdQElInoc1Y2up9dbQi9iv8VndQ== X-Received: by 2002:a17:906:1c46:: with SMTP id l6mr6650328ejg.328.1621351544591; Tue, 18 May 2021 08:25:44 -0700 (PDT) Received: from xps-13-7390.homenet.telecomitalia.it (host-79-19-135-103.retail.telecomitalia.it. [79.19.135.103]) by smtp.gmail.com with ESMTPSA id d11sm4969896ejr.58.2021.05.18.08.25.44 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 18 May 2021 08:25:44 -0700 (PDT) From: Andrea Righi To: kernel-team@lists.ubuntu.com Subject: [SRU][F/aws][PATCH v2 0/6] aws: proper fix for c5.18xlarge hibernation issues Date: Tue, 18 May 2021 17:25:32 +0200 Message-Id: <20210518152538.197174-1-andrea.righi@canonical.com> X-Mailer: git-send-email 2.30.2 MIME-Version: 1.0 X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: "kernel-team" BugLink: https://bugs.launchpad.net/bugs/1920944 [Impact] In LP: #1918694 we applied a fix and a workaround to solve the hibernation issues on c5.18xlarge. The workaround was in the form of a SAUCE patch:   "UBUNTU: SAUCE: aws: kvm: double the size of hv_clock_boot" It looks like we can replace this workaround with a proper fix, by applying this patch: http://next.patchew.org/Linux/20210414123544.1060604-1-vkuznets@redhat.com/ [Test plan] Create a c5.18xlarge instance, run the memory stress test script (the same test script that we are using to stress test hibernation), trigger the hibernate event, trigger the resume event. Repeat a couple of times and the problem is very likely to happen. [Fix] Replace "UBUNTU: SAUCE: aws: kvm: double the size of hv_clock_boot" with: http://next.patchew.org/Linux/20210414123544.1060604-1-vkuznets@redhat.com/ The fix has been tested extensively in the AWS infrastructure with positive results. [Where problems could occur] This new code introduced by the fix can be executed also when a CPU is put offline, so we may see potential regressions in the KVM CPU hotplugging. ---------------------------------------------------------------- Changelog (v1 -> v2): - new patch set from readhat NOTE: backport activity was minimal, it only required some context adjustments to properly apply the changes. Andrea Righi (1): Revert "UBUNTU: SAUCE: aws: kvm: double the size of hv_clock_boot" Vitaly Kuznetsov (5): x86/kvm: Fix pr_info() for async PF setup/teardown x86/kvm: Teardown PV features on boot CPU as well x86/kvm: Disable kvmclock on all CPUs on shutdown x86/kvm: Disable all PV features on crash x86/kvm: Unify kvm_pv_guest_cpu_reboot() with kvm_guest_cpu_offline() arch/x86/include/asm/kvm_para.h | 9 ++---- arch/x86/kernel/kvm.c | 113 ++++++++++++++++++++++++++++++++++++++++++++---------------------- arch/x86/kernel/kvmclock.c | 28 ++--------------- 3 files changed, 79 insertions(+), 71 deletions(-)