From patchwork Mon Jan 18 12:26:08 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nicholas Piggin X-Patchwork-Id: 1428173 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=vger.kernel.org (client-ip=23.128.96.18; helo=vger.kernel.org; envelope-from=kvm-ppc-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.a=rsa-sha256 header.s=20161025 header.b=On3XP3a3; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by ozlabs.org (Postfix) with ESMTP id 4DKD7v5mz9z9sRf for ; Tue, 19 Jan 2021 01:05:15 +1100 (AEDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2392625AbhAROE4 (ORCPT ); Mon, 18 Jan 2021 09:04:56 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55732 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2391440AbhARM06 (ORCPT ); Mon, 18 Jan 2021 07:26:58 -0500 Received: from mail-pj1-x102a.google.com (mail-pj1-x102a.google.com [IPv6:2607:f8b0:4864:20::102a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7C93FC061573 for ; Mon, 18 Jan 2021 04:26:18 -0800 (PST) Received: by mail-pj1-x102a.google.com with SMTP id kx7so163311pjb.2 for ; Mon, 18 Jan 2021 04:26:18 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=EVfhBrBc4bRhWEYaMdwcW79J/KS3SFlxun+R1Uj1eXU=; b=On3XP3a3BY0kh1TareEHF1WeVCC5eq2vIVMzv44lvynxhGvnwHUEcEnD6gYdSvIi2d ev5Y0DhuGWz8DJiWB9C6dXq1HSUAQ2EzbrNxPTXYzNqXYgWkuDOGbSlJnaMyox98dkLj 0FJiLI5fOhJw6sduO61HMumGA+sUe9547W/Bm9MmsZRwvdHfiWph4BVj8V7745TKCYJf 6FgIDPBGiGN19lHNMi+xE/0RIqK1YYj8saqJ3Xkv7rZbLV5M6RNP0sKyLzC8WsaoJhCs PxhzU5cCfjkaCMErfNNZNQxUte14Ak9d3j8YHhs1tl2DSaZx1Ya8D/LqtyPPdM+aXH2G /ROg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=EVfhBrBc4bRhWEYaMdwcW79J/KS3SFlxun+R1Uj1eXU=; b=PZAW2fpCjdaboOW55RTfd8FwxRuVCll862yjSYHZG6M5cMGiqLFT7Y+ZamXsOo5Su+ T5pYEnhljLFI1QHzHPFF3OerGudEmwbYKb4wLKXFYUcnaFPbOFycrBUilyMuZwiExK5d qKa7nHC6dY1gYK5CFlBZ2Slg8MhsvxL59nR6a6MdwkehoBQY+bCwj+mfFN8U+GTiKV2u nzi1FoINJ8eyXIFoThHrZzMUOh1IHDZeXPwhW90DG1z/Fl6o4gDMhvGgbm7+6eYDA607 p1VhM5HXB+6J/GiSe9BQr6XgqKpVOpOoa3vKfx7tE6D1Ey2DiuQwe5A/WN+eZvHpPE02 PvZg== X-Gm-Message-State: AOAM532HVCuN0TGUAQCFlTEXp+iBrH30g8EZzrHjKpOpDFdp4Ph2MMBw GJN8SKTuGJ9obvNNFIdRuxvZ85Bunos= X-Google-Smtp-Source: ABdhPJxs8Dq1VNINpn+3xDNUm1xm3HTQZoVuZQfOkT0U4jf7mnfthiIGzQpWds62KetZdKnhp6w9tQ== X-Received: by 2002:a17:902:6b02:b029:da:c6c0:d650 with SMTP id o2-20020a1709026b02b02900dac6c0d650mr26480557plk.74.1610972777777; Mon, 18 Jan 2021 04:26:17 -0800 (PST) Received: from bobo.ibm.com ([124.170.13.62]) by smtp.gmail.com with ESMTPSA id h3sm15896098pgm.67.2021.01.18.04.26.15 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 18 Jan 2021 04:26:17 -0800 (PST) From: Nicholas Piggin To: kvm-ppc@vger.kernel.org Cc: Nicholas Piggin Subject: [PATCH 1/2] KVM: PPC: Book3S HV: Remove shared-TLB optimisation from vCPU TLB coherency logic Date: Mon, 18 Jan 2021 22:26:08 +1000 Message-Id: <20210118122609.1447366-1-npiggin@gmail.com> X-Mailer: git-send-email 2.23.0 MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: kvm-ppc@vger.kernel.org Processors that implement ISA v3.0 or later don't necessarily have threads in a core sharing all translations, and/or TLBIEL does not necessarily invalidate translations on all other threads (the architecture talks only about the effect on translations for "the thread executing the tlbiel instruction". While this worked for POWER9, it may not for future implementations, so remove it. A POWER9 specific optimisation would have to have a specific CPU feature to check, if it were to be re-added. Signed-off-by: Nicholas Piggin --- arch/powerpc/kvm/book3s_hv.c | 32 +++++++++++++++++++--------- arch/powerpc/kvm/book3s_hv_builtin.c | 9 -------- arch/powerpc/kvm/book3s_hv_rm_mmu.c | 6 ------ 3 files changed, 22 insertions(+), 25 deletions(-) diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index 2d8627dbd9f6..752daf43f780 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -2588,22 +2588,34 @@ static void radix_flush_cpu(struct kvm *kvm, int cpu, struct kvm_vcpu *vcpu) { struct kvm_nested_guest *nested = vcpu->arch.nested; cpumask_t *cpu_in_guest; + cpumask_t *need_tlb_flush; int i; - cpu = cpu_first_thread_sibling(cpu); if (nested) { - cpumask_set_cpu(cpu, &nested->need_tlb_flush); + need_tlb_flush = &nested->need_tlb_flush; cpu_in_guest = &nested->cpu_in_guest; } else { - cpumask_set_cpu(cpu, &kvm->arch.need_tlb_flush); + need_tlb_flush = &kvm->arch.need_tlb_flush; cpu_in_guest = &kvm->arch.cpu_in_guest; } + + cpu = cpu_first_thread_sibling(cpu); + for (i = 0; i < threads_per_core; ++i) + cpumask_set_cpu(cpu + i, need_tlb_flush); + /* - * Make sure setting of bit in need_tlb_flush precedes + * Make sure setting of bits in need_tlb_flush precedes * testing of cpu_in_guest bits. The matching barrier on * the other side is the first smp_mb() in kvmppc_run_core(). */ smp_mb(); + + /* + * Pull vcpus out of guests if necessary, such that they'll notice + * the need_tlb_flush bit when they re-enter the guest. If this was + * ever a performance concern, it would be interesting to compare + * with performance of using TLBIE. + */ for (i = 0; i < threads_per_core; ++i) if (cpumask_test_cpu(cpu + i, cpu_in_guest)) smp_call_function_single(cpu + i, do_nothing, NULL, 1); @@ -2632,18 +2644,18 @@ static void kvmppc_prepare_radix_vcpu(struct kvm_vcpu *vcpu, int pcpu) * can move around between pcpus. To cope with this, when * a vcpu moves from one pcpu to another, we need to tell * any vcpus running on the same core as this vcpu previously - * ran to flush the TLB. The TLB is shared between threads, - * so we use a single bit in .need_tlb_flush for all 4 threads. + * ran to flush the TLB. */ if (prev_cpu != pcpu) { - if (prev_cpu >= 0 && - cpu_first_thread_sibling(prev_cpu) != - cpu_first_thread_sibling(pcpu)) - radix_flush_cpu(kvm, prev_cpu, vcpu); if (nested) nested->prev_cpu[vcpu->arch.nested_vcpu_id] = pcpu; else vcpu->arch.prev_cpu = pcpu; + + if (prev_cpu < 0) + return; /* first run */ + + radix_flush_cpu(kvm, prev_cpu, vcpu); } } diff --git a/arch/powerpc/kvm/book3s_hv_builtin.c b/arch/powerpc/kvm/book3s_hv_builtin.c index f3d3183249fe..dad118760a4e 100644 --- a/arch/powerpc/kvm/book3s_hv_builtin.c +++ b/arch/powerpc/kvm/book3s_hv_builtin.c @@ -789,15 +789,6 @@ void kvmppc_check_need_tlb_flush(struct kvm *kvm, int pcpu, { cpumask_t *need_tlb_flush; - /* - * On POWER9, individual threads can come in here, but the - * TLB is shared between the 4 threads in a core, hence - * invalidating on one thread invalidates for all. - * Thus we make all 4 threads use the same bit. - */ - if (cpu_has_feature(CPU_FTR_ARCH_300)) - pcpu = cpu_first_thread_sibling(pcpu); - if (nested) need_tlb_flush = &nested->need_tlb_flush; else diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c b/arch/powerpc/kvm/book3s_hv_rm_mmu.c index 88da2764c1bb..f87237927096 100644 --- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c +++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c @@ -62,12 +62,6 @@ static int global_invalidates(struct kvm *kvm) smp_wmb(); cpumask_setall(&kvm->arch.need_tlb_flush); cpu = local_paca->kvm_hstate.kvm_vcore->pcpu; - /* - * On POWER9, threads are independent but the TLB is shared, - * so use the bit for the first thread to represent the core. - */ - if (cpu_has_feature(CPU_FTR_ARCH_300)) - cpu = cpu_first_thread_sibling(cpu); cpumask_clear_cpu(cpu, &kvm->arch.need_tlb_flush); } From patchwork Mon Jan 18 12:26:09 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nicholas Piggin X-Patchwork-Id: 1428107 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=vger.kernel.org (client-ip=23.128.96.18; helo=vger.kernel.org; envelope-from=kvm-ppc-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.a=rsa-sha256 header.s=20161025 header.b=RJS6vqyr; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by ozlabs.org (Postfix) with ESMTP id 4DK9zJ2HxGz9sVS for ; Mon, 18 Jan 2021 23:27:40 +1100 (AEDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2403897AbhARM1F (ORCPT ); Mon, 18 Jan 2021 07:27:05 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55746 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2403857AbhARM1B (ORCPT ); Mon, 18 Jan 2021 07:27:01 -0500 Received: from mail-pl1-x62d.google.com (mail-pl1-x62d.google.com [IPv6:2607:f8b0:4864:20::62d]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4ECC4C061574 for ; Mon, 18 Jan 2021 04:26:21 -0800 (PST) Received: by mail-pl1-x62d.google.com with SMTP id be12so8575364plb.4 for ; Mon, 18 Jan 2021 04:26:21 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=1erHg5LqqpKnTmLz3Bmoeq2XFVM/7b2u2hm1s2oUlH0=; b=RJS6vqyr8nt8AcIgRse7VffNVIgUqt84yrBtaBMv8sNofehRNk9zpW1tFrm6Fy5Das bThJ8lhFduPDkje8ttbuoJEmqzm7XuBi+bFfg64gUiRW7NN97AQ5j/aOR1ON+HgZo5Ij /1DkQxQr4TsxuYI6sRSfZkzdHRQw+SGueXgOxSeSiVIB0agzD0OoUp76jjWghHnLQ7kK XzXU+RfTatFPgRSdyqPbWe7zpCEvBfRv7C4Y0Ytou+hFp9sZdfVQCnHOTK4lB3R9ToVY qBFFf7+2WKflmE4mfy8DNx7zMmakXtBPkWCkBMroVaOZsdDxYfqmMI9YRwT36qvBYVUz LB7g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=1erHg5LqqpKnTmLz3Bmoeq2XFVM/7b2u2hm1s2oUlH0=; b=kVzvEz/S34S/s8UU5GqUD2s/IhOaY/BXK1f16++PsGKD/RNLJsio0A/CAmGzqCzcZQ 2BA+4a6VIucfQqWN7JPvkEF3dTX4f+lFBL4zRT2MCokAkYDnI4lsFl8qmn9CTxBzU6Np cd3OrVU4kzWtndvtbU4qed9KVLWwgN8crPLe8P7fsj6GVF0TooVE8mjWwYKRhq5mJn3g 0uvDM3WXfLusd1UROFtOBLfezitPWB2BNjcPU7AtvfAEKvBP70WYXHVmeMl1X0rgn8jh 82VRbEkwAK/RFtf9OGBN46LIcAHEE2SaIvP0ZdQx46SbVwusDeXsubdmRJaBjIWtHAYi BdKA== X-Gm-Message-State: AOAM5310LM6O93AbmuDWCkv13CgXxX7e6ukNYdX6Hv6efhUhF1o+pmEZ 3uraZDgUNeZamUQzBH6vRQdAjmBvvEE= X-Google-Smtp-Source: ABdhPJyESkwNd5/ykmGqwSM4XY6ECqMcOzUj4X7gKo0p2bf869gPvdmkL76KW85yAwEt703x5/iCcg== X-Received: by 2002:a17:90a:f2cf:: with SMTP id gt15mr5861299pjb.166.1610972780652; Mon, 18 Jan 2021 04:26:20 -0800 (PST) Received: from bobo.ibm.com ([124.170.13.62]) by smtp.gmail.com with ESMTPSA id h3sm15896098pgm.67.2021.01.18.04.26.18 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 18 Jan 2021 04:26:20 -0800 (PST) From: Nicholas Piggin To: kvm-ppc@vger.kernel.org Cc: Nicholas Piggin Subject: [PATCH 2/2] KVM: PPC: Book3S HV: Optimise TLB flushing when a vcpu moves between threads in a core Date: Mon, 18 Jan 2021 22:26:09 +1000 Message-Id: <20210118122609.1447366-2-npiggin@gmail.com> X-Mailer: git-send-email 2.23.0 In-Reply-To: <20210118122609.1447366-1-npiggin@gmail.com> References: <20210118122609.1447366-1-npiggin@gmail.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: kvm-ppc@vger.kernel.org As explained in the comment, there is no need to flush TLBs on all threads in a core when a vcpu moves between threads in the same core. Thread migrations can be a significant proportion of vcpu migrations, so this can help reduce the TLB flushing and IPI traffic. Signed-off-by: Nicholas Piggin --- I believe we can do this and have the TLB coherency correct as per the architecture, but would appreciate someone else verifying my thinking. Thanks, Nick arch/powerpc/kvm/book3s_hv.c | 28 ++++++++++++++++++++++++++-- 1 file changed, 26 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index 752daf43f780..53d0cbfe5933 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -2584,7 +2584,7 @@ static void kvmppc_release_hwthread(int cpu) tpaca->kvm_hstate.kvm_split_mode = NULL; } -static void radix_flush_cpu(struct kvm *kvm, int cpu, struct kvm_vcpu *vcpu) +static void radix_flush_cpu(struct kvm *kvm, int cpu, bool core, struct kvm_vcpu *vcpu) { struct kvm_nested_guest *nested = vcpu->arch.nested; cpumask_t *cpu_in_guest; @@ -2599,6 +2599,14 @@ static void radix_flush_cpu(struct kvm *kvm, int cpu, struct kvm_vcpu *vcpu) cpu_in_guest = &kvm->arch.cpu_in_guest; } + if (!core) { + cpumask_set_cpu(cpu, need_tlb_flush); + smp_mb(); + if (cpumask_test_cpu(cpu, cpu_in_guest)) + smp_call_function_single(cpu, do_nothing, NULL, 1); + return; + } + cpu = cpu_first_thread_sibling(cpu); for (i = 0; i < threads_per_core; ++i) cpumask_set_cpu(cpu + i, need_tlb_flush); @@ -2655,7 +2663,23 @@ static void kvmppc_prepare_radix_vcpu(struct kvm_vcpu *vcpu, int pcpu) if (prev_cpu < 0) return; /* first run */ - radix_flush_cpu(kvm, prev_cpu, vcpu); + /* + * If changing cores, all threads on the old core should + * flush, because TLBs can be shared between threads. More + * precisely, the thread we previously ran on should be + * flushed, and the thread to first run a vcpu on the old + * core should flush, but we don't keep enough information + * around to track that, so we flush all. + * + * If changing threads in the same core, only the old thread + * need be flushed. + */ + if (cpu_first_thread_sibling(prev_cpu) != + cpu_first_thread_sibling(pcpu)) + radix_flush_cpu(kvm, prev_cpu, true, vcpu); + else + radix_flush_cpu(kvm, prev_cpu, false, vcpu); + } }