From patchwork Mon Dec 18 10:41:06 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Atish Kumar Patra X-Patchwork-Id: 1877348 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; secure) header.d=lists.infradead.org header.i=@lists.infradead.org header.a=rsa-sha256 header.s=bombadil.20210309 header.b=autBBhJW; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=rivosinc-com.20230601.gappssmtp.com header.i=@rivosinc-com.20230601.gappssmtp.com header.a=rsa-sha256 header.s=20230601 header.b=YIkV2pT9; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=none (no SPF record) smtp.mailfrom=lists.infradead.org (client-ip=2607:7c80:54:3::133; helo=bombadil.infradead.org; envelope-from=kvm-riscv-bounces+incoming=patchwork.ozlabs.org@lists.infradead.org; receiver=patchwork.ozlabs.org) Received: from bombadil.infradead.org (bombadil.infradead.org [IPv6:2607:7c80:54:3::133]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4StxG84xgzz23yd for ; Mon, 18 Dec 2023 21:41:52 +1100 (AEDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:MIME-Version:References:In-Reply-To: Message-Id:Date:Subject:Cc:To:From:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=WuyaTFbQWTIqIEPL2h68nYfd9fJ9FSkmegePN6+JRvY=; b=autBBhJW/YIUws GyNuSkETghy+gar3IsiLz122xGE/ccq/j/SYTxYfZxRWO+Dc9MZe7Nn8VKK4y+FhouCi8NunleUlW 7fMke30DJQ2B30yKMpejwOT80V2QqAxMaWs1Xs7H2Jfk5556mBXPBrpvYsuO48iWWQIOK09E+LR4Q 3Prsx/1UMy8dY9ImY0AWxsSPX22fk7sUWAPZnv3A/ABBNcnIX2218HqQtdVvBwBqSmKLMg5yCUlk4 sYRRVCoTw7bGnjx5Ne9YfyiLF90TJdD0tXQSODcIi0jnFwDAGiV0t4uqR+ld9JaUoI7LXx6AofRBD JER6as5Bx+OF55RnUAAQ==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.96 #2 (Red Hat Linux)) id 1rFB4H-009rgn-1w; Mon, 18 Dec 2023 10:41:49 +0000 Received: from mail-ot1-x32e.google.com ([2607:f8b0:4864:20::32e]) by bombadil.infradead.org with esmtps (Exim 4.96 #2 (Red Hat Linux)) id 1rFB42-009rOu-2O for kvm-riscv@lists.infradead.org; Mon, 18 Dec 2023 10:41:38 +0000 Received: by mail-ot1-x32e.google.com with SMTP id 46e09a7af769-6d9d29a2332so1256780a34.0 for ; Mon, 18 Dec 2023 02:41:32 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=rivosinc-com.20230601.gappssmtp.com; s=20230601; t=1702896092; x=1703500892; darn=lists.infradead.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=yWhxjJQFHhpgCrXyYb1BAisy4P65ruL5Q7URr418uvg=; b=YIkV2pT93zjjS48TCYRHN3f9miaPV6mNQfBfDqe/f4kenk1tK5tEyIbGZhgoEIGWJ4 h9jOd5xGmOrMm2RRVf49szbHImsWzOepFw2VMflxyB+p6vNqCBWSNF6/SFH6ijRn6YJO 3hzULL43XX8hBea5ff1mMQi09/tFtpRVhqXCdfrzL/JvvLgLieppbnIn7vlhYch3MDSt fpbVQNMuK0f4hmjfq6ERrTqRQHM+Ta5K255FpRFLUAc75Ct0VeAa4kTOTZQHo3rnVshj kWOB8UyQ4b5sftZ2E2TCbThjs6QzdFVUd1Erey/rhXEy/dR7Y/FWZ9Vj+QbiBbOQmI11 t92A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1702896092; x=1703500892; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=yWhxjJQFHhpgCrXyYb1BAisy4P65ruL5Q7URr418uvg=; b=cqyyXCtgUM5geKLD6pAaqQNdS4yuquSqY/m+9kPh0XBeHzIACPS6ad0qFQIYk7JZz+ wJ330dhewQWlvv0N0Lad/qVk2V5dZGitQGIogA67psWUWiEepoy7N2hwTHOFPqoxCHen GlvIHl1JqM+jwvQwAAbu7zA/Y8ztDvRmZxJDoWlintGhxUFk3UJllRzJHAmizqdSaaho hwqYSPNqgiDxlWRd8hOJwIqBVMepVuoXZ1zO0MKayVOEF/H4WwueYSTRgRu7FSKebrDC K38NcetYHxn99Q0HZegDa3gimJLAI07vbqVF32KzjEqvlyXZxo5afPVeyanrPZSPUHKl o2hg== X-Gm-Message-State: AOJu0YwGrw69ErVC4dvLkVdSUeiWOugG+0Y5aVVVG7K0eCkZkt6Uzzxj /t0IQh0usHOrYbDZ7RnSNDq1fQ== X-Google-Smtp-Source: AGHT+IExF7GaBDAb4dryRnXzB1yMb8xukypd5iJOpRABiVVaLNBvK9ZvR+T2QdN+D8mJUg+T3JtTQQ== X-Received: by 2002:a05:6830:358a:b0:6d7:ea6b:947f with SMTP id bd10-20020a056830358a00b006d7ea6b947fmr6141722otb.1.1702896091907; Mon, 18 Dec 2023 02:41:31 -0800 (PST) Received: from atishp.ba.rivosinc.com ([64.71.180.162]) by smtp.gmail.com with ESMTPSA id 185-20020a4a1ac2000000b005907ad9f302sm574970oof.37.2023.12.18.02.41.30 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 18 Dec 2023 02:41:31 -0800 (PST) From: Atish Patra To: linux-kernel@vger.kernel.org Cc: Atish Patra , Albert Ou , Alexandre Ghiti , Andrew Jones , Anup Patel , Atish Patra , Conor Dooley , Guo Ren , Icenowy Zheng , kvm-riscv@lists.infradead.org, kvm@vger.kernel.org, linux-riscv@lists.infradead.org, Mark Rutland , Palmer Dabbelt , Paul Walmsley , Will Deacon Subject: [v1 09/10] RISC-V: KVM: Add perf sampling support for guests Date: Mon, 18 Dec 2023 02:41:06 -0800 Message-Id: <20231218104107.2976925-10-atishp@rivosinc.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20231218104107.2976925-1-atishp@rivosinc.com> References: <20231218104107.2976925-1-atishp@rivosinc.com> MIME-Version: 1.0 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20231218_024134_811041_A2321E35 X-CRM114-Status: GOOD ( 27.59 ) X-Spam-Score: 0.0 (/) X-Spam-Report: Spam detection software, running on the system "bombadil.infradead.org", has NOT identified this incoming email as spam. The original message has been attached to this so you can view it or label similar future email. If you have any questions, see the administrator of that system for details. Content preview: KVM enables perf for guest via counter virtualization. However, the sampling can not be supported as there is no mechanism to enabled trap/emulate scountovf in ISA yet. Rely on the SBI PMU snapshot to [...] Content analysis details: (0.0 points, 5.0 required) pts rule name description ---- ---------------------- -------------------------------------------------- -0.0 RCVD_IN_DNSWL_NONE RBL: Sender listed at https://www.dnswl.org/, no trust [2607:f8b0:4864:20:0:0:0:32e listed in] [list.dnswl.org] -0.0 SPF_PASS SPF: sender matches SPF record 0.0 SPF_HELO_NONE SPF: HELO does not publish an SPF Record -0.1 DKIM_VALID Message has at least one valid DKIM or DK signature 0.1 DKIM_SIGNED Message has a DKIM or DK signature, not necessarily valid X-BeenThere: kvm-riscv@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "kvm-riscv" Errors-To: kvm-riscv-bounces+incoming=patchwork.ozlabs.org@lists.infradead.org KVM enables perf for guest via counter virtualization. However, the sampling can not be supported as there is no mechanism to enabled trap/emulate scountovf in ISA yet. Rely on the SBI PMU snapshot to provide the counter overflow data via the shared memory. In case of sampling event, the host first guest the LCOFI interrupt and injects to the guest via irq filtering mechanism defined in AIA specification. Thus, ssaia must be enabled in the host in order to use perf sampling in the guest. No other AIA dpeendancy w.r.t kernel is required. Signed-off-by: Atish Patra --- arch/riscv/include/asm/csr.h | 3 +- arch/riscv/include/asm/kvm_vcpu_pmu.h | 1 + arch/riscv/include/uapi/asm/kvm.h | 1 + arch/riscv/kvm/main.c | 1 + arch/riscv/kvm/vcpu.c | 8 +-- arch/riscv/kvm/vcpu_onereg.c | 2 + arch/riscv/kvm/vcpu_pmu.c | 70 +++++++++++++++++++++++++-- 7 files changed, 77 insertions(+), 9 deletions(-) diff --git a/arch/riscv/include/asm/csr.h b/arch/riscv/include/asm/csr.h index 88cdc8a3e654..bec09b33e2f0 100644 --- a/arch/riscv/include/asm/csr.h +++ b/arch/riscv/include/asm/csr.h @@ -168,7 +168,8 @@ #define VSIP_TO_HVIP_SHIFT (IRQ_VS_SOFT - IRQ_S_SOFT) #define VSIP_VALID_MASK ((_AC(1, UL) << IRQ_S_SOFT) | \ (_AC(1, UL) << IRQ_S_TIMER) | \ - (_AC(1, UL) << IRQ_S_EXT)) + (_AC(1, UL) << IRQ_S_EXT) | \ + (_AC(1, UL) << IRQ_PMU_OVF)) /* AIA CSR bits */ #define TOPI_IID_SHIFT 16 diff --git a/arch/riscv/include/asm/kvm_vcpu_pmu.h b/arch/riscv/include/asm/kvm_vcpu_pmu.h index 5e6fc9ac2b90..931887e07a66 100644 --- a/arch/riscv/include/asm/kvm_vcpu_pmu.h +++ b/arch/riscv/include/asm/kvm_vcpu_pmu.h @@ -36,6 +36,7 @@ struct kvm_pmc { bool started; /* Monitoring event ID */ unsigned long event_idx; + struct kvm_vcpu *vcpu; }; /* PMU data structure per vcpu */ diff --git a/arch/riscv/include/uapi/asm/kvm.h b/arch/riscv/include/uapi/asm/kvm.h index 60d3b21dead7..741c16f4518e 100644 --- a/arch/riscv/include/uapi/asm/kvm.h +++ b/arch/riscv/include/uapi/asm/kvm.h @@ -139,6 +139,7 @@ enum KVM_RISCV_ISA_EXT_ID { KVM_RISCV_ISA_EXT_ZIHPM, KVM_RISCV_ISA_EXT_SMSTATEEN, KVM_RISCV_ISA_EXT_ZICOND, + KVM_RISCV_ISA_EXT_SSCOFPMF, KVM_RISCV_ISA_EXT_MAX, }; diff --git a/arch/riscv/kvm/main.c b/arch/riscv/kvm/main.c index 225a435d9c9a..5a3a4cee0e3d 100644 --- a/arch/riscv/kvm/main.c +++ b/arch/riscv/kvm/main.c @@ -43,6 +43,7 @@ int kvm_arch_hardware_enable(void) csr_write(CSR_HCOUNTEREN, 0x02); csr_write(CSR_HVIP, 0); + csr_write(CSR_HVIEN, 1UL << IRQ_PMU_OVF); kvm_riscv_aia_enable(); diff --git a/arch/riscv/kvm/vcpu.c b/arch/riscv/kvm/vcpu.c index e087c809073c..2d9f252356c3 100644 --- a/arch/riscv/kvm/vcpu.c +++ b/arch/riscv/kvm/vcpu.c @@ -380,7 +380,8 @@ int kvm_riscv_vcpu_set_interrupt(struct kvm_vcpu *vcpu, unsigned int irq) if (irq < IRQ_LOCAL_MAX && irq != IRQ_VS_SOFT && irq != IRQ_VS_TIMER && - irq != IRQ_VS_EXT) + irq != IRQ_VS_EXT && + irq != IRQ_PMU_OVF) return -EINVAL; set_bit(irq, vcpu->arch.irqs_pending); @@ -395,14 +396,15 @@ int kvm_riscv_vcpu_set_interrupt(struct kvm_vcpu *vcpu, unsigned int irq) int kvm_riscv_vcpu_unset_interrupt(struct kvm_vcpu *vcpu, unsigned int irq) { /* - * We only allow VS-mode software, timer, and external + * We only allow VS-mode software, timer, counter overflow and external * interrupts when irq is one of the local interrupts * defined by RISC-V privilege specification. */ if (irq < IRQ_LOCAL_MAX && irq != IRQ_VS_SOFT && irq != IRQ_VS_TIMER && - irq != IRQ_VS_EXT) + irq != IRQ_VS_EXT && + irq != IRQ_PMU_OVF) return -EINVAL; clear_bit(irq, vcpu->arch.irqs_pending); diff --git a/arch/riscv/kvm/vcpu_onereg.c b/arch/riscv/kvm/vcpu_onereg.c index 855d12b6a4a5..e926681398f2 100644 --- a/arch/riscv/kvm/vcpu_onereg.c +++ b/arch/riscv/kvm/vcpu_onereg.c @@ -36,6 +36,7 @@ static const unsigned long kvm_isa_ext_arr[] = { /* Multi letter extensions (alphabetically sorted) */ KVM_ISA_EXT_ARR(SMSTATEEN), KVM_ISA_EXT_ARR(SSAIA), + KVM_ISA_EXT_ARR(SSCOFPMF), KVM_ISA_EXT_ARR(SSTC), KVM_ISA_EXT_ARR(SVINVAL), KVM_ISA_EXT_ARR(SVNAPOT), @@ -88,6 +89,7 @@ static bool kvm_riscv_vcpu_isa_disable_allowed(unsigned long ext) case KVM_RISCV_ISA_EXT_I: case KVM_RISCV_ISA_EXT_M: case KVM_RISCV_ISA_EXT_SSTC: + case KVM_RISCV_ISA_EXT_SSCOFPMF: case KVM_RISCV_ISA_EXT_SVINVAL: case KVM_RISCV_ISA_EXT_SVNAPOT: case KVM_RISCV_ISA_EXT_ZBA: diff --git a/arch/riscv/kvm/vcpu_pmu.c b/arch/riscv/kvm/vcpu_pmu.c index a6e9c2132e24..368c3a8a8d4a 100644 --- a/arch/riscv/kvm/vcpu_pmu.c +++ b/arch/riscv/kvm/vcpu_pmu.c @@ -229,6 +229,47 @@ static int kvm_pmu_validate_counter_mask(struct kvm_pmu *kvpmu, unsigned long ct return 0; } +static void kvm_riscv_pmu_overflow(struct perf_event *perf_event, + struct perf_sample_data *data, + struct pt_regs *regs) +{ + struct kvm_pmc *pmc = perf_event->overflow_handler_context; + struct kvm_vcpu *vcpu = pmc->vcpu; + struct kvm_pmu *kvpmu = vcpu_to_pmu(vcpu); + struct riscv_pmu *rpmu = to_riscv_pmu(perf_event->pmu); + u64 period; + + /* + * Stop the event counting by directly accessing the perf_event. + * Otherwise, this needs to deferred via a workqueue. + * That will introduce skew in the counter value because the actual + * physical counter would start after returning from this function. + * It will be stopped again once the workqueue is scheduled + */ + rpmu->pmu.stop(perf_event, PERF_EF_UPDATE); + + /* + * The hw counter would start automatically when this function returns. + * Thus, the host may continue to interrupt and inject it to the guest + * even without the guest configuring the next event. Depending on the hardware + * the host may have some sluggishness only if privilege mode filtering is not + * available. In an ideal world, where qemu is not the only capable hardware, + * this can be removed. + * FYI: ARM64 does this way while x86 doesn't do anything as such. + * TODO: Should we keep it for RISC-V ? + */ + period = -(local64_read(&perf_event->count)); + + local64_set(&perf_event->hw.period_left, 0); + perf_event->attr.sample_period = period; + perf_event->hw.sample_period = period; + + set_bit(pmc->idx, kvpmu->pmc_overflown); + kvm_riscv_vcpu_set_interrupt(vcpu, IRQ_PMU_OVF); + + rpmu->pmu.start(perf_event, PERF_EF_RELOAD); +} + static long kvm_pmu_create_perf_event(struct kvm_pmc *pmc, struct perf_event_attr *attr, unsigned long flags, unsigned long eidx, unsigned long evtdata) @@ -248,7 +289,7 @@ static long kvm_pmu_create_perf_event(struct kvm_pmc *pmc, struct perf_event_att */ attr->sample_period = kvm_pmu_get_sample_period(pmc); - event = perf_event_create_kernel_counter(attr, -1, current, NULL, pmc); + event = perf_event_create_kernel_counter(attr, -1, current, kvm_riscv_pmu_overflow, pmc); if (IS_ERR(event)) { pr_err("kvm pmu event creation failed for eidx %lx: %ld\n", eidx, PTR_ERR(event)); return PTR_ERR(event); @@ -473,6 +514,12 @@ int kvm_riscv_vcpu_pmu_ctr_start(struct kvm_vcpu *vcpu, unsigned long ctr_base, } } + /* The guest have serviced the interrupt and starting the counter again */ + if (test_bit(IRQ_PMU_OVF, vcpu->arch.irqs_pending)) { + clear_bit(pmc_index, kvpmu->pmc_overflown); + kvm_riscv_vcpu_unset_interrupt(vcpu, IRQ_PMU_OVF); + } + out: retdata->err_val = sbiret; @@ -539,7 +586,13 @@ int kvm_riscv_vcpu_pmu_ctr_stop(struct kvm_vcpu *vcpu, unsigned long ctr_base, else if (pmc->perf_event) pmc->counter_val += perf_event_read_value(pmc->perf_event, &enabled, &running); - /* TODO: Add counter overflow support when sscofpmf support is added */ + /* + * The counter and overflow indicies in the snapshot region are w.r.to + * cbase. Modify the set bit in the counter mask instead of the pmc_index + * which indicates the absolute counter index. + */ + if (test_bit(pmc_index, kvpmu->pmc_overflown)) + kvpmu->sdata->ctr_overflow_mask |= (1UL << i); kvpmu->sdata->ctr_values[i] = pmc->counter_val; kvm_vcpu_write_guest(vcpu, kvpmu->snapshot_addr, kvpmu->sdata, sizeof(struct riscv_pmu_snapshot_data)); @@ -548,15 +601,20 @@ int kvm_riscv_vcpu_pmu_ctr_stop(struct kvm_vcpu *vcpu, unsigned long ctr_base, if (flags & SBI_PMU_STOP_FLAG_RESET) { pmc->event_idx = SBI_PMU_EVENT_IDX_INVALID; clear_bit(pmc_index, kvpmu->pmc_in_use); + clear_bit(pmc_index, kvpmu->pmc_overflown); if (bSnapshot) { /* Clear the snapshot area for the upcoming deletion event */ kvpmu->sdata->ctr_values[i] = 0; + /* + * Only clear the given counter as the caller is responsible to + * validate both the overflow mask and configured counters. + */ + kvpmu->sdata->ctr_overflow_mask &= ~(1UL << i); kvm_vcpu_write_guest(vcpu, kvpmu->snapshot_addr, kvpmu->sdata, sizeof(struct riscv_pmu_snapshot_data)); } } } - out: retdata->err_val = sbiret; @@ -699,6 +757,7 @@ void kvm_riscv_vcpu_pmu_init(struct kvm_vcpu *vcpu) pmc = &kvpmu->pmc[i]; pmc->idx = i; pmc->event_idx = SBI_PMU_EVENT_IDX_INVALID; + pmc->vcpu = vcpu; if (i < kvpmu->num_hw_ctrs) { pmc->cinfo.type = SBI_PMU_CTR_TYPE_HW; if (i < 3) @@ -731,13 +790,14 @@ void kvm_riscv_vcpu_pmu_deinit(struct kvm_vcpu *vcpu) if (!kvpmu) return; - for_each_set_bit(i, kvpmu->pmc_in_use, RISCV_MAX_COUNTERS) { + for_each_set_bit(i, kvpmu->pmc_in_use, RISCV_KVM_MAX_COUNTERS) { pmc = &kvpmu->pmc[i]; pmc->counter_val = 0; kvm_pmu_release_perf_event(pmc); pmc->event_idx = SBI_PMU_EVENT_IDX_INVALID; } - bitmap_zero(kvpmu->pmc_in_use, RISCV_MAX_COUNTERS); + bitmap_zero(kvpmu->pmc_in_use, RISCV_KVM_MAX_COUNTERS); + bitmap_zero(kvpmu->pmc_overflown, RISCV_KVM_MAX_COUNTERS); memset(&kvpmu->fw_event, 0, SBI_PMU_FW_MAX * sizeof(struct kvm_fw_event)); kvm_pmu_clear_snapshot_area(vcpu); }