From patchwork Thu Jun 13 14:19:02 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Max Chou X-Patchwork-Id: 1947439 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=sifive.com header.i=@sifive.com header.a=rsa-sha256 header.s=google header.b=iN80wjiC; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=nongnu.org (client-ip=209.51.188.17; helo=lists.gnu.org; envelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org; receiver=patchwork.ozlabs.org) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-ECDSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4W0Phz4dtVz20Xd for ; Fri, 14 Jun 2024 00:21:07 +1000 (AEST) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1sHlIQ-0000Bf-SJ; Thu, 13 Jun 2024 10:19:22 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1sHlIP-0000BK-93 for qemu-devel@nongnu.org; Thu, 13 Jun 2024 10:19:21 -0400 Received: from mail-pl1-x632.google.com ([2607:f8b0:4864:20::632]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1sHlIN-0003bQ-Ob for qemu-devel@nongnu.org; Thu, 13 Jun 2024 10:19:21 -0400 Received: by mail-pl1-x632.google.com with SMTP id d9443c01a7336-1f6f38b1ab0so9136285ad.1 for ; Thu, 13 Jun 2024 07:19:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sifive.com; s=google; t=1718288358; x=1718893158; darn=nongnu.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=gNUjy3Lyly2ANVQLK35XiynkZqf6DtnuQXKWv8PGktk=; b=iN80wjiCwCPOt9q5EXvOxGA6KTtEcP8DeVIy90fTHUriN1HjButDIhpLvVpLOS2TxT aHZht/JfMf7OpUrIoIAr/ApZeq7DXPnp4dxw2DoHbJrMcOhJMlTry3MUtS64wlUXRsW6 Q96u+RC5oR7vPCg90aLhqGjNnR91/XktTk/wfLSmhtYZSPJT9pM03kzNKcEloFCE1D8l JBbh8GpgZFiw6BbTkieCZP6Wi7VGKWeiRykmAQmXZgcNkOl/DaoL0dcf+Ibv5C/pOdWk s0Ty2tcgpThHLTzKv0ixZmDmconanSEvAEf762Xp+cjAdT/sV6A+eolIAHJY31J+U1AQ Wmtg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1718288358; x=1718893158; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=gNUjy3Lyly2ANVQLK35XiynkZqf6DtnuQXKWv8PGktk=; b=qE0X35cqnqKbF+Vu/7xCYaDqGBjGOIqUwqb5/rX0BJHCkFBQhu+JjvTm6F9sXhQPXD UkUAaSqir8boUAL+WfmlgQDyJHyBsbtdy3YQ4Lc651gPTISTa5TmX7UJL1B5iXnbk1Sj iQYHE9bFd+bn57s33x/lxyBTuxNTdo/RDcPx6lMcGBszLI4qUl4t+yFTGF8slE38wITm +H8RKBJrmO566Zr3eBrYQW6uPwJ7a/2DojPtGfEnHKbS5lqI4yyLFFZwDJCU3mgyS9tQ x5swECCOy76TMX/bi6+xyD8wfdhvdk9Mw0UnhvW8fuU094241mIheEo0RmWRvUYKRtBi OYAA== X-Gm-Message-State: AOJu0YwX1gL8FEjeX4kg7Rict0ihsFa1jIc67A34dK/CB71GdfrOPmp6 2nCsnzUbVS3hZ7qUJc992/JslBmhfQ99LTF0IT+jsKaIWvButlHoSgM+RqYv0ed87AOvxfCv7n5 P6azmLTIONDdoIw53hykWDYMWnUTQMjniha7Ye2LQ015J8ReDtxlsQxBDVDIBMIkYO5U1D9k0j2 qaUHzGBPuIQdzYu05xqtAB/4iWhTyLtZCrZxevmw== X-Google-Smtp-Source: AGHT+IFyjAoh+fd9xQLCP3YENN/jZKZV4kWs86hCpMm4quKYFJ09IdIvgKf1e4Ae6OMfZt57tXJuCA== X-Received: by 2002:a17:902:db11:b0:1f6:869d:25bd with SMTP id d9443c01a7336-1f83b678f7bmr55187435ad.37.1718288357878; Thu, 13 Jun 2024 07:19:17 -0700 (PDT) Received: from duncan.localdomain (114-35-142-126.hinet-ip.hinet.net. [114.35.142.126]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-1f855f2fe6dsm14386975ad.257.2024.06.13.07.19.15 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 13 Jun 2024 07:19:17 -0700 (PDT) From: Max Chou To: qemu-devel@nongnu.org, qemu-riscv@nongnu.org Cc: Richard Henderson , Paolo Bonzini , Palmer Dabbelt , Alistair Francis , Bin Meng , Weiwei Li , Daniel Henrique Barboza , Liu Zhiwei , Max Chou Subject: [RFC PATCH v3 1/5] accel/tcg: Avoid unnecessary call overhead from qemu_plugin_vcpu_mem_cb Date: Thu, 13 Jun 2024 22:19:02 +0800 Message-Id: <20240613141906.1276105-2-max.chou@sifive.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240613141906.1276105-1-max.chou@sifive.com> References: <20240613141906.1276105-1-max.chou@sifive.com> MIME-Version: 1.0 Received-SPF: pass client-ip=2607:f8b0:4864:20::632; envelope-from=max.chou@sifive.com; helo=mail-pl1-x632.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=unavailable autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org If there are not any QEMU plugin memory callback functions, checking before calling the qemu_plugin_vcpu_mem_cb function can reduce the function call overhead. Signed-off-by: Max Chou --- accel/tcg/ldst_common.c.inc | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/accel/tcg/ldst_common.c.inc b/accel/tcg/ldst_common.c.inc index c82048e377e..87ceb954873 100644 --- a/accel/tcg/ldst_common.c.inc +++ b/accel/tcg/ldst_common.c.inc @@ -125,7 +125,9 @@ void helper_st_i128(CPUArchState *env, uint64_t addr, Int128 val, MemOpIdx oi) static void plugin_load_cb(CPUArchState *env, abi_ptr addr, MemOpIdx oi) { - qemu_plugin_vcpu_mem_cb(env_cpu(env), addr, oi, QEMU_PLUGIN_MEM_R); + if (cpu_plugin_mem_cbs_enabled(env_cpu(env))) { + qemu_plugin_vcpu_mem_cb(env_cpu(env), addr, oi, QEMU_PLUGIN_MEM_R); + } } uint8_t cpu_ldb_mmu(CPUArchState *env, abi_ptr addr, MemOpIdx oi, uintptr_t ra) @@ -188,7 +190,9 @@ Int128 cpu_ld16_mmu(CPUArchState *env, abi_ptr addr, static void plugin_store_cb(CPUArchState *env, abi_ptr addr, MemOpIdx oi) { - qemu_plugin_vcpu_mem_cb(env_cpu(env), addr, oi, QEMU_PLUGIN_MEM_W); + if (cpu_plugin_mem_cbs_enabled(env_cpu(env))) { + qemu_plugin_vcpu_mem_cb(env_cpu(env), addr, oi, QEMU_PLUGIN_MEM_W); + } } void cpu_stb_mmu(CPUArchState *env, abi_ptr addr, uint8_t val, From patchwork Thu Jun 13 14:19:03 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Max Chou X-Patchwork-Id: 1947440 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=sifive.com header.i=@sifive.com header.a=rsa-sha256 header.s=google header.b=WMN46Vvx; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=nongnu.org (client-ip=209.51.188.17; helo=lists.gnu.org; envelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org; receiver=patchwork.ozlabs.org) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-ECDSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4W0Pj13CdBz20Xd for ; Fri, 14 Jun 2024 00:21:09 +1000 (AEST) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1sHlIX-0000FA-Hq; Thu, 13 Jun 2024 10:19:29 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1sHlIV-0000E0-MM for qemu-devel@nongnu.org; Thu, 13 Jun 2024 10:19:27 -0400 Received: from mail-pl1-x634.google.com ([2607:f8b0:4864:20::634]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1sHlIR-0003bz-QU for qemu-devel@nongnu.org; Thu, 13 Jun 2024 10:19:27 -0400 Received: by mail-pl1-x634.google.com with SMTP id d9443c01a7336-1f8395a530dso10538095ad.0 for ; Thu, 13 Jun 2024 07:19:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sifive.com; s=google; t=1718288362; x=1718893162; darn=nongnu.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=YH4Oa5pm/0pKo3boNfimaMqv+2oYcPyjW60A5muLKn4=; b=WMN46Vvx5IAwE/F4zOf7vFUnGhwE2T5NOQG8DxwWrxQdOwZ/dVxwb/jZV5wnH3x89f odBhnUEfEYWdZrgMN02w8RMTlx8Tu14I0M2DD147/MTuZRPUQwWQb98y2r2D/SpOoE6G EQobLe/CyAR9VrSqus3RshZt4fh6bvaYC6aZFAJ01xypjjHcN8Q+ckCrTrGzdpEF7vbK CzPMHiPqmYwkjxyUm/IRXNeR6yl/0c9or66lqHU7iKXlkqpOrKI/SLYK/LQC5XvS8xai JU6NpBYo20E/cB4fCbLd/MWTXkCmsSRsrTnu8/t3lHrguvNR2pT7oLUdvfIPiL0SMTzs 7Nog== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1718288362; x=1718893162; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=YH4Oa5pm/0pKo3boNfimaMqv+2oYcPyjW60A5muLKn4=; b=mE8XBAIbmpdgKVpccoJaJqXjOXIUKwI38LQ+a5MEyxPB659p9qjAkbbOZkN/rGEqfL YHrCB3WxuKnjkOSwQxXbqa75+xXk450zp6mRqxCjbeTK5dfzI2hfohucUKeRnCSG7TI2 s3ewS5FjdI1LbshL1mXiJZfdp2UrtvJndlirtsD6Ryu9Br02i01qyiE0zgd9HdVBCpt4 zdyeXCOGYXjDQL+yVRoaTH27cYQy0436cypsGyx5iZ+3mDaj/ltIuFRNmu1QgCamxPez exbuNNfWVvLfz1Vk11r2QDPjEe5DTnmEacf+fEwuA9vp5/4pKSxYn0iuhCa3e2AMwBQV LZxw== X-Gm-Message-State: AOJu0YxAnSy5jE568K+j+3u0yFtTH0I53kw8nHGg2fOxVIl+CUhQ9HA6 hUGtzCAhSnhVpJDY5/YIAAiJxONdoYyjFEEK+462uoMvY6j1gzZIXFfYn4FqebWHJvw+sshnHxE aVK61r9NfbI87HGuomu3z8QVbJ41o6g9miztBdtS1PeJLDAmgST6ABYiNvJKwKNUB0ck3+H0LlL p13PxympNts+LkzFUU7dfxV56jOJGWog72rAr6tQ== X-Google-Smtp-Source: AGHT+IHFys2vYEs+EIvvDQUUumdg0hsBn1xMQ9ZiHaLVuACfPb66EPVivfr9pZlXXCq/Epmz0k4o4g== X-Received: by 2002:a17:902:f68c:b0:1f6:8be6:428d with SMTP id d9443c01a7336-1f83b74ce72mr57077645ad.56.1718288361369; Thu, 13 Jun 2024 07:19:21 -0700 (PDT) Received: from duncan.localdomain (114-35-142-126.hinet-ip.hinet.net. [114.35.142.126]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-1f855f2fe6dsm14386975ad.257.2024.06.13.07.19.19 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 13 Jun 2024 07:19:21 -0700 (PDT) From: Max Chou To: qemu-devel@nongnu.org, qemu-riscv@nongnu.org Cc: Richard Henderson , Paolo Bonzini , Palmer Dabbelt , Alistair Francis , Bin Meng , Weiwei Li , Daniel Henrique Barboza , Liu Zhiwei , Max Chou Subject: [RFC PATCH v3 2/5] target/riscv: rvv: Provide a fast path using direct access to host ram for unmasked unit-stride load/store Date: Thu, 13 Jun 2024 22:19:03 +0800 Message-Id: <20240613141906.1276105-3-max.chou@sifive.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240613141906.1276105-1-max.chou@sifive.com> References: <20240613141906.1276105-1-max.chou@sifive.com> MIME-Version: 1.0 Received-SPF: pass client-ip=2607:f8b0:4864:20::634; envelope-from=max.chou@sifive.com; helo=mail-pl1-x634.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org This commit references the sve_ldN_r/sve_stN_r helper functions in ARM target to optimize the vector unmasked unit-stride load/store instructions by following items: * Get the loose bound of activate elements * Probing pages/resolving host memory address/handling watchpoint at beginning * Provide new interface to direct access host memory The original element load/store interface is replaced by the new element load/store functions with _tlb & _host postfix that means doing the element load/store through the original softmmu flow and the direct access host memory flow. Signed-off-by: Max Chou --- target/riscv/insn_trans/trans_rvv.c.inc | 3 + target/riscv/vector_helper.c | 630 ++++++++++++++++++------ target/riscv/vector_internals.h | 48 ++ 3 files changed, 544 insertions(+), 137 deletions(-) diff --git a/target/riscv/insn_trans/trans_rvv.c.inc b/target/riscv/insn_trans/trans_rvv.c.inc index 3a3896ba06c..14e10568bd7 100644 --- a/target/riscv/insn_trans/trans_rvv.c.inc +++ b/target/riscv/insn_trans/trans_rvv.c.inc @@ -770,6 +770,7 @@ static bool ld_us_mask_op(DisasContext *s, arg_vlm_v *a, uint8_t eew) /* Mask destination register are always tail-agnostic */ data = FIELD_DP32(data, VDATA, VTA, s->cfg_vta_all_1s); data = FIELD_DP32(data, VDATA, VMA, s->vma); + data = FIELD_DP32(data, VDATA, VM, 1); return ldst_us_trans(a->rd, a->rs1, data, fn, s, false); } @@ -787,6 +788,7 @@ static bool st_us_mask_op(DisasContext *s, arg_vsm_v *a, uint8_t eew) /* EMUL = 1, NFIELDS = 1 */ data = FIELD_DP32(data, VDATA, LMUL, 0); data = FIELD_DP32(data, VDATA, NF, 1); + data = FIELD_DP32(data, VDATA, VM, 1); return ldst_us_trans(a->rd, a->rs1, data, fn, s, true); } @@ -1106,6 +1108,7 @@ static bool ldst_whole_trans(uint32_t vd, uint32_t rs1, uint32_t nf, TCGv_i32 desc; uint32_t data = FIELD_DP32(0, VDATA, NF, nf); + data = FIELD_DP32(data, VDATA, VM, 1); dest = tcg_temp_new_ptr(); desc = tcg_constant_i32(simd_desc(s->cfg_ptr->vlenb, s->cfg_ptr->vlenb, data)); diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c index 1b4d5a8e378..d33ba5aeca1 100644 --- a/target/riscv/vector_helper.c +++ b/target/riscv/vector_helper.c @@ -29,6 +29,7 @@ #include "tcg/tcg-gvec-desc.h" #include "internals.h" #include "vector_internals.h" +#include "hw/core/tcg-cpu-ops.h" #include target_ulong HELPER(vsetvl)(CPURISCVState *env, target_ulong s1, @@ -136,6 +137,263 @@ static void probe_pages(CPURISCVState *env, target_ulong addr, } } +/* + * Find first active element on each page, and a loose bound for the + * final element on each page. Identify any single element that spans + * the page boundary. Return true if there are any active elements. + */ +static bool vext_cont_ldst_elements(RVVContLdSt *info, target_ulong addr, + void *v0, uint32_t vstart, uint32_t evl, + uint32_t desc, uint32_t log2_esz, + bool is_us_whole) +{ + uint32_t vm = vext_vm(desc); + uint32_t nf = vext_nf(desc); + uint32_t max_elems = vext_max_elems(desc, log2_esz); + uint32_t esz = 1 << log2_esz; + uint32_t msize = is_us_whole ? esz : nf * esz; + int32_t reg_idx_first = -1, reg_idx_last = -1, reg_idx_split; + int32_t mem_off_last, mem_off_split; + int32_t page_split, elt_split; + int32_t i; + + /* Set all of the element indices to -1, and the TLB data to 0. */ + memset(info, -1, offsetof(RVVContLdSt, page)); + memset(info->page, 0, sizeof(info->page)); + + /* Gross scan over the mask register v0 to find bounds. */ + if (vm == 0) { + for (i = vstart; i < evl; ++i) { + if (vext_elem_mask(v0, i)) { + reg_idx_last = i; + if (reg_idx_first < 0) { + reg_idx_first = i; + } + } + } + } else { + reg_idx_first = vstart; + reg_idx_last = evl - 1; + } + + if (unlikely(reg_idx_first < 0)) { + /* No active elements, no pages touched. */ + return false; + } + tcg_debug_assert(reg_idx_last >= 0 && reg_idx_last < max_elems); + + info->reg_idx_first[0] = reg_idx_first; + info->mem_off_first[0] = reg_idx_first * msize; + mem_off_last = reg_idx_last * msize; + + page_split = -(addr | TARGET_PAGE_MASK); + if (likely(mem_off_last + msize <= page_split)) { + /* The entire operation fits within a single page. */ + info->reg_idx_last[0] = reg_idx_last; + return true; + } + + info->page_split = page_split; + elt_split = page_split / msize; + reg_idx_split = elt_split; + mem_off_split = elt_split * msize; + + /* + * This is the last full element on the first page, but it is not + * necessarily active. If there is no full element, i.e. the first + * active element is the one that's split, this value remains -1. + * It is useful as iteration bounds. + */ + if (elt_split != 0) { + info->reg_idx_last[0] = reg_idx_split - 1; + } + + /* Determine if an unaligned element spans the pages. */ + if (page_split % msize != 0) { + /* It is helpful to know if the split element is active. */ + if (vm == 1 || (vm == 0 && vext_elem_mask(v0, reg_idx_split))) { + info->reg_idx_split = reg_idx_split; + info->mem_off_split = mem_off_split; + + if (reg_idx_split == reg_idx_last) { + /* The page crossing element is last. */ + return true; + } + } + reg_idx_split++; + mem_off_split += msize; + } + + /* + * We do want the first active element on the second page, because + * this may affect the address reported in an exception. + */ + if (vm == 0) { + for (; reg_idx_split < evl; ++reg_idx_split) { + if (vext_elem_mask(v0, reg_idx_split)) { + break; + } + } + } + tcg_debug_assert(reg_idx_split <= reg_idx_last); + info->reg_idx_first[1] = reg_idx_split; + info->mem_off_first[1] = reg_idx_split * msize; + info->reg_idx_last[1] = reg_idx_last; + return true; +} + +/* + * Resolve the guest virtual address to info->host and info->flags. + * If @nofault, return false if the page is invalid, otherwise + * exit via page fault exception. + */ +static bool vext_probe_page(CPURISCVState *env, RVVHostPage *info, + bool nofault, target_ulong addr, int mem_off, + int size, MMUAccessType access_type, int mmu_idx, + uintptr_t ra) +{ + int flags; + + addr += mem_off; + +#ifdef CONFIG_USER_ONLY + flags = probe_access_flags(env, adjust_addr(env, addr), size, access_type, + mmu_idx, nofault, &info->host, ra); +#else + CPUTLBEntryFull *full; + flags = probe_access_full(env, adjust_addr(env, addr), size, access_type, + mmu_idx, nofault, &info->host, &full, ra); +#endif + info->flags = flags; + + if (flags & TLB_INVALID_MASK) { + g_assert(nofault); + return false; + } + +#ifdef CONFIG_USER_ONLY + memset(&info->attrs, 0, sizeof(info->attrs)); +#else + info->attrs = full->attrs; +#endif + + /* Ensure that info->host[] is relative to addr, not addr + mem_off. */ + info->host -= mem_off; + return true; +} + +/* + * Resolve the guest virtual addresses to info->page[]. + * Control the generation of page faults with @fault. Return false if + * there is no work to do, which can only happen with @fault == FAULT_NO. + */ +static bool vext_cont_ldst_pages(CPURISCVState *env, RVVContLdSt *info, + target_ulong addr, bool is_load, + uint32_t desc, uint32_t esz, uintptr_t ra) +{ + uint32_t vm = vext_vm(desc); + bool nofault = (vm == 1 ? false : true); + int mmu_index = riscv_env_mmu_index(env, false); + int mem_off = info->mem_off_first[0]; + int size = (info->reg_idx_last[0] - info->reg_idx_first[0] + 1) * esz; + MMUAccessType access_type = is_load ? MMU_DATA_LOAD : MMU_DATA_STORE; + bool have_work; + + have_work = vext_probe_page(env, &info->page[0], nofault, addr, mem_off, + size, access_type, mmu_index, ra); + if (!have_work) { + /* No work to be done. */ + return false; + } + + if (likely(info->page_split < 0)) { + /* The entire operation was on the one page. */ + return true; + } + + /* + * If the second page is invalid, then we want the fault address to be + * the first byte on that page which is accessed. + */ + if (info->mem_off_split >= 0) { + /* + * There is an element split across the pages. The fault address + * should be the first byte of the second page. + */ + mem_off = info->page_split; + } else { + /* + * There is no element split across the pages. The fault address + * should be the first active element on the second page. + */ + mem_off = info->mem_off_first[1]; + } + size = info->reg_idx_last[1] * esz - mem_off + esz; + have_work |= vext_probe_page(env, &info->page[1], nofault, addr, mem_off, + size, access_type, mmu_index, ra); + return have_work; +} + +#ifndef CONFIG_USER_ONLY +void vext_cont_ldst_watchpoints(CPURISCVState *env, RVVContLdSt *info, + uint64_t *v0, target_ulong addr, + uint32_t esz, bool is_load, uintptr_t ra, + uint32_t desc) +{ + int32_t i; + intptr_t mem_off, reg_off, reg_last; + uint32_t vm = vext_vm(desc); + int wp_access = is_load == true ? BP_MEM_READ : BP_MEM_WRITE; + int flags0 = info->page[0].flags; + int flags1 = info->page[1].flags; + + if (likely(!((flags0 | flags1) & TLB_WATCHPOINT))) { + return; + } + + /* Indicate that watchpoints are handled. */ + info->page[0].flags = flags0 & ~TLB_WATCHPOINT; + info->page[1].flags = flags1 & ~TLB_WATCHPOINT; + + if (flags0 & TLB_WATCHPOINT) { + mem_off = info->mem_off_first[0]; + reg_off = info->reg_idx_first[0]; + reg_last = info->reg_idx_last[0]; + + for (i = reg_off; i < reg_last; ++i, mem_off += esz) { + if (vm == 1 || (vm == 0 && vext_elem_mask(v0, i))) { + cpu_check_watchpoint(env_cpu(env), + adjust_addr(env, addr + mem_off), esz, + info->page[0].attrs, wp_access, ra); + } + } + } + + mem_off = info->mem_off_split; + if (mem_off >= 0) { + if (vm == 1 || (vm == 0 && vext_elem_mask(v0, mem_off / esz))) { + cpu_check_watchpoint(env_cpu(env), + adjust_addr(env, addr + mem_off), esz, + info->page[0].attrs, wp_access, ra); + } + } + + mem_off = info->mem_off_first[1]; + if ((flags1 & TLB_WATCHPOINT) && mem_off >= 0) { + reg_off = info->reg_idx_first[1]; + reg_last = info->reg_idx_last[1]; + + for (i = reg_off; i < reg_last; ++i, mem_off += esz) { + if (vm == 1 || (vm == 0 && vext_elem_mask(v0, i))) { + cpu_check_watchpoint(env_cpu(env), + adjust_addr(env, addr + mem_off), esz, + info->page[1].attrs, wp_access, ra); + } + } + } +} +#endif + static inline void vext_set_elem_mask(void *v0, int index, uint8_t value) { @@ -146,34 +404,51 @@ static inline void vext_set_elem_mask(void *v0, int index, } /* elements operations for load and store */ -typedef void vext_ldst_elem_fn(CPURISCVState *env, abi_ptr addr, - uint32_t idx, void *vd, uintptr_t retaddr); +typedef void vext_ldst_elem_fn_tlb(CPURISCVState *env, abi_ptr addr, + uint32_t idx, void *vd, uintptr_t retaddr); +typedef void vext_ldst_elem_fn_host(void *vd, uint32_t idx, void *host); -#define GEN_VEXT_LD_ELEM(NAME, ETYPE, H, LDSUF) \ -static void NAME(CPURISCVState *env, abi_ptr addr, \ - uint32_t idx, void *vd, uintptr_t retaddr)\ -{ \ - ETYPE *cur = ((ETYPE *)vd + H(idx)); \ - *cur = cpu_##LDSUF##_data_ra(env, addr, retaddr); \ -} \ - -GEN_VEXT_LD_ELEM(lde_b, int8_t, H1, ldsb) -GEN_VEXT_LD_ELEM(lde_h, int16_t, H2, ldsw) -GEN_VEXT_LD_ELEM(lde_w, int32_t, H4, ldl) -GEN_VEXT_LD_ELEM(lde_d, int64_t, H8, ldq) - -#define GEN_VEXT_ST_ELEM(NAME, ETYPE, H, STSUF) \ -static void NAME(CPURISCVState *env, abi_ptr addr, \ - uint32_t idx, void *vd, uintptr_t retaddr)\ -{ \ - ETYPE data = *((ETYPE *)vd + H(idx)); \ - cpu_##STSUF##_data_ra(env, addr, data, retaddr); \ +#define GEN_VEXT_LD_ELEM(NAME, ETYPE, H, LDSUF) \ +static void NAME##_tlb(CPURISCVState *env, abi_ptr addr, \ + uint32_t byte_off, void *vd, uintptr_t retaddr) \ +{ \ + uint8_t *reg = ((uint8_t *)vd + byte_off); \ + ETYPE *cur = ((ETYPE *)reg); \ + *cur = cpu_##LDSUF##_data_ra(env, addr, retaddr); \ +} \ + \ +static void NAME##_host(void *vd, uint32_t byte_off, void *host) \ +{ \ + ETYPE val = LDSUF##_p(host); \ + uint8_t *reg = (uint8_t *)(vd + byte_off); \ + *(ETYPE *)(reg) = val; \ +} + +GEN_VEXT_LD_ELEM(lde_b, uint8_t, H1, ldub) +GEN_VEXT_LD_ELEM(lde_h, uint16_t, H2, lduw) +GEN_VEXT_LD_ELEM(lde_w, uint32_t, H4, ldl) +GEN_VEXT_LD_ELEM(lde_d, uint64_t, H8, ldq) + +#define GEN_VEXT_ST_ELEM(NAME, ETYPE, H, STSUF) \ +static void NAME##_tlb(CPURISCVState *env, abi_ptr addr, \ + uint32_t byte_off, void *vd, uintptr_t retaddr) \ +{ \ + uint8_t *reg = ((uint8_t *)vd + byte_off); \ + ETYPE data = *((ETYPE *)reg); \ + cpu_##STSUF##_data_ra(env, addr, data, retaddr); \ +} \ + \ +static void NAME##_host(void *vd, uint32_t byte_off, void *host) \ +{ \ + uint8_t *reg = ((uint8_t *)vd + byte_off); \ + ETYPE val = *(ETYPE *)(reg); \ + STSUF##_p(host, val); \ } -GEN_VEXT_ST_ELEM(ste_b, int8_t, H1, stb) -GEN_VEXT_ST_ELEM(ste_h, int16_t, H2, stw) -GEN_VEXT_ST_ELEM(ste_w, int32_t, H4, stl) -GEN_VEXT_ST_ELEM(ste_d, int64_t, H8, stq) +GEN_VEXT_ST_ELEM(ste_b, uint8_t, H1, stb) +GEN_VEXT_ST_ELEM(ste_h, uint16_t, H2, stw) +GEN_VEXT_ST_ELEM(ste_w, uint32_t, H4, stl) +GEN_VEXT_ST_ELEM(ste_d, uint64_t, H8, stq) static void vext_set_tail_elems_1s(target_ulong vl, void *vd, uint32_t desc, uint32_t nf, @@ -199,7 +474,7 @@ static void vext_ldst_stride(void *vd, void *v0, target_ulong base, target_ulong stride, CPURISCVState *env, uint32_t desc, uint32_t vm, - vext_ldst_elem_fn *ldst_elem, + vext_ldst_elem_fn_tlb *ldst_elem, uint32_t log2_esz, uintptr_t ra) { uint32_t i, k; @@ -221,7 +496,8 @@ vext_ldst_stride(void *vd, void *v0, target_ulong base, continue; } target_ulong addr = base + stride * i + (k << log2_esz); - ldst_elem(env, adjust_addr(env, addr), i + k * max_elems, vd, ra); + ldst_elem(env, adjust_addr(env, addr), + (i + k * max_elems) << log2_esz, vd, ra); k++; } } @@ -240,10 +516,10 @@ void HELPER(NAME)(void *vd, void * v0, target_ulong base, \ ctzl(sizeof(ETYPE)), GETPC()); \ } -GEN_VEXT_LD_STRIDE(vlse8_v, int8_t, lde_b) -GEN_VEXT_LD_STRIDE(vlse16_v, int16_t, lde_h) -GEN_VEXT_LD_STRIDE(vlse32_v, int32_t, lde_w) -GEN_VEXT_LD_STRIDE(vlse64_v, int64_t, lde_d) +GEN_VEXT_LD_STRIDE(vlse8_v, int8_t, lde_b_tlb) +GEN_VEXT_LD_STRIDE(vlse16_v, int16_t, lde_h_tlb) +GEN_VEXT_LD_STRIDE(vlse32_v, int32_t, lde_w_tlb) +GEN_VEXT_LD_STRIDE(vlse64_v, int64_t, lde_d_tlb) #define GEN_VEXT_ST_STRIDE(NAME, ETYPE, STORE_FN) \ void HELPER(NAME)(void *vd, void *v0, target_ulong base, \ @@ -255,10 +531,10 @@ void HELPER(NAME)(void *vd, void *v0, target_ulong base, \ ctzl(sizeof(ETYPE)), GETPC()); \ } -GEN_VEXT_ST_STRIDE(vsse8_v, int8_t, ste_b) -GEN_VEXT_ST_STRIDE(vsse16_v, int16_t, ste_h) -GEN_VEXT_ST_STRIDE(vsse32_v, int32_t, ste_w) -GEN_VEXT_ST_STRIDE(vsse64_v, int64_t, ste_d) +GEN_VEXT_ST_STRIDE(vsse8_v, int8_t, ste_b_tlb) +GEN_VEXT_ST_STRIDE(vsse16_v, int16_t, ste_h_tlb) +GEN_VEXT_ST_STRIDE(vsse32_v, int32_t, ste_w_tlb) +GEN_VEXT_ST_STRIDE(vsse64_v, int64_t, ste_d_tlb) /* * unit-stride: access elements stored contiguously in memory @@ -267,9 +543,14 @@ GEN_VEXT_ST_STRIDE(vsse64_v, int64_t, ste_d) /* unmasked unit-stride load and store operation */ static void vext_ldst_us(void *vd, target_ulong base, CPURISCVState *env, uint32_t desc, - vext_ldst_elem_fn *ldst_elem, uint32_t log2_esz, uint32_t evl, - uintptr_t ra) + vext_ldst_elem_fn_tlb *ldst_tlb, + vext_ldst_elem_fn_host *ldst_host, uint32_t log2_esz, + uint32_t evl, uintptr_t ra, bool is_load) { + RVVContLdSt info; + void *host; + int flags; + intptr_t reg_start, reg_last; uint32_t i, k; uint32_t nf = vext_nf(desc); uint32_t max_elems = vext_max_elems(desc, log2_esz); @@ -277,17 +558,88 @@ vext_ldst_us(void *vd, target_ulong base, CPURISCVState *env, uint32_t desc, VSTART_CHECK_EARLY_EXIT(env); - /* load bytes from guest memory */ - for (i = env->vstart; i < evl; env->vstart = ++i) { + vext_cont_ldst_elements(&info, base, env->vreg, env->vstart, evl, desc, + log2_esz, false); + /* Probe the page(s). Exit with exception for any invalid page. */ + vext_cont_ldst_pages(env, &info, base, is_load, desc, esz, ra); + /* Handle watchpoints for all active elements. */ + vext_cont_ldst_watchpoints(env, &info, env->vreg, base, esz, is_load, ra, + desc); + + /* Load bytes from guest memory */ + flags = info.page[0].flags | info.page[1].flags; + if (unlikely(flags != 0)) { + /* At least one page includes MMIO. */ + reg_start = info.reg_idx_first[0]; + reg_last = info.reg_idx_last[1]; + if (reg_last < 0) { + reg_last = info.reg_idx_split; + if (reg_last < 0) { + reg_last = info.reg_idx_last[0]; + } + } + reg_last += 1; + + for (i = reg_start; i < reg_last; ++i) { + k = 0; + while (k < nf) { + target_ulong addr = base + ((i * nf + k) << log2_esz); + ldst_tlb(env, adjust_addr(env, addr), + (i + k * max_elems) << log2_esz, vd, ra); + k++; + } + } + + env->vstart = 0; + vext_set_tail_elems_1s(evl, vd, desc, nf, esz, max_elems); + return; + } + + /* The entire operation is in RAM, on valid pages. */ + reg_start = info.reg_idx_first[0]; + reg_last = info.reg_idx_last[0] + 1; + host = info.page[0].host; + + for (i = reg_start; i < reg_last; ++i) { k = 0; while (k < nf) { - target_ulong addr = base + ((i * nf + k) << log2_esz); - ldst_elem(env, adjust_addr(env, addr), i + k * max_elems, vd, ra); + ldst_host(vd, (i + k * max_elems) << log2_esz, + host + ((i * nf + k) << log2_esz)); k++; } } - env->vstart = 0; + /* + * Use the slow path to manage the cross-page misalignment. + * But we know this is RAM and cannot trap. + */ + if (unlikely(info.mem_off_split >= 0)) { + reg_start = info.reg_idx_split; + k = 0; + while (k < nf) { + target_ulong addr = base + ((reg_start * nf + k) << log2_esz); + ldst_tlb(env, adjust_addr(env, addr), + (reg_start + k * max_elems) << log2_esz, vd, ra); + k++; + } + } + + if (unlikely(info.mem_off_first[1] >= 0)) { + reg_start = info.reg_idx_first[1]; + reg_last = info.reg_idx_last[1] + 1; + host = info.page[1].host; + + for (i = reg_start; i < reg_last; ++i) { + k = 0; + while (k < nf) { + ldst_host(vd, (i + k * max_elems) << log2_esz, + host + ((i * nf + k) << log2_esz)); + k++; + } + } + } + + env->vstart = 0; vext_set_tail_elems_1s(evl, vd, desc, nf, esz, max_elems); } @@ -296,47 +648,47 @@ vext_ldst_us(void *vd, target_ulong base, CPURISCVState *env, uint32_t desc, * stride, stride = NF * sizeof (ETYPE) */ -#define GEN_VEXT_LD_US(NAME, ETYPE, LOAD_FN) \ -void HELPER(NAME##_mask)(void *vd, void *v0, target_ulong base, \ - CPURISCVState *env, uint32_t desc) \ -{ \ - uint32_t stride = vext_nf(desc) << ctzl(sizeof(ETYPE)); \ - vext_ldst_stride(vd, v0, base, stride, env, desc, false, LOAD_FN, \ - ctzl(sizeof(ETYPE)), GETPC()); \ -} \ - \ -void HELPER(NAME)(void *vd, void *v0, target_ulong base, \ - CPURISCVState *env, uint32_t desc) \ -{ \ - vext_ldst_us(vd, base, env, desc, LOAD_FN, \ - ctzl(sizeof(ETYPE)), env->vl, GETPC()); \ +#define GEN_VEXT_LD_US(NAME, ETYPE, LOAD_FN_TLB, LOAD_FN_HOST) \ +void HELPER(NAME##_mask)(void *vd, void *v0, target_ulong base, \ + CPURISCVState *env, uint32_t desc) \ +{ \ + uint32_t stride = vext_nf(desc) << ctzl(sizeof(ETYPE)); \ + vext_ldst_stride(vd, v0, base, stride, env, desc, false, \ + LOAD_FN_TLB, ctzl(sizeof(ETYPE)), GETPC()); \ +} \ + \ +void HELPER(NAME)(void *vd, void *v0, target_ulong base, \ + CPURISCVState *env, uint32_t desc) \ +{ \ + vext_ldst_us(vd, base, env, desc, LOAD_FN_TLB, LOAD_FN_HOST, \ + ctzl(sizeof(ETYPE)), env->vl, GETPC(), true); \ } -GEN_VEXT_LD_US(vle8_v, int8_t, lde_b) -GEN_VEXT_LD_US(vle16_v, int16_t, lde_h) -GEN_VEXT_LD_US(vle32_v, int32_t, lde_w) -GEN_VEXT_LD_US(vle64_v, int64_t, lde_d) +GEN_VEXT_LD_US(vle8_v, int8_t, lde_b_tlb, lde_b_host) +GEN_VEXT_LD_US(vle16_v, int16_t, lde_h_tlb, lde_h_host) +GEN_VEXT_LD_US(vle32_v, int32_t, lde_w_tlb, lde_w_host) +GEN_VEXT_LD_US(vle64_v, int64_t, lde_d_tlb, lde_d_host) -#define GEN_VEXT_ST_US(NAME, ETYPE, STORE_FN) \ +#define GEN_VEXT_ST_US(NAME, ETYPE, STORE_FN_TLB, STORE_FN_HOST) \ void HELPER(NAME##_mask)(void *vd, void *v0, target_ulong base, \ CPURISCVState *env, uint32_t desc) \ { \ uint32_t stride = vext_nf(desc) << ctzl(sizeof(ETYPE)); \ - vext_ldst_stride(vd, v0, base, stride, env, desc, false, STORE_FN, \ - ctzl(sizeof(ETYPE)), GETPC()); \ + vext_ldst_stride(vd, v0, base, stride, env, desc, false, \ + STORE_FN_TLB, ctzl(sizeof(ETYPE)), GETPC()); \ } \ \ void HELPER(NAME)(void *vd, void *v0, target_ulong base, \ CPURISCVState *env, uint32_t desc) \ { \ - vext_ldst_us(vd, base, env, desc, STORE_FN, \ - ctzl(sizeof(ETYPE)), env->vl, GETPC()); \ + vext_ldst_us(vd, base, env, desc, STORE_FN_TLB, STORE_FN_HOST, \ + ctzl(sizeof(ETYPE)), env->vl, GETPC(), false); \ } -GEN_VEXT_ST_US(vse8_v, int8_t, ste_b) -GEN_VEXT_ST_US(vse16_v, int16_t, ste_h) -GEN_VEXT_ST_US(vse32_v, int32_t, ste_w) -GEN_VEXT_ST_US(vse64_v, int64_t, ste_d) +GEN_VEXT_ST_US(vse8_v, int8_t, ste_b_tlb, ste_b_host) +GEN_VEXT_ST_US(vse16_v, int16_t, ste_h_tlb, ste_h_host) +GEN_VEXT_ST_US(vse32_v, int32_t, ste_w_tlb, ste_w_host) +GEN_VEXT_ST_US(vse64_v, int64_t, ste_d_tlb, ste_d_host) /* * unit stride mask load and store, EEW = 1 @@ -346,8 +698,8 @@ void HELPER(vlm_v)(void *vd, void *v0, target_ulong base, { /* evl = ceil(vl/8) */ uint8_t evl = (env->vl + 7) >> 3; - vext_ldst_us(vd, base, env, desc, lde_b, - 0, evl, GETPC()); + vext_ldst_us(vd, base, env, desc, lde_b_tlb, lde_b_host, + 0, evl, GETPC(), true); } void HELPER(vsm_v)(void *vd, void *v0, target_ulong base, @@ -355,8 +707,8 @@ void HELPER(vsm_v)(void *vd, void *v0, target_ulong base, { /* evl = ceil(vl/8) */ uint8_t evl = (env->vl + 7) >> 3; - vext_ldst_us(vd, base, env, desc, ste_b, - 0, evl, GETPC()); + vext_ldst_us(vd, base, env, desc, ste_b_tlb, ste_b_host, + 0, evl, GETPC(), false); } /* @@ -381,7 +733,7 @@ static inline void vext_ldst_index(void *vd, void *v0, target_ulong base, void *vs2, CPURISCVState *env, uint32_t desc, vext_get_index_addr get_index_addr, - vext_ldst_elem_fn *ldst_elem, + vext_ldst_elem_fn_tlb *ldst_elem, uint32_t log2_esz, uintptr_t ra) { uint32_t i, k; @@ -405,7 +757,8 @@ vext_ldst_index(void *vd, void *v0, target_ulong base, continue; } abi_ptr addr = get_index_addr(base, i, vs2) + (k << log2_esz); - ldst_elem(env, adjust_addr(env, addr), i + k * max_elems, vd, ra); + ldst_elem(env, adjust_addr(env, addr), + (i + k * max_elems) << log2_esz, vd, ra); k++; } } @@ -422,22 +775,22 @@ void HELPER(NAME)(void *vd, void *v0, target_ulong base, \ LOAD_FN, ctzl(sizeof(ETYPE)), GETPC()); \ } -GEN_VEXT_LD_INDEX(vlxei8_8_v, int8_t, idx_b, lde_b) -GEN_VEXT_LD_INDEX(vlxei8_16_v, int16_t, idx_b, lde_h) -GEN_VEXT_LD_INDEX(vlxei8_32_v, int32_t, idx_b, lde_w) -GEN_VEXT_LD_INDEX(vlxei8_64_v, int64_t, idx_b, lde_d) -GEN_VEXT_LD_INDEX(vlxei16_8_v, int8_t, idx_h, lde_b) -GEN_VEXT_LD_INDEX(vlxei16_16_v, int16_t, idx_h, lde_h) -GEN_VEXT_LD_INDEX(vlxei16_32_v, int32_t, idx_h, lde_w) -GEN_VEXT_LD_INDEX(vlxei16_64_v, int64_t, idx_h, lde_d) -GEN_VEXT_LD_INDEX(vlxei32_8_v, int8_t, idx_w, lde_b) -GEN_VEXT_LD_INDEX(vlxei32_16_v, int16_t, idx_w, lde_h) -GEN_VEXT_LD_INDEX(vlxei32_32_v, int32_t, idx_w, lde_w) -GEN_VEXT_LD_INDEX(vlxei32_64_v, int64_t, idx_w, lde_d) -GEN_VEXT_LD_INDEX(vlxei64_8_v, int8_t, idx_d, lde_b) -GEN_VEXT_LD_INDEX(vlxei64_16_v, int16_t, idx_d, lde_h) -GEN_VEXT_LD_INDEX(vlxei64_32_v, int32_t, idx_d, lde_w) -GEN_VEXT_LD_INDEX(vlxei64_64_v, int64_t, idx_d, lde_d) +GEN_VEXT_LD_INDEX(vlxei8_8_v, int8_t, idx_b, lde_b_tlb) +GEN_VEXT_LD_INDEX(vlxei8_16_v, int16_t, idx_b, lde_h_tlb) +GEN_VEXT_LD_INDEX(vlxei8_32_v, int32_t, idx_b, lde_w_tlb) +GEN_VEXT_LD_INDEX(vlxei8_64_v, int64_t, idx_b, lde_d_tlb) +GEN_VEXT_LD_INDEX(vlxei16_8_v, int8_t, idx_h, lde_b_tlb) +GEN_VEXT_LD_INDEX(vlxei16_16_v, int16_t, idx_h, lde_h_tlb) +GEN_VEXT_LD_INDEX(vlxei16_32_v, int32_t, idx_h, lde_w_tlb) +GEN_VEXT_LD_INDEX(vlxei16_64_v, int64_t, idx_h, lde_d_tlb) +GEN_VEXT_LD_INDEX(vlxei32_8_v, int8_t, idx_w, lde_b_tlb) +GEN_VEXT_LD_INDEX(vlxei32_16_v, int16_t, idx_w, lde_h_tlb) +GEN_VEXT_LD_INDEX(vlxei32_32_v, int32_t, idx_w, lde_w_tlb) +GEN_VEXT_LD_INDEX(vlxei32_64_v, int64_t, idx_w, lde_d_tlb) +GEN_VEXT_LD_INDEX(vlxei64_8_v, int8_t, idx_d, lde_b_tlb) +GEN_VEXT_LD_INDEX(vlxei64_16_v, int16_t, idx_d, lde_h_tlb) +GEN_VEXT_LD_INDEX(vlxei64_32_v, int32_t, idx_d, lde_w_tlb) +GEN_VEXT_LD_INDEX(vlxei64_64_v, int64_t, idx_d, lde_d_tlb) #define GEN_VEXT_ST_INDEX(NAME, ETYPE, INDEX_FN, STORE_FN) \ void HELPER(NAME)(void *vd, void *v0, target_ulong base, \ @@ -448,22 +801,22 @@ void HELPER(NAME)(void *vd, void *v0, target_ulong base, \ GETPC()); \ } -GEN_VEXT_ST_INDEX(vsxei8_8_v, int8_t, idx_b, ste_b) -GEN_VEXT_ST_INDEX(vsxei8_16_v, int16_t, idx_b, ste_h) -GEN_VEXT_ST_INDEX(vsxei8_32_v, int32_t, idx_b, ste_w) -GEN_VEXT_ST_INDEX(vsxei8_64_v, int64_t, idx_b, ste_d) -GEN_VEXT_ST_INDEX(vsxei16_8_v, int8_t, idx_h, ste_b) -GEN_VEXT_ST_INDEX(vsxei16_16_v, int16_t, idx_h, ste_h) -GEN_VEXT_ST_INDEX(vsxei16_32_v, int32_t, idx_h, ste_w) -GEN_VEXT_ST_INDEX(vsxei16_64_v, int64_t, idx_h, ste_d) -GEN_VEXT_ST_INDEX(vsxei32_8_v, int8_t, idx_w, ste_b) -GEN_VEXT_ST_INDEX(vsxei32_16_v, int16_t, idx_w, ste_h) -GEN_VEXT_ST_INDEX(vsxei32_32_v, int32_t, idx_w, ste_w) -GEN_VEXT_ST_INDEX(vsxei32_64_v, int64_t, idx_w, ste_d) -GEN_VEXT_ST_INDEX(vsxei64_8_v, int8_t, idx_d, ste_b) -GEN_VEXT_ST_INDEX(vsxei64_16_v, int16_t, idx_d, ste_h) -GEN_VEXT_ST_INDEX(vsxei64_32_v, int32_t, idx_d, ste_w) -GEN_VEXT_ST_INDEX(vsxei64_64_v, int64_t, idx_d, ste_d) +GEN_VEXT_ST_INDEX(vsxei8_8_v, int8_t, idx_b, ste_b_tlb) +GEN_VEXT_ST_INDEX(vsxei8_16_v, int16_t, idx_b, ste_h_tlb) +GEN_VEXT_ST_INDEX(vsxei8_32_v, int32_t, idx_b, ste_w_tlb) +GEN_VEXT_ST_INDEX(vsxei8_64_v, int64_t, idx_b, ste_d_tlb) +GEN_VEXT_ST_INDEX(vsxei16_8_v, int8_t, idx_h, ste_b_tlb) +GEN_VEXT_ST_INDEX(vsxei16_16_v, int16_t, idx_h, ste_h_tlb) +GEN_VEXT_ST_INDEX(vsxei16_32_v, int32_t, idx_h, ste_w_tlb) +GEN_VEXT_ST_INDEX(vsxei16_64_v, int64_t, idx_h, ste_d_tlb) +GEN_VEXT_ST_INDEX(vsxei32_8_v, int8_t, idx_w, ste_b_tlb) +GEN_VEXT_ST_INDEX(vsxei32_16_v, int16_t, idx_w, ste_h_tlb) +GEN_VEXT_ST_INDEX(vsxei32_32_v, int32_t, idx_w, ste_w_tlb) +GEN_VEXT_ST_INDEX(vsxei32_64_v, int64_t, idx_w, ste_d_tlb) +GEN_VEXT_ST_INDEX(vsxei64_8_v, int8_t, idx_d, ste_b_tlb) +GEN_VEXT_ST_INDEX(vsxei64_16_v, int16_t, idx_d, ste_h_tlb) +GEN_VEXT_ST_INDEX(vsxei64_32_v, int32_t, idx_d, ste_w_tlb) +GEN_VEXT_ST_INDEX(vsxei64_64_v, int64_t, idx_d, ste_d_tlb) /* * unit-stride fault-only-fisrt load instructions @@ -471,7 +824,7 @@ GEN_VEXT_ST_INDEX(vsxei64_64_v, int64_t, idx_d, ste_d) static inline void vext_ldff(void *vd, void *v0, target_ulong base, CPURISCVState *env, uint32_t desc, - vext_ldst_elem_fn *ldst_elem, + vext_ldst_elem_fn_tlb *ldst_elem, uint32_t log2_esz, uintptr_t ra) { void *host; @@ -537,7 +890,8 @@ ProbeSuccess: continue; } addr = base + ((i * nf + k) << log2_esz); - ldst_elem(env, adjust_addr(env, addr), i + k * max_elems, vd, ra); + ldst_elem(env, adjust_addr(env, addr), + (i + k * max_elems) << log2_esz, vd, ra); k++; } } @@ -554,10 +908,10 @@ void HELPER(NAME)(void *vd, void *v0, target_ulong base, \ ctzl(sizeof(ETYPE)), GETPC()); \ } -GEN_VEXT_LDFF(vle8ff_v, int8_t, lde_b) -GEN_VEXT_LDFF(vle16ff_v, int16_t, lde_h) -GEN_VEXT_LDFF(vle32ff_v, int32_t, lde_w) -GEN_VEXT_LDFF(vle64ff_v, int64_t, lde_d) +GEN_VEXT_LDFF(vle8ff_v, int8_t, lde_b_tlb) +GEN_VEXT_LDFF(vle16ff_v, int16_t, lde_h_tlb) +GEN_VEXT_LDFF(vle32ff_v, int32_t, lde_w_tlb) +GEN_VEXT_LDFF(vle64ff_v, int64_t, lde_d_tlb) #define DO_SWAP(N, M) (M) #define DO_AND(N, M) (N & M) @@ -574,7 +928,8 @@ GEN_VEXT_LDFF(vle64ff_v, int64_t, lde_d) */ static void vext_ldst_whole(void *vd, target_ulong base, CPURISCVState *env, uint32_t desc, - vext_ldst_elem_fn *ldst_elem, uint32_t log2_esz, uintptr_t ra) + vext_ldst_elem_fn_tlb *ldst_elem, uint32_t log2_esz, + uintptr_t ra) { uint32_t i, k, off, pos; uint32_t nf = vext_nf(desc); @@ -593,8 +948,8 @@ vext_ldst_whole(void *vd, target_ulong base, CPURISCVState *env, uint32_t desc, /* load/store rest of elements of current segment pointed by vstart */ for (pos = off; pos < max_elems; pos++, env->vstart++) { target_ulong addr = base + ((pos + k * max_elems) << log2_esz); - ldst_elem(env, adjust_addr(env, addr), pos + k * max_elems, vd, - ra); + ldst_elem(env, adjust_addr(env, addr), + (pos + k * max_elems) << log2_esz, vd, ra); } k++; } @@ -603,7 +958,8 @@ vext_ldst_whole(void *vd, target_ulong base, CPURISCVState *env, uint32_t desc, for (; k < nf; k++) { for (i = 0; i < max_elems; i++, env->vstart++) { target_ulong addr = base + ((i + k * max_elems) << log2_esz); - ldst_elem(env, adjust_addr(env, addr), i + k * max_elems, vd, ra); + ldst_elem(env, adjust_addr(env, addr), + (i + k * max_elems) << log2_esz, vd, ra); } } @@ -618,22 +974,22 @@ void HELPER(NAME)(void *vd, target_ulong base, \ ctzl(sizeof(ETYPE)), GETPC()); \ } -GEN_VEXT_LD_WHOLE(vl1re8_v, int8_t, lde_b) -GEN_VEXT_LD_WHOLE(vl1re16_v, int16_t, lde_h) -GEN_VEXT_LD_WHOLE(vl1re32_v, int32_t, lde_w) -GEN_VEXT_LD_WHOLE(vl1re64_v, int64_t, lde_d) -GEN_VEXT_LD_WHOLE(vl2re8_v, int8_t, lde_b) -GEN_VEXT_LD_WHOLE(vl2re16_v, int16_t, lde_h) -GEN_VEXT_LD_WHOLE(vl2re32_v, int32_t, lde_w) -GEN_VEXT_LD_WHOLE(vl2re64_v, int64_t, lde_d) -GEN_VEXT_LD_WHOLE(vl4re8_v, int8_t, lde_b) -GEN_VEXT_LD_WHOLE(vl4re16_v, int16_t, lde_h) -GEN_VEXT_LD_WHOLE(vl4re32_v, int32_t, lde_w) -GEN_VEXT_LD_WHOLE(vl4re64_v, int64_t, lde_d) -GEN_VEXT_LD_WHOLE(vl8re8_v, int8_t, lde_b) -GEN_VEXT_LD_WHOLE(vl8re16_v, int16_t, lde_h) -GEN_VEXT_LD_WHOLE(vl8re32_v, int32_t, lde_w) -GEN_VEXT_LD_WHOLE(vl8re64_v, int64_t, lde_d) +GEN_VEXT_LD_WHOLE(vl1re8_v, int8_t, lde_b_tlb) +GEN_VEXT_LD_WHOLE(vl1re16_v, int16_t, lde_h_tlb) +GEN_VEXT_LD_WHOLE(vl1re32_v, int32_t, lde_w_tlb) +GEN_VEXT_LD_WHOLE(vl1re64_v, int64_t, lde_d_tlb) +GEN_VEXT_LD_WHOLE(vl2re8_v, int8_t, lde_b_tlb) +GEN_VEXT_LD_WHOLE(vl2re16_v, int16_t, lde_h_tlb) +GEN_VEXT_LD_WHOLE(vl2re32_v, int32_t, lde_w_tlb) +GEN_VEXT_LD_WHOLE(vl2re64_v, int64_t, lde_d_tlb) +GEN_VEXT_LD_WHOLE(vl4re8_v, int8_t, lde_b_tlb) +GEN_VEXT_LD_WHOLE(vl4re16_v, int16_t, lde_h_tlb) +GEN_VEXT_LD_WHOLE(vl4re32_v, int32_t, lde_w_tlb) +GEN_VEXT_LD_WHOLE(vl4re64_v, int64_t, lde_d_tlb) +GEN_VEXT_LD_WHOLE(vl8re8_v, int8_t, lde_b_tlb) +GEN_VEXT_LD_WHOLE(vl8re16_v, int16_t, lde_h_tlb) +GEN_VEXT_LD_WHOLE(vl8re32_v, int32_t, lde_w_tlb) +GEN_VEXT_LD_WHOLE(vl8re64_v, int64_t, lde_d_tlb) #define GEN_VEXT_ST_WHOLE(NAME, ETYPE, STORE_FN) \ void HELPER(NAME)(void *vd, target_ulong base, \ @@ -643,10 +999,10 @@ void HELPER(NAME)(void *vd, target_ulong base, \ ctzl(sizeof(ETYPE)), GETPC()); \ } -GEN_VEXT_ST_WHOLE(vs1r_v, int8_t, ste_b) -GEN_VEXT_ST_WHOLE(vs2r_v, int8_t, ste_b) -GEN_VEXT_ST_WHOLE(vs4r_v, int8_t, ste_b) -GEN_VEXT_ST_WHOLE(vs8r_v, int8_t, ste_b) +GEN_VEXT_ST_WHOLE(vs1r_v, int8_t, ste_b_tlb) +GEN_VEXT_ST_WHOLE(vs2r_v, int8_t, ste_b_tlb) +GEN_VEXT_ST_WHOLE(vs4r_v, int8_t, ste_b_tlb) +GEN_VEXT_ST_WHOLE(vs8r_v, int8_t, ste_b_tlb) /* * Vector Integer Arithmetic Instructions diff --git a/target/riscv/vector_internals.h b/target/riscv/vector_internals.h index 9e1e15b5750..f59d7d5c19f 100644 --- a/target/riscv/vector_internals.h +++ b/target/riscv/vector_internals.h @@ -233,4 +233,52 @@ void HELPER(NAME)(void *vd, void *v0, target_ulong s1, \ #define WOP_UUU_H uint32_t, uint16_t, uint16_t, uint32_t, uint32_t #define WOP_UUU_W uint64_t, uint32_t, uint32_t, uint64_t, uint64_t +typedef struct { + void *host; + int flags; + MemTxAttrs attrs; +} RVVHostPage; + +typedef struct { + /* + * First and last element wholly contained within the two pages. + * mem_off_first[0] and reg_idx_first[0] are always set >= 0. + * reg_idx_last[0] may be < 0 if the first element crosses pages. + * All of mem_off_first[1], reg_idx_first[1] and reg_idx_last[1] + * are set >= 0 only if there are complete elements on a second page. + */ + int16_t mem_off_first[2]; + int16_t reg_idx_first[2]; + int16_t reg_idx_last[2]; + + /* + * One element that is misaligned and spans both pages, + * or -1 if there is no such active element. + */ + int16_t mem_off_split; + int16_t reg_idx_split; + + /* + * The byte offset at which the entire operation crosses a page boundary. + * Set >= 0 if and only if the entire operation spans two pages. + */ + int16_t page_split; + + /* TLB data for the two pages. */ + RVVHostPage page[2]; +} RVVContLdSt; + +#ifdef CONFIG_USER_ONLY +static inline void +vext_cont_ldst_watchpoints(CPURISCVState *env, RVVContLdSt *info, uint64_t *v0, + target_ulong addr, uint32_t log2_esz, bool is_load, + uintptr_t ra, uint32_t desc) +{} +#else +void vext_cont_ldst_watchpoints(CPURISCVState *env, RVVContLdSt *info, + uint64_t *v0, target_ulong addr, + uint32_t log2_esz, bool is_load, uintptr_t ra, + uint32_t desc); +#endif + #endif /* TARGET_RISCV_VECTOR_INTERNALS_H */ From patchwork Thu Jun 13 14:19:04 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Max Chou X-Patchwork-Id: 1947436 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=sifive.com header.i=@sifive.com header.a=rsa-sha256 header.s=google header.b=N5TfrMen; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=nongnu.org (client-ip=209.51.188.17; helo=lists.gnu.org; envelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org; receiver=patchwork.ozlabs.org) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-ECDSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4W0Phs5mFPz20Xd for ; Fri, 14 Jun 2024 00:21:01 +1000 (AEST) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1sHlIX-0000F8-GN; Thu, 13 Jun 2024 10:19:29 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1sHlIW-0000ET-FP for qemu-devel@nongnu.org; Thu, 13 Jun 2024 10:19:28 -0400 Received: from mail-pf1-x42f.google.com ([2607:f8b0:4864:20::42f]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1sHlIU-0003cw-BQ for qemu-devel@nongnu.org; Thu, 13 Jun 2024 10:19:28 -0400 Received: by mail-pf1-x42f.google.com with SMTP id d2e1a72fcca58-6f6a045d476so895881b3a.1 for ; Thu, 13 Jun 2024 07:19:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sifive.com; s=google; t=1718288365; x=1718893165; darn=nongnu.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=10cMDHZvE0ZWYZou04C7wNwX0NYAeUSY86V5tPIbZF0=; b=N5TfrMenRxmeQldc9JG8XYGZarF7s6QnXcSJ5lwkrSO+GMzMp6CtEpsp5LlqqWFaTz bC2lPQMyhNX+hhB9HYzvuY0jAtlJSDJ/ncz9yr73J646KmHrugBidE8XfeWkHFJD5cDj clLRibrcy48AWWjaNm22bmrceQwzehVvmekKgrhi6j6f4veEUdo7tVaal4L+h/JWTtNG Euvzbl5jnu2Oux7kuyyxhw9EQWifOScKLO9szluDsDytZuLR59uLVvWZT6w4bxi7J2OB kAp9zwiK2a/IAS9LK2gJ6zXjUtP0J2rM+0fablvTIdeZw4o+k8vx9ia008z6EW+m9nJk KM/g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1718288365; x=1718893165; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=10cMDHZvE0ZWYZou04C7wNwX0NYAeUSY86V5tPIbZF0=; b=n9fFLFLr1P4ITvyeHrc/kXCBF37TDAo/waPooxqwMCbKE6bPK/2MUL/f5iriubST9c V71tJ0tj4RfZSQNkdRyVTY/aWrwhqHetoe6IHJcjKf8eFusgoNjU7Yimj9MsaBZA8Nfv Ctzpp3Fn9qFaB2R6yhQIiWumDAhQoJrDo4wi89l60/fEVIPr7l5RZP8Q4lPLM2l3xRfe qp+PS7RDzCf4QKDCm486LKRBoO3jFMOAb+yW/puRiTML8lUMKZQnZ1PYyYt/kXcs6rOO JzN0010wrpNOdm+msvIVeiFYJUuRotW0h8NqHOLzHbEmAjSOSW9rjiIgNtzizZwy3uMB cKgA== X-Gm-Message-State: AOJu0YybUVo4tJu8Ms0HWJqvII9wZ3kybg2A5H3PzjO0ld1j3eHuJrM5 E6Z84YLebsodLjQlsN5Asb8TnHMmDeuBBdXwTs1RCZ7zNVuAfLhE6JVbpHXfw+F0E6fIMBeGs/V cksAM8RAEMAZBwXA8XQZABv39FFf6A+a/X9PMoN6jlt66tkA7JXtI1KAut68D8yeHSiAuICegom nYccEnoSKjCyEdVZ/8Vmd0shgQ5GQRVd//9lvbUw== X-Google-Smtp-Source: AGHT+IHVpYvO9k0z+vG7UWjS4P/s7BvnyJr7bdSqE0esRt4UrWpzWOwZLxhfx4hxeMFjqlpzHDSjxw== X-Received: by 2002:a05:6a20:2214:b0:1b0:225:2b27 with SMTP id adf61e73a8af0-1b8a9c564cbmr3666803637.51.1718288364516; Thu, 13 Jun 2024 07:19:24 -0700 (PDT) Received: from duncan.localdomain (114-35-142-126.hinet-ip.hinet.net. [114.35.142.126]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-1f855f2fe6dsm14386975ad.257.2024.06.13.07.19.22 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 13 Jun 2024 07:19:24 -0700 (PDT) From: Max Chou To: qemu-devel@nongnu.org, qemu-riscv@nongnu.org Cc: Richard Henderson , Paolo Bonzini , Palmer Dabbelt , Alistair Francis , Bin Meng , Weiwei Li , Daniel Henrique Barboza , Liu Zhiwei , Max Chou Subject: [RFC PATCH v3 3/5] target/riscv: rvv: Provide a fast path using direct access to host ram for unit-stride whole register load/store Date: Thu, 13 Jun 2024 22:19:04 +0800 Message-Id: <20240613141906.1276105-4-max.chou@sifive.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240613141906.1276105-1-max.chou@sifive.com> References: <20240613141906.1276105-1-max.chou@sifive.com> MIME-Version: 1.0 Received-SPF: pass client-ip=2607:f8b0:4864:20::42f; envelope-from=max.chou@sifive.com; helo=mail-pf1-x42f.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org The vector unit-stride whole register load/store instructions are similar to unmasked unit-stride load/store instructions that is suitable to be optimized by using a direct access to host ram fast path. Signed-off-by: Max Chou --- target/riscv/vector_helper.c | 185 +++++++++++++++++++++++++---------- 1 file changed, 133 insertions(+), 52 deletions(-) diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c index d33ba5aeca1..b34d10b1b5d 100644 --- a/target/riscv/vector_helper.c +++ b/target/riscv/vector_helper.c @@ -928,81 +928,162 @@ GEN_VEXT_LDFF(vle64ff_v, int64_t, lde_d_tlb) */ static void vext_ldst_whole(void *vd, target_ulong base, CPURISCVState *env, uint32_t desc, - vext_ldst_elem_fn_tlb *ldst_elem, uint32_t log2_esz, - uintptr_t ra) + vext_ldst_elem_fn_tlb *ldst_tlb, + vext_ldst_elem_fn_host *ldst_host, uint32_t log2_esz, + uintptr_t ra, bool is_load) { - uint32_t i, k, off, pos; + RVVContLdSt info; + target_ulong addr; + void *host; + int flags; + intptr_t reg_start, reg_last; + uint32_t idx_nf, off, evl; uint32_t nf = vext_nf(desc); uint32_t vlenb = riscv_cpu_cfg(env)->vlenb; uint32_t max_elems = vlenb >> log2_esz; + uint32_t esz = 1 << log2_esz; if (env->vstart >= ((vlenb * nf) >> log2_esz)) { env->vstart = 0; return; } - k = env->vstart / max_elems; - off = env->vstart % max_elems; + vext_cont_ldst_elements(&info, base, env->vreg, env->vstart, + nf * max_elems, desc, log2_esz, true); + vext_cont_ldst_pages(env, &info, base, is_load, desc, esz, ra); + vext_cont_ldst_watchpoints(env, &info, env->vreg, base, esz, is_load, ra, + desc); + + flags = info.page[0].flags | info.page[1].flags; + if (unlikely(flags != 0)) { + /* At least one page includes MMIO. */ + reg_start = info.reg_idx_first[0]; + idx_nf = reg_start / max_elems; + off = reg_start % max_elems; + evl = (idx_nf + 1) * max_elems; + + if (off) { + /* + * load/store rest of elements of current segment pointed by vstart + */ + addr = base + (reg_start << log2_esz); + for (; reg_start < evl; reg_start++, addr += esz) { + ldst_tlb(env, adjust_addr(env, addr), reg_start << log2_esz, + vd, ra); + } + idx_nf++; + } + + /* load/store elements for rest of segments */ + evl = nf * max_elems; + addr = base + (reg_start << log2_esz); + for (; reg_start < evl; reg_start++, addr += esz) { + ldst_tlb(env, adjust_addr(env, addr), reg_start << log2_esz, vd, + ra); + } + + env->vstart = 0; + return; + } + + /* The entire operation is in RAM, on valid pages. */ + reg_start = info.reg_idx_first[0]; + reg_last = info.reg_idx_last[0] + 1; + host = info.page[0].host; + idx_nf = reg_start / max_elems; + off = reg_start % max_elems; + evl = (idx_nf + 1) * max_elems; if (off) { /* load/store rest of elements of current segment pointed by vstart */ - for (pos = off; pos < max_elems; pos++, env->vstart++) { - target_ulong addr = base + ((pos + k * max_elems) << log2_esz); - ldst_elem(env, adjust_addr(env, addr), - (pos + k * max_elems) << log2_esz, vd, ra); + for (; reg_start < evl; reg_start++) { + ldst_host(vd, reg_start << log2_esz, + host + (reg_start << log2_esz)); } - k++; + idx_nf++; } /* load/store elements for rest of segments */ - for (; k < nf; k++) { - for (i = 0; i < max_elems; i++, env->vstart++) { - target_ulong addr = base + ((i + k * max_elems) << log2_esz); - ldst_elem(env, adjust_addr(env, addr), - (i + k * max_elems) << log2_esz, vd, ra); + for (; reg_start < reg_last; reg_start++) { + ldst_host(vd, reg_start << log2_esz, host + (reg_start << log2_esz)); + } + + /* + * Use the slow path to manage the cross-page misalignment. + * But we know this is RAM and cannot trap. + */ + if (unlikely(info.mem_off_split >= 0)) { + reg_start = info.reg_idx_split; + addr = base + (reg_start << log2_esz); + ldst_tlb(env, adjust_addr(env, addr), reg_start << log2_esz, vd, ra); + } + + if (unlikely(info.mem_off_first[1] >= 0)) { + reg_start = info.reg_idx_first[1]; + reg_last = info.reg_idx_last[1] + 1; + host = info.page[1].host; + idx_nf = reg_start / max_elems; + off = reg_start % max_elems; + evl = (idx_nf + 1) * max_elems; + + if (off) { + /* + * load/store rest of elements of current segment pointed by vstart + */ + for (; reg_start < evl; reg_start++) { + ldst_host(vd, reg_start << log2_esz, + host + (reg_start << log2_esz)); + } + idx_nf++; + } + + /* load/store elements for rest of segments */ + for (; reg_start < reg_last; reg_start++) { + ldst_host(vd, reg_start << log2_esz, + host + (reg_start << log2_esz)); } } env->vstart = 0; } -#define GEN_VEXT_LD_WHOLE(NAME, ETYPE, LOAD_FN) \ -void HELPER(NAME)(void *vd, target_ulong base, \ - CPURISCVState *env, uint32_t desc) \ -{ \ - vext_ldst_whole(vd, base, env, desc, LOAD_FN, \ - ctzl(sizeof(ETYPE)), GETPC()); \ -} - -GEN_VEXT_LD_WHOLE(vl1re8_v, int8_t, lde_b_tlb) -GEN_VEXT_LD_WHOLE(vl1re16_v, int16_t, lde_h_tlb) -GEN_VEXT_LD_WHOLE(vl1re32_v, int32_t, lde_w_tlb) -GEN_VEXT_LD_WHOLE(vl1re64_v, int64_t, lde_d_tlb) -GEN_VEXT_LD_WHOLE(vl2re8_v, int8_t, lde_b_tlb) -GEN_VEXT_LD_WHOLE(vl2re16_v, int16_t, lde_h_tlb) -GEN_VEXT_LD_WHOLE(vl2re32_v, int32_t, lde_w_tlb) -GEN_VEXT_LD_WHOLE(vl2re64_v, int64_t, lde_d_tlb) -GEN_VEXT_LD_WHOLE(vl4re8_v, int8_t, lde_b_tlb) -GEN_VEXT_LD_WHOLE(vl4re16_v, int16_t, lde_h_tlb) -GEN_VEXT_LD_WHOLE(vl4re32_v, int32_t, lde_w_tlb) -GEN_VEXT_LD_WHOLE(vl4re64_v, int64_t, lde_d_tlb) -GEN_VEXT_LD_WHOLE(vl8re8_v, int8_t, lde_b_tlb) -GEN_VEXT_LD_WHOLE(vl8re16_v, int16_t, lde_h_tlb) -GEN_VEXT_LD_WHOLE(vl8re32_v, int32_t, lde_w_tlb) -GEN_VEXT_LD_WHOLE(vl8re64_v, int64_t, lde_d_tlb) - -#define GEN_VEXT_ST_WHOLE(NAME, ETYPE, STORE_FN) \ -void HELPER(NAME)(void *vd, target_ulong base, \ - CPURISCVState *env, uint32_t desc) \ -{ \ - vext_ldst_whole(vd, base, env, desc, STORE_FN, \ - ctzl(sizeof(ETYPE)), GETPC()); \ -} - -GEN_VEXT_ST_WHOLE(vs1r_v, int8_t, ste_b_tlb) -GEN_VEXT_ST_WHOLE(vs2r_v, int8_t, ste_b_tlb) -GEN_VEXT_ST_WHOLE(vs4r_v, int8_t, ste_b_tlb) -GEN_VEXT_ST_WHOLE(vs8r_v, int8_t, ste_b_tlb) +#define GEN_VEXT_LD_WHOLE(NAME, ETYPE, LOAD_FN_TLB, LOAD_FN_HOST) \ +void HELPER(NAME)(void *vd, target_ulong base, CPURISCVState *env, \ + uint32_t desc) \ +{ \ + vext_ldst_whole(vd, base, env, desc, LOAD_FN_TLB, LOAD_FN_HOST, \ + ctzl(sizeof(ETYPE)), GETPC(), true); \ +} + +GEN_VEXT_LD_WHOLE(vl1re8_v, int8_t, lde_b_tlb, lde_b_host) +GEN_VEXT_LD_WHOLE(vl1re16_v, int16_t, lde_h_tlb, lde_h_host) +GEN_VEXT_LD_WHOLE(vl1re32_v, int32_t, lde_w_tlb, lde_w_host) +GEN_VEXT_LD_WHOLE(vl1re64_v, int64_t, lde_d_tlb, lde_d_host) +GEN_VEXT_LD_WHOLE(vl2re8_v, int8_t, lde_b_tlb, lde_b_host) +GEN_VEXT_LD_WHOLE(vl2re16_v, int16_t, lde_h_tlb, lde_h_host) +GEN_VEXT_LD_WHOLE(vl2re32_v, int32_t, lde_w_tlb, lde_w_host) +GEN_VEXT_LD_WHOLE(vl2re64_v, int64_t, lde_d_tlb, lde_d_host) +GEN_VEXT_LD_WHOLE(vl4re8_v, int8_t, lde_b_tlb, lde_b_host) +GEN_VEXT_LD_WHOLE(vl4re16_v, int16_t, lde_h_tlb, lde_h_host) +GEN_VEXT_LD_WHOLE(vl4re32_v, int32_t, lde_w_tlb, lde_w_host) +GEN_VEXT_LD_WHOLE(vl4re64_v, int64_t, lde_d_tlb, lde_d_host) +GEN_VEXT_LD_WHOLE(vl8re8_v, int8_t, lde_b_tlb, lde_b_host) +GEN_VEXT_LD_WHOLE(vl8re16_v, int16_t, lde_h_tlb, lde_h_host) +GEN_VEXT_LD_WHOLE(vl8re32_v, int32_t, lde_w_tlb, lde_w_host) +GEN_VEXT_LD_WHOLE(vl8re64_v, int64_t, lde_d_tlb, lde_d_host) + +#define GEN_VEXT_ST_WHOLE(NAME, ETYPE, STORE_FN_TLB, STORE_FN_HOST) \ +void HELPER(NAME)(void *vd, target_ulong base, CPURISCVState *env, \ + uint32_t desc) \ +{ \ + vext_ldst_whole(vd, base, env, desc, STORE_FN_TLB, STORE_FN_HOST, \ + ctzl(sizeof(ETYPE)), GETPC(), false); \ +} + +GEN_VEXT_ST_WHOLE(vs1r_v, int8_t, ste_b_tlb, ste_b_host) +GEN_VEXT_ST_WHOLE(vs2r_v, int8_t, ste_b_tlb, ste_b_host) +GEN_VEXT_ST_WHOLE(vs4r_v, int8_t, ste_b_tlb, ste_b_host) +GEN_VEXT_ST_WHOLE(vs8r_v, int8_t, ste_b_tlb, ste_b_host) /* * Vector Integer Arithmetic Instructions From patchwork Thu Jun 13 14:19:05 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Max Chou X-Patchwork-Id: 1947437 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=sifive.com header.i=@sifive.com header.a=rsa-sha256 header.s=google header.b=ESWxfWuV; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=nongnu.org (client-ip=209.51.188.17; helo=lists.gnu.org; envelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org; receiver=patchwork.ozlabs.org) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-ECDSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4W0Pht25PZz23tw for ; Fri, 14 Jun 2024 00:21:02 +1000 (AEST) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1sHlIc-0000IE-Gp; Thu, 13 Jun 2024 10:19:34 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1sHlIa-0000GV-1N for qemu-devel@nongnu.org; Thu, 13 Jun 2024 10:19:32 -0400 Received: from mail-pl1-x635.google.com ([2607:f8b0:4864:20::635]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1sHlIX-0003eY-Hf for qemu-devel@nongnu.org; Thu, 13 Jun 2024 10:19:31 -0400 Received: by mail-pl1-x635.google.com with SMTP id d9443c01a7336-1f44b441b08so9353645ad.0 for ; Thu, 13 Jun 2024 07:19:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sifive.com; s=google; t=1718288368; x=1718893168; darn=nongnu.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=0t/vM91+T27b2ItOvv7gvzbpNEbYUWQGcMzSk8/eXH0=; b=ESWxfWuVNrPIqOiDnbrs5jzodi5LuYmuNQ8XGp2E2ZP09bMPOT6tTa79crlHsE/L9I AVKtZNSTusRQyhu+jffKxQPThJoxwiRfYVckZC7SBb3ii0fp6JVoD2kkFocwMlU3jKCo Tx53/RW7X/AhuAThnhcbUUVSvB9qinrX4qD+R7yG54EZ5YRfY5wVFkb59s0xFtX7Nt7s zPmGgYO3XkrDgvuI4apR6lpV2UUMLNn3RL1FDekefz0NLDiPbg5SOq896S/I/uJKrcpq kLaNRGmcQrD1VPepK8yWzfx+2sruumW6TUde48dZO3LmHcvY0QvHHfEENllD6V+Czr79 5ejA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1718288368; x=1718893168; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=0t/vM91+T27b2ItOvv7gvzbpNEbYUWQGcMzSk8/eXH0=; b=X0IO21eUrrmA6jtoRBc9fHpNjQViwkBM6pBtQUFAau95MGt3DZ0E9+MrPCzyxXegBI tPT4y4/Ac8Xe4Xk/fyeVMakk/68guWx3RUER/wRs1e3q4IQap/czCQJAaJkmnJK65tMZ foZbrEVPk318LCG3nXVOHwVhGeAkeh0RNAU9sxlasSJAimAZM7B4uSIndvbNmNVNTbzZ C7RyJpgM2mo8Ic8F6IKAndlsl7JrD916ViY2Jgqg+OP44kyjnmqJQ91LNgqR0CqOvuTb 1Oxly569jO+NsG6u904s9gbmsvcd7ztdp4Ao9oMSMGCXYn8fm4wc6ganmBeXvB4AZtT0 pXiA== X-Gm-Message-State: AOJu0YxC9leNwXH7NCCOhNPCK8b/JOS5Mn0pclx1FRayd8qJWmDZHkWm 05zqjLu1MYIi7YegOsojUKoG2N/TX1K32fXlxIaYFe64SDyFERx8SEuyrfMc69BpZhAN9Nau6Z1 089lAX+uU6YJnFgUmx764W+MHROaJpwKETaSEbyjHByt6SLio+AfOjeLIuAMT0u/Swvd1pzACBb pQ/P2NucT6DcR37rzl3G4dKNLwX3fUkopT4XLu6A== X-Google-Smtp-Source: AGHT+IHTc7SkCHacsADXbEx9P5Jfok/j+Ul9NpfoqGJDsqNmRmezeu/FSSocddUD0wTZ5H1uWhD5qg== X-Received: by 2002:a17:902:ced0:b0:1f8:44f8:a384 with SMTP id d9443c01a7336-1f844f8a5aamr43848675ad.30.1718288367822; Thu, 13 Jun 2024 07:19:27 -0700 (PDT) Received: from duncan.localdomain (114-35-142-126.hinet-ip.hinet.net. [114.35.142.126]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-1f855f2fe6dsm14386975ad.257.2024.06.13.07.19.25 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 13 Jun 2024 07:19:27 -0700 (PDT) From: Max Chou To: qemu-devel@nongnu.org, qemu-riscv@nongnu.org Cc: Richard Henderson , Paolo Bonzini , Palmer Dabbelt , Alistair Francis , Bin Meng , Weiwei Li , Daniel Henrique Barboza , Liu Zhiwei , Max Chou Subject: [RFC PATCH v3 4/5] target/riscv: rvv: Provide group continuous ld/st flow for unit-stride ld/st instructions Date: Thu, 13 Jun 2024 22:19:05 +0800 Message-Id: <20240613141906.1276105-5-max.chou@sifive.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240613141906.1276105-1-max.chou@sifive.com> References: <20240613141906.1276105-1-max.chou@sifive.com> MIME-Version: 1.0 Received-SPF: pass client-ip=2607:f8b0:4864:20::635; envelope-from=max.chou@sifive.com; helo=mail-pl1-x635.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=unavailable autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org The vector unmasked unit-stride and whole register load/store instructions will load/store continuous memory. If the endian of both the host and guest architecture are the same, then we can group the element load/store to load/store more data at a time. Signed-off-by: Max Chou --- target/riscv/vector_helper.c | 160 +++++++++++++++++++++++++---------- 1 file changed, 117 insertions(+), 43 deletions(-) diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c index b34d10b1b5d..09c9b231c3f 100644 --- a/target/riscv/vector_helper.c +++ b/target/riscv/vector_helper.c @@ -450,6 +450,69 @@ GEN_VEXT_ST_ELEM(ste_h, uint16_t, H2, stw) GEN_VEXT_ST_ELEM(ste_w, uint32_t, H4, stl) GEN_VEXT_ST_ELEM(ste_d, uint64_t, H8, stq) +static inline uint32_t +vext_group_ldst_host(CPURISCVState *env, void *vd, uint32_t byte_end, + uint32_t byte_offset, void *host, uint32_t esz, + bool is_load) +{ + uint32_t group_size; + static vext_ldst_elem_fn_host * const fns[2][4] = { + /* Store */ + { ste_b_host, ste_h_host, ste_w_host, ste_d_host }, + /* Load */ + { lde_b_host, lde_h_host, lde_w_host, lde_d_host } + }; + vext_ldst_elem_fn_host *fn; + + if (byte_offset + 8 < byte_end) { + group_size = MO_64; + } else if (byte_offset + 4 < byte_end) { + group_size = MO_32; + } else if (byte_offset + 2 < byte_end) { + group_size = MO_16; + } else { + group_size = MO_8; + } + + fn = fns[is_load][group_size]; + fn(vd, byte_offset, host + byte_offset); + + return 1 << group_size; +} + +static inline void +vext_continus_ldst_tlb(CPURISCVState *env, vext_ldst_elem_fn_tlb *ldst_tlb, + void *vd, uint32_t evl, target_ulong addr, + uint32_t reg_start, uintptr_t ra, uint32_t esz, + bool is_load) +{ + for (; reg_start < evl; reg_start++, addr += esz) { + ldst_tlb(env, adjust_addr(env, addr), reg_start * esz, vd, ra); + } +} + +static inline void +vext_continus_ldst_host(CPURISCVState *env, vext_ldst_elem_fn_host *ldst_host, + void *vd, uint32_t evl, uint32_t reg_start, void *host, + uint32_t esz, bool is_load) +{ +#if TARGET_BIG_ENDIAN != HOST_BIG_ENDIAN + for (; reg_start < evl; reg_start++) { + uint32_t byte_off = reg_start * esz; + ldst_host(vd, byte_off, host + byte_off); + } +#else + uint32_t group_byte; + uint32_t byte_start = reg_start * esz; + uint32_t byte_end = evl * esz; + while (byte_start < byte_end) { + group_byte = vext_group_ldst_host(env, vd, byte_end, byte_start, host, + esz, is_load); + byte_start += group_byte; + } +#endif +} + static void vext_set_tail_elems_1s(target_ulong vl, void *vd, uint32_t desc, uint32_t nf, uint32_t esz, uint32_t max_elems) @@ -548,6 +611,7 @@ vext_ldst_us(void *vd, target_ulong base, CPURISCVState *env, uint32_t desc, uint32_t evl, uintptr_t ra, bool is_load) { RVVContLdSt info; + target_ulong addr; void *host; int flags; intptr_t reg_start, reg_last; @@ -580,13 +644,19 @@ vext_ldst_us(void *vd, target_ulong base, CPURISCVState *env, uint32_t desc, } reg_last += 1; - for (i = reg_start; i < reg_last; ++i) { - k = 0; - while (k < nf) { - target_ulong addr = base + ((i * nf + k) << log2_esz); - ldst_tlb(env, adjust_addr(env, addr), - (i + k * max_elems) << log2_esz, vd, ra); - k++; + if (nf == 1) { + addr = base + reg_start * esz; + vext_continus_ldst_tlb(env, ldst_tlb, vd, reg_last, addr, + reg_start, ra, esz, is_load); + } else { + for (i = reg_start; i < reg_last; ++i) { + k = 0; + while (k < nf) { + addr = base + ((i * nf + k) * esz); + ldst_tlb(env, adjust_addr(env, addr), + (i + k * max_elems) << log2_esz, vd, ra); + k++; + } } } @@ -600,12 +670,17 @@ vext_ldst_us(void *vd, target_ulong base, CPURISCVState *env, uint32_t desc, reg_last = info.reg_idx_last[0] + 1; host = info.page[0].host; - for (i = reg_start; i < reg_last; ++i) { - k = 0; - while (k < nf) { - ldst_host(vd, (i + k * max_elems) << log2_esz, - host + ((i * nf + k) << log2_esz)); - k++; + if (nf == 1) { + vext_continus_ldst_host(env, ldst_host, vd, reg_last, reg_start, host, + esz, is_load); + } else { + for (i = reg_start; i < reg_last; ++i) { + k = 0; + while (k < nf) { + ldst_host(vd, (i + k * max_elems) << log2_esz, + host + ((i * nf + k) * esz)); + k++; + } } } @@ -617,7 +692,7 @@ vext_ldst_us(void *vd, target_ulong base, CPURISCVState *env, uint32_t desc, reg_start = info.reg_idx_split; k = 0; while (k < nf) { - target_ulong addr = base + ((reg_start * nf + k) << log2_esz); + addr = base + ((reg_start * nf + k) << log2_esz); ldst_tlb(env, adjust_addr(env, addr), (reg_start + k * max_elems) << log2_esz, vd, ra); k++; @@ -629,12 +704,17 @@ vext_ldst_us(void *vd, target_ulong base, CPURISCVState *env, uint32_t desc, reg_last = info.reg_idx_last[1] + 1; host = info.page[1].host; - for (i = reg_start; i < reg_last; ++i) { - k = 0; - while (k < nf) { - ldst_host(vd, (i + k * max_elems) << log2_esz, - host + ((i * nf + k) << log2_esz)); - k++; + if (nf == 1) { + vext_continus_ldst_host(env, ldst_host, vd, reg_last, reg_start, + host, esz, is_load); + } else { + for (i = reg_start; i < reg_last; ++i) { + k = 0; + while (k < nf) { + ldst_host(vd, (i + k * max_elems) << log2_esz, + host + ((i * nf + k) << log2_esz)); + k++; + } } } } @@ -967,20 +1047,17 @@ vext_ldst_whole(void *vd, target_ulong base, CPURISCVState *env, uint32_t desc, * load/store rest of elements of current segment pointed by vstart */ addr = base + (reg_start << log2_esz); - for (; reg_start < evl; reg_start++, addr += esz) { - ldst_tlb(env, adjust_addr(env, addr), reg_start << log2_esz, - vd, ra); - } + vext_continus_ldst_tlb(env, ldst_tlb, vd, evl, addr, reg_start, ra, + esz, is_load); idx_nf++; } /* load/store elements for rest of segments */ evl = nf * max_elems; addr = base + (reg_start << log2_esz); - for (; reg_start < evl; reg_start++, addr += esz) { - ldst_tlb(env, adjust_addr(env, addr), reg_start << log2_esz, vd, - ra); - } + reg_start = idx_nf * max_elems; + vext_continus_ldst_tlb(env, ldst_tlb, vd, evl, addr, reg_start, ra, + esz, is_load); env->vstart = 0; return; @@ -996,17 +1073,16 @@ vext_ldst_whole(void *vd, target_ulong base, CPURISCVState *env, uint32_t desc, if (off) { /* load/store rest of elements of current segment pointed by vstart */ - for (; reg_start < evl; reg_start++) { - ldst_host(vd, reg_start << log2_esz, - host + (reg_start << log2_esz)); - } + vext_continus_ldst_host(env, ldst_host, vd, evl, reg_start, host, esz, + is_load); idx_nf++; } /* load/store elements for rest of segments */ - for (; reg_start < reg_last; reg_start++) { - ldst_host(vd, reg_start << log2_esz, host + (reg_start << log2_esz)); - } + evl = reg_last; + reg_start = idx_nf * max_elems; + vext_continus_ldst_host(env, ldst_host, vd, evl, reg_start, host, esz, + is_load); /* * Use the slow path to manage the cross-page misalignment. @@ -1030,18 +1106,16 @@ vext_ldst_whole(void *vd, target_ulong base, CPURISCVState *env, uint32_t desc, /* * load/store rest of elements of current segment pointed by vstart */ - for (; reg_start < evl; reg_start++) { - ldst_host(vd, reg_start << log2_esz, - host + (reg_start << log2_esz)); - } + vext_continus_ldst_host(env, ldst_host, vd, evl, reg_start, host, + esz, is_load); idx_nf++; } /* load/store elements for rest of segments */ - for (; reg_start < reg_last; reg_start++) { - ldst_host(vd, reg_start << log2_esz, - host + (reg_start << log2_esz)); - } + evl = reg_last; + reg_start = idx_nf * max_elems; + vext_continus_ldst_host(env, ldst_host, vd, evl, reg_start, host, esz, + is_load); } env->vstart = 0; From patchwork Thu Jun 13 14:19:06 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Max Chou X-Patchwork-Id: 1947435 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=sifive.com header.i=@sifive.com header.a=rsa-sha256 header.s=google header.b=BXUH4f7p; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=nongnu.org (client-ip=209.51.188.17; helo=lists.gnu.org; envelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org; receiver=patchwork.ozlabs.org) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-ECDSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4W0PhJ07PPz20Xd for ; Fri, 14 Jun 2024 00:20:32 +1000 (AEST) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1sHlIe-0000Kk-Bl; Thu, 13 Jun 2024 10:19:36 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1sHlIc-0000J9-KA for qemu-devel@nongnu.org; Thu, 13 Jun 2024 10:19:34 -0400 Received: from mail-pl1-x630.google.com ([2607:f8b0:4864:20::630]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1sHlIa-0003gC-UK for qemu-devel@nongnu.org; Thu, 13 Jun 2024 10:19:34 -0400 Received: by mail-pl1-x630.google.com with SMTP id d9443c01a7336-1f70131063cso10527965ad.2 for ; Thu, 13 Jun 2024 07:19:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sifive.com; s=google; t=1718288371; x=1718893171; darn=nongnu.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=oaOaaoMmlyje8Yi3g2JX3kO1iEufFu1UTGhlSxPfEi8=; b=BXUH4f7pCPV5ORvQIwND5KBbsQW7iiXimzaQPCYfzpaAwXoNw6ktnXkI2ThLZcp3e5 DMLHE1qa+AxJu66Apz2nmRArWSmsoSLrlo17w+aOIQs/eSemNJFNCXVkD5B//CehN94D ACxk9/JDwt38jj3krmhT8Pxfdftv+bCUEvUblOgER9GTTwRAQOOLa/uvds4bi2aVRVNZ UvgCEeDQivmlzg/RFiXfQeQnpGzIOBfGWP4FLo6F4zll7zi5h8FmrfRYlK8gdg2BDDvC Qqz7+48HlFYcbj+4FuM4prR70NdnrDo/V0oTYcTy3owYznt/7ou9W7R5K80pCtxuyMa6 N3tQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1718288371; x=1718893171; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=oaOaaoMmlyje8Yi3g2JX3kO1iEufFu1UTGhlSxPfEi8=; b=TN3kMZ7p354FAkRSrBzWdigbPeXbafmmZKylCT1TZ311ldf5dh2uGKztRXYUePp37Q +V9MRGw6kP973o0+u0q8azJx9XYDIVjT1xxuQ58fvYS8Prb4Ak/1CzlirtYMAhrZaQZg P/KUe07YPEt38UVUmAFzGoqVNvOac+Qy/1HDrifO+UTz7nWIadMrOK/H+Sx4kiNFQeM6 nwuuCFwCa7SpfkBAR2neKVn9mdkkThPCPt022kmfkXq/76IPiD7YA424M7L32jERBdif V1+jEMyml0m9Y8HjfOVOICirUeRFs6BcZj1fkeJGpKtHNItK81gHQyU3XHInF0DO+ipa 7nFA== X-Gm-Message-State: AOJu0Yws1s0FHEv//DPKbCYhLrXLBzXeW/iZrXIwuST3Qsmo6z8MrdwY IoKx1IV4JLPuupxQVPpsF5BDSnAw4flEdW6rAcPZZO6rEnXzFSEm6USkT3lwm2jrjNzRWYi+NDx UL6yh+/sbEppislppAUH+/8PI3ftRwuhQCbaavjanJp02h3XnOzTUwKoFMc/8k2stE8XFUKEfWa W4zOksCDYSK5veJNCubpdhZZEhBPvNkxD8NqkPvw== X-Google-Smtp-Source: AGHT+IFb068M+3tGjZBVSOt48c82I/e4QkTdKcx/eWMOAvHzhk3dZqkQ1V+tWEOwKFGawG8kclytpQ== X-Received: by 2002:a17:902:d4cb:b0:1f7:3763:5ffb with SMTP id d9443c01a7336-1f83b74d134mr52952515ad.59.1718288371270; Thu, 13 Jun 2024 07:19:31 -0700 (PDT) Received: from duncan.localdomain (114-35-142-126.hinet-ip.hinet.net. [114.35.142.126]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-1f855f2fe6dsm14386975ad.257.2024.06.13.07.19.29 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 13 Jun 2024 07:19:31 -0700 (PDT) From: Max Chou To: qemu-devel@nongnu.org, qemu-riscv@nongnu.org Cc: Richard Henderson , Paolo Bonzini , Palmer Dabbelt , Alistair Francis , Bin Meng , Weiwei Li , Daniel Henrique Barboza , Liu Zhiwei , Max Chou Subject: [RFC PATCH v3 5/5] target/riscv: Inline unit-stride ld/st and corresponding functions for performance Date: Thu, 13 Jun 2024 22:19:06 +0800 Message-Id: <20240613141906.1276105-6-max.chou@sifive.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240613141906.1276105-1-max.chou@sifive.com> References: <20240613141906.1276105-1-max.chou@sifive.com> MIME-Version: 1.0 Received-SPF: pass client-ip=2607:f8b0:4864:20::630; envelope-from=max.chou@sifive.com; helo=mail-pl1-x630.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=unavailable autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org In the vector unit-stride load/store helper functions. the vext_ldst_us & vext_ldst_whole functions corresponding most of the execution time. Inline the functions can avoid the function call overhead to improve the helper function performance. Signed-off-by: Max Chou --- target/riscv/vector_helper.c | 64 +++++++++++++++++++----------------- 1 file changed, 34 insertions(+), 30 deletions(-) diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c index 09c9b231c3f..4a21064a366 100644 --- a/target/riscv/vector_helper.c +++ b/target/riscv/vector_helper.c @@ -408,20 +408,22 @@ typedef void vext_ldst_elem_fn_tlb(CPURISCVState *env, abi_ptr addr, uint32_t idx, void *vd, uintptr_t retaddr); typedef void vext_ldst_elem_fn_host(void *vd, uint32_t idx, void *host); -#define GEN_VEXT_LD_ELEM(NAME, ETYPE, H, LDSUF) \ -static void NAME##_tlb(CPURISCVState *env, abi_ptr addr, \ - uint32_t byte_off, void *vd, uintptr_t retaddr) \ -{ \ - uint8_t *reg = ((uint8_t *)vd + byte_off); \ - ETYPE *cur = ((ETYPE *)reg); \ - *cur = cpu_##LDSUF##_data_ra(env, addr, retaddr); \ -} \ - \ -static void NAME##_host(void *vd, uint32_t byte_off, void *host) \ -{ \ - ETYPE val = LDSUF##_p(host); \ - uint8_t *reg = (uint8_t *)(vd + byte_off); \ - *(ETYPE *)(reg) = val; \ +#define GEN_VEXT_LD_ELEM(NAME, ETYPE, H, LDSUF) \ +static inline QEMU_ALWAYS_INLINE \ +void NAME##_tlb(CPURISCVState *env, abi_ptr addr, \ + uint32_t byte_off, void *vd, uintptr_t retaddr) \ +{ \ + uint8_t *reg = ((uint8_t *)vd + byte_off); \ + ETYPE *cur = ((ETYPE *)reg); \ + *cur = cpu_##LDSUF##_data_ra(env, addr, retaddr); \ +} \ + \ +static inline QEMU_ALWAYS_INLINE \ +void NAME##_host(void *vd, uint32_t byte_off, void *host) \ +{ \ + ETYPE val = LDSUF##_p(host); \ + uint8_t *reg = (uint8_t *)(vd + byte_off); \ + *(ETYPE *)(reg) = val; \ } GEN_VEXT_LD_ELEM(lde_b, uint8_t, H1, ldub) @@ -429,20 +431,22 @@ GEN_VEXT_LD_ELEM(lde_h, uint16_t, H2, lduw) GEN_VEXT_LD_ELEM(lde_w, uint32_t, H4, ldl) GEN_VEXT_LD_ELEM(lde_d, uint64_t, H8, ldq) -#define GEN_VEXT_ST_ELEM(NAME, ETYPE, H, STSUF) \ -static void NAME##_tlb(CPURISCVState *env, abi_ptr addr, \ - uint32_t byte_off, void *vd, uintptr_t retaddr) \ -{ \ - uint8_t *reg = ((uint8_t *)vd + byte_off); \ - ETYPE data = *((ETYPE *)reg); \ - cpu_##STSUF##_data_ra(env, addr, data, retaddr); \ -} \ - \ -static void NAME##_host(void *vd, uint32_t byte_off, void *host) \ -{ \ - uint8_t *reg = ((uint8_t *)vd + byte_off); \ - ETYPE val = *(ETYPE *)(reg); \ - STSUF##_p(host, val); \ +#define GEN_VEXT_ST_ELEM(NAME, ETYPE, H, STSUF) \ +static inline QEMU_ALWAYS_INLINE \ +void NAME##_tlb(CPURISCVState *env, abi_ptr addr, \ + uint32_t byte_off, void *vd, uintptr_t retaddr) \ +{ \ + uint8_t *reg = ((uint8_t *)vd + byte_off); \ + ETYPE data = *((ETYPE *)reg); \ + cpu_##STSUF##_data_ra(env, addr, data, retaddr); \ +} \ + \ +static inline QEMU_ALWAYS_INLINE \ +void NAME##_host(void *vd, uint32_t byte_off, void *host) \ +{ \ + uint8_t *reg = ((uint8_t *)vd + byte_off); \ + ETYPE val = *(ETYPE *)(reg); \ + STSUF##_p(host, val); \ } GEN_VEXT_ST_ELEM(ste_b, uint8_t, H1, stb) @@ -604,7 +608,7 @@ GEN_VEXT_ST_STRIDE(vsse64_v, int64_t, ste_d_tlb) */ /* unmasked unit-stride load and store operation */ -static void +static inline QEMU_ALWAYS_INLINE void vext_ldst_us(void *vd, target_ulong base, CPURISCVState *env, uint32_t desc, vext_ldst_elem_fn_tlb *ldst_tlb, vext_ldst_elem_fn_host *ldst_host, uint32_t log2_esz, @@ -1006,7 +1010,7 @@ GEN_VEXT_LDFF(vle64ff_v, int64_t, lde_d_tlb) /* * load and store whole register instructions */ -static void +static inline QEMU_ALWAYS_INLINE void vext_ldst_whole(void *vd, target_ulong base, CPURISCVState *env, uint32_t desc, vext_ldst_elem_fn_tlb *ldst_tlb, vext_ldst_elem_fn_host *ldst_host, uint32_t log2_esz,