From patchwork Tue Feb 20 04:15:10 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexandre Oliva X-Patchwork-Id: 1901190 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; secure) header.d=adacore.com header.i=@adacore.com header.a=rsa-sha256 header.s=google header.b=kwkQx0qj; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4Tf5gK4hRqz1yP6 for ; Tue, 20 Feb 2024 15:15:56 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id D46543858C31 for ; Tue, 20 Feb 2024 04:15:54 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mail-pf1-x42c.google.com (mail-pf1-x42c.google.com [IPv6:2607:f8b0:4864:20::42c]) by sourceware.org (Postfix) with ESMTPS id 414763858CD1 for ; Tue, 20 Feb 2024 04:15:27 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 414763858CD1 Authentication-Results: sourceware.org; dmarc=pass (p=quarantine dis=none) header.from=adacore.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=adacore.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 414763858CD1 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2607:f8b0:4864:20::42c ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1708402530; cv=none; b=N4CFJNUixuP7ekL0hPBljOEqBRud9N/VLdRUk2uEG3WOonG8tWlGaJ36zr0/YGqW08ETNljrNmmLY9bMfbQP9kx7Ey7VpFhGB4HjXJDyAnfxELgYJcKTmziJG5vemAxwvlRIs2Fq1loRcHTrYhUQjt6GyXA8xWrK26J8EcpUx0Y= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1708402530; c=relaxed/simple; bh=EWdhiQjlbfzuXJNDM/IM82X7S9JcVH4X6HHhRVsPu5A=; h=DKIM-Signature:From:To:Subject:Date:Message-ID:MIME-Version; b=w1/RcfHrb9zSfO3kipyeIomreopOnjBJvKt33kmEGLXuKZbfg8fcFCP5rmWhG1ntjoMqEA5/LMt62mhN79soCnv/u8DSIs9WG+5ULwN3+4JgOUa8vKFKgppHzHg3hSFqb35LNk06v5L6QOBxbqGrZ0wRAtUKdb7act1m4RlUCcc= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-pf1-x42c.google.com with SMTP id d2e1a72fcca58-6d9f94b9186so4086555b3a.0 for ; Mon, 19 Feb 2024 20:15:27 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=adacore.com; s=google; t=1708402526; x=1709007326; darn=gcc.gnu.org; h=mime-version:user-agent:message-id:date:organization:subject:cc:to :from:from:to:cc:subject:date:message-id:reply-to; bh=yvdbi8SSlEoxRRaT9gdqVK6HpVodZ7w58iwRnpiX7gI=; b=kwkQx0qjwMrR3wDYxoL1My3Qk2UqmDrVjBaZdC4EPHGmQsYbxmrgkyWsuPtoyfQ5hj +3LUrTv2Xhi1++fvNPcnI5tf+Ihl8zViBjCJYj1LqfqTkG+AwO+me02DktoTCSuSsbSy Bx6EdKH06rFDdLO1atHbQMlb089CIFPpEiozNDhrmYwXKNe5Q5rq2theVwJybhBwB+pf B3kcmJxkFGW1oRkS7Lxn/oZyNLtcyqQoLjg/BD9gWV/cqgqUPWeEgYrd7vgo08HYzTEC Gg6VHPjDA4uIpDfyt7VZcc7nuJzYrCpkPgCMNndE5VD9iz/Mao6AYr3jwAZLgCDT4cu8 LLdA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1708402526; x=1709007326; h=mime-version:user-agent:message-id:date:organization:subject:cc:to :from:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=yvdbi8SSlEoxRRaT9gdqVK6HpVodZ7w58iwRnpiX7gI=; b=pNA7rt1J5mNM/EEWO21uQxko4H1DBxsZvG/uHIPOG9861SRhnX854FKJ8UiJrDngOP ely/9cYkEzyg28T/5nGKaeEMcBcebkpfneDZtWNolVJ5nth26HJPcoQEIYjezzCa7Ykd cBQe2Cop/q0DDLAYOCeE4rovsaBNcQ7Cfnqp+ywzDE4y3iFPX2NE923RvdeiXWbxQH1h 7ts1w67hb5JyUFaCZMs8fya20wjFaYRu4aPJ9AzrI/zzDwL3mOpVAjye0BxCVr45KE7b 1eIUCARyioy16IzENkGZwQiIhEbeBzFh7K0aPst0nHtz+RjoaSHthdl+MkT3egopci/Y DOmQ== X-Gm-Message-State: AOJu0YxOsgw8KSQn5T46MpvobZBotKY395Co63hrOb+OZ3BkK2YlEnZY B0Q/wHhr6QJABfLMoz0M7tfoJ26s2wyCWYn85B1dKzVa5c9cNg3gV8MRNYClvwThL2ko7m2VuNc 0wA== X-Google-Smtp-Source: AGHT+IFVwbcE31RpfbNxJBBJOaIkeTUMuS/vu3nzOhfDPx8KPBYvn7l0SbpnQB2c4VY0VhquGnZVIQ== X-Received: by 2002:a05:6a00:2e24:b0:6e1:4836:629c with SMTP id fc36-20020a056a002e2400b006e14836629cmr12870376pfb.11.1708402526020; Mon, 19 Feb 2024 20:15:26 -0800 (PST) Received: from free.home ([2804:7f1:218a:c88b:e868:4eaf:8258:c30b]) by smtp.gmail.com with ESMTPSA id y12-20020aa79e0c000000b006e3e72a4f87sm4110702pfq.0.2024.02.19.20.15.25 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 19 Feb 2024 20:15:25 -0800 (PST) Received: from livre (livre.home [172.31.160.2]) by free.home (8.15.2/8.15.2) with ESMTPS id 41K4FAui005770 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NOT); Tue, 20 Feb 2024 01:15:10 -0300 From: Alexandre Oliva To: gcc-patches@gcc.gnu.org Cc: Kito Cheng , Palmer Dabbelt , Andrew Waterman , Jim Wilson , Lehua Ding , Ju-Zhe Zhong Subject: [PATCH] RISC-V: Revert the convert from vmv.s.x to vmv.v.i Organization: Free thinker, does not speak for AdaCore Date: Tue, 20 Feb 2024 01:15:10 -0300 Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.1 (gnu/linux) MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.84 X-Spam-Status: No, score=-11.4 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_ASCII_DIVIDERS, KAM_SHORT, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE, WEIRD_QUOTING autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org This backport is the first of two required for the pr111935 testcase, already backported to gcc-13, to pass on riscv64-elf and riscv32-elf. The V_VLS mode iterator, used in the original patch, is not available in gcc-13, and I thought that would be too much to backport (and maybe so are these two patches, WDYT?), so I changed it to V, to match the preexisting gcc-13 pattern. Regstrapped on x86_64-linux-gnu, along with other backports, and tested manually on riscv64-elf. Ok to install? From: Lehua Ding Hi, This patch revert the convert from vmv.s.x to vmv.v.i and add new pattern optimize the special case when the scalar operand is zero. Currently, the broadcast pattern where the scalar operand is a imm will be converted to vmv.v.i from vmv.s.x and the mask operand will be converted from 00..01 to 11..11. There are some advantages and disadvantages before and after the conversion after discussing with Juzhe offline and we chose not to do this transform. Before: Advantages: The vsetvli info required by vmv.s.x has better compatibility since vmv.s.x only required SEW and VLEN be zero or one. That mean there is more opportunities to combine with other vsetlv infos in vsetvl pass. Disadvantages: For non-zero scalar imm, one more `li rd, imm` instruction will be needed. After: Advantages: No need `li rd, imm` instruction since vmv.v.i support imm operand. Disadvantages: Like before's advantages. Worse compatibility leads to more vsetvl instrunctions need. Consider the bellow C code and asm after autovec. there is an extra insn (vsetivli zero, 1, e32, m1, ta, ma) after converted vmv.s.x to vmv.v.i. ``` int foo1(int* restrict a, int* restrict b, int *restrict c, int n) { int sum = 0; for (int i = 0; i < n; i++) sum += a[i] * b[i]; return sum; } ``` asm (Before): ``` foo1: ble a3,zero,.L7 vsetvli a2,zero,e32,m1,ta,ma vmv.v.i v1,0 .L6: vsetvli a5,a3,e32,m1,tu,ma slli a4,a5,2 sub a3,a3,a5 vle32.v v2,0(a0) vle32.v v3,0(a1) add a0,a0,a4 add a1,a1,a4 vmacc.vv v1,v3,v2 bne a3,zero,.L6 vsetvli a2,zero,e32,m1,ta,ma vmv.s.x v2,zero vredsum.vs v1,v1,v2 vmv.x.s a0,v1 ret .L7: li a0,0 ret ``` asm (After): ``` foo1: ble a3,zero,.L4 vsetvli a2,zero,e32,m1,ta,ma vmv.v.i v1,0 .L3: vsetvli a5,a3,e32,m1,tu,ma slli a4,a5,2 sub a3,a3,a5 vle32.v v2,0(a0) vle32.v v3,0(a1) add a0,a0,a4 add a1,a1,a4 vmacc.vv v1,v3,v2 bne a3,zero,.L3 vsetivli zero,1,e32,m1,ta,ma vmv.v.i v2,0 vsetvli a2,zero,e32,m1,ta,ma vredsum.vs v1,v1,v2 vmv.x.s a0,v1 ret .L4: li a0,0 ret ``` Best, Lehua Co-Authored-By: Ju-Zhe Zhong gcc/ChangeLog: * config/riscv/predicates.md (vector_const_0_operand): New. * config/riscv/vector.md (*pred_broadcast_zero): Ditto. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/base/scalar_move-5.c: Update. * gcc.target/riscv/rvv/base/scalar_move-6.c: Ditto. (cherry picked from commit 86d80395cf3c8832b669135b1ca7ea8258790c19) --- gcc/config/riscv/predicates.md | 4 ++ gcc/config/riscv/vector.md | 43 ++++++++++++++------ .../gcc.target/riscv/rvv/base/scalar_move-5.c | 20 ++++++++- .../gcc.target/riscv/rvv/base/scalar_move-6.c | 22 ++++++++-- 4 files changed, 70 insertions(+), 19 deletions(-) diff --git a/gcc/config/riscv/predicates.md b/gcc/config/riscv/predicates.md index 8654dbc594354..1707c80cba256 100644 --- a/gcc/config/riscv/predicates.md +++ b/gcc/config/riscv/predicates.md @@ -276,6 +276,10 @@ (define_predicate "reg_or_int_operand" (ior (match_operand 0 "register_operand") (match_operand 0 "const_int_operand"))) +(define_predicate "vector_const_0_operand" + (and (match_code "const_vector") + (match_test "satisfies_constraint_Wc0 (op)"))) + (define_predicate "vector_move_operand" (ior (match_operand 0 "nonimmediate_operand") (and (match_code "const_vector") diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md index db3a972832aea..fb0caab8da360 100644 --- a/gcc/config/riscv/vector.md +++ b/gcc/config/riscv/vector.md @@ -1217,23 +1217,24 @@ (define_expand "@pred_broadcast" (match_operand:V 2 "vector_merge_operand")))] "TARGET_VECTOR" { - /* Handle vmv.s.x instruction which has memory scalar. */ - if (satisfies_constraint_Wdm (operands[3]) || riscv_vector::simm5_p (operands[3]) - || rtx_equal_p (operands[3], CONST0_RTX (mode))) + /* Handle vmv.s.x instruction (Wb1 mask) which has memory scalar. */ + if (satisfies_constraint_Wdm (operands[3])) { if (satisfies_constraint_Wb1 (operands[1])) - { - // Case 1: vmv.s.x (TA) ==> vlse.v (TA) - if (satisfies_constraint_vu (operands[2])) - operands[1] = CONSTM1_RTX (mode); - else if (GET_MODE_BITSIZE (mode) > GET_MODE_BITSIZE (Pmode)) - { - // Case 2: vmv.s.x (TU) ==> andi vl + vlse.v (TU) in RV32 system. + { + /* Case 1: vmv.s.x (TA, x == memory) ==> vlse.v (TA) */ + if (satisfies_constraint_vu (operands[2])) + operands[1] = CONSTM1_RTX (mode); + else if (GET_MODE_BITSIZE (mode) > GET_MODE_BITSIZE (Pmode)) + { + /* Case 2: vmv.s.x (TU, x == memory) ==> + vl = 0 or 1; + vlse.v (TU) in RV32 system */ operands[4] = riscv_vector::gen_avl_for_scalar_move (operands[4]); operands[1] = CONSTM1_RTX (mode); } - else - operands[3] = force_reg (mode, operands[3]); + else + /* Case 3: load x (memory) to register. */ + operands[3] = force_reg (mode, operands[3]); } } else if (GET_MODE_BITSIZE (mode) > GET_MODE_BITSIZE (Pmode) @@ -1348,6 +1349,24 @@ (define_insn "*pred_broadcast_extended_scalar" [(set_attr "type" "vimov,vimov,vimovxv,vimovxv") (set_attr "mode" "")]) +(define_insn "*pred_broadcast_zero" + [(set (match_operand:V 0 "register_operand" "=vr, vr") + (if_then_else:V + (unspec: + [(match_operand: 1 "vector_least_significant_set_mask_operand" "Wb1, Wb1") + (match_operand 4 "vector_length_operand" " rK, rK") + (match_operand 5 "const_int_operand" " i, i") + (match_operand 6 "const_int_operand" " i, i") + (match_operand 7 "const_int_operand" " i, i") + (reg:SI VL_REGNUM) + (reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE) + (match_operand:V 3 "vector_const_0_operand" "Wc0, Wc0") + (match_operand:V 2 "vector_merge_operand" " vu, 0")))] + "TARGET_VECTOR" + "vmv.s.x\t%0,zero" + [(set_attr "type" "vimovxv,vimovxv") + (set_attr "mode" "")]) + ;; ------------------------------------------------------------------------------- ;; ---- Predicated Strided loads/stores ;; ------------------------------------------------------------------------------- diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/scalar_move-5.c b/gcc/testsuite/gcc.target/riscv/rvv/base/scalar_move-5.c index db6800c89781b..2e897a4896fec 100644 --- a/gcc/testsuite/gcc.target/riscv/rvv/base/scalar_move-5.c +++ b/gcc/testsuite/gcc.target/riscv/rvv/base/scalar_move-5.c @@ -121,7 +121,7 @@ void foo8 (void *base, void *out, size_t vl, double x) /* ** foo9: ** ... -** vmv.v.i\tv[0-9]+,\s*-15 +** vmv.s.x\tv[0-9]+,\s*[a-x0-9]+ ** ... ** ret */ @@ -150,7 +150,7 @@ void foo10 (void *base, void *out, size_t vl) /* ** foo11: ** ... -** vmv.v.i\tv[0-9]+,\s*0 +** vmv.s.x\tv[0-9]+,\s*zero ** ... ** ret */ @@ -164,7 +164,7 @@ void foo11 (void *base, void *out, size_t vl) /* ** foo12: ** ... -** vfmv.s.f\tv[0-9]+,\s*[a-x0-9]+ +** vmv.s.x\tv[0-9]+,\s*zero ** ... ** ret */ @@ -174,3 +174,17 @@ void foo12 (void *base, void *out, size_t vl) vfloat64m2_t v = __riscv_vfmv_s_f_f64m2_tu (merge, 0, vl); *(vfloat64m2_t*)out = v; } + +/* +** foo13: +** ... +** vfmv.s.f\tv[0-9]+,\s*[a-x0-9]+ +** ... +** ret +*/ +void foo13 (void *base, void *out, size_t vl) +{ + vfloat64m2_t merge = *(vfloat64m2_t*) (base + 200); + vfloat64m2_t v = __riscv_vfmv_s_f_f64m2_tu (merge, 0.2, vl); + *(vfloat64m2_t*)out = v; +} diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/scalar_move-6.c b/gcc/testsuite/gcc.target/riscv/rvv/base/scalar_move-6.c index f27f85cdb5866..326cfd8e2ff4b 100644 --- a/gcc/testsuite/gcc.target/riscv/rvv/base/scalar_move-6.c +++ b/gcc/testsuite/gcc.target/riscv/rvv/base/scalar_move-6.c @@ -119,7 +119,7 @@ void foo8 (void *base, void *out, size_t vl, double x) /* ** foo9: ** ... -** vmv.v.i\tv[0-9]+,\s*-15 +** vmv.s.x\tv[0-9]+,\s*[a-x0-9]+ ** ... ** ret */ @@ -133,7 +133,7 @@ void foo9 (void *base, void *out, size_t vl) /* ** foo10: ** ... -** vmv.v.i\tv[0-9]+,\s*-15 +** vmv.s.x\tv[0-9]+,\s*[a-x0-9]+ ** ... */ void foo10 (void *base, void *out, size_t vl) @@ -147,7 +147,7 @@ void foo10 (void *base, void *out, size_t vl) /* ** foo11: ** ... -** vmv.v.i\tv[0-9]+,\s*0 +** vmv.s.x\tv[0-9]+,\s*zero ** ... ** ret */ @@ -161,7 +161,7 @@ void foo11 (void *base, void *out, size_t vl) /* ** foo12: ** ... -** vmv.v.i\tv[0-9]+,\s*0 +** vmv.s.x\tv[0-9]+,\s*zero ** ... ** ret */ @@ -172,6 +172,20 @@ void foo12 (void *base, void *out, size_t vl) *(vfloat64m2_t*)out = v; } +/* +** foo12_1: +** ... +** vfmv.s.f\tv[0-9]+,\s*[a-x0-9]+ +** ... +** ret +*/ +void foo12_1 (void *base, void *out, size_t vl) +{ + vfloat64m2_t merge = *(vfloat64m2_t*) (base + 200); + vfloat64m2_t v = __riscv_vfmv_s_f_f64m2_tu (merge, 0.2, vl); + *(vfloat64m2_t*)out = v; +} + /* ** foo13: ** ...