From patchwork Tue Mar 5 17:52:30 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Sandiford X-Patchwork-Id: 1908408 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=8.43.85.97; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4Tq37W3w2gz23cb for ; Wed, 6 Mar 2024 04:52:55 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 1E5F13858421 for ; Tue, 5 Mar 2024 17:52:53 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by sourceware.org (Postfix) with ESMTP id F2F1B3858D20 for ; Tue, 5 Mar 2024 17:52:31 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org F2F1B3858D20 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org F2F1B3858D20 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=217.140.110.172 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1709661154; cv=none; b=NVm0sQSGIO17PyzCBesJBprgWWg9AWW5qi4NZhVRXyfZ2qVT82lsIKp7N4fT/3M9lbq3/vQ41tL/B3scm0VypvOYCehVRxKf6SeMwkV9r2+OdUKu2p9NQrmrZCrvHWKxJ/IHzEEkczzdFtsm4RKXLJyYD4aVyuOG3c5vKS1xow4= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1709661154; c=relaxed/simple; bh=RAb9NwPv7yKhnrL+PYs7cqzqS1nZs0KYeTpg2J2a6L4=; h=From:To:Subject:Date:Message-ID:MIME-Version; b=l9PLxKG3KOSqKVAEVNF1l4eTbXmuWWrUF/VutMon4my90NIn/F0gZTGEsyBSo9rygTS1do8Y+VAf2/q4KFeCMB9nDmnW0x1nd6Cw0RD4ILG/qdqSc+LsY0I0yZ7dWY+E5BHI8m2ddM+6pOuiwj07caxN33kdM1ltxtAJg7b7NOk= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 6C8381FB for ; Tue, 5 Mar 2024 09:53:08 -0800 (PST) Received: from localhost (e121540-lin.manchester.arm.com [10.32.110.72]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 59A193F73F for ; Tue, 5 Mar 2024 09:52:31 -0800 (PST) From: Richard Sandiford To: gcc-patches@gcc.gnu.org Mail-Followup-To: gcc-patches@gcc.gnu.org, richard.sandiford@arm.com Subject: [pushed] aarch64: Remove SME2.1 forms of LUTI2/4 Date: Tue, 05 Mar 2024 17:52:30 +0000 Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) MIME-Version: 1.0 X-Spam-Status: No, score=-20.6 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_NONE, KAM_DMARC_STATUS, KAM_LAZY_DOMAIN_SECURITY, KAM_NUMSUBJECT, KAM_SHORT, SPF_HELO_NONE, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org I was over-eager when adding support for strided SME2 instructions and accidentally included forms of LUTI2 and LUTI4 that are only available with SME2.1, not SME2. This patch removes them for now. We're planning to add proper support for SME2.1 in the GCC 15 timeframe. Sorry for the blunder :( Tested on aarch64-linux-gnu & pushed. Richard gcc/ * config/aarch64/aarch64.md (stride_type): Remove luti_consecutive and luti_strided. * config/aarch64/aarch64-sme.md (@aarch64_sme_lut): Remove stride_type attribute. (@aarch64_sme_lut_strided2): Delete. (@aarch64_sme_lut_strided4): Likewise. * config/aarch64/aarch64-early-ra.cc (is_stride_candidate) (early_ra::maybe_convert_to_strided_access): Remove support for strided LUTI2 and LUTI4. gcc/testsuite/ * gcc.target/aarch64/sme/strided_1.c (test5): Remove. --- gcc/config/aarch64/aarch64-early-ra.cc | 20 +----- gcc/config/aarch64/aarch64-sme.md | 70 ------------------- gcc/config/aarch64/aarch64.md | 3 +- .../gcc.target/aarch64/sme/strided_1.c | 55 --------------- 4 files changed, 3 insertions(+), 145 deletions(-) diff --git a/gcc/config/aarch64/aarch64-early-ra.cc b/gcc/config/aarch64/aarch64-early-ra.cc index 8530b0ae41e..1e2c823cb2e 100644 --- a/gcc/config/aarch64/aarch64-early-ra.cc +++ b/gcc/config/aarch64/aarch64-early-ra.cc @@ -1060,8 +1060,7 @@ is_stride_candidate (rtx_insn *insn) return false; auto stride_type = get_attr_stride_type (insn); - return (stride_type == STRIDE_TYPE_LUTI_CONSECUTIVE - || stride_type == STRIDE_TYPE_LD1_CONSECUTIVE + return (stride_type == STRIDE_TYPE_LD1_CONSECUTIVE || stride_type == STRIDE_TYPE_ST1_CONSECUTIVE); } @@ -3212,8 +3211,7 @@ early_ra::maybe_convert_to_strided_access (rtx_insn *insn) auto stride_type = get_attr_stride_type (insn); rtx pat = PATTERN (insn); rtx op; - if (stride_type == STRIDE_TYPE_LUTI_CONSECUTIVE - || stride_type == STRIDE_TYPE_LD1_CONSECUTIVE) + if (stride_type == STRIDE_TYPE_LD1_CONSECUTIVE) op = SET_DEST (pat); else if (stride_type == STRIDE_TYPE_ST1_CONSECUTIVE) op = XVECEXP (SET_SRC (pat), 0, 1); @@ -3263,20 +3261,6 @@ early_ra::maybe_convert_to_strided_access (rtx_insn *insn) XVECEXP (SET_SRC (pat), 0, XVECLEN (SET_SRC (pat), 0) - 1) = *recog_data.dup_loc[0]; } - else if (stride_type == STRIDE_TYPE_LUTI_CONSECUTIVE) - { - auto bits = INTVAL (XVECEXP (SET_SRC (pat), 0, 4)); - if (range.count == 2) - pat = gen_aarch64_sme_lut_strided2 (bits, single_mode, - regs[0], regs[1], - recog_data.operand[1], - recog_data.operand[2]); - else - pat = gen_aarch64_sme_lut_strided4 (bits, single_mode, - regs[0], regs[1], regs[2], regs[3], - recog_data.operand[1], - recog_data.operand[2]); - } else gcc_unreachable (); PATTERN (insn) = pat; diff --git a/gcc/config/aarch64/aarch64-sme.md b/gcc/config/aarch64/aarch64-sme.md index c95d4aa696c..78ad2fc699f 100644 --- a/gcc/config/aarch64/aarch64-sme.md +++ b/gcc/config/aarch64/aarch64-sme.md @@ -1939,74 +1939,4 @@ (define_insn "@aarch64_sme_lut" "TARGET_STREAMING_SME2 && !( == 4 && == 4 && == 8)" "luti\t%0, zt0, %1[%2]" - [(set_attr "stride_type" "luti_consecutive")] -) - -(define_insn "@aarch64_sme_lut_strided2" - [(set (match_operand:SVE_FULL_BHS 0 "aarch64_simd_register" "=Uwd") - (unspec:SVE_FULL_BHS - [(reg:V8DI ZT0_REGNUM) - (reg:DI SME_STATE_REGNUM) - (match_operand:VNx16QI 2 "register_operand" "w") - (match_operand:DI 3 "const_int_operand") - (const_int LUTI_BITS) - (const_int 0)] - UNSPEC_SME_LUTI)) - (set (match_operand:SVE_FULL_BHS 1 "aarch64_simd_register" "=w") - (unspec:SVE_FULL_BHS - [(reg:V8DI ZT0_REGNUM) - (reg:DI SME_STATE_REGNUM) - (match_dup 2) - (match_dup 3) - (const_int LUTI_BITS) - (const_int 1)] - UNSPEC_SME_LUTI))] - "TARGET_STREAMING_SME2 - && aarch64_strided_registers_p (operands, 2, 8)" - "luti\t{%0., %1.}, zt0, %2[%3]" - [(set_attr "stride_type" "luti_strided")] -) - -(define_insn "@aarch64_sme_lut_strided4" - [(set (match_operand:SVE_FULL_BHS 0 "aarch64_simd_register" "=Uwt") - (unspec:SVE_FULL_BHS - [(reg:V8DI ZT0_REGNUM) - (reg:DI SME_STATE_REGNUM) - (match_operand:VNx16QI 4 "register_operand" "w") - (match_operand:DI 5 "const_int_operand") - (const_int LUTI_BITS) - (const_int 0)] - UNSPEC_SME_LUTI)) - (set (match_operand:SVE_FULL_BHS 1 "aarch64_simd_register" "=w") - (unspec:SVE_FULL_BHS - [(reg:V8DI ZT0_REGNUM) - (reg:DI SME_STATE_REGNUM) - (match_dup 4) - (match_dup 5) - (const_int LUTI_BITS) - (const_int 1)] - UNSPEC_SME_LUTI)) - (set (match_operand:SVE_FULL_BHS 2 "aarch64_simd_register" "=w") - (unspec:SVE_FULL_BHS - [(reg:V8DI ZT0_REGNUM) - (reg:DI SME_STATE_REGNUM) - (match_dup 4) - (match_dup 5) - (const_int LUTI_BITS) - (const_int 2)] - UNSPEC_SME_LUTI)) - (set (match_operand:SVE_FULL_BHS 3 "aarch64_simd_register" "=w") - (unspec:SVE_FULL_BHS - [(reg:V8DI ZT0_REGNUM) - (reg:DI SME_STATE_REGNUM) - (match_dup 4) - (match_dup 5) - (const_int LUTI_BITS) - (const_int 3)] - UNSPEC_SME_LUTI))] - "TARGET_STREAMING_SME2 - && !( == 4 && == 8) - && aarch64_strided_registers_p (operands, 4, 4)" - "luti\t{%0., %1., %2., %3.}, zt0, %4[%5]" - [(set_attr "stride_type" "luti_strided")] ) diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index 33fbe1b2e8d..7d51d923bf6 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -553,8 +553,7 @@ (define_attr "speculation_barrier" "true,false" (const_string "false")) ;; The RTL mapping therefore applies at LD1 granularity, rather than ;; being broken down into individual types of load. (define_attr "stride_type" - "none,ld1_consecutive,ld1_strided,st1_consecutive,st1_strided, - luti_consecutive,luti_strided" + "none,ld1_consecutive,ld1_strided,st1_consecutive,st1_strided" (const_string "none")) ;; Attribute used to identify load pair and store pair instructions. diff --git a/gcc/testsuite/gcc.target/aarch64/sme/strided_1.c b/gcc/testsuite/gcc.target/aarch64/sme/strided_1.c index 3620fff3668..73aac0683ea 100644 --- a/gcc/testsuite/gcc.target/aarch64/sme/strided_1.c +++ b/gcc/testsuite/gcc.target/aarch64/sme/strided_1.c @@ -180,61 +180,6 @@ void test4(int32_t *dest, int32_t *src) __arm_streaming svget4(l2, 3), svget4(l3, 3))); } -/* -** test5: -** ptrue [^\n]+ -** ld1b [^\n]+ -** ld1b [^\n]+ -** ptrue ([^\n]+)\.s -** ld1w [^\n]+, \1/z, \[x0\] -** luti4 {z16\.s, z20\.s, z24\.s, z28\.s}, zt0, z[0-9]+\[0\] -** luti4 {z17\.s, z21\.s, z25\.s, z29\.s}, zt0, z[0-9]+\[1\] -** luti4 {z18\.s, z22\.s, z26\.s, z30\.s}, zt0, z[0-9]+\[0\] -** luti4 {z19\.s, z23\.s, z27\.s, z31\.s}, zt0, z[0-9]+\[1\] -** uclamp {z16\.s - z19\.s}, z[0-9]+\.s, z[0-9]+\.s -** uclamp {z20\.s - z23\.s}, z[0-9]+\.s, z[0-9]+\.s -** uclamp {z24\.s - z27\.s}, z[0-9]+\.s, z[0-9]+\.s -** uclamp {z28\.s - z31\.s}, z[0-9]+\.s, z[0-9]+\.s -** st1w {z16\.s - z19\.s}, \1, \[x0\] -** st1w {z20\.s - z23\.s}, \1, \[x0, #4, mul vl\] -** st1w {z24\.s - z27\.s}, \1, \[x0, #8, mul vl\] -** st1w {z28\.s - z31\.s}, \1, \[x0, #12, mul vl\] -** ret -*/ -void test5(uint32_t *dest, uint8_t *indices) - __arm_streaming __arm_preserves("za") __arm_inout("zt0") -{ - svuint8_t indices1 = svld1_vnum(svptrue_b8(), indices, 0); - svuint8_t indices2 = svld1_vnum(svptrue_b8(), indices, 2); - - svcount_t pg = svptrue_c32(); - svuint32x4_t bounds = svld1_x4(pg, dest); - - svuint32x4_t x0 = svluti4_lane_zt_u32_x4(0, indices1, 0); - svuint32x4_t x1 = svluti4_lane_zt_u32_x4(0, indices1, 1); - svuint32x4_t x2 = svluti4_lane_zt_u32_x4(0, indices2, 0); - svuint32x4_t x3 = svluti4_lane_zt_u32_x4(0, indices2, 1); - - svuint32x4_t y0 = svcreate4(svget4(x0, 0), svget4(x1, 0), - svget4(x2, 0), svget4(x3, 0)); - svuint32x4_t y1 = svcreate4(svget4(x0, 1), svget4(x1, 1), - svget4(x2, 1), svget4(x3, 1)); - svuint32x4_t y2 = svcreate4(svget4(x0, 2), svget4(x1, 2), - svget4(x2, 2), svget4(x3, 2)); - svuint32x4_t y3 = svcreate4(svget4(x0, 3), svget4(x1, 3), - svget4(x2, 3), svget4(x3, 3)); - - y0 = svclamp(y0, svget4(bounds, 0), svget4(bounds, 1)); - y1 = svclamp(y1, svget4(bounds, 2), svget4(bounds, 3)); - y2 = svclamp(y2, svget4(bounds, 0), svget4(bounds, 1)); - y3 = svclamp(y3, svget4(bounds, 2), svget4(bounds, 3)); - - svst1_vnum(pg, dest, 0, y0); - svst1_vnum(pg, dest, 4, y1); - svst1_vnum(pg, dest, 8, y2); - svst1_vnum(pg, dest, 12, y3); -} - /* ** test6: ** ptrue [^\n]+