From patchwork Tue May 16 10:53:32 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Sandiford X-Patchwork-Id: 1781890 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: legolas.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256 header.s=default header.b=apsyxNpI; dkim-atps=neutral Received: from sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-384) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4QLClp42J6z20KF for ; Tue, 16 May 2023 20:53:57 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 75AD3385734C for ; Tue, 16 May 2023 10:53:55 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 75AD3385734C DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1684234435; bh=lvhBcbVc4+mRRGmgmaEtm4fefbSdFuQe5RGNpq3ft38=; h=To:Cc:Subject:Date:List-Id:List-Unsubscribe:List-Archive: List-Post:List-Help:List-Subscribe:From:Reply-To:From; b=apsyxNpIXDnffyhvd0IYTYE4/ZlVEexF/Kf8E5/j9TpcBaoOsMeCOcqYJVvzhud0x OWy65nwuu4LaVWbxbGCIJXzH2lBCLJOIAHhA+Fko5BTTSCvRmTn6WZ3vTVY4cNJjFD Rv1liN4cM98RvxyZKniv2ZwxpeXCRoNF0eV804kI= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by sourceware.org (Postfix) with ESMTP id EA16F3858439 for ; Tue, 16 May 2023 10:53:34 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org EA16F3858439 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 4E55F2F4; Tue, 16 May 2023 03:54:19 -0700 (PDT) Received: from localhost (e121540-lin.manchester.arm.com [10.32.110.72]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 1ED643F663; Tue, 16 May 2023 03:53:34 -0700 (PDT) To: gcc-patches@gcc.gnu.org Mail-Followup-To: gcc-patches@gcc.gnu.org, kyrylo.tkachov@arm.com, richard.sandiford@arm.com Cc: kyrylo.tkachov@arm.com Subject: [PATCH] aarch64: Allow moves after tied-register intrinsics (2nd edition) Date: Tue, 16 May 2023 11:53:32 +0100 Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) MIME-Version: 1.0 X-Spam-Status: No, score=-29.1 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_NONE, KAM_DMARC_STATUS, KAM_LAZY_DOMAIN_SECURITY, KAM_SHORT, SPF_HELO_NONE, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Richard Sandiford via Gcc-patches From: Richard Sandiford Reply-To: Richard Sandiford Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" I missed these two in g:4ff89f10ca0d41f9cfa76 because I was testing on a system that didn't support big-endian compilation. Testing on aarch64_be-elf shows no other related failures (although the overall results are worse than for little-endian). Tested on aarch64_be-elf & pushed. Richard gcc/testsuite/ * gcc.target/aarch64/advsimd-intrinsics/bfdot-2.c: Allow mves to occur after the intrinsic instruction, rather than requiring them to happen before. * gcc.target/aarch64/advsimd-intrinsics/vdot-3-2.c: Likewise. --- .../gcc.target/aarch64/advsimd-intrinsics/bfdot-2.c | 10 ++++++++++ .../gcc.target/aarch64/advsimd-intrinsics/vdot-3-2.c | 10 ++++++++++ 2 files changed, 20 insertions(+) diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/bfdot-2.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/bfdot-2.c index ae0a953f7b4..9975edb8fdb 100644 --- a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/bfdot-2.c +++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/bfdot-2.c @@ -70,8 +70,13 @@ float32x4_t ufooq_lane(float32x4_t r, bfloat16x8_t x, bfloat16x4_t y) /* **ufoo_untied: +** ( ** mov v0.8b, v1.8b ** bfdot v0.2s, (v2.4h, v3.4h|v3.4h, v2.4h) +** | +** bfdot v1.2s, (v2.4h, v3.4h|v3.4h, v2.4h) +** mov v0.8b, v1.8b +** ) ** ret */ float32x2_t ufoo_untied(float32x4_t unused, float32x2_t r, bfloat16x4_t x, bfloat16x4_t y) @@ -81,8 +86,13 @@ float32x2_t ufoo_untied(float32x4_t unused, float32x2_t r, bfloat16x4_t x, bfloa /* **ufooq_lane_untied: +** ( ** mov v0.16b, v1.16b ** bfdot v0.4s, v2.8h, v3.2h\[1\] +** | +** bfdot v1.4s, v2.8h, v3.2h\[1\] +** mov v0.16b, v1.16b +** ) ** ret */ float32x4_t ufooq_lane_untied(float32x4_t unused, float32x4_t r, bfloat16x8_t x, bfloat16x4_t y) diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vdot-3-2.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vdot-3-2.c index 61c7c51f5ec..76787f6bedd 100644 --- a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vdot-3-2.c +++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vdot-3-2.c @@ -115,8 +115,13 @@ int32x4_t sfooq_laneq (int32x4_t r, int8x16_t x, uint8x16_t y) /* **ufoo_untied: +** ( ** mov v0\.8b, v1\.8b ** usdot v0\.2s, v2\.8b, v3\.8b +** | +** usdot v1\.2s, v2\.8b, v3\.8b +** mov v0\.8b, v1\.8b +** ) ** ret */ int32x2_t ufoo_untied (int32x2_t unused, int32x2_t r, uint8x8_t x, int8x8_t y) @@ -126,8 +131,13 @@ int32x2_t ufoo_untied (int32x2_t unused, int32x2_t r, uint8x8_t x, int8x8_t y) /* **ufooq_laneq_untied: +** ( ** mov v0\.16b, v1\.16b ** usdot v0\.4s, v2\.16b, v3\.4b\[3\] +** | +** usdot v1\.4s, v2\.16b, v3\.4b\[3\] +** mov v0\.16b, v1\.16b +** ) ** ret */ int32x4_t ufooq_laneq_untied (int32x2_t unused, int32x4_t r, uint8x16_t x, int8x16_t y)