From patchwork Thu Jun 13 09:36:18 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Sandiford X-Patchwork-Id: 1947316 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4W0HNs5fzWz20Xd for ; Thu, 13 Jun 2024 19:36:45 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id D609A3882108 for ; Thu, 13 Jun 2024 09:36:43 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by sourceware.org (Postfix) with ESMTP id 16B6A388204B for ; Thu, 13 Jun 2024 09:36:21 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 16B6A388204B Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 16B6A388204B Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=217.140.110.172 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1718271383; cv=none; b=cMDg+2rHv7Rg0/M95qU4Z8sIsh4WMwSdYYFCgRwQSUWRVrdAWxvXgv/9HS8qkKGwg3pBTINw5aVC9w8o5G/pkJWeeNaHll2M8U3Fd2ivJIHsqmk3mTTPIzcZSABjnm4ea21bPTvea5LFZcoqAKl7Ko4epCbwQKJaWVwvwLybzDE= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1718271383; c=relaxed/simple; bh=s0yXXBevh4lYgvyas/oKU2LvyFRtHpUEoW5l034r/UY=; h=From:To:Subject:Date:Message-ID:MIME-Version; b=MMQtdL5QqYrFtob/KQ76giXqJ4VMF6u7F6sdFfhi9olvAPwvAMMv5uosMd4YJZhwQrgA3TUIusD68Qn/SchzqyUYumRRTDMBgA+gTaiZmuUjUm5ANnutv/sfGN9Odfjr/tgr6dRa2KSaqnpixxGwotXr3Ow97QjcgCIPL23qupw= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 51F5F1063; Thu, 13 Jun 2024 02:36:45 -0700 (PDT) Received: from localhost (e121540-lin.manchester.arm.com [10.32.110.72]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 33F5A3F5A1; Thu, 13 Jun 2024 02:36:20 -0700 (PDT) From: Richard Sandiford To: gcc-patches@gcc.gnu.org Mail-Followup-To: gcc-patches@gcc.gnu.org, rguenther@suse.de, jlaw@ventanamicro.com, richard.sandiford@arm.com Cc: rguenther@suse.de, jlaw@ventanamicro.com Subject: [PATCH] aarch64: Fix invalid nested subregs [PR115464] Date: Thu, 13 Jun 2024 10:36:18 +0100 Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) MIME-Version: 1.0 X-Spam-Status: No, score=-19.9 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_NONE, KAM_DMARC_STATUS, KAM_LAZY_DOMAIN_SECURITY, KAM_SHORT, SPF_HELO_NONE, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org The testcase extracts one arm_neon.h vector from a pair (one subreg) and then reinterprets the result as an SVE vector (another subreg). Each subreg makes sense individually, but we can't fold them together into a single subreg: it's 32 bytes -> 16 bytes -> 16*N bytes, but the interpretation of 32 bytes -> 16*N bytes depends on whether N==1 or N>1. Since the second subreg makes sense individually, simplify_subreg should bail out rather than ICE on it. simplify_gen_subreg will then do the same (because it already checks validate_subreg). This leaves simplify_gen_subreg returning null, requiring the caller to take appropriate action. I think this is relatively likely to occur elsewhere, so the patch adds a helper for forcing a subreg, allowing a temporary pseudo to be created where necessary. I'll follow up by using force_subreg in more places. This patch is intended to be a minimal backportable fix for the PR. Bootstrapped & regression tested on aarch64-linux-gnu. OK for trunk and GCC 14 branch? Richard gcc/ PR target/115464 * simplify-rtx.cc (simplify_context::simplify_subreg): Don't try to fold two subregs together if their relationship isn't known at compile time. * explow.h (force_subreg): Declare. * explow.cc (force_subreg): New function. * config/aarch64/aarch64-sve-builtins-base.cc (svset_neonq_impl::expand): Use it instead of simplify_gen_subreg. gcc/testsuite/ PR target/115464 * gcc.target/aarch64/sve/acle/general/pr115464.c: New test. --- gcc/config/aarch64/aarch64-sve-builtins-base.cc | 2 +- gcc/explow.cc | 15 +++++++++++++++ gcc/explow.h | 2 ++ gcc/simplify-rtx.cc | 5 +++++ .../aarch64/sve/acle/general/pr115464.c | 13 +++++++++++++ 5 files changed, 36 insertions(+), 1 deletion(-) create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/acle/general/pr115464.c diff --git a/gcc/config/aarch64/aarch64-sve-builtins-base.cc b/gcc/config/aarch64/aarch64-sve-builtins-base.cc index 0d2edf3f19e..c9182594bc1 100644 --- a/gcc/config/aarch64/aarch64-sve-builtins-base.cc +++ b/gcc/config/aarch64/aarch64-sve-builtins-base.cc @@ -1174,7 +1174,7 @@ public: Advanced SIMD argument as an SVE vector. */ if (!BYTES_BIG_ENDIAN && is_undef (CALL_EXPR_ARG (e.call_expr, 0))) - return simplify_gen_subreg (mode, e.args[1], GET_MODE (e.args[1]), 0); + return force_subreg (mode, e.args[1], GET_MODE (e.args[1]), 0); rtx_vector_builder builder (VNx16BImode, 16, 2); for (unsigned int i = 0; i < 16; i++) diff --git a/gcc/explow.cc b/gcc/explow.cc index 8e5f6b8e680..f6843398c4b 100644 --- a/gcc/explow.cc +++ b/gcc/explow.cc @@ -745,6 +745,21 @@ force_reg (machine_mode mode, rtx x) return temp; } +/* Like simplify_gen_subreg, but force OP into a new register if the + subreg cannot be formed directly. */ + +rtx +force_subreg (machine_mode outermode, rtx op, + machine_mode innermode, poly_uint64 byte) +{ + rtx x = simplify_gen_subreg (outermode, op, innermode, byte); + if (x) + return x; + + op = copy_to_mode_reg (innermode, op); + return simplify_gen_subreg (outermode, op, innermode, byte); +} + /* If X is a memory ref, copy its contents to a new temp reg and return that reg. Otherwise, return X. */ diff --git a/gcc/explow.h b/gcc/explow.h index 16aa02cfb68..cbd1fcb7eb3 100644 --- a/gcc/explow.h +++ b/gcc/explow.h @@ -42,6 +42,8 @@ extern rtx copy_to_suggested_reg (rtx, rtx, machine_mode); Args are mode (in case value is a constant) and the value. */ extern rtx force_reg (machine_mode, rtx); +extern rtx force_subreg (machine_mode, rtx, machine_mode, poly_uint64); + /* Return given rtx, copied into a new temp reg if it was in memory. */ extern rtx force_not_mem (rtx); diff --git a/gcc/simplify-rtx.cc b/gcc/simplify-rtx.cc index 9bc3ef9ad9f..b6bb7e1f9e9 100644 --- a/gcc/simplify-rtx.cc +++ b/gcc/simplify-rtx.cc @@ -7735,6 +7735,11 @@ simplify_context::simplify_subreg (machine_mode outermode, rtx op, poly_uint64 innermostsize = GET_MODE_SIZE (innermostmode); rtx newx; + /* Make sure that the relationship between the two subregs is + known at compile time. */ + if (!ordered_p (outersize, innermostsize)) + return NULL_RTX; + if (outermode == innermostmode && known_eq (byte, 0U) && known_eq (SUBREG_BYTE (op), 0)) diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/general/pr115464.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/general/pr115464.c new file mode 100644 index 00000000000..d728d1325ed --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/general/pr115464.c @@ -0,0 +1,13 @@ +/* { dg-options "-O2" } */ + +#include +#include +#include + +svuint16_t +convolve4_4_x (uint16x8x2_t permute_tbl) +{ + return svset_neonq_u16 (svundef_u16 (), permute_tbl.val[1]); +} + +/* { dg-final { scan-assembler {\tmov\tz0\.d, z1\.d\n} } } */