From patchwork Wed Jul 10 15:32:39 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Sandiford X-Patchwork-Id: 1958900 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=8.43.85.97; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4WK21Y5GlLz1xpd for ; Thu, 11 Jul 2024 01:33:05 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id BEA113858435 for ; Wed, 10 Jul 2024 15:33:03 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by sourceware.org (Postfix) with ESMTP id 1D9C03858432 for ; Wed, 10 Jul 2024 15:32:41 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 1D9C03858432 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 1D9C03858432 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=217.140.110.172 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1720625563; cv=none; b=vdWH6HAaYeBObtsRap0NuqfnW3OXHdFfsjaG+aqyONGl4w9IjSC0SYL8gAhomHiMpMgUHtbmiLpNK2+PZDnUQdKwdnVsYTCLN3Q95hqqcmfu0OMreJm1NCzjsrcvCLMnmx7Vbs2T1KK45IjzfOq8DfDHYNlnSHNmeCAdxLgVuIE= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1720625563; c=relaxed/simple; bh=G7x2gbru2xHcrsmMhm6qTO/fYTzqE7Tr02h7B7WJ+bM=; h=From:To:Subject:Date:Message-ID:MIME-Version; b=gt3O+qcUzGoh9f+bbz2UevLhjM1fpr7yCY383HnjEj9xgRiEsM6g7YzSaTfcJW0c9bJqTIESYIdoRV7MydNOGA7Q8kb33/3r8hP3nXwi89/RU/rE4q2xl6skyn8FbhzrPhFh5Zr41O7htgMcwgYCuJHfLuTlPXvKCrFUljeg7KU= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 04654106F for ; Wed, 10 Jul 2024 08:33:06 -0700 (PDT) Received: from localhost (e121540-lin.manchester.arm.com [10.32.110.72]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 798B33F766 for ; Wed, 10 Jul 2024 08:32:40 -0700 (PDT) From: Richard Sandiford To: gcc-patches@gcc.gnu.org Mail-Followup-To: gcc-patches@gcc.gnu.org, richard.sandiford@arm.com Subject: [PATCH] recog: Handle some mode-changing hardreg propagations Date: Wed, 10 Jul 2024 16:32:39 +0100 Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) MIME-Version: 1.0 X-Spam-Status: No, score=-19.5 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_NONE, KAM_DMARC_STATUS, KAM_LAZY_DOMAIN_SECURITY, KAM_SHORT, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org insn_propagation would previously only replace (reg:M H) with X for some hard register H if the uses of H were also in mode M. This patch extends it to handle simple mode punning too. The original motivation was to try to get rid of the execution frequency test in aarch64_split_simd_shift_p, but doing that is follow-up work. I tried this on at least one target per CPU directory (as for the late-combine patches) and it seems to be a small win for all of them. The patch includes a couple of updates to the ia32 results. In pr105033.c, foo3 replaced: vmovq 8(%esp), %xmm1 vpunpcklqdq %xmm1, %xmm0, %xmm0 with: vmovhps 8(%esp), %xmm0, %xmm0 In vect-bfloat16-2b.c, 5 of the vec_extract_v32bf_* routines (specifically the ones with nonzero even indices) replaced things like: movl 28(%esp), %eax vmovd %eax, %xmm0 with: vpinsrw $0, 28(%esp), %xmm0, %xmm0 (These functions return a bf16, and so only the low 16 bits matter.) Bootstrapped & regression-tested on aarch64-linux-gnu and x86_64-linux-gnu. OK to install? Richard gcc/ * recog.cc (insn_propagation::apply_to_rvalue_1): Handle simple cases of hardreg propagation in which the register is set and used in different modes. gcc/testsuite/ * gcc.target/i386/pr105033.c: Expect vmovhps for the ia32 version of foo. * gcc.target/i386/vect-bfloat16-2b.c: Expect more vpinsrws. --- gcc/recog.cc | 31 +++++++++++++++---- gcc/testsuite/gcc.target/i386/pr105033.c | 4 ++- .../gcc.target/i386/vect-bfloat16-2b.c | 2 +- 3 files changed, 29 insertions(+), 8 deletions(-) diff --git a/gcc/recog.cc b/gcc/recog.cc index 56370e40e01..36507f3f57c 100644 --- a/gcc/recog.cc +++ b/gcc/recog.cc @@ -1055,7 +1055,11 @@ insn_propagation::apply_to_rvalue_1 (rtx *loc) machine_mode mode = GET_MODE (x); auto old_num_changes = num_validated_changes (); - if (from && GET_CODE (x) == GET_CODE (from) && rtx_equal_p (x, from)) + if (from + && GET_CODE (x) == GET_CODE (from) + && (REG_P (x) + ? REGNO (x) == REGNO (from) + : rtx_equal_p (x, from))) { /* Don't replace register asms in asm statements; we mustn't change the user's register allocation. */ @@ -1065,11 +1069,26 @@ insn_propagation::apply_to_rvalue_1 (rtx *loc) && asm_noperands (PATTERN (insn)) > 0) return false; + rtx newval = to; + if (GET_MODE (x) != GET_MODE (from)) + { + gcc_assert (REG_P (x) && HARD_REGISTER_P (x)); + if (REG_NREGS (x) != REG_NREGS (from) + || !REG_CAN_CHANGE_MODE_P (REGNO (x), GET_MODE (from), + GET_MODE (x))) + return false; + newval = simplify_subreg (GET_MODE (x), to, GET_MODE (from), + subreg_lowpart_offset (GET_MODE (x), + GET_MODE (from))); + if (!newval) + return false; + } + if (should_unshare) - validate_unshare_change (insn, loc, to, 1); + validate_unshare_change (insn, loc, newval, 1); else - validate_change (insn, loc, to, 1); - if (mem_depth && !REG_P (to) && !CONSTANT_P (to)) + validate_change (insn, loc, newval, 1); + if (mem_depth && !REG_P (newval) && !CONSTANT_P (newval)) { /* We're substituting into an address, but TO will have the form expected outside an address. Canonicalize it if @@ -1083,9 +1102,9 @@ insn_propagation::apply_to_rvalue_1 (rtx *loc) { /* TO is owned by someone else, so create a copy and return TO to its original form. */ - rtx to = copy_rtx (*loc); + newval = copy_rtx (*loc); cancel_changes (old_num_changes); - validate_change (insn, loc, to, 1); + validate_change (insn, loc, newval, 1); } } num_replacements += 1; diff --git a/gcc/testsuite/gcc.target/i386/pr105033.c b/gcc/testsuite/gcc.target/i386/pr105033.c index ab05e3b3bc8..10e39783464 100644 --- a/gcc/testsuite/gcc.target/i386/pr105033.c +++ b/gcc/testsuite/gcc.target/i386/pr105033.c @@ -1,6 +1,8 @@ /* { dg-do compile } */ /* { dg-options "-march=sapphirerapids -O2" } */ -/* { dg-final { scan-assembler-times {vpunpcklqdq[ \t]+} 3 } } */ +/* { dg-final { scan-assembler-times {vpunpcklqdq[ \t]+} 3 { target { ! ia32 } } } } */ +/* { dg-final { scan-assembler-times {vpunpcklqdq[ \t]+} 2 { target ia32 } } } */ +/* { dg-final { scan-assembler-times {vmovhps[ \t]+} 1 { target ia32 } } } */ /* { dg-final { scan-assembler-not {vpermi2[wb][ \t]+} } } */ typedef _Float16 v8hf __attribute__((vector_size (16))); diff --git a/gcc/testsuite/gcc.target/i386/vect-bfloat16-2b.c b/gcc/testsuite/gcc.target/i386/vect-bfloat16-2b.c index 29bf601d537..0d1e14d6eb6 100644 --- a/gcc/testsuite/gcc.target/i386/vect-bfloat16-2b.c +++ b/gcc/testsuite/gcc.target/i386/vect-bfloat16-2b.c @@ -17,6 +17,6 @@ /* { dg-final { scan-assembler-times "vpbroadcastw" 6 { target ia32 } } } */ /* { dg-final { scan-assembler-times "vpblendw" 3 { target ia32 } } } */ -/* { dg-final { scan-assembler-times "vpinsrw" 63 { target ia32 } } } */ +/* { dg-final { scan-assembler-times "vpinsrw" 68 { target ia32 } } } */ /* { dg-final { scan-assembler-times "vpblendd" 3 } } */