aarch64: Rework uxtl->zip optimisation [PR113196]

g:f26f92b534f9 implemented unsigned extensions using ZIPs rather than
UXTL{,2}, since the former has a higher throughput than the latter on
amny cores.  The optimisation worked by lowering directly to ZIP during
expand, so that the zero input could be hoisted and shared.

However, changing to ZIP means that zero extensions no longer benefit
from some existing combine patterns.  The patch included new patterns
for UADDW and USUBW, but the PR shows that other patterns were affected
as well.

This patch instead introduces the ZIPs during a pre-reload split
and forcibly hoists the zero move to the outermost scope.  This has
the disadvantage of executing the move even for a shrink-wrapped
function, which I suppose could be a problem if it causes a kernel
to trap and enable Advanced SIMD unnecessarily.  In other circumstances,
an unused move shouldn't affect things much.

Also, the RA should be able to rematerialise the move at an
appropriate point if necessary, such as if there is an intervening
call.  uxtl-combine-13.c contains a test for this.

The patch then tries to ensure that the post-RA late-combine pass
can recombine zeros and ZIPs back into UXTLs if there wasn't
sufficient use of the zero to make it worthwhile.  The cut-off
used by the patch is that 1 UXTL is better than 1 MOVI + 1 ZIP,
but that 1 MOVI + 2 ZIPs are better than 2 UXTLs (assuming all
instructions have equal execution frequency).  Any other uses of the
shared zero would count in its favour too; it's not limitedto ZIPs.

In order to do that, the patch relaxes the ZIP patterns so that
the inputs can have any mode.  This allows the V4SI zero to be
propagated into any kind of ZIP, rather than just V4SI ones.
I think that's logically consistent, since it's the mode of
the unspec that ultimately determines the mode of the operation.
(And we don't need to be overly defensive about which modes are
acceptable, since ZIPs are only generated by code that knows/ought
to know what it's doing.)

Also, the original optimisation contained a big-endian correction
that I don't think is needed/correct.  Even on big-endian targets,
we want the ZIP to take the low half of an element from the input
vector and the high half from the zero vector.  And the patterns
map directly to the underlying Advanced SIMD instructions: the use
of unspecs means that there's no need to adjust for the difference
between GCC and Arm lane numbering.

Tested on aarch64-linux-gnu and aarch64_be-elf (fixing some execution
failures for the latter).  The patch depends on the late-combine pass
and on the FUNCTION_BEG patch that I just posted.  I'll commit once
those are in, if there are no objections.

Richard

gcc/
	PR target/113196
	* config/aarch64/aarch64.h (machine_function::advsimd_zero_insn):
	New member variable.
	* config/aarch64/iterators.md (Vnarrowq2): New mode attribute.
	* config/aarch64/predicates.md (aarch64_any_register_operand):
	Accept subregs too.
	* config/aarch64/aarch64-simd.md
	(aarch64_<PERMUTE:perm_insn><mode><vczle><vczbe>): Change the
	input operand predicates to aarch64_any_register_operand.
	(vec_unpacku_hi_<mode>, vec_unpacks_hi_<mode>): Recombine into...
	(vec_unpack<su>_hi_<mode>): ...this.  Move the generation of
	zip2 for zero-extends to...
	(aarch64_simd_vec_unpack<su>_hi_<mode>): ...a split of this
	instruction.  Fix big-endian handling.
	(*aarch64_zip2_uxtl2): New pattern.
	(vec_unpacku_lo_<mode>, vec_unpacks_lo_<mode>): Recombine into...
	(vec_unpack<su>_lo_<mode>): ...this.  Move the generation of
	zip1 for zero-extends to...
	(<optab><Vnarrowq><mode>2): ...a split of this instruction.
	Fix big-endian handling.
	(*aarch64_zip1_uxtl): New pattern.
	(aarch64_usubw<mode>_lo_zip, aarch64_uaddw<mode>_lo_zip): Delete
	(aarch64_usubw<mode>_hi_zip, aarch64_uaddw<mode>_hi_zip): Likewise.
	* config/aarch64/aarch64.cc (aarch64_rtx_costs): Recognize ZIP1s
	and ZIP2s that can be implemented using UXTL{,2}.  Make them
	half an instruction more expensive than a normal zip.
	(aarch64_get_shareable_reg): New function.
	(aarch64_gen_shareable_zero): Use it.

gcc/testsuite/
	PR target/113196
	* gcc.target/aarch64/pr103350-1.c: Disable split1.
	* gcc.target/aarch64/pr103350-2.c: Likewise.
	* gcc.target/aarch64/simd/vmovl_high_1.c: Remove double include.
	Expect uxtl2 rather than zip2.
	* gcc.target/aarch64/vect_mixed_sizes_8.c: Expect zip1 rather
	than uxtl.
	* gcc.target/aarch64/vect_mixed_sizes_9.c: Likewise.
	* gcc.target/aarch64/vect_mixed_sizes_10.c: Likewise.
	* gcc.target/aarch64/uxtl-combine-7.c: New test.
	* gcc.target/aarch64/uxtl-combine-8.c: Likewise.
	* gcc.target/aarch64/uxtl-combine-9.c: Likewise.
	* gcc.target/aarch64/uxtl-combine-10.c: Likewise.
	* gcc.target/aarch64/uxtl-combine-11.c: Likewise.
	* gcc.target/aarch64/uxtl-combine-12.c: Likewise.
	* gcc.target/aarch64/uxtl-combine-13.c: Likewise.
---
 gcc/config/aarch64/aarch64-simd.md            | 157 +++++++-----------
 gcc/config/aarch64/aarch64.cc                 |  47 +++++-
 gcc/config/aarch64/aarch64.h                  |   6 +
 gcc/config/aarch64/iterators.md               |   2 +
 gcc/config/aarch64/predicates.md              |   4 +-
 gcc/testsuite/gcc.target/aarch64/pr103350-1.c |   2 +-
 gcc/testsuite/gcc.target/aarch64/pr103350-2.c |   2 +-
 .../gcc.target/aarch64/simd/vmovl_high_1.c    |   8 +-
 .../gcc.target/aarch64/uxtl-combine-10.c      |  24 +++
 .../gcc.target/aarch64/uxtl-combine-11.c      | 127 ++++++++++++++
 .../gcc.target/aarch64/uxtl-combine-12.c      | 130 +++++++++++++++
 .../gcc.target/aarch64/uxtl-combine-13.c      |  26 +++
 .../gcc.target/aarch64/uxtl-combine-7.c       | 136 +++++++++++++++
 .../gcc.target/aarch64/uxtl-combine-8.c       | 136 +++++++++++++++
 .../gcc.target/aarch64/uxtl-combine-9.c       |  32 ++++
 .../gcc.target/aarch64/vect_mixed_sizes_10.c  |   2 +-
 .../gcc.target/aarch64/vect_mixed_sizes_8.c   |   2 +-
 .../gcc.target/aarch64/vect_mixed_sizes_9.c   |   2 +-
 18 files changed, 732 insertions(+), 113 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/uxtl-combine-10.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/uxtl-combine-11.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/uxtl-combine-12.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/uxtl-combine-13.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/uxtl-combine-7.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/uxtl-combine-8.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/uxtl-combine-9.c

Message ID	mptbk9z4u9z.fsf@arm.com
State	New
Headers	show Return-Path: <gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org> X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4T68996X8gz1yP3 for <incoming@patchwork.ozlabs.org>; Sat, 6 Jan 2024 03:31:25 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id DE2BF3857B93 for <incoming@patchwork.ozlabs.org>; Fri, 5 Jan 2024 16:31:23 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by sourceware.org (Postfix) with ESMTP id 048343857B93 for <gcc-patches@gcc.gnu.org>; Fri, 5 Jan 2024 16:30:51 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 048343857B93 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 048343857B93 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=217.140.110.172 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1704472255; cv=none; b=GtnHgKq2FTfr8Ka0Jw4TbEkaDWqMMTKlPW74bLgNCGYaFs/rV39ssGD9wS8wW+/V2DaKpo4TtqfBvbvxrDNYafFm5Yt3G2DbANB0+5SKRrVnuHEpI2+G8C8JQir/oTIFbXALQmlqWeqeo4B0zn3j+Vn5AKKGAQcqnB/vBqJptQA= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1704472255; c=relaxed/simple; bh=Rl/5/a9q3EcGWc47HkyonQaRz+k6ToT80rOJmDZK2iU=; h=From:To:Subject:Date:Message-ID:MIME-Version; b=VrFOGEjEpC9obIy96Aly571LChJmm1r3OZ7Y95oi870cCKWkV6sgi4MAxGBQvf6c/4UZjKyKbcJx+ZS9DkkP0yNCexinpHgbPQCh8G1zDMUyTLYZKedOyo0JtQT9iv1rSlA0+CNFQe/JmLUOHjLH3qALLwWZDzJZuGS/101r94I= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 0E501C15 for <gcc-patches@gcc.gnu.org>; Fri, 5 Jan 2024 08:31:37 -0800 (PST) Received: from localhost (e121540-lin.manchester.arm.com [10.32.110.72]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 17A293F64C for <gcc-patches@gcc.gnu.org>; Fri, 5 Jan 2024 08:30:49 -0800 (PST) From: Richard Sandiford <richard.sandiford@arm.com> To: gcc-patches@gcc.gnu.org Mail-Followup-To: gcc-patches@gcc.gnu.org, richard.sandiford@arm.com Subject: [PATCH] aarch64: Rework uxtl->zip optimisation [PR113196] Date: Fri, 05 Jan 2024 16:30:48 +0000 Message-ID: <mptbk9z4u9z.fsf@arm.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-Spam-Status: No, score=-20.7 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_NONE, KAM_DMARC_STATUS, KAM_LAZY_DOMAIN_SECURITY, KAM_SHORT, KAM_STOCKGEN, SPF_HELO_NONE, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org> List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>, <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe> List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/> List-Post: <mailto:gcc-patches@gcc.gnu.org> List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help> List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>, <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe> Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org
Series	aarch64: Rework uxtl->zip optimisation [PR113196] \| expand aarch64: Rework uxtl->zip optimisation [PR113196]

aarch64: Rework uxtl->zip optimisation [PR113196]

Commit Message

Patch