[pushed,v2] aarch64: Add an early RA for strided registers

Updates in v2:
- Respond to Kyrill's review:
  - Add a comment above the stride_type attribute.
  - Document the default -mearly-ra setting.
  - Fix REG_EQUIV handling and so avoid the need for the previous IRA change.
- Now that Lehua's patches have been postponed to GCC 15, Kyrill, Tamar
  and I agreed that it would be worth enabling this by default for all
  functions, rather than just those that need the strided registers.
  We can revisit that for GCC 15 if necessary (or earlier if problems
  are found).
- Add tests for PRs that are fixed by doing that.
- Add a simple regrename-like post-pass over the allocation to use
  spare FPRs.  This gives more scheduling freedom and encourages the
  formation of LDPs and STPs.
- Tweak and fix various comments.

Pushed to trunk after retesting on aarch64-linux-gnu.

---

This pass adds a simple register allocator for FP & SIMD registers.
Its main purpose is to make use of SME2's strided LD1, ST1 and LUTI2/4
instructions, which require a very specific grouping structure,
and so would be difficult to exploit with general allocation.

The allocator is very simple.  It gives up on anything that would
require spilling, or that it might not handle well for other reasons.

The allocator needs to track liveness at the level of individual FPRs.
Doing that fixes a lot of the PRs relating to redundant moves caused by
structure loads and stores.  That particular problem is going to be
fixed more generally for GCC 15 by Lehua's RA patches.

However, the early-RA pass runs before scheduling, so it has a chance
to bag a spill-free allocation of vector code before the scheduler moves
things around.  It could therefore still be useful for non-SME code
(e.g. for hand-scheduled ACLE code) even after Lehua's patches are in.

The pass is controlled by a tristate switch:

- -mearly-ra=all: run on all functions
- -mearly-ra=strided: run on functions that have access to strided registers
- -mearly-ra=none: don't run on any function

The patch makes -mearly-ra=all the default at -O2 and above for now.
We can revisit this for GCC 15 once Lehua's patches are in;
-mearly-ra=strided might then be more appropriate.

As said previously, the pass is very naive.  There's much more that we
could do, such as handling invariants better.  The main focus is on not
committing to a bad allocation, rather than on handling as much as
possible.

gcc/
	PR rtl-optimization/106694
	PR rtl-optimization/109078
	PR rtl-optimization/109391
	* config.gcc: Add aarch64-early-ra.o for AArch64 targets.
	* config/aarch64/t-aarch64 (aarch64-early-ra.o): New rule.
	* config/aarch64/aarch64-opts.h (aarch64_early_ra_scope): New enum.
	* config/aarch64/aarch64.opt (mearly_ra): New option.
	* doc/invoke.texi: Document it.
	* common/config/aarch64/aarch64-common.cc
	(aarch_option_optimization_table): Use -mearly-ra=strided by
	default for -O2 and above.
	* config/aarch64/aarch64-passes.def (pass_aarch64_early_ra): New pass.
	* config/aarch64/aarch64-protos.h (aarch64_strided_registers_p)
	(make_pass_aarch64_early_ra): Declare.
	* config/aarch64/aarch64-sme.md (@aarch64_sme_lut<LUTI_BITS><mode>):
	Add a stride_type attribute.
	(@aarch64_sme_lut<LUTI_BITS><mode>_strided2): New pattern.
	(@aarch64_sme_lut<LUTI_BITS><mode>_strided4): Likewise.
	* config/aarch64/aarch64-sve-builtins-base.cc (svld1_impl::expand)
	(svldnt1_impl::expand, svst1_impl::expand, svstn1_impl::expand): Handle
	new way of defining multi-register loads and stores.
	* config/aarch64/aarch64-sve.md (@aarch64_ld1<SVE_FULLx24:mode>)
	(@aarch64_ldnt1<SVE_FULLx24:mode>, @aarch64_st1<SVE_FULLx24:mode>)
	(@aarch64_stnt1<SVE_FULLx24:mode>): Delete.
	* config/aarch64/aarch64-sve2.md (@aarch64_<LD1_COUNT:optab><mode>)
	(@aarch64_<LD1_COUNT:optab><mode>_strided2): New patterns.
	(@aarch64_<LD1_COUNT:optab><mode>_strided4): Likewise.
	(@aarch64_<ST1_COUNT:optab><mode>): Likewise.
	(@aarch64_<ST1_COUNT:optab><mode>_strided2): Likewise.
	(@aarch64_<ST1_COUNT:optab><mode>_strided4): Likewise.
	* config/aarch64/aarch64.cc (aarch64_strided_registers_p): New
	function.
	* config/aarch64/aarch64.md (UNSPEC_LD1_SVE_COUNT): Delete.
	(UNSPEC_ST1_SVE_COUNT, UNSPEC_LDNT1_SVE_COUNT): Likewise.
	(UNSPEC_STNT1_SVE_COUNT): Likewise.
	(stride_type): New attribute.
	* config/aarch64/constraints.md (Uwd, Uwt): New constraints.
	* config/aarch64/iterators.md (UNSPEC_LD1_COUNT, UNSPEC_LDNT1_COUNT)
	(UNSPEC_ST1_COUNT, UNSPEC_STNT1_COUNT): New unspecs.
	(optab): Handle them.
	(LD1_COUNT, ST1_COUNT): New iterators.
	* config/aarch64/aarch64-early-ra.cc: New file.

gcc/testsuite/
	PR rtl-optimization/106694
	PR rtl-optimization/109078
	PR rtl-optimization/109391
	* gcc.target/aarch64/ldp_stp_16.c (cons4_4_float): Tighten expected
	output test.
	* gcc.target/aarch64/sve/shift_1.c: Allow reversed shifts for .s
	as well as .d.
	* gcc.target/aarch64/sme/strided_1.c: New test.
	* gcc.target/aarch64/pr109078.c: Likewise.
	* gcc.target/aarch64/pr109391.c: Likewise.
	* gcc.target/aarch64/sve/pr106694.c: Likewise.
---
 gcc/common/config/aarch64/aarch64-common.cc   |    1 +
 gcc/config.gcc                                |    2 +-
 gcc/config/aarch64/aarch64-early-ra.cc        | 3423 +++++++++++++++++
 gcc/config/aarch64/aarch64-opts.h             |   11 +
 gcc/config/aarch64/aarch64-passes.def         |    1 +
 gcc/config/aarch64/aarch64-protos.h           |    2 +
 gcc/config/aarch64/aarch64-sme.md             |   70 +
 .../aarch64/aarch64-sve-builtins-base.cc      |   14 +-
 gcc/config/aarch64/aarch64-sve.md             |   44 -
 gcc/config/aarch64/aarch64-sve2.md            |  144 +-
 gcc/config/aarch64/aarch64.cc                 |   13 +
 gcc/config/aarch64/aarch64.md                 |   24 +-
 gcc/config/aarch64/aarch64.opt                |   18 +
 gcc/config/aarch64/constraints.md             |    8 +
 gcc/config/aarch64/iterators.md               |   12 +
 gcc/config/aarch64/t-aarch64                  |    6 +
 gcc/doc/invoke.texi                           |   15 +
 gcc/testsuite/gcc.target/aarch64/ldp_stp_16.c |    5 +-
 gcc/testsuite/gcc.target/aarch64/pr109078.c   |   59 +
 gcc/testsuite/gcc.target/aarch64/pr109391.c   |   14 +
 .../gcc.target/aarch64/sme/strided_1.c        |  253 ++
 .../gcc.target/aarch64/sve/pr106694.c         |   28 +
 .../gcc.target/aarch64/sve/shift_1.c          |    6 +-
 23 files changed, 4113 insertions(+), 60 deletions(-)
 create mode 100644 gcc/config/aarch64/aarch64-early-ra.cc
 create mode 100644 gcc/testsuite/gcc.target/aarch64/pr109078.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/pr109391.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/strided_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/pr106694.c

Message ID	mptlea5x05m.fsf@arm.com
State	New
Headers	show Return-Path: <gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org> X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4SmPyp21GGz23nS for <incoming@patchwork.ozlabs.org>; Fri, 8 Dec 2023 06:50:56 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 274DB3858404 for <incoming@patchwork.ozlabs.org>; Thu, 7 Dec 2023 19:50:54 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by sourceware.org (Postfix) with ESMTP id 6208A3858D20 for <gcc-patches@gcc.gnu.org>; Thu, 7 Dec 2023 19:50:15 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 6208A3858D20 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 6208A3858D20 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=217.140.110.172 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1701978626; cv=none; b=mt2eiq9D6cOJHW/tqt3s7PaUuUE7jkdwCWGbikL6pQkqvdyvNK6tv+rrhCg/pALO0ACoqwcHAdKYWhhy9MdkzPOGLtxTmWnXPnyjEXDbnqUR/7gpHY56OoaUF/VKkZE2grER98mc2bdm4ewWnnAx4/1aeWvJgcb8SC/7XIBr3k0= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1701978626; c=relaxed/simple; bh=aB3MzCMYGN3hmnplBfFcjwpvC6YqVP/3D8kEgY1f1jg=; h=From:To:Subject:Date:Message-ID:MIME-Version; b=pkA6XkMW86FrY2FOtvSclWSfklPG9bQjT22IrRB9qE3jFHtL64bGm/8jxHmOZ/fyV/QK1/l0iLaBmIdZAh7xNVRqYvPdMB5aDVshMXeV1IcDjZM9pjHxuEy2XZQDZc3l5/u0K9IKRApNxkSltkLlo0IGoSv9did7HLWsw8/GFAI= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id DFB2711FB for <gcc-patches@gcc.gnu.org>; Thu, 7 Dec 2023 11:51:00 -0800 (PST) Received: from localhost (e121540-lin.manchester.arm.com [10.32.110.72]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 3A0D73F5A1 for <gcc-patches@gcc.gnu.org>; Thu, 7 Dec 2023 11:50:14 -0800 (PST) From: Richard Sandiford <richard.sandiford@arm.com> To: gcc-patches@gcc.gnu.org Mail-Followup-To: gcc-patches@gcc.gnu.org, richard.sandiford@arm.com Subject: [pushed v2] aarch64: Add an early RA for strided registers Date: Thu, 07 Dec 2023 19:50:13 +0000 Message-ID: <mptlea5x05m.fsf@arm.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-Spam-Status: No, score=-21.6 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_ASCII_DIVIDERS, KAM_DMARC_NONE, KAM_DMARC_STATUS, KAM_LAZY_DOMAIN_SECURITY, KAM_SHORT, SPF_HELO_NONE, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org> List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>, <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe> List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/> List-Post: <mailto:gcc-patches@gcc.gnu.org> List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help> List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>, <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe> Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org
Series	[pushed,v2] aarch64: Add an early RA for strided registers \| expand [pushed,v2] aarch64: Add an early RA for strided registers

[pushed,v2] aarch64: Add an early RA for strided registers

Commit Message

Patch