[6/6] Add a late-combine pass [PR106594]

This patch adds a combine pass that runs late in the pipeline.
There are two instances: one between combine and split1, and one
after postreload.

The pass currently has a single objective: remove definitions by
substituting into all uses.  The pre-RA version tries to restrict
itself to cases that are likely to have a neutral or beneficial
effect on register pressure.

The patch fixes PR106594.  It also fixes a few FAILs and XFAILs
in the aarch64 test results, mostly due to making proper use of
MOVPRFX in cases where we didn't previously.

This is just a first step.  I'm hoping that the pass could be
used for other combine-related optimisations in future.  In particular,
the post-RA version doesn't need to restrict itself to cases where all
uses are substitutable, since it doesn't have to worry about register
pressure.  If we did that, and if we extended it to handle multi-register
REGs, the pass might be a viable replacement for regcprop, which in
turn might reduce the cost of having a post-RA instance of the new pass.

On most targets, the pass is enabled by default at -O2 and above.
However, it has a tendency to undo x86's STV and RPAD passes,
by folding the more complex post-STV/RPAD form back into the
simpler pre-pass form.

Also, running a pass after register allocation means that we can
now match define_insn_and_splits that were previously only matched
before register allocation.  This trips things like:

  (define_insn_and_split "..."
    [...pattern...]
    "...cond..."
    "#"
    "&& 1"
    [...pattern...]
    {
      ...unconditional use of gen_reg_rtx ()...;
    }

because matching and splitting after RA will call gen_reg_rtx when
pseudos are no longer allowed.  rs6000 has several instances of this.

xtensa has a variation in which the split condition is:

    "&& can_create_pseudo_p ()"

The failure then is that, if we match after RA, we'll never be
able to split the instruction.

The patch therefore disables the pass by default on i386, rs6000
and xtensa.  Hopefully we can fix those ports later (if their
maintainers want).  It seems easier to add the pass first, though,
to make it easier to test any such fixes.

gcc.target/aarch64/bitfield-bitint-abi-align{16,8}.c would need
quite a few updates for the late-combine output.  That might be
worth doing, but it seems too complex to do as part of this patch.

I tried compiling at least one target per CPU directory and comparing
the assembly output for parts of the GCC testsuite.  This is just a way
of getting a flavour of how the pass performs; it obviously isn't a
meaningful benchmark.  All targets seemed to improve on average:

Target                 Tests   Good    Bad   %Good   Delta  Median
======                 =====   ====    ===   =====   =====  ======
aarch64-linux-gnu       2215   1975    240  89.16%   -4159      -1
aarch64_be-linux-gnu    1569   1483     86  94.52%  -10117      -1
alpha-linux-gnu         1454   1370     84  94.22%   -9502      -1
amdgcn-amdhsa           5122   4671    451  91.19%  -35737      -1
arc-elf                 2166   1932    234  89.20%  -37742      -1
arm-linux-gnueabi       1953   1661    292  85.05%  -12415      -1
arm-linux-gnueabihf     1834   1549    285  84.46%  -11137      -1
avr-elf                 4789   4330    459  90.42% -441276      -4
bfin-elf                2795   2394    401  85.65%  -19252      -1
bpf-elf                 3122   2928    194  93.79%   -8785      -1
c6x-elf                 2227   1929    298  86.62%  -17339      -1
cris-elf                3464   3270    194  94.40%  -23263      -2
csky-elf                2915   2591    324  88.89%  -22146      -1
epiphany-elf            2399   2304     95  96.04%  -28698      -2
fr30-elf                7712   7299    413  94.64%  -99830      -2
frv-linux-gnu           3332   2877    455  86.34%  -25108      -1
ft32-elf                2775   2667    108  96.11%  -25029      -1
h8300-elf               3176   2862    314  90.11%  -29305      -2
hppa64-hp-hpux11.23     4287   4247     40  99.07%  -45963      -2
ia64-linux-gnu          2343   1946    397  83.06%   -9907      -2
iq2000-elf              9684   9637     47  99.51% -126557      -2
lm32-elf                2681   2608     73  97.28%  -59884      -3
loongarch64-linux-gnu   1303   1218     85  93.48%  -13375      -2
m32r-elf                1626   1517    109  93.30%   -9323      -2
m68k-linux-gnu          3022   2620    402  86.70%  -21531      -1
mcore-elf               2315   2085    230  90.06%  -24160      -1
microblaze-elf          2782   2585    197  92.92%  -16530      -1
mipsel-linux-gnu        1958   1827    131  93.31%  -15462      -1
mipsisa64-linux-gnu     1655   1488    167  89.91%  -16592      -2
mmix                    4914   4814    100  97.96%  -63021      -1
mn10300-elf             3639   3320    319  91.23%  -34752      -2
moxie-rtems             3497   3252    245  92.99%  -87305      -3
msp430-elf              4353   3876    477  89.04%  -23780      -1
nds32le-elf             3042   2780    262  91.39%  -27320      -1
nios2-linux-gnu         1683   1355    328  80.51%   -8065      -1
nvptx-none              2114   1781    333  84.25%  -12589      -2
or1k-elf                3045   2699    346  88.64%  -14328      -2
pdp11                   4515   4146    369  91.83%  -26047      -2
pru-elf                 1585   1245    340  78.55%   -5225      -1
riscv32-elf             2122   2000    122  94.25% -101162      -2
riscv64-elf             1841   1726    115  93.75%  -49997      -2
rl78-elf                2823   2530    293  89.62%  -40742      -4
rx-elf                  2614   2480    134  94.87%  -18863      -1
s390-linux-gnu          1591   1393    198  87.55%  -16696      -1
s390x-linux-gnu         2015   1879    136  93.25%  -21134      -1
sh-linux-gnu            1870   1507    363  80.59%   -9491      -1
sparc-linux-gnu         1123   1075     48  95.73%  -14503      -1
sparc-wrs-vxworks       1121   1073     48  95.72%  -14578      -1
sparc64-linux-gnu       1096   1021     75  93.16%  -15003      -1
v850-elf                1897   1728    169  91.09%  -11078      -1
vax-netbsdelf           3035   2995     40  98.68%  -27642      -1
visium-elf              1392   1106    286  79.45%   -7984      -2
xstormy16-elf           2577   2071    506  80.36%  -13061      -1

gcc/
	PR rtl-optimization/106594
	* Makefile.in (OBJS): Add late-combine.o.
	* common.opt (flate-combine-instructions): New option.
	* doc/invoke.texi: Document it.
	* opts.cc (default_options_table): Enable it by default at -O2
	and above.
	* tree-pass.h (make_pass_late_combine): Declare.
	* late-combine.cc: New file.
	* passes.def: Add two instances of late_combine.
	* config/i386/i386-options.cc (ix86_override_options_after_change):
	Disable late-combine by default.
	* config/rs6000/rs6000.cc (rs6000_option_override_internal): Likewise.
	* config/xtensa/xtensa.cc (xtensa_option_override): Likewise.

gcc/testsuite/
	PR rtl-optimization/106594
	* gcc.dg/ira-shrinkwrap-prep-1.c: Restrict XFAIL to non-aarch64
	targets.
	* gcc.dg/ira-shrinkwrap-prep-2.c: Likewise.
	* gcc.dg/stack-check-4.c: Add -fno-shrink-wrap.
	* gcc.target/aarch64/bitfield-bitint-abi-align16.c: Add
	-fno-late-combine-instructions.
	* gcc.target/aarch64/bitfield-bitint-abi-align8.c: Likewise.
	* gcc.target/aarch64/sve/cond_asrd_3.c: Remove XFAILs.
	* gcc.target/aarch64/sve/cond_convert_3.c: Likewise.
	* gcc.target/aarch64/sve/cond_fabd_5.c: Likewise.
	* gcc.target/aarch64/sve/cond_convert_6.c: Expect the MOVPRFX /Zs
	described in the comment.
	* gcc.target/aarch64/sve/cond_unary_4.c: Likewise.
	* gcc.target/aarch64/pr106594_1.c: New test.
---
 gcc/Makefile.in                               |   1 +
 gcc/common.opt                                |   5 +
 gcc/config/i386/i386-options.cc               |   4 +
 gcc/config/rs6000/rs6000.cc                   |   8 +
 gcc/config/xtensa/xtensa.cc                   |  11 +
 gcc/doc/invoke.texi                           |  11 +-
 gcc/late-combine.cc                           | 747 ++++++++++++++++++
 gcc/opts.cc                                   |   1 +
 gcc/passes.def                                |   2 +
 gcc/testsuite/gcc.dg/ira-shrinkwrap-prep-1.c  |   2 +-
 gcc/testsuite/gcc.dg/ira-shrinkwrap-prep-2.c  |   2 +-
 gcc/testsuite/gcc.dg/stack-check-4.c          |   2 +-
 .../aarch64/bitfield-bitint-abi-align16.c     |   2 +-
 .../aarch64/bitfield-bitint-abi-align8.c      |   2 +-
 gcc/testsuite/gcc.target/aarch64/pr106594_1.c |  20 +
 .../gcc.target/aarch64/sve/cond_asrd_3.c      |  10 +-
 .../gcc.target/aarch64/sve/cond_convert_3.c   |   8 +-
 .../gcc.target/aarch64/sve/cond_convert_6.c   |   8 +-
 .../gcc.target/aarch64/sve/cond_fabd_5.c      |  11 +-
 .../gcc.target/aarch64/sve/cond_unary_4.c     |  13 +-
 gcc/tree-pass.h                               |   1 +
 21 files changed, 834 insertions(+), 37 deletions(-)
 create mode 100644 gcc/late-combine.cc
 create mode 100644 gcc/testsuite/gcc.target/aarch64/pr106594_1.c

Message ID	20240620133418.350772-7-richard.sandiford@arm.com
State	New
Headers	show Return-Path: <gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org> X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4W4hNY0YhNz20Wb for <incoming@patchwork.ozlabs.org>; Thu, 20 Jun 2024 23:36:45 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 4A51D38930FF for <incoming@patchwork.ozlabs.org>; Thu, 20 Jun 2024 13:36:43 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by sourceware.org (Postfix) with ESMTP id EEF6038930DA for <gcc-patches@gcc.gnu.org>; Thu, 20 Jun 2024 13:35:07 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org EEF6038930DA Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org EEF6038930DA Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=217.140.110.172 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1718890514; cv=none; b=oCrl4VEc4MQo/l+OiP+S9p/6aCwiekbnaIui0AnvMQ097oiZ1XpXC2Mmr970iF9ysAzlZW/3nbDlL3o2fNPdvY9poIRJ4lJxaKuRZSrplK/RrK+EIlFNW/GsQ/OkaE8byc7Zzscp9db5EEwzddKU489oScZTWTURdc2mhHmvN+w= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1718890514; c=relaxed/simple; bh=pTFq3xZg24kdWZ58sakoJM0HYKiCxpfG/NMGUfapS+4=; h=From:To:Subject:Date:Message-Id:MIME-Version; b=Mb/pa1XTxZadgiDeYnAW8A/seDyvODr9zivkz7V9TvY6P8aJ4YsAx3flkak7ZdBp4Wyfuf0g0kAujC6o7KFutZpgL/RKewHlwp8HsWHBdr34qqRUJiNt8RtTK4xc4F+zb/+iSA2LFtym/990g7K8adYdPFclQ+gi8PM3c8FoQqQ= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 6C1741684; Thu, 20 Jun 2024 06:35:32 -0700 (PDT) Received: from e121540-lin.manchester.arm.com (e121540-lin.manchester.arm.com [10.32.110.72]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 0A5C73F73B; Thu, 20 Jun 2024 06:35:06 -0700 (PDT) From: Richard Sandiford <richard.sandiford@arm.com> To: jlaw@ventanamicro.com, gcc-patches@gcc.gnu.org Cc: Richard Sandiford <richard.sandiford@arm.com> Subject: [PATCH 6/6] Add a late-combine pass [PR106594] Date: Thu, 20 Jun 2024 14:34:18 +0100 Message-Id: <20240620133418.350772-7-richard.sandiford@arm.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20240620133418.350772-1-richard.sandiford@arm.com> References: <20240620133418.350772-1-richard.sandiford@arm.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-17.0 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_NONE, KAM_DMARC_STATUS, KAM_LAZY_DOMAIN_SECURITY, KAM_SHORT, SPAM_BODY, SPF_HELO_NONE, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org> List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>, <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe> List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/> List-Post: <mailto:gcc-patches@gcc.gnu.org> List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help> List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>, <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe> Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org
Series	Add a late-combine pass \| expand [0/6] Add a late-combine pass [1/6] rtl-ssa: Rework _ignoring interfaces [2/6] rtl-ssa: Don't cost no-op moves [3/6] iq2000: Fix test and branch instructions [4/6] sh: Make *minus_plus_one work after RA [5/6] xstormy16: Fix xs_hi_nonmemory_operand [6/6] Add a late-combine pass [PR106594]

[6/6] Add a late-combine pass [PR106594]

Commit Message

Comments

Patch