From patchwork Thu Jun 20 13:34:12 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Sandiford X-Patchwork-Id: 1950175 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=8.43.85.97; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4W4hM952YYz20Wb for ; Thu, 20 Jun 2024 23:35:32 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 0B302389367F for ; Thu, 20 Jun 2024 13:35:30 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by sourceware.org (Postfix) with ESMTP id D2C1E389245F for ; Thu, 20 Jun 2024 13:35:02 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org D2C1E389245F Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org D2C1E389245F Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=217.140.110.172 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1718890504; cv=none; b=h9Ybf58PINNBrNS4YMFcUJ3JeqLzi/Is8i/XRNgn3Vx4WRDZntzpFl6H92YVSAT30Vg+SUcRtdKHWo2C6LEgAPUGgNFzA/GG7xNQGpPdTdIC1XcKknBImItJDwykidghcHCeABj7uYJuYsz3GHqCgLVelAiUlbgjesCYUvkzeoI= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1718890504; c=relaxed/simple; bh=fqpKQKRyurWosPfZBGWy3mLVwri23LGelyrO6dFvJOg=; h=From:To:Subject:Date:Message-Id:MIME-Version; b=RoKiOuOn4r02IaFWvQXH5rfvnPxH38uYWH54vI6YZ3NUOxqcQQm6HpRVZN38KcYGQKQiN8T4OvtQxsYleQjGVzfTWBKNSw99eDGjhedX4m770v+MUgi3Rgs0+XKnpGZjH/QevyEuj/37F+08IX5rwWC8zVi1f3KrGe+HEjU0xwY= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 3CDBEDA7; Thu, 20 Jun 2024 06:35:27 -0700 (PDT) Received: from e121540-lin.manchester.arm.com (e121540-lin.manchester.arm.com [10.32.110.72]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id E9B9A3F73B; Thu, 20 Jun 2024 06:35:01 -0700 (PDT) From: Richard Sandiford To: jlaw@ventanamicro.com, gcc-patches@gcc.gnu.org Cc: Richard Sandiford Subject: [PATCH 0/6] Add a late-combine pass Date: Thu, 20 Jun 2024 14:34:12 +0100 Message-Id: <20240620133418.350772-1-richard.sandiford@arm.com> X-Mailer: git-send-email 2.25.1 MIME-Version: 1.0 X-Spam-Status: No, score=-14.0 required=5.0 tests=BAYES_00, KAM_DMARC_NONE, KAM_DMARC_STATUS, KAM_LAZY_DOMAIN_SECURITY, KAM_SHORT, SPF_HELO_NONE, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org This series is a resubmission of the late-combine work. I've fixed some bugs that Jeff's cross-target CI found last time and some others that I hit since then. I've also removed a source of quadraticness (oops!). Doing that in turn drove some tweaks to the rtl-ssa scan routines. The complexity of the new pass should be amortised O(n1 log(n2)), where n1 is the total number of input operands in the function and n2 is the number of instructions. The log(n2) component comes from searching call clobbers and is very much a worst case. We therefore shouldn't need a --param to limit the optimisation. I think the main comment from last time was that we should enable the pass by default on most targets. If there is a known reason why the pass doesn't work on a particular target, we should default to off for that specific target and file a bug to track the problem. The only targets that I know need to be handled in this way are i386, rs6000 and xtensa. See the covering note in the last patch for details. If the series is OK, I'll file PRs for those targets after pushing the patches. Tested on aarch64-linux-gnu and x86_64-linux-gnu (somewhat of a token gesture given the default-off for x86_64). Also tested by compiling one target per CPU directory and comparing the assembly output for parts of the GCC testsuite. This is just a way of getting a flavour of how the pass performs; it obviously isn't a meaningful benchmark. All targets seemed to improve on average, as described in the covering note to the last patch. The original motivation for the pass was to fix things like PR106594. However, it also helps to reclaim some of the optimisations that were lost in r15-268. Please let me know if there are some cases that the pass fails to reclaim. The series depends on Gui Haochen's insn_cost fix. OK to install? Thanks to Jeff for the help with testing the series. Richard Richard Sandiford (6): rtl-ssa: Rework _ignoring interfaces rtl-ssa: Don't cost no-op moves iq2000: Fix test and branch instructions sh: Make *minus_plus_one work after RA xstormy16: Fix xs_hi_nonmemory_operand Add a late-combine pass [PR106594] gcc/Makefile.in | 1 + gcc/common.opt | 5 + gcc/config/aarch64/aarch64-cc-fusion.cc | 4 +- gcc/config/i386/i386-options.cc | 4 + gcc/config/iq2000/iq2000.cc | 2 +- gcc/config/iq2000/iq2000.md | 4 +- gcc/config/rs6000/rs6000.cc | 8 + gcc/config/sh/sh.md | 6 +- gcc/config/stormy16/predicates.md | 2 +- gcc/config/xtensa/xtensa.cc | 11 + gcc/doc/invoke.texi | 11 +- gcc/doc/rtl.texi | 14 +- gcc/late-combine.cc | 747 ++++++++++++++++++ gcc/opts.cc | 1 + gcc/pair-fusion.cc | 34 +- gcc/passes.def | 2 + gcc/rtl-ssa.h | 1 + gcc/rtl-ssa/access-utils.h | 145 ++-- gcc/rtl-ssa/change-utils.h | 67 +- gcc/rtl-ssa/changes.cc | 6 +- gcc/rtl-ssa/changes.h | 13 - gcc/rtl-ssa/functions.h | 16 +- gcc/rtl-ssa/insn-utils.h | 8 - gcc/rtl-ssa/insns.cc | 7 +- gcc/rtl-ssa/insns.h | 12 - gcc/rtl-ssa/member-fns.inl | 35 +- gcc/rtl-ssa/movement.h | 118 ++- gcc/rtl-ssa/predicates.h | 58 ++ gcc/testsuite/gcc.dg/ira-shrinkwrap-prep-1.c | 2 +- gcc/testsuite/gcc.dg/ira-shrinkwrap-prep-2.c | 2 +- gcc/testsuite/gcc.dg/stack-check-4.c | 2 +- .../aarch64/bitfield-bitint-abi-align16.c | 2 +- .../aarch64/bitfield-bitint-abi-align8.c | 2 +- gcc/testsuite/gcc.target/aarch64/pr106594_1.c | 20 + .../gcc.target/aarch64/sve/cond_asrd_3.c | 10 +- .../gcc.target/aarch64/sve/cond_convert_3.c | 8 +- .../gcc.target/aarch64/sve/cond_convert_6.c | 8 +- .../gcc.target/aarch64/sve/cond_fabd_5.c | 11 +- .../gcc.target/aarch64/sve/cond_unary_4.c | 13 +- gcc/tree-pass.h | 1 + 40 files changed, 1127 insertions(+), 296 deletions(-) create mode 100644 gcc/late-combine.cc create mode 100644 gcc/rtl-ssa/predicates.h create mode 100644 gcc/testsuite/gcc.target/aarch64/pr106594_1.c