[v2,1/2] RISC-V: avoid LUI based const materialization ... [part of PR/106265]

Apologies for the delay in getting this out. Needed to fix one ICE
with glibc build and fresh round of testing: both testsuite and SPEC
runs (which are similar to v1 in terms of Cactu gains, but some more minor
regressions elsewhere gcc). Again those seem so small that IMHO this
should still go in.

I'll investigate those next as well as an existing weirdnes in glibc tempnam
which I spotted during the debugging.

Changes since v1 [1]
 - Tighten the main conditition to avoid stack regs as destination
   (to avoid making them potentially unaligned with -2047 addend:
    this might be OK execution/ABI wise, but undesirable/ugly still
    specially when coming from compiler codegen).
 - Ensure that first alternative is always split
 - Remove "&& 1" from split condition. That was tripping up glibc build
   with illegal operands `add s0, s0, 2048`.

[1] https://gcc.gnu.org/pipermail/gcc-patches/2024-March/647877.html

---

... if the constant can be represented as sum of two S12 values.
The two S12 values could instead be fused with subsequent ADD insn.
The helps
 - avoid an additional LUI insn
 - side benefits of not clobbering a reg

e.g.
                            w/o patch             w/ patch
long                  |                     |
plus(unsigned long i) |	li	a5,4096     |
{                     |	addi	a5,a5,-2032 | addi a0, a0, 2047
   return i + 2064;   |	add	a0,a0,a5    | addi a0, a0, 17
}                     | 	ret         | ret

NOTE: In theory not having const in a standalone reg might seem less
      CSE friendly, but for workloads in consideration these mat are
      from very late LRA reloads and follow on GCSE is not doing much
      currently.

The real benefit however is seen in base+offset computation for array
accesses and especially for stack accesses which are finalized late in
optim pipeline, during LRA register allocation. Often the finalized
offsets trigger LRA reloads resulting in mind boggling repetition of
exact same insn sequence including LUI based constant materialization.

This shaves off 290 billion dynamic instrustions (QEMU icounts) in
SPEC 2017 Cactu benchmark which is over 10% of workload. In the rest of
suite, there additional 10 billion shaved, with both gains and losses
in indiv workloads as is usual with compiler changes.

 500.perlbench_r-0 |  1,214,534,029,025 | 1,212,887,959,387 |
 500.perlbench_r-1 |    740,383,419,739 |   739,280,308,163 |
 500.perlbench_r-2 |    692,074,638,817 |   691,118,734,547 |
 502.gcc_r-0       |    190,820,141,435 |   190,857,065,988 |
 502.gcc_r-1       |    225,747,660,839 |   225,809,444,357 | <- -0.02%
 502.gcc_r-2       |    220,370,089,641 |   220,406,367,876 | <- -0.03%
 502.gcc_r-3       |    179,111,460,458 |   179,135,609,723 | <- -0.02%
 502.gcc_r-4       |    219,301,546,340 |   219,320,416,956 | <- -0.01%
 503.bwaves_r-0    |    278,733,324,691 |   278,733,323,575 | <- -0.01%
 503.bwaves_r-1    |    442,397,521,282 |   442,397,519,616 |
 503.bwaves_r-2    |    344,112,218,206 |   344,112,216,760 |
 503.bwaves_r-3    |    417,561,469,153 |   417,561,467,597 |
 505.mcf_r         |    669,319,257,525 |   669,318,763,084 |
 507.cactuBSSN_r   |  2,852,767,394,456 | 2,564,736,063,742 | <+ 10.10%
 508.namd_r        |  1,855,884,342,110 | 1,855,881,110,934 |
 510.parest_r      |  1,654,525,521,053 | 1,654,402,859,174 |
 511.povray_r      |  2,990,146,655,619 | 2,990,060,324,589 |
 519.lbm_r         |  1,158,337,294,525 | 1,158,337,294,529 |
 520.omnetpp_r     |  1,021,765,791,283 | 1,026,165,661,394 |
 521.wrf_r         |  1,715,955,652,503 | 1,714,352,737,385 |
 523.xalancbmk_r   |    849,846,008,075 |   849,836,851,752 |
 525.x264_r-0      |    277,801,762,763 |   277,488,776,427 |
 525.x264_r-1      |    927,281,789,540 |   926,751,516,742 |
 525.x264_r-2      |    915,352,631,375 |   914,667,785,953 |
 526.blender_r     |  1,652,839,180,887 | 1,653,260,825,512 |
 527.cam4_r        |  1,487,053,494,925 | 1,484,526,670,770 |
 531.deepsjeng_r   |  1,641,969,526,837 | 1,642,126,598,866 |
 538.imagick_r     |  2,098,016,546,691 | 2,097,997,929,125 |
 541.leela_r       |  1,983,557,323,877 | 1,983,531,314,526 |
 544.nab_r         |  1,516,061,611,233 | 1,516,061,407,715 |
 548.exchange2_r   |  2,072,594,330,215 | 2,072,591,648,318 |
 549.fotonik3d_r   |  1,001,499,307,366 | 1,001,478,944,189 |
 554.roms_r        |  1,028,799,739,111 | 1,028,780,904,061 |
 557.xz_r-0        |    363,827,039,684 |   363,057,014,260 |
 557.xz_r-1        |    906,649,112,601 |   905,928,888,732 |
 557.xz_r-2        |    509,023,898,187 |   508,140,356,932 |
 997.specrand_fr   |        402,535,577 |       403,052,561 |
 999.specrand_ir   |        402,535,577 |       403,052,561 |

This should still be considered damage control as the real/deeper fix
would be to reduce number of LRA reloads or CSE/anchor those during
LRA constraint sub-pass (re)runs (thats a different PR/114729.

Implementation Details (for posterity)
--------------------------------------
 - basic idea is to have a splitter selected via a new predicate for constant
   being possible sum of two S12 and provide the transform.
   This is however a 2 -> 2 transform which combine can't handle.
   So we specify it using a define_insn_and_split.

 - the initial loose "i" constraint caused LRA to accept invalid insns thus
   needing a tighter new constraint as well.

 - An additional fallback alternate with catch-all "r" register
   constraint also needed to allow any "reloads" that LRA might
   require for ADDI with const larger than S12.

Testing
--------
This is testsuite clean (rv64 only).
I'll rely on post-commit CI multlib run for any possible fallout for
other setups such as rv32.

|                                               |         gcc |          g++ |     gfortran |
| rv64imafdc_zba_zbb_zbs_zicond/  lp64d/ medlow |  41 /    17 |    8 /     3 |    7 /     2 |
| rv64imafdc_zba_zbb_zbs_zicond/  lp64d/ medlow |  41 /    17 |    8 /     3 |    7 /     2 |

I also threw this into a buildroot run, it obviously boots Linux to
userspace. bloat-o-meter on glibc and kernel show overall decrease in
staic instruction counts with some minor spot increases.
These are generally in the category of

 - LUI + ADDI are 2 byte each vs. two ADD being 4 byte each.
 - Knock on effects due to inlining changes.
 - Sometimes the slightly shorter 2-insn seq in a mult-exit function
   can cause in-place epilogue duplication (vs. a jump back).
   This is slightly larger but more efficient in execution.
In summary nothing to fret about.

| linux/scripts/bloat-o-meter build-gcc-240131/target/lib/libc.so.6 \
         build-gcc-240131-new-splitter-1-variant/target/lib/libc.so.6
|
| add/remove: 0/0 grow/shrink: 21/49 up/down: 520/-3056 (-2536)
| Function                                     old     new   delta
| getnameinfo                                 2756    2892    +136
...
| tempnam                                      136     144      +8
| padzero                                      276     284      +8
...
| __GI___register_printf_specifier             284     280      -4
| __EI_xdr_array                               468     464      -4
| try_file_lock                                268     260      -8
| pthread_create@GLIBC_2                      3520    3508     -12
| __pthread_create_2_1                        3520    3508     -12
...
| _nss_files_setnetgrent                       932     904     -28
| _nss_dns_gethostbyaddr2_r                   1524    1480     -44
| build_trtable                               3312    3208    -104
| printf_positional                          25000   22580   -2420
| Total: Before=2107999, After=2105463, chg -0.12%

gcc/ChangeLog:
	* config/riscv/riscv.h: New macros to check for sum of two S12
	range.
	* config/riscv/constraints.md: New constraint.
	* config/riscv/predicates.md: New Predicate.
	* config/riscv/riscv.md: New splitter.
	* config/riscv/riscv.cc (riscv_reg_frame_related): New helper.
	* config/riscv/riscv-protos.h: New helper prototype.

gcc/testsuite/ChangeLog:
	* gcc.target/riscv/sum-of-two-s12-const-1.c: New test: checks
	for new patterns output.
	* gcc.target/riscv/sum-of-two-s12-const-2.c: Ditto.
	* gcc.target/riscv/sum-of-two-s12-const-3.c: New test: should not
	ICE.

Signed-off-by: Vineet Gupta <vineetg@rivosinc.com>
---
 gcc/config/riscv/constraints.md               |  6 +++
 gcc/config/riscv/predicates.md                |  6 +++
 gcc/config/riscv/riscv-protos.h               |  1 +
 gcc/config/riscv/riscv.cc                     | 11 +++++
 gcc/config/riscv/riscv.h                      | 15 +++++++
 gcc/config/riscv/riscv.md                     | 40 +++++++++++++++++
 .../gcc.target/riscv/sum-of-two-s12-const-1.c | 45 +++++++++++++++++++
 .../gcc.target/riscv/sum-of-two-s12-const-2.c | 15 +++++++
 .../gcc.target/riscv/sum-of-two-s12-const-3.c | 22 +++++++++
 9 files changed, 161 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/sum-of-two-s12-const-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sum-of-two-s12-const-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sum-of-two-s12-const-3.c

Message ID	20240513184932.662109-2-vineetg@rivosinc.com
State	New
Headers	show Return-Path: <gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org> X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=rivosinc-com.20230601.gappssmtp.com header.i=@rivosinc-com.20230601.gappssmtp.com header.a=rsa-sha256 header.s=20230601 header.b=WFVezGC8; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4VdT7k2xKqz1ymg for <incoming@patchwork.ozlabs.org>; Tue, 14 May 2024 04:50:10 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id A132D3870C28 for <incoming@patchwork.ozlabs.org>; Mon, 13 May 2024 18:50:08 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mail-pf1-x42f.google.com (mail-pf1-x42f.google.com [IPv6:2607:f8b0:4864:20::42f]) by sourceware.org (Postfix) with ESMTPS id E79353870895 for <gcc-patches@gcc.gnu.org>; Mon, 13 May 2024 18:49:38 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org E79353870895 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=rivosinc.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=rivosinc.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org E79353870895 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2607:f8b0:4864:20::42f ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1715626182; cv=none; b=ZoUX/8RK3PmFUKmZc86YEbxOKIP3joGGuwdYooxA4/i3fkSnCBROOIyEZfQKk9D++A+2QLncYPZKZeEs8ypR0NeLdCi4lHhB/4x8TT2GyW0suvz5tRvR1gp9T5Rv3JGZ8DnynVt+b9/HV+f7uqT3DB/QCcbXlNs11+swExY7KrA= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1715626182; c=relaxed/simple; bh=7KhLu1RdeyBaXvyvgwlrEMf2Oo7OGgVxOgNzPHB5I8k=; h=DKIM-Signature:From:To:Subject:Date:Message-Id:MIME-Version; b=YwWmWMdxAoaxTu0Hl+Ir7ZlD1CqIuAfuiVj241M4JAvGCbgmV7jPn3HCl2mqu8QyhBELcNlB99ueRhqD0Syhtn3vVpzlGyIJDkroq95cw0oQzWLXKQihgeh6WqKGo3mzdeG8f8SwQHK24fwuWJ3BeD4+lGzH5avUaD7GIOpT7+g= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-pf1-x42f.google.com with SMTP id d2e1a72fcca58-6f666caf9abso159348b3a.1 for <gcc-patches@gcc.gnu.org>; Mon, 13 May 2024 11:49:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=rivosinc-com.20230601.gappssmtp.com; s=20230601; t=1715626178; x=1716230978; darn=gcc.gnu.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=MvQKNCzJ+qvnIFZ1hK2EJ/foOsjgKFL8cavrfBBQYmw=; b=WFVezGC8HglhH8RsVPQKNsAfYXYXlzmBBzRRY9B+EbTBWLXqUMj9iz8kczFtu4Ozod yrvuxV3k5JuZxloI1bXQRZUEO2KgQ6iqBcKmVPAfRBHOOO0NRbb8vKNGh3q7v8I6kKts qSIMfyLW/bKM8vD4ckEH6yMyItweyf4bLj3TKxYFsaDVsTzmAjPF1xwQGd7lX/GkrEc5 DgM+1MLe5gUyOkN793tVRhmoYTLkD+J99Iq0/mEUt14DHxyOjrSLqyPv+4ZUEvc8A6cp /qXLKrbeBJGAMmxAg/n5hi84O3oOvvxMvXMI6pyvBqoCZRD+lZskcp+I5xhWnWh3X8x7 zwBw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1715626178; x=1716230978; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=MvQKNCzJ+qvnIFZ1hK2EJ/foOsjgKFL8cavrfBBQYmw=; b=DwRwGw4R9p+YNyyRIrjpp1ulZQP5mkqMnJ+w7YnZUGekXT/ReXwKXwmWs1hwhvYggW HhcNW/CiB3G7/TBmIl3A4k8faohbh8KvFyZNUiwJFfYQnojytqtKR6cGImlmD0w0mW7l aEudq4PyTGYHEld/PjiA1fjzi6/vZvuKpMB+3CiSzbdFgcSYVp4uqQdziNJSi006adEh izeFK7z5TOQ5dHKc1fwnpZHzGTq60DqbDFOcYTeQW9BHlwyJUoh01I/OrMJqjGHxHKa3 /V7hRsW8XLLACjtYaXLvNAk2QWifNRWCrOpUTAksrWpd/y6BD/ppRsI/NLrtTNqjGm3u ONxA== X-Gm-Message-State: AOJu0YzstSEE7VvxeYghML8ZYMvvxOO8QYXlYfZbLoD2G8tX98O53nsH 6C8GCuiO4NrIG/I9R6+qz3B4PwIORGbk25C77DtEHc500iyyG3M7TrbSOGqjJhH3QKvnY+uJ40L N05o= X-Google-Smtp-Source: AGHT+IFnDrnho7Kum1Z08kgXoMgzoL4Tpdubey6wwl0NmG95nYtVUJ/RYeI4SqHeU1A4KpAdhsD6jw== X-Received: by 2002:a05:6a21:3d85:b0:1a7:52fa:7d6b with SMTP id adf61e73a8af0-1afde197988mr10038965637.43.1715626177330; Mon, 13 May 2024 11:49:37 -0700 (PDT) Received: from vineet-framework.hq.rivosinc.com ([50.145.13.30]) by smtp.gmail.com with ESMTPSA id d2e1a72fcca58-6f4d2a66585sm7944336b3a.10.2024.05.13.11.49.36 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 13 May 2024 11:49:37 -0700 (PDT) From: Vineet Gupta <vineetg@rivosinc.com> To: gcc-patches@gcc.gnu.org Cc: Jeff Law <jeffreyalaw@gmail.com>, kito.cheng@gmail.com, Palmer Dabbelt <palmer@rivosinc.com>, =?utf-8?q?Christoph_M=C3=BCllner?= <christoph.muellner@vrull.eu>, Robin Dapp <rdapp.gcc@gmail.com>, gnu-toolchain@rivosinc.com, Vineet Gupta <vineetg@rivosinc.com> Subject: [PATCH v2 1/2] RISC-V: avoid LUI based const materialization ... [part of PR/106265] Date: Mon, 13 May 2024 11:49:31 -0700 Message-Id: <20240513184932.662109-2-vineetg@rivosinc.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240513184932.662109-1-vineetg@rivosinc.com> References: <20240513184932.662109-1-vineetg@rivosinc.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-9.9 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, GIT_PATCH_0, KAM_SHORT, LIKELY_SPAM_BODY, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org> List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>, <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe> List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/> List-Post: <mailto:gcc-patches@gcc.gnu.org> List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help> List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>, <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe> Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org
Series	RISC-V improve stack/array access by constant mat tweak \| expand [v2,0/2] RISC-V improve stack/array access by constant mat tweak [v2,1/2] RISC-V: avoid LUI based const materialization ... [part of PR/106265] [v2,2/2] RISC-V: avoid LUI based const mat in prologue/epilogue expansion [PR/105733]

[v2,1/2] RISC-V: avoid LUI based const materialization ... [part of PR/106265]

Commit Message

Comments

Patch