diff mbox series

[RFC] expr: don't clear SUBREG_PROMOTED_VAR_P flag for a promoted subreg [target/111466]

Message ID 20230928214341.257862-1-vineetg@rivosinc.com
State New
Headers show
Series [RFC] expr: don't clear SUBREG_PROMOTED_VAR_P flag for a promoted subreg [target/111466] | expand

Commit Message

Vineet Gupta Sept. 28, 2023, 9:43 p.m. UTC
RISC-V suffers from extraneous sign extensions, despite/given the ABI
guarantee that 32-bit quantities are sign-extended into 64-bit registers,
meaning incoming SI function args need not be explicitly sign extended
(so do SI return values as most ALU insns implicitly sign-extend too.)

Existing REE doesn't seem to handle this well and there are various ideas
floating around to smarten REE about it.

RISC-V also seems to correctly implement middle-end hook PROMOTE_MODE
etc.

Another approach would be to prevent EXPAND from generating the
sign_extend in the first place which this patch tries to do.

The hunk being removed was introduced way back in 1994 as
   5069803972 ("expand_expr, case CONVERT_EXPR .. clear the promotion flag")

This survived full testsuite run for RISC-V rv64gc with surprisingly no
fallouts: test results before/after are exactly same.

|                               | # of unexpected case / # of unique unexpected case
|                               |          gcc |          g++ |     gfortran |
| rv64imafdc_zba_zbb_zbs_zicond/|  264 /    87 |    5 /     2 |   72 /    12 |
|    lp64d/medlow

Granted for something so old to have survived, there must be a valid
reason. Unfortunately the original change didn't have additional
commentary or a test case. That is not to say it can't/won't possibly
break things on other arches/ABIs, hence the RFC for someone to scream
that this is just bonkers, don't do this :-)

I've explicitly CC'ed Jakub and Roger who have last touched subreg
promoted notes in expr.cc for insight and/or screaming ;-)

Thanks to Robin for narrowing this down in an amazing debugging session
@ GNU Cauldron.

```
foo2:
	sext.w	a6,a1             <-- this goes away
	beq	a1,zero,.L4
	li	a5,0
	li	a0,0
.L3:
	addw	a4,a2,a5
	addw	a5,a3,a5
	addw	a0,a4,a0
	bltu	a5,a6,.L3
	ret
.L4:
	li	a0,0
	ret
```

Signed-off-by: Vineet Gupta <vineetg@rivosinc.com>
Co-developed-by: Robin Dapp <rdapp.gcc@gmail.com>
---
 gcc/expr.cc                               |  7 -------
 gcc/testsuite/gcc.target/riscv/pr111466.c | 15 +++++++++++++++
 2 files changed, 15 insertions(+), 7 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/pr111466.c

Comments

Jeff Law Sept. 29, 2023, 3:17 a.m. UTC | #1
On 9/28/23 15:43, Vineet Gupta wrote:
> RISC-V suffers from extraneous sign extensions, despite/given the ABI
> guarantee that 32-bit quantities are sign-extended into 64-bit registers,
> meaning incoming SI function args need not be explicitly sign extended
> (so do SI return values as most ALU insns implicitly sign-extend too.)
> 
> Existing REE doesn't seem to handle this well and there are various ideas
> floating around to smarten REE about it.
> 
> RISC-V also seems to correctly implement middle-end hook PROMOTE_MODE
> etc.
> 
> Another approach would be to prevent EXPAND from generating the
> sign_extend in the first place which this patch tries to do.
> 
> The hunk being removed was introduced way back in 1994 as
>     5069803972 ("expand_expr, case CONVERT_EXPR .. clear the promotion flag")
> 
> This survived full testsuite run for RISC-V rv64gc with surprisingly no
> fallouts: test results before/after are exactly same.
> 
> |                               | # of unexpected case / # of unique unexpected case
> |                               |          gcc |          g++ |     gfortran |
> | rv64imafdc_zba_zbb_zbs_zicond/|  264 /    87 |    5 /     2 |   72 /    12 |
> |    lp64d/medlow
> 
> Granted for something so old to have survived, there must be a valid
> reason. Unfortunately the original change didn't have additional
> commentary or a test case. That is not to say it can't/won't possibly
> break things on other arches/ABIs, hence the RFC for someone to scream
> that this is just bonkers, don't do this :-)
> 
> I've explicitly CC'ed Jakub and Roger who have last touched subreg
> promoted notes in expr.cc for insight and/or screaming ;-)
> 
> Thanks to Robin for narrowing this down in an amazing debugging session
> @ GNU Cauldron.
So I scoured my old gcc2 archives to see if there was anything that 
might hint as to why this was changed.  Sadly (but not unexpectedly), 
nothing.  The relevant ChangeLog entry is;

> 
> Fri Jul  8 11:46:50 1994  Richard Kenner  (kenner@vlsi1.ultra.nyu.edu)
> 
>         * varasm.c (record_constant_rtx, force_const_mem): Ensure everything
>         is in saveable_obstack, not current_obstack.
> 
>         * combine.c (force_to_mode): OP_MODE must be MODE if MODE and
>         mode of X are of different classes.
>         (nonzero_bits, num_sign_bit_copies): Say nothing known for
>         floating-point modes.
> 
>         * function.c (instantiate_virtual_regs_1, case SET):
>         If DEST is virtual_stack_vars_rtx, replace with hardware
>         frame pointer.
> 
>         * expr.c (expand_expr, case CONVERT_EXPR): If changing signedness
>         and we have a promoted SUBREG, clear the promotion flag.
> 
>         * c-decl.c (finish_decl): Put RTL and other stuff in
>         permanent_obstack if DECL is.
> 
>         * combine.c (gen_unary): Add new arg, OP0_MODE.
>         All callers changed.

So standard practice back then was to re-use the header and have a blank 
line between conceptual changes if the same author made a series of 
changes.  So it's reasonable to assume the expr.c change was considered 
independent of the other changes.

At that particular time I think Kenner was mostly focused on the alpha 
and ppc ports, but I think he was also still poking around with romp and 
a29k.  I think romp is an unlikely target for this because it didn't 
promote modes and it wasn't even building for several months 
(April->late July).

PPC and a29k were both 32 bit ports and while they did promotions, I 
would hazard a guess the alpha was actually more sensitive to this 
stuff.  Which suggests a possible path forward.

I can bootstrap & regression test alpha using QEMU user mode emulation. 
So we might be able to trigger something that way.  It'll take some 
time, but might prove fruitful.


Jeff
Vineet Gupta Sept. 29, 2023, 3:49 a.m. UTC | #2
On 9/28/23 20:17, Jeff Law wrote:
> I can bootstrap & regression test alpha using QEMU user mode 
> emulation. So we might be able to trigger something that way. It'll 
> take some time, but might prove fruitful. 

That would be awesome. It's not like this this is burning or anything 
and one of the things in the long tail of things we need to do in this area.

Thx,
-Vineet
Roger Sayle Sept. 29, 2023, 10:40 a.m. UTC | #3
I agree that this looks dubious.  Normally, if the middle-end/optimizers
wish to reuse a SUBREG in a context where the flags are not valid, it
should create a new one with the desired flags, rather than "mutate"
an existing (and possibly shared) RTX.

I wonder if creating a new SUBREG here also fixes your problem?
I'm not sure that clearing SUBREG_PROMOTED_VAR_P is needed
at all, but given its motivation has been lost to history, it would
good to have a plan B, if Jeff's alpha testing uncovers a subtle issue.

Roger
--

> -----Original Message-----
> From: Vineet Gupta <vineetg@rivosinc.com>
> Sent: 28 September 2023 22:44
> To: gcc-patches@gcc.gnu.org; Robin Dapp <rdapp.gcc@gmail.com>
> Cc: kito.cheng@gmail.com; Jeff Law <jeffreyalaw@gmail.com>; Palmer Dabbelt
> <palmer@rivosinc.com>; gnu-toolchain@rivosinc.com; Roger Sayle
> <roger@nextmovesoftware.com>; Jakub Jelinek <jakub@redhat.com>; Jivan
> Hakobyan <jivanhakobyan9@gmail.com>; Vineet Gupta <vineetg@rivosinc.com>
> Subject: [RFC] expr: don't clear SUBREG_PROMOTED_VAR_P flag for a promoted
> subreg [target/111466]
> 
> RISC-V suffers from extraneous sign extensions, despite/given the ABI
guarantee
> that 32-bit quantities are sign-extended into 64-bit registers, meaning
incoming SI
> function args need not be explicitly sign extended (so do SI return values
as most
> ALU insns implicitly sign-extend too.)
> 
> Existing REE doesn't seem to handle this well and there are various ideas
floating
> around to smarten REE about it.
> 
> RISC-V also seems to correctly implement middle-end hook PROMOTE_MODE
> etc.
> 
> Another approach would be to prevent EXPAND from generating the
sign_extend
> in the first place which this patch tries to do.
> 
> The hunk being removed was introduced way back in 1994 as
>    5069803972 ("expand_expr, case CONVERT_EXPR .. clear the promotion
flag")
> 
> This survived full testsuite run for RISC-V rv64gc with surprisingly no
> fallouts: test results before/after are exactly same.
> 
> |                               | # of unexpected case / # of unique
unexpected case
> |                               |          gcc |          g++ |
gfortran |
> | rv64imafdc_zba_zbb_zbs_zicond/|  264 /    87 |    5 /     2 |   72 /
12 |
> |    lp64d/medlow
> 
> Granted for something so old to have survived, there must be a valid
reason.
> Unfortunately the original change didn't have additional commentary or a
test
> case. That is not to say it can't/won't possibly break things on other
arches/ABIs,
> hence the RFC for someone to scream that this is just bonkers, don't do
this :-)
> 
> I've explicitly CC'ed Jakub and Roger who have last touched subreg
promoted
> notes in expr.cc for insight and/or screaming ;-)
> 
> Thanks to Robin for narrowing this down in an amazing debugging session @
GNU
> Cauldron.
> 
> ```
> foo2:
> 	sext.w	a6,a1             <-- this goes away
> 	beq	a1,zero,.L4
> 	li	a5,0
> 	li	a0,0
> .L3:
> 	addw	a4,a2,a5
> 	addw	a5,a3,a5
> 	addw	a0,a4,a0
> 	bltu	a5,a6,.L3
> 	ret
> .L4:
> 	li	a0,0
> 	ret
> ```
> 
> Signed-off-by: Vineet Gupta <vineetg@rivosinc.com>
> Co-developed-by: Robin Dapp <rdapp.gcc@gmail.com>
> ---
>  gcc/expr.cc                               |  7 -------
>  gcc/testsuite/gcc.target/riscv/pr111466.c | 15 +++++++++++++++
>  2 files changed, 15 insertions(+), 7 deletions(-)  create mode 100644
> gcc/testsuite/gcc.target/riscv/pr111466.c
> 
> diff --git a/gcc/expr.cc b/gcc/expr.cc
> index 308ddc09e631..d259c6e53385 100644
> --- a/gcc/expr.cc
> +++ b/gcc/expr.cc
> @@ -9332,13 +9332,6 @@ expand_expr_real_2 (sepops ops, rtx target,
> machine_mode tmode,
>  	  op0 = expand_expr (treeop0, target, VOIDmode,
>  			     modifier);
> 
> -	  /* If the signedness of the conversion differs and OP0 is
> -	     a promoted SUBREG, clear that indication since we now
> -	     have to do the proper extension.  */
> -	  if (TYPE_UNSIGNED (TREE_TYPE (treeop0)) != unsignedp
> -	      && GET_CODE (op0) == SUBREG)
> -	    SUBREG_PROMOTED_VAR_P (op0) = 0;
> -
>  	  return REDUCE_BIT_FIELD (op0);
>  	}
> 
> diff --git a/gcc/testsuite/gcc.target/riscv/pr111466.c
> b/gcc/testsuite/gcc.target/riscv/pr111466.c
> new file mode 100644
> index 000000000000..007792466a51
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/pr111466.c
> @@ -0,0 +1,15 @@
> +/* Simplified varaint of gcc.target/riscv/zba-adduw.c.  */
> +
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv64gc_zba_zbs -mabi=lp64" } */
> +/* { dg-skip-if "" { *-*-* } { "-O0" } } */
> +
> +int foo2(int unused, int n, unsigned y, unsigned delta){
> +  int s = 0;
> +  unsigned int x = 0;
> +  for (;x<n;x +=delta)
> +    s += x+y;
> +  return s;
> +}
> +
> +/* { dg-final { scan-assembler "\msext\M" } } */
> --
> 2.34.1
Jeff Law Sept. 29, 2023, 12:14 p.m. UTC | #4
On 9/28/23 21:49, Vineet Gupta wrote:
> 
> On 9/28/23 20:17, Jeff Law wrote:
>> I can bootstrap & regression test alpha using QEMU user mode 
>> emulation. So we might be able to trigger something that way. It'll 
>> take some time, but might prove fruitful. 
> 
> That would be awesome. It's not like this this is burning or anything 
> and one of the things in the long tail of things we need to do in this 
> area.
Absolutely true.  In fact, I doubt this particular quirk is all that 
important in the extension removal space.  But we've got enough data to 
do a bit of an experiment.  While it takes a long time to run, it 
doesn't require any kind of regular human intervention.  Better to fire 
it off now while it's still fresh in our minds.  If we wait a week or 
two I'll have forgotten everything.

Jeff
Jeff Law Sept. 29, 2023, 1:43 p.m. UTC | #5
On 9/29/23 04:40, Roger Sayle wrote:
> 
> I agree that this looks dubious.  Normally, if the middle-end/optimizers
> wish to reuse a SUBREG in a context where the flags are not valid, it
> should create a new one with the desired flags, rather than "mutate"
> an existing (and possibly shared) RTX.
SUBREGs aren't shared, though I don't think that changes any of your 
conclusions.


jeff
Vineet Gupta Oct. 3, 2023, 1:29 a.m. UTC | #6
On 9/29/23 05:14, Jeff Law wrote:
>
>
> On 9/28/23 21:49, Vineet Gupta wrote:
>>
>> On 9/28/23 20:17, Jeff Law wrote:
>>> I can bootstrap & regression test alpha using QEMU user mode 
>>> emulation. So we might be able to trigger something that way. It'll 
>>> take some time, but might prove fruitful. 
>>
>> That would be awesome. It's not like this this is burning or anything 
>> and one of the things in the long tail of things we need to do in 
>> this area.
> Absolutely true.  In fact, I doubt this particular quirk is all that 
> important in the extension removal space.  But we've got enough data 
> to do a bit of an experiment.  While it takes a long time to run, it 
> doesn't require any kind of regular human intervention.  Better to 
> fire it off now while it's still fresh in our minds.  If we wait a 
> week or two I'll have forgotten everything.

Just as a data-point, the SPEC QEMU icount on RISC-V with this patch 
seems promising (+ve is better, lesser icount)

Baseline        subreg promoted
                                                 note preserved   %
benchmark       workload #
500.perlbench_r 0               1222524143251   1222717055541 -0.02%
                 1               741422677286    740573609468 0.11%
                 2               694157786227    693787219643 0.05%
502.gcc_r       0               189519277112    188986824061 0.28%  <--
                 1               224763075520    224225499491 0.24%
                 2               216802546093    216426186662 0.17%
                 3               179634122120    179165923074 0.26%  <--
                 4               222757886491    222343753217 0.19%
503.bwaves_r    0               309660270217    309640234863 0.01%
                 1               488871452301    488838894844 0.01%
                 2               381243468081    381218065515 0.01%
                 3               463786236849    463756755469 0.01%
505.mcf_r       0               675010417323    675014630665 0.00%
507.cactuBSSN_r 0               2856874668987   2856874679135 0.00%
508.namd_r      0               1877527849689   1877508676900 0.00%
510.parest_r    0               3479719575579   3479104891751 0.02%
511.povray_r    0               3028749801452   3030037805612 -0.04%
519.lbm_r       0               1172421269180   1172421185445 0.00%
520.omnetpp_r   0               1014838628822   1007680353306 0.71%  <--
521.wrf_r       0               3897783090826   3897162379983 0.02%
523.xalancbmk_r 0               1062577057227   1062508198843 0.01%
525.x264_r      0               451836043634    449289002218 0.56%  <--
                 1               1761617424590   1758009904369 0.20%  <--
                 2               1686037465791   1682815274048 0.19%  <--
526.blender_r   0               1660559350538   1660534452552 0.00%
527.cam4_r      0               2141572063843   2141254936488 0.01%
531.deepsjeng_r 0               1605692153702   1603021256993 0.17%
538.imagick_r   0               3401602661013   3401602660827 0.00%
541.leela_r     0               1989286081019   1987365526160 0.10%
544.nab_r       0               1537038165879   1528865609530 0.53%  <--
548.exchange2_r 0               2050220774922   2048840906692 0.07%
549.fotonik3d_r 0               2231807908394   2231807600916 0.00%
554.roms_r      0               2612969430882   2611529873683 0.06%
557.xz_r        0               367967125022    367760820700 0.06%
                 1               961163448926    961025548415 0.01%
                 2               522156857947    521939109911 0.04%
997.specrand_fr 0               413618578       413604068 0.00%
999.specrand_ir 0               413618578       413604068 0.00%
Jeff Law Oct. 4, 2023, 3:29 p.m. UTC | #7
On 9/28/23 15:43, Vineet Gupta wrote:
> RISC-V suffers from extraneous sign extensions, despite/given the ABI
> guarantee that 32-bit quantities are sign-extended into 64-bit registers,
> meaning incoming SI function args need not be explicitly sign extended
> (so do SI return values as most ALU insns implicitly sign-extend too.)
> 
> Existing REE doesn't seem to handle this well and there are various ideas
> floating around to smarten REE about it.
> 
> RISC-V also seems to correctly implement middle-end hook PROMOTE_MODE
> etc.
> 
> Another approach would be to prevent EXPAND from generating the
> sign_extend in the first place which this patch tries to do.
> 
> The hunk being removed was introduced way back in 1994 as
>     5069803972 ("expand_expr, case CONVERT_EXPR .. clear the promotion flag")
> 
> This survived full testsuite run for RISC-V rv64gc with surprisingly no
> fallouts: test results before/after are exactly same.
> 
> |                               | # of unexpected case / # of unique unexpected case
> |                               |          gcc |          g++ |     gfortran |
> | rv64imafdc_zba_zbb_zbs_zicond/|  264 /    87 |    5 /     2 |   72 /    12 |
> |    lp64d/medlow
> 
> Granted for something so old to have survived, there must be a valid
> reason. Unfortunately the original change didn't have additional
> commentary or a test case. That is not to say it can't/won't possibly
> break things on other arches/ABIs, hence the RFC for someone to scream
> that this is just bonkers, don't do this :-)
> 
> I've explicitly CC'ed Jakub and Roger who have last touched subreg
> promoted notes in expr.cc for insight and/or screaming ;-)
> 
> Thanks to Robin for narrowing this down in an amazing debugging session
> @ GNU Cauldron.
[ ... ]
So the data for Alpha was -- no change.  I also put the patch into my 
tester in the hopes that some target, any target would show a difference 
in test results.  It's churned about halfway through the embedded 
targets so far with no regressions.

Given the data in your followup message, this clearly looks valuable and 
so far we don't have any evidence Kenner's old change is useful or 
necessary anymore.

I'm not (yet) comfortable committing this patch, I think the easy 
avenues to getting a case where it's needed are not likely to prove 
fruitful.  So next steps here are for me to spend a bit more time trying 
to understand what cases Kenner was trying to handle.

Jeff
Jeff Law Oct. 4, 2023, 5:59 p.m. UTC | #8
On 9/28/23 15:43, Vineet Gupta wrote:
> RISC-V suffers from extraneous sign extensions, despite/given the ABI
> guarantee that 32-bit quantities are sign-extended into 64-bit registers,
> meaning incoming SI function args need not be explicitly sign extended
> (so do SI return values as most ALU insns implicitly sign-extend too.)
> 
> Existing REE doesn't seem to handle this well and there are various ideas
> floating around to smarten REE about it.
> 
> RISC-V also seems to correctly implement middle-end hook PROMOTE_MODE
> etc.
> 
> Another approach would be to prevent EXPAND from generating the
> sign_extend in the first place which this patch tries to do.
> 
> The hunk being removed was introduced way back in 1994 as
>     5069803972 ("expand_expr, case CONVERT_EXPR .. clear the promotion flag")
> 
> This survived full testsuite run for RISC-V rv64gc with surprisingly no
> fallouts: test results before/after are exactly same.
> 
> |                               | # of unexpected case / # of unique unexpected case
> |                               |          gcc |          g++ |     gfortran |
> | rv64imafdc_zba_zbb_zbs_zicond/|  264 /    87 |    5 /     2 |   72 /    12 |
> |    lp64d/medlow
> 
> Granted for something so old to have survived, there must be a valid
> reason. Unfortunately the original change didn't have additional
> commentary or a test case. That is not to say it can't/won't possibly
> break things on other arches/ABIs, hence the RFC for someone to scream
> that this is just bonkers, don't do this :-)
> 
> I've explicitly CC'ed Jakub and Roger who have last touched subreg
> promoted notes in expr.cc for insight and/or screaming ;-)
> 
> Thanks to Robin for narrowing this down in an amazing debugging session
> @ GNU Cauldron.
> 
> ```
> foo2:
> 	sext.w	a6,a1             <-- this goes away
> 	beq	a1,zero,.L4
> 	li	a5,0
> 	li	a0,0
> .L3:
> 	addw	a4,a2,a5
> 	addw	a5,a3,a5
> 	addw	a0,a4,a0
> 	bltu	a5,a6,.L3
> 	ret
> .L4:
> 	li	a0,0
> 	ret
> ```
> 
> Signed-off-by: Vineet Gupta <vineetg@rivosinc.com>
> Co-developed-by: Robin Dapp <rdapp.gcc@gmail.com>
> ---
>   gcc/expr.cc                               |  7 -------
>   gcc/testsuite/gcc.target/riscv/pr111466.c | 15 +++++++++++++++
>   2 files changed, 15 insertions(+), 7 deletions(-)
>   create mode 100644 gcc/testsuite/gcc.target/riscv/pr111466.c
So mcore-elf is showing something interesting.  With that hunk of Kenner 
code removed, it actually has a few failing tests that flip to passes.

> Tests that now work, but didn't before (11 tests):
> 
> mcore-sim: gcc.c-torture/execute/pr109986.c   -O1  execution test
> mcore-sim: gcc.c-torture/execute/pr109986.c   -O2  execution test
> mcore-sim: gcc.c-torture/execute/pr109986.c   -O2 -flto -fno-use-linker-plugin -flto-partition=none  execution test
> mcore-sim: gcc.c-torture/execute/pr109986.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  execution test
> mcore-sim: gcc.c-torture/execute/pr109986.c   -O3 -g  execution test
> mcore-sim: gcc.c-torture/execute/pr109986.c   -Os  execution test
> mcore-sim: gcc.c-torture/execute/pr84524.c   -O2  execution test
> mcore-sim: gcc.c-torture/execute/pr84524.c   -O2 -flto -fno-use-linker-plugin -flto-partition=none  execution test
> mcore-sim: gcc.c-torture/execute/pr84524.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  execution test
> mcore-sim: gcc.dg/tree-ssa/pr84436-5.c execution test
> mcore-sim: gcc.dg/tree-ssa/pr84436-5.c execution test

So that's a really interesting result.   If further analysis doesn't 
point the finger at a simulator bug or something like that, then we'll 
have strong evidence that Kenner's change is actively harmful from a 
correctness standpoint.  That would change the calculus here significantly.

Sadly, mcore-elf doesn't have a working gdb IIRC (don't ask how I know 
that!), so I'm going to have to analyze this further with less efficient 
techniques.  BUt definitely interesting news/results.

Jeff
Vineet Gupta Oct. 4, 2023, 6:14 p.m. UTC | #9
On 10/4/23 10:59, Jeff Law wrote:
>
>
> On 9/28/23 15:43, Vineet Gupta wrote:
>> RISC-V suffers from extraneous sign extensions, despite/given the ABI
>> guarantee that 32-bit quantities are sign-extended into 64-bit 
>> registers,
>> meaning incoming SI function args need not be explicitly sign extended
>> (so do SI return values as most ALU insns implicitly sign-extend too.)
>>
>> Existing REE doesn't seem to handle this well and there are various 
>> ideas
>> floating around to smarten REE about it.
>>
>> RISC-V also seems to correctly implement middle-end hook PROMOTE_MODE
>> etc.
>>
>> Another approach would be to prevent EXPAND from generating the
>> sign_extend in the first place which this patch tries to do.
>>
>> The hunk being removed was introduced way back in 1994 as
>>     5069803972 ("expand_expr, case CONVERT_EXPR .. clear the 
>> promotion flag")
>>
>> This survived full testsuite run for RISC-V rv64gc with surprisingly no
>> fallouts: test results before/after are exactly same.
>>
>> |                               | # of unexpected case / # of unique 
>> unexpected case
>> |                               |          gcc |          g++ |     
>> gfortran |
>> | rv64imafdc_zba_zbb_zbs_zicond/|  264 /    87 |    5 /     2 |   72 
>> /    12 |
>> |    lp64d/medlow
>>
>> Granted for something so old to have survived, there must be a valid
>> reason. Unfortunately the original change didn't have additional
>> commentary or a test case. That is not to say it can't/won't possibly
>> break things on other arches/ABIs, hence the RFC for someone to scream
>> that this is just bonkers, don't do this :-)
>>
>> I've explicitly CC'ed Jakub and Roger who have last touched subreg
>> promoted notes in expr.cc for insight and/or screaming ;-)
>>
>> Thanks to Robin for narrowing this down in an amazing debugging session
>> @ GNU Cauldron.
>>
>> ```
>> foo2:
>>     sext.w    a6,a1             <-- this goes away
>>     beq    a1,zero,.L4
>>     li    a5,0
>>     li    a0,0
>> .L3:
>>     addw    a4,a2,a5
>>     addw    a5,a3,a5
>>     addw    a0,a4,a0
>>     bltu    a5,a6,.L3
>>     ret
>> .L4:
>>     li    a0,0
>>     ret
>> ```
>>
>> Signed-off-by: Vineet Gupta <vineetg@rivosinc.com>
>> Co-developed-by: Robin Dapp <rdapp.gcc@gmail.com>
>> ---
>>   gcc/expr.cc                               |  7 -------
>>   gcc/testsuite/gcc.target/riscv/pr111466.c | 15 +++++++++++++++
>>   2 files changed, 15 insertions(+), 7 deletions(-)
>>   create mode 100644 gcc/testsuite/gcc.target/riscv/pr111466.c
> So mcore-elf is showing something interesting.  With that hunk of 
> Kenner code removed, it actually has a few failing tests that flip to 
> passes.
>
>> Tests that now work, but didn't before (11 tests):
>>
>> mcore-sim: gcc.c-torture/execute/pr109986.c   -O1  execution test
>> mcore-sim: gcc.c-torture/execute/pr109986.c   -O2  execution test
>> mcore-sim: gcc.c-torture/execute/pr109986.c   -O2 -flto 
>> -fno-use-linker-plugin -flto-partition=none  execution test
>> mcore-sim: gcc.c-torture/execute/pr109986.c   -O2 -flto 
>> -fuse-linker-plugin -fno-fat-lto-objects  execution test
>> mcore-sim: gcc.c-torture/execute/pr109986.c   -O3 -g  execution test
>> mcore-sim: gcc.c-torture/execute/pr109986.c   -Os  execution test
>> mcore-sim: gcc.c-torture/execute/pr84524.c   -O2  execution test
>> mcore-sim: gcc.c-torture/execute/pr84524.c   -O2 -flto 
>> -fno-use-linker-plugin -flto-partition=none  execution test
>> mcore-sim: gcc.c-torture/execute/pr84524.c   -O2 -flto 
>> -fuse-linker-plugin -fno-fat-lto-objects  execution test
>> mcore-sim: gcc.dg/tree-ssa/pr84436-5.c execution test
>> mcore-sim: gcc.dg/tree-ssa/pr84436-5.c execution test
>
> So that's a really interesting result.   If further analysis doesn't 
> point the finger at a simulator bug or something like that, then we'll 
> have strong evidence that Kenner's change is actively harmful from a 
> correctness standpoint.  That would change the calculus here 
> significantly.
>
> Sadly, mcore-elf doesn't have a working gdb IIRC (don't ask how I know 
> that!), so I'm going to have to analyze this further with less 
> efficient techniques.  BUt definitely interesting news/results.

Really interesting. Can't thank you enough for spending time in chasing 
this down.

-Vineet
Jeff Law Oct. 4, 2023, 8:11 p.m. UTC | #10
On 10/4/23 12:14, Vineet Gupta wrote:
> On 10/4/23 10:59, Jeff Law wrote:
>>
>>
>> On 9/28/23 15:43, Vineet Gupta wrote:
>>> RISC-V suffers from extraneous sign extensions, despite/given the ABI
>>> guarantee that 32-bit quantities are sign-extended into 64-bit 
>>> registers,
>>> meaning incoming SI function args need not be explicitly sign extended
>>> (so do SI return values as most ALU insns implicitly sign-extend too.)
>>>
>>> Existing REE doesn't seem to handle this well and there are various 
>>> ideas
>>> floating around to smarten REE about it.
>>>
>>> RISC-V also seems to correctly implement middle-end hook PROMOTE_MODE
>>> etc.
>>>
>>> Another approach would be to prevent EXPAND from generating the
>>> sign_extend in the first place which this patch tries to do.
>>>
>>> The hunk being removed was introduced way back in 1994 as
>>>     5069803972 ("expand_expr, case CONVERT_EXPR .. clear the 
>>> promotion flag")
>>>
>>> This survived full testsuite run for RISC-V rv64gc with surprisingly no
>>> fallouts: test results before/after are exactly same.
>>>
>>> |                               | # of unexpected case / # of unique 
>>> unexpected case
>>> |                               |          gcc |          g++ | 
>>> gfortran |
>>> | rv64imafdc_zba_zbb_zbs_zicond/|  264 /    87 |    5 /     2 |   72 
>>> /    12 |
>>> |    lp64d/medlow
>>>
>>> Granted for something so old to have survived, there must be a valid
>>> reason. Unfortunately the original change didn't have additional
>>> commentary or a test case. That is not to say it can't/won't possibly
>>> break things on other arches/ABIs, hence the RFC for someone to scream
>>> that this is just bonkers, don't do this :-)
>>>
>>> I've explicitly CC'ed Jakub and Roger who have last touched subreg
>>> promoted notes in expr.cc for insight and/or screaming ;-)
>>>
>>> Thanks to Robin for narrowing this down in an amazing debugging session
>>> @ GNU Cauldron.
>>>
>>> ```
>>> foo2:
>>>     sext.w    a6,a1             <-- this goes away
>>>     beq    a1,zero,.L4
>>>     li    a5,0
>>>     li    a0,0
>>> .L3:
>>>     addw    a4,a2,a5
>>>     addw    a5,a3,a5
>>>     addw    a0,a4,a0
>>>     bltu    a5,a6,.L3
>>>     ret
>>> .L4:
>>>     li    a0,0
>>>     ret
>>> ```
>>>
>>> Signed-off-by: Vineet Gupta <vineetg@rivosinc.com>
>>> Co-developed-by: Robin Dapp <rdapp.gcc@gmail.com>
>>> ---
>>>   gcc/expr.cc                               |  7 -------
>>>   gcc/testsuite/gcc.target/riscv/pr111466.c | 15 +++++++++++++++
>>>   2 files changed, 15 insertions(+), 7 deletions(-)
>>>   create mode 100644 gcc/testsuite/gcc.target/riscv/pr111466.c
>> So mcore-elf is showing something interesting.  With that hunk of 
>> Kenner code removed, it actually has a few failing tests that flip to 
>> passes.
>>
>>> Tests that now work, but didn't before (11 tests):
>>>
>>> mcore-sim: gcc.c-torture/execute/pr109986.c   -O1  execution test
>>> mcore-sim: gcc.c-torture/execute/pr109986.c   -O2  execution test
>>> mcore-sim: gcc.c-torture/execute/pr109986.c   -O2 -flto 
>>> -fno-use-linker-plugin -flto-partition=none  execution test
>>> mcore-sim: gcc.c-torture/execute/pr109986.c   -O2 -flto 
>>> -fuse-linker-plugin -fno-fat-lto-objects  execution test
>>> mcore-sim: gcc.c-torture/execute/pr109986.c   -O3 -g  execution test
>>> mcore-sim: gcc.c-torture/execute/pr109986.c   -Os  execution test
>>> mcore-sim: gcc.c-torture/execute/pr84524.c   -O2  execution test
>>> mcore-sim: gcc.c-torture/execute/pr84524.c   -O2 -flto 
>>> -fno-use-linker-plugin -flto-partition=none  execution test
>>> mcore-sim: gcc.c-torture/execute/pr84524.c   -O2 -flto 
>>> -fuse-linker-plugin -fno-fat-lto-objects  execution test
>>> mcore-sim: gcc.dg/tree-ssa/pr84436-5.c execution test
>>> mcore-sim: gcc.dg/tree-ssa/pr84436-5.c execution test
>>
>> So that's a really interesting result.   If further analysis doesn't 
>> point the finger at a simulator bug or something like that, then we'll 
>> have strong evidence that Kenner's change is actively harmful from a 
>> correctness standpoint.  That would change the calculus here 
>> significantly.
>>
>> Sadly, mcore-elf doesn't have a working gdb IIRC (don't ask how I know 
>> that!), so I'm going to have to analyze this further with less 
>> efficient techniques.  BUt definitely interesting news/results.
> 
> Really interesting. Can't thank you enough for spending time in chasing 
> this down.
Turns out this is a simulator bug.  The simulator assumed that "long" 
types were 32 bits and implemented sextb as x <<= 24; x >>= 24;  This is 
a fairly common error.  If "long" is 64 bits on the host, then that 
approach doesn't work well.



Jeff
Jeff Law Oct. 5, 2023, 4:49 a.m. UTC | #11
On 9/28/23 15:43, Vineet Gupta wrote:
> RISC-V suffers from extraneous sign extensions, despite/given the ABI
> guarantee that 32-bit quantities are sign-extended into 64-bit registers,
> meaning incoming SI function args need not be explicitly sign extended
> (so do SI return values as most ALU insns implicitly sign-extend too.)
> 
> Existing REE doesn't seem to handle this well and there are various ideas
> floating around to smarten REE about it.
> 
> RISC-V also seems to correctly implement middle-end hook PROMOTE_MODE
> etc.
> 
> Another approach would be to prevent EXPAND from generating the
> sign_extend in the first place which this patch tries to do.
> 
> The hunk being removed was introduced way back in 1994 as
>     5069803972 ("expand_expr, case CONVERT_EXPR .. clear the promotion flag")
> 
> This survived full testsuite run for RISC-V rv64gc with surprisingly no
> fallouts: test results before/after are exactly same.
> 
> |                               | # of unexpected case / # of unique unexpected case
> |                               |          gcc |          g++ |     gfortran |
> | rv64imafdc_zba_zbb_zbs_zicond/|  264 /    87 |    5 /     2 |   72 /    12 |
> |    lp64d/medlow
> 
> Granted for something so old to have survived, there must be a valid
> reason. Unfortunately the original change didn't have additional
> commentary or a test case. That is not to say it can't/won't possibly
> break things on other arches/ABIs, hence the RFC for someone to scream
> that this is just bonkers, don't do this :-)
> 
> I've explicitly CC'ed Jakub and Roger who have last touched subreg
> promoted notes in expr.cc for insight and/or screaming ;-)
> 
> Thanks to Robin for narrowing this down in an amazing debugging session
> @ GNU Cauldron.
So I think Kenner's code is trying to prevent having a value in a SUBREG 
that is inconsistent with the SUBREG_PROMOTED* flag bits.  But I think 
it's been unnecessary since Matz's rewrite in 2009.

The key change in Matz's work is that when the target is a promoted 
subreg expression we pass NULL down to expand_expr_real_2 which forces 
that routine to expand operand 0 (the incoming PARM_DECL object) into a 
temporary object (in this case another promoted subreg expression). 
It's that temporary object that Kenner's change clears the promoted 
subreg state on.

After expand_expr_real_returns, we call convert_move to move the data 
from that temporary object into the actual destination.  That routine 
(and its children) seem to be doing the right things WRT promoted subregs.


Prior to Matz's change I'm pretty sure we would do expansion directly 
into the destination (we'd generate appropriate tree nodes, then 
ultimately pass things off to store_expr which would pass down the final 
target to expand_expr).  Meaning that if the signedness differed then we 
potentially needed to reset the subreg promotion state on the 
destination object to ensure we were conservatively correct, hence 
Kenner's change.


I'm largely convinced we can drop this code.  I'm going to give it a few 
days to run some of the emulated native targets (they're long running 
jobs, so they only fire once a week).    But my plan is to move forward 
with the patch relatively soon.

jeff
Robin Dapp Oct. 5, 2023, 1:33 p.m. UTC | #12
> So I think Kenner's code is trying to prevent having a value in a
> SUBREG that is inconsistent with the SUBREG_PROMOTED* flag bits.  But
> I think it's been unnecessary since Matz's rewrite in 2009.

I couldn't really tell what the rewrite does entirely so I tried creating
a case where we would require the SUBREG_PROMOTED_VAR but couldn't come
up with any.  At least for the most common path through expr I believe
I know why:

So our case is when we have an SI subreg from a DI reg that is originally
a sign-extended SI.  Then we NOP-convert the SI subreg from signed to
unsigned.  We only perform implicit sign extensions therefore we can
omit the implicit zero-extension case here.
The way the result of the signed->unsigned conversion is used determines
whether we can use SUBREG_PROMOTED_VAR.  There are two possibilities
(1) and (2).

 void foo (int bar)
 {
    unsigned int u = (unsigned int) bar;


(1) unsigned long long ul = (unsigned long long) u;

As long as the result is used unsigned, we will always perform a zero
extension no matter the "Kenner hunk" (because whether the subreg has
SRP_SIGNED or !SUBREG_PROMOTED_VAR does not change the need for a
zero_extend).


(2) long long l = (long long) u;

SUBREG_PROMOTED is checked by the following in convert_move:

  scalar_int_mode to_int_mode;
  if (GET_CODE (from) == SUBREG
      && SUBREG_PROMOTED_VAR_P (from)
      && is_a <scalar_int_mode> (to_mode, &to_int_mode)
      && (GET_MODE_PRECISION (subreg_promoted_mode (from))
	  >= GET_MODE_PRECISION (to_int_mode))
      && SUBREG_CHECK_PROMOTED_SIGN (from, unsignedp))

The SUBREG_CHECK_PROMOTED_SIGN (from, unsignedp) is decisive
as far as I can tell.  unsignedp = 1 comes from treeop0 so our
"from" (i.e. unsigned int u).
With the "Kenner hunk" SUBREG_PROMOTED_VAR is unset, so we don't
strip the extension.  Without it, SUBREG_PROMOTED_VAR () == SRP_SIGNED
which is != unsignedp, so no stripping either.

Now there are several other paths that would need auditing as well
but at least this one is safe.  An interesting test target would be
a backend that does implicit zero extensions but as we haven't seen
fallout so far chances to find a trigger are slim.

Does that make sense?

Regards
 Robin
Richard Kenner Oct. 5, 2023, 2:56 p.m. UTC | #13
> At that particular time I think Kenner was mostly focused on the alpha 
> and ppc ports, but I think he was also still poking around with romp and 
> a29k.  I think romp is an unlikely target for this because it didn't 
> promote modes and it wasn't even building for several months 
> (April->late July).

Obviously, I have no recollection of that change at all.

In July of 1994, I don't believe I was actively working on much in the
way of ports, though I could be misremembering.  My guess is that this
change was to fix some bug, but I'm a bit mystified why I'd have
batched so many different changes together in one ChangeLog entry like
that.  I think you're correct that it was most likely the Alpha that
showed the bug.
Jeff Law Oct. 5, 2023, 3:04 p.m. UTC | #14
On 10/5/23 08:56, Richard Kenner wrote:
>> At that particular time I think Kenner was mostly focused on the alpha
>> and ppc ports, but I think he was also still poking around with romp and
>> a29k.  I think romp is an unlikely target for this because it didn't
>> promote modes and it wasn't even building for several months
>> (April->late July).
> 
> Obviously, I have no recollection of that change at all.
That's the assumption I made :-)

> 
> In July of 1994, I don't believe I was actively working on much in the
> way of ports, though I could be misremembering.  My guess is that this
> change was to fix some bug, but I'm a bit mystified why I'd have
> batched so many different changes together in one ChangeLog entry like
> that.  I think you're correct that it was most likely the Alpha that
> showed the bug.
The alpha was a combination of my memory and reviewing patches/email 
messages in that time span.

I agree this was almost certainly meant to be a bugfix and I suspect the 
bug was expanding directly into a promoted subreg target and ending up 
with inconsistency between the value and the promoted subreg state bits.

Jeff
Jeff Law Oct. 5, 2023, 4:42 p.m. UTC | #15
On 10/5/23 07:33, Robin Dapp wrote:
>> So I think Kenner's code is trying to prevent having a value in a
>> SUBREG that is inconsistent with the SUBREG_PROMOTED* flag bits.  But
>> I think it's been unnecessary since Matz's rewrite in 2009.
> 
> I couldn't really tell what the rewrite does entirely so I tried creating
> a case where we would require the SUBREG_PROMOTED_VAR but couldn't come
> up with any.  At least for the most common path through expr I believe
> I know why:
> 
> So our case is when we have an SI subreg from a DI reg that is originally
> a sign-extended SI.  Then we NOP-convert the SI subreg from signed to
> unsigned.  We only perform implicit sign extensions therefore we can
> omit the implicit zero-extension case here.
Right.  The extension into bits 32..63, whether it be zero or sign 
extension is essentially a don't care.  It's there because of 
PROMOTE_MODE forcing most operations to 64 bits to match the hardware, 
even if the upper 32 bits aren't ever relevant.





> The way the result of the signed->unsigned conversion is used determines
> whether we can use SUBREG_PROMOTED_VAR.  There are two possibilities
> (1) and (2).
> 
>   void foo (int bar)
>   {
>      unsigned int u = (unsigned int) bar;
> 
> 
> (1) unsigned long long ul = (unsigned long long) u;
> 
> As long as the result is used unsigned, we will always perform a zero
> extension no matter the "Kenner hunk" (because whether the subreg has
> SRP_SIGNED or !SUBREG_PROMOTED_VAR does not change the need for a
> zero_extend).
Right.

> 
> 
> (2) long long l = (long long) u;
> 
> SUBREG_PROMOTED is checked by the following in convert_move:
> 
>    scalar_int_mode to_int_mode;
>    if (GET_CODE (from) == SUBREG
>        && SUBREG_PROMOTED_VAR_P (from)
>        && is_a <scalar_int_mode> (to_mode, &to_int_mode)
>        && (GET_MODE_PRECISION (subreg_promoted_mode (from))
> 	  >= GET_MODE_PRECISION (to_int_mode))
>        && SUBREG_CHECK_PROMOTED_SIGN (from, unsignedp))
> 
> The SUBREG_CHECK_PROMOTED_SIGN (from, unsignedp) is decisive
> as far as I can tell.
Right.   We have already ensured the modes are either the same size or 
the PARM_DECL's mode is wider than the local VAR_DECL's mode.  So the 
check that FROM has the same promotion property as UNSIGNEDP is going to 
be decisive.

   unsignedp = 1 comes from treeop0 so our
Correct.  It comes from the TREE_TYPE (treeop0) where treeop0 is the 
incoming PARM_DECL.

> "from" (i.e. unsigned int u).
> With the "Kenner hunk" SUBREG_PROMOTED_VAR is unset, so we don't
> strip the extension.  Without it, SUBREG_PROMOTED_VAR () == SRP_SIGNED
> which is != unsignedp, so no stripping either.
Correct.  The Kenner hunk wipes SUBREG_PROMOTED_VAR, meaning the 
promotion state of the object is unknown.

> 
> Now there are several other paths that would need auditing as well
> but at least this one is safe.  An interesting test target would be
> a backend that does implicit zero extensions but as we haven't seen
> fallout so far chances to find a trigger are slim.
I did some testing of the other paths yesterday, but didn't include them 
in my message.

First, if the PARM_DECL is a narrower type than the local VAR_DECL, then 
the path we're considering changing doesn't get used because the modes 
have different sizes.   Thus we need not worry about this case.

If the PARM_DECL is wider than the local VAR_DECL, then we downsize to 
the same size as the VAR_DECL via a SUBREG and it behaves the same as 
the Vineet's original when the sizes are the same, but they differ in 
signedness.  So if we conclude the same size cases are OK, then the case 
where the PARM_DECL is wider than the VAR_DECL, we're going to be safe 
as well.


Jeff
Hans-Peter Nilsson Oct. 12, 2023, 2:37 a.m. UTC | #16
> From: Vineet Gupta <vineetg@rivosinc.com>
> Date: Thu, 28 Sep 2023 14:43:41 -0700

Please forgive my daftness, but...

> ```
> foo2:
> 	sext.w	a6,a1             <-- this goes away
> 	beq	a1,zero,.L4
> 	li	a5,0
> 	li	a0,0
> .L3:
> 	addw	a4,a2,a5
> 	addw	a5,a3,a5
> 	addw	a0,a4,a0
> 	bltu	a5,a6,.L3
> 	ret
> .L4:
> 	li	a0,0
> 	ret
> ```

...if your patch gets rid of that sign-extension above...

> diff --git a/gcc/testsuite/gcc.target/riscv/pr111466.c b/gcc/testsuite/gcc.target/riscv/pr111466.c
> new file mode 100644
> index 000000000000..007792466a51
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/pr111466.c
> @@ -0,0 +1,15 @@
> +/* Simplified varaint of gcc.target/riscv/zba-adduw.c.  */
> +
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv64gc_zba_zbs -mabi=lp64" } */
> +/* { dg-skip-if "" { *-*-* } { "-O0" } } */
> +
> +int foo2(int unused, int n, unsigned y, unsigned delta){
> +  int s = 0;
> +  unsigned int x = 0;
> +  for (;x<n;x +=delta)
> +    s += x+y;
> +  return s;
> +}
> +
> +/* { dg-final { scan-assembler "\msext\M" } } */

...then why test for the presence of a sign-extension
instruction in the test-case?

IOW, shouldn't that be a scan-assember-not?

(What am I missing?)

brgds, H-P
PS. sorry I missed the Cauldron this year.  Hope to see you all next year!
Vineet Gupta Oct. 12, 2023, 4:11 p.m. UTC | #17
On 10/11/23 19:37, Hans-Peter Nilsson wrote:
>> ```
>> foo2:
>> 	sext.w	a6,a1             <-- this goes away
>> 	beq	a1,zero,.L4
>> 	li	a5,0
>> 	li	a0,0
>> .L3:
>> 	addw	a4,a2,a5
>> 	addw	a5,a3,a5
>> 	addw	a0,a4,a0
>> 	bltu	a5,a6,.L3
>> 	ret
>> .L4:
>> 	li	a0,0
>> 	ret
>> ```
> ...if your patch gets rid of that sign-extension above...
>
>> diff --git a/gcc/testsuite/gcc.target/riscv/pr111466.c b/gcc/testsuite/gcc.target/riscv/pr111466.c
>> new file mode 100644
>> index 000000000000..007792466a51
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/riscv/pr111466.c
>> @@ -0,0 +1,15 @@
>> +/* Simplified varaint of gcc.target/riscv/zba-adduw.c.  */
>> +
>> +/* { dg-do compile } */
>> +/* { dg-options "-march=rv64gc_zba_zbs -mabi=lp64" } */
>> +/* { dg-skip-if "" { *-*-* } { "-O0" } } */
>> +
>> +int foo2(int unused, int n, unsigned y, unsigned delta){
>> +  int s = 0;
>> +  unsigned int x = 0;
>> +  for (;x<n;x +=delta)
>> +    s += x+y;
>> +  return s;
>> +}
>> +
>> +/* { dg-final { scan-assembler "\msext\M" } } */
> ...then why test for the presence of a sign-extension
> instruction in the test-case?
>
> IOW, shouldn't that be a scan-assember-not?

Yes indeed.

> (What am I missing?)

Nothing deep really, just a snafu on my side. I'll fix it in v2.

Thx,
-Vineet

> brgds, H-P
> PS. sorry I missed the Cauldron this year.  Hope to see you all next year!

Looking fwd to.

Thx,
-Vineet
Jeff Law Oct. 17, 2023, 4:07 a.m. UTC | #18
On 9/28/23 15:43, Vineet Gupta wrote:
> RISC-V suffers from extraneous sign extensions, despite/given the ABI
> guarantee that 32-bit quantities are sign-extended into 64-bit registers,
> meaning incoming SI function args need not be explicitly sign extended
> (so do SI return values as most ALU insns implicitly sign-extend too.)
> 
> Existing REE doesn't seem to handle this well and there are various ideas
> floating around to smarten REE about it.
> 
> RISC-V also seems to correctly implement middle-end hook PROMOTE_MODE
> etc.
> 
> Another approach would be to prevent EXPAND from generating the
> sign_extend in the first place which this patch tries to do.
> 
> The hunk being removed was introduced way back in 1994 as
>     5069803972 ("expand_expr, case CONVERT_EXPR .. clear the promotion flag")
> 
> This survived full testsuite run for RISC-V rv64gc with surprisingly no
> fallouts: test results before/after are exactly same.
> 
> |                               | # of unexpected case / # of unique unexpected case
> |                               |          gcc |          g++ |     gfortran |
> | rv64imafdc_zba_zbb_zbs_zicond/|  264 /    87 |    5 /     2 |   72 /    12 |
> |    lp64d/medlow
> 
> Granted for something so old to have survived, there must be a valid
> reason. Unfortunately the original change didn't have additional
> commentary or a test case. That is not to say it can't/won't possibly
> break things on other arches/ABIs, hence the RFC for someone to scream
> that this is just bonkers, don't do this :-)
> 
> I've explicitly CC'ed Jakub and Roger who have last touched subreg
> promoted notes in expr.cc for insight and/or screaming ;-)
> 
> Thanks to Robin for narrowing this down in an amazing debugging session
> @ GNU Cauldron.
> 
> ```
> foo2:
> 	sext.w	a6,a1             <-- this goes away
> 	beq	a1,zero,.L4
> 	li	a5,0
> 	li	a0,0
> .L3:
> 	addw	a4,a2,a5
> 	addw	a5,a3,a5
> 	addw	a0,a4,a0
> 	bltu	a5,a6,.L3
> 	ret
> .L4:
> 	li	a0,0
> 	ret
> ```
> 
> Signed-off-by: Vineet Gupta <vineetg@rivosinc.com>
> Co-developed-by: Robin Dapp <rdapp.gcc@gmail.com>
> ---
>   gcc/expr.cc                               |  7 -------
>   gcc/testsuite/gcc.target/riscv/pr111466.c | 15 +++++++++++++++
>   2 files changed, 15 insertions(+), 7 deletions(-)
>   create mode 100644 gcc/testsuite/gcc.target/riscv/pr111466.c
I created a ChangeLog and pushed this after a final bootstrap/comparison 
run on x86_64.

As I've noted before, this has been running across the various targets 
in my tester for quite a while with no issues.  Additionally Robin and 
myself have dug into various paths through expr_expr_real_2 and we're 
reasonably confident this is safe (about as much as we can be given the 
lack of information about the original patch).

My strong suspicion is that Michael M. made this code obsolete when he 
last revamped the gimple/ssa->RTL expansion path.

Thanks for your patience Vineet.  It's been a long road.


Jivan is diving into Joern's work.  It shows significant promise, though 
we are seeing some very weird behavior on perlbench.

jeff
Vineet Gupta Oct. 17, 2023, 6:06 p.m. UTC | #19
On 10/16/23 21:07, Jeff Law wrote:
>
>
> On 9/28/23 15:43, Vineet Gupta wrote:
>> RISC-V suffers from extraneous sign extensions, despite/given the ABI
>> guarantee that 32-bit quantities are sign-extended into 64-bit 
>> registers,
>> meaning incoming SI function args need not be explicitly sign extended
>> (so do SI return values as most ALU insns implicitly sign-extend too.)
>>
>> [...]
>> ---
>>   gcc/expr.cc                               |  7 -------
>>   gcc/testsuite/gcc.target/riscv/pr111466.c | 15 +++++++++++++++
>>   2 files changed, 15 insertions(+), 7 deletions(-)
>>   create mode 100644 gcc/testsuite/gcc.target/riscv/pr111466.c
> I created a ChangeLog and pushed this after a final 
> bootstrap/comparison run on x86_64.

Awesome.

> As I've noted before, this has been running across the various targets 
> in my tester for quite a while with no issues. Additionally Robin and 
> myself have dug into various paths through expr_expr_real_2 and we're 
> reasonably confident this is safe (about as much as we can be given 
> the lack of information about the original patch).
>
> My strong suspicion is that Michael M. made this code obsolete when he 
> last revamped the gimple/ssa->RTL expansion path.
>
> Thanks for your patience Vineet.  It's been a long road.

All the thanks to you for verifying this across targets and deep analysis.
There was a little snafu on my part in the test for which I'll post a fixup.


> Jivan is diving into Joern's work.  It shows significant promise, 
> though we are seeing some very weird behavior on perlbench.

That's great to hear. I was away from sign extension work much of last 
week. Back on it now. The prev example I was chasing 
(gcc.c-torture/compile/20040401-1.c) turned out to be a dead end as it 
has explicit casts and such so not an ideal case. I'm sifting through 
the logs and looking for better tests, there's a ton of them so I'm sure 
there's bunch more we can do at expand time to eliminate extensions 
early which as you mentioned is better in general, to have to "undo" 
less in later passes.

Thx,
-Vineet
diff mbox series

Patch

diff --git a/gcc/expr.cc b/gcc/expr.cc
index 308ddc09e631..d259c6e53385 100644
--- a/gcc/expr.cc
+++ b/gcc/expr.cc
@@ -9332,13 +9332,6 @@  expand_expr_real_2 (sepops ops, rtx target, machine_mode tmode,
 	  op0 = expand_expr (treeop0, target, VOIDmode,
 			     modifier);
 
-	  /* If the signedness of the conversion differs and OP0 is
-	     a promoted SUBREG, clear that indication since we now
-	     have to do the proper extension.  */
-	  if (TYPE_UNSIGNED (TREE_TYPE (treeop0)) != unsignedp
-	      && GET_CODE (op0) == SUBREG)
-	    SUBREG_PROMOTED_VAR_P (op0) = 0;
-
 	  return REDUCE_BIT_FIELD (op0);
 	}
 
diff --git a/gcc/testsuite/gcc.target/riscv/pr111466.c b/gcc/testsuite/gcc.target/riscv/pr111466.c
new file mode 100644
index 000000000000..007792466a51
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/pr111466.c
@@ -0,0 +1,15 @@ 
+/* Simplified varaint of gcc.target/riscv/zba-adduw.c.  */
+
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc_zba_zbs -mabi=lp64" } */
+/* { dg-skip-if "" { *-*-* } { "-O0" } } */
+
+int foo2(int unused, int n, unsigned y, unsigned delta){
+  int s = 0;
+  unsigned int x = 0;
+  for (;x<n;x +=delta)
+    s += x+y;
+  return s;
+}
+
+/* { dg-final { scan-assembler "\msext\M" } } */