diff mbox series

PR 91865: Avoid ZERO_EXTEND of ZERO_EXTEND in make_compound_operation.

Message ID 01f701d9feeb$ce8654e0$6b92fea0$@nextmovesoftware.com
State New
Headers show
Series PR 91865: Avoid ZERO_EXTEND of ZERO_EXTEND in make_compound_operation. | expand

Commit Message

Roger Sayle Oct. 14, 2023, 10:14 p.m. UTC
This patch is my proposed solution to PR rtl-optimization/91865.
Normally RTX simplification canonicalizes a ZERO_EXTEND of a ZERO_EXTEND
to a single ZERO_EXTEND, but as shown in this PR it is possible for
combine's make_compound_operation to unintentionally generate a
non-canonical ZERO_EXTEND of a ZERO_EXTEND, which is unlikely to be
matched by the backend.

For the new test case:

const int table[2] = {1, 2};
int foo (char i) { return table[i]; }

compiling with -O2 -mlarge on msp430 we currently see:

Trying 2 -> 7:
    2: r25:HI=zero_extend(R12:QI)
      REG_DEAD R12:QI
    7: r28:PSI=sign_extend(r25:HI)#0
      REG_DEAD r25:HI
Failed to match this instruction:
(set (reg:PSI 28 [ iD.1772 ])
    (zero_extend:PSI (zero_extend:HI (reg:QI 12 R12 [ iD.1772 ]))))

which results in the following code:

foo:    AND     #0xff, R12
        RLAM.A #4, R12 { RRAM.A #4, R12
        RLAM.A  #1, R12
        MOVX.W  table(R12), R12
        RETA

With this patch, we now see:

Trying 2 -> 7:
    2: r25:HI=zero_extend(R12:QI)
      REG_DEAD R12:QI
    7: r28:PSI=sign_extend(r25:HI)#0
      REG_DEAD r25:HI
Successfully matched this instruction:
(set (reg:PSI 28 [ iD.1772 ])
    (zero_extend:PSI (reg:QI 12 R12 [ iD.1772 ])))
allowing combination of insns 2 and 7
original costs 4 + 8 = 12
replacement cost 8

foo:    MOV.B   R12, R12
        RLAM.A  #1, R12
        MOVX.W  table(R12), R12
        RETA


This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
and make -k check, both with and without --target_board=unix{-m32}
with no new failures.  Ok for mainline?

2023-10-14  Roger Sayle  <roger@nextmovesoftware.com>

gcc/ChangeLog
        PR rtl-optimization/91865
        * combine.cc (make_compound_operation): Avoid creating a
        ZERO_EXTEND of a ZERO_EXTEND.

gcc/testsuite/ChangeLog
        PR rtl-optimization/91865
        * gcc.target/msp430/pr91865.c: New test case.


Thanks in advance,
Roger
--

Comments

Jeff Law Oct. 14, 2023, 11:02 p.m. UTC | #1
On 10/14/23 16:14, Roger Sayle wrote:
> 
> This patch is my proposed solution to PR rtl-optimization/91865.
> Normally RTX simplification canonicalizes a ZERO_EXTEND of a ZERO_EXTEND
> to a single ZERO_EXTEND, but as shown in this PR it is possible for
> combine's make_compound_operation to unintentionally generate a
> non-canonical ZERO_EXTEND of a ZERO_EXTEND, which is unlikely to be
> matched by the backend.
> 
> For the new test case:
> 
> const int table[2] = {1, 2};
> int foo (char i) { return table[i]; }
> 
> compiling with -O2 -mlarge on msp430 we currently see:
> 
> Trying 2 -> 7:
>      2: r25:HI=zero_extend(R12:QI)
>        REG_DEAD R12:QI
>      7: r28:PSI=sign_extend(r25:HI)#0
>        REG_DEAD r25:HI
> Failed to match this instruction:
> (set (reg:PSI 28 [ iD.1772 ])
>      (zero_extend:PSI (zero_extend:HI (reg:QI 12 R12 [ iD.1772 ]))))
> 
> which results in the following code:
> 
> foo:    AND     #0xff, R12
>          RLAM.A #4, R12 { RRAM.A #4, R12
>          RLAM.A  #1, R12
>          MOVX.W  table(R12), R12
>          RETA
> 
> With this patch, we now see:
> 
> Trying 2 -> 7:
>      2: r25:HI=zero_extend(R12:QI)
>        REG_DEAD R12:QI
>      7: r28:PSI=sign_extend(r25:HI)#0
>        REG_DEAD r25:HI
> Successfully matched this instruction:
> (set (reg:PSI 28 [ iD.1772 ])
>      (zero_extend:PSI (reg:QI 12 R12 [ iD.1772 ])))
> allowing combination of insns 2 and 7
> original costs 4 + 8 = 12
> replacement cost 8
> 
> foo:    MOV.B   R12, R12
>          RLAM.A  #1, R12
>          MOVX.W  table(R12), R12
>          RETA
> 
> 
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check, both with and without --target_board=unix{-m32}
> with no new failures.  Ok for mainline?
> 
> 2023-10-14  Roger Sayle  <roger@nextmovesoftware.com>
> 
> gcc/ChangeLog
>          PR rtl-optimization/91865
>          * combine.cc (make_compound_operation): Avoid creating a
>          ZERO_EXTEND of a ZERO_EXTEND.
> 
> gcc/testsuite/ChangeLog
>          PR rtl-optimization/91865
>          * gcc.target/msp430/pr91865.c: New test case.
Neither an ACK or NAK at this point.

The bug report includes a patch from Segher which purports to fix this 
in simplify-rtx.  Any thoughts on Segher's approach and whether or not 
it should be considered?

The BZ also indicates that removal of 2 patterns from msp430.md would 
solve this too (though it may cause regressions elsewhere?).  Any 
thoughts on that approach as well?

Thanks,
jeff
Roger Sayle Oct. 15, 2023, 9:49 a.m. UTC | #2
Hi Jeff,
Thanks for the speedy review(s).

> From: Jeff Law <jeffreyalaw@gmail.com>
> Sent: 15 October 2023 00:03
> To: Roger Sayle <roger@nextmovesoftware.com>; gcc-patches@gcc.gnu.org
> Subject: Re: [PATCH] PR 91865: Avoid ZERO_EXTEND of ZERO_EXTEND in
> make_compound_operation.
> 
> On 10/14/23 16:14, Roger Sayle wrote:
> >
> > This patch is my proposed solution to PR rtl-optimization/91865.
> > Normally RTX simplification canonicalizes a ZERO_EXTEND of a
> > ZERO_EXTEND to a single ZERO_EXTEND, but as shown in this PR it is
> > possible for combine's make_compound_operation to unintentionally
> > generate a non-canonical ZERO_EXTEND of a ZERO_EXTEND, which is
> > unlikely to be matched by the backend.
> >
> > For the new test case:
> >
> > const int table[2] = {1, 2};
> > int foo (char i) { return table[i]; }
> >
> > compiling with -O2 -mlarge on msp430 we currently see:
> >
> > Trying 2 -> 7:
> >      2: r25:HI=zero_extend(R12:QI)
> >        REG_DEAD R12:QI
> >      7: r28:PSI=sign_extend(r25:HI)#0
> >        REG_DEAD r25:HI
> > Failed to match this instruction:
> > (set (reg:PSI 28 [ iD.1772 ])
> >      (zero_extend:PSI (zero_extend:HI (reg:QI 12 R12 [ iD.1772 ]))))
> >
> > which results in the following code:
> >
> > foo:    AND     #0xff, R12
> >          RLAM.A #4, R12 { RRAM.A #4, R12
> >          RLAM.A  #1, R12
> >          MOVX.W  table(R12), R12
> >          RETA
> >
> > With this patch, we now see:
> >
> > Trying 2 -> 7:
> >      2: r25:HI=zero_extend(R12:QI)
> >        REG_DEAD R12:QI
> >      7: r28:PSI=sign_extend(r25:HI)#0
> >        REG_DEAD r25:HI
> > Successfully matched this instruction:
> > (set (reg:PSI 28 [ iD.1772 ])
> >      (zero_extend:PSI (reg:QI 12 R12 [ iD.1772 ]))) allowing
> > combination of insns 2 and 7 original costs 4 + 8 = 12 replacement
> > cost 8
> >
> > foo:    MOV.B   R12, R12
> >          RLAM.A  #1, R12
> >          MOVX.W  table(R12), R12
> >          RETA
> >
> >
> > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> > and make -k check, both with and without --target_board=unix{-m32}
> > with no new failures.  Ok for mainline?
> >
> > 2023-10-14  Roger Sayle  <roger@nextmovesoftware.com>
> >
> > gcc/ChangeLog
> >          PR rtl-optimization/91865
> >          * combine.cc (make_compound_operation): Avoid creating a
> >          ZERO_EXTEND of a ZERO_EXTEND.
> >
> > gcc/testsuite/ChangeLog
> >          PR rtl-optimization/91865
> >          * gcc.target/msp430/pr91865.c: New test case.
> Neither an ACK or NAK at this point.
> 
> The bug report includes a patch from Segher which purports to fix this in simplify-
> rtx.  Any thoughts on Segher's approach and whether or not it should be
> considered?
> 
> The BZ also indicates that removal of 2 patterns from msp430.md would solve this
> too (though it may cause regressions elsewhere?).  Any thoughts on that approach
> as well?
> 

Great questions.  I believe Segher's proposed patch (in comment #4) was an
msp430-specific proof-of-concept workaround rather than intended to be fix.
Eliminating a ZERO_EXTEND simply by changing the mode of a hard register
is not a solution that'll work on many platforms (and therefore not really suitable
for target-independent middle-end code in the RTL optimizers).

For example, zero_extend:TI (and:QI (reg:QI hard_r1) (const_int 0x0f)) can't
universally be reduced to (and:TI (reg:TI hard_r1) (const_int 0x0f)).  Notice that
Segher's code doesn't check TARGET_HARD_REGNO_MODE_OK or 
TARGET_MODES_TIEABLE_P or any of the other backend hooks necessary
to confirm such a transformation is safe/possible.

Secondly, the hard register aspect is a bit of a red herring.  This work-around
fixes the issue in the original BZ description, but not the slightly modified test
case in comment #2 (with a global variable).  This doesn't have a hard register,
but does have the dubious ZERO_EXTEND/SIGN_EXTEND of a ZERO_EXTEND.

The underlying issue, which is applicable to all targets, is that combine.cc's
make_compound_operation is expected to reverse the local transformations
made by expand_compound_operation.  Hence, if an RTL expression is
canonical going into expand_compound_operation, it is expected (hoped)
to be canonical (and equivalent) coming out of make_compound_operation.

Hence, rather than be a MSP430 specific issue, no target should expect (or
be expected to see) a ZERO_EXTEND of a ZERO_EXTEND, or a SIGN_EXTEND
of a ZERO_EXTEND in the RTL stream.  Much like a binary operator with two
CONST_INT operands, or a shift by zero, it's something the middle-end might
reasonably be expected to clean-up. [Yeah, I know... 😊]

However, if it isn't considered a problem that make_compound_operand and
simplify_rtx generate different forms of the same expression, requiring the
backend to match multiple patterns (some for the combine pass, and others
for other RTL passes), there is a simple fix for MSP430; Just add a pattern 
that matches (the slightly odd):

> > (set (reg:PSI 28 [ iD.1772 ])
> >      (zero_extend:PSI (zero_extend:HI (reg:QI 12 R12 [ iD.1772 ]))))

As a rule of thumb, if the missed optimization bug report includes combine's
diagnostic "Failed to match this instruction:", things can be improved by adding
a pattern (often a define_insn_and_split) that matches the shown RTL.

In this case the perhaps reasonable assumption is that the above would/should
(normally) match the backend's existing (zero_extend:PSI (reg:QI ...)) insn pattern.
Or that's my understanding of why this PR is classified as rtl-optimization and
not target.

Finally, my patch shouldn't block a (updated corrected) variant of Segher's patch
or other solution to PR 91865.  The more (safe) solutions the merrier.

Thanks again.
Jeff Law Oct. 16, 2023, 7:27 p.m. UTC | #3
On 10/15/23 03:49, Roger Sayle wrote:
> 
> Hi Jeff,
> Thanks for the speedy review(s).
> 
>> From: Jeff Law <jeffreyalaw@gmail.com>
>> Sent: 15 October 2023 00:03
>> To: Roger Sayle <roger@nextmovesoftware.com>; gcc-patches@gcc.gnu.org
>> Subject: Re: [PATCH] PR 91865: Avoid ZERO_EXTEND of ZERO_EXTEND in
>> make_compound_operation.
>>
>> On 10/14/23 16:14, Roger Sayle wrote:
>>>
>>> This patch is my proposed solution to PR rtl-optimization/91865.
>>> Normally RTX simplification canonicalizes a ZERO_EXTEND of a
>>> ZERO_EXTEND to a single ZERO_EXTEND, but as shown in this PR it is
>>> possible for combine's make_compound_operation to unintentionally
>>> generate a non-canonical ZERO_EXTEND of a ZERO_EXTEND, which is
>>> unlikely to be matched by the backend.
>>>
>>> For the new test case:
>>>
>>> const int table[2] = {1, 2};
>>> int foo (char i) { return table[i]; }
>>>
>>> compiling with -O2 -mlarge on msp430 we currently see:
>>>
>>> Trying 2 -> 7:
>>>       2: r25:HI=zero_extend(R12:QI)
>>>         REG_DEAD R12:QI
>>>       7: r28:PSI=sign_extend(r25:HI)#0
>>>         REG_DEAD r25:HI
>>> Failed to match this instruction:
>>> (set (reg:PSI 28 [ iD.1772 ])
>>>       (zero_extend:PSI (zero_extend:HI (reg:QI 12 R12 [ iD.1772 ]))))
>>>
>>> which results in the following code:
>>>
>>> foo:    AND     #0xff, R12
>>>           RLAM.A #4, R12 { RRAM.A #4, R12
>>>           RLAM.A  #1, R12
>>>           MOVX.W  table(R12), R12
>>>           RETA
>>>
>>> With this patch, we now see:
>>>
>>> Trying 2 -> 7:
>>>       2: r25:HI=zero_extend(R12:QI)
>>>         REG_DEAD R12:QI
>>>       7: r28:PSI=sign_extend(r25:HI)#0
>>>         REG_DEAD r25:HI
>>> Successfully matched this instruction:
>>> (set (reg:PSI 28 [ iD.1772 ])
>>>       (zero_extend:PSI (reg:QI 12 R12 [ iD.1772 ]))) allowing
>>> combination of insns 2 and 7 original costs 4 + 8 = 12 replacement
>>> cost 8
>>>
>>> foo:    MOV.B   R12, R12
>>>           RLAM.A  #1, R12
>>>           MOVX.W  table(R12), R12
>>>           RETA
>>>
>>>
>>> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
>>> and make -k check, both with and without --target_board=unix{-m32}
>>> with no new failures.  Ok for mainline?
>>>
>>> 2023-10-14  Roger Sayle  <roger@nextmovesoftware.com>
>>>
>>> gcc/ChangeLog
>>>           PR rtl-optimization/91865
>>>           * combine.cc (make_compound_operation): Avoid creating a
>>>           ZERO_EXTEND of a ZERO_EXTEND.
>>>
>>> gcc/testsuite/ChangeLog
>>>           PR rtl-optimization/91865
>>>           * gcc.target/msp430/pr91865.c: New test case.
>> Neither an ACK or NAK at this point.
>>
>> The bug report includes a patch from Segher which purports to fix this in simplify-
>> rtx.  Any thoughts on Segher's approach and whether or not it should be
>> considered?
>>
>> The BZ also indicates that removal of 2 patterns from msp430.md would solve this
>> too (though it may cause regressions elsewhere?).  Any thoughts on that approach
>> as well?
>>
> 
> Great questions.  I believe Segher's proposed patch (in comment #4) was an
> msp430-specific proof-of-concept workaround rather than intended to be fix.
> Eliminating a ZERO_EXTEND simply by changing the mode of a hard register
> is not a solution that'll work on many platforms (and therefore not really suitable
> for target-independent middle-end code in the RTL optimizers).
Thanks.  I didn't really look at Segher's patch, so thanks for digging 
into it.  Certainly just flipping the mode of the hard register isn't 
correct.


> 
> The underlying issue, which is applicable to all targets, is that combine.cc's
> make_compound_operation is expected to reverse the local transformations
> made by expand_compound_operation.  Hence, if an RTL expression is
> canonical going into expand_compound_operation, it is expected (hoped)
> to be canonical (and equivalent) coming out of make_compound_operation.
In theory, correct.


> 
> Hence, rather than be a MSP430 specific issue, no target should expect (or
> be expected to see) a ZERO_EXTEND of a ZERO_EXTEND, or a SIGN_EXTEND
> of a ZERO_EXTEND in the RTL stream.  Much like a binary operator with two
> CONST_INT operands, or a shift by zero, it's something the middle-end might
> reasonably be expected to clean-up. [Yeah, I know... 😊]
Agreed.



> 
>>> (set (reg:PSI 28 [ iD.1772 ])
>>>       (zero_extend:PSI (zero_extend:HI (reg:QI 12 R12 [ iD.1772 ]))))
> 
> As a rule of thumb, if the missed optimization bug report includes combine's
> diagnostic "Failed to match this instruction:", things can be improved by adding
> a pattern (often a define_insn_and_split) that matches the shown RTL.
Yes, but we also need to ponder if that's the right way to fix any given 
problem.  Sometimes we're going to be better off simplifying elsewhere 
in the pipeline.  I think we can agree this is one of the cases where 
matching the RTL in the backend is not the desired approach.

Occasionally things like those two patterns show up for various reasons. 
  Hopefully they can be removed :-)  Though the first looks awful close 
to something I did for the mn102 (not to be confused with the mn103) 
eons ago.  Partial modes aren't exactly handled well.

> 
> In this case the perhaps reasonable assumption is that the above would/should
> (normally) match the backend's existing (zero_extend:PSI (reg:QI ...)) insn pattern.
> Or that's my understanding of why this PR is classified as rtl-optimization and
> not target.
I wouldn't put a lot of faith in the classification ;-)


> 
> Finally, my patch shouldn't block a (updated corrected) variant of Segher's patch
> or other solution to PR 91865.  The more (safe) solutions the merrier.
Generally agreed.

jeff
Richard Biener Oct. 17, 2023, 8:43 a.m. UTC | #4
On Mon, Oct 16, 2023 at 9:27 PM Jeff Law <jeffreyalaw@gmail.com> wrote:
>
>
>
> On 10/15/23 03:49, Roger Sayle wrote:
> >
> > Hi Jeff,
> > Thanks for the speedy review(s).
> >
> >> From: Jeff Law <jeffreyalaw@gmail.com>
> >> Sent: 15 October 2023 00:03
> >> To: Roger Sayle <roger@nextmovesoftware.com>; gcc-patches@gcc.gnu.org
> >> Subject: Re: [PATCH] PR 91865: Avoid ZERO_EXTEND of ZERO_EXTEND in
> >> make_compound_operation.
> >>
> >> On 10/14/23 16:14, Roger Sayle wrote:
> >>>
> >>> This patch is my proposed solution to PR rtl-optimization/91865.
> >>> Normally RTX simplification canonicalizes a ZERO_EXTEND of a
> >>> ZERO_EXTEND to a single ZERO_EXTEND, but as shown in this PR it is
> >>> possible for combine's make_compound_operation to unintentionally
> >>> generate a non-canonical ZERO_EXTEND of a ZERO_EXTEND, which is
> >>> unlikely to be matched by the backend.
> >>>
> >>> For the new test case:
> >>>
> >>> const int table[2] = {1, 2};
> >>> int foo (char i) { return table[i]; }
> >>>
> >>> compiling with -O2 -mlarge on msp430 we currently see:
> >>>
> >>> Trying 2 -> 7:
> >>>       2: r25:HI=zero_extend(R12:QI)
> >>>         REG_DEAD R12:QI
> >>>       7: r28:PSI=sign_extend(r25:HI)#0
> >>>         REG_DEAD r25:HI
> >>> Failed to match this instruction:
> >>> (set (reg:PSI 28 [ iD.1772 ])
> >>>       (zero_extend:PSI (zero_extend:HI (reg:QI 12 R12 [ iD.1772 ]))))
> >>>
> >>> which results in the following code:
> >>>
> >>> foo:    AND     #0xff, R12
> >>>           RLAM.A #4, R12 { RRAM.A #4, R12
> >>>           RLAM.A  #1, R12
> >>>           MOVX.W  table(R12), R12
> >>>           RETA
> >>>
> >>> With this patch, we now see:
> >>>
> >>> Trying 2 -> 7:
> >>>       2: r25:HI=zero_extend(R12:QI)
> >>>         REG_DEAD R12:QI
> >>>       7: r28:PSI=sign_extend(r25:HI)#0
> >>>         REG_DEAD r25:HI
> >>> Successfully matched this instruction:
> >>> (set (reg:PSI 28 [ iD.1772 ])
> >>>       (zero_extend:PSI (reg:QI 12 R12 [ iD.1772 ]))) allowing
> >>> combination of insns 2 and 7 original costs 4 + 8 = 12 replacement
> >>> cost 8
> >>>
> >>> foo:    MOV.B   R12, R12
> >>>           RLAM.A  #1, R12
> >>>           MOVX.W  table(R12), R12
> >>>           RETA
> >>>
> >>>
> >>> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> >>> and make -k check, both with and without --target_board=unix{-m32}
> >>> with no new failures.  Ok for mainline?
> >>>
> >>> 2023-10-14  Roger Sayle  <roger@nextmovesoftware.com>
> >>>
> >>> gcc/ChangeLog
> >>>           PR rtl-optimization/91865
> >>>           * combine.cc (make_compound_operation): Avoid creating a
> >>>           ZERO_EXTEND of a ZERO_EXTEND.
> >>>
> >>> gcc/testsuite/ChangeLog
> >>>           PR rtl-optimization/91865
> >>>           * gcc.target/msp430/pr91865.c: New test case.
> >> Neither an ACK or NAK at this point.
> >>
> >> The bug report includes a patch from Segher which purports to fix this in simplify-
> >> rtx.  Any thoughts on Segher's approach and whether or not it should be
> >> considered?
> >>
> >> The BZ also indicates that removal of 2 patterns from msp430.md would solve this
> >> too (though it may cause regressions elsewhere?).  Any thoughts on that approach
> >> as well?
> >>
> >
> > Great questions.  I believe Segher's proposed patch (in comment #4) was an
> > msp430-specific proof-of-concept workaround rather than intended to be fix.
> > Eliminating a ZERO_EXTEND simply by changing the mode of a hard register
> > is not a solution that'll work on many platforms (and therefore not really suitable
> > for target-independent middle-end code in the RTL optimizers).
> Thanks.  I didn't really look at Segher's patch, so thanks for digging
> into it.  Certainly just flipping the mode of the hard register isn't
> correct.
>
>
> >
> > The underlying issue, which is applicable to all targets, is that combine.cc's
> > make_compound_operation is expected to reverse the local transformations
> > made by expand_compound_operation.  Hence, if an RTL expression is
> > canonical going into expand_compound_operation, it is expected (hoped)
> > to be canonical (and equivalent) coming out of make_compound_operation.
> In theory, correct.
>
>
> >
> > Hence, rather than be a MSP430 specific issue, no target should expect (or
> > be expected to see) a ZERO_EXTEND of a ZERO_EXTEND, or a SIGN_EXTEND
> > of a ZERO_EXTEND in the RTL stream.  Much like a binary operator with two
> > CONST_INT operands, or a shift by zero, it's something the middle-end might
> > reasonably be expected to clean-up. [Yeah, I know... 😊]
> Agreed.
>
>
>
> >
> >>> (set (reg:PSI 28 [ iD.1772 ])
> >>>       (zero_extend:PSI (zero_extend:HI (reg:QI 12 R12 [ iD.1772 ]))))
> >
> > As a rule of thumb, if the missed optimization bug report includes combine's
> > diagnostic "Failed to match this instruction:", things can be improved by adding
> > a pattern (often a define_insn_and_split) that matches the shown RTL.
> Yes, but we also need to ponder if that's the right way to fix any given
> problem.  Sometimes we're going to be better off simplifying elsewhere
> in the pipeline.  I think we can agree this is one of the cases where
> matching the RTL in the backend is not the desired approach.
>
> Occasionally things like those two patterns show up for various reasons.
>   Hopefully they can be removed :-)  Though the first looks awful close
> to something I did for the mn102 (not to be confused with the mn103)
> eons ago.  Partial modes aren't exactly handled well.
>
> >
> > In this case the perhaps reasonable assumption is that the above would/should
> > (normally) match the backend's existing (zero_extend:PSI (reg:QI ...)) insn pattern.
> > Or that's my understanding of why this PR is classified as rtl-optimization and
> > not target.
> I wouldn't put a lot of faith in the classification ;-)
>
>
> >
> > Finally, my patch shouldn't block a (updated corrected) variant of Segher's patch
> > or other solution to PR 91865.  The more (safe) solutions the merrier.
> Generally agreed.

Looking at the patch I wonder whether handling (zero_extend (zero_extend ..))
shouldn't be done by using simplify_unary_operation instead of
simplify_const_unary_operation
here?  If that's by design then I agree the patch looks reasonable (albeit ugly)
as long as the reverse still works.

But you probably need Seghers ack here.

Thanks,
Richard.


> jeff
Jeff Law Oct. 19, 2023, 3:20 p.m. UTC | #5
On 10/14/23 16:14, Roger Sayle wrote:
> 
> This patch is my proposed solution to PR rtl-optimization/91865.
> Normally RTX simplification canonicalizes a ZERO_EXTEND of a ZERO_EXTEND
> to a single ZERO_EXTEND, but as shown in this PR it is possible for
> combine's make_compound_operation to unintentionally generate a
> non-canonical ZERO_EXTEND of a ZERO_EXTEND, which is unlikely to be
> matched by the backend.
> 
> For the new test case:
> 
> const int table[2] = {1, 2};
> int foo (char i) { return table[i]; }
> 
> compiling with -O2 -mlarge on msp430 we currently see:
> 
> Trying 2 -> 7:
>      2: r25:HI=zero_extend(R12:QI)
>        REG_DEAD R12:QI
>      7: r28:PSI=sign_extend(r25:HI)#0
>        REG_DEAD r25:HI
> Failed to match this instruction:
> (set (reg:PSI 28 [ iD.1772 ])
>      (zero_extend:PSI (zero_extend:HI (reg:QI 12 R12 [ iD.1772 ]))))
> 
> which results in the following code:
> 
> foo:    AND     #0xff, R12
>          RLAM.A #4, R12 { RRAM.A #4, R12
>          RLAM.A  #1, R12
>          MOVX.W  table(R12), R12
>          RETA
> 
> With this patch, we now see:
> 
> Trying 2 -> 7:
>      2: r25:HI=zero_extend(R12:QI)
>        REG_DEAD R12:QI
>      7: r28:PSI=sign_extend(r25:HI)#0
>        REG_DEAD r25:HI
> Successfully matched this instruction:
> (set (reg:PSI 28 [ iD.1772 ])
>      (zero_extend:PSI (reg:QI 12 R12 [ iD.1772 ])))
> allowing combination of insns 2 and 7
> original costs 4 + 8 = 12
> replacement cost 8
> 
> foo:    MOV.B   R12, R12
>          RLAM.A  #1, R12
>          MOVX.W  table(R12), R12
>          RETA
> 
> 
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check, both with and without --target_board=unix{-m32}
> with no new failures.  Ok for mainline?
> 
> 2023-10-14  Roger Sayle  <roger@nextmovesoftware.com>
> 
> gcc/ChangeLog
>          PR rtl-optimization/91865
>          * combine.cc (make_compound_operation): Avoid creating a
>          ZERO_EXTEND of a ZERO_EXTEND.
Final question.  Is there a reasonable expectation that we could get a 
similar situation with sign extensions?   If so we probably ought to try 
and handle both.

OK with the obvious change to handle nested sign extensions if you think 
it's useful to do so.  And OK as-is if you don't think handling nested 
sign extensions is useful.

jeff
diff mbox series

Patch

diff --git a/gcc/combine.cc b/gcc/combine.cc
index 360aa2f25e6..f47ff596782 100644
--- a/gcc/combine.cc
+++ b/gcc/combine.cc
@@ -8453,6 +8453,9 @@  make_compound_operation (rtx x, enum rtx_code in_code)
 					    new_rtx, GET_MODE (XEXP (x, 0)));
       if (tem)
 	return tem;
+      /* Avoid creating a ZERO_EXTEND of a ZERO_EXTEND.  */
+      if (GET_CODE (new_rtx) == ZERO_EXTEND)
+	new_rtx = XEXP (new_rtx, 0);
       SUBST (XEXP (x, 0), new_rtx);
       return x;
     }
diff --git a/gcc/testsuite/gcc.target/msp430/pr91865.c b/gcc/testsuite/gcc.target/msp430/pr91865.c
new file mode 100644
index 00000000000..8cc21c8b9e8
--- /dev/null
+++ b/gcc/testsuite/gcc.target/msp430/pr91865.c
@@ -0,0 +1,8 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2 -mlarge" } */
+
+const int table[2] = {1, 2};
+int foo (char i) { return table[i]; }
+
+/* { dg-final { scan-assembler-not "AND" } } */
+/* { dg-final { scan-assembler-not "RRAM" } } */