Message ID | 20231019040519.2655598-1-pinskia@gmail.com |
---|---|
State | New |
Headers | show |
Series | aarch64: [PR110986] Emit csinv again for `a ? ~b : b` | expand |
Andrew Pinski <pinskia@gmail.com> writes: > After r14-3110-g7fb65f10285, the canonical form for > `a ? ~b : b` changed to be `-(a) ^ b` that means > for aarch64 we need to add a few new insn patterns > to be able to catch this and change it to be > what is the canonical form for the aarch64 backend. > A secondary pattern was needed to support a zero_extended > form too; this adds a testcase for all 3 cases. From the comment in the patch, it sounds like we don't really have a target-independent canonical form. That is, we can't just rewrite the old pattern to use the new form. It would be nice there was a canonical form, but I won't push it. > Bootstrapped and tested on aarch64-linux-gnu with no regressions. > > PR target/110986 > > gcc/ChangeLog: > > * config/aarch64/aarch64.md (*cmov<mode>_insn_insv): New pattern. > (*cmov_uxtw_insn_insv): Likewise. > > gcc/testsuite/ChangeLog: > > * gcc.target/aarch64/cond_op-1.c: New test. > --- > gcc/config/aarch64/aarch64.md | 46 ++++++++++++++++++++ > gcc/testsuite/gcc.target/aarch64/cond_op-1.c | 20 +++++++++ > 2 files changed, 66 insertions(+) > create mode 100644 gcc/testsuite/gcc.target/aarch64/cond_op-1.c > > diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md > index 32c7adc8928..59cd0415937 100644 > --- a/gcc/config/aarch64/aarch64.md > +++ b/gcc/config/aarch64/aarch64.md > @@ -4413,6 +4413,52 @@ (define_insn "*csinv3_uxtw_insn3" > [(set_attr "type" "csel")] > ) > > +;; There are two canonical forms for `cmp ? ~a : a`. > +;; This is the second form and is here to help combine. > +;; Support `-(cmp) ^ a` into `cmp ? ~a : a` > +;; The second pattern is to support the zero extend'ed version. > + > +(define_insn_and_split "*cmov<mode>_insn_insv" > + [(set (match_operand:GPI 0 "register_operand" "=r") > + (xor:GPI > + (neg:GPI > + (match_operator:GPI 1 "aarch64_comparison_operator" > + [(match_operand 2 "cc_register" "") (const_int 0)])) > + (match_operand:GPI 3 "general_operand" "r")))] > + "can_create_pseudo_p ()" > + "#" > + "&& true" IMO this is an ICE trap, since it hard-codes the assumption that there will be a split pass after the last pre-LRA call to recog. I think we should jsut provide the asm directly instead. Looks good otherwise, thanks. Richard > + [(set (match_dup 0) > + (if_then_else:GPI (match_dup 1) > + (not:GPI (match_dup 3)) > + (match_dup 3)))] > + { > + operands[3] = force_reg (<MODE>mode, operands[3]); > + } > + [(set_attr "type" "csel")] > +) > + > +(define_insn_and_split "*cmov_uxtw_insn_insv" > + [(set (match_operand:DI 0 "register_operand" "=r") > + (zero_extend:DI > + (xor:SI > + (neg:SI > + (match_operator:SI 1 "aarch64_comparison_operator" > + [(match_operand 2 "cc_register" "") (const_int 0)])) > + (match_operand:SI 3 "general_operand" "r"))))] > + "can_create_pseudo_p ()" > + "#" > + "&& true" > + [(set (match_dup 0) > + (if_then_else:DI (match_dup 1) > + (zero_extend:DI (not:SI (match_dup 3))) > + (zero_extend:DI (match_dup 3))))] > + { > + operands[3] = force_reg (SImode, operands[3]); > + } > + [(set_attr "type" "csel")] > +) > + > ;; If X can be loaded by a single CNT[BHWD] instruction, > ;; > ;; A = UMAX (B, X) > diff --git a/gcc/testsuite/gcc.target/aarch64/cond_op-1.c b/gcc/testsuite/gcc.target/aarch64/cond_op-1.c > new file mode 100644 > index 00000000000..e6c7821127e > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/cond_op-1.c > @@ -0,0 +1,20 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O2" } */ > +/* PR target/110986 */ > + > + > +long long full(unsigned a, unsigned b) > +{ > + return a ? ~b : b; > +} > +unsigned fuu(unsigned a, unsigned b) > +{ > + return a ? ~b : b; > +} > +long long fllll(unsigned long long a, unsigned long long b) > +{ > + return a ? ~b : b; > +} > + > +/* { dg-final { scan-assembler-times "csinv\tw\[0-9\]*" 2 } } */ > +/* { dg-final { scan-assembler-times "csinv\tx\[0-9\]*" 1 } } */
On 20/10/2023 13:13, Richard Sandiford wrote: >> +(define_insn_and_split "*cmov<mode>_insn_insv" >> + [(set (match_operand:GPI 0 "register_operand" "=r") >> + (xor:GPI >> + (neg:GPI >> + (match_operator:GPI 1 "aarch64_comparison_operator" >> + [(match_operand 2 "cc_register" "") (const_int 0)])) >> + (match_operand:GPI 3 "general_operand" "r")))] >> + "can_create_pseudo_p ()" >> + "#" >> + "&& true" > > IMO this is an ICE trap, since it hard-codes the assumption that there > will be a split pass after the last pre-LRA call to recog. I think we > should jsut provide the asm directly instead. So why not add (clobber (match_operand:GPI 4 "register_operand" "=&r")) to the insn, then you'll always get the scratch needed and the need to check cane_create_pseudo_p goes away. R.
Richard Earnshaw <Richard.Earnshaw@foss.arm.com> writes: > On 20/10/2023 13:13, Richard Sandiford wrote: >>> +(define_insn_and_split "*cmov<mode>_insn_insv" >>> + [(set (match_operand:GPI 0 "register_operand" "=r") >>> + (xor:GPI >>> + (neg:GPI >>> + (match_operator:GPI 1 "aarch64_comparison_operator" >>> + [(match_operand 2 "cc_register" "") (const_int 0)])) >>> + (match_operand:GPI 3 "general_operand" "r")))] >>> + "can_create_pseudo_p ()" >>> + "#" >>> + "&& true" > > >> IMO this is an ICE trap, since it hard-codes the assumption that there >> will be a split pass after the last pre-LRA call to recog. I think we >> should jsut provide the asm directly instead. > > So why not add > > (clobber (match_operand:GPI 4 "register_operand" "=&r")) > > to the insn, then you'll always get the scratch needed and the need to > check cane_create_pseudo_p goes away. I think the "general_operand" "r" works in terms of ensuring that the source is a GPR. So we shouldn't need a separate clobber. Our off-list discussion made me realise that my concern above wasn't very clear. In principle, it should be possible for any pass to clear INSN_CODE and then rerecognise the pattern using recog. So I think it's wrong (or at least dangerous) for insns to require can_create_pseudo_p. It means that an insn starts out valid and suddenly becomes invalid half way through RTL compilation. But looking at it again, the patch seems correct with just the can_create_pseudo_p conditions removed. The constraints seem to satisfy what csinv requires, and the force_reg should be a no-op after RA. So the patch is OK with just the can_create_pseudo_p tests removed. Sorry for the run-around, and thanks for pushing back. Richard
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index 32c7adc8928..59cd0415937 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -4413,6 +4413,52 @@ (define_insn "*csinv3_uxtw_insn3" [(set_attr "type" "csel")] ) +;; There are two canonical forms for `cmp ? ~a : a`. +;; This is the second form and is here to help combine. +;; Support `-(cmp) ^ a` into `cmp ? ~a : a` +;; The second pattern is to support the zero extend'ed version. + +(define_insn_and_split "*cmov<mode>_insn_insv" + [(set (match_operand:GPI 0 "register_operand" "=r") + (xor:GPI + (neg:GPI + (match_operator:GPI 1 "aarch64_comparison_operator" + [(match_operand 2 "cc_register" "") (const_int 0)])) + (match_operand:GPI 3 "general_operand" "r")))] + "can_create_pseudo_p ()" + "#" + "&& true" + [(set (match_dup 0) + (if_then_else:GPI (match_dup 1) + (not:GPI (match_dup 3)) + (match_dup 3)))] + { + operands[3] = force_reg (<MODE>mode, operands[3]); + } + [(set_attr "type" "csel")] +) + +(define_insn_and_split "*cmov_uxtw_insn_insv" + [(set (match_operand:DI 0 "register_operand" "=r") + (zero_extend:DI + (xor:SI + (neg:SI + (match_operator:SI 1 "aarch64_comparison_operator" + [(match_operand 2 "cc_register" "") (const_int 0)])) + (match_operand:SI 3 "general_operand" "r"))))] + "can_create_pseudo_p ()" + "#" + "&& true" + [(set (match_dup 0) + (if_then_else:DI (match_dup 1) + (zero_extend:DI (not:SI (match_dup 3))) + (zero_extend:DI (match_dup 3))))] + { + operands[3] = force_reg (SImode, operands[3]); + } + [(set_attr "type" "csel")] +) + ;; If X can be loaded by a single CNT[BHWD] instruction, ;; ;; A = UMAX (B, X) diff --git a/gcc/testsuite/gcc.target/aarch64/cond_op-1.c b/gcc/testsuite/gcc.target/aarch64/cond_op-1.c new file mode 100644 index 00000000000..e6c7821127e --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/cond_op-1.c @@ -0,0 +1,20 @@ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ +/* PR target/110986 */ + + +long long full(unsigned a, unsigned b) +{ + return a ? ~b : b; +} +unsigned fuu(unsigned a, unsigned b) +{ + return a ? ~b : b; +} +long long fllll(unsigned long long a, unsigned long long b) +{ + return a ? ~b : b; +} + +/* { dg-final { scan-assembler-times "csinv\tw\[0-9\]*" 2 } } */ +/* { dg-final { scan-assembler-times "csinv\tx\[0-9\]*" 1 } } */