Message ID | VI1PR08MB4029BD6710802D23EA73FF24EAAD0@VI1PR08MB4029.eurprd08.prod.outlook.com |
---|---|
State | New |
Headers | show |
Series | aarch64: prefer using csinv, csneg in zero extend contexts | expand |
Thanks for doing this. Alex Coplan <Alex.Coplan@arm.com> writes: > Hello, > > The attached patch adds an optimization to the AArch64 backend to catch > additional cases where we can use csinv and csneg. > > Given the C code: > > unsigned long long inv(unsigned a, unsigned b, unsigned c) > { > return a ? b : ~c; > } > > Prior to this patch, AArch64 GCC at -O2 generates: > > inv: > cmp w0, 0 > mvn w2, w2 > csel w0, w1, w2, ne > ret > > and after applying the patch, we get: > > inv: > cmp w0, 0 > csinv w0, w1, w2, ne > ret > > The new pattern also catches the optimization for the symmetric case where the > body of foo reads a ? ~b : c. Yeah. The thing that surprised me was that the non-extending form has the operator in the "then" arm of the if_then_else: (define_insn "*csinv3<mode>_insn" [(set (match_operand:GPI 0 "register_operand" "=r") (if_then_else:GPI (match_operand 1 "aarch64_comparison_operation" "") (not:GPI (match_operand:GPI 2 "register_operand" "r")) (match_operand:GPI 3 "aarch64_reg_or_zero" "rZ")))] "" "csinv\\t%<w>0, %<w>3, %<w>2, %M1" [(set_attr "type" "csel")] ) whereas the new pattern has it in the "else" arm. I think for the non-extending form, having the operator in the "then" arm really is canonical and close to guaranteed, since that's how ifcvt processes half diamonds. But when both values are zero-extended, ifcvt has to convert a full diamond, and I'm not sure we can rely on the two sides coming out in a particular order. I think the two ?: orders above work with one pattern thanks to simplifications that happen before entering gimple. If instead the operator is split out into a separate statement: unsigned long long inv1(int a, unsigned b, unsigned c) { unsigned d = ~c; return a ? b : d; } then we go back to using the less optimal CSEL sequence. So I think it would be worth having a second pattern for the opposite order. Alternatively, we could add a new rtl canonicalisation rule to force the if_then_else operands to end up in a particular order, but that's more complex and is likely to affect other targets. > diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md > index c7c4d1dd519..2f7367c0b1a 100644 > --- a/gcc/config/aarch64/aarch64.md > +++ b/gcc/config/aarch64/aarch64.md > @@ -4390,6 +4390,19 @@ > [(set_attr "type" "csel")] > ) > > +(define_insn "*csinv3_uxtw_insn" > + [(set (match_operand:DI 0 "register_operand" "=r") > + (if_then_else:DI > + (match_operand 1 "aarch64_comparison_operation" "") > + (zero_extend:DI > + (match_operand:SI 2 "aarch64_reg_or_zero" "rZ")) > + (zero_extend:DI > + (NEG_NOT:SI (match_operand:SI 3 "register_operand" "r")))))] > + "" > + "cs<neg_not_cs>\\t%w0, %w2, %w3, %m1" > + [(set_attr "type" "csel")] > +) > + The operand to a zero_extend can never be a const_int, so operand 2 should just be a register_operand with an "r" constraint. At the risk of feature creep :-) a useful third pattern could be to combine a zero-extended operator result with an existing DImode value. In that case, the existing DImode value really can be "rZ" and should always be in the "else" arm of the if_then_else. E.g.: unsigned long long f(int a, unsigned long b, unsigned c) { return a ? b : ~c; } unsigned long long g(int a, unsigned c) { return a ? 0 : ~c; } But that's definitely something that can be left for later. Thanks, Richard
Hi Richard, Many thanks for the detailed review. I've attached an updated patch based on your comments (bootstrapped and regtested on aarch64-linux). > -----Original Message----- > From: Richard Sandiford <richard.sandiford@arm.com> > Sent: 29 April 2020 17:16 > To: Alex Coplan <Alex.Coplan@arm.com> > Cc: gcc-patches@gcc.gnu.org; Richard Earnshaw <Richard.Earnshaw@arm.com>; > Marcus Shawcroft <Marcus.Shawcroft@arm.com>; Kyrylo Tkachov > <Kyrylo.Tkachov@arm.com>; nd <nd@arm.com> > Subject: Re: [PATCH] aarch64: prefer using csinv, csneg in zero extend > contexts > > Yeah. The thing that surprised me was that the non-extending form > has the operator in the "then" arm of the if_then_else: > > (define_insn "*csinv3<mode>_insn" > [(set (match_operand:GPI 0 "register_operand" "=r") > (if_then_else:GPI > (match_operand 1 "aarch64_comparison_operation" "") > (not:GPI (match_operand:GPI 2 "register_operand" "r")) > (match_operand:GPI 3 "aarch64_reg_or_zero" "rZ")))] > "" > "csinv\\t%<w>0, %<w>3, %<w>2, %M1" > [(set_attr "type" "csel")] > ) > > whereas the new pattern has it in the "else" arm. I think for the > non-extending form, having the operator in the "then" arm really is > canonical and close to guaranteed, since that's how ifcvt processes > half diamonds. > > But when both values are zero-extended, ifcvt has to convert a full > diamond, and I'm not sure we can rely on the two sides coming out in > a particular order. I think the two ?: orders above work with one > pattern > thanks to simplifications that happen before entering gimple. If instead > the operator is split out into a separate statement: > > unsigned long long > inv1(int a, unsigned b, unsigned c) > { > unsigned d = ~c; > return a ? b : d; > } > > then we go back to using the less optimal CSEL sequence. > > So I think it would be worth having a second pattern for the opposite > order. Agreed. It did seem a bit magical that we would get the other case for free. I added this function to the tests and observed that we were getting the less optimal CSEL sequence until I added the rule with the NEG_NOT on the other arm. > > diff --git a/gcc/config/aarch64/aarch64.md > b/gcc/config/aarch64/aarch64.md > > index c7c4d1dd519..2f7367c0b1a 100644 > > --- a/gcc/config/aarch64/aarch64.md > > +++ b/gcc/config/aarch64/aarch64.md > > @@ -4390,6 +4390,19 @@ > > [(set_attr "type" "csel")] > > ) > > > > +(define_insn "*csinv3_uxtw_insn" > > + [(set (match_operand:DI 0 "register_operand" "=r") > > + (if_then_else:DI > > + (match_operand 1 "aarch64_comparison_operation" "") > > + (zero_extend:DI > > + (match_operand:SI 2 "aarch64_reg_or_zero" "rZ")) > > + (zero_extend:DI > > + (NEG_NOT:SI (match_operand:SI 3 "register_operand" > "r")))))] > > + "" > > + "cs<neg_not_cs>\\t%w0, %w2, %w3, %m1" > > + [(set_attr "type" "csel")] > > +) > > + > > The operand to a zero_extend can never be a const_int, so operand 2 > should just be a register_operand with an "r" constraint. OK, makes sense. I've fixed this in the updated patch. > At the risk of feature creep :-) a useful third pattern could be > to combine a zero-extended operator result with an existing DImode value. > In that case, the existing DImode value really can be "rZ" and should > always be in the "else" arm of the if_then_else. E.g.: > > unsigned long long > f(int a, unsigned long b, unsigned c) > { > return a ? b : ~c; > } > > unsigned long long > g(int a, unsigned c) > { > return a ? 0 : ~c; > } > > But that's definitely something that can be left for later. Hm. What sequence would you like to see for the function f here with the argument already in DImode? I don't think we can do it with just a CMP + CSINV like in the other test cases because you end up wanting to access b as x1 and c as w2 which is not allowed. Indeed, clang generates a MVN + CSEL sequence where the CSEL operates on the 64-bit registers: f: mvn w8, w2 cmp w0, #0 csel x0, x8, x1, eq ret However, I agree that the restricted case where the else arm is (const_int 0) is a worthwhile optimization (that wasn't caught by the previous patterns due to const_ints never being zero_extended as you mentioned). I've added a pattern and tests for this case in the updated patch. Many thanks, Alex --- gcc/ChangeLog: 2020-04-30 Alex Coplan <alex.coplan@arm.com> * config/aarch64/aarch64.md (*csinv3_utxw_insn1): New. (*csinv3_uxtw_insn2): New. (*csinv3_uxtw_insn3): New. * config/aarch64/iterators.md (neg_not_cs): New. gcc/testsuite/ChangeLog: 2020-04-30 Alex Coplan <alex.coplan@arm.com> * gcc.target/aarch64/csinv-neg.c: New test. diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index c7c4d1dd519..37d651724ad 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -4390,6 +4390,45 @@ [(set_attr "type" "csel")] ) +(define_insn "*csinv3_uxtw_insn1" + [(set (match_operand:DI 0 "register_operand" "=r") + (if_then_else:DI + (match_operand 1 "aarch64_comparison_operation" "") + (zero_extend:DI + (match_operand:SI 2 "register_operand" "r")) + (zero_extend:DI + (NEG_NOT:SI (match_operand:SI 3 "register_operand" "r")))))] + "" + "cs<neg_not_cs>\\t%w0, %w2, %w3, %m1" + [(set_attr "type" "csel")] +) + +(define_insn "*csinv3_uxtw_insn2" + [(set (match_operand:DI 0 "register_operand" "=r") + (if_then_else:DI + (match_operand 1 "aarch64_comparison_operation" "") + (zero_extend:DI + (NEG_NOT:SI (match_operand:SI 2 "register_operand" "r"))) + (zero_extend:DI + (match_operand:SI 3 "register_operand" "r"))))] + "" + "cs<neg_not_cs>\\t%w0, %w3, %w2, %M1" + [(set_attr "type" "csel")] +) + +(define_insn "*csinv3_uxtw_insn3" + [(set (match_operand:DI 0 "register_operand" "=r") + (if_then_else:DI + (match_operand 1 "aarch64_comparison_operation" "") + (zero_extend:DI + (NEG_NOT:SI (match_operand:SI 2 "register_operand" "r"))) + (const_int 0)))] + "" + "cs<neg_not_cs>\\t%w0, wzr, %w2, %M1" + [(set_attr "type" "csel")] +) + + ;; If X can be loaded by a single CNT[BHWD] instruction, ;; ;; A = UMAX (B, X) diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md index 8e434389e59..a568cf21b99 100644 --- a/gcc/config/aarch64/iterators.md +++ b/gcc/config/aarch64/iterators.md @@ -1932,6 +1932,9 @@ ;; Operation names for negate and bitwise complement. (define_code_attr neg_not_op [(neg "neg") (not "not")]) +;; csinv, csneg insn suffixes. +(define_code_attr neg_not_cs [(neg "neg") (not "inv")]) + ;; Similar, but when the second operand is inverted. (define_code_attr nlogical [(and "bic") (ior "orn") (xor "eon")]) diff --git a/gcc/testsuite/gcc.target/aarch64/csinv-neg.c b/gcc/testsuite/gcc.target/aarch64/csinv-neg.c new file mode 100644 index 00000000000..cc64b4094d7 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/csinv-neg.c @@ -0,0 +1,104 @@ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ + +/* +** inv1: +** cmp w0, 0 +** csinv w0, w1, w2, ne +** ret +*/ +unsigned long long +inv1(unsigned a, unsigned b, unsigned c) +{ + return a ? b : ~c; +} + +/* +** inv1_local: +** cmp w0, 0 +** csinv w0, w1, w2, ne +** ret +*/ +unsigned long long +inv1_local(unsigned a, unsigned b, unsigned c) +{ + unsigned d = ~c; + return a ? b : d; +} + +/* +** inv_zero1: +** cmp w0, 0 +** csinv w0, wzr, w1, ne +** ret +*/ +unsigned long long +inv_zero1(unsigned a, unsigned b) +{ + return a ? 0 : ~b; +} + +/* +** inv_zero2: +** cmp w0, 0 +** csinv w0, wzr, w1, eq +** ret +*/ +unsigned long long +inv_zero2(unsigned a, unsigned b) +{ + return a ? ~b : 0; +} + + +/* +** inv2: +** cmp w0, 0 +** csinv w0, w2, w1, eq +** ret +*/ +unsigned long long +inv2(unsigned a, unsigned b, unsigned c) +{ + return a ? ~b : c; +} + +/* +** inv2_local: +** cmp w0, 0 +** csinv w0, w2, w1, eq +** ret +*/ +unsigned long long +inv2_local(unsigned a, unsigned b, unsigned c) +{ + unsigned d = ~b; + return a ? d : c; +} + +/* +** neg1: +** cmp w0, 0 +** csneg w0, w1, w2, ne +** ret +*/ +unsigned long long +neg1(unsigned a, unsigned b, unsigned c) +{ + return a ? b : -c; +} + + +/* +** neg2: +** cmp w0, 0 +** csneg w0, w2, w1, eq +** ret +*/ +unsigned long long +neg2(unsigned a, unsigned b, unsigned c) +{ + return a ? -b : c; +} + +/* { dg-final { check-function-bodies "**" "" "" } } */
Alex Coplan <Alex.Coplan@arm.com> writes: >> At the risk of feature creep :-) a useful third pattern could be >> to combine a zero-extended operator result with an existing DImode value. >> In that case, the existing DImode value really can be "rZ" and should >> always be in the "else" arm of the if_then_else. E.g.: >> >> unsigned long long >> f(int a, unsigned long b, unsigned c) >> { >> return a ? b : ~c; >> } >> >> unsigned long long >> g(int a, unsigned c) >> { >> return a ? 0 : ~c; >> } >> >> But that's definitely something that can be left for later. > > Hm. What sequence would you like to see for the function f here with the > argument already in DImode? I don't think we can do it with just a CMP + CSINV > like in the other test cases because you end up wanting to access b as x1 and c > as w2 which is not allowed. Yeah, I was hoping for something like... > Indeed, clang generates a MVN + CSEL sequence where the CSEL operates on the > 64-bit registers: > > f: > mvn w8, w2 > cmp w0, #0 > csel x0, x8, x1, eq > ret ...this rather than the 4-insn (+ret) sequence that we currently generate. So it would have been a define_insn_and_split that handles the zero case directly but splits into the "optimal" two-instruction sequence for registers. But I guess the underlying problem is instead that we don't have a pattern for (zero_extend:DI (not:SI ...)). Adding that would definitely be a better fix. > However, I agree that the restricted case where the else arm is (const_int 0) is > a worthwhile optimization (that wasn't caught by the previous patterns due to > const_ints never being zero_extended as you mentioned). I've added a pattern and > tests for this case in the updated patch. Looks good, just a couple of minor things: > 2020-04-30 Alex Coplan <alex.coplan@arm.com> > > * config/aarch64/aarch64.md (*csinv3_utxw_insn1): New. > (*csinv3_uxtw_insn2): New. > (*csinv3_uxtw_insn3): New. ChangeLog trivia, but these last two lines should only be indented by a tab. Hopefully one day we'll finally ditch this format and stop having to quibble over such minor formatting stuff... > diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md > index c7c4d1dd519..37d651724ad 100644 > --- a/gcc/config/aarch64/aarch64.md > +++ b/gcc/config/aarch64/aarch64.md > @@ -4390,6 +4390,45 @@ > [(set_attr "type" "csel")] > ) > > +(define_insn "*csinv3_uxtw_insn1" > + [(set (match_operand:DI 0 "register_operand" "=r") > + (if_then_else:DI > + (match_operand 1 "aarch64_comparison_operation" "") > + (zero_extend:DI > + (match_operand:SI 2 "register_operand" "r")) > + (zero_extend:DI > + (NEG_NOT:SI (match_operand:SI 3 "register_operand" "r")))))] > + "" > + "cs<neg_not_cs>\\t%w0, %w2, %w3, %m1" > + [(set_attr "type" "csel")] > +) > + > +(define_insn "*csinv3_uxtw_insn2" > + [(set (match_operand:DI 0 "register_operand" "=r") > + (if_then_else:DI > + (match_operand 1 "aarch64_comparison_operation" "") > + (zero_extend:DI > + (NEG_NOT:SI (match_operand:SI 2 "register_operand" "r"))) > + (zero_extend:DI > + (match_operand:SI 3 "register_operand" "r"))))] > + "" > + "cs<neg_not_cs>\\t%w0, %w3, %w2, %M1" > + [(set_attr "type" "csel")] > +) > + > +(define_insn "*csinv3_uxtw_insn3" > + [(set (match_operand:DI 0 "register_operand" "=r") > + (if_then_else:DI > + (match_operand 1 "aarch64_comparison_operation" "") > + (zero_extend:DI > + (NEG_NOT:SI (match_operand:SI 2 "register_operand" "r"))) > + (const_int 0)))] > + "" > + "cs<neg_not_cs>\\t%w0, wzr, %w2, %M1" > + [(set_attr "type" "csel")] > +) > + > + Usually there's just one blank line between patterns, even if the patterns aren't naturally grouped. No need to repost just for that. I'll push with those changes once stage 1 opens. Thanks, Richard
> -----Original Message----- > From: Richard Sandiford <richard.sandiford@arm.com> > Sent: 30 April 2020 15:13 > To: Alex Coplan <Alex.Coplan@arm.com> > Cc: gcc-patches@gcc.gnu.org; Richard Earnshaw <Richard.Earnshaw@arm.com>; > Marcus Shawcroft <Marcus.Shawcroft@arm.com>; Kyrylo Tkachov > <Kyrylo.Tkachov@arm.com>; nd <nd@arm.com> > Subject: Re: [PATCH] aarch64: prefer using csinv, csneg in zero extend contexts > > Yeah, I was hoping for something like... > > > Indeed, clang generates a MVN + CSEL sequence where the CSEL operates on the > > 64-bit registers: > > > > f: > > mvn w8, w2 > > cmp w0, #0 > > csel x0, x8, x1, eq > > ret > > ...this rather than the 4-insn (+ret) sequence that we currently > generate. So it would have been a define_insn_and_split that handles > the zero case directly but splits into the "optimal" two-instruction > sequence for registers. > > But I guess the underlying problem is instead that we don't have > a pattern for (zero_extend:DI (not:SI ...)). Adding that would > definitely be a better fix. Yes. I sent a patch for this very fix which Kyrill is going to commit once stage 1 opens: https://gcc.gnu.org/pipermail/gcc-patches/2020-April/544365.html I tried compiling this function with that patch applied and we get: f: cmp w0, 0 mvn w2, w2 csel x0, x2, x1, eq ret which is good. > ChangeLog trivia, but these last two lines should only be indented by a tab. Ah, thanks. Noted for next time. > > Hopefully one day we'll finally ditch this format and stop having to > quibble over such minor formatting stuff... That would be good! > > > diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md > > index c7c4d1dd519..37d651724ad 100644 > > --- a/gcc/config/aarch64/aarch64.md > > +++ b/gcc/config/aarch64/aarch64.md > > @@ -4390,6 +4390,45 @@ > > [(set_attr "type" "csel")] > > ) > > > > +(define_insn "*csinv3_uxtw_insn1" > > + [(set (match_operand:DI 0 "register_operand" "=r") > > + (if_then_else:DI > > + (match_operand 1 "aarch64_comparison_operation" "") > > + (zero_extend:DI > > + (match_operand:SI 2 "register_operand" "r")) > > + (zero_extend:DI > > + (NEG_NOT:SI (match_operand:SI 3 "register_operand" "r")))))] > > + "" > > + "cs<neg_not_cs>\\t%w0, %w2, %w3, %m1" > > + [(set_attr "type" "csel")] > > +) > > + > > +(define_insn "*csinv3_uxtw_insn2" > > + [(set (match_operand:DI 0 "register_operand" "=r") > > + (if_then_else:DI > > + (match_operand 1 "aarch64_comparison_operation" "") > > + (zero_extend:DI > > + (NEG_NOT:SI (match_operand:SI 2 "register_operand" "r"))) > > + (zero_extend:DI > > + (match_operand:SI 3 "register_operand" "r"))))] > > + "" > > + "cs<neg_not_cs>\\t%w0, %w3, %w2, %M1" > > + [(set_attr "type" "csel")] > > +) > > + > > +(define_insn "*csinv3_uxtw_insn3" > > + [(set (match_operand:DI 0 "register_operand" "=r") > > + (if_then_else:DI > > + (match_operand 1 "aarch64_comparison_operation" "") > > + (zero_extend:DI > > + (NEG_NOT:SI (match_operand:SI 2 "register_operand" "r"))) > > + (const_int 0)))] > > + "" > > + "cs<neg_not_cs>\\t%w0, wzr, %w2, %M1" > > + [(set_attr "type" "csel")] > > +) > > + > > + > > Usually there's just one blank line between patterns, even if the > patterns aren't naturally grouped. Ok, good to know. > > No need to repost just for that. I'll push with those changes once > stage 1 opens. Great! Thanks, Alex
Alex Coplan <Alex.Coplan@arm.com> writes: >> -----Original Message----- >> From: Richard Sandiford <richard.sandiford@arm.com> >> Sent: 30 April 2020 15:13 >> To: Alex Coplan <Alex.Coplan@arm.com> >> Cc: gcc-patches@gcc.gnu.org; Richard Earnshaw <Richard.Earnshaw@arm.com>; >> Marcus Shawcroft <Marcus.Shawcroft@arm.com>; Kyrylo Tkachov >> <Kyrylo.Tkachov@arm.com>; nd <nd@arm.com> >> Subject: Re: [PATCH] aarch64: prefer using csinv, csneg in zero extend contexts >> >> Yeah, I was hoping for something like... >> >> > Indeed, clang generates a MVN + CSEL sequence where the CSEL operates on the >> > 64-bit registers: >> > >> > f: >> > mvn w8, w2 >> > cmp w0, #0 >> > csel x0, x8, x1, eq >> > ret >> >> ...this rather than the 4-insn (+ret) sequence that we currently >> generate. So it would have been a define_insn_and_split that handles >> the zero case directly but splits into the "optimal" two-instruction >> sequence for registers. >> >> But I guess the underlying problem is instead that we don't have >> a pattern for (zero_extend:DI (not:SI ...)). Adding that would >> definitely be a better fix. > > Yes. I sent a patch for this very fix which Kyrill is going to commit once stage > 1 opens: https://gcc.gnu.org/pipermail/gcc-patches/2020-April/544365.html Sorry, missed that. It looks like that patch hinders this one though. Trying it with current master (where that patch is applied), I get: FAIL: gcc.target/aarch64/csinv-neg.c check-function-bodies inv_zero1 FAIL: gcc.target/aarch64/csinv-neg.c check-function-bodies inv_zero2 It looks like a costs issue: Trying 27 -> 18: 27: r99:DI=zero_extend(~r101:SI) REG_DEAD r101:SI 18: x0:DI={(cc:CC==0)?r99:DI:0} REG_DEAD cc:CC REG_DEAD r99:DI Successfully matched this instruction: (set (reg/i:DI 0 x0) (if_then_else:DI (eq (reg:CC 66 cc) (const_int 0 [0])) (zero_extend:DI (not:SI (reg:SI 101))) (const_int 0 [0]))) rejecting combination of insns 27 and 18 original costs 4 + 4 = 8 replacement cost 12 I guess we'll need to teach aarch64_if_then_else_costs about the costs of the new insns. Thanks, Richard
> -----Original Message----- > From: Richard Sandiford <richard.sandiford@arm.com> > Sent: 06 May 2020 11:28 > To: Alex Coplan <Alex.Coplan@arm.com> > Cc: gcc-patches@gcc.gnu.org; Richard Earnshaw <Richard.Earnshaw@arm.com>; > Marcus Shawcroft <Marcus.Shawcroft@arm.com>; Kyrylo Tkachov > <Kyrylo.Tkachov@arm.com>; nd <nd@arm.com> > Subject: Re: [PATCH] aarch64: prefer using csinv, csneg in zero extend > contexts > > Alex Coplan <Alex.Coplan@arm.com> writes: > >> -----Original Message----- > >> From: Richard Sandiford <richard.sandiford@arm.com> > >> Sent: 30 April 2020 15:13 > >> To: Alex Coplan <Alex.Coplan@arm.com> > >> Cc: gcc-patches@gcc.gnu.org; Richard Earnshaw > <Richard.Earnshaw@arm.com>; > >> Marcus Shawcroft <Marcus.Shawcroft@arm.com>; Kyrylo Tkachov > >> <Kyrylo.Tkachov@arm.com>; nd <nd@arm.com> > >> Subject: Re: [PATCH] aarch64: prefer using csinv, csneg in zero extend > contexts > >> > >> Yeah, I was hoping for something like... > >> > >> > Indeed, clang generates a MVN + CSEL sequence where the CSEL > operates on the > >> > 64-bit registers: > >> > > >> > f: > >> > mvn w8, w2 > >> > cmp w0, #0 > >> > csel x0, x8, x1, eq > >> > ret > >> > >> ...this rather than the 4-insn (+ret) sequence that we currently > >> generate. So it would have been a define_insn_and_split that handles > >> the zero case directly but splits into the "optimal" two-instruction > >> sequence for registers. > >> > >> But I guess the underlying problem is instead that we don't have > >> a pattern for (zero_extend:DI (not:SI ...)). Adding that would > >> definitely be a better fix. > > > > Yes. I sent a patch for this very fix which Kyrill is going to commit > once stage > > 1 opens: https://gcc.gnu.org/pipermail/gcc-patches/2020- > April/544365.html > > Sorry, missed that. > > It looks like that patch hinders this one though. Trying it with > current master (where that patch is applied), I get: > > FAIL: gcc.target/aarch64/csinv-neg.c check-function-bodies inv_zero1 > FAIL: gcc.target/aarch64/csinv-neg.c check-function-bodies inv_zero2 > > It looks like a costs issue: > > Trying 27 -> 18: > 27: r99:DI=zero_extend(~r101:SI) > REG_DEAD r101:SI > 18: x0:DI={(cc:CC==0)?r99:DI:0} > REG_DEAD cc:CC > REG_DEAD r99:DI > Successfully matched this instruction: > (set (reg/i:DI 0 x0) > (if_then_else:DI (eq (reg:CC 66 cc) > (const_int 0 [0])) > (zero_extend:DI (not:SI (reg:SI 101))) > (const_int 0 [0]))) > rejecting combination of insns 27 and 18 > original costs 4 + 4 = 8 > replacement cost 12 > > I guess we'll need to teach aarch64_if_then_else_costs about the costs > of the new insns. Ah, thanks for catching this. I've attached an updated patch which fixes the costs issue here. With the new patch, all the test cases in csinv-neg.c now pass. In addition, I've done a bootstrap and regtest on aarch64-linux with no new failures. Do you think we need to add cases to aarch64_if_then_else_costs for the other new insns, or is the attached patch OK for master? Thanks, Alex --- gcc/ChangeLog: 2020-05-07 Alex Coplan <alex.coplan@arm.com> * config/aarch64/aarch64.c (aarch64_if_then_else_costs): Add case to correctly calculate cost for new pattern (*csinv3_uxtw_insn3). * config/aarch64/aarch64.md (*csinv3_utxw_insn1): New. (*csinv3_uxtw_insn2): New. (*csinv3_uxtw_insn3): New. * config/aarch64/iterators.md (neg_not_cs): New. gcc/testsuite/ChangeLog: 2020-05-07 Alex Coplan <alex.coplan@arm.com> * gcc.target/aarch64/csinv-neg.c: New test. diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index e92c7e69fcb..efb3da7a7fc 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -11695,6 +11695,15 @@ aarch64_if_then_else_costs (rtx op0, rtx op1, rtx op2, int *cost, bool speed) op1 = XEXP (op1, 0); op2 = XEXP (op2, 0); } + else if (GET_CODE (op1) == ZERO_EXTEND && op2 == const0_rtx) + { + inner = XEXP (op1, 0); + if (GET_CODE (inner) == NEG || GET_CODE (inner) == NOT) + { + /* CSINV/NEG with zero extend + const 0 (*csinv3_uxtw_insn3). */ + op1 = XEXP (inner, 0); + } + } *cost += rtx_cost (op1, VOIDmode, IF_THEN_ELSE, 1, speed); *cost += rtx_cost (op2, VOIDmode, IF_THEN_ELSE, 2, speed); diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index ff15505d455..b2cfd015530 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -4391,6 +4391,44 @@ [(set_attr "type" "csel")] ) +(define_insn "*csinv3_uxtw_insn1" + [(set (match_operand:DI 0 "register_operand" "=r") + (if_then_else:DI + (match_operand 1 "aarch64_comparison_operation" "") + (zero_extend:DI + (match_operand:SI 2 "register_operand" "r")) + (zero_extend:DI + (NEG_NOT:SI (match_operand:SI 3 "register_operand" "r")))))] + "" + "cs<neg_not_cs>\\t%w0, %w2, %w3, %m1" + [(set_attr "type" "csel")] +) + +(define_insn "*csinv3_uxtw_insn2" + [(set (match_operand:DI 0 "register_operand" "=r") + (if_then_else:DI + (match_operand 1 "aarch64_comparison_operation" "") + (zero_extend:DI + (NEG_NOT:SI (match_operand:SI 2 "register_operand" "r"))) + (zero_extend:DI + (match_operand:SI 3 "register_operand" "r"))))] + "" + "cs<neg_not_cs>\\t%w0, %w3, %w2, %M1" + [(set_attr "type" "csel")] +) + +(define_insn "*csinv3_uxtw_insn3" + [(set (match_operand:DI 0 "register_operand" "=r") + (if_then_else:DI + (match_operand 1 "aarch64_comparison_operation" "") + (zero_extend:DI + (NEG_NOT:SI (match_operand:SI 2 "register_operand" "r"))) + (const_int 0)))] + "" + "cs<neg_not_cs>\\t%w0, wzr, %w2, %M1" + [(set_attr "type" "csel")] +) + ;; If X can be loaded by a single CNT[BHWD] instruction, ;; ;; A = UMAX (B, X) diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md index 8e434389e59..a568cf21b99 100644 --- a/gcc/config/aarch64/iterators.md +++ b/gcc/config/aarch64/iterators.md @@ -1932,6 +1932,9 @@ ;; Operation names for negate and bitwise complement. (define_code_attr neg_not_op [(neg "neg") (not "not")]) +;; csinv, csneg insn suffixes. +(define_code_attr neg_not_cs [(neg "neg") (not "inv")]) + ;; Similar, but when the second operand is inverted. (define_code_attr nlogical [(and "bic") (ior "orn") (xor "eon")]) diff --git a/gcc/testsuite/gcc.target/aarch64/csinv-neg.c b/gcc/testsuite/gcc.target/aarch64/csinv-neg.c new file mode 100644 index 00000000000..cc64b4094d7 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/csinv-neg.c @@ -0,0 +1,104 @@ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ + +/* +** inv1: +** cmp w0, 0 +** csinv w0, w1, w2, ne +** ret +*/ +unsigned long long +inv1(unsigned a, unsigned b, unsigned c) +{ + return a ? b : ~c; +} + +/* +** inv1_local: +** cmp w0, 0 +** csinv w0, w1, w2, ne +** ret +*/ +unsigned long long +inv1_local(unsigned a, unsigned b, unsigned c) +{ + unsigned d = ~c; + return a ? b : d; +} + +/* +** inv_zero1: +** cmp w0, 0 +** csinv w0, wzr, w1, ne +** ret +*/ +unsigned long long +inv_zero1(unsigned a, unsigned b) +{ + return a ? 0 : ~b; +} + +/* +** inv_zero2: +** cmp w0, 0 +** csinv w0, wzr, w1, eq +** ret +*/ +unsigned long long +inv_zero2(unsigned a, unsigned b) +{ + return a ? ~b : 0; +} + + +/* +** inv2: +** cmp w0, 0 +** csinv w0, w2, w1, eq +** ret +*/ +unsigned long long +inv2(unsigned a, unsigned b, unsigned c) +{ + return a ? ~b : c; +} + +/* +** inv2_local: +** cmp w0, 0 +** csinv w0, w2, w1, eq +** ret +*/ +unsigned long long +inv2_local(unsigned a, unsigned b, unsigned c) +{ + unsigned d = ~b; + return a ? d : c; +} + +/* +** neg1: +** cmp w0, 0 +** csneg w0, w1, w2, ne +** ret +*/ +unsigned long long +neg1(unsigned a, unsigned b, unsigned c) +{ + return a ? b : -c; +} + + +/* +** neg2: +** cmp w0, 0 +** csneg w0, w2, w1, eq +** ret +*/ +unsigned long long +neg2(unsigned a, unsigned b, unsigned c) +{ + return a ? -b : c; +} + +/* { dg-final { check-function-bodies "**" "" "" } } */
Alex Coplan <Alex.Coplan@arm.com> writes: >> -----Original Message----- >> From: Richard Sandiford <richard.sandiford@arm.com> >> Sent: 06 May 2020 11:28 >> To: Alex Coplan <Alex.Coplan@arm.com> >> Cc: gcc-patches@gcc.gnu.org; Richard Earnshaw <Richard.Earnshaw@arm.com>; >> Marcus Shawcroft <Marcus.Shawcroft@arm.com>; Kyrylo Tkachov >> <Kyrylo.Tkachov@arm.com>; nd <nd@arm.com> >> Subject: Re: [PATCH] aarch64: prefer using csinv, csneg in zero extend >> contexts >> >> Alex Coplan <Alex.Coplan@arm.com> writes: >> >> -----Original Message----- >> >> From: Richard Sandiford <richard.sandiford@arm.com> >> >> Sent: 30 April 2020 15:13 >> >> To: Alex Coplan <Alex.Coplan@arm.com> >> >> Cc: gcc-patches@gcc.gnu.org; Richard Earnshaw >> <Richard.Earnshaw@arm.com>; >> >> Marcus Shawcroft <Marcus.Shawcroft@arm.com>; Kyrylo Tkachov >> >> <Kyrylo.Tkachov@arm.com>; nd <nd@arm.com> >> >> Subject: Re: [PATCH] aarch64: prefer using csinv, csneg in zero extend >> contexts >> >> >> >> Yeah, I was hoping for something like... >> >> >> >> > Indeed, clang generates a MVN + CSEL sequence where the CSEL >> operates on the >> >> > 64-bit registers: >> >> > >> >> > f: >> >> > mvn w8, w2 >> >> > cmp w0, #0 >> >> > csel x0, x8, x1, eq >> >> > ret >> >> >> >> ...this rather than the 4-insn (+ret) sequence that we currently >> >> generate. So it would have been a define_insn_and_split that handles >> >> the zero case directly but splits into the "optimal" two-instruction >> >> sequence for registers. >> >> >> >> But I guess the underlying problem is instead that we don't have >> >> a pattern for (zero_extend:DI (not:SI ...)). Adding that would >> >> definitely be a better fix. >> > >> > Yes. I sent a patch for this very fix which Kyrill is going to commit >> once stage >> > 1 opens: https://gcc.gnu.org/pipermail/gcc-patches/2020- >> April/544365.html >> >> Sorry, missed that. >> >> It looks like that patch hinders this one though. Trying it with >> current master (where that patch is applied), I get: >> >> FAIL: gcc.target/aarch64/csinv-neg.c check-function-bodies inv_zero1 >> FAIL: gcc.target/aarch64/csinv-neg.c check-function-bodies inv_zero2 >> >> It looks like a costs issue: >> >> Trying 27 -> 18: >> 27: r99:DI=zero_extend(~r101:SI) >> REG_DEAD r101:SI >> 18: x0:DI={(cc:CC==0)?r99:DI:0} >> REG_DEAD cc:CC >> REG_DEAD r99:DI >> Successfully matched this instruction: >> (set (reg/i:DI 0 x0) >> (if_then_else:DI (eq (reg:CC 66 cc) >> (const_int 0 [0])) >> (zero_extend:DI (not:SI (reg:SI 101))) >> (const_int 0 [0]))) >> rejecting combination of insns 27 and 18 >> original costs 4 + 4 = 8 >> replacement cost 12 >> >> I guess we'll need to teach aarch64_if_then_else_costs about the costs >> of the new insns. > > Ah, thanks for catching this. I've attached an updated patch which fixes the > costs issue here. With the new patch, all the test cases in csinv-neg.c now > pass. In addition, I've done a bootstrap and regtest on aarch64-linux with no > new failures. > > Do you think we need to add cases to aarch64_if_then_else_costs for the other > new insns, or is the attached patch OK for master? Looks good as-is, thanks. Just a couple of very minor nits: > 2020-05-07 Alex Coplan <alex.coplan@arm.com> > > * config/aarch64/aarch64.c (aarch64_if_then_else_costs): Add case to correctly > calculate cost for new pattern (*csinv3_uxtw_insn3). ChangeLog lines follow the 80-character limit. > diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c > index e92c7e69fcb..efb3da7a7fc 100644 > --- a/gcc/config/aarch64/aarch64.c > +++ b/gcc/config/aarch64/aarch64.c > @@ -11695,6 +11695,15 @@ aarch64_if_then_else_costs (rtx op0, rtx op1, rtx op2, int *cost, bool speed) > op1 = XEXP (op1, 0); > op2 = XEXP (op2, 0); > } > + else if (GET_CODE (op1) == ZERO_EXTEND && op2 == const0_rtx) > + { > + inner = XEXP (op1, 0); > + if (GET_CODE (inner) == NEG || GET_CODE (inner) == NOT) > + { > + /* CSINV/NEG with zero extend + const 0 (*csinv3_uxtw_insn3). */ > + op1 = XEXP (inner, 0); > + } GCC style is to avoid { ... } around single statements, unless it's needed to avoid an ambiguity. If we did use { ... }, the block should be indented by two further spaces, so that the "{" is under the space after "if". Pushed with those changes, thanks. Richard
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index c7c4d1dd519..2f7367c0b1a 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -4390,6 +4390,19 @@ [(set_attr "type" "csel")] ) +(define_insn "*csinv3_uxtw_insn" + [(set (match_operand:DI 0 "register_operand" "=r") + (if_then_else:DI + (match_operand 1 "aarch64_comparison_operation" "") + (zero_extend:DI + (match_operand:SI 2 "aarch64_reg_or_zero" "rZ")) + (zero_extend:DI + (NEG_NOT:SI (match_operand:SI 3 "register_operand" "r")))))] + "" + "cs<neg_not_cs>\\t%w0, %w2, %w3, %m1" + [(set_attr "type" "csel")] +) + ;; If X can be loaded by a single CNT[BHWD] instruction, ;; ;; A = UMAX (B, X) diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md index 8e434389e59..a568cf21b99 100644 --- a/gcc/config/aarch64/iterators.md +++ b/gcc/config/aarch64/iterators.md @@ -1932,6 +1932,9 @@ ;; Operation names for negate and bitwise complement. (define_code_attr neg_not_op [(neg "neg") (not "not")]) +;; csinv, csneg insn suffixes. +(define_code_attr neg_not_cs [(neg "neg") (not "inv")]) + ;; Similar, but when the second operand is inverted. (define_code_attr nlogical [(and "bic") (ior "orn") (xor "eon")]) diff --git a/gcc/testsuite/gcc.target/aarch64/csinv-neg.c b/gcc/testsuite/gcc.target/aarch64/csinv-neg.c new file mode 100644 index 00000000000..4636f3e0906 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/csinv-neg.c @@ -0,0 +1,54 @@ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ + +/* +** inv1: +** cmp w0, 0 +** csinv w0, w1, w2, ne +** ret +*/ +unsigned long long +inv1(unsigned a, unsigned b, unsigned c) +{ + return a ? b : ~c; +} + +/* +** inv2: +** cmp w0, 0 +** csinv w0, w2, w1, eq +** ret +*/ +unsigned long long +inv2(unsigned a, unsigned b, unsigned c) +{ + return a ? ~b : c; +} + +/* +** neg1: +** cmp w0, 0 +** csneg w0, w1, w2, ne +** ret +*/ +unsigned long long +neg1(unsigned a, unsigned b, unsigned c) +{ + return a ? b : -c; +} + + +/* +** neg2: +** cmp w0, 0 +** csneg w0, w2, w1, eq +** ret +*/ +unsigned long long +neg2(unsigned a, unsigned b, unsigned c) +{ + return a ? -b : c; +} + + +/* { dg-final { check-function-bodies "**" "" "" } } */