diff mbox series

aarch64: prefer using csinv, csneg in zero extend contexts

Message ID VI1PR08MB4029BD6710802D23EA73FF24EAAD0@VI1PR08MB4029.eurprd08.prod.outlook.com
State New
Headers show
Series aarch64: prefer using csinv, csneg in zero extend contexts | expand

Commit Message

Alex Coplan April 29, 2020, 7:30 a.m. UTC
Hello,

The attached patch adds an optimization to the AArch64 backend to catch
additional cases where we can use csinv and csneg.

Given the C code:

unsigned long long inv(unsigned a, unsigned b, unsigned c)
{
  return a ? b : ~c;
}

Prior to this patch, AArch64 GCC at -O2 generates:

inv:
        cmp     w0, 0
        mvn     w2, w2
        csel    w0, w1, w2, ne
        ret

and after applying the patch, we get:

inv:
        cmp     w0, 0
        csinv   w0, w1, w2, ne
        ret

The new pattern also catches the optimization for the symmetric case where the
body of foo reads a ? ~b : c.

Similarly, with the following code:

unsigned long long neg(unsigned a, unsigned b, unsigned c)
{
  return a ? b : -c;
}

GCC at -O2 previously gave:

neg:
        cmp     w0, 0
        neg     w2, w2
        csel    w0, w1, w2, ne

but now gives:

neg:
        cmp     w0, 0
        csneg   w0, w1, w2, ne
        ret

with the corresponding code for the symmetric case as above.

Testing:
 - New regression test which checks all four of these cases.
 - Full bootstrap and regression on aarch64-linux.

Thanks,
Alex

---

gcc/ChangeLog:

2020-04-24  Alex Coplan  <alex.coplan@arm.com>

        * config/aarch64/aarch64.md (*csinv3_utxw_insn): New.
        * config/aarch64/iterators.md (neg_not_cs): New.

gcc/testsuite/ChangeLog:

2020-04-22  Alex Coplan  <alex.coplan@arm.com>

        * gcc.target/aarch64/csinv-neg.c: New test.

Comments

Richard Sandiford April 29, 2020, 4:15 p.m. UTC | #1
Thanks for doing this.

Alex Coplan <Alex.Coplan@arm.com> writes:
> Hello,
>
> The attached patch adds an optimization to the AArch64 backend to catch
> additional cases where we can use csinv and csneg.
>
> Given the C code:
>
> unsigned long long inv(unsigned a, unsigned b, unsigned c)
> {
>   return a ? b : ~c;
> }
>
> Prior to this patch, AArch64 GCC at -O2 generates:
>
> inv:
>         cmp     w0, 0
>         mvn     w2, w2
>         csel    w0, w1, w2, ne
>         ret
>
> and after applying the patch, we get:
>
> inv:
>         cmp     w0, 0
>         csinv   w0, w1, w2, ne
>         ret
>
> The new pattern also catches the optimization for the symmetric case where the
> body of foo reads a ? ~b : c.

Yeah.  The thing that surprised me was that the non-extending form
has the operator in the "then" arm of the if_then_else:

(define_insn "*csinv3<mode>_insn"
  [(set (match_operand:GPI 0 "register_operand" "=r")
        (if_then_else:GPI
	  (match_operand 1 "aarch64_comparison_operation" "")
	  (not:GPI (match_operand:GPI 2 "register_operand" "r"))
	  (match_operand:GPI 3 "aarch64_reg_or_zero" "rZ")))]
  ""
  "csinv\\t%<w>0, %<w>3, %<w>2, %M1"
  [(set_attr "type" "csel")]
)

whereas the new pattern has it in the "else" arm.  I think for the
non-extending form, having the operator in the "then" arm really is
canonical and close to guaranteed, since that's how ifcvt processes
half diamonds.

But when both values are zero-extended, ifcvt has to convert a full
diamond, and I'm not sure we can rely on the two sides coming out in
a particular order.  I think the two ?: orders above work with one pattern
thanks to simplifications that happen before entering gimple.  If instead
the operator is split out into a separate statement:

unsigned long long
inv1(int a, unsigned b, unsigned c)
{
  unsigned d = ~c;
  return a ? b : d;
}

then we go back to using the less optimal CSEL sequence.

So I think it would be worth having a second pattern for the opposite order.
Alternatively, we could add a new rtl canonicalisation rule to force the
if_then_else operands to end up in a particular order, but that's more
complex and is likely to affect other targets.

> diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
> index c7c4d1dd519..2f7367c0b1a 100644
> --- a/gcc/config/aarch64/aarch64.md
> +++ b/gcc/config/aarch64/aarch64.md
> @@ -4390,6 +4390,19 @@
>    [(set_attr "type" "csel")]
>  )
>  
> +(define_insn "*csinv3_uxtw_insn"
> +  [(set (match_operand:DI 0 "register_operand" "=r")
> +        (if_then_else:DI
> +          (match_operand 1 "aarch64_comparison_operation" "")
> +          (zero_extend:DI
> +            (match_operand:SI 2 "aarch64_reg_or_zero" "rZ"))
> +          (zero_extend:DI
> +            (NEG_NOT:SI (match_operand:SI 3 "register_operand" "r")))))]
> +  ""
> +  "cs<neg_not_cs>\\t%w0, %w2, %w3, %m1"
> +  [(set_attr "type" "csel")]
> +)
> +

The operand to a zero_extend can never be a const_int, so operand 2
should just be a register_operand with an "r" constraint.

At the risk of feature creep :-) a useful third pattern could be
to combine a zero-extended operator result with an existing DImode value.
In that case, the existing DImode value really can be "rZ" and should
always be in the "else" arm of the if_then_else.  E.g.:

unsigned long long
f(int a, unsigned long b, unsigned c)
{
  return a ? b : ~c;
}

unsigned long long
g(int a, unsigned c)
{
  return a ? 0 : ~c;
}

But that's definitely something that can be left for later.

Thanks,
Richard
Alex Coplan April 30, 2020, 1:27 p.m. UTC | #2
Hi Richard,

Many thanks for the detailed review. I've attached an updated patch based on
your comments (bootstrapped and regtested on aarch64-linux).

> -----Original Message-----
> From: Richard Sandiford <richard.sandiford@arm.com>
> Sent: 29 April 2020 17:16
> To: Alex Coplan <Alex.Coplan@arm.com>
> Cc: gcc-patches@gcc.gnu.org; Richard Earnshaw <Richard.Earnshaw@arm.com>;
> Marcus Shawcroft <Marcus.Shawcroft@arm.com>; Kyrylo Tkachov
> <Kyrylo.Tkachov@arm.com>; nd <nd@arm.com>
> Subject: Re: [PATCH] aarch64: prefer using csinv, csneg in zero extend
> contexts
> 
> Yeah.  The thing that surprised me was that the non-extending form
> has the operator in the "then" arm of the if_then_else:
> 
> (define_insn "*csinv3<mode>_insn"
>   [(set (match_operand:GPI 0 "register_operand" "=r")
>         (if_then_else:GPI
> 	  (match_operand 1 "aarch64_comparison_operation" "")
> 	  (not:GPI (match_operand:GPI 2 "register_operand" "r"))
> 	  (match_operand:GPI 3 "aarch64_reg_or_zero" "rZ")))]
>   ""
>   "csinv\\t%<w>0, %<w>3, %<w>2, %M1"
>   [(set_attr "type" "csel")]
> )
> 
> whereas the new pattern has it in the "else" arm.  I think for the
> non-extending form, having the operator in the "then" arm really is
> canonical and close to guaranteed, since that's how ifcvt processes
> half diamonds.
> 
> But when both values are zero-extended, ifcvt has to convert a full
> diamond, and I'm not sure we can rely on the two sides coming out in
> a particular order.  I think the two ?: orders above work with one
> pattern
> thanks to simplifications that happen before entering gimple.  If instead
> the operator is split out into a separate statement:
> 
> unsigned long long
> inv1(int a, unsigned b, unsigned c)
> {
>   unsigned d = ~c;
>   return a ? b : d;
> }
> 
> then we go back to using the less optimal CSEL sequence.
> 
> So I think it would be worth having a second pattern for the opposite
> order.

Agreed. It did seem a bit magical that we would get the other case for free. I
added this function to the tests and observed that we were getting the less
optimal CSEL sequence until I added the rule with the NEG_NOT on the other arm.

> > diff --git a/gcc/config/aarch64/aarch64.md
> b/gcc/config/aarch64/aarch64.md
> > index c7c4d1dd519..2f7367c0b1a 100644
> > --- a/gcc/config/aarch64/aarch64.md
> > +++ b/gcc/config/aarch64/aarch64.md
> > @@ -4390,6 +4390,19 @@
> >    [(set_attr "type" "csel")]
> >  )
> >
> > +(define_insn "*csinv3_uxtw_insn"
> > +  [(set (match_operand:DI 0 "register_operand" "=r")
> > +        (if_then_else:DI
> > +          (match_operand 1 "aarch64_comparison_operation" "")
> > +          (zero_extend:DI
> > +            (match_operand:SI 2 "aarch64_reg_or_zero" "rZ"))
> > +          (zero_extend:DI
> > +            (NEG_NOT:SI (match_operand:SI 3 "register_operand"
> "r")))))]
> > +  ""
> > +  "cs<neg_not_cs>\\t%w0, %w2, %w3, %m1"
> > +  [(set_attr "type" "csel")]
> > +)
> > +
> 
> The operand to a zero_extend can never be a const_int, so operand 2
> should just be a register_operand with an "r" constraint.

OK, makes sense. I've fixed this in the updated patch.

> At the risk of feature creep :-) a useful third pattern could be
> to combine a zero-extended operator result with an existing DImode value.
> In that case, the existing DImode value really can be "rZ" and should
> always be in the "else" arm of the if_then_else.  E.g.:
> 
> unsigned long long
> f(int a, unsigned long b, unsigned c)
> {
>   return a ? b : ~c;
> }
> 
> unsigned long long
> g(int a, unsigned c)
> {
>   return a ? 0 : ~c;
> }
> 
> But that's definitely something that can be left for later.

Hm. What sequence would you like to see for the function f here with the
argument already in DImode? I don't think we can do it with just a CMP + CSINV
like in the other test cases because you end up wanting to access b as x1 and c
as w2 which is not allowed.

Indeed, clang generates a MVN + CSEL sequence where the CSEL operates on the
64-bit registers:

f:
        mvn     w8, w2
        cmp     w0, #0
        csel    x0, x8, x1, eq
        ret

However, I agree that the restricted case where the else arm is (const_int 0) is
a worthwhile optimization (that wasn't caught by the previous patterns due to
const_ints never being zero_extended as you mentioned). I've added a pattern and
tests for this case in the updated patch.

Many thanks,
Alex

---

gcc/ChangeLog:

2020-04-30  Alex Coplan  <alex.coplan@arm.com>

	* config/aarch64/aarch64.md (*csinv3_utxw_insn1): New.
	  (*csinv3_uxtw_insn2): New.
	  (*csinv3_uxtw_insn3): New.
	* config/aarch64/iterators.md (neg_not_cs): New.

gcc/testsuite/ChangeLog:

2020-04-30  Alex Coplan  <alex.coplan@arm.com>

	* gcc.target/aarch64/csinv-neg.c: New test.
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index c7c4d1dd519..37d651724ad 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -4390,6 +4390,45 @@
   [(set_attr "type" "csel")]
 )
 
+(define_insn "*csinv3_uxtw_insn1"
+  [(set (match_operand:DI 0 "register_operand" "=r")
+        (if_then_else:DI
+          (match_operand 1 "aarch64_comparison_operation" "")
+          (zero_extend:DI
+            (match_operand:SI 2 "register_operand" "r"))
+          (zero_extend:DI
+            (NEG_NOT:SI (match_operand:SI 3 "register_operand" "r")))))]
+  ""
+  "cs<neg_not_cs>\\t%w0, %w2, %w3, %m1"
+  [(set_attr "type" "csel")]
+)
+
+(define_insn "*csinv3_uxtw_insn2"
+  [(set (match_operand:DI 0 "register_operand" "=r")
+        (if_then_else:DI
+          (match_operand 1 "aarch64_comparison_operation" "")
+          (zero_extend:DI
+            (NEG_NOT:SI (match_operand:SI 2 "register_operand" "r")))
+          (zero_extend:DI
+            (match_operand:SI 3 "register_operand" "r"))))]
+  ""
+  "cs<neg_not_cs>\\t%w0, %w3, %w2, %M1"
+  [(set_attr "type" "csel")]
+)
+
+(define_insn "*csinv3_uxtw_insn3"
+  [(set (match_operand:DI 0 "register_operand" "=r")
+        (if_then_else:DI
+          (match_operand 1 "aarch64_comparison_operation" "")
+          (zero_extend:DI
+            (NEG_NOT:SI (match_operand:SI 2 "register_operand" "r")))
+          (const_int 0)))]
+  ""
+  "cs<neg_not_cs>\\t%w0, wzr, %w2, %M1"
+  [(set_attr "type" "csel")]
+)
+
+
 ;; If X can be loaded by a single CNT[BHWD] instruction,
 ;;
 ;;    A = UMAX (B, X)
diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md
index 8e434389e59..a568cf21b99 100644
--- a/gcc/config/aarch64/iterators.md
+++ b/gcc/config/aarch64/iterators.md
@@ -1932,6 +1932,9 @@
 ;; Operation names for negate and bitwise complement.
 (define_code_attr neg_not_op [(neg "neg") (not "not")])
 
+;; csinv, csneg insn suffixes.
+(define_code_attr neg_not_cs [(neg "neg") (not "inv")])
+
 ;; Similar, but when the second operand is inverted.
 (define_code_attr nlogical [(and "bic") (ior "orn") (xor "eon")])
 
diff --git a/gcc/testsuite/gcc.target/aarch64/csinv-neg.c b/gcc/testsuite/gcc.target/aarch64/csinv-neg.c
new file mode 100644
index 00000000000..cc64b4094d7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/csinv-neg.c
@@ -0,0 +1,104 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+/*
+** inv1:
+**	cmp	w0, 0
+**	csinv	w0, w1, w2, ne
+**	ret
+*/
+unsigned long long
+inv1(unsigned a, unsigned b, unsigned c)
+{
+  return a ? b : ~c;
+}
+
+/*
+** inv1_local:
+**	cmp	w0, 0
+**	csinv	w0, w1, w2, ne
+**	ret
+*/
+unsigned long long
+inv1_local(unsigned a, unsigned b, unsigned c)
+{
+  unsigned d = ~c;
+  return a ? b : d;
+}
+
+/*
+** inv_zero1:
+**	cmp	w0, 0
+**	csinv	w0, wzr, w1, ne
+**	ret
+*/
+unsigned long long
+inv_zero1(unsigned a, unsigned b)
+{
+  return a ? 0 : ~b;
+}
+
+/*
+** inv_zero2:
+**	cmp	w0, 0
+**	csinv	w0, wzr, w1, eq
+**	ret
+*/
+unsigned long long
+inv_zero2(unsigned a, unsigned b)
+{
+  return a ? ~b : 0;
+}
+
+
+/*
+** inv2:
+**	cmp	w0, 0
+**	csinv	w0, w2, w1, eq
+**	ret
+*/
+unsigned long long
+inv2(unsigned a, unsigned b, unsigned c)
+{
+  return a ? ~b : c;
+}
+
+/*
+** inv2_local:
+**	cmp	w0, 0
+**	csinv	w0, w2, w1, eq
+**	ret
+*/
+unsigned long long
+inv2_local(unsigned a, unsigned b, unsigned c)
+{
+  unsigned d = ~b;
+  return a ? d : c;
+}
+
+/*
+** neg1:
+**	cmp	w0, 0
+**	csneg	w0, w1, w2, ne
+**	ret
+*/
+unsigned long long
+neg1(unsigned a, unsigned b, unsigned c)
+{
+  return a ? b : -c;
+}
+
+
+/*
+** neg2:
+**	cmp	w0, 0
+**	csneg	w0, w2, w1, eq
+**	ret
+*/
+unsigned long long
+neg2(unsigned a, unsigned b, unsigned c)
+{
+  return a ? -b : c;
+}
+
+/* { dg-final { check-function-bodies "**" "" "" } } */
Richard Sandiford April 30, 2020, 2:12 p.m. UTC | #3
Alex Coplan <Alex.Coplan@arm.com> writes:
>> At the risk of feature creep :-) a useful third pattern could be
>> to combine a zero-extended operator result with an existing DImode value.
>> In that case, the existing DImode value really can be "rZ" and should
>> always be in the "else" arm of the if_then_else.  E.g.:
>> 
>> unsigned long long
>> f(int a, unsigned long b, unsigned c)
>> {
>>   return a ? b : ~c;
>> }
>> 
>> unsigned long long
>> g(int a, unsigned c)
>> {
>>   return a ? 0 : ~c;
>> }
>> 
>> But that's definitely something that can be left for later.
>
> Hm. What sequence would you like to see for the function f here with the
> argument already in DImode? I don't think we can do it with just a CMP + CSINV
> like in the other test cases because you end up wanting to access b as x1 and c
> as w2 which is not allowed.

Yeah, I was hoping for something like...

> Indeed, clang generates a MVN + CSEL sequence where the CSEL operates on the
> 64-bit registers:
>
> f:
>         mvn     w8, w2
>         cmp     w0, #0
>         csel    x0, x8, x1, eq
>         ret

...this rather than the 4-insn (+ret) sequence that we currently
generate.  So it would have been a define_insn_and_split that handles
the zero case directly but splits into the "optimal" two-instruction
sequence for registers.

But I guess the underlying problem is instead that we don't have
a pattern for (zero_extend:DI (not:SI ...)).  Adding that would
definitely be a better fix.

> However, I agree that the restricted case where the else arm is (const_int 0) is
> a worthwhile optimization (that wasn't caught by the previous patterns due to
> const_ints never being zero_extended as you mentioned). I've added a pattern and
> tests for this case in the updated patch.

Looks good, just a couple of minor things:

> 2020-04-30  Alex Coplan  <alex.coplan@arm.com>
>
> 	* config/aarch64/aarch64.md (*csinv3_utxw_insn1): New.
> 	  (*csinv3_uxtw_insn2): New.
> 	  (*csinv3_uxtw_insn3): New.

ChangeLog trivia, but these last two lines should only be indented by a tab.

Hopefully one day we'll finally ditch this format and stop having to
quibble over such minor formatting stuff...

> diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
> index c7c4d1dd519..37d651724ad 100644
> --- a/gcc/config/aarch64/aarch64.md
> +++ b/gcc/config/aarch64/aarch64.md
> @@ -4390,6 +4390,45 @@
>    [(set_attr "type" "csel")]
>  )
>  
> +(define_insn "*csinv3_uxtw_insn1"
> +  [(set (match_operand:DI 0 "register_operand" "=r")
> +        (if_then_else:DI
> +          (match_operand 1 "aarch64_comparison_operation" "")
> +          (zero_extend:DI
> +            (match_operand:SI 2 "register_operand" "r"))
> +          (zero_extend:DI
> +            (NEG_NOT:SI (match_operand:SI 3 "register_operand" "r")))))]
> +  ""
> +  "cs<neg_not_cs>\\t%w0, %w2, %w3, %m1"
> +  [(set_attr "type" "csel")]
> +)
> +
> +(define_insn "*csinv3_uxtw_insn2"
> +  [(set (match_operand:DI 0 "register_operand" "=r")
> +        (if_then_else:DI
> +          (match_operand 1 "aarch64_comparison_operation" "")
> +          (zero_extend:DI
> +            (NEG_NOT:SI (match_operand:SI 2 "register_operand" "r")))
> +          (zero_extend:DI
> +            (match_operand:SI 3 "register_operand" "r"))))]
> +  ""
> +  "cs<neg_not_cs>\\t%w0, %w3, %w2, %M1"
> +  [(set_attr "type" "csel")]
> +)
> +
> +(define_insn "*csinv3_uxtw_insn3"
> +  [(set (match_operand:DI 0 "register_operand" "=r")
> +        (if_then_else:DI
> +          (match_operand 1 "aarch64_comparison_operation" "")
> +          (zero_extend:DI
> +            (NEG_NOT:SI (match_operand:SI 2 "register_operand" "r")))
> +          (const_int 0)))]
> +  ""
> +  "cs<neg_not_cs>\\t%w0, wzr, %w2, %M1"
> +  [(set_attr "type" "csel")]
> +)
> +
> +

Usually there's just one blank line between patterns, even if the
patterns aren't naturally grouped.

No need to repost just for that.  I'll push with those changes once
stage 1 opens.

Thanks,
Richard
Alex Coplan April 30, 2020, 2:49 p.m. UTC | #4
> -----Original Message-----
> From: Richard Sandiford <richard.sandiford@arm.com>
> Sent: 30 April 2020 15:13
> To: Alex Coplan <Alex.Coplan@arm.com>
> Cc: gcc-patches@gcc.gnu.org; Richard Earnshaw <Richard.Earnshaw@arm.com>;
> Marcus Shawcroft <Marcus.Shawcroft@arm.com>; Kyrylo Tkachov
> <Kyrylo.Tkachov@arm.com>; nd <nd@arm.com>
> Subject: Re: [PATCH] aarch64: prefer using csinv, csneg in zero extend contexts
> 
> Yeah, I was hoping for something like...
> 
> > Indeed, clang generates a MVN + CSEL sequence where the CSEL operates on the
> > 64-bit registers:
> >
> > f:
> >         mvn     w8, w2
> >         cmp     w0, #0
> >         csel    x0, x8, x1, eq
> >         ret
> 
> ...this rather than the 4-insn (+ret) sequence that we currently
> generate.  So it would have been a define_insn_and_split that handles
> the zero case directly but splits into the "optimal" two-instruction
> sequence for registers.
> 
> But I guess the underlying problem is instead that we don't have
> a pattern for (zero_extend:DI (not:SI ...)).  Adding that would
> definitely be a better fix.

Yes. I sent a patch for this very fix which Kyrill is going to commit once stage
1 opens: https://gcc.gnu.org/pipermail/gcc-patches/2020-April/544365.html

I tried compiling this function with that patch applied and we get:

f:
        cmp     w0, 0
        mvn     w2, w2
        csel    x0, x2, x1, eq
        ret

which is good.

> ChangeLog trivia, but these last two lines should only be indented by a tab.

Ah, thanks. Noted for next time.

> 
> Hopefully one day we'll finally ditch this format and stop having to
> quibble over such minor formatting stuff...

That would be good!

> 
> > diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
> > index c7c4d1dd519..37d651724ad 100644
> > --- a/gcc/config/aarch64/aarch64.md
> > +++ b/gcc/config/aarch64/aarch64.md
> > @@ -4390,6 +4390,45 @@
> >    [(set_attr "type" "csel")]
> >  )
> >
> > +(define_insn "*csinv3_uxtw_insn1"
> > +  [(set (match_operand:DI 0 "register_operand" "=r")
> > +        (if_then_else:DI
> > +          (match_operand 1 "aarch64_comparison_operation" "")
> > +          (zero_extend:DI
> > +            (match_operand:SI 2 "register_operand" "r"))
> > +          (zero_extend:DI
> > +            (NEG_NOT:SI (match_operand:SI 3 "register_operand" "r")))))]
> > +  ""
> > +  "cs<neg_not_cs>\\t%w0, %w2, %w3, %m1"
> > +  [(set_attr "type" "csel")]
> > +)
> > +
> > +(define_insn "*csinv3_uxtw_insn2"
> > +  [(set (match_operand:DI 0 "register_operand" "=r")
> > +        (if_then_else:DI
> > +          (match_operand 1 "aarch64_comparison_operation" "")
> > +          (zero_extend:DI
> > +            (NEG_NOT:SI (match_operand:SI 2 "register_operand" "r")))
> > +          (zero_extend:DI
> > +            (match_operand:SI 3 "register_operand" "r"))))]
> > +  ""
> > +  "cs<neg_not_cs>\\t%w0, %w3, %w2, %M1"
> > +  [(set_attr "type" "csel")]
> > +)
> > +
> > +(define_insn "*csinv3_uxtw_insn3"
> > +  [(set (match_operand:DI 0 "register_operand" "=r")
> > +        (if_then_else:DI
> > +          (match_operand 1 "aarch64_comparison_operation" "")
> > +          (zero_extend:DI
> > +            (NEG_NOT:SI (match_operand:SI 2 "register_operand" "r")))
> > +          (const_int 0)))]
> > +  ""
> > +  "cs<neg_not_cs>\\t%w0, wzr, %w2, %M1"
> > +  [(set_attr "type" "csel")]
> > +)
> > +
> > +
> 
> Usually there's just one blank line between patterns, even if the
> patterns aren't naturally grouped.

Ok, good to know.

> 
> No need to repost just for that.  I'll push with those changes once
> stage 1 opens.

Great!

Thanks,
Alex
Richard Sandiford May 6, 2020, 10:28 a.m. UTC | #5
Alex Coplan <Alex.Coplan@arm.com> writes:
>> -----Original Message-----
>> From: Richard Sandiford <richard.sandiford@arm.com>
>> Sent: 30 April 2020 15:13
>> To: Alex Coplan <Alex.Coplan@arm.com>
>> Cc: gcc-patches@gcc.gnu.org; Richard Earnshaw <Richard.Earnshaw@arm.com>;
>> Marcus Shawcroft <Marcus.Shawcroft@arm.com>; Kyrylo Tkachov
>> <Kyrylo.Tkachov@arm.com>; nd <nd@arm.com>
>> Subject: Re: [PATCH] aarch64: prefer using csinv, csneg in zero extend contexts
>> 
>> Yeah, I was hoping for something like...
>> 
>> > Indeed, clang generates a MVN + CSEL sequence where the CSEL operates on the
>> > 64-bit registers:
>> >
>> > f:
>> >         mvn     w8, w2
>> >         cmp     w0, #0
>> >         csel    x0, x8, x1, eq
>> >         ret
>> 
>> ...this rather than the 4-insn (+ret) sequence that we currently
>> generate.  So it would have been a define_insn_and_split that handles
>> the zero case directly but splits into the "optimal" two-instruction
>> sequence for registers.
>> 
>> But I guess the underlying problem is instead that we don't have
>> a pattern for (zero_extend:DI (not:SI ...)).  Adding that would
>> definitely be a better fix.
>
> Yes. I sent a patch for this very fix which Kyrill is going to commit once stage
> 1 opens: https://gcc.gnu.org/pipermail/gcc-patches/2020-April/544365.html

Sorry, missed that.

It looks like that patch hinders this one though.  Trying it with
current master (where that patch is applied), I get:

FAIL: gcc.target/aarch64/csinv-neg.c check-function-bodies inv_zero1
FAIL: gcc.target/aarch64/csinv-neg.c check-function-bodies inv_zero2

It looks like a costs issue:

Trying 27 -> 18:
   27: r99:DI=zero_extend(~r101:SI)
      REG_DEAD r101:SI
   18: x0:DI={(cc:CC==0)?r99:DI:0}
      REG_DEAD cc:CC
      REG_DEAD r99:DI
Successfully matched this instruction:
(set (reg/i:DI 0 x0)
    (if_then_else:DI (eq (reg:CC 66 cc)
            (const_int 0 [0]))
        (zero_extend:DI (not:SI (reg:SI 101)))
        (const_int 0 [0])))
rejecting combination of insns 27 and 18
original costs 4 + 4 = 8
replacement cost 12

I guess we'll need to teach aarch64_if_then_else_costs about the costs
of the new insns.

Thanks,
Richard
Alex Coplan May 7, 2020, 3:51 p.m. UTC | #6
> -----Original Message-----
> From: Richard Sandiford <richard.sandiford@arm.com>
> Sent: 06 May 2020 11:28
> To: Alex Coplan <Alex.Coplan@arm.com>
> Cc: gcc-patches@gcc.gnu.org; Richard Earnshaw <Richard.Earnshaw@arm.com>;
> Marcus Shawcroft <Marcus.Shawcroft@arm.com>; Kyrylo Tkachov
> <Kyrylo.Tkachov@arm.com>; nd <nd@arm.com>
> Subject: Re: [PATCH] aarch64: prefer using csinv, csneg in zero extend
> contexts
>
> Alex Coplan <Alex.Coplan@arm.com> writes:
> >> -----Original Message-----
> >> From: Richard Sandiford <richard.sandiford@arm.com>
> >> Sent: 30 April 2020 15:13
> >> To: Alex Coplan <Alex.Coplan@arm.com>
> >> Cc: gcc-patches@gcc.gnu.org; Richard Earnshaw
> <Richard.Earnshaw@arm.com>;
> >> Marcus Shawcroft <Marcus.Shawcroft@arm.com>; Kyrylo Tkachov
> >> <Kyrylo.Tkachov@arm.com>; nd <nd@arm.com>
> >> Subject: Re: [PATCH] aarch64: prefer using csinv, csneg in zero extend
> contexts
> >>
> >> Yeah, I was hoping for something like...
> >>
> >> > Indeed, clang generates a MVN + CSEL sequence where the CSEL
> operates on the
> >> > 64-bit registers:
> >> >
> >> > f:
> >> >         mvn     w8, w2
> >> >         cmp     w0, #0
> >> >         csel    x0, x8, x1, eq
> >> >         ret
> >>
> >> ...this rather than the 4-insn (+ret) sequence that we currently
> >> generate.  So it would have been a define_insn_and_split that handles
> >> the zero case directly but splits into the "optimal" two-instruction
> >> sequence for registers.
> >>
> >> But I guess the underlying problem is instead that we don't have
> >> a pattern for (zero_extend:DI (not:SI ...)).  Adding that would
> >> definitely be a better fix.
> >
> > Yes. I sent a patch for this very fix which Kyrill is going to commit
> once stage
> > 1 opens: https://gcc.gnu.org/pipermail/gcc-patches/2020-
> April/544365.html
>
> Sorry, missed that.
>
> It looks like that patch hinders this one though.  Trying it with
> current master (where that patch is applied), I get:
>
> FAIL: gcc.target/aarch64/csinv-neg.c check-function-bodies inv_zero1
> FAIL: gcc.target/aarch64/csinv-neg.c check-function-bodies inv_zero2
>
> It looks like a costs issue:
>
> Trying 27 -> 18:
>    27: r99:DI=zero_extend(~r101:SI)
>       REG_DEAD r101:SI
>    18: x0:DI={(cc:CC==0)?r99:DI:0}
>       REG_DEAD cc:CC
>       REG_DEAD r99:DI
> Successfully matched this instruction:
> (set (reg/i:DI 0 x0)
>     (if_then_else:DI (eq (reg:CC 66 cc)
>             (const_int 0 [0]))
>         (zero_extend:DI (not:SI (reg:SI 101)))
>         (const_int 0 [0])))
> rejecting combination of insns 27 and 18
> original costs 4 + 4 = 8
> replacement cost 12
>
> I guess we'll need to teach aarch64_if_then_else_costs about the costs
> of the new insns.

Ah, thanks for catching this. I've attached an updated patch which fixes the
costs issue here. With the new patch, all the test cases in csinv-neg.c now
pass. In addition, I've done a bootstrap and regtest on aarch64-linux with no
new failures.

Do you think we need to add cases to aarch64_if_then_else_costs for the other
new insns, or is the attached patch OK for master?

Thanks,
Alex

---

gcc/ChangeLog:

2020-05-07  Alex Coplan  <alex.coplan@arm.com>

        * config/aarch64/aarch64.c (aarch64_if_then_else_costs): Add case to correctly
        calculate cost for new pattern (*csinv3_uxtw_insn3).
        * config/aarch64/aarch64.md (*csinv3_utxw_insn1): New.
        (*csinv3_uxtw_insn2): New.
        (*csinv3_uxtw_insn3): New.
        * config/aarch64/iterators.md (neg_not_cs): New.

gcc/testsuite/ChangeLog:

2020-05-07  Alex Coplan  <alex.coplan@arm.com>

        * gcc.target/aarch64/csinv-neg.c: New test.
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index e92c7e69fcb..efb3da7a7fc 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -11695,6 +11695,15 @@ aarch64_if_then_else_costs (rtx op0, rtx op1, rtx op2, int *cost, bool speed)
 	  op1 = XEXP (op1, 0);
 	  op2 = XEXP (op2, 0);
 	}
+      else if (GET_CODE (op1) == ZERO_EXTEND && op2 == const0_rtx)
+	{
+	  inner = XEXP (op1, 0);
+	  if (GET_CODE (inner) == NEG || GET_CODE (inner) == NOT)
+	  {
+	    /* CSINV/NEG with zero extend + const 0 (*csinv3_uxtw_insn3).  */
+	    op1 = XEXP (inner, 0);
+	  }
+	}
 
       *cost += rtx_cost (op1, VOIDmode, IF_THEN_ELSE, 1, speed);
       *cost += rtx_cost (op2, VOIDmode, IF_THEN_ELSE, 2, speed);
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index ff15505d455..b2cfd015530 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -4391,6 +4391,44 @@
   [(set_attr "type" "csel")]
 )
 
+(define_insn "*csinv3_uxtw_insn1"
+  [(set (match_operand:DI 0 "register_operand" "=r")
+	(if_then_else:DI
+	  (match_operand 1 "aarch64_comparison_operation" "")
+	  (zero_extend:DI
+	    (match_operand:SI 2 "register_operand" "r"))
+	  (zero_extend:DI
+	    (NEG_NOT:SI (match_operand:SI 3 "register_operand" "r")))))]
+  ""
+  "cs<neg_not_cs>\\t%w0, %w2, %w3, %m1"
+  [(set_attr "type" "csel")]
+)
+
+(define_insn "*csinv3_uxtw_insn2"
+  [(set (match_operand:DI 0 "register_operand" "=r")
+	(if_then_else:DI
+	  (match_operand 1 "aarch64_comparison_operation" "")
+	  (zero_extend:DI
+	    (NEG_NOT:SI (match_operand:SI 2 "register_operand" "r")))
+	  (zero_extend:DI
+	    (match_operand:SI 3 "register_operand" "r"))))]
+  ""
+  "cs<neg_not_cs>\\t%w0, %w3, %w2, %M1"
+  [(set_attr "type" "csel")]
+)
+
+(define_insn "*csinv3_uxtw_insn3"
+  [(set (match_operand:DI 0 "register_operand" "=r")
+	(if_then_else:DI
+	  (match_operand 1 "aarch64_comparison_operation" "")
+	  (zero_extend:DI
+	    (NEG_NOT:SI (match_operand:SI 2 "register_operand" "r")))
+	  (const_int 0)))]
+  ""
+  "cs<neg_not_cs>\\t%w0, wzr, %w2, %M1"
+  [(set_attr "type" "csel")]
+)
+
 ;; If X can be loaded by a single CNT[BHWD] instruction,
 ;;
 ;;    A = UMAX (B, X)
diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md
index 8e434389e59..a568cf21b99 100644
--- a/gcc/config/aarch64/iterators.md
+++ b/gcc/config/aarch64/iterators.md
@@ -1932,6 +1932,9 @@
 ;; Operation names for negate and bitwise complement.
 (define_code_attr neg_not_op [(neg "neg") (not "not")])
 
+;; csinv, csneg insn suffixes.
+(define_code_attr neg_not_cs [(neg "neg") (not "inv")])
+
 ;; Similar, but when the second operand is inverted.
 (define_code_attr nlogical [(and "bic") (ior "orn") (xor "eon")])
 
diff --git a/gcc/testsuite/gcc.target/aarch64/csinv-neg.c b/gcc/testsuite/gcc.target/aarch64/csinv-neg.c
new file mode 100644
index 00000000000..cc64b4094d7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/csinv-neg.c
@@ -0,0 +1,104 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+/*
+** inv1:
+**	cmp	w0, 0
+**	csinv	w0, w1, w2, ne
+**	ret
+*/
+unsigned long long
+inv1(unsigned a, unsigned b, unsigned c)
+{
+  return a ? b : ~c;
+}
+
+/*
+** inv1_local:
+**	cmp	w0, 0
+**	csinv	w0, w1, w2, ne
+**	ret
+*/
+unsigned long long
+inv1_local(unsigned a, unsigned b, unsigned c)
+{
+  unsigned d = ~c;
+  return a ? b : d;
+}
+
+/*
+** inv_zero1:
+**	cmp	w0, 0
+**	csinv	w0, wzr, w1, ne
+**	ret
+*/
+unsigned long long
+inv_zero1(unsigned a, unsigned b)
+{
+  return a ? 0 : ~b;
+}
+
+/*
+** inv_zero2:
+**	cmp	w0, 0
+**	csinv	w0, wzr, w1, eq
+**	ret
+*/
+unsigned long long
+inv_zero2(unsigned a, unsigned b)
+{
+  return a ? ~b : 0;
+}
+
+
+/*
+** inv2:
+**	cmp	w0, 0
+**	csinv	w0, w2, w1, eq
+**	ret
+*/
+unsigned long long
+inv2(unsigned a, unsigned b, unsigned c)
+{
+  return a ? ~b : c;
+}
+
+/*
+** inv2_local:
+**	cmp	w0, 0
+**	csinv	w0, w2, w1, eq
+**	ret
+*/
+unsigned long long
+inv2_local(unsigned a, unsigned b, unsigned c)
+{
+  unsigned d = ~b;
+  return a ? d : c;
+}
+
+/*
+** neg1:
+**	cmp	w0, 0
+**	csneg	w0, w1, w2, ne
+**	ret
+*/
+unsigned long long
+neg1(unsigned a, unsigned b, unsigned c)
+{
+  return a ? b : -c;
+}
+
+
+/*
+** neg2:
+**	cmp	w0, 0
+**	csneg	w0, w2, w1, eq
+**	ret
+*/
+unsigned long long
+neg2(unsigned a, unsigned b, unsigned c)
+{
+  return a ? -b : c;
+}
+
+/* { dg-final { check-function-bodies "**" "" "" } } */
Richard Sandiford May 11, 2020, 2:25 p.m. UTC | #7
Alex Coplan <Alex.Coplan@arm.com> writes:
>> -----Original Message-----
>> From: Richard Sandiford <richard.sandiford@arm.com>
>> Sent: 06 May 2020 11:28
>> To: Alex Coplan <Alex.Coplan@arm.com>
>> Cc: gcc-patches@gcc.gnu.org; Richard Earnshaw <Richard.Earnshaw@arm.com>;
>> Marcus Shawcroft <Marcus.Shawcroft@arm.com>; Kyrylo Tkachov
>> <Kyrylo.Tkachov@arm.com>; nd <nd@arm.com>
>> Subject: Re: [PATCH] aarch64: prefer using csinv, csneg in zero extend
>> contexts
>>
>> Alex Coplan <Alex.Coplan@arm.com> writes:
>> >> -----Original Message-----
>> >> From: Richard Sandiford <richard.sandiford@arm.com>
>> >> Sent: 30 April 2020 15:13
>> >> To: Alex Coplan <Alex.Coplan@arm.com>
>> >> Cc: gcc-patches@gcc.gnu.org; Richard Earnshaw
>> <Richard.Earnshaw@arm.com>;
>> >> Marcus Shawcroft <Marcus.Shawcroft@arm.com>; Kyrylo Tkachov
>> >> <Kyrylo.Tkachov@arm.com>; nd <nd@arm.com>
>> >> Subject: Re: [PATCH] aarch64: prefer using csinv, csneg in zero extend
>> contexts
>> >>
>> >> Yeah, I was hoping for something like...
>> >>
>> >> > Indeed, clang generates a MVN + CSEL sequence where the CSEL
>> operates on the
>> >> > 64-bit registers:
>> >> >
>> >> > f:
>> >> >         mvn     w8, w2
>> >> >         cmp     w0, #0
>> >> >         csel    x0, x8, x1, eq
>> >> >         ret
>> >>
>> >> ...this rather than the 4-insn (+ret) sequence that we currently
>> >> generate.  So it would have been a define_insn_and_split that handles
>> >> the zero case directly but splits into the "optimal" two-instruction
>> >> sequence for registers.
>> >>
>> >> But I guess the underlying problem is instead that we don't have
>> >> a pattern for (zero_extend:DI (not:SI ...)).  Adding that would
>> >> definitely be a better fix.
>> >
>> > Yes. I sent a patch for this very fix which Kyrill is going to commit
>> once stage
>> > 1 opens: https://gcc.gnu.org/pipermail/gcc-patches/2020-
>> April/544365.html
>>
>> Sorry, missed that.
>>
>> It looks like that patch hinders this one though.  Trying it with
>> current master (where that patch is applied), I get:
>>
>> FAIL: gcc.target/aarch64/csinv-neg.c check-function-bodies inv_zero1
>> FAIL: gcc.target/aarch64/csinv-neg.c check-function-bodies inv_zero2
>>
>> It looks like a costs issue:
>>
>> Trying 27 -> 18:
>>    27: r99:DI=zero_extend(~r101:SI)
>>       REG_DEAD r101:SI
>>    18: x0:DI={(cc:CC==0)?r99:DI:0}
>>       REG_DEAD cc:CC
>>       REG_DEAD r99:DI
>> Successfully matched this instruction:
>> (set (reg/i:DI 0 x0)
>>     (if_then_else:DI (eq (reg:CC 66 cc)
>>             (const_int 0 [0]))
>>         (zero_extend:DI (not:SI (reg:SI 101)))
>>         (const_int 0 [0])))
>> rejecting combination of insns 27 and 18
>> original costs 4 + 4 = 8
>> replacement cost 12
>>
>> I guess we'll need to teach aarch64_if_then_else_costs about the costs
>> of the new insns.
>
> Ah, thanks for catching this. I've attached an updated patch which fixes the
> costs issue here. With the new patch, all the test cases in csinv-neg.c now
> pass. In addition, I've done a bootstrap and regtest on aarch64-linux with no
> new failures.
>
> Do you think we need to add cases to aarch64_if_then_else_costs for the other
> new insns, or is the attached patch OK for master?

Looks good as-is, thanks.  Just a couple of very minor nits:

> 2020-05-07  Alex Coplan  <alex.coplan@arm.com>
>
>         * config/aarch64/aarch64.c (aarch64_if_then_else_costs): Add case to correctly
>         calculate cost for new pattern (*csinv3_uxtw_insn3).

ChangeLog lines follow the 80-character limit.

> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index e92c7e69fcb..efb3da7a7fc 100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -11695,6 +11695,15 @@ aarch64_if_then_else_costs (rtx op0, rtx op1, rtx op2, int *cost, bool speed)
>  	  op1 = XEXP (op1, 0);
>  	  op2 = XEXP (op2, 0);
>  	}
> +      else if (GET_CODE (op1) == ZERO_EXTEND && op2 == const0_rtx)
> +	{
> +	  inner = XEXP (op1, 0);
> +	  if (GET_CODE (inner) == NEG || GET_CODE (inner) == NOT)
> +	  {
> +	    /* CSINV/NEG with zero extend + const 0 (*csinv3_uxtw_insn3).  */
> +	    op1 = XEXP (inner, 0);
> +	  }

GCC style is to avoid { ... } around single statements, unless it's
needed to avoid an ambiguity.  If we did use { ... }, the block
should be indented by two further spaces, so that the "{" is
under the space after "if".

Pushed with those changes, thanks.

Richard
diff mbox series

Patch

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index c7c4d1dd519..2f7367c0b1a 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -4390,6 +4390,19 @@ 
   [(set_attr "type" "csel")]
 )
 
+(define_insn "*csinv3_uxtw_insn"
+  [(set (match_operand:DI 0 "register_operand" "=r")
+        (if_then_else:DI
+          (match_operand 1 "aarch64_comparison_operation" "")
+          (zero_extend:DI
+            (match_operand:SI 2 "aarch64_reg_or_zero" "rZ"))
+          (zero_extend:DI
+            (NEG_NOT:SI (match_operand:SI 3 "register_operand" "r")))))]
+  ""
+  "cs<neg_not_cs>\\t%w0, %w2, %w3, %m1"
+  [(set_attr "type" "csel")]
+)
+
 ;; If X can be loaded by a single CNT[BHWD] instruction,
 ;;
 ;;    A = UMAX (B, X)
diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md
index 8e434389e59..a568cf21b99 100644
--- a/gcc/config/aarch64/iterators.md
+++ b/gcc/config/aarch64/iterators.md
@@ -1932,6 +1932,9 @@ 
 ;; Operation names for negate and bitwise complement.
 (define_code_attr neg_not_op [(neg "neg") (not "not")])
 
+;; csinv, csneg insn suffixes.
+(define_code_attr neg_not_cs [(neg "neg") (not "inv")])
+
 ;; Similar, but when the second operand is inverted.
 (define_code_attr nlogical [(and "bic") (ior "orn") (xor "eon")])
 
diff --git a/gcc/testsuite/gcc.target/aarch64/csinv-neg.c b/gcc/testsuite/gcc.target/aarch64/csinv-neg.c
new file mode 100644
index 00000000000..4636f3e0906
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/csinv-neg.c
@@ -0,0 +1,54 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+/*
+** inv1:
+**	cmp	w0, 0
+**	csinv	w0, w1, w2, ne
+**	ret
+*/
+unsigned long long
+inv1(unsigned a, unsigned b, unsigned c)
+{
+  return a ? b : ~c;
+}
+
+/*
+** inv2:
+**	cmp	w0, 0
+**	csinv	w0, w2, w1, eq
+**	ret
+*/
+unsigned long long
+inv2(unsigned a, unsigned b, unsigned c)
+{
+  return a ? ~b : c;
+}
+
+/*
+** neg1:
+**	cmp	w0, 0
+**	csneg	w0, w1, w2, ne
+**	ret
+*/
+unsigned long long
+neg1(unsigned a, unsigned b, unsigned c)
+{
+  return a ? b : -c;
+}
+
+
+/*
+** neg2:
+**	cmp	w0, 0
+**	csneg	w0, w2, w1, eq
+**	ret
+*/
+unsigned long long
+neg2(unsigned a, unsigned b, unsigned c)
+{
+  return a ? -b : c;
+}
+
+
+/* { dg-final { check-function-bodies "**" "" "" } } */