diff mbox series

[2/3,APX,CCMP] Adjust startegy for selecting ccmp candidates

Message ID 20240515082054.3934069-3-hongyu.wang@intel.com
State New
Headers show
Series Support Intel APX CCMP | expand

Commit Message

Hongyu Wang May 15, 2024, 8:20 a.m. UTC
For general ccmp scenario, the tree sequence is like

_1 = (a < b)
_2 = (c < d)
_3 = _1 & _2

current ccmp expanding will try to swap compare order for _1 and _2,
compare the cost/cost2 between compare _1 and _2 first, then return the
sequence with lower cost.

For x86 ccmp, we don't support FP compare as ccmp operand, but we
support fp comi + int ccmp sequence. With current cost comparison
model, the fp comi + int ccmp can never be generated since it doesn't
check whether expand_ccmp_next returns available result and the rtl
cost for the empty ccmp sequence is always smaller.

Check the expand_ccmp_next result ret and ret2, returns the valid one
before cost comparison.

gcc/ChangeLog:

	* ccmp.cc (expand_ccmp_expr_1): Check ret and ret2 of
	expand_ccmp_next, returns the valid one first before
	comparing cost.
---
 gcc/ccmp.cc | 12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

Comments

Hongyu Wang May 15, 2024, 8:25 a.m. UTC | #1
CC'd Richard for ccmp part as previously it is added only for aarch64.
The original logic will not interrupted since if
aarch64_gen_ccmp_first succeeded, aarch64_gen_ccmp_next will also
success, the cmp/fcmp and ccmp/fccmp supports all GPI/GPF, and the
prepare_operand will fixup the input that cmp supports but ccmp not,
so ret/ret2 will all be valid when comparing cost.
Thanks in advance.

Hongyu Wang <hongyu.wang@intel.com> 于2024年5月15日周三 16:22写道:
>
> For general ccmp scenario, the tree sequence is like
>
> _1 = (a < b)
> _2 = (c < d)
> _3 = _1 & _2
>
> current ccmp expanding will try to swap compare order for _1 and _2,
> compare the cost/cost2 between compare _1 and _2 first, then return the
> sequence with lower cost.
>
> For x86 ccmp, we don't support FP compare as ccmp operand, but we
> support fp comi + int ccmp sequence. With current cost comparison
> model, the fp comi + int ccmp can never be generated since it doesn't
> check whether expand_ccmp_next returns available result and the rtl
> cost for the empty ccmp sequence is always smaller.
>
> Check the expand_ccmp_next result ret and ret2, returns the valid one
> before cost comparison.
>
> gcc/ChangeLog:
>
>         * ccmp.cc (expand_ccmp_expr_1): Check ret and ret2 of
>         expand_ccmp_next, returns the valid one first before
>         comparing cost.
> ---
>  gcc/ccmp.cc | 12 +++++++++++-
>  1 file changed, 11 insertions(+), 1 deletion(-)
>
> diff --git a/gcc/ccmp.cc b/gcc/ccmp.cc
> index 7cb525addf4..4b424220068 100644
> --- a/gcc/ccmp.cc
> +++ b/gcc/ccmp.cc
> @@ -247,7 +247,17 @@ expand_ccmp_expr_1 (gimple *g, rtx_insn **prep_seq, rtx_insn **gen_seq)
>               cost2 = seq_cost (prep_seq_2, speed_p);
>               cost2 += seq_cost (gen_seq_2, speed_p);
>             }
> -         if (cost2 < cost1)
> +
> +         /* For x86 target the ccmp does not support fp operands, but
> +            have fcomi insn that can produce eflags and then do int
> +            ccmp. So if one of the op is fp compare, ret1 or ret2 can
> +            fail, and the cost of the corresponding empty seq will
> +            always be smaller, then the NULL sequence will be returned.
> +            Add check for ret and ret2, returns the available one if
> +            the other is NULL.  */
> +         if ((!ret && ret2)
> +             || (!(ret && !ret2)
> +                 && cost2 < cost1))
>             {
>               *prep_seq = prep_seq_2;
>               *gen_seq = gen_seq_2;
> --
> 2.31.1
>
Hongyu Wang May 23, 2024, 8:27 a.m. UTC | #2
Gently ping for this :)
Hi Richard, Is it OK to adopt the ccmp change? Or did you know who can
help to review this part?
Thanks.

Hongyu Wang <wwwhhhyyy333@gmail.com> 于2024年5月15日周三 16:25写道:
>
> CC'd Richard for ccmp part as previously it is added only for aarch64.
> The original logic will not interrupted since if
> aarch64_gen_ccmp_first succeeded, aarch64_gen_ccmp_next will also
> success, the cmp/fcmp and ccmp/fccmp supports all GPI/GPF, and the
> prepare_operand will fixup the input that cmp supports but ccmp not,
> so ret/ret2 will all be valid when comparing cost.
> Thanks in advance.
>
> Hongyu Wang <hongyu.wang@intel.com> 于2024年5月15日周三 16:22写道:
> >
> > For general ccmp scenario, the tree sequence is like
> >
> > _1 = (a < b)
> > _2 = (c < d)
> > _3 = _1 & _2
> >
> > current ccmp expanding will try to swap compare order for _1 and _2,
> > compare the cost/cost2 between compare _1 and _2 first, then return the
> > sequence with lower cost.
> >
> > For x86 ccmp, we don't support FP compare as ccmp operand, but we
> > support fp comi + int ccmp sequence. With current cost comparison
> > model, the fp comi + int ccmp can never be generated since it doesn't
> > check whether expand_ccmp_next returns available result and the rtl
> > cost for the empty ccmp sequence is always smaller.
> >
> > Check the expand_ccmp_next result ret and ret2, returns the valid one
> > before cost comparison.
> >
> > gcc/ChangeLog:
> >
> >         * ccmp.cc (expand_ccmp_expr_1): Check ret and ret2 of
> >         expand_ccmp_next, returns the valid one first before
> >         comparing cost.
> > ---
> >  gcc/ccmp.cc | 12 +++++++++++-
> >  1 file changed, 11 insertions(+), 1 deletion(-)
> >
> > diff --git a/gcc/ccmp.cc b/gcc/ccmp.cc
> > index 7cb525addf4..4b424220068 100644
> > --- a/gcc/ccmp.cc
> > +++ b/gcc/ccmp.cc
> > @@ -247,7 +247,17 @@ expand_ccmp_expr_1 (gimple *g, rtx_insn **prep_seq, rtx_insn **gen_seq)
> >               cost2 = seq_cost (prep_seq_2, speed_p);
> >               cost2 += seq_cost (gen_seq_2, speed_p);
> >             }
> > -         if (cost2 < cost1)
> > +
> > +         /* For x86 target the ccmp does not support fp operands, but
> > +            have fcomi insn that can produce eflags and then do int
> > +            ccmp. So if one of the op is fp compare, ret1 or ret2 can
> > +            fail, and the cost of the corresponding empty seq will
> > +            always be smaller, then the NULL sequence will be returned.
> > +            Add check for ret and ret2, returns the available one if
> > +            the other is NULL.  */
> > +         if ((!ret && ret2)
> > +             || (!(ret && !ret2)
> > +                 && cost2 < cost1))
> >             {
> >               *prep_seq = prep_seq_2;
> >               *gen_seq = gen_seq_2;
> > --
> > 2.31.1
> >
Hongyu Wang May 30, 2024, 3:11 a.m. UTC | #3
Gently ping :)
Hi Richard, Is it OK to adopt the ccmp change? Or did you know who can
help to review this part?
Thanks.


Hongyu Wang <wwwhhhyyy333@gmail.com> 于2024年5月23日周四 16:27写道:

>
> Gently ping for this :)
> Hi Richard, Is it OK to adopt the ccmp change? Or did you know who can
> help to review this part?
> Thanks.
>
> Hongyu Wang <wwwhhhyyy333@gmail.com> 于2024年5月15日周三 16:25写道:
> >
> > CC'd Richard for ccmp part as previously it is added only for aarch64.
> > The original logic will not interrupted since if
> > aarch64_gen_ccmp_first succeeded, aarch64_gen_ccmp_next will also
> > success, the cmp/fcmp and ccmp/fccmp supports all GPI/GPF, and the
> > prepare_operand will fixup the input that cmp supports but ccmp not,
> > so ret/ret2 will all be valid when comparing cost.
> > Thanks in advance.
> >
> > Hongyu Wang <hongyu.wang@intel.com> 于2024年5月15日周三 16:22写道:
> > >
> > > For general ccmp scenario, the tree sequence is like
> > >
> > > _1 = (a < b)
> > > _2 = (c < d)
> > > _3 = _1 & _2
> > >
> > > current ccmp expanding will try to swap compare order for _1 and _2,
> > > compare the cost/cost2 between compare _1 and _2 first, then return the
> > > sequence with lower cost.
> > >
> > > For x86 ccmp, we don't support FP compare as ccmp operand, but we
> > > support fp comi + int ccmp sequence. With current cost comparison
> > > model, the fp comi + int ccmp can never be generated since it doesn't
> > > check whether expand_ccmp_next returns available result and the rtl
> > > cost for the empty ccmp sequence is always smaller.
> > >
> > > Check the expand_ccmp_next result ret and ret2, returns the valid one
> > > before cost comparison.
> > >
> > > gcc/ChangeLog:
> > >
> > >         * ccmp.cc (expand_ccmp_expr_1): Check ret and ret2 of
> > >         expand_ccmp_next, returns the valid one first before
> > >         comparing cost.
> > > ---
> > >  gcc/ccmp.cc | 12 +++++++++++-
> > >  1 file changed, 11 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/gcc/ccmp.cc b/gcc/ccmp.cc
> > > index 7cb525addf4..4b424220068 100644
> > > --- a/gcc/ccmp.cc
> > > +++ b/gcc/ccmp.cc
> > > @@ -247,7 +247,17 @@ expand_ccmp_expr_1 (gimple *g, rtx_insn **prep_seq, rtx_insn **gen_seq)
> > >               cost2 = seq_cost (prep_seq_2, speed_p);
> > >               cost2 += seq_cost (gen_seq_2, speed_p);
> > >             }
> > > -         if (cost2 < cost1)
> > > +
> > > +         /* For x86 target the ccmp does not support fp operands, but
> > > +            have fcomi insn that can produce eflags and then do int
> > > +            ccmp. So if one of the op is fp compare, ret1 or ret2 can
> > > +            fail, and the cost of the corresponding empty seq will
> > > +            always be smaller, then the NULL sequence will be returned.
> > > +            Add check for ret and ret2, returns the available one if
> > > +            the other is NULL.  */
> > > +         if ((!ret && ret2)
> > > +             || (!(ret && !ret2)
> > > +                 && cost2 < cost1))
> > >             {
> > >               *prep_seq = prep_seq_2;
> > >               *gen_seq = gen_seq_2;
> > > --
> > > 2.31.1
> > >
Richard Sandiford June 5, 2024, 9:21 a.m. UTC | #4
Hongyu Wang <wwwhhhyyy333@gmail.com> writes:
> CC'd Richard for ccmp part as previously it is added only for aarch64.
> The original logic will not interrupted since if
> aarch64_gen_ccmp_first succeeded, aarch64_gen_ccmp_next will also
> success, the cmp/fcmp and ccmp/fccmp supports all GPI/GPF, and the
> prepare_operand will fixup the input that cmp supports but ccmp not,
> so ret/ret2 will all be valid when comparing cost.
> Thanks in advance.

Sorry for the slow review.

> Hongyu Wang <hongyu.wang@intel.com> 于2024年5月15日周三 16:22写道:
>>
>> For general ccmp scenario, the tree sequence is like
>>
>> _1 = (a < b)
>> _2 = (c < d)
>> _3 = _1 & _2
>>
>> current ccmp expanding will try to swap compare order for _1 and _2,
>> compare the cost/cost2 between compare _1 and _2 first, then return the
>> sequence with lower cost.
>>
>> For x86 ccmp, we don't support FP compare as ccmp operand, but we
>> support fp comi + int ccmp sequence. With current cost comparison
>> model, the fp comi + int ccmp can never be generated since it doesn't
>> check whether expand_ccmp_next returns available result and the rtl
>> cost for the empty ccmp sequence is always smaller.
>>
>> Check the expand_ccmp_next result ret and ret2, returns the valid one
>> before cost comparison.
>>
>> gcc/ChangeLog:
>>
>>         * ccmp.cc (expand_ccmp_expr_1): Check ret and ret2 of
>>         expand_ccmp_next, returns the valid one first before
>>         comparing cost.
>> ---
>>  gcc/ccmp.cc | 12 +++++++++++-
>>  1 file changed, 11 insertions(+), 1 deletion(-)
>>
>> diff --git a/gcc/ccmp.cc b/gcc/ccmp.cc
>> index 7cb525addf4..4b424220068 100644
>> --- a/gcc/ccmp.cc
>> +++ b/gcc/ccmp.cc
>> @@ -247,7 +247,17 @@ expand_ccmp_expr_1 (gimple *g, rtx_insn **prep_seq, rtx_insn **gen_seq)
>>               cost2 = seq_cost (prep_seq_2, speed_p);
>>               cost2 += seq_cost (gen_seq_2, speed_p);
>>             }
>> -         if (cost2 < cost1)
>> +
>> +         /* For x86 target the ccmp does not support fp operands, but
>> +            have fcomi insn that can produce eflags and then do int
>> +            ccmp. So if one of the op is fp compare, ret1 or ret2 can
>> +            fail, and the cost of the corresponding empty seq will
>> +            always be smaller, then the NULL sequence will be returned.
>> +            Add check for ret and ret2, returns the available one if
>> +            the other is NULL.  */

I think the more fundamental point is that the cost of a failed
expansion isn't meaningful.  So how about:

	  /* It's possible that one expansion succeeds and the other fails.
	     For example, x86 has int ccmp but not fp ccmp, and so a combined
	     fp and int comparison must be ordered such that the fp comparison
	     happens first.  The costs are not meaningful for failed
	     expansions.  */

>> +         if ((!ret && ret2)
>> +             || (!(ret && !ret2)
>> +                 && cost2 < cost1))

I think this simplifies to:

	  if (ret2 && (!ret1 || cost2 < cost1))

OK with those changes, thanks.

Richard

>>             {
>>               *prep_seq = prep_seq_2;
>>               *gen_seq = gen_seq_2;
>> --
>> 2.31.1
>>
Hongyu Wang June 6, 2024, 7:13 a.m. UTC | #5
Thanks, this is the patch I'm going to check-in.

For general ccmp scenario, the tree sequence is like

_1 = (a < b)
_2 = (c < d)
_3 = _1 & _2

current ccmp expanding will try to swap compare order for _1 and _2,
compare the expansion cost/cost2 for expanding _1 or _2 first, then
return the sequence with lower cost.

It is possible that one expansion succeeds and the other fails.
For example, x86 has int ccmp but not fp ccmp, so a combined fp and
int comparison must be ordered such that the fp comparison happens
first.  The costs are not meaningful for failed expansions.

Check the expand_ccmp_next result ret and ret2, returns the valid one
before cost comparison.

gcc/ChangeLog:

* ccmp.cc (expand_ccmp_expr_1): Check ret and ret2 of
  expand_ccmp_next, returns the valid one first instead of
  comparing cost.
---
 gcc/ccmp.cc | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/gcc/ccmp.cc b/gcc/ccmp.cc
index 7cb525addf4..4d50708d986 100644
--- a/gcc/ccmp.cc
+++ b/gcc/ccmp.cc
@@ -247,7 +247,15 @@ expand_ccmp_expr_1 (gimple *g, rtx_insn
**prep_seq, rtx_insn **gen_seq)
        cost2 = seq_cost (prep_seq_2, speed_p);
        cost2 += seq_cost (gen_seq_2, speed_p);
      }
-   if (cost2 < cost1)
+
+   /* It's possible that one expansion succeeds and the other
+      fails.
+      For example, x86 has int ccmp but not fp ccmp, and so a
+      combined fp and int comparison must be ordered such that
+      the fp comparison happens first. The costs are not
+      meaningful for failed expansions.  */
+
+   if (ret2 && (!ret || cost2 < cost1))
      {
        *prep_seq = prep_seq_2;
        *gen_seq = gen_seq_2;
--
2.31.1

Richard Sandiford <richard.sandiford@arm.com> 于2024年6月5日周三 17:21写道:

>
> Hongyu Wang <wwwhhhyyy333@gmail.com> writes:
> > CC'd Richard for ccmp part as previously it is added only for aarch64.
> > The original logic will not interrupted since if
> > aarch64_gen_ccmp_first succeeded, aarch64_gen_ccmp_next will also
> > success, the cmp/fcmp and ccmp/fccmp supports all GPI/GPF, and the
> > prepare_operand will fixup the input that cmp supports but ccmp not,
> > so ret/ret2 will all be valid when comparing cost.
> > Thanks in advance.
>
> Sorry for the slow review.
>
> > Hongyu Wang <hongyu.wang@intel.com> 于2024年5月15日周三 16:22写道:
> >>
> >> For general ccmp scenario, the tree sequence is like
> >>
> >> _1 = (a < b)
> >> _2 = (c < d)
> >> _3 = _1 & _2
> >>
> >> current ccmp expanding will try to swap compare order for _1 and _2,
> >> compare the cost/cost2 between compare _1 and _2 first, then return the
> >> sequence with lower cost.
> >>
> >> For x86 ccmp, we don't support FP compare as ccmp operand, but we
> >> support fp comi + int ccmp sequence. With current cost comparison
> >> model, the fp comi + int ccmp can never be generated since it doesn't
> >> check whether expand_ccmp_next returns available result and the rtl
> >> cost for the empty ccmp sequence is always smaller.
> >>
> >> Check the expand_ccmp_next result ret and ret2, returns the valid one
> >> before cost comparison.
> >>
> >> gcc/ChangeLog:
> >>
> >>         * ccmp.cc (expand_ccmp_expr_1): Check ret and ret2 of
> >>         expand_ccmp_next, returns the valid one first before
> >>         comparing cost.
> >> ---
> >>  gcc/ccmp.cc | 12 +++++++++++-
> >>  1 file changed, 11 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/gcc/ccmp.cc b/gcc/ccmp.cc
> >> index 7cb525addf4..4b424220068 100644
> >> --- a/gcc/ccmp.cc
> >> +++ b/gcc/ccmp.cc
> >> @@ -247,7 +247,17 @@ expand_ccmp_expr_1 (gimple *g, rtx_insn **prep_seq, rtx_insn **gen_seq)
> >>               cost2 = seq_cost (prep_seq_2, speed_p);
> >>               cost2 += seq_cost (gen_seq_2, speed_p);
> >>             }
> >> -         if (cost2 < cost1)
> >> +
> >> +         /* For x86 target the ccmp does not support fp operands, but
> >> +            have fcomi insn that can produce eflags and then do int
> >> +            ccmp. So if one of the op is fp compare, ret1 or ret2 can
> >> +            fail, and the cost of the corresponding empty seq will
> >> +            always be smaller, then the NULL sequence will be returned.
> >> +            Add check for ret and ret2, returns the available one if
> >> +            the other is NULL.  */
>
> I think the more fundamental point is that the cost of a failed
> expansion isn't meaningful.  So how about:
>
>           /* It's possible that one expansion succeeds and the other fails.
>              For example, x86 has int ccmp but not fp ccmp, and so a combined
>              fp and int comparison must be ordered such that the fp comparison
>              happens first.  The costs are not meaningful for failed
>              expansions.  */
>
> >> +         if ((!ret && ret2)
> >> +             || (!(ret && !ret2)
> >> +                 && cost2 < cost1))
>
> I think this simplifies to:
>
>           if (ret2 && (!ret1 || cost2 < cost1))
>
> OK with those changes, thanks.
>
> Richard
>
> >>             {
> >>               *prep_seq = prep_seq_2;
> >>               *gen_seq = gen_seq_2;
> >> --
> >> 2.31.1
> >>
diff mbox series

Patch

diff --git a/gcc/ccmp.cc b/gcc/ccmp.cc
index 7cb525addf4..4b424220068 100644
--- a/gcc/ccmp.cc
+++ b/gcc/ccmp.cc
@@ -247,7 +247,17 @@  expand_ccmp_expr_1 (gimple *g, rtx_insn **prep_seq, rtx_insn **gen_seq)
 	      cost2 = seq_cost (prep_seq_2, speed_p);
 	      cost2 += seq_cost (gen_seq_2, speed_p);
 	    }
-	  if (cost2 < cost1)
+
+	  /* For x86 target the ccmp does not support fp operands, but
+	     have fcomi insn that can produce eflags and then do int
+	     ccmp. So if one of the op is fp compare, ret1 or ret2 can
+	     fail, and the cost of the corresponding empty seq will
+	     always be smaller, then the NULL sequence will be returned.
+	     Add check for ret and ret2, returns the available one if
+	     the other is NULL.  */
+	  if ((!ret && ret2)
+	      || (!(ret && !ret2)
+		  && cost2 < cost1))
 	    {
 	      *prep_seq = prep_seq_2;
 	      *gen_seq = gen_seq_2;