diff mbox series

vect: Add extraction cost for slp reduc

Message ID f3937a02-3b41-8187-1110-5a20a8f1b9d7@linux.ibm.com
State New
Headers show
Series vect: Add extraction cost for slp reduc | expand

Commit Message

Kewen.Lin Aug. 16, 2021, 6:03 a.m. UTC
Hi,

IIUC, the function vectorizable_bb_reduc_epilogue missed to
consider the cost to extract the final value from the vector
for reduc operations.  This patch is to add one time of
vec_to_scalar cost for extracting.

Bootstrapped & regtested on powerpc64le-linux-gnu P9.
The testing on x86_64 and aarch64 is ongoing.

Is it ok for trunk?

BR,
Kewen
-----
gcc/ChangeLog:

	* tree-vect-slp.c (vectorizable_bb_reduc_epilogue): Add the cost for
	value extraction.

Comments

Richard Biener Aug. 16, 2021, 6:49 a.m. UTC | #1
On Mon, Aug 16, 2021 at 8:03 AM Kewen.Lin <linkw@linux.ibm.com> wrote:
>
> Hi,
>
> IIUC, the function vectorizable_bb_reduc_epilogue missed to
> consider the cost to extract the final value from the vector
> for reduc operations.  This patch is to add one time of
> vec_to_scalar cost for extracting.
>
> Bootstrapped & regtested on powerpc64le-linux-gnu P9.
> The testing on x86_64 and aarch64 is ongoing.
>
> Is it ok for trunk?

There's no such instruction necessary, the way the costing works
the result is in lane zero already.  Note the optabs are defined
to reduce to a scalar already.  So if your arch implements those and
requires such move then the backend costing needs to handle that.

That said, ideally we'd simply cost the IFN_REDUC_* in the backend
but for BB reductions we don't actually build a SLP node with such
representative stmt to pass down (yet).

I guess you're running into a integer reduction where there's
a vector -> gpr move missing in costing?  I suppose costing
vec_to_scalar works for that but in the end we should maybe
find a way to cost the IFN_REDUC_* ...

Richard.

> BR,
> Kewen
> -----
> gcc/ChangeLog:
>
>         * tree-vect-slp.c (vectorizable_bb_reduc_epilogue): Add the cost for
>         value extraction.
>
> diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c
> index b9d88c2d943..841a0872afa 100644
> --- a/gcc/tree-vect-slp.c
> +++ b/gcc/tree-vect-slp.c
> @@ -4845,12 +4845,14 @@ vectorizable_bb_reduc_epilogue (slp_instance instance,
>      return false;
>
>    /* There's no way to cost a horizontal vector reduction via REDUC_FN so
> -     cost log2 vector operations plus shuffles.  */
> +     cost log2 vector operations plus shuffles and one extraction.  */
>    unsigned steps = floor_log2 (vect_nunits_for_cost (vectype));
>    record_stmt_cost (cost_vec, steps, vector_stmt, instance->root_stmts[0],
>                     vectype, 0, vect_body);
>    record_stmt_cost (cost_vec, steps, vec_perm, instance->root_stmts[0],
>                     vectype, 0, vect_body);
> +  record_stmt_cost (cost_vec, 1, vec_to_scalar, instance->root_stmts[0],
> +                   vectype, 0, vect_body);
>    return true;
>  }
Kewen.Lin Aug. 16, 2021, 7:15 a.m. UTC | #2
Hi Richi,

Thanks for the comments!

on 2021/8/16 下午2:49, Richard Biener wrote:
> On Mon, Aug 16, 2021 at 8:03 AM Kewen.Lin <linkw@linux.ibm.com> wrote:
>>
>> Hi,
>>
>> IIUC, the function vectorizable_bb_reduc_epilogue missed to
>> consider the cost to extract the final value from the vector
>> for reduc operations.  This patch is to add one time of
>> vec_to_scalar cost for extracting.
>>
>> Bootstrapped & regtested on powerpc64le-linux-gnu P9.
>> The testing on x86_64 and aarch64 is ongoing.
>>
>> Is it ok for trunk?
> 
> There's no such instruction necessary, the way the costing works
> the result is in lane zero already.  Note the optabs are defined
> to reduce to a scalar already.  So if your arch implements those and
> requires such move then the backend costing needs to handle that.
> 

Yes, these reduc_<operation>_scal_<mode> should have made the
operand[0] as the final scalar result.

> That said, ideally we'd simply cost the IFN_REDUC_* in the backend
> but for BB reductions we don't actually build a SLP node with such
> representative stmt to pass down (yet).
> 

OK, thanks for the explanation.  It explains why we cost the 
IFN_REDUC_* as one vect_stmt in loop vect but cost it as
conservative (shuffle and reduc_op) as possible here.

> I guess you're running into a integer reduction where there's
> a vector -> gpr move missing in costing?  I suppose costing
> vec_to_scalar works for that but in the end we should maybe
> find a way to cost the IFN_REDUC_* ...

Yeah, it's a reduction on plus, initially I wanted to adjust backend
costing for various IFN_REDUC* (since for some variants Power has more
than one instructions for them), then I noticed we cost the reduction
as shuffle and reduc_op during SLP for now, I guess it's good to get
vec_to_scalar considered here for consistency?  Then it can be removed
together when we have a better modeling in the end? 

BR,
Kewen

> 
> Richard.
> 
>> BR,
>> Kewen
>> -----
>> gcc/ChangeLog:
>>
>>         * tree-vect-slp.c (vectorizable_bb_reduc_epilogue): Add the cost for
>>         value extraction.
>>
>> diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c
>> index b9d88c2d943..841a0872afa 100644
>> --- a/gcc/tree-vect-slp.c
>> +++ b/gcc/tree-vect-slp.c
>> @@ -4845,12 +4845,14 @@ vectorizable_bb_reduc_epilogue (slp_instance instance,
>>      return false;
>>
>>    /* There's no way to cost a horizontal vector reduction via REDUC_FN so
>> -     cost log2 vector operations plus shuffles.  */
>> +     cost log2 vector operations plus shuffles and one extraction.  */
>>    unsigned steps = floor_log2 (vect_nunits_for_cost (vectype));
>>    record_stmt_cost (cost_vec, steps, vector_stmt, instance->root_stmts[0],
>>                     vectype, 0, vect_body);
>>    record_stmt_cost (cost_vec, steps, vec_perm, instance->root_stmts[0],
>>                     vectype, 0, vect_body);
>> +  record_stmt_cost (cost_vec, 1, vec_to_scalar, instance->root_stmts[0],
>> +                   vectype, 0, vect_body);
>>    return true;
>>  }
Richard Biener Aug. 16, 2021, 1:27 p.m. UTC | #3
On Mon, Aug 16, 2021 at 9:16 AM Kewen.Lin <linkw@linux.ibm.com> wrote:
>
> Hi Richi,
>
> Thanks for the comments!
>
> on 2021/8/16 下午2:49, Richard Biener wrote:
> > On Mon, Aug 16, 2021 at 8:03 AM Kewen.Lin <linkw@linux.ibm.com> wrote:
> >>
> >> Hi,
> >>
> >> IIUC, the function vectorizable_bb_reduc_epilogue missed to
> >> consider the cost to extract the final value from the vector
> >> for reduc operations.  This patch is to add one time of
> >> vec_to_scalar cost for extracting.
> >>
> >> Bootstrapped & regtested on powerpc64le-linux-gnu P9.
> >> The testing on x86_64 and aarch64 is ongoing.
> >>
> >> Is it ok for trunk?
> >
> > There's no such instruction necessary, the way the costing works
> > the result is in lane zero already.  Note the optabs are defined
> > to reduce to a scalar already.  So if your arch implements those and
> > requires such move then the backend costing needs to handle that.
> >
>
> Yes, these reduc_<operation>_scal_<mode> should have made the
> operand[0] as the final scalar result.
>
> > That said, ideally we'd simply cost the IFN_REDUC_* in the backend
> > but for BB reductions we don't actually build a SLP node with such
> > representative stmt to pass down (yet).
> >
>
> OK, thanks for the explanation.  It explains why we cost the
> IFN_REDUC_* as one vect_stmt in loop vect but cost it as
> conservative (shuffle and reduc_op) as possible here.
>
> > I guess you're running into a integer reduction where there's
> > a vector -> gpr move missing in costing?  I suppose costing
> > vec_to_scalar works for that but in the end we should maybe
> > find a way to cost the IFN_REDUC_* ...
>
> Yeah, it's a reduction on plus, initially I wanted to adjust backend
> costing for various IFN_REDUC* (since for some variants Power has more
> than one instructions for them), then I noticed we cost the reduction
> as shuffle and reduc_op during SLP for now, I guess it's good to get
> vec_to_scalar considered here for consistency?  Then it can be removed
> together when we have a better modeling in the end?

Yeah, I guess that works for now.

Thanks,
Richard.

> BR,
> Kewen
>
> >
> > Richard.
> >
> >> BR,
> >> Kewen
> >> -----
> >> gcc/ChangeLog:
> >>
> >>         * tree-vect-slp.c (vectorizable_bb_reduc_epilogue): Add the cost for
> >>         value extraction.
> >>
> >> diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c
> >> index b9d88c2d943..841a0872afa 100644
> >> --- a/gcc/tree-vect-slp.c
> >> +++ b/gcc/tree-vect-slp.c
> >> @@ -4845,12 +4845,14 @@ vectorizable_bb_reduc_epilogue (slp_instance instance,
> >>      return false;
> >>
> >>    /* There's no way to cost a horizontal vector reduction via REDUC_FN so
> >> -     cost log2 vector operations plus shuffles.  */
> >> +     cost log2 vector operations plus shuffles and one extraction.  */
> >>    unsigned steps = floor_log2 (vect_nunits_for_cost (vectype));
> >>    record_stmt_cost (cost_vec, steps, vector_stmt, instance->root_stmts[0],
> >>                     vectype, 0, vect_body);
> >>    record_stmt_cost (cost_vec, steps, vec_perm, instance->root_stmts[0],
> >>                     vectype, 0, vect_body);
> >> +  record_stmt_cost (cost_vec, 1, vec_to_scalar, instance->root_stmts[0],
> >> +                   vectype, 0, vect_body);
> >>    return true;
> >>  }
>
>
diff mbox series

Patch

diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c
index b9d88c2d943..841a0872afa 100644
--- a/gcc/tree-vect-slp.c
+++ b/gcc/tree-vect-slp.c
@@ -4845,12 +4845,14 @@  vectorizable_bb_reduc_epilogue (slp_instance instance,
     return false;

   /* There's no way to cost a horizontal vector reduction via REDUC_FN so
-     cost log2 vector operations plus shuffles.  */
+     cost log2 vector operations plus shuffles and one extraction.  */
   unsigned steps = floor_log2 (vect_nunits_for_cost (vectype));
   record_stmt_cost (cost_vec, steps, vector_stmt, instance->root_stmts[0],
 		    vectype, 0, vect_body);
   record_stmt_cost (cost_vec, steps, vec_perm, instance->root_stmts[0],
 		    vectype, 0, vect_body);
+  record_stmt_cost (cost_vec, 1, vec_to_scalar, instance->root_stmts[0],
+		    vectype, 0, vect_body);
   return true;
 }