Message ID | f3937a02-3b41-8187-1110-5a20a8f1b9d7@linux.ibm.com |
---|---|
State | New |
Headers | show |
Series | vect: Add extraction cost for slp reduc | expand |
On Mon, Aug 16, 2021 at 8:03 AM Kewen.Lin <linkw@linux.ibm.com> wrote: > > Hi, > > IIUC, the function vectorizable_bb_reduc_epilogue missed to > consider the cost to extract the final value from the vector > for reduc operations. This patch is to add one time of > vec_to_scalar cost for extracting. > > Bootstrapped & regtested on powerpc64le-linux-gnu P9. > The testing on x86_64 and aarch64 is ongoing. > > Is it ok for trunk? There's no such instruction necessary, the way the costing works the result is in lane zero already. Note the optabs are defined to reduce to a scalar already. So if your arch implements those and requires such move then the backend costing needs to handle that. That said, ideally we'd simply cost the IFN_REDUC_* in the backend but for BB reductions we don't actually build a SLP node with such representative stmt to pass down (yet). I guess you're running into a integer reduction where there's a vector -> gpr move missing in costing? I suppose costing vec_to_scalar works for that but in the end we should maybe find a way to cost the IFN_REDUC_* ... Richard. > BR, > Kewen > ----- > gcc/ChangeLog: > > * tree-vect-slp.c (vectorizable_bb_reduc_epilogue): Add the cost for > value extraction. > > diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c > index b9d88c2d943..841a0872afa 100644 > --- a/gcc/tree-vect-slp.c > +++ b/gcc/tree-vect-slp.c > @@ -4845,12 +4845,14 @@ vectorizable_bb_reduc_epilogue (slp_instance instance, > return false; > > /* There's no way to cost a horizontal vector reduction via REDUC_FN so > - cost log2 vector operations plus shuffles. */ > + cost log2 vector operations plus shuffles and one extraction. */ > unsigned steps = floor_log2 (vect_nunits_for_cost (vectype)); > record_stmt_cost (cost_vec, steps, vector_stmt, instance->root_stmts[0], > vectype, 0, vect_body); > record_stmt_cost (cost_vec, steps, vec_perm, instance->root_stmts[0], > vectype, 0, vect_body); > + record_stmt_cost (cost_vec, 1, vec_to_scalar, instance->root_stmts[0], > + vectype, 0, vect_body); > return true; > }
Hi Richi, Thanks for the comments! on 2021/8/16 下午2:49, Richard Biener wrote: > On Mon, Aug 16, 2021 at 8:03 AM Kewen.Lin <linkw@linux.ibm.com> wrote: >> >> Hi, >> >> IIUC, the function vectorizable_bb_reduc_epilogue missed to >> consider the cost to extract the final value from the vector >> for reduc operations. This patch is to add one time of >> vec_to_scalar cost for extracting. >> >> Bootstrapped & regtested on powerpc64le-linux-gnu P9. >> The testing on x86_64 and aarch64 is ongoing. >> >> Is it ok for trunk? > > There's no such instruction necessary, the way the costing works > the result is in lane zero already. Note the optabs are defined > to reduce to a scalar already. So if your arch implements those and > requires such move then the backend costing needs to handle that. > Yes, these reduc_<operation>_scal_<mode> should have made the operand[0] as the final scalar result. > That said, ideally we'd simply cost the IFN_REDUC_* in the backend > but for BB reductions we don't actually build a SLP node with such > representative stmt to pass down (yet). > OK, thanks for the explanation. It explains why we cost the IFN_REDUC_* as one vect_stmt in loop vect but cost it as conservative (shuffle and reduc_op) as possible here. > I guess you're running into a integer reduction where there's > a vector -> gpr move missing in costing? I suppose costing > vec_to_scalar works for that but in the end we should maybe > find a way to cost the IFN_REDUC_* ... Yeah, it's a reduction on plus, initially I wanted to adjust backend costing for various IFN_REDUC* (since for some variants Power has more than one instructions for them), then I noticed we cost the reduction as shuffle and reduc_op during SLP for now, I guess it's good to get vec_to_scalar considered here for consistency? Then it can be removed together when we have a better modeling in the end? BR, Kewen > > Richard. > >> BR, >> Kewen >> ----- >> gcc/ChangeLog: >> >> * tree-vect-slp.c (vectorizable_bb_reduc_epilogue): Add the cost for >> value extraction. >> >> diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c >> index b9d88c2d943..841a0872afa 100644 >> --- a/gcc/tree-vect-slp.c >> +++ b/gcc/tree-vect-slp.c >> @@ -4845,12 +4845,14 @@ vectorizable_bb_reduc_epilogue (slp_instance instance, >> return false; >> >> /* There's no way to cost a horizontal vector reduction via REDUC_FN so >> - cost log2 vector operations plus shuffles. */ >> + cost log2 vector operations plus shuffles and one extraction. */ >> unsigned steps = floor_log2 (vect_nunits_for_cost (vectype)); >> record_stmt_cost (cost_vec, steps, vector_stmt, instance->root_stmts[0], >> vectype, 0, vect_body); >> record_stmt_cost (cost_vec, steps, vec_perm, instance->root_stmts[0], >> vectype, 0, vect_body); >> + record_stmt_cost (cost_vec, 1, vec_to_scalar, instance->root_stmts[0], >> + vectype, 0, vect_body); >> return true; >> }
On Mon, Aug 16, 2021 at 9:16 AM Kewen.Lin <linkw@linux.ibm.com> wrote: > > Hi Richi, > > Thanks for the comments! > > on 2021/8/16 下午2:49, Richard Biener wrote: > > On Mon, Aug 16, 2021 at 8:03 AM Kewen.Lin <linkw@linux.ibm.com> wrote: > >> > >> Hi, > >> > >> IIUC, the function vectorizable_bb_reduc_epilogue missed to > >> consider the cost to extract the final value from the vector > >> for reduc operations. This patch is to add one time of > >> vec_to_scalar cost for extracting. > >> > >> Bootstrapped & regtested on powerpc64le-linux-gnu P9. > >> The testing on x86_64 and aarch64 is ongoing. > >> > >> Is it ok for trunk? > > > > There's no such instruction necessary, the way the costing works > > the result is in lane zero already. Note the optabs are defined > > to reduce to a scalar already. So if your arch implements those and > > requires such move then the backend costing needs to handle that. > > > > Yes, these reduc_<operation>_scal_<mode> should have made the > operand[0] as the final scalar result. > > > That said, ideally we'd simply cost the IFN_REDUC_* in the backend > > but for BB reductions we don't actually build a SLP node with such > > representative stmt to pass down (yet). > > > > OK, thanks for the explanation. It explains why we cost the > IFN_REDUC_* as one vect_stmt in loop vect but cost it as > conservative (shuffle and reduc_op) as possible here. > > > I guess you're running into a integer reduction where there's > > a vector -> gpr move missing in costing? I suppose costing > > vec_to_scalar works for that but in the end we should maybe > > find a way to cost the IFN_REDUC_* ... > > Yeah, it's a reduction on plus, initially I wanted to adjust backend > costing for various IFN_REDUC* (since for some variants Power has more > than one instructions for them), then I noticed we cost the reduction > as shuffle and reduc_op during SLP for now, I guess it's good to get > vec_to_scalar considered here for consistency? Then it can be removed > together when we have a better modeling in the end? Yeah, I guess that works for now. Thanks, Richard. > BR, > Kewen > > > > > Richard. > > > >> BR, > >> Kewen > >> ----- > >> gcc/ChangeLog: > >> > >> * tree-vect-slp.c (vectorizable_bb_reduc_epilogue): Add the cost for > >> value extraction. > >> > >> diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c > >> index b9d88c2d943..841a0872afa 100644 > >> --- a/gcc/tree-vect-slp.c > >> +++ b/gcc/tree-vect-slp.c > >> @@ -4845,12 +4845,14 @@ vectorizable_bb_reduc_epilogue (slp_instance instance, > >> return false; > >> > >> /* There's no way to cost a horizontal vector reduction via REDUC_FN so > >> - cost log2 vector operations plus shuffles. */ > >> + cost log2 vector operations plus shuffles and one extraction. */ > >> unsigned steps = floor_log2 (vect_nunits_for_cost (vectype)); > >> record_stmt_cost (cost_vec, steps, vector_stmt, instance->root_stmts[0], > >> vectype, 0, vect_body); > >> record_stmt_cost (cost_vec, steps, vec_perm, instance->root_stmts[0], > >> vectype, 0, vect_body); > >> + record_stmt_cost (cost_vec, 1, vec_to_scalar, instance->root_stmts[0], > >> + vectype, 0, vect_body); > >> return true; > >> } > >
diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c index b9d88c2d943..841a0872afa 100644 --- a/gcc/tree-vect-slp.c +++ b/gcc/tree-vect-slp.c @@ -4845,12 +4845,14 @@ vectorizable_bb_reduc_epilogue (slp_instance instance, return false; /* There's no way to cost a horizontal vector reduction via REDUC_FN so - cost log2 vector operations plus shuffles. */ + cost log2 vector operations plus shuffles and one extraction. */ unsigned steps = floor_log2 (vect_nunits_for_cost (vectype)); record_stmt_cost (cost_vec, steps, vector_stmt, instance->root_stmts[0], vectype, 0, vect_body); record_stmt_cost (cost_vec, steps, vec_perm, instance->root_stmts[0], vectype, 0, vect_body); + record_stmt_cost (cost_vec, 1, vec_to_scalar, instance->root_stmts[0], + vectype, 0, vect_body); return true; }