Message ID | 32ad22b3-1fa7-47f6-b8e9-877cea9aee00@linux.ibm.com |
---|---|
State | New |
Headers | show |
Series | [tree-optimization,predcom] Improve unroll factor for predictive commoning | expand |
On Thu, Jul 11, 2024 at 10:30 AM Ajit Agarwal <aagarwa1@linux.ibm.com> wrote: > > Hello All: > > Unroll factor is determined with max distance across loop iterations. > The logic for determining the loop unroll factor is based on > data dependency across loop iterations. > > The max distance across loop iterations is the unrolling factor > that helps in predictive commoning. The old comment in the code says > - /* The best unroll factor for this chain is equal to the number of > - temporary variables that we create for it. */ why is that wrong and why is the max dependence distance more correct? Do you have a testcase that shows how this makes a (positive) difference? Richard. > Bootstrapped and regtested on powerpc64-linux-gnu. > > Thanks & Regards > Ajit > > tree-optimization, predcom: Improve unroll factor for predictive commoning > > Unroll factor is determined with max distance across loop iterations. > The logic for determining the loop unroll factor is based on > data dependency across loop iterations. > > The max distance across loop iterations is the unrolling factor > that helps in predictive commoning. > > 2024-07-11 Ajit Kumar Agarwal <aagarwa1@linux.ibm.com> > > gcc/ChangeLog: > > * tree-predcom.cc: Change in determining unroll factor with > data dependence across loop iterations. > --- > gcc/tree-predcom.cc | 51 ++++++++++++++++++++++++++++++++++----------- > 1 file changed, 39 insertions(+), 12 deletions(-) > > diff --git a/gcc/tree-predcom.cc b/gcc/tree-predcom.cc > index 9844fee1e97..029b02f5990 100644 > --- a/gcc/tree-predcom.cc > +++ b/gcc/tree-predcom.cc > @@ -409,6 +409,7 @@ public: > /* Perform the predictive commoning optimization for chains, make this > public for being called in callback execute_pred_commoning_cbck. */ > void execute_pred_commoning (bitmap tmp_vars); > + unsigned determine_unroll_factor (const vec<chain_p> &chains); > > private: > /* The pointer to the given loop. */ > @@ -2400,13 +2401,46 @@ pcom_worker::execute_pred_commoning_chain (chain_p chain, > copies as possible. CHAINS is the list of chains that will be > optimized. */ > > -static unsigned > -determine_unroll_factor (const vec<chain_p> &chains) > +unsigned > +pcom_worker::determine_unroll_factor (const vec<chain_p> &chains) > { > chain_p chain; > - unsigned factor = 1, af, nfactor, i; > + unsigned factor = 1, i; > unsigned max = param_max_unroll_times; > + struct data_dependence_relation *ddr; > + unsigned nfactor = 0; > + int nzfactor = 0; > + > + /* Best unroll factor is the maximum distance across loop > + iterations. */ > + FOR_EACH_VEC_ELT (m_dependences, i, ddr) > + { > + for (unsigned j = 0; j < DDR_NUM_DIST_VECTS (ddr); j++) > + { > + lambda_vector vec = DDR_DIST_VECT (ddr, j); > + widest_int distance = vec[j]; > + unsigned offset = distance.to_uhwi (); > + if (offset == 0) > + continue; > + > + int dist = offset - nzfactor; > + if (dist == 0) > + continue; > > + if (nfactor == 0) > + { > + nfactor = offset; > + nzfactor = offset; > + } > + else if (dist <= nzfactor) > + nfactor = offset; > + > + if (nfactor > 0 && nfactor <= max) > + factor = nfactor; > + } > + } > + > + int max_use = 0; > FOR_EACH_VEC_ELT (chains, i, chain) > { > if (chain->type == CT_INVARIANT) > @@ -2427,17 +2461,10 @@ determine_unroll_factor (const vec<chain_p> &chains) > continue; > } > > - /* The best unroll factor for this chain is equal to the number of > - temporary variables that we create for it. */ > - af = chain->length; > if (chain->has_max_use_after) > - af++; > - > - nfactor = factor * af / gcd (factor, af); > - if (nfactor <= max) > - factor = nfactor; > + max_use++; > } > - > + factor += max_use; > return factor; > } > > -- > 2.43.5 >
Hello Richard: On 11/07/24 2:21 pm, Richard Biener wrote: > On Thu, Jul 11, 2024 at 10:30 AM Ajit Agarwal <aagarwa1@linux.ibm.com> wrote: >> >> Hello All: >> >> Unroll factor is determined with max distance across loop iterations. >> The logic for determining the loop unroll factor is based on >> data dependency across loop iterations. >> >> The max distance across loop iterations is the unrolling factor >> that helps in predictive commoning. > > The old comment in the code says > >> - /* The best unroll factor for this chain is equal to the number of >> - temporary variables that we create for it. */ > > why is that wrong and why is the max dependence distance more correct? > > Do you have a testcase that shows how this makes a (positive) difference? > There is nothing wrong in the existing implementation of unroll factor for predictive commoning. But with max dependence distance we get performance improvement with spec 2017 benchmarks (INT) of 0.01% (Geomean) with and without changes. Improvement in benchmarks with max dependence distance changes. I have used the following flags: -O2 -funroll-loops --param max-unroll-times=8 -fpredictive-commoning -fno-tree-pre With above flags I ran with and without changes. There is no degradation with spec 2017 (FP benchmarks). Because in predictive commoning we reuse values computed in earlier iterations of a loop in the later ones, max distance is the better choice. > Richard. > Thanks & Regards Ajit >> Bootstrapped and regtested on powerpc64-linux-gnu. >> >> Thanks & Regards >> Ajit >> >> tree-optimization, predcom: Improve unroll factor for predictive commoning >> >> Unroll factor is determined with max distance across loop iterations. >> The logic for determining the loop unroll factor is based on >> data dependency across loop iterations. >> >> The max distance across loop iterations is the unrolling factor >> that helps in predictive commoning. >> >> 2024-07-11 Ajit Kumar Agarwal <aagarwa1@linux.ibm.com> >> >> gcc/ChangeLog: >> >> * tree-predcom.cc: Change in determining unroll factor with >> data dependence across loop iterations. >> --- >> gcc/tree-predcom.cc | 51 ++++++++++++++++++++++++++++++++++----------- >> 1 file changed, 39 insertions(+), 12 deletions(-) >> >> diff --git a/gcc/tree-predcom.cc b/gcc/tree-predcom.cc >> index 9844fee1e97..029b02f5990 100644 >> --- a/gcc/tree-predcom.cc >> +++ b/gcc/tree-predcom.cc >> @@ -409,6 +409,7 @@ public: >> /* Perform the predictive commoning optimization for chains, make this >> public for being called in callback execute_pred_commoning_cbck. */ >> void execute_pred_commoning (bitmap tmp_vars); >> + unsigned determine_unroll_factor (const vec<chain_p> &chains); >> >> private: >> /* The pointer to the given loop. */ >> @@ -2400,13 +2401,46 @@ pcom_worker::execute_pred_commoning_chain (chain_p chain, >> copies as possible. CHAINS is the list of chains that will be >> optimized. */ >> >> -static unsigned >> -determine_unroll_factor (const vec<chain_p> &chains) >> +unsigned >> +pcom_worker::determine_unroll_factor (const vec<chain_p> &chains) >> { >> chain_p chain; >> - unsigned factor = 1, af, nfactor, i; >> + unsigned factor = 1, i; >> unsigned max = param_max_unroll_times; >> + struct data_dependence_relation *ddr; >> + unsigned nfactor = 0; >> + int nzfactor = 0; >> + >> + /* Best unroll factor is the maximum distance across loop >> + iterations. */ >> + FOR_EACH_VEC_ELT (m_dependences, i, ddr) >> + { >> + for (unsigned j = 0; j < DDR_NUM_DIST_VECTS (ddr); j++) >> + { >> + lambda_vector vec = DDR_DIST_VECT (ddr, j); >> + widest_int distance = vec[j]; >> + unsigned offset = distance.to_uhwi (); >> + if (offset == 0) >> + continue; >> + >> + int dist = offset - nzfactor; >> + if (dist == 0) >> + continue; >> >> + if (nfactor == 0) >> + { >> + nfactor = offset; >> + nzfactor = offset; >> + } >> + else if (dist <= nzfactor) >> + nfactor = offset; >> + >> + if (nfactor > 0 && nfactor <= max) >> + factor = nfactor; >> + } >> + } >> + >> + int max_use = 0; >> FOR_EACH_VEC_ELT (chains, i, chain) >> { >> if (chain->type == CT_INVARIANT) >> @@ -2427,17 +2461,10 @@ determine_unroll_factor (const vec<chain_p> &chains) >> continue; >> } >> >> - /* The best unroll factor for this chain is equal to the number of >> - temporary variables that we create for it. */ >> - af = chain->length; >> if (chain->has_max_use_after) >> - af++; >> - >> - nfactor = factor * af / gcd (factor, af); >> - if (nfactor <= max) >> - factor = nfactor; >> + max_use++; >> } >> - >> + factor += max_use; >> return factor; >> } >> >> -- >> 2.43.5 >>
On Fri, Jul 12, 2024 at 12:09 PM Ajit Agarwal <aagarwa1@linux.ibm.com> wrote: > > Hello Richard: > > On 11/07/24 2:21 pm, Richard Biener wrote: > > On Thu, Jul 11, 2024 at 10:30 AM Ajit Agarwal <aagarwa1@linux.ibm.com> wrote: > >> > >> Hello All: > >> > >> Unroll factor is determined with max distance across loop iterations. > >> The logic for determining the loop unroll factor is based on > >> data dependency across loop iterations. > >> > >> The max distance across loop iterations is the unrolling factor > >> that helps in predictive commoning. > > > > The old comment in the code says > > > >> - /* The best unroll factor for this chain is equal to the number of > >> - temporary variables that we create for it. */ > > > > why is that wrong and why is the max dependence distance more correct? > > > > Do you have a testcase that shows how this makes a (positive) difference? > > > > There is nothing wrong in the existing implementation of unroll > factor for predictive commoning. > > But with max dependence distance we get performance improvement > with spec 2017 benchmarks (INT) of 0.01% (Geomean) with and without > changes. Improvement in benchmarks with max dependence distance > changes. > > I have used the following flags: > -O2 -funroll-loops --param max-unroll-times=8 -fpredictive-commoning -fno-tree-pre > > With above flags I ran with and without changes. A 0.01% geomean improvement is noise. Why did you disable PRE? > There is no degradation with spec 2017 (FP benchmarks). > > Because in predictive commoning we reuse values computed in > earlier iterations of a loop in the later ones, max distance is the > better choice. The re-use distance is the same though. So your change merely increases the unroll factor? Or can you explain why there is more re-use with your change. Richard. > > Richard. > > > > Thanks & Regards > Ajit > > >> Bootstrapped and regtested on powerpc64-linux-gnu. > >> > >> Thanks & Regards > >> Ajit > >> > >> tree-optimization, predcom: Improve unroll factor for predictive commoning > >> > >> Unroll factor is determined with max distance across loop iterations. > >> The logic for determining the loop unroll factor is based on > >> data dependency across loop iterations. > >> > >> The max distance across loop iterations is the unrolling factor > >> that helps in predictive commoning. > >> > >> 2024-07-11 Ajit Kumar Agarwal <aagarwa1@linux.ibm.com> > >> > >> gcc/ChangeLog: > >> > >> * tree-predcom.cc: Change in determining unroll factor with > >> data dependence across loop iterations. > >> --- > >> gcc/tree-predcom.cc | 51 ++++++++++++++++++++++++++++++++++----------- > >> 1 file changed, 39 insertions(+), 12 deletions(-) > >> > >> diff --git a/gcc/tree-predcom.cc b/gcc/tree-predcom.cc > >> index 9844fee1e97..029b02f5990 100644 > >> --- a/gcc/tree-predcom.cc > >> +++ b/gcc/tree-predcom.cc > >> @@ -409,6 +409,7 @@ public: > >> /* Perform the predictive commoning optimization for chains, make this > >> public for being called in callback execute_pred_commoning_cbck. */ > >> void execute_pred_commoning (bitmap tmp_vars); > >> + unsigned determine_unroll_factor (const vec<chain_p> &chains); > >> > >> private: > >> /* The pointer to the given loop. */ > >> @@ -2400,13 +2401,46 @@ pcom_worker::execute_pred_commoning_chain (chain_p chain, > >> copies as possible. CHAINS is the list of chains that will be > >> optimized. */ > >> > >> -static unsigned > >> -determine_unroll_factor (const vec<chain_p> &chains) > >> +unsigned > >> +pcom_worker::determine_unroll_factor (const vec<chain_p> &chains) > >> { > >> chain_p chain; > >> - unsigned factor = 1, af, nfactor, i; > >> + unsigned factor = 1, i; > >> unsigned max = param_max_unroll_times; > >> + struct data_dependence_relation *ddr; > >> + unsigned nfactor = 0; > >> + int nzfactor = 0; > >> + > >> + /* Best unroll factor is the maximum distance across loop > >> + iterations. */ > >> + FOR_EACH_VEC_ELT (m_dependences, i, ddr) > >> + { > >> + for (unsigned j = 0; j < DDR_NUM_DIST_VECTS (ddr); j++) > >> + { > >> + lambda_vector vec = DDR_DIST_VECT (ddr, j); > >> + widest_int distance = vec[j]; > >> + unsigned offset = distance.to_uhwi (); > >> + if (offset == 0) > >> + continue; > >> + > >> + int dist = offset - nzfactor; > >> + if (dist == 0) > >> + continue; > >> > >> + if (nfactor == 0) > >> + { > >> + nfactor = offset; > >> + nzfactor = offset; > >> + } > >> + else if (dist <= nzfactor) > >> + nfactor = offset; > >> + > >> + if (nfactor > 0 && nfactor <= max) > >> + factor = nfactor; > >> + } > >> + } > >> + > >> + int max_use = 0; > >> FOR_EACH_VEC_ELT (chains, i, chain) > >> { > >> if (chain->type == CT_INVARIANT) > >> @@ -2427,17 +2461,10 @@ determine_unroll_factor (const vec<chain_p> &chains) > >> continue; > >> } > >> > >> - /* The best unroll factor for this chain is equal to the number of > >> - temporary variables that we create for it. */ > >> - af = chain->length; > >> if (chain->has_max_use_after) > >> - af++; > >> - > >> - nfactor = factor * af / gcd (factor, af); > >> - if (nfactor <= max) > >> - factor = nfactor; > >> + max_use++; > >> } > >> - > >> + factor += max_use; > >> return factor; > >> } > >> > >> -- > >> 2.43.5 > >>
Hello Richard: On 12/07/24 6:20 pm, Richard Biener wrote: > On Fri, Jul 12, 2024 at 12:09 PM Ajit Agarwal <aagarwa1@linux.ibm.com> wrote: >> >> Hello Richard: >> >> On 11/07/24 2:21 pm, Richard Biener wrote: >>> On Thu, Jul 11, 2024 at 10:30 AM Ajit Agarwal <aagarwa1@linux.ibm.com> wrote: >>>> >>>> Hello All: >>>> >>>> Unroll factor is determined with max distance across loop iterations. >>>> The logic for determining the loop unroll factor is based on >>>> data dependency across loop iterations. >>>> >>>> The max distance across loop iterations is the unrolling factor >>>> that helps in predictive commoning. >>> >>> The old comment in the code says >>> >>>> - /* The best unroll factor for this chain is equal to the number of >>>> - temporary variables that we create for it. */ >>> >>> why is that wrong and why is the max dependence distance more correct? >>> >>> Do you have a testcase that shows how this makes a (positive) difference? >>> >> >> There is nothing wrong in the existing implementation of unroll >> factor for predictive commoning. >> >> But with max dependence distance we get performance improvement >> with spec 2017 benchmarks (INT) of 0.01% (Geomean) with and without >> changes. Improvement in benchmarks with max dependence distance >> changes. >> >> I have used the following flags: >> -O2 -funroll-loops --param max-unroll-times=8 -fpredictive-commoning -fno-tree-pre >> >> With above flags I ran with and without changes. > > A 0.01% geomean improvement is noise. Why did you disable PRE? > I have changed the flags used. Now I change the flags to -O3 and -fpredictive-commoning -funroll-loops. I am measuring the performance with the above flags for spec 2017 benchmarks and would let you know the performance by Monday. >> There is no degradation with spec 2017 (FP benchmarks). >> >> Because in predictive commoning we reuse values computed in >> earlier iterations of a loop in the later ones, max distance is the >> better choice. > > The re-use distance is the same though. So your change merely increases > the unroll factor? Or can you explain why there is more re-use with > your change. > With -O3 flag and -fpredictive-commoning -funroll-loops in spec 2017 benchmarks many benchmarks increases the unroll factor with my changes in predictive commoning determine_unroll_factor. I have traversed the data dependence relation vector and get the distance and the max data dependence distance is the unroll factor. > Richard. > Thanks & Regards Ajit >>> Richard. >>> >> >> Thanks & Regards >> Ajit >> >>>> Bootstrapped and regtested on powerpc64-linux-gnu. >>>> >>>> Thanks & Regards >>>> Ajit >>>> >>>> tree-optimization, predcom: Improve unroll factor for predictive commoning >>>> >>>> Unroll factor is determined with max distance across loop iterations. >>>> The logic for determining the loop unroll factor is based on >>>> data dependency across loop iterations. >>>> >>>> The max distance across loop iterations is the unrolling factor >>>> that helps in predictive commoning. >>>> >>>> 2024-07-11 Ajit Kumar Agarwal <aagarwa1@linux.ibm.com> >>>> >>>> gcc/ChangeLog: >>>> >>>> * tree-predcom.cc: Change in determining unroll factor with >>>> data dependence across loop iterations. >>>> --- >>>> gcc/tree-predcom.cc | 51 ++++++++++++++++++++++++++++++++++----------- >>>> 1 file changed, 39 insertions(+), 12 deletions(-) >>>> >>>> diff --git a/gcc/tree-predcom.cc b/gcc/tree-predcom.cc >>>> index 9844fee1e97..029b02f5990 100644 >>>> --- a/gcc/tree-predcom.cc >>>> +++ b/gcc/tree-predcom.cc >>>> @@ -409,6 +409,7 @@ public: >>>> /* Perform the predictive commoning optimization for chains, make this >>>> public for being called in callback execute_pred_commoning_cbck. */ >>>> void execute_pred_commoning (bitmap tmp_vars); >>>> + unsigned determine_unroll_factor (const vec<chain_p> &chains); >>>> >>>> private: >>>> /* The pointer to the given loop. */ >>>> @@ -2400,13 +2401,46 @@ pcom_worker::execute_pred_commoning_chain (chain_p chain, >>>> copies as possible. CHAINS is the list of chains that will be >>>> optimized. */ >>>> >>>> -static unsigned >>>> -determine_unroll_factor (const vec<chain_p> &chains) >>>> +unsigned >>>> +pcom_worker::determine_unroll_factor (const vec<chain_p> &chains) >>>> { >>>> chain_p chain; >>>> - unsigned factor = 1, af, nfactor, i; >>>> + unsigned factor = 1, i; >>>> unsigned max = param_max_unroll_times; >>>> + struct data_dependence_relation *ddr; >>>> + unsigned nfactor = 0; >>>> + int nzfactor = 0; >>>> + >>>> + /* Best unroll factor is the maximum distance across loop >>>> + iterations. */ >>>> + FOR_EACH_VEC_ELT (m_dependences, i, ddr) >>>> + { >>>> + for (unsigned j = 0; j < DDR_NUM_DIST_VECTS (ddr); j++) >>>> + { >>>> + lambda_vector vec = DDR_DIST_VECT (ddr, j); >>>> + widest_int distance = vec[j]; >>>> + unsigned offset = distance.to_uhwi (); >>>> + if (offset == 0) >>>> + continue; >>>> + >>>> + int dist = offset - nzfactor; >>>> + if (dist == 0) >>>> + continue; >>>> >>>> + if (nfactor == 0) >>>> + { >>>> + nfactor = offset; >>>> + nzfactor = offset; >>>> + } >>>> + else if (dist <= nzfactor) >>>> + nfactor = offset; >>>> + >>>> + if (nfactor > 0 && nfactor <= max) >>>> + factor = nfactor; >>>> + } >>>> + } >>>> + >>>> + int max_use = 0; >>>> FOR_EACH_VEC_ELT (chains, i, chain) >>>> { >>>> if (chain->type == CT_INVARIANT) >>>> @@ -2427,17 +2461,10 @@ determine_unroll_factor (const vec<chain_p> &chains) >>>> continue; >>>> } >>>> >>>> - /* The best unroll factor for this chain is equal to the number of >>>> - temporary variables that we create for it. */ >>>> - af = chain->length; >>>> if (chain->has_max_use_after) >>>> - af++; >>>> - >>>> - nfactor = factor * af / gcd (factor, af); >>>> - if (nfactor <= max) >>>> - factor = nfactor; >>>> + max_use++; >>>> } >>>> - >>>> + factor += max_use; >>>> return factor; >>>> } >>>> >>>> -- >>>> 2.43.5 >>>>
Hello Richard: On 12/07/24 6:20 pm, Richard Biener wrote: > On Fri, Jul 12, 2024 at 12:09 PM Ajit Agarwal <aagarwa1@linux.ibm.com> wrote: >> >> Hello Richard: >> >> On 11/07/24 2:21 pm, Richard Biener wrote: >>> On Thu, Jul 11, 2024 at 10:30 AM Ajit Agarwal <aagarwa1@linux.ibm.com> wrote: >>>> >>>> Hello All: >>>> >>>> Unroll factor is determined with max distance across loop iterations. >>>> The logic for determining the loop unroll factor is based on >>>> data dependency across loop iterations. >>>> >>>> The max distance across loop iterations is the unrolling factor >>>> that helps in predictive commoning. >>> >>> The old comment in the code says >>> >>>> - /* The best unroll factor for this chain is equal to the number of >>>> - temporary variables that we create for it. */ >>> >>> why is that wrong and why is the max dependence distance more correct? >>> >>> Do you have a testcase that shows how this makes a (positive) difference? >>> >> >> There is nothing wrong in the existing implementation of unroll >> factor for predictive commoning. >> >> But with max dependence distance we get performance improvement >> with spec 2017 benchmarks (INT) of 0.01% (Geomean) with and without >> changes. Improvement in benchmarks with max dependence distance >> changes. >> >> I have used the following flags: >> -O2 -funroll-loops --param max-unroll-times=8 -fpredictive-commoning -fno-tree-pre >> >> With above flags I ran with and without changes. > > A 0.01% geomean improvement is noise. Why did you disable PRE? > I have changed the flags. Now I changed the flags to -O3 -fpredictive-commoning -funroll-loops. With these flags I am measuring the performance with spec 2017 benchmarks. Would let you know the results by Monday. >> There is no degradation with spec 2017 (FP benchmarks). >> >> Because in predictive commoning we reuse values computed in >> earlier iterations of a loop in the later ones, max distance is the >> better choice. > > The re-use distance is the same though. So your change merely increases > the unroll factor? Or can you explain why there is more re-use with > your change. > With -O3 -fpredictive-commoning -funroll-loops many spec 2017 benchmarks increases unroll factor with my changes. I am traversing data dependence relation vector, get the distance and max distance is the unroll factor. > Richard. > Thanks & Regards Ajit >>> Richard. >>> >> >> Thanks & Regards >> Ajit >> >>>> Bootstrapped and regtested on powerpc64-linux-gnu. >>>> >>>> Thanks & Regards >>>> Ajit >>>> >>>> tree-optimization, predcom: Improve unroll factor for predictive commoning >>>> >>>> Unroll factor is determined with max distance across loop iterations. >>>> The logic for determining the loop unroll factor is based on >>>> data dependency across loop iterations. >>>> >>>> The max distance across loop iterations is the unrolling factor >>>> that helps in predictive commoning. >>>> >>>> 2024-07-11 Ajit Kumar Agarwal <aagarwa1@linux.ibm.com> >>>> >>>> gcc/ChangeLog: >>>> >>>> * tree-predcom.cc: Change in determining unroll factor with >>>> data dependence across loop iterations. >>>> --- >>>> gcc/tree-predcom.cc | 51 ++++++++++++++++++++++++++++++++++----------- >>>> 1 file changed, 39 insertions(+), 12 deletions(-) >>>> >>>> diff --git a/gcc/tree-predcom.cc b/gcc/tree-predcom.cc >>>> index 9844fee1e97..029b02f5990 100644 >>>> --- a/gcc/tree-predcom.cc >>>> +++ b/gcc/tree-predcom.cc >>>> @@ -409,6 +409,7 @@ public: >>>> /* Perform the predictive commoning optimization for chains, make this >>>> public for being called in callback execute_pred_commoning_cbck. */ >>>> void execute_pred_commoning (bitmap tmp_vars); >>>> + unsigned determine_unroll_factor (const vec<chain_p> &chains); >>>> >>>> private: >>>> /* The pointer to the given loop. */ >>>> @@ -2400,13 +2401,46 @@ pcom_worker::execute_pred_commoning_chain (chain_p chain, >>>> copies as possible. CHAINS is the list of chains that will be >>>> optimized. */ >>>> >>>> -static unsigned >>>> -determine_unroll_factor (const vec<chain_p> &chains) >>>> +unsigned >>>> +pcom_worker::determine_unroll_factor (const vec<chain_p> &chains) >>>> { >>>> chain_p chain; >>>> - unsigned factor = 1, af, nfactor, i; >>>> + unsigned factor = 1, i; >>>> unsigned max = param_max_unroll_times; >>>> + struct data_dependence_relation *ddr; >>>> + unsigned nfactor = 0; >>>> + int nzfactor = 0; >>>> + >>>> + /* Best unroll factor is the maximum distance across loop >>>> + iterations. */ >>>> + FOR_EACH_VEC_ELT (m_dependences, i, ddr) >>>> + { >>>> + for (unsigned j = 0; j < DDR_NUM_DIST_VECTS (ddr); j++) >>>> + { >>>> + lambda_vector vec = DDR_DIST_VECT (ddr, j); >>>> + widest_int distance = vec[j]; >>>> + unsigned offset = distance.to_uhwi (); >>>> + if (offset == 0) >>>> + continue; >>>> + >>>> + int dist = offset - nzfactor; >>>> + if (dist == 0) >>>> + continue; >>>> >>>> + if (nfactor == 0) >>>> + { >>>> + nfactor = offset; >>>> + nzfactor = offset; >>>> + } >>>> + else if (dist <= nzfactor) >>>> + nfactor = offset; >>>> + >>>> + if (nfactor > 0 && nfactor <= max) >>>> + factor = nfactor; >>>> + } >>>> + } >>>> + >>>> + int max_use = 0; >>>> FOR_EACH_VEC_ELT (chains, i, chain) >>>> { >>>> if (chain->type == CT_INVARIANT) >>>> @@ -2427,17 +2461,10 @@ determine_unroll_factor (const vec<chain_p> &chains) >>>> continue; >>>> } >>>> >>>> - /* The best unroll factor for this chain is equal to the number of >>>> - temporary variables that we create for it. */ >>>> - af = chain->length; >>>> if (chain->has_max_use_after) >>>> - af++; >>>> - >>>> - nfactor = factor * af / gcd (factor, af); >>>> - if (nfactor <= max) >>>> - factor = nfactor; >>>> + max_use++; >>>> } >>>> - >>>> + factor += max_use; >>>> return factor; >>>> } >>>> >>>> -- >>>> 2.43.5 >>>>
Hello Richard: On 13/07/24 8:16 pm, Ajit Agarwal wrote: > Hello Richard: > > On 12/07/24 6:20 pm, Richard Biener wrote: >> On Fri, Jul 12, 2024 at 12:09 PM Ajit Agarwal <aagarwa1@linux.ibm.com> wrote: >>> >>> Hello Richard: >>> >>> On 11/07/24 2:21 pm, Richard Biener wrote: >>>> On Thu, Jul 11, 2024 at 10:30 AM Ajit Agarwal <aagarwa1@linux.ibm.com> wrote: >>>>> >>>>> Hello All: >>>>> >>>>> Unroll factor is determined with max distance across loop iterations. >>>>> The logic for determining the loop unroll factor is based on >>>>> data dependency across loop iterations. >>>>> >>>>> The max distance across loop iterations is the unrolling factor >>>>> that helps in predictive commoning. >>>> >>>> The old comment in the code says >>>> >>>>> - /* The best unroll factor for this chain is equal to the number of >>>>> - temporary variables that we create for it. */ >>>> >>>> why is that wrong and why is the max dependence distance more correct? >>>> >>>> Do you have a testcase that shows how this makes a (positive) difference? >>>> >>> >>> There is nothing wrong in the existing implementation of unroll >>> factor for predictive commoning. >>> >>> But with max dependence distance we get performance improvement >>> with spec 2017 benchmarks (INT) of 0.01% (Geomean) with and without >>> changes. Improvement in benchmarks with max dependence distance >>> changes. >>> >>> I have used the following flags: >>> -O2 -funroll-loops --param max-unroll-times=8 -fpredictive-commoning -fno-tree-pre >>> >>> With above flags I ran with and without changes. >> >> A 0.01% geomean improvement is noise. Why did you disable PRE? >> > > I have changed the flags. Now I changed the flags to -O3 > -fpredictive-commoning -funroll-loops. With these flags > I am measuring the performance with spec 2017 benchmarks. > Would let you know the results by Monday. > With the changes in this patch, 500.perlbench_r gave gain of 2.56% (spec 2017 INT) benchmarks. There is no degradation with other benchmarks. Thanks & Regards Ajit >>> There is no degradation with spec 2017 (FP benchmarks). >>> >>> Because in predictive commoning we reuse values computed in >>> earlier iterations of a loop in the later ones, max distance is the >>> better choice. >> >> The re-use distance is the same though. So your change merely increases >> the unroll factor? Or can you explain why there is more re-use with >> your change. >> > > With -O3 -fpredictive-commoning -funroll-loops many spec 2017 benchmarks > increases unroll factor with my changes. > > I am traversing data dependence relation vector, get the distance > and max distance is the unroll factor. > >> Richard. >> > Thanks & Regards > Ajit >>>> Richard. >>>> >>> >>> Thanks & Regards >>> Ajit >>> >>>>> Bootstrapped and regtested on powerpc64-linux-gnu. >>>>> >>>>> Thanks & Regards >>>>> Ajit >>>>> >>>>> tree-optimization, predcom: Improve unroll factor for predictive commoning >>>>> >>>>> Unroll factor is determined with max distance across loop iterations. >>>>> The logic for determining the loop unroll factor is based on >>>>> data dependency across loop iterations. >>>>> >>>>> The max distance across loop iterations is the unrolling factor >>>>> that helps in predictive commoning. >>>>> >>>>> 2024-07-11 Ajit Kumar Agarwal <aagarwa1@linux.ibm.com> >>>>> >>>>> gcc/ChangeLog: >>>>> >>>>> * tree-predcom.cc: Change in determining unroll factor with >>>>> data dependence across loop iterations. >>>>> --- >>>>> gcc/tree-predcom.cc | 51 ++++++++++++++++++++++++++++++++++----------- >>>>> 1 file changed, 39 insertions(+), 12 deletions(-) >>>>> >>>>> diff --git a/gcc/tree-predcom.cc b/gcc/tree-predcom.cc >>>>> index 9844fee1e97..029b02f5990 100644 >>>>> --- a/gcc/tree-predcom.cc >>>>> +++ b/gcc/tree-predcom.cc >>>>> @@ -409,6 +409,7 @@ public: >>>>> /* Perform the predictive commoning optimization for chains, make this >>>>> public for being called in callback execute_pred_commoning_cbck. */ >>>>> void execute_pred_commoning (bitmap tmp_vars); >>>>> + unsigned determine_unroll_factor (const vec<chain_p> &chains); >>>>> >>>>> private: >>>>> /* The pointer to the given loop. */ >>>>> @@ -2400,13 +2401,46 @@ pcom_worker::execute_pred_commoning_chain (chain_p chain, >>>>> copies as possible. CHAINS is the list of chains that will be >>>>> optimized. */ >>>>> >>>>> -static unsigned >>>>> -determine_unroll_factor (const vec<chain_p> &chains) >>>>> +unsigned >>>>> +pcom_worker::determine_unroll_factor (const vec<chain_p> &chains) >>>>> { >>>>> chain_p chain; >>>>> - unsigned factor = 1, af, nfactor, i; >>>>> + unsigned factor = 1, i; >>>>> unsigned max = param_max_unroll_times; >>>>> + struct data_dependence_relation *ddr; >>>>> + unsigned nfactor = 0; >>>>> + int nzfactor = 0; >>>>> + >>>>> + /* Best unroll factor is the maximum distance across loop >>>>> + iterations. */ >>>>> + FOR_EACH_VEC_ELT (m_dependences, i, ddr) >>>>> + { >>>>> + for (unsigned j = 0; j < DDR_NUM_DIST_VECTS (ddr); j++) >>>>> + { >>>>> + lambda_vector vec = DDR_DIST_VECT (ddr, j); >>>>> + widest_int distance = vec[j]; >>>>> + unsigned offset = distance.to_uhwi (); >>>>> + if (offset == 0) >>>>> + continue; >>>>> + >>>>> + int dist = offset - nzfactor; >>>>> + if (dist == 0) >>>>> + continue; >>>>> >>>>> + if (nfactor == 0) >>>>> + { >>>>> + nfactor = offset; >>>>> + nzfactor = offset; >>>>> + } >>>>> + else if (dist <= nzfactor) >>>>> + nfactor = offset; >>>>> + >>>>> + if (nfactor > 0 && nfactor <= max) >>>>> + factor = nfactor; >>>>> + } >>>>> + } >>>>> + >>>>> + int max_use = 0; >>>>> FOR_EACH_VEC_ELT (chains, i, chain) >>>>> { >>>>> if (chain->type == CT_INVARIANT) >>>>> @@ -2427,17 +2461,10 @@ determine_unroll_factor (const vec<chain_p> &chains) >>>>> continue; >>>>> } >>>>> >>>>> - /* The best unroll factor for this chain is equal to the number of >>>>> - temporary variables that we create for it. */ >>>>> - af = chain->length; >>>>> if (chain->has_max_use_after) >>>>> - af++; >>>>> - >>>>> - nfactor = factor * af / gcd (factor, af); >>>>> - if (nfactor <= max) >>>>> - factor = nfactor; >>>>> + max_use++; >>>>> } >>>>> - >>>>> + factor += max_use; >>>>> return factor; >>>>> } >>>>> >>>>> -- >>>>> 2.43.5 >>>>>
diff --git a/gcc/tree-predcom.cc b/gcc/tree-predcom.cc index 9844fee1e97..029b02f5990 100644 --- a/gcc/tree-predcom.cc +++ b/gcc/tree-predcom.cc @@ -409,6 +409,7 @@ public: /* Perform the predictive commoning optimization for chains, make this public for being called in callback execute_pred_commoning_cbck. */ void execute_pred_commoning (bitmap tmp_vars); + unsigned determine_unroll_factor (const vec<chain_p> &chains); private: /* The pointer to the given loop. */ @@ -2400,13 +2401,46 @@ pcom_worker::execute_pred_commoning_chain (chain_p chain, copies as possible. CHAINS is the list of chains that will be optimized. */ -static unsigned -determine_unroll_factor (const vec<chain_p> &chains) +unsigned +pcom_worker::determine_unroll_factor (const vec<chain_p> &chains) { chain_p chain; - unsigned factor = 1, af, nfactor, i; + unsigned factor = 1, i; unsigned max = param_max_unroll_times; + struct data_dependence_relation *ddr; + unsigned nfactor = 0; + int nzfactor = 0; + + /* Best unroll factor is the maximum distance across loop + iterations. */ + FOR_EACH_VEC_ELT (m_dependences, i, ddr) + { + for (unsigned j = 0; j < DDR_NUM_DIST_VECTS (ddr); j++) + { + lambda_vector vec = DDR_DIST_VECT (ddr, j); + widest_int distance = vec[j]; + unsigned offset = distance.to_uhwi (); + if (offset == 0) + continue; + + int dist = offset - nzfactor; + if (dist == 0) + continue; + if (nfactor == 0) + { + nfactor = offset; + nzfactor = offset; + } + else if (dist <= nzfactor) + nfactor = offset; + + if (nfactor > 0 && nfactor <= max) + factor = nfactor; + } + } + + int max_use = 0; FOR_EACH_VEC_ELT (chains, i, chain) { if (chain->type == CT_INVARIANT) @@ -2427,17 +2461,10 @@ determine_unroll_factor (const vec<chain_p> &chains) continue; } - /* The best unroll factor for this chain is equal to the number of - temporary variables that we create for it. */ - af = chain->length; if (chain->has_max_use_after) - af++; - - nfactor = factor * af / gcd (factor, af); - if (nfactor <= max) - factor = nfactor; + max_use++; } - + factor += max_use; return factor; }