Message ID | 61ac669c-7293-f53a-20c7-158b5a813cee@linux.ibm.com |
---|---|
State | New |
Headers | show |
Series | Make loops_list support an optional loop_p root | expand |
On 7/23/21 2:41 AM, Kewen.Lin wrote: > on 2021/7/22 下午8:56, Richard Biener wrote: >> On Tue, Jul 20, 2021 at 4:37 >> PM Kewen.Lin <linkw@linux.ibm.com> wrote: >>> >>> Hi, >>> >>> This v2 has addressed some review comments/suggestions: >>> >>> - Use "!=" instead of "<" in function operator!= (const Iter &rhs) >>> - Add new CTOR loops_list (struct loops *loops, unsigned flags) >>> to support loop hierarchy tree rather than just a function, >>> and adjust to use loops* accordingly. >> >> I actually meant struct loop *, not struct loops * ;) At the point >> we pondered to make loop invariant motion work on single >> loop nests we gave up not only but also because it iterates >> over the loop nest but all the iterators only ever can process >> all loops, not say, all loops inside a specific 'loop' (and >> including that 'loop' if LI_INCLUDE_ROOT). So the >> CTOR would take the 'root' of the loop tree as argument. >> >> I see that doesn't trivially fit how loops_list works, at least >> not for LI_ONLY_INNERMOST. But I guess FROM_INNERMOST >> could be adjusted to do ONLY_INNERMOST as well? >> > > > Thanks for the clarification! I just realized that the previous > version with struct loops* is problematic, all traversal is > still bounded with outer_loop == NULL. I think what you expect > is to respect the given loop_p root boundary. Since we just > record the loops' nums, I think we still need the function* fn? > So I add one optional argument loop_p root and update the > visiting codes accordingly. Before this change, the previous > visiting uses the outer_loop == NULL as the termination condition, > it perfectly includes the root itself, but with this given root, > we have to use it as the termination condition to avoid to iterate > onto its possible existing next. > > For LI_ONLY_INNERMOST, I was thinking whether we can use the > code like: > > struct loops *fn_loops = loops_for_fn (fn)->larray; > for (i = 0; vec_safe_iterate (fn_loops, i, &aloop); i++) > if (aloop != NULL > && aloop->inner == NULL > && flow_loop_nested_p (tree_root, aloop)) > this->to_visit.quick_push (aloop->num); > > it has the stable bound, but if the given root only has several > child loops, it can be much worse if there are many loops in fn. > It seems impossible to predict the given root loop hierarchy size, > maybe we can still use the original linear searching for the case > loops_for_fn (fn) == root? But since this visiting seems not so > performance critical, I chose to share the code originally used > for FROM_INNERMOST, hope it can have better readability and > maintainability. I might be mixing up the two patches (they both seem to touch the same functions), but in this one the loops_list ctor looks like a sizeable function with at least one loop. Since the ctor is used in the initialization of each of the many range-for loops, that could result in inlining of a lot of these calls and so quite a bit code bloat. Unless this is necessary for efficiency (not my area) I would recommend to consider defining the loops_list ctor out-of-line in some .c or .cc file. (Also, if you agree with the rationale, I'd replace loop_p with loop * in the new code.) Thanks Martin > > Bootstrapped and regtested on powerpc64le-linux-gnu P9, > x86_64-redhat-linux and aarch64-linux-gnu, also > bootstrapped on ppc64le P9 with bootstrap-O3 config. > > Does the attached patch meet what you expect? > > BR, > Kewen > ----- > gcc/ChangeLog: > > * cfgloop.h (loops_list::loops_list): Add one optional argument root > and adjust accordingly. >
on 2021/7/24 上午12:26, Martin Sebor wrote: > On 7/23/21 2:41 AM, Kewen.Lin wrote: >> on 2021/7/22 下午8:56, Richard Biener wrote: >>> On Tue, Jul 20, 2021 at 4:37 >>> PM Kewen.Lin <linkw@linux.ibm.com> wrote: >>>> >>>> Hi, >>>> >>>> This v2 has addressed some review comments/suggestions: >>>> >>>> - Use "!=" instead of "<" in function operator!= (const Iter &rhs) >>>> - Add new CTOR loops_list (struct loops *loops, unsigned flags) >>>> to support loop hierarchy tree rather than just a function, >>>> and adjust to use loops* accordingly. >>> >>> I actually meant struct loop *, not struct loops * ;) At the point >>> we pondered to make loop invariant motion work on single >>> loop nests we gave up not only but also because it iterates >>> over the loop nest but all the iterators only ever can process >>> all loops, not say, all loops inside a specific 'loop' (and >>> including that 'loop' if LI_INCLUDE_ROOT). So the >>> CTOR would take the 'root' of the loop tree as argument. >>> >>> I see that doesn't trivially fit how loops_list works, at least >>> not for LI_ONLY_INNERMOST. But I guess FROM_INNERMOST >>> could be adjusted to do ONLY_INNERMOST as well? >>> >> >> >> Thanks for the clarification! I just realized that the previous >> version with struct loops* is problematic, all traversal is >> still bounded with outer_loop == NULL. I think what you expect >> is to respect the given loop_p root boundary. Since we just >> record the loops' nums, I think we still need the function* fn? >> So I add one optional argument loop_p root and update the >> visiting codes accordingly. Before this change, the previous >> visiting uses the outer_loop == NULL as the termination condition, >> it perfectly includes the root itself, but with this given root, >> we have to use it as the termination condition to avoid to iterate >> onto its possible existing next. >> >> For LI_ONLY_INNERMOST, I was thinking whether we can use the >> code like: >> >> struct loops *fn_loops = loops_for_fn (fn)->larray; >> for (i = 0; vec_safe_iterate (fn_loops, i, &aloop); i++) >> if (aloop != NULL >> && aloop->inner == NULL >> && flow_loop_nested_p (tree_root, aloop)) >> this->to_visit.quick_push (aloop->num); >> >> it has the stable bound, but if the given root only has several >> child loops, it can be much worse if there are many loops in fn. >> It seems impossible to predict the given root loop hierarchy size, >> maybe we can still use the original linear searching for the case >> loops_for_fn (fn) == root? But since this visiting seems not so >> performance critical, I chose to share the code originally used >> for FROM_INNERMOST, hope it can have better readability and >> maintainability. > > I might be mixing up the two patches (they both seem to touch > the same functions), but in this one the loops_list ctor looks > like a sizeable function with at least one loop. Since the ctor > is used in the initialization of each of the many range-for loops, > that could result in inlining of a lot of these calls and so quite > a bit code bloat. Unless this is necessary for efficiency (not > my area) I would recommend to consider defining the loops_list > ctor out-of-line in some .c or .cc file. > Yeah, they touch the same functions. Good point on the code bloat, I'm not sure the historical reason here, it needs Richi's input. :) > (Also, if you agree with the rationale, I'd replace loop_p with > loop * in the new code.) > Oh, thanks for the reminder, will update it. BR, Kewen > Thanks > Martin > >> >> Bootstrapped and regtested on powerpc64le-linux-gnu P9, >> x86_64-redhat-linux and aarch64-linux-gnu, also >> bootstrapped on ppc64le P9 with bootstrap-O3 config. >> >> Does the attached patch meet what you expect? >> >> BR, >> Kewen >> ----- >> gcc/ChangeLog: >> >> * cfgloop.h (loops_list::loops_list): Add one optional argument root >> and adjust accordingly. >> >
On Fri, Jul 23, 2021 at 10:41 AM Kewen.Lin <linkw@linux.ibm.com> wrote: > > on 2021/7/22 下午8:56, Richard Biener wrote: > > On Tue, Jul 20, 2021 at 4:37 > > PM Kewen.Lin <linkw@linux.ibm.com> wrote: > >> > >> Hi, > >> > >> This v2 has addressed some review comments/suggestions: > >> > >> - Use "!=" instead of "<" in function operator!= (const Iter &rhs) > >> - Add new CTOR loops_list (struct loops *loops, unsigned flags) > >> to support loop hierarchy tree rather than just a function, > >> and adjust to use loops* accordingly. > > > > I actually meant struct loop *, not struct loops * ;) At the point > > we pondered to make loop invariant motion work on single > > loop nests we gave up not only but also because it iterates > > over the loop nest but all the iterators only ever can process > > all loops, not say, all loops inside a specific 'loop' (and > > including that 'loop' if LI_INCLUDE_ROOT). So the > > CTOR would take the 'root' of the loop tree as argument. > > > > I see that doesn't trivially fit how loops_list works, at least > > not for LI_ONLY_INNERMOST. But I guess FROM_INNERMOST > > could be adjusted to do ONLY_INNERMOST as well? > > > > > Thanks for the clarification! I just realized that the previous > version with struct loops* is problematic, all traversal is > still bounded with outer_loop == NULL. I think what you expect > is to respect the given loop_p root boundary. Since we just > record the loops' nums, I think we still need the function* fn? Would it simplify things if we recorded the actual loop *? There's still the to_visit reserve which needs a bound on the number of loops for efficiency reasons. > So I add one optional argument loop_p root and update the > visiting codes accordingly. Before this change, the previous > visiting uses the outer_loop == NULL as the termination condition, > it perfectly includes the root itself, but with this given root, > we have to use it as the termination condition to avoid to iterate > onto its possible existing next. > > For LI_ONLY_INNERMOST, I was thinking whether we can use the > code like: > > struct loops *fn_loops = loops_for_fn (fn)->larray; > for (i = 0; vec_safe_iterate (fn_loops, i, &aloop); i++) > if (aloop != NULL > && aloop->inner == NULL > && flow_loop_nested_p (tree_root, aloop)) > this->to_visit.quick_push (aloop->num); > > it has the stable bound, but if the given root only has several > child loops, it can be much worse if there are many loops in fn. > It seems impossible to predict the given root loop hierarchy size, > maybe we can still use the original linear searching for the case > loops_for_fn (fn) == root? But since this visiting seems not so > performance critical, I chose to share the code originally used > for FROM_INNERMOST, hope it can have better readability and > maintainability. I was indeed looking for something that has execution/storage bound on the subtree we're interested in. If we pull the CTOR out-of-line we can probably keep the linear search for LI_ONLY_INNERMOST when looking at the whole loop tree. It just seemed to me that we can eventually re-use a single loop tree walker for all orders, just adjusting the places we push. > > Bootstrapped and regtested on powerpc64le-linux-gnu P9, > x86_64-redhat-linux and aarch64-linux-gnu, also > bootstrapped on ppc64le P9 with bootstrap-O3 config. > > Does the attached patch meet what you expect? So yeah, it's probably close to what is sensible. Not sure whether optimizing the loops for the !only_push_innermost_p case is important - if we manage to produce a single walker with conditionals based on 'flags' then IPA-CP should produce optimal clones as well I guess. Richard. > > BR, > Kewen > ----- > gcc/ChangeLog: > > * cfgloop.h (loops_list::loops_list): Add one optional argument root > and adjust accordingly.
diff --git a/gcc/cfgloop.h b/gcc/cfgloop.h index 741df44ea51..f7148df1758 100644 --- a/gcc/cfgloop.h +++ b/gcc/cfgloop.h @@ -669,13 +669,15 @@ as_const (T &t) } /* A list for visiting loops, which contains the loop numbers instead of - the loop pointers. The scope is restricted in function FN and the - visiting order is specified by FLAGS. */ + the loop pointers. If the loop ROOT is offered (non-null), the visiting + will start from it, otherwise it would start from loops_for_fn (FN) + instead. The scope is restricted in function FN and the visiting order + is specified by FLAGS. */ class loops_list { public: - loops_list (function *fn, unsigned flags); + loops_list (function *fn, unsigned flags, loop_p root = nullptr); template <typename T> class Iter { @@ -782,71 +784,94 @@ loops_list::Iter<T>::fill_curr_loop () } /* Set up the loops list to visit according to the specified - function scope FN and iterating order FLAGS. */ + function scope FN and iterating order FLAGS. If ROOT is + not null, the visiting would start from it, otherwise it + will start from tree_root of loops_for_fn (FN). */ -inline loops_list::loops_list (function *fn, unsigned flags) +inline loops_list::loops_list (function *fn, unsigned flags, loop_p root) { class loop *aloop; - unsigned i; int mn; + struct loops *loops = loops_for_fn (fn); + gcc_assert (!root || loops); + this->fn = fn; - if (!loops_for_fn (fn)) + if (!loops) return; + loop_p tree_root = root ? root : loops->tree_root; + this->to_visit.reserve_exact (number_of_loops (fn)); - mn = (flags & LI_INCLUDE_ROOT) ? 0 : 1; + mn = (flags & LI_INCLUDE_ROOT) ? -1 : tree_root->num; - if (flags & LI_ONLY_INNERMOST) - { - for (i = 0; vec_safe_iterate (loops_for_fn (fn)->larray, i, &aloop); i++) - if (aloop != NULL - && aloop->inner == NULL - && aloop->num >= mn) + /* The helper function for LI_FROM_INNERMOST and LI_ONLY_INNERMOST + visiting, ONLY_PUSH_INNERMOST_P indicates whether only push + the innermost loop, it's true for LI_ONLY_INNERMOST visiting + while false for LI_FROM_INNERMOST visiting. */ + auto visit_from_innermost = [&] (bool only_push_innermost_p) + { + /* Push the loops to LI->TO_VISIT in postorder. */ + + /* Early handle tree_root without any inner loops, make later + processing simpler, that is the while loop can only care + about loops which aren't possible to be tree_root. */ + if (!tree_root->inner) + { + if (tree_root->num != mn) + this->to_visit.quick_push (tree_root->num); + return; + } + + for (aloop = tree_root; + aloop->inner != NULL; + aloop = aloop->inner) + continue; + + while (1) + { + gcc_assert (aloop != tree_root); + if (!only_push_innermost_p || aloop->inner == NULL) this->to_visit.quick_push (aloop->num); - } - else if (flags & LI_FROM_INNERMOST) - { - /* Push the loops to LI->TO_VISIT in postorder. */ - for (aloop = loops_for_fn (fn)->tree_root; - aloop->inner != NULL; - aloop = aloop->inner) - continue; - while (1) - { - if (aloop->num >= mn) - this->to_visit.quick_push (aloop->num); + if (aloop->next) + { + for (aloop = aloop->next; + aloop->inner != NULL; + aloop = aloop->inner) + continue; + } + else if (loop_outer (aloop) == tree_root) + break; + else + aloop = loop_outer (aloop); + } + + /* Reconsider tree_root since the previous loop doesn't handle it. */ + if (!only_push_innermost_p && tree_root->num != mn) + this->to_visit.quick_push (tree_root->num); + }; - if (aloop->next) - { - for (aloop = aloop->next; - aloop->inner != NULL; - aloop = aloop->inner) - continue; - } - else if (!loop_outer (aloop)) - break; - else - aloop = loop_outer (aloop); - } - } + if (flags & LI_ONLY_INNERMOST) + visit_from_innermost (true); + else if (flags & LI_FROM_INNERMOST) + visit_from_innermost (false); else { /* Push the loops to LI->TO_VISIT in preorder. */ - aloop = loops_for_fn (fn)->tree_root; + aloop = tree_root; while (1) { - if (aloop->num >= mn) + if (aloop->num != mn) this->to_visit.quick_push (aloop->num); if (aloop->inner != NULL) aloop = aloop->inner; else { - while (aloop != NULL && aloop->next == NULL) + while (aloop != tree_root && aloop->next == NULL) aloop = loop_outer (aloop); - if (aloop == NULL) + if (aloop == tree_root) break; aloop = aloop->next; }