Message ID | 16cb11c5-c46e-b0ae-2813-52f141414a41@redhat.com |
---|---|
State | New |
Headers | show |
Series | tree-optimization/104530 - proposed re-evaluation. | expand |
On Tue, Feb 22, 2022 at 5:42 PM Andrew MacLeod via Gcc-patches <gcc-patches@gcc.gnu.org> wrote: > > This patch simply leverages the existing computation machinery to > re-evaluate values dependent on a newly found non-null value > > Ranger associates a monotonically increasing temporal value with every > def as it is defined. When that value is used, we check if any of the > values used in the definition have been updated, making the current > cached global value stale. This makes the evaluation lazy, if there are > no more uses, we will never re-evaluate. > > When an ssa-name is marked non-null it does not change the global value, > and thus will not invalidate any global values. This patch marks any > definitions in the block which are dependent on the non-null value as > stale. This will cause them to be re-evaluated when they are next used. > > Imports: b.0_1 d.3_7 > Exports: b.0_1 _2 _3 d.3_7 _8 > _2 : b.0_1(I) > _3 : b.0_1(I) _2 > _8 : b.0_1(I) _2 _3 d.3_7(I) > > b.0_1 = b; > _2 = b.0_1 == 0B; > _3 = (int) _2; > c = _3; > _5 = *b.0_1; <<-- from this point b.0_1 is [+1, +INF] > a = _5; > d.3_7 = d; > _8 = _3 % d.3_7; > if (_8 != 0) > > when _5 is defined, and n.0_1 becomes non-null, we mark the dependent > names that are exports and defined in this block as stale. so _2, _3 > and _8. > > When _8 is being calculated, _3 is stale, and causes it to be > recomputed. it is dependent on _2, alsdo stale, so it is also > recomputed, and we end up with > > _2 == [0, 0] > _3 == [0 ,0] > and _8 = [0, 0] > And then we can fold away the condition. > > The side effect is that _2 and _3 are globally changed to be [0, 0], but > this is OK because it is the definition block, so it dominates all other > uses of these names, and they should be [0,0] upon exit anyway. The > previous patch ensure that the global values written to > SSA_NAME_RANGE_INFO is the correct [0,1] for both _2 and _3. > > The patch would have been even smaller if I already had a mark_stale > method. I thought there was one, but I guess it never made it in from > lack of need at the time. The only other tweak was to make the value > stale if the dependent value was the same as the definitions. > > This bootstraps on x86_64-pc-linux-gnu with no regressions. Re-running > to ensure. @@ -1475,6 +1488,15 @@ ranger_cache::update_to_nonnull (basic_block bb, tree name) { r.set_nonzero (type); m_on_entry.set_bb_range (name, bb, r); + // Mark consumers of name stale so they can be recomputed. + if (m_gori.is_import_p (name, bb) || m_gori.is_export_p (name, bb)) + { + tree x; + FOR_EACH_GORI_EXPORT_NAME (m_gori, bb, x) + if (m_gori.in_chain_p (name, x) + && gimple_bb (SSA_NAME_DEF_STMT (x)) == bb) + m_temporal->set_stale (x); + } } so if we have a BB that exports N names and each of those is updated to nonnull this is going to be quadratic? It also looks like the gimple_bb check is cheaper than the bitmap test done in in_chain_p. What comes to my mind is why we need to mark "consumers"? Can't consumers check their uses defs when they look at their timestamp? This whole set_stale thing doesn't seem to be transitive anyway, consider: _1 = ... <bb> _2 = _1 + ..; <bb> _3 = _2 + ...; so when _1 is updated to non-null we mark _2 as stale but _3 should also be stale, no? When we visit _3 before eventually getting to _2 (to see whether it updates and thus we more precisely we know if it makes _3 stale) we won't re-evaluate it? That said, the change looks somewhat ad-hoc to get to 1-level deep second-level opportunities? Richard. > > OK for trunk? or defer to stage 1? > Andrew
On 2/23/22 02:33, Richard Biener wrote: > On Tue, Feb 22, 2022 at 5:42 PM Andrew MacLeod via Gcc-patches > <gcc-patches@gcc.gnu.org> wrote: >> This patch simply leverages the existing computation machinery to >> re-evaluate values dependent on a newly found non-null value >> >> Ranger associates a monotonically increasing temporal value with every >> def as it is defined. When that value is used, we check if any of the >> values used in the definition have been updated, making the current >> cached global value stale. This makes the evaluation lazy, if there are >> no more uses, we will never re-evaluate. >> >> When an ssa-name is marked non-null it does not change the global value, >> and thus will not invalidate any global values. This patch marks any >> definitions in the block which are dependent on the non-null value as >> stale. This will cause them to be re-evaluated when they are next used. >> >> Imports: b.0_1 d.3_7 >> Exports: b.0_1 _2 _3 d.3_7 _8 >> _2 : b.0_1(I) >> _3 : b.0_1(I) _2 >> _8 : b.0_1(I) _2 _3 d.3_7(I) >> >> b.0_1 = b; >> _2 = b.0_1 == 0B; >> _3 = (int) _2; >> c = _3; >> _5 = *b.0_1; <<-- from this point b.0_1 is [+1, +INF] >> a = _5; >> d.3_7 = d; >> _8 = _3 % d.3_7; >> if (_8 != 0) >> >> when _5 is defined, and n.0_1 becomes non-null, we mark the dependent >> names that are exports and defined in this block as stale. so _2, _3 >> and _8. >> >> When _8 is being calculated, _3 is stale, and causes it to be >> recomputed. it is dependent on _2, alsdo stale, so it is also >> recomputed, and we end up with >> >> _2 == [0, 0] >> _3 == [0 ,0] >> and _8 = [0, 0] >> And then we can fold away the condition. >> >> The side effect is that _2 and _3 are globally changed to be [0, 0], but >> this is OK because it is the definition block, so it dominates all other >> uses of these names, and they should be [0,0] upon exit anyway. The >> previous patch ensure that the global values written to >> SSA_NAME_RANGE_INFO is the correct [0,1] for both _2 and _3. >> >> The patch would have been even smaller if I already had a mark_stale >> method. I thought there was one, but I guess it never made it in from >> lack of need at the time. The only other tweak was to make the value >> stale if the dependent value was the same as the definitions. >> >> This bootstraps on x86_64-pc-linux-gnu with no regressions. Re-running >> to ensure. > @@ -1475,6 +1488,15 @@ ranger_cache::update_to_nonnull (basic_block > bb, tree name) > { > r.set_nonzero (type); > m_on_entry.set_bb_range (name, bb, r); > + // Mark consumers of name stale so they can be recomputed. > + if (m_gori.is_import_p (name, bb) || m_gori.is_export_p (name, bb)) > + { > + tree x; > + FOR_EACH_GORI_EXPORT_NAME (m_gori, bb, x) > + if (m_gori.in_chain_p (name, x) > + && gimple_bb (SSA_NAME_DEF_STMT (x)) == bb) > + m_temporal->set_stale (x); > + } > } > > so if we have a BB that exports N names and each of those is updated to nonnull > this is going to be quadratic? It also looks like the gimple_bb check > is cheaper > than the bitmap test done in in_chain_p. What comes to my mind is why we need > to mark "consumers"? Can't consumers check their uses defs when they look > at their timestamp? This whole set_stale thing doesn't seem to be They do. The timestamps only look at direct uses. Any use of _2 should look at the def and notice it is stale relative to b.0_1 automatically. We miss the opportunity in the example which uses _3 to compute _8. _3 is directly dependent on _2 whose def is not stale relative to _3, so we miss the transitive staleness via b.0_1. This marks all the consumers whose calculation is derived from the now non-null value as stale. Within the block, it is fully transitive and anything potentially derived from NAME will be recalculated if it is used. In old EVRP terms, it would be like updating the current value vector for any ssa-names derived from NAME when it becomes non-null, except it is done lazily. > transitive anyway, > consider: > > _1 = ... > > <bb> > _2 = _1 + ..; > > <bb> > _3 = _2 + ...; > > so when _1 is updated to non-null we mark _2 as stale but _3 should > also be stale, no? > When we visit _3 before eventually getting to _2 (to see whether it > updates and thus > we more precisely we know if it makes _3 stale) we won't re-evaluate it? > That said, the change looks somewhat ad-hoc to get to 1-level deep second-level > opportunities? The patch applies only to dom-walks, and primarily targets definitions in the current block that we have already seen that we now know are stale. It is one approach to applying non-null later in the same block without resorting to much of an algorithmic change. It's not really intended to affect anything cross block as that is handled differently via the GORI engine. It would provide better on-exit ranges in the definition block for some of the names involved. That said, I'm not crazy about putting anything else into this release anyway, so if the regressions isn't serious enough, then I'd simply wait for the revamp of side-effects in the next release to deal with it. Andrew
On 2/22/2022 9:40 AM, Andrew MacLeod via Gcc-patches wrote: > This patch simply leverages the existing computation machinery to > re-evaluate values dependent on a newly found non-null value > > Ranger associates a monotonically increasing temporal value with every > def as it is defined. When that value is used, we check if any of the > values used in the definition have been updated, making the current > cached global value stale. This makes the evaluation lazy, if there > are no more uses, we will never re-evaluate. > > When an ssa-name is marked non-null it does not change the global > value, and thus will not invalidate any global values. This patch > marks any definitions in the block which are dependent on the non-null > value as stale. This will cause them to be re-evaluated when they are > next used. > > Imports: b.0_1 d.3_7 > Exports: b.0_1 _2 _3 d.3_7 _8 > _2 : b.0_1(I) > _3 : b.0_1(I) _2 > _8 : b.0_1(I) _2 _3 d.3_7(I) > > b.0_1 = b; > _2 = b.0_1 == 0B; > _3 = (int) _2; > c = _3; > _5 = *b.0_1; <<-- from this point b.0_1 is [+1, +INF] > a = _5; > d.3_7 = d; > _8 = _3 % d.3_7; > if (_8 != 0) > > when _5 is defined, and n.0_1 becomes non-null, we mark the dependent > names that are exports and defined in this block as stale. so _2, _3 > and _8. > > When _8 is being calculated, _3 is stale, and causes it to be > recomputed. it is dependent on _2, alsdo stale, so it is also > recomputed, and we end up with > > _2 == [0, 0] > _3 == [0 ,0] > and _8 = [0, 0] > And then we can fold away the condition. > > The side effect is that _2 and _3 are globally changed to be [0, 0], > but this is OK because it is the definition block, so it dominates all > other uses of these names, and they should be [0,0] upon exit anyway. > The previous patch ensure that the global values written to > SSA_NAME_RANGE_INFO is the correct [0,1] for both _2 and _3. > > The patch would have been even smaller if I already had a mark_stale > method. I thought there was one, but I guess it never made it in > from lack of need at the time. The only other tweak was to make the > value stale if the dependent value was the same as the definitions. > > This bootstraps on x86_64-pc-linux-gnu with no regressions. Re-running > to ensure. > > OK for trunk? or defer to stage 1? Seems reasonable now that we're in stage1. Obviously given the time between original posting and now you should probably bootstrap and regression test it again. Jeff
From a7e4e5f04899817cacc3ebe5cc3ff2d489489309 Mon Sep 17 00:00:00 2001 From: Andrew MacLeod <amacleod@redhat.com> Date: Tue, 22 Feb 2022 09:58:00 -0500 Subject: [PATCH 2/2] Mark defs dependent on non-null stale. When a name is marked as non-null, find all exports from the block, and mark their timestamp as stale. Any following use of the name will trigger a recomputaion using the new non-null range. PR tree-optimization/104530 gcc/ * gimple-range-cache.cc (temporal_cache::set_stale): New. (temporal_cache::current_p): Identical timestamp is not current. (ranger_cache::update_to_nonnull): Mark any export defined in this block stale if it is dependent on this name. gcc/testsuite/ * gcc.dg/pr104530.c: New. --- gcc/gimple-range-cache.cc | 26 ++++++++++++++++++++++++-- gcc/testsuite/gcc.dg/pr104530.c | 17 +++++++++++++++++ 2 files changed, 41 insertions(+), 2 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/pr104530.c diff --git a/gcc/gimple-range-cache.cc b/gcc/gimple-range-cache.cc index 613135266a4..debc93767a9 100644 --- a/gcc/gimple-range-cache.cc +++ b/gcc/gimple-range-cache.cc @@ -696,6 +696,7 @@ public: bool current_p (tree name, tree dep1, tree dep2) const; void set_timestamp (tree name); void set_always_current (tree name); + void set_stale (tree name); private: unsigned temporal_value (unsigned ssa) const; @@ -740,9 +741,9 @@ temporal_cache::current_p (tree name, tree dep1, tree dep2) const // Any non-registered dependencies will have a value of 0 and thus be older. // Return true if time is newer than either dependent. - if (dep1 && ts < temporal_value (SSA_NAME_VERSION (dep1))) + if (dep1 && ts <= temporal_value (SSA_NAME_VERSION (dep1))) return false; - if (dep2 && ts < temporal_value (SSA_NAME_VERSION (dep2))) + if (dep2 && ts <= temporal_value (SSA_NAME_VERSION (dep2))) return false; return true; @@ -759,6 +760,18 @@ temporal_cache::set_timestamp (tree name) m_timestamp[v] = ++m_current_time; } +// Mark a NAME as stale by marking the timestamp as oldest, unless it is +// already "always current". + +inline void +temporal_cache::set_stale (tree name) +{ + unsigned v = SSA_NAME_VERSION (name); + if (v >= m_timestamp.length () || m_timestamp[v] == 0) + return; + m_timestamp[v] = 1; +} + // Set the timestamp to 0, marking it as "always up to date". inline void @@ -1475,6 +1488,15 @@ ranger_cache::update_to_nonnull (basic_block bb, tree name) { r.set_nonzero (type); m_on_entry.set_bb_range (name, bb, r); + // Mark consumers of name stale so they can be recomputed. + if (m_gori.is_import_p (name, bb) || m_gori.is_export_p (name, bb)) + { + tree x; + FOR_EACH_GORI_EXPORT_NAME (m_gori, bb, x) + if (m_gori.in_chain_p (name, x) + && gimple_bb (SSA_NAME_DEF_STMT (x)) == bb) + m_temporal->set_stale (x); + } } } } diff --git a/gcc/testsuite/gcc.dg/pr104530.c b/gcc/testsuite/gcc.dg/pr104530.c new file mode 100644 index 00000000000..9adedc5e5f9 --- /dev/null +++ b/gcc/testsuite/gcc.dg/pr104530.c @@ -0,0 +1,17 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-evrp" } */ + +void foo(void); + +static int a, *b = &a, c, d = 1; + +int main() { + c = 0 == b; + a = *b; + if (c % d) + for (; d; --d) + foo(); + b = 0; +} + +/* { dg-final { scan-tree-dump-not "foo" "evrp" } } */ -- 2.17.2