Message ID | 155ced1e-67fb-7cfb-f94a-138e2b2ead35@redhat.com |
---|---|
State | New |
Headers | show |
Series | [1/2] Split return functionality of get_non_stale_global_range. | expand |
On Tue, Nov 23, 2021 at 6:04 PM Andrew MacLeod via Gcc-patches <gcc-patches@gcc.gnu.org> wrote: > > This is the second patch in the series. > > Ranger uses its own API to recursively satisfy dependencies. When > range_of_stmt is called on _1482 = _1154 + _1177; it picks up the > ranges of _1154 and _1177 from it's cache. If those statements have not > been seen yet, it recursively calls range_of_stmt on each one to resolve > the answer. Each main API call can trigger up to 5 other calls to get > to the next API point: > > gimple_ranger::fold_range_internal (...) > gimple_ranger::range_of_stmt (_1154,...) > gimple_ranger::range_of_expr (_1154,....) > fold_using_range::range_of_range_op (..) > fold_using_range::fold_stmt (...) > gimple_ranger::fold_range_internal (...) > gimple_ranger::range_of_stmt (_1482,...) > > For a normal forward walk, values tend to already be in the cache, but > when we try to answer a range_on_edge question on a back edge, it can > trigger a very long series of queries. I spent some time analyzing > these patterns, and found that regardless of which API entry point was > used, ultimately range_of_stmt is invoked in a predictable order to > initiate the cache values. > > This patch implements a dependency resolver which when range_of_stmt > uses when it is called on something which does not have a cache entry > yet (thus the disambiguation of the temporal failure vs lack of cache > entry in the previous patch) > > This looks at each operand, and if that operand does not have a cache > entry, pushes it on a stack. Names are popped from the stack and > fold_using_range() is invoked once all the operands have been > resolved. When we do get to call fold_using_range::fold_stmt(), we are > sure the operands are cached and the value will simply be calculated. > This is ultimately the exact series of events that would have happened > had the main API been used... except we don't involve the call stack > anymore for each one. > > Well, mostly :-). For this fix, we only do this with operands of stmts > which have a range-ops handler.. meaning we do not use the API for > anything range-ops understands. We will still use the main API for > resolving PHIS and other statements as they are encountered. We could > do this for PHIS as well, but for the most part it was the chains of > stmts within a block that were causing the vast majority of the issue. > If we later discover large chains of PHIs are causing issues as well, > then I can easily add them to this as well. I avoided them this time > because there is extra overhead involved in traversing all the PHI > arguments extra times. Sticking with range-ops limits us to 2 operands > to check, and the overhead is very minimal. > > I have tested this with PHIs as well and we could just include them > upfront. The overhead is more than doubled, but the increased compile > time of a VRP pass is still under 1%. > > Bootstrapped on x86_64-pc-linux-gnu with no regressions. OK? OK. Richard. > Andrew > > >
On 11/24/21 04:17, Richard Biener wrote: > On Tue, Nov 23, 2021 at 6:04 PM Andrew MacLeod via Gcc-patches > <gcc-patches@gcc.gnu.org> wrote: >> This is the second patch in the series. >> >> Ranger uses its own API to recursively satisfy dependencies. When >> range_of_stmt is called on _1482 = _1154 + _1177; it picks up the >> ranges of _1154 and _1177 from it's cache. If those statements have not >> been seen yet, it recursively calls range_of_stmt on each one to resolve >> the answer. Each main API call can trigger up to 5 other calls to get >> to the next API point: >> >> gimple_ranger::fold_range_internal (...) >> gimple_ranger::range_of_stmt (_1154,...) >> gimple_ranger::range_of_expr (_1154,....) >> fold_using_range::range_of_range_op (..) >> fold_using_range::fold_stmt (...) >> gimple_ranger::fold_range_internal (...) >> gimple_ranger::range_of_stmt (_1482,...) >> >> For a normal forward walk, values tend to already be in the cache, but >> when we try to answer a range_on_edge question on a back edge, it can >> trigger a very long series of queries. I spent some time analyzing >> these patterns, and found that regardless of which API entry point was >> used, ultimately range_of_stmt is invoked in a predictable order to >> initiate the cache values. >> >> This patch implements a dependency resolver which when range_of_stmt >> uses when it is called on something which does not have a cache entry >> yet (thus the disambiguation of the temporal failure vs lack of cache >> entry in the previous patch) >> >> This looks at each operand, and if that operand does not have a cache >> entry, pushes it on a stack. Names are popped from the stack and >> fold_using_range() is invoked once all the operands have been >> resolved. When we do get to call fold_using_range::fold_stmt(), we are >> sure the operands are cached and the value will simply be calculated. >> This is ultimately the exact series of events that would have happened >> had the main API been used... except we don't involve the call stack >> anymore for each one. >> >> Well, mostly :-). For this fix, we only do this with operands of stmts >> which have a range-ops handler.. meaning we do not use the API for >> anything range-ops understands. We will still use the main API for >> resolving PHIS and other statements as they are encountered. We could >> do this for PHIS as well, but for the most part it was the chains of >> stmts within a block that were causing the vast majority of the issue. >> If we later discover large chains of PHIs are causing issues as well, >> then I can easily add them to this as well. I avoided them this time >> because there is extra overhead involved in traversing all the PHI >> arguments extra times. Sticking with range-ops limits us to 2 operands >> to check, and the overhead is very minimal. >> >> I have tested this with PHIs as well and we could just include them >> upfront. The overhead is more than doubled, but the increased compile >> time of a VRP pass is still under 1%. >> >> Bootstrapped on x86_64-pc-linux-gnu with no regressions. OK? > OK. > > Richard. > >> Andrew >> >> >> committed.
From 28d1fea6e6c0c0368dbc04e895aaa0a6b47c19da Mon Sep 17 00:00:00 2001 From: Andrew MacLeod <amacleod@redhat.com> Date: Mon, 22 Nov 2021 14:39:41 -0500 Subject: [PATCH 3/3] Directly resolve range_of_stmt dependencies. All ranger API entries eventually call range_of_stmt to ensure there is an initial global value to work with. This can cause very deep call chains when satisfied via the normal API. Instead, push any dependencies onto a stack and evaluate them in a depth first manner, mirroring what would have happened via the normal API calls. PR tree-optimization/103231 gcc/ * gimple-range.cc (gimple_ranger::gimple_ranger): Create stmt stack. (gimple_ranger::gimple_ranger): Delete stmt stack. (gimple_ranger::range_of_stmt): Process depenedencies if they have no global cache entry. (gimple_ranger::prefill_name): New. (gimple_ranger::prefill_stmt_dependencies): New. * gimple-range.h (class gimple_ranger): Add prototypes. --- gcc/gimple-range.cc | 107 +++++++++++++++++++++++++++++++++++++++++++- gcc/gimple-range.h | 4 ++ 2 files changed, 109 insertions(+), 2 deletions(-) diff --git a/gcc/gimple-range.cc b/gcc/gimple-range.cc index e3ab3a8bb48..178a470a419 100644 --- a/gcc/gimple-range.cc +++ b/gcc/gimple-range.cc @@ -46,6 +46,9 @@ gimple_ranger::gimple_ranger () : m_oracle = m_cache.oracle (); if (dump_file && (param_ranger_debug & RANGER_DEBUG_TRACE)) tracer.enable_trace (); + m_stmt_list.create (0); + m_stmt_list.safe_grow (num_ssa_names); + m_stmt_list.truncate (0); // Ensure the not_executable flag is clear everywhere. if (flag_checking) @@ -61,6 +64,11 @@ gimple_ranger::gimple_ranger () : } } +gimple_ranger::~gimple_ranger () +{ + m_stmt_list.release (); +} + bool gimple_ranger::range_of_expr (irange &r, tree expr, gimple *stmt) { @@ -284,9 +292,10 @@ gimple_ranger::range_of_stmt (irange &r, gimple *s, tree name) else { bool current; - // Check if the stmt has already been processed, and is not stale. + // Check if the stmt has already been processed. if (m_cache.get_global_range (r, name, current)) { + // If it isn't stale, use this cached value. if (current) { if (idx) @@ -294,7 +303,10 @@ gimple_ranger::range_of_stmt (irange &r, gimple *s, tree name) return true; } } - // Otherwise calculate a new value. + else + prefill_stmt_dependencies (name); + + // Calculate a new value. int_range_max tmp; fold_range_internal (tmp, s, name); @@ -311,6 +323,97 @@ gimple_ranger::range_of_stmt (irange &r, gimple *s, tree name) return res; } + +// Check if NAME is a dependency that needs resolving, and push it on the +// stack if so. R is a scratch range. + +inline void +gimple_ranger::prefill_name (irange &r, tree name) +{ + if (!gimple_range_ssa_p (name)) + return; + gimple *stmt = SSA_NAME_DEF_STMT (name); + if (!gimple_range_handler (stmt)) + return; + + bool current; + // If this op has not been processed yet, then push it on the stack + if (!m_cache.get_global_range (r, name, current)) + m_stmt_list.safe_push (name); +} + +// This routine will seed the global cache with most of the depnedencies of +// NAME. This prevents excessive call depth through the normal API. + +void +gimple_ranger::prefill_stmt_dependencies (tree ssa) +{ + if (SSA_NAME_IS_DEFAULT_DEF (ssa)) + return; + + int_range_max r; + unsigned idx; + gimple *stmt = SSA_NAME_DEF_STMT (ssa); + gcc_checking_assert (stmt && gimple_bb (stmt)); + + // Only pre-process range-ops. + if (!gimple_range_handler (stmt)) + return; + + // Mark where on the stack we are starting. + unsigned start = m_stmt_list.length (); + m_stmt_list.safe_push (ssa); + + idx = tracer.header ("ROS dependence fill\n"); + + // Loop until back at the start point. + while (m_stmt_list.length () > start) + { + tree name = m_stmt_list.last (); + // NULL is a marker which indicates the next name in the stack has now + // been fully resolved, so we can fold it. + if (!name) + { + // Pop the NULL, then pop the name. + m_stmt_list.pop (); + name = m_stmt_list.pop (); + // Don't fold initial request, it will be calculated upon return. + if (m_stmt_list.length () > start) + { + // Fold and save the value for NAME. + stmt = SSA_NAME_DEF_STMT (name); + fold_range_internal (r, stmt, name); + m_cache.set_global_range (name, r); + } + continue; + } + + // Add marker indicating previous NAME in list should be folded + // when we get to this NULL. + m_stmt_list.safe_push (NULL_TREE); + stmt = SSA_NAME_DEF_STMT (name); + + if (idx) + { + tracer.print (idx, "ROS dep fill ("); + print_generic_expr (dump_file, name, TDF_SLIM); + fputs (") at stmt ", dump_file); + print_gimple_stmt (dump_file, stmt, 0, TDF_SLIM); + } + + gcc_checking_assert (gimple_range_handler (stmt)); + tree op = gimple_range_operand2 (stmt); + if (op) + prefill_name (r, op); + op = gimple_range_operand1 (stmt); + if (op) + prefill_name (r, op); + } + if (idx) + tracer.trailer (idx, "ROS ", false, ssa, r); +} + + // This routine will invoke the gimple fold_stmt routine, providing context to // range_of_expr calls via an private interal API. diff --git a/gcc/gimple-range.h b/gcc/gimple-range.h index 615496ec9b8..c2923c58b27 100644 --- a/gcc/gimple-range.h +++ b/gcc/gimple-range.h @@ -47,6 +47,7 @@ class gimple_ranger : public range_query { public: gimple_ranger (); + ~gimple_ranger (); virtual bool range_of_stmt (irange &r, gimple *, tree name = NULL) OVERRIDE; virtual bool range_of_expr (irange &r, tree name, gimple * = NULL) OVERRIDE; virtual bool range_on_edge (irange &r, edge e, tree name) OVERRIDE; @@ -61,9 +62,12 @@ public: bool fold_stmt (gimple_stmt_iterator *gsi, tree (*) (tree)); protected: bool fold_range_internal (irange &r, gimple *s, tree name); + void prefill_name (irange &r, tree name); + void prefill_stmt_dependencies (tree ssa); ranger_cache m_cache; range_tracer tracer; basic_block current_bb; + vec<tree> m_stmt_list; }; /* Create a new ranger instance and associate it with a function. -- 2.17.2