Message ID | 69507e99-2f0a-1187-2701-ccfb240c77b3@redhat.com |
---|---|
State | New |
Headers | show |
Series | [1/2] - Add BB option for outgoing_edge_range_p. | expand |
On Fri, Dec 3, 2021 at 9:42 PM Andrew MacLeod via Gcc-patches <gcc-patches@gcc.gnu.org> wrote: > > When a request is made for the range of an ssa_name at some location, > the first thing we do is invoke range_of_stmt() to ensure we have looked > at the definition and have an evaluation for the name at a global > level. I recently added a patch which dramatically reduces the call > stack requirements for that call. > > Once we have confirmed the definition range has been set, a call is made > for the range-on-entry to the block of the use. This is performed by > the cache, which proceeds to walk the CFG predecessors looking for when > ranges are created (exported), existing range-on-entry cache hits, or > the definition block. Once this list of predecessors has been created, > a forward walk is done, pushing the range's through successor edges of > all the blocks were visited in the initial walk. > > If the use is far from the definition, we end up filling a lot of the > same value on these paths. Also uses which are far from a > range-modifying statement push the same value into a lot of blocks. > > This patch tries to address at least some inefficiencies. It recognizes > that > > First, if there is no range modifying stmt between this use and the last > range we saw in a dominating block, we can just use the value from the > dominating block and not fill in all the cache entries between here and > there. This is the biggest win. > > Second. if there is a range modifying statement at the end of some > block, we will have to do the appropriate cache walk to this point, but > its possible the range-on-entry to THAT block might be able to use a > dominating range, and we can prevent the walk from going any further > than this block > > Combined, this should prevent a lot of unnecessary ranges from being > plugging into the cache. > > ie, to visualize: > > bb4: > a = foo() > <..> > bb60: > if (a < 30) > <...> > bb110: > g = a + 10 > > if the first place we ask for a is in bb110, we walk the CFG from 110 > all the way back to bb4, on all paths leading back. then fill all those > cache entries. > With this patch, > a) if bb60 does not dominate bb110, the request will scan the > dominators, arrive at the definition block, have seen no range modifier, > and simply set the on-entry for 110 to the range of a. done. > b) if bb60 does dominate 110, we have no idea which edge out of 60 > dominates it, so we will revert to he existing cache algorithm. Before > doing so, it checks and determines that there are no modifiers between > bb60 and the def in bb4, and so sets the on-entry cache for bb60 to be > the range of a. Now when we do the cache fill walk, it only has to go > back as far as bb60 instead of all the way to bb4. > > Otherwise we just revert to what we do now (or if dominators are not > available). I have yet to see a case where we miss something we use to > get, but that does not mean there isn't one :-). > > The cumulative performance impact of this compiling a set of 390 GCC > source files at -O2 (measured via callgrind) is pretty decent: Negative > numbers are a compile time decrease. Thus -10% is 10% faster > compilation time. > > EVRP : %change from trunk is -26.31% (!) > VRP2 : %change from trunk is -9.53% > thread_jumps_full : %change from trunk is -15.8% > Total compilation time : %change from trunk is -1.06% > > So its not insignificant. > > Risk would be very low, unless dominators are screwed up mid-pass.. but > then the relation code would be screwed up too. > > Bootstrapped on x86_64-pc-linux-gnu with no regressions. OK? OK. Wow - I didn't realize we have this backwards CFG walk w/o dominators ... I wonder if you can add a counter to visualize places we end up using this path. I suggest to add statistics_counter_event (cfun, "slow range query", 1); or so and see with -fdump-statistics-stats which passes eventually trigger that (I suspect none?). Thanks, Richard. > Andrew > > > >
On 12/6/21 02:27, Richard Biener wrote: > On Fri, Dec 3, 2021 at 9:42 PM Andrew MacLeod via Gcc-patches > <gcc-patches@gcc.gnu.org> wrote: >> When a request is made for the range of an ssa_name at some location, >> the first thing we do is invoke range_of_stmt() to ensure we have looked >> at the definition and have an evaluation for the name at a global >> level. I recently added a patch which dramatically reduces the call >> stack requirements for that call. >> >> Once we have confirmed the definition range has been set, a call is made >> for the range-on-entry to the block of the use. This is performed by >> the cache, which proceeds to walk the CFG predecessors looking for when >> ranges are created (exported), existing range-on-entry cache hits, or >> the definition block. Once this list of predecessors has been created, >> a forward walk is done, pushing the range's through successor edges of >> all the blocks were visited in the initial walk. >> >> If the use is far from the definition, we end up filling a lot of the >> same value on these paths. Also uses which are far from a >> range-modifying statement push the same value into a lot of blocks. >> >> This patch tries to address at least some inefficiencies. It recognizes >> that >> >> First, if there is no range modifying stmt between this use and the last >> range we saw in a dominating block, we can just use the value from the >> dominating block and not fill in all the cache entries between here and >> there. This is the biggest win. >> >> Second. if there is a range modifying statement at the end of some >> block, we will have to do the appropriate cache walk to this point, but >> its possible the range-on-entry to THAT block might be able to use a >> dominating range, and we can prevent the walk from going any further >> than this block >> >> Combined, this should prevent a lot of unnecessary ranges from being >> plugging into the cache. >> >> ie, to visualize: >> >> bb4: >> a = foo() >> <..> >> bb60: >> if (a < 30) >> <...> >> bb110: >> g = a + 10 >> >> if the first place we ask for a is in bb110, we walk the CFG from 110 >> all the way back to bb4, on all paths leading back. then fill all those >> cache entries. >> With this patch, >> a) if bb60 does not dominate bb110, the request will scan the >> dominators, arrive at the definition block, have seen no range modifier, >> and simply set the on-entry for 110 to the range of a. done. >> b) if bb60 does dominate 110, we have no idea which edge out of 60 >> dominates it, so we will revert to he existing cache algorithm. Before >> doing so, it checks and determines that there are no modifiers between >> bb60 and the def in bb4, and so sets the on-entry cache for bb60 to be >> the range of a. Now when we do the cache fill walk, it only has to go >> back as far as bb60 instead of all the way to bb4. >> >> Otherwise we just revert to what we do now (or if dominators are not >> available). I have yet to see a case where we miss something we use to >> get, but that does not mean there isn't one :-). >> >> The cumulative performance impact of this compiling a set of 390 GCC >> source files at -O2 (measured via callgrind) is pretty decent: Negative >> numbers are a compile time decrease. Thus -10% is 10% faster >> compilation time. >> >> EVRP : %change from trunk is -26.31% (!) >> VRP2 : %change from trunk is -9.53% >> thread_jumps_full : %change from trunk is -15.8% >> Total compilation time : %change from trunk is -1.06% >> >> So its not insignificant. >> >> Risk would be very low, unless dominators are screwed up mid-pass.. but >> then the relation code would be screwed up too. >> >> Bootstrapped on x86_64-pc-linux-gnu with no regressions. OK? > OK. Committed. > > Wow - I didn't realize we have this backwards CFG walk w/o dominators ... > I wonder if you can add a counter to visualize places we end up using this path. Well, its only does the fill now when there is range info located on an outgoing edge of the dominator. Its still used, just on a much smaller portion of the graph. We could do even better if we knew whether one edge was involved. ie bb3: a = foo() if (a > 4) blah1; bb4 else blah2; bb5 bb6: = a; The use of a in bb6 will look to bb3 as the dominator, but if it knew that both edges are used in that dominance relation (ie, neither outgoing edge dominates bb6), it wouldn't have to calculate a range for 'a'. But as it stands, it cant really tell the above situation from: bb3: a = foo() if (a > 4) = a bb4 In this case, only the one edge is used, and we DO need to calculate a range for a. The edge from 3->4 does dominate bb4 In both cases, bb3 is the dominator, and bb3 can generate an outgoing range, but only in one case do we need to calculate a range. What would be really useful would be to be able to tell if an edge dominates a block :-) I have thoughts on this for next release, and may overhaul the cache... but I don't think there is any such facility in the dominator subsystem? Andrew
On Mon, Dec 6, 2021 at 7:39 PM Andrew MacLeod <amacleod@redhat.com> wrote: > > On 12/6/21 02:27, Richard Biener wrote: > > On Fri, Dec 3, 2021 at 9:42 PM Andrew MacLeod via Gcc-patches > > <gcc-patches@gcc.gnu.org> wrote: > >> When a request is made for the range of an ssa_name at some location, > >> the first thing we do is invoke range_of_stmt() to ensure we have looked > >> at the definition and have an evaluation for the name at a global > >> level. I recently added a patch which dramatically reduces the call > >> stack requirements for that call. > >> > >> Once we have confirmed the definition range has been set, a call is made > >> for the range-on-entry to the block of the use. This is performed by > >> the cache, which proceeds to walk the CFG predecessors looking for when > >> ranges are created (exported), existing range-on-entry cache hits, or > >> the definition block. Once this list of predecessors has been created, > >> a forward walk is done, pushing the range's through successor edges of > >> all the blocks were visited in the initial walk. > >> > >> If the use is far from the definition, we end up filling a lot of the > >> same value on these paths. Also uses which are far from a > >> range-modifying statement push the same value into a lot of blocks. > >> > >> This patch tries to address at least some inefficiencies. It recognizes > >> that > >> > >> First, if there is no range modifying stmt between this use and the last > >> range we saw in a dominating block, we can just use the value from the > >> dominating block and not fill in all the cache entries between here and > >> there. This is the biggest win. > >> > >> Second. if there is a range modifying statement at the end of some > >> block, we will have to do the appropriate cache walk to this point, but > >> its possible the range-on-entry to THAT block might be able to use a > >> dominating range, and we can prevent the walk from going any further > >> than this block > >> > >> Combined, this should prevent a lot of unnecessary ranges from being > >> plugging into the cache. > >> > >> ie, to visualize: > >> > >> bb4: > >> a = foo() > >> <..> > >> bb60: > >> if (a < 30) > >> <...> > >> bb110: > >> g = a + 10 > >> > >> if the first place we ask for a is in bb110, we walk the CFG from 110 > >> all the way back to bb4, on all paths leading back. then fill all those > >> cache entries. > >> With this patch, > >> a) if bb60 does not dominate bb110, the request will scan the > >> dominators, arrive at the definition block, have seen no range modifier, > >> and simply set the on-entry for 110 to the range of a. done. > >> b) if bb60 does dominate 110, we have no idea which edge out of 60 > >> dominates it, so we will revert to he existing cache algorithm. Before > >> doing so, it checks and determines that there are no modifiers between > >> bb60 and the def in bb4, and so sets the on-entry cache for bb60 to be > >> the range of a. Now when we do the cache fill walk, it only has to go > >> back as far as bb60 instead of all the way to bb4. > >> > >> Otherwise we just revert to what we do now (or if dominators are not > >> available). I have yet to see a case where we miss something we use to > >> get, but that does not mean there isn't one :-). > >> > >> The cumulative performance impact of this compiling a set of 390 GCC > >> source files at -O2 (measured via callgrind) is pretty decent: Negative > >> numbers are a compile time decrease. Thus -10% is 10% faster > >> compilation time. > >> > >> EVRP : %change from trunk is -26.31% (!) > >> VRP2 : %change from trunk is -9.53% > >> thread_jumps_full : %change from trunk is -15.8% > >> Total compilation time : %change from trunk is -1.06% > >> > >> So its not insignificant. > >> > >> Risk would be very low, unless dominators are screwed up mid-pass.. but > >> then the relation code would be screwed up too. > >> > >> Bootstrapped on x86_64-pc-linux-gnu with no regressions. OK? > > OK. > > Committed. > > > > > > Wow - I didn't realize we have this backwards CFG walk w/o dominators ... > > I wonder if you can add a counter to visualize places we end up using this path. > > Well, its only does the fill now when there is range info located on an > outgoing edge of the dominator. Its still used, just on a much smaller > portion of the graph. > > We could do even better if we knew whether one edge was involved. ie > > bb3: > a = foo() > if (a > 4) > blah1; bb4 > else > blah2; bb5 > bb6: > = a; > > The use of a in bb6 will look to bb3 as the dominator, but if it knew > that both edges are used in that dominance relation (ie, neither > outgoing edge dominates bb6), it wouldn't have to calculate a range for > 'a'. > > But as it stands, it cant really tell the above situation from: > > bb3: > a = foo() > if (a > 4) > = a bb4 > > In this case, only the one edge is used, and we DO need to calculate a > range for a. The edge from 3->4 does dominate bb4 > > In both cases, bb3 is the dominator, and bb3 can generate an outgoing > range, but only in one case do we need to calculate a range. > > What would be really useful would be to be able to tell if an edge > dominates a block :-) > > I have thoughts on this for next release, and may overhaul the cache... > but I don't think there is any such facility in the dominator subsystem? Well, dominance of an edge is dominance of the edge destination if the destination has only a single non-backedge. If the destination has more than one non-backedge then the edge does not dominate anything. So to generally answer dominance queries for edges you have to have backedges marked. Richard. > > Andrew >
On 12/7/21 02:12, Richard Biener wrote: > On Mon, Dec 6, 2021 at 7:39 PM Andrew MacLeod <amacleod@redhat.com> wrote: >> On >> Well, its only does the fill now when there is range info located on an >> outgoing edge of the dominator. Its still used, just on a much smaller >> portion of the graph. >> >> We could do even better if we knew whether one edge was involved. ie >> >> bb3: >> a = foo() >> if (a > 4) >> blah1; bb4 >> else >> blah2; bb5 >> bb6: >> = a; >> >> The use of a in bb6 will look to bb3 as the dominator, but if it knew >> that both edges are used in that dominance relation (ie, neither >> outgoing edge dominates bb6), it wouldn't have to calculate a range for >> 'a'. >> >> But as it stands, it cant really tell the above situation from: >> >> bb3: >> a = foo() >> if (a > 4) >> = a bb4 >> >> In this case, only the one edge is used, and we DO need to calculate a >> range for a. The edge from 3->4 does dominate bb4 >> >> In both cases, bb3 is the dominator, and bb3 can generate an outgoing >> range, but only in one case do we need to calculate a range. >> >> What would be really useful would be to be able to tell if an edge >> dominates a block :-) >> >> I have thoughts on this for next release, and may overhaul the cache... >> but I don't think there is any such facility in the dominator subsystem? > Well, dominance of an edge is dominance of the edge destination if the > destination has only a single non-backedge. If the destination has more > than one non-backedge then the edge does not dominate anything. So > to generally answer dominance queries for edges you have to have > backedges marked. > Are back edges not always marked if dominance information is available? OR does one need to do something to ensure they are up to date. And if an edge is marked as DFS_BACK, is that guaranteed to be true always, regardless of dominance info, assuming no CFG manipulation is going on? Thanks Andrew
On Tue, Dec 7, 2021 at 3:16 PM Andrew MacLeod <amacleod@redhat.com> wrote: > > On 12/7/21 02:12, Richard Biener wrote: > > On Mon, Dec 6, 2021 at 7:39 PM Andrew MacLeod <amacleod@redhat.com> wrote: > >> On > >> Well, its only does the fill now when there is range info located on an > >> outgoing edge of the dominator. Its still used, just on a much smaller > >> portion of the graph. > >> > >> We could do even better if we knew whether one edge was involved. ie > >> > >> bb3: > >> a = foo() > >> if (a > 4) > >> blah1; bb4 > >> else > >> blah2; bb5 > >> bb6: > >> = a; > >> > >> The use of a in bb6 will look to bb3 as the dominator, but if it knew > >> that both edges are used in that dominance relation (ie, neither > >> outgoing edge dominates bb6), it wouldn't have to calculate a range for > >> 'a'. > >> > >> But as it stands, it cant really tell the above situation from: > >> > >> bb3: > >> a = foo() > >> if (a > 4) > >> = a bb4 > >> > >> In this case, only the one edge is used, and we DO need to calculate a > >> range for a. The edge from 3->4 does dominate bb4 > >> > >> In both cases, bb3 is the dominator, and bb3 can generate an outgoing > >> range, but only in one case do we need to calculate a range. > >> > >> What would be really useful would be to be able to tell if an edge > >> dominates a block :-) > >> > >> I have thoughts on this for next release, and may overhaul the cache... > >> but I don't think there is any such facility in the dominator subsystem? > > Well, dominance of an edge is dominance of the edge destination if the > > destination has only a single non-backedge. If the destination has more > > than one non-backedge then the edge does not dominate anything. So > > to generally answer dominance queries for edges you have to have > > backedges marked. > > > Are back edges not always marked if dominance information is available? no > OR does one need to do something to ensure they are up to date. mark_dfs_back_edges () > And if an edge is marked as DFS_BACK, is that guaranteed to be true > always, regardless of dominance info, assuming no CFG manipulation is > going on? well, it's guaranteed after the above call. CFG manipulations do not keep the flag up-to-date. Richard. > > Thanks > > Andrew >
From ea2f90151dcaeea2b5c372f900e1eef735269e18 Mon Sep 17 00:00:00 2001 From: Andrew MacLeod <amacleod@redhat.com> Date: Fri, 3 Dec 2021 11:02:19 -0500 Subject: [PATCH 2/2] Use dominators to reduce cache-flling. Before walking the CFG and filling all cache entries, check if the same information is available in a dominator. * gimple-range-cache.cc (ranger_cache::fill_block_cache): Check for a range from dominators before filling the cache. (ranger_cache::range_from_dom): New. --- gcc/gimple-range-cache.cc | 73 +++++++++++++++++++++++++++++++++++++++ gcc/gimple-range-cache.h | 1 + 2 files changed, 74 insertions(+) diff --git a/gcc/gimple-range-cache.cc b/gcc/gimple-range-cache.cc index fe31e9462aa..47e95ec23be 100644 --- a/gcc/gimple-range-cache.cc +++ b/gcc/gimple-range-cache.cc @@ -1312,6 +1312,20 @@ ranger_cache::fill_block_cache (tree name, basic_block bb, basic_block def_bb) fprintf (dump_file, " : "); } + // If there are dominators, check if a dominators can supply the range. + if (dom_info_available_p (CDI_DOMINATORS) + && range_from_dom (block_result, name, bb)) + { + m_on_entry.set_bb_range (name, bb, block_result); + if (DEBUG_RANGE_CACHE) + { + fprintf (dump_file, "Filled from dominator! : "); + block_result.dump (dump_file); + fprintf (dump_file, "\n"); + } + return; + } + while (m_workback.length () > 0) { basic_block node = m_workback.pop (); @@ -1394,3 +1408,62 @@ ranger_cache::fill_block_cache (tree name, basic_block bb, basic_block def_bb) fprintf (dump_file, " Propagation update done.\n"); } + +// Check to see if we can simply get the range from the dominator. + +bool +ranger_cache::range_from_dom (irange &r, tree name, basic_block bb) +{ + gcc_checking_assert (dom_info_available_p (CDI_DOMINATORS)); + + // Search back to the definition block or entry block. + basic_block def_bb = gimple_bb (SSA_NAME_DEF_STMT (name)); + if (def_bb == NULL) + def_bb = ENTRY_BLOCK_PTR_FOR_FN (cfun); + + // Flag if we encounter a block with non-null set. + bool non_null = false; + for (bb = get_immediate_dominator (CDI_DOMINATORS, bb); + bb && bb != def_bb; + bb = get_immediate_dominator (CDI_DOMINATORS, bb)) + { + // If there is an outgoing range, the on-entry value won't work. + if (m_gori.has_edge_range_p (name, bb)) + { + // Check if we can seed this block with a dominator value. THis will + // prevent the ache from being filled back further than this. + if (bb != def_bb && range_from_dom (r, name, bb)) + m_on_entry.set_bb_range (name, bb, r); + return false; + } + + // Flag if we see a non-null reference during this walk. + if (m_non_null.non_null_deref_p (name, bb, false)) + non_null = true; + + // If range-on-entry is set in this block, it can be used. + if (m_on_entry.get_bb_range (r, name, bb)) + { + // Apply non-null if appropriate. + if (r.varying_p () && non_null) + { + gcc_checking_assert (POINTER_TYPE_P (TREE_TYPE (name))); + r.set_nonzero (TREE_TYPE (name)); + } + return true; + } + } + // If this is the def block, and NAME is an export, then this value + // cannot be used. + if (bb == def_bb && m_gori.has_edge_range_p (name, bb)) + return false; + + // Otherwise choose the global value and use it. + get_global_range (r, name); + if (r.varying_p () && non_null) + { + gcc_checking_assert (POINTER_TYPE_P (TREE_TYPE (name))); + r.set_nonzero (TREE_TYPE (name)); + } + return true; +} diff --git a/gcc/gimple-range-cache.h b/gcc/gimple-range-cache.h index eb7a875c46b..2c52a0b6ce3 100644 --- a/gcc/gimple-range-cache.h +++ b/gcc/gimple-range-cache.h @@ -98,6 +98,7 @@ public: virtual bool range_of_expr (irange &r, tree name, gimple *stmt); virtual bool range_on_edge (irange &r, edge e, tree expr); bool block_range (irange &r, basic_block bb, tree name, bool calc = true); + bool range_from_dom (irange &r, tree name, basic_block bb); bool get_global_range (irange &r, tree name) const; bool get_global_range (irange &r, tree name, bool ¤t_p); -- 2.17.2