Message ID | orlfz82koi.fsf@lxoliva.fsfla.org |
---|---|
State | New |
Headers | show |
Series | preserve more debug stmts in gimple jump threading | expand |
On Wed, May 15, 2019 at 10:20 AM Alexandre Oliva <aoliva@redhat.com> wrote: > > Gimple jump threading does not duplicate forwarder blocks that might > be present before or after the second copied block. Empty forwarder blocks I presume? > This silently > drops debug binds and markers that might be present in them. This > patch attempts to preserve them. > > For blocks after either copied block, we attempt to append debug stmts > to the copied block, if it does not end with a block-ending stmt. > Failing that, for blocks between both copied blocks, we prepend its > debug stmts to the copy of the second block. > > If everything fails, we still drop debug stmts on the floor, though > preexisting code consolidates debug binds in the block that threading > flows into, so only markers are really lost. We can't do much better > than that without conditional binds and markers, or debug stmts in > edges, or somesuch. > > If we append debug stmts to a reusable template block, we copy it > after splitting out the debug stmts, and before putting them back. > > Regstrapped on x86_64-linux-gnu and i686-linux-gnu. Ok to install? I wonder if not copying empty forwarders is premature optimization in that (hopefully) CFG cleanup is run after jump threading anyways? If we'd copy the blocks the patch would be moot? Richard. > > for gcc/ChangeLog > > * tree-ssa-threadupdate.c (struct ssa_local_info_t): Add > field template_last_to_copy. > (ssa_create_duplicates): Set it, and use it. Attempt to > preserve more debug stmts. > --- > gcc/tree-ssa-threadupdate.c | 68 +++++++++++++++++++++++++++++++++++++++++++ > 1 file changed, 68 insertions(+) > > diff --git a/gcc/tree-ssa-threadupdate.c b/gcc/tree-ssa-threadupdate.c > index bccef879db03..d63154317746 100644 > --- a/gcc/tree-ssa-threadupdate.c > +++ b/gcc/tree-ssa-threadupdate.c > @@ -235,6 +235,12 @@ struct ssa_local_info_t > and sharing a template for that block is considerably more difficult. */ > basic_block template_block; > > + /* If we append debug stmts to the template block after creating it, > + this iterator won't be the last one in the block, and further > + copies of the template block shouldn't get debug stmts after > + it. */ > + gimple_stmt_iterator template_last_to_copy; > + > /* Blocks duplicated for the thread. */ > bitmap duplicate_blocks; > > @@ -1124,6 +1130,8 @@ ssa_create_duplicates (struct redirection_data **slot, > create_block_for_threading ((*path)[1]->e->src, rd, 0, > &local_info->duplicate_blocks); > local_info->template_block = rd->dup_blocks[0]; > + local_info->template_last_to_copy > + = gsi_last_bb (local_info->template_block); > > /* We do not create any outgoing edges for the template. We will > take care of that in a later traversal. That way we do not > @@ -1131,14 +1139,74 @@ ssa_create_duplicates (struct redirection_data **slot, > } > else > { > + gimple_seq seq = NULL; > + if (gsi_stmt (local_info->template_last_to_copy) > + != gsi_stmt (gsi_last_bb (local_info->template_block))) > + seq = gsi_split_seq_after (local_info->template_last_to_copy); > create_block_for_threading (local_info->template_block, rd, 0, > &local_info->duplicate_blocks); > + if (seq) > + gsi_insert_seq_after (&local_info->template_last_to_copy, > + seq, GSI_SAME_STMT); > > /* Go ahead and wire up outgoing edges and update PHIs for the duplicate > block. */ > ssa_fix_duplicate_block_edges (rd, local_info); > } > > + if (MAY_HAVE_DEBUG_STMTS) > + { > + /* Copy debug stmts from each NO_COPY src block to the block > + that would have been its predecessor, if we can append to it > + (we can't add stmts after a block-ending stmt), or prepending > + to the duplicate of the successor, if there is one. If > + there's no duplicate successor, we'll mostly drop the blocks > + on the floor; propagate_threaded_block_debug_into, called > + elsewhere, will consolidate and preserve the effects of the > + binds, but none of the markers. */ > + gimple_stmt_iterator copy_to = gsi_last_bb (rd->dup_blocks[0]); > + if (!gsi_end_p (copy_to)) > + { > + if (stmt_ends_bb_p (gsi_stmt (copy_to))) > + { > + if (rd->dup_blocks[1]) > + copy_to = gsi_after_labels (rd->dup_blocks[1]); > + else > + copy_to = gsi_none (); > + } > + else > + gsi_next (©_to); > + } > + for (unsigned int i = 2, j = 0; i < path->length (); i++) > + if ((*path)[i]->type == EDGE_NO_COPY_SRC_BLOCK > + && gsi_bb (copy_to)) > + { > + for (gimple_stmt_iterator gsi = gsi_start_bb ((*path)[i]->e->src); > + !gsi_end_p (gsi); gsi_next (&gsi)) > + { > + if (!is_gimple_debug (gsi_stmt (gsi))) > + continue; > + gimple *stmt = gsi_stmt (gsi); > + gimple *copy = gimple_copy (stmt); > + gsi_insert_before (©_to, copy, GSI_SAME_STMT); > + } > + } > + else if ((*path)[i]->type == EDGE_COPY_SRC_BLOCK > + || (*path)[i]->type == EDGE_COPY_SRC_JOINER_BLOCK) > + { > + j++; > + gcc_assert (j < 2); > + copy_to = gsi_last_bb (rd->dup_blocks[j]); > + if (!gsi_end_p (copy_to)) > + { > + if (stmt_ends_bb_p (gsi_stmt (copy_to))) > + copy_to = gsi_none (); > + else > + gsi_next (©_to); > + } > + } > + } > + > /* Keep walking the hash table. */ > return 1; > } > > > -- > Alexandre Oliva, freedom fighter he/him https://FSFLA.org/blogs/lxo > Be the change, be Free! FSF Latin America board member > GNU Toolchain Engineer Free Software Evangelist > Hay que enGNUrecerse, pero sin perder la terGNUra jamás - Che GNUevara
On May 15, 2019, Richard Biener <richard.guenther@gmail.com> wrote: > On Wed, May 15, 2019 at 10:20 AM Alexandre Oliva <aoliva@redhat.com> wrote: >> >> Gimple jump threading does not duplicate forwarder blocks that might >> be present before or after the copied block. > Empty forwarder blocks I presume? Empty except for debug stmts and possibly a final conditional jump that, in the threading path, becomes unconditional. > I wonder if not copying empty forwarders is premature optimization > in that (hopefully) CFG cleanup is run after jump threading anyways? I don't think it's so much of an optimization as it is an extension. The code as it is can only really handle copying the initial block and another subsequent block. Threading through other "empty" blocks didsn't require as significant changes as copying them would have. If we were to change that, then we'd be extending jump threading to actually copy multiple blocks, empty or not. Not necessarily a bad thing, but there are comments in the code suggesting that hasn't really been tried. > If we'd copy the blocks the patch would be moot? Yes, but with a caveat. It's true that cfgcleanup would pretty much end up having the same or similar effect, consolidating debug stmts from empty blocks into successor or predecessor if possible and dropping them otherwise. What it would not do is to consolidate binds from the now-split paths into the confluence, like propagate_threaded_block_debug_into does. The proposed patch does not change that, and it shouldn't: the confluence binds stand on their own (*), very much like the binds we add after PHI nodes when entering SSA. That reduces the loss when we don't have a place to hold the debug stmts, and it helps bind resolution in var-tracking even when the incoming paths seem to have diverged in where they hold a variable, at least when the binds don't end up being reset for this very reason even when var-tracking dataflow could have figured out a common location among the incoming blocks. (*) though memory references in them might become wrong if they're before the second block and the second block changes the memory locations, it occurs to me now, without actually going back and checking whether the consolidation pays any attention to that. We shouldn't assume that just because we're now copying all intermediate blocks, debug info and all, the confluence binds cease to be useful. Rather, we should probably consider introducing them in other passes that split paths and introduce new confluences.
On Wed, May 15, 2019 at 11:03 PM Alexandre Oliva <aoliva@redhat.com> wrote: > > On May 15, 2019, Richard Biener <richard.guenther@gmail.com> wrote: > > > On Wed, May 15, 2019 at 10:20 AM Alexandre Oliva <aoliva@redhat.com> wrote: > >> > >> Gimple jump threading does not duplicate forwarder blocks that might > >> be present before or after the copied block. > > > Empty forwarder blocks I presume? > > Empty except for debug stmts and possibly a final conditional jump that, > in the threading path, becomes unconditional. > > > I wonder if not copying empty forwarders is premature optimization > > in that (hopefully) CFG cleanup is run after jump threading anyways? > > I don't think it's so much of an optimization as it is an extension. > The code as it is can only really handle copying the initial block and > another subsequent block. Threading through other "empty" blocks > didsn't require as significant changes as copying them would have. > > If we were to change that, then we'd be extending jump threading to > actually copy multiple blocks, empty or not. Not necessarily a bad > thing, but there are comments in the code suggesting that hasn't really > been tried. Oh, I see - I thought we've extended the code to copy an arbitrary length path by now ... > > If we'd copy the blocks the patch would be moot? > > Yes, but with a caveat. > > It's true that cfgcleanup would pretty much end up having the same or > similar effect, consolidating debug stmts from empty blocks into > successor or predecessor if possible and dropping them otherwise. > > What it would not do is to consolidate binds from the now-split paths > into the confluence, like propagate_threaded_block_debug_into does. The > proposed patch does not change that, and it shouldn't: the confluence > binds stand on their own (*), very much like the binds we add after PHI > nodes when entering SSA. Ah, I missed that, so yes, it looks more powerful. > That reduces the loss when we don't have a place to hold the debug > stmts, and it helps bind resolution in var-tracking even when the > incoming paths seem to have diverged in where they hold a variable, at > least when the binds don't end up being reset for this very reason even > when var-tracking dataflow could have figured out a common location > among the incoming blocks. > > (*) though memory references in them might become wrong if they're > before the second block and the second block changes the memory > locations, it occurs to me now, without actually going back and checking > whether the consolidation pays any attention to that. > > We shouldn't assume that just because we're now copying all intermediate > blocks, debug info and all, the confluence binds cease to be useful. > Rather, we should probably consider introducing them in other passes > that split paths and introduce new confluences. Indeed. Richard. > -- > Alexandre Oliva, freedom fighter he/him https://FSFLA.org/blogs/lxo > Be the change, be Free! FSF Latin America board member > GNU Toolchain Engineer Free Software Evangelist > Hay que enGNUrecerse, pero sin perder la terGNUra jamás - Che GNUevara
On 5/15/19 3:03 PM, Alexandre Oliva wrote: > On May 15, 2019, Richard Biener <richard.guenther@gmail.com> wrote: > >> On Wed, May 15, 2019 at 10:20 AM Alexandre Oliva <aoliva@redhat.com> wrote: >>> >>> Gimple jump threading does not duplicate forwarder blocks that might >>> be present before or after the copied block. > >> Empty forwarder blocks I presume? > > Empty except for debug stmts and possibly a final conditional jump that, > in the threading path, becomes unconditional. Right. The tree-ssa-threadupate code all pre-dates the SEME copier which is a *much* better way to handle duplicating the region. Initially we allowed only one block with side effects in the jump threading path. That's all we really knew how to do correctly. We extended that to ignore forwarders at some point since they didn't need to be copied -- you just need to know where the forwarder block will go. We later added the ability to copy a second block with side effects in the jump threading path. But I'd really like to just remove tree-ssa-threadupate.c. It's horribly convoluted due to old requirements. I'm confident we could use the SEME copier to handle all the existing cases in a much simpler fashion. Jeff
On Thu, May 16, 2019 at 6:14 PM Jeff Law <law@redhat.com> wrote: > > On 5/15/19 3:03 PM, Alexandre Oliva wrote: > > On May 15, 2019, Richard Biener <richard.guenther@gmail.com> wrote: > > > >> On Wed, May 15, 2019 at 10:20 AM Alexandre Oliva <aoliva@redhat.com> wrote: > >>> > >>> Gimple jump threading does not duplicate forwarder blocks that might > >>> be present before or after the copied block. > > > >> Empty forwarder blocks I presume? > > > > Empty except for debug stmts and possibly a final conditional jump that, > > in the threading path, becomes unconditional. > Right. The tree-ssa-threadupate code all pre-dates the SEME copier > which is a *much* better way to handle duplicating the region. > > Initially we allowed only one block with side effects in the jump > threading path. That's all we really knew how to do correctly. We > extended that to ignore forwarders at some point since they didn't need > to be copied -- you just need to know where the forwarder block will go. > > We later added the ability to copy a second block with side effects in > the jump threading path. > > But I'd really like to just remove tree-ssa-threadupate.c. It's > horribly convoluted due to old requirements. I'm confident we could use > the SEME copier to handle all the existing cases in a much simpler fashion. Not sure if that's the best infrastructure to use (it cannot copy a path crossing a backedge). tracer does the duplicating incrementally for example. Technically the duplication isn't difficult but some simplification on the fly would be nice (like actually merging the blocks and propagating out PHIs and constants). Eventually I'll find cycles to implement sth like this in a greedy fashion for loop unrolling. Richard. > > > Jeff
On 5/16/19 12:46 PM, Richard Biener wrote: > On Thu, May 16, 2019 at 6:14 PM Jeff Law <law@redhat.com> wrote: >> >> On 5/15/19 3:03 PM, Alexandre Oliva wrote: >>> On May 15, 2019, Richard Biener <richard.guenther@gmail.com> wrote: >>> >>>> On Wed, May 15, 2019 at 10:20 AM Alexandre Oliva <aoliva@redhat.com> wrote: >>>>> >>>>> Gimple jump threading does not duplicate forwarder blocks that might >>>>> be present before or after the copied block. >>> >>>> Empty forwarder blocks I presume? >>> >>> Empty except for debug stmts and possibly a final conditional jump that, >>> in the threading path, becomes unconditional. >> Right. The tree-ssa-threadupate code all pre-dates the SEME copier >> which is a *much* better way to handle duplicating the region. >> >> Initially we allowed only one block with side effects in the jump >> threading path. That's all we really knew how to do correctly. We >> extended that to ignore forwarders at some point since they didn't need >> to be copied -- you just need to know where the forwarder block will go. >> >> We later added the ability to copy a second block with side effects in >> the jump threading path. >> >> But I'd really like to just remove tree-ssa-threadupate.c. It's >> horribly convoluted due to old requirements. I'm confident we could use >> the SEME copier to handle all the existing cases in a much simpler fashion. > > Not sure if that's the best infrastructure to use (it cannot copy a path > crossing a backedge). tracer does the duplicating incrementally > for example. Technically the duplication isn't difficult but some > simplification on the fly would be nice (like actually merging the > blocks and propagating out PHIs and constants). We don't need to worry about backedges in this code anymore. And yes, some simplification would be helpful. In fact using the SEME copier actually helps with that because it requires a bit more structure. THe biggest downside I see with moving to the SEME copier here would be that when we have multiple incoming edges that thread to the same outgoing edge, the current copier will create a single duplicate. We'd likely end up with multiple duplicates using the SEME copier. Jeff
On 5/15/19 2:20 AM, Alexandre Oliva wrote: > Gimple jump threading does not duplicate forwarder blocks that might > be present before or after the second copied block. This silently > drops debug binds and markers that might be present in them. This > patch attempts to preserve them. > > For blocks after either copied block, we attempt to append debug stmts > to the copied block, if it does not end with a block-ending stmt. > Failing that, for blocks between both copied blocks, we prepend its > debug stmts to the copy of the second block. > > If everything fails, we still drop debug stmts on the floor, though > preexisting code consolidates debug binds in the block that threading > flows into, so only markers are really lost. We can't do much better > than that without conditional binds and markers, or debug stmts in > edges, or somesuch. > > If we append debug stmts to a reusable template block, we copy it > after splitting out the debug stmts, and before putting them back. > > Regstrapped on x86_64-linux-gnu and i686-linux-gnu. Ok to install? > > > for gcc/ChangeLog > > * tree-ssa-threadupdate.c (struct ssa_local_info_t): Add > field template_last_to_copy. > (ssa_create_duplicates): Set it, and use it. Attempt to > preserve more debug stmts. OK. Presumably creating a reliable testcase was painful? jeff > ---
On May 17, 2019, Jeff Law <law@redhat.com> wrote:
> OK. Presumably creating a reliable testcase was painful?
Heh, that it might even possible didn't even cross my mind. I was happy
enough to be able to exercise and inspect the before&after IR for most
of the new code paths in the patch in a GCC bootstrap, by placing
gcc_assert(gcc_stop_here_*||getenv("GCC_STOP_HERE_*")) at various
places, so that I could stop to inspect, skip them in a debugger or in a
full run to look at the compiler dumps. That alone took much longer
than I had to complete it :-(
diff --git a/gcc/tree-ssa-threadupdate.c b/gcc/tree-ssa-threadupdate.c index bccef879db03..d63154317746 100644 --- a/gcc/tree-ssa-threadupdate.c +++ b/gcc/tree-ssa-threadupdate.c @@ -235,6 +235,12 @@ struct ssa_local_info_t and sharing a template for that block is considerably more difficult. */ basic_block template_block; + /* If we append debug stmts to the template block after creating it, + this iterator won't be the last one in the block, and further + copies of the template block shouldn't get debug stmts after + it. */ + gimple_stmt_iterator template_last_to_copy; + /* Blocks duplicated for the thread. */ bitmap duplicate_blocks; @@ -1124,6 +1130,8 @@ ssa_create_duplicates (struct redirection_data **slot, create_block_for_threading ((*path)[1]->e->src, rd, 0, &local_info->duplicate_blocks); local_info->template_block = rd->dup_blocks[0]; + local_info->template_last_to_copy + = gsi_last_bb (local_info->template_block); /* We do not create any outgoing edges for the template. We will take care of that in a later traversal. That way we do not @@ -1131,14 +1139,74 @@ ssa_create_duplicates (struct redirection_data **slot, } else { + gimple_seq seq = NULL; + if (gsi_stmt (local_info->template_last_to_copy) + != gsi_stmt (gsi_last_bb (local_info->template_block))) + seq = gsi_split_seq_after (local_info->template_last_to_copy); create_block_for_threading (local_info->template_block, rd, 0, &local_info->duplicate_blocks); + if (seq) + gsi_insert_seq_after (&local_info->template_last_to_copy, + seq, GSI_SAME_STMT); /* Go ahead and wire up outgoing edges and update PHIs for the duplicate block. */ ssa_fix_duplicate_block_edges (rd, local_info); } + if (MAY_HAVE_DEBUG_STMTS) + { + /* Copy debug stmts from each NO_COPY src block to the block + that would have been its predecessor, if we can append to it + (we can't add stmts after a block-ending stmt), or prepending + to the duplicate of the successor, if there is one. If + there's no duplicate successor, we'll mostly drop the blocks + on the floor; propagate_threaded_block_debug_into, called + elsewhere, will consolidate and preserve the effects of the + binds, but none of the markers. */ + gimple_stmt_iterator copy_to = gsi_last_bb (rd->dup_blocks[0]); + if (!gsi_end_p (copy_to)) + { + if (stmt_ends_bb_p (gsi_stmt (copy_to))) + { + if (rd->dup_blocks[1]) + copy_to = gsi_after_labels (rd->dup_blocks[1]); + else + copy_to = gsi_none (); + } + else + gsi_next (©_to); + } + for (unsigned int i = 2, j = 0; i < path->length (); i++) + if ((*path)[i]->type == EDGE_NO_COPY_SRC_BLOCK + && gsi_bb (copy_to)) + { + for (gimple_stmt_iterator gsi = gsi_start_bb ((*path)[i]->e->src); + !gsi_end_p (gsi); gsi_next (&gsi)) + { + if (!is_gimple_debug (gsi_stmt (gsi))) + continue; + gimple *stmt = gsi_stmt (gsi); + gimple *copy = gimple_copy (stmt); + gsi_insert_before (©_to, copy, GSI_SAME_STMT); + } + } + else if ((*path)[i]->type == EDGE_COPY_SRC_BLOCK + || (*path)[i]->type == EDGE_COPY_SRC_JOINER_BLOCK) + { + j++; + gcc_assert (j < 2); + copy_to = gsi_last_bb (rd->dup_blocks[j]); + if (!gsi_end_p (copy_to)) + { + if (stmt_ends_bb_p (gsi_stmt (copy_to))) + copy_to = gsi_none (); + else + gsi_next (©_to); + } + } + } + /* Keep walking the hash table. */ return 1; }