Message ID | ZJAJMPiUB4oxqZBR@kam.mff.cuni.cz |
---|---|
State | New |
Headers | show |
Series | Do not account __builtin_unreachable guards in inliner | expand |
On Mon, Jun 19, 2023 at 9:52 AM Jan Hubicka via Gcc-patches <gcc-patches@gcc.gnu.org> wrote: > > Hi, > this was suggested earlier somewhere, but I can not find the thread. > C++ has assume attribute that expands int > if (conditional) > __builtin_unreachable () > We do not want to account the conditional in inline heuristics since > we know that it is going to be optimized out. > > Bootstrapped/regtested x86_64-linux, will commit it later today if > thre are no complains. I think we also had the request to not account the condition feeding stmts (if they only feed it and have no side-effects). libstdc++ has complex range comparisons here. Also ... > gcc/ChangeLog: > > * ipa-fnsummary.cc (builtin_unreachable_bb_p): New function. > (analyze_function_body): Do not account conditionals guarding > builtin_unreachable calls. > > gcc/testsuite/ChangeLog: > > * gcc.dg/ipa/fnsummary-1.c: New test. > > diff --git a/gcc/ipa-fnsummary.cc b/gcc/ipa-fnsummary.cc > index a5f5a50c8a5..987da29ec34 100644 > --- a/gcc/ipa-fnsummary.cc > +++ b/gcc/ipa-fnsummary.cc > @@ -2649,6 +2649,54 @@ points_to_possible_sra_candidate_p (tree t) > return false; > } > > +/* Return true if BB is builtin_unreachable. > + We skip empty basic blocks, debug statements, clobbers and predicts. > + CACHE is used to memoize already analyzed blocks. */ > + > +static bool > +builtin_unreachable_bb_p (basic_block bb, vec<unsigned char> &cache) > +{ > + if (cache[bb->index]) > + return cache[bb->index] - 1; > + gimple_stmt_iterator si; > + auto_vec <basic_block, 4> visited_bbs; > + bool ret = false; > + while (true) > + { > + bool empty_bb = true; > + visited_bbs.safe_push (bb); > + cache[bb->index] = 3; > + for (si = gsi_start_nondebug_bb (bb); > + !gsi_end_p (si) && empty_bb; > + gsi_next_nondebug (&si)) > + { > + if (gimple_code (gsi_stmt (si)) != GIMPLE_PREDICT > + && !gimple_clobber_p (gsi_stmt (si)) > + && !gimple_nop_p (gsi_stmt (si))) > + { > + empty_bb = false; > + break; > + } > + } > + if (!empty_bb) > + break; > + else > + bb = single_succ_edge (bb)->dest; > + if (cache[bb->index]) > + { > + ret = cache[bb->index] == 3 ? false : cache[bb->index] - 1; > + goto done; > + } > + } > + if (gimple_call_builtin_p (gsi_stmt (si), BUILT_IN_UNREACHABLE) > + || gimple_call_builtin_p (gsi_stmt (si), BUILT_IN_UNREACHABLE_TRAP)) ... we do code generate BUILT_IN_UNREACHABLE_TRAP, no? > + ret = true; > +done: > + for (basic_block vbb:visited_bbs) > + cache[vbb->index] = (unsigned char)ret + 1; > + return ret; > +} > + > /* Analyze function body for NODE. > EARLY indicates run from early optimization pipeline. */ > > @@ -2743,6 +2791,8 @@ analyze_function_body (struct cgraph_node *node, bool early) > const profile_count entry_count = ENTRY_BLOCK_PTR_FOR_FN (cfun)->count; > order = XNEWVEC (int, n_basic_blocks_for_fn (cfun)); > nblocks = pre_and_rev_post_order_compute (NULL, order, false); > + auto_vec<unsigned char, 10> cache; > + cache.safe_grow_cleared (last_basic_block_for_fn (cfun)); A sbitmap with two bits per entry would be more space efficient here. bitmap has bitmap_set_aligned_chunk and bitmap_get_aligned_chunk for convenience, adding the corresponding to sbitmap.h would likely ease use there as well. > for (n = 0; n < nblocks; n++) > { > bb = BASIC_BLOCK_FOR_FN (cfun, order[n]); > @@ -2901,6 +2951,24 @@ analyze_function_body (struct cgraph_node *node, bool early) > } > } > > + /* Conditionals guarding __builtin_unreachable will be > + optimized out. */ > + if (gimple_code (stmt) == GIMPLE_COND) > + { > + edge_iterator ei; > + edge e; > + FOR_EACH_EDGE (e, ei, bb->succs) > + if (builtin_unreachable_bb_p (e->dest, cache)) > + { > + if (dump_file) > + fprintf (dump_file, > + "\t\tConditional guarding __builtin_unreachable; ignored\n"); > + this_time = 0; > + this_size = 0; > + break; > + } > + } > + > /* TODO: When conditional jump or switch is known to be constant, but > we did not translate it into the predicates, we really can account > just maximum of the possible paths. */ > diff --git a/gcc/testsuite/gcc.dg/ipa/fnsummary-1.c b/gcc/testsuite/gcc.dg/ipa/fnsummary-1.c > new file mode 100644 > index 00000000000..a0ece0c300b > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/ipa/fnsummary-1.c > @@ -0,0 +1,8 @@ > +/* { dg-options "-O2 -fdump-ipa-fnsummary" } */ > +int > +test(int a) > +{ > + if (a > 10) > + __builtin_unreachable (); > +} > +/* { dg-final { scan-ipa-dump "Conditional guarding __builtin_unreachable" "fnsummary" } } */
> On Mon, Jun 19, 2023 at 9:52 AM Jan Hubicka via Gcc-patches > <gcc-patches@gcc.gnu.org> wrote: > > > > Hi, > > this was suggested earlier somewhere, but I can not find the thread. > > C++ has assume attribute that expands int > > if (conditional) > > __builtin_unreachable () > > We do not want to account the conditional in inline heuristics since > > we know that it is going to be optimized out. > > > > Bootstrapped/regtested x86_64-linux, will commit it later today if > > thre are no complains. > > I think we also had the request to not account the condition feeding > stmts (if they only feed it and have no side-effects). libstdc++ has > complex range comparisons here. Also ... I was thinking of this: it depends on how smart do we want to get. We also have dead conditionals guarding clobbers, predicts and other stuff. In general we can use mark phase of cd-dce telling it to ignore those statements and then use its resut in the analysis. Also question is how much we can rely on middle-end optimizing out unreachables. For example: int a; int b[3]; test() { if (a > 0) { for (int i = 0; i < 3; i++) b[i] = 0; __builtin_unreachable (); } } IMO can be optimized to empty function. I believe we used to have code in tree-cfgcleanup to remove statements just before __builtin_unreachable which can not terminate execution, but perhaps it existed only in my local tree? We could also perhaps declare unreachable NOVOPs which would make DSE to remove the stores. > > ... we do code generate BUILT_IN_UNREACHABLE_TRAP, no? You are right. I tested it with -funreachable-traps but it does not do what I expected, I need -fsanitize=unreachable -fsanitize-trap=unreachable Also if I try to call it by hand I get: jan@localhost:/tmp> gcc t.c -S -O2 -funreachable-traps -fdump-tree-all-details -fsanitize=unreachable -fsanitize-trap=unreachable t.c: In function ‘test’: t.c:9:13: warning: implicit declaration of function ‘__builtin_unreachable_trap’; did you mean ‘__builtin_unreachable trap’? [-Wimplicit-function-declaration] 9 | __builtin_unreachable_trap (); | ^~~~~~~~~~~~~~~~~~~~~~~~~~ | __builtin_unreachable trap Which is not as helpful as it is trying to be. > > > + ret = true; > > +done: > > + for (basic_block vbb:visited_bbs) > > + cache[vbb->index] = (unsigned char)ret + 1; > > + return ret; > > +} > > + > > /* Analyze function body for NODE. > > EARLY indicates run from early optimization pipeline. */ > > > > @@ -2743,6 +2791,8 @@ analyze_function_body (struct cgraph_node *node, bool early) > > const profile_count entry_count = ENTRY_BLOCK_PTR_FOR_FN (cfun)->count; > > order = XNEWVEC (int, n_basic_blocks_for_fn (cfun)); > > nblocks = pre_and_rev_post_order_compute (NULL, order, false); > > + auto_vec<unsigned char, 10> cache; > > + cache.safe_grow_cleared (last_basic_block_for_fn (cfun)); > > A sbitmap with two bits per entry would be more space efficient here. bitmap > has bitmap_set_aligned_chunk and bitmap_get_aligned_chunk for convenience, > adding the corresponding to sbitmap.h would likely ease use there as well. I did not know about the chunk API which is certainly nice :) sbitmap will always allocate, while here we stay on stack for small functions and I am not sure how much extra bit operations would not offset extra memset, but overall I think it is all in noise. Honza
On Mon, Jun 19, 2023 at 12:15 PM Jan Hubicka <hubicka@ucw.cz> wrote: > > > On Mon, Jun 19, 2023 at 9:52 AM Jan Hubicka via Gcc-patches > > <gcc-patches@gcc.gnu.org> wrote: > > > > > > Hi, > > > this was suggested earlier somewhere, but I can not find the thread. > > > C++ has assume attribute that expands int > > > if (conditional) > > > __builtin_unreachable () > > > We do not want to account the conditional in inline heuristics since > > > we know that it is going to be optimized out. > > > > > > Bootstrapped/regtested x86_64-linux, will commit it later today if > > > thre are no complains. > > > > I think we also had the request to not account the condition feeding > > stmts (if they only feed it and have no side-effects). libstdc++ has > > complex range comparisons here. Also ... > > I was thinking of this: it depends on how smart do we want to get. > We also have dead conditionals guarding clobbers, predicts and other > stuff. In general we can use mark phase of cd-dce telling it to ignore > those statements and then use its resut in the analysis. Hmm, possible but a bit heavy-handed. There's simple_dce_from_worklist which might be a way to do this (of course we cannot use that 1:1). Also then consider a = a + 1; if (a > 10) __builtin_unreachable (); if (a < 5) __builtin_unreachable (); and a has more than one use but both are going away. So indeed a more global analysis would be needed to get the full benefit. > Also question is how much we can rely on middle-end optimizing out > unreachables. For example: > int a; > int b[3]; > test() > { > if (a > 0) > { > for (int i = 0; i < 3; i++) > b[i] = 0; > __builtin_unreachable (); > } > } > > IMO can be optimized to empty function. I believe we used to have code > in tree-cfgcleanup to remove statements just before > __builtin_unreachable which can not terminate execution, but perhaps it > existed only in my local tree? I think we rely on DCE/DSE here and explicit unreachable () pruning after VRP picked up things (I think it simply gets the secondary effect optimizing the condition it created the range for in the first pass). DSE is appearantly not able to kill the stores, I will fix that. I think DCE can, but only for non-aliased stores. > We could also perhaps declare unreachable NOVOPs which would make DSE to > remove the stores. But only because of a bug in DSE ... it also removes them if that __builtin_unreachable () is GIMPLE_RESX. > > > > ... we do code generate BUILT_IN_UNREACHABLE_TRAP, no? > > You are right. I tested it with -funreachable-traps but it does not do > what I expected, I need -fsanitize=unreachable -fsanitize-trap=unreachable > > Also if I try to call it by hand I get: > > jan@localhost:/tmp> gcc t.c -S -O2 -funreachable-traps -fdump-tree-all-details -fsanitize=unreachable -fsanitize-trap=unreachable > t.c: In function ‘test’: > t.c:9:13: warning: implicit declaration of function ‘__builtin_unreachable_trap’; did you mean ‘__builtin_unreachable trap’? [-Wimplicit-function-declaration] > 9 | __builtin_unreachable_trap (); > | ^~~~~~~~~~~~~~~~~~~~~~~~~~ > | __builtin_unreachable trap > > Which is not as helpful as it is trying to be. > > > > > + ret = true; > > > +done: > > > + for (basic_block vbb:visited_bbs) > > > + cache[vbb->index] = (unsigned char)ret + 1; > > > + return ret; > > > +} > > > + > > > /* Analyze function body for NODE. > > > EARLY indicates run from early optimization pipeline. */ > > > > > > @@ -2743,6 +2791,8 @@ analyze_function_body (struct cgraph_node *node, bool early) > > > const profile_count entry_count = ENTRY_BLOCK_PTR_FOR_FN (cfun)->count; > > > order = XNEWVEC (int, n_basic_blocks_for_fn (cfun)); > > > nblocks = pre_and_rev_post_order_compute (NULL, order, false); > > > + auto_vec<unsigned char, 10> cache; > > > + cache.safe_grow_cleared (last_basic_block_for_fn (cfun)); > > > > A sbitmap with two bits per entry would be more space efficient here. bitmap > > has bitmap_set_aligned_chunk and bitmap_get_aligned_chunk for convenience, > > adding the corresponding to sbitmap.h would likely ease use there as well. > > I did not know about the chunk API which is certainly nice :) > sbitmap will always allocate, while here we stay on stack for small > functions and I am not sure how much extra bit operations would not > offset extra memset, but overall I think it is all in noise. Ah, yeah. > Honza
On Mon, Jun 19, 2023 at 1:30 PM Richard Biener <richard.guenther@gmail.com> wrote: > > On Mon, Jun 19, 2023 at 12:15 PM Jan Hubicka <hubicka@ucw.cz> wrote: > > > > > On Mon, Jun 19, 2023 at 9:52 AM Jan Hubicka via Gcc-patches > > > <gcc-patches@gcc.gnu.org> wrote: > > > > > > > > Hi, > > > > this was suggested earlier somewhere, but I can not find the thread. > > > > C++ has assume attribute that expands int > > > > if (conditional) > > > > __builtin_unreachable () > > > > We do not want to account the conditional in inline heuristics since > > > > we know that it is going to be optimized out. > > > > > > > > Bootstrapped/regtested x86_64-linux, will commit it later today if > > > > thre are no complains. > > > > > > I think we also had the request to not account the condition feeding > > > stmts (if they only feed it and have no side-effects). libstdc++ has > > > complex range comparisons here. Also ... > > > > I was thinking of this: it depends on how smart do we want to get. > > We also have dead conditionals guarding clobbers, predicts and other > > stuff. In general we can use mark phase of cd-dce telling it to ignore > > those statements and then use its resut in the analysis. > > Hmm, possible but a bit heavy-handed. There's simple_dce_from_worklist > which might be a way to do this (of course we cannot use that 1:1). Also > then consider > > a = a + 1; > if (a > 10) > __builtin_unreachable (); > if (a < 5) > __builtin_unreachable (); > > and a has more than one use but both are going away. So indeed a > more global analysis would be needed to get the full benefit. > > > Also question is how much we can rely on middle-end optimizing out > > unreachables. For example: > > int a; > > int b[3]; > > test() > > { > > if (a > 0) > > { > > for (int i = 0; i < 3; i++) > > b[i] = 0; > > __builtin_unreachable (); > > } > > } > > > > IMO can be optimized to empty function. I believe we used to have code > > in tree-cfgcleanup to remove statements just before > > __builtin_unreachable which can not terminate execution, but perhaps it > > existed only in my local tree? > > I think we rely on DCE/DSE here and explicit unreachable () pruning after > VRP picked up things (I think it simply gets the secondary effect optimizing > the condition it created the range for in the first pass). > > DSE is appearantly not able to kill the stores, I will fix that. I > think DCE can, > but only for non-aliased stores. > > > We could also perhaps declare unreachable NOVOPs which would make DSE to > > remove the stores. > > But only because of a bug in DSE ... it also removes them if that > __builtin_unreachable () > is GIMPLE_RESX. Oh, and __builtin_unreachable is already 'const' and thus without any VOPs. The issue in DSE is that DSE will not run into __builtin_unreachable because it has no VOPs. Instead DSE relies on eventually seeing a VUSE for all paths leaving a function, it doesn't have a way to consider __builtin_unreachable killing all memory (it would need a VDEF for that). It might be possible to record which virtual operands are live at BBs without successors (but the VUSE on returns was an attempt to avoid the need for that). So there's no easy way to fix DSE here. > > > > > > ... we do code generate BUILT_IN_UNREACHABLE_TRAP, no? > > > > You are right. I tested it with -funreachable-traps but it does not do > > what I expected, I need -fsanitize=unreachable -fsanitize-trap=unreachable > > > > Also if I try to call it by hand I get: > > > > jan@localhost:/tmp> gcc t.c -S -O2 -funreachable-traps -fdump-tree-all-details -fsanitize=unreachable -fsanitize-trap=unreachable > > t.c: In function ‘test’: > > t.c:9:13: warning: implicit declaration of function ‘__builtin_unreachable_trap’; did you mean ‘__builtin_unreachable trap’? [-Wimplicit-function-declaration] > > 9 | __builtin_unreachable_trap (); > > | ^~~~~~~~~~~~~~~~~~~~~~~~~~ > > | __builtin_unreachable trap > > > > Which is not as helpful as it is trying to be. > > > > > > > + ret = true; > > > > +done: > > > > + for (basic_block vbb:visited_bbs) > > > > + cache[vbb->index] = (unsigned char)ret + 1; > > > > + return ret; > > > > +} > > > > + > > > > /* Analyze function body for NODE. > > > > EARLY indicates run from early optimization pipeline. */ > > > > > > > > @@ -2743,6 +2791,8 @@ analyze_function_body (struct cgraph_node *node, bool early) > > > > const profile_count entry_count = ENTRY_BLOCK_PTR_FOR_FN (cfun)->count; > > > > order = XNEWVEC (int, n_basic_blocks_for_fn (cfun)); > > > > nblocks = pre_and_rev_post_order_compute (NULL, order, false); > > > > + auto_vec<unsigned char, 10> cache; > > > > + cache.safe_grow_cleared (last_basic_block_for_fn (cfun)); > > > > > > A sbitmap with two bits per entry would be more space efficient here. bitmap > > > has bitmap_set_aligned_chunk and bitmap_get_aligned_chunk for convenience, > > > adding the corresponding to sbitmap.h would likely ease use there as well. > > > > I did not know about the chunk API which is certainly nice :) > > sbitmap will always allocate, while here we stay on stack for small > > functions and I am not sure how much extra bit operations would not > > offset extra memset, but overall I think it is all in noise. > > Ah, yeah. > > Honza
> On Mon, Jun 19, 2023 at 12:15 PM Jan Hubicka <hubicka@ucw.cz> wrote: > > > > > On Mon, Jun 19, 2023 at 9:52 AM Jan Hubicka via Gcc-patches > > > <gcc-patches@gcc.gnu.org> wrote: > > > > > > > > Hi, > > > > this was suggested earlier somewhere, but I can not find the thread. > > > > C++ has assume attribute that expands int > > > > if (conditional) > > > > __builtin_unreachable () > > > > We do not want to account the conditional in inline heuristics since > > > > we know that it is going to be optimized out. > > > > > > > > Bootstrapped/regtested x86_64-linux, will commit it later today if > > > > thre are no complains. > > > > > > I think we also had the request to not account the condition feeding > > > stmts (if they only feed it and have no side-effects). libstdc++ has > > > complex range comparisons here. Also ... > > > > I was thinking of this: it depends on how smart do we want to get. > > We also have dead conditionals guarding clobbers, predicts and other > > stuff. In general we can use mark phase of cd-dce telling it to ignore > > those statements and then use its resut in the analysis. > > Hmm, possible but a bit heavy-handed. There's simple_dce_from_worklist > which might be a way to do this (of course we cannot use that 1:1). Also > then consider > > a = a + 1; > if (a > 10) > __builtin_unreachable (); > if (a < 5) > __builtin_unreachable (); > > and a has more than one use but both are going away. So indeed a > more global analysis would be needed to get the full benefit. I was looking into simple_dce_from_worklist and if I understand it right, it simply walks list of SSA names which probably lost some uses by the consuming pass. If they have zero non-debug uses and defining statement has no side effects, then they are removed. I think this is not really fitting the bill here since the example above is likely to be common and also if we want one assign filling conditional optimized out, we probably want to handle case with multiple assignments. What about 1) walk function body and see if there are conditionals we know will be optimized out (at the begining those can be only those which has one arm reaching __bulitin_unreachable 2) if there are none, just proceed with fnsummary construction 3) if there were some, do non-cd-dce mark stage which will skip those dead conditional identified in 1 and proceed to fnsummary construction with additional bitmap of marked stmts. This should be cheaper than unconditionally doing cd-dce and should handle common cases? Honza
On Fri, Jun 23, 2023 at 12:11 PM Jan Hubicka <hubicka@ucw.cz> wrote: > > > On Mon, Jun 19, 2023 at 12:15 PM Jan Hubicka <hubicka@ucw.cz> wrote: > > > > > > > On Mon, Jun 19, 2023 at 9:52 AM Jan Hubicka via Gcc-patches > > > > <gcc-patches@gcc.gnu.org> wrote: > > > > > > > > > > Hi, > > > > > this was suggested earlier somewhere, but I can not find the thread. > > > > > C++ has assume attribute that expands int > > > > > if (conditional) > > > > > __builtin_unreachable () > > > > > We do not want to account the conditional in inline heuristics since > > > > > we know that it is going to be optimized out. > > > > > > > > > > Bootstrapped/regtested x86_64-linux, will commit it later today if > > > > > thre are no complains. > > > > > > > > I think we also had the request to not account the condition feeding > > > > stmts (if they only feed it and have no side-effects). libstdc++ has > > > > complex range comparisons here. Also ... > > > > > > I was thinking of this: it depends on how smart do we want to get. > > > We also have dead conditionals guarding clobbers, predicts and other > > > stuff. In general we can use mark phase of cd-dce telling it to ignore > > > those statements and then use its resut in the analysis. > > > > Hmm, possible but a bit heavy-handed. There's simple_dce_from_worklist > > which might be a way to do this (of course we cannot use that 1:1). Also > > then consider > > > > a = a + 1; > > if (a > 10) > > __builtin_unreachable (); > > if (a < 5) > > __builtin_unreachable (); > > > > and a has more than one use but both are going away. So indeed a > > more global analysis would be needed to get the full benefit. > > I was looking into simple_dce_from_worklist and if I understand it > right, it simply walks list of SSA names which probably lost some uses > by the consuming pass. If they have zero non-debug uses and defining statement has > no side effects, then they are removed. > > I think this is not really fitting the bill here since the example above > is likely to be common and also if we want one assign filling > conditional optimized out, we probably want to handle case with multiple > assignments. What about > 1) walk function body and see if there are conditionals we know will be > optimized out (at the begining those can be only those which has one > arm reaching __bulitin_unreachable > 2) if there are none, just proceed with fnsummary construction > 3) if there were some, do non-cd-dce mark stage which will skip those > dead conditional identified in 1 > and proceed to fnsummary construction with additional bitmap of > marked stmts. So you need to feed it with extra info on the optimized out stmts because as-is it will not remove __builtin_unreachable (). That means you're doing the find_obviously_necessary_stmts manually, skipping the conditional and all stmts it controls to the __builtin_unreachable () path? I also think you want something cheaper than non-cd-dce mark, you also don't want to bother with stores/loads? Also when you only do this conditional how do you plan to use the result? Richard. > > This should be cheaper than unconditionally doing cd-dce and should > handle common cases? > Honza
> > So you need to feed it with extra info on the optimized out stmts because > as-is it will not remove __builtin_unreachable (). That means you're My plan was to add entry point to tree-ssa-dce that will take an set of stmts declared dead by external force and will do the usual mark stage bypassing mark_stmt_if_necessary if the stmt is in the set of deads. > doing the find_obviously_necessary_stmts manually, skipping the > conditional and all stmts it controls to the __builtin_unreachable () path? > > I also think you want something cheaper than non-cd-dce mark, you also don't > want to bother with stores/loads? You are probably right. cd-dce marking became bit of a monster and I do not want to care about memory. One can add extra flag to avoid processing of memory, but the code I would re-use is quite small. I can do my own mark&sweep just considering phis, pre-identified conditionals and basic gimple_assigns with no side effects as possibly unnecesary stmts. I can completely ignore debug stmts. So it should be one pass through the statments to populate the worklist & simple walk of the ssa graph to propagae it. > > Also when you only do this conditional how do you plan to use the result? Well, the analysis is a loop that walks all basic blocks and then all stmts. I can keep track if computation of live stmts was done and in that case query the flag assume it is true otherwise. Honza
diff --git a/gcc/ipa-fnsummary.cc b/gcc/ipa-fnsummary.cc index a5f5a50c8a5..987da29ec34 100644 --- a/gcc/ipa-fnsummary.cc +++ b/gcc/ipa-fnsummary.cc @@ -2649,6 +2649,54 @@ points_to_possible_sra_candidate_p (tree t) return false; } +/* Return true if BB is builtin_unreachable. + We skip empty basic blocks, debug statements, clobbers and predicts. + CACHE is used to memoize already analyzed blocks. */ + +static bool +builtin_unreachable_bb_p (basic_block bb, vec<unsigned char> &cache) +{ + if (cache[bb->index]) + return cache[bb->index] - 1; + gimple_stmt_iterator si; + auto_vec <basic_block, 4> visited_bbs; + bool ret = false; + while (true) + { + bool empty_bb = true; + visited_bbs.safe_push (bb); + cache[bb->index] = 3; + for (si = gsi_start_nondebug_bb (bb); + !gsi_end_p (si) && empty_bb; + gsi_next_nondebug (&si)) + { + if (gimple_code (gsi_stmt (si)) != GIMPLE_PREDICT + && !gimple_clobber_p (gsi_stmt (si)) + && !gimple_nop_p (gsi_stmt (si))) + { + empty_bb = false; + break; + } + } + if (!empty_bb) + break; + else + bb = single_succ_edge (bb)->dest; + if (cache[bb->index]) + { + ret = cache[bb->index] == 3 ? false : cache[bb->index] - 1; + goto done; + } + } + if (gimple_call_builtin_p (gsi_stmt (si), BUILT_IN_UNREACHABLE) + || gimple_call_builtin_p (gsi_stmt (si), BUILT_IN_UNREACHABLE_TRAP)) + ret = true; +done: + for (basic_block vbb:visited_bbs) + cache[vbb->index] = (unsigned char)ret + 1; + return ret; +} + /* Analyze function body for NODE. EARLY indicates run from early optimization pipeline. */ @@ -2743,6 +2791,8 @@ analyze_function_body (struct cgraph_node *node, bool early) const profile_count entry_count = ENTRY_BLOCK_PTR_FOR_FN (cfun)->count; order = XNEWVEC (int, n_basic_blocks_for_fn (cfun)); nblocks = pre_and_rev_post_order_compute (NULL, order, false); + auto_vec<unsigned char, 10> cache; + cache.safe_grow_cleared (last_basic_block_for_fn (cfun)); for (n = 0; n < nblocks; n++) { bb = BASIC_BLOCK_FOR_FN (cfun, order[n]); @@ -2901,6 +2951,24 @@ analyze_function_body (struct cgraph_node *node, bool early) } } + /* Conditionals guarding __builtin_unreachable will be + optimized out. */ + if (gimple_code (stmt) == GIMPLE_COND) + { + edge_iterator ei; + edge e; + FOR_EACH_EDGE (e, ei, bb->succs) + if (builtin_unreachable_bb_p (e->dest, cache)) + { + if (dump_file) + fprintf (dump_file, + "\t\tConditional guarding __builtin_unreachable; ignored\n"); + this_time = 0; + this_size = 0; + break; + } + } + /* TODO: When conditional jump or switch is known to be constant, but we did not translate it into the predicates, we really can account just maximum of the possible paths. */ diff --git a/gcc/testsuite/gcc.dg/ipa/fnsummary-1.c b/gcc/testsuite/gcc.dg/ipa/fnsummary-1.c new file mode 100644 index 00000000000..a0ece0c300b --- /dev/null +++ b/gcc/testsuite/gcc.dg/ipa/fnsummary-1.c @@ -0,0 +1,8 @@ +/* { dg-options "-O2 -fdump-ipa-fnsummary" } */ +int +test(int a) +{ + if (a > 10) + __builtin_unreachable (); +} +/* { dg-final { scan-ipa-dump "Conditional guarding __builtin_unreachable" "fnsummary" } } */