Message ID | 0e195e6c-5e41-34c8-dd6c-b6722845a3a3@redhat.com |
---|---|
State | New |
Headers | show |
Series | [committed,PR,rtl-optimization/87761] Limited iteration in regcprop to pick up secondary opportunities | expand |
Hi Jeff, On Sun, Mar 24, 2019 at 09:20:07AM -0600, Jeff Law wrote: > + PR rtl-optimization/87761 > + * regcprop.c (copyprop_hardreg_forward_1): Check may_trap_p on SET, > + not INSN. Also check RTX_FRAME_RELATED_P. Queue insns for DF rescan > + as needed. Why the RTX_FRAME_RELATED_P addition? You didn't explain it I think. Is it just a bugfix, or something this patch exposed, something that couldn't happen before (or was harmless) and now isn't anymore? Or should this part be backported :-) Segher
On 3/24/19 11:11 AM, Segher Boessenkool wrote: > Hi Jeff, > > On Sun, Mar 24, 2019 at 09:20:07AM -0600, Jeff Law wrote: >> + PR rtl-optimization/87761 >> + * regcprop.c (copyprop_hardreg_forward_1): Check may_trap_p on SET, >> + not INSN. Also check RTX_FRAME_RELATED_P. Queue insns for DF rescan >> + as needed. > > Why the RTX_FRAME_RELATED_P addition? You didn't explain it I think. Is it > just a bugfix, or something this patch exposed, something that couldn't > happen before (or was harmless) and now isn't anymore? It's just a bugfix -- one of the embedded targets needed it, I can't offhand remember which -- we had a frame related insn with a REG_UNUSED note which the code deleted causing a fault in the dwarf2cfi bits. Thank goodness for Jenkins scripts which will build and test all those silly *-elf targets :-) > > Or should this part be backported :-) Shouldn't be needed as the code to remove insns with REG_UNUSED notes is new for gcc-9. jeff
On Sun, Mar 24, 2019 at 09:20:07AM -0600, Jeff Law wrote: > However, I'm increasingly of the opinion that MIPS targets need to drop > off the priority platform list. Given the trajectory I see for MIPS > based processors in industry, it's really hard to justify spending this > much time on them, particularly for low priority code quality issues. Besides what has been discussed on IRC for the PR89826 fix, that we really need a df_analyze before processing the first block, because otherwise we can't rely on the REG_UNUSED notes in the IL, I see some other issues, but I admit I don't know much about df nor regcprop. 1) the df_analyze () after every (successful) processing of a basic block is IMHO way too expensive, I would be very surprised if df_analyze () isn't quadratic in number of basic blocks and so one could construct testcases with millions of basic blocks and at least one regcprop change in each bb and get at cubic complexity (correct me if I'm wrong, and I'm aware of the 95% bbs you said won't have any changes at all) 2) another thing I've noticed today, we queue up the debug insn changes, but if we iterate the second time for any bb, we overwrite and thus lose any of the debug insn queued changes from the first iteration, so those changes aren't applied (wrong-debug?) 3) as mentioned on IRC, the order of tests in copyprop_hardreg_forward_1 conditional doesn't start from the cheapest to most expensive one So, do we want something like the following untested patch (only tested that the PR89826 testcase is fixed on ARM, but df_analyze right after df_set_flags is all that is needed to cure that. I've outlined two hunks of code into separate functions, so that I can call them from two different spots. And, I don't see why in the debug processing loop we should test the visited bitmap, isn't it guaranteed that it is set on all the bbs after going through the first FOR_EACH_BB_FN loop? On the other side, I thought it might be beneficial not to do the debug FOR_EACH_BB_FN loop(s) if there aren't any debug insn changes queued (not sure how often would that happen, if not often enough, that part can be dropped). --- gcc/regcprop.c.jj 2019-03-25 11:02:55.826998948 +0100 +++ gcc/regcprop.c 2019-03-27 15:15:23.505651565 +0100 @@ -800,9 +800,9 @@ copyprop_hardreg_forward_1 (basic_block /* Detect obviously dead sets (via REG_UNUSED notes) and remove them. */ if (set - && !may_trap_p (set) - && !RTX_FRAME_RELATED_P (insn) && INSN_P (insn) + && !RTX_FRAME_RELATED_P (insn) + && !may_trap_p (set) && find_reg_note (insn, REG_UNUSED, SET_DEST (set)) && !side_effects_p (SET_SRC (set)) && !side_effects_p (SET_DEST (set))) @@ -1282,6 +1282,66 @@ public: }; // class pass_cprop_hardreg +static bool +cprop_hardreg_bb (basic_block bb, struct value_data *all_vd, sbitmap visited) +{ + bitmap_set_bit (visited, bb->index); + + /* If a block has a single predecessor, that we've already + processed, begin with the value data that was live at + the end of the predecessor block. */ + /* ??? Ought to use more intelligent queuing of blocks. */ + if (single_pred_p (bb) + && bitmap_bit_p (visited, single_pred (bb)->index) + && ! (single_pred_edge (bb)->flags & (EDGE_ABNORMAL_CALL | EDGE_EH))) + { + all_vd[bb->index] = all_vd[single_pred (bb)->index]; + if (all_vd[bb->index].n_debug_insn_changes) + { + unsigned int regno; + + for (regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++) + { + if (all_vd[bb->index].e[regno].debug_insn_changes) + { + all_vd[bb->index].e[regno].debug_insn_changes = NULL; + if (--all_vd[bb->index].n_debug_insn_changes == 0) + break; + } + } + } + } + else + init_value_data (all_vd + bb->index); + + return copyprop_hardreg_forward_1 (bb, all_vd + bb->index); +} + +static void +cprop_hardreg_debug (function *fun, struct value_data *all_vd) +{ + basic_block bb; + + FOR_EACH_BB_FN (bb, fun) + if (all_vd[bb->index].n_debug_insn_changes) + { + unsigned int regno; + bitmap live; + + live = df_get_live_out (bb); + for (regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++) + if (all_vd[bb->index].e[regno].debug_insn_changes) + { + if (REGNO_REG_SET_P (live, regno)) + apply_debug_insn_changes (all_vd + bb->index, regno); + if (all_vd[bb->index].n_debug_insn_changes == 0) + break; + } + } + + queued_debug_insn_change_pool.release (); +} + unsigned int pass_cprop_hardreg::execute (function *fun) { @@ -1293,6 +1353,9 @@ pass_cprop_hardreg::execute (function *f auto_sbitmap visited (last_basic_block_for_fn (fun)); bitmap_clear (visited); + auto_vec<int> redo; + bool any_debug_changes = false; + df_note_add_problem (); /* It is tempting to set DF_LR_RUN_DCE, but DCE may choose to delete @@ -1305,71 +1368,42 @@ pass_cprop_hardreg::execute (function *f data structures appropriately. */ df_set_flags (DF_DEFER_INSN_RESCAN); + df_analyze (); + FOR_EACH_BB_FN (bb, fun) { - bitmap_set_bit (visited, bb->index); - - for (int pass = 0; pass < 2; pass++) - { - /* If a block has a single predecessor, that we've already - processed, begin with the value data that was live at - the end of the predecessor block. */ - /* ??? Ought to use more intelligent queuing of blocks. */ - if (single_pred_p (bb) - && bitmap_bit_p (visited, single_pred (bb)->index) - && ! (single_pred_edge (bb)->flags & (EDGE_ABNORMAL_CALL | EDGE_EH))) - { - all_vd[bb->index] = all_vd[single_pred (bb)->index]; - if (all_vd[bb->index].n_debug_insn_changes) - { - unsigned int regno; - - for (regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++) - { - if (all_vd[bb->index].e[regno].debug_insn_changes) - { - all_vd[bb->index].e[regno].debug_insn_changes = NULL; - if (--all_vd[bb->index].n_debug_insn_changes == 0) - break; - } - } - } - } - else - init_value_data (all_vd + bb->index); - - /* If we were unable to propagate, then break the loop. */ - if (!copyprop_hardreg_forward_1 (bb, all_vd + bb->index)) - break; - df_analyze (); - } + if (cprop_hardreg_bb (bb, all_vd, visited)) + redo.safe_push (bb->index); + if (all_vd[bb->index].n_debug_insn_changes) + any_debug_changes = true; } /* We must call df_analyze here unconditionally to ensure that the REG_UNUSED and REG_DEAD notes are consistent with and without -g. */ df_analyze (); - if (MAY_HAVE_DEBUG_BIND_INSNS) + if (MAY_HAVE_DEBUG_BIND_INSNS && any_debug_changes) + cprop_hardreg_debug (fun, all_vd); + + /* Second pass if we've changed anything, only for the bbs where we have + changed anything though. */ + if (!redo.is_empty ()) { - FOR_EACH_BB_FN (bb, fun) - if (bitmap_bit_p (visited, bb->index) - && all_vd[bb->index].n_debug_insn_changes) - { - unsigned int regno; - bitmap live; + unsigned int i; + int index; - live = df_get_live_out (bb); - for (regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++) - if (all_vd[bb->index].e[regno].debug_insn_changes) - { - if (REGNO_REG_SET_P (live, regno)) - apply_debug_insn_changes (all_vd + bb->index, regno); - if (all_vd[bb->index].n_debug_insn_changes == 0) - break; - } - } + any_debug_changes = false; + FOR_EACH_VEC_ELT (redo, i, index) + { + bb = BASIC_BLOCK_FOR_FN (fun, index); + cprop_hardreg_bb (bb, all_vd, visited); + if (all_vd[bb->index].n_debug_insn_changes) + any_debug_changes = true; + } - queued_debug_insn_change_pool.release (); + df_analyze (); + if (MAY_HAVE_DEBUG_BIND_INSNS && any_debug_changes) + cprop_hardreg_debug (fun, all_vd); } free (all_vd); Jakub
On 3/27/19 8:36 AM, Jakub Jelinek wrote: > On Sun, Mar 24, 2019 at 09:20:07AM -0600, Jeff Law wrote: >> However, I'm increasingly of the opinion that MIPS targets need to drop >> off the priority platform list. Given the trajectory I see for MIPS >> based processors in industry, it's really hard to justify spending this >> much time on them, particularly for low priority code quality issues. > > Besides what has been discussed on IRC for the PR89826 fix, that we really > need a df_analyze before processing the first block, because otherwise we > can't rely on the REG_UNUSED notes in the IL, I see some other issues, but I > admit I don't know much about df nor regcprop. RIght. I plan to commit that today along with the test reordering you pointed out. > > 1) the df_analyze () after every (successful) processing of a basic block > is IMHO way too expensive, I would be very surprised if df_analyze () isn't > quadratic in number of basic blocks and so one could construct testcases > with millions of basic blocks and at least one regcprop change in each bb > and get at cubic complexity (correct me if I'm wrong, and I'm aware of the > 95% bbs you said won't have any changes at all) I'm going to look this further today. > > 2) another thing I've noticed today, we queue up the debug insn changes, > but if we iterate the second time for any bb, we overwrite and thus lose any > of the debug insn queued changes from the first iteration, so those changes > aren't applied (wrong-debug?) This is actually what worries me, both with the current bits and with any bits that change to a worklist of blocks to reprocess. As is I think we preserve the debug stuff across the iterations which should keep debug info correct. Managing that in a world where we queue up the iteration step is going to be tougher. Queuing the iteration step feels like gcc-10 to me, but we'll see if it falls out easier than expected. > > 3) as mentioned on IRC, the order of tests in copyprop_hardreg_forward_1 > conditional doesn't start from the cheapest to most expensive one That's going to be fixed in today's commit since it's been tested. Jeff
On Wed, Mar 27, 2019 at 09:24:46AM -0600, Jeff Law wrote: > > 2) another thing I've noticed today, we queue up the debug insn changes, > > but if we iterate the second time for any bb, we overwrite and thus lose any > > of the debug insn queued changes from the first iteration, so those changes > > aren't applied (wrong-debug?) > This is actually what worries me, both with the current bits and with > any bits that change to a worklist of blocks to reprocess. As is I > think we preserve the debug stuff across the iterations which should > keep debug info correct. Managing that in a world where we queue up the > iteration step is going to be tougher. The patch I've posted would do df_analyze, all bbs once, then df_analyze, process queued debug changes, and if anything needs to be repeated (just once), redo the bbs from worklist and repeat the above. That seems to me the easiest way to get rid of the per-bb df_analyze calls as well as fixing the debug stuff. Because otherwise, we'd need to somehow merge the queued debug insns from the first and second pass and figure out how to deal with that. Jakub
On 3/27/19 9:24 AM, Jeff Law wrote: > On 3/27/19 8:36 AM, Jakub Jelinek wrote: >> On Sun, Mar 24, 2019 at 09:20:07AM -0600, Jeff Law wrote: >>> However, I'm increasingly of the opinion that MIPS targets need to drop >>> off the priority platform list. Given the trajectory I see for MIPS >>> based processors in industry, it's really hard to justify spending this >>> much time on them, particularly for low priority code quality issues. >> >> Besides what has been discussed on IRC for the PR89826 fix, that we really >> need a df_analyze before processing the first block, because otherwise we >> can't rely on the REG_UNUSED notes in the IL, I see some other issues, but I >> admit I don't know much about df nor regcprop. > RIght. I plan to commit that today along with the test reordering you > pointed out. And the actual patch committed after the usual bootstrapping & regression test on x86_64 plus testing on various *-elf platforms, mips64-linux-gnu, mips64el-linux-gnu and others :-) Jeff diff --git a/gcc/ChangeLog b/gcc/ChangeLog index 07a1333cee0..f8a353d16c0 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,3 +1,13 @@ +2019-03-27 Jeff Law <law@redhat.com> + + + PR rtl-optimization/87761 + PR rtl-optimization/89826 + * regcprop.c (copyprop_hardreg_forward_1): Move may_trap_p test + slightly later. + (pass_cprop_hardreg::execute): Call df_analyze after adding the + note problem to get REG_DEAD/REG_UNUSED notes updated. + 2019-03-27 Richard Biener <rguenther@suse.de> PR tree-optimization/89463 diff --git a/gcc/regcprop.c b/gcc/regcprop.c index 8ca523ffe23..3efe21f377c 100644 --- a/gcc/regcprop.c +++ b/gcc/regcprop.c @@ -800,9 +800,9 @@ copyprop_hardreg_forward_1 (basic_block bb, struct value_data *vd) /* Detect obviously dead sets (via REG_UNUSED notes) and remove them. */ if (set - && !may_trap_p (set) && !RTX_FRAME_RELATED_P (insn) && INSN_P (insn) + && !may_trap_p (set) && find_reg_note (insn, REG_UNUSED, SET_DEST (set)) && !side_effects_p (SET_SRC (set)) && !side_effects_p (SET_DEST (set))) @@ -1293,7 +1293,10 @@ pass_cprop_hardreg::execute (function *fun) auto_sbitmap visited (last_basic_block_for_fn (fun)); bitmap_clear (visited); + /* We need accurate notes. Earlier passes such as if-conversion may + leave notes in an inconsistent state. */ df_note_add_problem (); + df_analyze (); /* It is tempting to set DF_LR_RUN_DCE, but DCE may choose to delete an insn and this pass would not have visibility into the removal. diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog index 0679cb72e52..b2c649cfb3e 100644 --- a/gcc/testsuite/ChangeLog +++ b/gcc/testsuite/ChangeLog @@ -1,3 +1,9 @@ +2019-03-26 Jeff Law <law@redhat.com> + + PR rtl-optimization/87761 + PR rtl-optimization/89826 + * gcc.c-torture/execute/pr89826.c: New test. + 2019-03-27 Richard Biener <rguenther@suse.de> * gcc.dg/torture/20190327-1.c: New testcase. diff --git a/gcc/testsuite/gcc.c-torture/execute/pr89826.c b/gcc/testsuite/gcc.c-torture/execute/pr89826.c new file mode 100644 index 00000000000..091644849d3 --- /dev/null +++ b/gcc/testsuite/gcc.c-torture/execute/pr89826.c @@ -0,0 +1,21 @@ +typedef unsigned int u32; +typedef unsigned long long u64; +u64 a; +u32 b; + +u64 +foo (u32 d) +{ + a -= d ? 0 : ~a; + return a + b; +} + +int +main (void) +{ + u64 x = foo (2); + if (x != 0) + __builtin_abort(); + return 0; +} +
On Wed, Mar 27, 2019 at 10:20:17AM -0600, Jeff Law wrote: > --- a/gcc/regcprop.c > +++ b/gcc/regcprop.c > @@ -800,9 +800,9 @@ copyprop_hardreg_forward_1 (basic_block bb, struct value_data *vd) > > /* Detect obviously dead sets (via REG_UNUSED notes) and remove them. */ > if (set > - && !may_trap_p (set) > && !RTX_FRAME_RELATED_P (insn) > && INSN_P (insn) > + && !may_trap_p (set) One more nitpick, I think the && INSN (insn) test is redundant there. A few lines above this we have: if (!NONDEBUG_INSN_P (insn)) { ... break; or continue; } // code that doesn't change insn, at most delete_insn + break/continue and as #define NONDEBUG_INSN_P(X) (INSN_P (X) && !DEBUG_INSN_P (X)) INSN_P (insn) must be always true. Jakub
On Wed, Mar 27, 2019 at 4:26 PM Jeff Law <law@redhat.com> wrote: > > On 3/27/19 8:36 AM, Jakub Jelinek wrote: > > On Sun, Mar 24, 2019 at 09:20:07AM -0600, Jeff Law wrote: > >> However, I'm increasingly of the opinion that MIPS targets need to drop > >> off the priority platform list. Given the trajectory I see for MIPS > >> based processors in industry, it's really hard to justify spending this > >> much time on them, particularly for low priority code quality issues. > > > > Besides what has been discussed on IRC for the PR89826 fix, that we really > > need a df_analyze before processing the first block, because otherwise we > > can't rely on the REG_UNUSED notes in the IL, I see some other issues, but I > > admit I don't know much about df nor regcprop. > RIght. I plan to commit that today along with the test reordering you > pointed out. > > > > > 1) the df_analyze () after every (successful) processing of a basic block > > is IMHO way too expensive, I would be very surprised if df_analyze () isn't > > quadratic in number of basic blocks and so one could construct testcases > > with millions of basic blocks and at least one regcprop change in each bb > > and get at cubic complexity (correct me if I'm wrong, and I'm aware of the > > 95% bbs you said won't have any changes at all) > I'm going to look this further today. Look at https://gcc.opensuse.org/gcc-old/c++bench-czerny/random/random-performance-latest and you'll see multiple testcases with 'hard reg cprop' >10% compile-time. It's indeed a hog for no good reason. Richard. > > > > 2) another thing I've noticed today, we queue up the debug insn changes, > > but if we iterate the second time for any bb, we overwrite and thus lose any > > of the debug insn queued changes from the first iteration, so those changes > > aren't applied (wrong-debug?) > This is actually what worries me, both with the current bits and with > any bits that change to a worklist of blocks to reprocess. As is I > think we preserve the debug stuff across the iterations which should > keep debug info correct. Managing that in a world where we queue up the > iteration step is going to be tougher. > > Queuing the iteration step feels like gcc-10 to me, but we'll see if it > falls out easier than expected. > > > > > 3) as mentioned on IRC, the order of tests in copyprop_hardreg_forward_1 > > conditional doesn't start from the cheapest to most expensive one > That's going to be fixed in today's commit since it's been tested. > > Jeff
On Thu, Mar 28, 2019 at 09:55:46AM +0100, Richard Biener wrote: > On Wed, Mar 27, 2019 at 4:26 PM Jeff Law <law@redhat.com> wrote: > > > > On 3/27/19 8:36 AM, Jakub Jelinek wrote: > > > On Sun, Mar 24, 2019 at 09:20:07AM -0600, Jeff Law wrote: > > >> However, I'm increasingly of the opinion that MIPS targets need to drop > > >> off the priority platform list. Given the trajectory I see for MIPS > > >> based processors in industry, it's really hard to justify spending this > > >> much time on them, particularly for low priority code quality issues. > > > > > > Besides what has been discussed on IRC for the PR89826 fix, that we really > > > need a df_analyze before processing the first block, because otherwise we > > > can't rely on the REG_UNUSED notes in the IL, I see some other issues, but I > > > admit I don't know much about df nor regcprop. > > RIght. I plan to commit that today along with the test reordering you > > pointed out. > > > > > > > > 1) the df_analyze () after every (successful) processing of a basic block > > > is IMHO way too expensive, I would be very surprised if df_analyze () isn't > > > quadratic in number of basic blocks and so one could construct testcases > > > with millions of basic blocks and at least one regcprop change in each bb > > > and get at cubic complexity (correct me if I'm wrong, and I'm aware of the > > > 95% bbs you said won't have any changes at all) > > I'm going to look this further today. > > Look at https://gcc.opensuse.org/gcc-old/c++bench-czerny/random/random-performance-latest > and you'll see multiple testcases with 'hard reg cprop' >10% compile-time. > It's indeed a hog for no good reason. I've tried https://gcc.gnu.org/bugzilla/show_bug.cgi?id=28071#c1 in --enable-checking=yes,rtl,extra bootstrapped cc1 at -O2, without and with the patch. The important times in -ftime-report with vanilla trunk: phase opt and generate : 250.76 (100%) 2.00 ( 96%) 253.36 (100%) 768860 kB ( 99%) df live regs : 19.95 ( 8%) 0.03 ( 1%) 19.39 ( 8%) 0 kB ( 0%) df live&initialized regs : 20.29 ( 8%) 0.05 ( 2%) 19.73 ( 8%) 0 kB ( 0%) df reg dead/unused notes : 158.66 ( 63%) 0.02 ( 1%) 160.12 ( 63%) 4665 kB ( 1%) hard reg cprop : 21.03 ( 8%) 0.01 ( 0%) 21.39 ( 8%) 509 kB ( 0%) TOTAL : 250.85 2.09 253.57 776940 kB (ignoring everything <2% in the first % column). Configure with --enable-checking=release to disable checks. With the https://gcc.gnu.org/ml/gcc-patches/2019-03/msg01335.html patch the same testcase with -O2 -ftime-report results in identical assembly, but: phase opt and generate : 28.92 (100%) 1.82 ( 95%) 30.85 ( 99%) 768882 kB ( 99%) CFG verifier : 1.66 ( 6%) 0.02 ( 1%) 1.69 ( 5%) 0 kB ( 0%) df live regs : 0.63 ( 2%) 0.00 ( 0%) 0.61 ( 2%) 0 kB ( 0%) df live&initialized regs : 1.01 ( 3%) 0.03 ( 2%) 1.00 ( 3%) 0 kB ( 0%) df must-initialized regs : 1.51 ( 5%) 0.93 ( 48%) 2.46 ( 8%) 0 kB ( 0%) tree SSA verifier : 2.79 ( 10%) 0.01 ( 1%) 2.78 ( 9%) 0 kB ( 0%) tree STMT verifier : 2.00 ( 7%) 0.00 ( 0%) 1.99 ( 6%) 0 kB ( 0%) dominance computation : 0.61 ( 2%) 0.00 ( 0%) 0.59 ( 2%) 0 kB ( 0%) out of ssa : 0.61 ( 2%) 0.04 ( 2%) 0.65 ( 2%) 1 kB ( 0%) loop init : 0.58 ( 2%) 0.00 ( 0%) 0.63 ( 2%) 38 kB ( 0%) combiner : 0.44 ( 2%) 0.02 ( 1%) 0.47 ( 2%) 17926 kB ( 2%) integrated RA : 2.24 ( 8%) 0.08 ( 4%) 2.35 ( 8%) 205177 kB ( 26%) LRA non-specific : 1.46 ( 5%) 0.05 ( 3%) 1.50 ( 5%) 19172 kB ( 2%) LRA create live ranges : 1.23 ( 4%) 0.00 ( 0%) 1.23 ( 4%) 2589 kB ( 0%) reload CSE regs : 0.54 ( 2%) 0.00 ( 0%) 0.51 ( 2%) 8456 kB ( 1%) scheduling 2 : 0.73 ( 3%) 0.09 ( 5%) 0.81 ( 3%) 2715 kB ( 0%) verify RTL sharing : 1.19 ( 4%) 0.00 ( 0%) 1.15 ( 4%) 0 kB ( 0%) TOTAL : 29.02 1.92 31.07 776962 kB So 8.5x usr time speedup with that patch. Jakub
On 3/28/19 2:55 AM, Richard Biener wrote: > On Wed, Mar 27, 2019 at 4:26 PM Jeff Law <law@redhat.com> wrote: >> >> On 3/27/19 8:36 AM, Jakub Jelinek wrote: >>> On Sun, Mar 24, 2019 at 09:20:07AM -0600, Jeff Law wrote: >>>> However, I'm increasingly of the opinion that MIPS targets need to drop >>>> off the priority platform list. Given the trajectory I see for MIPS >>>> based processors in industry, it's really hard to justify spending this >>>> much time on them, particularly for low priority code quality issues. >>> >>> Besides what has been discussed on IRC for the PR89826 fix, that we really >>> need a df_analyze before processing the first block, because otherwise we >>> can't rely on the REG_UNUSED notes in the IL, I see some other issues, but I >>> admit I don't know much about df nor regcprop. >> RIght. I plan to commit that today along with the test reordering you >> pointed out. >> >>> >>> 1) the df_analyze () after every (successful) processing of a basic block >>> is IMHO way too expensive, I would be very surprised if df_analyze () isn't >>> quadratic in number of basic blocks and so one could construct testcases >>> with millions of basic blocks and at least one regcprop change in each bb >>> and get at cubic complexity (correct me if I'm wrong, and I'm aware of the >>> 95% bbs you said won't have any changes at all) >> I'm going to look this further today. > > Look at https://gcc.opensuse.org/gcc-old/c++bench-czerny/random/random-performance-latest > and you'll see multiple testcases with 'hard reg cprop' >10% compile-time. > It's indeed a hog for no good reason. I stand corrected. That's really a surprise. I looked at today's report and don't see anything over 1%, so Jakub's improvements really helped.
diff --git a/gcc/ChangeLog b/gcc/ChangeLog index 6b9f1c4a2b6..4a5b122eedd 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,3 +1,14 @@ +2019-03-26 Jeff Law <law@redhat.com> + + PR rtl-optimization/87761 + * regcprop.c (copyprop_hardreg_forward_1): Check may_trap_p on SET, + not INSN. Also check RTX_FRAME_RELATED_P. Queue insns for DF rescan + as needed. + (pass_cprop_hardreg::execute): Add df note problem and defer insn + rescans. Reprocess blocks as needed, calling df_analyze before + reprocessing. Always call df_analyze before fixing up debug bind + insns. + 2019-03-23 Segher Boessenkool <segher@kernel.crashing.org> * config/rs6000/xmmintrin.h (_mm_movemask_pi8): Implement for 32-bit @@ -8,7 +19,7 @@ * config/aarch64/aarch64.md (zero_extendsidi2_aarch64): Fix type attrribute for uxtw. -2019-02-26 Jeff Law <law@redhat.com> +2019-03-26 Jeff Law <law@redhat.com> PR rtl-optimization/87761 * config/mips/mips-protos.h (mips_split_move): Add new argument. diff --git a/gcc/regcprop.c b/gcc/regcprop.c index 926df40f954..8ca523ffe23 100644 --- a/gcc/regcprop.c +++ b/gcc/regcprop.c @@ -800,8 +800,9 @@ copyprop_hardreg_forward_1 (basic_block bb, struct value_data *vd) /* Detect obviously dead sets (via REG_UNUSED notes) and remove them. */ if (set + && !may_trap_p (set) + && !RTX_FRAME_RELATED_P (insn) && INSN_P (insn) - && !may_trap_p (insn) && find_reg_note (insn, REG_UNUSED, SET_DEST (set)) && !side_effects_p (SET_SRC (set)) && !side_effects_p (SET_DEST (set))) @@ -1034,6 +1035,7 @@ copyprop_hardreg_forward_1 (basic_block bb, struct value_data *vd) DEBUG_INSNs can be applied. */ if (vd->n_debug_insn_changes) note_uses (&PATTERN (insn), cprop_find_used_regs, vd); + df_insn_rescan (insn); } ksvd.vd = vd; @@ -1112,7 +1114,10 @@ copyprop_hardreg_forward_1 (basic_block bb, struct value_data *vd) /* Notice copies. */ if (copy_p) - copy_value (SET_DEST (set), SET_SRC (set), vd); + { + df_insn_rescan (insn); + copy_value (SET_DEST (set), SET_SRC (set), vd); + } } if (insn == BB_END (bb)) @@ -1282,47 +1287,68 @@ pass_cprop_hardreg::execute (function *fun) { struct value_data *all_vd; basic_block bb; - bool analyze_called = false; all_vd = XNEWVEC (struct value_data, last_basic_block_for_fn (fun)); auto_sbitmap visited (last_basic_block_for_fn (fun)); bitmap_clear (visited); + df_note_add_problem (); + + /* It is tempting to set DF_LR_RUN_DCE, but DCE may choose to delete + an insn and this pass would not have visibility into the removal. + This pass would then potentially use the source of that + INSN for propagation purposes, generating invalid code. + + So we just ask for updated notes and handle trivial deletions + within this pass where we can update this passes internal + data structures appropriately. */ + df_set_flags (DF_DEFER_INSN_RESCAN); + FOR_EACH_BB_FN (bb, fun) { bitmap_set_bit (visited, bb->index); - /* If a block has a single predecessor, that we've already - processed, begin with the value data that was live at - the end of the predecessor block. */ - /* ??? Ought to use more intelligent queuing of blocks. */ - if (single_pred_p (bb) - && bitmap_bit_p (visited, single_pred (bb)->index) - && ! (single_pred_edge (bb)->flags & (EDGE_ABNORMAL_CALL | EDGE_EH))) + for (int pass = 0; pass < 2; pass++) { - all_vd[bb->index] = all_vd[single_pred (bb)->index]; - if (all_vd[bb->index].n_debug_insn_changes) + /* If a block has a single predecessor, that we've already + processed, begin with the value data that was live at + the end of the predecessor block. */ + /* ??? Ought to use more intelligent queuing of blocks. */ + if (single_pred_p (bb) + && bitmap_bit_p (visited, single_pred (bb)->index) + && ! (single_pred_edge (bb)->flags & (EDGE_ABNORMAL_CALL | EDGE_EH))) { - unsigned int regno; - - for (regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++) + all_vd[bb->index] = all_vd[single_pred (bb)->index]; + if (all_vd[bb->index].n_debug_insn_changes) { - if (all_vd[bb->index].e[regno].debug_insn_changes) + unsigned int regno; + + for (regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++) { - all_vd[bb->index].e[regno].debug_insn_changes = NULL; - if (--all_vd[bb->index].n_debug_insn_changes == 0) - break; + if (all_vd[bb->index].e[regno].debug_insn_changes) + { + all_vd[bb->index].e[regno].debug_insn_changes = NULL; + if (--all_vd[bb->index].n_debug_insn_changes == 0) + break; + } } } } - } - else - init_value_data (all_vd + bb->index); + else + init_value_data (all_vd + bb->index); - copyprop_hardreg_forward_1 (bb, all_vd + bb->index); + /* If we were unable to propagate, then break the loop. */ + if (!copyprop_hardreg_forward_1 (bb, all_vd + bb->index)) + break; + df_analyze (); + } } + /* We must call df_analyze here unconditionally to ensure that the + REG_UNUSED and REG_DEAD notes are consistent with and without -g. */ + df_analyze (); + if (MAY_HAVE_DEBUG_BIND_INSNS) { FOR_EACH_BB_FN (bb, fun) @@ -1332,11 +1358,6 @@ pass_cprop_hardreg::execute (function *fun) unsigned int regno; bitmap live; - if (!analyze_called) - { - df_analyze (); - analyze_called = true; - } live = df_get_live_out (bb); for (regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++) if (all_vd[bb->index].e[regno].debug_insn_changes)