Message ID | D4C76825A6780047854A11E93CDE84D004C176842F@SAUSEXMBP01.amd.com |
---|---|
State | New |
Headers | show |
On Tue, Jan 4, 2011 at 11:30 PM, Fang, Changpeng <Changpeng.Fang@amd.com> wrote: > Hi, > > Thanks, Jack. Hopefully the 6% protein gain (for -m64) is not just noise. > > I updated the patch based on the current trunk (REV 168477). > > Is it OK to commit now? The patch doesn't contain a testcase nor a reference to a bug so isn't appropriate for stage4 (or even stage3). Also using BB flags for this doesn't sound quite right as you'd have to make sure CFG manipulations properly copy/unset this flag (which is likely the reason for the polyhedron regressions seen). That said, I don't like the patch too much anyway, it looks like a hack. The proper fix is to finally go the way of preserving loop information and putting such flag in the loop information (or well, just manually setting the upper bound for the number of iterations therein). Please postpone this for stage1 of GCC 4.7. Thanks. Richard. > Thanks, > > Changpeng > > > > ________________________________________ > From: Jack Howarth [howarth@bromo.med.uc.edu] > Sent: Monday, January 03, 2011 9:04 PM > To: Fang, Changpeng > Cc: Zdenek Dvorak; Richard Guenther; Xinliang David Li; gcc-patches@gcc.gnu.org > Subject: Re: [PATCH, Loop optimizer]: Add logic to disable certain loop optimizations on pre-/post-loops > > On Fri, Dec 17, 2010 at 01:14:49AM -0600, Fang, Changpeng wrote: >> Hi, Jack: >> >> Thanks for the testing. >> >> This patch is not supposed to slow down a program by 10% (rnflow and test_fpu). >> It would be helpful if you can provide analysis why they are slowed down. > > Changpeng, > The corrected merge against gcc trunk of... > > Index: gcc/basic-block.h > =================================================================== > --- gcc/basic-block.h (revision 168437) > +++ gcc/basic-block.h (working copy) > @@ -247,11 +247,14 @@ > Only used in cfgcleanup.c. */ > BB_NONTHREADABLE_BLOCK = 1 << 11, > > + /* Set on blocks that are headers of non-rolling loops. */ > + BB_HEADER_OF_NONROLLING_LOOP = 1 << 12, > + > /* Set on blocks that were modified in some way. This bit is set in > df_set_bb_dirty, but not cleared by df_analyze, so it can be used > to test whether a block has been modified prior to a df_analyze > call. */ > - BB_MODIFIED = 1 << 12 > + BB_MODIFIED = 1 << 13 > }; > > /* Dummy flag for convenience in the hot/cold partitioning code. */ > > for the proposed patch from http://gcc.gnu.org/ml/gcc-patches/2010-12/msg01344.html > eliminated the performance regressions on x86_64-apple-darwin10. I now get... > > Compile Command : gfortran -ffast-math -funroll-loops -O3 %n.f90 -o %n > > Execution Time > -m32 > stock patched %increase > ac 10.59 10.59 0.0 > aermod 19.49 19.13 -1.8 > air 6.07 6.07 0.0 > capacita 44.60 44.61 0.0 > channel 1.98 1.98 0.0 > doduc 31.19 31.31 0.4 > fatigue 9.90 10.29 3.9 > gas_dyn 4.72 4.71 -0.2 > induct 13.93 13.93 0.0 > linpk 15.50 15.49 -0.1 > mdbx 11.28 11.26 -0.2 > nf 27.62 27.58 -0.1 > protein 38.70 38.60 -0.3 > rnflow 24.68 24.68 0.0 > test_fpu 10.13 10.13 0.0 > tfft 1.92 1.92 0.0 > > Geometric Mean 12.06 12.08 0.2 > Execution Time > > -m64 > stock patched %increase > ac 8.80 8.80 0.0 > aermod 17.34 17.17 -1.0 > air 5.48 5.52 0.7 > capacita 32.38 32.50 0.4 > channel 1.84 1.84 0.0 > doduc 26.50 26.52 0.1 > fatigue 8.35 8.33 -0.2 > gas_dyn 4.30 4.29 -0.2 > induct 12.83 12.83 0.0 > linpk 15.49 15.49 0.0 > mdbx 11.23 11.22 -0.1 > nf 30.21 30.16 -0.2 > protein 34.13 32.07 -6.0 > rnflow 23.18 23.19 0.0 > test_fpu 8.04 8.02 -0.2 > tfft 1.87 1.86 -0.5 > > Geometric Mean 10.87 10.82 -0.5 > Execution Time >
On Tue, Jan 04, 2011 at 04:30:38PM -0600, Fang, Changpeng wrote: > Hi, > > Thanks, Jack. Hopefully the 6% protein gain (for -m64) is not just noise. > > I updated the patch based on the current trunk (REV 168477). > > Is it OK to commit now? I can't approve it but the benchmarks now seem reasonable with the patch (although you might want to investigate why fatigue runs 3.9% slower at -m32). > > Thanks, > > Changpeng > > > > ________________________________________ > From: Jack Howarth [howarth@bromo.med.uc.edu] > Sent: Monday, January 03, 2011 9:04 PM > To: Fang, Changpeng > Cc: Zdenek Dvorak; Richard Guenther; Xinliang David Li; gcc-patches@gcc.gnu.org > Subject: Re: [PATCH, Loop optimizer]: Add logic to disable certain loop optimizations on pre-/post-loops > > On Fri, Dec 17, 2010 at 01:14:49AM -0600, Fang, Changpeng wrote: > > Hi, Jack: > > > > Thanks for the testing. > > > > This patch is not supposed to slow down a program by 10% (rnflow and test_fpu). > > It would be helpful if you can provide analysis why they are slowed down. > > Changpeng, > The corrected merge against gcc trunk of... > > Index: gcc/basic-block.h > =================================================================== > --- gcc/basic-block.h (revision 168437) > +++ gcc/basic-block.h (working copy) > @@ -247,11 +247,14 @@ > Only used in cfgcleanup.c. */ > BB_NONTHREADABLE_BLOCK = 1 << 11, > > + /* Set on blocks that are headers of non-rolling loops. */ > + BB_HEADER_OF_NONROLLING_LOOP = 1 << 12, > + > /* Set on blocks that were modified in some way. This bit is set in > df_set_bb_dirty, but not cleared by df_analyze, so it can be used > to test whether a block has been modified prior to a df_analyze > call. */ > - BB_MODIFIED = 1 << 12 > + BB_MODIFIED = 1 << 13 > }; > > /* Dummy flag for convenience in the hot/cold partitioning code. */ > > for the proposed patch from http://gcc.gnu.org/ml/gcc-patches/2010-12/msg01344.html > eliminated the performance regressions on x86_64-apple-darwin10. I now get... > > Compile Command : gfortran -ffast-math -funroll-loops -O3 %n.f90 -o %n > > Execution Time > -m32 > stock patched %increase > ac 10.59 10.59 0.0 > aermod 19.49 19.13 -1.8 > air 6.07 6.07 0.0 > capacita 44.60 44.61 0.0 > channel 1.98 1.98 0.0 > doduc 31.19 31.31 0.4 > fatigue 9.90 10.29 3.9 > gas_dyn 4.72 4.71 -0.2 > induct 13.93 13.93 0.0 > linpk 15.50 15.49 -0.1 > mdbx 11.28 11.26 -0.2 > nf 27.62 27.58 -0.1 > protein 38.70 38.60 -0.3 > rnflow 24.68 24.68 0.0 > test_fpu 10.13 10.13 0.0 > tfft 1.92 1.92 0.0 > > Geometric Mean 12.06 12.08 0.2 > Execution Time > > -m64 > stock patched %increase > ac 8.80 8.80 0.0 > aermod 17.34 17.17 -1.0 > air 5.48 5.52 0.7 > capacita 32.38 32.50 0.4 > channel 1.84 1.84 0.0 > doduc 26.50 26.52 0.1 > fatigue 8.35 8.33 -0.2 > gas_dyn 4.30 4.29 -0.2 > induct 12.83 12.83 0.0 > linpk 15.49 15.49 0.0 > mdbx 11.23 11.22 -0.1 > nf 30.21 30.16 -0.2 > protein 34.13 32.07 -6.0 > rnflow 23.18 23.19 0.0 > test_fpu 8.04 8.02 -0.2 > tfft 1.87 1.86 -0.5 > > Geometric Mean 10.87 10.82 -0.5 > Execution Time Content-Description: 0001-Consider-a-loop-not-hot-if-it-rolls-only-a-few-times.patch > From 7d10fcfe379e3fe4387cf192fbbdbea7daff0637 Mon Sep 17 00:00:00 2001 > From: Changpeng Fang <chfang@houghton.(none)> > Date: Tue, 4 Jan 2011 14:36:50 -0800 > Subject: [PATCH] Consider a loop not hot if it rolls only a few times (non-rolling) > > * basic-block.h (bb_flags): Add a new flag BB_HEADER_OF_NONROLLING_LOOP. > * cfg.c (clear_bb_flags): Keep BB_HEADER_OF_NONROLLING marker. > * cfgloop.h (mark_non_rolling_loop): New function declaration. > (non_rolling_loop_p): New function declaration. > * predict.c (optimize_loop_for_size_p): Return true if the loop was marked > NON-ROLLING. (optimize_loop_for_speed_p): Return false if the loop was > marked NON-ROLLING. > * tree-ssa-loop-manip.c (tree_transform_and_unroll_loop): Mark the > non-rolling loop. > * tree-ssa-loop-niter.c (mark_non_rolling_loop): Implement the new > function. (non_rolling_loop_p): Implement the new function. > * tree-vect-loop-manip.c (vect_do_peeling_for_loop_bound): Mark the > non-rolling loop. (vect_do_peeling_for_alignment): Mark the non-rolling > loop. > --- > gcc/basic-block.h | 5 ++++- > gcc/cfg.c | 7 ++++--- > gcc/cfgloop.h | 2 ++ > gcc/predict.c | 6 ++++++ > gcc/tree-ssa-loop-manip.c | 3 +++ > gcc/tree-ssa-loop-niter.c | 20 ++++++++++++++++++++ > gcc/tree-vect-loop-manip.c | 8 ++++++++ > 7 files changed, 47 insertions(+), 4 deletions(-) > > diff --git a/gcc/basic-block.h b/gcc/basic-block.h > index 3594eea..081c175 100644 > --- a/gcc/basic-block.h > +++ b/gcc/basic-block.h > @@ -251,7 +251,10 @@ enum bb_flags > df_set_bb_dirty, but not cleared by df_analyze, so it can be used > to test whether a block has been modified prior to a df_analyze > call. */ > - BB_MODIFIED = 1 << 12 > + BB_MODIFIED = 1 << 12, > + > + /* Set on blocks that are headers of non-rolling loops. */ > + BB_HEADER_OF_NONROLLING_LOOP = 1 << 13 > }; > > /* Dummy flag for convenience in the hot/cold partitioning code. */ > diff --git a/gcc/cfg.c b/gcc/cfg.c > index c8ef799..e59a637 100644 > --- a/gcc/cfg.c > +++ b/gcc/cfg.c > @@ -425,8 +425,8 @@ redirect_edge_pred (edge e, basic_block new_pred) > connect_src (e); > } > > -/* Clear all basic block flags, with the exception of partitioning and > - setjmp_target. */ > +/* Clear all basic block flags, with the exception of partitioning, > + setjmp_target, and the non-rolling loop marker. */ > void > clear_bb_flags (void) > { > @@ -434,7 +434,8 @@ clear_bb_flags (void) > > FOR_BB_BETWEEN (bb, ENTRY_BLOCK_PTR, NULL, next_bb) > bb->flags = (BB_PARTITION (bb) > - | (bb->flags & (BB_DISABLE_SCHEDULE + BB_RTL + BB_NON_LOCAL_GOTO_TARGET))); > + | (bb->flags & (BB_DISABLE_SCHEDULE + BB_RTL + BB_NON_LOCAL_GOTO_TARGET > + + BB_HEADER_OF_NONROLLING_LOOP))); > } > > /* Check the consistency of profile information. We can't do that > diff --git a/gcc/cfgloop.h b/gcc/cfgloop.h > index f7bb134..0f48115 100644 > --- a/gcc/cfgloop.h > +++ b/gcc/cfgloop.h > @@ -279,6 +279,8 @@ extern rtx doloop_condition_get (rtx); > void estimate_numbers_of_iterations_loop (struct loop *, bool); > HOST_WIDE_INT estimated_loop_iterations_int (struct loop *, bool); > bool estimated_loop_iterations (struct loop *, bool, double_int *); > +void mark_non_rolling_loop (struct loop *); > +bool non_rolling_loop_p (struct loop *); > > /* Loop manipulation. */ > extern bool can_duplicate_loop_p (const struct loop *loop); > diff --git a/gcc/predict.c b/gcc/predict.c > index a86708a..34d7dff 100644 > --- a/gcc/predict.c > +++ b/gcc/predict.c > @@ -280,6 +280,9 @@ optimize_insn_for_speed_p (void) > bool > optimize_loop_for_size_p (struct loop *loop) > { > + /* Loops marked NON-ROLLING are not likely to be hot. */ > + if (non_rolling_loop_p (loop)) > + return true; > return optimize_bb_for_size_p (loop->header); > } > > @@ -288,6 +291,9 @@ optimize_loop_for_size_p (struct loop *loop) > bool > optimize_loop_for_speed_p (struct loop *loop) > { > + /* Loops marked NON-ROLLING are not likely to be hot. */ > + if (non_rolling_loop_p (loop)) > + return false; > return optimize_bb_for_speed_p (loop->header); > } > > diff --git a/gcc/tree-ssa-loop-manip.c b/gcc/tree-ssa-loop-manip.c > index 8176ed8..3ff0790 100644 > --- a/gcc/tree-ssa-loop-manip.c > +++ b/gcc/tree-ssa-loop-manip.c > @@ -931,6 +931,9 @@ tree_transform_and_unroll_loop (struct loop *loop, unsigned factor, > gcc_assert (new_loop != NULL); > update_ssa (TODO_update_ssa); > > + /* NEW_LOOP is a non-rolling loop. */ > + mark_non_rolling_loop (new_loop); > + > /* Determine the probability of the exit edge of the unrolled loop. */ > new_est_niter = est_niter / factor; > > diff --git a/gcc/tree-ssa-loop-niter.c b/gcc/tree-ssa-loop-niter.c > index ee85f6f..ec108c2 100644 > --- a/gcc/tree-ssa-loop-niter.c > +++ b/gcc/tree-ssa-loop-niter.c > @@ -3011,6 +3011,26 @@ estimate_numbers_of_iterations (bool use_undefined_p) > fold_undefer_and_ignore_overflow_warnings (); > } > > +/* Mark LOOP as a non-rolling loop. */ > + > +void > +mark_non_rolling_loop (struct loop *loop) > +{ > + gcc_assert (loop && loop->header); > + loop->header->flags |= BB_HEADER_OF_NONROLLING_LOOP; > +} > + > +/* Return true if LOOP is a non-rolling loop. */ > + > +bool > +non_rolling_loop_p (struct loop *loop) > +{ > + int masked_flags; > + gcc_assert (loop && loop->header); > + masked_flags = (loop->header->flags & BB_HEADER_OF_NONROLLING_LOOP); > + return (masked_flags != 0); > +} > + > /* Returns true if statement S1 dominates statement S2. */ > > bool > diff --git a/gcc/tree-vect-loop-manip.c b/gcc/tree-vect-loop-manip.c > index 28b75f1..9bbe68b 100644 > --- a/gcc/tree-vect-loop-manip.c > +++ b/gcc/tree-vect-loop-manip.c > @@ -1938,6 +1938,10 @@ vect_do_peeling_for_loop_bound (loop_vec_info loop_vinfo, tree *ratio, > cond_expr, cond_expr_stmt_list); > gcc_assert (new_loop); > gcc_assert (loop_num == loop->num); > + > + /* NEW_LOOP is a non-rolling loop. */ > + mark_non_rolling_loop (new_loop); > + > #ifdef ENABLE_CHECKING > slpeel_verify_cfg_after_peeling (loop, new_loop); > #endif > @@ -2191,6 +2195,10 @@ vect_do_peeling_for_alignment (loop_vec_info loop_vinfo) > th, true, NULL_TREE, NULL); > > gcc_assert (new_loop); > + > + /* NEW_LOOP is a non-rolling loop. */ > + mark_non_rolling_loop (new_loop); > + > #ifdef ENABLE_CHECKING > slpeel_verify_cfg_after_peeling (new_loop, loop); > #endif > -- > 1.6.3.3 >
From 7d10fcfe379e3fe4387cf192fbbdbea7daff0637 Mon Sep 17 00:00:00 2001 From: Changpeng Fang <chfang@houghton.(none)> Date: Tue, 4 Jan 2011 14:36:50 -0800 Subject: [PATCH] Consider a loop not hot if it rolls only a few times (non-rolling) * basic-block.h (bb_flags): Add a new flag BB_HEADER_OF_NONROLLING_LOOP. * cfg.c (clear_bb_flags): Keep BB_HEADER_OF_NONROLLING marker. * cfgloop.h (mark_non_rolling_loop): New function declaration. (non_rolling_loop_p): New function declaration. * predict.c (optimize_loop_for_size_p): Return true if the loop was marked NON-ROLLING. (optimize_loop_for_speed_p): Return false if the loop was marked NON-ROLLING. * tree-ssa-loop-manip.c (tree_transform_and_unroll_loop): Mark the non-rolling loop. * tree-ssa-loop-niter.c (mark_non_rolling_loop): Implement the new function. (non_rolling_loop_p): Implement the new function. * tree-vect-loop-manip.c (vect_do_peeling_for_loop_bound): Mark the non-rolling loop. (vect_do_peeling_for_alignment): Mark the non-rolling loop. --- gcc/basic-block.h | 5 ++++- gcc/cfg.c | 7 ++++--- gcc/cfgloop.h | 2 ++ gcc/predict.c | 6 ++++++ gcc/tree-ssa-loop-manip.c | 3 +++ gcc/tree-ssa-loop-niter.c | 20 ++++++++++++++++++++ gcc/tree-vect-loop-manip.c | 8 ++++++++ 7 files changed, 47 insertions(+), 4 deletions(-) diff --git a/gcc/basic-block.h b/gcc/basic-block.h index 3594eea..081c175 100644 --- a/gcc/basic-block.h +++ b/gcc/basic-block.h @@ -251,7 +251,10 @@ enum bb_flags df_set_bb_dirty, but not cleared by df_analyze, so it can be used to test whether a block has been modified prior to a df_analyze call. */ - BB_MODIFIED = 1 << 12 + BB_MODIFIED = 1 << 12, + + /* Set on blocks that are headers of non-rolling loops. */ + BB_HEADER_OF_NONROLLING_LOOP = 1 << 13 }; /* Dummy flag for convenience in the hot/cold partitioning code. */ diff --git a/gcc/cfg.c b/gcc/cfg.c index c8ef799..e59a637 100644 --- a/gcc/cfg.c +++ b/gcc/cfg.c @@ -425,8 +425,8 @@ redirect_edge_pred (edge e, basic_block new_pred) connect_src (e); } -/* Clear all basic block flags, with the exception of partitioning and - setjmp_target. */ +/* Clear all basic block flags, with the exception of partitioning, + setjmp_target, and the non-rolling loop marker. */ void clear_bb_flags (void) { @@ -434,7 +434,8 @@ clear_bb_flags (void) FOR_BB_BETWEEN (bb, ENTRY_BLOCK_PTR, NULL, next_bb) bb->flags = (BB_PARTITION (bb) - | (bb->flags & (BB_DISABLE_SCHEDULE + BB_RTL + BB_NON_LOCAL_GOTO_TARGET))); + | (bb->flags & (BB_DISABLE_SCHEDULE + BB_RTL + BB_NON_LOCAL_GOTO_TARGET + + BB_HEADER_OF_NONROLLING_LOOP))); } /* Check the consistency of profile information. We can't do that diff --git a/gcc/cfgloop.h b/gcc/cfgloop.h index f7bb134..0f48115 100644 --- a/gcc/cfgloop.h +++ b/gcc/cfgloop.h @@ -279,6 +279,8 @@ extern rtx doloop_condition_get (rtx); void estimate_numbers_of_iterations_loop (struct loop *, bool); HOST_WIDE_INT estimated_loop_iterations_int (struct loop *, bool); bool estimated_loop_iterations (struct loop *, bool, double_int *); +void mark_non_rolling_loop (struct loop *); +bool non_rolling_loop_p (struct loop *); /* Loop manipulation. */ extern bool can_duplicate_loop_p (const struct loop *loop); diff --git a/gcc/predict.c b/gcc/predict.c index a86708a..34d7dff 100644 --- a/gcc/predict.c +++ b/gcc/predict.c @@ -280,6 +280,9 @@ optimize_insn_for_speed_p (void) bool optimize_loop_for_size_p (struct loop *loop) { + /* Loops marked NON-ROLLING are not likely to be hot. */ + if (non_rolling_loop_p (loop)) + return true; return optimize_bb_for_size_p (loop->header); } @@ -288,6 +291,9 @@ optimize_loop_for_size_p (struct loop *loop) bool optimize_loop_for_speed_p (struct loop *loop) { + /* Loops marked NON-ROLLING are not likely to be hot. */ + if (non_rolling_loop_p (loop)) + return false; return optimize_bb_for_speed_p (loop->header); } diff --git a/gcc/tree-ssa-loop-manip.c b/gcc/tree-ssa-loop-manip.c index 8176ed8..3ff0790 100644 --- a/gcc/tree-ssa-loop-manip.c +++ b/gcc/tree-ssa-loop-manip.c @@ -931,6 +931,9 @@ tree_transform_and_unroll_loop (struct loop *loop, unsigned factor, gcc_assert (new_loop != NULL); update_ssa (TODO_update_ssa); + /* NEW_LOOP is a non-rolling loop. */ + mark_non_rolling_loop (new_loop); + /* Determine the probability of the exit edge of the unrolled loop. */ new_est_niter = est_niter / factor; diff --git a/gcc/tree-ssa-loop-niter.c b/gcc/tree-ssa-loop-niter.c index ee85f6f..ec108c2 100644 --- a/gcc/tree-ssa-loop-niter.c +++ b/gcc/tree-ssa-loop-niter.c @@ -3011,6 +3011,26 @@ estimate_numbers_of_iterations (bool use_undefined_p) fold_undefer_and_ignore_overflow_warnings (); } +/* Mark LOOP as a non-rolling loop. */ + +void +mark_non_rolling_loop (struct loop *loop) +{ + gcc_assert (loop && loop->header); + loop->header->flags |= BB_HEADER_OF_NONROLLING_LOOP; +} + +/* Return true if LOOP is a non-rolling loop. */ + +bool +non_rolling_loop_p (struct loop *loop) +{ + int masked_flags; + gcc_assert (loop && loop->header); + masked_flags = (loop->header->flags & BB_HEADER_OF_NONROLLING_LOOP); + return (masked_flags != 0); +} + /* Returns true if statement S1 dominates statement S2. */ bool diff --git a/gcc/tree-vect-loop-manip.c b/gcc/tree-vect-loop-manip.c index 28b75f1..9bbe68b 100644 --- a/gcc/tree-vect-loop-manip.c +++ b/gcc/tree-vect-loop-manip.c @@ -1938,6 +1938,10 @@ vect_do_peeling_for_loop_bound (loop_vec_info loop_vinfo, tree *ratio, cond_expr, cond_expr_stmt_list); gcc_assert (new_loop); gcc_assert (loop_num == loop->num); + + /* NEW_LOOP is a non-rolling loop. */ + mark_non_rolling_loop (new_loop); + #ifdef ENABLE_CHECKING slpeel_verify_cfg_after_peeling (loop, new_loop); #endif @@ -2191,6 +2195,10 @@ vect_do_peeling_for_alignment (loop_vec_info loop_vinfo) th, true, NULL_TREE, NULL); gcc_assert (new_loop); + + /* NEW_LOOP is a non-rolling loop. */ + mark_non_rolling_loop (new_loop); + #ifdef ENABLE_CHECKING slpeel_verify_cfg_after_peeling (new_loop, loop); #endif -- 1.6.3.3