diff mbox

[google,gcc-4_8] Tree Loop Unrolling - Relax code size increase with -O2

Message ID CAAs8HmwDmZty3qst-o8pvxbmY=UhR+ozLYJxHA4W6_U_J8mPbg@mail.gmail.com
State New
Headers show

Commit Message

Sriraman Tallam Jan. 21, 2014, 9:49 p.m. UTC
Hi,

     Currently, tree unrolling pass(cunroll) does not allow any code
size growth in O2 mode.  Code size growth is permitted only if O3 or
funroll-loops/fpeel-loops is used. I have created  a patch to allow
partial code size increase in O2 mode. With funroll-loops the maximum
allowed code growth is 400 unrolled insns. I have set it to 200
unrolled insns in O2 mode.  This patch improves an image processing
benchmark by 20%. It improves most benchmarks by 1-2%. The code size
increase is <1% for all the benchmarks except the image processing
benchmark which increases by 6% (perf improves by 20%).

     I am working on getting this patch reviewed for trunk. Here is
the disussion on this:
http://gcc.gnu.org/ml/gcc-patches/2013-11/msg02643.html  I have
incorporated the comments on making the patch simpler. I will
follow-up on that patch to trunk by also getting data on limiting
complete peeling with O2.

Is this ok for the google branch?

Thanks
Sri

Comments

Xinliang David Li Jan. 21, 2014, 10:49 p.m. UTC | #1
I think it might be better to introduce a new parameter for  max peel
insn at O2 (e.g, call it MAX_O2_COMPLETELY_PEEL_INSN or
MAX_DEFAULT_...), and use the same logic in your patch to override the
MAX_COMPLETELY_PEELED_INSN parameter at O2).

By so doing, we don't need to have a hard coded factor of 2.

In the longer run, we really need better cost/benefit analysis, but
that is independent.

David

On Tue, Jan 21, 2014 at 1:49 PM, Sriraman Tallam <tmsriram@google.com> wrote:
> Hi,
>
>      Currently, tree unrolling pass(cunroll) does not allow any code
> size growth in O2 mode.  Code size growth is permitted only if O3 or
> funroll-loops/fpeel-loops is used. I have created  a patch to allow
> partial code size increase in O2 mode. With funroll-loops the maximum
> allowed code growth is 400 unrolled insns. I have set it to 200
> unrolled insns in O2 mode.  This patch improves an image processing
> benchmark by 20%. It improves most benchmarks by 1-2%. The code size
> increase is <1% for all the benchmarks except the image processing
> benchmark which increases by 6% (perf improves by 20%).
>
>      I am working on getting this patch reviewed for trunk. Here is
> the disussion on this:
> http://gcc.gnu.org/ml/gcc-patches/2013-11/msg02643.html  I have
> incorporated the comments on making the patch simpler. I will
> follow-up on that patch to trunk by also getting data on limiting
> complete peeling with O2.
>
> Is this ok for the google branch?
>
> Thanks
> Sri
diff mbox

Patch

Index: opts.c
===================================================================
--- opts.c	(revision 206638)
+++ opts.c	(working copy)
@@ -855,6 +855,19 @@  finish_options (struct gcc_options *opts, struct g
             0, opts->x_param_values, opts_set->x_param_values);
     }
 
+  /* Set PARAM_MAX_COMPLETELY_PEELED_INSNS to half its original value during
+     -O2 when -funroll-loops and -fpeel-loops are not set.   */
+  if (optimize == 2 && !opts->x_flag_unroll_loops && !opts->x_flag_peel_loops
+      && !opts->x_flag_unroll_all_loops)
+
+    {
+      unsigned HOST_WIDE_INT max_completely_peeled_insns
+	= (PARAM_VALUE (PARAM_MAX_COMPLETELY_PEELED_INSNS) / 2);
+      maybe_set_param_value (PARAM_MAX_COMPLETELY_PEELED_INSNS,
+			     max_completely_peeled_insns,
+			     opts->x_param_values, opts_set->x_param_values);
+    }
+
   /* Set PARAM_MAX_STORES_TO_SINK to 0 if either vectorization or if-conversion
      is disabled.  */
   if ((!opts->x_flag_tree_loop_vectorize && !opts->x_flag_tree_slp_vectorize)
Index: tree-ssa-loop.c
===================================================================
--- tree-ssa-loop.c	(revision 206638)
+++ tree-ssa-loop.c	(working copy)
@@ -467,7 +467,7 @@  tree_complete_unroll (void)
 
   return tree_unroll_loops_completely (flag_unroll_loops
 				       || flag_peel_loops
-				       || optimize >= 3, true);
+				       || optimize >= 2, true);
 }
 
 static bool