diff mbox series

Implement three-level optimize_for_size predicates

Message ID 20201026142259.GA66596@kam.mff.cuni.cz
State New
Headers show
Series Implement three-level optimize_for_size predicates | expand

Commit Message

Jan Hubicka Oct. 26, 2020, 2:22 p.m. UTC
Hi,
this patch implements thre two-state optimize_for_size predicates, so with -Os
and with profile feedback for never executed code it returns OPTIMIZE_SIZE_MAX
while in cases we decide to optimize for size based on branch prediction logic
it return OPTIMIZE_SIZE_BALLANCED.

The idea is that for places where we guess that code is unlikely we do not
want to do extreme optimizations for size that leads to many fold slowdowns
(using idiv rather than few shigts or using rep based inlined stringops).

I will update RTL handling code to also support this with BB granuality (which
we don't currently).   We also should make attribute cold to lead to
OPTIMIZE_SIZE_MAX I would say.

LLVM has -Os and -Oz levels where -Oz is our -Os and LLVM's -Os would
ocrrespond to OPTIMIZE_SIZE_BALLANCED.  I wonder if we want to export
this to command line somehow?  For me it would be definitly useful to
test things, I am not sure how "weaker" -Os is desired in practice.

Bootstrapped/regtested x86_64-linux, I will commit it later today if
there are no comments.

H.J., can you plase update your patch on stringopts?

Honza

gcc/ChangeLog:

2020-10-26  Jan Hubicka  <hubicka@ucw.cz>

	* cgraph.h (cgraph_node::optimize_for_size_p): Return
	optimize_size_level.
	(cgraph_node::optimize_for_size_p): Update.
	* coretypes.h (enum optimize_size_level): New enum.
	* predict.c (unlikely_executed_edge_p): Microoptimize.
	(optimize_function_for_size_p): Return optimize_size_level.
	(optimize_bb_for_size_p): Likewise.
	(optimize_edge_for_size_p): Likewise.
	(optimize_insn_for_size_p): Likewise.
	(optimize_loop_nest_for_size_p): Likewise.
	* predict.h (optimize_function_for_size_p): Update declaration.
	(optimize_bb_for_size_p): Update declaration.
	(optimize_edge_for_size_p): Update declaration.
	(optimize_insn_for_size_p): Update declaration.
	(optimize_loop_for_size_p): Update declaration.
	(optimize_loop_nest_for_size_p): Update declaration.

Comments

H.J. Lu Oct. 26, 2020, 2:26 p.m. UTC | #1
On Mon, Oct 26, 2020 at 7:23 AM Jan Hubicka <hubicka@ucw.cz> wrote:
>
> Hi,
> this patch implements thre two-state optimize_for_size predicates, so with -Os
> and with profile feedback for never executed code it returns OPTIMIZE_SIZE_MAX
> while in cases we decide to optimize for size based on branch prediction logic
> it return OPTIMIZE_SIZE_BALLANCED.
>
> The idea is that for places where we guess that code is unlikely we do not
> want to do extreme optimizations for size that leads to many fold slowdowns
> (using idiv rather than few shigts or using rep based inlined stringops).
>
> I will update RTL handling code to also support this with BB granuality (which
> we don't currently).   We also should make attribute cold to lead to
> OPTIMIZE_SIZE_MAX I would say.
>
> LLVM has -Os and -Oz levels where -Oz is our -Os and LLVM's -Os would
> ocrrespond to OPTIMIZE_SIZE_BALLANCED.  I wonder if we want to export
> this to command line somehow?  For me it would be definitly useful to
> test things, I am not sure how "weaker" -Os is desired in practice.
>
> Bootstrapped/regtested x86_64-linux, I will commit it later today if
> there are no comments.
>
> H.J., can you plase update your patch on stringopts?
>

Please ahead.  My patches should be orthogonal to yours.
Martin Liška Oct. 26, 2020, 2:28 p.m. UTC | #2
On 10/26/20 3:22 PM, Jan Hubicka wrote:
> Hi,
> this patch implements thre two-state optimize_for_size predicates, so with -Os
> and with profile feedback for never executed code it returns OPTIMIZE_SIZE_MAX
> while in cases we decide to optimize for size based on branch prediction logic
> it return OPTIMIZE_SIZE_BALLANCED.

Hello.

Do we want to somehow correspondent to -fprofile-partial-training? Or is the
-fprofile-partial-training option basically dead with your new levels?

Thanks,
Martin
Jan Hubicka Oct. 26, 2020, 2:29 p.m. UTC | #3
> On 10/26/20 3:22 PM, Jan Hubicka wrote:
> > Hi,
> > this patch implements thre two-state optimize_for_size predicates, so with -Os
> > and with profile feedback for never executed code it returns OPTIMIZE_SIZE_MAX
> > while in cases we decide to optimize for size based on branch prediction logic
> > it return OPTIMIZE_SIZE_BALLANCED.
> 
> Hello.
> 
> Do we want to somehow correspondent to -fprofile-partial-training? Or is the
> -fprofile-partial-training option basically dead with your new levels?

partial-training will set counts to guessed 0 instead of absolute 0, so
we will use OPTIMIZE_SIZE_BALLANCED instead of MAX for things that was
not executed.
We will still use MAX for portions with optimize_size and code detected
by safe heuristics, like code just before abort.

Honza
> 
> Thanks,
> Martin
Jan Hubicka Oct. 26, 2020, 2:36 p.m. UTC | #4
> On Mon, Oct 26, 2020 at 7:23 AM Jan Hubicka <hubicka@ucw.cz> wrote:
> >
> > Hi,
> > this patch implements thre two-state optimize_for_size predicates, so with -Os
> > and with profile feedback for never executed code it returns OPTIMIZE_SIZE_MAX
> > while in cases we decide to optimize for size based on branch prediction logic
> > it return OPTIMIZE_SIZE_BALLANCED.
> >
> > The idea is that for places where we guess that code is unlikely we do not
> > want to do extreme optimizations for size that leads to many fold slowdowns
> > (using idiv rather than few shigts or using rep based inlined stringops).
> >
> > I will update RTL handling code to also support this with BB granuality (which
> > we don't currently).   We also should make attribute cold to lead to
> > OPTIMIZE_SIZE_MAX I would say.
> >
> > LLVM has -Os and -Oz levels where -Oz is our -Os and LLVM's -Os would
> > ocrrespond to OPTIMIZE_SIZE_BALLANCED.  I wonder if we want to export
> > this to command line somehow?  For me it would be definitly useful to
> > test things, I am not sure how "weaker" -Os is desired in practice.
> >
> > Bootstrapped/regtested x86_64-linux, I will commit it later today if
> > there are no comments.
> >
> > H.J., can you plase update your patch on stringopts?
> >
> 
> Please ahead.  My patches should be orthogonal to yours.

For example you had patch that limited "rep cmpsb" expansion for
-minline-all-stringops.  Now the conditions could be
-minline-all-stringops || optimize_insn_for_size () == OPTIMIZE_SIZE_MAX
since it is still useful size optimization.

I am not sure if you had other changes of this nature? (It is bit hard
to grep compiler for things like this and I would like to get these
organized to optimize_size levels now).

Honza
> 
> -- 
> H.J.
H.J. Lu Oct. 26, 2020, 3:06 p.m. UTC | #5
On Mon, Oct 26, 2020 at 7:36 AM Jan Hubicka <hubicka@ucw.cz> wrote:
>
> > On Mon, Oct 26, 2020 at 7:23 AM Jan Hubicka <hubicka@ucw.cz> wrote:
> > >
> > > Hi,
> > > this patch implements thre two-state optimize_for_size predicates, so with -Os
> > > and with profile feedback for never executed code it returns OPTIMIZE_SIZE_MAX
> > > while in cases we decide to optimize for size based on branch prediction logic
> > > it return OPTIMIZE_SIZE_BALLANCED.
> > >
> > > The idea is that for places where we guess that code is unlikely we do not
> > > want to do extreme optimizations for size that leads to many fold slowdowns
> > > (using idiv rather than few shigts or using rep based inlined stringops).
> > >
> > > I will update RTL handling code to also support this with BB granuality (which
> > > we don't currently).   We also should make attribute cold to lead to
> > > OPTIMIZE_SIZE_MAX I would say.
> > >
> > > LLVM has -Os and -Oz levels where -Oz is our -Os and LLVM's -Os would
> > > ocrrespond to OPTIMIZE_SIZE_BALLANCED.  I wonder if we want to export
> > > this to command line somehow?  For me it would be definitly useful to
> > > test things, I am not sure how "weaker" -Os is desired in practice.
> > >
> > > Bootstrapped/regtested x86_64-linux, I will commit it later today if
> > > there are no comments.
> > >
> > > H.J., can you plase update your patch on stringopts?
> > >
> >
> > Please ahead.  My patches should be orthogonal to yours.
>
> For example you had patch that limited "rep cmpsb" expansion for
> -minline-all-stringops.  Now the conditions could be
> -minline-all-stringops || optimize_insn_for_size () == OPTIMIZE_SIZE_MAX
> since it is still useful size optimization.
>
> I am not sure if you had other changes of this nature? (It is bit hard
> to grep compiler for things like this and I would like to get these
> organized to optimize_size levels now).

Shouldn't it apply to all functions inlined by -minline-all-stringops?
Jan Hubicka Oct. 26, 2020, 5:14 p.m. UTC | #6
> >
> > For example you had patch that limited "rep cmpsb" expansion for
> > -minline-all-stringops.  Now the conditions could be
> > -minline-all-stringops || optimize_insn_for_size () == OPTIMIZE_SIZE_MAX
> > since it is still useful size optimization.
> >
> > I am not sure if you had other changes of this nature? (It is bit hard
> > to grep compiler for things like this and I would like to get these
> > organized to optimize_size levels now).
> 
> Shouldn't it apply to all functions inlined by -minline-all-stringops?

I think we handle the other cases, for code optimized for size we go for
ix86_size_memcpy and ix86_size_memset tables that say inline all with
rep movsb.  We do not inline strlen since the way it is implemented gets
too long (short inline version would be welcome).

I will look through backend, but if you are aware of more checks like
one in ix86_expand_cmpstrn_or_cmpmem which disable size optimization
even at -Os, let me know.  They are not that easy to find...

Honza
> 
> 
> -- 
> H.J.
H.J. Lu Oct. 26, 2020, 5:25 p.m. UTC | #7
On Mon, Oct 26, 2020 at 10:14 AM Jan Hubicka <hubicka@ucw.cz> wrote:
>
> > >
> > > For example you had patch that limited "rep cmpsb" expansion for
> > > -minline-all-stringops.  Now the conditions could be
> > > -minline-all-stringops || optimize_insn_for_size () == OPTIMIZE_SIZE_MAX
> > > since it is still useful size optimization.
> > >
> > > I am not sure if you had other changes of this nature? (It is bit hard
> > > to grep compiler for things like this and I would like to get these
> > > organized to optimize_size levels now).
> >
> > Shouldn't it apply to all functions inlined by -minline-all-stringops?
>
> I think we handle the other cases, for code optimized for size we go for
> ix86_size_memcpy and ix86_size_memset tables that say inline all with
> rep movsb.  We do not inline strlen since the way it is implemented gets
> too long (short inline version would be welcome).
>
> I will look through backend, but if you are aware of more checks like
> one in ix86_expand_cmpstrn_or_cmpmem which disable size optimization
> even at -Os, let me know.  They are not that easy to find...
>

[hjl@gnu-cfl-2 gcc]$ cat /tmp/x.c
int
func (char *d, unsigned int l)
{
  return __builtin_strncmp (d, "foo", l) ? 1 : 2;
}
[hjl@gnu-cfl-2 gcc]$ gcc -c -Os  /tmp/x.c
[hjl@gnu-cfl-2 gcc]$ nm x.o
0000000000000000 T func
                 U strncmp
[hjl@gnu-cfl-2 gcc]$ size x.o
   text    data     bss     dec     hex filename
    138       0       0     138      8a x.o
[hjl@gnu-cfl-2 gcc]$ gcc -c -O2  /tmp/x.c
[hjl@gnu-cfl-2 gcc]$ size x.o
   text    data     bss     dec     hex filename
    146       0       0     146      92 x.o
[hjl@gnu-cfl-2 gcc]$ nm x.o
0000000000000000 T func
[hjl@gnu-cfl-2 gcc]$

-Os shouldn't inline strncmp.
Jan Hubicka Oct. 26, 2020, 6:13 p.m. UTC | #8
> On Mon, Oct 26, 2020 at 10:14 AM Jan Hubicka <hubicka@ucw.cz> wrote:
> >
> > > >
> > > > For example you had patch that limited "rep cmpsb" expansion for
> > > > -minline-all-stringops.  Now the conditions could be
> > > > -minline-all-stringops || optimize_insn_for_size () == OPTIMIZE_SIZE_MAX
> > > > since it is still useful size optimization.
> > > >
> > > > I am not sure if you had other changes of this nature? (It is bit hard
> > > > to grep compiler for things like this and I would like to get these
> > > > organized to optimize_size levels now).
> > >
> > > Shouldn't it apply to all functions inlined by -minline-all-stringops?
> >
> > I think we handle the other cases, for code optimized for size we go for
> > ix86_size_memcpy and ix86_size_memset tables that say inline all with
> > rep movsb.  We do not inline strlen since the way it is implemented gets
> > too long (short inline version would be welcome).
> >
> > I will look through backend, but if you are aware of more checks like
> > one in ix86_expand_cmpstrn_or_cmpmem which disable size optimization
> > even at -Os, let me know.  They are not that easy to find...
> >
> 
> [hjl@gnu-cfl-2 gcc]$ cat /tmp/x.c
> int
> func (char *d, unsigned int l)
> {
>   return __builtin_strncmp (d, "foo", l) ? 1 : 2;
> }
> [hjl@gnu-cfl-2 gcc]$ gcc -c -Os  /tmp/x.c
> [hjl@gnu-cfl-2 gcc]$ nm x.o
> 0000000000000000 T func
>                  U strncmp
> [hjl@gnu-cfl-2 gcc]$ size x.o
>    text    data     bss     dec     hex filename
>     138       0       0     138      8a x.o
> [hjl@gnu-cfl-2 gcc]$ gcc -c -O2  /tmp/x.c
> [hjl@gnu-cfl-2 gcc]$ size x.o
>    text    data     bss     dec     hex filename
>     146       0       0     146      92 x.o
> [hjl@gnu-cfl-2 gcc]$ nm x.o
> 0000000000000000 T func
> [hjl@gnu-cfl-2 gcc]$
> 
> -Os shouldn't inline strncmp.
Interesting, I woul dexpect cmpsb still win.  Well, this makes it
easeir.  Sorry for delaying the patches - somehow I got them connected
with the -Os refactoring.

Honza
> 
> -- 
> H.J.
diff mbox series

Patch

diff --git a/gcc/cgraph.h b/gcc/cgraph.h
index 65e4646efcd..fb3ad95e064 100644
--- a/gcc/cgraph.h
+++ b/gcc/cgraph.h
@@ -1279,7 +1279,7 @@  struct GTY((tag ("SYMTAB_FUNCTION"))) cgraph_node : public symtab_node
   bool check_calls_comdat_local_p ();
 
   /* Return true if function should be optimized for size.  */
-  bool optimize_for_size_p (void);
+  enum optimize_size_level optimize_for_size_p (void);
 
   /* Dump the callgraph to file F.  */
   static void dump_cgraph (FILE *f);
@@ -3315,15 +3315,17 @@  cgraph_node::mark_force_output (void)
 
 /* Return true if function should be optimized for size.  */
 
-inline bool
+inline enum optimize_size_level
 cgraph_node::optimize_for_size_p (void)
 {
   if (opt_for_fn (decl, optimize_size))
-    return true;
+    return OPTIMIZE_SIZE_MAX;
+  if (count == profile_count::zero ())
+    return OPTIMIZE_SIZE_MAX;
   if (frequency == NODE_FREQUENCY_UNLIKELY_EXECUTED)
-    return true;
+    return OPTIMIZE_SIZE_BALANCED;
   else
-    return false;
+    return OPTIMIZE_SIZE_NO;
 }
 
 /* Return symtab_node for NODE or create one if it is not present
diff --git a/gcc/coretypes.h b/gcc/coretypes.h
index 81a1b594dcd..da178b6a9f6 100644
--- a/gcc/coretypes.h
+++ b/gcc/coretypes.h
@@ -444,6 +444,18 @@  enum excess_precision_type
   EXCESS_PRECISION_TYPE_FAST
 };
 
+/* Level of size optimization.  */
+
+enum optimize_size_level
+{
+  /* Do not optimize for size.  */
+  OPTIMIZE_SIZE_NO,
+  /* Optimize for size but not at extreme performance costs.  */
+  OPTIMIZE_SIZE_BALANCED,
+  /* Optimize for size as much as possible.  */
+  OPTIMIZE_SIZE_MAX
+};
+
 /* Support for user-provided GGC and PCH markers.  The first parameter
    is a pointer to a pointer, the second a cookie.  */
 typedef void (*gt_pointer_operator) (void *, void *);
diff --git a/gcc/predict.c b/gcc/predict.c
index 5983889209f..361c4019eec 100644
--- a/gcc/predict.c
+++ b/gcc/predict.c
@@ -243,7 +243,7 @@  probably_never_executed_bb_p (struct function *fun, const_basic_block bb)
 static bool
 unlikely_executed_edge_p (edge e)
 {
-  return (e->count () == profile_count::zero ()
+  return (e->src->count == profile_count::zero ()
 	  || e->probability == profile_probability::never ())
 	 || (e->flags & (EDGE_EH | EDGE_FAKE));
 }
@@ -260,13 +260,15 @@  probably_never_executed_edge_p (struct function *fun, edge e)
 
 /* Return true if function FUN should always be optimized for size.  */
 
-bool
+optimize_size_level
 optimize_function_for_size_p (struct function *fun)
 {
   if (!fun || !fun->decl)
-    return optimize_size;
+    return optimize_size ? OPTIMIZE_SIZE_MAX : OPTIMIZE_SIZE_NO;
   cgraph_node *n = cgraph_node::get (fun->decl);
-  return n && n->optimize_for_size_p ();
+  if (n)
+    return n->optimize_for_size_p ();
+  return OPTIMIZE_SIZE_NO;
 }
 
 /* Return true if function FUN should always be optimized for speed.  */
@@ -289,11 +291,16 @@  function_optimization_type (struct function *fun)
 
 /* Return TRUE if basic block BB should be optimized for size.  */
 
-bool
+optimize_size_level
 optimize_bb_for_size_p (const_basic_block bb)
 {
-  return (optimize_function_for_size_p (cfun)
-	  || (bb && !maybe_hot_bb_p (cfun, bb)));
+  enum optimize_size_level ret = optimize_function_for_size_p (cfun);
+
+  if (bb && ret < OPTIMIZE_SIZE_MAX && bb->count == profile_count::zero ())
+    ret = OPTIMIZE_SIZE_MAX;
+  if (bb && ret < OPTIMIZE_SIZE_BALANCED && !maybe_hot_bb_p (cfun, bb))
+    ret = OPTIMIZE_SIZE_BALANCED;
+  return ret;
 }
 
 /* Return TRUE if basic block BB should be optimized for speed.  */
@@ -316,10 +323,16 @@  bb_optimization_type (const_basic_block bb)
 
 /* Return TRUE if edge E should be optimized for size.  */
 
-bool
+optimize_size_level
 optimize_edge_for_size_p (edge e)
 {
-  return optimize_function_for_size_p (cfun) || !maybe_hot_edge_p (e);
+  enum optimize_size_level ret = optimize_function_for_size_p (cfun);
+
+  if (ret < OPTIMIZE_SIZE_MAX && unlikely_executed_edge_p (e))
+    ret = OPTIMIZE_SIZE_MAX;
+  if (ret < OPTIMIZE_SIZE_BALANCED && !maybe_hot_edge_p (e))
+    ret = OPTIMIZE_SIZE_BALANCED;
+  return ret;
 }
 
 /* Return TRUE if edge E should be optimized for speed.  */
@@ -332,10 +345,13 @@  optimize_edge_for_speed_p (edge e)
 
 /* Return TRUE if the current function is optimized for size.  */
 
-bool
+optimize_size_level
 optimize_insn_for_size_p (void)
 {
-  return optimize_function_for_size_p (cfun) || !crtl->maybe_hot_insn_p;
+  enum optimize_size_level ret = optimize_function_for_size_p (cfun);
+  if (ret < OPTIMIZE_SIZE_BALANCED && !crtl->maybe_hot_insn_p)
+    ret = OPTIMIZE_SIZE_BALANCED;
+  return ret;
 }
 
 /* Return TRUE if the current function is optimized for speed.  */
@@ -348,7 +364,7 @@  optimize_insn_for_speed_p (void)
 
 /* Return TRUE if LOOP should be optimized for size.  */
 
-bool
+optimize_size_level
 optimize_loop_for_size_p (class loop *loop)
 {
   return optimize_bb_for_size_p (loop->header);
@@ -392,10 +408,31 @@  optimize_loop_nest_for_speed_p (class loop *loop)
 
 /* Return TRUE if nest rooted at LOOP should be optimized for size.  */
 
-bool
+optimize_size_level
 optimize_loop_nest_for_size_p (class loop *loop)
 {
-  return !optimize_loop_nest_for_speed_p (loop);
+  enum optimize_size_level ret = optimize_loop_for_size_p (loop);
+  class loop *l = loop;
+
+  l = loop->inner;
+  while (l && l != loop)
+    {
+      if (ret == OPTIMIZE_SIZE_NO)
+	break;
+      ret = MIN (optimize_loop_for_size_p (l), ret);
+      if (l->inner)
+        l = l->inner;
+      else if (l->next)
+        l = l->next;
+      else
+        {
+	  while (l != loop && !l->next)
+	    l = loop_outer (l);
+	  if (l != loop)
+	    l = l->next;
+	}
+    }
+  return ret;
 }
 
 /* Return true if edge E is likely to be well predictable by branch
diff --git a/gcc/predict.h b/gcc/predict.h
index 274597e002f..b64d2098ab0 100644
--- a/gcc/predict.h
+++ b/gcc/predict.h
@@ -58,20 +58,20 @@  extern bool maybe_hot_bb_p (struct function *, const_basic_block);
 extern bool maybe_hot_edge_p (edge);
 extern bool probably_never_executed_bb_p (struct function *, const_basic_block);
 extern bool probably_never_executed_edge_p (struct function *, edge);
-extern bool optimize_function_for_size_p (struct function *);
+extern enum optimize_size_level optimize_function_for_size_p (struct function *);
 extern bool optimize_function_for_speed_p (struct function *);
 extern optimization_type function_optimization_type (struct function *);
-extern bool optimize_bb_for_size_p (const_basic_block);
+extern enum optimize_size_level optimize_bb_for_size_p (const_basic_block);
 extern bool optimize_bb_for_speed_p (const_basic_block);
 extern optimization_type bb_optimization_type (const_basic_block);
-extern bool optimize_edge_for_size_p (edge);
+extern enum optimize_size_level optimize_edge_for_size_p (edge);
 extern bool optimize_edge_for_speed_p (edge);
-extern bool optimize_insn_for_size_p (void);
+extern enum optimize_size_level optimize_insn_for_size_p (void);
 extern bool optimize_insn_for_speed_p (void);
-extern bool optimize_loop_for_size_p (class loop *);
+extern enum optimize_size_level optimize_loop_for_size_p (class loop *);
 extern bool optimize_loop_for_speed_p (class loop *);
 extern bool optimize_loop_nest_for_speed_p (class loop *);
-extern bool optimize_loop_nest_for_size_p (class loop *);
+extern enum optimize_size_level optimize_loop_nest_for_size_p (class loop *);
 extern bool predictable_edge_p (edge);
 extern void rtl_profile_for_bb (basic_block);
 extern void rtl_profile_for_edge (edge);