Message ID | 20201026142259.GA66596@kam.mff.cuni.cz |
---|---|
State | New |
Headers | show |
Series | Implement three-level optimize_for_size predicates | expand |
On Mon, Oct 26, 2020 at 7:23 AM Jan Hubicka <hubicka@ucw.cz> wrote: > > Hi, > this patch implements thre two-state optimize_for_size predicates, so with -Os > and with profile feedback for never executed code it returns OPTIMIZE_SIZE_MAX > while in cases we decide to optimize for size based on branch prediction logic > it return OPTIMIZE_SIZE_BALLANCED. > > The idea is that for places where we guess that code is unlikely we do not > want to do extreme optimizations for size that leads to many fold slowdowns > (using idiv rather than few shigts or using rep based inlined stringops). > > I will update RTL handling code to also support this with BB granuality (which > we don't currently). We also should make attribute cold to lead to > OPTIMIZE_SIZE_MAX I would say. > > LLVM has -Os and -Oz levels where -Oz is our -Os and LLVM's -Os would > ocrrespond to OPTIMIZE_SIZE_BALLANCED. I wonder if we want to export > this to command line somehow? For me it would be definitly useful to > test things, I am not sure how "weaker" -Os is desired in practice. > > Bootstrapped/regtested x86_64-linux, I will commit it later today if > there are no comments. > > H.J., can you plase update your patch on stringopts? > Please ahead. My patches should be orthogonal to yours.
On 10/26/20 3:22 PM, Jan Hubicka wrote: > Hi, > this patch implements thre two-state optimize_for_size predicates, so with -Os > and with profile feedback for never executed code it returns OPTIMIZE_SIZE_MAX > while in cases we decide to optimize for size based on branch prediction logic > it return OPTIMIZE_SIZE_BALLANCED. Hello. Do we want to somehow correspondent to -fprofile-partial-training? Or is the -fprofile-partial-training option basically dead with your new levels? Thanks, Martin
> On 10/26/20 3:22 PM, Jan Hubicka wrote: > > Hi, > > this patch implements thre two-state optimize_for_size predicates, so with -Os > > and with profile feedback for never executed code it returns OPTIMIZE_SIZE_MAX > > while in cases we decide to optimize for size based on branch prediction logic > > it return OPTIMIZE_SIZE_BALLANCED. > > Hello. > > Do we want to somehow correspondent to -fprofile-partial-training? Or is the > -fprofile-partial-training option basically dead with your new levels? partial-training will set counts to guessed 0 instead of absolute 0, so we will use OPTIMIZE_SIZE_BALLANCED instead of MAX for things that was not executed. We will still use MAX for portions with optimize_size and code detected by safe heuristics, like code just before abort. Honza > > Thanks, > Martin
> On Mon, Oct 26, 2020 at 7:23 AM Jan Hubicka <hubicka@ucw.cz> wrote: > > > > Hi, > > this patch implements thre two-state optimize_for_size predicates, so with -Os > > and with profile feedback for never executed code it returns OPTIMIZE_SIZE_MAX > > while in cases we decide to optimize for size based on branch prediction logic > > it return OPTIMIZE_SIZE_BALLANCED. > > > > The idea is that for places where we guess that code is unlikely we do not > > want to do extreme optimizations for size that leads to many fold slowdowns > > (using idiv rather than few shigts or using rep based inlined stringops). > > > > I will update RTL handling code to also support this with BB granuality (which > > we don't currently). We also should make attribute cold to lead to > > OPTIMIZE_SIZE_MAX I would say. > > > > LLVM has -Os and -Oz levels where -Oz is our -Os and LLVM's -Os would > > ocrrespond to OPTIMIZE_SIZE_BALLANCED. I wonder if we want to export > > this to command line somehow? For me it would be definitly useful to > > test things, I am not sure how "weaker" -Os is desired in practice. > > > > Bootstrapped/regtested x86_64-linux, I will commit it later today if > > there are no comments. > > > > H.J., can you plase update your patch on stringopts? > > > > Please ahead. My patches should be orthogonal to yours. For example you had patch that limited "rep cmpsb" expansion for -minline-all-stringops. Now the conditions could be -minline-all-stringops || optimize_insn_for_size () == OPTIMIZE_SIZE_MAX since it is still useful size optimization. I am not sure if you had other changes of this nature? (It is bit hard to grep compiler for things like this and I would like to get these organized to optimize_size levels now). Honza > > -- > H.J.
On Mon, Oct 26, 2020 at 7:36 AM Jan Hubicka <hubicka@ucw.cz> wrote: > > > On Mon, Oct 26, 2020 at 7:23 AM Jan Hubicka <hubicka@ucw.cz> wrote: > > > > > > Hi, > > > this patch implements thre two-state optimize_for_size predicates, so with -Os > > > and with profile feedback for never executed code it returns OPTIMIZE_SIZE_MAX > > > while in cases we decide to optimize for size based on branch prediction logic > > > it return OPTIMIZE_SIZE_BALLANCED. > > > > > > The idea is that for places where we guess that code is unlikely we do not > > > want to do extreme optimizations for size that leads to many fold slowdowns > > > (using idiv rather than few shigts or using rep based inlined stringops). > > > > > > I will update RTL handling code to also support this with BB granuality (which > > > we don't currently). We also should make attribute cold to lead to > > > OPTIMIZE_SIZE_MAX I would say. > > > > > > LLVM has -Os and -Oz levels where -Oz is our -Os and LLVM's -Os would > > > ocrrespond to OPTIMIZE_SIZE_BALLANCED. I wonder if we want to export > > > this to command line somehow? For me it would be definitly useful to > > > test things, I am not sure how "weaker" -Os is desired in practice. > > > > > > Bootstrapped/regtested x86_64-linux, I will commit it later today if > > > there are no comments. > > > > > > H.J., can you plase update your patch on stringopts? > > > > > > > Please ahead. My patches should be orthogonal to yours. > > For example you had patch that limited "rep cmpsb" expansion for > -minline-all-stringops. Now the conditions could be > -minline-all-stringops || optimize_insn_for_size () == OPTIMIZE_SIZE_MAX > since it is still useful size optimization. > > I am not sure if you had other changes of this nature? (It is bit hard > to grep compiler for things like this and I would like to get these > organized to optimize_size levels now). Shouldn't it apply to all functions inlined by -minline-all-stringops?
> > > > For example you had patch that limited "rep cmpsb" expansion for > > -minline-all-stringops. Now the conditions could be > > -minline-all-stringops || optimize_insn_for_size () == OPTIMIZE_SIZE_MAX > > since it is still useful size optimization. > > > > I am not sure if you had other changes of this nature? (It is bit hard > > to grep compiler for things like this and I would like to get these > > organized to optimize_size levels now). > > Shouldn't it apply to all functions inlined by -minline-all-stringops? I think we handle the other cases, for code optimized for size we go for ix86_size_memcpy and ix86_size_memset tables that say inline all with rep movsb. We do not inline strlen since the way it is implemented gets too long (short inline version would be welcome). I will look through backend, but if you are aware of more checks like one in ix86_expand_cmpstrn_or_cmpmem which disable size optimization even at -Os, let me know. They are not that easy to find... Honza > > > -- > H.J.
On Mon, Oct 26, 2020 at 10:14 AM Jan Hubicka <hubicka@ucw.cz> wrote: > > > > > > > For example you had patch that limited "rep cmpsb" expansion for > > > -minline-all-stringops. Now the conditions could be > > > -minline-all-stringops || optimize_insn_for_size () == OPTIMIZE_SIZE_MAX > > > since it is still useful size optimization. > > > > > > I am not sure if you had other changes of this nature? (It is bit hard > > > to grep compiler for things like this and I would like to get these > > > organized to optimize_size levels now). > > > > Shouldn't it apply to all functions inlined by -minline-all-stringops? > > I think we handle the other cases, for code optimized for size we go for > ix86_size_memcpy and ix86_size_memset tables that say inline all with > rep movsb. We do not inline strlen since the way it is implemented gets > too long (short inline version would be welcome). > > I will look through backend, but if you are aware of more checks like > one in ix86_expand_cmpstrn_or_cmpmem which disable size optimization > even at -Os, let me know. They are not that easy to find... > [hjl@gnu-cfl-2 gcc]$ cat /tmp/x.c int func (char *d, unsigned int l) { return __builtin_strncmp (d, "foo", l) ? 1 : 2; } [hjl@gnu-cfl-2 gcc]$ gcc -c -Os /tmp/x.c [hjl@gnu-cfl-2 gcc]$ nm x.o 0000000000000000 T func U strncmp [hjl@gnu-cfl-2 gcc]$ size x.o text data bss dec hex filename 138 0 0 138 8a x.o [hjl@gnu-cfl-2 gcc]$ gcc -c -O2 /tmp/x.c [hjl@gnu-cfl-2 gcc]$ size x.o text data bss dec hex filename 146 0 0 146 92 x.o [hjl@gnu-cfl-2 gcc]$ nm x.o 0000000000000000 T func [hjl@gnu-cfl-2 gcc]$ -Os shouldn't inline strncmp.
> On Mon, Oct 26, 2020 at 10:14 AM Jan Hubicka <hubicka@ucw.cz> wrote: > > > > > > > > > > For example you had patch that limited "rep cmpsb" expansion for > > > > -minline-all-stringops. Now the conditions could be > > > > -minline-all-stringops || optimize_insn_for_size () == OPTIMIZE_SIZE_MAX > > > > since it is still useful size optimization. > > > > > > > > I am not sure if you had other changes of this nature? (It is bit hard > > > > to grep compiler for things like this and I would like to get these > > > > organized to optimize_size levels now). > > > > > > Shouldn't it apply to all functions inlined by -minline-all-stringops? > > > > I think we handle the other cases, for code optimized for size we go for > > ix86_size_memcpy and ix86_size_memset tables that say inline all with > > rep movsb. We do not inline strlen since the way it is implemented gets > > too long (short inline version would be welcome). > > > > I will look through backend, but if you are aware of more checks like > > one in ix86_expand_cmpstrn_or_cmpmem which disable size optimization > > even at -Os, let me know. They are not that easy to find... > > > > [hjl@gnu-cfl-2 gcc]$ cat /tmp/x.c > int > func (char *d, unsigned int l) > { > return __builtin_strncmp (d, "foo", l) ? 1 : 2; > } > [hjl@gnu-cfl-2 gcc]$ gcc -c -Os /tmp/x.c > [hjl@gnu-cfl-2 gcc]$ nm x.o > 0000000000000000 T func > U strncmp > [hjl@gnu-cfl-2 gcc]$ size x.o > text data bss dec hex filename > 138 0 0 138 8a x.o > [hjl@gnu-cfl-2 gcc]$ gcc -c -O2 /tmp/x.c > [hjl@gnu-cfl-2 gcc]$ size x.o > text data bss dec hex filename > 146 0 0 146 92 x.o > [hjl@gnu-cfl-2 gcc]$ nm x.o > 0000000000000000 T func > [hjl@gnu-cfl-2 gcc]$ > > -Os shouldn't inline strncmp. Interesting, I woul dexpect cmpsb still win. Well, this makes it easeir. Sorry for delaying the patches - somehow I got them connected with the -Os refactoring. Honza > > -- > H.J.
diff --git a/gcc/cgraph.h b/gcc/cgraph.h index 65e4646efcd..fb3ad95e064 100644 --- a/gcc/cgraph.h +++ b/gcc/cgraph.h @@ -1279,7 +1279,7 @@ struct GTY((tag ("SYMTAB_FUNCTION"))) cgraph_node : public symtab_node bool check_calls_comdat_local_p (); /* Return true if function should be optimized for size. */ - bool optimize_for_size_p (void); + enum optimize_size_level optimize_for_size_p (void); /* Dump the callgraph to file F. */ static void dump_cgraph (FILE *f); @@ -3315,15 +3315,17 @@ cgraph_node::mark_force_output (void) /* Return true if function should be optimized for size. */ -inline bool +inline enum optimize_size_level cgraph_node::optimize_for_size_p (void) { if (opt_for_fn (decl, optimize_size)) - return true; + return OPTIMIZE_SIZE_MAX; + if (count == profile_count::zero ()) + return OPTIMIZE_SIZE_MAX; if (frequency == NODE_FREQUENCY_UNLIKELY_EXECUTED) - return true; + return OPTIMIZE_SIZE_BALANCED; else - return false; + return OPTIMIZE_SIZE_NO; } /* Return symtab_node for NODE or create one if it is not present diff --git a/gcc/coretypes.h b/gcc/coretypes.h index 81a1b594dcd..da178b6a9f6 100644 --- a/gcc/coretypes.h +++ b/gcc/coretypes.h @@ -444,6 +444,18 @@ enum excess_precision_type EXCESS_PRECISION_TYPE_FAST }; +/* Level of size optimization. */ + +enum optimize_size_level +{ + /* Do not optimize for size. */ + OPTIMIZE_SIZE_NO, + /* Optimize for size but not at extreme performance costs. */ + OPTIMIZE_SIZE_BALANCED, + /* Optimize for size as much as possible. */ + OPTIMIZE_SIZE_MAX +}; + /* Support for user-provided GGC and PCH markers. The first parameter is a pointer to a pointer, the second a cookie. */ typedef void (*gt_pointer_operator) (void *, void *); diff --git a/gcc/predict.c b/gcc/predict.c index 5983889209f..361c4019eec 100644 --- a/gcc/predict.c +++ b/gcc/predict.c @@ -243,7 +243,7 @@ probably_never_executed_bb_p (struct function *fun, const_basic_block bb) static bool unlikely_executed_edge_p (edge e) { - return (e->count () == profile_count::zero () + return (e->src->count == profile_count::zero () || e->probability == profile_probability::never ()) || (e->flags & (EDGE_EH | EDGE_FAKE)); } @@ -260,13 +260,15 @@ probably_never_executed_edge_p (struct function *fun, edge e) /* Return true if function FUN should always be optimized for size. */ -bool +optimize_size_level optimize_function_for_size_p (struct function *fun) { if (!fun || !fun->decl) - return optimize_size; + return optimize_size ? OPTIMIZE_SIZE_MAX : OPTIMIZE_SIZE_NO; cgraph_node *n = cgraph_node::get (fun->decl); - return n && n->optimize_for_size_p (); + if (n) + return n->optimize_for_size_p (); + return OPTIMIZE_SIZE_NO; } /* Return true if function FUN should always be optimized for speed. */ @@ -289,11 +291,16 @@ function_optimization_type (struct function *fun) /* Return TRUE if basic block BB should be optimized for size. */ -bool +optimize_size_level optimize_bb_for_size_p (const_basic_block bb) { - return (optimize_function_for_size_p (cfun) - || (bb && !maybe_hot_bb_p (cfun, bb))); + enum optimize_size_level ret = optimize_function_for_size_p (cfun); + + if (bb && ret < OPTIMIZE_SIZE_MAX && bb->count == profile_count::zero ()) + ret = OPTIMIZE_SIZE_MAX; + if (bb && ret < OPTIMIZE_SIZE_BALANCED && !maybe_hot_bb_p (cfun, bb)) + ret = OPTIMIZE_SIZE_BALANCED; + return ret; } /* Return TRUE if basic block BB should be optimized for speed. */ @@ -316,10 +323,16 @@ bb_optimization_type (const_basic_block bb) /* Return TRUE if edge E should be optimized for size. */ -bool +optimize_size_level optimize_edge_for_size_p (edge e) { - return optimize_function_for_size_p (cfun) || !maybe_hot_edge_p (e); + enum optimize_size_level ret = optimize_function_for_size_p (cfun); + + if (ret < OPTIMIZE_SIZE_MAX && unlikely_executed_edge_p (e)) + ret = OPTIMIZE_SIZE_MAX; + if (ret < OPTIMIZE_SIZE_BALANCED && !maybe_hot_edge_p (e)) + ret = OPTIMIZE_SIZE_BALANCED; + return ret; } /* Return TRUE if edge E should be optimized for speed. */ @@ -332,10 +345,13 @@ optimize_edge_for_speed_p (edge e) /* Return TRUE if the current function is optimized for size. */ -bool +optimize_size_level optimize_insn_for_size_p (void) { - return optimize_function_for_size_p (cfun) || !crtl->maybe_hot_insn_p; + enum optimize_size_level ret = optimize_function_for_size_p (cfun); + if (ret < OPTIMIZE_SIZE_BALANCED && !crtl->maybe_hot_insn_p) + ret = OPTIMIZE_SIZE_BALANCED; + return ret; } /* Return TRUE if the current function is optimized for speed. */ @@ -348,7 +364,7 @@ optimize_insn_for_speed_p (void) /* Return TRUE if LOOP should be optimized for size. */ -bool +optimize_size_level optimize_loop_for_size_p (class loop *loop) { return optimize_bb_for_size_p (loop->header); @@ -392,10 +408,31 @@ optimize_loop_nest_for_speed_p (class loop *loop) /* Return TRUE if nest rooted at LOOP should be optimized for size. */ -bool +optimize_size_level optimize_loop_nest_for_size_p (class loop *loop) { - return !optimize_loop_nest_for_speed_p (loop); + enum optimize_size_level ret = optimize_loop_for_size_p (loop); + class loop *l = loop; + + l = loop->inner; + while (l && l != loop) + { + if (ret == OPTIMIZE_SIZE_NO) + break; + ret = MIN (optimize_loop_for_size_p (l), ret); + if (l->inner) + l = l->inner; + else if (l->next) + l = l->next; + else + { + while (l != loop && !l->next) + l = loop_outer (l); + if (l != loop) + l = l->next; + } + } + return ret; } /* Return true if edge E is likely to be well predictable by branch diff --git a/gcc/predict.h b/gcc/predict.h index 274597e002f..b64d2098ab0 100644 --- a/gcc/predict.h +++ b/gcc/predict.h @@ -58,20 +58,20 @@ extern bool maybe_hot_bb_p (struct function *, const_basic_block); extern bool maybe_hot_edge_p (edge); extern bool probably_never_executed_bb_p (struct function *, const_basic_block); extern bool probably_never_executed_edge_p (struct function *, edge); -extern bool optimize_function_for_size_p (struct function *); +extern enum optimize_size_level optimize_function_for_size_p (struct function *); extern bool optimize_function_for_speed_p (struct function *); extern optimization_type function_optimization_type (struct function *); -extern bool optimize_bb_for_size_p (const_basic_block); +extern enum optimize_size_level optimize_bb_for_size_p (const_basic_block); extern bool optimize_bb_for_speed_p (const_basic_block); extern optimization_type bb_optimization_type (const_basic_block); -extern bool optimize_edge_for_size_p (edge); +extern enum optimize_size_level optimize_edge_for_size_p (edge); extern bool optimize_edge_for_speed_p (edge); -extern bool optimize_insn_for_size_p (void); +extern enum optimize_size_level optimize_insn_for_size_p (void); extern bool optimize_insn_for_speed_p (void); -extern bool optimize_loop_for_size_p (class loop *); +extern enum optimize_size_level optimize_loop_for_size_p (class loop *); extern bool optimize_loop_for_speed_p (class loop *); extern bool optimize_loop_nest_for_speed_p (class loop *); -extern bool optimize_loop_nest_for_size_p (class loop *); +extern enum optimize_size_level optimize_loop_nest_for_size_p (class loop *); extern bool predictable_edge_p (edge); extern void rtl_profile_for_bb (basic_block); extern void rtl_profile_for_edge (edge);