Message ID | CAO2gOZW3v87az_UJ2LTmnGaRnsdbBKNxEak3YzC=ByhgBJTgug@mail.gmail.com |
---|---|
State | New |
Headers | show |
auto profile info is not available yet in early inlining, why would this change make any difference? Can you just reset the max_iters to a higher value for autoFDO? David On Sun, Jun 2, 2013 at 6:21 PM, Dehao Chen <dehao@google.com> wrote: > The patch was committed to google-4_8, but it causes problem because > einline sets PARAM_EARLY_INLINING_INSNS = 11. This will cause > recursive inlining at einline stage (e.g. main->foo, foo->bar, > bar->foo) when autofdo is enabled. > > The following patch can fix the problem by doing more targetted early inlining: > > Index: gcc/predict.c > =================================================================== > --- gcc/predict.c (revision 199593) > +++ gcc/predict.c (working copy) > @@ -175,6 +175,8 @@ cgraph_maybe_hot_edge_p (struct cgraph_edge *edge) > && !maybe_hot_count_p (NULL, > edge->count)) > return false; > + if (flag_auto_profile) > + return false; > if (edge->caller->frequency == NODE_FREQUENCY_UNLIKELY_EXECUTED > || (edge->callee > && edge->callee->frequency == NODE_FREQUENCY_UNLIKELY_EXECUTED)) > > Performance testing on-going... > > Dehao > > On Wed, May 29, 2013 at 3:44 PM, Dehao Chen <dehao@google.com> wrote: >> OK, I'll commit the early inline part. >> >> Dehao >> >> On Wed, May 29, 2013 at 10:00 AM, Xinliang David Li <davidxl@google.com> wrote: >>> The early inlining part is ok. The tracer optimization should be >>> revisited -- we should have more fine grain control on it (for >>> instance, based on FDO summary -- but that should be common to >>> FDO/LIPO). >>> >>> David >>> >>> On Wed, May 29, 2013 at 9:39 AM, Dehao Chen <dehao@google.com> wrote: >>>> In gcc4-8, the max einline iterations are restricted to 1. For >>>> AutoFDO, this is bad because early inline is not size restricted. This >>>> patch allows einline to do multiple iterations in AutoFDO. It also >>>> enables tracer optimization in AutoFDO. >>>> >>>> Bootstrapped and passed regression test. >>>> >>>> OK for googel-4_8? >>>> >>>> Thanks, >>>> Dehao >>>> >>>> Index: gcc/ipa-inline.c >>>> =================================================================== >>>> --- gcc/ipa-inline.c (revision 199416) >>>> +++ gcc/ipa-inline.c (working copy) >>>> @@ -2161,7 +2161,8 @@ early_inliner (void) >>>> { >>>> /* We iterate incremental inlining to get trivial cases of indirect >>>> inlining. */ >>>> - while (iterations < PARAM_VALUE (PARAM_EARLY_INLINER_MAX_ITERATIONS) >>>> + while ((flag_auto_profile >>>> + || iterations < PARAM_VALUE (PARAM_EARLY_INLINER_MAX_ITERATIONS)) >>>> && early_inline_small_functions (node)) >>>> { >>>> timevar_push (TV_INTEGRATION); >>>> Index: gcc/opts.c >>>> =================================================================== >>>> --- gcc/opts.c (revision 199416) >>>> +++ gcc/opts.c (working copy) >>>> @@ -1644,6 +1644,8 @@ common_handle_option (struct gcc_options *opts, >>>> opts->x_flag_peel_loops = value; >>>> if (!opts_set->x_flag_value_profile_transformations) >>>> opts->x_flag_value_profile_transformations = value; >>>> + if (!opts_set->x_flag_tracer) >>>> + opts->x_flag_tracer = value; >>>> if (!opts_set->x_flag_inline_functions) >>>> opts->x_flag_inline_functions = value; >>>> if (!opts_set->x_flag_ipa_cp)
On Sun, Jun 2, 2013 at 7:14 PM, Xinliang David Li <davidxl@google.com> wrote: > > auto profile info is not available yet in early inlining, why would > this change make any difference? Because the check of PARAM_EARLY_INLINING_INSNS is after the check of cgraph_maybe_hot_edge_p in early inline. If cgraph_maybe_hot_edge_p fails, the early inline will not happen even if growth is less than PARAM_EARLY_INLINING_INSNS. > > Can you just reset the max_iters to a > higher value for autoFDO? We could do that, but it could still lead to some code bloat because recursive inlines can happen for at most, say 10, iterations. Dehao > > David > > On Sun, Jun 2, 2013 at 6:21 PM, Dehao Chen <dehao@google.com> wrote: > > The patch was committed to google-4_8, but it causes problem because > > einline sets PARAM_EARLY_INLINING_INSNS = 11. This will cause > > recursive inlining at einline stage (e.g. main->foo, foo->bar, > > bar->foo) when autofdo is enabled. > > > > The following patch can fix the problem by doing more targetted early inlining: > > > > Index: gcc/predict.c > > =================================================================== > > --- gcc/predict.c (revision 199593) > > +++ gcc/predict.c (working copy) > > @@ -175,6 +175,8 @@ cgraph_maybe_hot_edge_p (struct cgraph_edge *edge) > > && !maybe_hot_count_p (NULL, > > edge->count)) > > return false; > > + if (flag_auto_profile) > > + return false; > > if (edge->caller->frequency == NODE_FREQUENCY_UNLIKELY_EXECUTED > > || (edge->callee > > && edge->callee->frequency == NODE_FREQUENCY_UNLIKELY_EXECUTED)) > > > > Performance testing on-going... > > > > Dehao > > > > On Wed, May 29, 2013 at 3:44 PM, Dehao Chen <dehao@google.com> wrote: > >> OK, I'll commit the early inline part. > >> > >> Dehao > >> > >> On Wed, May 29, 2013 at 10:00 AM, Xinliang David Li <davidxl@google.com> wrote: > >>> The early inlining part is ok. The tracer optimization should be > >>> revisited -- we should have more fine grain control on it (for > >>> instance, based on FDO summary -- but that should be common to > >>> FDO/LIPO). > >>> > >>> David > >>> > >>> On Wed, May 29, 2013 at 9:39 AM, Dehao Chen <dehao@google.com> wrote: > >>>> In gcc4-8, the max einline iterations are restricted to 1. For > >>>> AutoFDO, this is bad because early inline is not size restricted. This > >>>> patch allows einline to do multiple iterations in AutoFDO. It also > >>>> enables tracer optimization in AutoFDO. > >>>> > >>>> Bootstrapped and passed regression test. > >>>> > >>>> OK for googel-4_8? > >>>> > >>>> Thanks, > >>>> Dehao > >>>> > >>>> Index: gcc/ipa-inline.c > >>>> =================================================================== > >>>> --- gcc/ipa-inline.c (revision 199416) > >>>> +++ gcc/ipa-inline.c (working copy) > >>>> @@ -2161,7 +2161,8 @@ early_inliner (void) > >>>> { > >>>> /* We iterate incremental inlining to get trivial cases of indirect > >>>> inlining. */ > >>>> - while (iterations < PARAM_VALUE (PARAM_EARLY_INLINER_MAX_ITERATIONS) > >>>> + while ((flag_auto_profile > >>>> + || iterations < PARAM_VALUE (PARAM_EARLY_INLINER_MAX_ITERATIONS)) > >>>> && early_inline_small_functions (node)) > >>>> { > >>>> timevar_push (TV_INTEGRATION); > >>>> Index: gcc/opts.c > >>>> =================================================================== > >>>> --- gcc/opts.c (revision 199416) > >>>> +++ gcc/opts.c (working copy) > >>>> @@ -1644,6 +1644,8 @@ common_handle_option (struct gcc_options *opts, > >>>> opts->x_flag_peel_loops = value; > >>>> if (!opts_set->x_flag_value_profile_transformations) > >>>> opts->x_flag_value_profile_transformations = value; > >>>> + if (!opts_set->x_flag_tracer) > >>>> + opts->x_flag_tracer = value; > >>>> if (!opts_set->x_flag_inline_functions) > >>>> opts->x_flag_inline_functions = value; > >>>> if (!opts_set->x_flag_ipa_cp)
If the purpose of the fix is to filter early inlinings with code growth in autoFDO, the proposed fix is the wrong way to do -- it changes the meaning of cgraph_maybe_hot_edge_p. David On Sun, Jun 2, 2013 at 7:25 PM, Dehao Chen <dehao@google.com> wrote: > On Sun, Jun 2, 2013 at 7:14 PM, Xinliang David Li <davidxl@google.com> wrote: >> >> auto profile info is not available yet in early inlining, why would >> this change make any difference? > > Because the check of PARAM_EARLY_INLINING_INSNS is after the check of > cgraph_maybe_hot_edge_p in early inline. If > cgraph_maybe_hot_edge_p fails, the early inline will not happen even > if growth is less than PARAM_EARLY_INLINING_INSNS. > >> >> Can you just reset the max_iters to a >> higher value for autoFDO? > > We could do that, but it could still lead to some code bloat because > recursive inlines can happen for at most, say 10, iterations. > > Dehao > >> >> David >> >> On Sun, Jun 2, 2013 at 6:21 PM, Dehao Chen <dehao@google.com> wrote: >> > The patch was committed to google-4_8, but it causes problem because >> > einline sets PARAM_EARLY_INLINING_INSNS = 11. This will cause >> > recursive inlining at einline stage (e.g. main->foo, foo->bar, >> > bar->foo) when autofdo is enabled. >> > >> > The following patch can fix the problem by doing more targetted early inlining: >> > >> > Index: gcc/predict.c >> > =================================================================== >> > --- gcc/predict.c (revision 199593) >> > +++ gcc/predict.c (working copy) >> > @@ -175,6 +175,8 @@ cgraph_maybe_hot_edge_p (struct cgraph_edge *edge) >> > && !maybe_hot_count_p (NULL, >> > edge->count)) >> > return false; >> > + if (flag_auto_profile) >> > + return false; >> > if (edge->caller->frequency == NODE_FREQUENCY_UNLIKELY_EXECUTED >> > || (edge->callee >> > && edge->callee->frequency == NODE_FREQUENCY_UNLIKELY_EXECUTED)) >> > >> > Performance testing on-going... >> > >> > Dehao >> > >> > On Wed, May 29, 2013 at 3:44 PM, Dehao Chen <dehao@google.com> wrote: >> >> OK, I'll commit the early inline part. >> >> >> >> Dehao >> >> >> >> On Wed, May 29, 2013 at 10:00 AM, Xinliang David Li <davidxl@google.com> wrote: >> >>> The early inlining part is ok. The tracer optimization should be >> >>> revisited -- we should have more fine grain control on it (for >> >>> instance, based on FDO summary -- but that should be common to >> >>> FDO/LIPO). >> >>> >> >>> David >> >>> >> >>> On Wed, May 29, 2013 at 9:39 AM, Dehao Chen <dehao@google.com> wrote: >> >>>> In gcc4-8, the max einline iterations are restricted to 1. For >> >>>> AutoFDO, this is bad because early inline is not size restricted. This >> >>>> patch allows einline to do multiple iterations in AutoFDO. It also >> >>>> enables tracer optimization in AutoFDO. >> >>>> >> >>>> Bootstrapped and passed regression test. >> >>>> >> >>>> OK for googel-4_8? >> >>>> >> >>>> Thanks, >> >>>> Dehao >> >>>> >> >>>> Index: gcc/ipa-inline.c >> >>>> =================================================================== >> >>>> --- gcc/ipa-inline.c (revision 199416) >> >>>> +++ gcc/ipa-inline.c (working copy) >> >>>> @@ -2161,7 +2161,8 @@ early_inliner (void) >> >>>> { >> >>>> /* We iterate incremental inlining to get trivial cases of indirect >> >>>> inlining. */ >> >>>> - while (iterations < PARAM_VALUE (PARAM_EARLY_INLINER_MAX_ITERATIONS) >> >>>> + while ((flag_auto_profile >> >>>> + || iterations < PARAM_VALUE (PARAM_EARLY_INLINER_MAX_ITERATIONS)) >> >>>> && early_inline_small_functions (node)) >> >>>> { >> >>>> timevar_push (TV_INTEGRATION); >> >>>> Index: gcc/opts.c >> >>>> =================================================================== >> >>>> --- gcc/opts.c (revision 199416) >> >>>> +++ gcc/opts.c (working copy) >> >>>> @@ -1644,6 +1644,8 @@ common_handle_option (struct gcc_options *opts, >> >>>> opts->x_flag_peel_loops = value; >> >>>> if (!opts_set->x_flag_value_profile_transformations) >> >>>> opts->x_flag_value_profile_transformations = value; >> >>>> + if (!opts_set->x_flag_tracer) >> >>>> + opts->x_flag_tracer = value; >> >>>> if (!opts_set->x_flag_inline_functions) >> >>>> opts->x_flag_inline_functions = value; >> >>>> if (!opts_set->x_flag_ipa_cp)
Index: gcc/predict.c =================================================================== --- gcc/predict.c (revision 199593) +++ gcc/predict.c (working copy) @@ -175,6 +175,8 @@ cgraph_maybe_hot_edge_p (struct cgraph_edge *edge) && !maybe_hot_count_p (NULL, edge->count)) return false; + if (flag_auto_profile) + return false; if (edge->caller->frequency == NODE_FREQUENCY_UNLIKELY_EXECUTED || (edge->callee && edge->callee->frequency == NODE_FREQUENCY_UNLIKELY_EXECUTED))