Message ID | CAO2gOZV8=MEHb_Vz6+ZNrUGe0dsyfG3F1hYd+mr2gyFHSCeG1Q@mail.gmail.com |
---|---|
State | New |
Headers | show |
Is there a need to have 10 iterations of early inline for autofdo? David On Thu, Nov 13, 2014 at 2:25 PM, Dehao Chen <dehao@google.com> wrote: > In AutoFDO, we increase einline iterations. This could lead to > extensive code bloat if we have recursive calls like: > > dtor() { > destroy(node); > } > > destroy(node) { > destroy(left) > destroy(right) > } > > In this case, the size growth will be around 8 which is smaller than > threshold (11). However, if we allow this to happen for 2 iterations, > it will expand the size by 1024X. To fix this problem, we want to set > a much smaller threshold in the AutoFDO case. This is because AutoFDO > do not not rely on aggressive einline to gain more profile context. > > And also, in AutoFDO pass, after we processed a function, we need to > recompute inline parameters because rebuild_cgraph_edges will zero out > all inline parameters. > > The patch is attached below, bootstrapped and perf test on-going. OK > for google-4_9? > > Thanks, > Dehao > > Index: gcc/auto-profile.c > =================================================================== > --- gcc/auto-profile.c (revision 217523) > +++ gcc/auto-profile.c (working copy) > @@ -1771,6 +1771,7 @@ auto_profile (void) > free_dominance_info (CDI_DOMINATORS); > free_dominance_info (CDI_POST_DOMINATORS); > rebuild_cgraph_edges (); > + compute_inline_parameters (cgraph_get_node > (current_function_decl), true); > pop_cfun (); > } > > Index: gcc/opts.c > =================================================================== > --- gcc/opts.c (revision 217523) > +++ gcc/opts.c (working copy) > @@ -1853,6 +1853,12 @@ common_handle_option (struct gcc_options *opts, > maybe_set_param_value ( > PARAM_EARLY_INLINER_MAX_ITERATIONS, 10, > opts->x_param_values, opts_set->x_param_values); > + maybe_set_param_value ( > + PARAM_EARLY_INLINING_INSNS, 4, > + opts->x_param_values, opts_set->x_param_values); > + maybe_set_param_value ( > + PARAM_EARLY_INLINING_INSNS_COMDAT, 4, > + opts->x_param_values, opts_set->x_param_values); > value = true; > /* No break here - do -fauto-profile processing. */ > case OPT_fauto_profile:
After inline summary is recomputed, the large code growth problem will also be better controlled, right? David On Thu, Nov 13, 2014 at 2:48 PM, Xinliang David Li <davidxl@google.com> wrote: > Is there a need to have 10 iterations of early inline for autofdo? > > David > > On Thu, Nov 13, 2014 at 2:25 PM, Dehao Chen <dehao@google.com> wrote: >> In AutoFDO, we increase einline iterations. This could lead to >> extensive code bloat if we have recursive calls like: >> >> dtor() { >> destroy(node); >> } >> >> destroy(node) { >> destroy(left) >> destroy(right) >> } >> >> In this case, the size growth will be around 8 which is smaller than >> threshold (11). However, if we allow this to happen for 2 iterations, >> it will expand the size by 1024X. To fix this problem, we want to set >> a much smaller threshold in the AutoFDO case. This is because AutoFDO >> do not not rely on aggressive einline to gain more profile context. >> >> And also, in AutoFDO pass, after we processed a function, we need to >> recompute inline parameters because rebuild_cgraph_edges will zero out >> all inline parameters. >> >> The patch is attached below, bootstrapped and perf test on-going. OK >> for google-4_9? >> >> Thanks, >> Dehao >> >> Index: gcc/auto-profile.c >> =================================================================== >> --- gcc/auto-profile.c (revision 217523) >> +++ gcc/auto-profile.c (working copy) >> @@ -1771,6 +1771,7 @@ auto_profile (void) >> free_dominance_info (CDI_DOMINATORS); >> free_dominance_info (CDI_POST_DOMINATORS); >> rebuild_cgraph_edges (); >> + compute_inline_parameters (cgraph_get_node >> (current_function_decl), true); >> pop_cfun (); >> } >> >> Index: gcc/opts.c >> =================================================================== >> --- gcc/opts.c (revision 217523) >> +++ gcc/opts.c (working copy) >> @@ -1853,6 +1853,12 @@ common_handle_option (struct gcc_options *opts, >> maybe_set_param_value ( >> PARAM_EARLY_INLINER_MAX_ITERATIONS, 10, >> opts->x_param_values, opts_set->x_param_values); >> + maybe_set_param_value ( >> + PARAM_EARLY_INLINING_INSNS, 4, >> + opts->x_param_values, opts_set->x_param_values); >> + maybe_set_param_value ( >> + PARAM_EARLY_INLINING_INSNS_COMDAT, 4, >> + opts->x_param_values, opts_set->x_param_values); >> value = true; >> /* No break here - do -fauto-profile processing. */ >> case OPT_fauto_profile:
IIRC, AutoFDO the actual iteration for AutoFDO is mostly <3. But it should not harm to set max iter as 10. On Thu, Nov 13, 2014 at 2:51 PM, Xinliang David Li <davidxl@google.com> wrote: > After inline summary is recomputed, the large code growth problem will > also be better controlled, right? For this case, recomputing inline summary does not help because the code was bloated in first einline phase. Dehao > > David > > On Thu, Nov 13, 2014 at 2:48 PM, Xinliang David Li <davidxl@google.com> wrote: >> Is there a need to have 10 iterations of early inline for autofdo? >> >> David >> >> On Thu, Nov 13, 2014 at 2:25 PM, Dehao Chen <dehao@google.com> wrote: >>> In AutoFDO, we increase einline iterations. This could lead to >>> extensive code bloat if we have recursive calls like: >>> >>> dtor() { >>> destroy(node); >>> } >>> >>> destroy(node) { >>> destroy(left) >>> destroy(right) >>> } >>> >>> In this case, the size growth will be around 8 which is smaller than >>> threshold (11). However, if we allow this to happen for 2 iterations, >>> it will expand the size by 1024X. To fix this problem, we want to set >>> a much smaller threshold in the AutoFDO case. This is because AutoFDO >>> do not not rely on aggressive einline to gain more profile context. >>> >>> And also, in AutoFDO pass, after we processed a function, we need to >>> recompute inline parameters because rebuild_cgraph_edges will zero out >>> all inline parameters. >>> >>> The patch is attached below, bootstrapped and perf test on-going. OK >>> for google-4_9? >>> >>> Thanks, >>> Dehao >>> >>> Index: gcc/auto-profile.c >>> =================================================================== >>> --- gcc/auto-profile.c (revision 217523) >>> +++ gcc/auto-profile.c (working copy) >>> @@ -1771,6 +1771,7 @@ auto_profile (void) >>> free_dominance_info (CDI_DOMINATORS); >>> free_dominance_info (CDI_POST_DOMINATORS); >>> rebuild_cgraph_edges (); >>> + compute_inline_parameters (cgraph_get_node >>> (current_function_decl), true); >>> pop_cfun (); >>> } >>> >>> Index: gcc/opts.c >>> =================================================================== >>> --- gcc/opts.c (revision 217523) >>> +++ gcc/opts.c (working copy) >>> @@ -1853,6 +1853,12 @@ common_handle_option (struct gcc_options *opts, >>> maybe_set_param_value ( >>> PARAM_EARLY_INLINER_MAX_ITERATIONS, 10, >>> opts->x_param_values, opts_set->x_param_values); >>> + maybe_set_param_value ( >>> + PARAM_EARLY_INLINING_INSNS, 4, >>> + opts->x_param_values, opts_set->x_param_values); >>> + maybe_set_param_value ( >>> + PARAM_EARLY_INLINING_INSNS_COMDAT, 4, >>> + opts->x_param_values, opts_set->x_param_values); >>> value = true; >>> /* No break here - do -fauto-profile processing. */ >>> case OPT_fauto_profile:
On Thu, Nov 13, 2014 at 2:57 PM, Dehao Chen <dehao@google.com> wrote: > IIRC, AutoFDO the actual iteration for AutoFDO is mostly <3. But it > should not harm to set max iter as 10. > > On Thu, Nov 13, 2014 at 2:51 PM, Xinliang David Li <davidxl@google.com> wrote: >> After inline summary is recomputed, the large code growth problem will >> also be better controlled, right? > > For this case, recomputing inline summary does not help because the > code was bloated in first einline phase. For recursive inlining, the inline summary for the cloned edges need to be updated to prevent the growth? david > > Dehao > >> >> David >> >> On Thu, Nov 13, 2014 at 2:48 PM, Xinliang David Li <davidxl@google.com> wrote: >>> Is there a need to have 10 iterations of early inline for autofdo? >>> >>> David >>> >>> On Thu, Nov 13, 2014 at 2:25 PM, Dehao Chen <dehao@google.com> wrote: >>>> In AutoFDO, we increase einline iterations. This could lead to >>>> extensive code bloat if we have recursive calls like: >>>> >>>> dtor() { >>>> destroy(node); >>>> } >>>> >>>> destroy(node) { >>>> destroy(left) >>>> destroy(right) >>>> } >>>> >>>> In this case, the size growth will be around 8 which is smaller than >>>> threshold (11). However, if we allow this to happen for 2 iterations, >>>> it will expand the size by 1024X. To fix this problem, we want to set >>>> a much smaller threshold in the AutoFDO case. This is because AutoFDO >>>> do not not rely on aggressive einline to gain more profile context. >>>> >>>> And also, in AutoFDO pass, after we processed a function, we need to >>>> recompute inline parameters because rebuild_cgraph_edges will zero out >>>> all inline parameters. >>>> >>>> The patch is attached below, bootstrapped and perf test on-going. OK >>>> for google-4_9? >>>> >>>> Thanks, >>>> Dehao >>>> >>>> Index: gcc/auto-profile.c >>>> =================================================================== >>>> --- gcc/auto-profile.c (revision 217523) >>>> +++ gcc/auto-profile.c (working copy) >>>> @@ -1771,6 +1771,7 @@ auto_profile (void) >>>> free_dominance_info (CDI_DOMINATORS); >>>> free_dominance_info (CDI_POST_DOMINATORS); >>>> rebuild_cgraph_edges (); >>>> + compute_inline_parameters (cgraph_get_node >>>> (current_function_decl), true); >>>> pop_cfun (); >>>> } >>>> >>>> Index: gcc/opts.c >>>> =================================================================== >>>> --- gcc/opts.c (revision 217523) >>>> +++ gcc/opts.c (working copy) >>>> @@ -1853,6 +1853,12 @@ common_handle_option (struct gcc_options *opts, >>>> maybe_set_param_value ( >>>> PARAM_EARLY_INLINER_MAX_ITERATIONS, 10, >>>> opts->x_param_values, opts_set->x_param_values); >>>> + maybe_set_param_value ( >>>> + PARAM_EARLY_INLINING_INSNS, 4, >>>> + opts->x_param_values, opts_set->x_param_values); >>>> + maybe_set_param_value ( >>>> + PARAM_EARLY_INLINING_INSNS_COMDAT, 4, >>>> + opts->x_param_values, opts_set->x_param_values); >>>> value = true; >>>> /* No break here - do -fauto-profile processing. */ >>>> case OPT_fauto_profile:
We do not do sophisticated recursive call detection in einline phase. It only happens in ipa-inline phase. Dehao On Thu, Nov 13, 2014 at 3:18 PM, Xinliang David Li <davidxl@google.com> wrote: > On Thu, Nov 13, 2014 at 2:57 PM, Dehao Chen <dehao@google.com> wrote: >> IIRC, AutoFDO the actual iteration for AutoFDO is mostly <3. But it >> should not harm to set max iter as 10. >> >> On Thu, Nov 13, 2014 at 2:51 PM, Xinliang David Li <davidxl@google.com> wrote: >>> After inline summary is recomputed, the large code growth problem will >>> also be better controlled, right? >> >> For this case, recomputing inline summary does not help because the >> code was bloated in first einline phase. > > For recursive inlining, the inline summary for the cloned edges need > to be updated to prevent the growth? > > david > >> >> Dehao >> >>> >>> David >>> >>> On Thu, Nov 13, 2014 at 2:48 PM, Xinliang David Li <davidxl@google.com> wrote: >>>> Is there a need to have 10 iterations of early inline for autofdo? >>>> >>>> David >>>> >>>> On Thu, Nov 13, 2014 at 2:25 PM, Dehao Chen <dehao@google.com> wrote: >>>>> In AutoFDO, we increase einline iterations. This could lead to >>>>> extensive code bloat if we have recursive calls like: >>>>> >>>>> dtor() { >>>>> destroy(node); >>>>> } >>>>> >>>>> destroy(node) { >>>>> destroy(left) >>>>> destroy(right) >>>>> } >>>>> >>>>> In this case, the size growth will be around 8 which is smaller than >>>>> threshold (11). However, if we allow this to happen for 2 iterations, >>>>> it will expand the size by 1024X. To fix this problem, we want to set >>>>> a much smaller threshold in the AutoFDO case. This is because AutoFDO >>>>> do not not rely on aggressive einline to gain more profile context. >>>>> >>>>> And also, in AutoFDO pass, after we processed a function, we need to >>>>> recompute inline parameters because rebuild_cgraph_edges will zero out >>>>> all inline parameters. >>>>> >>>>> The patch is attached below, bootstrapped and perf test on-going. OK >>>>> for google-4_9? >>>>> >>>>> Thanks, >>>>> Dehao >>>>> >>>>> Index: gcc/auto-profile.c >>>>> =================================================================== >>>>> --- gcc/auto-profile.c (revision 217523) >>>>> +++ gcc/auto-profile.c (working copy) >>>>> @@ -1771,6 +1771,7 @@ auto_profile (void) >>>>> free_dominance_info (CDI_DOMINATORS); >>>>> free_dominance_info (CDI_POST_DOMINATORS); >>>>> rebuild_cgraph_edges (); >>>>> + compute_inline_parameters (cgraph_get_node >>>>> (current_function_decl), true); >>>>> pop_cfun (); >>>>> } >>>>> >>>>> Index: gcc/opts.c >>>>> =================================================================== >>>>> --- gcc/opts.c (revision 217523) >>>>> +++ gcc/opts.c (working copy) >>>>> @@ -1853,6 +1853,12 @@ common_handle_option (struct gcc_options *opts, >>>>> maybe_set_param_value ( >>>>> PARAM_EARLY_INLINER_MAX_ITERATIONS, 10, >>>>> opts->x_param_values, opts_set->x_param_values); >>>>> + maybe_set_param_value ( >>>>> + PARAM_EARLY_INLINING_INSNS, 4, >>>>> + opts->x_param_values, opts_set->x_param_values); >>>>> + maybe_set_param_value ( >>>>> + PARAM_EARLY_INLINING_INSNS_COMDAT, 4, >>>>> + opts->x_param_values, opts_set->x_param_values); >>>>> value = true; >>>>> /* No break here - do -fauto-profile processing. */ >>>>> case OPT_fauto_profile:
Index: gcc/auto-profile.c =================================================================== --- gcc/auto-profile.c (revision 217523) +++ gcc/auto-profile.c (working copy) @@ -1771,6 +1771,7 @@ auto_profile (void) free_dominance_info (CDI_DOMINATORS); free_dominance_info (CDI_POST_DOMINATORS); rebuild_cgraph_edges (); + compute_inline_parameters (cgraph_get_node (current_function_decl), true); pop_cfun (); } Index: gcc/opts.c =================================================================== --- gcc/opts.c (revision 217523) +++ gcc/opts.c (working copy) @@ -1853,6 +1853,12 @@ common_handle_option (struct gcc_options *opts, maybe_set_param_value ( PARAM_EARLY_INLINER_MAX_ITERATIONS, 10, opts->x_param_values, opts_set->x_param_values); + maybe_set_param_value ( + PARAM_EARLY_INLINING_INSNS, 4, + opts->x_param_values, opts_set->x_param_values); + maybe_set_param_value ( + PARAM_EARLY_INLINING_INSNS_COMDAT, 4, + opts->x_param_values, opts_set->x_param_values); value = true; /* No break here - do -fauto-profile processing. */ case OPT_fauto_profile: