Message ID | CAO2gOZWLrtyNNZyz0q3aqF+bxay30+Y4hhCsbYzTTQcooTKbnw@mail.gmail.com |
---|---|
State | New |
Headers | show |
Is it sufficient to check if the final caller is defined in primary module? Note that in some cases, doing profile update 'speculatively' (without your fix) can be more precise (assuming the inlining gets materialized in a different compilation), but in general it might be better to be conservative in profile update (as in your patch) -- the downside is you may end up with less aggressive size optimization. David On Mon, Oct 28, 2013 at 3:51 PM, Dehao Chen <dehao@google.com> wrote: > This patch changes to no update callee count if caller count is not a > resolved node (in LIPO mode) during AutoFDO compilation. This is > because AutoFDO will have the same edge counts for all unresolved > nodes. Original update method will lead to multi-update of the callee. > > Bootstrapped and testing on going. > > OK for google-4_8 if test is OK? > > Thanks, > Dehao > > Index: gcc/ipa-inline-transform.c > =================================================================== > --- gcc/ipa-inline-transform.c (revision 204131) > +++ gcc/ipa-inline-transform.c (working copy) > @@ -169,6 +169,15 @@ clone_inlined_nodes (struct cgraph_edge *e, bool d > else > { > struct cgraph_node *n; > + if (flag_auto_profile && L_IPO_COMP_MODE > + && cgraph_pre_profiling_inlining_done) > + { > + struct cgraph_node *caller = e->caller; > + if (caller->global.inlined_to) > + caller = caller->global.inlined_to; > + if (cgraph_lipo_get_resolved_node (caller->symbol.decl) != caller) > + update_original = false; > + } > n = cgraph_clone_node (e->callee, e->callee->symbol.decl, > e->count, e->frequency, > update_original, vNULL, true);
On Mon, Oct 28, 2013 at 9:49 PM, Xinliang David Li <davidxl@google.com> wrote: > Is it sufficient to check if the final caller is defined in primary module? This might not be sufficient because the final caller may come from comdat of aux-modules (not defined in the primary module). > > Note that in some cases, doing profile update 'speculatively' (without > your fix) can be more precise (assuming the inlining gets materialized > in a different compilation), but in general it might be better to be > conservative in profile update (as in your patch) -- the downside is > you may end up with less aggressive size optimization. This is actually no speculatively update profile, but double update. E.g. comdat_foo()-->bar() with count 100 and comdat_foo() has been defined in 2 aux-modules. Then there will be two edges from comdat_foo()-->bar(), each of which has the count 100. But bar()'s entry count is only 100 (assume comdat_foo is the only caller). Then if we update bar() twice when inline these two edges, the second update will be wrong. Dehao > > David > > On Mon, Oct 28, 2013 at 3:51 PM, Dehao Chen <dehao@google.com> wrote: >> This patch changes to no update callee count if caller count is not a >> resolved node (in LIPO mode) during AutoFDO compilation. This is >> because AutoFDO will have the same edge counts for all unresolved >> nodes. Original update method will lead to multi-update of the callee. >> >> Bootstrapped and testing on going. >> >> OK for google-4_8 if test is OK? >> >> Thanks, >> Dehao >> >> Index: gcc/ipa-inline-transform.c >> =================================================================== >> --- gcc/ipa-inline-transform.c (revision 204131) >> +++ gcc/ipa-inline-transform.c (working copy) >> @@ -169,6 +169,15 @@ clone_inlined_nodes (struct cgraph_edge *e, bool d >> else >> { >> struct cgraph_node *n; >> + if (flag_auto_profile && L_IPO_COMP_MODE >> + && cgraph_pre_profiling_inlining_done) >> + { >> + struct cgraph_node *caller = e->caller; >> + if (caller->global.inlined_to) >> + caller = caller->global.inlined_to; >> + if (cgraph_lipo_get_resolved_node (caller->symbol.decl) != caller) >> + update_original = false; >> + } >> n = cgraph_clone_node (e->callee, e->callee->symbol.decl, >> e->count, e->frequency, >> update_original, vNULL, true);
The situation you described is worse -- hopefully it will be addressed in the next version of lipo. The change is ok. David On Tue, Oct 29, 2013 at 10:08 AM, Dehao Chen <dehao@google.com> wrote: > On Mon, Oct 28, 2013 at 9:49 PM, Xinliang David Li <davidxl@google.com> wrote: >> Is it sufficient to check if the final caller is defined in primary module? > > This might not be sufficient because the final caller may come from > comdat of aux-modules (not defined in the primary module). > >> >> Note that in some cases, doing profile update 'speculatively' (without >> your fix) can be more precise (assuming the inlining gets materialized >> in a different compilation), but in general it might be better to be >> conservative in profile update (as in your patch) -- the downside is >> you may end up with less aggressive size optimization. > > This is actually no speculatively update profile, but double update. > > E.g. comdat_foo()-->bar() with count 100 > > and comdat_foo() has been defined in 2 aux-modules. Then there will be > two edges from comdat_foo()-->bar(), each of which has the count 100. > But bar()'s entry count is only 100 (assume comdat_foo is the only > caller). Then if we update bar() twice when inline these two edges, > the second update will be wrong. > > Dehao > >> >> David >> >> On Mon, Oct 28, 2013 at 3:51 PM, Dehao Chen <dehao@google.com> wrote: >>> This patch changes to no update callee count if caller count is not a >>> resolved node (in LIPO mode) during AutoFDO compilation. This is >>> because AutoFDO will have the same edge counts for all unresolved >>> nodes. Original update method will lead to multi-update of the callee. >>> >>> Bootstrapped and testing on going. >>> >>> OK for google-4_8 if test is OK? >>> >>> Thanks, >>> Dehao >>> >>> Index: gcc/ipa-inline-transform.c >>> =================================================================== >>> --- gcc/ipa-inline-transform.c (revision 204131) >>> +++ gcc/ipa-inline-transform.c (working copy) >>> @@ -169,6 +169,15 @@ clone_inlined_nodes (struct cgraph_edge *e, bool d >>> else >>> { >>> struct cgraph_node *n; >>> + if (flag_auto_profile && L_IPO_COMP_MODE >>> + && cgraph_pre_profiling_inlining_done) >>> + { >>> + struct cgraph_node *caller = e->caller; >>> + if (caller->global.inlined_to) >>> + caller = caller->global.inlined_to; >>> + if (cgraph_lipo_get_resolved_node (caller->symbol.decl) != caller) >>> + update_original = false; >>> + } >>> n = cgraph_clone_node (e->callee, e->callee->symbol.decl, >>> e->count, e->frequency, >>> update_original, vNULL, true);
Index: gcc/ipa-inline-transform.c =================================================================== --- gcc/ipa-inline-transform.c (revision 204131) +++ gcc/ipa-inline-transform.c (working copy) @@ -169,6 +169,15 @@ clone_inlined_nodes (struct cgraph_edge *e, bool d else { struct cgraph_node *n; + if (flag_auto_profile && L_IPO_COMP_MODE + && cgraph_pre_profiling_inlining_done) + { + struct cgraph_node *caller = e->caller; + if (caller->global.inlined_to) + caller = caller->global.inlined_to; + if (cgraph_lipo_get_resolved_node (caller->symbol.decl) != caller) + update_original = false; + } n = cgraph_clone_node (e->callee, e->callee->symbol.decl, e->count, e->frequency, update_original, vNULL, true);