Message ID | CAAe5K+WwdWZyp96ODECAxPsu-nf97akiKzremnhPbiYGS2E_bw@mail.gmail.com |
---|---|
State | New |
Headers | show |
> On Sep 24, 2015, at 10:16 AM, Teresa Johnson <tejohnson@google.com> wrote: > > This patch unsets -freorder-blocks-and-partition when -fprofile-use > is not specified. Function splitting was not actually being performed > in that case, as probably_never_executed_bb_p does not distinguish > any basic blocks as being cold vs hot when there is no profile data. > > Leaving it enabled, however, causes the assembly code generator to create > (empty) cold sections and labels, leading to unnecessary size overhead. > > Bootstrapped and tested on x86-64-unknown-linux-gnu. Ok for trunk? This might be ok for now but there is a notion to enable it for non profile case. Thanks, Andrew > > Thanks, > Teresa > > 2015-09-24 Teresa Johnson <tejohnson@google.com> > > * opts.c (finish_options): Unset -freorder-blocks-and-partition > if not using profile. > > Index: opts.c > =================================================================== > --- opts.c (revision 228062) > +++ opts.c (working copy) > @@ -821,7 +821,17 @@ finish_options (struct gcc_options *opts, struct g > opts->x_flag_reorder_blocks = 1; > } > > + /* Disable -freorder-blocks-and-partition when -fprofile-use is not in > + effect. Function splitting was not actually being performed in that case, > + as probably_never_executed_bb_p does not distinguish any basic blocks as > + being cold vs hot when there is no profile data. Leaving it enabled, > + however, causes the assembly code generator to create (empty) cold > + sections and labels, leading to unnecessary size overhead. */ > if (opts->x_flag_reorder_blocks_and_partition > + && !opts_set->x_flag_profile_use) > + opts->x_flag_reorder_blocks_and_partition = 0; > + > + if (opts->x_flag_reorder_blocks_and_partition > && !opts_set->x_flag_reorder_functions) > opts->x_flag_reorder_functions = 1; > > > -- > Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413
On 09/24/2015 07:16 PM, Teresa Johnson wrote: > This patch unsets -freorder-blocks-and-partition when -fprofile-use > is not specified. Function splitting was not actually being performed > in that case, as probably_never_executed_bb_p does not distinguish > any basic blocks as being cold vs hot when there is no profile data. > > Leaving it enabled, however, causes the assembly code generator to create > (empty) cold sections and labels, leading to unnecessary size overhead. > > Bootstrapped and tested on x86-64-unknown-linux-gnu. Ok for trunk? > > Thanks, > Teresa > > 2015-09-24 Teresa Johnson <tejohnson@google.com> > > * opts.c (finish_options): Unset -freorder-blocks-and-partition > if not using profile. Hmm, I'd noticed I was getting that enabled by default. It looks like you added this default with: 2013-11-19 Teresa Johnson <tejohnson@google.com> * common/config/i386/i386-common.c: Enable -freorder-blocks-and-partition at -O2 and up for x86. * doc/invoke.texi: Update -freorder-blocks-and-partition default. * opts.c (finish_options): Only warn if -freorder-blocks-and- partition was set on command line. (Note that this ChangeLog entry should have mentioned ix86_option_optimization_table as the variable that was changed). What has changed between then and now? Also, do we not use estimates/heuristics when not using a profile? Bernd
On Fri, Sep 25, 2015 at 6:58 AM, Bernd Schmidt <bschmidt@redhat.com> wrote: > On 09/24/2015 07:16 PM, Teresa Johnson wrote: >> >> This patch unsets -freorder-blocks-and-partition when -fprofile-use >> is not specified. Function splitting was not actually being performed >> in that case, as probably_never_executed_bb_p does not distinguish >> any basic blocks as being cold vs hot when there is no profile data. >> >> Leaving it enabled, however, causes the assembly code generator to create >> (empty) cold sections and labels, leading to unnecessary size overhead. >> >> Bootstrapped and tested on x86-64-unknown-linux-gnu. Ok for trunk? >> >> Thanks, >> Teresa >> >> 2015-09-24 Teresa Johnson <tejohnson@google.com> >> >> * opts.c (finish_options): Unset -freorder-blocks-and-partition >> if not using profile. > > > Hmm, I'd noticed I was getting that enabled by default. It looks like you > added this default with: > > 2013-11-19 Teresa Johnson <tejohnson@google.com> > > * common/config/i386/i386-common.c: Enable > -freorder-blocks-and-partition at -O2 and up for x86. > * doc/invoke.texi: Update -freorder-blocks-and-partition default. > * opts.c (finish_options): Only warn if -freorder-blocks-and- > partition was set on command line. > > (Note that this ChangeLog entry should have mentioned > ix86_option_optimization_table as the variable that was changed). Yeah, looks like I accidentally left the function name out of the i386-common.c entry. I can fix the ChangeLog entry. > > What has changed between then and now? Also, do we not use > estimates/heuristics when not using a profile? Nothing has changed - splitting effectively never kicked in without a profile. Honza and I had discussed the idea that static profile heuristics could eventually be used to distinguish hot vs cold bbs, but that hasn't happened. What I didn't notice until recently was the size increase in the .o files from varasm adding in unnecessary sections and labels when this option was on. Unless and until static heuristics are used to distinguish cold bbs in probably_never_executed_bb_p, I don't think it makes sense to do anything finer grained that just disabling the option. Teresa > > > Bernd
On 09/25/2015 04:14 PM, Teresa Johnson wrote: > Nothing has changed - splitting effectively never kicked in without a > profile. Honza and I had discussed the idea that static profile > heuristics could eventually be used to distinguish hot vs cold bbs, > but that hasn't happened. What I didn't notice until recently was the > size increase in the .o files from varasm adding in unnecessary > sections and labels when this option was on. Unless and until static > heuristics are used to distinguish cold bbs in > probably_never_executed_bb_p, I don't think it makes sense to do > anything finer grained that just disabling the option. Thanks for the explanation. Your patch is ok then. Bernd
> > What has changed between then and now? Also, do we not use > > estimates/heuristics when not using a profile? > > Nothing has changed - splitting effectively never kicked in without a > profile. Honza and I had discussed the idea that static profile > heuristics could eventually be used to distinguish hot vs cold bbs, Yep, the basic idea was to move EH clenaups to the cold section for start. The cleanup code can get surprisingly large for some C++ apps. > but that hasn't happened. What I didn't notice until recently was the > size increase in the .o files from varasm adding in unnecessary > sections and labels when this option was on. Unless and until static Perhaps we also may just teach varasm to not output those when function is not split. I am happy with the patch as it is because it is pointless to run the code when no splitting happens. Honza > heuristics are used to distinguish cold bbs in > probably_never_executed_bb_p, I don't think it makes sense to do > anything finer grained that just disabling the option. > > Teresa > > > > > > > Bernd > > > > -- > Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413
On Fri, Sep 25, 2015 at 7:34 AM, Jan Hubicka <hubicka@ucw.cz> wrote: >> > What has changed between then and now? Also, do we not use >> > estimates/heuristics when not using a profile? >> >> Nothing has changed - splitting effectively never kicked in without a >> profile. Honza and I had discussed the idea that static profile >> heuristics could eventually be used to distinguish hot vs cold bbs, > > Yep, the basic idea was to move EH clenaups to the cold section for start. The > cleanup code can get surprisingly large for some C++ apps. > >> but that hasn't happened. What I didn't notice until recently was the >> size increase in the .o files from varasm adding in unnecessary >> sections and labels when this option was on. Unless and until static > > Perhaps we also may just teach varasm to not output those when function is not > split. I am happy with the patch as it is because it is pointless to run the > code when no splitting happens. Right, that will need to happen if splitting is tuned for static frequencies. For now I committed this patch. I also fixed the old change log entry that Bernd pointed out. Thanks, Teresa > > Honza >> heuristics are used to distinguish cold bbs in >> probably_never_executed_bb_p, I don't think it makes sense to do >> anything finer grained that just disabling the option. >> >> Teresa >> >> > >> > >> > Bernd >> >> >> >> -- >> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413
Teresa Johnson <tejohnson@google.com> writes: > This patch unsets -freorder-blocks-and-partition when -fprofile-use > is not specified. Function splitting was not actually being performed > in that case, as probably_never_executed_bb_p does not distinguish > any basic blocks as being cold vs hot when there is no profile data. Actually I'm experimenting with a patch to fix that by allowing function splitting even without profile feed back. See PR66890 which has the patch. I would prefer to keep and fix it. -Andi
Woops, we crossed wires. I just committed this patch. Would you like me to revert it? Teresa On Fri, Sep 25, 2015 at 9:57 AM, Andi Kleen <andi@firstfloor.org> wrote: > Teresa Johnson <tejohnson@google.com> writes: > >> This patch unsets -freorder-blocks-and-partition when -fprofile-use >> is not specified. Function splitting was not actually being performed >> in that case, as probably_never_executed_bb_p does not distinguish >> any basic blocks as being cold vs hot when there is no profile data. > > Actually I'm experimenting with a patch to fix that by allowing > function splitting even without profile feed back. See PR66890 > which has the patch. I would prefer to keep and fix it. > > -Andi > > > -- > ak@linux.intel.com -- Speaking for myself only
On 09/25/2015 10:58 AM, Teresa Johnson wrote: > Woops, we crossed wires. I just committed this patch. Would you like > me to revert it? Leave it. If Andi can include a reversion if he can pound his work around 66890 into submission. But I think it'd need some of the varasm.c work Jan hinted at. jeff
On Fri, Sep 25, 2015 at 11:01:47AM -0600, Jeff Law wrote: > On 09/25/2015 10:58 AM, Teresa Johnson wrote: > >Woops, we crossed wires. I just committed this patch. Would you like > >me to revert it? > Leave it. If Andi can include a reversion if he can pound his work > around 66890 into submission. The patch is stable, was just gathering more data. > But I think it'd need some of the > varasm.c work Jan hinted at. The varasm work should be only needed if no function splitting is done, right? With my patch most functions do function splitting. -Andi
On September 25, 2015 7:59:07 PM GMT+02:00, Andi Kleen <andi@firstfloor.org> wrote: >On Fri, Sep 25, 2015 at 11:01:47AM -0600, Jeff Law wrote: >> On 09/25/2015 10:58 AM, Teresa Johnson wrote: >> >Woops, we crossed wires. I just committed this patch. Would you like >> >me to revert it? >> Leave it. If Andi can include a reversion if he can pound his work >> around 66890 into submission. > >The patch is stable, was just gathering more data. +@item partition-cold-min-freq +When using doing function partitioning and there is no profile information +consider edges below this frequency cold. Setting to zero disables +any function splitting without profile information. Trouble parsing "using doing", nice hack otherwise. It's obviously a pity Google does its own thing and doesn't really push a balance between auto and such simple improvements. /me likes yours fwiw.. Cheers, > >> But I think it'd need some of the >> varasm.c work Jan hinted at. > >The varasm work should be only needed if no function splitting is done, >right? With my patch most functions do function splitting. > >-Andi
> +@item partition-cold-min-freq > +When using doing function partitioning and there is no profile information > +consider edges below this frequency cold. Setting to zero disables > +any function splitting without profile information. > > Trouble parsing "using doing", nice hack otherwise. Thanks. I removed the extra word. Any additional testing welcome. -Andi
Index: opts.c =================================================================== --- opts.c (revision 228062) +++ opts.c (working copy) @@ -821,7 +821,17 @@ finish_options (struct gcc_options *opts, struct g opts->x_flag_reorder_blocks = 1; } + /* Disable -freorder-blocks-and-partition when -fprofile-use is not in + effect. Function splitting was not actually being performed in that case, + as probably_never_executed_bb_p does not distinguish any basic blocks as + being cold vs hot when there is no profile data. Leaving it enabled, + however, causes the assembly code generator to create (empty) cold + sections and labels, leading to unnecessary size overhead. */ if (opts->x_flag_reorder_blocks_and_partition + && !opts_set->x_flag_profile_use) + opts->x_flag_reorder_blocks_and_partition = 0; + + if (opts->x_flag_reorder_blocks_and_partition && !opts_set->x_flag_reorder_functions) opts->x_flag_reorder_functions = 1;