diff mbox

Disable -fno-reorder-blocks-and-partition if no -fprofile-use to avoid unnecessary overhead

Message ID CAAe5K+WwdWZyp96ODECAxPsu-nf97akiKzremnhPbiYGS2E_bw@mail.gmail.com
State New
Headers show

Commit Message

Teresa Johnson Sept. 24, 2015, 5:16 p.m. UTC
This patch unsets -freorder-blocks-and-partition when -fprofile-use
is not specified. Function splitting was not actually being performed
in that case, as probably_never_executed_bb_p does not distinguish
any basic blocks as being cold vs hot when there is no profile data.

Leaving it enabled, however, causes the assembly code generator to create
(empty) cold sections and labels, leading to unnecessary size overhead.

Bootstrapped and tested on x86-64-unknown-linux-gnu. Ok for trunk?

Thanks,
Teresa

2015-09-24  Teresa Johnson  <tejohnson@google.com>

        * opts.c (finish_options): Unset -freorder-blocks-and-partition
        if not using profile.

Comments

Andrew Pinski Sept. 24, 2015, 5:22 p.m. UTC | #1
> On Sep 24, 2015, at 10:16 AM, Teresa Johnson <tejohnson@google.com> wrote:
> 
> This patch unsets -freorder-blocks-and-partition when -fprofile-use
> is not specified. Function splitting was not actually being performed
> in that case, as probably_never_executed_bb_p does not distinguish
> any basic blocks as being cold vs hot when there is no profile data.
> 
> Leaving it enabled, however, causes the assembly code generator to create
> (empty) cold sections and labels, leading to unnecessary size overhead.
> 
> Bootstrapped and tested on x86-64-unknown-linux-gnu. Ok for trunk?

This might be ok for now but there is a notion to enable it for non profile case. 

Thanks,
Andrew




> 
> Thanks,
> Teresa
> 
> 2015-09-24  Teresa Johnson  <tejohnson@google.com>
> 
>        * opts.c (finish_options): Unset -freorder-blocks-and-partition
>        if not using profile.
> 
> Index: opts.c
> ===================================================================
> --- opts.c      (revision 228062)
> +++ opts.c      (working copy)
> @@ -821,7 +821,17 @@ finish_options (struct gcc_options *opts, struct g
>       opts->x_flag_reorder_blocks = 1;
>     }
> 
> +  /* Disable -freorder-blocks-and-partition when -fprofile-use is not in
> +     effect. Function splitting was not actually being performed in that case,
> +     as probably_never_executed_bb_p does not distinguish any basic blocks as
> +     being cold vs hot when there is no profile data. Leaving it enabled,
> +     however, causes the assembly code generator to create (empty) cold
> +     sections and labels, leading to unnecessary size overhead.  */
>   if (opts->x_flag_reorder_blocks_and_partition
> +      && !opts_set->x_flag_profile_use)
> +    opts->x_flag_reorder_blocks_and_partition = 0;
> +
> +  if (opts->x_flag_reorder_blocks_and_partition
>       && !opts_set->x_flag_reorder_functions)
>     opts->x_flag_reorder_functions = 1;
> 
> 
> -- 
> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413
Bernd Schmidt Sept. 25, 2015, 1:58 p.m. UTC | #2
On 09/24/2015 07:16 PM, Teresa Johnson wrote:
> This patch unsets -freorder-blocks-and-partition when -fprofile-use
> is not specified. Function splitting was not actually being performed
> in that case, as probably_never_executed_bb_p does not distinguish
> any basic blocks as being cold vs hot when there is no profile data.
>
> Leaving it enabled, however, causes the assembly code generator to create
> (empty) cold sections and labels, leading to unnecessary size overhead.
>
> Bootstrapped and tested on x86-64-unknown-linux-gnu. Ok for trunk?
>
> Thanks,
> Teresa
>
> 2015-09-24  Teresa Johnson  <tejohnson@google.com>
>
>          * opts.c (finish_options): Unset -freorder-blocks-and-partition
>          if not using profile.

Hmm, I'd noticed I was getting that enabled by default. It looks like 
you added this default with:

2013-11-19  Teresa Johnson  <tejohnson@google.com>

        * common/config/i386/i386-common.c: Enable
        -freorder-blocks-and-partition at -O2 and up for x86.
        * doc/invoke.texi: Update -freorder-blocks-and-partition default.
        * opts.c (finish_options): Only warn if -freorder-blocks-and-
        partition was set on command line.

(Note that this ChangeLog entry should have mentioned 
ix86_option_optimization_table as the variable that was changed).

What has changed between then and now? Also, do we not use 
estimates/heuristics when not using a profile?


Bernd
Teresa Johnson Sept. 25, 2015, 2:14 p.m. UTC | #3
On Fri, Sep 25, 2015 at 6:58 AM, Bernd Schmidt <bschmidt@redhat.com> wrote:
> On 09/24/2015 07:16 PM, Teresa Johnson wrote:
>>
>> This patch unsets -freorder-blocks-and-partition when -fprofile-use
>> is not specified. Function splitting was not actually being performed
>> in that case, as probably_never_executed_bb_p does not distinguish
>> any basic blocks as being cold vs hot when there is no profile data.
>>
>> Leaving it enabled, however, causes the assembly code generator to create
>> (empty) cold sections and labels, leading to unnecessary size overhead.
>>
>> Bootstrapped and tested on x86-64-unknown-linux-gnu. Ok for trunk?
>>
>> Thanks,
>> Teresa
>>
>> 2015-09-24  Teresa Johnson  <tejohnson@google.com>
>>
>>          * opts.c (finish_options): Unset -freorder-blocks-and-partition
>>          if not using profile.
>
>
> Hmm, I'd noticed I was getting that enabled by default. It looks like you
> added this default with:
>
> 2013-11-19  Teresa Johnson  <tejohnson@google.com>
>
>        * common/config/i386/i386-common.c: Enable
>        -freorder-blocks-and-partition at -O2 and up for x86.
>        * doc/invoke.texi: Update -freorder-blocks-and-partition default.
>        * opts.c (finish_options): Only warn if -freorder-blocks-and-
>        partition was set on command line.
>
> (Note that this ChangeLog entry should have mentioned
> ix86_option_optimization_table as the variable that was changed).

Yeah, looks like I accidentally left the function name out of the
i386-common.c entry. I can fix the ChangeLog entry.

>
> What has changed between then and now? Also, do we not use
> estimates/heuristics when not using a profile?

Nothing has changed - splitting effectively never kicked in without a
profile. Honza and I had discussed the idea that static profile
heuristics could eventually be used to distinguish hot vs cold bbs,
but that hasn't happened. What I didn't notice until recently was the
size increase in the .o files from varasm adding in unnecessary
sections and labels when this option was on. Unless and until static
heuristics are used to distinguish cold bbs in
probably_never_executed_bb_p, I don't think it makes sense to do
anything finer grained that just disabling the option.

Teresa

>
>
> Bernd
Bernd Schmidt Sept. 25, 2015, 2:16 p.m. UTC | #4
On 09/25/2015 04:14 PM, Teresa Johnson wrote:
> Nothing has changed - splitting effectively never kicked in without a
> profile. Honza and I had discussed the idea that static profile
> heuristics could eventually be used to distinguish hot vs cold bbs,
> but that hasn't happened. What I didn't notice until recently was the
> size increase in the .o files from varasm adding in unnecessary
> sections and labels when this option was on. Unless and until static
> heuristics are used to distinguish cold bbs in
> probably_never_executed_bb_p, I don't think it makes sense to do
> anything finer grained that just disabling the option.

Thanks for the explanation. Your patch is ok then.


Bernd
Jan Hubicka Sept. 25, 2015, 2:34 p.m. UTC | #5
> > What has changed between then and now? Also, do we not use
> > estimates/heuristics when not using a profile?
> 
> Nothing has changed - splitting effectively never kicked in without a
> profile. Honza and I had discussed the idea that static profile
> heuristics could eventually be used to distinguish hot vs cold bbs,

Yep, the basic idea was to move EH clenaups to the cold section for start.  The
cleanup code can get surprisingly large for some C++ apps.

> but that hasn't happened. What I didn't notice until recently was the
> size increase in the .o files from varasm adding in unnecessary
> sections and labels when this option was on. Unless and until static

Perhaps we also may just teach varasm to not output those when function is not
split.  I am happy with the patch as it is because it is pointless to run the
code when no splitting happens.

Honza
> heuristics are used to distinguish cold bbs in
> probably_never_executed_bb_p, I don't think it makes sense to do
> anything finer grained that just disabling the option.
> 
> Teresa
> 
> >
> >
> > Bernd
> 
> 
> 
> -- 
> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413
Teresa Johnson Sept. 25, 2015, 4:56 p.m. UTC | #6
On Fri, Sep 25, 2015 at 7:34 AM, Jan Hubicka <hubicka@ucw.cz> wrote:
>> > What has changed between then and now? Also, do we not use
>> > estimates/heuristics when not using a profile?
>>
>> Nothing has changed - splitting effectively never kicked in without a
>> profile. Honza and I had discussed the idea that static profile
>> heuristics could eventually be used to distinguish hot vs cold bbs,
>
> Yep, the basic idea was to move EH clenaups to the cold section for start.  The
> cleanup code can get surprisingly large for some C++ apps.
>
>> but that hasn't happened. What I didn't notice until recently was the
>> size increase in the .o files from varasm adding in unnecessary
>> sections and labels when this option was on. Unless and until static
>
> Perhaps we also may just teach varasm to not output those when function is not
> split.  I am happy with the patch as it is because it is pointless to run the
> code when no splitting happens.

Right, that will need to happen if splitting is tuned for static
frequencies. For now I committed this patch.

I also fixed the old change log entry that Bernd pointed out.

Thanks,
Teresa

>
> Honza
>> heuristics are used to distinguish cold bbs in
>> probably_never_executed_bb_p, I don't think it makes sense to do
>> anything finer grained that just disabling the option.
>>
>> Teresa
>>
>> >
>> >
>> > Bernd
>>
>>
>>
>> --
>> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413
Andi Kleen Sept. 25, 2015, 4:57 p.m. UTC | #7
Teresa Johnson <tejohnson@google.com> writes:

> This patch unsets -freorder-blocks-and-partition when -fprofile-use
> is not specified. Function splitting was not actually being performed
> in that case, as probably_never_executed_bb_p does not distinguish
> any basic blocks as being cold vs hot when there is no profile data.

Actually I'm experimenting with a patch to fix that by allowing
function splitting even without profile feed back. See PR66890
which has the patch. I would prefer to keep and fix it.

-Andi
Teresa Johnson Sept. 25, 2015, 4:58 p.m. UTC | #8
Woops, we crossed wires. I just committed this patch. Would you like
me to revert it?

Teresa

On Fri, Sep 25, 2015 at 9:57 AM, Andi Kleen <andi@firstfloor.org> wrote:
> Teresa Johnson <tejohnson@google.com> writes:
>
>> This patch unsets -freorder-blocks-and-partition when -fprofile-use
>> is not specified. Function splitting was not actually being performed
>> in that case, as probably_never_executed_bb_p does not distinguish
>> any basic blocks as being cold vs hot when there is no profile data.
>
> Actually I'm experimenting with a patch to fix that by allowing
> function splitting even without profile feed back. See PR66890
> which has the patch. I would prefer to keep and fix it.
>
> -Andi
>
>
> --
> ak@linux.intel.com -- Speaking for myself only
Jeff Law Sept. 25, 2015, 5:01 p.m. UTC | #9
On 09/25/2015 10:58 AM, Teresa Johnson wrote:
> Woops, we crossed wires. I just committed this patch. Would you like
> me to revert it?
Leave it.  If Andi can include a reversion if he can pound his work 
around 66890 into submission.  But I think it'd need some of the 
varasm.c work Jan hinted at.

jeff
Andi Kleen Sept. 25, 2015, 5:59 p.m. UTC | #10
On Fri, Sep 25, 2015 at 11:01:47AM -0600, Jeff Law wrote:
> On 09/25/2015 10:58 AM, Teresa Johnson wrote:
> >Woops, we crossed wires. I just committed this patch. Would you like
> >me to revert it?
> Leave it.  If Andi can include a reversion if he can pound his work
> around 66890 into submission. 

The patch is stable, was just gathering more data.

> But I think it'd need some of the
> varasm.c work Jan hinted at.

The varasm work should be only needed if no function splitting is done,
right? With my patch most functions do function splitting.

-Andi
Bernhard Reutner-Fischer Sept. 26, 2015, 8:52 p.m. UTC | #11
On September 25, 2015 7:59:07 PM GMT+02:00, Andi Kleen <andi@firstfloor.org> wrote:
>On Fri, Sep 25, 2015 at 11:01:47AM -0600, Jeff Law wrote:
>> On 09/25/2015 10:58 AM, Teresa Johnson wrote:
>> >Woops, we crossed wires. I just committed this patch. Would you like
>> >me to revert it?
>> Leave it.  If Andi can include a reversion if he can pound his work
>> around 66890 into submission. 
>
>The patch is stable, was just gathering more data.

+@item partition-cold-min-freq 
+When using doing function partitioning and there is no profile information 
+consider edges below this frequency cold. Setting to zero disables 
+any function splitting without profile information.

Trouble parsing "using doing", nice hack otherwise.
It's obviously a pity Google does its own thing and doesn't really push a balance between auto and such simple improvements. /me likes yours fwiw..

Cheers,

>
>> But I think it'd need some of the
>> varasm.c work Jan hinted at.
>
>The varasm work should be only needed if no function splitting is done,
>right? With my patch most functions do function splitting.
>
>-Andi
Andi Kleen Sept. 26, 2015, 10:02 p.m. UTC | #12
> +@item partition-cold-min-freq 
> +When using doing function partitioning and there is no profile information 
> +consider edges below this frequency cold. Setting to zero disables 
> +any function splitting without profile information.
> 
> Trouble parsing "using doing", nice hack otherwise.

Thanks. I removed the extra word.

Any additional testing welcome.

-Andi
diff mbox

Patch

Index: opts.c
===================================================================
--- opts.c      (revision 228062)
+++ opts.c      (working copy)
@@ -821,7 +821,17 @@  finish_options (struct gcc_options *opts, struct g
       opts->x_flag_reorder_blocks = 1;
     }

+  /* Disable -freorder-blocks-and-partition when -fprofile-use is not in
+     effect. Function splitting was not actually being performed in that case,
+     as probably_never_executed_bb_p does not distinguish any basic blocks as
+     being cold vs hot when there is no profile data. Leaving it enabled,
+     however, causes the assembly code generator to create (empty) cold
+     sections and labels, leading to unnecessary size overhead.  */
   if (opts->x_flag_reorder_blocks_and_partition
+      && !opts_set->x_flag_profile_use)
+    opts->x_flag_reorder_blocks_and_partition = 0;
+
+  if (opts->x_flag_reorder_blocks_and_partition
       && !opts_set->x_flag_reorder_functions)
     opts->x_flag_reorder_functions = 1;