diff mbox

FDO and source changes

Message ID CAAkRFZ+_sgKGZj07qbkvcw3Emm3Dzmhr0LtLdvnx4nTw2grG=g@mail.gmail.com
State New
Headers show

Commit Message

Xinliang David Li July 16, 2014, 8:32 p.m. UTC
Instrumentation based FDO is designed to work when the source files
that are used to generate the instr binary match exactly with the
sources in profile-use compile. It is known historically that using
stale profile (due to source changes, not gcda format change) can lead
to lots of mismatch warnings and even worse -- compiler ICEs.  This is
due to two reasons:
1) the profile lookup for each function is based on funcdef_no which
can change when function definition order is changed or new functions
are inserted in the middle of a source
2) the indirect call target id may change due to source changes:
before GCC4.9, the id uses cgraph uid which is as bad as funcdef_no.
Attributing wrong IC target to the indirect call site is the main
cause of compiler ICE (we have signature match check, but bad target
can leak through result in problem later). Starting from gcc49, the
indirect target profiling uses profile_id which is stable for public
functions.

This patch introduces a new parameter for FDO to determine whether to
use internal id or assembler name based external id for profile
lookup. When the external id is used, GCC FDO will become very
tolerant to simple source changes.

Note that autoFDO solves this problem but it is currently limited to
Intel platforms with LBR support.

(Tested with SPEC, SPEC06 and large internal benchmarks. No performance impact).

Ok for trunk?

thanks,

David

Comments

Jan Hubicka July 16, 2014, 11:42 p.m. UTC | #1
> Instrumentation based FDO is designed to work when the source files
> that are used to generate the instr binary match exactly with the
> sources in profile-use compile. It is known historically that using
> stale profile (due to source changes, not gcda format change) can lead
> to lots of mismatch warnings and even worse -- compiler ICEs.  This is
> due to two reasons:
> 1) the profile lookup for each function is based on funcdef_no which
> can change when function definition order is changed or new functions
> are inserted in the middle of a source
> 2) the indirect call target id may change due to source changes:
> before GCC4.9, the id uses cgraph uid which is as bad as funcdef_no.
> Attributing wrong IC target to the indirect call site is the main
> cause of compiler ICE (we have signature match check, but bad target
> can leak through result in problem later). Starting from gcc49, the
> indirect target profiling uses profile_id which is stable for public
> functions.

We should not ICE however when the targets gets wrong. There is some basic
type checking on the place, do you have testcase where we still ICE?
> 
> This patch introduces a new parameter for FDO to determine whether to
> use internal id or assembler name based external id for profile
> lookup. When the external id is used, GCC FDO will become very
> tolerant to simple source changes.
> 
> Note that autoFDO solves this problem but it is currently limited to
> Intel platforms with LBR support.
> 
> (Tested with SPEC, SPEC06 and large internal benchmarks. No performance impact).
> 
> Ok for trunk?

I wonder if there are any downsides for using this always?
We still compare checksums so we should warn user that profile is out of date,
so I would consistently switch from funcdef_no to profile_id...

> Index: coverage.c
> ===================================================================
> --- coverage.c	(revision 212682)
> +++ coverage.c	(working copy)
> @@ -54,6 +54,7 @@ along with GCC; see the file COPYING3.
>  #include "intl.h"
>  #include "filenames.h"
>  #include "target.h"
> +#include "params.h"
>  
>  #include "gcov-io.h"
>  #include "gcov-io.c"
> @@ -369,8 +370,10 @@ get_coverage_counts (unsigned counter, u
>                           da_file_name);
>        return NULL;
>      }
> -
> -  elt.ident = current_function_funcdef_no + 1;
> +  if (PARAM_VALUE (PARAM_PROFILE_FUNC_INTERNAL_ID))
> +    elt.ident = current_function_funcdef_no + 1;
> +  else
> +    elt.ident = coverage_compute_profile_id (cgraph_get_node (cfun->decl));
>    elt.ctr = counter;
>    entry = counts_hash->find (&elt);
>    if (!entry || !entry->summary.num)
> @@ -416,7 +419,8 @@ get_coverage_counts (unsigned counter, u
>      }
>    else if (entry->lineno_checksum != lineno_checksum)
>      {
> -      warning (0, "source locations for function %qE have changed,"
> +      warning (OPT_Wcoverage_mismatch,
> +               "source locations for function %qE have changed,"
>  	       " the profile data may be out of date",
>  	       DECL_ASSEMBLER_NAME (current_function_decl));
>      }
> @@ -581,12 +585,13 @@ coverage_compute_profile_id (struct cgra
>      {
>        expanded_location xloc
>  	= expand_location (DECL_SOURCE_LOCATION (n->decl));
> +      bool use_name_only = (PARAM_VALUE (PARAM_PROFILE_FUNC_INTERNAL_ID) == 0);
>  
> -      chksum = xloc.line;
> +      chksum = (use_name_only ? 0 : xloc.line);
>        chksum = coverage_checksum_string (chksum, xloc.file);
>        chksum = coverage_checksum_string
>  	(chksum, IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (n->decl)));
> -      if (first_global_object_name)
> +      if (!use_name_only && first_global_object_name)

I think this will cause troubles with static functions and LTO indirect call
optimization.  We really want to make two static functions with same name to have
different IDs when they come from different units.

I see why you do not like first_global_object_name because changing it would cause
all functions from that unit to drop the profiles. Perhaps we can combine function name
and compilation unit (gcov file) name?

Honza

>  	chksum = coverage_checksum_string
>  	  (chksum, first_global_object_name);
>        chksum = coverage_checksum_string
> @@ -645,7 +650,12 @@ coverage_begin_function (unsigned lineno
>  
>    /* Announce function */
>    offset = gcov_write_tag (GCOV_TAG_FUNCTION);
> -  gcov_write_unsigned (current_function_funcdef_no + 1);
> +  if (PARAM_VALUE (PARAM_PROFILE_FUNC_INTERNAL_ID))
> +    gcov_write_unsigned (current_function_funcdef_no + 1);
> +  else
> +    gcov_write_unsigned (coverage_compute_profile_id (
> +       cgraph_get_node (current_function_decl)));
> +
>    gcov_write_unsigned (lineno_checksum);
>    gcov_write_unsigned (cfg_checksum);
>    gcov_write_string (IDENTIFIER_POINTER
> @@ -682,8 +692,13 @@ coverage_end_function (unsigned lineno_c
>        if (!DECL_EXTERNAL (current_function_decl))
>  	{
>  	  item = ggc_alloc<coverage_data> ();
> -	  
> -	  item->ident = current_function_funcdef_no + 1;
> +
> +          if (PARAM_VALUE (PARAM_PROFILE_FUNC_INTERNAL_ID))
> +	    item->ident = current_function_funcdef_no + 1;
> +          else
> +            item->ident = coverage_compute_profile_id (
> +               cgraph_get_node (cfun->decl));
> +
>  	  item->lineno_checksum = lineno_checksum;
>  	  item->cfg_checksum = cfg_checksum;
>
Xinliang David Li July 17, 2014, 5:13 p.m. UTC | #2
On Wed, Jul 16, 2014 at 4:42 PM, Jan Hubicka <hubicka@ucw.cz> wrote:
>> Instrumentation based FDO is designed to work when the source files
>> that are used to generate the instr binary match exactly with the
>> sources in profile-use compile. It is known historically that using
>> stale profile (due to source changes, not gcda format change) can lead
>> to lots of mismatch warnings and even worse -- compiler ICEs.  This is
>> due to two reasons:
>> 1) the profile lookup for each function is based on funcdef_no which
>> can change when function definition order is changed or new functions
>> are inserted in the middle of a source
>> 2) the indirect call target id may change due to source changes:
>> before GCC4.9, the id uses cgraph uid which is as bad as funcdef_no.
>> Attributing wrong IC target to the indirect call site is the main
>> cause of compiler ICE (we have signature match check, but bad target
>> can leak through result in problem later). Starting from gcc49, the
>> indirect target profiling uses profile_id which is stable for public
>> functions.
>
> We should not ICE however when the targets gets wrong. There is some basic
> type checking on the place, do you have testcase where we still ICE?

I don't have test cases at hand (when it happened, it was not
considered important to fix due to the stale profile used by the
user). The basic check may have some holes - e.g. related to vararg
functions.


>>
>> This patch introduces a new parameter for FDO to determine whether to
>> use internal id or assembler name based external id for profile
>> lookup. When the external id is used, GCC FDO will become very
>> tolerant to simple source changes.
>>
>> Note that autoFDO solves this problem but it is currently limited to
>> Intel platforms with LBR support.
>>
>> (Tested with SPEC, SPEC06 and large internal benchmarks. No performance impact).
>>
>> Ok for trunk?
>
> I wonder if there are any downsides for using this always?
> We still compare checksums so we should warn user that profile is out of date,
> so I would consistently switch from funcdef_no to profile_id...

I wonder about the same. The tests show no downside.

>
>> Index: coverage.c
>> ===================================================================
>> --- coverage.c        (revision 212682)
>> +++ coverage.c        (working copy)
>> @@ -54,6 +54,7 @@ along with GCC; see the file COPYING3.
>>  #include "intl.h"
>>  #include "filenames.h"
>>  #include "target.h"
>> +#include "params.h"
>>
>>  #include "gcov-io.h"
>>  #include "gcov-io.c"
>> @@ -369,8 +370,10 @@ get_coverage_counts (unsigned counter, u
>>                           da_file_name);
>>        return NULL;
>>      }
>> -
>> -  elt.ident = current_function_funcdef_no + 1;
>> +  if (PARAM_VALUE (PARAM_PROFILE_FUNC_INTERNAL_ID))
>> +    elt.ident = current_function_funcdef_no + 1;
>> +  else
>> +    elt.ident = coverage_compute_profile_id (cgraph_get_node (cfun->decl));
>>    elt.ctr = counter;
>>    entry = counts_hash->find (&elt);
>>    if (!entry || !entry->summary.num)
>> @@ -416,7 +419,8 @@ get_coverage_counts (unsigned counter, u
>>      }
>>    else if (entry->lineno_checksum != lineno_checksum)
>>      {
>> -      warning (0, "source locations for function %qE have changed,"
>> +      warning (OPT_Wcoverage_mismatch,
>> +               "source locations for function %qE have changed,"
>>              " the profile data may be out of date",
>>              DECL_ASSEMBLER_NAME (current_function_decl));
>>      }
>> @@ -581,12 +585,13 @@ coverage_compute_profile_id (struct cgra
>>      {
>>        expanded_location xloc
>>       = expand_location (DECL_SOURCE_LOCATION (n->decl));
>> +      bool use_name_only = (PARAM_VALUE (PARAM_PROFILE_FUNC_INTERNAL_ID) == 0);
>>
>> -      chksum = xloc.line;
>> +      chksum = (use_name_only ? 0 : xloc.line);
>>        chksum = coverage_checksum_string (chksum, xloc.file);
>>        chksum = coverage_checksum_string
>>       (chksum, IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (n->decl)));
>> -      if (first_global_object_name)
>> +      if (!use_name_only && first_global_object_name)
>
> I think this will cause troubles with static functions and LTO indirect call
> optimization.  We really want to make two static functions with same name to have
> different IDs when they come from different units.
>
> I see why you do not like first_global_object_name because changing it would cause
> all functions from that unit to drop the profiles. Perhaps we can combine function name
> and compilation unit (gcov file) name?

that is a good idea -- it will also solve the LTO problem you mentioned above.

Will update the patch.

David

>
> Honza
>
>>       chksum = coverage_checksum_string
>>         (chksum, first_global_object_name);
>>        chksum = coverage_checksum_string
>> @@ -645,7 +650,12 @@ coverage_begin_function (unsigned lineno
>>
>>    /* Announce function */
>>    offset = gcov_write_tag (GCOV_TAG_FUNCTION);
>> -  gcov_write_unsigned (current_function_funcdef_no + 1);
>> +  if (PARAM_VALUE (PARAM_PROFILE_FUNC_INTERNAL_ID))
>> +    gcov_write_unsigned (current_function_funcdef_no + 1);
>> +  else
>> +    gcov_write_unsigned (coverage_compute_profile_id (
>> +       cgraph_get_node (current_function_decl)));
>> +
>>    gcov_write_unsigned (lineno_checksum);
>>    gcov_write_unsigned (cfg_checksum);
>>    gcov_write_string (IDENTIFIER_POINTER
>> @@ -682,8 +692,13 @@ coverage_end_function (unsigned lineno_c
>>        if (!DECL_EXTERNAL (current_function_decl))
>>       {
>>         item = ggc_alloc<coverage_data> ();
>> -
>> -       item->ident = current_function_funcdef_no + 1;
>> +
>> +          if (PARAM_VALUE (PARAM_PROFILE_FUNC_INTERNAL_ID))
>> +         item->ident = current_function_funcdef_no + 1;
>> +          else
>> +            item->ident = coverage_compute_profile_id (
>> +               cgraph_get_node (cfun->decl));
>> +
>>         item->lineno_checksum = lineno_checksum;
>>         item->cfg_checksum = cfg_checksum;
>>
>
Xinliang David Li July 17, 2014, 5:17 p.m. UTC | #3
>>
>> I see why you do not like first_global_object_name because changing it would cause
>> all functions from that unit to drop the profiles. Perhaps we can combine function name
>> and compilation unit (gcov file) name?
>
> that is a good idea -- it will also solve the LTO problem you mentioned above.
>
> Will update the patch.

It already does this (similarly):

    chksum = coverage_checksum_string
        (chksum, aux_base_name);


The static function defined in the same header will have different
'aux_base_name' depending on the including module.

David


>
> David
>
>>
>> Honza
>>
>>>       chksum = coverage_checksum_string
>>>         (chksum, first_global_object_name);
>>>        chksum = coverage_checksum_string
>>> @@ -645,7 +650,12 @@ coverage_begin_function (unsigned lineno
>>>
>>>    /* Announce function */
>>>    offset = gcov_write_tag (GCOV_TAG_FUNCTION);
>>> -  gcov_write_unsigned (current_function_funcdef_no + 1);
>>> +  if (PARAM_VALUE (PARAM_PROFILE_FUNC_INTERNAL_ID))
>>> +    gcov_write_unsigned (current_function_funcdef_no + 1);
>>> +  else
>>> +    gcov_write_unsigned (coverage_compute_profile_id (
>>> +       cgraph_get_node (current_function_decl)));
>>> +
>>>    gcov_write_unsigned (lineno_checksum);
>>>    gcov_write_unsigned (cfg_checksum);
>>>    gcov_write_string (IDENTIFIER_POINTER
>>> @@ -682,8 +692,13 @@ coverage_end_function (unsigned lineno_c
>>>        if (!DECL_EXTERNAL (current_function_decl))
>>>       {
>>>         item = ggc_alloc<coverage_data> ();
>>> -
>>> -       item->ident = current_function_funcdef_no + 1;
>>> +
>>> +          if (PARAM_VALUE (PARAM_PROFILE_FUNC_INTERNAL_ID))
>>> +         item->ident = current_function_funcdef_no + 1;
>>> +          else
>>> +            item->ident = coverage_compute_profile_id (
>>> +               cgraph_get_node (cfun->decl));
>>> +
>>>         item->lineno_checksum = lineno_checksum;
>>>         item->cfg_checksum = cfg_checksum;
>>>
>>
Xinliang David Li July 18, 2014, 7:16 p.m. UTC | #4
Any other concern of the patch? I don't really like the name of the
parameter myself. Do you have better suggestions?

thanks,

David

On Thu, Jul 17, 2014 at 10:17 AM, Xinliang David Li <davidxl@google.com> wrote:
>>>
>>> I see why you do not like first_global_object_name because changing it would cause
>>> all functions from that unit to drop the profiles. Perhaps we can combine function name
>>> and compilation unit (gcov file) name?
>>
>> that is a good idea -- it will also solve the LTO problem you mentioned above.
>>
>> Will update the patch.
>
> It already does this (similarly):
>
>     chksum = coverage_checksum_string
>         (chksum, aux_base_name);
>
>
> The static function defined in the same header will have different
> 'aux_base_name' depending on the including module.
>
> David
>
>
>>
>> David
>>
>>>
>>> Honza
>>>
>>>>       chksum = coverage_checksum_string
>>>>         (chksum, first_global_object_name);
>>>>        chksum = coverage_checksum_string
>>>> @@ -645,7 +650,12 @@ coverage_begin_function (unsigned lineno
>>>>
>>>>    /* Announce function */
>>>>    offset = gcov_write_tag (GCOV_TAG_FUNCTION);
>>>> -  gcov_write_unsigned (current_function_funcdef_no + 1);
>>>> +  if (PARAM_VALUE (PARAM_PROFILE_FUNC_INTERNAL_ID))
>>>> +    gcov_write_unsigned (current_function_funcdef_no + 1);
>>>> +  else
>>>> +    gcov_write_unsigned (coverage_compute_profile_id (
>>>> +       cgraph_get_node (current_function_decl)));
>>>> +
>>>>    gcov_write_unsigned (lineno_checksum);
>>>>    gcov_write_unsigned (cfg_checksum);
>>>>    gcov_write_string (IDENTIFIER_POINTER
>>>> @@ -682,8 +692,13 @@ coverage_end_function (unsigned lineno_c
>>>>        if (!DECL_EXTERNAL (current_function_decl))
>>>>       {
>>>>         item = ggc_alloc<coverage_data> ();
>>>> -
>>>> -       item->ident = current_function_funcdef_no + 1;
>>>> +
>>>> +          if (PARAM_VALUE (PARAM_PROFILE_FUNC_INTERNAL_ID))
>>>> +         item->ident = current_function_funcdef_no + 1;
>>>> +          else
>>>> +            item->ident = coverage_compute_profile_id (
>>>> +               cgraph_get_node (cfun->decl));
>>>> +
>>>>         item->lineno_checksum = lineno_checksum;
>>>>         item->cfg_checksum = cfg_checksum;
>>>>
>>>
Jeff Law July 23, 2014, 3:14 a.m. UTC | #5
On 07/16/14 14:32, Xinliang David Li wrote:
> Instrumentation based FDO is designed to work when the source files
> that are used to generate the instr binary match exactly with the
> sources in profile-use compile. It is known historically that using
> stale profile (due to source changes, not gcda format change) can lead
> to lots of mismatch warnings and even worse -- compiler ICEs.  This is
> due to two reasons:
> 1) the profile lookup for each function is based on funcdef_no which
> can change when function definition order is changed or new functions
> are inserted in the middle of a source
> 2) the indirect call target id may change due to source changes:
> before GCC4.9, the id uses cgraph uid which is as bad as funcdef_no.
> Attributing wrong IC target to the indirect call site is the main
> cause of compiler ICE (we have signature match check, but bad target
> can leak through result in problem later). Starting from gcc49, the
> indirect target profiling uses profile_id which is stable for public
> functions.
>
> This patch introduces a new parameter for FDO to determine whether to
> use internal id or assembler name based external id for profile
> lookup. When the external id is used, GCC FDO will become very
> tolerant to simple source changes.
>
> Note that autoFDO solves this problem but it is currently limited to
> Intel platforms with LBR support.
>
> (Tested with SPEC, SPEC06 and large internal benchmarks. No performance impact).
>
> Ok for trunk?
Is there a good reason why we would want to ever use the internal id? 
Is it because the internal id shows up in the profile data for 
preexisting files?

Given that we need both, why is this a param vs a regular -f option? 
Shouldn't the default be to use the external id?

BTW, thanks for working on this.  I've certainly got customers that want 
to see the FDO data be more tolerant of changes.

Heff
Xinliang David Li July 23, 2014, 5:42 p.m. UTC | #6
On Tue, Jul 22, 2014 at 8:14 PM, Jeff Law <law@redhat.com> wrote:
> On 07/16/14 14:32, Xinliang David Li wrote:
>>
>> Instrumentation based FDO is designed to work when the source files
>> that are used to generate the instr binary match exactly with the
>> sources in profile-use compile. It is known historically that using
>> stale profile (due to source changes, not gcda format change) can lead
>> to lots of mismatch warnings and even worse -- compiler ICEs.  This is
>> due to two reasons:
>> 1) the profile lookup for each function is based on funcdef_no which
>> can change when function definition order is changed or new functions
>> are inserted in the middle of a source
>> 2) the indirect call target id may change due to source changes:
>> before GCC4.9, the id uses cgraph uid which is as bad as funcdef_no.
>> Attributing wrong IC target to the indirect call site is the main
>> cause of compiler ICE (we have signature match check, but bad target
>> can leak through result in problem later). Starting from gcc49, the
>> indirect target profiling uses profile_id which is stable for public
>> functions.
>>
>> This patch introduces a new parameter for FDO to determine whether to
>> use internal id or assembler name based external id for profile
>> lookup. When the external id is used, GCC FDO will become very
>> tolerant to simple source changes.
>>
>> Note that autoFDO solves this problem but it is currently limited to
>> Intel platforms with LBR support.
>>
>> (Tested with SPEC, SPEC06 and large internal benchmarks. No performance
>> impact).
>>
>> Ok for trunk?
>
> Is there a good reason why we would want to ever use the internal id? Is it
> because the internal id shows up in the profile data for preexisting files?

I don't think existing profile data matter.

For perfect fresh profile, using external id has the chance of
collision. I have tested with a C++ symbol file with about 750k unique
symbol names, using crc32 based id yields 71 collisions --- the rate
is ~0.009%.

>
> Given that we need both, why is this a param vs a regular -f option?
> Shouldn't the default be to use the external id?
>

I am open to both. I have not seen evidence that id collision causes
trouble even though in theory it can.

thanks,

David


> BTW, thanks for working on this.  I've certainly got customers that want to
> see the FDO data be more tolerant of changes.
>
> Heff
>
Yi Yang July 23, 2014, 5:52 p.m. UTC | #7
It's worth noting that merely changing the hash function from crc32 to
something that's 64 bit long is enough to make sure collisions does
not happen. Maybe it's worth the trouble?

On Wed, Jul 23, 2014 at 10:42 AM, Xinliang David Li <davidxl@google.com> wrote:
>
> On Tue, Jul 22, 2014 at 8:14 PM, Jeff Law <law@redhat.com> wrote:
> > On 07/16/14 14:32, Xinliang David Li wrote:
> >>
> >> Instrumentation based FDO is designed to work when the source files
> >> that are used to generate the instr binary match exactly with the
> >> sources in profile-use compile. It is known historically that using
> >> stale profile (due to source changes, not gcda format change) can lead
> >> to lots of mismatch warnings and even worse -- compiler ICEs.  This is
> >> due to two reasons:
> >> 1) the profile lookup for each function is based on funcdef_no which
> >> can change when function definition order is changed or new functions
> >> are inserted in the middle of a source
> >> 2) the indirect call target id may change due to source changes:
> >> before GCC4.9, the id uses cgraph uid which is as bad as funcdef_no.
> >> Attributing wrong IC target to the indirect call site is the main
> >> cause of compiler ICE (we have signature match check, but bad target
> >> can leak through result in problem later). Starting from gcc49, the
> >> indirect target profiling uses profile_id which is stable for public
> >> functions.
> >>
> >> This patch introduces a new parameter for FDO to determine whether to
> >> use internal id or assembler name based external id for profile
> >> lookup. When the external id is used, GCC FDO will become very
> >> tolerant to simple source changes.
> >>
> >> Note that autoFDO solves this problem but it is currently limited to
> >> Intel platforms with LBR support.
> >>
> >> (Tested with SPEC, SPEC06 and large internal benchmarks. No performance
> >> impact).
> >>
> >> Ok for trunk?
> >
> > Is there a good reason why we would want to ever use the internal id? Is it
> > because the internal id shows up in the profile data for preexisting files?
>
> I don't think existing profile data matter.
>
> For perfect fresh profile, using external id has the chance of
> collision. I have tested with a C++ symbol file with about 750k unique
> symbol names, using crc32 based id yields 71 collisions --- the rate
> is ~0.009%.
>
> >
> > Given that we need both, why is this a param vs a regular -f option?
> > Shouldn't the default be to use the external id?
> >
>
> I am open to both. I have not seen evidence that id collision causes
> trouble even though in theory it can.
>
> thanks,
>
> David
>
>
> > BTW, thanks for working on this.  I've certainly got customers that want to
> > see the FDO data be more tolerant of changes.
> >
> > Heff
> >
Xinliang David Li July 23, 2014, 6 p.m. UTC | #8
yes -- I am thinking about this and other better hash functions.
Changing it to 64bit will cause profile format change and increase in
profile data size.

David

On Wed, Jul 23, 2014 at 10:52 AM, Yi Yang <ahyangyi@google.com> wrote:
> It's worth noting that merely changing the hash function from crc32 to
> something that's 64 bit long is enough to make sure collisions does
> not happen. Maybe it's worth the trouble?
>
> On Wed, Jul 23, 2014 at 10:42 AM, Xinliang David Li <davidxl@google.com> wrote:
>>
>> On Tue, Jul 22, 2014 at 8:14 PM, Jeff Law <law@redhat.com> wrote:
>> > On 07/16/14 14:32, Xinliang David Li wrote:
>> >>
>> >> Instrumentation based FDO is designed to work when the source files
>> >> that are used to generate the instr binary match exactly with the
>> >> sources in profile-use compile. It is known historically that using
>> >> stale profile (due to source changes, not gcda format change) can lead
>> >> to lots of mismatch warnings and even worse -- compiler ICEs.  This is
>> >> due to two reasons:
>> >> 1) the profile lookup for each function is based on funcdef_no which
>> >> can change when function definition order is changed or new functions
>> >> are inserted in the middle of a source
>> >> 2) the indirect call target id may change due to source changes:
>> >> before GCC4.9, the id uses cgraph uid which is as bad as funcdef_no.
>> >> Attributing wrong IC target to the indirect call site is the main
>> >> cause of compiler ICE (we have signature match check, but bad target
>> >> can leak through result in problem later). Starting from gcc49, the
>> >> indirect target profiling uses profile_id which is stable for public
>> >> functions.
>> >>
>> >> This patch introduces a new parameter for FDO to determine whether to
>> >> use internal id or assembler name based external id for profile
>> >> lookup. When the external id is used, GCC FDO will become very
>> >> tolerant to simple source changes.
>> >>
>> >> Note that autoFDO solves this problem but it is currently limited to
>> >> Intel platforms with LBR support.
>> >>
>> >> (Tested with SPEC, SPEC06 and large internal benchmarks. No performance
>> >> impact).
>> >>
>> >> Ok for trunk?
>> >
>> > Is there a good reason why we would want to ever use the internal id? Is it
>> > because the internal id shows up in the profile data for preexisting files?
>>
>> I don't think existing profile data matter.
>>
>> For perfect fresh profile, using external id has the chance of
>> collision. I have tested with a C++ symbol file with about 750k unique
>> symbol names, using crc32 based id yields 71 collisions --- the rate
>> is ~0.009%.
>>
>> >
>> > Given that we need both, why is this a param vs a regular -f option?
>> > Shouldn't the default be to use the external id?
>> >
>>
>> I am open to both. I have not seen evidence that id collision causes
>> trouble even though in theory it can.
>>
>> thanks,
>>
>> David
>>
>>
>> > BTW, thanks for working on this.  I've certainly got customers that want to
>> > see the FDO data be more tolerant of changes.
>> >
>> > Heff
>> >
Jan Hubicka July 23, 2014, 6:06 p.m. UTC | #9
> 
> I don't think existing profile data matter.
> 
> For perfect fresh profile, using external id has the chance of
> collision. I have tested with a C++ symbol file with about 750k unique
> symbol names, using crc32 based id yields 71 collisions --- the rate
> is ~0.009%.

init_node_map will resolve profile_id conflicts within single unit, so this is
not causing any troubles (naturally the profile will read wose if one of the
two conflicting symbols disappears, but that is an minor issue).
So i think we want to go for profile_id by default.

Only what I am concerned about is to make C static functions that have same name in
different translation units to not clash in profile_id or indirect call optimization
will be behaving poorly. That is main reason why we mix in global symbol name.

As mentioned in the original review, I think using gcov file name should be good enough...

Honza
> 
> >
> > Given that we need both, why is this a param vs a regular -f option?
> > Shouldn't the default be to use the external id?
> >
> 
> I am open to both. I have not seen evidence that id collision causes
> trouble even though in theory it can.
> 
> thanks,
> 
> David
> 
> 
> > BTW, thanks for working on this.  I've certainly got customers that want to
> > see the FDO data be more tolerant of changes.
> >
> > Heff
> >
diff mbox

Patch

Index: params.def
===================================================================
--- params.def	(revision 212682)
+++ params.def	(working copy)
@@ -869,6 +869,14 @@  DEFPARAM (PARAM_LOOP_INVARIANT_MAX_BBS_I
 	  "Max basic blocks number in loop for loop invariant motion",
 	  10000, 0, 0)
 
+/* When the parameter is 1, use the internal function id
+   to look up for profile data. Otherwise, use a more stable
+   external id based on assembler name and source location. */
+DEFPARAM (PARAM_PROFILE_FUNC_INTERNAL_ID,
+         "profile-func-internal-id",
+         "use internal function id in profile lookup",
+          1, 0, 1)
+
 /* Avoid SLP vectorization of large basic blocks.  */
 DEFPARAM (PARAM_SLP_MAX_INSNS_IN_BB,
           "slp-max-insns-in-bb",
Index: ChangeLog
===================================================================
--- ChangeLog	(revision 212682)
+++ ChangeLog	(working copy)
@@ -1,3 +1,10 @@ 
+2014-07-16  Xinliang David Li  <davidxl@google.com>
+
+	* params.def: New parameter.
+	* coverage.c (get_coverage_counts): Check new flag.
+	(coverage_compute_profile_id): Check new flag.
+	(coverage_begin_function): Check new flag.
+
 2014-07-16  Dodji Seketeli  <dodji@redhat.com>
 
 	Support location tracking for built-in macro tokens
Index: testsuite/g++.dg/tree-prof/reorder.C
===================================================================
--- testsuite/g++.dg/tree-prof/reorder.C	(revision 0)
+++ testsuite/g++.dg/tree-prof/reorder.C	(revision 0)
@@ -0,0 +1,48 @@ 
+/* { dg-options "-O2 -fno-devirtualize --param=profile-func-internal-id=0 -fdump-ipa-profile -Wno-coverage-mismatch -Wno-attributes" } */
+
+#ifdef _PROFILE_USE
+#include "reorder_class1.h"
+#include "reorder_class2.h"
+#else
+#include "reorder_class2.h"
+#include "reorder_class1.h"
+#endif
+
+int g;
+static __attribute__((always_inline))
+void test1 (A *tc)
+{
+  int i;
+  for (i = 0; i < 1000; i++)
+     g += tc->foo(); 
+   if (g<100) g++;
+}
+
+static __attribute__((always_inline))
+void test2 (B *tc)
+{
+  int i;
+  for (i = 0; i < 1000; i++)
+     g += tc->foo();
+}
+
+
+#ifdef _PROFILE_USE
+__attribute__((noinline)) void test_a(A *ap) { test1 (ap); }
+__attribute__((noinline)) void test_b(B *bp) { test2 (bp); }
+#else
+__attribute__((noinline)) void test_b(B *bp) { test2 (bp); }
+__attribute__((noinline)) void test_a(A *ap) { test1 (ap); }
+#endif
+
+int main()
+{
+  A* ap = new A();
+  B* bp = new B();
+
+  test_a(ap);
+  test_b(bp);
+}
+
+/* { dg-final-use { scan-ipa-dump-times "Indirect call -> direct call" 2 "profile" } } */
+
Index: testsuite/g++.dg/tree-prof/tree-prof.exp
===================================================================
--- testsuite/g++.dg/tree-prof/tree-prof.exp	(revision 212682)
+++ testsuite/g++.dg/tree-prof/tree-prof.exp	(working copy)
@@ -42,8 +42,8 @@  set PROFOPT_OPTIONS [list {}]
 # These are globals used by profopt-execute.  The first is options
 # needed to generate profile data, the second is options to use the
 # profile data.
-set profile_option "-fprofile-generate"
-set feedback_option "-fprofile-use"
+set profile_option "-fprofile-generate -D_PROFILE_GENERATE"
+set feedback_option "-fprofile-use -D_PROFILE_USE"
 
 foreach src [lsort [glob -nocomplain $srcdir/$subdir/*.C]] {
     # If we're only testing specific files and this isn't one of them, skip it.
Index: testsuite/g++.dg/tree-prof/reorder_class1.h
===================================================================
--- testsuite/g++.dg/tree-prof/reorder_class1.h	(revision 0)
+++ testsuite/g++.dg/tree-prof/reorder_class1.h	(revision 0)
@@ -0,0 +1,11 @@ 
+struct A {
+  virtual int foo();
+};
+
+int A::foo()
+{
+  return 1;
+}
+
+
+
Index: testsuite/g++.dg/tree-prof/reorder_class2.h
===================================================================
--- testsuite/g++.dg/tree-prof/reorder_class2.h	(revision 0)
+++ testsuite/g++.dg/tree-prof/reorder_class2.h	(revision 0)
@@ -0,0 +1,12 @@ 
+
+struct B {
+  virtual int foo();
+};
+
+int B::foo()
+{
+  return 2;
+}
+
+
+
Index: testsuite/g++.dg/tree-prof/morefunc.C
===================================================================
--- testsuite/g++.dg/tree-prof/morefunc.C	(revision 0)
+++ testsuite/g++.dg/tree-prof/morefunc.C	(revision 0)
@@ -0,0 +1,55 @@ 
+/* { dg-options "-O2 -fno-devirtualize --param=profile-func-internal-id=0 -fdump-ipa-profile -Wno-attributes -Wno-coverage-mismatch" } */
+#include "reorder_class1.h"
+#include "reorder_class2.h"
+
+int g;
+
+#ifdef _PROFILE_USE
+/* Another function not existing
+ * in profile-gen  */
+
+__attribute__((noinline)) void
+new_func (int i)
+{
+   g += i;
+}
+#endif
+
+static __attribute__((always_inline))
+void test1 (A *tc)
+{
+  int i;
+  for (i = 0; i < 1000; i++)
+     g += tc->foo(); 
+   if (g<100) g++;
+}
+
+static __attribute__((always_inline))
+void test2 (B *tc)
+{
+  int i;
+  for (i = 0; i < 1000; i++)
+     g += tc->foo();
+}
+
+
+__attribute__((noinline)) void test_a(A *ap) { test1 (ap); }
+__attribute__((noinline)) void test_b(B *bp) { test2 (bp); }
+
+
+int main()
+{
+  A* ap = new A();
+  B* bp = new B();
+
+  test_a(ap);
+  test_b(bp);
+
+#ifdef _PROFILE_USE
+  new_func(10);
+#endif
+
+}
+
+/* { dg-final-use { scan-ipa-dump-times "Indirect call -> direct call" 2 "profile" } } */
+
Index: testsuite/ChangeLog
===================================================================
--- testsuite/ChangeLog	(revision 212682)
+++ testsuite/ChangeLog	(working copy)
@@ -1,3 +1,11 @@ 
+2014-07-16  Xinliang David Li  <davidxl@google.com>
+
+	* g++.dg/tree-prof/tree-prof.exp: Define macros.
+	* g++.dg/tree-prof/reorder_class1.h: New file.
+	* g++.dg/tree-prof/reorder_class2.h: New file.
+	* g++.dg/tree-prof/reorder.C: New test.
+	* g++.dg/tree-prof/morefunc.C: New test.
+
 2014-07-16  Arnaud Charlet  <charlet@adacore.com>
 
 	* gnat.db/specs/alignment2.ads, gnat.db/specs/size_clause1.ads,
Index: coverage.c
===================================================================
--- coverage.c	(revision 212682)
+++ coverage.c	(working copy)
@@ -54,6 +54,7 @@  along with GCC; see the file COPYING3.
 #include "intl.h"
 #include "filenames.h"
 #include "target.h"
+#include "params.h"
 
 #include "gcov-io.h"
 #include "gcov-io.c"
@@ -369,8 +370,10 @@  get_coverage_counts (unsigned counter, u
                          da_file_name);
       return NULL;
     }
-
-  elt.ident = current_function_funcdef_no + 1;
+  if (PARAM_VALUE (PARAM_PROFILE_FUNC_INTERNAL_ID))
+    elt.ident = current_function_funcdef_no + 1;
+  else
+    elt.ident = coverage_compute_profile_id (cgraph_get_node (cfun->decl));
   elt.ctr = counter;
   entry = counts_hash->find (&elt);
   if (!entry || !entry->summary.num)
@@ -416,7 +419,8 @@  get_coverage_counts (unsigned counter, u
     }
   else if (entry->lineno_checksum != lineno_checksum)
     {
-      warning (0, "source locations for function %qE have changed,"
+      warning (OPT_Wcoverage_mismatch,
+               "source locations for function %qE have changed,"
 	       " the profile data may be out of date",
 	       DECL_ASSEMBLER_NAME (current_function_decl));
     }
@@ -581,12 +585,13 @@  coverage_compute_profile_id (struct cgra
     {
       expanded_location xloc
 	= expand_location (DECL_SOURCE_LOCATION (n->decl));
+      bool use_name_only = (PARAM_VALUE (PARAM_PROFILE_FUNC_INTERNAL_ID) == 0);
 
-      chksum = xloc.line;
+      chksum = (use_name_only ? 0 : xloc.line);
       chksum = coverage_checksum_string (chksum, xloc.file);
       chksum = coverage_checksum_string
 	(chksum, IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (n->decl)));
-      if (first_global_object_name)
+      if (!use_name_only && first_global_object_name)
 	chksum = coverage_checksum_string
 	  (chksum, first_global_object_name);
       chksum = coverage_checksum_string
@@ -645,7 +650,12 @@  coverage_begin_function (unsigned lineno
 
   /* Announce function */
   offset = gcov_write_tag (GCOV_TAG_FUNCTION);
-  gcov_write_unsigned (current_function_funcdef_no + 1);
+  if (PARAM_VALUE (PARAM_PROFILE_FUNC_INTERNAL_ID))
+    gcov_write_unsigned (current_function_funcdef_no + 1);
+  else
+    gcov_write_unsigned (coverage_compute_profile_id (
+       cgraph_get_node (current_function_decl)));
+
   gcov_write_unsigned (lineno_checksum);
   gcov_write_unsigned (cfg_checksum);
   gcov_write_string (IDENTIFIER_POINTER
@@ -682,8 +692,13 @@  coverage_end_function (unsigned lineno_c
       if (!DECL_EXTERNAL (current_function_decl))
 	{
 	  item = ggc_alloc<coverage_data> ();
-	  
-	  item->ident = current_function_funcdef_no + 1;
+
+          if (PARAM_VALUE (PARAM_PROFILE_FUNC_INTERNAL_ID))
+	    item->ident = current_function_funcdef_no + 1;
+          else
+            item->ident = coverage_compute_profile_id (
+               cgraph_get_node (cfun->decl));
+
 	  item->lineno_checksum = lineno_checksum;
 	  item->cfg_checksum = cfg_checksum;