Sanitize block partitioning under -freorder-blocks-and-partition

On Fri, Sep 27, 2013 at 7:15 AM, Teresa Johnson <tejohnson@google.com> wrote:
> On Thu, Sep 26, 2013 at 3:02 PM, Jan Hubicka <hubicka@ucw.cz> wrote:
>>>
>>> Why not just have probably_never_executed_bb_p return simply return
>>> false bb->frequency is non-zero (right now it does the opposite -
>>
>> We want to have frequencies guessed for functions that was not trained
>> in the profiling run (that was patch I posted earlier that I think did not
>> go in, yet).
>
> Right, but for splitting and bb layout purposes, for these statically
> guessed unprofiled functions we in fact don't want to do any splitting
> or treat the bbs as never executed (which shouldn't be a change from
> the status quo since all the bbs in these functions are currently 0
> weight, it's only when we inline in the case of comdats that they
> appear colder than the surrounding code, but in fact we don't want
> this).
>
> The only other caller to probably_never_executed_bb_p is
> compute_function_frequency, but in the case of statically guessed
> functions they will have profile_status != PROFILE_READ and won't
> invoke probably_never_executed_bb_p. But re-reading our most recent
> exchange on the comdat profile issue, it sounds like you were
> suggesting guessing profiles for all 0-weight functions early, then
> dropping them from PROFILE_READ to PROFILE_GUESSED only once we
> determine in ipa-inline that there is a potentially non-zero call path
> to them. In that case with the change I describe above to
> probably_never_executed_bb_p, the 0-weight functions with 0 calls to
> them will incorrectly be marked as NODE_FREQUENCY_NORMAL, which would
> be bad as they would not be size optimized or moved into the cold
> section.
>
> So it seems like we want different handling of these guessed
> frequencies in compute_function_frequency and bb-reorder.c. Actually I
> think we can handle this by checking if the function entry block has a
> 0 count. If so, then we just look at the bb counts and not the
> frequencies for determining bb hotness as the frequencies would
> presumably have been statically-guessed. This will ensure that the
> cgraph node continues to be marked unlikely and size-optimized. If the
> function entry block has a non-zero count, then we look at both the bb
> count and the bb frequency - if they are both zero then the bb is
> probably never executed, but if either is non-zero then we should
> treat the block as possibly executed (which will come into play for
> splitting and bb layout).

Here is a patch to handle the profile insanities conservatively during
splitting. It also simplifies the probably_never_executed* code to
treat missing counts within a profiled function differently
(conservatively, based on frequency) from the case where the whole
function has a guessed profile. That way, once a patch to guess
profiles for non-executed functions is added, they will continue to
have their nodes marked as unlikely. I also pulled the guts of the
probably_never_executed_bb_p code out to a helper that is then invoked
by both the bb and edge versions of this function, so they stay in
sync.

This gets rid of a number of the failures with splitting + the linker
script to make the unlikely section non-executable. I have a patch to
fix some jump threading insanities that I will send separately.

Bootstrapped and regression tested on x86_64. Also tested with an lto
profiledbootstrap. Ok for trunk?

Thanks,
Teresa

2013-09-29  Teresa Johnson  <tejohnson@google.com>

        * bb-reorder.c (find_rarely_executed_basic_blocks_and_crossing_edges):
        Treat profile insanities conservatively.
        * predict.c (probably_never_executed): New function. Treat profile
        insanities conservatively.
        (probably_never_executed_bb_p): Invoke probably_never_executed.
        (probably_never_executed_edge_p): Invoke probably_never_executed.

>
> Teresa
>
>>
>> Currently I return true when frequency indicate that BB is executed at least in
>> 1/4th of all executions.  With the cases discussed I see we may need to reduce
>> this threshold.  In general I do not like much hard tests for 0 because meaning
>> of 0 depends on REG_BR_FREQ_BASE that is supposed to be changeable and we may
>> want to make frequencies sreal, too.
>>
>> I suppose we may introduce --param for this.  You are also right that I should
>> update probably_never_executed_edge_p (I intended so, but obviously the code
>> ended up in mainline accidentally).
>>
>> I however saw at least one case of jump threading where this trick did not
>> help: the jump threading update confused itself by scaling via counts rather
>> than frequencies and ended up with dropping everything to 0. This makes it
>> more tempting to try to go with sreals for those....
>>
>> Honza
>>
>>> returns true when bb->frequency is 0)? Making this change removed a
>>> bunch of other failures. With this change as well, there are only 3
>>> cases that still fail with 1 train run that pass with 100. Need to
>>> look at those.
>>>
>>> >
>>> > Will you look into logic of do_jump or shall I try to dive in?
>>>
>>> I can take a look, but probably won't have a chance until late this
>>> week. If you don't get to it before then I will see if I can figure
>>> out why it is applying the branch probabilities this way.
>>>
>>> Teresa
>>>
>>> >
>>> > Honza
>>>
>>>
>>>
>>> --
>>> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413
>
>
>
> --
> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413

Sanitize block partitioning under -freorder-blocks-and-partition

Commit Message

Comments

Patch