[gomp4,committed,PR65443] Add transform_to_exit_first_loop_al

[ was: Re: [PING][PATCH][PR65443] Add transform_to_exit_first_loop_alt ]

On 14/05/15 13:42, Tom de Vries wrote:
> On 20-04-15 14:25, Richard Biener wrote:
>> On Wed, 15 Apr 2015, Tom de Vries wrote:
>>
>>> On 03-04-15 14:39, Tom de Vries wrote:
>>>> On 27-03-15 15:10, Tom de Vries wrote:
>>>>> Hi,
>>>>>
>>>>> this patch fixes PR65443, a todo in the parloops pass for function
>>>>> transform_to_exit_first_loop:
>>>>> ...
>>>>>      TODO: the common case is that latch of the loop is empty and
>>>>> immediately
>>>>>      follows the loop exit.  In this case, it would be better not
>>>>> to copy
>>>>> the
>>>>>      body of the loop, but only move the entry of the loop directly
>>>>> before
>>>>> the
>>>>>      exit check and increase the number of iterations of the loop
>>>>> by one.
>>>>>      This may need some additional preconditioning in case NIT = ~0.
>>>>> ...
>>>>>
>>>>> The current implementation of transform_to_exit_first_loop
>>>>> transforms the
>>>>> loop
>>>>> by moving and duplicating the loop body. This patch transforms the
>>>>> loop
>>>>> into the
>>>>> same normal form, but without the duplication, if that's possible
>>>>> (determined by
>>>>> try_transform_to_exit_first_loop_alt).
>>>>>
>>>>> The actual transformation, done by transform_to_exit_first_loop_alt
>>>>> transforms
>>>>> this loop:
>>>>> ...
>>>>>        bb preheader:
>>>>>        ...
>>>>>        goto <bb header>
>>>>>
>>>>>        <bb header>:
>>>>>        ...
>>>>>        if (ivtmp < n)
>>>>>          goto <bb latch>;
>>>>>        else
>>>>>          goto <bb exit>;
>>>>>
>>>>>        <bb latch>:
>>>>>        ivtmp = ivtmp + inc;
>>>>>        goto <bb header>
>>>>> ...
>>>>>
>>>>> into this one:
>>>>> ...
>>>>>        bb preheader:
>>>>>        ...
>>>>>        goto <bb newheader>
>>>>>
>>>>>        <bb header>:
>>>>>        ...
>>>>>        goto <bb latch>;
>>>>>
>>>>>        <bb newheader>:
>>>>>        if (ivtmp < n + 1)
>>>>>          goto <bb header>;
>>>>>        else
>>>>>          goto <bb exit>;
>>>>>
>>>>>        <bb latch>:
>>>>>        ivtmp = ivtmp + inc;
>>>>>        goto <bb newheader>
>>>>> ...
>>>>>
>>>>
>>>> Updated patch, now using redirect_edge_var_map and flush_pending_stmts.
>>>>
>>>> Bootstrapped and reg-tested on x86_64.
>>>>
>>>> OK for stage1 trunk?
>>>>
>>>
>>> Ping.
>>
>
> Hi Richard,
>
> thanks for the review.
>
>> +static void
>> +replace_imm_uses (tree val, imm_use_iterator *imm_iter)
>> +{
>> +  use_operand_p use_p;
>> +
>> +  FOR_EACH_IMM_USE_ON_STMT (use_p, *imm_iter)
>> +    SET_USE (use_p, val);
>>
>> Use propagate_value.  Why this odd interface passing a imm_iter?!
>>
>
> The replace_imm_uses function is common code factored out of
> replace_uses_in_bb_by and replace_uses_in_bbs_by. I'm not sure what is
> odd about the interface of the replace_imm_uses function, but I've
> removed the function.
>
> I tried using propagate_value, but that one doesn't like virtuals.
>
>> In fact most of the "repair SSA" in transform_to_exit_first_loop_alt
>> looks too ad-hoc to me ... it also looks somewhat familiar to other
>> code ...
>>
>> Unfortunately the comment before the function isn't in SSA form
>> so it's hard to follow the transform.
>>
>
> I've added the ssa bits in the transformation comment before the
> function, and updated variable names and comments in the function.
>
>> I consider the parloops code bitrotten, no repair possible, so
>> I might be convinced to not care about new spaghetti in there...
>>
>> +  /* Fix up loop structure.  TODO: Check whether this is sufficient.  */
>> +  loop->header = new_header;
>> +
>>
>> no, surely not.  Number of iterations (and estimates) are off
>> after the transform
>
> The loop is cancelled at the end of gen_parallel_loop, and the
> estimations are removed.  We only seem to be using a limited set of the
> loop data until the loop is cancelled. Updated comment accordingly.
>
>> and loop->latch might also need updating.
>>
>
> That one actually stays the same. Updated comment accordingly.
>
>> "Safest" is to simply schedule a fixup (but you'll lose any
>> loop annotation in that process).
>>
>
> Since the loop is cancelled, AFAIU we don't need that, but... You're
> referring to the annotations mentioned in
> replace_loop_annotate_in_block? Losing the annotations seems to be a
> generic problem of scheduling such a fixup, not particular to this
> patch. Do you have a concern related to the patch and these annotations?
>
>> +  /* Figure out whether nit + 1 overflows.  */
>> +  if (TREE_CODE (nit) == INTEGER_CST)
>> +    {
>> +      if (!tree_int_cst_equal (nit, TYPE_MAXVAL (nit_type)))
>>
>> in case nit_type is a pointer type TYPE_MAXVAL will be NULL I think.
>>
>
> We've done ivcanon just before this point, so AFAIU we can assume we're
> dealing with an unsigned integer.
>
>> Is the whole exercise for performance?
>
> I'm trying to fix the todo in the code for parloops, which is about
> performance, though I don't expect a large gain.
>
> OTOH, my main focus is a related oacc kernels issue. For an oacc kernels
> region, the entire loop is enclosed in a kernels region. Peeling of the
> last iteration means I have to guard that iteration to run on only one
> thread, which currently isn't done, so in that sense it's a correctness
> issue as well.
>
>> In that case using an
>> entirely new, unsigned IV, that runs from 0 to niter should be easiest
>> and
>
> Introducting such an IV would mean that we don't have to update the IV,
> but still we have to connect the new IV to the uses of the old IV.
>
> The current special handling of the IV variable is actually not that
> elaborate:
> ...
> +      /* Replace ivtmp_a with ivtmp_c in condition 'if (ivtmp_a < n)'.  */
> +      replace_uses_in_bb_by (res_a, res_c, new_header);
> ...
> So I'm not sure it's easier.
>
> just run-time guard that niter == +INF case?
>
> That doesn't work out nicely for the oacc kernels case. I need to know
> compile-time wheter I can parallelize or not.
>
>> For the graphite case, can't you make graphite generate the
>> loops exit-first in the first place?
>>
>
> Perhaps, but that doesn't make a difference for the oacc kernels case.
>
>> The testcases are just correctness ones?  That is, there isn't
>> any testcase that checks whether the new code is exercised?
>> (no extra debugging dumping?)
>>
>
> There are 3 new test patterns, each with a libgomp/testsuite/libgomp.c
> and a gcc/testsuite/gcc.dg variant, so 6 new test-cases in total. The
> gcc/testsuite/gcc.dg variant checks that the new code is exercised. The
> libgomp/testsuite/libgomp.c variant checks that the new code generates
> correct code.
>

Committed to gomp-4_0-branch.

Thanks,
- Tom

[gomp4,committed,PR65443] Add transform_to_exit_first_loop_al

Commit Message

Patch