Message ID | 55B20F2D.3060403@mentor.com |
---|---|
State | New |
Headers | show |
On Fri, Jul 24, 2015 at 12:10 PM, Tom de Vries <Tom_deVries@mentor.com> wrote: > On 20/07/15 15:04, Tom de Vries wrote: >> >> On 16/07/15 12:15, Richard Biener wrote: >>> >>> On Thu, Jul 16, 2015 at 11:39 AM, Tom de Vries >>> <Tom_deVries@mentor.com> wrote: >>>> >>>> On 16/07/15 10:44, Richard Biener wrote: >>>>> >>>>> >>>>> On Wed, Jul 15, 2015 at 9:36 PM, Tom de Vries <Tom_deVries@mentor.com> >>>>> wrote: >>>>>> >>>>>> >>>>>> Hi, >>>>>> >>>>>> I. >>>>>> >>>>>> In openmp expansion of loops, we do some effort to try to create >>>>>> matching >>>>>> loops in the loop state of the child function, f.i.in >>>>>> expand_omp_for_generic: >>>>>> ... >>>>>> struct loop *outer_loop; >>>>>> if (seq_loop) >>>>>> outer_loop = l0_bb->loop_father; >>>>>> else >>>>>> { >>>>>> outer_loop = alloc_loop (); >>>>>> outer_loop->header = l0_bb; >>>>>> outer_loop->latch = l2_bb; >>>>>> add_loop (outer_loop, l0_bb->loop_father); >>>>>> } >>>>>> >>>>>> if (!gimple_omp_for_combined_p (fd->for_stmt)) >>>>>> { >>>>>> struct loop *loop = alloc_loop (); >>>>>> loop->header = l1_bb; >>>>>> /* The loop may have multiple latches. */ >>>>>> add_loop (loop, outer_loop); >>>>>> } >>>>>> ... >>>>>> >>>>>> And if that doesn't work out, we try to mark the loop state for >>>>>> fixup, in >>>>>> expand_omp_taskreg and expand_omp_target: >>>>>> ... >>>>>> /* When the OMP expansion process cannot guarantee an >>>>>> up-to-date >>>>>> loop tree arrange for the child function to fixup >>>>>> loops. */ >>>>>> if (loops_state_satisfies_p (LOOPS_NEED_FIXUP)) >>>>>> child_cfun->x_current_loops->state |= LOOPS_NEED_FIXUP; >>>>>> ... >>>>>> >>>>>> and expand_omp_for: >>>>>> ... >>>>>> else >>>>>> /* If there isn't a continue then this is a degerate case where >>>>>> the introduction of abnormal edges during lowering will >>>>>> prevent >>>>>> original loops from being detected. Fix that up. */ >>>>>> loops_state_set (LOOPS_NEED_FIXUP); >>>>>> ... >>>>>> >>>>>> However, loops are fixed up anyway, because the first pass we execute >>>>>> with >>>>>> the new child function is pass_fixup_cfg. >>>>>> >>>>>> The new child function contains a function call to >>>>>> __builtin_omp_get_num_threads, which is marked with ECF_CONST, so >>>>>> execute_fixup_cfg marks the function for TODO_cleanup_cfg, and >>>>>> subsequently >>>>>> the loops with LOOPS_NEED_FIXUP. >>>>>> >>>>>> >>>>>> II. >>>>>> >>>>>> This patch adds a verification that at the end of the omp-expand >>>>>> processing >>>>>> of the child function, either the loop structure is ok, or marked for >>>>>> fixup. >>>>>> >>>>>> This verfication triggered a failure in parloops. When an outer >>>>>> loop is >>>>>> being parallelized, both the outer and inner loop are cancelled. Then >>>>>> during >>>>>> omp-expansion, we create a loop in the loop state for the outer >>>>>> loop (the >>>>>> one that is transformed), but not for the inner, which causes the >>>>>> verification failure: >>>>>> ... >>>>>> outer-1.c:11:3: error: loop with header 5 not in loop tree >>>>>> ... >>>>>> >>>>>> [ I ran into this verification failure with an openacc kernels >>>>>> testcase >>>>>> on >>>>>> the gomp-4_0-branch, where parloops is called additionally from a >>>>>> different >>>>>> location, and pass_fixup_cfg is not the first pass that the child >>>>>> function >>>>>> is processed by. ] >>>>>> >>>>>> The patch contains a bit that makes sure that the loop state of the >>>>>> child >>>>>> function is marked for fixup in parloops. The bit is non-trival >>>>>> since it >>>>>> create a loop state and sets the fixup flag on the loop state, but >>>>>> postpones >>>>>> the init_loops_structure call till move_sese_region_to_fn, where it >>>>>> can >>>>>> succeed. >>>>>> >>>>>> >> >> <SNIP> >> >>> Can we fix the root-cause of the issue instead? That is, build a >>> valid loop >>> structure in the first place? >>> >> >> This patch manages to keep the loop structure, that is, to not cancel >> the loop tree in parloops, and guarantee a valid loop structure at the >> end of parloops. >> >> The transformation to insert the omp_for invalidates the loop state >> properties LOOPS_HAVE_RECORDED_EXITS and LOOPS_HAVE_SIMPLE_LATCHES, so >> we drop those in parloops. >> >> In expand_omp_for_static_nochunk, we detect the existing loop struct of >> the omp_for, and keep it. >> >> Then by calling pass_tree_loop_init after pass_expand_omp_ssa, we get >> the loop state properties LOOPS_HAVE_RECORDED_EXITS and >> LOOPS_HAVE_SIMPLE_LATCHES back. >> > > This updated patch tries a more minimal approach. > > Rather than dropping property LOOPS_HAVE_RECORDED_EXITS, we record the new > exit instead. > > And rather than adding pass_tree_loop_init after pass_expand_omp_ssa, we > just set LOOPS_HAVE_SIMPLE_LATCHES back at the end of pass_expand_omp_ssa. > > Bootstrapped and reg-tested on x86_64. > > OK for trunk? I wonder about the need to clear LOOPS_HAVE_SIMPLE_LATCHES (and esp. turning that back on in execute_expand_omp). The parloops code lacks comments and the /* Prepare cfg. */ part looks twisty to me - but I don't see why it should be difficult to preserve simple latches as well - is this just because we insert the GIMPLE_OMP_CONTINUE in it? If execute_expand_omp is not performed in a loop pipeline where things expect simple latches (well, obviously it isn't) then why set LOOPS_HAVE_SIMPLE_LATCHES here at all? If somebody needs it it will just request simple latches. So if the GIMPLE_OMP_CONTINUE part is correct then I'd prefer to keep LOOPS_HAVE_SIMPLE_LATCHES unset in execute_expand_omp. Ok with that change. Thanks, Richard. > Thanks, > - Tom >
Don't cancel loop tree in parloops 2015-07-20 Tom de Vries <tom@codesourcery.com> PR tree-optimization/66846 * omp-low.c (expand_omp_taskreg) [ENABLE_CHECKING]: Call verify_loop_structure for child_cfun if !LOOPS_NEED_FIXUP. (expand_omp_target) [ENABLE_CHECKING]: Same. (execute_expand_omp): Reinstate LOOPS_HAVE_SIMPLE_LATCHES if in ssa. [ENABLE_CHECKING]: Call verify_loop_structure for cfun if !LOOPS_NEED_FIXUP. (expand_omp_for_static_nochunk): Handle case that omp_for already has its own loop struct. * tree-parloops.c (create_parallel_loop): Add comment about breaking LOOPS_HAVE_SIMPLE_LATCHES. Record new exit. (gen_parallel_loop): Remove call to cancel_loop_tree. (parallelize_loops): Skip loops that are inner loops of parallelized loops. (pass_parallelize_loops::execute): Clear LOOPS_HAVE_SIMPLE_LATCHES on loop state. [ENABLE_CHECKING]: Call verify_loop_structure. --- gcc/omp-low.c | 27 ++++++++++++++++++++++++++- gcc/tree-parloops.c | 32 ++++++++++++++++++++++++++++---- 2 files changed, 54 insertions(+), 5 deletions(-) diff --git a/gcc/omp-low.c b/gcc/omp-low.c index 3135606..ce83abf 100644 --- a/gcc/omp-low.c +++ b/gcc/omp-low.c @@ -5604,6 +5604,10 @@ expand_omp_taskreg (struct omp_region *region) } if (gimple_in_ssa_p (cfun)) update_ssa (TODO_update_ssa); +#ifdef ENABLE_CHECKING + if (!loops_state_satisfies_p (LOOPS_NEED_FIXUP)) + verify_loop_structure (); +#endif pop_cfun (); } @@ -6843,9 +6847,17 @@ expand_omp_for_static_nochunk (struct omp_region *region, set_immediate_dominator (CDI_DOMINATORS, fin_bb, recompute_dominator (CDI_DOMINATORS, fin_bb)); + struct loop *loop = body_bb->loop_father; + if (loop != entry_bb->loop_father) + { + gcc_assert (loop->header == body_bb); + gcc_assert (broken_loop || loop->latch == region->cont); + return; + } + if (!broken_loop && !gimple_omp_for_combined_p (fd->for_stmt)) { - struct loop *loop = alloc_loop (); + loop = alloc_loop (); loop->header = body_bb; if (collapse_bb == NULL) loop->latch = cont_bb; @@ -8984,6 +8996,10 @@ expand_omp_target (struct omp_region *region) if (changed) cleanup_tree_cfg (); } +#ifdef ENABLE_CHECKING + if (!loops_state_satisfies_p (LOOPS_NEED_FIXUP)) + verify_loop_structure (); +#endif pop_cfun (); } @@ -9492,6 +9508,15 @@ execute_expand_omp (void) expand_omp (root_omp_region); + /* We dropped this property in parloops because of the omp_for. Now that the + omp_for has been expanded, reinstate it. */ + if (gimple_in_ssa_p (cfun)) + loops_state_set (LOOPS_HAVE_SIMPLE_LATCHES); + +#ifdef ENABLE_CHECKING + if (!loops_state_satisfies_p (LOOPS_NEED_FIXUP)) + verify_loop_structure (); +#endif cleanup_tree_cfg (); free_omp_regions (); diff --git a/gcc/tree-parloops.c b/gcc/tree-parloops.c index ec41834..1899052 100644 --- a/gcc/tree-parloops.c +++ b/gcc/tree-parloops.c @@ -2045,7 +2045,14 @@ create_parallel_loop (struct loop *loop, tree loop_fn, tree data, guard = make_edge (for_bb, ex_bb, 0); single_succ_edge (loop->latch)->flags = 0; + + /* After creating this edge, the latch has two successors, so + LOOPS_HAVE_SIMPLE_LATCHES is no longer valid. We'll update the loop state + as such at the end of the pass, since it's not needed earlier, and doing it + earlier will invalidate info for loops we still need to process. */ end = make_edge (loop->latch, ex_bb, EDGE_FALLTHRU); + rescan_loop_exit (end, true, false); + for (gphi_iterator gpi = gsi_start_phis (ex_bb); !gsi_end_p (gpi); gsi_next (&gpi)) { @@ -2282,10 +2289,6 @@ gen_parallel_loop (struct loop *loop, scev_reset (); - /* Cancel the loop (it is simpler to do it here rather than to teach the - expander to do it). */ - cancel_loop_tree (loop); - /* Free loop bound estimations that could contain references to removed statements. */ FOR_EACH_LOOP (loop, 0) @@ -2521,6 +2524,7 @@ parallelize_loops (void) unsigned n_threads = flag_tree_parallelize_loops; bool changed = false; struct loop *loop; + struct loop *skip_loop = NULL; struct tree_niter_desc niter_desc; struct obstack parloop_obstack; HOST_WIDE_INT estimated; @@ -2538,6 +2542,19 @@ parallelize_loops (void) FOR_EACH_LOOP (loop, 0) { + if (loop == skip_loop) + { + if (dump_file && (dump_flags & TDF_DETAILS)) + fprintf (dump_file, + "Skipping loop %d as inner loop of parallelized loop\n", + loop->num); + + skip_loop = loop->inner; + continue; + } + else + skip_loop = NULL; + reduction_list.empty (); if (dump_file && (dump_flags & TDF_DETAILS)) { @@ -2597,6 +2614,7 @@ parallelize_loops (void) continue; changed = true; + skip_loop = loop->inner; if (dump_file && (dump_flags & TDF_DETAILS)) { if (loop->inner) @@ -2663,6 +2681,12 @@ pass_parallelize_loops::execute (function *fun) if (parallelize_loops ()) { fun->curr_properties &= ~(PROP_gimple_eomp); + + loops_state_clear (LOOPS_HAVE_SIMPLE_LATCHES); +#ifdef ENABLE_CHECKING + verify_loop_structure (); +#endif + return TODO_update_ssa; } -- 1.9.1