From patchwork Wed Jul 29 11:38:44 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tom de Vries X-Patchwork-Id: 501669 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 1AEEE1402CD for ; Wed, 29 Jul 2015 21:39:30 +1000 (AEST) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b=sqFlri3K; dkim-atps=neutral DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :message-id:date:from:mime-version:to:cc:subject:references :in-reply-to:content-type; q=dns; s=default; b=ms4WPttl225HvVbCm mfmqqVDqhu1XDG8zKfTSMyxxveO/fgPoW4VIvoL4Kf8+rWlLnF8DdqqZQityA4Kg LnA2ixZ+fQ3maK5/aURHQCIPMlYAeKwo/+5y57vajejJZAoI5DjwVtj57l8olwnw ITqoxwLgqZBgVTldz4/hk8AnHI= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :message-id:date:from:mime-version:to:cc:subject:references :in-reply-to:content-type; s=default; bh=eUNmncEx/J4xrhWcT4aeDTv YKng=; b=sqFlri3Kj7xbuWaoq2TUtCDDq4ohMSvIqR2wjjeY7ojUVbFNthXusy+ Rn2zxgpM8wdNN1KHrfkk/TCmFTQsDqpaLr2AcCTqwaDIMzHJmkyVSfyCNPm4cdhc A7OVPrxJbF9qu9PEgVUNH+sqv4z30mHwRnMXyMX9/7sl/QQZwQis= Received: (qmail 100424 invoked by alias); 29 Jul 2015 11:39:21 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 100411 invoked by uid 89); 29 Jul 2015 11:39:21 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.7 required=5.0 tests=AWL, BAYES_00, RP_MATCHES_RCVD, SPF_PASS autolearn=ham version=3.3.2 X-HELO: fencepost.gnu.org Received: from fencepost.gnu.org (HELO fencepost.gnu.org) (208.118.235.10) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES128-SHA encrypted) ESMTPS; Wed, 29 Jul 2015 11:39:18 +0000 Received: from eggs.gnu.org ([2001:4830:134:3::10]:38715) by fencepost.gnu.org with esmtps (TLS1.0:RSA_AES_256_CBC_SHA1:256) (Exim 4.82) (envelope-from ) id 1ZKPhc-0003VE-2V for gcc-patches@gnu.org; Wed, 29 Jul 2015 07:39:16 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ZKPhV-0004q4-VW for gcc-patches@gnu.org; Wed, 29 Jul 2015 07:39:15 -0400 Received: from relay1.mentorg.com ([192.94.38.131]:48532) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZKPhV-0004pd-LA for gcc-patches@gnu.org; Wed, 29 Jul 2015 07:39:09 -0400 Received: from nat-ies.mentorg.com ([192.94.31.2] helo=SVR-IES-FEM-01.mgc.mentorg.com) by relay1.mentorg.com with esmtp id 1ZKPhT-0006ye-Na from Tom_deVries@mentor.com ; Wed, 29 Jul 2015 04:39:08 -0700 Received: from [127.0.0.1] (137.202.0.76) by SVR-IES-FEM-01.mgc.mentorg.com (137.202.0.104) with Microsoft SMTP Server id 14.3.224.2; Wed, 29 Jul 2015 12:39:05 +0100 Message-ID: <55B8BB44.2090703@mentor.com> Date: Wed, 29 Jul 2015 13:38:44 +0200 From: Tom de Vries User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.7.0 MIME-Version: 1.0 To: Richard Biener CC: "gcc-patches@gnu.org" Subject: Re: [PATCH, PR66846] Mark inner loop for fixup in parloops References: <55A6B656.8080701@mentor.com> <55A77BEE.1050906@mentor.com> <55ACF1DA.7000102@mentor.com> <55B20F2D.3060403@mentor.com> In-Reply-To: X-detected-operating-system: by eggs.gnu.org: Windows NT kernel [generic] [fuzzy] X-Received-From: 192.94.38.131 On 28/07/15 12:11, Richard Biener wrote: > On Fri, Jul 24, 2015 at 12:10 PM, Tom de Vries wrote: >> On 20/07/15 15:04, Tom de Vries wrote: >>> >>> On 16/07/15 12:15, Richard Biener wrote: >>>> >>>> On Thu, Jul 16, 2015 at 11:39 AM, Tom de Vries >>>> wrote: >>>>> >>>>> On 16/07/15 10:44, Richard Biener wrote: >>>>>> >>>>>> >>>>>> On Wed, Jul 15, 2015 at 9:36 PM, Tom de Vries >>>>>> wrote: >>>>>>> >>>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> I. >>>>>>> >>>>>>> In openmp expansion of loops, we do some effort to try to create >>>>>>> matching >>>>>>> loops in the loop state of the child function, f.i.in >>>>>>> expand_omp_for_generic: >>>>>>> ... >>>>>>> struct loop *outer_loop; >>>>>>> if (seq_loop) >>>>>>> outer_loop = l0_bb->loop_father; >>>>>>> else >>>>>>> { >>>>>>> outer_loop = alloc_loop (); >>>>>>> outer_loop->header = l0_bb; >>>>>>> outer_loop->latch = l2_bb; >>>>>>> add_loop (outer_loop, l0_bb->loop_father); >>>>>>> } >>>>>>> >>>>>>> if (!gimple_omp_for_combined_p (fd->for_stmt)) >>>>>>> { >>>>>>> struct loop *loop = alloc_loop (); >>>>>>> loop->header = l1_bb; >>>>>>> /* The loop may have multiple latches. */ >>>>>>> add_loop (loop, outer_loop); >>>>>>> } >>>>>>> ... >>>>>>> >>>>>>> And if that doesn't work out, we try to mark the loop state for >>>>>>> fixup, in >>>>>>> expand_omp_taskreg and expand_omp_target: >>>>>>> ... >>>>>>> /* When the OMP expansion process cannot guarantee an >>>>>>> up-to-date >>>>>>> loop tree arrange for the child function to fixup >>>>>>> loops. */ >>>>>>> if (loops_state_satisfies_p (LOOPS_NEED_FIXUP)) >>>>>>> child_cfun->x_current_loops->state |= LOOPS_NEED_FIXUP; >>>>>>> ... >>>>>>> >>>>>>> and expand_omp_for: >>>>>>> ... >>>>>>> else >>>>>>> /* If there isn't a continue then this is a degerate case where >>>>>>> the introduction of abnormal edges during lowering will >>>>>>> prevent >>>>>>> original loops from being detected. Fix that up. */ >>>>>>> loops_state_set (LOOPS_NEED_FIXUP); >>>>>>> ... >>>>>>> >>>>>>> However, loops are fixed up anyway, because the first pass we execute >>>>>>> with >>>>>>> the new child function is pass_fixup_cfg. >>>>>>> >>>>>>> The new child function contains a function call to >>>>>>> __builtin_omp_get_num_threads, which is marked with ECF_CONST, so >>>>>>> execute_fixup_cfg marks the function for TODO_cleanup_cfg, and >>>>>>> subsequently >>>>>>> the loops with LOOPS_NEED_FIXUP. >>>>>>> >>>>>>> >>>>>>> II. >>>>>>> >>>>>>> This patch adds a verification that at the end of the omp-expand >>>>>>> processing >>>>>>> of the child function, either the loop structure is ok, or marked for >>>>>>> fixup. >>>>>>> >>>>>>> This verfication triggered a failure in parloops. When an outer >>>>>>> loop is >>>>>>> being parallelized, both the outer and inner loop are cancelled. Then >>>>>>> during >>>>>>> omp-expansion, we create a loop in the loop state for the outer >>>>>>> loop (the >>>>>>> one that is transformed), but not for the inner, which causes the >>>>>>> verification failure: >>>>>>> ... >>>>>>> outer-1.c:11:3: error: loop with header 5 not in loop tree >>>>>>> ... >>>>>>> >>>>>>> [ I ran into this verification failure with an openacc kernels >>>>>>> testcase >>>>>>> on >>>>>>> the gomp-4_0-branch, where parloops is called additionally from a >>>>>>> different >>>>>>> location, and pass_fixup_cfg is not the first pass that the child >>>>>>> function >>>>>>> is processed by. ] >>>>>>> >>>>>>> The patch contains a bit that makes sure that the loop state of the >>>>>>> child >>>>>>> function is marked for fixup in parloops. The bit is non-trival >>>>>>> since it >>>>>>> create a loop state and sets the fixup flag on the loop state, but >>>>>>> postpones >>>>>>> the init_loops_structure call till move_sese_region_to_fn, where it >>>>>>> can >>>>>>> succeed. >>>>>>> >>>>>>> >>> >>> >>> >>>> Can we fix the root-cause of the issue instead? That is, build a >>>> valid loop >>>> structure in the first place? >>>> >>> >>> This patch manages to keep the loop structure, that is, to not cancel >>> the loop tree in parloops, and guarantee a valid loop structure at the >>> end of parloops. >>> >>> The transformation to insert the omp_for invalidates the loop state >>> properties LOOPS_HAVE_RECORDED_EXITS and LOOPS_HAVE_SIMPLE_LATCHES, so >>> we drop those in parloops. >>> >>> In expand_omp_for_static_nochunk, we detect the existing loop struct of >>> the omp_for, and keep it. >>> >>> Then by calling pass_tree_loop_init after pass_expand_omp_ssa, we get >>> the loop state properties LOOPS_HAVE_RECORDED_EXITS and >>> LOOPS_HAVE_SIMPLE_LATCHES back. >>> >> >> This updated patch tries a more minimal approach. >> >> Rather than dropping property LOOPS_HAVE_RECORDED_EXITS, we record the new >> exit instead. >> >> And rather than adding pass_tree_loop_init after pass_expand_omp_ssa, we >> just set LOOPS_HAVE_SIMPLE_LATCHES back at the end of pass_expand_omp_ssa. >> >> Bootstrapped and reg-tested on x86_64. >> >> OK for trunk? > > I wonder about the need to clear LOOPS_HAVE_SIMPLE_LATCHES (and esp. turning > that back on in execute_expand_omp). The parloops code lacks comments and > the /* Prepare cfg. */ part looks twisty to me - but I don't see why > it should be difficult > to preserve simple latches as well - is this just because we insert > the GIMPLE_OMP_CONTINUE > in it? > It's not difficult to do. It's just that the omp-for that is generated via the normal route (omp-annotated source code) doesn't have this simple latch, and parloops just mimics that cfg shape. Attached updated patch preserves simple latches in parloops. We need some minor adjustments in omp-expand to handle that. > If execute_expand_omp is not performed in a loop pipeline where things > expect simple latches > (well, obviously it isn't) then why set LOOPS_HAVE_SIMPLE_LATCHES here > at all? If somebody > needs it it will just request simple latches. > I've looked into this. The (first?) pass that gives me trouble is pass_iv_optimize. It doesn't request simple latches, but it does need them. > So if the GIMPLE_OMP_CONTINUE part is correct then I'd prefer to keep > LOOPS_HAVE_SIMPLE_LATCHES > unset in execute_expand_omp. > > Ok with that change. > OK with this approach of keeping LOOPS_HAVE_SIMPLE_LATCHES in parloops (if bootstrap and reg-test succeeds)? Thanks, - Tom Don't cancel loop tree in parloops 2015-07-28 Tom de Vries PR tree-optimization/66846 * omp-low.c (expand_omp_taskreg) [ENABLE_CHECKING]: Call verify_loop_structure for child_cfun if !LOOPS_NEED_FIXUP. (expand_omp_target) [ENABLE_CHECKING]: Same. (execute_expand_omp) [ENABLE_CHECKING]: Call verify_loop_structure for cfun if !LOOPS_NEED_FIXUP. (expand_omp_for_static_nochunk): Handle simple latch bb. Handle case that omp_for already has its own loop struct. * tree-parloops.c (create_phi_for_local_result) (create_call_for_reduction): Handle simple latch bb. (create_parallel_loop): Add simple latch bb to preserve LOOPS_HAVE_SIMPLE_LATCHES. Record new exit. Handle simple latch bb. (gen_parallel_loop): Remove call to cancel_loop_tree. (parallelize_loops): Skip loops that are inner loops of parallelized loops. (pass_parallelize_loops::execute) [ENABLE_CHECKING]: Call verify_loop_structure. --- gcc/omp-low.c | 32 ++++++++++++++++++++++++++++++-- gcc/tree-parloops.c | 49 ++++++++++++++++++++++++++++++++++++------------- 2 files changed, 66 insertions(+), 15 deletions(-) diff --git a/gcc/omp-low.c b/gcc/omp-low.c index 3135606..0f5c0f1 100644 --- a/gcc/omp-low.c +++ b/gcc/omp-low.c @@ -5604,6 +5604,10 @@ expand_omp_taskreg (struct omp_region *region) } if (gimple_in_ssa_p (cfun)) update_ssa (TODO_update_ssa); +#ifdef ENABLE_CHECKING + if (!loops_state_satisfies_p (LOOPS_NEED_FIXUP)) + verify_loop_structure (); +#endif pop_cfun (); } @@ -6535,7 +6539,8 @@ expand_omp_for_static_nochunk (struct omp_region *region, body_bb = single_succ (seq_start_bb); if (!broken_loop) { - gcc_assert (BRANCH_EDGE (cont_bb)->dest == body_bb); + gcc_assert (BRANCH_EDGE (cont_bb)->dest == body_bb + || single_succ (BRANCH_EDGE (cont_bb)->dest) == body_bb); gcc_assert (EDGE_COUNT (cont_bb->succs) == 2); } exit_bb = region->exit; @@ -6818,6 +6823,11 @@ expand_omp_for_static_nochunk (struct omp_region *region, if (!broken_loop) { ep = find_edge (cont_bb, body_bb); + if (ep == NULL) + { + ep = BRANCH_EDGE (cont_bb); + gcc_assert (single_succ (ep->dest) == body_bb); + } if (gimple_omp_for_combined_p (fd->for_stmt)) { remove_edge (ep); @@ -6843,9 +6853,19 @@ expand_omp_for_static_nochunk (struct omp_region *region, set_immediate_dominator (CDI_DOMINATORS, fin_bb, recompute_dominator (CDI_DOMINATORS, fin_bb)); + struct loop *loop = body_bb->loop_father; + if (loop != entry_bb->loop_father) + { + gcc_assert (loop->header == body_bb); + gcc_assert (broken_loop + || loop->latch == region->cont + || single_pred (loop->latch) == region->cont); + return; + } + if (!broken_loop && !gimple_omp_for_combined_p (fd->for_stmt)) { - struct loop *loop = alloc_loop (); + loop = alloc_loop (); loop->header = body_bb; if (collapse_bb == NULL) loop->latch = cont_bb; @@ -8984,6 +9004,10 @@ expand_omp_target (struct omp_region *region) if (changed) cleanup_tree_cfg (); } +#ifdef ENABLE_CHECKING + if (!loops_state_satisfies_p (LOOPS_NEED_FIXUP)) + verify_loop_structure (); +#endif pop_cfun (); } @@ -9492,6 +9516,10 @@ execute_expand_omp (void) expand_omp (root_omp_region); +#ifdef ENABLE_CHECKING + if (!loops_state_satisfies_p (LOOPS_NEED_FIXUP)) + verify_loop_structure (); +#endif cleanup_tree_cfg (); free_omp_regions (); diff --git a/gcc/tree-parloops.c b/gcc/tree-parloops.c index b06265c..d017479 100644 --- a/gcc/tree-parloops.c +++ b/gcc/tree-parloops.c @@ -1032,21 +1032,22 @@ create_phi_for_local_result (reduction_info **slot, struct loop *loop) struct reduction_info *const reduc = *slot; edge e; gphi *new_phi; - basic_block store_bb; + basic_block store_bb, continue_bb; tree local_res; source_location locus; /* STORE_BB is the block where the phi should be stored. It is the destination of the loop exit. (Find the fallthru edge from GIMPLE_OMP_CONTINUE). */ - store_bb = FALLTHRU_EDGE (loop->latch)->dest; + continue_bb = single_pred (loop->latch); + store_bb = FALLTHRU_EDGE (continue_bb)->dest; /* STORE_BB has two predecessors. One coming from the loop (the reduction's result is computed at the loop), and another coming from a block preceding the loop, when no iterations are executed (the initial value should be taken). */ - if (EDGE_PRED (store_bb, 0) == FALLTHRU_EDGE (loop->latch)) + if (EDGE_PRED (store_bb, 0) == FALLTHRU_EDGE (continue_bb)) e = EDGE_PRED (store_bb, 1); else e = EDGE_PRED (store_bb, 0); @@ -1055,7 +1056,7 @@ create_phi_for_local_result (reduction_info **slot, struct loop *loop) locus = gimple_location (reduc->reduc_stmt); new_phi = create_phi_node (local_res, store_bb); add_phi_arg (new_phi, reduc->init, e, locus); - add_phi_arg (new_phi, lhs, FALLTHRU_EDGE (loop->latch), locus); + add_phi_arg (new_phi, lhs, FALLTHRU_EDGE (continue_bb), locus); reduc->new_phi = new_phi; return 1; @@ -1134,7 +1135,8 @@ create_call_for_reduction (struct loop *loop, { reduction_list->traverse (loop); /* Find the fallthru edge from GIMPLE_OMP_CONTINUE. */ - ld_st_data->load_bb = FALLTHRU_EDGE (loop->latch)->dest; + basic_block continue_bb = single_pred (loop->latch); + ld_st_data->load_bb = FALLTHRU_EDGE (continue_bb)->dest; reduction_list ->traverse (ld_st_data); } @@ -1981,7 +1983,7 @@ create_parallel_loop (struct loop *loop, tree loop_fn, tree data, tree new_data, unsigned n_threads, location_t loc) { gimple_stmt_iterator gsi; - basic_block bb, paral_bb, for_bb, ex_bb; + basic_block bb, paral_bb, for_bb, ex_bb, continue_bb; tree t, param; gomp_parallel *omp_par_stmt; gimple omp_return_stmt1, omp_return_stmt2; @@ -2052,8 +2054,12 @@ create_parallel_loop (struct loop *loop, tree loop_fn, tree data, gcc_assert (exit == single_dom_exit (loop)); guard = make_edge (for_bb, ex_bb, 0); - single_succ_edge (loop->latch)->flags = 0; - end = make_edge (loop->latch, ex_bb, EDGE_FALLTHRU); + /* Split the latch edge, so LOOPS_HAVE_SIMPLE_LATCHES is still valid. */ + loop->latch = split_edge (single_succ_edge (loop->latch)); + single_pred_edge (loop->latch)->flags = 0; + end = make_edge (single_pred (loop->latch), ex_bb, EDGE_FALLTHRU); + rescan_loop_exit (end, true, false); + for (gphi_iterator gpi = gsi_start_phis (ex_bb); !gsi_end_p (gpi); gsi_next (&gpi)) { @@ -2102,7 +2108,8 @@ create_parallel_loop (struct loop *loop, tree loop_fn, tree data, SSA_NAME_DEF_STMT (initvar) = for_stmt; /* Emit GIMPLE_OMP_CONTINUE. */ - gsi = gsi_last_bb (loop->latch); + continue_bb = single_pred (loop->latch); + gsi = gsi_last_bb (continue_bb); omp_cont_stmt = gimple_build_omp_continue (cvar_next, cvar); gimple_set_location (omp_cont_stmt, loc); gsi_insert_after (&gsi, omp_cont_stmt, GSI_NEW_STMT); @@ -2298,10 +2305,6 @@ gen_parallel_loop (struct loop *loop, scev_reset (); - /* Cancel the loop (it is simpler to do it here rather than to teach the - expander to do it). */ - cancel_loop_tree (loop); - /* Free loop bound estimations that could contain references to removed statements. */ FOR_EACH_LOOP (loop, 0) @@ -2587,6 +2590,7 @@ parallelize_loops (void) unsigned n_threads = flag_tree_parallelize_loops; bool changed = false; struct loop *loop; + struct loop *skip_loop = NULL; struct tree_niter_desc niter_desc; struct obstack parloop_obstack; HOST_WIDE_INT estimated; @@ -2604,6 +2608,19 @@ parallelize_loops (void) FOR_EACH_LOOP (loop, 0) { + if (loop == skip_loop) + { + if (dump_file && (dump_flags & TDF_DETAILS)) + fprintf (dump_file, + "Skipping loop %d as inner loop of parallelized loop\n", + loop->num); + + skip_loop = loop->inner; + continue; + } + else + skip_loop = NULL; + reduction_list.empty (); if (dump_file && (dump_flags & TDF_DETAILS)) { @@ -2663,6 +2680,7 @@ parallelize_loops (void) continue; changed = true; + skip_loop = loop->inner; if (dump_file && (dump_flags & TDF_DETAILS)) { if (loop->inner) @@ -2729,6 +2747,11 @@ pass_parallelize_loops::execute (function *fun) if (parallelize_loops ()) { fun->curr_properties &= ~(PROP_gimple_eomp); + +#ifdef ENABLE_CHECKING + verify_loop_structure (); +#endif + return TODO_update_ssa; } -- 1.9.1