From patchwork Fri Jul 24 10:10:53 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tom de Vries X-Patchwork-Id: 499652 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id BE3C9140B04 for ; Fri, 24 Jul 2015 20:11:29 +1000 (AEST) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b=KiMI9rh+; dkim-atps=neutral DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :message-id:date:from:mime-version:to:cc:subject:references :in-reply-to:content-type; q=dns; s=default; b=g4FfF8wATEtsMmygN iKztwIikqXwHwmllCQcBTjNsg3KlL+ohCULFX4X9DCRayuL2CiwYSRhXRSs2txQj BZhUbt3kDdsAJ7y8AcNgFgKSxKpIuopbOo53gzfuYYeWV/FW5kZAjHk3vkV+Xd8b ithvFo35PqYCg5ye4zxmupivUo= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :message-id:date:from:mime-version:to:cc:subject:references :in-reply-to:content-type; s=default; bh=8KJMVdTM1o+rtVHl5bWJf2W Ulkg=; b=KiMI9rh+R5Z7XUROxzibBSpmP4GO39Mpf5qIZYNMsQhoEFbo8d9H70N ShvVY+j9S62aLfxOVMh4kbHq87oL4N4+zgw8BVr9g/D61TvT1NCJhaMfts+TgvqV DaloGhsU38NlEQfXP+qoZ5qZNTstSpuaET476VL1O2yIRubk17mI= Received: (qmail 75260 invoked by alias); 24 Jul 2015 10:11:21 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 75244 invoked by uid 89); 24 Jul 2015 10:11:20 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.6 required=5.0 tests=AWL, BAYES_00, RP_MATCHES_RCVD, SPF_PASS autolearn=ham version=3.3.2 X-HELO: fencepost.gnu.org Received: from fencepost.gnu.org (HELO fencepost.gnu.org) (208.118.235.10) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES128-SHA encrypted) ESMTPS; Fri, 24 Jul 2015 10:11:18 +0000 Received: from eggs.gnu.org ([2001:4830:134:3::10]:36097) by fencepost.gnu.org with esmtps (TLS1.0:RSA_AES_256_CBC_SHA1:256) (Exim 4.82) (envelope-from ) id 1ZIZwi-0001s6-Dw for gcc-patches@gnu.org; Fri, 24 Jul 2015 06:11:16 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ZIZwe-0006F4-D3 for gcc-patches@gnu.org; Fri, 24 Jul 2015 06:11:15 -0400 Received: from relay1.mentorg.com ([192.94.38.131]:35783) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZIZwe-0006D6-4F for gcc-patches@gnu.org; Fri, 24 Jul 2015 06:11:12 -0400 Received: from nat-ies.mentorg.com ([192.94.31.2] helo=SVR-IES-FEM-01.mgc.mentorg.com) by relay1.mentorg.com with esmtp id 1ZIZwb-0006LL-DT from Tom_deVries@mentor.com ; Fri, 24 Jul 2015 03:11:09 -0700 Received: from [127.0.0.1] (137.202.0.76) by SVR-IES-FEM-01.mgc.mentorg.com (137.202.0.104) with Microsoft SMTP Server id 14.3.224.2; Fri, 24 Jul 2015 11:11:07 +0100 Message-ID: <55B20F2D.3060403@mentor.com> Date: Fri, 24 Jul 2015 12:10:53 +0200 From: Tom de Vries User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.7.0 MIME-Version: 1.0 To: Richard Biener CC: "gcc-patches@gnu.org" Subject: Re: [PATCH, PR66846] Mark inner loop for fixup in parloops References: <55A6B656.8080701@mentor.com> <55A77BEE.1050906@mentor.com> <55ACF1DA.7000102@mentor.com> In-Reply-To: <55ACF1DA.7000102@mentor.com> X-detected-operating-system: by eggs.gnu.org: Windows NT kernel [generic] [fuzzy] X-Received-From: 192.94.38.131 On 20/07/15 15:04, Tom de Vries wrote: > On 16/07/15 12:15, Richard Biener wrote: >> On Thu, Jul 16, 2015 at 11:39 AM, Tom de Vries >> wrote: >>> On 16/07/15 10:44, Richard Biener wrote: >>>> >>>> On Wed, Jul 15, 2015 at 9:36 PM, Tom de Vries >>>> wrote: >>>>> >>>>> Hi, >>>>> >>>>> I. >>>>> >>>>> In openmp expansion of loops, we do some effort to try to create >>>>> matching >>>>> loops in the loop state of the child function, f.i.in >>>>> expand_omp_for_generic: >>>>> ... >>>>> struct loop *outer_loop; >>>>> if (seq_loop) >>>>> outer_loop = l0_bb->loop_father; >>>>> else >>>>> { >>>>> outer_loop = alloc_loop (); >>>>> outer_loop->header = l0_bb; >>>>> outer_loop->latch = l2_bb; >>>>> add_loop (outer_loop, l0_bb->loop_father); >>>>> } >>>>> >>>>> if (!gimple_omp_for_combined_p (fd->for_stmt)) >>>>> { >>>>> struct loop *loop = alloc_loop (); >>>>> loop->header = l1_bb; >>>>> /* The loop may have multiple latches. */ >>>>> add_loop (loop, outer_loop); >>>>> } >>>>> ... >>>>> >>>>> And if that doesn't work out, we try to mark the loop state for >>>>> fixup, in >>>>> expand_omp_taskreg and expand_omp_target: >>>>> ... >>>>> /* When the OMP expansion process cannot guarantee an >>>>> up-to-date >>>>> loop tree arrange for the child function to fixup >>>>> loops. */ >>>>> if (loops_state_satisfies_p (LOOPS_NEED_FIXUP)) >>>>> child_cfun->x_current_loops->state |= LOOPS_NEED_FIXUP; >>>>> ... >>>>> >>>>> and expand_omp_for: >>>>> ... >>>>> else >>>>> /* If there isn't a continue then this is a degerate case where >>>>> the introduction of abnormal edges during lowering will >>>>> prevent >>>>> original loops from being detected. Fix that up. */ >>>>> loops_state_set (LOOPS_NEED_FIXUP); >>>>> ... >>>>> >>>>> However, loops are fixed up anyway, because the first pass we execute >>>>> with >>>>> the new child function is pass_fixup_cfg. >>>>> >>>>> The new child function contains a function call to >>>>> __builtin_omp_get_num_threads, which is marked with ECF_CONST, so >>>>> execute_fixup_cfg marks the function for TODO_cleanup_cfg, and >>>>> subsequently >>>>> the loops with LOOPS_NEED_FIXUP. >>>>> >>>>> >>>>> II. >>>>> >>>>> This patch adds a verification that at the end of the omp-expand >>>>> processing >>>>> of the child function, either the loop structure is ok, or marked for >>>>> fixup. >>>>> >>>>> This verfication triggered a failure in parloops. When an outer >>>>> loop is >>>>> being parallelized, both the outer and inner loop are cancelled. Then >>>>> during >>>>> omp-expansion, we create a loop in the loop state for the outer >>>>> loop (the >>>>> one that is transformed), but not for the inner, which causes the >>>>> verification failure: >>>>> ... >>>>> outer-1.c:11:3: error: loop with header 5 not in loop tree >>>>> ... >>>>> >>>>> [ I ran into this verification failure with an openacc kernels >>>>> testcase >>>>> on >>>>> the gomp-4_0-branch, where parloops is called additionally from a >>>>> different >>>>> location, and pass_fixup_cfg is not the first pass that the child >>>>> function >>>>> is processed by. ] >>>>> >>>>> The patch contains a bit that makes sure that the loop state of the >>>>> child >>>>> function is marked for fixup in parloops. The bit is non-trival >>>>> since it >>>>> create a loop state and sets the fixup flag on the loop state, but >>>>> postpones >>>>> the init_loops_structure call till move_sese_region_to_fn, where it >>>>> can >>>>> succeed. >>>>> >>>>> > > > >> Can we fix the root-cause of the issue instead? That is, build a >> valid loop >> structure in the first place? >> > > This patch manages to keep the loop structure, that is, to not cancel > the loop tree in parloops, and guarantee a valid loop structure at the > end of parloops. > > The transformation to insert the omp_for invalidates the loop state > properties LOOPS_HAVE_RECORDED_EXITS and LOOPS_HAVE_SIMPLE_LATCHES, so > we drop those in parloops. > > In expand_omp_for_static_nochunk, we detect the existing loop struct of > the omp_for, and keep it. > > Then by calling pass_tree_loop_init after pass_expand_omp_ssa, we get > the loop state properties LOOPS_HAVE_RECORDED_EXITS and > LOOPS_HAVE_SIMPLE_LATCHES back. > This updated patch tries a more minimal approach. Rather than dropping property LOOPS_HAVE_RECORDED_EXITS, we record the new exit instead. And rather than adding pass_tree_loop_init after pass_expand_omp_ssa, we just set LOOPS_HAVE_SIMPLE_LATCHES back at the end of pass_expand_omp_ssa. Bootstrapped and reg-tested on x86_64. OK for trunk? Thanks, - Tom Don't cancel loop tree in parloops 2015-07-20 Tom de Vries PR tree-optimization/66846 * omp-low.c (expand_omp_taskreg) [ENABLE_CHECKING]: Call verify_loop_structure for child_cfun if !LOOPS_NEED_FIXUP. (expand_omp_target) [ENABLE_CHECKING]: Same. (execute_expand_omp): Reinstate LOOPS_HAVE_SIMPLE_LATCHES if in ssa. [ENABLE_CHECKING]: Call verify_loop_structure for cfun if !LOOPS_NEED_FIXUP. (expand_omp_for_static_nochunk): Handle case that omp_for already has its own loop struct. * tree-parloops.c (create_parallel_loop): Add comment about breaking LOOPS_HAVE_SIMPLE_LATCHES. Record new exit. (gen_parallel_loop): Remove call to cancel_loop_tree. (parallelize_loops): Skip loops that are inner loops of parallelized loops. (pass_parallelize_loops::execute): Clear LOOPS_HAVE_SIMPLE_LATCHES on loop state. [ENABLE_CHECKING]: Call verify_loop_structure. --- gcc/omp-low.c | 27 ++++++++++++++++++++++++++- gcc/tree-parloops.c | 32 ++++++++++++++++++++++++++++---- 2 files changed, 54 insertions(+), 5 deletions(-) diff --git a/gcc/omp-low.c b/gcc/omp-low.c index 3135606..ce83abf 100644 --- a/gcc/omp-low.c +++ b/gcc/omp-low.c @@ -5604,6 +5604,10 @@ expand_omp_taskreg (struct omp_region *region) } if (gimple_in_ssa_p (cfun)) update_ssa (TODO_update_ssa); +#ifdef ENABLE_CHECKING + if (!loops_state_satisfies_p (LOOPS_NEED_FIXUP)) + verify_loop_structure (); +#endif pop_cfun (); } @@ -6843,9 +6847,17 @@ expand_omp_for_static_nochunk (struct omp_region *region, set_immediate_dominator (CDI_DOMINATORS, fin_bb, recompute_dominator (CDI_DOMINATORS, fin_bb)); + struct loop *loop = body_bb->loop_father; + if (loop != entry_bb->loop_father) + { + gcc_assert (loop->header == body_bb); + gcc_assert (broken_loop || loop->latch == region->cont); + return; + } + if (!broken_loop && !gimple_omp_for_combined_p (fd->for_stmt)) { - struct loop *loop = alloc_loop (); + loop = alloc_loop (); loop->header = body_bb; if (collapse_bb == NULL) loop->latch = cont_bb; @@ -8984,6 +8996,10 @@ expand_omp_target (struct omp_region *region) if (changed) cleanup_tree_cfg (); } +#ifdef ENABLE_CHECKING + if (!loops_state_satisfies_p (LOOPS_NEED_FIXUP)) + verify_loop_structure (); +#endif pop_cfun (); } @@ -9492,6 +9508,15 @@ execute_expand_omp (void) expand_omp (root_omp_region); + /* We dropped this property in parloops because of the omp_for. Now that the + omp_for has been expanded, reinstate it. */ + if (gimple_in_ssa_p (cfun)) + loops_state_set (LOOPS_HAVE_SIMPLE_LATCHES); + +#ifdef ENABLE_CHECKING + if (!loops_state_satisfies_p (LOOPS_NEED_FIXUP)) + verify_loop_structure (); +#endif cleanup_tree_cfg (); free_omp_regions (); diff --git a/gcc/tree-parloops.c b/gcc/tree-parloops.c index ec41834..1899052 100644 --- a/gcc/tree-parloops.c +++ b/gcc/tree-parloops.c @@ -2045,7 +2045,14 @@ create_parallel_loop (struct loop *loop, tree loop_fn, tree data, guard = make_edge (for_bb, ex_bb, 0); single_succ_edge (loop->latch)->flags = 0; + + /* After creating this edge, the latch has two successors, so + LOOPS_HAVE_SIMPLE_LATCHES is no longer valid. We'll update the loop state + as such at the end of the pass, since it's not needed earlier, and doing it + earlier will invalidate info for loops we still need to process. */ end = make_edge (loop->latch, ex_bb, EDGE_FALLTHRU); + rescan_loop_exit (end, true, false); + for (gphi_iterator gpi = gsi_start_phis (ex_bb); !gsi_end_p (gpi); gsi_next (&gpi)) { @@ -2282,10 +2289,6 @@ gen_parallel_loop (struct loop *loop, scev_reset (); - /* Cancel the loop (it is simpler to do it here rather than to teach the - expander to do it). */ - cancel_loop_tree (loop); - /* Free loop bound estimations that could contain references to removed statements. */ FOR_EACH_LOOP (loop, 0) @@ -2521,6 +2524,7 @@ parallelize_loops (void) unsigned n_threads = flag_tree_parallelize_loops; bool changed = false; struct loop *loop; + struct loop *skip_loop = NULL; struct tree_niter_desc niter_desc; struct obstack parloop_obstack; HOST_WIDE_INT estimated; @@ -2538,6 +2542,19 @@ parallelize_loops (void) FOR_EACH_LOOP (loop, 0) { + if (loop == skip_loop) + { + if (dump_file && (dump_flags & TDF_DETAILS)) + fprintf (dump_file, + "Skipping loop %d as inner loop of parallelized loop\n", + loop->num); + + skip_loop = loop->inner; + continue; + } + else + skip_loop = NULL; + reduction_list.empty (); if (dump_file && (dump_flags & TDF_DETAILS)) { @@ -2597,6 +2614,7 @@ parallelize_loops (void) continue; changed = true; + skip_loop = loop->inner; if (dump_file && (dump_flags & TDF_DETAILS)) { if (loop->inner) @@ -2663,6 +2681,12 @@ pass_parallelize_loops::execute (function *fun) if (parallelize_loops ()) { fun->curr_properties &= ~(PROP_gimple_eomp); + + loops_state_clear (LOOPS_HAVE_SIMPLE_LATCHES); +#ifdef ENABLE_CHECKING + verify_loop_structure (); +#endif + return TODO_update_ssa; } -- 1.9.1