From patchwork Mon Oct 12 17:26:47 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tom de Vries X-Patchwork-Id: 529264 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id EC20C1402B7 for ; Tue, 13 Oct 2015 04:28:09 +1100 (AEDT) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b=jZxyfENq; dkim-atps=neutral DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :subject:to:references:cc:from:message-id:date:mime-version :in-reply-to:content-type; q=dns; s=default; b=m6fJ6dAxNVcMzuhS1 qrdWe4yTusHA/PkL5Ezks2fCCcRuxSy0pop2DFpaWhy7SAE2k/9ij9FgZn+dAmxt wePL7p3HtKfGE74AjFwg5zsZaf9UeEBf7SkjqKKzgTgjyAbb7o3MpKk4DB2ccSvN +KhwruZzebOM7aSsX37iNaSy4g= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :subject:to:references:cc:from:message-id:date:mime-version :in-reply-to:content-type; s=default; bh=YX41Hd2pIt+BY1dhHylOxhV 4n8Y=; b=jZxyfENqWpSdiWqZrbtCjuU7DCHlacvZQGceQvxgsajO9vdPkz5PhSj VULf48iF5N1xnxF6H4qaRzgKJpavO+rDprxibDO/oA2r9uUgTHGWRnkSyp1j5vAi uL4kEF2qOpXxw+vvomz/9mUTMwqXk/VgiUgMXHW0UKIGNru0b6Mk= Received: (qmail 49072 invoked by alias); 12 Oct 2015 17:27:52 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 49004 invoked by uid 89); 12 Oct 2015 17:27:51 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.0 required=5.0 tests=AWL, BAYES_00, SPF_PASS, T_RP_MATCHES_RCVD autolearn=ham version=3.3.2 X-HELO: fencepost.gnu.org Received: from fencepost.gnu.org (HELO fencepost.gnu.org) (208.118.235.10) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES128-SHA encrypted) ESMTPS; Mon, 12 Oct 2015 17:27:44 +0000 Received: from eggs.gnu.org ([2001:4830:134:3::10]:59400) by fencepost.gnu.org with esmtps (TLS1.0:RSA_AES_256_CBC_SHA1:256) (Exim 4.82) (envelope-from ) id 1Zlgsv-0006H6-R6 for gcc-patches@gnu.org; Mon, 12 Oct 2015 13:27:42 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Zlgsr-0002YE-EB for gcc-patches@gnu.org; Mon, 12 Oct 2015 13:27:41 -0400 Received: from relay1.mentorg.com ([192.94.38.131]:55306) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Zlgsr-0002Y4-4x for gcc-patches@gnu.org; Mon, 12 Oct 2015 13:27:37 -0400 Received: from nat-ies.mentorg.com ([192.94.31.2] helo=SVR-IES-FEM-01.mgc.mentorg.com) by relay1.mentorg.com with esmtp id 1Zlgsp-0006LZ-4M from Tom_deVries@mentor.com ; Mon, 12 Oct 2015 10:27:35 -0700 Received: from [127.0.0.1] (137.202.0.76) by SVR-IES-FEM-01.mgc.mentorg.com (137.202.0.104) with Microsoft SMTP Server id 14.3.224.2; Mon, 12 Oct 2015 18:27:33 +0100 Subject: [committed, gomp4, 2/3] Handle sequential code in kernels region To: "gcc-patches@gnu.org" References: <561BEA02.6010808@mentor.com> CC: Jakub Jelinek From: Tom de Vries Message-ID: <561BED57.7030400@mentor.com> Date: Mon, 12 Oct 2015 19:26:47 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.3.0 MIME-Version: 1.0 In-Reply-To: <561BEA02.6010808@mentor.com> X-detected-operating-system: by eggs.gnu.org: Windows NT kernel [generic] [fuzzy] X-Received-From: 192.94.38.131 On 12/10/15 19:12, Tom de Vries wrote: > Hi, > > I've committed the following patch series. > > 1 Add get_bbs_in_oacc_kernels_region > 2 Handle sequential code in kernels region > 3 Handle sequential code in kernels region - Testcases > > The patch series adds detection of whether sequential code (that is, > code in the oacc kernels region before and after the loop that is to be > parallelized), is safe to execute in parallel. > > Bootstrapped and reg-tested on x86_64. > > I'll post the patches individually, in reply to this email. This patch checks in parloops, for each non-loop stmt in the oacc kernels region, that it's not a load aliasing with a store anywhere in the region, and vice versa. An exception are loads and stores for reductions, which are later-on transformed into an atomic update. Thanks, - Tom Handle sequential code in kernels region 2015-10-12 Tom de Vries * omp-low.c (lower_omp_for): Don't call lower_oacc_head_tail for oacc kernels regions. * tree-parloops.c (try_create_reduction_list): Initialize keep_res field. (dead_load_p, ref_conflicts_with_region, oacc_entry_exit_ok_1) (oacc_entry_exit_ok): New function. (parallelize_loops): Call oacc_entry_exit_ok. --- gcc/omp-low.c | 3 +- gcc/tree-parloops.c | 245 ++++++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 247 insertions(+), 1 deletion(-) diff --git a/gcc/omp-low.c b/gcc/omp-low.c index f6e0247..e700dd1 100644 --- a/gcc/omp-low.c +++ b/gcc/omp-low.c @@ -11949,7 +11949,8 @@ lower_omp_for (gimple_stmt_iterator *gsi_p, omp_context *ctx) /* Once lowered, extract the bounds and clauses. */ extract_omp_for_data (stmt, &fd, NULL); - if (is_gimple_omp_oacc (ctx->stmt)) + if (is_gimple_omp_oacc (ctx->stmt) + && !ctx_in_oacc_kernels_region (ctx)) lower_oacc_head_tail (gimple_location (stmt), gimple_omp_for_clauses (stmt), &oacc_head, &oacc_tail, ctx); diff --git a/gcc/tree-parloops.c b/gcc/tree-parloops.c index 4b67793..d4eb32a 100644 --- a/gcc/tree-parloops.c +++ b/gcc/tree-parloops.c @@ -58,6 +58,8 @@ along with GCC; see the file COPYING3. If not see #include "cgraph.h" #include "tree-ssa.h" #include "params.h" +#include "tree-ssa-alias.h" +#include "tree-eh.h" /* This pass tries to distribute iterations of loops into several threads. The implementation is straightforward -- for each loop we test whether its @@ -2672,6 +2674,7 @@ try_create_reduction_list (loop_p loop, " FAILED: it is not a part of reduction.\n"); return false; } + red->keep_res = phi; if (dump_file && (dump_flags & TDF_DETAILS)) { fprintf (dump_file, "reduction phi is "); @@ -2764,6 +2767,240 @@ try_create_reduction_list (loop_p loop, return true; } +/* Return true if STMT is a load of which the result is unused, and can be + safely deleted. */ + +static bool +dead_load_p (gimple *stmt) +{ + if (!gimple_assign_load_p (stmt)) + return false; + + tree lhs = gimple_assign_lhs (stmt); + return (TREE_CODE (lhs) == SSA_NAME + && has_zero_uses (lhs) + && !gimple_has_side_effects (stmt) + && !stmt_could_throw_p (stmt)); +} + +static bool +ref_conflicts_with_region (gimple_stmt_iterator gsi, ao_ref *ref, + bool ref_is_store, vec region_bbs, + unsigned int i, gimple *skip_stmt) +{ + basic_block bb = region_bbs[i]; + gsi_next (&gsi); + + while (true) + { + for (; !gsi_end_p (gsi); + gsi_next (&gsi)) + { + gimple *stmt = gsi_stmt (gsi); + if (stmt == skip_stmt) + { + if (dump_file) + { + fprintf (dump_file, "skipping reduction store: "); + print_gimple_stmt (dump_file, stmt, 0, 0); + } + continue; + } + + if (!gimple_vdef (stmt) + && !gimple_vuse (stmt)) + continue; + + if (ref_is_store) + { + if (dead_load_p (stmt)) + { + if (dump_file) + { + fprintf (dump_file, "skipping dead load: "); + print_gimple_stmt (dump_file, stmt, 0, 0); + } + continue; + } + + if (ref_maybe_used_by_stmt_p (stmt, ref)) + { + if (dump_file) + { + fprintf (dump_file, "Stmt "); + print_gimple_stmt (dump_file, stmt, 0, 0); + } + return true; + } + } + else + { + if (stmt_may_clobber_ref_p_1 (stmt, ref)) + { + if (dump_file) + { + fprintf (dump_file, "Stmt "); + print_gimple_stmt (dump_file, stmt, 0, 0); + } + return true; + } + } + } + i++; + if (i == region_bbs.length ()) + break; + bb = region_bbs[i]; + gsi = gsi_start_bb (bb); + } + + return false; +} + +static bool +oacc_entry_exit_ok_1 (bitmap in_loop_bbs, vec region_bbs, + tree omp_data_i, + reduction_info_table_type *reduction_list) +{ + unsigned i; + basic_block bb; + FOR_EACH_VEC_ELT (region_bbs, i, bb) + { + if (bitmap_bit_p (in_loop_bbs, bb->index)) + continue; + + gimple_stmt_iterator gsi; + for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); + gsi_next (&gsi)) + { + gimple *stmt = gsi_stmt (gsi); + gimple *skip_stmt = NULL; + + if (is_gimple_debug (stmt) + || gimple_code (stmt) == GIMPLE_COND) + continue; + + ao_ref ref; + bool ref_is_store = false; + if (gimple_assign_load_p (stmt)) + { + tree rhs = gimple_assign_rhs1 (stmt); + tree base = get_base_address (rhs); + if (TREE_CODE (base) == MEM_REF + && operand_equal_p (TREE_OPERAND (base, 0), omp_data_i, 0)) + continue; + + /* By testing for dead loads (here and in + ref_conflicts_with_region), we avoid having to run pass_dce + before pass_parallelize_loops_oacc_kernels. */ + if (dead_load_p (stmt)) + { + if (dump_file) + { + fprintf (dump_file, "skipping dead load: "); + print_gimple_stmt (dump_file, stmt, 0, 0); + } + continue; + } + + tree lhs = gimple_assign_lhs (stmt); + if (TREE_CODE (lhs) == SSA_NAME + && has_single_use (lhs)) + { + use_operand_p use_p; + gimple *use_stmt; + single_imm_use (lhs, &use_p, &use_stmt); + if (gimple_code (use_stmt) == GIMPLE_PHI) + { + struct reduction_info *red; + red = reduction_phi (reduction_list, use_stmt); + tree val = PHI_RESULT (red->keep_res); + if (has_single_use (val)) + { + single_imm_use (val, &use_p, &use_stmt); + if (gimple_store_p (use_stmt)) + { + skip_stmt = use_stmt; + if (dump_file) + { + fprintf (dump_file, "found reduction load: "); + print_gimple_stmt (dump_file, stmt, 0, 0); + } + } + } + } + } + + ao_ref_init (&ref, rhs); + } + else if (gimple_store_p (stmt)) + { + ao_ref ref; + ao_ref_init (&ref, gimple_assign_lhs (stmt)); + ref_is_store = true; + } + else if (gimple_code (stmt) == GIMPLE_OMP_RETURN) + continue; + else if (gimple_stmt_omp_data_i_init_p (stmt)) + continue; + else if (!gimple_has_side_effects (stmt) + && !gimple_could_trap_p (stmt) + && !stmt_could_throw_p (stmt) + && !gimple_vdef (stmt) + && !gimple_vuse (stmt)) + continue; + else + { + if (dump_file) + { + fprintf (dump_file, "Unhandled stmt in entry/exit: "); + print_gimple_stmt (dump_file, stmt, 0, 0); + } + return false; + } + + if (ref_conflicts_with_region (gsi, &ref, ref_is_store, region_bbs, + i, skip_stmt)) + { + if (dump_file) + { + fprintf (dump_file, "conflicts with entry/exit stmt: "); + print_gimple_stmt (dump_file, stmt, 0, 0); + } + return false; + } + } + } + + return true; +} + +static bool +oacc_entry_exit_ok (struct loop *loop, basic_block region_entry, + reduction_info_table_type *reduction_list) +{ + basic_block *loop_bbs = get_loop_body_in_dom_order (loop); + basic_block region_exit + = get_oacc_kernels_region_exit (single_succ (region_entry)); + vec region_bbs + = get_bbs_in_oacc_kernels_region (region_entry, region_exit); + tree omp_data_i = get_omp_data_i (region_entry); + gcc_assert (omp_data_i != NULL_TREE); + + bitmap in_loop_bbs = BITMAP_ALLOC (NULL); + bitmap_clear (in_loop_bbs); + for (unsigned int i = 0; i < loop->num_nodes; i++) + bitmap_set_bit (in_loop_bbs, loop_bbs[i]->index); + + bool res = oacc_entry_exit_ok_1 (in_loop_bbs, region_bbs, omp_data_i, + reduction_list); + + free (loop_bbs); + + BITMAP_FREE (in_loop_bbs); + + return res; +} + /* Detect parallel loops and generate parallel code using libgomp primitives. Returns true if some loop was parallelized, false otherwise. */ @@ -2901,6 +3138,14 @@ parallelize_loops (bool oacc_kernels_p) continue; } + if (oacc_kernels_p + && !oacc_entry_exit_ok (loop, region_entry, &reduction_list)) + { + if (dump_file) + fprintf (dump_file, "entry/exit not ok: FAILED\n"); + continue; + } + changed = true; /* Skip inner loop(s) of parallelized loop. */ skip_loop = loop->inner; -- 1.9.1