From patchwork Mon Jun 29 20:17:30 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Julian Brown X-Patchwork-Id: 1319166 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=8.43.85.97; helo=sourceware.org; envelope-from=gcc-patches-bounces@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=codesourcery.com Received: from sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 49wf1X3DCkz9sX9 for ; Tue, 30 Jun 2020 06:17:52 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 0CC11386F816; Mon, 29 Jun 2020 20:17:50 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from esa4.mentor.iphmx.com (esa4.mentor.iphmx.com [68.232.137.252]) by sourceware.org (Postfix) with ESMTPS id 2D3BE386F037 for ; Mon, 29 Jun 2020 20:17:47 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 2D3BE386F037 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=codesourcery.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=Julian_Brown@mentor.com IronPort-SDR: sNYvl7hfTY3xtn9ycBmStQrYIwyHOXE5bNE4UynONwxu8YcLKCe0XlV6LCuTWsV9M+LWqDcSes O3CyQUOUtwZrFJBdD8C4WgUMznoYnX3gTfVQfVAPUhqH7OyFXQTFDCCHgmuzB7vI47UfMWPkx9 VptCihChJAIPshVEU02Qd5PUJF39x8UbvWjjX/zPC9R1IJknoGNnlbN/yibo43a6cwC8HEEHp0 svL5pMtV02FvltFGBZ0hCvOUeKVxrF/0tw4tDCqUQ1P/5uLMzrirmbHS/hA7vgjK9lTXe1gI+q CEc= X-IronPort-AV: E=Sophos;i="5.75,295,1589270400"; d="scan'208";a="50548883" Received: from orw-gwy-02-in.mentorg.com ([192.94.38.167]) by esa4.mentor.iphmx.com with ESMTP; 29 Jun 2020 12:17:46 -0800 IronPort-SDR: n2RXs9QNcS0icsnWSjgVubE/9XdtroXEiykXTH9eTH1NkZ6/6p6yKkBBRiZIjTjeU98l7iChnt ENmy8iFxfEWdHVTYyF913DHajuMH7KNUwVrNd6WmJOfdu7+JGWq4HHNiHPYmIC+ZZQL+UY/rLB O09S1jK8rXDBlvObr7EG4byzCyO/9ligIk1D2zx3+KSW/OU/K+J8EGZq11jXZDpm5L02J2Gw4d kzemvLbmr39Rp2pB5asLKYb/hb+D2BnGlxJ26BHpeaUNHRLO4Wjbmrf0Irii0qCAMxydmeVQdv eFQ= From: Julian Brown To: Subject: [PATCH] [og10] OpenACC: Remove unnecessary barriers (gimple worker partitioning/broadcast) Date: Mon, 29 Jun 2020 13:17:30 -0700 Message-ID: <20200629201730.896-1-julian@codesourcery.com> X-Mailer: git-send-email 2.23.0 MIME-Version: 1.0 X-Originating-IP: [137.202.0.90] X-ClientProxiedBy: SVR-IES-MBX-04.mgc.mentorg.com (139.181.222.4) To SVR-IES-MBX-03.mgc.mentorg.com (139.181.222.3) X-Spam-Status: No, score=-12.7 required=5.0 tests=BAYES_00, GIT_PATCH_0, HEADER_FROM_DIFFERENT_DOMAINS, KAM_DMARC_STATUS, SPF_HELO_PASS, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Jakub Jelinek , Andrew Stubbs Errors-To: gcc-patches-bounces@gcc.gnu.org Sender: "Gcc-patches" This is an optimisation for middle-end worker-partitioning support (used to support multiple workers on AMD GCN). At present, barriers may be emitted in cases where they aren't needed and cannot be optimised away. This patch stops the extraneous barriers from being emitted in the first place. One exception to the above (where the barrier is still needed) is for predicated blocks of code that perform a write to gang-private shared memory from one worker. We must execute a barrier before other workers read that shared memory location. OK for og10 branch? Julian ChangeLog gcc/ * config/gcn/gcn.c (gimple.h): Include. (gcn_fork_join): Emit barrier for worker-level joins. * omp-sese.c (find_local_vars_to_propagate): Add writes_gangprivate bitmap parameter. Set bit for blocks containing gang-private variable writes. (worker_single_simple): Don't emit barrier after predicated block. (worker_single_copy): Don't emit barrier if we're not broadcasting anything and the block contains no gang-private writes. (neuter_worker_single): Don't predicate blocks that only contain NOPs or internal marker functions. Pass has_gangprivate_write argument to worker_single_copy. (oacc_do_neutering): Add writes_gangprivate bitmap handling. --- gcc/config/gcn/gcn.c | 9 +++- gcc/omp-sese.c | 115 +++++++++++++++++++++++++++++++++---------- 2 files changed, 97 insertions(+), 27 deletions(-) diff --git a/gcc/config/gcn/gcn.c b/gcc/config/gcn/gcn.c index bf996b461547..35b2ef5e752b 100644 --- a/gcc/config/gcn/gcn.c +++ b/gcc/config/gcn/gcn.c @@ -50,6 +50,7 @@ #include "varasm.h" #include "intl.h" #include "rtl-iter.h" +#include "gimple.h" /* This file should be included last. */ #include "target-def.h" @@ -4898,9 +4899,15 @@ gcn_oacc_dim_pos (int dim) /* Implement TARGET_GOACC_FORK_JOIN. */ static bool -gcn_fork_join (gcall *ARG_UNUSED (call), const int *ARG_UNUSED (dims), +gcn_fork_join (gcall *call, const int *ARG_UNUSED (dims), bool ARG_UNUSED (is_fork)) { + tree arg = gimple_call_arg (call, 2); + unsigned axis = TREE_INT_CST_LOW (arg); + + if (!is_fork && axis == GOMP_DIM_WORKER && dims[axis] != 1) + return true; + return false; } diff --git a/gcc/omp-sese.c b/gcc/omp-sese.c index 4dd3417066c6..80697358efec 100644 --- a/gcc/omp-sese.c +++ b/gcc/omp-sese.c @@ -768,16 +768,19 @@ static void find_local_vars_to_propagate (parallel_g *par, unsigned outer_mask, hash_set *partitioned_var_uses, hash_set *gangprivate_vars, + bitmap writes_gangprivate, vec *prop_set) { unsigned mask = outer_mask | par->mask; if (par->inner) find_local_vars_to_propagate (par->inner, mask, partitioned_var_uses, - gangprivate_vars, prop_set); + gangprivate_vars, writes_gangprivate, + prop_set); if (par->next) find_local_vars_to_propagate (par->next, outer_mask, partitioned_var_uses, - gangprivate_vars, prop_set); + gangprivate_vars, writes_gangprivate, + prop_set); if (!(mask & GOMP_DIM_MASK (GOMP_DIM_WORKER))) { @@ -798,8 +801,7 @@ find_local_vars_to_propagate (parallel_g *par, unsigned outer_mask, if (!VAR_P (var) || is_global_var (var) || AGGREGATE_TYPE_P (TREE_TYPE (var)) - || !partitioned_var_uses->contains (var) - || gangprivate_vars->contains (var)) + || !partitioned_var_uses->contains (var)) continue; if (stmt_may_clobber_ref_p (stmt, var)) @@ -813,6 +815,14 @@ find_local_vars_to_propagate (parallel_g *par, unsigned outer_mask, fprintf (dump_file, "\n"); } + if (gangprivate_vars->contains (var)) + { + /* If we write a gang-private variable, we want a + barrier at the end of the block. */ + bitmap_set_bit (writes_gangprivate, block->index); + continue; + } + if (!(*prop_set)[block->index]) (*prop_set)[block->index] = new propagation_set; @@ -924,14 +934,6 @@ worker_single_simple (basic_block from, basic_block to, } } } - - gsi = gsi_start_bb (skip_block); - - decl = builtin_decl_explicit (BUILT_IN_GOACC_BARRIER); - gimple *acc_bar = gimple_build_call (decl, 0); - - gsi_insert_before (&gsi, acc_bar, GSI_SAME_STMT); - update_stmt (acc_bar); } /* This is a copied and renamed omp-low.c:omp_build_component_ref. */ @@ -1009,7 +1011,7 @@ worker_single_copy (basic_block from, basic_block to, hash_set *def_escapes_block, hash_set *worker_partitioned_uses, tree record_type, unsigned HOST_WIDE_INT placement, - bool isolate_broadcasts) + bool isolate_broadcasts, bool has_gangprivate_write) { /* If we only have virtual defs, we'll have no record type, but we still want to emit single_copy_start and (particularly) single_copy_end to act as @@ -1090,14 +1092,19 @@ worker_single_copy (basic_block from, basic_block to, edge ef = make_edge (from, barrier_block, EDGE_FALSE_VALUE); ef->probability = et->probability.invert (); - decl = builtin_decl_explicit (BUILT_IN_GOACC_BARRIER); - gimple *acc_bar = gimple_build_call (decl, 0); - gimple_stmt_iterator bar_gsi = gsi_start_bb (barrier_block); - gsi_insert_before (&bar_gsi, acc_bar, GSI_NEW_STMT); - cond = gimple_build_cond (NE_EXPR, recv_tmp, zero_ptr, NULL_TREE, NULL_TREE); - gsi_insert_after (&bar_gsi, cond, GSI_NEW_STMT); + + if (record_type != char_type_node || has_gangprivate_write) + { + decl = builtin_decl_explicit (BUILT_IN_GOACC_BARRIER); + gimple *acc_bar = gimple_build_call (decl, 0); + + gsi_insert_before (&bar_gsi, acc_bar, GSI_NEW_STMT); + gsi_insert_after (&bar_gsi, cond, GSI_NEW_STMT); + } + else + gsi_insert_before (&bar_gsi, cond, GSI_NEW_STMT); edge et2 = split_block (barrier_block, cond); et2->flags &= ~EDGE_FALLTHRU; @@ -1259,7 +1266,8 @@ neuter_worker_single (parallel_g *par, unsigned outer_mask, bitmap worker_single, bitmap vector_single, vec *prop_set, hash_set *partitioned_var_uses, - blk_offset_map_t *blk_offset_map) + blk_offset_map_t *blk_offset_map, + bitmap writes_gangprivate) { unsigned mask = outer_mask | par->mask; @@ -1345,19 +1353,69 @@ neuter_worker_single (parallel_g *par, unsigned outer_mask, (*prop_set)[block->index] = 0; } - tree record_type = (tree) block->aux; + bool only_marker_fns = true; + bool join_block = false; + + for (gimple_stmt_iterator gsi = gsi_start_bb (block); + !gsi_end_p (gsi); + gsi_next (&gsi)) + { + gimple *stmt = gsi_stmt (gsi); + if (gimple_code (stmt) == GIMPLE_CALL + && gimple_call_internal_p (stmt, IFN_UNIQUE)) + { + enum ifn_unique_kind k = ((enum ifn_unique_kind) + TREE_INT_CST_LOW (gimple_call_arg (stmt, 0))); + if (k != IFN_UNIQUE_OACC_PRIVATE + && k != IFN_UNIQUE_OACC_JOIN + && k != IFN_UNIQUE_OACC_FORK + && k != IFN_UNIQUE_OACC_HEAD_MARK + && k != IFN_UNIQUE_OACC_TAIL_MARK) + only_marker_fns = false; + else if (k == IFN_UNIQUE_OACC_JOIN) + /* The JOIN marker is special in that it *cannot* be + predicated for worker zero, because it may be lowered + to a barrier instruction and all workers must typically + execute that barrier. We shouldn't be doing any + broadcasts from the join block anyway. */ + join_block = true; + } + else if (gimple_code (stmt) == GIMPLE_CALL + && gimple_call_internal_p (stmt, IFN_GOACC_LOOP)) + /* Empty. */; + else if (gimple_nop_p (stmt)) + /* Empty. */; + else + only_marker_fns = false; + } + + /* We can skip predicating this block for worker zero if the only + thing it contains is marker functions that will be removed in the + oaccdevlow pass anyway. + Don't do this if the block has (any) phi nodes, because those + might define SSA names that need broadcasting. + TODO: We might be able to skip transforming blocks that only + contain some other trivial statements too. */ + if (only_marker_fns && !phi_nodes (block)) + continue; + + gcc_assert (!join_block); if (has_defs) { + tree record_type = (tree) block->aux; auto off_rngalloc = blk_offset_map->get (block); gcc_assert (!record_type || off_rngalloc); unsigned HOST_WIDE_INT offset = off_rngalloc ? off_rngalloc->first : 0; bool range_allocated = off_rngalloc ? off_rngalloc->second : true; + bool has_gangprivate_write + = bitmap_bit_p (writes_gangprivate, block->index); worker_single_copy (block, block, &def_escapes_block, &worker_partitioned_uses, record_type, - offset, !range_allocated); + offset, !range_allocated, + has_gangprivate_write); } else worker_single_simple (block, block, &def_escapes_block); @@ -1394,10 +1452,12 @@ neuter_worker_single (parallel_g *par, unsigned outer_mask, if (par->inner) neuter_worker_single (par->inner, mask, worker_single, vector_single, - prop_set, partitioned_var_uses, blk_offset_map); + prop_set, partitioned_var_uses, blk_offset_map, + writes_gangprivate); if (par->next) neuter_worker_single (par->next, outer_mask, worker_single, vector_single, - prop_set, partitioned_var_uses, blk_offset_map); + prop_set, partitioned_var_uses, blk_offset_map, + writes_gangprivate); } @@ -1595,11 +1655,13 @@ oacc_do_neutering (unsigned HOST_WIDE_INT bounds_lo, hash_set partitioned_var_uses; hash_set gangprivate_vars; + auto_bitmap writes_gangprivate; find_gangprivate_vars (&gangprivate_vars); find_partitioned_var_uses (par, mask, &partitioned_var_uses); find_local_vars_to_propagate (par, mask, &partitioned_var_uses, - &gangprivate_vars, &prop_set); + &gangprivate_vars, writes_gangprivate, + &prop_set); FOR_ALL_BB_FN (bb, cfun) { @@ -1747,7 +1809,8 @@ oacc_do_neutering (unsigned HOST_WIDE_INT bounds_lo, sbitmap_vector_free (reachable); neuter_worker_single (par, mask, worker_single, vector_single, &prop_set, - &partitioned_var_uses, &blk_offset_map); + &partitioned_var_uses, &blk_offset_map, + writes_gangprivate); prop_set.release ();