From patchwork Wed Jan 13 23:48:42 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Julian Brown X-Patchwork-Id: 1426064 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=sourceware.org; envelope-from=gcc-patches-bounces@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=codesourcery.com Received: from sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4DGPL15kNNz9sVy for ; Thu, 14 Jan 2021 10:49:13 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id A72D13870855; Wed, 13 Jan 2021 23:49:06 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from esa3.mentor.iphmx.com (esa3.mentor.iphmx.com [68.232.137.180]) by sourceware.org (Postfix) with ESMTPS id A1B05386F80C for ; Wed, 13 Jan 2021 23:49:03 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org A1B05386F80C Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=codesourcery.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=Julian_Brown@mentor.com IronPort-SDR: dmx3wAylI0vkOfxzT+/ydwXTLp9zDsoEfzhAXQxBHVyJJZxityO6XfQKcApPjm62qFCtqXtcnl 9geMxwEXhYO01haY27t1N9nCneCmQbeAlJ/hLbYaZzo1cmvciwaQnE+zV0C0Q1xMTmxdsF/uqo ynrLWc+q4+AP+DzinP9nA76LgjUwmljuDoUCtDPrOeVZUy3wyjg+t0l/nn+RFYRlOLu7S79bnX O18hurr/QSSq6YtBwFFRp7DkLBRLSLLnS1spVf+QRU2w52TptEAOl7ntsVE5zoYW3uUQAHGcxe 8rM= X-IronPort-AV: E=Sophos;i="5.79,345,1602576000"; d="scan'208";a="57052895" Received: from orw-gwy-02-in.mentorg.com ([192.94.38.167]) by esa3.mentor.iphmx.com with ESMTP; 13 Jan 2021 15:48:59 -0800 IronPort-SDR: lWVPgAqzZqjPeFYsKFIUya7E2dEkUWQTu3i5Ah9X6xfp4tiNOR1LwiA1oA8LYX2pp6OwXDU1Q/ fcwz7b7DVorZdU2mGsH9mns079AXE3kh/eF9sE3MsPq/Zc/2ieqTC+l6rUBPUtOhVbQlBznLkG I8rfHg/jfKV8N1jeQsSIha2UzZY91m0LRGdno26OfC4mCjc357zbh3yFCZGOQqfvweDLyyh9Yk o99aE0yQyyk7JrjZkEqYktGRhJJ3aRD40fUseKHlWvJrRkeG6vs0tnqIl5ygn+NP6tYfdGVzA0 50A= From: Julian Brown To: Subject: [PATCH] [og10] vect: Add target hook to prefer gather/scatter instructions Date: Wed, 13 Jan 2021 15:48:42 -0800 Message-ID: <20210113234842.71133-2-julian@codesourcery.com> X-Mailer: git-send-email 2.29.2 MIME-Version: 1.0 X-Originating-IP: [137.202.0.90] X-ClientProxiedBy: SVR-IES-MBX-07.mgc.mentorg.com (139.181.222.7) To SVR-IES-MBX-04.mgc.mentorg.com (139.181.222.4) X-Spam-Status: No, score=-12.1 required=5.0 tests=BAYES_00, GIT_PATCH_0, HEADER_FROM_DIFFERENT_DOMAINS, KAM_DMARC_STATUS, SPF_HELO_PASS, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces@gcc.gnu.org Sender: "Gcc-patches" For AMD GCN, the instructions available for loading/storing vectors are always scatter/gather operations (i.e. there are separate addresses for each vector lane), so the current heuristic to avoid gather/scatter operations with too many elements in get_group_load_store_type is counterproductive. Avoiding such operations in that function can subsequently lead to a missed vectorization opportunity whereby later analyses in the vectorizer try to use a very wide array type which is not available on this target, and thus it bails out. The attached patch adds a target hook to override the "single_element_p" heuristic in the function as a target hook, and activates it for GCN. This allows much better code to be generated for affected loops. Tested with offloading to AMD GCN. I will apply to the og10 branch shortly. Julian 2021-01-13 Julian Brown gcc/ * doc/tm.texi.in (TARGET_VECTORIZE_PREFER_GATHER_SCATTER): Add documentation hook. * doc/tm.texi: Regenerate. * target.def (prefer_gather_scatter): Add target hook under vectorizer. * tree-vect-stmts.c (get_group_load_store_type): Optionally prefer gather/scatter instructions to scalar/elementwise fallback. * config/gcn/gcn.c (TARGET_VECTORIZE_PREFER_GATHER_SCATTER): Define hook. --- gcc/config/gcn/gcn.c | 2 ++ gcc/doc/tm.texi | 5 +++++ gcc/doc/tm.texi.in | 2 ++ gcc/target.def | 8 ++++++++ gcc/tree-vect-stmts.c | 9 +++++++-- 5 files changed, 24 insertions(+), 2 deletions(-) diff --git a/gcc/config/gcn/gcn.c b/gcc/config/gcn/gcn.c index ee9f00558305..ea88b5e91244 100644 --- a/gcc/config/gcn/gcn.c +++ b/gcc/config/gcn/gcn.c @@ -6501,6 +6501,8 @@ gcn_dwarf_register_span (rtx rtl) gcn_vector_alignment_reachable #undef TARGET_VECTOR_MODE_SUPPORTED_P #define TARGET_VECTOR_MODE_SUPPORTED_P gcn_vector_mode_supported_p +#undef TARGET_VECTORIZE_PREFER_GATHER_SCATTER +#define TARGET_VECTORIZE_PREFER_GATHER_SCATTER true struct gcc_target targetm = TARGET_INITIALIZER; diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi index 581b7b51eeb0..bd0b2eea477a 100644 --- a/gcc/doc/tm.texi +++ b/gcc/doc/tm.texi @@ -6122,6 +6122,11 @@ The default is @code{NULL_TREE} which means to not vectorize scatter stores. @end deftypefn +@deftypevr {Target Hook} bool TARGET_VECTORIZE_PREFER_GATHER_SCATTER +This hook is set to TRUE if gather loads or scatter stores are cheaper on +this target than a sequence of elementwise loads or stores. +@end deftypevr + @deftypefn {Target Hook} int TARGET_SIMD_CLONE_COMPUTE_VECSIZE_AND_SIMDLEN (struct cgraph_node *@var{}, struct cgraph_simd_clone *@var{}, @var{tree}, @var{int}) This hook should set @var{vecsize_mangle}, @var{vecsize_int}, @var{vecsize_float} fields in @var{simd_clone} structure pointed by @var{clone_info} argument and also diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in index afa19d4ac63c..c0883e5da82c 100644 --- a/gcc/doc/tm.texi.in +++ b/gcc/doc/tm.texi.in @@ -4195,6 +4195,8 @@ address; but often a machine-dependent strategy can generate better code. @hook TARGET_VECTORIZE_BUILTIN_SCATTER +@hook TARGET_VECTORIZE_PREFER_GATHER_SCATTER + @hook TARGET_SIMD_CLONE_COMPUTE_VECSIZE_AND_SIMDLEN @hook TARGET_SIMD_CLONE_ADJUST diff --git a/gcc/target.def b/gcc/target.def index 00421f3a6acd..0b34ab5c3d52 100644 --- a/gcc/target.def +++ b/gcc/target.def @@ -2027,6 +2027,14 @@ all zeros. GCC can then try to branch around the instruction instead.", (unsigned ifn), default_empty_mask_is_expensive) +/* Prefer gather/scatter loads/stores to e.g. elementwise accesses if\n\ +we cannot use a contiguous access. */ +DEFHOOKPOD +(prefer_gather_scatter, + "This hook is set to TRUE if gather loads or scatter stores are cheaper on\n\ +this target than a sequence of elementwise loads or stores.", + bool, false) + /* Target builtin that implements vector gather operation. */ DEFHOOK (builtin_gather, diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c index 9ace345fc5e2..e117d3d16afc 100644 --- a/gcc/tree-vect-stmts.c +++ b/gcc/tree-vect-stmts.c @@ -2444,9 +2444,14 @@ get_group_load_store_type (stmt_vec_info stmt_info, tree vectype, bool slp, it probably isn't a win to use separate strided accesses based on nearby locations. Or, even if it's a win over scalar code, it might not be a win over vectorizing at a lower VF, if that - allows us to use contiguous accesses. */ + allows us to use contiguous accesses. + + On some targets (e.g. AMD GCN), always use gather/scatter accesses + here since those are the only types of vector loads/stores available, + and the fallback case of using elementwise accesses is very + inefficient. */ if (*memory_access_type == VMAT_ELEMENTWISE - && single_element_p + && (targetm.vectorize.prefer_gather_scatter || single_element_p) && loop_vinfo && vect_use_strided_gather_scatters_p (stmt_info, loop_vinfo, masked_p, gs_info))