From patchwork Fri Jan 15 08:11:11 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Kewen.Lin" X-Patchwork-Id: 1426832 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=sourceware.org; envelope-from=gcc-patches-bounces@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=none dis=none) header.from=gcc.gnu.org Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256 header.s=default header.b=Ls5ZEZU2; dkim-atps=neutral Received: from sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4DHDR61Rspz9sT6 for ; Fri, 15 Jan 2021 19:11:29 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 836173972019; Fri, 15 Jan 2021 08:11:27 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 836173972019 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1610698287; bh=T+zLDKLBlTYlXi1iFEdYJvs3M8aohJ6ezp+BdhFiWfQ=; h=To:Subject:Date:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:Cc:From; b=Ls5ZEZU2LjBjRfJ5fA4D1LmsJ06DKW4lx0vhR0pVcyOLt2yIiqRDPMhz91KHYnXuC gwefWql+rHE7umIMP9N5g8VN462lkPOCcnG5bWhQ7BBcwg+m1M3NyeZ0A2ncV15bi6 fG+kaWYQ88sopGW28HH07/XMuEb/8gkblpj66K1k= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by sourceware.org (Postfix) with ESMTPS id E7D6F3858034 for ; Fri, 15 Jan 2021 08:11:23 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org E7D6F3858034 Received: from pps.filterd (m0098393.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.42/8.16.0.42) with SMTP id 10F82ew0048403; Fri, 15 Jan 2021 03:11:20 -0500 Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com with ESMTP id 36370t8jku-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 15 Jan 2021 03:11:20 -0500 Received: from m0098393.ppops.net (m0098393.ppops.net [127.0.0.1]) by pps.reinject (8.16.0.36/8.16.0.36) with SMTP id 10F8BJA5084945; Fri, 15 Jan 2021 03:11:19 -0500 Received: from ppma04fra.de.ibm.com (6a.4a.5195.ip4.static.sl-reverse.com [149.81.74.106]) by mx0a-001b2d01.pphosted.com with ESMTP id 36370t8jjg-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 15 Jan 2021 03:11:19 -0500 Received: from pps.filterd (ppma04fra.de.ibm.com [127.0.0.1]) by ppma04fra.de.ibm.com (8.16.0.42/8.16.0.42) with SMTP id 10F81LPW012220; Fri, 15 Jan 2021 08:11:17 GMT Received: from b06avi18626390.portsmouth.uk.ibm.com (b06avi18626390.portsmouth.uk.ibm.com [9.149.26.192]) by ppma04fra.de.ibm.com with ESMTP id 3604h9b4a2-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 15 Jan 2021 08:11:17 +0000 Received: from d06av24.portsmouth.uk.ibm.com (mk.ibm.com [9.149.105.60]) by b06avi18626390.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 10F8B9Oa27525516 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Fri, 15 Jan 2021 08:11:09 GMT Received: from d06av24.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 25B8D42042; Fri, 15 Jan 2021 08:11:15 +0000 (GMT) Received: from d06av24.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id E6BF842049; Fri, 15 Jan 2021 08:11:12 +0000 (GMT) Received: from KewenLins-MacBook-Pro.local (unknown [9.197.224.233]) by d06av24.portsmouth.uk.ibm.com (Postfix) with ESMTP; Fri, 15 Jan 2021 08:11:12 +0000 (GMT) To: GCC Patches Subject: [PATCH] vect: Use factored nloads for load cost modeling [PR82255] Message-ID: <9ea04d21-11b2-2b78-756f-f421df03fc8b@linux.ibm.com> Date: Fri, 15 Jan 2021 16:11:11 +0800 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.16; rv:78.0) Gecko/20100101 Thunderbird/78.5.0 MIME-Version: 1.0 Content-Language: en-US X-TM-AS-GCONF: 00 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.343, 18.0.737 definitions=2021-01-15_03:2021-01-15, 2021-01-15 signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 bulkscore=0 adultscore=0 impostorscore=0 mlxscore=0 malwarescore=0 clxscore=1015 lowpriorityscore=0 phishscore=0 spamscore=0 mlxlogscore=999 suspectscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2009150000 definitions=main-2101150043 X-Spam-Status: No, score=-10.8 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, RCVD_IN_DNSWL_LOW, RCVD_IN_MSPIKE_H4, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: "Kewen.Lin via Gcc-patches" From: "Kewen.Lin" Reply-To: "Kewen.Lin" Cc: Richard Sandiford , Bill Schmidt , Segher Boessenkool Errors-To: gcc-patches-bounces@gcc.gnu.org Sender: "Gcc-patches" Hi, This patch follows Richard's suggestion in the thread discussion[1], it's to factor out the nloads computation in vectorizable_load for strided access, to ensure we can obtain the consistent information when estimating the costs. btw, the reason why I didn't try to save the information into stmt_info during analysis phase and then fetch it in transform phase is that the information is just for strided slp loading, and to re-compute it looks not very expensive and acceptable. Bootstrapped/regtested on powerpc64le-linux-gnu P9, x86_64-redhat-linux and aarch64-linux-gnu. Is it ok for trunk? Or it belongs to next stage 1? BR, Kewen [1] https://gcc.gnu.org/legacy-ml/gcc-patches/2017-09/msg01433.html gcc/ChangeLog: PR tree-optimization/82255 * tree-vect-stmts.c (vector_vector_composition_type): Adjust function location. (struct strided_load_info): New structure. (vect_get_strided_load_info): New function factored out from... (vectorizable_load): ...this. Call function vect_get_strided_load_info accordingly. (vect_model_load_cost): Call function vect_get_strided_load_info. gcc/testsuite/ChangeLog: 2020-01-15 Bill Schmidt Kewen Lin PR tree-optimization/82255 * gcc.dg/vect/costmodel/ppc/costmodel-pr82255.c: New test. diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-pr82255.c b/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-pr82255.c new file mode 100644 index 00000000000..aaeefc39595 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-pr82255.c @@ -0,0 +1,32 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_int } */ + + +/* PR82255: Ensure we don't require a vec_construct cost when we aren't + going to generate a strided load. */ + +extern int abs (int __x) __attribute__ ((__nothrow__, __leaf__)) +__attribute__ ((__const__)); + +static int +foo (unsigned char *w, int i, unsigned char *x, int j) +{ + int tot = 0; + for (int a = 0; a < 16; a++) + { +#pragma GCC unroll 16 + for (int b = 0; b < 16; b++) + tot += abs (w[b] - x[b]); + w += i; + x += j; + } + return tot; +} + +void +bar (unsigned char *w, unsigned char *x, int i, int *result) +{ + *result = foo (w, 16, x, i); +} + +/* { dg-final { scan-tree-dump-times "vec_construct" 0 "vect" } } */ diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c index 068e4982303..d1cbc55a676 100644 --- a/gcc/tree-vect-stmts.c +++ b/gcc/tree-vect-stmts.c @@ -897,6 +897,146 @@ cfun_returns (tree decl) return false; } +/* Function VECTOR_VECTOR_COMPOSITION_TYPE + + This function returns a vector type which can be composed with NETLS pieces, + whose type is recorded in PTYPE. VTYPE should be a vector type, and has the + same vector size as the return vector. It checks target whether supports + pieces-size vector mode for construction firstly, if target fails to, check + pieces-size scalar mode for construction further. It returns NULL_TREE if + fails to find the available composition. + + For example, for (vtype=V16QI, nelts=4), we can probably get: + - V16QI with PTYPE V4QI. + - V4SI with PTYPE SI. + - NULL_TREE. */ + +static tree +vector_vector_composition_type (tree vtype, poly_uint64 nelts, tree *ptype) +{ + gcc_assert (VECTOR_TYPE_P (vtype)); + gcc_assert (known_gt (nelts, 0U)); + + machine_mode vmode = TYPE_MODE (vtype); + if (!VECTOR_MODE_P (vmode)) + return NULL_TREE; + + poly_uint64 vbsize = GET_MODE_BITSIZE (vmode); + unsigned int pbsize; + if (constant_multiple_p (vbsize, nelts, &pbsize)) + { + /* First check if vec_init optab supports construction from + vector pieces directly. */ + scalar_mode elmode = SCALAR_TYPE_MODE (TREE_TYPE (vtype)); + poly_uint64 inelts = pbsize / GET_MODE_BITSIZE (elmode); + machine_mode rmode; + if (related_vector_mode (vmode, elmode, inelts).exists (&rmode) + && (convert_optab_handler (vec_init_optab, vmode, rmode) + != CODE_FOR_nothing)) + { + *ptype = build_vector_type (TREE_TYPE (vtype), inelts); + return vtype; + } + + /* Otherwise check if exists an integer type of the same piece size and + if vec_init optab supports construction from it directly. */ + if (int_mode_for_size (pbsize, 0).exists (&elmode) + && related_vector_mode (vmode, elmode, nelts).exists (&rmode) + && (convert_optab_handler (vec_init_optab, rmode, elmode) + != CODE_FOR_nothing)) + { + *ptype = build_nonstandard_integer_type (pbsize, 1); + return build_vector_type (*ptype, nelts); + } + } + + return NULL_TREE; +} + +/* Hold information for VMAT_ELEMENTWISE or VMAT_STRIDED_SLP strided + loads in function vectorizable_load. */ +struct strided_load_info { + /* Number of loads required. */ + int nloads; + /* Number of vector unit advanced for each load. */ + int lnel; + /* Access type for each load. */ + tree ltype; + /* Vector type used for possible vector construction. */ + tree lvectype; +}; + +/* Determine how we perform VMAT_ELEMENTWISE or VMAT_STRIDED_SLP loads by + considering vector units number, group size and target supports for + vector construction, return the strided_load_info information. + ACCESS_TYPE indicates memory access type, VECTYPE indicates vector + type, GROUP_SIZE indicates group size for grouped_access. */ + +static strided_load_info +vect_get_strided_load_info (vect_memory_access_type access_type, tree vectype, + int group_size) +{ + int nloads, lnel; + tree ltype, lvectype; + + poly_uint64 nunits = TYPE_VECTOR_SUBPARTS (vectype); + /* Checked by get_load_store_type. */ + unsigned int const_nunits = nunits.to_constant (); + + if (access_type == VMAT_STRIDED_SLP) + { + if ((unsigned int) group_size < const_nunits) + { + /* First check if vec_init optab supports construction from vector + elts directly. Otherwise avoid emitting a constructor of + vector elements by performing the loads using an integer type + of the same size, constructing a vector of those and then + re-interpreting it as the original vector type. This avoids a + huge runtime penalty due to the general inability to perform + store forwarding from smaller stores to a larger load. */ + tree ptype; + unsigned int nelts = const_nunits / group_size; + tree vtype = vector_vector_composition_type (vectype, nelts, &ptype); + if (vtype != NULL_TREE) + { + nloads = nelts; + lnel = group_size; + ltype = ptype; + lvectype = vtype; + } + else + { + nloads = const_nunits; + lnel = 1; + ltype = TREE_TYPE (vectype); + lvectype = vectype; + } + } + else + { + nloads = 1; + lnel = const_nunits; + ltype = vectype; + lvectype = vectype; + } + ltype = build_aligned_type (ltype, TYPE_ALIGN (TREE_TYPE (vectype))); + } + else + { + gcc_assert (access_type == VMAT_ELEMENTWISE); + nloads = const_nunits; + lnel = 1; + /* Load vector(1) scalar_type if it's 1 element-wise vectype. */ + if (nloads == 1) + ltype = vectype; + else + ltype = TREE_TYPE (vectype); + lvectype = vectype; + } + + return {nloads, lnel, ltype, lvectype}; +} + /* Function vect_model_store_cost Models cost for stores. In the case of grouped accesses, one access @@ -1157,8 +1297,22 @@ vect_model_load_cost (vec_info *vinfo, cost_vec, cost_vec, true); if (memory_access_type == VMAT_ELEMENTWISE || memory_access_type == VMAT_STRIDED_SLP) - inside_cost += record_stmt_cost (cost_vec, ncopies, vec_construct, - stmt_info, 0, vect_body); + { + tree vectype = STMT_VINFO_VECTYPE (stmt_info); + int group_size = 1; + if (slp_node && grouped_access_p) + { + first_stmt_info = DR_GROUP_FIRST_ELEMENT (stmt_info); + group_size = DR_GROUP_SIZE (first_stmt_info); + } + strided_load_info + info = vect_get_strided_load_info (memory_access_type, + vectype, group_size); + /* Only requires vec_construct when number of loads > 1. */ + if (info.nloads > 1) + inside_cost += record_stmt_cost (cost_vec, ncopies, vec_construct, + stmt_info, 0, vect_body); + } if (dump_enabled_p ()) dump_printf_loc (MSG_NOTE, vect_location, @@ -1996,62 +2150,6 @@ vect_get_store_rhs (stmt_vec_info stmt_info) gcc_unreachable (); } -/* Function VECTOR_VECTOR_COMPOSITION_TYPE - - This function returns a vector type which can be composed with NETLS pieces, - whose type is recorded in PTYPE. VTYPE should be a vector type, and has the - same vector size as the return vector. It checks target whether supports - pieces-size vector mode for construction firstly, if target fails to, check - pieces-size scalar mode for construction further. It returns NULL_TREE if - fails to find the available composition. - - For example, for (vtype=V16QI, nelts=4), we can probably get: - - V16QI with PTYPE V4QI. - - V4SI with PTYPE SI. - - NULL_TREE. */ - -static tree -vector_vector_composition_type (tree vtype, poly_uint64 nelts, tree *ptype) -{ - gcc_assert (VECTOR_TYPE_P (vtype)); - gcc_assert (known_gt (nelts, 0U)); - - machine_mode vmode = TYPE_MODE (vtype); - if (!VECTOR_MODE_P (vmode)) - return NULL_TREE; - - poly_uint64 vbsize = GET_MODE_BITSIZE (vmode); - unsigned int pbsize; - if (constant_multiple_p (vbsize, nelts, &pbsize)) - { - /* First check if vec_init optab supports construction from - vector pieces directly. */ - scalar_mode elmode = SCALAR_TYPE_MODE (TREE_TYPE (vtype)); - poly_uint64 inelts = pbsize / GET_MODE_BITSIZE (elmode); - machine_mode rmode; - if (related_vector_mode (vmode, elmode, inelts).exists (&rmode) - && (convert_optab_handler (vec_init_optab, vmode, rmode) - != CODE_FOR_nothing)) - { - *ptype = build_vector_type (TREE_TYPE (vtype), inelts); - return vtype; - } - - /* Otherwise check if exists an integer type of the same piece size and - if vec_init optab supports construction from it directly. */ - if (int_mode_for_size (pbsize, 0).exists (&elmode) - && related_vector_mode (vmode, elmode, nelts).exists (&rmode) - && (convert_optab_handler (vec_init_optab, rmode, elmode) - != CODE_FOR_nothing)) - { - *ptype = build_nonstandard_integer_type (pbsize, 1); - return build_vector_type (*ptype, nelts); - } - } - - return NULL_TREE; -} - /* A subroutine of get_load_store_type, with a subset of the same arguments. Handle the case where STMT_INFO is part of a grouped load or store. @@ -8745,49 +8843,7 @@ vectorizable_load (vec_info *vinfo, stride_step = cse_and_gimplify_to_preheader (loop_vinfo, stride_step); - running_off = offvar; - alias_off = build_int_cst (ref_type, 0); - int nloads = const_nunits; - int lnel = 1; - tree ltype = TREE_TYPE (vectype); - tree lvectype = vectype; auto_vec dr_chain; - if (memory_access_type == VMAT_STRIDED_SLP) - { - if (group_size < const_nunits) - { - /* First check if vec_init optab supports construction from vector - elts directly. Otherwise avoid emitting a constructor of - vector elements by performing the loads using an integer type - of the same size, constructing a vector of those and then - re-interpreting it as the original vector type. This avoids a - huge runtime penalty due to the general inability to perform - store forwarding from smaller stores to a larger load. */ - tree ptype; - tree vtype - = vector_vector_composition_type (vectype, - const_nunits / group_size, - &ptype); - if (vtype != NULL_TREE) - { - nloads = const_nunits / group_size; - lnel = group_size; - lvectype = vtype; - ltype = ptype; - } - } - else - { - nloads = 1; - lnel = const_nunits; - ltype = vectype; - } - ltype = build_aligned_type (ltype, TYPE_ALIGN (TREE_TYPE (vectype))); - } - /* Load vector(1) scalar_type if it's 1 element-wise vectype. */ - else if (nloads == 1) - ltype = vectype; - if (slp) { /* For SLP permutation support we need to load the whole group, @@ -8804,6 +8860,13 @@ vectorizable_load (vec_info *vinfo, else ncopies = SLP_TREE_NUMBER_OF_VEC_STMTS (slp_node); } + + running_off = offvar; + alias_off = build_int_cst (ref_type, 0); + strided_load_info + sload_info = vect_get_strided_load_info (memory_access_type, vectype, group_size); + int nloads = sload_info.nloads; + tree ltype = sload_info.ltype; unsigned int group_el = 0; unsigned HOST_WIDE_INT elsz = tree_to_uhwi (TYPE_SIZE_UNIT (TREE_TYPE (vectype))); @@ -8824,7 +8887,7 @@ vectorizable_load (vec_info *vinfo, CONSTRUCTOR_APPEND_ELT (v, NULL_TREE, gimple_assign_lhs (new_stmt)); - group_el += lnel; + group_el += sload_info.lnel; if (! slp || group_el == group_size) { @@ -8839,11 +8902,11 @@ vectorizable_load (vec_info *vinfo, } if (nloads > 1) { - tree vec_inv = build_constructor (lvectype, v); - new_temp = vect_init_vector (vinfo, stmt_info, - vec_inv, lvectype, gsi); + tree vec_inv = build_constructor (sload_info.lvectype, v); + new_temp = vect_init_vector (vinfo, stmt_info, vec_inv, + sload_info.lvectype, gsi); new_stmt = SSA_NAME_DEF_STMT (new_temp); - if (lvectype != vectype) + if (sload_info.lvectype != vectype) { new_stmt = gimple_build_assign (make_ssa_name (vectype), VIEW_CONVERT_EXPR,