From patchwork Thu Sep 14 03:11:50 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Kewen.Lin" X-Patchwork-Id: 1833960 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256 header.s=default header.b=MFt9Llkt; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4RmMpp43k0z1yhd for ; Thu, 14 Sep 2023 13:13:38 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 7FFD43852763 for ; Thu, 14 Sep 2023 03:13:36 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 7FFD43852763 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1694661216; bh=pFu4B7Y/fwvGUPyoPOxg6dny9rtausyBqCWslV2S1+4=; h=To:Cc:Subject:Date:In-Reply-To:References:List-Id: List-Unsubscribe:List-Archive:List-Post:List-Help:List-Subscribe: From:Reply-To:From; b=MFt9LlktUdKy2CghomGbbxbsHUVzKuPqS6y3uCQKI9n5HZY9pO0ctfFPvS+pMXPOp qjY0xDJVCFtyvn/BgeCw07Ie9vunV+C7dKIzngJVO4Da121rRIfhuhX3KzJF5ygt7l PWSQtDZJJCDIN0z3wO14f41lZxI/X0MU+DE2+gPA= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by sourceware.org (Postfix) with ESMTPS id 8088D3858C5F for ; Thu, 14 Sep 2023 03:12:10 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 8088D3858C5F Received: from pps.filterd (m0353727.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 38E38tnq005534; Thu, 14 Sep 2023 03:12:07 GMT Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3t3sd5ru58-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 14 Sep 2023 03:12:06 +0000 Received: from m0353727.ppops.net (m0353727.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 38E3BvkM015388; Thu, 14 Sep 2023 03:12:06 GMT Received: from ppma22.wdc07v.mail.ibm.com (5c.69.3da9.ip4.static.sl-reverse.com [169.61.105.92]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3t3sd5ru4u-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 14 Sep 2023 03:12:06 +0000 Received: from pps.filterd (ppma22.wdc07v.mail.ibm.com [127.0.0.1]) by ppma22.wdc07v.mail.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 38E1WMBF012069; Thu, 14 Sep 2023 03:12:05 GMT Received: from smtprelay04.fra02v.mail.ibm.com ([9.218.2.228]) by ppma22.wdc07v.mail.ibm.com (PPS) with ESMTPS id 3t13e002hh-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 14 Sep 2023 03:12:04 +0000 Received: from smtpav06.fra02v.mail.ibm.com (smtpav06.fra02v.mail.ibm.com [10.20.54.105]) by smtprelay04.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 38E3C2qZ45220532 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 14 Sep 2023 03:12:03 GMT Received: from smtpav06.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 978F920049; Thu, 14 Sep 2023 03:12:02 +0000 (GMT) Received: from smtpav06.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id D9A9120040; Thu, 14 Sep 2023 03:12:01 +0000 (GMT) Received: from trout.aus.stglabs.ibm.com (unknown [9.40.194.100]) by smtpav06.fra02v.mail.ibm.com (Postfix) with ESMTP; Thu, 14 Sep 2023 03:12:01 +0000 (GMT) To: gcc-patches@gcc.gnu.org Cc: richard.guenther@gmail.com, richard.sandiford@arm.com Subject: [PATCH 01/10] vect: Ensure vect store is supported for some VMAT_ELEMENTWISE case Date: Wed, 13 Sep 2023 22:11:50 -0500 Message-Id: X-Mailer: git-send-email 2.31.1 In-Reply-To: References: MIME-Version: 1.0 X-TM-AS-GCONF: 00 X-Proofpoint-ORIG-GUID: rAt9XSGBWAHWWkYhun0-IP1eGt5n3NsJ X-Proofpoint-GUID: e98G5SqHweh76m49-6joZhdEmPJ81a6P X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.267,Aquarius:18.0.980,Hydra:6.0.601,FMLib:17.11.176.26 definitions=2023-09-13_19,2023-09-13_01,2023-05-22_02 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 bulkscore=0 impostorscore=0 mlxscore=0 priorityscore=1501 clxscore=1015 malwarescore=0 mlxlogscore=999 spamscore=0 lowpriorityscore=0 suspectscore=0 adultscore=0 phishscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2308100000 definitions=main-2309140025 X-Spam-Status: No, score=-12.3 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_EF, GIT_PATCH_0, RCVD_IN_MSPIKE_H4, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Kewen Lin via Gcc-patches From: "Kewen.Lin" Reply-To: Kewen Lin Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" When making/testing patches to move costing next to the transform code for vectorizable_store, some ICEs got exposed when I further refined the costing handlings on VMAT_ELEMENTWISE. The apparent cause is triggering the assertion in rs6000 specific function for costing rs6000_builtin_vectorization_cost: if (TARGET_ALTIVEC) /* Misaligned stores are not supported. */ gcc_unreachable (); I used vect_get_store_cost instead of the original way by record_stmt_cost with scalar_store for costing, that is to use one unaligned_store instead, it matches what we use in transforming, it's a vector store as below: else if (group_size >= const_nunits && group_size % const_nunits == 0) { nstores = 1; lnel = const_nunits; ltype = vectype; lvectype = vectype; } So IMHO it's more consistent with vector store instead of scalar store, with the given compilation option -mno-allow-movmisalign, the misaligned vector store is unexpected to be used in vectorizer, but why it's still adopted? In the current implementation of function get_group_load_store_type, we always set alignment support scheme as dr_unaligned_supported for VMAT_ELEMENTWISE, it is true if we always adopt scalar stores, but as the above code shows, we could use vector stores for some cases, so we should use the correct alignment support scheme for it. This patch is to ensure the vector store is supported by further checking with vect_supportable_dr_alignment. The ICEs got exposed with patches moving costing next to the transform but they haven't been landed, the test coverage would be there once they get landed. The affected test cases are: - gcc.dg/vect/slp-45.c - gcc.dg/vect/vect-alias-check-{10,11,12}.c btw, I tried to make some correctness test case, but I realized that -mno-allow-movmisalign is mainly for noting movmisalign optab and it doesn't guard for the actual hw vector memory access insns, so I failed to make it unless I also altered some conditions for them as it. gcc/ChangeLog: * tree-vect-stmts.cc (vectorizable_store): Ensure the generated vector store for some case of VMAT_ELEMENTWISE is supported. --- gcc/tree-vect-stmts.cc | 16 ++++++++++++---- 1 file changed, 12 insertions(+), 4 deletions(-) diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc index cd7c1090d88..a5caaf0bca2 100644 --- a/gcc/tree-vect-stmts.cc +++ b/gcc/tree-vect-stmts.cc @@ -8558,10 +8558,18 @@ vectorizable_store (vec_info *vinfo, else if (group_size >= const_nunits && group_size % const_nunits == 0) { - nstores = 1; - lnel = const_nunits; - ltype = vectype; - lvectype = vectype; + int mis_align = dr_misalignment (first_dr_info, vectype); + dr_alignment_support dr_align + = vect_supportable_dr_alignment (vinfo, dr_info, vectype, + mis_align); + if (dr_align == dr_aligned + || dr_align == dr_unaligned_supported) + { + nstores = 1; + lnel = const_nunits; + ltype = vectype; + lvectype = vectype; + } } ltype = build_aligned_type (ltype, TYPE_ALIGN (elem_type)); ncopies = SLP_TREE_NUMBER_OF_VEC_STMTS (slp_node); From patchwork Thu Sep 14 03:11:51 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Kewen.Lin" X-Patchwork-Id: 1833954 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256 header.s=default header.b=HhI5ARmV; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4RmMnc0cQhz1yhd for ; Thu, 14 Sep 2023 13:12:34 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 703803858032 for ; Thu, 14 Sep 2023 03:12:32 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 703803858032 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1694661152; bh=hBnjte7FFsy6RRoK9snmkszsO0pbNv4vGDg463ngIMw=; h=To:Cc:Subject:Date:In-Reply-To:References:List-Id: List-Unsubscribe:List-Archive:List-Post:List-Help:List-Subscribe: From:Reply-To:From; b=HhI5ARmVbmAp03l93Js8ryuBdNcYwV8YmCXFwnrJw/iGQ07Mv3YEoEtZFL8YsHzpY wtW5jYNnrVpyoUBrT4APOi5Yoeqv99Ivd+9WbL2bRFTu9Q42n9iaNxOB2xpXZsOJhJ rd9Sr5EBkR04j8JLfjj1apN3NL3XNzlEPjBJytF8= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mx0b-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) by sourceware.org (Postfix) with ESMTPS id EC26C3858C2B for ; Thu, 14 Sep 2023 03:12:08 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org EC26C3858C2B Received: from pps.filterd (m0356516.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 38E38tU5028540; Thu, 14 Sep 2023 03:12:07 GMT Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3t3sq3rar4-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 14 Sep 2023 03:12:06 +0000 Received: from m0356516.ppops.net (m0356516.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 38E3ALtU002343; Thu, 14 Sep 2023 03:12:06 GMT Received: from ppma12.dal12v.mail.ibm.com (dc.9e.1632.ip4.static.sl-reverse.com [50.22.158.220]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3t3sq3raqy-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 14 Sep 2023 03:12:06 +0000 Received: from pps.filterd (ppma12.dal12v.mail.ibm.com [127.0.0.1]) by ppma12.dal12v.mail.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 38E18D4F024088; Thu, 14 Sep 2023 03:12:05 GMT Received: from smtprelay05.fra02v.mail.ibm.com ([9.218.2.225]) by ppma12.dal12v.mail.ibm.com (PPS) with ESMTPS id 3t131tg7j6-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 14 Sep 2023 03:12:05 +0000 Received: from smtpav06.fra02v.mail.ibm.com (smtpav06.fra02v.mail.ibm.com [10.20.54.105]) by smtprelay05.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 38E3C3hq16056960 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 14 Sep 2023 03:12:03 GMT Received: from smtpav06.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id B0D5E20040; Thu, 14 Sep 2023 03:12:03 +0000 (GMT) Received: from smtpav06.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id E89932004B; Thu, 14 Sep 2023 03:12:02 +0000 (GMT) Received: from trout.aus.stglabs.ibm.com (unknown [9.40.194.100]) by smtpav06.fra02v.mail.ibm.com (Postfix) with ESMTP; Thu, 14 Sep 2023 03:12:02 +0000 (GMT) To: gcc-patches@gcc.gnu.org Cc: richard.guenther@gmail.com, richard.sandiford@arm.com Subject: [PATCH 02/10] vect: Move vect_model_store_cost next to the transform in vectorizable_store Date: Wed, 13 Sep 2023 22:11:51 -0500 Message-Id: <1539ec7d34af4e38467420b3aed342d708a64a48.1694657494.git.linkw@linux.ibm.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: References: MIME-Version: 1.0 X-TM-AS-GCONF: 00 X-Proofpoint-GUID: DF7KICfyTP2M0VtLkqkznXEtcQJoW0n4 X-Proofpoint-ORIG-GUID: cpsy1P73M4FF_amiWGjShHfXRIcpDQe7 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.267,Aquarius:18.0.980,Hydra:6.0.601,FMLib:17.11.176.26 definitions=2023-09-13_19,2023-09-13_01,2023-05-22_02 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 mlxscore=0 suspectscore=0 adultscore=0 spamscore=0 clxscore=1015 malwarescore=0 phishscore=0 bulkscore=0 lowpriorityscore=0 priorityscore=1501 impostorscore=0 mlxlogscore=973 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2308100000 definitions=main-2309140025 X-Spam-Status: No, score=-12.3 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_EF, GIT_PATCH_0, RCVD_IN_MSPIKE_H4, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Kewen Lin via Gcc-patches From: "Kewen.Lin" Reply-To: Kewen Lin Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" This patch is an initial patch to move costing next to the transform, it still adopts vect_model_store_cost for costing but moves and duplicates it down according to the handlings of different vect_memory_access_types or some special handling need, hope it can make the subsequent patches easy to review. This patch should not have any functional changes. gcc/ChangeLog: * tree-vect-stmts.cc (vectorizable_store): Move and duplicate the call to vect_model_store_cost down to some different transform paths according to the handlings of different vect_memory_access_types or some special handling need. --- gcc/tree-vect-stmts.cc | 79 ++++++++++++++++++++++++++++++++---------- 1 file changed, 60 insertions(+), 19 deletions(-) diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc index a5caaf0bca2..36f7c5b9f4b 100644 --- a/gcc/tree-vect-stmts.cc +++ b/gcc/tree-vect-stmts.cc @@ -8372,7 +8372,8 @@ vectorizable_store (vec_info *vinfo, return false; } - if (!vec_stmt) /* transformation not required. */ + bool costing_p = !vec_stmt; + if (costing_p) /* transformation not required. */ { STMT_VINFO_MEMORY_ACCESS_TYPE (stmt_info) = memory_access_type; @@ -8401,11 +8402,6 @@ vectorizable_store (vec_info *vinfo, "Vectorizing an unaligned access.\n"); STMT_VINFO_TYPE (stmt_info) = store_vec_info_type; - vect_model_store_cost (vinfo, stmt_info, ncopies, - memory_access_type, &gs_info, - alignment_support_scheme, - misalignment, vls_type, slp_node, cost_vec); - return true; } gcc_assert (memory_access_type == STMT_VINFO_MEMORY_ACCESS_TYPE (stmt_info)); @@ -8415,12 +8411,27 @@ vectorizable_store (vec_info *vinfo, if (memory_access_type == VMAT_GATHER_SCATTER && gs_info.decl) { - vect_build_scatter_store_calls (vinfo, stmt_info, gsi, vec_stmt, - &gs_info, mask); + if (costing_p) + vect_model_store_cost (vinfo, stmt_info, ncopies, memory_access_type, + &gs_info, alignment_support_scheme, misalignment, + vls_type, slp_node, cost_vec); + else + vect_build_scatter_store_calls (vinfo, stmt_info, gsi, vec_stmt, + &gs_info, mask); return true; } else if (STMT_VINFO_SIMD_LANE_ACCESS_P (stmt_info) >= 3) - return vectorizable_scan_store (vinfo, stmt_info, gsi, vec_stmt, ncopies); + { + gcc_assert (memory_access_type == VMAT_CONTIGUOUS); + if (costing_p) + { + vect_model_store_cost (vinfo, stmt_info, ncopies, memory_access_type, + &gs_info, alignment_support_scheme, + misalignment, vls_type, slp_node, cost_vec); + return true; + } + return vectorizable_scan_store (vinfo, stmt_info, gsi, vec_stmt, ncopies); + } if (grouped_store) { @@ -8449,13 +8460,21 @@ vectorizable_store (vec_info *vinfo, else ref_type = reference_alias_ptr_type (DR_REF (first_dr_info->dr)); - if (dump_enabled_p ()) - dump_printf_loc (MSG_NOTE, vect_location, - "transform store. ncopies = %d\n", ncopies); + if (!costing_p && dump_enabled_p ()) + dump_printf_loc (MSG_NOTE, vect_location, "transform store. ncopies = %d\n", + ncopies); if (memory_access_type == VMAT_ELEMENTWISE || memory_access_type == VMAT_STRIDED_SLP) { + if (costing_p) + { + vect_model_store_cost (vinfo, stmt_info, ncopies, memory_access_type, + &gs_info, alignment_support_scheme, + misalignment, vls_type, slp_node, cost_vec); + return true; + } + gimple_stmt_iterator incr_gsi; bool insert_after; gimple *incr; @@ -8718,8 +8737,9 @@ vectorizable_store (vec_info *vinfo, else if (memory_access_type == VMAT_GATHER_SCATTER) { aggr_type = elem_type; - vect_get_strided_load_store_ops (stmt_info, loop_vinfo, gsi, &gs_info, - &bump, &vec_offset, loop_lens); + if (!costing_p) + vect_get_strided_load_store_ops (stmt_info, loop_vinfo, gsi, &gs_info, + &bump, &vec_offset, loop_lens); } else { @@ -8731,7 +8751,7 @@ vectorizable_store (vec_info *vinfo, memory_access_type, loop_lens); } - if (mask) + if (mask && !costing_p) LOOP_VINFO_HAS_MASK_STORE (loop_vinfo) = true; /* In case the vectorization factor (VF) is bigger than the number @@ -8782,6 +8802,13 @@ vectorizable_store (vec_info *vinfo, if (memory_access_type == VMAT_LOAD_STORE_LANES) { gcc_assert (!slp && grouped_store); + if (costing_p) + { + vect_model_store_cost (vinfo, stmt_info, ncopies, memory_access_type, + &gs_info, alignment_support_scheme, + misalignment, vls_type, slp_node, cost_vec); + return true; + } for (j = 0; j < ncopies; j++) { gimple *new_stmt; @@ -8927,6 +8954,13 @@ vectorizable_store (vec_info *vinfo, if (memory_access_type == VMAT_GATHER_SCATTER) { gcc_assert (!slp && !grouped_store); + if (costing_p) + { + vect_model_store_cost (vinfo, stmt_info, ncopies, memory_access_type, + &gs_info, alignment_support_scheme, + misalignment, vls_type, slp_node, cost_vec); + return true; + } auto_vec vec_offsets; for (j = 0; j < ncopies; j++) { @@ -9091,7 +9125,7 @@ vectorizable_store (vec_info *vinfo, for (j = 0; j < ncopies; j++) { gimple *new_stmt; - if (j == 0) + if (j == 0 && !costing_p) { if (slp) { @@ -9158,7 +9192,7 @@ vectorizable_store (vec_info *vinfo, offset, &dummy, gsi, &ptr_incr, simd_lane_access_p, bump); } - else + else if (!costing_p) { gcc_assert (!LOOP_VINFO_USING_SELECT_VL_P (loop_vinfo)); /* DR_CHAIN is then used as an input to vect_permute_store_chain(). @@ -9179,7 +9213,7 @@ vectorizable_store (vec_info *vinfo, } new_stmt = NULL; - if (grouped_store) + if (!costing_p && grouped_store) /* Permute. */ vect_permute_store_chain (vinfo, dr_chain, group_size, stmt_info, gsi, &result_chain); @@ -9187,6 +9221,8 @@ vectorizable_store (vec_info *vinfo, stmt_vec_info next_stmt_info = first_stmt_info; for (i = 0; i < vec_num; i++) { + if (costing_p) + continue; unsigned misalign; unsigned HOST_WIDE_INT align; @@ -9361,7 +9397,7 @@ vectorizable_store (vec_info *vinfo, if (!next_stmt_info) break; } - if (!slp) + if (!slp && !costing_p) { if (j == 0) *vec_stmt = new_stmt; @@ -9369,6 +9405,11 @@ vectorizable_store (vec_info *vinfo, } } + if (costing_p) + vect_model_store_cost (vinfo, stmt_info, ncopies, memory_access_type, + &gs_info, alignment_support_scheme, misalignment, + vls_type, slp_node, cost_vec); + return true; } From patchwork Thu Sep 14 03:11:52 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Kewen.Lin" X-Patchwork-Id: 1833964 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256 header.s=default header.b=sv9G2X1C; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=8.43.85.97; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4RmMqx70sxz1yhd for ; Thu, 14 Sep 2023 13:14:37 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id F291B382DC75 for ; Thu, 14 Sep 2023 03:14:35 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org F291B382DC75 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1694661276; bh=uL6bd/ub+xytwJBUOufIlU+oclxP38BlDoWzO935S6I=; h=To:Cc:Subject:Date:In-Reply-To:References:List-Id: List-Unsubscribe:List-Archive:List-Post:List-Help:List-Subscribe: From:Reply-To:From; b=sv9G2X1C/C9VR3y+3mRJLRMyaXoq7nu8MyKTB0Va9O936Y/B1AX2N148pV06DvVRJ VCxlfosiGLQVJ0beTcF5Q7aZVh89zJIEtsktFlsBVmS8Wp7mAzrTFestc+WIIgCFPY 5VbAS+NphHkhQpt7m2SVxyXS/AbSCkbKTNeiEwpY= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by sourceware.org (Postfix) with ESMTPS id 6BB023858C3A for ; Thu, 14 Sep 2023 03:12:11 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 6BB023858C3A Received: from pps.filterd (m0353726.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 38E37MKY022234; Thu, 14 Sep 2023 03:12:08 GMT Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3t3ssm0c0v-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 14 Sep 2023 03:12:07 +0000 Received: from m0353726.ppops.net (m0353726.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 38E38CMv027867; Thu, 14 Sep 2023 03:12:07 GMT Received: from ppma12.dal12v.mail.ibm.com (dc.9e.1632.ip4.static.sl-reverse.com [50.22.158.220]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3t3ssm0c0m-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 14 Sep 2023 03:12:07 +0000 Received: from pps.filterd (ppma12.dal12v.mail.ibm.com [127.0.0.1]) by ppma12.dal12v.mail.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 38E198SP024021; Thu, 14 Sep 2023 03:12:06 GMT Received: from smtprelay06.fra02v.mail.ibm.com ([9.218.2.230]) by ppma12.dal12v.mail.ibm.com (PPS) with ESMTPS id 3t131tg7j8-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 14 Sep 2023 03:12:06 +0000 Received: from smtpav06.fra02v.mail.ibm.com (smtpav06.fra02v.mail.ibm.com [10.20.54.105]) by smtprelay06.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 38E3C4G344368432 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 14 Sep 2023 03:12:04 GMT Received: from smtpav06.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id C038220049; Thu, 14 Sep 2023 03:12:04 +0000 (GMT) Received: from smtpav06.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 0546720040; Thu, 14 Sep 2023 03:12:04 +0000 (GMT) Received: from trout.aus.stglabs.ibm.com (unknown [9.40.194.100]) by smtpav06.fra02v.mail.ibm.com (Postfix) with ESMTP; Thu, 14 Sep 2023 03:12:03 +0000 (GMT) To: gcc-patches@gcc.gnu.org Cc: richard.guenther@gmail.com, richard.sandiford@arm.com Subject: [PATCH 03/10] vect: Adjust vectorizable_store costing on VMAT_GATHER_SCATTER Date: Wed, 13 Sep 2023 22:11:52 -0500 Message-Id: <8abc6ddb4683d9058ffb48eb54f3a717e655efb4.1694657494.git.linkw@linux.ibm.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: References: MIME-Version: 1.0 X-TM-AS-GCONF: 00 X-Proofpoint-ORIG-GUID: Sxgm0zRklnKuXp0jRp_qzNmrYjYLBWx2 X-Proofpoint-GUID: LCQS5XHNY2je7-0FR9EsxVbOMutU0xQs X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.267,Aquarius:18.0.980,Hydra:6.0.601,FMLib:17.11.176.26 definitions=2023-09-13_19,2023-09-13_01,2023-05-22_02 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 malwarescore=0 priorityscore=1501 mlxscore=0 bulkscore=0 spamscore=0 impostorscore=0 lowpriorityscore=0 suspectscore=0 clxscore=1015 mlxlogscore=999 phishscore=0 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2308100000 definitions=main-2309140025 X-Spam-Status: No, score=-12.3 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_EF, GIT_PATCH_0, RCVD_IN_MSPIKE_H4, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Kewen Lin via Gcc-patches From: "Kewen.Lin" Reply-To: Kewen Lin Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" This patch adjusts the cost handling on VMAT_GATHER_SCATTER in function vectorizable_store (all three cases), then we won't depend on vect_model_load_store for its costing any more. This patch shouldn't have any functional changes. gcc/ChangeLog: * tree-vect-stmts.cc (vect_model_store_cost): Assert it won't get VMAT_GATHER_SCATTER any more, remove VMAT_GATHER_SCATTER related handlings and the related parameter gs_info. (vect_build_scatter_store_calls): Add the handlings on costing with one more argument cost_vec. (vectorizable_store): Adjust the cost handling on VMAT_GATHER_SCATTER without calling vect_model_store_cost any more. --- gcc/tree-vect-stmts.cc | 188 ++++++++++++++++++++++++++--------------- 1 file changed, 118 insertions(+), 70 deletions(-) diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc index 36f7c5b9f4b..3f908242fee 100644 --- a/gcc/tree-vect-stmts.cc +++ b/gcc/tree-vect-stmts.cc @@ -959,12 +959,12 @@ cfun_returns (tree decl) static void vect_model_store_cost (vec_info *vinfo, stmt_vec_info stmt_info, int ncopies, vect_memory_access_type memory_access_type, - gather_scatter_info *gs_info, dr_alignment_support alignment_support_scheme, int misalignment, vec_load_store_type vls_type, slp_tree slp_node, stmt_vector_for_cost *cost_vec) { + gcc_assert (memory_access_type != VMAT_GATHER_SCATTER); unsigned int inside_cost = 0, prologue_cost = 0; stmt_vec_info first_stmt_info = stmt_info; bool grouped_access_p = STMT_VINFO_GROUPED_ACCESS (stmt_info); @@ -1012,18 +1012,9 @@ vect_model_store_cost (vec_info *vinfo, stmt_vec_info stmt_info, int ncopies, tree vectype = STMT_VINFO_VECTYPE (stmt_info); /* Costs of the stores. */ - if (memory_access_type == VMAT_ELEMENTWISE - || memory_access_type == VMAT_GATHER_SCATTER) + if (memory_access_type == VMAT_ELEMENTWISE) { unsigned int assumed_nunits = vect_nunits_for_cost (vectype); - if (memory_access_type == VMAT_GATHER_SCATTER - && gs_info->ifn == IFN_LAST && !gs_info->decl) - /* For emulated scatter N offset vector element extracts - (we assume the scalar scaling and ptr + offset add is consumed by - the load). */ - inside_cost += record_stmt_cost (cost_vec, ncopies * assumed_nunits, - vec_to_scalar, stmt_info, 0, - vect_body); /* N scalar stores plus extracting the elements. */ inside_cost += record_stmt_cost (cost_vec, ncopies * assumed_nunits, @@ -1034,9 +1025,7 @@ vect_model_store_cost (vec_info *vinfo, stmt_vec_info stmt_info, int ncopies, misalignment, &inside_cost, cost_vec); if (memory_access_type == VMAT_ELEMENTWISE - || memory_access_type == VMAT_STRIDED_SLP - || (memory_access_type == VMAT_GATHER_SCATTER - && gs_info->ifn == IFN_LAST && !gs_info->decl)) + || memory_access_type == VMAT_STRIDED_SLP) { /* N scalar stores plus extracting the elements. */ unsigned int assumed_nunits = vect_nunits_for_cost (vectype); @@ -2999,7 +2988,8 @@ vect_build_gather_load_calls (vec_info *vinfo, stmt_vec_info stmt_info, static void vect_build_scatter_store_calls (vec_info *vinfo, stmt_vec_info stmt_info, gimple_stmt_iterator *gsi, gimple **vec_stmt, - gather_scatter_info *gs_info, tree mask) + gather_scatter_info *gs_info, tree mask, + stmt_vector_for_cost *cost_vec) { loop_vec_info loop_vinfo = dyn_cast (vinfo); tree vectype = STMT_VINFO_VECTYPE (stmt_info); @@ -3009,6 +2999,30 @@ vect_build_scatter_store_calls (vec_info *vinfo, stmt_vec_info stmt_info, poly_uint64 scatter_off_nunits = TYPE_VECTOR_SUBPARTS (gs_info->offset_vectype); + /* FIXME: Keep the previous costing way in vect_model_store_cost by + costing N scalar stores, but it should be tweaked to use target + specific costs on related scatter store calls. */ + if (cost_vec) + { + tree op = vect_get_store_rhs (stmt_info); + enum vect_def_type dt; + gcc_assert (vect_is_simple_use (op, vinfo, &dt)); + unsigned int inside_cost, prologue_cost = 0; + if (dt == vect_constant_def || dt == vect_external_def) + prologue_cost += record_stmt_cost (cost_vec, 1, scalar_to_vec, + stmt_info, 0, vect_prologue); + unsigned int assumed_nunits = vect_nunits_for_cost (vectype); + inside_cost = record_stmt_cost (cost_vec, ncopies * assumed_nunits, + scalar_store, stmt_info, 0, vect_body); + + if (dump_enabled_p ()) + dump_printf_loc (MSG_NOTE, vect_location, + "vect_model_store_cost: inside_cost = %d, " + "prologue_cost = %d .\n", + inside_cost, prologue_cost); + return; + } + tree perm_mask = NULL_TREE, mask_halfvectype = NULL_TREE; if (known_eq (nunits, scatter_off_nunits)) modifier = NONE; @@ -8411,13 +8425,8 @@ vectorizable_store (vec_info *vinfo, if (memory_access_type == VMAT_GATHER_SCATTER && gs_info.decl) { - if (costing_p) - vect_model_store_cost (vinfo, stmt_info, ncopies, memory_access_type, - &gs_info, alignment_support_scheme, misalignment, - vls_type, slp_node, cost_vec); - else - vect_build_scatter_store_calls (vinfo, stmt_info, gsi, vec_stmt, - &gs_info, mask); + vect_build_scatter_store_calls (vinfo, stmt_info, gsi, vec_stmt, &gs_info, + mask, cost_vec); return true; } else if (STMT_VINFO_SIMD_LANE_ACCESS_P (stmt_info) >= 3) @@ -8426,8 +8435,8 @@ vectorizable_store (vec_info *vinfo, if (costing_p) { vect_model_store_cost (vinfo, stmt_info, ncopies, memory_access_type, - &gs_info, alignment_support_scheme, - misalignment, vls_type, slp_node, cost_vec); + alignment_support_scheme, misalignment, + vls_type, slp_node, cost_vec); return true; } return vectorizable_scan_store (vinfo, stmt_info, gsi, vec_stmt, ncopies); @@ -8470,8 +8479,8 @@ vectorizable_store (vec_info *vinfo, if (costing_p) { vect_model_store_cost (vinfo, stmt_info, ncopies, memory_access_type, - &gs_info, alignment_support_scheme, - misalignment, vls_type, slp_node, cost_vec); + alignment_support_scheme, misalignment, + vls_type, slp_node, cost_vec); return true; } @@ -8805,8 +8814,8 @@ vectorizable_store (vec_info *vinfo, if (costing_p) { vect_model_store_cost (vinfo, stmt_info, ncopies, memory_access_type, - &gs_info, alignment_support_scheme, - misalignment, vls_type, slp_node, cost_vec); + alignment_support_scheme, misalignment, + vls_type, slp_node, cost_vec); return true; } for (j = 0; j < ncopies; j++) @@ -8954,49 +8963,50 @@ vectorizable_store (vec_info *vinfo, if (memory_access_type == VMAT_GATHER_SCATTER) { gcc_assert (!slp && !grouped_store); - if (costing_p) - { - vect_model_store_cost (vinfo, stmt_info, ncopies, memory_access_type, - &gs_info, alignment_support_scheme, - misalignment, vls_type, slp_node, cost_vec); - return true; - } auto_vec vec_offsets; + unsigned int inside_cost = 0, prologue_cost = 0; for (j = 0; j < ncopies; j++) { gimple *new_stmt; if (j == 0) { - /* Since the store is not grouped, DR_GROUP_SIZE is 1, and - DR_CHAIN is of size 1. */ - gcc_assert (group_size == 1); - op = vect_get_store_rhs (first_stmt_info); - vect_get_vec_defs_for_operand (vinfo, first_stmt_info, ncopies, - op, gvec_oprnds[0]); - vec_oprnd = (*gvec_oprnds[0])[0]; - dr_chain.quick_push (vec_oprnd); - if (mask) + if (costing_p && vls_type == VLS_STORE_INVARIANT) + prologue_cost += record_stmt_cost (cost_vec, 1, scalar_to_vec, + stmt_info, 0, vect_prologue); + else if (!costing_p) { - vect_get_vec_defs_for_operand (vinfo, stmt_info, ncopies, - mask, &vec_masks, - mask_vectype); - vec_mask = vec_masks[0]; - } + /* Since the store is not grouped, DR_GROUP_SIZE is 1, and + DR_CHAIN is of size 1. */ + gcc_assert (group_size == 1); + op = vect_get_store_rhs (first_stmt_info); + vect_get_vec_defs_for_operand (vinfo, first_stmt_info, + ncopies, op, gvec_oprnds[0]); + vec_oprnd = (*gvec_oprnds[0])[0]; + dr_chain.quick_push (vec_oprnd); + if (mask) + { + vect_get_vec_defs_for_operand (vinfo, stmt_info, ncopies, + mask, &vec_masks, + mask_vectype); + vec_mask = vec_masks[0]; + } - /* We should have catched mismatched types earlier. */ - gcc_assert (useless_type_conversion_p (vectype, - TREE_TYPE (vec_oprnd))); - if (STMT_VINFO_GATHER_SCATTER_P (stmt_info)) - vect_get_gather_scatter_ops (loop_vinfo, loop, stmt_info, - slp_node, &gs_info, &dataref_ptr, - &vec_offsets); - else - dataref_ptr - = vect_create_data_ref_ptr (vinfo, first_stmt_info, aggr_type, - NULL, offset, &dummy, gsi, - &ptr_incr, false, bump); + /* We should have catched mismatched types earlier. */ + gcc_assert ( + useless_type_conversion_p (vectype, TREE_TYPE (vec_oprnd))); + if (STMT_VINFO_GATHER_SCATTER_P (stmt_info)) + vect_get_gather_scatter_ops (loop_vinfo, loop, stmt_info, + slp_node, &gs_info, + &dataref_ptr, &vec_offsets); + else + dataref_ptr + = vect_create_data_ref_ptr (vinfo, first_stmt_info, + aggr_type, NULL, offset, + &dummy, gsi, &ptr_incr, false, + bump); + } } - else + else if (!costing_p) { gcc_assert (!LOOP_VINFO_USING_SELECT_VL_P (loop_vinfo)); vec_oprnd = (*gvec_oprnds[0])[j]; @@ -9013,15 +9023,27 @@ vectorizable_store (vec_info *vinfo, tree final_mask = NULL_TREE; tree final_len = NULL_TREE; tree bias = NULL_TREE; - if (loop_masks) - final_mask = vect_get_loop_mask (loop_vinfo, gsi, loop_masks, - ncopies, vectype, j); - if (vec_mask) - final_mask = prepare_vec_mask (loop_vinfo, mask_vectype, final_mask, - vec_mask, gsi); + if (!costing_p) + { + if (loop_masks) + final_mask = vect_get_loop_mask (loop_vinfo, gsi, loop_masks, + ncopies, vectype, j); + if (vec_mask) + final_mask = prepare_vec_mask (loop_vinfo, mask_vectype, + final_mask, vec_mask, gsi); + } if (gs_info.ifn != IFN_LAST) { + if (costing_p) + { + unsigned int cnunits = vect_nunits_for_cost (vectype); + inside_cost + += record_stmt_cost (cost_vec, cnunits, scalar_store, + stmt_info, 0, vect_body); + continue; + } + if (STMT_VINFO_GATHER_SCATTER_P (stmt_info)) vec_offset = vec_offsets[j]; tree scale = size_int (gs_info.scale); @@ -9067,6 +9089,25 @@ vectorizable_store (vec_info *vinfo, { /* Emulated scatter. */ gcc_assert (!final_mask); + if (costing_p) + { + unsigned int cnunits = vect_nunits_for_cost (vectype); + /* For emulated scatter N offset vector element extracts + (we assume the scalar scaling and ptr + offset add is + consumed by the load). */ + inside_cost + += record_stmt_cost (cost_vec, cnunits, vec_to_scalar, + stmt_info, 0, vect_body); + /* N scalar stores plus extracting the elements. */ + inside_cost + += record_stmt_cost (cost_vec, cnunits, vec_to_scalar, + stmt_info, 0, vect_body); + inside_cost + += record_stmt_cost (cost_vec, cnunits, scalar_store, + stmt_info, 0, vect_body); + continue; + } + unsigned HOST_WIDE_INT const_nunits = nunits.to_constant (); unsigned HOST_WIDE_INT const_offset_nunits = TYPE_VECTOR_SUBPARTS (gs_info.offset_vectype).to_constant (); @@ -9117,6 +9158,13 @@ vectorizable_store (vec_info *vinfo, *vec_stmt = new_stmt; STMT_VINFO_VEC_STMTS (stmt_info).safe_push (new_stmt); } + + if (costing_p && dump_enabled_p ()) + dump_printf_loc (MSG_NOTE, vect_location, + "vect_model_store_cost: inside_cost = %d, " + "prologue_cost = %d .\n", + inside_cost, prologue_cost); + return true; } @@ -9407,8 +9455,8 @@ vectorizable_store (vec_info *vinfo, if (costing_p) vect_model_store_cost (vinfo, stmt_info, ncopies, memory_access_type, - &gs_info, alignment_support_scheme, misalignment, - vls_type, slp_node, cost_vec); + alignment_support_scheme, misalignment, vls_type, + slp_node, cost_vec); return true; } From patchwork Thu Sep 14 03:11:53 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Kewen.Lin" X-Patchwork-Id: 1833956 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256 header.s=default header.b=ul+x9qYy; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=8.43.85.97; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (ip-8-43-85-97.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4RmMnk1qypz1yhd for ; Thu, 14 Sep 2023 13:12:42 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 021163853D00 for ; Thu, 14 Sep 2023 03:12:40 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 021163853D00 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1694661160; bh=mTMQbMfxroKSjB8WIwwg2xt6fQit19HwdedlWzdn2uo=; h=To:Cc:Subject:Date:In-Reply-To:References:List-Id: List-Unsubscribe:List-Archive:List-Post:List-Help:List-Subscribe: From:Reply-To:From; b=ul+x9qYyI4RjwhMlarWq8HpttkqfyqoCcppYvqn711sTELszyn79CHDE9eGlvdG6+ WgDTN4rFOWNNK8KYPl61bYbhiTLfiRtdHPMzmAU7xDWYE16TnxluA7RGJvSM2xKtxK y1dcC0AplR7R0NK4za6D8rk29cBQS6X0ab83fEwU= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by sourceware.org (Postfix) with ESMTPS id 09A0D3857C45 for ; Thu, 14 Sep 2023 03:12:13 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 09A0D3857C45 Received: from pps.filterd (m0353728.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 38E36bLC031967; Thu, 14 Sep 2023 03:12:10 GMT Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3t3s7f11dc-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 14 Sep 2023 03:12:10 +0000 Received: from m0353728.ppops.net (m0353728.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 38E36qIb001496; Thu, 14 Sep 2023 03:12:09 GMT Received: from ppma21.wdc07v.mail.ibm.com (5b.69.3da9.ip4.static.sl-reverse.com [169.61.105.91]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3t3s7f11cw-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 14 Sep 2023 03:12:09 +0000 Received: from pps.filterd (ppma21.wdc07v.mail.ibm.com [127.0.0.1]) by ppma21.wdc07v.mail.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 38E2RMS3023152; Thu, 14 Sep 2023 03:12:08 GMT Received: from smtprelay07.fra02v.mail.ibm.com ([9.218.2.229]) by ppma21.wdc07v.mail.ibm.com (PPS) with ESMTPS id 3t141nyum1-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 14 Sep 2023 03:12:07 +0000 Received: from smtpav06.fra02v.mail.ibm.com (smtpav06.fra02v.mail.ibm.com [10.20.54.105]) by smtprelay07.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 38E3C5XC60817728 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 14 Sep 2023 03:12:06 GMT Received: from smtpav06.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id CF18220049; Thu, 14 Sep 2023 03:12:05 +0000 (GMT) Received: from smtpav06.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 11B6120040; Thu, 14 Sep 2023 03:12:05 +0000 (GMT) Received: from trout.aus.stglabs.ibm.com (unknown [9.40.194.100]) by smtpav06.fra02v.mail.ibm.com (Postfix) with ESMTP; Thu, 14 Sep 2023 03:12:04 +0000 (GMT) To: gcc-patches@gcc.gnu.org Cc: richard.guenther@gmail.com, richard.sandiford@arm.com Subject: [PATCH 04/10] vect: Simplify costing on vectorizable_scan_store Date: Wed, 13 Sep 2023 22:11:53 -0500 Message-Id: <308240b9aff98d1edc15bcba7a2f015e42cdc371.1694657494.git.linkw@linux.ibm.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: References: MIME-Version: 1.0 X-TM-AS-GCONF: 00 X-Proofpoint-ORIG-GUID: lY6jVRNfy0m73rHMalfNnBD31Cy1ETJc X-Proofpoint-GUID: fly-IviufND_3jxUv2hNdlsiOlEbObef X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.267,Aquarius:18.0.980,Hydra:6.0.601,FMLib:17.11.176.26 definitions=2023-09-13_19,2023-09-13_01,2023-05-22_02 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 suspectscore=0 adultscore=0 clxscore=1015 lowpriorityscore=0 mlxscore=0 priorityscore=1501 impostorscore=0 spamscore=0 mlxlogscore=999 phishscore=0 bulkscore=0 malwarescore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2308100000 definitions=main-2309140025 X-Spam-Status: No, score=-12.3 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_EF, GIT_PATCH_0, RCVD_IN_MSPIKE_H4, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Kewen Lin via Gcc-patches From: "Kewen.Lin" Reply-To: Kewen Lin Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" This patch is to simplify the costing on the case vectorizable_scan_store without calling function vect_model_store_cost any more. I considered if moving the costing into function vectorizable_scan_store is a good idea, for doing that, we have to pass several variables down which are only used for costing, and for now we just want to keep the costing as the previous, haven't tried to make this costing consistent with what the transforming does, so I think we can leave it for now. gcc/ChangeLog: * tree-vect-stmts.cc (vectorizable_store): Adjust costing on vectorizable_scan_store without calling vect_model_store_cost any more. --- gcc/tree-vect-stmts.cc | 18 +++++++++++++++--- 1 file changed, 15 insertions(+), 3 deletions(-) diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc index 3f908242fee..048c14d291c 100644 --- a/gcc/tree-vect-stmts.cc +++ b/gcc/tree-vect-stmts.cc @@ -8432,11 +8432,23 @@ vectorizable_store (vec_info *vinfo, else if (STMT_VINFO_SIMD_LANE_ACCESS_P (stmt_info) >= 3) { gcc_assert (memory_access_type == VMAT_CONTIGUOUS); + gcc_assert (!slp); if (costing_p) { - vect_model_store_cost (vinfo, stmt_info, ncopies, memory_access_type, - alignment_support_scheme, misalignment, - vls_type, slp_node, cost_vec); + unsigned int inside_cost = 0, prologue_cost = 0; + if (vls_type == VLS_STORE_INVARIANT) + prologue_cost += record_stmt_cost (cost_vec, 1, scalar_to_vec, + stmt_info, 0, vect_prologue); + vect_get_store_cost (vinfo, stmt_info, ncopies, + alignment_support_scheme, misalignment, + &inside_cost, cost_vec); + + if (dump_enabled_p ()) + dump_printf_loc (MSG_NOTE, vect_location, + "vect_model_store_cost: inside_cost = %d, " + "prologue_cost = %d .\n", + inside_cost, prologue_cost); + return true; } return vectorizable_scan_store (vinfo, stmt_info, gsi, vec_stmt, ncopies); From patchwork Thu Sep 14 03:11:54 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Kewen.Lin" X-Patchwork-Id: 1833958 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256 header.s=default header.b=DmZSwdJi; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=8.43.85.97; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (ip-8-43-85-97.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4RmMp23S3Hz1ypL for ; Thu, 14 Sep 2023 13:12:58 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 836C6385773F for ; Thu, 14 Sep 2023 03:12:56 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 836C6385773F DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1694661176; bh=2gtAuGwjK5byV+aXTzqLrKTXQsQlUooJHhtV5KqlxNU=; h=To:Cc:Subject:Date:In-Reply-To:References:List-Id: List-Unsubscribe:List-Archive:List-Post:List-Help:List-Subscribe: From:Reply-To:From; b=DmZSwdJiHWdq1y0IX+PzVVldWJae4LAKxWJucB5staz7hGnx98rfoQYWB/r1g5mNj OcbdTWYfDz+JwDcJHYF7ZbVC22oj36Eb4AQfOPhvaJrc5YnEMfsjAaI6vu1bktYzxh qIzNh/zEaktrR9fqF/hp3rRWVsbEHLcek0hYV4TE= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by sourceware.org (Postfix) with ESMTPS id CE8FA385842C for ; Thu, 14 Sep 2023 03:12:13 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org CE8FA385842C Received: from pps.filterd (m0353729.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 38E36h4Y011974; Thu, 14 Sep 2023 03:12:11 GMT Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3t3st2rdc9-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 14 Sep 2023 03:12:10 +0000 Received: from m0353729.ppops.net (m0353729.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 38E3779o015668; Thu, 14 Sep 2023 03:12:09 GMT Received: from ppma11.dal12v.mail.ibm.com (db.9e.1632.ip4.static.sl-reverse.com [50.22.158.219]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3t3st2rdby-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 14 Sep 2023 03:12:09 +0000 Received: from pps.filterd (ppma11.dal12v.mail.ibm.com [127.0.0.1]) by ppma11.dal12v.mail.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 38E16WU8011980; Thu, 14 Sep 2023 03:12:08 GMT Received: from smtprelay03.fra02v.mail.ibm.com ([9.218.2.224]) by ppma11.dal12v.mail.ibm.com (PPS) with ESMTPS id 3t15r27b1v-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 14 Sep 2023 03:12:08 +0000 Received: from smtpav06.fra02v.mail.ibm.com (smtpav06.fra02v.mail.ibm.com [10.20.54.105]) by smtprelay03.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 38E3C78l22610664 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 14 Sep 2023 03:12:07 GMT Received: from smtpav06.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id EC5F620049; Thu, 14 Sep 2023 03:12:06 +0000 (GMT) Received: from smtpav06.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 2E48320040; Thu, 14 Sep 2023 03:12:06 +0000 (GMT) Received: from trout.aus.stglabs.ibm.com (unknown [9.40.194.100]) by smtpav06.fra02v.mail.ibm.com (Postfix) with ESMTP; Thu, 14 Sep 2023 03:12:05 +0000 (GMT) To: gcc-patches@gcc.gnu.org Cc: richard.guenther@gmail.com, richard.sandiford@arm.com Subject: [PATCH 05/10] vect: Adjust vectorizable_store costing on VMAT_ELEMENTWISE and VMAT_STRIDED_SLP Date: Wed, 13 Sep 2023 22:11:54 -0500 Message-Id: <2adef8b10433859b6642282b03a11df33c732d11.1694657494.git.linkw@linux.ibm.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: References: MIME-Version: 1.0 X-TM-AS-GCONF: 00 X-Proofpoint-GUID: yeA2gKqtTBGEbibTmtBvE8-nb3XC44uw X-Proofpoint-ORIG-GUID: XYGBd0vX17AYdtgKu1StvDPTSNgs5i43 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.267,Aquarius:18.0.980,Hydra:6.0.601,FMLib:17.11.176.26 definitions=2023-09-13_19,2023-09-13_01,2023-05-22_02 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 clxscore=1015 malwarescore=0 bulkscore=0 mlxscore=0 lowpriorityscore=0 spamscore=0 impostorscore=0 phishscore=0 priorityscore=1501 adultscore=0 suspectscore=0 mlxlogscore=999 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2308100000 definitions=main-2309140025 X-Spam-Status: No, score=-12.3 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_EF, GIT_PATCH_0, RCVD_IN_MSPIKE_H4, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Kewen Lin via Gcc-patches From: "Kewen.Lin" Reply-To: Kewen Lin Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" This patch adjusts the cost handling on VMAT_ELEMENTWISE and VMAT_STRIDED_SLP in function vectorizable_store. We don't call function vect_model_store_cost for them any more. Like what we improved for PR82255 on load side, this change helps us to get rid of unnecessary vec_to_scalar costing for some case with VMAT_STRIDED_SLP. One typical test case gcc.dg/vect/costmodel/ppc/costmodel-vect-store-1.c has been associated. And it helps some cases with some inconsistent costing too. Besides, this also special-cases the interleaving stores for these two affected memory access types, since for the interleaving stores the whole chain is vectorized when the last store in the chain is reached, the other stores in the group would be skipped. To keep consistent with this and follows the transforming handlings like iterating the whole group, it only costs for the first store in the group. Ideally we can only cost for the last one but it's not trivial and using the first one is actually equivalent. gcc/ChangeLog: * tree-vect-stmts.cc (vect_model_store_cost): Assert it won't get VMAT_ELEMENTWISE and VMAT_STRIDED_SLP any more, and remove their related handlings. (vectorizable_store): Adjust the cost handling on VMAT_ELEMENTWISE and VMAT_STRIDED_SLP without calling vect_model_store_cost. gcc/testsuite/ChangeLog: * gcc.dg/vect/costmodel/ppc/costmodel-vect-store-1.c: New test. --- .../costmodel/ppc/costmodel-vect-store-1.c | 23 +++ gcc/tree-vect-stmts.cc | 160 +++++++++++------- 2 files changed, 120 insertions(+), 63 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-store-1.c diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-store-1.c b/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-store-1.c new file mode 100644 index 00000000000..ab5f3301492 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-store-1.c @@ -0,0 +1,23 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_int } */ +/* { dg-additional-options "-O3" } + +/* This test case is partially extracted from case + gcc.dg/vect/vect-avg-16.c, it's to verify we don't + cost a store with vec_to_scalar when we shouldn't. */ + +void +test (signed char *restrict a, signed char *restrict b, signed char *restrict c, + int n) +{ + for (int j = 0; j < n; ++j) + { + for (int i = 0; i < 16; ++i) + a[i] = (b[i] + c[i]) >> 1; + a += 20; + b += 20; + c += 20; + } +} + +/* { dg-final { scan-tree-dump-times "vec_to_scalar" 0 "vect" } } */ diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc index 048c14d291c..3d01168080a 100644 --- a/gcc/tree-vect-stmts.cc +++ b/gcc/tree-vect-stmts.cc @@ -964,7 +964,9 @@ vect_model_store_cost (vec_info *vinfo, stmt_vec_info stmt_info, int ncopies, vec_load_store_type vls_type, slp_tree slp_node, stmt_vector_for_cost *cost_vec) { - gcc_assert (memory_access_type != VMAT_GATHER_SCATTER); + gcc_assert (memory_access_type != VMAT_GATHER_SCATTER + && memory_access_type != VMAT_ELEMENTWISE + && memory_access_type != VMAT_STRIDED_SLP); unsigned int inside_cost = 0, prologue_cost = 0; stmt_vec_info first_stmt_info = stmt_info; bool grouped_access_p = STMT_VINFO_GROUPED_ACCESS (stmt_info); @@ -1010,29 +1012,9 @@ vect_model_store_cost (vec_info *vinfo, stmt_vec_info stmt_info, int ncopies, group_size); } - tree vectype = STMT_VINFO_VECTYPE (stmt_info); /* Costs of the stores. */ - if (memory_access_type == VMAT_ELEMENTWISE) - { - unsigned int assumed_nunits = vect_nunits_for_cost (vectype); - /* N scalar stores plus extracting the elements. */ - inside_cost += record_stmt_cost (cost_vec, - ncopies * assumed_nunits, - scalar_store, stmt_info, 0, vect_body); - } - else - vect_get_store_cost (vinfo, stmt_info, ncopies, alignment_support_scheme, - misalignment, &inside_cost, cost_vec); - - if (memory_access_type == VMAT_ELEMENTWISE - || memory_access_type == VMAT_STRIDED_SLP) - { - /* N scalar stores plus extracting the elements. */ - unsigned int assumed_nunits = vect_nunits_for_cost (vectype); - inside_cost += record_stmt_cost (cost_vec, - ncopies * assumed_nunits, - vec_to_scalar, stmt_info, 0, vect_body); - } + vect_get_store_cost (vinfo, stmt_info, ncopies, alignment_support_scheme, + misalignment, &inside_cost, cost_vec); /* When vectorizing a store into the function result assign a penalty if the function returns in a multi-register location. @@ -8416,6 +8398,18 @@ vectorizable_store (vec_info *vinfo, "Vectorizing an unaligned access.\n"); STMT_VINFO_TYPE (stmt_info) = store_vec_info_type; + + /* As function vect_transform_stmt shows, for interleaving stores + the whole chain is vectorized when the last store in the chain + is reached, the other stores in the group are skipped. So we + want to only cost the last one here, but it's not trivial to + get the last, as it's equivalent to use the first one for + costing, use the first one instead. */ + if (grouped_store + && !slp + && first_stmt_info != stmt_info + && memory_access_type == VMAT_ELEMENTWISE) + return true; } gcc_assert (memory_access_type == STMT_VINFO_MEMORY_ACCESS_TYPE (stmt_info)); @@ -8488,14 +8482,7 @@ vectorizable_store (vec_info *vinfo, if (memory_access_type == VMAT_ELEMENTWISE || memory_access_type == VMAT_STRIDED_SLP) { - if (costing_p) - { - vect_model_store_cost (vinfo, stmt_info, ncopies, memory_access_type, - alignment_support_scheme, misalignment, - vls_type, slp_node, cost_vec); - return true; - } - + unsigned inside_cost = 0, prologue_cost = 0; gimple_stmt_iterator incr_gsi; bool insert_after; gimple *incr; @@ -8503,7 +8490,7 @@ vectorizable_store (vec_info *vinfo, tree ivstep; tree running_off; tree stride_base, stride_step, alias_off; - tree vec_oprnd; + tree vec_oprnd = NULL_TREE; tree dr_offset; unsigned int g; /* Checked by get_load_store_type. */ @@ -8609,26 +8596,30 @@ vectorizable_store (vec_info *vinfo, lnel = const_nunits; ltype = vectype; lvectype = vectype; + alignment_support_scheme = dr_align; + misalignment = mis_align; } } ltype = build_aligned_type (ltype, TYPE_ALIGN (elem_type)); ncopies = SLP_TREE_NUMBER_OF_VEC_STMTS (slp_node); } - ivstep = stride_step; - ivstep = fold_build2 (MULT_EXPR, TREE_TYPE (ivstep), ivstep, - build_int_cst (TREE_TYPE (ivstep), vf)); + if (!costing_p) + { + ivstep = stride_step; + ivstep = fold_build2 (MULT_EXPR, TREE_TYPE (ivstep), ivstep, + build_int_cst (TREE_TYPE (ivstep), vf)); - standard_iv_increment_position (loop, &incr_gsi, &insert_after); + standard_iv_increment_position (loop, &incr_gsi, &insert_after); - stride_base = cse_and_gimplify_to_preheader (loop_vinfo, stride_base); - ivstep = cse_and_gimplify_to_preheader (loop_vinfo, ivstep); - create_iv (stride_base, PLUS_EXPR, ivstep, NULL, - loop, &incr_gsi, insert_after, - &offvar, NULL); - incr = gsi_stmt (incr_gsi); + stride_base = cse_and_gimplify_to_preheader (loop_vinfo, stride_base); + ivstep = cse_and_gimplify_to_preheader (loop_vinfo, ivstep); + create_iv (stride_base, PLUS_EXPR, ivstep, NULL, loop, &incr_gsi, + insert_after, &offvar, NULL); + incr = gsi_stmt (incr_gsi); - stride_step = cse_and_gimplify_to_preheader (loop_vinfo, stride_step); + stride_step = cse_and_gimplify_to_preheader (loop_vinfo, stride_step); + } alias_off = build_int_cst (ref_type, 0); stmt_vec_info next_stmt_info = first_stmt_info; @@ -8636,39 +8627,76 @@ vectorizable_store (vec_info *vinfo, for (g = 0; g < group_size; g++) { running_off = offvar; - if (g) + if (!costing_p) { - tree size = TYPE_SIZE_UNIT (ltype); - tree pos = fold_build2 (MULT_EXPR, sizetype, size_int (g), - size); - tree newoff = copy_ssa_name (running_off, NULL); - incr = gimple_build_assign (newoff, POINTER_PLUS_EXPR, - running_off, pos); - vect_finish_stmt_generation (vinfo, stmt_info, incr, gsi); - running_off = newoff; + if (g) + { + tree size = TYPE_SIZE_UNIT (ltype); + tree pos + = fold_build2 (MULT_EXPR, sizetype, size_int (g), size); + tree newoff = copy_ssa_name (running_off, NULL); + incr = gimple_build_assign (newoff, POINTER_PLUS_EXPR, + running_off, pos); + vect_finish_stmt_generation (vinfo, stmt_info, incr, gsi); + running_off = newoff; + } } if (!slp) op = vect_get_store_rhs (next_stmt_info); - vect_get_vec_defs (vinfo, next_stmt_info, slp_node, ncopies, - op, &vec_oprnds); + if (!costing_p) + vect_get_vec_defs (vinfo, next_stmt_info, slp_node, ncopies, op, + &vec_oprnds); + else if (!slp) + { + enum vect_def_type cdt; + gcc_assert (vect_is_simple_use (op, vinfo, &cdt)); + if (cdt == vect_constant_def || cdt == vect_external_def) + prologue_cost += record_stmt_cost (cost_vec, 1, scalar_to_vec, + stmt_info, 0, vect_prologue); + } unsigned int group_el = 0; unsigned HOST_WIDE_INT elsz = tree_to_uhwi (TYPE_SIZE_UNIT (TREE_TYPE (vectype))); for (j = 0; j < ncopies; j++) { - vec_oprnd = vec_oprnds[j]; - /* Pun the vector to extract from if necessary. */ - if (lvectype != vectype) + if (!costing_p) { - tree tem = make_ssa_name (lvectype); - gimple *pun - = gimple_build_assign (tem, build1 (VIEW_CONVERT_EXPR, - lvectype, vec_oprnd)); - vect_finish_stmt_generation (vinfo, stmt_info, pun, gsi); - vec_oprnd = tem; + vec_oprnd = vec_oprnds[j]; + /* Pun the vector to extract from if necessary. */ + if (lvectype != vectype) + { + tree tem = make_ssa_name (lvectype); + tree cvt + = build1 (VIEW_CONVERT_EXPR, lvectype, vec_oprnd); + gimple *pun = gimple_build_assign (tem, cvt); + vect_finish_stmt_generation (vinfo, stmt_info, pun, gsi); + vec_oprnd = tem; + } } for (i = 0; i < nstores; i++) { + if (costing_p) + { + /* Only need vector extracting when there are more + than one stores. */ + if (nstores > 1) + inside_cost + += record_stmt_cost (cost_vec, 1, vec_to_scalar, + stmt_info, 0, vect_body); + /* Take a single lane vector type store as scalar + store to avoid ICE like 110776. */ + if (VECTOR_TYPE_P (ltype) + && known_ne (TYPE_VECTOR_SUBPARTS (ltype), 1U)) + vect_get_store_cost (vinfo, stmt_info, 1, + alignment_support_scheme, + misalignment, &inside_cost, + cost_vec); + else + inside_cost + += record_stmt_cost (cost_vec, 1, scalar_store, + stmt_info, 0, vect_body); + continue; + } tree newref, newoff; gimple *incr, *assign; tree size = TYPE_SIZE (ltype); @@ -8719,6 +8747,12 @@ vectorizable_store (vec_info *vinfo, break; } + if (costing_p && dump_enabled_p ()) + dump_printf_loc (MSG_NOTE, vect_location, + "vect_model_store_cost: inside_cost = %d, " + "prologue_cost = %d .\n", + inside_cost, prologue_cost); + return true; } From patchwork Thu Sep 14 03:11:55 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Kewen.Lin" X-Patchwork-Id: 1833957 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256 header.s=default header.b=lLWHFr3U; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=8.43.85.97; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (ip-8-43-85-97.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4RmMp15lWwz1yhd for ; Thu, 14 Sep 2023 13:12:57 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id CCEA4385277B for ; Thu, 14 Sep 2023 03:12:55 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org CCEA4385277B DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1694661175; bh=OCN7nnYwV6qFjUL82UyY/RtJ08VxPJHNZxK9lc4zI8U=; h=To:Cc:Subject:Date:In-Reply-To:References:List-Id: List-Unsubscribe:List-Archive:List-Post:List-Help:List-Subscribe: From:Reply-To:From; b=lLWHFr3U3VR2mrf7GYAxfbLOFNi/WkwSfBtjwNTMfuGm4bO5x3mjCIDz3fhK+EjYv MhMOZSahWeBzy1ESCL1LA0xoCOLPB/b5glyKH6d6enyjaS9rxwx/eBU9N/672TjVrV +0yb7rsiL6dy+Ey6Ejty6CJK2H8w62vQ9lDtw0iI= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by sourceware.org (Postfix) with ESMTPS id C2E073858291 for ; Thu, 14 Sep 2023 03:12:14 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org C2E073858291 Received: from pps.filterd (m0353728.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 38E36bte032012; Thu, 14 Sep 2023 03:12:11 GMT Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3t3s7f11dn-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 14 Sep 2023 03:12:11 +0000 Received: from m0353728.ppops.net (m0353728.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 38E37Icu005936; Thu, 14 Sep 2023 03:12:10 GMT Received: from ppma12.dal12v.mail.ibm.com (dc.9e.1632.ip4.static.sl-reverse.com [50.22.158.220]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3t3s7f11dg-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 14 Sep 2023 03:12:10 +0000 Received: from pps.filterd (ppma12.dal12v.mail.ibm.com [127.0.0.1]) by ppma12.dal12v.mail.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 38E1PXuk024034; Thu, 14 Sep 2023 03:12:09 GMT Received: from smtprelay01.fra02v.mail.ibm.com ([9.218.2.227]) by ppma12.dal12v.mail.ibm.com (PPS) with ESMTPS id 3t131tg7jn-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 14 Sep 2023 03:12:09 +0000 Received: from smtpav06.fra02v.mail.ibm.com (smtpav06.fra02v.mail.ibm.com [10.20.54.105]) by smtprelay01.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 38E3C88f18350734 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 14 Sep 2023 03:12:08 GMT Received: from smtpav06.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 062C52004E; Thu, 14 Sep 2023 03:12:08 +0000 (GMT) Received: from smtpav06.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 3DEAB20049; Thu, 14 Sep 2023 03:12:07 +0000 (GMT) Received: from trout.aus.stglabs.ibm.com (unknown [9.40.194.100]) by smtpav06.fra02v.mail.ibm.com (Postfix) with ESMTP; Thu, 14 Sep 2023 03:12:07 +0000 (GMT) To: gcc-patches@gcc.gnu.org Cc: richard.guenther@gmail.com, richard.sandiford@arm.com Subject: [PATCH 06/10] vect: Adjust vectorizable_store costing on VMAT_LOAD_STORE_LANES Date: Wed, 13 Sep 2023 22:11:55 -0500 Message-Id: <048c90cf62145799aa31e3ca4edd6f7adc911a6c.1694657494.git.linkw@linux.ibm.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: References: MIME-Version: 1.0 X-TM-AS-GCONF: 00 X-Proofpoint-ORIG-GUID: sAgY-3QliisiifjQwavUUxRNuyqibSgB X-Proofpoint-GUID: GCN-3zzrF0nHHfa6wWoBU74RLbO6TKOI X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.267,Aquarius:18.0.980,Hydra:6.0.601,FMLib:17.11.176.26 definitions=2023-09-13_19,2023-09-13_01,2023-05-22_02 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 suspectscore=0 adultscore=0 clxscore=1015 lowpriorityscore=0 mlxscore=0 priorityscore=1501 impostorscore=0 spamscore=0 mlxlogscore=837 phishscore=0 bulkscore=0 malwarescore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2308100000 definitions=main-2309140025 X-Spam-Status: No, score=-12.3 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_EF, GIT_PATCH_0, RCVD_IN_MSPIKE_H4, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Kewen Lin via Gcc-patches From: "Kewen.Lin" Reply-To: Kewen Lin Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" This patch adjusts the cost handling on VMAT_LOAD_STORE_LANES in function vectorizable_store. We don't call function vect_model_store_cost for it any more. It's the case of interleaving stores, so it skips all stmts excepting for first_stmt_info, consider the whole group when costing first_stmt_info. This patch shouldn't have any functional changes. gcc/ChangeLog: * tree-vect-stmts.cc (vect_model_store_cost): Assert it will never get VMAT_LOAD_STORE_LANES. (vectorizable_store): Adjust the cost handling on VMAT_LOAD_STORE_LANES without calling vect_model_store_cost. Factor out new lambda function update_prologue_cost. --- gcc/tree-vect-stmts.cc | 110 ++++++++++++++++++++++++++++------------- 1 file changed, 75 insertions(+), 35 deletions(-) diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc index 3d01168080a..fbd16b8a487 100644 --- a/gcc/tree-vect-stmts.cc +++ b/gcc/tree-vect-stmts.cc @@ -966,7 +966,8 @@ vect_model_store_cost (vec_info *vinfo, stmt_vec_info stmt_info, int ncopies, { gcc_assert (memory_access_type != VMAT_GATHER_SCATTER && memory_access_type != VMAT_ELEMENTWISE - && memory_access_type != VMAT_STRIDED_SLP); + && memory_access_type != VMAT_STRIDED_SLP + && memory_access_type != VMAT_LOAD_STORE_LANES); unsigned int inside_cost = 0, prologue_cost = 0; stmt_vec_info first_stmt_info = stmt_info; bool grouped_access_p = STMT_VINFO_GROUPED_ACCESS (stmt_info); @@ -8408,7 +8409,8 @@ vectorizable_store (vec_info *vinfo, if (grouped_store && !slp && first_stmt_info != stmt_info - && memory_access_type == VMAT_ELEMENTWISE) + && (memory_access_type == VMAT_ELEMENTWISE + || memory_access_type == VMAT_LOAD_STORE_LANES)) return true; } gcc_assert (memory_access_type == STMT_VINFO_MEMORY_ACCESS_TYPE (stmt_info)); @@ -8479,6 +8481,31 @@ vectorizable_store (vec_info *vinfo, dump_printf_loc (MSG_NOTE, vect_location, "transform store. ncopies = %d\n", ncopies); + /* Check if we need to update prologue cost for invariant, + and update it accordingly if so. If it's not for + interleaving store, we can just check vls_type; but if + it's for interleaving store, need to check the def_type + of the stored value since the current vls_type is just + for first_stmt_info. */ + auto update_prologue_cost = [&](unsigned *prologue_cost, tree store_rhs) + { + gcc_assert (costing_p); + if (slp) + return; + if (grouped_store) + { + gcc_assert (store_rhs); + enum vect_def_type cdt; + gcc_assert (vect_is_simple_use (store_rhs, vinfo, &cdt)); + if (cdt != vect_constant_def && cdt != vect_external_def) + return; + } + else if (vls_type != VLS_STORE_INVARIANT) + return; + *prologue_cost += record_stmt_cost (cost_vec, 1, scalar_to_vec, stmt_info, + 0, vect_prologue); + }; + if (memory_access_type == VMAT_ELEMENTWISE || memory_access_type == VMAT_STRIDED_SLP) { @@ -8646,14 +8673,8 @@ vectorizable_store (vec_info *vinfo, if (!costing_p) vect_get_vec_defs (vinfo, next_stmt_info, slp_node, ncopies, op, &vec_oprnds); - else if (!slp) - { - enum vect_def_type cdt; - gcc_assert (vect_is_simple_use (op, vinfo, &cdt)); - if (cdt == vect_constant_def || cdt == vect_external_def) - prologue_cost += record_stmt_cost (cost_vec, 1, scalar_to_vec, - stmt_info, 0, vect_prologue); - } + else + update_prologue_cost (&prologue_cost, op); unsigned int group_el = 0; unsigned HOST_WIDE_INT elsz = tree_to_uhwi (TYPE_SIZE_UNIT (TREE_TYPE (vectype))); @@ -8857,13 +8878,7 @@ vectorizable_store (vec_info *vinfo, if (memory_access_type == VMAT_LOAD_STORE_LANES) { gcc_assert (!slp && grouped_store); - if (costing_p) - { - vect_model_store_cost (vinfo, stmt_info, ncopies, memory_access_type, - alignment_support_scheme, misalignment, - vls_type, slp_node, cost_vec); - return true; - } + unsigned inside_cost = 0, prologue_cost = 0; for (j = 0; j < ncopies; j++) { gimple *new_stmt; @@ -8879,29 +8894,39 @@ vectorizable_store (vec_info *vinfo, DR_GROUP_SIZE is the exact number of stmts in the chain. Therefore, NEXT_STMT_INFO can't be NULL_TREE. */ op = vect_get_store_rhs (next_stmt_info); - vect_get_vec_defs_for_operand (vinfo, next_stmt_info, ncopies, - op, gvec_oprnds[i]); - vec_oprnd = (*gvec_oprnds[i])[0]; - dr_chain.quick_push (vec_oprnd); + if (costing_p) + update_prologue_cost (&prologue_cost, op); + else + { + vect_get_vec_defs_for_operand (vinfo, next_stmt_info, + ncopies, op, + gvec_oprnds[i]); + vec_oprnd = (*gvec_oprnds[i])[0]; + dr_chain.quick_push (vec_oprnd); + } next_stmt_info = DR_GROUP_NEXT_ELEMENT (next_stmt_info); } - if (mask) + + if (!costing_p) { - vect_get_vec_defs_for_operand (vinfo, stmt_info, ncopies, - mask, &vec_masks, - mask_vectype); - vec_mask = vec_masks[0]; - } + if (mask) + { + vect_get_vec_defs_for_operand (vinfo, stmt_info, ncopies, + mask, &vec_masks, + mask_vectype); + vec_mask = vec_masks[0]; + } - /* We should have catched mismatched types earlier. */ - gcc_assert ( - useless_type_conversion_p (vectype, TREE_TYPE (vec_oprnd))); - dataref_ptr - = vect_create_data_ref_ptr (vinfo, first_stmt_info, aggr_type, - NULL, offset, &dummy, gsi, - &ptr_incr, false, bump); + /* We should have catched mismatched types earlier. */ + gcc_assert ( + useless_type_conversion_p (vectype, TREE_TYPE (vec_oprnd))); + dataref_ptr + = vect_create_data_ref_ptr (vinfo, first_stmt_info, + aggr_type, NULL, offset, &dummy, + gsi, &ptr_incr, false, bump); + } } - else + else if (!costing_p) { gcc_assert (!LOOP_VINFO_USING_SELECT_VL_P (loop_vinfo)); /* DR_CHAIN is then used as an input to @@ -8917,6 +8942,15 @@ vectorizable_store (vec_info *vinfo, stmt_info, bump); } + if (costing_p) + { + for (i = 0; i < vec_num; i++) + vect_get_store_cost (vinfo, stmt_info, 1, + alignment_support_scheme, misalignment, + &inside_cost, cost_vec); + continue; + } + /* Get an array into which we can store the individual vectors. */ tree vec_array = create_vector_array (vectype, vec_num); @@ -9003,6 +9037,12 @@ vectorizable_store (vec_info *vinfo, STMT_VINFO_VEC_STMTS (stmt_info).safe_push (new_stmt); } + if (costing_p && dump_enabled_p ()) + dump_printf_loc (MSG_NOTE, vect_location, + "vect_model_store_cost: inside_cost = %d, " + "prologue_cost = %d .\n", + inside_cost, prologue_cost); + return true; } From patchwork Thu Sep 14 03:11:56 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Kewen.Lin" X-Patchwork-Id: 1833962 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256 header.s=default header.b=fTHxmSNS; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=8.43.85.97; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4RmMqC2ZdKz1yhd for ; Thu, 14 Sep 2023 13:13:59 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 63A34382EF0C for ; Thu, 14 Sep 2023 03:13:57 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 63A34382EF0C DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1694661237; bh=yYh0FyzWyXBm7mpdnmIpfTcS3TOiit4V6OIoPGWJJ5U=; h=To:Cc:Subject:Date:In-Reply-To:References:List-Id: List-Unsubscribe:List-Archive:List-Post:List-Help:List-Subscribe: From:Reply-To:From; b=fTHxmSNSFbkdXJ9OXX9aUGgUSl2s3e4dpMEM5ponQvYmqPc7GFykatEbfhFk81Yo0 ueHz5Zb9l9AL+mprg907gJkkUEAad2aMCKuGx5DPLITP3xNvCSLc47PmkYlQAR82zh rGD0CJxIJV1sCMTLJ3ty34jcD3tMEvDyHMZpfGKc= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mx0b-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) by sourceware.org (Postfix) with ESMTPS id 098BE3858C2F for ; Thu, 14 Sep 2023 03:12:15 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 098BE3858C2F Received: from pps.filterd (m0356516.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 38E394do028954; Thu, 14 Sep 2023 03:12:12 GMT Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3t3sq3rat7-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 14 Sep 2023 03:12:12 +0000 Received: from m0356516.ppops.net (m0356516.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 38E3CBSW006555; Thu, 14 Sep 2023 03:12:12 GMT Received: from ppma23.wdc07v.mail.ibm.com (5d.69.3da9.ip4.static.sl-reverse.com [169.61.105.93]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3t3sq3rasu-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 14 Sep 2023 03:12:11 +0000 Received: from pps.filterd (ppma23.wdc07v.mail.ibm.com [127.0.0.1]) by ppma23.wdc07v.mail.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 38E37pTk002779; Thu, 14 Sep 2023 03:12:11 GMT Received: from smtprelay04.fra02v.mail.ibm.com ([9.218.2.228]) by ppma23.wdc07v.mail.ibm.com (PPS) with ESMTPS id 3t14hm7q3s-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 14 Sep 2023 03:12:11 +0000 Received: from smtpav06.fra02v.mail.ibm.com (smtpav06.fra02v.mail.ibm.com [10.20.54.105]) by smtprelay04.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 38E3C9IE33161942 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 14 Sep 2023 03:12:09 GMT Received: from smtpav06.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 158DE20040; Thu, 14 Sep 2023 03:12:09 +0000 (GMT) Received: from smtpav06.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 4EBF820049; Thu, 14 Sep 2023 03:12:08 +0000 (GMT) Received: from trout.aus.stglabs.ibm.com (unknown [9.40.194.100]) by smtpav06.fra02v.mail.ibm.com (Postfix) with ESMTP; Thu, 14 Sep 2023 03:12:08 +0000 (GMT) To: gcc-patches@gcc.gnu.org Cc: richard.guenther@gmail.com, richard.sandiford@arm.com Subject: [PATCH 07/10] vect: Adjust vectorizable_store costing on VMAT_CONTIGUOUS_PERMUTE Date: Wed, 13 Sep 2023 22:11:56 -0500 Message-Id: <03074b183ea6c016691e6174a331de1443bdf326.1694657494.git.linkw@linux.ibm.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: References: MIME-Version: 1.0 X-TM-AS-GCONF: 00 X-Proofpoint-GUID: A2JfLfMfARBZUkf4s45XnWoCAw9YMSnj X-Proofpoint-ORIG-GUID: aqmzHGyxKzfHsu3um06hgj3w4Z32xtXO X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.267,Aquarius:18.0.980,Hydra:6.0.601,FMLib:17.11.176.26 definitions=2023-09-13_19,2023-09-13_01,2023-05-22_02 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 mlxscore=0 suspectscore=0 adultscore=0 spamscore=0 clxscore=1015 malwarescore=0 phishscore=0 bulkscore=0 lowpriorityscore=0 priorityscore=1501 impostorscore=0 mlxlogscore=626 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2308100000 definitions=main-2309140025 X-Spam-Status: No, score=-12.3 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_EF, GIT_PATCH_0, RCVD_IN_MSPIKE_H4, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Kewen Lin via Gcc-patches From: "Kewen.Lin" Reply-To: Kewen Lin Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" This patch adjusts the cost handling on VMAT_CONTIGUOUS_PERMUTE in function vectorizable_store. We don't call function vect_model_store_cost for it any more. It's the case of interleaving stores, so it skips all stmts excepting for first_stmt_info, consider the whole group when costing first_stmt_info. This patch shouldn't have any functional changes. gcc/ChangeLog: * tree-vect-stmts.cc (vect_model_store_cost): Assert it will never get VMAT_CONTIGUOUS_PERMUTE and remove VMAT_CONTIGUOUS_PERMUTE related handlings. (vectorizable_store): Adjust the cost handling on VMAT_CONTIGUOUS_PERMUTE without calling vect_model_store_cost. --- gcc/tree-vect-stmts.cc | 128 ++++++++++++++++++++++++----------------- 1 file changed, 74 insertions(+), 54 deletions(-) diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc index fbd16b8a487..e3ba8077091 100644 --- a/gcc/tree-vect-stmts.cc +++ b/gcc/tree-vect-stmts.cc @@ -967,10 +967,10 @@ vect_model_store_cost (vec_info *vinfo, stmt_vec_info stmt_info, int ncopies, gcc_assert (memory_access_type != VMAT_GATHER_SCATTER && memory_access_type != VMAT_ELEMENTWISE && memory_access_type != VMAT_STRIDED_SLP - && memory_access_type != VMAT_LOAD_STORE_LANES); + && memory_access_type != VMAT_LOAD_STORE_LANES + && memory_access_type != VMAT_CONTIGUOUS_PERMUTE); + unsigned int inside_cost = 0, prologue_cost = 0; - stmt_vec_info first_stmt_info = stmt_info; - bool grouped_access_p = STMT_VINFO_GROUPED_ACCESS (stmt_info); /* ??? Somehow we need to fix this at the callers. */ if (slp_node) @@ -983,35 +983,6 @@ vect_model_store_cost (vec_info *vinfo, stmt_vec_info stmt_info, int ncopies, stmt_info, 0, vect_prologue); } - /* Grouped stores update all elements in the group at once, - so we want the DR for the first statement. */ - if (!slp_node && grouped_access_p) - first_stmt_info = DR_GROUP_FIRST_ELEMENT (stmt_info); - - /* True if we should include any once-per-group costs as well as - the cost of the statement itself. For SLP we only get called - once per group anyhow. */ - bool first_stmt_p = (first_stmt_info == stmt_info); - - /* We assume that the cost of a single store-lanes instruction is - equivalent to the cost of DR_GROUP_SIZE separate stores. If a grouped - access is instead being provided by a permute-and-store operation, - include the cost of the permutes. */ - if (first_stmt_p - && memory_access_type == VMAT_CONTIGUOUS_PERMUTE) - { - /* Uses a high and low interleave or shuffle operations for each - needed permute. */ - int group_size = DR_GROUP_SIZE (first_stmt_info); - int nstmts = ncopies * ceil_log2 (group_size) * group_size; - inside_cost = record_stmt_cost (cost_vec, nstmts, vec_perm, - stmt_info, 0, vect_body); - - if (dump_enabled_p ()) - dump_printf_loc (MSG_NOTE, vect_location, - "vect_model_store_cost: strided group_size = %d .\n", - group_size); - } /* Costs of the stores. */ vect_get_store_cost (vinfo, stmt_info, ncopies, alignment_support_scheme, @@ -8408,9 +8379,7 @@ vectorizable_store (vec_info *vinfo, costing, use the first one instead. */ if (grouped_store && !slp - && first_stmt_info != stmt_info - && (memory_access_type == VMAT_ELEMENTWISE - || memory_access_type == VMAT_LOAD_STORE_LANES)) + && first_stmt_info != stmt_info) return true; } gcc_assert (memory_access_type == STMT_VINFO_MEMORY_ACCESS_TYPE (stmt_info)); @@ -9254,14 +9223,15 @@ vectorizable_store (vec_info *vinfo, return true; } + unsigned inside_cost = 0, prologue_cost = 0; auto_vec result_chain (group_size); auto_vec vec_oprnds; for (j = 0; j < ncopies; j++) { gimple *new_stmt; - if (j == 0 && !costing_p) + if (j == 0) { - if (slp) + if (slp && !costing_p) { /* Get vectorized arguments for SLP_NODE. */ vect_get_vec_defs (vinfo, stmt_info, slp_node, 1, op, @@ -9287,13 +9257,20 @@ vectorizable_store (vec_info *vinfo, that there is no interleaving, DR_GROUP_SIZE is 1, and only one iteration of the loop will be executed. */ op = vect_get_store_rhs (next_stmt_info); - vect_get_vec_defs_for_operand (vinfo, next_stmt_info, ncopies, - op, gvec_oprnds[i]); - vec_oprnd = (*gvec_oprnds[i])[0]; - dr_chain.quick_push (vec_oprnd); + if (costing_p + && memory_access_type == VMAT_CONTIGUOUS_PERMUTE) + update_prologue_cost (&prologue_cost, op); + else if (!costing_p) + { + vect_get_vec_defs_for_operand (vinfo, next_stmt_info, + ncopies, op, + gvec_oprnds[i]); + vec_oprnd = (*gvec_oprnds[i])[0]; + dr_chain.quick_push (vec_oprnd); + } next_stmt_info = DR_GROUP_NEXT_ELEMENT (next_stmt_info); } - if (mask) + if (mask && !costing_p) { vect_get_vec_defs_for_operand (vinfo, stmt_info, ncopies, mask, &vec_masks, @@ -9303,11 +9280,13 @@ vectorizable_store (vec_info *vinfo, } /* We should have catched mismatched types earlier. */ - gcc_assert (useless_type_conversion_p (vectype, - TREE_TYPE (vec_oprnd))); + gcc_assert (costing_p + || useless_type_conversion_p (vectype, + TREE_TYPE (vec_oprnd))); bool simd_lane_access_p = STMT_VINFO_SIMD_LANE_ACCESS_P (stmt_info) != 0; - if (simd_lane_access_p + if (!costing_p + && simd_lane_access_p && !loop_masks && TREE_CODE (DR_BASE_ADDRESS (first_dr_info->dr)) == ADDR_EXPR && VAR_P (TREE_OPERAND (DR_BASE_ADDRESS (first_dr_info->dr), 0)) @@ -9319,7 +9298,7 @@ vectorizable_store (vec_info *vinfo, dataref_ptr = unshare_expr (DR_BASE_ADDRESS (first_dr_info->dr)); dataref_offset = build_int_cst (ref_type, 0); } - else + else if (!costing_p) dataref_ptr = vect_create_data_ref_ptr (vinfo, first_stmt_info, aggr_type, simd_lane_access_p ? loop : NULL, @@ -9347,16 +9326,46 @@ vectorizable_store (vec_info *vinfo, } new_stmt = NULL; - if (!costing_p && grouped_store) - /* Permute. */ - vect_permute_store_chain (vinfo, dr_chain, group_size, stmt_info, gsi, - &result_chain); + if (grouped_store) + { + /* Permute. */ + gcc_assert (memory_access_type == VMAT_CONTIGUOUS_PERMUTE); + if (costing_p) + { + int group_size = DR_GROUP_SIZE (first_stmt_info); + int nstmts = ceil_log2 (group_size) * group_size; + inside_cost += record_stmt_cost (cost_vec, nstmts, vec_perm, + stmt_info, 0, vect_body); + if (dump_enabled_p ()) + dump_printf_loc (MSG_NOTE, vect_location, + "vect_model_store_cost: " + "strided group_size = %d .\n", + group_size); + } + else + vect_permute_store_chain (vinfo, dr_chain, group_size, stmt_info, + gsi, &result_chain); + } stmt_vec_info next_stmt_info = first_stmt_info; for (i = 0; i < vec_num; i++) { if (costing_p) - continue; + { + if (memory_access_type == VMAT_CONTIGUOUS_PERMUTE) + vect_get_store_cost (vinfo, stmt_info, 1, + alignment_support_scheme, misalignment, + &inside_cost, cost_vec); + + if (!slp) + { + next_stmt_info = DR_GROUP_NEXT_ELEMENT (next_stmt_info); + if (!next_stmt_info) + break; + } + + continue; + } unsigned misalign; unsigned HOST_WIDE_INT align; @@ -9540,9 +9549,20 @@ vectorizable_store (vec_info *vinfo, } if (costing_p) - vect_model_store_cost (vinfo, stmt_info, ncopies, memory_access_type, - alignment_support_scheme, misalignment, vls_type, - slp_node, cost_vec); + { + if (memory_access_type == VMAT_CONTIGUOUS_PERMUTE) + { + if (dump_enabled_p ()) + dump_printf_loc (MSG_NOTE, vect_location, + "vect_model_store_cost: inside_cost = %d, " + "prologue_cost = %d .\n", + inside_cost, prologue_cost); + } + else + vect_model_store_cost (vinfo, stmt_info, ncopies, memory_access_type, + alignment_support_scheme, misalignment, vls_type, + slp_node, cost_vec); + } return true; } From patchwork Thu Sep 14 03:11:57 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Kewen.Lin" X-Patchwork-Id: 1833961 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256 header.s=default header.b=Zcg3lmRi; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=8.43.85.97; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4RmMpw2z8fz1yhd for ; Thu, 14 Sep 2023 13:13:44 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 71851388265C for ; Thu, 14 Sep 2023 03:13:42 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 71851388265C DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1694661222; bh=aM6exNuyL4wstzLrtxZ3GzqDd9LIaDzDSWBtBTG+skU=; h=To:Cc:Subject:Date:In-Reply-To:References:List-Id: List-Unsubscribe:List-Archive:List-Post:List-Help:List-Subscribe: From:Reply-To:From; b=Zcg3lmRiit7rjAm2f1x9IeoXJVKqbUAtVNsksjbS2h/tLyZCEteypBOmgKLN2xoFy r/YmZcF19kpBoXYbkfIliWUEIAY7ZU+N3AaZ8DN+Yt2kZlBXswkUEIP03+LqlJavqi 0fUlHpdZ23EwJCTGok3zqc1kKD98MHO1Kq69Iu6E= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mx0b-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) by sourceware.org (Postfix) with ESMTPS id 1377B3857342 for ; Thu, 14 Sep 2023 03:12:17 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 1377B3857342 Received: from pps.filterd (m0353725.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 38E38UG1009801; Thu, 14 Sep 2023 03:12:13 GMT Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3t3smp8ff6-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 14 Sep 2023 03:12:13 +0000 Received: from m0353725.ppops.net (m0353725.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 38E3AKYx017675; Thu, 14 Sep 2023 03:12:12 GMT Received: from ppma23.wdc07v.mail.ibm.com (5d.69.3da9.ip4.static.sl-reverse.com [169.61.105.93]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3t3smp8fey-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 14 Sep 2023 03:12:12 +0000 Received: from pps.filterd (ppma23.wdc07v.mail.ibm.com [127.0.0.1]) by ppma23.wdc07v.mail.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 38E37MhL002755; Thu, 14 Sep 2023 03:12:12 GMT Received: from smtprelay05.fra02v.mail.ibm.com ([9.218.2.225]) by ppma23.wdc07v.mail.ibm.com (PPS) with ESMTPS id 3t14hm7q3x-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 14 Sep 2023 03:12:12 +0000 Received: from smtpav06.fra02v.mail.ibm.com (smtpav06.fra02v.mail.ibm.com [10.20.54.105]) by smtprelay05.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 38E3CA8B24445606 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 14 Sep 2023 03:12:10 GMT Received: from smtpav06.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 27DB42004B; Thu, 14 Sep 2023 03:12:10 +0000 (GMT) Received: from smtpav06.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 5DAC320040; Thu, 14 Sep 2023 03:12:09 +0000 (GMT) Received: from trout.aus.stglabs.ibm.com (unknown [9.40.194.100]) by smtpav06.fra02v.mail.ibm.com (Postfix) with ESMTP; Thu, 14 Sep 2023 03:12:09 +0000 (GMT) To: gcc-patches@gcc.gnu.org Cc: richard.guenther@gmail.com, richard.sandiford@arm.com Subject: [PATCH/RFC 08/10] aarch64: Don't use CEIL for vector_store in aarch64_stp_sequence_cost Date: Wed, 13 Sep 2023 22:11:57 -0500 Message-Id: X-Mailer: git-send-email 2.31.1 In-Reply-To: References: MIME-Version: 1.0 X-TM-AS-GCONF: 00 X-Proofpoint-ORIG-GUID: ghYjvqLm2oR4ipJNVh3E3aSbxyYV8aSp X-Proofpoint-GUID: 91DDTUS805zRxtpW6HV-A0AIV2XXbB_Y X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.267,Aquarius:18.0.980,Hydra:6.0.601,FMLib:17.11.176.26 definitions=2023-09-13_19,2023-09-13_01,2023-05-22_02 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 suspectscore=0 bulkscore=0 clxscore=1015 spamscore=0 priorityscore=1501 impostorscore=0 mlxlogscore=613 adultscore=0 malwarescore=0 mlxscore=0 phishscore=0 lowpriorityscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2308100000 definitions=main-2309140025 X-Spam-Status: No, score=-12.3 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, RCVD_IN_MSPIKE_H4, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Kewen Lin via Gcc-patches From: "Kewen.Lin" Reply-To: Kewen Lin Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" This costing adjustment patch series exposes one issue in aarch64 specific costing adjustment for STP sequence. It causes the below test cases to fail: - gcc/testsuite/gcc.target/aarch64/ldp_stp_15.c - gcc/testsuite/gcc.target/aarch64/ldp_stp_16.c - gcc/testsuite/gcc.target/aarch64/ldp_stp_17.c - gcc/testsuite/gcc.target/aarch64/ldp_stp_18.c Take the below function extracted from ldp_stp_15.c as example: void dup_8_int32_t (int32_t *x, int32_t val) { for (int i = 0; i < 8; ++i) x[i] = val; } Without my patch series, during slp1 it gets: val_8(D) 2 times unaligned_store (misalign -1) costs 2 in body node 0x10008c85e38 1 times scalar_to_vec costs 1 in prologue then the final vector cost is 3. With my patch series, during slp1 it gets: val_8(D) 1 times unaligned_store (misalign -1) costs 1 in body val_8(D) 1 times unaligned_store (misalign -1) costs 1 in body node 0x10004cc5d88 1 times scalar_to_vec costs 1 in prologue but the final vector cost is 17. The unaligned_store count is actually unchanged, but the final vector costs become different, it's because the below aarch64 special handling makes the different costs: /* Apply the heuristic described above m_stp_sequence_cost. */ if (m_stp_sequence_cost != ~0U) { uint64_t cost = aarch64_stp_sequence_cost (count, kind, stmt_info, vectype); m_stp_sequence_cost = MIN (m._stp_sequence_cost + cost, ~0U); } For the former, since the count is 2, function aarch64_stp_sequence_cost returns 2 as "CEIL (count, 2) * 2". While for the latter, it's separated into twice calls with count 1, aarch64_stp_sequence_cost returns 2 for each time, so it returns 4 in total. For this case, the stmt with scalar_to_vec also contributes 4 to m_stp_sequence_cost, then the final m_stp_sequence_cost are 6 (2+4) vs. 8 (4+4). Considering scalar_costs->m_stp_sequence_cost is 8 and below checking and re-assigning: else if (m_stp_sequence_cost >= scalar_costs->m_stp_sequence_cost) m_costs[vect_body] = 2 * scalar_costs->total_cost (); For the former, the body cost of vector isn't changed; but for the latter, the body cost of vector is double of scalar cost which is 8 for this case, then it becomes 16 which is bigger than what we expect. I'm not sure why it adopts CEIL for the return value for case unaligned_store in function aarch64_stp_sequence_cost, but I tried to modify it with "return count;" (as it can get back to previous cost), there is no failures exposed in regression testing. I expected that if the previous unaligned_store count is even, this adjustment doesn't change anything, if it's odd, the adjustment may reduce it by one, but I'd guess it would be few. Besides, as the comments for m_stp_sequence_cost, the current handlings seems temporary, maybe a tweak like this can be accepted, so I posted this RFC/PATCH to request comments. this one line change is considered. gcc/ChangeLog: * config/aarch64/aarch64.cc (aarch64_stp_sequence_cost): Return count directly instead of the adjusted value computed with CEIL. --- gcc/config/aarch64/aarch64.cc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc index 37d414021ca..9fb4fbd883d 100644 --- a/gcc/config/aarch64/aarch64.cc +++ b/gcc/config/aarch64/aarch64.cc @@ -17051,7 +17051,7 @@ aarch64_stp_sequence_cost (unsigned int count, vect_cost_for_stmt kind, if (!aarch64_aligned_constant_offset_p (stmt_info, size)) return count * 2; } - return CEIL (count, 2) * 2; + return count; case scalar_store: if (stmt_info && STMT_VINFO_DATA_REF (stmt_info)) From patchwork Thu Sep 14 03:11:58 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Kewen.Lin" X-Patchwork-Id: 1833965 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256 header.s=default header.b=Z/dbu1RH; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4RmMrK2FkHz1yhd for ; Thu, 14 Sep 2023 13:14:57 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 53509385483F for ; Thu, 14 Sep 2023 03:14:55 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 53509385483F DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1694661295; bh=6l68TugUHT9n0p00kuhrHFYdDmKgcJC1VWtgmGj7kuc=; h=To:Cc:Subject:Date:In-Reply-To:References:List-Id: List-Unsubscribe:List-Archive:List-Post:List-Help:List-Subscribe: From:Reply-To:From; b=Z/dbu1RHU98KIYZ7bYTqt8IORCE67Ep94N7hhKh5M1Oi6kHU/BeapWpRsBbz9Ww5B g452X9sGEb05ar7oHyxzOF+HagPsfneafNcuRMc0NclIRX2w174kA1XCDY1BtHK2D/ IjgGAXAcMWqNwiWHGDHzsO/ChV4D6j/KKnUyFrXk= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by sourceware.org (Postfix) with ESMTPS id 1E111385842A for ; Thu, 14 Sep 2023 03:12:18 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 1E111385842A Received: from pps.filterd (m0360083.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 38E39boq021124; Thu, 14 Sep 2023 03:12:14 GMT Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3t3s628wmc-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 14 Sep 2023 03:12:14 +0000 Received: from m0360083.ppops.net (m0360083.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 38E3AIjV023165; Thu, 14 Sep 2023 03:12:14 GMT Received: from ppma13.dal12v.mail.ibm.com (dd.9e.1632.ip4.static.sl-reverse.com [50.22.158.221]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3t3s628wm4-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 14 Sep 2023 03:12:13 +0000 Received: from pps.filterd (ppma13.dal12v.mail.ibm.com [127.0.0.1]) by ppma13.dal12v.mail.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 38E0S9kT002352; Thu, 14 Sep 2023 03:12:13 GMT Received: from smtprelay06.fra02v.mail.ibm.com ([9.218.2.230]) by ppma13.dal12v.mail.ibm.com (PPS) with ESMTPS id 3t158kffj1-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 14 Sep 2023 03:12:12 +0000 Received: from smtpav06.fra02v.mail.ibm.com (smtpav06.fra02v.mail.ibm.com [10.20.54.105]) by smtprelay06.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 38E3CBx044827092 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 14 Sep 2023 03:12:11 GMT Received: from smtpav06.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 392CA20040; Thu, 14 Sep 2023 03:12:11 +0000 (GMT) Received: from smtpav06.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 6EFCB2004B; Thu, 14 Sep 2023 03:12:10 +0000 (GMT) Received: from trout.aus.stglabs.ibm.com (unknown [9.40.194.100]) by smtpav06.fra02v.mail.ibm.com (Postfix) with ESMTP; Thu, 14 Sep 2023 03:12:10 +0000 (GMT) To: gcc-patches@gcc.gnu.org Cc: richard.guenther@gmail.com, richard.sandiford@arm.com Subject: [PATCH 09/10] vect: Get rid of vect_model_store_cost Date: Wed, 13 Sep 2023 22:11:58 -0500 Message-Id: X-Mailer: git-send-email 2.31.1 In-Reply-To: References: MIME-Version: 1.0 X-TM-AS-GCONF: 00 X-Proofpoint-GUID: _ljovXMqTxoxOOpH2eryTEMea9vxCyYr X-Proofpoint-ORIG-GUID: W3DzvZd6zhSwjPuWNtOU-mxbGdJGKxcq X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.267,Aquarius:18.0.980,Hydra:6.0.601,FMLib:17.11.176.26 definitions=2023-09-13_19,2023-09-13_01,2023-05-22_02 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 mlxlogscore=999 spamscore=0 mlxscore=0 bulkscore=0 suspectscore=0 lowpriorityscore=0 malwarescore=0 priorityscore=1501 impostorscore=0 clxscore=1015 phishscore=0 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2308100000 definitions=main-2309140025 X-Spam-Status: No, score=-12.3 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_EF, GIT_PATCH_0, RCVD_IN_MSPIKE_H4, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Kewen Lin via Gcc-patches From: "Kewen.Lin" Reply-To: Kewen Lin Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" This patch is to eventually get rid of vect_model_store_cost, it adjusts the costing for the remaining memory access types VMAT_CONTIGUOUS{, _DOWN, _REVERSE} by moving costing close to the transform code. Note that in vect_model_store_cost, there is one special handling for vectorizing a store into the function result, since it's extra penalty and the transform part doesn't have it, this patch keep it alone. gcc/ChangeLog: * tree-vect-stmts.cc (vect_model_store_cost): Remove. (vectorizable_store): Adjust the costing for the remaining memory access types VMAT_CONTIGUOUS{, _DOWN, _REVERSE}. --- gcc/tree-vect-stmts.cc | 137 +++++++++++++---------------------------- 1 file changed, 44 insertions(+), 93 deletions(-) diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc index e3ba8077091..3d451c80bca 100644 --- a/gcc/tree-vect-stmts.cc +++ b/gcc/tree-vect-stmts.cc @@ -951,81 +951,6 @@ cfun_returns (tree decl) return false; } -/* Function vect_model_store_cost - - Models cost for stores. In the case of grouped accesses, one access - has the overhead of the grouped access attributed to it. */ - -static void -vect_model_store_cost (vec_info *vinfo, stmt_vec_info stmt_info, int ncopies, - vect_memory_access_type memory_access_type, - dr_alignment_support alignment_support_scheme, - int misalignment, - vec_load_store_type vls_type, slp_tree slp_node, - stmt_vector_for_cost *cost_vec) -{ - gcc_assert (memory_access_type != VMAT_GATHER_SCATTER - && memory_access_type != VMAT_ELEMENTWISE - && memory_access_type != VMAT_STRIDED_SLP - && memory_access_type != VMAT_LOAD_STORE_LANES - && memory_access_type != VMAT_CONTIGUOUS_PERMUTE); - - unsigned int inside_cost = 0, prologue_cost = 0; - - /* ??? Somehow we need to fix this at the callers. */ - if (slp_node) - ncopies = SLP_TREE_NUMBER_OF_VEC_STMTS (slp_node); - - if (vls_type == VLS_STORE_INVARIANT) - { - if (!slp_node) - prologue_cost += record_stmt_cost (cost_vec, 1, scalar_to_vec, - stmt_info, 0, vect_prologue); - } - - - /* Costs of the stores. */ - vect_get_store_cost (vinfo, stmt_info, ncopies, alignment_support_scheme, - misalignment, &inside_cost, cost_vec); - - /* When vectorizing a store into the function result assign - a penalty if the function returns in a multi-register location. - In this case we assume we'll end up with having to spill the - vector result and do piecewise loads as a conservative estimate. */ - tree base = get_base_address (STMT_VINFO_DATA_REF (stmt_info)->ref); - if (base - && (TREE_CODE (base) == RESULT_DECL - || (DECL_P (base) && cfun_returns (base))) - && !aggregate_value_p (base, cfun->decl)) - { - rtx reg = hard_function_value (TREE_TYPE (base), cfun->decl, 0, 1); - /* ??? Handle PARALLEL in some way. */ - if (REG_P (reg)) - { - int nregs = hard_regno_nregs (REGNO (reg), GET_MODE (reg)); - /* Assume that a single reg-reg move is possible and cheap, - do not account for vector to gp register move cost. */ - if (nregs > 1) - { - /* Spill. */ - prologue_cost += record_stmt_cost (cost_vec, ncopies, - vector_store, - stmt_info, 0, vect_epilogue); - /* Loads. */ - prologue_cost += record_stmt_cost (cost_vec, ncopies * nregs, - scalar_load, - stmt_info, 0, vect_epilogue); - } - } - } - - if (dump_enabled_p ()) - dump_printf_loc (MSG_NOTE, vect_location, - "vect_model_store_cost: inside_cost = %d, " - "prologue_cost = %d .\n", inside_cost, prologue_cost); -} - - /* Calculate cost of DR's memory access. */ void vect_get_store_cost (vec_info *, stmt_vec_info stmt_info, int ncopies, @@ -9223,6 +9148,11 @@ vectorizable_store (vec_info *vinfo, return true; } + gcc_assert (memory_access_type == VMAT_CONTIGUOUS + || memory_access_type == VMAT_CONTIGUOUS_DOWN + || memory_access_type == VMAT_CONTIGUOUS_PERMUTE + || memory_access_type == VMAT_CONTIGUOUS_REVERSE); + unsigned inside_cost = 0, prologue_cost = 0; auto_vec result_chain (group_size); auto_vec vec_oprnds; @@ -9257,10 +9187,9 @@ vectorizable_store (vec_info *vinfo, that there is no interleaving, DR_GROUP_SIZE is 1, and only one iteration of the loop will be executed. */ op = vect_get_store_rhs (next_stmt_info); - if (costing_p - && memory_access_type == VMAT_CONTIGUOUS_PERMUTE) + if (costing_p) update_prologue_cost (&prologue_cost, op); - else if (!costing_p) + else { vect_get_vec_defs_for_operand (vinfo, next_stmt_info, ncopies, op, @@ -9352,10 +9281,9 @@ vectorizable_store (vec_info *vinfo, { if (costing_p) { - if (memory_access_type == VMAT_CONTIGUOUS_PERMUTE) - vect_get_store_cost (vinfo, stmt_info, 1, - alignment_support_scheme, misalignment, - &inside_cost, cost_vec); + vect_get_store_cost (vinfo, stmt_info, 1, + alignment_support_scheme, misalignment, + &inside_cost, cost_vec); if (!slp) { @@ -9550,18 +9478,41 @@ vectorizable_store (vec_info *vinfo, if (costing_p) { - if (memory_access_type == VMAT_CONTIGUOUS_PERMUTE) - { - if (dump_enabled_p ()) - dump_printf_loc (MSG_NOTE, vect_location, - "vect_model_store_cost: inside_cost = %d, " - "prologue_cost = %d .\n", - inside_cost, prologue_cost); + /* When vectorizing a store into the function result assign + a penalty if the function returns in a multi-register location. + In this case we assume we'll end up with having to spill the + vector result and do piecewise loads as a conservative estimate. */ + tree base = get_base_address (STMT_VINFO_DATA_REF (stmt_info)->ref); + if (base + && (TREE_CODE (base) == RESULT_DECL + || (DECL_P (base) && cfun_returns (base))) + && !aggregate_value_p (base, cfun->decl)) + { + rtx reg = hard_function_value (TREE_TYPE (base), cfun->decl, 0, 1); + /* ??? Handle PARALLEL in some way. */ + if (REG_P (reg)) + { + int nregs = hard_regno_nregs (REGNO (reg), GET_MODE (reg)); + /* Assume that a single reg-reg move is possible and cheap, + do not account for vector to gp register move cost. */ + if (nregs > 1) + { + /* Spill. */ + prologue_cost + += record_stmt_cost (cost_vec, ncopies, vector_store, + stmt_info, 0, vect_epilogue); + /* Loads. */ + prologue_cost + += record_stmt_cost (cost_vec, ncopies * nregs, scalar_load, + stmt_info, 0, vect_epilogue); + } + } } - else - vect_model_store_cost (vinfo, stmt_info, ncopies, memory_access_type, - alignment_support_scheme, misalignment, vls_type, - slp_node, cost_vec); + if (dump_enabled_p ()) + dump_printf_loc (MSG_NOTE, vect_location, + "vect_model_store_cost: inside_cost = %d, " + "prologue_cost = %d .\n", + inside_cost, prologue_cost); } return true; From patchwork Thu Sep 14 03:11:59 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Kewen.Lin" X-Patchwork-Id: 1833967 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256 header.s=default header.b=Y4/dlEbo; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4RmMs44B3Gz1yhn for ; Thu, 14 Sep 2023 13:15:36 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 9660D3882648 for ; Thu, 14 Sep 2023 03:15:34 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 9660D3882648 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1694661334; bh=p3m/dVg9mRJnfmR1+w8o2tq3treyr0bLr54ohXJdDqM=; h=To:Cc:Subject:Date:In-Reply-To:References:List-Id: List-Unsubscribe:List-Archive:List-Post:List-Help:List-Subscribe: From:Reply-To:From; b=Y4/dlEboeCVvVQnXCr1NS8uxVnnqnDwCdtw23HgqEbI4eDEDgAs1a+Bsdd9qDCyi9 6iG96+B4D5uoaIHb011QWc7wQVdu+L/PGET4Zy+o+tty6KKXXuVwJkgyhQQ/HvLjDq x+NzkA7uppGR8TN47iBRTRwTVNXvdTP7CyUG4TGg= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mx0b-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) by sourceware.org (Postfix) with ESMTPS id 6E23D385843E for ; Thu, 14 Sep 2023 03:12:17 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 6E23D385843E Received: from pps.filterd (m0353723.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 38E374NM013837; Thu, 14 Sep 2023 03:12:15 GMT Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3t3rabj99e-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 14 Sep 2023 03:12:15 +0000 Received: from m0353723.ppops.net (m0353723.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 38E370Qk012491; Thu, 14 Sep 2023 03:12:14 GMT Received: from ppma12.dal12v.mail.ibm.com (dc.9e.1632.ip4.static.sl-reverse.com [50.22.158.220]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3t3rabj996-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 14 Sep 2023 03:12:14 +0000 Received: from pps.filterd (ppma12.dal12v.mail.ibm.com [127.0.0.1]) by ppma12.dal12v.mail.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 38E1vxH6024080; Thu, 14 Sep 2023 03:12:14 GMT Received: from smtprelay07.fra02v.mail.ibm.com ([9.218.2.229]) by ppma12.dal12v.mail.ibm.com (PPS) with ESMTPS id 3t131tg7k3-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 14 Sep 2023 03:12:13 +0000 Received: from smtpav06.fra02v.mail.ibm.com (smtpav06.fra02v.mail.ibm.com [10.20.54.105]) by smtprelay07.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 38E3CCoL62194174 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 14 Sep 2023 03:12:12 GMT Received: from smtpav06.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 490FD2004B; Thu, 14 Sep 2023 03:12:12 +0000 (GMT) Received: from smtpav06.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 8015220040; Thu, 14 Sep 2023 03:12:11 +0000 (GMT) Received: from trout.aus.stglabs.ibm.com (unknown [9.40.194.100]) by smtpav06.fra02v.mail.ibm.com (Postfix) with ESMTP; Thu, 14 Sep 2023 03:12:11 +0000 (GMT) To: gcc-patches@gcc.gnu.org Cc: richard.guenther@gmail.com, richard.sandiford@arm.com Subject: [PATCH 10/10] vect: Consider vec_perm costing for VMAT_CONTIGUOUS_REVERSE Date: Wed, 13 Sep 2023 22:11:59 -0500 Message-Id: <7514680ad7b9b859a054ca1a59356f58b5ac9089.1694657495.git.linkw@linux.ibm.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: References: MIME-Version: 1.0 X-TM-AS-GCONF: 00 X-Proofpoint-GUID: qyDYQG2df7uWknIsrpRbxOg3KonIyCDk X-Proofpoint-ORIG-GUID: rg6aGb3L8l6jg6xlMeJNyVfJVxWS22Mq X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.267,Aquarius:18.0.980,Hydra:6.0.601,FMLib:17.11.176.26 definitions=2023-09-13_19,2023-09-13_01,2023-05-22_02 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 spamscore=0 adultscore=0 mlxscore=0 mlxlogscore=999 malwarescore=0 bulkscore=0 priorityscore=1501 clxscore=1015 phishscore=0 lowpriorityscore=0 suspectscore=0 impostorscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2308100000 definitions=main-2309140025 X-Spam-Status: No, score=-12.3 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_EF, GIT_PATCH_0, RCVD_IN_MSPIKE_H4, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Kewen Lin via Gcc-patches From: "Kewen.Lin" Reply-To: Kewen Lin Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" For VMAT_CONTIGUOUS_REVERSE, the transform code in function vectorizable_store generates a VEC_PERM_EXPR stmt before storing, but it's never considered in costing. This patch is to make it consider vec_perm in costing, it adjusts the order of transform code a bit to make it easy to early return for costing_p. gcc/ChangeLog: * tree-vect-stmts.cc (vectorizable_store): Consider generated VEC_PERM_EXPR stmt for VMAT_CONTIGUOUS_REVERSE in costing as vec_perm. gcc/testsuite/ChangeLog: * gcc.dg/vect/costmodel/ppc/costmodel-vect-store-2.c: New test. --- .../costmodel/ppc/costmodel-vect-store-2.c | 29 +++++++++ gcc/tree-vect-stmts.cc | 63 +++++++++++-------- 2 files changed, 65 insertions(+), 27 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-store-2.c diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-store-2.c b/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-store-2.c new file mode 100644 index 00000000000..72b67cf9040 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-store-2.c @@ -0,0 +1,29 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_int } */ +/* { dg-require-effective-target powerpc_vsx_ok } */ +/* { dg-additional-options "-mvsx" } */ + +/* Verify we do cost the required vec_perm. */ + +int +foo (int *a, int *b, int len) +{ + int i; + int *a1 = a; + int *a0 = a1 - 4; + for (i = 0; i < len; i++) + { + *b = *a0 + *a1; + b--; + a0++; + a1++; + } + return 0; +} + +/* The reason why it doesn't check the exact count is that + we can get more than 1 vec_perm when it's compiled with + partial vector capability like Power10 (retrying for + the epilogue) or it's complied without unaligned vector + memory access support (realign). */ +/* { dg-final { scan-tree-dump {\mvec_perm\M} "vect" } } */ diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc index 3d451c80bca..ce925cc1d53 100644 --- a/gcc/tree-vect-stmts.cc +++ b/gcc/tree-vect-stmts.cc @@ -9279,6 +9279,40 @@ vectorizable_store (vec_info *vinfo, stmt_vec_info next_stmt_info = first_stmt_info; for (i = 0; i < vec_num; i++) { + if (!costing_p) + { + if (slp) + vec_oprnd = vec_oprnds[i]; + else if (grouped_store) + /* For grouped stores vectorized defs are interleaved in + vect_permute_store_chain(). */ + vec_oprnd = result_chain[i]; + } + + if (memory_access_type == VMAT_CONTIGUOUS_REVERSE) + { + if (costing_p) + inside_cost += record_stmt_cost (cost_vec, 1, vec_perm, + stmt_info, 0, vect_body); + else + { + tree perm_mask = perm_mask_for_reverse (vectype); + tree perm_dest = vect_create_destination_var ( + vect_get_store_rhs (stmt_info), vectype); + tree new_temp = make_ssa_name (perm_dest); + + /* Generate the permute statement. */ + gimple *perm_stmt + = gimple_build_assign (new_temp, VEC_PERM_EXPR, vec_oprnd, + vec_oprnd, perm_mask); + vect_finish_stmt_generation (vinfo, stmt_info, perm_stmt, + gsi); + + perm_stmt = SSA_NAME_DEF_STMT (new_temp); + vec_oprnd = new_temp; + } + } + if (costing_p) { vect_get_store_cost (vinfo, stmt_info, 1, @@ -9294,8 +9328,6 @@ vectorizable_store (vec_info *vinfo, continue; } - unsigned misalign; - unsigned HOST_WIDE_INT align; tree final_mask = NULL_TREE; tree final_len = NULL_TREE; @@ -9315,13 +9347,8 @@ vectorizable_store (vec_info *vinfo, dataref_ptr = bump_vector_ptr (vinfo, dataref_ptr, ptr_incr, gsi, stmt_info, bump); - if (slp) - vec_oprnd = vec_oprnds[i]; - else if (grouped_store) - /* For grouped stores vectorized defs are interleaved in - vect_permute_store_chain(). */ - vec_oprnd = result_chain[i]; - + unsigned misalign; + unsigned HOST_WIDE_INT align; align = known_alignment (DR_TARGET_ALIGNMENT (first_dr_info)); if (alignment_support_scheme == dr_aligned) misalign = 0; @@ -9338,24 +9365,6 @@ vectorizable_store (vec_info *vinfo, misalign); align = least_bit_hwi (misalign | align); - if (memory_access_type == VMAT_CONTIGUOUS_REVERSE) - { - tree perm_mask = perm_mask_for_reverse (vectype); - tree perm_dest - = vect_create_destination_var (vect_get_store_rhs (stmt_info), - vectype); - tree new_temp = make_ssa_name (perm_dest); - - /* Generate the permute statement. */ - gimple *perm_stmt - = gimple_build_assign (new_temp, VEC_PERM_EXPR, vec_oprnd, - vec_oprnd, perm_mask); - vect_finish_stmt_generation (vinfo, stmt_info, perm_stmt, gsi); - - perm_stmt = SSA_NAME_DEF_STMT (new_temp); - vec_oprnd = new_temp; - } - /* Compute IFN when LOOP_LENS or final_mask valid. */ machine_mode vmode = TYPE_MODE (vectype); machine_mode new_vmode = vmode;