From patchwork Thu Sep 14 03:11:57 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Kewen.Lin" X-Patchwork-Id: 1833961 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256 header.s=default header.b=Zcg3lmRi; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=8.43.85.97; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4RmMpw2z8fz1yhd for ; Thu, 14 Sep 2023 13:13:44 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 71851388265C for ; Thu, 14 Sep 2023 03:13:42 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 71851388265C DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1694661222; bh=aM6exNuyL4wstzLrtxZ3GzqDd9LIaDzDSWBtBTG+skU=; h=To:Cc:Subject:Date:In-Reply-To:References:List-Id: List-Unsubscribe:List-Archive:List-Post:List-Help:List-Subscribe: From:Reply-To:From; b=Zcg3lmRiit7rjAm2f1x9IeoXJVKqbUAtVNsksjbS2h/tLyZCEteypBOmgKLN2xoFy r/YmZcF19kpBoXYbkfIliWUEIAY7ZU+N3AaZ8DN+Yt2kZlBXswkUEIP03+LqlJavqi 0fUlHpdZ23EwJCTGok3zqc1kKD98MHO1Kq69Iu6E= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mx0b-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) by sourceware.org (Postfix) with ESMTPS id 1377B3857342 for ; Thu, 14 Sep 2023 03:12:17 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 1377B3857342 Received: from pps.filterd (m0353725.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 38E38UG1009801; Thu, 14 Sep 2023 03:12:13 GMT Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3t3smp8ff6-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 14 Sep 2023 03:12:13 +0000 Received: from m0353725.ppops.net (m0353725.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 38E3AKYx017675; Thu, 14 Sep 2023 03:12:12 GMT Received: from ppma23.wdc07v.mail.ibm.com (5d.69.3da9.ip4.static.sl-reverse.com [169.61.105.93]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3t3smp8fey-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 14 Sep 2023 03:12:12 +0000 Received: from pps.filterd (ppma23.wdc07v.mail.ibm.com [127.0.0.1]) by ppma23.wdc07v.mail.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 38E37MhL002755; Thu, 14 Sep 2023 03:12:12 GMT Received: from smtprelay05.fra02v.mail.ibm.com ([9.218.2.225]) by ppma23.wdc07v.mail.ibm.com (PPS) with ESMTPS id 3t14hm7q3x-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 14 Sep 2023 03:12:12 +0000 Received: from smtpav06.fra02v.mail.ibm.com (smtpav06.fra02v.mail.ibm.com [10.20.54.105]) by smtprelay05.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 38E3CA8B24445606 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 14 Sep 2023 03:12:10 GMT Received: from smtpav06.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 27DB42004B; Thu, 14 Sep 2023 03:12:10 +0000 (GMT) Received: from smtpav06.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 5DAC320040; Thu, 14 Sep 2023 03:12:09 +0000 (GMT) Received: from trout.aus.stglabs.ibm.com (unknown [9.40.194.100]) by smtpav06.fra02v.mail.ibm.com (Postfix) with ESMTP; Thu, 14 Sep 2023 03:12:09 +0000 (GMT) To: gcc-patches@gcc.gnu.org Cc: richard.guenther@gmail.com, richard.sandiford@arm.com Subject: [PATCH/RFC 08/10] aarch64: Don't use CEIL for vector_store in aarch64_stp_sequence_cost Date: Wed, 13 Sep 2023 22:11:57 -0500 Message-Id: X-Mailer: git-send-email 2.31.1 In-Reply-To: References: MIME-Version: 1.0 X-TM-AS-GCONF: 00 X-Proofpoint-ORIG-GUID: ghYjvqLm2oR4ipJNVh3E3aSbxyYV8aSp X-Proofpoint-GUID: 91DDTUS805zRxtpW6HV-A0AIV2XXbB_Y X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.267,Aquarius:18.0.980,Hydra:6.0.601,FMLib:17.11.176.26 definitions=2023-09-13_19,2023-09-13_01,2023-05-22_02 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 suspectscore=0 bulkscore=0 clxscore=1015 spamscore=0 priorityscore=1501 impostorscore=0 mlxlogscore=613 adultscore=0 malwarescore=0 mlxscore=0 phishscore=0 lowpriorityscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2308100000 definitions=main-2309140025 X-Spam-Status: No, score=-12.3 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, RCVD_IN_MSPIKE_H4, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Kewen Lin via Gcc-patches From: "Kewen.Lin" Reply-To: Kewen Lin Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" This costing adjustment patch series exposes one issue in aarch64 specific costing adjustment for STP sequence. It causes the below test cases to fail: - gcc/testsuite/gcc.target/aarch64/ldp_stp_15.c - gcc/testsuite/gcc.target/aarch64/ldp_stp_16.c - gcc/testsuite/gcc.target/aarch64/ldp_stp_17.c - gcc/testsuite/gcc.target/aarch64/ldp_stp_18.c Take the below function extracted from ldp_stp_15.c as example: void dup_8_int32_t (int32_t *x, int32_t val) { for (int i = 0; i < 8; ++i) x[i] = val; } Without my patch series, during slp1 it gets: val_8(D) 2 times unaligned_store (misalign -1) costs 2 in body node 0x10008c85e38 1 times scalar_to_vec costs 1 in prologue then the final vector cost is 3. With my patch series, during slp1 it gets: val_8(D) 1 times unaligned_store (misalign -1) costs 1 in body val_8(D) 1 times unaligned_store (misalign -1) costs 1 in body node 0x10004cc5d88 1 times scalar_to_vec costs 1 in prologue but the final vector cost is 17. The unaligned_store count is actually unchanged, but the final vector costs become different, it's because the below aarch64 special handling makes the different costs: /* Apply the heuristic described above m_stp_sequence_cost. */ if (m_stp_sequence_cost != ~0U) { uint64_t cost = aarch64_stp_sequence_cost (count, kind, stmt_info, vectype); m_stp_sequence_cost = MIN (m._stp_sequence_cost + cost, ~0U); } For the former, since the count is 2, function aarch64_stp_sequence_cost returns 2 as "CEIL (count, 2) * 2". While for the latter, it's separated into twice calls with count 1, aarch64_stp_sequence_cost returns 2 for each time, so it returns 4 in total. For this case, the stmt with scalar_to_vec also contributes 4 to m_stp_sequence_cost, then the final m_stp_sequence_cost are 6 (2+4) vs. 8 (4+4). Considering scalar_costs->m_stp_sequence_cost is 8 and below checking and re-assigning: else if (m_stp_sequence_cost >= scalar_costs->m_stp_sequence_cost) m_costs[vect_body] = 2 * scalar_costs->total_cost (); For the former, the body cost of vector isn't changed; but for the latter, the body cost of vector is double of scalar cost which is 8 for this case, then it becomes 16 which is bigger than what we expect. I'm not sure why it adopts CEIL for the return value for case unaligned_store in function aarch64_stp_sequence_cost, but I tried to modify it with "return count;" (as it can get back to previous cost), there is no failures exposed in regression testing. I expected that if the previous unaligned_store count is even, this adjustment doesn't change anything, if it's odd, the adjustment may reduce it by one, but I'd guess it would be few. Besides, as the comments for m_stp_sequence_cost, the current handlings seems temporary, maybe a tweak like this can be accepted, so I posted this RFC/PATCH to request comments. this one line change is considered. gcc/ChangeLog: * config/aarch64/aarch64.cc (aarch64_stp_sequence_cost): Return count directly instead of the adjusted value computed with CEIL. --- gcc/config/aarch64/aarch64.cc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc index 37d414021ca..9fb4fbd883d 100644 --- a/gcc/config/aarch64/aarch64.cc +++ b/gcc/config/aarch64/aarch64.cc @@ -17051,7 +17051,7 @@ aarch64_stp_sequence_cost (unsigned int count, vect_cost_for_stmt kind, if (!aarch64_aligned_constant_offset_p (stmt_info, size)) return count * 2; } - return CEIL (count, 2) * 2; + return count; case scalar_store: if (stmt_info && STMT_VINFO_DATA_REF (stmt_info))