From patchwork Thu Jul 25 06:09:08 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: HAO CHEN GUI X-Patchwork-Id: 1964595 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=ibm.com header.i=@ibm.com header.a=rsa-sha256 header.s=pp1 header.b=de9cMUfp; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=8.43.85.97; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4WV0pZ5gwMz1yY9 for ; Thu, 25 Jul 2024 16:09:41 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 7DC8C3858431 for ; Thu, 25 Jul 2024 06:09:39 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by sourceware.org (Postfix) with ESMTPS id CB7F03858D29 for ; Thu, 25 Jul 2024 06:09:19 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org CB7F03858D29 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=linux.ibm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linux.ibm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org CB7F03858D29 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=148.163.156.1 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1721887762; cv=none; b=Pd7AFk6ZvyPXHYo6WRaPBoVpe5BVgzOQcPQgo2vitqFwuW0+3mnmOIPsnJ3uvvmfEE0DELR3mOuMp0vg3Q173yXu2PON3TLXZhdQQysg1gv/uYOOTNsB3DY+n7Eo7s8u9ZEzCDRrQmtVctGf1ysW9s+CHh4vUny/hu2eVVSY2ZA= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1721887762; c=relaxed/simple; bh=eE2Tq5SxSccwkzIU4YKAN++mcBXDIQTiqJhRcCS0YsM=; h=DKIM-Signature:Message-ID:Date:To:From:Subject:MIME-Version; b=V45FPE95LWctiHkMdTvpIr/svd4VhJN4WBXB9EkoeRDdPAMBvzrkj0a/s8WMqTqge3FxcYO310kDeyj0yz5pIW0M5umOMdIv9B60Ck6UezilV+1jXx3NT+zab13GODKq3uesK4jh8QTjYWpuxAmdYwTK4uxlWQc0kiEYrcaS9sI= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from pps.filterd (m0353727.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 46P5tuUq018058; Thu, 25 Jul 2024 06:09:18 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h= message-id:date:to:cc:from:subject:content-type :content-transfer-encoding:mime-version; s=pp1; bh=dwiuwcO+Nfw1G 8pqG1t7LpPzZFhMFyiFBT0GCWCwTIM=; b=de9cMUfpnrzMD4cHnxhG/Jy242VBt LWaFcE01mpn7NBsSQ+OQf2wwNWJsm0QeTQ6COBfGsC1IRUa6QE6uhKpZM8oop6Ul uG4pqhWY3RqQxDb9L+ynXab/Zz0iMp+BlVWKNFgB3lUpbOmvBpFPeMWPg3ITBUP0 6V50laUombakEtNjl12m7svuWMsyOeDNDdI59HJjE0kJmm5vVGf0afG1JHgeWfNE rDqbbpkG2jLqaXeO+e87G2+9MAo6hrwQ1UuwN+46mh9/o0FW55mcVb4rUDVadT5A TuEZXxrRmx2ePkQHzAW9GxhE1x565sCi/py7/QWVZoU2YgM9D/KMgNBMQ== Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 40kbrugn64-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 25 Jul 2024 06:09:18 +0000 (GMT) Received: from m0353727.ppops.net (m0353727.ppops.net [127.0.0.1]) by pps.reinject (8.18.0.8/8.18.0.8) with ESMTP id 46P69Huf009581; Thu, 25 Jul 2024 06:09:17 GMT Received: from ppma13.dal12v.mail.ibm.com (dd.9e.1632.ip4.static.sl-reverse.com [50.22.158.221]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 40kbrugn62-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 25 Jul 2024 06:09:17 +0000 (GMT) Received: from pps.filterd (ppma13.dal12v.mail.ibm.com [127.0.0.1]) by ppma13.dal12v.mail.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 46P5uI77007172; Thu, 25 Jul 2024 06:09:16 GMT Received: from smtprelay02.fra02v.mail.ibm.com ([9.218.2.226]) by ppma13.dal12v.mail.ibm.com (PPS) with ESMTPS id 40gx72vate-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 25 Jul 2024 06:09:16 +0000 Received: from smtpav02.fra02v.mail.ibm.com (smtpav02.fra02v.mail.ibm.com [10.20.54.101]) by smtprelay02.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 46P69Bud52822302 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 25 Jul 2024 06:09:13 GMT Received: from smtpav02.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 2392D2004F; Thu, 25 Jul 2024 06:09:11 +0000 (GMT) Received: from smtpav02.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id ABA5420040; Thu, 25 Jul 2024 06:09:09 +0000 (GMT) Received: from [9.200.103.140] (unknown [9.200.103.140]) by smtpav02.fra02v.mail.ibm.com (Postfix) with ESMTP; Thu, 25 Jul 2024 06:09:09 +0000 (GMT) Message-ID: <0da5f7f3-2cb4-41b9-b7ac-3c88354af377@linux.ibm.com> Date: Thu, 25 Jul 2024 14:09:08 +0800 User-Agent: Mozilla Thunderbird Content-Language: en-US To: gcc-patches Cc: Segher Boessenkool , David , "Kewen.Lin" , Peter Bergner From: HAO CHEN GUI Subject: [PATCHv3, rs6000] Optimize vector construction with two vector doubleword loads [PR103568] X-TM-AS-GCONF: 00 X-Proofpoint-ORIG-GUID: wXakKMIOUpmHO17vh8qIeMlEHjJ8Yai7 X-Proofpoint-GUID: N49HmlURAnbqe4JKOnxaOW0ZIgXjkSlg X-Proofpoint-UnRewURL: 0 URL was un-rewritten MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1039,Hydra:6.0.680,FMLib:17.12.28.16 definitions=2024-07-25_05,2024-07-25_02,2024-05-17_01 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 impostorscore=0 malwarescore=0 bulkscore=0 phishscore=0 mlxlogscore=999 spamscore=0 mlxscore=0 clxscore=1015 priorityscore=1501 suspectscore=0 lowpriorityscore=0 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.19.0-2407110000 definitions=main-2407250037 X-Spam-Status: No, score=-12.1 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org Hi, This patch optimizes vector construction with two vector doubleword loads. It generates an optimal insn sequence as "xxlor" has lower latency than "mtvsrdd" on Power10. Compared with previous version, the main change is to add new patterns for LE platform. Also lxsd[x] instructions are guarded by POWER10 as the dword1 is undefined before ISA3.1. https://gcc.gnu.org/pipermail/gcc-patches/2024-May/653180.html Bootstrapped and tested on powerpc64-linux BE and LE with no regressions. OK for the trunk? Thanks Gui Haochen ChangeLog rs6000: Optimize vector construction with two vector doubleword loads When constructing a vector by two doublewords from memory, originally it does ld 10,0(3) ld 9,0(4) mtvsrdd 34,9,10 An optimal sequence on Power10 should be lxsd 0,0(4) lxvrdx 1,0,3 xxlor 34,1,32 This patch does this optimization by insn combine and split. gcc/ PR target/103568 * config/rs6000/vsx.md (lxsd__be, lxsd__le, lxvrdx__be, lxvrdx__le): New insn pattern. (*vsx_concat_mem__be, *vsx_concat_mem__le): New insn_and_split pattern. gcc/testsuite/ PR target/103568 * gcc.target/powerpc/pr103568.c: New test. patch.diff diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md index f135fa079bd..9182d824d25 100644 --- a/gcc/config/rs6000/vsx.md +++ b/gcc/config/rs6000/vsx.md @@ -1395,6 +1395,49 @@ (define_insn "vsx_ld_elemrev_v2di" "lxvd2x %x0,%y1" [(set_attr "type" "vecload")]) +;; Before ISA3.1 the dword1 of lxsd[x] is undefined. +(define_insn "lxsd__be" + [(set (match_operand:VSX_D 0 "vsx_register_operand" "=v,wa") + (vec_concat:VSX_D + (match_operand: 1 "memory_operand" "wY,Z") + (match_operand: 2 "zero_constant" "j,j")))] + "TARGET_POWER10 && BYTES_BIG_ENDIAN" + "@ + lxsd %0,%1 + lxsdx %x0,%y1" + [(set_attr "type" "vecload,vecload") + (set_attr "prefixed" "yes,no")]) + +(define_insn "lxsd__le" + [(set (match_operand:VSX_D 0 "vsx_register_operand" "=v,wa") + (vec_concat:VSX_D + (match_operand: 1 "zero_constant" "j,j") + (match_operand: 2 "memory_operand" "wY,Z")))] + "TARGET_POWER10 && !BYTES_BIG_ENDIAN" + "@ + lxsd %0,%2 + lxsdx %x0,%y2" + [(set_attr "type" "vecload,vecload") + (set_attr "prefixed" "yes,no")]) + +(define_insn "lxvrdx__be" + [(set (match_operand:VSX_D 0 "vsx_register_operand" "=wa") + (vec_concat:VSX_D + (match_operand: 1 "zero_constant" "j") + (match_operand: 2 "memory_operand" "Z")))] + "TARGET_POWER10 && BYTES_BIG_ENDIAN" + "lxvrdx %x0,%y2" + [(set_attr "type" "vecload")]) + +(define_insn "lxvrdx__le" + [(set (match_operand:VSX_D 0 "vsx_register_operand" "=wa") + (vec_concat:VSX_D + (match_operand: 1 "memory_operand" "Z") + (match_operand: 2 "zero_constant" "j")))] + "TARGET_POWER10 && !BYTES_BIG_ENDIAN" + "lxvrdx %x0,%y1" + [(set_attr "type" "vecload")]) + (define_insn "vsx_ld_elemrev_v1ti" [(set (match_operand:V1TI 0 "vsx_register_operand" "=wa") (vec_select:V1TI @@ -3063,6 +3106,48 @@ (define_insn "vsx_concat_" } [(set_attr "type" "vecperm,vecmove")]) +(define_insn_and_split "*vsx_concat_mem__be" + [(set (match_operand:VSX_D 0 "vsx_register_operand" "=v,wa") + (vec_concat:VSX_D + (match_operand: 1 "memory_operand" "wY,Z") + (match_operand: 2 "memory_operand" "Z,Z")))] + "TARGET_POWER10 && BYTES_BIG_ENDIAN + && can_create_pseudo_p ()" + "#" + "&& 1" + [(const_int 0)] +{ + rtx tmp1 = gen_reg_rtx (mode); + rtx tmp2 = gen_reg_rtx (mode); + emit_insn (gen_lxsd__be (tmp1, CONST0_RTX (mode), + operands[1])); + emit_insn (gen_lxvrdx__be (tmp2, operands[2], + CONST0_RTX (mode))); + emit_insn (gen_ior3 (operands[0], tmp1, tmp2)); + DONE; +}) + +(define_insn_and_split "*vsx_concat_mem__le" + [(set (match_operand:VSX_D 0 "vsx_register_operand" "=v,wa") + (vec_concat:VSX_D + (match_operand: 1 "memory_operand" "Z,Z") + (match_operand: 2 "memory_operand" "wY,Z")))] + "TARGET_POWER10 && !BYTES_BIG_ENDIAN + && can_create_pseudo_p ()" + "#" + "&& 1" + [(const_int 0)] +{ + rtx tmp1 = gen_reg_rtx (mode); + rtx tmp2 = gen_reg_rtx (mode); + emit_insn (gen_lxsd__le (tmp1, CONST0_RTX (mode), + operands[2])); + emit_insn (gen_lxvrdx__le (tmp2, operands[1], + CONST0_RTX (mode))); + emit_insn (gen_ior3 (operands[0], tmp1, tmp2)); + DONE; +}) + ;; Combiner patterns to allow creating XXPERMDI's to access either double ;; word element in a vector register. (define_insn "*vsx_concat__1" diff --git a/gcc/testsuite/gcc.target/powerpc/pr103568.c b/gcc/testsuite/gcc.target/powerpc/pr103568.c new file mode 100644 index 00000000000..106fad7c8c0 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/pr103568.c @@ -0,0 +1,17 @@ +/* { dg-do compile } */ +/* { dg-options "-mdejagnu-cpu=power10 -O2" } */ + +vector double test (double *a, double *b) +{ + return (vector double) {*a, *b}; +} + +vector long long test1 (long long *a, long long *b) +{ + return (vector long long) {*a, *b}; +} + +/* { dg-final { scan-assembler-times {\mp?lxsd} 2 } } */ +/* { dg-final { scan-assembler-times {\mlxvrdx\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mxxlor\M} 2 } } */ +