From patchwork Thu Oct 31 17:44:14 2024
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Pengxuan Zheng <quic_pzheng@quicinc.com>
X-Patchwork-Id: 2004755
Return-Path: <gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org>
X-Original-To: incoming@patchwork.ozlabs.org
Delivered-To: patchwork-incoming@legolas.ozlabs.org
Authentication-Results: legolas.ozlabs.org;
	dkim=pass (2048-bit key;
 unprotected) header.d=quicinc.com header.i=@quicinc.com header.a=rsa-sha256
 header.s=qcppdkim1 header.b=DcReAApO;
	dkim-atps=neutral
Authentication-Results: legolas.ozlabs.org;
 spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org
 (client-ip=8.43.85.97; helo=server2.sourceware.org;
 envelope-from=gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org;
 receiver=patchwork.ozlabs.org)
Received: from server2.sourceware.org (server2.sourceware.org [8.43.85.97])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
	 key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384)
	(No client certificate requested)
	by legolas.ozlabs.org (Postfix) with ESMTPS id 4XfWbY3SgRz1xwF
	for <incoming@patchwork.ozlabs.org>; Fri,  1 Nov 2024 04:44:55 +1100 (AEDT)
Received: from server2.sourceware.org (localhost [IPv6:::1])
	by sourceware.org (Postfix) with ESMTP id 7CD5A3857348
	for <incoming@patchwork.ozlabs.org>; Thu, 31 Oct 2024 17:44:53 +0000 (GMT)
X-Original-To: gcc-patches@gcc.gnu.org
Delivered-To: gcc-patches@gcc.gnu.org
Received: from mx0b-0031df01.pphosted.com (mx0b-0031df01.pphosted.com
 [205.220.180.131])
 by sourceware.org (Postfix) with ESMTPS id 0C5B23858CDB
 for <gcc-patches@gcc.gnu.org>; Thu, 31 Oct 2024 17:44:32 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 0C5B23858CDB
Authentication-Results: sourceware.org;
 dmarc=pass (p=none dis=none) header.from=quicinc.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=quicinc.com
ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 0C5B23858CDB
Authentication-Results: server2.sourceware.org;
 arc=none smtp.remote-ip=205.220.180.131
ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1730396674; cv=none;
 b=f+vg4kxvs6dPFfVH1TL7ZCftIqgDWleKYAkCkUBna/pmBoseqDrZeoe5eS1Qr9J7cQXwK69fK48bJYxOQYvqA6f9F3PXLXVcHkTogLiRTdufYwYZIIl3Td2V+IKHQoIN2x874P7YMwz+uGgfGENT9jetMvOyNd2LPMrgIIoUBVM=
ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key;
 t=1730396674; c=relaxed/simple;
 bh=QhJPCo6zzDAkEIwFS/pYliSITNwodJZXdJtH9m/5AcY=;
 h=DKIM-Signature:From:To:Subject:Date:Message-ID:MIME-Version;
 b=IUENoZy3siKQN8O28r5jpNzBPp09pNYdrdccL0IwQ1NdOANqQWYkxUG0VLEG7boJDw0vqCTshn6vjYEgwMFTZ8ccjDjuyfZWPWxOOQd2HLNjHv6dwcgWUcGV2g3U1+O0+VA59YWO9i6S7Y1zawPgwHg6Sq0Te1vXIt7TWiiut/I=
ARC-Authentication-Results: i=1; server2.sourceware.org
Received: from pps.filterd (m0279872.ppops.net [127.0.0.1])
 by mx0a-0031df01.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id
 49V8aZpO006585
 for <gcc-patches@gcc.gnu.org>; Thu, 31 Oct 2024 17:44:31 GMT
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=quicinc.com; h=
 cc:content-type:date:from:message-id:mime-version:subject:to; s=
 qcppdkim1; bh=0AxCRpy04S0M27zBici7yBoBONuIe9V1KbgezL8pfRs=; b=Dc
 ReAApOJVjt0lTBp7V7TzdJ1yW0AQ7+1hHNMIRJQ/6b1LTQymRnRkWRVVqEYG6rUG
 jPThglkUydAlyjtEQ03yRhqxcnCDEk6dHvOaEStv68ADuUwtPhZsaewkVSh/Z/pF
 eYWMf39mG9WLLvorGekk+1qy6UiMzCJ2/h1L/9AEXviqtBpXaZrsAHb5Gt/0+kbt
 5QTeF5vCT9AhIE8tW+bOqsoyZIo6SSc02LeobaBEIRLdto+vbRgK/cU5hCe3Qpcy
 ARSIGlRepQa+D02EVhzmkiZ5F55horh+tVT9sc0U3WmV21CYVruOmx9hpOij8i/7
 oP3tTAOhUjBj9dIcfZKQ==
Received: from nalasppmta04.qualcomm.com (Global_NAT1.qualcomm.com
 [129.46.96.20])
 by mx0a-0031df01.pphosted.com (PPS) with ESMTPS id 42ku65baxt-1
 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT)
 for <gcc-patches@gcc.gnu.org>; Thu, 31 Oct 2024 17:44:31 +0000 (GMT)
Received: from nalasex01c.na.qualcomm.com (nalasex01c.na.qualcomm.com
 [10.47.97.35])
 by NALASPPMTA04.qualcomm.com (8.18.1.2/8.18.1.2) with ESMTPS id
 49VHiUMO030057
 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT)
 for <gcc-patches@gcc.gnu.org>; Thu, 31 Oct 2024 17:44:30 GMT
Received: from hu-pzheng-lv.qualcomm.com (10.49.16.6) by
 nalasex01c.na.qualcomm.com (10.47.97.35) with Microsoft SMTP Server
 (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id
 15.2.1544.9; Thu, 31 Oct 2024 10:44:29 -0700
From: Pengxuan Zheng <quic_pzheng@quicinc.com>
To: <gcc-patches@gcc.gnu.org>
CC: Pengxuan Zheng <quic_pzheng@quicinc.com>
Subject: [PATCH] aarch64: Recognize vector permute patterns suitable for FMOV
 [PR100165]
Date: Thu, 31 Oct 2024 10:44:14 -0700
Message-ID: <20241031174414.3514-1-quic_pzheng@quicinc.com>
X-Mailer: git-send-email 2.17.1
MIME-Version: 1.0
X-Originating-IP: [10.49.16.6]
X-ClientProxiedBy: nalasex01b.na.qualcomm.com (10.47.209.197) To
 nalasex01c.na.qualcomm.com (10.47.97.35)
X-QCInternal: smtphost
X-Proofpoint-Virus-Version: vendor=nai engine=6200 definitions=5800
 signatures=585085
X-Proofpoint-GUID: EK00WtPZa6iMnqmwoElcxGNfe6vz-6rQ
X-Proofpoint-ORIG-GUID: EK00WtPZa6iMnqmwoElcxGNfe6vz-6rQ
X-Proofpoint-Virus-Version: vendor=baseguard
 engine=ICAP:2.0.293,Aquarius:18.0.1039,Hydra:6.0.680,FMLib:17.12.60.29
 definitions=2024-09-06_09,2024-09-06_01,2024-09-02_01
X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0
 mlxlogscore=999
 suspectscore=0 bulkscore=0 lowpriorityscore=0 mlxscore=0
 priorityscore=1501 impostorscore=0 phishscore=0 clxscore=1015
 malwarescore=0 spamscore=0 adultscore=0 classifier=spam adjust=0
 reason=mlx scancount=1 engine=8.19.0-2409260000
 definitions=main-2410310134
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.30
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
Errors-To: gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org

This patch optimizes certain vector permute expansion with the FMOV instruction
when one of the input vectors is a vector of all zeros and the result of the
vector permute is as if the upper lane of the non-zero input vector is set to
zero and the lower lane remains unchanged.

Note that the patch also propagates zero_op0_p and zero_op1_p during re-encode
now.  They will be used by aarch64_evpc_fmov to check if the input vectors are
valid candidates.

	PR target/100165

gcc/ChangeLog:

	* config/aarch64/aarch64-simd.md (aarch64_simd_vec_set_zero_fmov<mode>):
	New define_insn.
	* config/aarch64/aarch64.cc (aarch64_evpc_reencode): Copy zero_op0_p and
	zero_op1_p.
	(aarch64_evpc_fmov): New function.
	(aarch64_expand_vec_perm_const_1): Add call to aarch64_evpc_fmov.

gcc/testsuite/ChangeLog:

	* gcc.target/aarch64/vec-set-zero.c: Update test accordingly.
	* gcc.target/aarch64/fmov.c: New test.
	* gcc.target/aarch64/fmov-be.c: New test.

Signed-off-by: Pengxuan Zheng <quic_pzheng@quicinc.com>
---
 gcc/config/aarch64/aarch64-simd.md            |  14 +++
 gcc/config/aarch64/aarch64.cc                 |  74 +++++++++++-
 gcc/testsuite/gcc.target/aarch64/fmov-be.c    |  74 ++++++++++++
 gcc/testsuite/gcc.target/aarch64/fmov.c       | 110 ++++++++++++++++++
 .../gcc.target/aarch64/vec-set-zero.c         |   6 +-
 5 files changed, 275 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/fmov-be.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/fmov.c

diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index e456f693d2f..543126948e7 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -1190,6 +1190,20 @@ (define_insn "aarch64_simd_vec_set<mode>"
   [(set_attr "type" "neon_ins<q>, neon_from_gp<q>, neon_load1_one_lane<q>")]
 )
 
+(define_insn "aarch64_simd_vec_set_zero_fmov<mode>"
+  [(set (match_operand:VP_2E 0 "register_operand" "=w")
+	(vec_merge:VP_2E
+	    (match_operand:VP_2E 1 "aarch64_simd_imm_zero" "Dz")
+	    (match_operand:VP_2E 3 "register_operand" "w")
+	    (match_operand:SI 2 "immediate_operand" "i")))]
+  "TARGET_SIMD
+   && (ENDIAN_LANE_N (<nunits>, exact_log2 (INTVAL (operands[2]))) == 1)"
+  {
+    return "fmov\\t%<Vetype>0, %<Vetype>3";
+  }
+  [(set_attr "type" "fmov")]
+)
+
 (define_insn "aarch64_simd_vec_set_zero<mode>"
   [(set (match_operand:VALL_F16 0 "register_operand" "=w")
 	(vec_merge:VALL_F16
diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index a6cc00e74ab..64756920eda 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -25950,6 +25950,8 @@ aarch64_evpc_reencode (struct expand_vec_perm_d *d)
   newd.target = d->target ? gen_lowpart (new_mode, d->target) : NULL;
   newd.op0 = d->op0 ? gen_lowpart (new_mode, d->op0) : NULL;
   newd.op1 = d->op1 ? gen_lowpart (new_mode, d->op1) : NULL;
+  newd.zero_op0_p = d->zero_op0_p;
+  newd.zero_op1_p = d->zero_op1_p;
   newd.testing_p = d->testing_p;
   newd.one_vector_p = d->one_vector_p;
 
@@ -26434,6 +26436,74 @@ aarch64_evpc_ins (struct expand_vec_perm_d *d)
   return true;
 }
 
+/* Recognize patterns suitable for the FMOV instructions.  */
+static bool
+aarch64_evpc_fmov (struct expand_vec_perm_d *d)
+{
+  machine_mode mode = d->vmode;
+  unsigned HOST_WIDE_INT nelt;
+
+  if (d->vec_flags != VEC_ADVSIMD)
+    return false;
+
+  /* to_constant is safe since this routine is specific to Advanced SIMD
+     vectors.  */
+  nelt = d->perm.length ().to_constant ();
+
+  /* Either d->op0 or d->op1 should be a vector of all zeros.  */
+  if (nelt != 2 || d->one_vector_p || (!d->zero_op0_p && !d->zero_op1_p))
+    return false;
+
+  HOST_WIDE_INT elt0, elt1;
+  rtx in0 = d->op0;
+  rtx in1 = d->op1;
+
+  if (!d->perm[0].is_constant (&elt0))
+    return false;
+
+  if (!d->perm[1].is_constant (&elt1))
+    return false;
+
+  if (!BYTES_BIG_ENDIAN)
+    {
+      /* Lane 0 of the output vector should come from lane 0 of the non-zero
+	 vector.  */
+      if (elt0 != (d->zero_op0_p ? 2 : 0))
+	return false;
+
+      /* Lane 1 of the output vector should come from any lane of the zero
+	 vector.  */
+      if (elt1 != (d->zero_op0_p ? 0 : 2) && elt1 != (d->zero_op0_p ? 1 : 3))
+	return false;
+    }
+  else
+    {
+      /* Lane 0 of the output vector should come from any lane of the zero
+	 vector.  */
+      if (elt0 != (d->zero_op0_p ? 0 : 2) && elt0 != (d->zero_op0_p ? 1 : 3))
+	return false;
+
+      /* Lane 1 of the output vector should come from lane 1 of the non-zero
+	 vector.  */
+      if (elt1 != (d->zero_op0_p ? 3 : 1))
+	return false;
+    }
+
+  if (d->testing_p)
+    return true;
+
+  insn_code icode = code_for_aarch64_simd_vec_copy_lane (mode);
+  expand_operand ops[5];
+  create_output_operand (&ops[0], d->target, mode);
+  create_input_operand (&ops[1], d->zero_op0_p ? in1 : in0, mode);
+  create_integer_operand (&ops[2], BYTES_BIG_ENDIAN ? 1 : 2);
+  create_input_operand (&ops[3], d->zero_op0_p ? in0 : in1, mode);
+  create_integer_operand (&ops[4], 0);
+  expand_insn (icode, 5, ops);
+
+  return true;
+}
+
 static bool
 aarch64_expand_vec_perm_const_1 (struct expand_vec_perm_d *d)
 {
@@ -26457,7 +26527,9 @@ aarch64_expand_vec_perm_const_1 (struct expand_vec_perm_d *d)
     {
       if (d->vmode == d->op_mode)
 	{
-	  if (aarch64_evpc_rev_local (d))
+	  if (aarch64_evpc_fmov (d))
+	    return true;
+	  else if (aarch64_evpc_rev_local (d))
 	    return true;
 	  else if (aarch64_evpc_rev_global (d))
 	    return true;
diff --git a/gcc/testsuite/gcc.target/aarch64/fmov-be.c b/gcc/testsuite/gcc.target/aarch64/fmov-be.c
new file mode 100644
index 00000000000..f864d474a7a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/fmov-be.c
@@ -0,0 +1,74 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mbig-endian" } */
+/* { dg-final { check-function-bodies "**" "" "" } } */
+
+typedef int v2si __attribute__ ((vector_size (8)));
+typedef long v2di __attribute__ ((vector_size (16)));
+typedef int v4si __attribute__ ((vector_size (16)));
+typedef float v4sf __attribute__ ((vector_size (16)));
+
+/*
+** f_v2si:
+**	fmov	s0, s0
+**	ret
+*/
+v2si
+f_v2si (v2si x)
+{
+  return __builtin_shuffle (x, (v2si){ 0, 0 }, (v2si){ 3, 1 });
+}
+
+/*
+** g_v2si:
+**	fmov	s0, s0
+**	ret
+*/
+v2si
+g_v2si (v2si x)
+{
+  return __builtin_shuffle ((v2si){ 0, 0 }, x, (v2si){ 0, 3 });
+}
+
+/*
+** f_v2di:
+**	fmov	d0, d0
+**	ret
+*/
+v2di
+f_v2di (v2di x)
+{
+  return __builtin_shuffle (x, (v2di){ 0, 0 }, (v2di){ 2, 1 });
+}
+
+/*
+** g_v2di:
+**	fmov	d0, d0
+**	ret
+*/
+v2di
+g_v2di (v2di x)
+{
+  return __builtin_shuffle ((v2di){ 0, 0 }, x, (v2di){ 0, 3 });
+}
+
+/*
+** f_v4si:
+**	fmov	d0, d0
+**	ret
+*/
+v4si
+f_v4si (v4si x)
+{
+  return __builtin_shuffle (x, (v4si){ 0, 0, 0, 0 }, (v4si){ 6, 7, 2, 3 });
+}
+
+/*
+** g_v4si:
+**	fmov	d0, d0
+**	ret
+*/
+v4si
+g_v4si (v4si x)
+{
+  return __builtin_shuffle ((v4si){ 0, 0, 0, 0 }, x, (v4si){ 2, 3, 6, 7 });
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/fmov.c b/gcc/testsuite/gcc.target/aarch64/fmov.c
new file mode 100644
index 00000000000..f40ec56c5f8
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/fmov.c
@@ -0,0 +1,110 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" "" } } */
+
+typedef int v2si __attribute__ ((vector_size (8)));
+typedef float v2sf __attribute__ ((vector_size (8)));
+typedef long v2di __attribute__ ((vector_size (16)));
+typedef double v2df __attribute__ ((vector_size (16)));
+typedef int v4si __attribute__ ((vector_size (16)));
+typedef float v4sf __attribute__ ((vector_size (16)));
+
+
+/*
+** f_v2si:
+**	fmov	s0, s0
+**	ret
+*/
+v2si
+f_v2si (v2si x)
+{
+  return __builtin_shuffle (x, (v2si){ 0, 0 }, (v2si){ 0, 3 });
+}
+
+/*
+** g_v2si:
+**	fmov	s0, s0
+**	ret
+*/
+v2si
+g_v2si (v2si x)
+{
+  return __builtin_shuffle ((v2si){ 0, 0 }, x, (v2si){ 2, 0 });
+}
+
+/*
+** f_v2sf:
+**	fmov	s0, s0
+**	ret
+*/
+v2sf
+f_v2sf (v2sf x)
+{
+  return __builtin_shuffle (x, (v2sf){ 0, 0 }, (v2si){ 0, 2 });
+}
+
+/*
+** f_v2di:
+**	fmov	d0, d0
+**	ret
+*/
+v2di
+f_v2di (v2di x)
+{
+  return __builtin_shuffle (x, (v2di){ 0, 0 }, (v2di){ 0, 3 });
+}
+
+/*
+** g_v2di:
+**	fmov	d0, d0
+**	ret
+*/
+v2di
+g_v2di (v2di x)
+{
+  return __builtin_shuffle ((v2di){ 0, 0 }, x, (v2di){ 2, 1 });
+}
+
+/*
+** f_v2df:
+**	fmov	d0, d0
+**	ret
+*/
+v2df
+f_v2df (v2df x)
+{
+  return __builtin_shuffle (x, (v2df){ 0, 0 }, (v2di){ 0, 2 });
+}
+
+/*
+** f_v4si:
+**	fmov	d0, d0
+**	ret
+*/
+v4si
+f_v4si (v4si x)
+{
+  return __builtin_shuffle (x, (v4si){ 0, 0, 0, 0 }, (v4si){ 0, 1, 4, 5 });
+}
+
+/*
+** g_v4si:
+**	fmov	d0, d0
+**	ret
+*/
+v4si
+g_v4si (v4si x)
+{
+  return __builtin_shuffle ((v4si){ 0, 0, 0, 0 }, x, (v4si){ 4, 5, 2, 3 });
+}
+
+/*
+** f_v4sf:
+**	fmov	d0, d0
+**	ret
+*/
+v4sf
+f_v4sf (v4sf x)
+{
+  return __builtin_shuffle (x, (v4sf){ 0, 0, 0, 0 }, (v4si){ 0, 1, 6, 7 });
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/vec-set-zero.c b/gcc/testsuite/gcc.target/aarch64/vec-set-zero.c
index b34b902cf27..9040839931f 100644
--- a/gcc/testsuite/gcc.target/aarch64/vec-set-zero.c
+++ b/gcc/testsuite/gcc.target/aarch64/vec-set-zero.c
@@ -28,8 +28,10 @@ FOO(float64x2_t)
 
 /* { dg-final { scan-assembler-times {ins\tv[0-9]+\.b\[1\], wzr} 2 { target aarch64_little_endian } } } */
 /* { dg-final { scan-assembler-times {ins\tv[0-9]+\.h\[1\], wzr} 4 { target aarch64_little_endian } } } */
-/* { dg-final { scan-assembler-times {ins\tv[0-9]+\.s\[1\], wzr} 4 { target aarch64_little_endian } } } */
-/* { dg-final { scan-assembler-times {ins\tv[0-9]+\.d\[1\], xzr} 2 { target aarch64_little_endian } } } */
+/* { dg-final { scan-assembler-times {ins\tv[0-9]+\.s\[1\], wzr} 2 { target aarch64_little_endian } } } */
+/* { dg-final { scan-assembler-times {ins\tv[0-9]+\.d\[1\], xzr} 0 { target aarch64_little_endian } } } */
+/* { dg-final { scan-assembler-times {fmov\ts0, s0} 2 { target aarch64_little_endian } } } */
+/* { dg-final { scan-assembler-times {fmov\td0, d0} 2 { target aarch64_little_endian } } } */
 
 /* { dg-final { scan-assembler-times {ins\tv[0-9]+\.b\[6\], wzr} 1 { target aarch64_big_endian } } } */
 /* { dg-final { scan-assembler-times {ins\tv[0-9]+\.b\[14\], wzr} 1 { target aarch64_big_endian } } } */