From patchwork Tue May 21 07:13:34 2024
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: "Jiang, Haochen" <haochen.jiang@intel.com>
X-Patchwork-Id: 1937234
Return-Path: <gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org>
X-Original-To: incoming@patchwork.ozlabs.org
Delivered-To: patchwork-incoming@legolas.ozlabs.org
Authentication-Results: legolas.ozlabs.org;
	dkim=pass (2048-bit key;
 unprotected) header.d=intel.com header.i=@intel.com header.a=rsa-sha256
 header.s=Intel header.b=gFrcev93;
	dkim-atps=neutral
Authentication-Results: legolas.ozlabs.org;
 spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org
 (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org;
 envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org;
 receiver=patchwork.ozlabs.org)
Received: from server2.sourceware.org (server2.sourceware.org
 [IPv6:2620:52:3:1:0:246e:9693:128c])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
	 key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384)
	(No client certificate requested)
	by legolas.ozlabs.org (Postfix) with ESMTPS id 4Vk5K62LFrz1ydW
	for <incoming@patchwork.ozlabs.org>; Tue, 21 May 2024 17:14:18 +1000 (AEST)
Received: from server2.sourceware.org (localhost [IPv6:::1])
	by sourceware.org (Postfix) with ESMTP id 784BC3858C50
	for <incoming@patchwork.ozlabs.org>; Tue, 21 May 2024 07:14:14 +0000 (GMT)
X-Original-To: gcc-patches@gcc.gnu.org
Delivered-To: gcc-patches@gcc.gnu.org
Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.12])
 by sourceware.org (Postfix) with ESMTPS id ADE613858D1E
 for <gcc-patches@gcc.gnu.org>; Tue, 21 May 2024 07:13:38 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org ADE613858D1E
Authentication-Results: sourceware.org;
 dmarc=pass (p=none dis=none) header.from=intel.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=intel.com
ARC-Filter: OpenARC Filter v1.0.0 sourceware.org ADE613858D1E
Authentication-Results: server2.sourceware.org;
 arc=none smtp.remote-ip=192.198.163.12
ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1716275621; cv=none;
 b=u9ep+EmudtyeILvj9E8pJftc/3lNXUBHV99VkUCZGldV3ByDYKc5szBlWpFsL9Omo7ONXL/f9keSxYdDSi0XqTq04WCAkjQdr+zKghOauNkvaF4OwUMMoHca4sCkIrIa3VmOL+NRCn9hsiJD6tzfX86nQ6oxije6xWOvKCr6zyo=
ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key;
 t=1716275621; c=relaxed/simple;
 bh=U6l3ndsudR2BURFy4dvRcXlBhJT+5R9h5FQfDJFedlo=;
 h=DKIM-Signature:From:To:Subject:Date:Message-Id:MIME-Version;
 b=L/LfhcBxfH0bIucgmSP3zoj4aVw/6fdzViX7SnGoPxHsrQhTBaSyi+RQNA1PE3p+uSZSsDaKlE34RNNZMs3wGVZE0P874vUHdKkCkqwh9ry8cfmQsTon/ZKevEhY2Sai35O7zRk5AmvVYjoYEbl6R+AfdWm6bVtt4fYWl22hL8o=
ARC-Authentication-Results: i=1; server2.sourceware.org
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
 d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
 t=1716275619; x=1747811619;
 h=from:to:cc:subject:date:message-id:in-reply-to:
 references:mime-version:content-transfer-encoding;
 bh=U6l3ndsudR2BURFy4dvRcXlBhJT+5R9h5FQfDJFedlo=;
 b=gFrcev93TZ+LHzJx2heA+HPTLCJZPFmNn53hn8iz0z+4s+zhSw5GR7JR
 IlYn0kAyCcm9dKNTMkGmt+Ekt1NvCFKIWFBcJztH676xR4xvb9XPMb8CJ
 hHJmKcyBasmWBoEMVPDwz2R6uNHfkTg3ItRLNrBfowd7V2MUmSQGAbbUY
 BiE3ZxSmCGprm/NsFN26TlRzHYkCpktrAPaFza8rkjrGrx17J+qj9Fyx7
 Pf567b6N7skz+gd1WM8B6hvt9ERKi7dxFpl+h6Wu+4jPVysNaaUVHnRy/
 BMy/wDW4hldgETjFZkOqb4U1tvaSPKunqQoxCMeMECD8FV+o+gxGUUFSq w==;
X-CSE-ConnectionGUID: 5+ZatPH0QJaUPapq9oUcow==
X-CSE-MsgGUID: PHOZRuhoSR6Lho+dKmHNZw==
X-IronPort-AV: E=McAfee;i="6600,9927,11078"; a="16286120"
X-IronPort-AV: E=Sophos;i="6.08,177,1712646000"; d="scan'208";a="16286120"
Received: from orviesa007.jf.intel.com ([10.64.159.147])
 by fmvoesa106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 21 May 2024 00:13:37 -0700
X-CSE-ConnectionGUID: 6oscly4LTn2hpPELuXD2JQ==
X-CSE-MsgGUID: hqQwebXfTLutyDoC7MSrIA==
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="6.08,177,1712646000"; d="scan'208";a="33416405"
Received: from shvmail03.sh.intel.com ([10.239.245.20])
 by orviesa007.jf.intel.com with ESMTP; 21 May 2024 00:13:35 -0700
Received: from shliclel4217.sh.intel.com (shliclel4217.sh.intel.com
 [10.239.240.127])
 by shvmail03.sh.intel.com (Postfix) with ESMTP id A04D310081FC;
 Tue, 21 May 2024 15:13:34 +0800 (CST)
From: Haochen Jiang <haochen.jiang@intel.com>
To: gcc-patches@gcc.gnu.org
Cc: hongtao.liu@intel.com,
	ubizjak@gmail.com
Subject: [PATCH v2] i386: Disable ix86_expand_vecop_qihi2 when
 !TARGET_AVX512BW
Date: Tue, 21 May 2024 15:13:34 +0800
Message-Id: <20240521071334.1450276-1-haochen.jiang@intel.com>
X-Mailer: git-send-email 2.31.1
In-Reply-To: 
 <CAFULd4YnZBs+Ov2efbr4tfu2YUeO2T1qkrs+Kvh7a7LqADZPSA@mail.gmail.com>
References: 
 <CAFULd4YnZBs+Ov2efbr4tfu2YUeO2T1qkrs+Kvh7a7LqADZPSA@mail.gmail.com>
MIME-Version: 1.0
X-Spam-Status: No, score=-10.7 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH,
 DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0,
 KAM_SHORT,
 SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.30
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org

Hi all,

This is the v2 patch to fix PR115069. The new testcase has passed.

Changes in v2:
  - Added a testcase.
  - Change the comment for the early exit.

Thx,
Haochen

Since vpermq is really slow, we should avoid using it for permutation
when vpmovwb is not available (needs AVX512BW) for ix86_expand_vecop_qihi2
and fall back to ix86_expand_vecop_qihi.

gcc/ChangeLog:

        PR target/115069
	* config/i386/i386-expand.cc (ix86_expand_vecop_qihi2):
	Do not enable the optimization when AVX512BW is not enabled.

gcc/testsuite/ChangeLog:

        PR target/115069
	* gcc.target/i386/pr115069.c: New.
---
 gcc/config/i386/i386-expand.cc           |  7 +++
 gcc/testsuite/gcc.target/i386/pr115069.c | 78 ++++++++++++++++++++++++
 2 files changed, 85 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr115069.c

diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc
index a6132911e6a..f7939761879 100644
--- a/gcc/config/i386/i386-expand.cc
+++ b/gcc/config/i386/i386-expand.cc
@@ -24323,6 +24323,13 @@ ix86_expand_vecop_qihi2 (enum rtx_code code, rtx dest, rtx op1, rtx op2)
   bool op2vec = GET_MODE_CLASS (GET_MODE (op2)) == MODE_VECTOR_INT;
   bool uns_p = code != ASHIFTRT;
 
+  /* Without VPMOVWB (provided by AVX512BW ISA), the expansion uses the
+     generic permutation to merge the data back into the right place.  This
+     permutation results in VPERMQ, which is slow, so better fall back to
+     ix86_expand_vecop_qihi.  */
+  if (!TARGET_AVX512BW)
+    return false;
+
   if ((qimode == V16QImode && !TARGET_AVX2)
       || (qimode == V32QImode && (!TARGET_AVX512BW || !TARGET_EVEX512))
       /* There are no V64HImode instructions.  */
diff --git a/gcc/testsuite/gcc.target/i386/pr115069.c b/gcc/testsuite/gcc.target/i386/pr115069.c
new file mode 100644
index 00000000000..c4b48b602ef
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr115069.c
@@ -0,0 +1,78 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mavx2" } */
+/* { dg-final { scan-assembler-not "vpermq" } } */
+
+#include <stdio.h>
+#include <unistd.h>
+#include <stdlib.h>
+#include <inttypes.h>
+
+typedef int8_t  stress_vint8_t  __attribute__ ((vector_size (16)));
+
+#define OPS(a, b, c, s, v23, v3) \
+do {				\
+	a += b;			\
+	a |= b;			\
+	a -= b;			\
+	a &= ~b;		\
+	a *= c;			\
+	a = ~a;			\
+	a *= s;			\
+	a ^= c;			\
+	a <<= 1;		\
+	b >>= 1;		\
+	b += c;			\
+	a %= v23;		\
+	c /= v3;		\
+	b = b ^ c;		\
+	c = b ^ c;		\
+	b = b ^ c;		\
+} while (0)
+
+volatile uint8_t csum8_put;
+
+void stress_vecmath(void)
+{
+	const stress_vint8_t v23_8 = { 
+		0x17, 0x17, 0x17, 0x17, 0x17, 0x17, 0x17, 0x17,	
+		0x17, 0x17, 0x17, 0x17, 0x17, 0x17, 0x17, 0x17 
+	};
+	const stress_vint8_t v3_8 = {
+		0x03, 0x03, 0x03, 0x03, 0x03, 0x03, 0x03, 0x03,
+		0x03, 0x03, 0x03, 0x03, 0x03, 0x03, 0x03, 0x03
+	};
+	stress_vint8_t a8 = {
+		0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+		0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00
+	};
+	stress_vint8_t b8 = {
+		0x01, 0x23, 0x45, 0x67, 0x89, 0xab, 0xcd, 0xef,
+		0x0f, 0x1e, 0x2d, 0x3c, 0x4b, 0x5a, 0x69, 0x78
+	};
+	stress_vint8_t c8 = {
+		0x01, 0x02, 0x03, 0x02, 0x01, 0x02, 0x03, 0x02,
+		0x03, 0x02, 0x01, 0x02, 0x03, 0x02, 0x01, 0x02
+	};
+	stress_vint8_t s8 = {
+		0x01, 0x01, 0x01, 0x01, 0x02, 0x02, 0x02, 0x02,
+		0x01, 0x01, 0x02, 0x02, 0x01, 0x01, 0x02, 0x02,
+	};
+	const uint8_t csum8_val =  (uint8_t)0x1b;
+	int i;
+	uint8_t csum8;
+
+	for (i = 1000; i; i--) {
+		OPS(a8, b8, c8, s8, v23_8, v3_8);
+		OPS(a8, b8, c8, s8, v23_8, v3_8);
+		OPS(a8, b8, c8, s8, v23_8, v3_8);
+		OPS(a8, b8, c8, s8, v23_8, v3_8);
+		OPS(a8, b8, c8, s8, v23_8, v3_8);
+		OPS(a8, b8, c8, s8, v23_8, v3_8);
+	}
+
+	csum8 = a8[0]  ^ a8[1]  ^ a8[2]  ^ a8[3]  ^
+		a8[4]  ^ a8[5]  ^ a8[6]  ^ a8[7]  ^
+		a8[8]  ^ a8[9]  ^ a8[10] ^ a8[11] ^
+		a8[12] ^ a8[13] ^ a8[14] ^ a8[15];
+	csum8_put = csum8;
+}