From patchwork Thu Jun 13 00:44:40 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: liuhongt X-Patchwork-Id: 1947159 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=intel.com header.i=@intel.com header.a=rsa-sha256 header.s=Intel header.b=lAHmGS/M; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=8.43.85.97; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4W03bj0zBdz1ydW for ; Thu, 13 Jun 2024 10:45:20 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id F40F8388207B for ; Thu, 13 Jun 2024 00:45:16 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.9]) by sourceware.org (Postfix) with ESMTPS id 1AF783882049 for ; Thu, 13 Jun 2024 00:44:43 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 1AF783882049 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=intel.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 1AF783882049 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=192.198.163.9 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1718239486; cv=none; b=uS0q3cATqtUShlcJtRq0GcOj1zBd1uu+zGcwkhoqNqbY8Emi9hdSaYPCUuw8lkfrd0JdDPKd5XWCHmhjgmx6+JdDi6HWkZigzExPODxnSLgtxlXRSc4o86fxwve2EoNTwojwA1BK2VHxBAtWEk1ZvszVoCHvrpIlMnYK6wCjkHI= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1718239486; c=relaxed/simple; bh=tX59OvxaYH6pvDihb4Hb1/F0STatiZgIXAamEgdWCjw=; h=DKIM-Signature:From:To:Subject:Date:Message-Id:MIME-Version; b=Uf9XL6oFxTHLbr1QuO/1OTuPrsVTXw7gmi6eRQe4EeYtf2M8x/ChZ56lksdFtVzxijF5uL/sRBtxrfz44Z6S3zWDfhuQbrjOFmdOEAzeHf+24dEV8IdmpTIy41+GLDXWbsqfB1vHULxzTp2oW9toDsjkV4K46RFmihCw0uhu5c0= ARC-Authentication-Results: i=1; server2.sourceware.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1718239484; x=1749775484; h=from:to:cc:subject:date:message-id:mime-version: content-transfer-encoding; bh=tX59OvxaYH6pvDihb4Hb1/F0STatiZgIXAamEgdWCjw=; b=lAHmGS/MhblY7thtCQ0WnuCmKjANrMbyT8z1cwXZiZ2QMCQvTFi4pumB vx9KoqrFYPK1H2wdXAsqB4jwFXPlJE5uZFt271NF7u7M+oxhX5DVyVZzE wGIAxIsR39SV5ziPA/L85ZMJB2qHraZfkRqrmSkW66bj851WwUZbMxxlY eAa1WyCutz52+eYYKklc/nrUXiWKL8eV87zYqLS5NcsEADMPIjInHPMQv qhmHu/4RNcf/iFKiLzf3C1i+6C2Etq4+iAFx/Lksf9So0zqsIthUFrUEW BUz2khoEWQRfOvxw6yBpw2unPgajzRQaJUOCn0OqbOlpCXzDCrAt3jBH9 A==; X-CSE-ConnectionGUID: BG5dtuygTD+xrmbMFQqaTw== X-CSE-MsgGUID: +gD/vRSyQ12oDA960NaBGQ== X-IronPort-AV: E=McAfee;i="6700,10204,11101"; a="25714248" X-IronPort-AV: E=Sophos;i="6.08,234,1712646000"; d="scan'208";a="25714248" Received: from fmviesa001.fm.intel.com ([10.60.135.141]) by fmvoesa103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2024 17:44:42 -0700 X-CSE-ConnectionGUID: +uClEj2PS3W8EOH+qZ8OCA== X-CSE-MsgGUID: QvBRt/hFR6auYOA0wONO/Q== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.08,234,1712646000"; d="scan'208";a="71164764" Received: from shvmail03.sh.intel.com ([10.239.245.20]) by fmviesa001.fm.intel.com with ESMTP; 12 Jun 2024 17:44:40 -0700 Received: from shliclel4217.sh.intel.com (shliclel4217.sh.intel.com [10.239.240.127]) by shvmail03.sh.intel.com (Postfix) with ESMTP id 2904E10077D3; Thu, 13 Jun 2024 08:44:40 +0800 (CST) From: liuhongt To: gcc-patches@gcc.gnu.org Cc: crazylht@gmail.com, hjl.tools@gmail.com Subject: [PATCH] Adjust ix86_rtx_costs for pternlog_operand_p. Date: Thu, 13 Jun 2024 08:44:40 +0800 Message-Id: <20240613004440.335650-1-hongtao.liu@intel.com> X-Mailer: git-send-email 2.31.1 MIME-Version: 1.0 X-Spam-Status: No, score=-12.2 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, SPF_HELO_NONE, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org r15-1100-gec985bc97a0157 improves handling of ternlog instructions, now GCC can recognize lots of pternlog_operand with different variants. The patch adjust rtx_costs for that, so pass_combine can reasonably generate more optimal vpternlog instructions. .i.e for avx512f-vpternlog-3.c, with the patch, 2 vpternlog are combined into one. < vpternlogd $168, %zmm1, %zmm0, %zmm2 < vpternlogd $0x55, %zmm2, %zmm2, %zmm2 > vpternlogd $87, %zmm1, %zmm0, %zmm2 < vpand %xmm0, %xmm1, %xmm0 < vpternlogd $0x55, %zmm0, %zmm0, %zmm0 > vpternlogd $63, %zmm1, %zmm0, %zmm1 > vmovdqa %xmm1, %xmm0 < vpternlogd $188, %zmm2, %zmm0, %zmm1 < vpternlogd $0x55, %zmm1, %zmm1, %zmm1 > vpternlogd $37, %zmm0, %zmm2, %zmm1 Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}. Ready push to trunk. gcc/ChangeLog: * config/i386/i386.cc (ix86_rtx_costs): Adjust rtx_cost for pternlog_operand under AVX512, also adjust VEC_DUPLICATE according since vec_dup:mem can't be that cheap. gcc/testsuite/ChangeLog: * gcc.target/i386/avx2-pr98461.c: Scan either notl or vpternlog. * gcc.target/i386/avx512f-pr96891-3.c: Also scan for inversed condition. * gcc.target/i386/avx512f-vpternlogd-3.c: Adjust vpternlog number to 673. * gcc.target/i386/avx512f-vpternlogd-4.c: Ditto. * gcc.target/i386/avx512f-vpternlogd-5.c: Ditto. * gcc.target/i386/sse2-v1ti-vne.c: Add -mno-avx512f. --- gcc/config/i386/i386.cc | 39 ++++++++++++++++++- gcc/testsuite/gcc.target/i386/avx2-pr98461.c | 2 +- .../gcc.target/i386/avx512f-pr96891-3.c | 2 +- .../gcc.target/i386/avx512f-vpternlogd-3.c | 2 +- .../gcc.target/i386/avx512f-vpternlogd-4.c | 2 +- .../gcc.target/i386/avx512f-vpternlogd-5.c | 2 +- gcc/testsuite/gcc.target/i386/sse2-v1ti-vne.c | 2 +- 7 files changed, 44 insertions(+), 7 deletions(-) diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc index 173db213d14..9fb1ae575dd 100644 --- a/gcc/config/i386/i386.cc +++ b/gcc/config/i386/i386.cc @@ -21571,6 +21571,31 @@ ix86_rtx_costs (rtx x, machine_mode mode, int outer_code_i, int opno, = speed ? ix86_tune_cost : &ix86_size_cost; int src_cost; + /* Handling different vternlog variants. */ + if ((GET_MODE_SIZE (mode) == 64 + ? (TARGET_AVX512F && TARGET_EVEX512) + : (TARGET_AVX512VL + || (TARGET_AVX512F && TARGET_EVEX512 && !TARGET_PREFER_AVX256))) + && GET_MODE_SIZE (mode) >= 16 + && outer_code_i == SET + && ternlog_operand (x, mode)) + { + rtx args[3]; + + args[0] = NULL_RTX; + args[1] = NULL_RTX; + args[2] = NULL_RTX; + int idx = ix86_ternlog_idx (x, args); + gcc_assert (idx >= 0); + + *total = cost->sse_op; + for (int i = 0; i != 3; i++) + if (args[i]) + *total += rtx_cost (args[i], GET_MODE (args[i]), UNSPEC, i, speed); + return true; + } + + switch (code) { case SET: @@ -22233,6 +22258,9 @@ ix86_rtx_costs (rtx x, machine_mode mode, int outer_code_i, int opno, else if (XINT (x, 1) == UNSPEC_VTERNLOG) { *total = cost->sse_op; + *total += rtx_cost (XVECEXP (x, 0, 0), mode, code, 0, speed); + *total += rtx_cost (XVECEXP (x, 0, 1), mode, code, 1, speed); + *total += rtx_cost (XVECEXP (x, 0, 2), mode, code, 2, speed); return true; } else if (XINT (x, 1) == UNSPEC_PTEST) @@ -22260,12 +22288,21 @@ ix86_rtx_costs (rtx x, machine_mode mode, int outer_code_i, int opno, case VEC_SELECT: case VEC_CONCAT: - case VEC_DUPLICATE: /* ??? Assume all of these vector manipulation patterns are recognizable. In which case they all pretty much have the same cost. */ *total = cost->sse_op; return true; + case VEC_DUPLICATE: + *total = rtx_cost (XEXP (x, 0), + GET_MODE (XEXP (x, 0)), + VEC_DUPLICATE, 0, speed); + /* It's broadcast instruction, not embedded broadcasting. */ + if (outer_code == SET) + *total += cost->sse_op; + + return true; + case VEC_MERGE: mask = XEXP (x, 2); /* This is masked instruction, assume the same cost, diff --git a/gcc/testsuite/gcc.target/i386/avx2-pr98461.c b/gcc/testsuite/gcc.target/i386/avx2-pr98461.c index 15f49b864da..225f2ab00e5 100644 --- a/gcc/testsuite/gcc.target/i386/avx2-pr98461.c +++ b/gcc/testsuite/gcc.target/i386/avx2-pr98461.c @@ -2,7 +2,7 @@ /* { dg-do compile } */ /* { dg-options "-O2 -mavx2 -masm=att" } */ /* { dg-final { scan-assembler-times "\tvpmovmskb\t" 6 } } */ -/* { dg-final { scan-assembler-times "\tnotl\t" 6 } } */ +/* { dg-final { scan-assembler-times "\t(?:notl|vpternlog\[dq\])\t" 6 } } */ /* { dg-final { scan-assembler-not "\tvpcmpeq" } } */ /* { dg-final { scan-assembler-not "\tvpxor" } } */ /* { dg-final { scan-assembler-not "\tvpandn" } } */ diff --git a/gcc/testsuite/gcc.target/i386/avx512f-pr96891-3.c b/gcc/testsuite/gcc.target/i386/avx512f-pr96891-3.c index 06db7521305..5b260818cb3 100644 --- a/gcc/testsuite/gcc.target/i386/avx512f-pr96891-3.c +++ b/gcc/testsuite/gcc.target/i386/avx512f-pr96891-3.c @@ -3,7 +3,7 @@ /* { dg-final { scan-assembler-not {not[bwlqd]\]} } } */ /* { dg-final { scan-assembler-times {(?n)vpcmp[bwdq][ \t]*\$5} 4} } */ /* { dg-final { scan-assembler-times {(?n)vpcmp[bwdq][ \t]*\$6} 4} } */ -/* { dg-final { scan-assembler-times {(?n)vpcmp[bwdq][ \t]*\$7} 4} } */ +/* { dg-final { scan-assembler-times {(?n)vpcmp[bwdq][ \t]*\$[37]} 4} } */ /* { dg-final { scan-assembler-times {(?n)vcmpp[sd][ \t]*\$5} 2} } */ /* { dg-final { scan-assembler-times {(?n)vcmpp[sd][ \t]*\$6} 2} } */ /* { dg-final { scan-assembler-times {(?n)vcmpp[sd][ \t]*\$7} 2} } */ diff --git a/gcc/testsuite/gcc.target/i386/avx512f-vpternlogd-3.c b/gcc/testsuite/gcc.target/i386/avx512f-vpternlogd-3.c index fc66a9f5572..9ed4680346b 100644 --- a/gcc/testsuite/gcc.target/i386/avx512f-vpternlogd-3.c +++ b/gcc/testsuite/gcc.target/i386/avx512f-vpternlogd-3.c @@ -952,4 +952,4 @@ V foo_254_3(V a, V b, V c) { return (c|b)|a; } V foo_255_1(V a, V b, V c) { return (V){~0,~0,~0,~0}; } -/* { dg-final { scan-assembler-times "vpternlogd\[ \\t\]" 694 } } */ +/* { dg-final { scan-assembler-times "vpternlogd\[ \\t\]" 673 } } */ diff --git a/gcc/testsuite/gcc.target/i386/avx512f-vpternlogd-4.c b/gcc/testsuite/gcc.target/i386/avx512f-vpternlogd-4.c index 14296508cac..eb39ffc2564 100644 --- a/gcc/testsuite/gcc.target/i386/avx512f-vpternlogd-4.c +++ b/gcc/testsuite/gcc.target/i386/avx512f-vpternlogd-4.c @@ -952,4 +952,4 @@ V foo_254_3(V a, V b, V c) { return (c|b)|a; } V foo_255_1(V a, V b, V c) { return (V){~0,~0,~0,~0}; } -/* { dg-final { scan-assembler-times "vpternlogd\[ \\t\]" 694 } } */ +/* { dg-final { scan-assembler-times "vpternlogd\[ \\t\]" 673 } } */ diff --git a/gcc/testsuite/gcc.target/i386/avx512f-vpternlogd-5.c b/gcc/testsuite/gcc.target/i386/avx512f-vpternlogd-5.c index 3dbd9545283..85de5b02ce6 100644 --- a/gcc/testsuite/gcc.target/i386/avx512f-vpternlogd-5.c +++ b/gcc/testsuite/gcc.target/i386/avx512f-vpternlogd-5.c @@ -952,4 +952,4 @@ V foo_254_3(V a, V b, V c) { return (c|b)|a; } V foo_255_1(V a, V b, V c) { return (V){~0,~0,~0,~0}; } -/* { dg-final { scan-assembler-times "vpternlogd\[ \\t\]" 679 } } */ +/* { dg-final { scan-assembler-times "vpternlogd\[ \\t\]" 673 } } */ diff --git a/gcc/testsuite/gcc.target/i386/sse2-v1ti-vne.c b/gcc/testsuite/gcc.target/i386/sse2-v1ti-vne.c index 767b0e4b3ac..2394cff39f2 100644 --- a/gcc/testsuite/gcc.target/i386/sse2-v1ti-vne.c +++ b/gcc/testsuite/gcc.target/i386/sse2-v1ti-vne.c @@ -1,5 +1,5 @@ /* { dg-do compile { target int128 } } */ -/* { dg-options "-O2 -msse2" } */ +/* { dg-options "-O2 -msse2 -mno-avx512f" } */ typedef unsigned __int128 uv1ti __attribute__ ((__vector_size__ (16))); typedef unsigned long long uv2di __attribute__ ((__vector_size__ (16))); typedef unsigned int uv4si __attribute__ ((__vector_size__ (16)));