From patchwork Fri Jun 21 02:55:43 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: liuhongt X-Patchwork-Id: 1950524 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=intel.com header.i=@intel.com header.a=rsa-sha256 header.s=Intel header.b=VGRjLqa6; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=8.43.85.97; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4W52715p7jz20X6 for ; Fri, 21 Jun 2024 12:56:13 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id DC9FD3895FDD for ; Fri, 21 Jun 2024 02:56:11 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.19]) by sourceware.org (Postfix) with ESMTPS id B1CD03896C0A for ; Fri, 21 Jun 2024 02:55:48 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org B1CD03896C0A Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=intel.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org B1CD03896C0A Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=192.198.163.19 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1718938551; cv=none; b=LeVJq/dEIxPT9z6H7szTgivZSAU1YsMrFqUhawTM4esaFVrrkB2PRLgX6XlJLwZZONOO/SsWm3fEdAf3Hh6dqP6Fuj29p2Df1aPcNgSNTBPP0jtJBonfxCMeRgXVOsYH0jgZqv1VIQMMUz+ohEw6vC/UXlL19SHTscHLpcPD7vY= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1718938551; c=relaxed/simple; bh=52D38h6ehR5BPquca3ffRgVcBYAQX7X6KGDQoN2NEMk=; h=DKIM-Signature:From:To:Subject:Date:Message-Id:MIME-Version; b=P06YPWC0in122OBbTXlD1aIzbbvKOaNpMh7QkM/O3EwefNraNJXzlkBhPacn966J7KprqkhzjPgPOIG/fzR9Gu3xVah6XNfDcNQkaaWalvk+pf0b/yfJYwKUcb89tkcIMVxGf2oQyU2LfJAH7V0XFiWs5y0qf+Pdj7+mlNQISbE= ARC-Authentication-Results: i=1; server2.sourceware.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1718938549; x=1750474549; h=from:to:cc:subject:date:message-id:mime-version: content-transfer-encoding; bh=52D38h6ehR5BPquca3ffRgVcBYAQX7X6KGDQoN2NEMk=; b=VGRjLqa697juw8FoQJySckw00ef1tQYs4zTqPc+iIF97Ev1ehSsNl8yW Tlr+AOrk1ScH9UbFMno6S4da6x/UOYytUQuluInblqmE8xeSrMF/zHgtC gnwOvn4HWuCSebLnLcgTpk09K3D9tKfKDqyUCDCv+rykmA3hNH1YblVZI y4xLiGYhD62mGtI+AY8jwAGt2aRmD98kTB+6aNt9ZtcwD/CCnP9JS36ag 9dANA+Xk3wpBacbdOmEQZMU6SD09MDFrnr1ELmnQkQ5RprmluglnE3IMz jP/q4VcLsez6NOU5b/XzrzzuLMwGnYHxFiWUgRWAQsiPkQ/6CEK8lqzLZ Q==; X-CSE-ConnectionGUID: NetGrOmMTGiD7YAmY9RIeQ== X-CSE-MsgGUID: EegNHUn9Rpy7Hg76QSM5qQ== X-IronPort-AV: E=McAfee;i="6700,10204,11109"; a="15717214" X-IronPort-AV: E=Sophos;i="6.08,253,1712646000"; d="scan'208";a="15717214" Received: from orviesa008.jf.intel.com ([10.64.159.148]) by fmvoesa113.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 20 Jun 2024 19:55:47 -0700 X-CSE-ConnectionGUID: wwl6/1sUTISyJSDIyKal3A== X-CSE-MsgGUID: pN0CtOhTSK+OSwUoI2fLoA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.08,253,1712646000"; d="scan'208";a="43147896" Received: from shvmail03.sh.intel.com ([10.239.245.20]) by orviesa008.jf.intel.com with ESMTP; 20 Jun 2024 19:55:45 -0700 Received: from shliclel4217.sh.intel.com (shliclel4217.sh.intel.com [10.239.240.127]) by shvmail03.sh.intel.com (Postfix) with ESMTP id 013481006FE8; Fri, 21 Jun 2024 10:55:44 +0800 (CST) From: liuhongt To: gcc-patches@gcc.gnu.org Cc: crazylht@gmail.com, hjl.tools@gmail.com Subject: [PATCH] [match.pd] Optimize a < 0 ? -1 : 0 to (signed)a >> 31. Date: Fri, 21 Jun 2024 10:55:43 +0800 Message-Id: <20240621025543.2470827-1-hongtao.liu@intel.com> X-Mailer: git-send-email 2.31.1 MIME-Version: 1.0 X-Spam-Status: No, score=-12.1 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, SPF_HELO_NONE, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Try to optimize x < 0 ? -1 : 0 into (signed) x >> 31 and x < 0 ? 1 : 0 into (unsigned) x >> 31. Move the optimization did in ix86_expand_int_vcond to match.pd Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}, aarch64-linux-gnu. Ok for trunk? gcc/ChangeLog: PR target/114189 * match.pd: Simplify a < 0 ? -1 : 0 to (signed) >> 31 and a < 0 ? 1 : 0 to (unsigned) a >> 31 for vector integer type. gcc/testsuite/ChangeLog: * gcc.target/i386/avx2-pr115517.c: New test. * gcc.target/i386/avx512-pr115517.c: New test. * g++.target/i386/avx2-pr115517.C: New test. * g++.target/i386/avx512-pr115517.C: New test. * g++.dg/tree-ssa/pr88152-1.C: Adjust testcase. --- gcc/match.pd | 28 ++++++++ gcc/testsuite/g++.dg/tree-ssa/pr88152-1.C | 2 +- gcc/testsuite/g++.target/i386/avx2-pr115517.C | 60 ++++++++++++++++ .../g++.target/i386/avx512-pr115517.C | 70 +++++++++++++++++++ gcc/testsuite/gcc.target/i386/avx2-pr115517.c | 33 +++++++++ .../gcc.target/i386/avx512-pr115517.c | 70 +++++++++++++++++++ 6 files changed, 262 insertions(+), 1 deletion(-) create mode 100644 gcc/testsuite/g++.target/i386/avx2-pr115517.C create mode 100644 gcc/testsuite/g++.target/i386/avx512-pr115517.C create mode 100644 gcc/testsuite/gcc.target/i386/avx2-pr115517.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512-pr115517.c diff --git a/gcc/match.pd b/gcc/match.pd index 3d0689c9312..41dd90493e7 100644 --- a/gcc/match.pd +++ b/gcc/match.pd @@ -5927,6 +5927,34 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) (if (VECTOR_INTEGER_TYPE_P (type) && target_supports_op_p (type, MINMAX, optab_vector)) (minmax @0 @1)))) + +/* Try to optimize x < 0 ? -1 : 0 into (signed) x >> 31 + and x < 0 ? 1 : 0 into (unsigned) x >> 31. */ +(simplify + (vec_cond (lt @0 integer_zerop) integer_all_onesp integer_zerop) + (if (VECTOR_INTEGER_TYPE_P (type) + && !TYPE_UNSIGNED (type) + && target_supports_op_p (type, RSHIFT_EXPR, optab_scalar)) + (with + { + unsigned int prec = element_precision (type); + } + (rshift @0 { build_int_cst (integer_type_node, prec - 1);})))) + +(simplify + (vec_cond (lt @0 integer_zerop) integer_onep integer_zerop) + (if (VECTOR_INTEGER_TYPE_P (type) + && !TYPE_UNSIGNED (type) + && target_supports_op_p (unsigned_type_for (type), + RSHIFT_EXPR, optab_scalar)) + (with + { + unsigned int prec = element_precision (type); + tree utype = unsigned_type_for (type); + } + (view_convert:type + (rshift (view_convert:utype @0) + { build_int_cst (integer_type_node, prec - 1);}))))) #endif (for cnd (cond vec_cond) diff --git a/gcc/testsuite/g++.dg/tree-ssa/pr88152-1.C b/gcc/testsuite/g++.dg/tree-ssa/pr88152-1.C index 423ec897c1d..21299b886f0 100644 --- a/gcc/testsuite/g++.dg/tree-ssa/pr88152-1.C +++ b/gcc/testsuite/g++.dg/tree-ssa/pr88152-1.C @@ -1,7 +1,7 @@ // PR target/88152 // { dg-do compile } // { dg-options "-O2 -std=c++14 -fdump-tree-forwprop1" } -// { dg-final { scan-tree-dump-times " (?:<|>=) \{ 0\[, ]" 120 "forwprop1" } } +// { dg-final { scan-tree-dump-times " (?:(?:<|>=) \{ 0\[, \]|>> (?:7|15|31|63))" 120 "forwprop1" } } template using V [[gnu::vector_size (sizeof (T) * N)]] = T; diff --git a/gcc/testsuite/g++.target/i386/avx2-pr115517.C b/gcc/testsuite/g++.target/i386/avx2-pr115517.C new file mode 100644 index 00000000000..ec000c57542 --- /dev/null +++ b/gcc/testsuite/g++.target/i386/avx2-pr115517.C @@ -0,0 +1,60 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx2 -O2" } */ +/* { dg-final { scan-assembler-times "vpsrlq" 2 } } */ +/* { dg-final { scan-assembler-times "vpsrld" 2 } } */ +/* { dg-final { scan-assembler-times "vpsrlw" 2 } } */ + +typedef short v8hi __attribute__((vector_size(16))); +typedef short v16hi __attribute__((vector_size(32))); +typedef int v4si __attribute__((vector_size(16))); +typedef int v8si __attribute__((vector_size(32))); +typedef long long v2di __attribute__((vector_size(16))); +typedef long long v4di __attribute__((vector_size(32))); + +v8hi +foo (v8hi a) +{ + v8hi const1_op = __extension__(v8hi){1,1,1,1,1,1,1,1}; + v8hi const0_op = __extension__(v8hi){0,0,0,0,0,0,0,0}; + return a < const0_op ? const1_op : const0_op; +} + +v16hi +foo2 (v16hi a) +{ + v16hi const1_op = __extension__(v16hi){1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1}; + v16hi const0_op = __extension__(v16hi){0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0}; + return a < const0_op ? const1_op : const0_op; +} + +v4si +foo3 (v4si a) +{ + v4si const1_op = __extension__(v4si){1,1,1,1}; + v4si const0_op = __extension__(v4si){0,0,0,0}; + return a < const0_op ? const1_op : const0_op; +} + +v8si +foo4 (v8si a) +{ + v8si const1_op = __extension__(v8si){1,1,1,1,1,1,1,1}; + v8si const0_op = __extension__(v8si){0,0,0,0,0,0,0,0}; + return a < const0_op ? const1_op : const0_op; +} + +v2di +foo3 (v2di a) +{ + v2di const1_op = __extension__(v2di){1,1}; + v2di const0_op = __extension__(v2di){0,0}; + return a < const0_op ? const1_op : const0_op; +} + +v4di +foo4 (v4di a) +{ + v4di const1_op = __extension__(v4di){1,1,1,1}; + v4di const0_op = __extension__(v4di){0,0,0,0}; + return a < const0_op ? const1_op : const0_op; +} diff --git a/gcc/testsuite/g++.target/i386/avx512-pr115517.C b/gcc/testsuite/g++.target/i386/avx512-pr115517.C new file mode 100644 index 00000000000..22df41bbdc9 --- /dev/null +++ b/gcc/testsuite/g++.target/i386/avx512-pr115517.C @@ -0,0 +1,70 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx512bw -mavx512vl -O2" } */ +/* { dg-final { scan-assembler-times "vpsrad" 3 } } */ +/* { dg-final { scan-assembler-times "vpsraw" 3 } } */ +/* { dg-final { scan-assembler-times "vpsraq" 3 } } */ + +typedef short v8hi __attribute__((vector_size(16))); +typedef short v16hi __attribute__((vector_size(32))); +typedef short v32hi __attribute__((vector_size(64))); +typedef int v4si __attribute__((vector_size(16))); +typedef int v8si __attribute__((vector_size(32))); +typedef int v16si __attribute__((vector_size(64))); +typedef long long v2di __attribute__((vector_size(16))); +typedef long long v4di __attribute__((vector_size(32))); +typedef long long v8di __attribute__((vector_size(64))); + +v8hi +foo (v8hi a) +{ + return a < __extension__(v8hi) { 0, 0, 0, 0, 0, 0, 0, 0}; +} + +v16hi +foo2 (v16hi a) +{ + return a < __extension__(v16hi) { 0, 0, 0, 0, 0, 0, 0, 0,0, 0, 0, 0, 0, 0, 0, 0}; +} + +v32hi +foo3 (v32hi a) +{ + return a < __extension__(v32hi) { 0, 0, 0, 0, 0, 0, 0, 0,0, 0, 0, 0, 0, 0, 0, 0, + 0, 0, 0, 0, 0, 0, 0, 0,0, 0, 0, 0, 0, 0, 0, 0}; +} + +v4si +foo4 (v4si a) +{ + return a < __extension__(v4si) { 0, 0, 0, 0}; +} + +v8si +foo5 (v8si a) +{ + return a < __extension__(v8si) { 0, 0, 0, 0, 0, 0, 0, 0}; +} + +v16si +foo6 (v16si a) +{ + return a < __extension__(v16si) { 0, 0, 0, 0, 0, 0, 0, 0,0, 0, 0, 0, 0, 0, 0, 0}; +} + +v2di +foo7 (v2di a) +{ + return a < __extension__(v2di) { 0, 0}; +} + +v4di +foo8 (v4di a) +{ + return a < __extension__(v4di) { 0, 0, 0, 0}; +} + +v8di +foo9 (v8di a) +{ + return a < __extension__(v8di) { 0, 0, 0, 0, 0, 0, 0, 0}; +} diff --git a/gcc/testsuite/gcc.target/i386/avx2-pr115517.c b/gcc/testsuite/gcc.target/i386/avx2-pr115517.c new file mode 100644 index 00000000000..5b2620b0dc1 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx2-pr115517.c @@ -0,0 +1,33 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx2 -O2" } */ +/* { dg-final { scan-assembler-times "vpsrad" 2 } } */ +/* { dg-final { scan-assembler-times "vpsraw" 2 } } */ + +typedef short v8hi __attribute__((vector_size(16))); +typedef short v16hi __attribute__((vector_size(32))); +typedef int v4si __attribute__((vector_size(16))); +typedef int v8si __attribute__((vector_size(32))); + +v8hi +foo (v8hi a) +{ + return a < __extension__(v8hi) { 0, 0, 0, 0, 0, 0, 0, 0}; +} + +v16hi +foo2 (v16hi a) +{ + return a < __extension__(v16hi) { 0, 0, 0, 0, 0, 0, 0, 0,0, 0, 0, 0, 0, 0, 0, 0}; +} + +v4si +foo3 (v4si a) +{ + return a < __extension__(v4si) { 0, 0, 0, 0}; +} + +v8si +foo4 (v8si a) +{ + return a < __extension__(v8si) { 0, 0, 0, 0, 0, 0, 0, 0}; +} diff --git a/gcc/testsuite/gcc.target/i386/avx512-pr115517.c b/gcc/testsuite/gcc.target/i386/avx512-pr115517.c new file mode 100644 index 00000000000..22df41bbdc9 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512-pr115517.c @@ -0,0 +1,70 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx512bw -mavx512vl -O2" } */ +/* { dg-final { scan-assembler-times "vpsrad" 3 } } */ +/* { dg-final { scan-assembler-times "vpsraw" 3 } } */ +/* { dg-final { scan-assembler-times "vpsraq" 3 } } */ + +typedef short v8hi __attribute__((vector_size(16))); +typedef short v16hi __attribute__((vector_size(32))); +typedef short v32hi __attribute__((vector_size(64))); +typedef int v4si __attribute__((vector_size(16))); +typedef int v8si __attribute__((vector_size(32))); +typedef int v16si __attribute__((vector_size(64))); +typedef long long v2di __attribute__((vector_size(16))); +typedef long long v4di __attribute__((vector_size(32))); +typedef long long v8di __attribute__((vector_size(64))); + +v8hi +foo (v8hi a) +{ + return a < __extension__(v8hi) { 0, 0, 0, 0, 0, 0, 0, 0}; +} + +v16hi +foo2 (v16hi a) +{ + return a < __extension__(v16hi) { 0, 0, 0, 0, 0, 0, 0, 0,0, 0, 0, 0, 0, 0, 0, 0}; +} + +v32hi +foo3 (v32hi a) +{ + return a < __extension__(v32hi) { 0, 0, 0, 0, 0, 0, 0, 0,0, 0, 0, 0, 0, 0, 0, 0, + 0, 0, 0, 0, 0, 0, 0, 0,0, 0, 0, 0, 0, 0, 0, 0}; +} + +v4si +foo4 (v4si a) +{ + return a < __extension__(v4si) { 0, 0, 0, 0}; +} + +v8si +foo5 (v8si a) +{ + return a < __extension__(v8si) { 0, 0, 0, 0, 0, 0, 0, 0}; +} + +v16si +foo6 (v16si a) +{ + return a < __extension__(v16si) { 0, 0, 0, 0, 0, 0, 0, 0,0, 0, 0, 0, 0, 0, 0, 0}; +} + +v2di +foo7 (v2di a) +{ + return a < __extension__(v2di) { 0, 0}; +} + +v4di +foo8 (v4di a) +{ + return a < __extension__(v4di) { 0, 0, 0, 0}; +} + +v8di +foo9 (v8di a) +{ + return a < __extension__(v8di) { 0, 0, 0, 0, 0, 0, 0, 0}; +}