From patchwork Sun Jun 23 23:25:56 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: liuhongt X-Patchwork-Id: 1951359 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=intel.com header.i=@intel.com header.a=rsa-sha256 header.s=Intel header.b=SGNTEYQF; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4W6nMx4QKRz20WR for ; Mon, 24 Jun 2024 09:28:25 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 9199F386181F for ; Sun, 23 Jun 2024 23:28:22 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.20]) by sourceware.org (Postfix) with ESMTPS id 5B1F03858D20 for ; Sun, 23 Jun 2024 23:28:01 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 5B1F03858D20 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=intel.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 5B1F03858D20 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=198.175.65.20 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1719185284; cv=none; b=vidkwWYKdXSbEUOb68cizzNUtk894DcMmiIHokpw4fV/1p13dmaWFHKjecSWEOHQfAm8WN7AmWWcFcTsu0sRurPXwGqTHeUJ9qld/ygVis7bhtuf3HcLkCRzURrtl11P3KatB2IFz/1+TR9vUsKIRHxNafqTLRFBK72QNAlPvxo= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1719185284; c=relaxed/simple; bh=PU3GabOygTyCaFVOwyQJ/FATO0CUbpKwEiT0qkew/QY=; h=DKIM-Signature:From:To:Subject:Date:Message-Id:MIME-Version; b=gdQu4s39kx/N5ax+hDBMljMeFdieVUYH5EXWrLZtOuPocyCKXQbeuNrs6QwRlb5N0OHITndOWezGsDOhKRCmuVD+q5YMpAsvB4Cre9mglKPsI752cIAxQDeEL+U7llYlIwLjIrkvgw4+93bW2lFVb/uYqc0IIJnmQV8UJFtGW0c= ARC-Authentication-Results: i=1; server2.sourceware.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1719185281; x=1750721281; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=PU3GabOygTyCaFVOwyQJ/FATO0CUbpKwEiT0qkew/QY=; b=SGNTEYQFOiW+DKI0NfzhVXfp5D1X9R5eUBtnave4qWrnkPsc0tyJy014 hZHNmbIbScDIMgba74mwLxxd7mHLIvMO76CKYCzmjEKuY5UGb6IHzFCJV urvJhJQPPxjSSSB236z3XIxZ1vhRcUE/7XxgXEry05ydG6NTnTJ5YyxZd pKKhsju6RC7WAZRJ/7xWhUPcI+L9Ulsaekvt8/L9oNdfPfyetO1DkQzVn QzpdW6SIikmmCZmRAzWQTLC8fIzYCwZE5FQCpoajqh0fWDUU4NH57hCoG UMZtuVEL9jVonlG2g8Ts2OXpKkv9Vgh/cceUOebkd+atr5OX5YZl2e+4f g==; X-CSE-ConnectionGUID: MwX8RxawQxqVe9SS7aVRNg== X-CSE-MsgGUID: Fyj9hq2PRGS/XJkoCiSdMw== X-IronPort-AV: E=McAfee;i="6700,10204,11112"; a="15969329" X-IronPort-AV: E=Sophos;i="6.08,261,1712646000"; d="scan'208";a="15969329" Received: from fmviesa002.fm.intel.com ([10.60.135.142]) by orvoesa112.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 23 Jun 2024 16:28:00 -0700 X-CSE-ConnectionGUID: e/1EsegLQwKd/mxFgDWIHQ== X-CSE-MsgGUID: olyOcJEZQP+b0pyvttUeaQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.08,261,1712646000"; d="scan'208";a="66359625" Received: from shvmail03.sh.intel.com ([10.239.245.20]) by fmviesa002.fm.intel.com with ESMTP; 23 Jun 2024 16:27:57 -0700 Received: from shliclel4217.sh.intel.com (shliclel4217.sh.intel.com [10.239.240.127]) by shvmail03.sh.intel.com (Postfix) with ESMTP id F263A100737F; Mon, 24 Jun 2024 07:27:56 +0800 (CST) From: liuhongt To: gcc-patches@gcc.gnu.org Cc: crazylht@gmail.com, hjl.tools@gmail.com Subject: [PATCH V2] [x86] Optimize a < 0 ? -1 : 0 to (signed)a >> 31. Date: Mon, 24 Jun 2024 07:25:56 +0800 Message-Id: <20240623232556.314365-1-hongtao.liu@intel.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: References: MIME-Version: 1.0 X-Spam-Status: No, score=-12.1 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org > I think the check for TYPE_UNSIGNED should be of TREE_TYPE (@0) rather > than type here. Changed > Or maybe you need `types_match (type, TREE_TYPE (@0))` too. And use tree_nop_conversion_p (type, TREE_TYPE (@0)) and add view_convert to rshift. Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}. Ok for trunk? Try to optimize x < 0 ? -1 : 0 into (signed) x >> 31 and x < 0 ? 1 : 0 into (unsigned) x >> 31. Move the optimization did in ix86_expand_int_vcond to match.pd gcc/ChangeLog: PR target/114189 * match.pd: Simplify a < 0 ? -1 : 0 to (signed) >> 31 and a < 0 ? 1 : 0 to (unsigned) a >> 31 for vector integer type. gcc/testsuite/ChangeLog: * gcc.target/i386/avx2-pr115517.c: New test. * gcc.target/i386/avx512-pr115517.c: New test. * g++.target/i386/avx2-pr115517.C: New test. * g++.target/i386/avx512-pr115517.C: New test. * g++.dg/tree-ssa/pr88152-1.C: Adjust testcase. --- gcc/match.pd | 31 ++++++++ gcc/testsuite/g++.dg/tree-ssa/pr88152-1.C | 2 +- gcc/testsuite/g++.target/i386/avx2-pr115517.C | 60 ++++++++++++++++ .../g++.target/i386/avx512-pr115517.C | 70 +++++++++++++++++++ gcc/testsuite/gcc.target/i386/avx2-pr115517.c | 33 +++++++++ .../gcc.target/i386/avx512-pr115517.c | 70 +++++++++++++++++++ 6 files changed, 265 insertions(+), 1 deletion(-) create mode 100644 gcc/testsuite/g++.target/i386/avx2-pr115517.C create mode 100644 gcc/testsuite/g++.target/i386/avx512-pr115517.C create mode 100644 gcc/testsuite/gcc.target/i386/avx2-pr115517.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512-pr115517.c diff --git a/gcc/match.pd b/gcc/match.pd index 3d0689c9312..1d10451d0de 100644 --- a/gcc/match.pd +++ b/gcc/match.pd @@ -5927,6 +5927,37 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) (if (VECTOR_INTEGER_TYPE_P (type) && target_supports_op_p (type, MINMAX, optab_vector)) (minmax @0 @1)))) + +/* Try to optimize x < 0 ? -1 : 0 into (signed) x >> 31 + and x < 0 ? 1 : 0 into (unsigned) x >> 31. */ +(simplify + (vec_cond (lt @0 integer_zerop) integer_all_onesp integer_zerop) + (if (VECTOR_INTEGER_TYPE_P (TREE_TYPE (@0)) + && !TYPE_UNSIGNED (TREE_TYPE (@0)) + && tree_nop_conversion_p (type, TREE_TYPE (@0)) + && target_supports_op_p (TREE_TYPE (@0), RSHIFT_EXPR, optab_scalar)) + (with + { + unsigned int prec = element_precision (TREE_TYPE (@0)); + } + (view_convert:type + (rshift @0 { build_int_cst (integer_type_node, prec - 1);}))))) + +(simplify + (vec_cond (lt @0 integer_zerop) integer_onep integer_zerop) + (if (VECTOR_INTEGER_TYPE_P (TREE_TYPE (@0)) + && !TYPE_UNSIGNED (TREE_TYPE (@0)) + && tree_nop_conversion_p (type, TREE_TYPE (@0)) + && target_supports_op_p (unsigned_type_for (TREE_TYPE (@0)), + RSHIFT_EXPR, optab_scalar)) + (with + { + unsigned int prec = element_precision (TREE_TYPE (@0)); + tree utype = unsigned_type_for (TREE_TYPE (@0)); + } + (view_convert:type + (rshift (view_convert:utype @0) + { build_int_cst (integer_type_node, prec - 1);}))))) #endif (for cnd (cond vec_cond) diff --git a/gcc/testsuite/g++.dg/tree-ssa/pr88152-1.C b/gcc/testsuite/g++.dg/tree-ssa/pr88152-1.C index 423ec897c1d..21299b886f0 100644 --- a/gcc/testsuite/g++.dg/tree-ssa/pr88152-1.C +++ b/gcc/testsuite/g++.dg/tree-ssa/pr88152-1.C @@ -1,7 +1,7 @@ // PR target/88152 // { dg-do compile } // { dg-options "-O2 -std=c++14 -fdump-tree-forwprop1" } -// { dg-final { scan-tree-dump-times " (?:<|>=) \{ 0\[, ]" 120 "forwprop1" } } +// { dg-final { scan-tree-dump-times " (?:(?:<|>=) \{ 0\[, \]|>> (?:7|15|31|63))" 120 "forwprop1" } } template using V [[gnu::vector_size (sizeof (T) * N)]] = T; diff --git a/gcc/testsuite/g++.target/i386/avx2-pr115517.C b/gcc/testsuite/g++.target/i386/avx2-pr115517.C new file mode 100644 index 00000000000..ec000c57542 --- /dev/null +++ b/gcc/testsuite/g++.target/i386/avx2-pr115517.C @@ -0,0 +1,60 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx2 -O2" } */ +/* { dg-final { scan-assembler-times "vpsrlq" 2 } } */ +/* { dg-final { scan-assembler-times "vpsrld" 2 } } */ +/* { dg-final { scan-assembler-times "vpsrlw" 2 } } */ + +typedef short v8hi __attribute__((vector_size(16))); +typedef short v16hi __attribute__((vector_size(32))); +typedef int v4si __attribute__((vector_size(16))); +typedef int v8si __attribute__((vector_size(32))); +typedef long long v2di __attribute__((vector_size(16))); +typedef long long v4di __attribute__((vector_size(32))); + +v8hi +foo (v8hi a) +{ + v8hi const1_op = __extension__(v8hi){1,1,1,1,1,1,1,1}; + v8hi const0_op = __extension__(v8hi){0,0,0,0,0,0,0,0}; + return a < const0_op ? const1_op : const0_op; +} + +v16hi +foo2 (v16hi a) +{ + v16hi const1_op = __extension__(v16hi){1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1}; + v16hi const0_op = __extension__(v16hi){0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0}; + return a < const0_op ? const1_op : const0_op; +} + +v4si +foo3 (v4si a) +{ + v4si const1_op = __extension__(v4si){1,1,1,1}; + v4si const0_op = __extension__(v4si){0,0,0,0}; + return a < const0_op ? const1_op : const0_op; +} + +v8si +foo4 (v8si a) +{ + v8si const1_op = __extension__(v8si){1,1,1,1,1,1,1,1}; + v8si const0_op = __extension__(v8si){0,0,0,0,0,0,0,0}; + return a < const0_op ? const1_op : const0_op; +} + +v2di +foo3 (v2di a) +{ + v2di const1_op = __extension__(v2di){1,1}; + v2di const0_op = __extension__(v2di){0,0}; + return a < const0_op ? const1_op : const0_op; +} + +v4di +foo4 (v4di a) +{ + v4di const1_op = __extension__(v4di){1,1,1,1}; + v4di const0_op = __extension__(v4di){0,0,0,0}; + return a < const0_op ? const1_op : const0_op; +} diff --git a/gcc/testsuite/g++.target/i386/avx512-pr115517.C b/gcc/testsuite/g++.target/i386/avx512-pr115517.C new file mode 100644 index 00000000000..22df41bbdc9 --- /dev/null +++ b/gcc/testsuite/g++.target/i386/avx512-pr115517.C @@ -0,0 +1,70 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx512bw -mavx512vl -O2" } */ +/* { dg-final { scan-assembler-times "vpsrad" 3 } } */ +/* { dg-final { scan-assembler-times "vpsraw" 3 } } */ +/* { dg-final { scan-assembler-times "vpsraq" 3 } } */ + +typedef short v8hi __attribute__((vector_size(16))); +typedef short v16hi __attribute__((vector_size(32))); +typedef short v32hi __attribute__((vector_size(64))); +typedef int v4si __attribute__((vector_size(16))); +typedef int v8si __attribute__((vector_size(32))); +typedef int v16si __attribute__((vector_size(64))); +typedef long long v2di __attribute__((vector_size(16))); +typedef long long v4di __attribute__((vector_size(32))); +typedef long long v8di __attribute__((vector_size(64))); + +v8hi +foo (v8hi a) +{ + return a < __extension__(v8hi) { 0, 0, 0, 0, 0, 0, 0, 0}; +} + +v16hi +foo2 (v16hi a) +{ + return a < __extension__(v16hi) { 0, 0, 0, 0, 0, 0, 0, 0,0, 0, 0, 0, 0, 0, 0, 0}; +} + +v32hi +foo3 (v32hi a) +{ + return a < __extension__(v32hi) { 0, 0, 0, 0, 0, 0, 0, 0,0, 0, 0, 0, 0, 0, 0, 0, + 0, 0, 0, 0, 0, 0, 0, 0,0, 0, 0, 0, 0, 0, 0, 0}; +} + +v4si +foo4 (v4si a) +{ + return a < __extension__(v4si) { 0, 0, 0, 0}; +} + +v8si +foo5 (v8si a) +{ + return a < __extension__(v8si) { 0, 0, 0, 0, 0, 0, 0, 0}; +} + +v16si +foo6 (v16si a) +{ + return a < __extension__(v16si) { 0, 0, 0, 0, 0, 0, 0, 0,0, 0, 0, 0, 0, 0, 0, 0}; +} + +v2di +foo7 (v2di a) +{ + return a < __extension__(v2di) { 0, 0}; +} + +v4di +foo8 (v4di a) +{ + return a < __extension__(v4di) { 0, 0, 0, 0}; +} + +v8di +foo9 (v8di a) +{ + return a < __extension__(v8di) { 0, 0, 0, 0, 0, 0, 0, 0}; +} diff --git a/gcc/testsuite/gcc.target/i386/avx2-pr115517.c b/gcc/testsuite/gcc.target/i386/avx2-pr115517.c new file mode 100644 index 00000000000..5b2620b0dc1 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx2-pr115517.c @@ -0,0 +1,33 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx2 -O2" } */ +/* { dg-final { scan-assembler-times "vpsrad" 2 } } */ +/* { dg-final { scan-assembler-times "vpsraw" 2 } } */ + +typedef short v8hi __attribute__((vector_size(16))); +typedef short v16hi __attribute__((vector_size(32))); +typedef int v4si __attribute__((vector_size(16))); +typedef int v8si __attribute__((vector_size(32))); + +v8hi +foo (v8hi a) +{ + return a < __extension__(v8hi) { 0, 0, 0, 0, 0, 0, 0, 0}; +} + +v16hi +foo2 (v16hi a) +{ + return a < __extension__(v16hi) { 0, 0, 0, 0, 0, 0, 0, 0,0, 0, 0, 0, 0, 0, 0, 0}; +} + +v4si +foo3 (v4si a) +{ + return a < __extension__(v4si) { 0, 0, 0, 0}; +} + +v8si +foo4 (v8si a) +{ + return a < __extension__(v8si) { 0, 0, 0, 0, 0, 0, 0, 0}; +} diff --git a/gcc/testsuite/gcc.target/i386/avx512-pr115517.c b/gcc/testsuite/gcc.target/i386/avx512-pr115517.c new file mode 100644 index 00000000000..22df41bbdc9 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512-pr115517.c @@ -0,0 +1,70 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx512bw -mavx512vl -O2" } */ +/* { dg-final { scan-assembler-times "vpsrad" 3 } } */ +/* { dg-final { scan-assembler-times "vpsraw" 3 } } */ +/* { dg-final { scan-assembler-times "vpsraq" 3 } } */ + +typedef short v8hi __attribute__((vector_size(16))); +typedef short v16hi __attribute__((vector_size(32))); +typedef short v32hi __attribute__((vector_size(64))); +typedef int v4si __attribute__((vector_size(16))); +typedef int v8si __attribute__((vector_size(32))); +typedef int v16si __attribute__((vector_size(64))); +typedef long long v2di __attribute__((vector_size(16))); +typedef long long v4di __attribute__((vector_size(32))); +typedef long long v8di __attribute__((vector_size(64))); + +v8hi +foo (v8hi a) +{ + return a < __extension__(v8hi) { 0, 0, 0, 0, 0, 0, 0, 0}; +} + +v16hi +foo2 (v16hi a) +{ + return a < __extension__(v16hi) { 0, 0, 0, 0, 0, 0, 0, 0,0, 0, 0, 0, 0, 0, 0, 0}; +} + +v32hi +foo3 (v32hi a) +{ + return a < __extension__(v32hi) { 0, 0, 0, 0, 0, 0, 0, 0,0, 0, 0, 0, 0, 0, 0, 0, + 0, 0, 0, 0, 0, 0, 0, 0,0, 0, 0, 0, 0, 0, 0, 0}; +} + +v4si +foo4 (v4si a) +{ + return a < __extension__(v4si) { 0, 0, 0, 0}; +} + +v8si +foo5 (v8si a) +{ + return a < __extension__(v8si) { 0, 0, 0, 0, 0, 0, 0, 0}; +} + +v16si +foo6 (v16si a) +{ + return a < __extension__(v16si) { 0, 0, 0, 0, 0, 0, 0, 0,0, 0, 0, 0, 0, 0, 0, 0}; +} + +v2di +foo7 (v2di a) +{ + return a < __extension__(v2di) { 0, 0}; +} + +v4di +foo8 (v4di a) +{ + return a < __extension__(v4di) { 0, 0, 0, 0}; +} + +v8di +foo9 (v8di a) +{ + return a < __extension__(v8di) { 0, 0, 0, 0, 0, 0, 0, 0}; +}