From patchwork Thu Oct 26 01:14:31 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Liu, Hongtao" X-Patchwork-Id: 1855557 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=intel.com header.i=@intel.com header.a=rsa-sha256 header.s=Intel header.b=na8/7wmU; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4SG7BS2rnwz202k for ; Thu, 26 Oct 2023 12:14:56 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 0DB653857702 for ; Thu, 26 Oct 2023 01:14:52 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mgamail.intel.com (mgamail.intel.com [192.55.52.93]) by sourceware.org (Postfix) with ESMTPS id 34A413858C31 for ; Thu, 26 Oct 2023 01:14:36 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 34A413858C31 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=intel.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 34A413858C31 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=192.55.52.93 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1698282878; cv=none; b=D6G4dr4My1CA/Z0oW5NFmUD0FAxe8LvtPhwvnUFSZPiL2IHBYYlGHeTjyRZW4NyCz8X/pWsfFnCL2PjvaExWpvY/zTnjqjwC07nKQiX8IxoZFKec7pi6Hjh0/0a9DI3h+SzqpVZb9lkEwK9V47IsJrY7R/NfPCLzZlqVY86PyDs= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1698282878; c=relaxed/simple; bh=PHzazmWTVtwfJc1p087wDn092aXRNGGGg49G3rUCgvM=; h=DKIM-Signature:From:To:Subject:Date:Message-Id:MIME-Version; b=bjAHNHv5fQJnm1tTZ49Qu5W9whjgxE6vfmktRqK7jV3FfOmA6nJqY6r9UiHFGNw8iTkMJnSEXIUAhXGPJcZlQPwTdQcw8AmyzzBJM/aEj5ii1+txyVwxkE5Vnh1OFW7OHJCtS74uaiDNRlk3todIyFXT1mTmZMqGtid6pHuxR+c= ARC-Authentication-Results: i=1; server2.sourceware.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1698282876; x=1729818876; h=from:to:cc:subject:date:message-id:mime-version: content-transfer-encoding; bh=PHzazmWTVtwfJc1p087wDn092aXRNGGGg49G3rUCgvM=; b=na8/7wmUOUIjZZE4umgj5VUocruVHrL8swGCpXgHYgdVy6aSOK2ZBGnD i05ORLOtYzuCPRJNpYavNMC0vsAmWTGzeMtrudAaWH2uAIpu6li7gIlH0 VJcz9Yo5u3VPw8Cvf3O9/DDmN6kOb4TRzgscmfjqICqNqVJvA+HDPTzl5 nsAsYjoPPPOi4BH/brjM7TEZROdtaztCA8J2ZU+CvD+Y/4LCUAXmhLh0c URTAUmw6KdoZFhTsLHNpbnM6BT9TouWDaDMkVQ9sDjuIv+rSdzMfP4cRT uxEwhyCjkIigS0v+mr0PObfDohy6bjBJUIh9orAJwJ3t/rPRCffAcCdmC w==; X-IronPort-AV: E=McAfee;i="6600,9927,10874"; a="384644489" X-IronPort-AV: E=Sophos;i="6.03,252,1694761200"; d="scan'208";a="384644489" Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Oct 2023 18:14:34 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10874"; a="875730702" X-IronPort-AV: E=Sophos;i="6.03,252,1694761200"; d="scan'208";a="875730702" Received: from shvmail03.sh.intel.com ([10.239.245.20]) by fmsmga002.fm.intel.com with ESMTP; 25 Oct 2023 18:14:32 -0700 Received: from shliclel4217.sh.intel.com (shliclel4217.sh.intel.com [10.239.240.127]) by shvmail03.sh.intel.com (Postfix) with ESMTP id 2FAE91005703; Thu, 26 Oct 2023 09:14:32 +0800 (CST) From: liuhongt To: gcc-patches@gcc.gnu.org Cc: crazylht@gmail.com, hjl.tools@gmail.com Subject: [PATCH V2 1/2] Pass type of comparison operands instead of comparison result to truth_type_for in build_vec_cmp. Date: Thu, 26 Oct 2023 09:14:31 +0800 Message-Id: <20231026011432.3484969-1-hongtao.liu@intel.com> X-Mailer: git-send-email 2.31.1 MIME-Version: 1.0 X-Spam-Status: No, score=-12.0 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org >I think it's indeed on purpose that the result of v1 < v2 is a signed >integer vector type. >But build_vec_cmp should not use the truth type for the result but instead the >truth type for the comparison, so Change build_vec_cmp in both c/c++, also notice for jit part, it already uses type of comparison instead of the result. gcc/c/ChangeLog: * c-typeck.cc (build_vec_cmp): Pass type of arg0 to truth_type_for. gcc/cp/ChangeLog: * typeck.cc (build_vec_cmp): Pass type of arg0 to truth_type_for. --- gcc/c/c-typeck.cc | 2 +- gcc/cp/typeck.cc | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/gcc/c/c-typeck.cc b/gcc/c/c-typeck.cc index e55e887da14..41ee38368f2 100644 --- a/gcc/c/c-typeck.cc +++ b/gcc/c/c-typeck.cc @@ -11982,7 +11982,7 @@ build_vec_cmp (tree_code code, tree type, { tree zero_vec = build_zero_cst (type); tree minus_one_vec = build_minus_one_cst (type); - tree cmp_type = truth_type_for (type); + tree cmp_type = truth_type_for (TREE_TYPE (arg0)); tree cmp = build2 (code, cmp_type, arg0, arg1); return build3 (VEC_COND_EXPR, type, cmp, minus_one_vec, zero_vec); } diff --git a/gcc/cp/typeck.cc b/gcc/cp/typeck.cc index 8132bd7fccc..7b2ad51bde7 100644 --- a/gcc/cp/typeck.cc +++ b/gcc/cp/typeck.cc @@ -4826,7 +4826,7 @@ build_vec_cmp (tree_code code, tree type, { tree zero_vec = build_zero_cst (type); tree minus_one_vec = build_minus_one_cst (type); - tree cmp_type = truth_type_for (type); + tree cmp_type = truth_type_for (TREE_TYPE (arg0)); tree cmp = build2 (code, cmp_type, arg0, arg1); return build3 (VEC_COND_EXPR, type, cmp, minus_one_vec, zero_vec); } From patchwork Thu Oct 26 01:14:32 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Liu, Hongtao" X-Patchwork-Id: 1855558 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=intel.com header.i=@intel.com header.a=rsa-sha256 header.s=Intel header.b=FvDP8BEe; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4SG7C370Qjz23jr for ; Thu, 26 Oct 2023 12:15:27 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 11104385B52A for ; Thu, 26 Oct 2023 01:15:25 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mgamail.intel.com (mgamail.intel.com [192.55.52.93]) by sourceware.org (Postfix) with ESMTPS id 66E7D3858022 for ; Thu, 26 Oct 2023 01:14:38 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 66E7D3858022 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=intel.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 66E7D3858022 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=192.55.52.93 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1698282881; cv=none; b=OxtElt/0K5oEcyX88isGz/75SO5r/+ZBSd4EQQ6VhRxi04ZAzKSrjtpjQRvV+m9QnnP1BStimxKhXgdkNAUoxDJ7GIluaLoArxDNq+2aKUyqoLfB0u3AHy/tmfhTOAKAKa0/V7NWUxX3C5Un0b0ujxc26d9HgRbEZvB7Nt+3PMI= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1698282881; c=relaxed/simple; bh=5iOZiYaNAln9kVRaIzuURSl4n0kMUlqgkycNff56Tpk=; h=DKIM-Signature:From:To:Subject:Date:Message-Id:MIME-Version; b=JLpwIH3apPHPlJO5DWk81yvGOj/BS5idWBuFdBJoB68dVEwKKZOrbCCNDlqMAV1Bv25veez3PQkS7uC9hyTVXaL81CCdbQd64M4SSIPlOBIHGDBCaiJRGiZvNCGfHAlfIUWZdPkkSO2kDJ9sc1R/l76+ojeEGweQQ+zZC2bWYRU= ARC-Authentication-Results: i=1; server2.sourceware.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1698282878; x=1729818878; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=5iOZiYaNAln9kVRaIzuURSl4n0kMUlqgkycNff56Tpk=; b=FvDP8BEeDgxUcmG2tg1J5l+dEbBWugjIXyGUWpc3/3TNP62lUMJkJDIi Z61xSFBi3luEA5ZV2HrA60c+yVQKnszQp9vERx/UjNdj5gVPrYwkfUAbQ BPJ8wmnpivWkPznfut4ebsw7AwBg8crux20HgnNTV6j9pXGn4RMrXOE6d FJ/Yi95+VBeUsWRIkTvUI8eCKgJOaUfbzJY+j/6/4vaso1KJqnwutRmFs zw2s/4fuaxZoAvHxiYD96AOajthFZwIQZfGT19/ThUCoOM9ZLzOrkIMOl 0xUR93/YowfFqRmRa2Yqa/p0hKtnofqhtDXu/lg5u/UyUkjDCmbmUi/MJ Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10874"; a="384644493" X-IronPort-AV: E=Sophos;i="6.03,252,1694761200"; d="scan'208";a="384644493" Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Oct 2023 18:14:37 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10874"; a="875730703" X-IronPort-AV: E=Sophos;i="6.03,252,1694761200"; d="scan'208";a="875730703" Received: from shvmail03.sh.intel.com ([10.239.245.20]) by fmsmga002.fm.intel.com with ESMTP; 25 Oct 2023 18:14:32 -0700 Received: from shliclel4217.sh.intel.com (shliclel4217.sh.intel.com [10.239.240.127]) by shvmail03.sh.intel.com (Postfix) with ESMTP id 325311005684; Thu, 26 Oct 2023 09:14:32 +0800 (CST) From: liuhongt To: gcc-patches@gcc.gnu.org Cc: crazylht@gmail.com, hjl.tools@gmail.com Subject: [PATCH V2 2/2] Support vec_cmpmn/vcondmn for v2hf/v4hf. Date: Thu, 26 Oct 2023 09:14:32 +0800 Message-Id: <20231026011432.3484969-2-hongtao.liu@intel.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20231026011432.3484969-1-hongtao.liu@intel.com> References: <20231026011432.3484969-1-hongtao.liu@intel.com> MIME-Version: 1.0 X-Spam-Status: No, score=-12.0 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org >vcond and vcondeq shouldn't be necessary if there's >vcond_mask and vcmp support which is the "modern" >way of handling vcond. Unless the ISA really can do >compare and select with a single instruction. The V2 patch remove vcond/vcondu from the initial version[1], but there're many optimizations done in ix86_expand_int_vcond, so this patch adds many extra combine splitters to get the same optimizations. [1] https://gcc.gnu.org/pipermail/gcc-patches/2023-October/633946.html Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}. Ready push to trunk. gcc/ChangeLog: PR target/103861 * config/i386/i386-expand.cc (ix86_expand_sse_movcc): Handle V2HF/V2BF/V4HF/V4BFmode. * config/i386/i386.cc (ix86_get_mask_mode): Return QImode when data_mode is V4HF/V2HFmode. * config/i386/mmx.md (vec_cmpv4hfqi): New expander. (vcond_mask_v4hi): Ditto. (vcond_mask_qi): Ditto. (vec_cmpv2hfqi): Ditto. (vcond_mask_v2hi): Ditto. (mmx_plendvb_): Add 2 combine splitters after the patterns. (mmx_pblendvb_v8qi): Ditto. (v2hi3): Add a combine splitter after the pattern. (3): Ditto. (v8qi3): Ditto. (3): Ditto. * config/i386/sse.md (vcond): Merge this with .. (vcond): .. this into .. (vcond): .. this, and extend to V8BF/V16BF/V32BFmode. gcc/testsuite/ChangeLog: * g++.target/i386/part-vect-vcondhf.C: New test. * gcc.target/i386/part-vect-vec_cmphf.c: New test. --- gcc/config/i386/i386-expand.cc | 4 + gcc/config/i386/i386.cc | 6 +- gcc/config/i386/mmx.md | 269 +++++++++++++++++- gcc/config/i386/sse.md | 25 +- .../g++.target/i386/part-vect-vcondhf.C | 45 +++ .../gcc.target/i386/part-vect-vec_cmphf.c | 26 ++ 6 files changed, 352 insertions(+), 23 deletions(-) create mode 100644 gcc/testsuite/g++.target/i386/part-vect-vcondhf.C create mode 100644 gcc/testsuite/gcc.target/i386/part-vect-vec_cmphf.c diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc index 1eae9d7c78c..9658f9c5a2d 100644 --- a/gcc/config/i386/i386-expand.cc +++ b/gcc/config/i386/i386-expand.cc @@ -4198,6 +4198,8 @@ ix86_expand_sse_movcc (rtx dest, rtx cmp, rtx op_true, rtx op_false) break; case E_V8QImode: case E_V4HImode: + case E_V4HFmode: + case E_V4BFmode: case E_V2SImode: if (TARGET_SSE4_1) { @@ -4207,6 +4209,8 @@ ix86_expand_sse_movcc (rtx dest, rtx cmp, rtx op_true, rtx op_false) break; case E_V4QImode: case E_V2HImode: + case E_V2HFmode: + case E_V2BFmode: if (TARGET_SSE4_1) { gen = gen_mmx_pblendvb_v4qi; diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc index 641e7680335..c9c07beaeb1 100644 --- a/gcc/config/i386/i386.cc +++ b/gcc/config/i386/i386.cc @@ -24360,7 +24360,11 @@ ix86_get_mask_mode (machine_mode data_mode) /* Scalar mask case. */ if ((TARGET_AVX512F && TARGET_EVEX512 && vector_size == 64) - || (TARGET_AVX512VL && (vector_size == 32 || vector_size == 16))) + || (TARGET_AVX512VL && (vector_size == 32 || vector_size == 16)) + /* AVX512FP16 only supports vector comparison + to kmask for _Float16. */ + || (TARGET_AVX512VL && TARGET_AVX512FP16 + && GET_MODE_INNER (data_mode) == E_HFmode)) { if (elem_size == 4 || elem_size == 8 diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md index 491a0a51272..78bb618f54c 100644 --- a/gcc/config/i386/mmx.md +++ b/gcc/config/i386/mmx.md @@ -61,6 +61,9 @@ (define_mode_iterator MMXMODE248 [V4HI V2SI V1DI]) (define_mode_iterator V_32 [V4QI V2HI V1SI V2HF V2BF]) (define_mode_iterator V2FI_32 [V2HF V2BF V2HI]) +(define_mode_iterator V4FI_64 [V4HF V4BF V4HI]) +(define_mode_iterator V4F_64 [V4HF V4BF]) +(define_mode_iterator V2F_32 [V2HF V2BF]) ;; 4-byte integer vector modes (define_mode_iterator VI_32 [V4QI V2HI]) @@ -1972,10 +1975,12 @@ (define_mode_attr mov_to_sse_suffix [(V2HF "d") (V4HF "q") (V2HI "d") (V4HI "q")]) (define_mode_attr mmxxmmmode - [(V2HF "V8HF") (V2HI "V8HI") (V2BF "V8BF")]) + [(V2HF "V8HF") (V2HI "V8HI") (V2BF "V8BF") + (V4HF "V8HF") (V4HI "V8HI") (V4BF "V8BF")]) (define_mode_attr mmxxmmmodelower - [(V2HF "v8hf") (V2HI "v8hi") (V2BF "v8bf")]) + [(V2HF "v8hf") (V2HI "v8hi") (V2BF "v8bf") + (V4HF "v8hf") (V4HI "v8hi") (V4BF "v8bf")]) (define_expand "movd__to_sse" [(set (match_operand: 0 "register_operand") @@ -2114,6 +2119,110 @@ (define_insn_and_split "*mmx_nabs2" [(set (match_dup 0) (ior: (match_dup 1) (match_dup 2)))]) +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;; +;; Parallel half-precision floating point comparisons +;; +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; + +(define_expand "vec_cmpv4hfqi" + [(set (match_operand:QI 0 "register_operand") + (match_operator:QI 1 "" + [(match_operand:V4HF 2 "nonimmediate_operand") + (match_operand:V4HF 3 "nonimmediate_operand")]))] + "TARGET_MMX_WITH_SSE && TARGET_AVX512FP16 && TARGET_AVX512VL + && ix86_partial_vec_fp_math" +{ + rtx ops[4]; + ops[3] = gen_reg_rtx (V8HFmode); + ops[2] = gen_reg_rtx (V8HFmode); + + emit_insn (gen_movq_v4hf_to_sse (ops[3], operands[3])); + emit_insn (gen_movq_v4hf_to_sse (ops[2], operands[2])); + emit_insn (gen_vec_cmpv8hfqi (operands[0], operands[1], ops[2], ops[3])); + DONE; +}) + +(define_expand "vcond_mask_v4hi" + [(set (match_operand:V4F_64 0 "register_operand") + (vec_merge:V4F_64 + (match_operand:V4F_64 1 "register_operand") + (match_operand:V4F_64 2 "register_operand") + (match_operand:V4HI 3 "register_operand")))] + "TARGET_MMX_WITH_SSE && TARGET_SSE4_1" +{ + ix86_expand_sse_movcc (operands[0], operands[3], + operands[1], operands[2]); + DONE; +}) + +(define_expand "vcond_mask_qi" + [(set (match_operand:V4FI_64 0 "register_operand") + (vec_merge:V4FI_64 + (match_operand:V4FI_64 1 "register_operand") + (match_operand:V4FI_64 2 "register_operand") + (match_operand:QI 3 "register_operand")))] + "TARGET_MMX_WITH_SSE && TARGET_AVX512BW && TARGET_AVX512VL" +{ + rtx op0 = gen_reg_rtx (mode); + operands[1] = lowpart_subreg (mode, operands[1], mode); + operands[2] = lowpart_subreg (mode, operands[2], mode); + emit_insn (gen_vcond_mask_qi (op0, operands[1], + operands[2], operands[3])); + emit_move_insn (operands[0], + lowpart_subreg (mode, op0, mode)); + DONE; +}) + +(define_expand "vec_cmpv2hfqi" + [(set (match_operand:QI 0 "register_operand") + (match_operator:QI 1 "" + [(match_operand:V2HF 2 "nonimmediate_operand") + (match_operand:V2HF 3 "nonimmediate_operand")]))] + "TARGET_AVX512FP16 && TARGET_AVX512VL + && ix86_partial_vec_fp_math" +{ + rtx ops[4]; + ops[3] = gen_reg_rtx (V8HFmode); + ops[2] = gen_reg_rtx (V8HFmode); + + emit_insn (gen_movd_v2hf_to_sse (ops[3], operands[3])); + emit_insn (gen_movd_v2hf_to_sse (ops[2], operands[2])); + emit_insn (gen_vec_cmpv8hfqi (operands[0], operands[1], ops[2], ops[3])); + DONE; +}) + +(define_expand "vcond_mask_v2hi" + [(set (match_operand:V2F_32 0 "register_operand") + (vec_merge:V2F_32 + (match_operand:V2F_32 1 "register_operand") + (match_operand:V2F_32 2 "register_operand") + (match_operand:V2HI 3 "register_operand")))] + "TARGET_SSE4_1" +{ + ix86_expand_sse_movcc (operands[0], operands[3], + operands[1], operands[2]); + DONE; +}) + +(define_expand "vcond_mask_qi" + [(set (match_operand:V2FI_32 0 "register_operand") + (vec_merge:V2FI_32 + (match_operand:V2FI_32 1 "register_operand") + (match_operand:V2FI_32 2 "register_operand") + (match_operand:QI 3 "register_operand")))] + "TARGET_AVX512BW && TARGET_AVX512VL" +{ + rtx op0 = gen_reg_rtx (mode); + operands[1] = lowpart_subreg (mode, operands[1], mode); + operands[2] = lowpart_subreg (mode, operands[2], mode); + emit_insn (gen_vcond_mask_qi (op0, operands[1], + operands[2], operands[3])); + emit_move_insn (operands[0], + lowpart_subreg (mode, op0, mode)); + DONE; +}) + ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; ;; ;; Parallel half-precision floating point rounding operations. @@ -3251,6 +3360,21 @@ (define_insn "3" (set_attr "prefix" "orig,orig,vex") (set_attr "mode" "TI")]) +(define_split + [(set (match_operand:V4HI 0 "register_operand") + (eq:V4HI + (eq:V4HI + (us_minus:V4HI + (match_operand:V4HI 1 "register_operand") + (match_operand:V4HI 2 "register_operand")) + (match_operand:V4HI 3 "const0_operand")) + (match_operand:V4HI 4 "const0_operand")))] + "TARGET_SSE4_1 && TARGET_MMX_WITH_SSE" + [(set (match_dup 0) + (umin:V4HI (match_dup 1) (match_dup 2))) + (set (match_dup 0) + (eq:V4HI (match_dup 0) (match_dup 2)))]) + (define_expand "mmx_v8qi3" [(set (match_operand:V8QI 0 "register_operand") (umaxmin:V8QI @@ -3284,6 +3408,21 @@ (define_expand "v8qi3" (match_operand:V8QI 2 "register_operand")))] "TARGET_MMX_WITH_SSE") +(define_split + [(set (match_operand:V8QI 0 "register_operand") + (eq:V8QI + (eq:V8QI + (us_minus:V8QI + (match_operand:V8QI 1 "register_operand") + (match_operand:V8QI 2 "register_operand")) + (match_operand:V8QI 3 "const0_operand")) + (match_operand:V8QI 4 "const0_operand")))] + "TARGET_MMX_WITH_SSE" + [(set (match_dup 0) + (umin:V8QI (match_dup 1) (match_dup 2))) + (set (match_dup 0) + (eq:V8QI (match_dup 0) (match_dup 2)))]) + (define_insn "3" [(set (match_operand:VI1_16_32 0 "register_operand" "=x,Yw") (umaxmin:VI1_16_32 @@ -3297,6 +3436,21 @@ (define_insn "3" (set_attr "type" "sseiadd") (set_attr "mode" "TI")]) +(define_split + [(set (match_operand:V4QI 0 "register_operand") + (eq:V4QI + (eq:V4QI + (us_minus:V4QI + (match_operand:V4QI 1 "register_operand") + (match_operand:V4QI 2 "register_operand")) + (match_operand:V4QI 3 "const0_operand")) + (match_operand:V4QI 4 "const0_operand")))] + "TARGET_SSE2" + [(set (match_dup 0) + (umin:V4QI (match_dup 1) (match_dup 2))) + (set (match_dup 0) + (eq:V4QI (match_dup 0) (match_dup 2)))]) + (define_insn "v2hi3" [(set (match_operand:V2HI 0 "register_operand" "=Yr,*x,Yv") (umaxmin:V2HI @@ -3313,6 +3467,21 @@ (define_insn "v2hi3" (set_attr "prefix" "orig,orig,vex") (set_attr "mode" "TI")]) +(define_split + [(set (match_operand:V2HI 0 "register_operand") + (eq:V2HI + (eq:V2HI + (us_minus:V2HI + (match_operand:V2HI 1 "register_operand") + (match_operand:V2HI 2 "register_operand")) + (match_operand:V2HI 3 "const0_operand")) + (match_operand:V2HI 4 "const0_operand")))] + "TARGET_SSE4_1" + [(set (match_dup 0) + (umin:V2HI (match_dup 1) (match_dup 2))) + (set (match_dup 0) + (eq:V2HI (match_dup 0) (match_dup 2)))]) + (define_insn "ssse3_abs2" [(set (match_operand:MMXMODEI 0 "register_operand" "=y,Yv") (abs:MMXMODEI @@ -3785,6 +3954,54 @@ (define_insn "mmx_pblendvb_v8qi" (set_attr "btver2_decode" "vector") (set_attr "mode" "TI")]) +(define_split + [(set (match_operand:V8QI 0 "register_operand") + (unspec:V8QI + [(match_operand:V8QI 1 "register_operand") + (match_operand:V8QI 2 "register_operand") + (eq:V8QI + (eq:V8QI + (match_operand:V8QI 3 "register_operand") + (match_operand:V8QI 4 "register_operand")) + (match_operand:V8QI 5 "const0_operand"))] + UNSPEC_BLENDV))] + "TARGET_MMX_WITH_SSE" + [(set (match_dup 6) + (eq:V8QI (match_dup 3) (match_dup 4))) + (set (match_dup 0) + (unspec:V8QI + [(match_dup 2) + (match_dup 1) + (match_dup 6)] + UNSPEC_BLENDV))] + "operands[6] = gen_reg_rtx (V8QImode);") + +(define_split + [(set (match_operand:V8QI 0 "register_operand") + (unspec:V8QI + [(match_operand:V8QI 1 "register_operand") + (match_operand:V8QI 2 "register_operand") + (subreg:V8QI + (eq:MMXMODE24 + (eq:MMXMODE24 + (match_operand:MMXMODE24 3 "register_operand") + (match_operand:MMXMODE24 4 "register_operand")) + (match_operand:MMXMODE24 5 "const0_operand")) 0)] + UNSPEC_BLENDV))] + "TARGET_MMX_WITH_SSE" + [(set (match_dup 6) + (eq:MMXMODE24 (match_dup 3) (match_dup 4))) + (set (match_dup 0) + (unspec:V8QI + [(match_dup 2) + (match_dup 1) + (match_dup 7)] + UNSPEC_BLENDV))] +{ + operands[6] = gen_reg_rtx (mode); + operands[7] = lowpart_subreg (V8QImode, operands[6], mode); +}) + (define_insn "mmx_pblendvb_" [(set (match_operand:VI_16_32 0 "register_operand" "=Yr,*x,x") (unspec:VI_16_32 @@ -3805,6 +4022,54 @@ (define_insn "mmx_pblendvb_" (set_attr "btver2_decode" "vector") (set_attr "mode" "TI")]) +(define_split + [(set (match_operand:VI_16_32 0 "register_operand") + (unspec:VI_16_32 + [(match_operand:VI_16_32 1 "register_operand") + (match_operand:VI_16_32 2 "register_operand") + (eq:VI_16_32 + (eq:VI_16_32 + (match_operand:VI_16_32 3 "register_operand") + (match_operand:VI_16_32 4 "register_operand")) + (match_operand:VI_16_32 5 "const0_operand"))] + UNSPEC_BLENDV))] + "TARGET_SSE2" + [(set (match_dup 6) + (eq:VI_16_32 (match_dup 3) (match_dup 4))) + (set (match_dup 0) + (unspec:VI_16_32 + [(match_dup 2) + (match_dup 1) + (match_dup 6)] + UNSPEC_BLENDV))] + "operands[6] = gen_reg_rtx (mode);") + +(define_split + [(set (match_operand:V4QI 0 "register_operand") + (unspec:V4QI + [(match_operand:V4QI 1 "register_operand") + (match_operand:V4QI 2 "register_operand") + (subreg:V4QI + (eq:V2HI + (eq:V2HI + (match_operand:V2HI 3 "register_operand") + (match_operand:V2HI 4 "register_operand")) + (match_operand:V2HI 5 "const0_operand")) 0)] + UNSPEC_BLENDV))] + "TARGET_SSE2" + [(set (match_dup 6) + (eq:V2HI (match_dup 3) (match_dup 4))) + (set (match_dup 0) + (unspec:V4QI + [(match_dup 2) + (match_dup 1) + (match_dup 7)] + UNSPEC_BLENDV))] +{ + operands[6] = gen_reg_rtx (V2HImode); + operands[7] = lowpart_subreg (V4QImode, operands[6], V2HImode); +}) + ;; XOP parallel XMM conditional moves (define_insn "*xop_pcmov_" [(set (match_operand:MMXMODE124 0 "register_operand" "=x") diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index c988935d4df..e2a7cbeb722 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -4644,29 +4644,14 @@ (define_expand "vcond" DONE; }) -(define_expand "vcond" - [(set (match_operand:VHF_AVX512VL 0 "register_operand") - (if_then_else:VHF_AVX512VL - (match_operator 3 "" - [(match_operand:VHF_AVX512VL 4 "vector_operand") - (match_operand:VHF_AVX512VL 5 "vector_operand")]) - (match_operand:VHF_AVX512VL 1 "general_operand") - (match_operand:VHF_AVX512VL 2 "general_operand")))] - "TARGET_AVX512FP16" -{ - bool ok = ix86_expand_fp_vcond (operands); - gcc_assert (ok); - DONE; -}) - -(define_expand "vcond" - [(set (match_operand: 0 "register_operand") - (if_then_else: +(define_expand "vcond" + [(set (match_operand:VI2HFBF_AVX512VL 0 "register_operand") + (if_then_else:VI2HFBF_AVX512VL (match_operator 3 "" [(match_operand:VHF_AVX512VL 4 "vector_operand") (match_operand:VHF_AVX512VL 5 "vector_operand")]) - (match_operand: 1 "general_operand") - (match_operand: 2 "general_operand")))] + (match_operand:VI2HFBF_AVX512VL 1 "general_operand") + (match_operand:VI2HFBF_AVX512VL 2 "general_operand")))] "TARGET_AVX512FP16" { bool ok = ix86_expand_fp_vcond (operands); diff --git a/gcc/testsuite/g++.target/i386/part-vect-vcondhf.C b/gcc/testsuite/g++.target/i386/part-vect-vcondhf.C new file mode 100644 index 00000000000..f19727816cf --- /dev/null +++ b/gcc/testsuite/g++.target/i386/part-vect-vcondhf.C @@ -0,0 +1,45 @@ +/* PR target/103861 */ +/* { dg-do compile { target { ! ia32 } } } */ +/* { dg-options "-O2 -mavx512fp16 -mavx512vl" } */ +/* { dg-final { scan-assembler-times "vpcmpeqw" 6 } } */ +/* { dg-final { scan-assembler-times "vpcmpgtw" 2 } } */ +/* { dg-final { scan-assembler-times "vpminuw" 2 } } */ +/* { dg-final { scan-assembler-times "vcmpph" 8 } } */ +/* { dg-final { scan-assembler-times "vpblendvb" 8 } } */ +typedef unsigned short __attribute__((__vector_size__ (4))) __v2hu; +typedef short __attribute__((__vector_size__ (4))) __v2hi; + +typedef unsigned short __attribute__((__vector_size__ (8))) __v4hu; +typedef short __attribute__((__vector_size__ (8))) __v4hi; + +typedef _Float16 __attribute__((__vector_size__ (4))) __v2hf; +typedef _Float16 __attribute__((__vector_size__ (8))) __v4hf; + + +__v2hu au, bu; +__v2hi as, bs; +__v2hf af, bf; + +__v4hu cu, du; +__v4hi cs, ds; +__v4hf cf, df; + +__v2hf auf (__v2hu a, __v2hu b) { return (a > b) ? af : bf; } +__v2hf asf (__v2hi a, __v2hi b) { return (a > b) ? af : bf; } +__v2hu afu (__v2hf a, __v2hf b) { return (a > b) ? au : bu; } +__v2hi afs (__v2hf a, __v2hf b) { return (a > b) ? as : bs; } + +__v4hf cuf (__v4hu c, __v4hu d) { return (c > d) ? cf : df; } +__v4hf csf (__v4hi c, __v4hi d) { return (c > d) ? cf : df; } +__v4hu cfu (__v4hf c, __v4hf d) { return (c > d) ? cu : du; } +__v4hi cfs (__v4hf c, __v4hf d) { return (c > d) ? cs : ds; } + +__v2hf auf_ne (__v2hu a, __v2hu b) { return (a != b) ? af : bf; } +__v2hf asf_ne (__v2hi a, __v2hi b) { return (a != b) ? af : bf; } +__v2hu afu_ne (__v2hf a, __v2hf b) { return (a != b) ? au : bu; } +__v2hi afs_ne (__v2hf a, __v2hf b) { return (a != b) ? as : bs; } + +__v4hf cuf_ne (__v4hu c, __v4hu d) { return (c != d) ? cf : df; } +__v4hf csf_ne (__v4hi c, __v4hi d) { return (c != d) ? cf : df; } +__v4hu cfu_ne (__v4hf c, __v4hf d) { return (c != d) ? cu : du; } +__v4hi cfs_ne (__v4hf c, __v4hf d) { return (c != d) ? cs : ds; } diff --git a/gcc/testsuite/gcc.target/i386/part-vect-vec_cmphf.c b/gcc/testsuite/gcc.target/i386/part-vect-vec_cmphf.c new file mode 100644 index 00000000000..ee8659395eb --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/part-vect-vec_cmphf.c @@ -0,0 +1,26 @@ +/* { dg-do compile { target { ! ia32 } } } */ +/* { dg-options "-O2 -mavx512fp16 -mavx512vl" } */ +/* { dg-final { scan-assembler-times "vcmpph" 10 } } */ + +typedef _Float16 __attribute__((__vector_size__ (4))) v2hf; +typedef _Float16 __attribute__((__vector_size__ (8))) v4hf; + + +#define VCMPMN(type, op, name) \ +type \ +__attribute__ ((noinline, noclone)) \ +vec_cmp_##type##type##name (type a, type b) \ +{ \ + return a op b; \ +} + +VCMPMN (v4hf, <, lt) +VCMPMN (v2hf, <, lt) +VCMPMN (v4hf, <=, le) +VCMPMN (v2hf, <=, le) +VCMPMN (v4hf, >, gt) +VCMPMN (v2hf, >, gt) +VCMPMN (v4hf, >=, ge) +VCMPMN (v2hf, >=, ge) +VCMPMN (v4hf, ==, eq) +VCMPMN (v2hf, ==, eq)