From patchwork Thu Sep 21 07:19:56 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Hu, Lin1" X-Patchwork-Id: 1837511 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=intel.com header.i=@intel.com header.a=rsa-sha256 header.s=Intel header.b=BsIHem97; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4Rrn1874XGz1ynF for ; Thu, 21 Sep 2023 17:22:52 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id ED0653856DFE for ; Thu, 21 Sep 2023 07:22:50 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mgamail.intel.com (mgamail.intel.com [192.55.52.115]) by sourceware.org (Postfix) with ESMTPS id 7266F3858410 for ; Thu, 21 Sep 2023 07:22:35 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 7266F3858410 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=intel.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1695280955; x=1726816955; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=xHnlEQnjB91GWlxx3TppWgkgRNU5hcrb2UnFp/uN4aw=; b=BsIHem97RAICLKbobhc3L6GcQGjc65lS/S3hypMPugdUQh7oYdmcvDcl 0inYDM3g6cywS3YJMPRmy/5Vg74komy8p4H+nehUYVZLzHfzxIetZUjQl hLUCW1PGbIXTSUvEVxtAVn4sb0Ck6EMiGqkqKQpU6JQJ3K9MPDYJtTuB8 207o/K9Lo2PJOvmLXRqSLPflX9AuvKu/aP/JjuS9zrmE1tAkNY9EjfoXi GwiaPyYHbkr5HbA1SPu15stlAlW3+rksxTok+S0NF1oQzgRwgicGSxSvy 65DEjYOAud0+Z0+F/mY/mp37HIRxYvFiLvwHqOHozw2bvVl1711mF4r9p Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10839"; a="380352145" X-IronPort-AV: E=Sophos;i="6.03,164,1694761200"; d="scan'208";a="380352145" Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Sep 2023 00:22:18 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10839"; a="817262174" X-IronPort-AV: E=Sophos;i="6.03,164,1694761200"; d="scan'208";a="817262174" Received: from shvmail03.sh.intel.com ([10.239.245.20]) by fmsmga004.fm.intel.com with ESMTP; 21 Sep 2023 00:22:14 -0700 Received: from shliclel4217.sh.intel.com (shliclel4217.sh.intel.com [10.239.240.127]) by shvmail03.sh.intel.com (Postfix) with ESMTP id BD53010056F6; Thu, 21 Sep 2023 15:22:13 +0800 (CST) From: "Hu, Lin1" To: gcc-patches@gcc.gnu.org Cc: hongtao.liu@intel.com, ubizjak@gmail.com, haochen.jiang@intel.com Subject: [PATCH 01/18] Initial support for -mevex512 Date: Thu, 21 Sep 2023 15:19:56 +0800 Message-Id: <20230921072013.2124750-2-lin1.hu@intel.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20230921072013.2124750-1-lin1.hu@intel.com> References: <20230921072013.2124750-1-lin1.hu@intel.com> MIME-Version: 1.0 X-Spam-Status: No, score=-10.9 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_NUMSUBJECT, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org From: Haochen Jiang gcc/ChangeLog: * common/config/i386/i386-common.cc (OPTION_MASK_ISA2_EVEX512_SET): New. (OPTION_MASK_ISA2_EVEX512_UNSET): Ditto. (ix86_handle_option): Handle EVEX512. * config/i386/i386-c.cc (ix86_target_macros_internal): Ditto. * config/i386/i386-options.cc: (isa2_opts): Ditto. (ix86_valid_target_attribute_inner_p): Ditto. (ix86_option_override_internal): Set EVEX512 target if it is not explicitly set when AVX512 is enabled. Disable AVX512{PF,ER,4VNNIW,4FAMPS} for -mno-evex512. * config/i386/i386.opt: Add mevex512. Temporaily RejectNegative. --- gcc/common/config/i386/i386-common.cc | 15 +++++++++++++++ gcc/config/i386/i386-c.cc | 2 ++ gcc/config/i386/i386-options.cc | 19 ++++++++++++++++++- gcc/config/i386/i386.opt | 4 ++++ 4 files changed, 39 insertions(+), 1 deletion(-) diff --git a/gcc/common/config/i386/i386-common.cc b/gcc/common/config/i386/i386-common.cc index 95468b7c405..8cc59e08d06 100644 --- a/gcc/common/config/i386/i386-common.cc +++ b/gcc/common/config/i386/i386-common.cc @@ -123,6 +123,7 @@ along with GCC; see the file COPYING3. If not see #define OPTION_MASK_ISA2_SM3_SET OPTION_MASK_ISA2_SM3 #define OPTION_MASK_ISA2_SHA512_SET OPTION_MASK_ISA2_SHA512 #define OPTION_MASK_ISA2_SM4_SET OPTION_MASK_ISA2_SM4 +#define OPTION_MASK_ISA2_EVEX512_SET OPTION_MASK_ISA2_EVEX512 /* SSE4 includes both SSE4.1 and SSE4.2. -msse4 should be the same as -msse4.2. */ @@ -309,6 +310,7 @@ along with GCC; see the file COPYING3. If not see #define OPTION_MASK_ISA2_SM3_UNSET OPTION_MASK_ISA2_SM3 #define OPTION_MASK_ISA2_SHA512_UNSET OPTION_MASK_ISA2_SHA512 #define OPTION_MASK_ISA2_SM4_UNSET OPTION_MASK_ISA2_SM4 +#define OPTION_MASK_ISA2_EVEX512_UNSET OPTION_MASK_ISA2_EVEX512 /* SSE4 includes both SSE4.1 and SSE4.2. -mno-sse4 should the same as -mno-sse4.1. */ @@ -1341,6 +1343,19 @@ ix86_handle_option (struct gcc_options *opts, } return true; + case OPT_mevex512: + if (value) + { + opts->x_ix86_isa_flags2 |= OPTION_MASK_ISA2_EVEX512_SET; + opts->x_ix86_isa_flags2_explicit |= OPTION_MASK_ISA2_EVEX512_SET; + } + else + { + opts->x_ix86_isa_flags2 &= ~OPTION_MASK_ISA2_EVEX512_UNSET; + opts->x_ix86_isa_flags2_explicit |= OPTION_MASK_ISA2_EVEX512_UNSET; + } + return true; + case OPT_mfma: if (value) { diff --git a/gcc/config/i386/i386-c.cc b/gcc/config/i386/i386-c.cc index 47768fa0940..93154efa7ff 100644 --- a/gcc/config/i386/i386-c.cc +++ b/gcc/config/i386/i386-c.cc @@ -707,6 +707,8 @@ ix86_target_macros_internal (HOST_WIDE_INT isa_flag, def_or_undef (parse_in, "__SHA512__"); if (isa_flag2 & OPTION_MASK_ISA2_SM4) def_or_undef (parse_in, "__SM4__"); + if (isa_flag2 & OPTION_MASK_ISA2_EVEX512) + def_or_undef (parse_in, "__EVEX512__"); if (TARGET_IAMCU) { def_or_undef (parse_in, "__iamcu"); diff --git a/gcc/config/i386/i386-options.cc b/gcc/config/i386/i386-options.cc index e47f9ed5d5f..a1a7a92da9f 100644 --- a/gcc/config/i386/i386-options.cc +++ b/gcc/config/i386/i386-options.cc @@ -250,7 +250,8 @@ static struct ix86_target_opts isa2_opts[] = { "-mavxvnniint16", OPTION_MASK_ISA2_AVXVNNIINT16 }, { "-msm3", OPTION_MASK_ISA2_SM3 }, { "-msha512", OPTION_MASK_ISA2_SHA512 }, - { "-msm4", OPTION_MASK_ISA2_SM4 } + { "-msm4", OPTION_MASK_ISA2_SM4 }, + { "-mevex512", OPTION_MASK_ISA2_EVEX512 } }; static struct ix86_target_opts isa_opts[] = { @@ -1109,6 +1110,7 @@ ix86_valid_target_attribute_inner_p (tree fndecl, tree args, char *p_strings[], IX86_ATTR_ISA ("sm3", OPT_msm3), IX86_ATTR_ISA ("sha512", OPT_msha512), IX86_ATTR_ISA ("sm4", OPT_msm4), + IX86_ATTR_ISA ("evex512", OPT_mevex512), /* enum options */ IX86_ATTR_ENUM ("fpmath=", OPT_mfpmath_), @@ -2559,6 +2561,21 @@ ix86_option_override_internal (bool main_args_p, &= ~((OPTION_MASK_ISA_BMI | OPTION_MASK_ISA_BMI2 | OPTION_MASK_ISA_TBM) & ~opts->x_ix86_isa_flags_explicit); + /* Set EVEX512 target if it is not explicitly set + when AVX512 is enabled. */ + if (TARGET_AVX512F_P(opts->x_ix86_isa_flags) + && !(opts->x_ix86_isa_flags2_explicit & OPTION_MASK_ISA2_EVEX512)) + opts->x_ix86_isa_flags2 |= OPTION_MASK_ISA2_EVEX512; + + /* Disable AVX512{PF,ER,4VNNIW,4FAMPS} for -mno-evex512. */ + if (!TARGET_EVEX512_P(opts->x_ix86_isa_flags2)) + { + opts->x_ix86_isa_flags + &= ~(OPTION_MASK_ISA_AVX512PF | OPTION_MASK_ISA_AVX512ER); + opts->x_ix86_isa_flags2 + &= ~(OPTION_MASK_ISA2_AVX5124FMAPS | OPTION_MASK_ISA2_AVX5124VNNIW); + } + /* Validate -mpreferred-stack-boundary= value or default it to PREFERRED_STACK_BOUNDARY_DEFAULT. */ ix86_preferred_stack_boundary = PREFERRED_STACK_BOUNDARY_DEFAULT; diff --git a/gcc/config/i386/i386.opt b/gcc/config/i386/i386.opt index 78b499304a4..6d8601b1f75 100644 --- a/gcc/config/i386/i386.opt +++ b/gcc/config/i386/i386.opt @@ -1310,3 +1310,7 @@ Enable vectorization for gather instruction. mscatter Target Alias(mtune-ctrl=, use_scatter, ^use_scatter) Enable vectorization for scatter instruction. + +mevex512 +Target RejectNegative Mask(ISA2_EVEX512) Var(ix86_isa_flags2) Save +Support 512 bit vector built-in functions and code generation. From patchwork Thu Sep 21 07:19:58 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Hu, Lin1" X-Patchwork-Id: 1837517 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=intel.com header.i=@intel.com header.a=rsa-sha256 header.s=Intel header.b=JJscqUf3; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4Rrn2K5BRrz1ynj for ; Thu, 21 Sep 2023 17:23:53 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id C702E3852770 for ; Thu, 21 Sep 2023 07:23:48 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mgamail.intel.com (mgamail.intel.com [192.55.52.115]) by sourceware.org (Postfix) with ESMTPS id 7EA15385773F for ; Thu, 21 Sep 2023 07:22:46 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 7EA15385773F Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=intel.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1695280966; x=1726816966; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=YLgP2Tauq3zl5SMyBY+XcP9g8nizVSIhkc2Zcmf7dA8=; b=JJscqUf3DPMBr/cAD8rkrXGfwaKZ8PK4gN9+SmHHftfpTpATiWUgGEKs nqWHg7+/qLmAzuBC4N1vxvAvs6Wza9dyt09ggT6+uoyD4xSzAtA2gA90g z+HM9VoLxgpJ3UUUuNFEPu7Y6pahMbHimLuwoKk6I//xUNG/8rCM8DZhq 44bBe82she5Pfxx78bpv+V/ZeDO124hOw/4yGyy3/NHqo3VRoutKOBcsM w8uf8Pwmm2VUiDUWbStrWc0mC+dOqW748ezfTjCtSyZex+4uo8woQUSEn 3by/rw+HKf9Oq+sIRI51KmvCrlXLtZLYLFyIy3KrFnVrb7pwzUn4SYrQr A==; X-IronPort-AV: E=McAfee;i="6600,9927,10839"; a="380352175" X-IronPort-AV: E=Sophos;i="6.03,164,1694761200"; d="scan'208";a="380352175" Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Sep 2023 00:22:24 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10839"; a="817262209" X-IronPort-AV: E=Sophos;i="6.03,164,1694761200"; d="scan'208";a="817262209" Received: from shvmail03.sh.intel.com ([10.239.245.20]) by fmsmga004.fm.intel.com with ESMTP; 21 Sep 2023 00:22:15 -0700 Received: from shliclel4217.sh.intel.com (shliclel4217.sh.intel.com [10.239.240.127]) by shvmail03.sh.intel.com (Postfix) with ESMTP id 132C41005632; Thu, 21 Sep 2023 15:22:14 +0800 (CST) From: "Hu, Lin1" To: gcc-patches@gcc.gnu.org Cc: hongtao.liu@intel.com, ubizjak@gmail.com, haochen.jiang@intel.com Subject: [PATCH 03/18] [PATCH 2/5] Push evex512 target for 512 bit intrins Date: Thu, 21 Sep 2023 15:19:58 +0800 Message-Id: <20230921072013.2124750-4-lin1.hu@intel.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20230921072013.2124750-1-lin1.hu@intel.com> References: <20230921072013.2124750-1-lin1.hu@intel.com> MIME-Version: 1.0 X-Spam-Status: No, score=-1.5 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, SPF_HELO_NONE, SPF_NONE, TXREP, UNWANTED_LANGUAGE_BODY autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org From: Haochen Jiang gcc/ChangeLog: * config/i386/avx512dqintrin.h: Add evex512 target for 512 bit intrins. --- gcc/config/i386/avx512dqintrin.h | 1840 +++++++++++++++--------------- 1 file changed, 926 insertions(+), 914 deletions(-) diff --git a/gcc/config/i386/avx512dqintrin.h b/gcc/config/i386/avx512dqintrin.h index 93900a0b5c7..b6a1d499e25 100644 --- a/gcc/config/i386/avx512dqintrin.h +++ b/gcc/config/i386/avx512dqintrin.h @@ -184,1275 +184,1426 @@ _kandn_mask8 (__mmask8 __A, __mmask8 __B) return (__mmask8) __builtin_ia32_kandnqi ((__mmask8) __A, (__mmask8) __B); } -extern __inline __m512d -__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_broadcast_f64x2 (__m128d __A) -{ - return (__m512d) - __builtin_ia32_broadcastf64x2_512_mask ((__v2df) __A, - _mm512_undefined_pd (), - (__mmask8) -1); -} - -extern __inline __m512d +#ifdef __OPTIMIZE__ +extern __inline __mmask8 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_mask_broadcast_f64x2 (__m512d __O, __mmask8 __M, __m128d __A) +_kshiftli_mask8 (__mmask8 __A, unsigned int __B) { - return (__m512d) __builtin_ia32_broadcastf64x2_512_mask ((__v2df) - __A, - (__v8df) - __O, __M); + return (__mmask8) __builtin_ia32_kshiftliqi ((__mmask8) __A, (__mmask8) __B); } -extern __inline __m512d +extern __inline __mmask8 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_maskz_broadcast_f64x2 (__mmask8 __M, __m128d __A) +_kshiftri_mask8 (__mmask8 __A, unsigned int __B) { - return (__m512d) __builtin_ia32_broadcastf64x2_512_mask ((__v2df) - __A, - (__v8df) - _mm512_setzero_ps (), - __M); + return (__mmask8) __builtin_ia32_kshiftriqi ((__mmask8) __A, (__mmask8) __B); } -extern __inline __m512i +extern __inline __m128d __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_broadcast_i64x2 (__m128i __A) +_mm_reduce_sd (__m128d __A, __m128d __B, int __C) { - return (__m512i) - __builtin_ia32_broadcasti64x2_512_mask ((__v2di) __A, - _mm512_undefined_epi32 (), + return (__m128d) __builtin_ia32_reducesd_mask ((__v2df) __A, + (__v2df) __B, __C, + (__v2df) _mm_setzero_pd (), (__mmask8) -1); } -extern __inline __m512i +extern __inline __m128d __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_mask_broadcast_i64x2 (__m512i __O, __mmask8 __M, __m128i __A) +_mm_reduce_round_sd (__m128d __A, __m128d __B, int __C, const int __R) { - return (__m512i) __builtin_ia32_broadcasti64x2_512_mask ((__v2di) - __A, - (__v8di) - __O, __M); + return (__m128d) __builtin_ia32_reducesd_mask_round ((__v2df) __A, + (__v2df) __B, __C, + (__v2df) + _mm_setzero_pd (), + (__mmask8) -1, __R); } -extern __inline __m512i +extern __inline __m128d __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_maskz_broadcast_i64x2 (__mmask8 __M, __m128i __A) +_mm_mask_reduce_sd (__m128d __W, __mmask8 __U, __m128d __A, + __m128d __B, int __C) { - return (__m512i) __builtin_ia32_broadcasti64x2_512_mask ((__v2di) - __A, - (__v8di) - _mm512_setzero_si512 (), - __M); + return (__m128d) __builtin_ia32_reducesd_mask ((__v2df) __A, + (__v2df) __B, __C, + (__v2df) __W, + (__mmask8) __U); } -extern __inline __m512 +extern __inline __m128d __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_broadcast_f32x2 (__m128 __A) +_mm_mask_reduce_round_sd (__m128d __W, __mmask8 __U, __m128d __A, + __m128d __B, int __C, const int __R) { - return (__m512) - __builtin_ia32_broadcastf32x2_512_mask ((__v4sf) __A, - (__v16sf)_mm512_undefined_ps (), - (__mmask16) -1); + return (__m128d) __builtin_ia32_reducesd_mask_round ((__v2df) __A, + (__v2df) __B, __C, + (__v2df) __W, + __U, __R); } -extern __inline __m512 +extern __inline __m128d __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_mask_broadcast_f32x2 (__m512 __O, __mmask16 __M, __m128 __A) +_mm_maskz_reduce_sd (__mmask8 __U, __m128d __A, __m128d __B, int __C) { - return (__m512) __builtin_ia32_broadcastf32x2_512_mask ((__v4sf) __A, - (__v16sf) - __O, __M); + return (__m128d) __builtin_ia32_reducesd_mask ((__v2df) __A, + (__v2df) __B, __C, + (__v2df) _mm_setzero_pd (), + (__mmask8) __U); } -extern __inline __m512 +extern __inline __m128d __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_maskz_broadcast_f32x2 (__mmask16 __M, __m128 __A) +_mm_maskz_reduce_round_sd (__mmask8 __U, __m128d __A, __m128d __B, + int __C, const int __R) { - return (__m512) __builtin_ia32_broadcastf32x2_512_mask ((__v4sf) __A, - (__v16sf) - _mm512_setzero_ps (), - __M); + return (__m128d) __builtin_ia32_reducesd_mask_round ((__v2df) __A, + (__v2df) __B, __C, + (__v2df) + _mm_setzero_pd (), + __U, __R); } -extern __inline __m512i +extern __inline __m128 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_broadcast_i32x2 (__m128i __A) +_mm_reduce_ss (__m128 __A, __m128 __B, int __C) { - return (__m512i) - __builtin_ia32_broadcasti32x2_512_mask ((__v4si) __A, - (__v16si) - _mm512_undefined_epi32 (), - (__mmask16) -1); + return (__m128) __builtin_ia32_reducess_mask ((__v4sf) __A, + (__v4sf) __B, __C, + (__v4sf) _mm_setzero_ps (), + (__mmask8) -1); } -extern __inline __m512i +extern __inline __m128 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_mask_broadcast_i32x2 (__m512i __O, __mmask16 __M, __m128i __A) +_mm_reduce_round_ss (__m128 __A, __m128 __B, int __C, const int __R) { - return (__m512i) __builtin_ia32_broadcasti32x2_512_mask ((__v4si) - __A, - (__v16si) - __O, __M); + return (__m128) __builtin_ia32_reducess_mask_round ((__v4sf) __A, + (__v4sf) __B, __C, + (__v4sf) + _mm_setzero_ps (), + (__mmask8) -1, __R); } -extern __inline __m512i +extern __inline __m128 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_maskz_broadcast_i32x2 (__mmask16 __M, __m128i __A) +_mm_mask_reduce_ss (__m128 __W, __mmask8 __U, __m128 __A, + __m128 __B, int __C) { - return (__m512i) __builtin_ia32_broadcasti32x2_512_mask ((__v4si) - __A, - (__v16si) - _mm512_setzero_si512 (), - __M); + return (__m128) __builtin_ia32_reducess_mask ((__v4sf) __A, + (__v4sf) __B, __C, + (__v4sf) __W, + (__mmask8) __U); } -extern __inline __m512 +extern __inline __m128 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_broadcast_f32x8 (__m256 __A) +_mm_mask_reduce_round_ss (__m128 __W, __mmask8 __U, __m128 __A, + __m128 __B, int __C, const int __R) { - return (__m512) - __builtin_ia32_broadcastf32x8_512_mask ((__v8sf) __A, - _mm512_undefined_ps (), - (__mmask16) -1); + return (__m128) __builtin_ia32_reducess_mask_round ((__v4sf) __A, + (__v4sf) __B, __C, + (__v4sf) __W, + __U, __R); } -extern __inline __m512 +extern __inline __m128 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_mask_broadcast_f32x8 (__m512 __O, __mmask16 __M, __m256 __A) +_mm_maskz_reduce_ss (__mmask8 __U, __m128 __A, __m128 __B, int __C) { - return (__m512) __builtin_ia32_broadcastf32x8_512_mask ((__v8sf) __A, - (__v16sf)__O, - __M); + return (__m128) __builtin_ia32_reducess_mask ((__v4sf) __A, + (__v4sf) __B, __C, + (__v4sf) _mm_setzero_ps (), + (__mmask8) __U); } -extern __inline __m512 +extern __inline __m128 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_maskz_broadcast_f32x8 (__mmask16 __M, __m256 __A) +_mm_maskz_reduce_round_ss (__mmask8 __U, __m128 __A, __m128 __B, + int __C, const int __R) { - return (__m512) __builtin_ia32_broadcastf32x8_512_mask ((__v8sf) __A, - (__v16sf) - _mm512_setzero_ps (), - __M); + return (__m128) __builtin_ia32_reducess_mask_round ((__v4sf) __A, + (__v4sf) __B, __C, + (__v4sf) + _mm_setzero_ps (), + __U, __R); } -extern __inline __m512i +extern __inline __m128d __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_broadcast_i32x8 (__m256i __A) +_mm_range_sd (__m128d __A, __m128d __B, int __C) { - return (__m512i) - __builtin_ia32_broadcasti32x8_512_mask ((__v8si) __A, - (__v16si) - _mm512_undefined_epi32 (), - (__mmask16) -1); + return (__m128d) __builtin_ia32_rangesd128_mask_round ((__v2df) __A, + (__v2df) __B, __C, + (__v2df) + _mm_setzero_pd (), + (__mmask8) -1, + _MM_FROUND_CUR_DIRECTION); } -extern __inline __m512i +extern __inline __m128d __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_mask_broadcast_i32x8 (__m512i __O, __mmask16 __M, __m256i __A) +_mm_mask_range_sd (__m128d __W, __mmask8 __U, __m128d __A, __m128d __B, int __C) { - return (__m512i) __builtin_ia32_broadcasti32x8_512_mask ((__v8si) - __A, - (__v16si)__O, - __M); + return (__m128d) __builtin_ia32_rangesd128_mask_round ((__v2df) __A, + (__v2df) __B, __C, + (__v2df) __W, + (__mmask8) __U, + _MM_FROUND_CUR_DIRECTION); } -extern __inline __m512i +extern __inline __m128d __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_maskz_broadcast_i32x8 (__mmask16 __M, __m256i __A) +_mm_maskz_range_sd (__mmask8 __U, __m128d __A, __m128d __B, int __C) { - return (__m512i) __builtin_ia32_broadcasti32x8_512_mask ((__v8si) - __A, - (__v16si) - _mm512_setzero_si512 (), - __M); + return (__m128d) __builtin_ia32_rangesd128_mask_round ((__v2df) __A, + (__v2df) __B, __C, + (__v2df) + _mm_setzero_pd (), + (__mmask8) __U, + _MM_FROUND_CUR_DIRECTION); } -extern __inline __m512i +extern __inline __m128 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_mullo_epi64 (__m512i __A, __m512i __B) +_mm_range_ss (__m128 __A, __m128 __B, int __C) { - return (__m512i) ((__v8du) __A * (__v8du) __B); + return (__m128) __builtin_ia32_rangess128_mask_round ((__v4sf) __A, + (__v4sf) __B, __C, + (__v4sf) + _mm_setzero_ps (), + (__mmask8) -1, + _MM_FROUND_CUR_DIRECTION); } -extern __inline __m512i +extern __inline __m128 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_mask_mullo_epi64 (__m512i __W, __mmask8 __U, __m512i __A, - __m512i __B) +_mm_mask_range_ss (__m128 __W, __mmask8 __U, __m128 __A, __m128 __B, int __C) { - return (__m512i) __builtin_ia32_pmullq512_mask ((__v8di) __A, - (__v8di) __B, - (__v8di) __W, - (__mmask8) __U); + return (__m128) __builtin_ia32_rangess128_mask_round ((__v4sf) __A, + (__v4sf) __B, __C, + (__v4sf) __W, + (__mmask8) __U, + _MM_FROUND_CUR_DIRECTION); } -extern __inline __m512i +extern __inline __m128 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_maskz_mullo_epi64 (__mmask8 __U, __m512i __A, __m512i __B) +_mm_maskz_range_ss (__mmask8 __U, __m128 __A, __m128 __B, int __C) { - return (__m512i) __builtin_ia32_pmullq512_mask ((__v8di) __A, - (__v8di) __B, - (__v8di) - _mm512_setzero_si512 (), - (__mmask8) __U); + return (__m128) __builtin_ia32_rangess128_mask_round ((__v4sf) __A, + (__v4sf) __B, __C, + (__v4sf) + _mm_setzero_ps (), + (__mmask8) __U, + _MM_FROUND_CUR_DIRECTION); } -extern __inline __m512d +extern __inline __m128d __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_xor_pd (__m512d __A, __m512d __B) +_mm_range_round_sd (__m128d __A, __m128d __B, int __C, const int __R) { - return (__m512d) __builtin_ia32_xorpd512_mask ((__v8df) __A, - (__v8df) __B, - (__v8df) - _mm512_setzero_pd (), - (__mmask8) -1); + return (__m128d) __builtin_ia32_rangesd128_mask_round ((__v2df) __A, + (__v2df) __B, __C, + (__v2df) + _mm_setzero_pd (), + (__mmask8) -1, __R); } -extern __inline __m512d +extern __inline __m128d __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_mask_xor_pd (__m512d __W, __mmask8 __U, __m512d __A, - __m512d __B) +_mm_mask_range_round_sd (__m128d __W, __mmask8 __U, __m128d __A, __m128d __B, + int __C, const int __R) { - return (__m512d) __builtin_ia32_xorpd512_mask ((__v8df) __A, - (__v8df) __B, - (__v8df) __W, - (__mmask8) __U); + return (__m128d) __builtin_ia32_rangesd128_mask_round ((__v2df) __A, + (__v2df) __B, __C, + (__v2df) __W, + (__mmask8) __U, __R); } -extern __inline __m512d +extern __inline __m128d __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_maskz_xor_pd (__mmask8 __U, __m512d __A, __m512d __B) +_mm_maskz_range_round_sd (__mmask8 __U, __m128d __A, __m128d __B, int __C, + const int __R) { - return (__m512d) __builtin_ia32_xorpd512_mask ((__v8df) __A, - (__v8df) __B, - (__v8df) - _mm512_setzero_pd (), - (__mmask8) __U); + return (__m128d) __builtin_ia32_rangesd128_mask_round ((__v2df) __A, + (__v2df) __B, __C, + (__v2df) + _mm_setzero_pd (), + (__mmask8) __U, __R); } -extern __inline __m512 +extern __inline __m128 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_xor_ps (__m512 __A, __m512 __B) +_mm_range_round_ss (__m128 __A, __m128 __B, int __C, const int __R) { - return (__m512) __builtin_ia32_xorps512_mask ((__v16sf) __A, - (__v16sf) __B, - (__v16sf) - _mm512_setzero_ps (), - (__mmask16) -1); + return (__m128) __builtin_ia32_rangess128_mask_round ((__v4sf) __A, + (__v4sf) __B, __C, + (__v4sf) + _mm_setzero_ps (), + (__mmask8) -1, __R); } -extern __inline __m512 +extern __inline __m128 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_mask_xor_ps (__m512 __W, __mmask16 __U, __m512 __A, __m512 __B) +_mm_mask_range_round_ss (__m128 __W, __mmask8 __U, __m128 __A, __m128 __B, + int __C, const int __R) { - return (__m512) __builtin_ia32_xorps512_mask ((__v16sf) __A, - (__v16sf) __B, - (__v16sf) __W, - (__mmask16) __U); + return (__m128) __builtin_ia32_rangess128_mask_round ((__v4sf) __A, + (__v4sf) __B, __C, + (__v4sf) __W, + (__mmask8) __U, __R); } -extern __inline __m512 +extern __inline __m128 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_maskz_xor_ps (__mmask16 __U, __m512 __A, __m512 __B) +_mm_maskz_range_round_ss (__mmask8 __U, __m128 __A, __m128 __B, int __C, + const int __R) { - return (__m512) __builtin_ia32_xorps512_mask ((__v16sf) __A, - (__v16sf) __B, - (__v16sf) - _mm512_setzero_ps (), - (__mmask16) __U); + return (__m128) __builtin_ia32_rangess128_mask_round ((__v4sf) __A, + (__v4sf) __B, __C, + (__v4sf) + _mm_setzero_ps (), + (__mmask8) __U, __R); } -extern __inline __m512d +extern __inline __mmask8 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_or_pd (__m512d __A, __m512d __B) +_mm_fpclass_ss_mask (__m128 __A, const int __imm) { - return (__m512d) __builtin_ia32_orpd512_mask ((__v8df) __A, - (__v8df) __B, - (__v8df) - _mm512_setzero_pd (), - (__mmask8) -1); + return (__mmask8) __builtin_ia32_fpclassss_mask ((__v4sf) __A, __imm, + (__mmask8) -1); } -extern __inline __m512d +extern __inline __mmask8 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_mask_or_pd (__m512d __W, __mmask8 __U, __m512d __A, __m512d __B) +_mm_fpclass_sd_mask (__m128d __A, const int __imm) { - return (__m512d) __builtin_ia32_orpd512_mask ((__v8df) __A, - (__v8df) __B, - (__v8df) __W, - (__mmask8) __U); + return (__mmask8) __builtin_ia32_fpclasssd_mask ((__v2df) __A, __imm, + (__mmask8) -1); } -extern __inline __m512d +extern __inline __mmask8 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_maskz_or_pd (__mmask8 __U, __m512d __A, __m512d __B) +_mm_mask_fpclass_ss_mask (__mmask8 __U, __m128 __A, const int __imm) { - return (__m512d) __builtin_ia32_orpd512_mask ((__v8df) __A, - (__v8df) __B, - (__v8df) - _mm512_setzero_pd (), - (__mmask8) __U); + return (__mmask8) __builtin_ia32_fpclassss_mask ((__v4sf) __A, __imm, __U); } -extern __inline __m512 +extern __inline __mmask8 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_or_ps (__m512 __A, __m512 __B) +_mm_mask_fpclass_sd_mask (__mmask8 __U, __m128d __A, const int __imm) { - return (__m512) __builtin_ia32_orps512_mask ((__v16sf) __A, - (__v16sf) __B, - (__v16sf) - _mm512_setzero_ps (), - (__mmask16) -1); + return (__mmask8) __builtin_ia32_fpclasssd_mask ((__v2df) __A, __imm, __U); } -extern __inline __m512 -__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_mask_or_ps (__m512 __W, __mmask16 __U, __m512 __A, __m512 __B) -{ - return (__m512) __builtin_ia32_orps512_mask ((__v16sf) __A, - (__v16sf) __B, - (__v16sf) __W, - (__mmask16) __U); -} +#else +#define _kshiftli_mask8(X, Y) \ + ((__mmask8) __builtin_ia32_kshiftliqi ((__mmask8)(X), (__mmask8)(Y))) -extern __inline __m512 -__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_maskz_or_ps (__mmask16 __U, __m512 __A, __m512 __B) -{ - return (__m512) __builtin_ia32_orps512_mask ((__v16sf) __A, - (__v16sf) __B, - (__v16sf) - _mm512_setzero_ps (), - (__mmask16) __U); -} +#define _kshiftri_mask8(X, Y) \ + ((__mmask8) __builtin_ia32_kshiftriqi ((__mmask8)(X), (__mmask8)(Y))) + +#define _mm_range_sd(A, B, C) \ + ((__m128d) __builtin_ia32_rangesd128_mask_round ((__v2df)(__m128d)(A), \ + (__v2df)(__m128d)(B), (int)(C), (__v2df) _mm_setzero_pd (), \ + (__mmask8) -1, _MM_FROUND_CUR_DIRECTION)) + +#define _mm_mask_range_sd(W, U, A, B, C) \ + ((__m128d) __builtin_ia32_rangesd128_mask_round ((__v2df)(__m128d)(A), \ + (__v2df)(__m128d)(B), (int)(C), (__v2df)(__m128d)(W), \ + (__mmask8)(U), _MM_FROUND_CUR_DIRECTION)) + +#define _mm_maskz_range_sd(U, A, B, C) \ + ((__m128d) __builtin_ia32_rangesd128_mask_round ((__v2df)(__m128d)(A), \ + (__v2df)(__m128d)(B), (int)(C), (__v2df) _mm_setzero_pd (), \ + (__mmask8)(U), _MM_FROUND_CUR_DIRECTION)) + +#define _mm_range_ss(A, B, C) \ + ((__m128) __builtin_ia32_rangess128_mask_round ((__v4sf)(__m128)(A), \ + (__v4sf)(__m128)(B), (int)(C), (__v4sf) _mm_setzero_ps (), \ + (__mmask8) -1, _MM_FROUND_CUR_DIRECTION)) + +#define _mm_mask_range_ss(W, U, A, B, C) \ + ((__m128) __builtin_ia32_rangess128_mask_round ((__v4sf)(__m128)(A), \ + (__v4sf)(__m128)(B), (int)(C), (__v4sf)(__m128)(W), \ + (__mmask8)(U), _MM_FROUND_CUR_DIRECTION)) + +#define _mm_maskz_range_ss(U, A, B, C) \ + ((__m128) __builtin_ia32_rangess128_mask_round ((__v4sf)(__m128)(A), \ + (__v4sf)(__m128)(B), (int)(C), (__v4sf) _mm_setzero_ps (), \ + (__mmask8)(U), _MM_FROUND_CUR_DIRECTION)) + +#define _mm_range_round_sd(A, B, C, R) \ + ((__m128d) __builtin_ia32_rangesd128_mask_round ((__v2df)(__m128d)(A), \ + (__v2df)(__m128d)(B), (int)(C), (__v2df) _mm_setzero_pd (), \ + (__mmask8) -1, (R))) + +#define _mm_mask_range_round_sd(W, U, A, B, C, R) \ + ((__m128d) __builtin_ia32_rangesd128_mask_round ((__v2df)(__m128d)(A), \ + (__v2df)(__m128d)(B), (int)(C), (__v2df)(__m128d)(W), \ + (__mmask8)(U), (R))) + +#define _mm_maskz_range_round_sd(U, A, B, C, R) \ + ((__m128d) __builtin_ia32_rangesd128_mask_round ((__v2df)(__m128d)(A), \ + (__v2df)(__m128d)(B), (int)(C), (__v2df) _mm_setzero_pd (), \ + (__mmask8)(U), (R))) + +#define _mm_range_round_ss(A, B, C, R) \ + ((__m128) __builtin_ia32_rangess128_mask_round ((__v4sf)(__m128)(A), \ + (__v4sf)(__m128)(B), (int)(C), (__v4sf) _mm_setzero_ps (), \ + (__mmask8) -1, (R))) + +#define _mm_mask_range_round_ss(W, U, A, B, C, R) \ + ((__m128) __builtin_ia32_rangess128_mask_round ((__v4sf)(__m128)(A), \ + (__v4sf)(__m128)(B), (int)(C), (__v4sf)(__m128)(W), \ + (__mmask8)(U), (R))) + +#define _mm_maskz_range_round_ss(U, A, B, C, R) \ + ((__m128) __builtin_ia32_rangess128_mask_round ((__v4sf)(__m128)(A), \ + (__v4sf)(__m128)(B), (int)(C), (__v4sf) _mm_setzero_ps (), \ + (__mmask8)(U), (R))) + +#define _mm_fpclass_ss_mask(X, C) \ + ((__mmask8) __builtin_ia32_fpclassss_mask ((__v4sf) (__m128) (X), \ + (int) (C), (__mmask8) (-1))) \ + +#define _mm_fpclass_sd_mask(X, C) \ + ((__mmask8) __builtin_ia32_fpclasssd_mask ((__v2df) (__m128d) (X), \ + (int) (C), (__mmask8) (-1))) \ + +#define _mm_mask_fpclass_ss_mask(X, C, U) \ + ((__mmask8) __builtin_ia32_fpclassss_mask ((__v4sf) (__m128) (X), \ + (int) (C), (__mmask8) (U))) + +#define _mm_mask_fpclass_sd_mask(X, C, U) \ + ((__mmask8) __builtin_ia32_fpclasssd_mask ((__v2df) (__m128d) (X), \ + (int) (C), (__mmask8) (U))) +#define _mm_reduce_sd(A, B, C) \ + ((__m128d) __builtin_ia32_reducesd_mask ((__v2df)(__m128d)(A), \ + (__v2df)(__m128d)(B), (int)(C), (__v2df) _mm_setzero_pd (), \ + (__mmask8)-1)) + +#define _mm_mask_reduce_sd(W, U, A, B, C) \ + ((__m128d) __builtin_ia32_reducesd_mask ((__v2df)(__m128d)(A), \ + (__v2df)(__m128d)(B), (int)(C), (__v2df)(__m128d)(W), (__mmask8)(U))) + +#define _mm_maskz_reduce_sd(U, A, B, C) \ + ((__m128d) __builtin_ia32_reducesd_mask ((__v2df)(__m128d)(A), \ + (__v2df)(__m128d)(B), (int)(C), (__v2df) _mm_setzero_pd (), \ + (__mmask8)(U))) + +#define _mm_reduce_round_sd(A, B, C, R) \ + ((__m128d) __builtin_ia32_reducesd_round ((__v2df)(__m128d)(A), \ + (__v2df)(__m128d)(B), (int)(C), (__mmask8)(U), (int)(R))) + +#define _mm_mask_reduce_round_sd(W, U, A, B, C, R) \ + ((__m128d) __builtin_ia32_reducesd_mask_round ((__v2df)(__m128d)(A), \ + (__v2df)(__m128d)(B), (int)(C), (__v2df)(__m128d)(W), \ + (__mmask8)(U), (int)(R))) + +#define _mm_maskz_reduce_round_sd(U, A, B, C, R) \ + ((__m128d) __builtin_ia32_reducesd_mask_round ((__v2df)(__m128d)(A), \ + (__v2df)(__m128d)(B), (int)(C), (__v2df) _mm_setzero_pd (), \ + (__mmask8)(U), (int)(R))) + +#define _mm_reduce_ss(A, B, C) \ + ((__m128) __builtin_ia32_reducess_mask ((__v4sf)(__m128)(A), \ + (__v4sf)(__m128)(B), (int)(C), (__v4sf) _mm_setzero_ps (), \ + (__mmask8)-1)) + +#define _mm_mask_reduce_ss(W, U, A, B, C) \ + ((__m128) __builtin_ia32_reducess_mask ((__v4sf)(__m128)(A), \ + (__v4sf)(__m128)(B), (int)(C), (__v4sf)(__m128)(W), (__mmask8)(U))) + +#define _mm_maskz_reduce_ss(U, A, B, C) \ + ((__m128) __builtin_ia32_reducess_mask ((__v4sf)(__m128)(A), \ + (__v4sf)(__m128)(B), (int)(C), (__v4sf) _mm_setzero_ps (), \ + (__mmask8)(U))) + +#define _mm_reduce_round_ss(A, B, C, R) \ + ((__m128) __builtin_ia32_reducess_round ((__v4sf)(__m128)(A), \ + (__v4sf)(__m128)(B), (int)(C), (__mmask8)(U), (int)(R))) + +#define _mm_mask_reduce_round_ss(W, U, A, B, C, R) \ + ((__m128) __builtin_ia32_reducess_mask_round ((__v4sf)(__m128)(A), \ + (__v4sf)(__m128)(B), (int)(C), (__v4sf)(__m128)(W), \ + (__mmask8)(U), (int)(R))) + +#define _mm_maskz_reduce_round_ss(U, A, B, C, R) \ + ((__m128) __builtin_ia32_reducesd_mask_round ((__v4sf)(__m128)(A), \ + (__v4sf)(__m128)(B), (int)(C), (__v4sf) _mm_setzero_ps (), \ + (__mmask8)(U), (int)(R))) + +#endif + +#ifdef __DISABLE_AVX512DQ__ +#undef __DISABLE_AVX512DQ__ +#pragma GCC pop_options +#endif /* __DISABLE_AVX512DQ__ */ + +#if !defined (__AVX512DQ__) || !defined (__EVEX512__) +#pragma GCC push_options +#pragma GCC target("avx512dq,evex512") +#define __DISABLE_AVX512DQ_512__ +#endif /* __AVX512DQ_512__ */ extern __inline __m512d __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_and_pd (__m512d __A, __m512d __B) +_mm512_broadcast_f64x2 (__m128d __A) { - return (__m512d) __builtin_ia32_andpd512_mask ((__v8df) __A, - (__v8df) __B, - (__v8df) - _mm512_setzero_pd (), + return (__m512d) + __builtin_ia32_broadcastf64x2_512_mask ((__v2df) __A, + _mm512_undefined_pd (), (__mmask8) -1); } extern __inline __m512d __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_mask_and_pd (__m512d __W, __mmask8 __U, __m512d __A, - __m512d __B) +_mm512_mask_broadcast_f64x2 (__m512d __O, __mmask8 __M, __m128d __A) { - return (__m512d) __builtin_ia32_andpd512_mask ((__v8df) __A, - (__v8df) __B, - (__v8df) __W, - (__mmask8) __U); + return (__m512d) __builtin_ia32_broadcastf64x2_512_mask ((__v2df) + __A, + (__v8df) + __O, __M); } extern __inline __m512d __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_maskz_and_pd (__mmask8 __U, __m512d __A, __m512d __B) +_mm512_maskz_broadcast_f64x2 (__mmask8 __M, __m128d __A) { - return (__m512d) __builtin_ia32_andpd512_mask ((__v8df) __A, - (__v8df) __B, - (__v8df) - _mm512_setzero_pd (), - (__mmask8) __U); + return (__m512d) __builtin_ia32_broadcastf64x2_512_mask ((__v2df) + __A, + (__v8df) + _mm512_setzero_ps (), + __M); } -extern __inline __m512 +extern __inline __m512i __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_and_ps (__m512 __A, __m512 __B) +_mm512_broadcast_i64x2 (__m128i __A) { - return (__m512) __builtin_ia32_andps512_mask ((__v16sf) __A, - (__v16sf) __B, - (__v16sf) - _mm512_setzero_ps (), - (__mmask16) -1); + return (__m512i) + __builtin_ia32_broadcasti64x2_512_mask ((__v2di) __A, + _mm512_undefined_epi32 (), + (__mmask8) -1); } -extern __inline __m512 +extern __inline __m512i __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_mask_and_ps (__m512 __W, __mmask16 __U, __m512 __A, __m512 __B) +_mm512_mask_broadcast_i64x2 (__m512i __O, __mmask8 __M, __m128i __A) { - return (__m512) __builtin_ia32_andps512_mask ((__v16sf) __A, - (__v16sf) __B, - (__v16sf) __W, - (__mmask16) __U); + return (__m512i) __builtin_ia32_broadcasti64x2_512_mask ((__v2di) + __A, + (__v8di) + __O, __M); } -extern __inline __m512 +extern __inline __m512i __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_maskz_and_ps (__mmask16 __U, __m512 __A, __m512 __B) +_mm512_maskz_broadcast_i64x2 (__mmask8 __M, __m128i __A) { - return (__m512) __builtin_ia32_andps512_mask ((__v16sf) __A, - (__v16sf) __B, - (__v16sf) - _mm512_setzero_ps (), - (__mmask16) __U); + return (__m512i) __builtin_ia32_broadcasti64x2_512_mask ((__v2di) + __A, + (__v8di) + _mm512_setzero_si512 (), + __M); } -extern __inline __m512d +extern __inline __m512 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_andnot_pd (__m512d __A, __m512d __B) +_mm512_broadcast_f32x2 (__m128 __A) { - return (__m512d) __builtin_ia32_andnpd512_mask ((__v8df) __A, - (__v8df) __B, - (__v8df) - _mm512_setzero_pd (), - (__mmask8) -1); + return (__m512) + __builtin_ia32_broadcastf32x2_512_mask ((__v4sf) __A, + (__v16sf)_mm512_undefined_ps (), + (__mmask16) -1); } -extern __inline __m512d +extern __inline __m512 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_mask_andnot_pd (__m512d __W, __mmask8 __U, __m512d __A, - __m512d __B) +_mm512_mask_broadcast_f32x2 (__m512 __O, __mmask16 __M, __m128 __A) { - return (__m512d) __builtin_ia32_andnpd512_mask ((__v8df) __A, - (__v8df) __B, - (__v8df) __W, - (__mmask8) __U); + return (__m512) __builtin_ia32_broadcastf32x2_512_mask ((__v4sf) __A, + (__v16sf) + __O, __M); } -extern __inline __m512d +extern __inline __m512 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_maskz_andnot_pd (__mmask8 __U, __m512d __A, __m512d __B) +_mm512_maskz_broadcast_f32x2 (__mmask16 __M, __m128 __A) { - return (__m512d) __builtin_ia32_andnpd512_mask ((__v8df) __A, - (__v8df) __B, - (__v8df) - _mm512_setzero_pd (), - (__mmask8) __U); + return (__m512) __builtin_ia32_broadcastf32x2_512_mask ((__v4sf) __A, + (__v16sf) + _mm512_setzero_ps (), + __M); } -extern __inline __m512 +extern __inline __m512i __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_andnot_ps (__m512 __A, __m512 __B) +_mm512_broadcast_i32x2 (__m128i __A) { - return (__m512) __builtin_ia32_andnps512_mask ((__v16sf) __A, - (__v16sf) __B, - (__v16sf) - _mm512_setzero_ps (), + return (__m512i) + __builtin_ia32_broadcasti32x2_512_mask ((__v4si) __A, + (__v16si) + _mm512_undefined_epi32 (), (__mmask16) -1); } -extern __inline __m512 +extern __inline __m512i __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_mask_andnot_ps (__m512 __W, __mmask16 __U, __m512 __A, - __m512 __B) +_mm512_mask_broadcast_i32x2 (__m512i __O, __mmask16 __M, __m128i __A) { - return (__m512) __builtin_ia32_andnps512_mask ((__v16sf) __A, - (__v16sf) __B, - (__v16sf) __W, - (__mmask16) __U); + return (__m512i) __builtin_ia32_broadcasti32x2_512_mask ((__v4si) + __A, + (__v16si) + __O, __M); } -extern __inline __m512 +extern __inline __m512i __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_maskz_andnot_ps (__mmask16 __U, __m512 __A, __m512 __B) +_mm512_maskz_broadcast_i32x2 (__mmask16 __M, __m128i __A) { - return (__m512) __builtin_ia32_andnps512_mask ((__v16sf) __A, - (__v16sf) __B, - (__v16sf) - _mm512_setzero_ps (), - (__mmask16) __U); + return (__m512i) __builtin_ia32_broadcasti32x2_512_mask ((__v4si) + __A, + (__v16si) + _mm512_setzero_si512 (), + __M); } -extern __inline __mmask16 +extern __inline __m512 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_movepi32_mask (__m512i __A) +_mm512_broadcast_f32x8 (__m256 __A) { - return (__mmask16) __builtin_ia32_cvtd2mask512 ((__v16si) __A); + return (__m512) + __builtin_ia32_broadcastf32x8_512_mask ((__v8sf) __A, + _mm512_undefined_ps (), + (__mmask16) -1); } -extern __inline __mmask8 +extern __inline __m512 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_movepi64_mask (__m512i __A) +_mm512_mask_broadcast_f32x8 (__m512 __O, __mmask16 __M, __m256 __A) { - return (__mmask8) __builtin_ia32_cvtq2mask512 ((__v8di) __A); + return (__m512) __builtin_ia32_broadcastf32x8_512_mask ((__v8sf) __A, + (__v16sf)__O, + __M); } -extern __inline __m512i +extern __inline __m512 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_movm_epi32 (__mmask16 __A) +_mm512_maskz_broadcast_f32x8 (__mmask16 __M, __m256 __A) { - return (__m512i) __builtin_ia32_cvtmask2d512 (__A); + return (__m512) __builtin_ia32_broadcastf32x8_512_mask ((__v8sf) __A, + (__v16sf) + _mm512_setzero_ps (), + __M); } extern __inline __m512i __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_movm_epi64 (__mmask8 __A) +_mm512_broadcast_i32x8 (__m256i __A) { - return (__m512i) __builtin_ia32_cvtmask2q512 (__A); + return (__m512i) + __builtin_ia32_broadcasti32x8_512_mask ((__v8si) __A, + (__v16si) + _mm512_undefined_epi32 (), + (__mmask16) -1); } extern __inline __m512i __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_cvttpd_epi64 (__m512d __A) +_mm512_mask_broadcast_i32x8 (__m512i __O, __mmask16 __M, __m256i __A) { - return (__m512i) __builtin_ia32_cvttpd2qq512_mask ((__v8df) __A, - (__v8di) - _mm512_setzero_si512 (), - (__mmask8) -1, - _MM_FROUND_CUR_DIRECTION); + return (__m512i) __builtin_ia32_broadcasti32x8_512_mask ((__v8si) + __A, + (__v16si)__O, + __M); } extern __inline __m512i __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_mask_cvttpd_epi64 (__m512i __W, __mmask8 __U, __m512d __A) +_mm512_maskz_broadcast_i32x8 (__mmask16 __M, __m256i __A) { - return (__m512i) __builtin_ia32_cvttpd2qq512_mask ((__v8df) __A, - (__v8di) __W, - (__mmask8) __U, - _MM_FROUND_CUR_DIRECTION); + return (__m512i) __builtin_ia32_broadcasti32x8_512_mask ((__v8si) + __A, + (__v16si) + _mm512_setzero_si512 (), + __M); } extern __inline __m512i __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_maskz_cvttpd_epi64 (__mmask8 __U, __m512d __A) +_mm512_mullo_epi64 (__m512i __A, __m512i __B) { - return (__m512i) __builtin_ia32_cvttpd2qq512_mask ((__v8df) __A, - (__v8di) - _mm512_setzero_si512 (), - (__mmask8) __U, - _MM_FROUND_CUR_DIRECTION); + return (__m512i) ((__v8du) __A * (__v8du) __B); } extern __inline __m512i __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_cvttpd_epu64 (__m512d __A) +_mm512_mask_mullo_epi64 (__m512i __W, __mmask8 __U, __m512i __A, + __m512i __B) { - return (__m512i) __builtin_ia32_cvttpd2uqq512_mask ((__v8df) __A, - (__v8di) - _mm512_setzero_si512 (), - (__mmask8) -1, - _MM_FROUND_CUR_DIRECTION); + return (__m512i) __builtin_ia32_pmullq512_mask ((__v8di) __A, + (__v8di) __B, + (__v8di) __W, + (__mmask8) __U); } extern __inline __m512i __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_mask_cvttpd_epu64 (__m512i __W, __mmask8 __U, __m512d __A) +_mm512_maskz_mullo_epi64 (__mmask8 __U, __m512i __A, __m512i __B) { - return (__m512i) __builtin_ia32_cvttpd2uqq512_mask ((__v8df) __A, - (__v8di) __W, - (__mmask8) __U, - _MM_FROUND_CUR_DIRECTION); + return (__m512i) __builtin_ia32_pmullq512_mask ((__v8di) __A, + (__v8di) __B, + (__v8di) + _mm512_setzero_si512 (), + (__mmask8) __U); } -extern __inline __m512i +extern __inline __m512d __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_maskz_cvttpd_epu64 (__mmask8 __U, __m512d __A) +_mm512_xor_pd (__m512d __A, __m512d __B) { - return (__m512i) __builtin_ia32_cvttpd2uqq512_mask ((__v8df) __A, - (__v8di) - _mm512_setzero_si512 (), - (__mmask8) __U, - _MM_FROUND_CUR_DIRECTION); + return (__m512d) __builtin_ia32_xorpd512_mask ((__v8df) __A, + (__v8df) __B, + (__v8df) + _mm512_setzero_pd (), + (__mmask8) -1); } -extern __inline __m512i +extern __inline __m512d __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_cvttps_epi64 (__m256 __A) +_mm512_mask_xor_pd (__m512d __W, __mmask8 __U, __m512d __A, + __m512d __B) { - return (__m512i) __builtin_ia32_cvttps2qq512_mask ((__v8sf) __A, - (__v8di) - _mm512_setzero_si512 (), - (__mmask8) -1, - _MM_FROUND_CUR_DIRECTION); + return (__m512d) __builtin_ia32_xorpd512_mask ((__v8df) __A, + (__v8df) __B, + (__v8df) __W, + (__mmask8) __U); } -extern __inline __m512i +extern __inline __m512d __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_mask_cvttps_epi64 (__m512i __W, __mmask8 __U, __m256 __A) +_mm512_maskz_xor_pd (__mmask8 __U, __m512d __A, __m512d __B) { - return (__m512i) __builtin_ia32_cvttps2qq512_mask ((__v8sf) __A, - (__v8di) __W, - (__mmask8) __U, - _MM_FROUND_CUR_DIRECTION); + return (__m512d) __builtin_ia32_xorpd512_mask ((__v8df) __A, + (__v8df) __B, + (__v8df) + _mm512_setzero_pd (), + (__mmask8) __U); } -extern __inline __m512i +extern __inline __m512 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_maskz_cvttps_epi64 (__mmask8 __U, __m256 __A) +_mm512_xor_ps (__m512 __A, __m512 __B) { - return (__m512i) __builtin_ia32_cvttps2qq512_mask ((__v8sf) __A, - (__v8di) - _mm512_setzero_si512 (), - (__mmask8) __U, - _MM_FROUND_CUR_DIRECTION); + return (__m512) __builtin_ia32_xorps512_mask ((__v16sf) __A, + (__v16sf) __B, + (__v16sf) + _mm512_setzero_ps (), + (__mmask16) -1); } -extern __inline __m512i +extern __inline __m512 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_cvttps_epu64 (__m256 __A) +_mm512_mask_xor_ps (__m512 __W, __mmask16 __U, __m512 __A, __m512 __B) { - return (__m512i) __builtin_ia32_cvttps2uqq512_mask ((__v8sf) __A, - (__v8di) - _mm512_setzero_si512 (), - (__mmask8) -1, - _MM_FROUND_CUR_DIRECTION); + return (__m512) __builtin_ia32_xorps512_mask ((__v16sf) __A, + (__v16sf) __B, + (__v16sf) __W, + (__mmask16) __U); } -extern __inline __m512i +extern __inline __m512 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_mask_cvttps_epu64 (__m512i __W, __mmask8 __U, __m256 __A) +_mm512_maskz_xor_ps (__mmask16 __U, __m512 __A, __m512 __B) { - return (__m512i) __builtin_ia32_cvttps2uqq512_mask ((__v8sf) __A, - (__v8di) __W, - (__mmask8) __U, - _MM_FROUND_CUR_DIRECTION); + return (__m512) __builtin_ia32_xorps512_mask ((__v16sf) __A, + (__v16sf) __B, + (__v16sf) + _mm512_setzero_ps (), + (__mmask16) __U); } -extern __inline __m512i +extern __inline __m512d __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_maskz_cvttps_epu64 (__mmask8 __U, __m256 __A) +_mm512_or_pd (__m512d __A, __m512d __B) { - return (__m512i) __builtin_ia32_cvttps2uqq512_mask ((__v8sf) __A, - (__v8di) - _mm512_setzero_si512 (), - (__mmask8) __U, - _MM_FROUND_CUR_DIRECTION); + return (__m512d) __builtin_ia32_orpd512_mask ((__v8df) __A, + (__v8df) __B, + (__v8df) + _mm512_setzero_pd (), + (__mmask8) -1); } -extern __inline __m512i +extern __inline __m512d __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_cvtpd_epi64 (__m512d __A) +_mm512_mask_or_pd (__m512d __W, __mmask8 __U, __m512d __A, __m512d __B) { - return (__m512i) __builtin_ia32_cvtpd2qq512_mask ((__v8df) __A, - (__v8di) - _mm512_setzero_si512 (), - (__mmask8) -1, - _MM_FROUND_CUR_DIRECTION); + return (__m512d) __builtin_ia32_orpd512_mask ((__v8df) __A, + (__v8df) __B, + (__v8df) __W, + (__mmask8) __U); } -extern __inline __m512i +extern __inline __m512d __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_mask_cvtpd_epi64 (__m512i __W, __mmask8 __U, __m512d __A) +_mm512_maskz_or_pd (__mmask8 __U, __m512d __A, __m512d __B) { - return (__m512i) __builtin_ia32_cvtpd2qq512_mask ((__v8df) __A, - (__v8di) __W, - (__mmask8) __U, - _MM_FROUND_CUR_DIRECTION); + return (__m512d) __builtin_ia32_orpd512_mask ((__v8df) __A, + (__v8df) __B, + (__v8df) + _mm512_setzero_pd (), + (__mmask8) __U); } -extern __inline __m512i +extern __inline __m512 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_maskz_cvtpd_epi64 (__mmask8 __U, __m512d __A) +_mm512_or_ps (__m512 __A, __m512 __B) { - return (__m512i) __builtin_ia32_cvtpd2qq512_mask ((__v8df) __A, - (__v8di) - _mm512_setzero_si512 (), - (__mmask8) __U, - _MM_FROUND_CUR_DIRECTION); + return (__m512) __builtin_ia32_orps512_mask ((__v16sf) __A, + (__v16sf) __B, + (__v16sf) + _mm512_setzero_ps (), + (__mmask16) -1); } -extern __inline __m512i +extern __inline __m512 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_cvtpd_epu64 (__m512d __A) +_mm512_mask_or_ps (__m512 __W, __mmask16 __U, __m512 __A, __m512 __B) { - return (__m512i) __builtin_ia32_cvtpd2uqq512_mask ((__v8df) __A, - (__v8di) - _mm512_setzero_si512 (), - (__mmask8) -1, - _MM_FROUND_CUR_DIRECTION); + return (__m512) __builtin_ia32_orps512_mask ((__v16sf) __A, + (__v16sf) __B, + (__v16sf) __W, + (__mmask16) __U); } -extern __inline __m512i +extern __inline __m512 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_mask_cvtpd_epu64 (__m512i __W, __mmask8 __U, __m512d __A) +_mm512_maskz_or_ps (__mmask16 __U, __m512 __A, __m512 __B) { - return (__m512i) __builtin_ia32_cvtpd2uqq512_mask ((__v8df) __A, - (__v8di) __W, - (__mmask8) __U, - _MM_FROUND_CUR_DIRECTION); + return (__m512) __builtin_ia32_orps512_mask ((__v16sf) __A, + (__v16sf) __B, + (__v16sf) + _mm512_setzero_ps (), + (__mmask16) __U); } -extern __inline __m512i +extern __inline __m512d __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_maskz_cvtpd_epu64 (__mmask8 __U, __m512d __A) +_mm512_and_pd (__m512d __A, __m512d __B) { - return (__m512i) __builtin_ia32_cvtpd2uqq512_mask ((__v8df) __A, - (__v8di) - _mm512_setzero_si512 (), - (__mmask8) __U, - _MM_FROUND_CUR_DIRECTION); + return (__m512d) __builtin_ia32_andpd512_mask ((__v8df) __A, + (__v8df) __B, + (__v8df) + _mm512_setzero_pd (), + (__mmask8) -1); } -extern __inline __m512i +extern __inline __m512d __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_cvtps_epi64 (__m256 __A) +_mm512_mask_and_pd (__m512d __W, __mmask8 __U, __m512d __A, + __m512d __B) { - return (__m512i) __builtin_ia32_cvtps2qq512_mask ((__v8sf) __A, - (__v8di) - _mm512_setzero_si512 (), - (__mmask8) -1, - _MM_FROUND_CUR_DIRECTION); + return (__m512d) __builtin_ia32_andpd512_mask ((__v8df) __A, + (__v8df) __B, + (__v8df) __W, + (__mmask8) __U); } -extern __inline __m512i +extern __inline __m512d __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_mask_cvtps_epi64 (__m512i __W, __mmask8 __U, __m256 __A) +_mm512_maskz_and_pd (__mmask8 __U, __m512d __A, __m512d __B) { - return (__m512i) __builtin_ia32_cvtps2qq512_mask ((__v8sf) __A, - (__v8di) __W, - (__mmask8) __U, - _MM_FROUND_CUR_DIRECTION); + return (__m512d) __builtin_ia32_andpd512_mask ((__v8df) __A, + (__v8df) __B, + (__v8df) + _mm512_setzero_pd (), + (__mmask8) __U); } -extern __inline __m512i +extern __inline __m512 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_maskz_cvtps_epi64 (__mmask8 __U, __m256 __A) +_mm512_and_ps (__m512 __A, __m512 __B) { - return (__m512i) __builtin_ia32_cvtps2qq512_mask ((__v8sf) __A, - (__v8di) - _mm512_setzero_si512 (), - (__mmask8) __U, - _MM_FROUND_CUR_DIRECTION); + return (__m512) __builtin_ia32_andps512_mask ((__v16sf) __A, + (__v16sf) __B, + (__v16sf) + _mm512_setzero_ps (), + (__mmask16) -1); } -extern __inline __m512i +extern __inline __m512 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_cvtps_epu64 (__m256 __A) +_mm512_mask_and_ps (__m512 __W, __mmask16 __U, __m512 __A, __m512 __B) { - return (__m512i) __builtin_ia32_cvtps2uqq512_mask ((__v8sf) __A, - (__v8di) - _mm512_setzero_si512 (), - (__mmask8) -1, - _MM_FROUND_CUR_DIRECTION); + return (__m512) __builtin_ia32_andps512_mask ((__v16sf) __A, + (__v16sf) __B, + (__v16sf) __W, + (__mmask16) __U); } -extern __inline __m512i +extern __inline __m512 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_mask_cvtps_epu64 (__m512i __W, __mmask8 __U, __m256 __A) +_mm512_maskz_and_ps (__mmask16 __U, __m512 __A, __m512 __B) { - return (__m512i) __builtin_ia32_cvtps2uqq512_mask ((__v8sf) __A, - (__v8di) __W, - (__mmask8) __U, - _MM_FROUND_CUR_DIRECTION); + return (__m512) __builtin_ia32_andps512_mask ((__v16sf) __A, + (__v16sf) __B, + (__v16sf) + _mm512_setzero_ps (), + (__mmask16) __U); } -extern __inline __m512i +extern __inline __m512d __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_maskz_cvtps_epu64 (__mmask8 __U, __m256 __A) +_mm512_andnot_pd (__m512d __A, __m512d __B) { - return (__m512i) __builtin_ia32_cvtps2uqq512_mask ((__v8sf) __A, - (__v8di) - _mm512_setzero_si512 (), - (__mmask8) __U, - _MM_FROUND_CUR_DIRECTION); + return (__m512d) __builtin_ia32_andnpd512_mask ((__v8df) __A, + (__v8df) __B, + (__v8df) + _mm512_setzero_pd (), + (__mmask8) -1); } -extern __inline __m256 +extern __inline __m512d __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_cvtepi64_ps (__m512i __A) +_mm512_mask_andnot_pd (__m512d __W, __mmask8 __U, __m512d __A, + __m512d __B) { - return (__m256) __builtin_ia32_cvtqq2ps512_mask ((__v8di) __A, - (__v8sf) - _mm256_setzero_ps (), - (__mmask8) -1, - _MM_FROUND_CUR_DIRECTION); + return (__m512d) __builtin_ia32_andnpd512_mask ((__v8df) __A, + (__v8df) __B, + (__v8df) __W, + (__mmask8) __U); } -extern __inline __m256 +extern __inline __m512d __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_mask_cvtepi64_ps (__m256 __W, __mmask8 __U, __m512i __A) +_mm512_maskz_andnot_pd (__mmask8 __U, __m512d __A, __m512d __B) { - return (__m256) __builtin_ia32_cvtqq2ps512_mask ((__v8di) __A, - (__v8sf) __W, - (__mmask8) __U, - _MM_FROUND_CUR_DIRECTION); + return (__m512d) __builtin_ia32_andnpd512_mask ((__v8df) __A, + (__v8df) __B, + (__v8df) + _mm512_setzero_pd (), + (__mmask8) __U); } -extern __inline __m256 +extern __inline __m512 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_maskz_cvtepi64_ps (__mmask8 __U, __m512i __A) +_mm512_andnot_ps (__m512 __A, __m512 __B) { - return (__m256) __builtin_ia32_cvtqq2ps512_mask ((__v8di) __A, - (__v8sf) - _mm256_setzero_ps (), - (__mmask8) __U, - _MM_FROUND_CUR_DIRECTION); + return (__m512) __builtin_ia32_andnps512_mask ((__v16sf) __A, + (__v16sf) __B, + (__v16sf) + _mm512_setzero_ps (), + (__mmask16) -1); } -extern __inline __m256 +extern __inline __m512 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_cvtepu64_ps (__m512i __A) +_mm512_mask_andnot_ps (__m512 __W, __mmask16 __U, __m512 __A, + __m512 __B) { - return (__m256) __builtin_ia32_cvtuqq2ps512_mask ((__v8di) __A, - (__v8sf) - _mm256_setzero_ps (), - (__mmask8) -1, - _MM_FROUND_CUR_DIRECTION); + return (__m512) __builtin_ia32_andnps512_mask ((__v16sf) __A, + (__v16sf) __B, + (__v16sf) __W, + (__mmask16) __U); } -extern __inline __m256 +extern __inline __m512 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_mask_cvtepu64_ps (__m256 __W, __mmask8 __U, __m512i __A) +_mm512_maskz_andnot_ps (__mmask16 __U, __m512 __A, __m512 __B) { - return (__m256) __builtin_ia32_cvtuqq2ps512_mask ((__v8di) __A, - (__v8sf) __W, - (__mmask8) __U, - _MM_FROUND_CUR_DIRECTION); + return (__m512) __builtin_ia32_andnps512_mask ((__v16sf) __A, + (__v16sf) __B, + (__v16sf) + _mm512_setzero_ps (), + (__mmask16) __U); } -extern __inline __m256 +extern __inline __mmask16 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_maskz_cvtepu64_ps (__mmask8 __U, __m512i __A) +_mm512_movepi32_mask (__m512i __A) { - return (__m256) __builtin_ia32_cvtuqq2ps512_mask ((__v8di) __A, - (__v8sf) - _mm256_setzero_ps (), - (__mmask8) __U, - _MM_FROUND_CUR_DIRECTION); + return (__mmask16) __builtin_ia32_cvtd2mask512 ((__v16si) __A); } -extern __inline __m512d +extern __inline __mmask8 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_cvtepi64_pd (__m512i __A) +_mm512_movepi64_mask (__m512i __A) { - return (__m512d) __builtin_ia32_cvtqq2pd512_mask ((__v8di) __A, - (__v8df) - _mm512_setzero_pd (), - (__mmask8) -1, - _MM_FROUND_CUR_DIRECTION); + return (__mmask8) __builtin_ia32_cvtq2mask512 ((__v8di) __A); } -extern __inline __m512d +extern __inline __m512i __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_mask_cvtepi64_pd (__m512d __W, __mmask8 __U, __m512i __A) +_mm512_movm_epi32 (__mmask16 __A) { - return (__m512d) __builtin_ia32_cvtqq2pd512_mask ((__v8di) __A, - (__v8df) __W, - (__mmask8) __U, - _MM_FROUND_CUR_DIRECTION); + return (__m512i) __builtin_ia32_cvtmask2d512 (__A); } -extern __inline __m512d +extern __inline __m512i __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_maskz_cvtepi64_pd (__mmask8 __U, __m512i __A) +_mm512_movm_epi64 (__mmask8 __A) { - return (__m512d) __builtin_ia32_cvtqq2pd512_mask ((__v8di) __A, - (__v8df) - _mm512_setzero_pd (), - (__mmask8) __U, - _MM_FROUND_CUR_DIRECTION); + return (__m512i) __builtin_ia32_cvtmask2q512 (__A); } -extern __inline __m512d +extern __inline __m512i __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_cvtepu64_pd (__m512i __A) +_mm512_cvttpd_epi64 (__m512d __A) { - return (__m512d) __builtin_ia32_cvtuqq2pd512_mask ((__v8di) __A, - (__v8df) - _mm512_setzero_pd (), + return (__m512i) __builtin_ia32_cvttpd2qq512_mask ((__v8df) __A, + (__v8di) + _mm512_setzero_si512 (), (__mmask8) -1, _MM_FROUND_CUR_DIRECTION); } -extern __inline __m512d +extern __inline __m512i __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_mask_cvtepu64_pd (__m512d __W, __mmask8 __U, __m512i __A) +_mm512_mask_cvttpd_epi64 (__m512i __W, __mmask8 __U, __m512d __A) { - return (__m512d) __builtin_ia32_cvtuqq2pd512_mask ((__v8di) __A, - (__v8df) __W, + return (__m512i) __builtin_ia32_cvttpd2qq512_mask ((__v8df) __A, + (__v8di) __W, (__mmask8) __U, _MM_FROUND_CUR_DIRECTION); } -extern __inline __m512d +extern __inline __m512i __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_maskz_cvtepu64_pd (__mmask8 __U, __m512i __A) +_mm512_maskz_cvttpd_epi64 (__mmask8 __U, __m512d __A) { - return (__m512d) __builtin_ia32_cvtuqq2pd512_mask ((__v8di) __A, - (__v8df) - _mm512_setzero_pd (), + return (__m512i) __builtin_ia32_cvttpd2qq512_mask ((__v8df) __A, + (__v8di) + _mm512_setzero_si512 (), (__mmask8) __U, _MM_FROUND_CUR_DIRECTION); } -#ifdef __OPTIMIZE__ -extern __inline __mmask8 +extern __inline __m512i __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_kshiftli_mask8 (__mmask8 __A, unsigned int __B) +_mm512_cvttpd_epu64 (__m512d __A) { - return (__mmask8) __builtin_ia32_kshiftliqi ((__mmask8) __A, (__mmask8) __B); + return (__m512i) __builtin_ia32_cvttpd2uqq512_mask ((__v8df) __A, + (__v8di) + _mm512_setzero_si512 (), + (__mmask8) -1, + _MM_FROUND_CUR_DIRECTION); } -extern __inline __mmask8 +extern __inline __m512i __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_kshiftri_mask8 (__mmask8 __A, unsigned int __B) +_mm512_mask_cvttpd_epu64 (__m512i __W, __mmask8 __U, __m512d __A) { - return (__mmask8) __builtin_ia32_kshiftriqi ((__mmask8) __A, (__mmask8) __B); + return (__m512i) __builtin_ia32_cvttpd2uqq512_mask ((__v8df) __A, + (__v8di) __W, + (__mmask8) __U, + _MM_FROUND_CUR_DIRECTION); } -extern __inline __m512d +extern __inline __m512i __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_range_pd (__m512d __A, __m512d __B, int __C) +_mm512_maskz_cvttpd_epu64 (__mmask8 __U, __m512d __A) { - return (__m512d) __builtin_ia32_rangepd512_mask ((__v8df) __A, - (__v8df) __B, __C, - (__v8df) - _mm512_setzero_pd (), - (__mmask8) -1, - _MM_FROUND_CUR_DIRECTION); + return (__m512i) __builtin_ia32_cvttpd2uqq512_mask ((__v8df) __A, + (__v8di) + _mm512_setzero_si512 (), + (__mmask8) __U, + _MM_FROUND_CUR_DIRECTION); } -extern __inline __m512d +extern __inline __m512i __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_mask_range_pd (__m512d __W, __mmask8 __U, - __m512d __A, __m512d __B, int __C) +_mm512_cvttps_epi64 (__m256 __A) { - return (__m512d) __builtin_ia32_rangepd512_mask ((__v8df) __A, - (__v8df) __B, __C, - (__v8df) __W, - (__mmask8) __U, - _MM_FROUND_CUR_DIRECTION); + return (__m512i) __builtin_ia32_cvttps2qq512_mask ((__v8sf) __A, + (__v8di) + _mm512_setzero_si512 (), + (__mmask8) -1, + _MM_FROUND_CUR_DIRECTION); } -extern __inline __m512d +extern __inline __m512i __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_maskz_range_pd (__mmask8 __U, __m512d __A, __m512d __B, int __C) +_mm512_mask_cvttps_epi64 (__m512i __W, __mmask8 __U, __m256 __A) { - return (__m512d) __builtin_ia32_rangepd512_mask ((__v8df) __A, - (__v8df) __B, __C, - (__v8df) - _mm512_setzero_pd (), - (__mmask8) __U, - _MM_FROUND_CUR_DIRECTION); + return (__m512i) __builtin_ia32_cvttps2qq512_mask ((__v8sf) __A, + (__v8di) __W, + (__mmask8) __U, + _MM_FROUND_CUR_DIRECTION); } -extern __inline __m512 +extern __inline __m512i __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_range_ps (__m512 __A, __m512 __B, int __C) +_mm512_maskz_cvttps_epi64 (__mmask8 __U, __m256 __A) { - return (__m512) __builtin_ia32_rangeps512_mask ((__v16sf) __A, - (__v16sf) __B, __C, - (__v16sf) - _mm512_setzero_ps (), - (__mmask16) -1, - _MM_FROUND_CUR_DIRECTION); + return (__m512i) __builtin_ia32_cvttps2qq512_mask ((__v8sf) __A, + (__v8di) + _mm512_setzero_si512 (), + (__mmask8) __U, + _MM_FROUND_CUR_DIRECTION); } -extern __inline __m512 +extern __inline __m512i __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_mask_range_ps (__m512 __W, __mmask16 __U, - __m512 __A, __m512 __B, int __C) +_mm512_cvttps_epu64 (__m256 __A) { - return (__m512) __builtin_ia32_rangeps512_mask ((__v16sf) __A, - (__v16sf) __B, __C, - (__v16sf) __W, - (__mmask16) __U, - _MM_FROUND_CUR_DIRECTION); + return (__m512i) __builtin_ia32_cvttps2uqq512_mask ((__v8sf) __A, + (__v8di) + _mm512_setzero_si512 (), + (__mmask8) -1, + _MM_FROUND_CUR_DIRECTION); } -extern __inline __m512 +extern __inline __m512i __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_maskz_range_ps (__mmask16 __U, __m512 __A, __m512 __B, int __C) +_mm512_mask_cvttps_epu64 (__m512i __W, __mmask8 __U, __m256 __A) { - return (__m512) __builtin_ia32_rangeps512_mask ((__v16sf) __A, - (__v16sf) __B, __C, - (__v16sf) - _mm512_setzero_ps (), - (__mmask16) __U, - _MM_FROUND_CUR_DIRECTION); + return (__m512i) __builtin_ia32_cvttps2uqq512_mask ((__v8sf) __A, + (__v8di) __W, + (__mmask8) __U, + _MM_FROUND_CUR_DIRECTION); } -extern __inline __m128d +extern __inline __m512i __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_reduce_sd (__m128d __A, __m128d __B, int __C) +_mm512_maskz_cvttps_epu64 (__mmask8 __U, __m256 __A) { - return (__m128d) __builtin_ia32_reducesd_mask ((__v2df) __A, - (__v2df) __B, __C, - (__v2df) _mm_setzero_pd (), - (__mmask8) -1); + return (__m512i) __builtin_ia32_cvttps2uqq512_mask ((__v8sf) __A, + (__v8di) + _mm512_setzero_si512 (), + (__mmask8) __U, + _MM_FROUND_CUR_DIRECTION); } -extern __inline __m128d +extern __inline __m512i __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_reduce_round_sd (__m128d __A, __m128d __B, int __C, const int __R) +_mm512_cvtpd_epi64 (__m512d __A) { - return (__m128d) __builtin_ia32_reducesd_mask_round ((__v2df) __A, - (__v2df) __B, __C, - (__v2df) - _mm_setzero_pd (), - (__mmask8) -1, __R); + return (__m512i) __builtin_ia32_cvtpd2qq512_mask ((__v8df) __A, + (__v8di) + _mm512_setzero_si512 (), + (__mmask8) -1, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_cvtpd_epi64 (__m512i __W, __mmask8 __U, __m512d __A) +{ + return (__m512i) __builtin_ia32_cvtpd2qq512_mask ((__v8df) __A, + (__v8di) __W, + (__mmask8) __U, + _MM_FROUND_CUR_DIRECTION); } -extern __inline __m128d +extern __inline __m512i __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_mask_reduce_sd (__m128d __W, __mmask8 __U, __m128d __A, - __m128d __B, int __C) +_mm512_maskz_cvtpd_epi64 (__mmask8 __U, __m512d __A) { - return (__m128d) __builtin_ia32_reducesd_mask ((__v2df) __A, - (__v2df) __B, __C, - (__v2df) __W, - (__mmask8) __U); + return (__m512i) __builtin_ia32_cvtpd2qq512_mask ((__v8df) __A, + (__v8di) + _mm512_setzero_si512 (), + (__mmask8) __U, + _MM_FROUND_CUR_DIRECTION); } -extern __inline __m128d +extern __inline __m512i __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_mask_reduce_round_sd (__m128d __W, __mmask8 __U, __m128d __A, - __m128d __B, int __C, const int __R) +_mm512_cvtpd_epu64 (__m512d __A) { - return (__m128d) __builtin_ia32_reducesd_mask_round ((__v2df) __A, - (__v2df) __B, __C, - (__v2df) __W, - __U, __R); + return (__m512i) __builtin_ia32_cvtpd2uqq512_mask ((__v8df) __A, + (__v8di) + _mm512_setzero_si512 (), + (__mmask8) -1, + _MM_FROUND_CUR_DIRECTION); } -extern __inline __m128d +extern __inline __m512i __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_maskz_reduce_sd (__mmask8 __U, __m128d __A, __m128d __B, int __C) +_mm512_mask_cvtpd_epu64 (__m512i __W, __mmask8 __U, __m512d __A) { - return (__m128d) __builtin_ia32_reducesd_mask ((__v2df) __A, - (__v2df) __B, __C, - (__v2df) _mm_setzero_pd (), - (__mmask8) __U); + return (__m512i) __builtin_ia32_cvtpd2uqq512_mask ((__v8df) __A, + (__v8di) __W, + (__mmask8) __U, + _MM_FROUND_CUR_DIRECTION); } -extern __inline __m128d +extern __inline __m512i __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_maskz_reduce_round_sd (__mmask8 __U, __m128d __A, __m128d __B, - int __C, const int __R) +_mm512_maskz_cvtpd_epu64 (__mmask8 __U, __m512d __A) { - return (__m128d) __builtin_ia32_reducesd_mask_round ((__v2df) __A, - (__v2df) __B, __C, - (__v2df) - _mm_setzero_pd (), - __U, __R); + return (__m512i) __builtin_ia32_cvtpd2uqq512_mask ((__v8df) __A, + (__v8di) + _mm512_setzero_si512 (), + (__mmask8) __U, + _MM_FROUND_CUR_DIRECTION); } -extern __inline __m128 +extern __inline __m512i __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_reduce_ss (__m128 __A, __m128 __B, int __C) +_mm512_cvtps_epi64 (__m256 __A) { - return (__m128) __builtin_ia32_reducess_mask ((__v4sf) __A, - (__v4sf) __B, __C, - (__v4sf) _mm_setzero_ps (), - (__mmask8) -1); + return (__m512i) __builtin_ia32_cvtps2qq512_mask ((__v8sf) __A, + (__v8di) + _mm512_setzero_si512 (), + (__mmask8) -1, + _MM_FROUND_CUR_DIRECTION); } -extern __inline __m128 +extern __inline __m512i __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_reduce_round_ss (__m128 __A, __m128 __B, int __C, const int __R) +_mm512_mask_cvtps_epi64 (__m512i __W, __mmask8 __U, __m256 __A) { - return (__m128) __builtin_ia32_reducess_mask_round ((__v4sf) __A, - (__v4sf) __B, __C, - (__v4sf) - _mm_setzero_ps (), - (__mmask8) -1, __R); + return (__m512i) __builtin_ia32_cvtps2qq512_mask ((__v8sf) __A, + (__v8di) __W, + (__mmask8) __U, + _MM_FROUND_CUR_DIRECTION); } -extern __inline __m128 +extern __inline __m512i __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_mask_reduce_ss (__m128 __W, __mmask8 __U, __m128 __A, - __m128 __B, int __C) +_mm512_maskz_cvtps_epi64 (__mmask8 __U, __m256 __A) { - return (__m128) __builtin_ia32_reducess_mask ((__v4sf) __A, - (__v4sf) __B, __C, - (__v4sf) __W, - (__mmask8) __U); + return (__m512i) __builtin_ia32_cvtps2qq512_mask ((__v8sf) __A, + (__v8di) + _mm512_setzero_si512 (), + (__mmask8) __U, + _MM_FROUND_CUR_DIRECTION); } -extern __inline __m128 +extern __inline __m512i __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_mask_reduce_round_ss (__m128 __W, __mmask8 __U, __m128 __A, - __m128 __B, int __C, const int __R) +_mm512_cvtps_epu64 (__m256 __A) { - return (__m128) __builtin_ia32_reducess_mask_round ((__v4sf) __A, - (__v4sf) __B, __C, - (__v4sf) __W, - __U, __R); + return (__m512i) __builtin_ia32_cvtps2uqq512_mask ((__v8sf) __A, + (__v8di) + _mm512_setzero_si512 (), + (__mmask8) -1, + _MM_FROUND_CUR_DIRECTION); } -extern __inline __m128 +extern __inline __m512i __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_maskz_reduce_ss (__mmask8 __U, __m128 __A, __m128 __B, int __C) +_mm512_mask_cvtps_epu64 (__m512i __W, __mmask8 __U, __m256 __A) { - return (__m128) __builtin_ia32_reducess_mask ((__v4sf) __A, - (__v4sf) __B, __C, - (__v4sf) _mm_setzero_ps (), - (__mmask8) __U); + return (__m512i) __builtin_ia32_cvtps2uqq512_mask ((__v8sf) __A, + (__v8di) __W, + (__mmask8) __U, + _MM_FROUND_CUR_DIRECTION); } -extern __inline __m128 +extern __inline __m512i __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_maskz_reduce_round_ss (__mmask8 __U, __m128 __A, __m128 __B, - int __C, const int __R) +_mm512_maskz_cvtps_epu64 (__mmask8 __U, __m256 __A) { - return (__m128) __builtin_ia32_reducess_mask_round ((__v4sf) __A, - (__v4sf) __B, __C, - (__v4sf) - _mm_setzero_ps (), - __U, __R); + return (__m512i) __builtin_ia32_cvtps2uqq512_mask ((__v8sf) __A, + (__v8di) + _mm512_setzero_si512 (), + (__mmask8) __U, + _MM_FROUND_CUR_DIRECTION); } -extern __inline __m128d +extern __inline __m256 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_range_sd (__m128d __A, __m128d __B, int __C) +_mm512_cvtepi64_ps (__m512i __A) { - return (__m128d) __builtin_ia32_rangesd128_mask_round ((__v2df) __A, - (__v2df) __B, __C, - (__v2df) - _mm_setzero_pd (), + return (__m256) __builtin_ia32_cvtqq2ps512_mask ((__v8di) __A, + (__v8sf) + _mm256_setzero_ps (), (__mmask8) -1, _MM_FROUND_CUR_DIRECTION); } -extern __inline __m128d +extern __inline __m256 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_mask_range_sd (__m128d __W, __mmask8 __U, __m128d __A, __m128d __B, int __C) +_mm512_mask_cvtepi64_ps (__m256 __W, __mmask8 __U, __m512i __A) { - return (__m128d) __builtin_ia32_rangesd128_mask_round ((__v2df) __A, - (__v2df) __B, __C, - (__v2df) __W, + return (__m256) __builtin_ia32_cvtqq2ps512_mask ((__v8di) __A, + (__v8sf) __W, (__mmask8) __U, _MM_FROUND_CUR_DIRECTION); } -extern __inline __m128d +extern __inline __m256 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_maskz_range_sd (__mmask8 __U, __m128d __A, __m128d __B, int __C) +_mm512_maskz_cvtepi64_ps (__mmask8 __U, __m512i __A) { - return (__m128d) __builtin_ia32_rangesd128_mask_round ((__v2df) __A, - (__v2df) __B, __C, - (__v2df) - _mm_setzero_pd (), + return (__m256) __builtin_ia32_cvtqq2ps512_mask ((__v8di) __A, + (__v8sf) + _mm256_setzero_ps (), (__mmask8) __U, _MM_FROUND_CUR_DIRECTION); } -extern __inline __m128 +extern __inline __m256 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_range_ss (__m128 __A, __m128 __B, int __C) +_mm512_cvtepu64_ps (__m512i __A) { - return (__m128) __builtin_ia32_rangess128_mask_round ((__v4sf) __A, - (__v4sf) __B, __C, - (__v4sf) - _mm_setzero_ps (), - (__mmask8) -1, - _MM_FROUND_CUR_DIRECTION); + return (__m256) __builtin_ia32_cvtuqq2ps512_mask ((__v8di) __A, + (__v8sf) + _mm256_setzero_ps (), + (__mmask8) -1, + _MM_FROUND_CUR_DIRECTION); } -extern __inline __m128 +extern __inline __m256 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_mask_range_ss (__m128 __W, __mmask8 __U, __m128 __A, __m128 __B, int __C) +_mm512_mask_cvtepu64_ps (__m256 __W, __mmask8 __U, __m512i __A) { - return (__m128) __builtin_ia32_rangess128_mask_round ((__v4sf) __A, - (__v4sf) __B, __C, - (__v4sf) __W, - (__mmask8) __U, - _MM_FROUND_CUR_DIRECTION); + return (__m256) __builtin_ia32_cvtuqq2ps512_mask ((__v8di) __A, + (__v8sf) __W, + (__mmask8) __U, + _MM_FROUND_CUR_DIRECTION); } +extern __inline __m256 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_cvtepu64_ps (__mmask8 __U, __m512i __A) +{ + return (__m256) __builtin_ia32_cvtuqq2ps512_mask ((__v8di) __A, + (__v8sf) + _mm256_setzero_ps (), + (__mmask8) __U, + _MM_FROUND_CUR_DIRECTION); +} -extern __inline __m128 +extern __inline __m512d __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_maskz_range_ss (__mmask8 __U, __m128 __A, __m128 __B, int __C) +_mm512_cvtepi64_pd (__m512i __A) { - return (__m128) __builtin_ia32_rangess128_mask_round ((__v4sf) __A, - (__v4sf) __B, __C, - (__v4sf) - _mm_setzero_ps (), - (__mmask8) __U, - _MM_FROUND_CUR_DIRECTION); + return (__m512d) __builtin_ia32_cvtqq2pd512_mask ((__v8di) __A, + (__v8df) + _mm512_setzero_pd (), + (__mmask8) -1, + _MM_FROUND_CUR_DIRECTION); } -extern __inline __m128d +extern __inline __m512d __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_range_round_sd (__m128d __A, __m128d __B, int __C, const int __R) +_mm512_mask_cvtepi64_pd (__m512d __W, __mmask8 __U, __m512i __A) { - return (__m128d) __builtin_ia32_rangesd128_mask_round ((__v2df) __A, - (__v2df) __B, __C, - (__v2df) - _mm_setzero_pd (), - (__mmask8) -1, __R); + return (__m512d) __builtin_ia32_cvtqq2pd512_mask ((__v8di) __A, + (__v8df) __W, + (__mmask8) __U, + _MM_FROUND_CUR_DIRECTION); } -extern __inline __m128d +extern __inline __m512d __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_mask_range_round_sd (__m128d __W, __mmask8 __U, __m128d __A, __m128d __B, - int __C, const int __R) +_mm512_maskz_cvtepi64_pd (__mmask8 __U, __m512i __A) +{ + return (__m512d) __builtin_ia32_cvtqq2pd512_mask ((__v8di) __A, + (__v8df) + _mm512_setzero_pd (), + (__mmask8) __U, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_cvtepu64_pd (__m512i __A) { - return (__m128d) __builtin_ia32_rangesd128_mask_round ((__v2df) __A, - (__v2df) __B, __C, - (__v2df) __W, - (__mmask8) __U, __R); + return (__m512d) __builtin_ia32_cvtuqq2pd512_mask ((__v8di) __A, + (__v8df) + _mm512_setzero_pd (), + (__mmask8) -1, + _MM_FROUND_CUR_DIRECTION); } -extern __inline __m128d +extern __inline __m512d __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_maskz_range_round_sd (__mmask8 __U, __m128d __A, __m128d __B, int __C, - const int __R) +_mm512_mask_cvtepu64_pd (__m512d __W, __mmask8 __U, __m512i __A) { - return (__m128d) __builtin_ia32_rangesd128_mask_round ((__v2df) __A, - (__v2df) __B, __C, - (__v2df) - _mm_setzero_pd (), - (__mmask8) __U, __R); + return (__m512d) __builtin_ia32_cvtuqq2pd512_mask ((__v8di) __A, + (__v8df) __W, + (__mmask8) __U, + _MM_FROUND_CUR_DIRECTION); } -extern __inline __m128 +extern __inline __m512d __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_range_round_ss (__m128 __A, __m128 __B, int __C, const int __R) +_mm512_maskz_cvtepu64_pd (__mmask8 __U, __m512i __A) { - return (__m128) __builtin_ia32_rangess128_mask_round ((__v4sf) __A, - (__v4sf) __B, __C, - (__v4sf) - _mm_setzero_ps (), - (__mmask8) -1, __R); + return (__m512d) __builtin_ia32_cvtuqq2pd512_mask ((__v8di) __A, + (__v8df) + _mm512_setzero_pd (), + (__mmask8) __U, + _MM_FROUND_CUR_DIRECTION); } -extern __inline __m128 +#ifdef __OPTIMIZE__ +extern __inline __m512d __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_mask_range_round_ss (__m128 __W, __mmask8 __U, __m128 __A, __m128 __B, - int __C, const int __R) +_mm512_range_pd (__m512d __A, __m512d __B, int __C) { - return (__m128) __builtin_ia32_rangess128_mask_round ((__v4sf) __A, - (__v4sf) __B, __C, - (__v4sf) __W, - (__mmask8) __U, __R); + return (__m512d) __builtin_ia32_rangepd512_mask ((__v8df) __A, + (__v8df) __B, __C, + (__v8df) + _mm512_setzero_pd (), + (__mmask8) -1, + _MM_FROUND_CUR_DIRECTION); } -extern __inline __m128 +extern __inline __m512d __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_maskz_range_round_ss (__mmask8 __U, __m128 __A, __m128 __B, int __C, - const int __R) +_mm512_mask_range_pd (__m512d __W, __mmask8 __U, + __m512d __A, __m512d __B, int __C) { - return (__m128) __builtin_ia32_rangess128_mask_round ((__v4sf) __A, - (__v4sf) __B, __C, - (__v4sf) - _mm_setzero_ps (), - (__mmask8) __U, __R); + return (__m512d) __builtin_ia32_rangepd512_mask ((__v8df) __A, + (__v8df) __B, __C, + (__v8df) __W, + (__mmask8) __U, + _MM_FROUND_CUR_DIRECTION); } -extern __inline __mmask8 +extern __inline __m512d __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_fpclass_ss_mask (__m128 __A, const int __imm) +_mm512_maskz_range_pd (__mmask8 __U, __m512d __A, __m512d __B, int __C) { - return (__mmask8) __builtin_ia32_fpclassss_mask ((__v4sf) __A, __imm, - (__mmask8) -1); + return (__m512d) __builtin_ia32_rangepd512_mask ((__v8df) __A, + (__v8df) __B, __C, + (__v8df) + _mm512_setzero_pd (), + (__mmask8) __U, + _MM_FROUND_CUR_DIRECTION); } -extern __inline __mmask8 +extern __inline __m512 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_fpclass_sd_mask (__m128d __A, const int __imm) +_mm512_range_ps (__m512 __A, __m512 __B, int __C) { - return (__mmask8) __builtin_ia32_fpclasssd_mask ((__v2df) __A, __imm, - (__mmask8) -1); + return (__m512) __builtin_ia32_rangeps512_mask ((__v16sf) __A, + (__v16sf) __B, __C, + (__v16sf) + _mm512_setzero_ps (), + (__mmask16) -1, + _MM_FROUND_CUR_DIRECTION); } -extern __inline __mmask8 +extern __inline __m512 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_mask_fpclass_ss_mask (__mmask8 __U, __m128 __A, const int __imm) +_mm512_mask_range_ps (__m512 __W, __mmask16 __U, + __m512 __A, __m512 __B, int __C) { - return (__mmask8) __builtin_ia32_fpclassss_mask ((__v4sf) __A, __imm, __U); + return (__m512) __builtin_ia32_rangeps512_mask ((__v16sf) __A, + (__v16sf) __B, __C, + (__v16sf) __W, + (__mmask16) __U, + _MM_FROUND_CUR_DIRECTION); } -extern __inline __mmask8 +extern __inline __m512 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_mask_fpclass_sd_mask (__mmask8 __U, __m128d __A, const int __imm) +_mm512_maskz_range_ps (__mmask16 __U, __m512 __A, __m512 __B, int __C) { - return (__mmask8) __builtin_ia32_fpclasssd_mask ((__v2df) __A, __imm, __U); + return (__m512) __builtin_ia32_rangeps512_mask ((__v16sf) __A, + (__v16sf) __B, __C, + (__v16sf) + _mm512_setzero_ps (), + (__mmask16) __U, + _MM_FROUND_CUR_DIRECTION); } extern __inline __m512i @@ -2395,72 +2546,6 @@ _mm512_fpclass_ps_mask (__m512 __A, const int __imm) } #else -#define _kshiftli_mask8(X, Y) \ - ((__mmask8) __builtin_ia32_kshiftliqi ((__mmask8)(X), (__mmask8)(Y))) - -#define _kshiftri_mask8(X, Y) \ - ((__mmask8) __builtin_ia32_kshiftriqi ((__mmask8)(X), (__mmask8)(Y))) - -#define _mm_range_sd(A, B, C) \ - ((__m128d) __builtin_ia32_rangesd128_mask_round ((__v2df)(__m128d)(A), \ - (__v2df)(__m128d)(B), (int)(C), (__v2df) _mm_setzero_pd (), \ - (__mmask8) -1, _MM_FROUND_CUR_DIRECTION)) - -#define _mm_mask_range_sd(W, U, A, B, C) \ - ((__m128d) __builtin_ia32_rangesd128_mask_round ((__v2df)(__m128d)(A), \ - (__v2df)(__m128d)(B), (int)(C), (__v2df)(__m128d)(W), \ - (__mmask8)(U), _MM_FROUND_CUR_DIRECTION)) - -#define _mm_maskz_range_sd(U, A, B, C) \ - ((__m128d) __builtin_ia32_rangesd128_mask_round ((__v2df)(__m128d)(A), \ - (__v2df)(__m128d)(B), (int)(C), (__v2df) _mm_setzero_pd (), \ - (__mmask8)(U), _MM_FROUND_CUR_DIRECTION)) - -#define _mm_range_ss(A, B, C) \ - ((__m128) __builtin_ia32_rangess128_mask_round ((__v4sf)(__m128)(A), \ - (__v4sf)(__m128)(B), (int)(C), (__v4sf) _mm_setzero_ps (), \ - (__mmask8) -1, _MM_FROUND_CUR_DIRECTION)) - -#define _mm_mask_range_ss(W, U, A, B, C) \ - ((__m128) __builtin_ia32_rangess128_mask_round ((__v4sf)(__m128)(A), \ - (__v4sf)(__m128)(B), (int)(C), (__v4sf)(__m128)(W), \ - (__mmask8)(U), _MM_FROUND_CUR_DIRECTION)) - -#define _mm_maskz_range_ss(U, A, B, C) \ - ((__m128) __builtin_ia32_rangess128_mask_round ((__v4sf)(__m128)(A), \ - (__v4sf)(__m128)(B), (int)(C), (__v4sf) _mm_setzero_ps (), \ - (__mmask8)(U), _MM_FROUND_CUR_DIRECTION)) - -#define _mm_range_round_sd(A, B, C, R) \ - ((__m128d) __builtin_ia32_rangesd128_mask_round ((__v2df)(__m128d)(A), \ - (__v2df)(__m128d)(B), (int)(C), (__v2df) _mm_setzero_pd (), \ - (__mmask8) -1, (R))) - -#define _mm_mask_range_round_sd(W, U, A, B, C, R) \ - ((__m128d) __builtin_ia32_rangesd128_mask_round ((__v2df)(__m128d)(A), \ - (__v2df)(__m128d)(B), (int)(C), (__v2df)(__m128d)(W), \ - (__mmask8)(U), (R))) - -#define _mm_maskz_range_round_sd(U, A, B, C, R) \ - ((__m128d) __builtin_ia32_rangesd128_mask_round ((__v2df)(__m128d)(A), \ - (__v2df)(__m128d)(B), (int)(C), (__v2df) _mm_setzero_pd (), \ - (__mmask8)(U), (R))) - -#define _mm_range_round_ss(A, B, C, R) \ - ((__m128) __builtin_ia32_rangess128_mask_round ((__v4sf)(__m128)(A), \ - (__v4sf)(__m128)(B), (int)(C), (__v4sf) _mm_setzero_ps (), \ - (__mmask8) -1, (R))) - -#define _mm_mask_range_round_ss(W, U, A, B, C, R) \ - ((__m128) __builtin_ia32_rangess128_mask_round ((__v4sf)(__m128)(A), \ - (__v4sf)(__m128)(B), (int)(C), (__v4sf)(__m128)(W), \ - (__mmask8)(U), (R))) - -#define _mm_maskz_range_round_ss(U, A, B, C, R) \ - ((__m128) __builtin_ia32_rangess128_mask_round ((__v4sf)(__m128)(A), \ - (__v4sf)(__m128)(B), (int)(C), (__v4sf) _mm_setzero_ps (), \ - (__mmask8)(U), (R))) - #define _mm512_cvtt_roundpd_epi64(A, B) \ ((__m512i)__builtin_ia32_cvttpd2qq512_mask ((A), (__v8di) \ _mm512_setzero_si512 (), \ @@ -2792,22 +2877,6 @@ _mm512_fpclass_ps_mask (__m512 __A, const int __imm) (__v16si)(__m512i)_mm512_setzero_si512 (),\ (__mmask16)(U))) -#define _mm_fpclass_ss_mask(X, C) \ - ((__mmask8) __builtin_ia32_fpclassss_mask ((__v4sf) (__m128) (X), \ - (int) (C), (__mmask8) (-1))) \ - -#define _mm_fpclass_sd_mask(X, C) \ - ((__mmask8) __builtin_ia32_fpclasssd_mask ((__v2df) (__m128d) (X), \ - (int) (C), (__mmask8) (-1))) \ - -#define _mm_mask_fpclass_ss_mask(X, C, U) \ - ((__mmask8) __builtin_ia32_fpclassss_mask ((__v4sf) (__m128) (X), \ - (int) (C), (__mmask8) (U))) - -#define _mm_mask_fpclass_sd_mask(X, C, U) \ - ((__mmask8) __builtin_ia32_fpclasssd_mask ((__v2df) (__m128d) (X), \ - (int) (C), (__mmask8) (U))) - #define _mm512_mask_fpclass_pd_mask(u, X, C) \ ((__mmask8) __builtin_ia32_fpclasspd512_mask ((__v8df) (__m512d) (X), \ (int) (C), (__mmask8)(u))) @@ -2824,68 +2893,11 @@ _mm512_fpclass_ps_mask (__m512 __A, const int __imm) ((__mmask16) __builtin_ia32_fpclassps512_mask ((__v16sf) (__m512) (x),\ (int) (c),(__mmask16)-1)) -#define _mm_reduce_sd(A, B, C) \ - ((__m128d) __builtin_ia32_reducesd_mask ((__v2df)(__m128d)(A), \ - (__v2df)(__m128d)(B), (int)(C), (__v2df) _mm_setzero_pd (), \ - (__mmask8)-1)) - -#define _mm_mask_reduce_sd(W, U, A, B, C) \ - ((__m128d) __builtin_ia32_reducesd_mask ((__v2df)(__m128d)(A), \ - (__v2df)(__m128d)(B), (int)(C), (__v2df)(__m128d)(W), (__mmask8)(U))) - -#define _mm_maskz_reduce_sd(U, A, B, C) \ - ((__m128d) __builtin_ia32_reducesd_mask ((__v2df)(__m128d)(A), \ - (__v2df)(__m128d)(B), (int)(C), (__v2df) _mm_setzero_pd (), \ - (__mmask8)(U))) - -#define _mm_reduce_round_sd(A, B, C, R) \ - ((__m128d) __builtin_ia32_reducesd_round ((__v2df)(__m128d)(A), \ - (__v2df)(__m128d)(B), (int)(C), (__mmask8)(U), (int)(R))) - -#define _mm_mask_reduce_round_sd(W, U, A, B, C, R) \ - ((__m128d) __builtin_ia32_reducesd_mask_round ((__v2df)(__m128d)(A), \ - (__v2df)(__m128d)(B), (int)(C), (__v2df)(__m128d)(W), \ - (__mmask8)(U), (int)(R))) - -#define _mm_maskz_reduce_round_sd(U, A, B, C, R) \ - ((__m128d) __builtin_ia32_reducesd_mask_round ((__v2df)(__m128d)(A), \ - (__v2df)(__m128d)(B), (int)(C), (__v2df) _mm_setzero_pd (), \ - (__mmask8)(U), (int)(R))) - -#define _mm_reduce_ss(A, B, C) \ - ((__m128) __builtin_ia32_reducess_mask ((__v4sf)(__m128)(A), \ - (__v4sf)(__m128)(B), (int)(C), (__v4sf) _mm_setzero_ps (), \ - (__mmask8)-1)) - -#define _mm_mask_reduce_ss(W, U, A, B, C) \ - ((__m128) __builtin_ia32_reducess_mask ((__v4sf)(__m128)(A), \ - (__v4sf)(__m128)(B), (int)(C), (__v4sf)(__m128)(W), (__mmask8)(U))) - -#define _mm_maskz_reduce_ss(U, A, B, C) \ - ((__m128) __builtin_ia32_reducess_mask ((__v4sf)(__m128)(A), \ - (__v4sf)(__m128)(B), (int)(C), (__v4sf) _mm_setzero_ps (), \ - (__mmask8)(U))) - -#define _mm_reduce_round_ss(A, B, C, R) \ - ((__m128) __builtin_ia32_reducess_round ((__v4sf)(__m128)(A), \ - (__v4sf)(__m128)(B), (int)(C), (__mmask8)(U), (int)(R))) - -#define _mm_mask_reduce_round_ss(W, U, A, B, C, R) \ - ((__m128) __builtin_ia32_reducess_mask_round ((__v4sf)(__m128)(A), \ - (__v4sf)(__m128)(B), (int)(C), (__v4sf)(__m128)(W), \ - (__mmask8)(U), (int)(R))) - -#define _mm_maskz_reduce_round_ss(U, A, B, C, R) \ - ((__m128) __builtin_ia32_reducesd_mask_round ((__v4sf)(__m128)(A), \ - (__v4sf)(__m128)(B), (int)(C), (__v4sf) _mm_setzero_ps (), \ - (__mmask8)(U), (int)(R))) - - #endif -#ifdef __DISABLE_AVX512DQ__ -#undef __DISABLE_AVX512DQ__ +#ifdef __DISABLE_AVX512DQ_512__ +#undef __DISABLE_AVX512DQ_512__ #pragma GCC pop_options -#endif /* __DISABLE_AVX512DQ__ */ +#endif /* __DISABLE_AVX512DQ_512__ */ #endif /* _AVX512DQINTRIN_H_INCLUDED */ From patchwork Thu Sep 21 07:19:59 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Hu, Lin1" X-Patchwork-Id: 1837523 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=intel.com header.i=@intel.com header.a=rsa-sha256 header.s=Intel header.b=CY0F57Qt; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4Rrn441MNzz1yh6 for ; Thu, 21 Sep 2023 17:25:24 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id B827C38AA26B for ; Thu, 21 Sep 2023 07:25:21 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mgamail.intel.com (mgamail.intel.com [192.55.52.120]) by sourceware.org (Postfix) with ESMTPS id 2075938A816D for ; Thu, 21 Sep 2023 07:25:07 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 2075938A816D Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=intel.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1695281108; x=1726817108; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=2L9cFyOhJAzgfzzHORMo+XuPFht1TvfffXUnqkMW2Cw=; b=CY0F57QtjuY3Hnjm5tvd0sirY6OykUjlVoL/tD2DQvyG/zdknP04nDHe zvG2Z5iHrxrHmf+cfxtcvyanUhCHGAytxbn/1xrAn0hSaFEbSVY3UavA8 pxcMbaDBUQye04PioA5EYj86xLPPF5Gs3zNCrobomvsroaGCM6ZG4qR5t D8ENMeyoKU355OD6QzyAcsf0hwJsikH6DMbM13uap4hjpTb2IQV51JzIn 03edXql62NXoMjSSgIM3h1a1T96g8ncHUyB6QVe8x6Tc6A3eqdpnClQmE PlPuhVHgrBbIAkvwc8MfuW8x02/d3QEPBpDG0zlmhzxDMuCxGrwIerUk7 g==; X-IronPort-AV: E=McAfee;i="6600,9927,10839"; a="379326677" X-IronPort-AV: E=Sophos;i="6.03,164,1694761200"; d="scan'208";a="379326677" Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by fmsmga104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Sep 2023 00:22:18 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10839"; a="817262186" X-IronPort-AV: E=Sophos;i="6.03,164,1694761200"; d="scan'208";a="817262186" Received: from shvmail03.sh.intel.com ([10.239.245.20]) by fmsmga004.fm.intel.com with ESMTP; 21 Sep 2023 00:22:15 -0700 Received: from shliclel4217.sh.intel.com (shliclel4217.sh.intel.com [10.239.240.127]) by shvmail03.sh.intel.com (Postfix) with ESMTP id 1A8821005132; Thu, 21 Sep 2023 15:22:14 +0800 (CST) From: "Hu, Lin1" To: gcc-patches@gcc.gnu.org Cc: hongtao.liu@intel.com, ubizjak@gmail.com, haochen.jiang@intel.com Subject: [PATCH 04/18] [PATCH 3/5] Push evex512 target for 512 bit intrins Date: Thu, 21 Sep 2023 15:19:59 +0800 Message-Id: <20230921072013.2124750-5-lin1.hu@intel.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20230921072013.2124750-1-lin1.hu@intel.com> References: <20230921072013.2124750-1-lin1.hu@intel.com> MIME-Version: 1.0 X-Spam-Status: No, score=-1.6 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, SPF_HELO_NONE, SPF_NONE, TXREP, UNWANTED_LANGUAGE_BODY autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org From: Haochen Jiang gcc/ChangeLog: * config/i386/avx512bwintrin.h: Add evex512 target for 512 bit intrins. --- gcc/config/i386/avx512bwintrin.h | 291 ++++++++++++++++--------------- 1 file changed, 153 insertions(+), 138 deletions(-) diff --git a/gcc/config/i386/avx512bwintrin.h b/gcc/config/i386/avx512bwintrin.h index d1cd549ce18..925bae1457c 100644 --- a/gcc/config/i386/avx512bwintrin.h +++ b/gcc/config/i386/avx512bwintrin.h @@ -34,16 +34,6 @@ #define __DISABLE_AVX512BW__ #endif /* __AVX512BW__ */ -/* Internal data types for implementing the intrinsics. */ -typedef short __v32hi __attribute__ ((__vector_size__ (64))); -typedef short __v32hi_u __attribute__ ((__vector_size__ (64), \ - __may_alias__, __aligned__ (1))); -typedef char __v64qi __attribute__ ((__vector_size__ (64))); -typedef char __v64qi_u __attribute__ ((__vector_size__ (64), \ - __may_alias__, __aligned__ (1))); - -typedef unsigned long long __mmask64; - extern __inline unsigned char __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) _ktest_mask32_u8 (__mmask32 __A, __mmask32 __B, unsigned char *__CF) @@ -54,229 +44,292 @@ _ktest_mask32_u8 (__mmask32 __A, __mmask32 __B, unsigned char *__CF) extern __inline unsigned char __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_ktest_mask64_u8 (__mmask64 __A, __mmask64 __B, unsigned char *__CF) +_ktestz_mask32_u8 (__mmask32 __A, __mmask32 __B) { - *__CF = (unsigned char) __builtin_ia32_ktestcdi (__A, __B); - return (unsigned char) __builtin_ia32_ktestzdi (__A, __B); + return (unsigned char) __builtin_ia32_ktestzsi (__A, __B); } extern __inline unsigned char __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_ktestz_mask32_u8 (__mmask32 __A, __mmask32 __B) +_ktestc_mask32_u8 (__mmask32 __A, __mmask32 __B) { - return (unsigned char) __builtin_ia32_ktestzsi (__A, __B); + return (unsigned char) __builtin_ia32_ktestcsi (__A, __B); } extern __inline unsigned char __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_ktestz_mask64_u8 (__mmask64 __A, __mmask64 __B) +_kortest_mask32_u8 (__mmask32 __A, __mmask32 __B, unsigned char *__CF) { - return (unsigned char) __builtin_ia32_ktestzdi (__A, __B); + *__CF = (unsigned char) __builtin_ia32_kortestcsi (__A, __B); + return (unsigned char) __builtin_ia32_kortestzsi (__A, __B); } extern __inline unsigned char __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_ktestc_mask32_u8 (__mmask32 __A, __mmask32 __B) +_kortestz_mask32_u8 (__mmask32 __A, __mmask32 __B) { - return (unsigned char) __builtin_ia32_ktestcsi (__A, __B); + return (unsigned char) __builtin_ia32_kortestzsi (__A, __B); } extern __inline unsigned char __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_ktestc_mask64_u8 (__mmask64 __A, __mmask64 __B) +_kortestc_mask32_u8 (__mmask32 __A, __mmask32 __B) { - return (unsigned char) __builtin_ia32_ktestcdi (__A, __B); + return (unsigned char) __builtin_ia32_kortestcsi (__A, __B); } -extern __inline unsigned char +extern __inline __mmask32 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_kortest_mask32_u8 (__mmask32 __A, __mmask32 __B, unsigned char *__CF) +_kadd_mask32 (__mmask32 __A, __mmask32 __B) { - *__CF = (unsigned char) __builtin_ia32_kortestcsi (__A, __B); - return (unsigned char) __builtin_ia32_kortestzsi (__A, __B); + return (__mmask32) __builtin_ia32_kaddsi ((__mmask32) __A, (__mmask32) __B); } -extern __inline unsigned char +extern __inline unsigned int __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_kortest_mask64_u8 (__mmask64 __A, __mmask64 __B, unsigned char *__CF) +_cvtmask32_u32 (__mmask32 __A) { - *__CF = (unsigned char) __builtin_ia32_kortestcdi (__A, __B); - return (unsigned char) __builtin_ia32_kortestzdi (__A, __B); + return (unsigned int) __builtin_ia32_kmovd ((__mmask32) __A); } -extern __inline unsigned char +extern __inline __mmask32 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_kortestz_mask32_u8 (__mmask32 __A, __mmask32 __B) +_cvtu32_mask32 (unsigned int __A) { - return (unsigned char) __builtin_ia32_kortestzsi (__A, __B); + return (__mmask32) __builtin_ia32_kmovd ((__mmask32) __A); } -extern __inline unsigned char +extern __inline __mmask32 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_kortestz_mask64_u8 (__mmask64 __A, __mmask64 __B) +_load_mask32 (__mmask32 *__A) { - return (unsigned char) __builtin_ia32_kortestzdi (__A, __B); + return (__mmask32) __builtin_ia32_kmovd (*__A); } -extern __inline unsigned char +extern __inline void __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_kortestc_mask32_u8 (__mmask32 __A, __mmask32 __B) +_store_mask32 (__mmask32 *__A, __mmask32 __B) { - return (unsigned char) __builtin_ia32_kortestcsi (__A, __B); + *(__mmask32 *) __A = __builtin_ia32_kmovd (__B); } -extern __inline unsigned char +extern __inline __mmask32 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_kortestc_mask64_u8 (__mmask64 __A, __mmask64 __B) +_knot_mask32 (__mmask32 __A) { - return (unsigned char) __builtin_ia32_kortestcdi (__A, __B); + return (__mmask32) __builtin_ia32_knotsi ((__mmask32) __A); } extern __inline __mmask32 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_kadd_mask32 (__mmask32 __A, __mmask32 __B) +_kor_mask32 (__mmask32 __A, __mmask32 __B) { - return (__mmask32) __builtin_ia32_kaddsi ((__mmask32) __A, (__mmask32) __B); + return (__mmask32) __builtin_ia32_korsi ((__mmask32) __A, (__mmask32) __B); } -extern __inline __mmask64 +extern __inline __mmask32 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_kadd_mask64 (__mmask64 __A, __mmask64 __B) +_kxnor_mask32 (__mmask32 __A, __mmask32 __B) { - return (__mmask64) __builtin_ia32_kadddi ((__mmask64) __A, (__mmask64) __B); + return (__mmask32) __builtin_ia32_kxnorsi ((__mmask32) __A, (__mmask32) __B); } -extern __inline unsigned int +extern __inline __mmask32 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_cvtmask32_u32 (__mmask32 __A) +_kxor_mask32 (__mmask32 __A, __mmask32 __B) { - return (unsigned int) __builtin_ia32_kmovd ((__mmask32) __A); + return (__mmask32) __builtin_ia32_kxorsi ((__mmask32) __A, (__mmask32) __B); } -extern __inline unsigned long long +extern __inline __mmask32 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_cvtmask64_u64 (__mmask64 __A) +_kand_mask32 (__mmask32 __A, __mmask32 __B) { - return (unsigned long long) __builtin_ia32_kmovq ((__mmask64) __A); + return (__mmask32) __builtin_ia32_kandsi ((__mmask32) __A, (__mmask32) __B); } extern __inline __mmask32 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_cvtu32_mask32 (unsigned int __A) +_kandn_mask32 (__mmask32 __A, __mmask32 __B) { - return (__mmask32) __builtin_ia32_kmovd ((__mmask32) __A); + return (__mmask32) __builtin_ia32_kandnsi ((__mmask32) __A, (__mmask32) __B); } -extern __inline __mmask64 +extern __inline __mmask32 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_cvtu64_mask64 (unsigned long long __A) +_mm512_kunpackw (__mmask32 __A, __mmask32 __B) { - return (__mmask64) __builtin_ia32_kmovq ((__mmask64) __A); + return (__mmask32) __builtin_ia32_kunpcksi ((__mmask32) __A, + (__mmask32) __B); } extern __inline __mmask32 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_load_mask32 (__mmask32 *__A) +_kunpackw_mask32 (__mmask16 __A, __mmask16 __B) { - return (__mmask32) __builtin_ia32_kmovd (*__A); + return (__mmask32) __builtin_ia32_kunpcksi ((__mmask32) __A, + (__mmask32) __B); } -extern __inline __mmask64 +#if __OPTIMIZE__ +extern __inline __mmask32 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_load_mask64 (__mmask64 *__A) +_kshiftli_mask32 (__mmask32 __A, unsigned int __B) { - return (__mmask64) __builtin_ia32_kmovq (*(__mmask64 *) __A); + return (__mmask32) __builtin_ia32_kshiftlisi ((__mmask32) __A, + (__mmask8) __B); } -extern __inline void +extern __inline __mmask32 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_store_mask32 (__mmask32 *__A, __mmask32 __B) +_kshiftri_mask32 (__mmask32 __A, unsigned int __B) { - *(__mmask32 *) __A = __builtin_ia32_kmovd (__B); + return (__mmask32) __builtin_ia32_kshiftrisi ((__mmask32) __A, + (__mmask8) __B); } -extern __inline void +#else +#define _kshiftli_mask32(X, Y) \ + ((__mmask32) __builtin_ia32_kshiftlisi ((__mmask32)(X), (__mmask8)(Y))) + +#define _kshiftri_mask32(X, Y) \ + ((__mmask32) __builtin_ia32_kshiftrisi ((__mmask32)(X), (__mmask8)(Y))) + +#endif + +#ifdef __DISABLE_AVX512BW__ +#undef __DISABLE_AVX512BW__ +#pragma GCC pop_options +#endif /* __DISABLE_AVX512BW__ */ + +#if !defined (__AVX512BW__) || !defined (__EVEX512__) +#pragma GCC push_options +#pragma GCC target("avx512bw,evex512") +#define __DISABLE_AVX512BW_512__ +#endif /* __AVX512BW_512__ */ + +/* Internal data types for implementing the intrinsics. */ +typedef short __v32hi __attribute__ ((__vector_size__ (64))); +typedef short __v32hi_u __attribute__ ((__vector_size__ (64), \ + __may_alias__, __aligned__ (1))); +typedef char __v64qi __attribute__ ((__vector_size__ (64))); +typedef char __v64qi_u __attribute__ ((__vector_size__ (64), \ + __may_alias__, __aligned__ (1))); + +typedef unsigned long long __mmask64; + +extern __inline unsigned char __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_store_mask64 (__mmask64 *__A, __mmask64 __B) +_ktest_mask64_u8 (__mmask64 __A, __mmask64 __B, unsigned char *__CF) { - *(__mmask64 *) __A = __builtin_ia32_kmovq (__B); + *__CF = (unsigned char) __builtin_ia32_ktestcdi (__A, __B); + return (unsigned char) __builtin_ia32_ktestzdi (__A, __B); } -extern __inline __mmask32 +extern __inline unsigned char __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_knot_mask32 (__mmask32 __A) +_ktestz_mask64_u8 (__mmask64 __A, __mmask64 __B) { - return (__mmask32) __builtin_ia32_knotsi ((__mmask32) __A); + return (unsigned char) __builtin_ia32_ktestzdi (__A, __B); } -extern __inline __mmask64 +extern __inline unsigned char __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_knot_mask64 (__mmask64 __A) +_ktestc_mask64_u8 (__mmask64 __A, __mmask64 __B) { - return (__mmask64) __builtin_ia32_knotdi ((__mmask64) __A); + return (unsigned char) __builtin_ia32_ktestcdi (__A, __B); } -extern __inline __mmask32 +extern __inline unsigned char __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_kor_mask32 (__mmask32 __A, __mmask32 __B) +_kortest_mask64_u8 (__mmask64 __A, __mmask64 __B, unsigned char *__CF) { - return (__mmask32) __builtin_ia32_korsi ((__mmask32) __A, (__mmask32) __B); + *__CF = (unsigned char) __builtin_ia32_kortestcdi (__A, __B); + return (unsigned char) __builtin_ia32_kortestzdi (__A, __B); +} + +extern __inline unsigned char +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_kortestz_mask64_u8 (__mmask64 __A, __mmask64 __B) +{ + return (unsigned char) __builtin_ia32_kortestzdi (__A, __B); +} + +extern __inline unsigned char +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_kortestc_mask64_u8 (__mmask64 __A, __mmask64 __B) +{ + return (unsigned char) __builtin_ia32_kortestcdi (__A, __B); } extern __inline __mmask64 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_kor_mask64 (__mmask64 __A, __mmask64 __B) +_kadd_mask64 (__mmask64 __A, __mmask64 __B) { - return (__mmask64) __builtin_ia32_kordi ((__mmask64) __A, (__mmask64) __B); + return (__mmask64) __builtin_ia32_kadddi ((__mmask64) __A, (__mmask64) __B); } -extern __inline __mmask32 +extern __inline unsigned long long __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_kxnor_mask32 (__mmask32 __A, __mmask32 __B) +_cvtmask64_u64 (__mmask64 __A) { - return (__mmask32) __builtin_ia32_kxnorsi ((__mmask32) __A, (__mmask32) __B); + return (unsigned long long) __builtin_ia32_kmovq ((__mmask64) __A); } extern __inline __mmask64 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_kxnor_mask64 (__mmask64 __A, __mmask64 __B) +_cvtu64_mask64 (unsigned long long __A) { - return (__mmask64) __builtin_ia32_kxnordi ((__mmask64) __A, (__mmask64) __B); + return (__mmask64) __builtin_ia32_kmovq ((__mmask64) __A); } -extern __inline __mmask32 +extern __inline __mmask64 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_kxor_mask32 (__mmask32 __A, __mmask32 __B) +_load_mask64 (__mmask64 *__A) { - return (__mmask32) __builtin_ia32_kxorsi ((__mmask32) __A, (__mmask32) __B); + return (__mmask64) __builtin_ia32_kmovq (*(__mmask64 *) __A); +} + +extern __inline void +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_store_mask64 (__mmask64 *__A, __mmask64 __B) +{ + *(__mmask64 *) __A = __builtin_ia32_kmovq (__B); } extern __inline __mmask64 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_kxor_mask64 (__mmask64 __A, __mmask64 __B) +_knot_mask64 (__mmask64 __A) { - return (__mmask64) __builtin_ia32_kxordi ((__mmask64) __A, (__mmask64) __B); + return (__mmask64) __builtin_ia32_knotdi ((__mmask64) __A); } -extern __inline __mmask32 +extern __inline __mmask64 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_kand_mask32 (__mmask32 __A, __mmask32 __B) +_kor_mask64 (__mmask64 __A, __mmask64 __B) { - return (__mmask32) __builtin_ia32_kandsi ((__mmask32) __A, (__mmask32) __B); + return (__mmask64) __builtin_ia32_kordi ((__mmask64) __A, (__mmask64) __B); } extern __inline __mmask64 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_kand_mask64 (__mmask64 __A, __mmask64 __B) +_kxnor_mask64 (__mmask64 __A, __mmask64 __B) { - return (__mmask64) __builtin_ia32_kanddi ((__mmask64) __A, (__mmask64) __B); + return (__mmask64) __builtin_ia32_kxnordi ((__mmask64) __A, (__mmask64) __B); } -extern __inline __mmask32 +extern __inline __mmask64 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_kandn_mask32 (__mmask32 __A, __mmask32 __B) +_kxor_mask64 (__mmask64 __A, __mmask64 __B) { - return (__mmask32) __builtin_ia32_kandnsi ((__mmask32) __A, (__mmask32) __B); + return (__mmask64) __builtin_ia32_kxordi ((__mmask64) __A, (__mmask64) __B); +} + +extern __inline __mmask64 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_kand_mask64 (__mmask64 __A, __mmask64 __B) +{ + return (__mmask64) __builtin_ia32_kanddi ((__mmask64) __A, (__mmask64) __B); } extern __inline __mmask64 @@ -366,22 +419,6 @@ _mm512_maskz_mov_epi8 (__mmask64 __U, __m512i __A) (__mmask64) __U); } -extern __inline __mmask32 -__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_kunpackw (__mmask32 __A, __mmask32 __B) -{ - return (__mmask32) __builtin_ia32_kunpcksi ((__mmask32) __A, - (__mmask32) __B); -} - -extern __inline __mmask32 -__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_kunpackw_mask32 (__mmask16 __A, __mmask16 __B) -{ - return (__mmask32) __builtin_ia32_kunpcksi ((__mmask32) __A, - (__mmask32) __B); -} - extern __inline __mmask64 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) _mm512_kunpackd (__mmask64 __A, __mmask64 __B) @@ -2776,14 +2813,6 @@ _mm512_mask_packus_epi32 (__m512i __W, __mmask32 __M, __m512i __A, } #ifdef __OPTIMIZE__ -extern __inline __mmask32 -__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_kshiftli_mask32 (__mmask32 __A, unsigned int __B) -{ - return (__mmask32) __builtin_ia32_kshiftlisi ((__mmask32) __A, - (__mmask8) __B); -} - extern __inline __mmask64 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) _kshiftli_mask64 (__mmask64 __A, unsigned int __B) @@ -2792,14 +2821,6 @@ _kshiftli_mask64 (__mmask64 __A, unsigned int __B) (__mmask8) __B); } -extern __inline __mmask32 -__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_kshiftri_mask32 (__mmask32 __A, unsigned int __B) -{ - return (__mmask32) __builtin_ia32_kshiftrisi ((__mmask32) __A, - (__mmask8) __B); -} - extern __inline __mmask64 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) _kshiftri_mask64 (__mmask64 __A, unsigned int __B) @@ -3145,15 +3166,9 @@ _mm512_bsrli_epi128 (__m512i __A, const int __N) } #else -#define _kshiftli_mask32(X, Y) \ - ((__mmask32) __builtin_ia32_kshiftlisi ((__mmask32)(X), (__mmask8)(Y))) - #define _kshiftli_mask64(X, Y) \ ((__mmask64) __builtin_ia32_kshiftlidi ((__mmask64)(X), (__mmask8)(Y))) -#define _kshiftri_mask32(X, Y) \ - ((__mmask32) __builtin_ia32_kshiftrisi ((__mmask32)(X), (__mmask8)(Y))) - #define _kshiftri_mask64(X, Y) \ ((__mmask64) __builtin_ia32_kshiftridi ((__mmask64)(X), (__mmask8)(Y))) @@ -3328,9 +3343,9 @@ _mm512_bsrli_epi128 (__m512i __A, const int __N) #endif -#ifdef __DISABLE_AVX512BW__ -#undef __DISABLE_AVX512BW__ +#ifdef __DISABLE_AVX512BW_512__ +#undef __DISABLE_AVX512BW_512__ #pragma GCC pop_options -#endif /* __DISABLE_AVX512BW__ */ +#endif /* __DISABLE_AVX512BW_512__ */ #endif /* _AVX512BWINTRIN_H_INCLUDED */ From patchwork Thu Sep 21 07:20:00 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Hu, Lin1" X-Patchwork-Id: 1837520 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=intel.com header.i=@intel.com header.a=rsa-sha256 header.s=Intel header.b=g11vbtPs; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4Rrn2z48grz1ynF for ; Thu, 21 Sep 2023 17:24:27 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 225C0385781F for ; Thu, 21 Sep 2023 07:24:21 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mgamail.intel.com (mgamail.intel.com [192.55.52.115]) by sourceware.org (Postfix) with ESMTPS id 832BD3856DC0 for ; Thu, 21 Sep 2023 07:22:39 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 832BD3856DC0 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=intel.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1695280959; x=1726816959; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=iFZhxODmUcxAVmGXAdginQ2Ti8uptgEM43tHAHRLhUQ=; b=g11vbtPsyEVMzIWDpInNkdHz/FlBcuxZCYS3EOTjehYnCgSzJ0HfJSxu 59Cv8T3+ZE5UdMPToMu3A59c7+QDvlLewlVagbYafF8LSSZpWSNyC0/+G KAUxIzv6g8Q+zk1wH4e65burnqQBPaJ7og4wiKJuGsYeTZsbeCVV4bJMg 5CCXEm+el6paNvCffIAVIlzdPTYMdFdgScQEGo8L6MxqxYzOoA2laa/0p 4bN0G4y5x6g1NDgCn1/aqvSzj/fP0+wrnuHseAcxmRGr7jOtso7onSIbI 4bjDeiihzf1/9ba9GO5/7paRpIA8Tl45UqDntPXchWD0CNHbfvKPqn6M9 w==; X-IronPort-AV: E=McAfee;i="6600,9927,10839"; a="380352164" X-IronPort-AV: E=Sophos;i="6.03,164,1694761200"; d="scan'208";a="380352164" Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Sep 2023 00:22:20 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10839"; a="817262197" X-IronPort-AV: E=Sophos;i="6.03,164,1694761200"; d="scan'208";a="817262197" Received: from shvmail03.sh.intel.com ([10.239.245.20]) by fmsmga004.fm.intel.com with ESMTP; 21 Sep 2023 00:22:15 -0700 Received: from shliclel4217.sh.intel.com (shliclel4217.sh.intel.com [10.239.240.127]) by shvmail03.sh.intel.com (Postfix) with ESMTP id 1DDF91005133; Thu, 21 Sep 2023 15:22:14 +0800 (CST) From: "Hu, Lin1" To: gcc-patches@gcc.gnu.org Cc: hongtao.liu@intel.com, ubizjak@gmail.com, haochen.jiang@intel.com Subject: [PATCH 05/18] [PATCH 4/5] Push evex512 target for 512 bit intrins Date: Thu, 21 Sep 2023 15:20:00 +0800 Message-Id: <20230921072013.2124750-6-lin1.hu@intel.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20230921072013.2124750-1-lin1.hu@intel.com> References: <20230921072013.2124750-1-lin1.hu@intel.com> MIME-Version: 1.0 X-Spam-Status: No, score=-11.4 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org From: Haochen Jiang gcc/ChangeLog: * config.gcc: Add avx512bitalgvlintrin.h. * config/i386/avx5124fmapsintrin.h: Add evex512 target for 512 bit intrins. * config/i386/avx5124vnniwintrin.h: Ditto. * config/i386/avx512bf16intrin.h: Ditto. * config/i386/avx512bitalgintrin.h: Add evex512 target for 512 bit intrins. Split 128/256 bit intrins to avx512bitalgvlintrin.h. * config/i386/avx512erintrin.h: Add evex512 target for 512 bit intrins * config/i386/avx512ifmaintrin.h: Ditto * config/i386/avx512pfintrin.h: Ditto * config/i386/avx512vbmi2intrin.h: Ditto. * config/i386/avx512vbmiintrin.h: Ditto. * config/i386/avx512vnniintrin.h: Ditto. * config/i386/avx512vp2intersectintrin.h: Ditto. * config/i386/avx512vpopcntdqintrin.h: Ditto. * config/i386/gfniintrin.h: Ditto. * config/i386/immintrin.h: Add avx512bitalgvlintrin.h. * config/i386/vaesintrin.h: Add evex512 target for 512 bit intrins. * config/i386/vpclmulqdqintrin.h: Ditto. * config/i386/avx512bitalgvlintrin.h: New. --- gcc/config.gcc | 19 +-- gcc/config/i386/avx5124fmapsintrin.h | 2 +- gcc/config/i386/avx5124vnniwintrin.h | 2 +- gcc/config/i386/avx512bf16intrin.h | 31 ++-- gcc/config/i386/avx512bitalgintrin.h | 155 +----------------- gcc/config/i386/avx512bitalgvlintrin.h | 180 +++++++++++++++++++++ gcc/config/i386/avx512erintrin.h | 2 +- gcc/config/i386/avx512ifmaintrin.h | 4 +- gcc/config/i386/avx512pfintrin.h | 2 +- gcc/config/i386/avx512vbmi2intrin.h | 4 +- gcc/config/i386/avx512vbmiintrin.h | 4 +- gcc/config/i386/avx512vnniintrin.h | 4 +- gcc/config/i386/avx512vp2intersectintrin.h | 4 +- gcc/config/i386/avx512vpopcntdqintrin.h | 4 +- gcc/config/i386/gfniintrin.h | 76 +++++---- gcc/config/i386/immintrin.h | 2 + gcc/config/i386/vaesintrin.h | 4 +- gcc/config/i386/vpclmulqdqintrin.h | 4 +- 18 files changed, 282 insertions(+), 221 deletions(-) create mode 100644 gcc/config/i386/avx512bitalgvlintrin.h diff --git a/gcc/config.gcc b/gcc/config.gcc index ce5def08e2e..e47e6893e1d 100644 --- a/gcc/config.gcc +++ b/gcc/config.gcc @@ -425,15 +425,16 @@ i[34567]86-*-* | x86_64-*-*) avx512vbmi2vlintrin.h avx512vnniintrin.h avx512vnnivlintrin.h vaesintrin.h vpclmulqdqintrin.h avx512vpopcntdqvlintrin.h avx512bitalgintrin.h - pconfigintrin.h wbnoinvdintrin.h movdirintrin.h - waitpkgintrin.h cldemoteintrin.h avx512bf16vlintrin.h - avx512bf16intrin.h enqcmdintrin.h serializeintrin.h - avx512vp2intersectintrin.h avx512vp2intersectvlintrin.h - tsxldtrkintrin.h amxtileintrin.h amxint8intrin.h - amxbf16intrin.h x86gprintrin.h uintrintrin.h - hresetintrin.h keylockerintrin.h avxvnniintrin.h - mwaitintrin.h avx512fp16intrin.h avx512fp16vlintrin.h - avxifmaintrin.h avxvnniint8intrin.h avxneconvertintrin.h + avx512bitalgvlintrin.h pconfigintrin.h wbnoinvdintrin.h + movdirintrin.h waitpkgintrin.h cldemoteintrin.h + avx512bf16vlintrin.h avx512bf16intrin.h enqcmdintrin.h + serializeintrin.h avx512vp2intersectintrin.h + avx512vp2intersectvlintrin.h tsxldtrkintrin.h + amxtileintrin.h amxint8intrin.h amxbf16intrin.h + x86gprintrin.h uintrintrin.h hresetintrin.h + keylockerintrin.h avxvnniintrin.h mwaitintrin.h + avx512fp16intrin.h avx512fp16vlintrin.h avxifmaintrin.h + avxvnniint8intrin.h avxneconvertintrin.h cmpccxaddintrin.h amxfp16intrin.h prfchiintrin.h raointintrin.h amxcomplexintrin.h avxvnniint16intrin.h sm3intrin.h sha512intrin.h sm4intrin.h" diff --git a/gcc/config/i386/avx5124fmapsintrin.h b/gcc/config/i386/avx5124fmapsintrin.h index 97dd77c9235..4c884a5c203 100644 --- a/gcc/config/i386/avx5124fmapsintrin.h +++ b/gcc/config/i386/avx5124fmapsintrin.h @@ -30,7 +30,7 @@ #ifndef __AVX5124FMAPS__ #pragma GCC push_options -#pragma GCC target("avx5124fmaps") +#pragma GCC target("avx5124fmaps,evex512") #define __DISABLE_AVX5124FMAPS__ #endif /* __AVX5124FMAPS__ */ diff --git a/gcc/config/i386/avx5124vnniwintrin.h b/gcc/config/i386/avx5124vnniwintrin.h index fd129589798..795e4814f28 100644 --- a/gcc/config/i386/avx5124vnniwintrin.h +++ b/gcc/config/i386/avx5124vnniwintrin.h @@ -30,7 +30,7 @@ #ifndef __AVX5124VNNIW__ #pragma GCC push_options -#pragma GCC target("avx5124vnniw") +#pragma GCC target("avx5124vnniw,evex512") #define __DISABLE_AVX5124VNNIW__ #endif /* __AVX5124VNNIW__ */ diff --git a/gcc/config/i386/avx512bf16intrin.h b/gcc/config/i386/avx512bf16intrin.h index 107f4a448f6..94ccbf6389f 100644 --- a/gcc/config/i386/avx512bf16intrin.h +++ b/gcc/config/i386/avx512bf16intrin.h @@ -34,13 +34,6 @@ #define __DISABLE_AVX512BF16__ #endif /* __AVX512BF16__ */ -/* Internal data types for implementing the intrinsics. */ -typedef __bf16 __v32bf __attribute__ ((__vector_size__ (64))); - -/* The Intel API is flexible enough that we must allow aliasing with other - vector types, and their scalar components. */ -typedef __bf16 __m512bh __attribute__ ((__vector_size__ (64), __may_alias__)); - /* Convert One BF16 Data to One Single Float Data. */ extern __inline float __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) @@ -49,6 +42,24 @@ _mm_cvtsbh_ss (__bf16 __A) return __builtin_ia32_cvtbf2sf (__A); } +#ifdef __DISABLE_AVX512BF16__ +#undef __DISABLE_AVX512BF16__ +#pragma GCC pop_options +#endif /* __DISABLE_AVX512BF16__ */ + +#if !defined (__AVX512BF16__) || !defined (__EVEX512__) +#pragma GCC push_options +#pragma GCC target("avx512bf16,evex512") +#define __DISABLE_AVX512BF16_512__ +#endif /* __AVX512BF16_512__ */ + +/* Internal data types for implementing the intrinsics. */ +typedef __bf16 __v32bf __attribute__ ((__vector_size__ (64))); + +/* The Intel API is flexible enough that we must allow aliasing with other + vector types, and their scalar components. */ +typedef __bf16 __m512bh __attribute__ ((__vector_size__ (64), __may_alias__)); + /* vcvtne2ps2bf16 */ extern __inline __m512bh @@ -144,9 +155,9 @@ _mm512_mask_cvtpbh_ps (__m512 __S, __mmask16 __U, __m256bh __A) (__m512i)_mm512_cvtepi16_epi32 ((__m256i)__A), 16))); } -#ifdef __DISABLE_AVX512BF16__ -#undef __DISABLE_AVX512BF16__ +#ifdef __DISABLE_AVX512BF16_512__ +#undef __DISABLE_AVX512BF16_512__ #pragma GCC pop_options -#endif /* __DISABLE_AVX512BF16__ */ +#endif /* __DISABLE_AVX512BF16_512__ */ #endif /* _AVX512BF16INTRIN_H_INCLUDED */ diff --git a/gcc/config/i386/avx512bitalgintrin.h b/gcc/config/i386/avx512bitalgintrin.h index a1c7be109a9..af8514f5838 100644 --- a/gcc/config/i386/avx512bitalgintrin.h +++ b/gcc/config/i386/avx512bitalgintrin.h @@ -22,15 +22,15 @@ . */ #if !defined _IMMINTRIN_H_INCLUDED -# error "Never use directly; include instead." +# error "Never use directly; include instead." #endif #ifndef _AVX512BITALGINTRIN_H_INCLUDED #define _AVX512BITALGINTRIN_H_INCLUDED -#ifndef __AVX512BITALG__ +#if !defined (__AVX512BITALG__) || !defined (__EVEX512__) #pragma GCC push_options -#pragma GCC target("avx512bitalg") +#pragma GCC target("avx512bitalg,evex512") #define __DISABLE_AVX512BITALG__ #endif /* __AVX512BITALG__ */ @@ -108,153 +108,4 @@ _mm512_mask_bitshuffle_epi64_mask (__mmask64 __M, __m512i __A, __m512i __B) #pragma GCC pop_options #endif /* __DISABLE_AVX512BITALG__ */ -#if !defined(__AVX512BITALG__) || !defined(__AVX512VL__) -#pragma GCC push_options -#pragma GCC target("avx512bitalg,avx512vl") -#define __DISABLE_AVX512BITALGVL__ -#endif /* __AVX512BITALGVL__ */ - -extern __inline __m256i -__attribute__((__gnu_inline__, __always_inline__, __artificial__)) -_mm256_mask_popcnt_epi8 (__m256i __W, __mmask32 __U, __m256i __A) -{ - return (__m256i) __builtin_ia32_vpopcountb_v32qi_mask ((__v32qi) __A, - (__v32qi) __W, - (__mmask32) __U); -} - -extern __inline __m256i -__attribute__((__gnu_inline__, __always_inline__, __artificial__)) -_mm256_maskz_popcnt_epi8 (__mmask32 __U, __m256i __A) -{ - return (__m256i) __builtin_ia32_vpopcountb_v32qi_mask ((__v32qi) __A, - (__v32qi) - _mm256_setzero_si256 (), - (__mmask32) __U); -} - -extern __inline __mmask32 -__attribute__((__gnu_inline__, __always_inline__, __artificial__)) -_mm256_bitshuffle_epi64_mask (__m256i __A, __m256i __B) -{ - return (__mmask32) __builtin_ia32_vpshufbitqmb256_mask ((__v32qi) __A, - (__v32qi) __B, - (__mmask32) -1); -} - -extern __inline __mmask32 -__attribute__((__gnu_inline__, __always_inline__, __artificial__)) -_mm256_mask_bitshuffle_epi64_mask (__mmask32 __M, __m256i __A, __m256i __B) -{ - return (__mmask32) __builtin_ia32_vpshufbitqmb256_mask ((__v32qi) __A, - (__v32qi) __B, - (__mmask32) __M); -} - -extern __inline __mmask16 -__attribute__((__gnu_inline__, __always_inline__, __artificial__)) -_mm_bitshuffle_epi64_mask (__m128i __A, __m128i __B) -{ - return (__mmask16) __builtin_ia32_vpshufbitqmb128_mask ((__v16qi) __A, - (__v16qi) __B, - (__mmask16) -1); -} - -extern __inline __mmask16 -__attribute__((__gnu_inline__, __always_inline__, __artificial__)) -_mm_mask_bitshuffle_epi64_mask (__mmask16 __M, __m128i __A, __m128i __B) -{ - return (__mmask16) __builtin_ia32_vpshufbitqmb128_mask ((__v16qi) __A, - (__v16qi) __B, - (__mmask16) __M); -} - -extern __inline __m256i -__attribute__((__gnu_inline__, __always_inline__, __artificial__)) -_mm256_popcnt_epi8 (__m256i __A) -{ - return (__m256i) __builtin_ia32_vpopcountb_v32qi ((__v32qi) __A); -} - -extern __inline __m256i -__attribute__((__gnu_inline__, __always_inline__, __artificial__)) -_mm256_popcnt_epi16 (__m256i __A) -{ - return (__m256i) __builtin_ia32_vpopcountw_v16hi ((__v16hi) __A); -} - -extern __inline __m128i -__attribute__((__gnu_inline__, __always_inline__, __artificial__)) -_mm_popcnt_epi8 (__m128i __A) -{ - return (__m128i) __builtin_ia32_vpopcountb_v16qi ((__v16qi) __A); -} - -extern __inline __m128i -__attribute__((__gnu_inline__, __always_inline__, __artificial__)) -_mm_popcnt_epi16 (__m128i __A) -{ - return (__m128i) __builtin_ia32_vpopcountw_v8hi ((__v8hi) __A); -} - -extern __inline __m256i -__attribute__((__gnu_inline__, __always_inline__, __artificial__)) -_mm256_mask_popcnt_epi16 (__m256i __W, __mmask16 __U, __m256i __A) -{ - return (__m256i) __builtin_ia32_vpopcountw_v16hi_mask ((__v16hi) __A, - (__v16hi) __W, - (__mmask16) __U); -} - -extern __inline __m256i -__attribute__((__gnu_inline__, __always_inline__, __artificial__)) -_mm256_maskz_popcnt_epi16 (__mmask16 __U, __m256i __A) -{ - return (__m256i) __builtin_ia32_vpopcountw_v16hi_mask ((__v16hi) __A, - (__v16hi) - _mm256_setzero_si256 (), - (__mmask16) __U); -} - -extern __inline __m128i -__attribute__((__gnu_inline__, __always_inline__, __artificial__)) -_mm_mask_popcnt_epi8 (__m128i __W, __mmask16 __U, __m128i __A) -{ - return (__m128i) __builtin_ia32_vpopcountb_v16qi_mask ((__v16qi) __A, - (__v16qi) __W, - (__mmask16) __U); -} - -extern __inline __m128i -__attribute__((__gnu_inline__, __always_inline__, __artificial__)) -_mm_maskz_popcnt_epi8 (__mmask16 __U, __m128i __A) -{ - return (__m128i) __builtin_ia32_vpopcountb_v16qi_mask ((__v16qi) __A, - (__v16qi) - _mm_setzero_si128 (), - (__mmask16) __U); -} -extern __inline __m128i -__attribute__((__gnu_inline__, __always_inline__, __artificial__)) -_mm_mask_popcnt_epi16 (__m128i __W, __mmask8 __U, __m128i __A) -{ - return (__m128i) __builtin_ia32_vpopcountw_v8hi_mask ((__v8hi) __A, - (__v8hi) __W, - (__mmask8) __U); -} - -extern __inline __m128i -__attribute__((__gnu_inline__, __always_inline__, __artificial__)) -_mm_maskz_popcnt_epi16 (__mmask8 __U, __m128i __A) -{ - return (__m128i) __builtin_ia32_vpopcountw_v8hi_mask ((__v8hi) __A, - (__v8hi) - _mm_setzero_si128 (), - (__mmask8) __U); -} -#ifdef __DISABLE_AVX512BITALGVL__ -#undef __DISABLE_AVX512BITALGVL__ -#pragma GCC pop_options -#endif /* __DISABLE_AVX512BITALGVL__ */ - #endif /* _AVX512BITALGINTRIN_H_INCLUDED */ diff --git a/gcc/config/i386/avx512bitalgvlintrin.h b/gcc/config/i386/avx512bitalgvlintrin.h new file mode 100644 index 00000000000..36d697dea8a --- /dev/null +++ b/gcc/config/i386/avx512bitalgvlintrin.h @@ -0,0 +1,180 @@ +/* Copyright (C) 2023 Free Software Foundation, Inc. + + This file is part of GCC. + + GCC is free software; you can redistribute it and/or modify + it under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 3, or (at your option) + any later version. + + GCC is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + Under Section 7 of GPL version 3, you are granted additional + permissions described in the GCC Runtime Library Exception, version + 3.1, as published by the Free Software Foundation. + + You should have received a copy of the GNU General Public License and + a copy of the GCC Runtime Library Exception along with this program; + see the files COPYING3 and COPYING.RUNTIME respectively. If not, see + . */ + +#if !defined _IMMINTRIN_H_INCLUDED +# error "Never use directly; include instead." +#endif + +#ifndef _AVX512BITALGVLINTRIN_H_INCLUDED +#define _AVX512BITALGVLINTRIN_H_INCLUDED + +#if !defined(__AVX512BITALG__) || !defined(__AVX512VL__) +#pragma GCC push_options +#pragma GCC target("avx512bitalg,avx512vl") +#define __DISABLE_AVX512BITALGVL__ +#endif /* __AVX512BITALGVL__ */ + +extern __inline __m256i +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_mask_popcnt_epi8 (__m256i __W, __mmask32 __U, __m256i __A) +{ + return (__m256i) __builtin_ia32_vpopcountb_v32qi_mask ((__v32qi) __A, + (__v32qi) __W, + (__mmask32) __U); +} + +extern __inline __m256i +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_maskz_popcnt_epi8 (__mmask32 __U, __m256i __A) +{ + return (__m256i) __builtin_ia32_vpopcountb_v32qi_mask ((__v32qi) __A, + (__v32qi) + _mm256_setzero_si256 (), + (__mmask32) __U); +} + +extern __inline __mmask32 +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_bitshuffle_epi64_mask (__m256i __A, __m256i __B) +{ + return (__mmask32) __builtin_ia32_vpshufbitqmb256_mask ((__v32qi) __A, + (__v32qi) __B, + (__mmask32) -1); +} + +extern __inline __mmask32 +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_mask_bitshuffle_epi64_mask (__mmask32 __M, __m256i __A, __m256i __B) +{ + return (__mmask32) __builtin_ia32_vpshufbitqmb256_mask ((__v32qi) __A, + (__v32qi) __B, + (__mmask32) __M); +} + +extern __inline __mmask16 +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_bitshuffle_epi64_mask (__m128i __A, __m128i __B) +{ + return (__mmask16) __builtin_ia32_vpshufbitqmb128_mask ((__v16qi) __A, + (__v16qi) __B, + (__mmask16) -1); +} + +extern __inline __mmask16 +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_bitshuffle_epi64_mask (__mmask16 __M, __m128i __A, __m128i __B) +{ + return (__mmask16) __builtin_ia32_vpshufbitqmb128_mask ((__v16qi) __A, + (__v16qi) __B, + (__mmask16) __M); +} + +extern __inline __m256i +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_popcnt_epi8 (__m256i __A) +{ + return (__m256i) __builtin_ia32_vpopcountb_v32qi ((__v32qi) __A); +} + +extern __inline __m256i +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_popcnt_epi16 (__m256i __A) +{ + return (__m256i) __builtin_ia32_vpopcountw_v16hi ((__v16hi) __A); +} + +extern __inline __m128i +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_popcnt_epi8 (__m128i __A) +{ + return (__m128i) __builtin_ia32_vpopcountb_v16qi ((__v16qi) __A); +} + +extern __inline __m128i +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_popcnt_epi16 (__m128i __A) +{ + return (__m128i) __builtin_ia32_vpopcountw_v8hi ((__v8hi) __A); +} + +extern __inline __m256i +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_mask_popcnt_epi16 (__m256i __W, __mmask16 __U, __m256i __A) +{ + return (__m256i) __builtin_ia32_vpopcountw_v16hi_mask ((__v16hi) __A, + (__v16hi) __W, + (__mmask16) __U); +} + +extern __inline __m256i +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_maskz_popcnt_epi16 (__mmask16 __U, __m256i __A) +{ + return (__m256i) __builtin_ia32_vpopcountw_v16hi_mask ((__v16hi) __A, + (__v16hi) + _mm256_setzero_si256 (), + (__mmask16) __U); +} + +extern __inline __m128i +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_popcnt_epi8 (__m128i __W, __mmask16 __U, __m128i __A) +{ + return (__m128i) __builtin_ia32_vpopcountb_v16qi_mask ((__v16qi) __A, + (__v16qi) __W, + (__mmask16) __U); +} + +extern __inline __m128i +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_popcnt_epi8 (__mmask16 __U, __m128i __A) +{ + return (__m128i) __builtin_ia32_vpopcountb_v16qi_mask ((__v16qi) __A, + (__v16qi) + _mm_setzero_si128 (), + (__mmask16) __U); +} +extern __inline __m128i +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_popcnt_epi16 (__m128i __W, __mmask8 __U, __m128i __A) +{ + return (__m128i) __builtin_ia32_vpopcountw_v8hi_mask ((__v8hi) __A, + (__v8hi) __W, + (__mmask8) __U); +} + +extern __inline __m128i +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_popcnt_epi16 (__mmask8 __U, __m128i __A) +{ + return (__m128i) __builtin_ia32_vpopcountw_v8hi_mask ((__v8hi) __A, + (__v8hi) + _mm_setzero_si128 (), + (__mmask8) __U); +} +#ifdef __DISABLE_AVX512BITALGVL__ +#undef __DISABLE_AVX512BITALGVL__ +#pragma GCC pop_options +#endif /* __DISABLE_AVX512BITALGVL__ */ + +#endif /* _AVX512BITALGVLINTRIN_H_INCLUDED */ diff --git a/gcc/config/i386/avx512erintrin.h b/gcc/config/i386/avx512erintrin.h index bd83b7fbbc6..5c7be9c47ac 100644 --- a/gcc/config/i386/avx512erintrin.h +++ b/gcc/config/i386/avx512erintrin.h @@ -30,7 +30,7 @@ #ifndef __AVX512ER__ #pragma GCC push_options -#pragma GCC target("avx512er") +#pragma GCC target("avx512er,evex512") #define __DISABLE_AVX512ER__ #endif /* __AVX512ER__ */ diff --git a/gcc/config/i386/avx512ifmaintrin.h b/gcc/config/i386/avx512ifmaintrin.h index fc97f1defe8..e08078b2725 100644 --- a/gcc/config/i386/avx512ifmaintrin.h +++ b/gcc/config/i386/avx512ifmaintrin.h @@ -28,9 +28,9 @@ #ifndef _AVX512IFMAINTRIN_H_INCLUDED #define _AVX512IFMAINTRIN_H_INCLUDED -#ifndef __AVX512IFMA__ +#if !defined (__AVX512IFMA__) || !defined (__EVEX512__) #pragma GCC push_options -#pragma GCC target("avx512ifma") +#pragma GCC target("avx512ifma,evex512") #define __DISABLE_AVX512IFMA__ #endif /* __AVX512IFMA__ */ diff --git a/gcc/config/i386/avx512pfintrin.h b/gcc/config/i386/avx512pfintrin.h index a547610660a..58af26ff02e 100644 --- a/gcc/config/i386/avx512pfintrin.h +++ b/gcc/config/i386/avx512pfintrin.h @@ -30,7 +30,7 @@ #ifndef __AVX512PF__ #pragma GCC push_options -#pragma GCC target("avx512pf") +#pragma GCC target("avx512pf,evex512") #define __DISABLE_AVX512PF__ #endif /* __AVX512PF__ */ diff --git a/gcc/config/i386/avx512vbmi2intrin.h b/gcc/config/i386/avx512vbmi2intrin.h index ca00f8a5f14..b7ff07b2d11 100644 --- a/gcc/config/i386/avx512vbmi2intrin.h +++ b/gcc/config/i386/avx512vbmi2intrin.h @@ -28,9 +28,9 @@ #ifndef __AVX512VBMI2INTRIN_H_INCLUDED #define __AVX512VBMI2INTRIN_H_INCLUDED -#if !defined(__AVX512VBMI2__) +#if !defined(__AVX512VBMI2__) || !defined (__EVEX512__) #pragma GCC push_options -#pragma GCC target("avx512vbmi2") +#pragma GCC target("avx512vbmi2,evex512") #define __DISABLE_AVX512VBMI2__ #endif /* __AVX512VBMI2__ */ diff --git a/gcc/config/i386/avx512vbmiintrin.h b/gcc/config/i386/avx512vbmiintrin.h index 502586090ae..1a7ab4edca3 100644 --- a/gcc/config/i386/avx512vbmiintrin.h +++ b/gcc/config/i386/avx512vbmiintrin.h @@ -28,9 +28,9 @@ #ifndef _AVX512VBMIINTRIN_H_INCLUDED #define _AVX512VBMIINTRIN_H_INCLUDED -#ifndef __AVX512VBMI__ +#if !defined (__AVX512VBMI__) || !defined (__EVEX512__) #pragma GCC push_options -#pragma GCC target("avx512vbmi") +#pragma GCC target("avx512vbmi,evex512") #define __DISABLE_AVX512VBMI__ #endif /* __AVX512VBMI__ */ diff --git a/gcc/config/i386/avx512vnniintrin.h b/gcc/config/i386/avx512vnniintrin.h index e36e2e57f21..1090703ec48 100644 --- a/gcc/config/i386/avx512vnniintrin.h +++ b/gcc/config/i386/avx512vnniintrin.h @@ -28,9 +28,9 @@ #ifndef __AVX512VNNIINTRIN_H_INCLUDED #define __AVX512VNNIINTRIN_H_INCLUDED -#if !defined(__AVX512VNNI__) +#if !defined(__AVX512VNNI__) || !defined (__EVEX512__) #pragma GCC push_options -#pragma GCC target("avx512vnni") +#pragma GCC target("avx512vnni,evex512") #define __DISABLE_AVX512VNNI__ #endif /* __AVX512VNNI__ */ diff --git a/gcc/config/i386/avx512vp2intersectintrin.h b/gcc/config/i386/avx512vp2intersectintrin.h index 65e2fb1abf5..bf68245155d 100644 --- a/gcc/config/i386/avx512vp2intersectintrin.h +++ b/gcc/config/i386/avx512vp2intersectintrin.h @@ -28,9 +28,9 @@ #ifndef _AVX512VP2INTERSECTINTRIN_H_INCLUDED #define _AVX512VP2INTERSECTINTRIN_H_INCLUDED -#if !defined(__AVX512VP2INTERSECT__) +#if !defined(__AVX512VP2INTERSECT__) || !defined (__EVEX512__) #pragma GCC push_options -#pragma GCC target("avx512vp2intersect") +#pragma GCC target("avx512vp2intersect,evex512") #define __DISABLE_AVX512VP2INTERSECT__ #endif /* __AVX512VP2INTERSECT__ */ diff --git a/gcc/config/i386/avx512vpopcntdqintrin.h b/gcc/config/i386/avx512vpopcntdqintrin.h index 47897fbd8d7..9470a403f8e 100644 --- a/gcc/config/i386/avx512vpopcntdqintrin.h +++ b/gcc/config/i386/avx512vpopcntdqintrin.h @@ -28,9 +28,9 @@ #ifndef _AVX512VPOPCNTDQINTRIN_H_INCLUDED #define _AVX512VPOPCNTDQINTRIN_H_INCLUDED -#ifndef __AVX512VPOPCNTDQ__ +#if !defined (__AVX512VPOPCNTDQ__) || !defined (__EVEX512__) #pragma GCC push_options -#pragma GCC target("avx512vpopcntdq") +#pragma GCC target("avx512vpopcntdq,evex512") #define __DISABLE_AVX512VPOPCNTDQ__ #endif /* __AVX512VPOPCNTDQ__ */ diff --git a/gcc/config/i386/gfniintrin.h b/gcc/config/i386/gfniintrin.h index ef3dc225b40..907e7a0cf7a 100644 --- a/gcc/config/i386/gfniintrin.h +++ b/gcc/config/i386/gfniintrin.h @@ -297,9 +297,53 @@ _mm256_maskz_gf2p8affine_epi64_epi8 (__mmask32 __A, __m256i __B, #pragma GCC pop_options #endif /* __GFNIAVX512VLBW__ */ -#if !defined(__GFNI__) || !defined(__AVX512F__) || !defined(__AVX512BW__) +#if !defined(__GFNI__) || !defined(__EVEX512__) || !defined(__AVX512F__) #pragma GCC push_options -#pragma GCC target("gfni,avx512f,avx512bw") +#pragma GCC target("gfni,avx512f,evex512") +#define __DISABLE_GFNIAVX512F__ +#endif /* __GFNIAVX512F__ */ + +extern __inline __m512i +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_gf2p8mul_epi8 (__m512i __A, __m512i __B) +{ + return (__m512i) __builtin_ia32_vgf2p8mulb_v64qi ((__v64qi) __A, + (__v64qi) __B); +} + +#ifdef __OPTIMIZE__ +extern __inline __m512i +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_gf2p8affineinv_epi64_epi8 (__m512i __A, __m512i __B, const int __C) +{ + return (__m512i) __builtin_ia32_vgf2p8affineinvqb_v64qi ((__v64qi) __A, + (__v64qi) __B, __C); +} + +extern __inline __m512i +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_gf2p8affine_epi64_epi8 (__m512i __A, __m512i __B, const int __C) +{ + return (__m512i) __builtin_ia32_vgf2p8affineqb_v64qi ((__v64qi) __A, + (__v64qi) __B, __C); +} +#else +#define _mm512_gf2p8affineinv_epi64_epi8(A, B, C) \ + ((__m512i) __builtin_ia32_vgf2p8affineinvqb_v64qi ( \ + (__v64qi)(__m512i)(A), (__v64qi)(__m512i)(B), (int)(C))) +#define _mm512_gf2p8affine_epi64_epi8(A, B, C) \ + ((__m512i) __builtin_ia32_vgf2p8affineqb_v64qi ((__v64qi)(__m512i)(A), \ + (__v64qi)(__m512i)(B), (int)(C))) +#endif + +#ifdef __DISABLE_GFNIAVX512F__ +#undef __DISABLE_GFNIAVX512F__ +#pragma GCC pop_options +#endif /* __GFNIAVX512F__ */ + +#if !defined(__GFNI__) || !defined(__EVEX512__) || !defined(__AVX512BW__) +#pragma GCC push_options +#pragma GCC target("gfni,avx512bw,evex512") #define __DISABLE_GFNIAVX512FBW__ #endif /* __GFNIAVX512FBW__ */ @@ -319,13 +363,6 @@ _mm512_maskz_gf2p8mul_epi8 (__mmask64 __A, __m512i __B, __m512i __C) return (__m512i) __builtin_ia32_vgf2p8mulb_v64qi_mask ((__v64qi) __B, (__v64qi) __C, (__v64qi) _mm512_setzero_si512 (), __A); } -extern __inline __m512i -__attribute__((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_gf2p8mul_epi8 (__m512i __A, __m512i __B) -{ - return (__m512i) __builtin_ia32_vgf2p8mulb_v64qi ((__v64qi) __A, - (__v64qi) __B); -} #ifdef __OPTIMIZE__ extern __inline __m512i @@ -350,14 +387,6 @@ _mm512_maskz_gf2p8affineinv_epi64_epi8 (__mmask64 __A, __m512i __B, (__v64qi) _mm512_setzero_si512 (), __A); } -extern __inline __m512i -__attribute__((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_gf2p8affineinv_epi64_epi8 (__m512i __A, __m512i __B, const int __C) -{ - return (__m512i) __builtin_ia32_vgf2p8affineinvqb_v64qi ((__v64qi) __A, - (__v64qi) __B, __C); -} - extern __inline __m512i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) _mm512_mask_gf2p8affine_epi64_epi8 (__m512i __A, __mmask64 __B, __m512i __C, @@ -375,13 +404,6 @@ _mm512_maskz_gf2p8affine_epi64_epi8 (__mmask64 __A, __m512i __B, __m512i __C, return (__m512i) __builtin_ia32_vgf2p8affineqb_v64qi_mask ((__v64qi) __B, (__v64qi) __C, __D, (__v64qi) _mm512_setzero_si512 (), __A); } -extern __inline __m512i -__attribute__((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_gf2p8affine_epi64_epi8 (__m512i __A, __m512i __B, const int __C) -{ - return (__m512i) __builtin_ia32_vgf2p8affineqb_v64qi ((__v64qi) __A, - (__v64qi) __B, __C); -} #else #define _mm512_mask_gf2p8affineinv_epi64_epi8(A, B, C, D, E) \ ((__m512i) __builtin_ia32_vgf2p8affineinvqb_v64qi_mask( \ @@ -391,9 +413,6 @@ _mm512_gf2p8affine_epi64_epi8 (__m512i __A, __m512i __B, const int __C) ((__m512i) __builtin_ia32_vgf2p8affineinvqb_v64qi_mask( \ (__v64qi)(__m512i)(B), (__v64qi)(__m512i)(C), (int)(D), \ (__v64qi)(__m512i) _mm512_setzero_si512 (), (__mmask64)(A))) -#define _mm512_gf2p8affineinv_epi64_epi8(A, B, C) \ - ((__m512i) __builtin_ia32_vgf2p8affineinvqb_v64qi ( \ - (__v64qi)(__m512i)(A), (__v64qi)(__m512i)(B), (int)(C))) #define _mm512_mask_gf2p8affine_epi64_epi8(A, B, C, D, E) \ ((__m512i) __builtin_ia32_vgf2p8affineqb_v64qi_mask((__v64qi)(__m512i)(C),\ (__v64qi)(__m512i)(D), (int)(E), (__v64qi)(__m512i)(A), (__mmask64)(B))) @@ -401,9 +420,6 @@ _mm512_gf2p8affine_epi64_epi8 (__m512i __A, __m512i __B, const int __C) ((__m512i) __builtin_ia32_vgf2p8affineqb_v64qi_mask((__v64qi)(__m512i)(B),\ (__v64qi)(__m512i)(C), (int)(D), \ (__v64qi)(__m512i) _mm512_setzero_si512 (), (__mmask64)(A))) -#define _mm512_gf2p8affine_epi64_epi8(A, B, C) \ - ((__m512i) __builtin_ia32_vgf2p8affineqb_v64qi ((__v64qi)(__m512i)(A), \ - (__v64qi)(__m512i)(B), (int)(C))) #endif #ifdef __DISABLE_GFNIAVX512FBW__ diff --git a/gcc/config/i386/immintrin.h b/gcc/config/i386/immintrin.h index 29b4dbbda24..4e17901db15 100644 --- a/gcc/config/i386/immintrin.h +++ b/gcc/config/i386/immintrin.h @@ -96,6 +96,8 @@ #include +#include + #include #include diff --git a/gcc/config/i386/vaesintrin.h b/gcc/config/i386/vaesintrin.h index 58fc19c9eb3..b2bcdbe5bd1 100644 --- a/gcc/config/i386/vaesintrin.h +++ b/gcc/config/i386/vaesintrin.h @@ -66,9 +66,9 @@ _mm256_aesenclast_epi128 (__m256i __A, __m256i __B) #endif /* __DISABLE_VAES__ */ -#if !defined(__VAES__) || !defined(__AVX512F__) +#if !defined(__VAES__) || !defined(__AVX512F__) || !defined(__EVEX512__) #pragma GCC push_options -#pragma GCC target("vaes,avx512f") +#pragma GCC target("vaes,avx512f,evex512") #define __DISABLE_VAESF__ #endif /* __VAES__ */ diff --git a/gcc/config/i386/vpclmulqdqintrin.h b/gcc/config/i386/vpclmulqdqintrin.h index 2c83b6037a0..c8c2c19d33f 100644 --- a/gcc/config/i386/vpclmulqdqintrin.h +++ b/gcc/config/i386/vpclmulqdqintrin.h @@ -28,9 +28,9 @@ #ifndef _VPCLMULQDQINTRIN_H_INCLUDED #define _VPCLMULQDQINTRIN_H_INCLUDED -#if !defined(__VPCLMULQDQ__) || !defined(__AVX512F__) +#if !defined(__VPCLMULQDQ__) || !defined(__AVX512F__) || !defined(__EVEX512__) #pragma GCC push_options -#pragma GCC target("vpclmulqdq,avx512f") +#pragma GCC target("vpclmulqdq,avx512f,evex512") #define __DISABLE_VPCLMULQDQF__ #endif /* __VPCLMULQDQF__ */ From patchwork Thu Sep 21 07:20:01 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Hu, Lin1" X-Patchwork-Id: 1837526 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=intel.com header.i=@intel.com header.a=rsa-sha256 header.s=Intel header.b=nAH/sSpQ; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4Rrn4Z6tDJz1yh6 for ; Thu, 21 Sep 2023 17:25:50 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 493FC3948018 for ; Thu, 21 Sep 2023 07:25:47 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mgamail.intel.com (mgamail.intel.com [192.55.52.120]) by sourceware.org (Postfix) with ESMTPS id 332933871035 for ; Thu, 21 Sep 2023 07:25:12 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 332933871035 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=intel.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1695281112; x=1726817112; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=WR9qGaNttLfxdJJhxDU/K6Bdh+OOXI7dNqeCojSLjwc=; b=nAH/sSpQgwHoDzARLKxQs3Z0BL4AaXDBgm4HhRGwQ7pnotRGKR34FMYm oRVLmzeVIz7+0KXiv9mRANutlRle+XUWJ+uRmZSaRQVHZQXOsitgUjEd2 /SdzNJe2+FYXdfNbf0Xeq+nBTUe9Z+HryqbZCg/PkUmnIXiA8rYJABeMa d86l7uGNl4ss6nELIBYXwxKEfHwQ7DirSL6wYJDqcxRvYkXKQ7VvA07zx h59DwmlO8j+qKNeZKtCtJb/zaxvdCtIUjYx577CZvnM8/3t49GCn+eWHo WGNqfkF4oA2Tjv6YJX7iUuKBCO/0N8AhL7sfIshf64V4qaFslDkAgo1xI Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10839"; a="379326774" X-IronPort-AV: E=Sophos;i="6.03,164,1694761200"; d="scan'208";a="379326774" Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by fmsmga104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Sep 2023 00:22:48 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10839"; a="817262387" X-IronPort-AV: E=Sophos;i="6.03,164,1694761200"; d="scan'208";a="817262387" Received: from shvmail03.sh.intel.com ([10.239.245.20]) by fmsmga004.fm.intel.com with ESMTP; 21 Sep 2023 00:22:15 -0700 Received: from shliclel4217.sh.intel.com (shliclel4217.sh.intel.com [10.239.240.127]) by shvmail03.sh.intel.com (Postfix) with ESMTP id 273A71005134; Thu, 21 Sep 2023 15:22:14 +0800 (CST) From: "Hu, Lin1" To: gcc-patches@gcc.gnu.org Cc: hongtao.liu@intel.com, ubizjak@gmail.com, haochen.jiang@intel.com Subject: [PATCH 06/18] [PATCH 5/5] Push evex512 target for 512 bit intrins Date: Thu, 21 Sep 2023 15:20:01 +0800 Message-Id: <20230921072013.2124750-7-lin1.hu@intel.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20230921072013.2124750-1-lin1.hu@intel.com> References: <20230921072013.2124750-1-lin1.hu@intel.com> MIME-Version: 1.0 X-Spam-Status: No, score=-11.6 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org From: Haochen Jiang gcc/Changelog: * config/i386/avx512fp16intrin.h: Add evex512 target for 512 bit intrins. Co-authored-by: Hu, Lin1 --- gcc/config/i386/avx512fp16intrin.h | 8925 ++++++++++++++-------------- 1 file changed, 4476 insertions(+), 4449 deletions(-) diff --git a/gcc/config/i386/avx512fp16intrin.h b/gcc/config/i386/avx512fp16intrin.h index dd083e5ed67..92c0c24e9bd 100644 --- a/gcc/config/i386/avx512fp16intrin.h +++ b/gcc/config/i386/avx512fp16intrin.h @@ -25,8 +25,8 @@ #error "Never use directly; include instead." #endif -#ifndef __AVX512FP16INTRIN_H_INCLUDED -#define __AVX512FP16INTRIN_H_INCLUDED +#ifndef _AVX512FP16INTRIN_H_INCLUDED +#define _AVX512FP16INTRIN_H_INCLUDED #ifndef __AVX512FP16__ #pragma GCC push_options @@ -37,21 +37,17 @@ /* Internal data types for implementing the intrinsics. */ typedef _Float16 __v8hf __attribute__ ((__vector_size__ (16))); typedef _Float16 __v16hf __attribute__ ((__vector_size__ (32))); -typedef _Float16 __v32hf __attribute__ ((__vector_size__ (64))); /* The Intel API is flexible enough that we must allow aliasing with other vector types, and their scalar components. */ typedef _Float16 __m128h __attribute__ ((__vector_size__ (16), __may_alias__)); typedef _Float16 __m256h __attribute__ ((__vector_size__ (32), __may_alias__)); -typedef _Float16 __m512h __attribute__ ((__vector_size__ (64), __may_alias__)); /* Unaligned version of the same type. */ typedef _Float16 __m128h_u __attribute__ ((__vector_size__ (16), \ __may_alias__, __aligned__ (1))); typedef _Float16 __m256h_u __attribute__ ((__vector_size__ (32), \ __may_alias__, __aligned__ (1))); -typedef _Float16 __m512h_u __attribute__ ((__vector_size__ (64), \ - __may_alias__, __aligned__ (1))); extern __inline __m128h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) @@ -78,33 +74,8 @@ _mm256_set_ph (_Float16 __A15, _Float16 __A14, _Float16 __A13, __A12, __A13, __A14, __A15 }; } -extern __inline __m512h -__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_set_ph (_Float16 __A31, _Float16 __A30, _Float16 __A29, - _Float16 __A28, _Float16 __A27, _Float16 __A26, - _Float16 __A25, _Float16 __A24, _Float16 __A23, - _Float16 __A22, _Float16 __A21, _Float16 __A20, - _Float16 __A19, _Float16 __A18, _Float16 __A17, - _Float16 __A16, _Float16 __A15, _Float16 __A14, - _Float16 __A13, _Float16 __A12, _Float16 __A11, - _Float16 __A10, _Float16 __A9, _Float16 __A8, - _Float16 __A7, _Float16 __A6, _Float16 __A5, - _Float16 __A4, _Float16 __A3, _Float16 __A2, - _Float16 __A1, _Float16 __A0) -{ - return __extension__ (__m512h)(__v32hf){ __A0, __A1, __A2, __A3, - __A4, __A5, __A6, __A7, - __A8, __A9, __A10, __A11, - __A12, __A13, __A14, __A15, - __A16, __A17, __A18, __A19, - __A20, __A21, __A22, __A23, - __A24, __A25, __A26, __A27, - __A28, __A29, __A30, __A31 }; -} - -/* Create vectors of elements in the reversed order from _mm_set_ph, - _mm256_set_ph and _mm512_set_ph functions. */ - +/* Create vectors of elements in the reversed order from _mm_set_ph + and _mm256_set_ph functions. */ extern __inline __m128h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) _mm_setr_ph (_Float16 __A0, _Float16 __A1, _Float16 __A2, @@ -128,30 +99,7 @@ _mm256_setr_ph (_Float16 __A0, _Float16 __A1, _Float16 __A2, __A0); } -extern __inline __m512h -__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_setr_ph (_Float16 __A0, _Float16 __A1, _Float16 __A2, - _Float16 __A3, _Float16 __A4, _Float16 __A5, - _Float16 __A6, _Float16 __A7, _Float16 __A8, - _Float16 __A9, _Float16 __A10, _Float16 __A11, - _Float16 __A12, _Float16 __A13, _Float16 __A14, - _Float16 __A15, _Float16 __A16, _Float16 __A17, - _Float16 __A18, _Float16 __A19, _Float16 __A20, - _Float16 __A21, _Float16 __A22, _Float16 __A23, - _Float16 __A24, _Float16 __A25, _Float16 __A26, - _Float16 __A27, _Float16 __A28, _Float16 __A29, - _Float16 __A30, _Float16 __A31) - -{ - return _mm512_set_ph (__A31, __A30, __A29, __A28, __A27, __A26, __A25, - __A24, __A23, __A22, __A21, __A20, __A19, __A18, - __A17, __A16, __A15, __A14, __A13, __A12, __A11, - __A10, __A9, __A8, __A7, __A6, __A5, __A4, __A3, - __A2, __A1, __A0); -} - /* Broadcast _Float16 to vector. */ - extern __inline __m128h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) _mm_set1_ph (_Float16 __A) @@ -167,18 +115,7 @@ _mm256_set1_ph (_Float16 __A) __A, __A, __A, __A, __A, __A, __A, __A); } -extern __inline __m512h -__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_set1_ph (_Float16 __A) -{ - return _mm512_set_ph (__A, __A, __A, __A, __A, __A, __A, __A, - __A, __A, __A, __A, __A, __A, __A, __A, - __A, __A, __A, __A, __A, __A, __A, __A, - __A, __A, __A, __A, __A, __A, __A, __A); -} - /* Create a vector with all zeros. */ - extern __inline __m128h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) _mm_setzero_ph (void) @@ -193,13 +130,6 @@ _mm256_setzero_ph (void) return _mm256_set1_ph (0.0f16); } -extern __inline __m512h -__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_setzero_ph (void) -{ - return _mm512_set1_ph (0.0f16); -} - extern __inline __m128h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) _mm_undefined_ph (void) @@ -222,3815 +152,4056 @@ _mm256_undefined_ph (void) return __Y; } -extern __inline __m512h -__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_undefined_ph (void) -{ -#pragma GCC diagnostic push -#pragma GCC diagnostic ignored "-Winit-self" - __m512h __Y = __Y; -#pragma GCC diagnostic pop - return __Y; -} - extern __inline _Float16 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_cvtsh_h (__m128h __A) +_mm256_cvtsh_h (__m256h __A) { return __A[0]; } -extern __inline _Float16 +extern __inline __m256h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm256_cvtsh_h (__m256h __A) +_mm256_load_ph (void const *__P) { - return __A[0]; + return *(const __m256h *) __P; } -extern __inline _Float16 +extern __inline __m128h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_cvtsh_h (__m512h __A) +_mm_load_ph (void const *__P) { - return __A[0]; + return *(const __m128h *) __P; } -extern __inline __m512 +extern __inline __m256h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_castph_ps (__m512h __a) +_mm256_loadu_ph (void const *__P) { - return (__m512) __a; + return *(const __m256h_u *) __P; } -extern __inline __m512d +extern __inline __m128h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_castph_pd (__m512h __a) +_mm_loadu_ph (void const *__P) { - return (__m512d) __a; + return *(const __m128h_u *) __P; } -extern __inline __m512i +extern __inline void __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_castph_si512 (__m512h __a) +_mm256_store_ph (void *__P, __m256h __A) { - return (__m512i) __a; + *(__m256h *) __P = __A; } -extern __inline __m128h +extern __inline void __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_castph512_ph128 (__m512h __A) +_mm_store_ph (void *__P, __m128h __A) { - union - { - __m128h __a[4]; - __m512h __v; - } __u = { .__v = __A }; - return __u.__a[0]; + *(__m128h *) __P = __A; } -extern __inline __m256h +extern __inline void __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_castph512_ph256 (__m512h __A) +_mm256_storeu_ph (void *__P, __m256h __A) { - union - { - __m256h __a[2]; - __m512h __v; - } __u = { .__v = __A }; - return __u.__a[0]; + *(__m256h_u *) __P = __A; } -extern __inline __m512h +extern __inline void __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_castph128_ph512 (__m128h __A) +_mm_storeu_ph (void *__P, __m128h __A) { - union - { - __m128h __a[4]; - __m512h __v; - } __u; - __u.__a[0] = __A; - return __u.__v; + *(__m128h_u *) __P = __A; } -extern __inline __m512h +/* Create a vector with element 0 as F and the rest zero. */ +extern __inline __m128h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_castph256_ph512 (__m256h __A) +_mm_set_sh (_Float16 __F) { - union - { - __m256h __a[2]; - __m512h __v; - } __u; - __u.__a[0] = __A; - return __u.__v; + return _mm_set_ph (0.0f16, 0.0f16, 0.0f16, 0.0f16, 0.0f16, 0.0f16, 0.0f16, + __F); } -extern __inline __m512h +/* Create a vector with element 0 as *P and the rest zero. */ +extern __inline __m128h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_zextph128_ph512 (__m128h __A) +_mm_load_sh (void const *__P) { - return (__m512h) _mm512_insertf32x4 (_mm512_setzero_ps (), - (__m128) __A, 0); + return _mm_set_ph (0.0f16, 0.0f16, 0.0f16, 0.0f16, 0.0f16, 0.0f16, 0.0f16, + *(_Float16 const *) __P); } -extern __inline __m512h +/* Stores the lower _Float16 value. */ +extern __inline void __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_zextph256_ph512 (__m256h __A) +_mm_store_sh (void *__P, __m128h __A) { - return (__m512h) _mm512_insertf64x4 (_mm512_setzero_pd (), - (__m256d) __A, 0); + *(_Float16 *) __P = ((__v8hf)__A)[0]; } -extern __inline __m512h -__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_castps_ph (__m512 __a) +/* Intrinsics of v[add,sub,mul,div]sh. */ +extern __inline __m128h + __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_add_sh (__m128h __A, __m128h __B) { - return (__m512h) __a; + __A[0] += __B[0]; + return __A; } -extern __inline __m512h +extern __inline __m128h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_castpd_ph (__m512d __a) +_mm_mask_add_sh (__m128h __A, __mmask8 __B, __m128h __C, __m128h __D) { - return (__m512h) __a; + return __builtin_ia32_addsh_mask (__C, __D, __A, __B); } -extern __inline __m512h +extern __inline __m128h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_castsi512_ph (__m512i __a) +_mm_maskz_add_sh (__mmask8 __A, __m128h __B, __m128h __C) { - return (__m512h) __a; + return __builtin_ia32_addsh_mask (__B, __C, _mm_setzero_ph (), + __A); } -/* Create a vector with element 0 as F and the rest zero. */ extern __inline __m128h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_set_sh (_Float16 __F) +_mm_sub_sh (__m128h __A, __m128h __B) { - return _mm_set_ph (0.0f16, 0.0f16, 0.0f16, 0.0f16, 0.0f16, 0.0f16, 0.0f16, - __F); + __A[0] -= __B[0]; + return __A; } -/* Create a vector with element 0 as *P and the rest zero. */ extern __inline __m128h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_load_sh (void const *__P) +_mm_mask_sub_sh (__m128h __A, __mmask8 __B, __m128h __C, __m128h __D) { - return _mm_set_ph (0.0f16, 0.0f16, 0.0f16, 0.0f16, 0.0f16, 0.0f16, 0.0f16, - *(_Float16 const *) __P); + return __builtin_ia32_subsh_mask (__C, __D, __A, __B); } -extern __inline __m512h +extern __inline __m128h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_load_ph (void const *__P) +_mm_maskz_sub_sh (__mmask8 __A, __m128h __B, __m128h __C) { - return *(const __m512h *) __P; + return __builtin_ia32_subsh_mask (__B, __C, _mm_setzero_ph (), + __A); } -extern __inline __m256h +extern __inline __m128h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm256_load_ph (void const *__P) +_mm_mul_sh (__m128h __A, __m128h __B) { - return *(const __m256h *) __P; + __A[0] *= __B[0]; + return __A; } extern __inline __m128h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_load_ph (void const *__P) +_mm_mask_mul_sh (__m128h __A, __mmask8 __B, __m128h __C, __m128h __D) { - return *(const __m128h *) __P; + return __builtin_ia32_mulsh_mask (__C, __D, __A, __B); } -extern __inline __m512h +extern __inline __m128h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_loadu_ph (void const *__P) +_mm_maskz_mul_sh (__mmask8 __A, __m128h __B, __m128h __C) { - return *(const __m512h_u *) __P; + return __builtin_ia32_mulsh_mask (__B, __C, _mm_setzero_ph (), __A); } -extern __inline __m256h +extern __inline __m128h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm256_loadu_ph (void const *__P) +_mm_div_sh (__m128h __A, __m128h __B) { - return *(const __m256h_u *) __P; + __A[0] /= __B[0]; + return __A; } extern __inline __m128h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_loadu_ph (void const *__P) +_mm_mask_div_sh (__m128h __A, __mmask8 __B, __m128h __C, __m128h __D) { - return *(const __m128h_u *) __P; + return __builtin_ia32_divsh_mask (__C, __D, __A, __B); } -/* Stores the lower _Float16 value. */ -extern __inline void +extern __inline __m128h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_store_sh (void *__P, __m128h __A) +_mm_maskz_div_sh (__mmask8 __A, __m128h __B, __m128h __C) { - *(_Float16 *) __P = ((__v8hf)__A)[0]; -} - -extern __inline void -__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_store_ph (void *__P, __m512h __A) -{ - *(__m512h *) __P = __A; + return __builtin_ia32_divsh_mask (__B, __C, _mm_setzero_ph (), + __A); } -extern __inline void +#ifdef __OPTIMIZE__ +extern __inline __m128h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm256_store_ph (void *__P, __m256h __A) +_mm_add_round_sh (__m128h __A, __m128h __B, const int __C) { - *(__m256h *) __P = __A; + return __builtin_ia32_addsh_mask_round (__A, __B, + _mm_setzero_ph (), + (__mmask8) -1, __C); } -extern __inline void +extern __inline __m128h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_store_ph (void *__P, __m128h __A) +_mm_mask_add_round_sh (__m128h __A, __mmask8 __B, __m128h __C, + __m128h __D, const int __E) { - *(__m128h *) __P = __A; + return __builtin_ia32_addsh_mask_round (__C, __D, __A, __B, __E); } -extern __inline void +extern __inline __m128h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_storeu_ph (void *__P, __m512h __A) +_mm_maskz_add_round_sh (__mmask8 __A, __m128h __B, __m128h __C, + const int __D) { - *(__m512h_u *) __P = __A; + return __builtin_ia32_addsh_mask_round (__B, __C, + _mm_setzero_ph (), + __A, __D); } -extern __inline void +extern __inline __m128h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm256_storeu_ph (void *__P, __m256h __A) +_mm_sub_round_sh (__m128h __A, __m128h __B, const int __C) { - *(__m256h_u *) __P = __A; + return __builtin_ia32_subsh_mask_round (__A, __B, + _mm_setzero_ph (), + (__mmask8) -1, __C); } -extern __inline void +extern __inline __m128h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_storeu_ph (void *__P, __m128h __A) +_mm_mask_sub_round_sh (__m128h __A, __mmask8 __B, __m128h __C, + __m128h __D, const int __E) { - *(__m128h_u *) __P = __A; + return __builtin_ia32_subsh_mask_round (__C, __D, __A, __B, __E); } -extern __inline __m512h +extern __inline __m128h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_abs_ph (__m512h __A) +_mm_maskz_sub_round_sh (__mmask8 __A, __m128h __B, __m128h __C, + const int __D) { - return (__m512h) _mm512_and_epi32 ( _mm512_set1_epi32 (0x7FFF7FFF), - (__m512i) __A); + return __builtin_ia32_subsh_mask_round (__B, __C, + _mm_setzero_ph (), + __A, __D); } -/* Intrinsics v[add,sub,mul,div]ph. */ -extern __inline __m512h +extern __inline __m128h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_add_ph (__m512h __A, __m512h __B) +_mm_mul_round_sh (__m128h __A, __m128h __B, const int __C) { - return (__m512h) ((__v32hf) __A + (__v32hf) __B); + return __builtin_ia32_mulsh_mask_round (__A, __B, + _mm_setzero_ph (), + (__mmask8) -1, __C); } -extern __inline __m512h +extern __inline __m128h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_mask_add_ph (__m512h __A, __mmask32 __B, __m512h __C, __m512h __D) +_mm_mask_mul_round_sh (__m128h __A, __mmask8 __B, __m128h __C, + __m128h __D, const int __E) { - return __builtin_ia32_addph512_mask (__C, __D, __A, __B); + return __builtin_ia32_mulsh_mask_round (__C, __D, __A, __B, __E); } -extern __inline __m512h +extern __inline __m128h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_maskz_add_ph (__mmask32 __A, __m512h __B, __m512h __C) +_mm_maskz_mul_round_sh (__mmask8 __A, __m128h __B, __m128h __C, + const int __D) { - return __builtin_ia32_addph512_mask (__B, __C, - _mm512_setzero_ph (), __A); + return __builtin_ia32_mulsh_mask_round (__B, __C, + _mm_setzero_ph (), + __A, __D); } -extern __inline __m512h +extern __inline __m128h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_sub_ph (__m512h __A, __m512h __B) +_mm_div_round_sh (__m128h __A, __m128h __B, const int __C) { - return (__m512h) ((__v32hf) __A - (__v32hf) __B); + return __builtin_ia32_divsh_mask_round (__A, __B, + _mm_setzero_ph (), + (__mmask8) -1, __C); } -extern __inline __m512h +extern __inline __m128h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_mask_sub_ph (__m512h __A, __mmask32 __B, __m512h __C, __m512h __D) +_mm_mask_div_round_sh (__m128h __A, __mmask8 __B, __m128h __C, + __m128h __D, const int __E) { - return __builtin_ia32_subph512_mask (__C, __D, __A, __B); + return __builtin_ia32_divsh_mask_round (__C, __D, __A, __B, __E); } -extern __inline __m512h +extern __inline __m128h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_maskz_sub_ph (__mmask32 __A, __m512h __B, __m512h __C) +_mm_maskz_div_round_sh (__mmask8 __A, __m128h __B, __m128h __C, + const int __D) { - return __builtin_ia32_subph512_mask (__B, __C, - _mm512_setzero_ph (), __A); + return __builtin_ia32_divsh_mask_round (__B, __C, + _mm_setzero_ph (), + __A, __D); } +#else +#define _mm_add_round_sh(A, B, C) \ + ((__m128h)__builtin_ia32_addsh_mask_round ((A), (B), \ + _mm_setzero_ph (), \ + (__mmask8)-1, (C))) -extern __inline __m512h +#define _mm_mask_add_round_sh(A, B, C, D, E) \ + ((__m128h)__builtin_ia32_addsh_mask_round ((C), (D), (A), (B), (E))) + +#define _mm_maskz_add_round_sh(A, B, C, D) \ + ((__m128h)__builtin_ia32_addsh_mask_round ((B), (C), \ + _mm_setzero_ph (), \ + (A), (D))) + +#define _mm_sub_round_sh(A, B, C) \ + ((__m128h)__builtin_ia32_subsh_mask_round ((A), (B), \ + _mm_setzero_ph (), \ + (__mmask8)-1, (C))) + +#define _mm_mask_sub_round_sh(A, B, C, D, E) \ + ((__m128h)__builtin_ia32_subsh_mask_round ((C), (D), (A), (B), (E))) + +#define _mm_maskz_sub_round_sh(A, B, C, D) \ + ((__m128h)__builtin_ia32_subsh_mask_round ((B), (C), \ + _mm_setzero_ph (), \ + (A), (D))) + +#define _mm_mul_round_sh(A, B, C) \ + ((__m128h)__builtin_ia32_mulsh_mask_round ((A), (B), \ + _mm_setzero_ph (), \ + (__mmask8)-1, (C))) + +#define _mm_mask_mul_round_sh(A, B, C, D, E) \ + ((__m128h)__builtin_ia32_mulsh_mask_round ((C), (D), (A), (B), (E))) + +#define _mm_maskz_mul_round_sh(A, B, C, D) \ + ((__m128h)__builtin_ia32_mulsh_mask_round ((B), (C), \ + _mm_setzero_ph (), \ + (A), (D))) + +#define _mm_div_round_sh(A, B, C) \ + ((__m128h)__builtin_ia32_divsh_mask_round ((A), (B), \ + _mm_setzero_ph (), \ + (__mmask8)-1, (C))) + +#define _mm_mask_div_round_sh(A, B, C, D, E) \ + ((__m128h)__builtin_ia32_divsh_mask_round ((C), (D), (A), (B), (E))) + +#define _mm_maskz_div_round_sh(A, B, C, D) \ + ((__m128h)__builtin_ia32_divsh_mask_round ((B), (C), \ + _mm_setzero_ph (), \ + (A), (D))) +#endif /* __OPTIMIZE__ */ + +/* Intrinsic vmaxsh vminsh. */ +extern __inline __m128h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_mul_ph (__m512h __A, __m512h __B) +_mm_max_sh (__m128h __A, __m128h __B) { - return (__m512h) ((__v32hf) __A * (__v32hf) __B); + __A[0] = __A[0] > __B[0] ? __A[0] : __B[0]; + return __A; } -extern __inline __m512h +extern __inline __m128h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_mask_mul_ph (__m512h __A, __mmask32 __B, __m512h __C, __m512h __D) +_mm_mask_max_sh (__m128h __A, __mmask8 __B, __m128h __C, __m128h __D) { - return __builtin_ia32_mulph512_mask (__C, __D, __A, __B); + return __builtin_ia32_maxsh_mask (__C, __D, __A, __B); } -extern __inline __m512h +extern __inline __m128h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_maskz_mul_ph (__mmask32 __A, __m512h __B, __m512h __C) +_mm_maskz_max_sh (__mmask8 __A, __m128h __B, __m128h __C) { - return __builtin_ia32_mulph512_mask (__B, __C, - _mm512_setzero_ph (), __A); + return __builtin_ia32_maxsh_mask (__B, __C, _mm_setzero_ph (), + __A); } -extern __inline __m512h +extern __inline __m128h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_div_ph (__m512h __A, __m512h __B) +_mm_min_sh (__m128h __A, __m128h __B) { - return (__m512h) ((__v32hf) __A / (__v32hf) __B); + __A[0] = __A[0] < __B[0] ? __A[0] : __B[0]; + return __A; } -extern __inline __m512h +extern __inline __m128h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_mask_div_ph (__m512h __A, __mmask32 __B, __m512h __C, __m512h __D) +_mm_mask_min_sh (__m128h __A, __mmask8 __B, __m128h __C, __m128h __D) { - return __builtin_ia32_divph512_mask (__C, __D, __A, __B); + return __builtin_ia32_minsh_mask (__C, __D, __A, __B); } -extern __inline __m512h +extern __inline __m128h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_maskz_div_ph (__mmask32 __A, __m512h __B, __m512h __C) +_mm_maskz_min_sh (__mmask8 __A, __m128h __B, __m128h __C) { - return __builtin_ia32_divph512_mask (__B, __C, - _mm512_setzero_ph (), __A); + return __builtin_ia32_minsh_mask (__B, __C, _mm_setzero_ph (), + __A); } #ifdef __OPTIMIZE__ -extern __inline __m512h +extern __inline __m128h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_add_round_ph (__m512h __A, __m512h __B, const int __C) +_mm_max_round_sh (__m128h __A, __m128h __B, const int __C) { - return __builtin_ia32_addph512_mask_round (__A, __B, - _mm512_setzero_ph (), - (__mmask32) -1, __C); + return __builtin_ia32_maxsh_mask_round (__A, __B, + _mm_setzero_ph (), + (__mmask8) -1, __C); } -extern __inline __m512h +extern __inline __m128h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_mask_add_round_ph (__m512h __A, __mmask32 __B, __m512h __C, - __m512h __D, const int __E) +_mm_mask_max_round_sh (__m128h __A, __mmask8 __B, __m128h __C, + __m128h __D, const int __E) { - return __builtin_ia32_addph512_mask_round (__C, __D, __A, __B, __E); + return __builtin_ia32_maxsh_mask_round (__C, __D, __A, __B, __E); } -extern __inline __m512h +extern __inline __m128h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_maskz_add_round_ph (__mmask32 __A, __m512h __B, __m512h __C, - const int __D) +_mm_maskz_max_round_sh (__mmask8 __A, __m128h __B, __m128h __C, + const int __D) { - return __builtin_ia32_addph512_mask_round (__B, __C, - _mm512_setzero_ph (), - __A, __D); + return __builtin_ia32_maxsh_mask_round (__B, __C, + _mm_setzero_ph (), + __A, __D); } -extern __inline __m512h +extern __inline __m128h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_sub_round_ph (__m512h __A, __m512h __B, const int __C) +_mm_min_round_sh (__m128h __A, __m128h __B, const int __C) { - return __builtin_ia32_subph512_mask_round (__A, __B, - _mm512_setzero_ph (), - (__mmask32) -1, __C); + return __builtin_ia32_minsh_mask_round (__A, __B, + _mm_setzero_ph (), + (__mmask8) -1, __C); } -extern __inline __m512h +extern __inline __m128h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_mask_sub_round_ph (__m512h __A, __mmask32 __B, __m512h __C, - __m512h __D, const int __E) +_mm_mask_min_round_sh (__m128h __A, __mmask8 __B, __m128h __C, + __m128h __D, const int __E) { - return __builtin_ia32_subph512_mask_round (__C, __D, __A, __B, __E); + return __builtin_ia32_minsh_mask_round (__C, __D, __A, __B, __E); } -extern __inline __m512h +extern __inline __m128h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_maskz_sub_round_ph (__mmask32 __A, __m512h __B, __m512h __C, - const int __D) +_mm_maskz_min_round_sh (__mmask8 __A, __m128h __B, __m128h __C, + const int __D) { - return __builtin_ia32_subph512_mask_round (__B, __C, - _mm512_setzero_ph (), - __A, __D); + return __builtin_ia32_minsh_mask_round (__B, __C, + _mm_setzero_ph (), + __A, __D); } -extern __inline __m512h -__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_mul_round_ph (__m512h __A, __m512h __B, const int __C) -{ - return __builtin_ia32_mulph512_mask_round (__A, __B, - _mm512_setzero_ph (), - (__mmask32) -1, __C); -} +#else +#define _mm_max_round_sh(A, B, C) \ + (__builtin_ia32_maxsh_mask_round ((A), (B), \ + _mm_setzero_ph (), \ + (__mmask8)-1, (C))) -extern __inline __m512h -__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_mask_mul_round_ph (__m512h __A, __mmask32 __B, __m512h __C, - __m512h __D, const int __E) -{ - return __builtin_ia32_mulph512_mask_round (__C, __D, __A, __B, __E); -} +#define _mm_mask_max_round_sh(A, B, C, D, E) \ + (__builtin_ia32_maxsh_mask_round ((C), (D), (A), (B), (E))) -extern __inline __m512h +#define _mm_maskz_max_round_sh(A, B, C, D) \ + (__builtin_ia32_maxsh_mask_round ((B), (C), \ + _mm_setzero_ph (), \ + (A), (D))) + +#define _mm_min_round_sh(A, B, C) \ + (__builtin_ia32_minsh_mask_round ((A), (B), \ + _mm_setzero_ph (), \ + (__mmask8)-1, (C))) + +#define _mm_mask_min_round_sh(A, B, C, D, E) \ + (__builtin_ia32_minsh_mask_round ((C), (D), (A), (B), (E))) + +#define _mm_maskz_min_round_sh(A, B, C, D) \ + (__builtin_ia32_minsh_mask_round ((B), (C), \ + _mm_setzero_ph (), \ + (A), (D))) + +#endif /* __OPTIMIZE__ */ + +/* Intrinsics vcmpsh. */ +#ifdef __OPTIMIZE__ +extern __inline __mmask8 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_maskz_mul_round_ph (__mmask32 __A, __m512h __B, __m512h __C, - const int __D) +_mm_cmp_sh_mask (__m128h __A, __m128h __B, const int __C) { - return __builtin_ia32_mulph512_mask_round (__B, __C, - _mm512_setzero_ph (), - __A, __D); + return (__mmask8) + __builtin_ia32_cmpsh_mask_round (__A, __B, + __C, (__mmask8) -1, + _MM_FROUND_CUR_DIRECTION); } -extern __inline __m512h +extern __inline __mmask8 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_div_round_ph (__m512h __A, __m512h __B, const int __C) +_mm_mask_cmp_sh_mask (__mmask8 __A, __m128h __B, __m128h __C, + const int __D) { - return __builtin_ia32_divph512_mask_round (__A, __B, - _mm512_setzero_ph (), - (__mmask32) -1, __C); + return (__mmask8) + __builtin_ia32_cmpsh_mask_round (__B, __C, + __D, __A, + _MM_FROUND_CUR_DIRECTION); } -extern __inline __m512h +extern __inline __mmask8 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_mask_div_round_ph (__m512h __A, __mmask32 __B, __m512h __C, - __m512h __D, const int __E) +_mm_cmp_round_sh_mask (__m128h __A, __m128h __B, const int __C, + const int __D) { - return __builtin_ia32_divph512_mask_round (__C, __D, __A, __B, __E); + return (__mmask8) __builtin_ia32_cmpsh_mask_round (__A, __B, + __C, (__mmask8) -1, + __D); } -extern __inline __m512h +extern __inline __mmask8 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_maskz_div_round_ph (__mmask32 __A, __m512h __B, __m512h __C, - const int __D) +_mm_mask_cmp_round_sh_mask (__mmask8 __A, __m128h __B, __m128h __C, + const int __D, const int __E) { - return __builtin_ia32_divph512_mask_round (__B, __C, - _mm512_setzero_ph (), - __A, __D); + return (__mmask8) __builtin_ia32_cmpsh_mask_round (__B, __C, + __D, __A, + __E); } -#else -#define _mm512_add_round_ph(A, B, C) \ - ((__m512h)__builtin_ia32_addph512_mask_round((A), (B), \ - _mm512_setzero_ph (), \ - (__mmask32)-1, (C))) - -#define _mm512_mask_add_round_ph(A, B, C, D, E) \ - ((__m512h)__builtin_ia32_addph512_mask_round((C), (D), (A), (B), (E))) - -#define _mm512_maskz_add_round_ph(A, B, C, D) \ - ((__m512h)__builtin_ia32_addph512_mask_round((B), (C), \ - _mm512_setzero_ph (), \ - (A), (D))) - -#define _mm512_sub_round_ph(A, B, C) \ - ((__m512h)__builtin_ia32_subph512_mask_round((A), (B), \ - _mm512_setzero_ph (), \ - (__mmask32)-1, (C))) - -#define _mm512_mask_sub_round_ph(A, B, C, D, E) \ - ((__m512h)__builtin_ia32_subph512_mask_round((C), (D), (A), (B), (E))) - -#define _mm512_maskz_sub_round_ph(A, B, C, D) \ - ((__m512h)__builtin_ia32_subph512_mask_round((B), (C), \ - _mm512_setzero_ph (), \ - (A), (D))) - -#define _mm512_mul_round_ph(A, B, C) \ - ((__m512h)__builtin_ia32_mulph512_mask_round((A), (B), \ - _mm512_setzero_ph (), \ - (__mmask32)-1, (C))) -#define _mm512_mask_mul_round_ph(A, B, C, D, E) \ - ((__m512h)__builtin_ia32_mulph512_mask_round((C), (D), (A), (B), (E))) +#else +#define _mm_cmp_sh_mask(A, B, C) \ + (__builtin_ia32_cmpsh_mask_round ((A), (B), (C), (-1), \ + (_MM_FROUND_CUR_DIRECTION))) -#define _mm512_maskz_mul_round_ph(A, B, C, D) \ - ((__m512h)__builtin_ia32_mulph512_mask_round((B), (C), \ - _mm512_setzero_ph (), \ - (A), (D))) +#define _mm_mask_cmp_sh_mask(A, B, C, D) \ + (__builtin_ia32_cmpsh_mask_round ((B), (C), (D), (A), \ + (_MM_FROUND_CUR_DIRECTION))) -#define _mm512_div_round_ph(A, B, C) \ - ((__m512h)__builtin_ia32_divph512_mask_round((A), (B), \ - _mm512_setzero_ph (), \ - (__mmask32)-1, (C))) +#define _mm_cmp_round_sh_mask(A, B, C, D) \ + (__builtin_ia32_cmpsh_mask_round ((A), (B), (C), (-1), (D))) -#define _mm512_mask_div_round_ph(A, B, C, D, E) \ - ((__m512h)__builtin_ia32_divph512_mask_round((C), (D), (A), (B), (E))) +#define _mm_mask_cmp_round_sh_mask(A, B, C, D, E) \ + (__builtin_ia32_cmpsh_mask_round ((B), (C), (D), (A), (E))) -#define _mm512_maskz_div_round_ph(A, B, C, D) \ - ((__m512h)__builtin_ia32_divph512_mask_round((B), (C), \ - _mm512_setzero_ph (), \ - (A), (D))) -#endif /* __OPTIMIZE__ */ +#endif /* __OPTIMIZE__ */ -extern __inline __m512h +/* Intrinsics vcomish. */ +extern __inline int __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_conj_pch (__m512h __A) +_mm_comieq_sh (__m128h __A, __m128h __B) { - return (__m512h) _mm512_xor_epi32 ((__m512i) __A, _mm512_set1_epi32 (1<<31)); + return __builtin_ia32_cmpsh_mask_round (__A, __B, _CMP_EQ_OS, + (__mmask8) -1, + _MM_FROUND_CUR_DIRECTION); } -extern __inline __m512h +extern __inline int __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_mask_conj_pch (__m512h __W, __mmask16 __U, __m512h __A) +_mm_comilt_sh (__m128h __A, __m128h __B) { - return (__m512h) - __builtin_ia32_movaps512_mask ((__v16sf) _mm512_conj_pch (__A), - (__v16sf) __W, - (__mmask16) __U); + return __builtin_ia32_cmpsh_mask_round (__A, __B, _CMP_LT_OS, + (__mmask8) -1, + _MM_FROUND_CUR_DIRECTION); } -extern __inline __m512h +extern __inline int __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_maskz_conj_pch (__mmask16 __U, __m512h __A) +_mm_comile_sh (__m128h __A, __m128h __B) { - return (__m512h) - __builtin_ia32_movaps512_mask ((__v16sf) _mm512_conj_pch (__A), - (__v16sf) _mm512_setzero_ps (), - (__mmask16) __U); + return __builtin_ia32_cmpsh_mask_round (__A, __B, _CMP_LE_OS, + (__mmask8) -1, + _MM_FROUND_CUR_DIRECTION); } -/* Intrinsics of v[add,sub,mul,div]sh. */ -extern __inline __m128h - __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_add_sh (__m128h __A, __m128h __B) +extern __inline int +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_comigt_sh (__m128h __A, __m128h __B) { - __A[0] += __B[0]; - return __A; + return __builtin_ia32_cmpsh_mask_round (__A, __B, _CMP_GT_OS, + (__mmask8) -1, + _MM_FROUND_CUR_DIRECTION); } -extern __inline __m128h +extern __inline int __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_mask_add_sh (__m128h __A, __mmask8 __B, __m128h __C, __m128h __D) +_mm_comige_sh (__m128h __A, __m128h __B) { - return __builtin_ia32_addsh_mask (__C, __D, __A, __B); + return __builtin_ia32_cmpsh_mask_round (__A, __B, _CMP_GE_OS, + (__mmask8) -1, + _MM_FROUND_CUR_DIRECTION); } -extern __inline __m128h +extern __inline int __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_maskz_add_sh (__mmask8 __A, __m128h __B, __m128h __C) +_mm_comineq_sh (__m128h __A, __m128h __B) { - return __builtin_ia32_addsh_mask (__B, __C, _mm_setzero_ph (), - __A); + return __builtin_ia32_cmpsh_mask_round (__A, __B, _CMP_NEQ_US, + (__mmask8) -1, + _MM_FROUND_CUR_DIRECTION); } -extern __inline __m128h +extern __inline int __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_sub_sh (__m128h __A, __m128h __B) +_mm_ucomieq_sh (__m128h __A, __m128h __B) { - __A[0] -= __B[0]; - return __A; + return __builtin_ia32_cmpsh_mask_round (__A, __B, _CMP_EQ_OQ, + (__mmask8) -1, + _MM_FROUND_CUR_DIRECTION); } -extern __inline __m128h +extern __inline int __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_mask_sub_sh (__m128h __A, __mmask8 __B, __m128h __C, __m128h __D) +_mm_ucomilt_sh (__m128h __A, __m128h __B) { - return __builtin_ia32_subsh_mask (__C, __D, __A, __B); + return __builtin_ia32_cmpsh_mask_round (__A, __B, _CMP_LT_OQ, + (__mmask8) -1, + _MM_FROUND_CUR_DIRECTION); } -extern __inline __m128h +extern __inline int __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_maskz_sub_sh (__mmask8 __A, __m128h __B, __m128h __C) +_mm_ucomile_sh (__m128h __A, __m128h __B) { - return __builtin_ia32_subsh_mask (__B, __C, _mm_setzero_ph (), - __A); + return __builtin_ia32_cmpsh_mask_round (__A, __B, _CMP_LE_OQ, + (__mmask8) -1, + _MM_FROUND_CUR_DIRECTION); } -extern __inline __m128h +extern __inline int __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_mul_sh (__m128h __A, __m128h __B) +_mm_ucomigt_sh (__m128h __A, __m128h __B) { - __A[0] *= __B[0]; - return __A; + return __builtin_ia32_cmpsh_mask_round (__A, __B, _CMP_GT_OQ, + (__mmask8) -1, + _MM_FROUND_CUR_DIRECTION); } -extern __inline __m128h +extern __inline int __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_mask_mul_sh (__m128h __A, __mmask8 __B, __m128h __C, __m128h __D) +_mm_ucomige_sh (__m128h __A, __m128h __B) { - return __builtin_ia32_mulsh_mask (__C, __D, __A, __B); + return __builtin_ia32_cmpsh_mask_round (__A, __B, _CMP_GE_OQ, + (__mmask8) -1, + _MM_FROUND_CUR_DIRECTION); } -extern __inline __m128h +extern __inline int __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_maskz_mul_sh (__mmask8 __A, __m128h __B, __m128h __C) +_mm_ucomineq_sh (__m128h __A, __m128h __B) { - return __builtin_ia32_mulsh_mask (__B, __C, _mm_setzero_ph (), __A); + return __builtin_ia32_cmpsh_mask_round (__A, __B, _CMP_NEQ_UQ, + (__mmask8) -1, + _MM_FROUND_CUR_DIRECTION); } -extern __inline __m128h +#ifdef __OPTIMIZE__ +extern __inline int __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_div_sh (__m128h __A, __m128h __B) +_mm_comi_sh (__m128h __A, __m128h __B, const int __P) { - __A[0] /= __B[0]; - return __A; + return __builtin_ia32_cmpsh_mask_round (__A, __B, __P, + (__mmask8) -1, + _MM_FROUND_CUR_DIRECTION); } -extern __inline __m128h +extern __inline int __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_mask_div_sh (__m128h __A, __mmask8 __B, __m128h __C, __m128h __D) +_mm_comi_round_sh (__m128h __A, __m128h __B, const int __P, const int __R) { - return __builtin_ia32_divsh_mask (__C, __D, __A, __B); + return __builtin_ia32_cmpsh_mask_round (__A, __B, __P, + (__mmask8) -1,__R); } +#else +#define _mm_comi_round_sh(A, B, P, R) \ + (__builtin_ia32_cmpsh_mask_round ((A), (B), (P), (__mmask8) (-1), (R))) +#define _mm_comi_sh(A, B, P) \ + (__builtin_ia32_cmpsh_mask_round ((A), (B), (P), (__mmask8) (-1), \ + _MM_FROUND_CUR_DIRECTION)) + +#endif /* __OPTIMIZE__ */ + +/* Intrinsics vsqrtsh. */ extern __inline __m128h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_maskz_div_sh (__mmask8 __A, __m128h __B, __m128h __C) +_mm_sqrt_sh (__m128h __A, __m128h __B) { - return __builtin_ia32_divsh_mask (__B, __C, _mm_setzero_ph (), - __A); + return __builtin_ia32_sqrtsh_mask_round (__B, __A, + _mm_setzero_ph (), + (__mmask8) -1, + _MM_FROUND_CUR_DIRECTION); } -#ifdef __OPTIMIZE__ extern __inline __m128h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_add_round_sh (__m128h __A, __m128h __B, const int __C) +_mm_mask_sqrt_sh (__m128h __A, __mmask8 __B, __m128h __C, __m128h __D) { - return __builtin_ia32_addsh_mask_round (__A, __B, - _mm_setzero_ph (), - (__mmask8) -1, __C); + return __builtin_ia32_sqrtsh_mask_round (__D, __C, __A, __B, + _MM_FROUND_CUR_DIRECTION); } extern __inline __m128h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_mask_add_round_sh (__m128h __A, __mmask8 __B, __m128h __C, - __m128h __D, const int __E) +_mm_maskz_sqrt_sh (__mmask8 __A, __m128h __B, __m128h __C) { - return __builtin_ia32_addsh_mask_round (__C, __D, __A, __B, __E); + return __builtin_ia32_sqrtsh_mask_round (__C, __B, + _mm_setzero_ph (), + __A, _MM_FROUND_CUR_DIRECTION); } +#ifdef __OPTIMIZE__ extern __inline __m128h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_maskz_add_round_sh (__mmask8 __A, __m128h __B, __m128h __C, - const int __D) +_mm_sqrt_round_sh (__m128h __A, __m128h __B, const int __C) { - return __builtin_ia32_addsh_mask_round (__B, __C, - _mm_setzero_ph (), - __A, __D); + return __builtin_ia32_sqrtsh_mask_round (__B, __A, + _mm_setzero_ph (), + (__mmask8) -1, __C); } extern __inline __m128h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_sub_round_sh (__m128h __A, __m128h __B, const int __C) +_mm_mask_sqrt_round_sh (__m128h __A, __mmask8 __B, __m128h __C, + __m128h __D, const int __E) { - return __builtin_ia32_subsh_mask_round (__A, __B, - _mm_setzero_ph (), - (__mmask8) -1, __C); + return __builtin_ia32_sqrtsh_mask_round (__D, __C, __A, __B, + __E); } extern __inline __m128h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_mask_sub_round_sh (__m128h __A, __mmask8 __B, __m128h __C, - __m128h __D, const int __E) +_mm_maskz_sqrt_round_sh (__mmask8 __A, __m128h __B, __m128h __C, + const int __D) { - return __builtin_ia32_subsh_mask_round (__C, __D, __A, __B, __E); + return __builtin_ia32_sqrtsh_mask_round (__C, __B, + _mm_setzero_ph (), + __A, __D); } +#else +#define _mm_sqrt_round_sh(A, B, C) \ + (__builtin_ia32_sqrtsh_mask_round ((B), (A), \ + _mm_setzero_ph (), \ + (__mmask8)-1, (C))) + +#define _mm_mask_sqrt_round_sh(A, B, C, D, E) \ + (__builtin_ia32_sqrtsh_mask_round ((D), (C), (A), (B), (E))) + +#define _mm_maskz_sqrt_round_sh(A, B, C, D) \ + (__builtin_ia32_sqrtsh_mask_round ((C), (B), \ + _mm_setzero_ph (), \ + (A), (D))) + +#endif /* __OPTIMIZE__ */ + +/* Intrinsics vrsqrtsh. */ extern __inline __m128h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_maskz_sub_round_sh (__mmask8 __A, __m128h __B, __m128h __C, - const int __D) +_mm_rsqrt_sh (__m128h __A, __m128h __B) { - return __builtin_ia32_subsh_mask_round (__B, __C, - _mm_setzero_ph (), - __A, __D); + return __builtin_ia32_rsqrtsh_mask (__B, __A, _mm_setzero_ph (), + (__mmask8) -1); } extern __inline __m128h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_mul_round_sh (__m128h __A, __m128h __B, const int __C) +_mm_mask_rsqrt_sh (__m128h __A, __mmask8 __B, __m128h __C, __m128h __D) { - return __builtin_ia32_mulsh_mask_round (__A, __B, - _mm_setzero_ph (), - (__mmask8) -1, __C); + return __builtin_ia32_rsqrtsh_mask (__D, __C, __A, __B); } extern __inline __m128h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_mask_mul_round_sh (__m128h __A, __mmask8 __B, __m128h __C, - __m128h __D, const int __E) +_mm_maskz_rsqrt_sh (__mmask8 __A, __m128h __B, __m128h __C) { - return __builtin_ia32_mulsh_mask_round (__C, __D, __A, __B, __E); + return __builtin_ia32_rsqrtsh_mask (__C, __B, _mm_setzero_ph (), + __A); } +/* Intrinsics vrcpsh. */ extern __inline __m128h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_maskz_mul_round_sh (__mmask8 __A, __m128h __B, __m128h __C, - const int __D) +_mm_rcp_sh (__m128h __A, __m128h __B) { - return __builtin_ia32_mulsh_mask_round (__B, __C, - _mm_setzero_ph (), - __A, __D); + return __builtin_ia32_rcpsh_mask (__B, __A, _mm_setzero_ph (), + (__mmask8) -1); } extern __inline __m128h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_div_round_sh (__m128h __A, __m128h __B, const int __C) +_mm_mask_rcp_sh (__m128h __A, __mmask32 __B, __m128h __C, __m128h __D) { - return __builtin_ia32_divsh_mask_round (__A, __B, - _mm_setzero_ph (), - (__mmask8) -1, __C); + return __builtin_ia32_rcpsh_mask (__D, __C, __A, __B); } extern __inline __m128h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_mask_div_round_sh (__m128h __A, __mmask8 __B, __m128h __C, - __m128h __D, const int __E) +_mm_maskz_rcp_sh (__mmask32 __A, __m128h __B, __m128h __C) { - return __builtin_ia32_divsh_mask_round (__C, __D, __A, __B, __E); + return __builtin_ia32_rcpsh_mask (__C, __B, _mm_setzero_ph (), + __A); } +/* Intrinsics vscalefsh. */ extern __inline __m128h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_maskz_div_round_sh (__mmask8 __A, __m128h __B, __m128h __C, - const int __D) +_mm_scalef_sh (__m128h __A, __m128h __B) { - return __builtin_ia32_divsh_mask_round (__B, __C, - _mm_setzero_ph (), - __A, __D); + return __builtin_ia32_scalefsh_mask_round (__A, __B, + _mm_setzero_ph (), + (__mmask8) -1, + _MM_FROUND_CUR_DIRECTION); } -#else -#define _mm_add_round_sh(A, B, C) \ - ((__m128h)__builtin_ia32_addsh_mask_round ((A), (B), \ - _mm_setzero_ph (), \ - (__mmask8)-1, (C))) - -#define _mm_mask_add_round_sh(A, B, C, D, E) \ - ((__m128h)__builtin_ia32_addsh_mask_round ((C), (D), (A), (B), (E))) - -#define _mm_maskz_add_round_sh(A, B, C, D) \ - ((__m128h)__builtin_ia32_addsh_mask_round ((B), (C), \ - _mm_setzero_ph (), \ - (A), (D))) - -#define _mm_sub_round_sh(A, B, C) \ - ((__m128h)__builtin_ia32_subsh_mask_round ((A), (B), \ - _mm_setzero_ph (), \ - (__mmask8)-1, (C))) - -#define _mm_mask_sub_round_sh(A, B, C, D, E) \ - ((__m128h)__builtin_ia32_subsh_mask_round ((C), (D), (A), (B), (E))) - -#define _mm_maskz_sub_round_sh(A, B, C, D) \ - ((__m128h)__builtin_ia32_subsh_mask_round ((B), (C), \ - _mm_setzero_ph (), \ - (A), (D))) - -#define _mm_mul_round_sh(A, B, C) \ - ((__m128h)__builtin_ia32_mulsh_mask_round ((A), (B), \ - _mm_setzero_ph (), \ - (__mmask8)-1, (C))) - -#define _mm_mask_mul_round_sh(A, B, C, D, E) \ - ((__m128h)__builtin_ia32_mulsh_mask_round ((C), (D), (A), (B), (E))) - -#define _mm_maskz_mul_round_sh(A, B, C, D) \ - ((__m128h)__builtin_ia32_mulsh_mask_round ((B), (C), \ - _mm_setzero_ph (), \ - (A), (D))) - -#define _mm_div_round_sh(A, B, C) \ - ((__m128h)__builtin_ia32_divsh_mask_round ((A), (B), \ - _mm_setzero_ph (), \ - (__mmask8)-1, (C))) - -#define _mm_mask_div_round_sh(A, B, C, D, E) \ - ((__m128h)__builtin_ia32_divsh_mask_round ((C), (D), (A), (B), (E))) - -#define _mm_maskz_div_round_sh(A, B, C, D) \ - ((__m128h)__builtin_ia32_divsh_mask_round ((B), (C), \ - _mm_setzero_ph (), \ - (A), (D))) -#endif /* __OPTIMIZE__ */ -/* Intrinsic vmaxph vminph. */ -extern __inline __m512h +extern __inline __m128h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_max_ph (__m512h __A, __m512h __B) +_mm_mask_scalef_sh (__m128h __A, __mmask8 __B, __m128h __C, __m128h __D) { - return __builtin_ia32_maxph512_mask (__A, __B, - _mm512_setzero_ph (), - (__mmask32) -1); + return __builtin_ia32_scalefsh_mask_round (__C, __D, __A, __B, + _MM_FROUND_CUR_DIRECTION); } -extern __inline __m512h +extern __inline __m128h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_mask_max_ph (__m512h __A, __mmask32 __B, __m512h __C, __m512h __D) +_mm_maskz_scalef_sh (__mmask8 __A, __m128h __B, __m128h __C) { - return __builtin_ia32_maxph512_mask (__C, __D, __A, __B); + return __builtin_ia32_scalefsh_mask_round (__B, __C, + _mm_setzero_ph (), + __A, + _MM_FROUND_CUR_DIRECTION); } -extern __inline __m512h +#ifdef __OPTIMIZE__ +extern __inline __m128h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_maskz_max_ph (__mmask32 __A, __m512h __B, __m512h __C) +_mm_scalef_round_sh (__m128h __A, __m128h __B, const int __C) { - return __builtin_ia32_maxph512_mask (__B, __C, - _mm512_setzero_ph (), __A); + return __builtin_ia32_scalefsh_mask_round (__A, __B, + _mm_setzero_ph (), + (__mmask8) -1, __C); } -extern __inline __m512h +extern __inline __m128h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_min_ph (__m512h __A, __m512h __B) +_mm_mask_scalef_round_sh (__m128h __A, __mmask8 __B, __m128h __C, + __m128h __D, const int __E) { - return __builtin_ia32_minph512_mask (__A, __B, - _mm512_setzero_ph (), - (__mmask32) -1); + return __builtin_ia32_scalefsh_mask_round (__C, __D, __A, __B, + __E); } -extern __inline __m512h +extern __inline __m128h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_mask_min_ph (__m512h __A, __mmask32 __B, __m512h __C, __m512h __D) +_mm_maskz_scalef_round_sh (__mmask8 __A, __m128h __B, __m128h __C, + const int __D) { - return __builtin_ia32_minph512_mask (__C, __D, __A, __B); + return __builtin_ia32_scalefsh_mask_round (__B, __C, + _mm_setzero_ph (), + __A, __D); } -extern __inline __m512h -__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_maskz_min_ph (__mmask32 __A, __m512h __B, __m512h __C) -{ - return __builtin_ia32_minph512_mask (__B, __C, - _mm512_setzero_ph (), __A); -} +#else +#define _mm_scalef_round_sh(A, B, C) \ + (__builtin_ia32_scalefsh_mask_round ((A), (B), \ + _mm_setzero_ph (), \ + (__mmask8)-1, (C))) +#define _mm_mask_scalef_round_sh(A, B, C, D, E) \ + (__builtin_ia32_scalefsh_mask_round ((C), (D), (A), (B), (E))) + +#define _mm_maskz_scalef_round_sh(A, B, C, D) \ + (__builtin_ia32_scalefsh_mask_round ((B), (C), _mm_setzero_ph (), \ + (A), (D))) + +#endif /* __OPTIMIZE__ */ + +/* Intrinsics vreducesh. */ #ifdef __OPTIMIZE__ -extern __inline __m512h +extern __inline __m128h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_max_round_ph (__m512h __A, __m512h __B, const int __C) +_mm_reduce_sh (__m128h __A, __m128h __B, int __C) { - return __builtin_ia32_maxph512_mask_round (__A, __B, - _mm512_setzero_ph (), - (__mmask32) -1, __C); + return __builtin_ia32_reducesh_mask_round (__A, __B, __C, + _mm_setzero_ph (), + (__mmask8) -1, + _MM_FROUND_CUR_DIRECTION); } -extern __inline __m512h +extern __inline __m128h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_mask_max_round_ph (__m512h __A, __mmask32 __B, __m512h __C, - __m512h __D, const int __E) +_mm_mask_reduce_sh (__m128h __A, __mmask8 __B, __m128h __C, + __m128h __D, int __E) { - return __builtin_ia32_maxph512_mask_round (__C, __D, __A, __B, __E); + return __builtin_ia32_reducesh_mask_round (__C, __D, __E, __A, __B, + _MM_FROUND_CUR_DIRECTION); } -extern __inline __m512h +extern __inline __m128h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_maskz_max_round_ph (__mmask32 __A, __m512h __B, __m512h __C, - const int __D) +_mm_maskz_reduce_sh (__mmask8 __A, __m128h __B, __m128h __C, int __D) { - return __builtin_ia32_maxph512_mask_round (__B, __C, - _mm512_setzero_ph (), - __A, __D); + return __builtin_ia32_reducesh_mask_round (__B, __C, __D, + _mm_setzero_ph (), __A, + _MM_FROUND_CUR_DIRECTION); } -extern __inline __m512h +extern __inline __m128h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_min_round_ph (__m512h __A, __m512h __B, const int __C) +_mm_reduce_round_sh (__m128h __A, __m128h __B, int __C, const int __D) { - return __builtin_ia32_minph512_mask_round (__A, __B, - _mm512_setzero_ph (), - (__mmask32) -1, __C); + return __builtin_ia32_reducesh_mask_round (__A, __B, __C, + _mm_setzero_ph (), + (__mmask8) -1, __D); } -extern __inline __m512h +extern __inline __m128h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_mask_min_round_ph (__m512h __A, __mmask32 __B, __m512h __C, - __m512h __D, const int __E) +_mm_mask_reduce_round_sh (__m128h __A, __mmask8 __B, __m128h __C, + __m128h __D, int __E, const int __F) { - return __builtin_ia32_minph512_mask_round (__C, __D, __A, __B, __E); + return __builtin_ia32_reducesh_mask_round (__C, __D, __E, __A, + __B, __F); } -extern __inline __m512h +extern __inline __m128h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_maskz_min_round_ph (__mmask32 __A, __m512h __B, __m512h __C, - const int __D) +_mm_maskz_reduce_round_sh (__mmask8 __A, __m128h __B, __m128h __C, + int __D, const int __E) { - return __builtin_ia32_minph512_mask_round (__B, __C, - _mm512_setzero_ph (), - __A, __D); + return __builtin_ia32_reducesh_mask_round (__B, __C, __D, + _mm_setzero_ph (), + __A, __E); } #else -#define _mm512_max_round_ph(A, B, C) \ - (__builtin_ia32_maxph512_mask_round ((A), (B), \ - _mm512_setzero_ph (), \ - (__mmask32)-1, (C))) +#define _mm_reduce_sh(A, B, C) \ + (__builtin_ia32_reducesh_mask_round ((A), (B), (C), \ + _mm_setzero_ph (), \ + (__mmask8)-1, \ + _MM_FROUND_CUR_DIRECTION)) -#define _mm512_mask_max_round_ph(A, B, C, D, E) \ - (__builtin_ia32_maxph512_mask_round ((C), (D), (A), (B), (E))) +#define _mm_mask_reduce_sh(A, B, C, D, E) \ + (__builtin_ia32_reducesh_mask_round ((C), (D), (E), (A), (B), \ + _MM_FROUND_CUR_DIRECTION)) -#define _mm512_maskz_max_round_ph(A, B, C, D) \ - (__builtin_ia32_maxph512_mask_round ((B), (C), \ - _mm512_setzero_ph (), \ - (A), (D))) +#define _mm_maskz_reduce_sh(A, B, C, D) \ + (__builtin_ia32_reducesh_mask_round ((B), (C), (D), \ + _mm_setzero_ph (), \ + (A), _MM_FROUND_CUR_DIRECTION)) -#define _mm512_min_round_ph(A, B, C) \ - (__builtin_ia32_minph512_mask_round ((A), (B), \ - _mm512_setzero_ph (), \ - (__mmask32)-1, (C))) +#define _mm_reduce_round_sh(A, B, C, D) \ + (__builtin_ia32_reducesh_mask_round ((A), (B), (C), \ + _mm_setzero_ph (), \ + (__mmask8)-1, (D))) -#define _mm512_mask_min_round_ph(A, B, C, D, E) \ - (__builtin_ia32_minph512_mask_round ((C), (D), (A), (B), (E))) +#define _mm_mask_reduce_round_sh(A, B, C, D, E, F) \ + (__builtin_ia32_reducesh_mask_round ((C), (D), (E), (A), (B), (F))) + +#define _mm_maskz_reduce_round_sh(A, B, C, D, E) \ + (__builtin_ia32_reducesh_mask_round ((B), (C), (D), \ + _mm_setzero_ph (), \ + (A), (E))) -#define _mm512_maskz_min_round_ph(A, B, C, D) \ - (__builtin_ia32_minph512_mask_round ((B), (C), \ - _mm512_setzero_ph (), \ - (A), (D))) #endif /* __OPTIMIZE__ */ -/* Intrinsic vmaxsh vminsh. */ +/* Intrinsics vrndscalesh. */ +#ifdef __OPTIMIZE__ extern __inline __m128h -__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_max_sh (__m128h __A, __m128h __B) + __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_roundscale_sh (__m128h __A, __m128h __B, int __C) { - __A[0] = __A[0] > __B[0] ? __A[0] : __B[0]; - return __A; + return __builtin_ia32_rndscalesh_mask_round (__A, __B, __C, + _mm_setzero_ph (), + (__mmask8) -1, + _MM_FROUND_CUR_DIRECTION); } extern __inline __m128h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_mask_max_sh (__m128h __A, __mmask8 __B, __m128h __C, __m128h __D) +_mm_mask_roundscale_sh (__m128h __A, __mmask8 __B, __m128h __C, + __m128h __D, int __E) { - return __builtin_ia32_maxsh_mask (__C, __D, __A, __B); + return __builtin_ia32_rndscalesh_mask_round (__C, __D, __E, __A, __B, + _MM_FROUND_CUR_DIRECTION); } extern __inline __m128h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_maskz_max_sh (__mmask8 __A, __m128h __B, __m128h __C) +_mm_maskz_roundscale_sh (__mmask8 __A, __m128h __B, __m128h __C, int __D) { - return __builtin_ia32_maxsh_mask (__B, __C, _mm_setzero_ph (), - __A); + return __builtin_ia32_rndscalesh_mask_round (__B, __C, __D, + _mm_setzero_ph (), __A, + _MM_FROUND_CUR_DIRECTION); } extern __inline __m128h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_min_sh (__m128h __A, __m128h __B) +_mm_roundscale_round_sh (__m128h __A, __m128h __B, int __C, const int __D) { - __A[0] = __A[0] < __B[0] ? __A[0] : __B[0]; - return __A; + return __builtin_ia32_rndscalesh_mask_round (__A, __B, __C, + _mm_setzero_ph (), + (__mmask8) -1, + __D); } extern __inline __m128h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_mask_min_sh (__m128h __A, __mmask8 __B, __m128h __C, __m128h __D) +_mm_mask_roundscale_round_sh (__m128h __A, __mmask8 __B, __m128h __C, + __m128h __D, int __E, const int __F) { - return __builtin_ia32_minsh_mask (__C, __D, __A, __B); + return __builtin_ia32_rndscalesh_mask_round (__C, __D, __E, + __A, __B, __F); } extern __inline __m128h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_maskz_min_sh (__mmask8 __A, __m128h __B, __m128h __C) +_mm_maskz_roundscale_round_sh (__mmask8 __A, __m128h __B, __m128h __C, + int __D, const int __E) { - return __builtin_ia32_minsh_mask (__B, __C, _mm_setzero_ph (), - __A); + return __builtin_ia32_rndscalesh_mask_round (__B, __C, __D, + _mm_setzero_ph (), + __A, __E); } +#else +#define _mm_roundscale_sh(A, B, C) \ + (__builtin_ia32_rndscalesh_mask_round ((A), (B), (C), \ + _mm_setzero_ph (), \ + (__mmask8)-1, \ + _MM_FROUND_CUR_DIRECTION)) + +#define _mm_mask_roundscale_sh(A, B, C, D, E) \ + (__builtin_ia32_rndscalesh_mask_round ((C), (D), (E), (A), (B), \ + _MM_FROUND_CUR_DIRECTION)) + +#define _mm_maskz_roundscale_sh(A, B, C, D) \ + (__builtin_ia32_rndscalesh_mask_round ((B), (C), (D), \ + _mm_setzero_ph (), \ + (A), _MM_FROUND_CUR_DIRECTION)) + +#define _mm_roundscale_round_sh(A, B, C, D) \ + (__builtin_ia32_rndscalesh_mask_round ((A), (B), (C), \ + _mm_setzero_ph (), \ + (__mmask8)-1, (D))) + +#define _mm_mask_roundscale_round_sh(A, B, C, D, E, F) \ + (__builtin_ia32_rndscalesh_mask_round ((C), (D), (E), (A), (B), (F))) + +#define _mm_maskz_roundscale_round_sh(A, B, C, D, E) \ + (__builtin_ia32_rndscalesh_mask_round ((B), (C), (D), \ + _mm_setzero_ph (), \ + (A), (E))) + +#endif /* __OPTIMIZE__ */ + +/* Intrinsics vfpclasssh. */ #ifdef __OPTIMIZE__ -extern __inline __m128h -__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_max_round_sh (__m128h __A, __m128h __B, const int __C) +extern __inline __mmask8 + __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_fpclass_sh_mask (__m128h __A, const int __imm) { - return __builtin_ia32_maxsh_mask_round (__A, __B, - _mm_setzero_ph (), - (__mmask8) -1, __C); + return (__mmask8) __builtin_ia32_fpclasssh_mask ((__v8hf) __A, __imm, + (__mmask8) -1); } -extern __inline __m128h +extern __inline __mmask8 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_mask_max_round_sh (__m128h __A, __mmask8 __B, __m128h __C, - __m128h __D, const int __E) +_mm_mask_fpclass_sh_mask (__mmask8 __U, __m128h __A, const int __imm) { - return __builtin_ia32_maxsh_mask_round (__C, __D, __A, __B, __E); + return (__mmask8) __builtin_ia32_fpclasssh_mask ((__v8hf) __A, __imm, __U); } +#else +#define _mm_fpclass_sh_mask(X, C) \ + ((__mmask8) __builtin_ia32_fpclasssh_mask ((__v8hf) (__m128h) (X), \ + (int) (C), (__mmask8) (-1))) \ + +#define _mm_mask_fpclass_sh_mask(U, X, C) \ + ((__mmask8) __builtin_ia32_fpclasssh_mask ((__v8hf) (__m128h) (X), \ + (int) (C), (__mmask8) (U))) + +#endif /* __OPTIMIZE__ */ + +/* Intrinsics vgetexpsh. */ extern __inline __m128h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_maskz_max_round_sh (__mmask8 __A, __m128h __B, __m128h __C, - const int __D) +_mm_getexp_sh (__m128h __A, __m128h __B) { - return __builtin_ia32_maxsh_mask_round (__B, __C, - _mm_setzero_ph (), - __A, __D); + return (__m128h) + __builtin_ia32_getexpsh_mask_round ((__v8hf) __A, (__v8hf) __B, + (__v8hf) _mm_setzero_ph (), + (__mmask8) -1, + _MM_FROUND_CUR_DIRECTION); } extern __inline __m128h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_min_round_sh (__m128h __A, __m128h __B, const int __C) +_mm_mask_getexp_sh (__m128h __W, __mmask8 __U, __m128h __A, __m128h __B) { - return __builtin_ia32_minsh_mask_round (__A, __B, - _mm_setzero_ph (), - (__mmask8) -1, __C); + return (__m128h) + __builtin_ia32_getexpsh_mask_round ((__v8hf) __A, (__v8hf) __B, + (__v8hf) __W, (__mmask8) __U, + _MM_FROUND_CUR_DIRECTION); } extern __inline __m128h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_mask_min_round_sh (__m128h __A, __mmask8 __B, __m128h __C, - __m128h __D, const int __E) +_mm_maskz_getexp_sh (__mmask8 __U, __m128h __A, __m128h __B) { - return __builtin_ia32_minsh_mask_round (__C, __D, __A, __B, __E); + return (__m128h) + __builtin_ia32_getexpsh_mask_round ((__v8hf) __A, (__v8hf) __B, + (__v8hf) _mm_setzero_ph (), + (__mmask8) __U, + _MM_FROUND_CUR_DIRECTION); } +#ifdef __OPTIMIZE__ extern __inline __m128h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_maskz_min_round_sh (__mmask8 __A, __m128h __B, __m128h __C, - const int __D) +_mm_getexp_round_sh (__m128h __A, __m128h __B, const int __R) { - return __builtin_ia32_minsh_mask_round (__B, __C, - _mm_setzero_ph (), - __A, __D); + return (__m128h) __builtin_ia32_getexpsh_mask_round ((__v8hf) __A, + (__v8hf) __B, + _mm_setzero_ph (), + (__mmask8) -1, + __R); } -#else -#define _mm_max_round_sh(A, B, C) \ - (__builtin_ia32_maxsh_mask_round ((A), (B), \ - _mm_setzero_ph (), \ - (__mmask8)-1, (C))) - -#define _mm_mask_max_round_sh(A, B, C, D, E) \ - (__builtin_ia32_maxsh_mask_round ((C), (D), (A), (B), (E))) +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_getexp_round_sh (__m128h __W, __mmask8 __U, __m128h __A, + __m128h __B, const int __R) +{ + return (__m128h) __builtin_ia32_getexpsh_mask_round ((__v8hf) __A, + (__v8hf) __B, + (__v8hf) __W, + (__mmask8) __U, __R); +} -#define _mm_maskz_max_round_sh(A, B, C, D) \ - (__builtin_ia32_maxsh_mask_round ((B), (C), \ - _mm_setzero_ph (), \ - (A), (D))) +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_getexp_round_sh (__mmask8 __U, __m128h __A, __m128h __B, + const int __R) +{ + return (__m128h) __builtin_ia32_getexpsh_mask_round ((__v8hf) __A, + (__v8hf) __B, + (__v8hf) + _mm_setzero_ph (), + (__mmask8) __U, __R); +} -#define _mm_min_round_sh(A, B, C) \ - (__builtin_ia32_minsh_mask_round ((A), (B), \ - _mm_setzero_ph (), \ - (__mmask8)-1, (C))) +#else +#define _mm_getexp_round_sh(A, B, R) \ + ((__m128h)__builtin_ia32_getexpsh_mask_round((__v8hf)(__m128h)(A), \ + (__v8hf)(__m128h)(B), \ + (__v8hf)_mm_setzero_ph(), \ + (__mmask8)-1, R)) -#define _mm_mask_min_round_sh(A, B, C, D, E) \ - (__builtin_ia32_minsh_mask_round ((C), (D), (A), (B), (E))) +#define _mm_mask_getexp_round_sh(W, U, A, B, C) \ + (__m128h)__builtin_ia32_getexpsh_mask_round(A, B, W, U, C) -#define _mm_maskz_min_round_sh(A, B, C, D) \ - (__builtin_ia32_minsh_mask_round ((B), (C), \ - _mm_setzero_ph (), \ - (A), (D))) +#define _mm_maskz_getexp_round_sh(U, A, B, C) \ + (__m128h)__builtin_ia32_getexpsh_mask_round(A, B, \ + (__v8hf)_mm_setzero_ph(), \ + U, C) #endif /* __OPTIMIZE__ */ -/* vcmpph */ -#ifdef __OPTIMIZE -extern __inline __mmask32 +/* Intrinsics vgetmantsh. */ +#ifdef __OPTIMIZE__ +extern __inline __m128h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_cmp_ph_mask (__m512h __A, __m512h __B, const int __C) +_mm_getmant_sh (__m128h __A, __m128h __B, + _MM_MANTISSA_NORM_ENUM __C, + _MM_MANTISSA_SIGN_ENUM __D) { - return (__mmask32) __builtin_ia32_cmpph512_mask (__A, __B, __C, - (__mmask32) -1); + return (__m128h) + __builtin_ia32_getmantsh_mask_round ((__v8hf) __A, (__v8hf) __B, + (__D << 2) | __C, _mm_setzero_ph (), + (__mmask8) -1, + _MM_FROUND_CUR_DIRECTION); } -extern __inline __mmask32 +extern __inline __m128h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_mask_cmp_ph_mask (__mmask32 __A, __m512h __B, __m512h __C, - const int __D) +_mm_mask_getmant_sh (__m128h __W, __mmask8 __U, __m128h __A, + __m128h __B, _MM_MANTISSA_NORM_ENUM __C, + _MM_MANTISSA_SIGN_ENUM __D) { - return (__mmask32) __builtin_ia32_cmpph512_mask (__B, __C, __D, - __A); + return (__m128h) + __builtin_ia32_getmantsh_mask_round ((__v8hf) __A, (__v8hf) __B, + (__D << 2) | __C, (__v8hf) __W, + __U, _MM_FROUND_CUR_DIRECTION); } -extern __inline __mmask32 +extern __inline __m128h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_cmp_round_ph_mask (__m512h __A, __m512h __B, const int __C, - const int __D) +_mm_maskz_getmant_sh (__mmask8 __U, __m128h __A, __m128h __B, + _MM_MANTISSA_NORM_ENUM __C, + _MM_MANTISSA_SIGN_ENUM __D) { - return (__mmask32) __builtin_ia32_cmpph512_mask_round (__A, __B, - __C, (__mmask32) -1, - __D); + return (__m128h) + __builtin_ia32_getmantsh_mask_round ((__v8hf) __A, (__v8hf) __B, + (__D << 2) | __C, + (__v8hf) _mm_setzero_ph(), + __U, _MM_FROUND_CUR_DIRECTION); } -extern __inline __mmask32 +extern __inline __m128h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_mask_cmp_round_ph_mask (__mmask32 __A, __m512h __B, __m512h __C, - const int __D, const int __E) +_mm_getmant_round_sh (__m128h __A, __m128h __B, + _MM_MANTISSA_NORM_ENUM __C, + _MM_MANTISSA_SIGN_ENUM __D, const int __R) { - return (__mmask32) __builtin_ia32_cmpph512_mask_round (__B, __C, - __D, __A, - __E); + return (__m128h) __builtin_ia32_getmantsh_mask_round ((__v8hf) __A, + (__v8hf) __B, + (__D << 2) | __C, + _mm_setzero_ph (), + (__mmask8) -1, + __R); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_getmant_round_sh (__m128h __W, __mmask8 __U, __m128h __A, + __m128h __B, _MM_MANTISSA_NORM_ENUM __C, + _MM_MANTISSA_SIGN_ENUM __D, const int __R) +{ + return (__m128h) __builtin_ia32_getmantsh_mask_round ((__v8hf) __A, + (__v8hf) __B, + (__D << 2) | __C, + (__v8hf) __W, + __U, __R); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_getmant_round_sh (__mmask8 __U, __m128h __A, __m128h __B, + _MM_MANTISSA_NORM_ENUM __C, + _MM_MANTISSA_SIGN_ENUM __D, const int __R) +{ + return (__m128h) __builtin_ia32_getmantsh_mask_round ((__v8hf) __A, + (__v8hf) __B, + (__D << 2) | __C, + (__v8hf) + _mm_setzero_ph(), + __U, __R); } #else -#define _mm512_cmp_ph_mask(A, B, C) \ - (__builtin_ia32_cmpph512_mask ((A), (B), (C), (-1))) +#define _mm_getmant_sh(X, Y, C, D) \ + ((__m128h)__builtin_ia32_getmantsh_mask_round ((__v8hf)(__m128h)(X), \ + (__v8hf)(__m128h)(Y), \ + (int)(((D)<<2) | (C)), \ + (__v8hf)(__m128h) \ + _mm_setzero_ph (), \ + (__mmask8)-1, \ + _MM_FROUND_CUR_DIRECTION)) -#define _mm512_mask_cmp_ph_mask(A, B, C, D) \ - (__builtin_ia32_cmpph512_mask ((B), (C), (D), (A))) +#define _mm_mask_getmant_sh(W, U, X, Y, C, D) \ + ((__m128h)__builtin_ia32_getmantsh_mask_round ((__v8hf)(__m128h)(X), \ + (__v8hf)(__m128h)(Y), \ + (int)(((D)<<2) | (C)), \ + (__v8hf)(__m128h)(W), \ + (__mmask8)(U), \ + _MM_FROUND_CUR_DIRECTION)) -#define _mm512_cmp_round_ph_mask(A, B, C, D) \ - (__builtin_ia32_cmpph512_mask_round ((A), (B), (C), (-1), (D))) +#define _mm_maskz_getmant_sh(U, X, Y, C, D) \ + ((__m128h)__builtin_ia32_getmantsh_mask_round ((__v8hf)(__m128h)(X), \ + (__v8hf)(__m128h)(Y), \ + (int)(((D)<<2) | (C)), \ + (__v8hf)(__m128h) \ + _mm_setzero_ph(), \ + (__mmask8)(U), \ + _MM_FROUND_CUR_DIRECTION)) -#define _mm512_mask_cmp_round_ph_mask(A, B, C, D, E) \ - (__builtin_ia32_cmpph512_mask_round ((B), (C), (D), (A), (E))) +#define _mm_getmant_round_sh(X, Y, C, D, R) \ + ((__m128h)__builtin_ia32_getmantsh_mask_round ((__v8hf)(__m128h)(X), \ + (__v8hf)(__m128h)(Y), \ + (int)(((D)<<2) | (C)), \ + (__v8hf)(__m128h) \ + _mm_setzero_ph (), \ + (__mmask8)-1, \ + (R))) + +#define _mm_mask_getmant_round_sh(W, U, X, Y, C, D, R) \ + ((__m128h)__builtin_ia32_getmantsh_mask_round ((__v8hf)(__m128h)(X), \ + (__v8hf)(__m128h)(Y), \ + (int)(((D)<<2) | (C)), \ + (__v8hf)(__m128h)(W), \ + (__mmask8)(U), \ + (R))) + +#define _mm_maskz_getmant_round_sh(U, X, Y, C, D, R) \ + ((__m128h)__builtin_ia32_getmantsh_mask_round ((__v8hf)(__m128h)(X), \ + (__v8hf)(__m128h)(Y), \ + (int)(((D)<<2) | (C)), \ + (__v8hf)(__m128h) \ + _mm_setzero_ph(), \ + (__mmask8)(U), \ + (R))) #endif /* __OPTIMIZE__ */ -/* Intrinsics vcmpsh. */ -#ifdef __OPTIMIZE__ -extern __inline __mmask8 +/* Intrinsics vmovw. */ +extern __inline __m128i __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_cmp_sh_mask (__m128h __A, __m128h __B, const int __C) +_mm_cvtsi16_si128 (short __A) { - return (__mmask8) - __builtin_ia32_cmpsh_mask_round (__A, __B, - __C, (__mmask8) -1, - _MM_FROUND_CUR_DIRECTION); + return _mm_set_epi16 (0, 0, 0, 0, 0, 0, 0, __A); } -extern __inline __mmask8 +extern __inline short __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_mask_cmp_sh_mask (__mmask8 __A, __m128h __B, __m128h __C, - const int __D) +_mm_cvtsi128_si16 (__m128i __A) { - return (__mmask8) - __builtin_ia32_cmpsh_mask_round (__B, __C, - __D, __A, - _MM_FROUND_CUR_DIRECTION); + return __builtin_ia32_vec_ext_v8hi ((__v8hi)__A, 0); } -extern __inline __mmask8 +/* Intrinsics vmovsh. */ +extern __inline __m128h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_cmp_round_sh_mask (__m128h __A, __m128h __B, const int __C, - const int __D) +_mm_mask_load_sh (__m128h __A, __mmask8 __B, _Float16 const* __C) { - return (__mmask8) __builtin_ia32_cmpsh_mask_round (__A, __B, - __C, (__mmask8) -1, - __D); + return __builtin_ia32_loadsh_mask (__C, __A, __B); } -extern __inline __mmask8 +extern __inline __m128h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_mask_cmp_round_sh_mask (__mmask8 __A, __m128h __B, __m128h __C, - const int __D, const int __E) +_mm_maskz_load_sh (__mmask8 __A, _Float16 const* __B) { - return (__mmask8) __builtin_ia32_cmpsh_mask_round (__B, __C, - __D, __A, - __E); + return __builtin_ia32_loadsh_mask (__B, _mm_setzero_ph (), __A); } -#else -#define _mm_cmp_sh_mask(A, B, C) \ - (__builtin_ia32_cmpsh_mask_round ((A), (B), (C), (-1), \ - (_MM_FROUND_CUR_DIRECTION))) - -#define _mm_mask_cmp_sh_mask(A, B, C, D) \ - (__builtin_ia32_cmpsh_mask_round ((B), (C), (D), (A), \ - (_MM_FROUND_CUR_DIRECTION))) - -#define _mm_cmp_round_sh_mask(A, B, C, D) \ - (__builtin_ia32_cmpsh_mask_round ((A), (B), (C), (-1), (D))) - -#define _mm_mask_cmp_round_sh_mask(A, B, C, D, E) \ - (__builtin_ia32_cmpsh_mask_round ((B), (C), (D), (A), (E))) - -#endif /* __OPTIMIZE__ */ - -/* Intrinsics vcomish. */ -extern __inline int +extern __inline void __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_comieq_sh (__m128h __A, __m128h __B) +_mm_mask_store_sh (_Float16 const* __A, __mmask8 __B, __m128h __C) { - return __builtin_ia32_cmpsh_mask_round (__A, __B, _CMP_EQ_OS, - (__mmask8) -1, - _MM_FROUND_CUR_DIRECTION); + __builtin_ia32_storesh_mask (__A, __C, __B); } -extern __inline int +extern __inline __m128h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_comilt_sh (__m128h __A, __m128h __B) +_mm_move_sh (__m128h __A, __m128h __B) { - return __builtin_ia32_cmpsh_mask_round (__A, __B, _CMP_LT_OS, - (__mmask8) -1, - _MM_FROUND_CUR_DIRECTION); + __A[0] = __B[0]; + return __A; } -extern __inline int +extern __inline __m128h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_comile_sh (__m128h __A, __m128h __B) +_mm_mask_move_sh (__m128h __A, __mmask8 __B, __m128h __C, __m128h __D) { - return __builtin_ia32_cmpsh_mask_round (__A, __B, _CMP_LE_OS, - (__mmask8) -1, - _MM_FROUND_CUR_DIRECTION); + return __builtin_ia32_vmovsh_mask (__C, __D, __A, __B); } -extern __inline int +extern __inline __m128h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_comigt_sh (__m128h __A, __m128h __B) +_mm_maskz_move_sh (__mmask8 __A, __m128h __B, __m128h __C) { - return __builtin_ia32_cmpsh_mask_round (__A, __B, _CMP_GT_OS, - (__mmask8) -1, - _MM_FROUND_CUR_DIRECTION); + return __builtin_ia32_vmovsh_mask (__B, __C, _mm_setzero_ph (), __A); } +/* Intrinsics vcvtsh2si, vcvtsh2us. */ extern __inline int __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_comige_sh (__m128h __A, __m128h __B) +_mm_cvtsh_i32 (__m128h __A) { - return __builtin_ia32_cmpsh_mask_round (__A, __B, _CMP_GE_OS, - (__mmask8) -1, - _MM_FROUND_CUR_DIRECTION); + return (int) __builtin_ia32_vcvtsh2si32_round (__A, _MM_FROUND_CUR_DIRECTION); } -extern __inline int +extern __inline unsigned __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_comineq_sh (__m128h __A, __m128h __B) +_mm_cvtsh_u32 (__m128h __A) { - return __builtin_ia32_cmpsh_mask_round (__A, __B, _CMP_NEQ_US, - (__mmask8) -1, - _MM_FROUND_CUR_DIRECTION); + return (int) __builtin_ia32_vcvtsh2usi32_round (__A, + _MM_FROUND_CUR_DIRECTION); } +#ifdef __OPTIMIZE__ extern __inline int __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_ucomieq_sh (__m128h __A, __m128h __B) +_mm_cvt_roundsh_i32 (__m128h __A, const int __R) { - return __builtin_ia32_cmpsh_mask_round (__A, __B, _CMP_EQ_OQ, - (__mmask8) -1, - _MM_FROUND_CUR_DIRECTION); + return (int) __builtin_ia32_vcvtsh2si32_round (__A, __R); } -extern __inline int +extern __inline unsigned __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_ucomilt_sh (__m128h __A, __m128h __B) +_mm_cvt_roundsh_u32 (__m128h __A, const int __R) { - return __builtin_ia32_cmpsh_mask_round (__A, __B, _CMP_LT_OQ, - (__mmask8) -1, - _MM_FROUND_CUR_DIRECTION); + return (int) __builtin_ia32_vcvtsh2usi32_round (__A, __R); } -extern __inline int -__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_ucomile_sh (__m128h __A, __m128h __B) -{ - return __builtin_ia32_cmpsh_mask_round (__A, __B, _CMP_LE_OQ, - (__mmask8) -1, - _MM_FROUND_CUR_DIRECTION); -} +#else +#define _mm_cvt_roundsh_i32(A, B) \ + ((int)__builtin_ia32_vcvtsh2si32_round ((A), (B))) +#define _mm_cvt_roundsh_u32(A, B) \ + ((int)__builtin_ia32_vcvtsh2usi32_round ((A), (B))) -extern __inline int -__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_ucomigt_sh (__m128h __A, __m128h __B) -{ - return __builtin_ia32_cmpsh_mask_round (__A, __B, _CMP_GT_OQ, - (__mmask8) -1, - _MM_FROUND_CUR_DIRECTION); -} +#endif /* __OPTIMIZE__ */ -extern __inline int +#ifdef __x86_64__ +extern __inline long long __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_ucomige_sh (__m128h __A, __m128h __B) +_mm_cvtsh_i64 (__m128h __A) { - return __builtin_ia32_cmpsh_mask_round (__A, __B, _CMP_GE_OQ, - (__mmask8) -1, - _MM_FROUND_CUR_DIRECTION); + return (long long) + __builtin_ia32_vcvtsh2si64_round (__A, _MM_FROUND_CUR_DIRECTION); } -extern __inline int +extern __inline unsigned long long __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_ucomineq_sh (__m128h __A, __m128h __B) +_mm_cvtsh_u64 (__m128h __A) { - return __builtin_ia32_cmpsh_mask_round (__A, __B, _CMP_NEQ_UQ, - (__mmask8) -1, - _MM_FROUND_CUR_DIRECTION); + return (long long) + __builtin_ia32_vcvtsh2usi64_round (__A, _MM_FROUND_CUR_DIRECTION); } #ifdef __OPTIMIZE__ -extern __inline int +extern __inline long long __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_comi_sh (__m128h __A, __m128h __B, const int __P) +_mm_cvt_roundsh_i64 (__m128h __A, const int __R) { - return __builtin_ia32_cmpsh_mask_round (__A, __B, __P, - (__mmask8) -1, - _MM_FROUND_CUR_DIRECTION); + return (long long) __builtin_ia32_vcvtsh2si64_round (__A, __R); } -extern __inline int +extern __inline unsigned long long __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_comi_round_sh (__m128h __A, __m128h __B, const int __P, const int __R) +_mm_cvt_roundsh_u64 (__m128h __A, const int __R) { - return __builtin_ia32_cmpsh_mask_round (__A, __B, __P, - (__mmask8) -1,__R); + return (long long) __builtin_ia32_vcvtsh2usi64_round (__A, __R); } #else -#define _mm_comi_round_sh(A, B, P, R) \ - (__builtin_ia32_cmpsh_mask_round ((A), (B), (P), (__mmask8) (-1), (R))) -#define _mm_comi_sh(A, B, P) \ - (__builtin_ia32_cmpsh_mask_round ((A), (B), (P), (__mmask8) (-1), \ - _MM_FROUND_CUR_DIRECTION)) - -#endif /* __OPTIMIZE__ */ +#define _mm_cvt_roundsh_i64(A, B) \ + ((long long)__builtin_ia32_vcvtsh2si64_round ((A), (B))) +#define _mm_cvt_roundsh_u64(A, B) \ + ((long long)__builtin_ia32_vcvtsh2usi64_round ((A), (B))) -/* Intrinsics vsqrtph. */ -extern __inline __m512h -__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_sqrt_ph (__m512h __A) -{ - return __builtin_ia32_sqrtph512_mask_round (__A, - _mm512_setzero_ph(), - (__mmask32) -1, - _MM_FROUND_CUR_DIRECTION); -} +#endif /* __OPTIMIZE__ */ +#endif /* __x86_64__ */ -extern __inline __m512h +/* Intrinsics vcvtsi2sh, vcvtusi2sh. */ +extern __inline __m128h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_mask_sqrt_ph (__m512h __A, __mmask32 __B, __m512h __C) +_mm_cvti32_sh (__m128h __A, int __B) { - return __builtin_ia32_sqrtph512_mask_round (__C, __A, __B, - _MM_FROUND_CUR_DIRECTION); + return __builtin_ia32_vcvtsi2sh32_round (__A, __B, _MM_FROUND_CUR_DIRECTION); } -extern __inline __m512h +extern __inline __m128h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_maskz_sqrt_ph (__mmask32 __A, __m512h __B) +_mm_cvtu32_sh (__m128h __A, unsigned int __B) { - return __builtin_ia32_sqrtph512_mask_round (__B, - _mm512_setzero_ph (), - __A, - _MM_FROUND_CUR_DIRECTION); + return __builtin_ia32_vcvtusi2sh32_round (__A, __B, _MM_FROUND_CUR_DIRECTION); } #ifdef __OPTIMIZE__ -extern __inline __m512h -__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_sqrt_round_ph (__m512h __A, const int __B) -{ - return __builtin_ia32_sqrtph512_mask_round (__A, - _mm512_setzero_ph(), - (__mmask32) -1, __B); -} - -extern __inline __m512h +extern __inline __m128h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_mask_sqrt_round_ph (__m512h __A, __mmask32 __B, __m512h __C, - const int __D) +_mm_cvt_roundi32_sh (__m128h __A, int __B, const int __R) { - return __builtin_ia32_sqrtph512_mask_round (__C, __A, __B, __D); + return __builtin_ia32_vcvtsi2sh32_round (__A, __B, __R); } -extern __inline __m512h +extern __inline __m128h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_maskz_sqrt_round_ph (__mmask32 __A, __m512h __B, const int __C) +_mm_cvt_roundu32_sh (__m128h __A, unsigned int __B, const int __R) { - return __builtin_ia32_sqrtph512_mask_round (__B, - _mm512_setzero_ph (), - __A, __C); + return __builtin_ia32_vcvtusi2sh32_round (__A, __B, __R); } #else -#define _mm512_sqrt_round_ph(A, B) \ - (__builtin_ia32_sqrtph512_mask_round ((A), \ - _mm512_setzero_ph (), \ - (__mmask32)-1, (B))) - -#define _mm512_mask_sqrt_round_ph(A, B, C, D) \ - (__builtin_ia32_sqrtph512_mask_round ((C), (A), (B), (D))) - -#define _mm512_maskz_sqrt_round_ph(A, B, C) \ - (__builtin_ia32_sqrtph512_mask_round ((B), \ - _mm512_setzero_ph (), \ - (A), (C))) +#define _mm_cvt_roundi32_sh(A, B, C) \ + (__builtin_ia32_vcvtsi2sh32_round ((A), (B), (C))) +#define _mm_cvt_roundu32_sh(A, B, C) \ + (__builtin_ia32_vcvtusi2sh32_round ((A), (B), (C))) #endif /* __OPTIMIZE__ */ -/* Intrinsics vrsqrtph. */ -extern __inline __m512h +#ifdef __x86_64__ +extern __inline __m128h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_rsqrt_ph (__m512h __A) +_mm_cvti64_sh (__m128h __A, long long __B) { - return __builtin_ia32_rsqrtph512_mask (__A, _mm512_setzero_ph (), - (__mmask32) -1); + return __builtin_ia32_vcvtsi2sh64_round (__A, __B, _MM_FROUND_CUR_DIRECTION); } -extern __inline __m512h +extern __inline __m128h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_mask_rsqrt_ph (__m512h __A, __mmask32 __B, __m512h __C) +_mm_cvtu64_sh (__m128h __A, unsigned long long __B) { - return __builtin_ia32_rsqrtph512_mask (__C, __A, __B); + return __builtin_ia32_vcvtusi2sh64_round (__A, __B, _MM_FROUND_CUR_DIRECTION); } -extern __inline __m512h +#ifdef __OPTIMIZE__ +extern __inline __m128h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_maskz_rsqrt_ph (__mmask32 __A, __m512h __B) +_mm_cvt_roundi64_sh (__m128h __A, long long __B, const int __R) { - return __builtin_ia32_rsqrtph512_mask (__B, _mm512_setzero_ph (), - __A); + return __builtin_ia32_vcvtsi2sh64_round (__A, __B, __R); } -/* Intrinsics vrsqrtsh. */ extern __inline __m128h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_rsqrt_sh (__m128h __A, __m128h __B) +_mm_cvt_roundu64_sh (__m128h __A, unsigned long long __B, const int __R) { - return __builtin_ia32_rsqrtsh_mask (__B, __A, _mm_setzero_ph (), - (__mmask8) -1); + return __builtin_ia32_vcvtusi2sh64_round (__A, __B, __R); } -extern __inline __m128h +#else +#define _mm_cvt_roundi64_sh(A, B, C) \ + (__builtin_ia32_vcvtsi2sh64_round ((A), (B), (C))) +#define _mm_cvt_roundu64_sh(A, B, C) \ + (__builtin_ia32_vcvtusi2sh64_round ((A), (B), (C))) + +#endif /* __OPTIMIZE__ */ +#endif /* __x86_64__ */ + +/* Intrinsics vcvttsh2si, vcvttsh2us. */ +extern __inline int __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_mask_rsqrt_sh (__m128h __A, __mmask8 __B, __m128h __C, __m128h __D) +_mm_cvttsh_i32 (__m128h __A) { - return __builtin_ia32_rsqrtsh_mask (__D, __C, __A, __B); + return (int) + __builtin_ia32_vcvttsh2si32_round (__A, _MM_FROUND_CUR_DIRECTION); } -extern __inline __m128h +extern __inline unsigned __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_maskz_rsqrt_sh (__mmask8 __A, __m128h __B, __m128h __C) +_mm_cvttsh_u32 (__m128h __A) { - return __builtin_ia32_rsqrtsh_mask (__C, __B, _mm_setzero_ph (), - __A); + return (int) + __builtin_ia32_vcvttsh2usi32_round (__A, _MM_FROUND_CUR_DIRECTION); } -/* Intrinsics vsqrtsh. */ -extern __inline __m128h +#ifdef __OPTIMIZE__ +extern __inline int __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_sqrt_sh (__m128h __A, __m128h __B) +_mm_cvtt_roundsh_i32 (__m128h __A, const int __R) { - return __builtin_ia32_sqrtsh_mask_round (__B, __A, - _mm_setzero_ph (), - (__mmask8) -1, - _MM_FROUND_CUR_DIRECTION); + return (int) __builtin_ia32_vcvttsh2si32_round (__A, __R); } -extern __inline __m128h +extern __inline unsigned __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_mask_sqrt_sh (__m128h __A, __mmask8 __B, __m128h __C, __m128h __D) +_mm_cvtt_roundsh_u32 (__m128h __A, const int __R) { - return __builtin_ia32_sqrtsh_mask_round (__D, __C, __A, __B, - _MM_FROUND_CUR_DIRECTION); + return (int) __builtin_ia32_vcvttsh2usi32_round (__A, __R); } -extern __inline __m128h +#else +#define _mm_cvtt_roundsh_i32(A, B) \ + ((int)__builtin_ia32_vcvttsh2si32_round ((A), (B))) +#define _mm_cvtt_roundsh_u32(A, B) \ + ((int)__builtin_ia32_vcvttsh2usi32_round ((A), (B))) + +#endif /* __OPTIMIZE__ */ + +#ifdef __x86_64__ +extern __inline long long __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_maskz_sqrt_sh (__mmask8 __A, __m128h __B, __m128h __C) +_mm_cvttsh_i64 (__m128h __A) { - return __builtin_ia32_sqrtsh_mask_round (__C, __B, - _mm_setzero_ph (), - __A, _MM_FROUND_CUR_DIRECTION); + return (long long) + __builtin_ia32_vcvttsh2si64_round (__A, _MM_FROUND_CUR_DIRECTION); } -#ifdef __OPTIMIZE__ -extern __inline __m128h +extern __inline unsigned long long __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_sqrt_round_sh (__m128h __A, __m128h __B, const int __C) +_mm_cvttsh_u64 (__m128h __A) { - return __builtin_ia32_sqrtsh_mask_round (__B, __A, - _mm_setzero_ph (), - (__mmask8) -1, __C); + return (long long) + __builtin_ia32_vcvttsh2usi64_round (__A, _MM_FROUND_CUR_DIRECTION); } -extern __inline __m128h +#ifdef __OPTIMIZE__ +extern __inline long long __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_mask_sqrt_round_sh (__m128h __A, __mmask8 __B, __m128h __C, - __m128h __D, const int __E) +_mm_cvtt_roundsh_i64 (__m128h __A, const int __R) { - return __builtin_ia32_sqrtsh_mask_round (__D, __C, __A, __B, - __E); + return (long long) __builtin_ia32_vcvttsh2si64_round (__A, __R); } -extern __inline __m128h +extern __inline unsigned long long __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_maskz_sqrt_round_sh (__mmask8 __A, __m128h __B, __m128h __C, - const int __D) +_mm_cvtt_roundsh_u64 (__m128h __A, const int __R) { - return __builtin_ia32_sqrtsh_mask_round (__C, __B, - _mm_setzero_ph (), - __A, __D); + return (long long) __builtin_ia32_vcvttsh2usi64_round (__A, __R); } #else -#define _mm_sqrt_round_sh(A, B, C) \ - (__builtin_ia32_sqrtsh_mask_round ((B), (A), \ - _mm_setzero_ph (), \ - (__mmask8)-1, (C))) - -#define _mm_mask_sqrt_round_sh(A, B, C, D, E) \ - (__builtin_ia32_sqrtsh_mask_round ((D), (C), (A), (B), (E))) - -#define _mm_maskz_sqrt_round_sh(A, B, C, D) \ - (__builtin_ia32_sqrtsh_mask_round ((C), (B), \ - _mm_setzero_ph (), \ - (A), (D))) +#define _mm_cvtt_roundsh_i64(A, B) \ + ((long long)__builtin_ia32_vcvttsh2si64_round ((A), (B))) +#define _mm_cvtt_roundsh_u64(A, B) \ + ((long long)__builtin_ia32_vcvttsh2usi64_round ((A), (B))) #endif /* __OPTIMIZE__ */ +#endif /* __x86_64__ */ -/* Intrinsics vrcpph. */ -extern __inline __m512h +/* Intrinsics vcvtsh2ss, vcvtsh2sd. */ +extern __inline __m128 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_rcp_ph (__m512h __A) +_mm_cvtsh_ss (__m128 __A, __m128h __B) { - return __builtin_ia32_rcpph512_mask (__A, _mm512_setzero_ph (), - (__mmask32) -1); + return __builtin_ia32_vcvtsh2ss_mask_round (__B, __A, + _mm_setzero_ps (), + (__mmask8) -1, + _MM_FROUND_CUR_DIRECTION); } -extern __inline __m512h +extern __inline __m128 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_mask_rcp_ph (__m512h __A, __mmask32 __B, __m512h __C) +_mm_mask_cvtsh_ss (__m128 __A, __mmask8 __B, __m128 __C, + __m128h __D) { - return __builtin_ia32_rcpph512_mask (__C, __A, __B); + return __builtin_ia32_vcvtsh2ss_mask_round (__D, __C, __A, __B, + _MM_FROUND_CUR_DIRECTION); } -extern __inline __m512h +extern __inline __m128 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_maskz_rcp_ph (__mmask32 __A, __m512h __B) +_mm_maskz_cvtsh_ss (__mmask8 __A, __m128 __B, + __m128h __C) { - return __builtin_ia32_rcpph512_mask (__B, _mm512_setzero_ph (), - __A); + return __builtin_ia32_vcvtsh2ss_mask_round (__C, __B, + _mm_setzero_ps (), + __A, _MM_FROUND_CUR_DIRECTION); } -/* Intrinsics vrcpsh. */ -extern __inline __m128h +extern __inline __m128d __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_rcp_sh (__m128h __A, __m128h __B) +_mm_cvtsh_sd (__m128d __A, __m128h __B) { - return __builtin_ia32_rcpsh_mask (__B, __A, _mm_setzero_ph (), - (__mmask8) -1); + return __builtin_ia32_vcvtsh2sd_mask_round (__B, __A, + _mm_setzero_pd (), + (__mmask8) -1, + _MM_FROUND_CUR_DIRECTION); } -extern __inline __m128h +extern __inline __m128d __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_mask_rcp_sh (__m128h __A, __mmask32 __B, __m128h __C, __m128h __D) +_mm_mask_cvtsh_sd (__m128d __A, __mmask8 __B, __m128d __C, + __m128h __D) { - return __builtin_ia32_rcpsh_mask (__D, __C, __A, __B); + return __builtin_ia32_vcvtsh2sd_mask_round (__D, __C, __A, __B, + _MM_FROUND_CUR_DIRECTION); } -extern __inline __m128h +extern __inline __m128d __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_maskz_rcp_sh (__mmask32 __A, __m128h __B, __m128h __C) +_mm_maskz_cvtsh_sd (__mmask8 __A, __m128d __B, __m128h __C) { - return __builtin_ia32_rcpsh_mask (__C, __B, _mm_setzero_ph (), - __A); + return __builtin_ia32_vcvtsh2sd_mask_round (__C, __B, + _mm_setzero_pd (), + __A, _MM_FROUND_CUR_DIRECTION); } -/* Intrinsics vscalefph. */ -extern __inline __m512h +#ifdef __OPTIMIZE__ +extern __inline __m128 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_scalef_ph (__m512h __A, __m512h __B) +_mm_cvt_roundsh_ss (__m128 __A, __m128h __B, const int __R) { - return __builtin_ia32_scalefph512_mask_round (__A, __B, - _mm512_setzero_ph (), - (__mmask32) -1, - _MM_FROUND_CUR_DIRECTION); + return __builtin_ia32_vcvtsh2ss_mask_round (__B, __A, + _mm_setzero_ps (), + (__mmask8) -1, __R); } -extern __inline __m512h +extern __inline __m128 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_mask_scalef_ph (__m512h __A, __mmask32 __B, __m512h __C, __m512h __D) +_mm_mask_cvt_roundsh_ss (__m128 __A, __mmask8 __B, __m128 __C, + __m128h __D, const int __R) { - return __builtin_ia32_scalefph512_mask_round (__C, __D, __A, __B, - _MM_FROUND_CUR_DIRECTION); + return __builtin_ia32_vcvtsh2ss_mask_round (__D, __C, __A, __B, __R); } -extern __inline __m512h +extern __inline __m128 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_maskz_scalef_ph (__mmask32 __A, __m512h __B, __m512h __C) +_mm_maskz_cvt_roundsh_ss (__mmask8 __A, __m128 __B, + __m128h __C, const int __R) { - return __builtin_ia32_scalefph512_mask_round (__B, __C, - _mm512_setzero_ph (), - __A, - _MM_FROUND_CUR_DIRECTION); + return __builtin_ia32_vcvtsh2ss_mask_round (__C, __B, + _mm_setzero_ps (), + __A, __R); } -#ifdef __OPTIMIZE__ -extern __inline __m512h +extern __inline __m128d __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_scalef_round_ph (__m512h __A, __m512h __B, const int __C) +_mm_cvt_roundsh_sd (__m128d __A, __m128h __B, const int __R) { - return __builtin_ia32_scalefph512_mask_round (__A, __B, - _mm512_setzero_ph (), - (__mmask32) -1, __C); + return __builtin_ia32_vcvtsh2sd_mask_round (__B, __A, + _mm_setzero_pd (), + (__mmask8) -1, __R); } -extern __inline __m512h +extern __inline __m128d __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_mask_scalef_round_ph (__m512h __A, __mmask32 __B, __m512h __C, - __m512h __D, const int __E) +_mm_mask_cvt_roundsh_sd (__m128d __A, __mmask8 __B, __m128d __C, + __m128h __D, const int __R) { - return __builtin_ia32_scalefph512_mask_round (__C, __D, __A, __B, - __E); + return __builtin_ia32_vcvtsh2sd_mask_round (__D, __C, __A, __B, __R); } -extern __inline __m512h +extern __inline __m128d __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_maskz_scalef_round_ph (__mmask32 __A, __m512h __B, __m512h __C, - const int __D) +_mm_maskz_cvt_roundsh_sd (__mmask8 __A, __m128d __B, __m128h __C, const int __R) { - return __builtin_ia32_scalefph512_mask_round (__B, __C, - _mm512_setzero_ph (), - __A, __D); + return __builtin_ia32_vcvtsh2sd_mask_round (__C, __B, + _mm_setzero_pd (), + __A, __R); } #else -#define _mm512_scalef_round_ph(A, B, C) \ - (__builtin_ia32_scalefph512_mask_round ((A), (B), \ - _mm512_setzero_ph (), \ - (__mmask32)-1, (C))) +#define _mm_cvt_roundsh_ss(A, B, R) \ + (__builtin_ia32_vcvtsh2ss_mask_round ((B), (A), \ + _mm_setzero_ps (), \ + (__mmask8) -1, (R))) -#define _mm512_mask_scalef_round_ph(A, B, C, D, E) \ - (__builtin_ia32_scalefph512_mask_round ((C), (D), (A), (B), (E))) +#define _mm_mask_cvt_roundsh_ss(A, B, C, D, R) \ + (__builtin_ia32_vcvtsh2ss_mask_round ((D), (C), (A), (B), (R))) -#define _mm512_maskz_scalef_round_ph(A, B, C, D) \ - (__builtin_ia32_scalefph512_mask_round ((B), (C), \ - _mm512_setzero_ph (), \ - (A), (D))) +#define _mm_maskz_cvt_roundsh_ss(A, B, C, R) \ + (__builtin_ia32_vcvtsh2ss_mask_round ((C), (B), \ + _mm_setzero_ps (), \ + (A), (R))) -#endif /* __OPTIMIZE__ */ +#define _mm_cvt_roundsh_sd(A, B, R) \ + (__builtin_ia32_vcvtsh2sd_mask_round ((B), (A), \ + _mm_setzero_pd (), \ + (__mmask8) -1, (R))) -/* Intrinsics vscalefsh. */ -extern __inline __m128h -__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_scalef_sh (__m128h __A, __m128h __B) +#define _mm_mask_cvt_roundsh_sd(A, B, C, D, R) \ + (__builtin_ia32_vcvtsh2sd_mask_round ((D), (C), (A), (B), (R))) + +#define _mm_maskz_cvt_roundsh_sd(A, B, C, R) \ + (__builtin_ia32_vcvtsh2sd_mask_round ((C), (B), \ + _mm_setzero_pd (), \ + (A), (R))) + +#endif /* __OPTIMIZE__ */ + +/* Intrinsics vcvtss2sh, vcvtsd2sh. */ +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvtss_sh (__m128h __A, __m128 __B) { - return __builtin_ia32_scalefsh_mask_round (__A, __B, - _mm_setzero_ph (), - (__mmask8) -1, - _MM_FROUND_CUR_DIRECTION); + return __builtin_ia32_vcvtss2sh_mask_round (__B, __A, + _mm_setzero_ph (), + (__mmask8) -1, + _MM_FROUND_CUR_DIRECTION); } extern __inline __m128h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_mask_scalef_sh (__m128h __A, __mmask8 __B, __m128h __C, __m128h __D) +_mm_mask_cvtss_sh (__m128h __A, __mmask8 __B, __m128h __C, __m128 __D) { - return __builtin_ia32_scalefsh_mask_round (__C, __D, __A, __B, - _MM_FROUND_CUR_DIRECTION); + return __builtin_ia32_vcvtss2sh_mask_round (__D, __C, __A, __B, + _MM_FROUND_CUR_DIRECTION); } extern __inline __m128h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_maskz_scalef_sh (__mmask8 __A, __m128h __B, __m128h __C) +_mm_maskz_cvtss_sh (__mmask8 __A, __m128h __B, __m128 __C) { - return __builtin_ia32_scalefsh_mask_round (__B, __C, - _mm_setzero_ph (), - __A, - _MM_FROUND_CUR_DIRECTION); + return __builtin_ia32_vcvtss2sh_mask_round (__C, __B, + _mm_setzero_ph (), + __A, _MM_FROUND_CUR_DIRECTION); } -#ifdef __OPTIMIZE__ extern __inline __m128h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_scalef_round_sh (__m128h __A, __m128h __B, const int __C) +_mm_cvtsd_sh (__m128h __A, __m128d __B) { - return __builtin_ia32_scalefsh_mask_round (__A, __B, - _mm_setzero_ph (), - (__mmask8) -1, __C); + return __builtin_ia32_vcvtsd2sh_mask_round (__B, __A, + _mm_setzero_ph (), + (__mmask8) -1, + _MM_FROUND_CUR_DIRECTION); } extern __inline __m128h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_mask_scalef_round_sh (__m128h __A, __mmask8 __B, __m128h __C, - __m128h __D, const int __E) +_mm_mask_cvtsd_sh (__m128h __A, __mmask8 __B, __m128h __C, __m128d __D) { - return __builtin_ia32_scalefsh_mask_round (__C, __D, __A, __B, - __E); + return __builtin_ia32_vcvtsd2sh_mask_round (__D, __C, __A, __B, + _MM_FROUND_CUR_DIRECTION); } extern __inline __m128h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_maskz_scalef_round_sh (__mmask8 __A, __m128h __B, __m128h __C, - const int __D) +_mm_maskz_cvtsd_sh (__mmask8 __A, __m128h __B, __m128d __C) { - return __builtin_ia32_scalefsh_mask_round (__B, __C, - _mm_setzero_ph (), - __A, __D); + return __builtin_ia32_vcvtsd2sh_mask_round (__C, __B, + _mm_setzero_ph (), + __A, _MM_FROUND_CUR_DIRECTION); } -#else -#define _mm_scalef_round_sh(A, B, C) \ - (__builtin_ia32_scalefsh_mask_round ((A), (B), \ - _mm_setzero_ph (), \ - (__mmask8)-1, (C))) - -#define _mm_mask_scalef_round_sh(A, B, C, D, E) \ - (__builtin_ia32_scalefsh_mask_round ((C), (D), (A), (B), (E))) - -#define _mm_maskz_scalef_round_sh(A, B, C, D) \ - (__builtin_ia32_scalefsh_mask_round ((B), (C), _mm_setzero_ph (), \ - (A), (D))) - -#endif /* __OPTIMIZE__ */ - -/* Intrinsics vreduceph. */ #ifdef __OPTIMIZE__ -extern __inline __m512h +extern __inline __m128h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_reduce_ph (__m512h __A, int __B) +_mm_cvt_roundss_sh (__m128h __A, __m128 __B, const int __R) { - return __builtin_ia32_reduceph512_mask_round (__A, __B, - _mm512_setzero_ph (), - (__mmask32) -1, - _MM_FROUND_CUR_DIRECTION); + return __builtin_ia32_vcvtss2sh_mask_round (__B, __A, + _mm_setzero_ph (), + (__mmask8) -1, __R); } -extern __inline __m512h +extern __inline __m128h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_mask_reduce_ph (__m512h __A, __mmask32 __B, __m512h __C, int __D) +_mm_mask_cvt_roundss_sh (__m128h __A, __mmask8 __B, __m128h __C, __m128 __D, + const int __R) { - return __builtin_ia32_reduceph512_mask_round (__C, __D, __A, __B, - _MM_FROUND_CUR_DIRECTION); + return __builtin_ia32_vcvtss2sh_mask_round (__D, __C, __A, __B, __R); } -extern __inline __m512h +extern __inline __m128h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_maskz_reduce_ph (__mmask32 __A, __m512h __B, int __C) +_mm_maskz_cvt_roundss_sh (__mmask8 __A, __m128h __B, __m128 __C, + const int __R) { - return __builtin_ia32_reduceph512_mask_round (__B, __C, - _mm512_setzero_ph (), - __A, - _MM_FROUND_CUR_DIRECTION); + return __builtin_ia32_vcvtss2sh_mask_round (__C, __B, + _mm_setzero_ph (), + __A, __R); } -extern __inline __m512h +extern __inline __m128h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_reduce_round_ph (__m512h __A, int __B, const int __C) +_mm_cvt_roundsd_sh (__m128h __A, __m128d __B, const int __R) { - return __builtin_ia32_reduceph512_mask_round (__A, __B, - _mm512_setzero_ph (), - (__mmask32) -1, __C); + return __builtin_ia32_vcvtsd2sh_mask_round (__B, __A, + _mm_setzero_ph (), + (__mmask8) -1, __R); } -extern __inline __m512h +extern __inline __m128h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_mask_reduce_round_ph (__m512h __A, __mmask32 __B, __m512h __C, - int __D, const int __E) +_mm_mask_cvt_roundsd_sh (__m128h __A, __mmask8 __B, __m128h __C, __m128d __D, + const int __R) { - return __builtin_ia32_reduceph512_mask_round (__C, __D, __A, __B, - __E); + return __builtin_ia32_vcvtsd2sh_mask_round (__D, __C, __A, __B, __R); } -extern __inline __m512h +extern __inline __m128h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_maskz_reduce_round_ph (__mmask32 __A, __m512h __B, int __C, - const int __D) +_mm_maskz_cvt_roundsd_sh (__mmask8 __A, __m128h __B, __m128d __C, + const int __R) { - return __builtin_ia32_reduceph512_mask_round (__B, __C, - _mm512_setzero_ph (), - __A, __D); + return __builtin_ia32_vcvtsd2sh_mask_round (__C, __B, + _mm_setzero_ph (), + __A, __R); } #else -#define _mm512_reduce_ph(A, B) \ - (__builtin_ia32_reduceph512_mask_round ((A), (B), \ - _mm512_setzero_ph (), \ - (__mmask32)-1, \ - _MM_FROUND_CUR_DIRECTION)) +#define _mm_cvt_roundss_sh(A, B, R) \ + (__builtin_ia32_vcvtss2sh_mask_round ((B), (A), \ + _mm_setzero_ph (), \ + (__mmask8) -1, R)) -#define _mm512_mask_reduce_ph(A, B, C, D) \ - (__builtin_ia32_reduceph512_mask_round ((C), (D), (A), (B), \ - _MM_FROUND_CUR_DIRECTION)) +#define _mm_mask_cvt_roundss_sh(A, B, C, D, R) \ + (__builtin_ia32_vcvtss2sh_mask_round ((D), (C), (A), (B), (R))) -#define _mm512_maskz_reduce_ph(A, B, C) \ - (__builtin_ia32_reduceph512_mask_round ((B), (C), \ - _mm512_setzero_ph (), \ - (A), _MM_FROUND_CUR_DIRECTION)) +#define _mm_maskz_cvt_roundss_sh(A, B, C, R) \ + (__builtin_ia32_vcvtss2sh_mask_round ((C), (B), \ + _mm_setzero_ph (), \ + A, R)) -#define _mm512_reduce_round_ph(A, B, C) \ - (__builtin_ia32_reduceph512_mask_round ((A), (B), \ - _mm512_setzero_ph (), \ - (__mmask32)-1, (C))) +#define _mm_cvt_roundsd_sh(A, B, R) \ + (__builtin_ia32_vcvtsd2sh_mask_round ((B), (A), \ + _mm_setzero_ph (), \ + (__mmask8) -1, R)) -#define _mm512_mask_reduce_round_ph(A, B, C, D, E) \ - (__builtin_ia32_reduceph512_mask_round ((C), (D), (A), (B), (E))) +#define _mm_mask_cvt_roundsd_sh(A, B, C, D, R) \ + (__builtin_ia32_vcvtsd2sh_mask_round ((D), (C), (A), (B), (R))) -#define _mm512_maskz_reduce_round_ph(A, B, C, D) \ - (__builtin_ia32_reduceph512_mask_round ((B), (C), \ - _mm512_setzero_ph (), \ - (A), (D))) +#define _mm_maskz_cvt_roundsd_sh(A, B, C, R) \ + (__builtin_ia32_vcvtsd2sh_mask_round ((C), (B), \ + _mm_setzero_ph (), \ + (A), (R))) #endif /* __OPTIMIZE__ */ -/* Intrinsics vreducesh. */ -#ifdef __OPTIMIZE__ -extern __inline __m128h +extern __inline _Float16 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_reduce_sh (__m128h __A, __m128h __B, int __C) +_mm_cvtsh_h (__m128h __A) { - return __builtin_ia32_reducesh_mask_round (__A, __B, __C, - _mm_setzero_ph (), - (__mmask8) -1, - _MM_FROUND_CUR_DIRECTION); + return __A[0]; } +/* Intrinsics vfmadd[132,213,231]sh. */ extern __inline __m128h -__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_mask_reduce_sh (__m128h __A, __mmask8 __B, __m128h __C, - __m128h __D, int __E) + __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_fmadd_sh (__m128h __W, __m128h __A, __m128h __B) { - return __builtin_ia32_reducesh_mask_round (__C, __D, __E, __A, __B, - _MM_FROUND_CUR_DIRECTION); + return (__m128h) __builtin_ia32_vfmaddsh3_mask ((__v8hf) __W, + (__v8hf) __A, + (__v8hf) __B, + (__mmask8) -1, + _MM_FROUND_CUR_DIRECTION); } extern __inline __m128h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_maskz_reduce_sh (__mmask8 __A, __m128h __B, __m128h __C, int __D) +_mm_mask_fmadd_sh (__m128h __W, __mmask8 __U, __m128h __A, __m128h __B) { - return __builtin_ia32_reducesh_mask_round (__B, __C, __D, - _mm_setzero_ph (), __A, - _MM_FROUND_CUR_DIRECTION); + return (__m128h) __builtin_ia32_vfmaddsh3_mask ((__v8hf) __W, + (__v8hf) __A, + (__v8hf) __B, + (__mmask8) __U, + _MM_FROUND_CUR_DIRECTION); } extern __inline __m128h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_reduce_round_sh (__m128h __A, __m128h __B, int __C, const int __D) +_mm_mask3_fmadd_sh (__m128h __W, __m128h __A, __m128h __B, __mmask8 __U) { - return __builtin_ia32_reducesh_mask_round (__A, __B, __C, - _mm_setzero_ph (), - (__mmask8) -1, __D); + return (__m128h) __builtin_ia32_vfmaddsh3_mask3 ((__v8hf) __W, + (__v8hf) __A, + (__v8hf) __B, + (__mmask8) __U, + _MM_FROUND_CUR_DIRECTION); } extern __inline __m128h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_mask_reduce_round_sh (__m128h __A, __mmask8 __B, __m128h __C, - __m128h __D, int __E, const int __F) +_mm_maskz_fmadd_sh (__mmask8 __U, __m128h __W, __m128h __A, __m128h __B) { - return __builtin_ia32_reducesh_mask_round (__C, __D, __E, __A, - __B, __F); + return (__m128h) __builtin_ia32_vfmaddsh3_maskz ((__v8hf) __W, + (__v8hf) __A, + (__v8hf) __B, + (__mmask8) __U, + _MM_FROUND_CUR_DIRECTION); } + +#ifdef __OPTIMIZE__ extern __inline __m128h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_maskz_reduce_round_sh (__mmask8 __A, __m128h __B, __m128h __C, - int __D, const int __E) +_mm_fmadd_round_sh (__m128h __W, __m128h __A, __m128h __B, const int __R) { - return __builtin_ia32_reducesh_mask_round (__B, __C, __D, - _mm_setzero_ph (), - __A, __E); + return (__m128h) __builtin_ia32_vfmaddsh3_mask ((__v8hf) __W, + (__v8hf) __A, + (__v8hf) __B, + (__mmask8) -1, + __R); } -#else -#define _mm_reduce_sh(A, B, C) \ - (__builtin_ia32_reducesh_mask_round ((A), (B), (C), \ - _mm_setzero_ph (), \ - (__mmask8)-1, \ - _MM_FROUND_CUR_DIRECTION)) - -#define _mm_mask_reduce_sh(A, B, C, D, E) \ - (__builtin_ia32_reducesh_mask_round ((C), (D), (E), (A), (B), \ - _MM_FROUND_CUR_DIRECTION)) - -#define _mm_maskz_reduce_sh(A, B, C, D) \ - (__builtin_ia32_reducesh_mask_round ((B), (C), (D), \ - _mm_setzero_ph (), \ - (A), _MM_FROUND_CUR_DIRECTION)) +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_fmadd_round_sh (__m128h __W, __mmask8 __U, __m128h __A, __m128h __B, + const int __R) +{ + return (__m128h) __builtin_ia32_vfmaddsh3_mask ((__v8hf) __W, + (__v8hf) __A, + (__v8hf) __B, + (__mmask8) __U, __R); +} -#define _mm_reduce_round_sh(A, B, C, D) \ - (__builtin_ia32_reducesh_mask_round ((A), (B), (C), \ - _mm_setzero_ph (), \ - (__mmask8)-1, (D))) +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask3_fmadd_round_sh (__m128h __W, __m128h __A, __m128h __B, __mmask8 __U, + const int __R) +{ + return (__m128h) __builtin_ia32_vfmaddsh3_mask3 ((__v8hf) __W, + (__v8hf) __A, + (__v8hf) __B, + (__mmask8) __U, __R); +} -#define _mm_mask_reduce_round_sh(A, B, C, D, E, F) \ - (__builtin_ia32_reducesh_mask_round ((C), (D), (E), (A), (B), (F))) +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_fmadd_round_sh (__mmask8 __U, __m128h __W, __m128h __A, + __m128h __B, const int __R) +{ + return (__m128h) __builtin_ia32_vfmaddsh3_maskz ((__v8hf) __W, + (__v8hf) __A, + (__v8hf) __B, + (__mmask8) __U, __R); +} -#define _mm_maskz_reduce_round_sh(A, B, C, D, E) \ - (__builtin_ia32_reducesh_mask_round ((B), (C), (D), \ - _mm_setzero_ph (), \ - (A), (E))) +#else +#define _mm_fmadd_round_sh(A, B, C, R) \ + ((__m128h) __builtin_ia32_vfmaddsh3_mask ((A), (B), (C), (-1), (R))) +#define _mm_mask_fmadd_round_sh(A, U, B, C, R) \ + ((__m128h) __builtin_ia32_vfmaddsh3_mask ((A), (B), (C), (U), (R))) +#define _mm_mask3_fmadd_round_sh(A, B, C, U, R) \ + ((__m128h) __builtin_ia32_vfmaddsh3_mask3 ((A), (B), (C), (U), (R))) +#define _mm_maskz_fmadd_round_sh(U, A, B, C, R) \ + ((__m128h) __builtin_ia32_vfmaddsh3_maskz ((A), (B), (C), (U), (R))) #endif /* __OPTIMIZE__ */ -/* Intrinsics vrndscaleph. */ -#ifdef __OPTIMIZE__ -extern __inline __m512h +/* Intrinsics vfnmadd[132,213,231]sh. */ +extern __inline __m128h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_roundscale_ph (__m512h __A, int __B) +_mm_fnmadd_sh (__m128h __W, __m128h __A, __m128h __B) { - return __builtin_ia32_rndscaleph512_mask_round (__A, __B, - _mm512_setzero_ph (), - (__mmask32) -1, - _MM_FROUND_CUR_DIRECTION); + return (__m128h) __builtin_ia32_vfnmaddsh3_mask ((__v8hf) __W, + (__v8hf) __A, + (__v8hf) __B, + (__mmask8) -1, + _MM_FROUND_CUR_DIRECTION); } -extern __inline __m512h +extern __inline __m128h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_mask_roundscale_ph (__m512h __A, __mmask32 __B, - __m512h __C, int __D) +_mm_mask_fnmadd_sh (__m128h __W, __mmask8 __U, __m128h __A, __m128h __B) { - return __builtin_ia32_rndscaleph512_mask_round (__C, __D, __A, __B, + return (__m128h) __builtin_ia32_vfnmaddsh3_mask ((__v8hf) __W, + (__v8hf) __A, + (__v8hf) __B, + (__mmask8) __U, _MM_FROUND_CUR_DIRECTION); } -extern __inline __m512h +extern __inline __m128h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_maskz_roundscale_ph (__mmask32 __A, __m512h __B, int __C) +_mm_mask3_fnmadd_sh (__m128h __W, __m128h __A, __m128h __B, __mmask8 __U) { - return __builtin_ia32_rndscaleph512_mask_round (__B, __C, - _mm512_setzero_ph (), - __A, - _MM_FROUND_CUR_DIRECTION); + return (__m128h) __builtin_ia32_vfnmaddsh3_mask3 ((__v8hf) __W, + (__v8hf) __A, + (__v8hf) __B, + (__mmask8) __U, + _MM_FROUND_CUR_DIRECTION); } -extern __inline __m512h +extern __inline __m128h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_roundscale_round_ph (__m512h __A, int __B, const int __C) +_mm_maskz_fnmadd_sh (__mmask8 __U, __m128h __W, __m128h __A, __m128h __B) { - return __builtin_ia32_rndscaleph512_mask_round (__A, __B, - _mm512_setzero_ph (), - (__mmask32) -1, - __C); + return (__m128h) __builtin_ia32_vfnmaddsh3_maskz ((__v8hf) __W, + (__v8hf) __A, + (__v8hf) __B, + (__mmask8) __U, + _MM_FROUND_CUR_DIRECTION); } -extern __inline __m512h + +#ifdef __OPTIMIZE__ +extern __inline __m128h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_mask_roundscale_round_ph (__m512h __A, __mmask32 __B, - __m512h __C, int __D, const int __E) +_mm_fnmadd_round_sh (__m128h __W, __m128h __A, __m128h __B, const int __R) { - return __builtin_ia32_rndscaleph512_mask_round (__C, __D, __A, - __B, __E); + return (__m128h) __builtin_ia32_vfnmaddsh3_mask ((__v8hf) __W, + (__v8hf) __A, + (__v8hf) __B, + (__mmask8) -1, + __R); } -extern __inline __m512h +extern __inline __m128h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_maskz_roundscale_round_ph (__mmask32 __A, __m512h __B, int __C, - const int __D) +_mm_mask_fnmadd_round_sh (__m128h __W, __mmask8 __U, __m128h __A, __m128h __B, + const int __R) { - return __builtin_ia32_rndscaleph512_mask_round (__B, __C, - _mm512_setzero_ph (), - __A, __D); + return (__m128h) __builtin_ia32_vfnmaddsh3_mask ((__v8hf) __W, + (__v8hf) __A, + (__v8hf) __B, + (__mmask8) __U, __R); } -#else -#define _mm512_roundscale_ph(A, B) \ - (__builtin_ia32_rndscaleph512_mask_round ((A), (B), \ - _mm512_setzero_ph (), \ - (__mmask32)-1, \ - _MM_FROUND_CUR_DIRECTION)) - -#define _mm512_mask_roundscale_ph(A, B, C, D) \ - (__builtin_ia32_rndscaleph512_mask_round ((C), (D), (A), (B), \ - _MM_FROUND_CUR_DIRECTION)) - -#define _mm512_maskz_roundscale_ph(A, B, C) \ - (__builtin_ia32_rndscaleph512_mask_round ((B), (C), \ - _mm512_setzero_ph (), \ - (A), \ - _MM_FROUND_CUR_DIRECTION)) -#define _mm512_roundscale_round_ph(A, B, C) \ - (__builtin_ia32_rndscaleph512_mask_round ((A), (B), \ - _mm512_setzero_ph (), \ - (__mmask32)-1, (C))) +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask3_fnmadd_round_sh (__m128h __W, __m128h __A, __m128h __B, __mmask8 __U, + const int __R) +{ + return (__m128h) __builtin_ia32_vfnmaddsh3_mask3 ((__v8hf) __W, + (__v8hf) __A, + (__v8hf) __B, + (__mmask8) __U, __R); +} -#define _mm512_mask_roundscale_round_ph(A, B, C, D, E) \ - (__builtin_ia32_rndscaleph512_mask_round ((C), (D), (A), (B), (E))) +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_fnmadd_round_sh (__mmask8 __U, __m128h __W, __m128h __A, + __m128h __B, const int __R) +{ + return (__m128h) __builtin_ia32_vfnmaddsh3_maskz ((__v8hf) __W, + (__v8hf) __A, + (__v8hf) __B, + (__mmask8) __U, __R); +} -#define _mm512_maskz_roundscale_round_ph(A, B, C, D) \ - (__builtin_ia32_rndscaleph512_mask_round ((B), (C), \ - _mm512_setzero_ph (), \ - (A), (D))) +#else +#define _mm_fnmadd_round_sh(A, B, C, R) \ + ((__m128h) __builtin_ia32_vfnmaddsh3_mask ((A), (B), (C), (-1), (R))) +#define _mm_mask_fnmadd_round_sh(A, U, B, C, R) \ + ((__m128h) __builtin_ia32_vfnmaddsh3_mask ((A), (B), (C), (U), (R))) +#define _mm_mask3_fnmadd_round_sh(A, B, C, U, R) \ + ((__m128h) __builtin_ia32_vfnmaddsh3_mask3 ((A), (B), (C), (U), (R))) +#define _mm_maskz_fnmadd_round_sh(U, A, B, C, R) \ + ((__m128h) __builtin_ia32_vfnmaddsh3_maskz ((A), (B), (C), (U), (R))) #endif /* __OPTIMIZE__ */ -/* Intrinsics vrndscalesh. */ -#ifdef __OPTIMIZE__ +/* Intrinsics vfmsub[132,213,231]sh. */ extern __inline __m128h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_roundscale_sh (__m128h __A, __m128h __B, int __C) +_mm_fmsub_sh (__m128h __W, __m128h __A, __m128h __B) { - return __builtin_ia32_rndscalesh_mask_round (__A, __B, __C, - _mm_setzero_ph (), - (__mmask8) -1, - _MM_FROUND_CUR_DIRECTION); + return (__m128h) __builtin_ia32_vfmaddsh3_mask ((__v8hf) __W, + (__v8hf) __A, + -(__v8hf) __B, + (__mmask8) -1, + _MM_FROUND_CUR_DIRECTION); } extern __inline __m128h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_mask_roundscale_sh (__m128h __A, __mmask8 __B, __m128h __C, - __m128h __D, int __E) +_mm_mask_fmsub_sh (__m128h __W, __mmask8 __U, __m128h __A, __m128h __B) { - return __builtin_ia32_rndscalesh_mask_round (__C, __D, __E, __A, __B, - _MM_FROUND_CUR_DIRECTION); + return (__m128h) __builtin_ia32_vfmaddsh3_mask ((__v8hf) __W, + (__v8hf) __A, + -(__v8hf) __B, + (__mmask8) __U, + _MM_FROUND_CUR_DIRECTION); } extern __inline __m128h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_maskz_roundscale_sh (__mmask8 __A, __m128h __B, __m128h __C, int __D) +_mm_mask3_fmsub_sh (__m128h __W, __m128h __A, __m128h __B, __mmask8 __U) { - return __builtin_ia32_rndscalesh_mask_round (__B, __C, __D, - _mm_setzero_ph (), __A, - _MM_FROUND_CUR_DIRECTION); + return (__m128h) __builtin_ia32_vfmsubsh3_mask3 ((__v8hf) __W, + (__v8hf) __A, + (__v8hf) __B, + (__mmask8) __U, + _MM_FROUND_CUR_DIRECTION); } extern __inline __m128h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_roundscale_round_sh (__m128h __A, __m128h __B, int __C, const int __D) +_mm_maskz_fmsub_sh (__mmask8 __U, __m128h __W, __m128h __A, __m128h __B) { - return __builtin_ia32_rndscalesh_mask_round (__A, __B, __C, - _mm_setzero_ph (), - (__mmask8) -1, - __D); + return (__m128h) __builtin_ia32_vfmaddsh3_maskz ((__v8hf) __W, + (__v8hf) __A, + -(__v8hf) __B, + (__mmask8) __U, + _MM_FROUND_CUR_DIRECTION); } + +#ifdef __OPTIMIZE__ extern __inline __m128h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_mask_roundscale_round_sh (__m128h __A, __mmask8 __B, __m128h __C, - __m128h __D, int __E, const int __F) +_mm_fmsub_round_sh (__m128h __W, __m128h __A, __m128h __B, const int __R) { - return __builtin_ia32_rndscalesh_mask_round (__C, __D, __E, - __A, __B, __F); + return (__m128h) __builtin_ia32_vfmaddsh3_mask ((__v8hf) __W, + (__v8hf) __A, + -(__v8hf) __B, + (__mmask8) -1, + __R); } extern __inline __m128h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_maskz_roundscale_round_sh (__mmask8 __A, __m128h __B, __m128h __C, - int __D, const int __E) +_mm_mask_fmsub_round_sh (__m128h __W, __mmask8 __U, __m128h __A, __m128h __B, + const int __R) { - return __builtin_ia32_rndscalesh_mask_round (__B, __C, __D, - _mm_setzero_ph (), - __A, __E); + return (__m128h) __builtin_ia32_vfmaddsh3_mask ((__v8hf) __W, + (__v8hf) __A, + -(__v8hf) __B, + (__mmask8) __U, __R); } -#else -#define _mm_roundscale_sh(A, B, C) \ - (__builtin_ia32_rndscalesh_mask_round ((A), (B), (C), \ - _mm_setzero_ph (), \ - (__mmask8)-1, \ - _MM_FROUND_CUR_DIRECTION)) - -#define _mm_mask_roundscale_sh(A, B, C, D, E) \ - (__builtin_ia32_rndscalesh_mask_round ((C), (D), (E), (A), (B), \ - _MM_FROUND_CUR_DIRECTION)) - -#define _mm_maskz_roundscale_sh(A, B, C, D) \ - (__builtin_ia32_rndscalesh_mask_round ((B), (C), (D), \ - _mm_setzero_ph (), \ - (A), _MM_FROUND_CUR_DIRECTION)) - -#define _mm_roundscale_round_sh(A, B, C, D) \ - (__builtin_ia32_rndscalesh_mask_round ((A), (B), (C), \ - _mm_setzero_ph (), \ - (__mmask8)-1, (D))) - -#define _mm_mask_roundscale_round_sh(A, B, C, D, E, F) \ - (__builtin_ia32_rndscalesh_mask_round ((C), (D), (E), (A), (B), (F))) - -#define _mm_maskz_roundscale_round_sh(A, B, C, D, E) \ - (__builtin_ia32_rndscalesh_mask_round ((B), (C), (D), \ - _mm_setzero_ph (), \ - (A), (E))) - -#endif /* __OPTIMIZE__ */ - -/* Intrinsics vfpclasssh. */ -#ifdef __OPTIMIZE__ -extern __inline __mmask8 - __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_fpclass_sh_mask (__m128h __A, const int __imm) +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask3_fmsub_round_sh (__m128h __W, __m128h __A, __m128h __B, __mmask8 __U, + const int __R) { - return (__mmask8) __builtin_ia32_fpclasssh_mask ((__v8hf) __A, __imm, - (__mmask8) -1); + return (__m128h) __builtin_ia32_vfmsubsh3_mask3 ((__v8hf) __W, + (__v8hf) __A, + (__v8hf) __B, + (__mmask8) __U, __R); } -extern __inline __mmask8 +extern __inline __m128h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_mask_fpclass_sh_mask (__mmask8 __U, __m128h __A, const int __imm) +_mm_maskz_fmsub_round_sh (__mmask8 __U, __m128h __W, __m128h __A, + __m128h __B, const int __R) { - return (__mmask8) __builtin_ia32_fpclasssh_mask ((__v8hf) __A, __imm, __U); + return (__m128h) __builtin_ia32_vfmaddsh3_maskz ((__v8hf) __W, + (__v8hf) __A, + -(__v8hf) __B, + (__mmask8) __U, __R); } #else -#define _mm_fpclass_sh_mask(X, C) \ - ((__mmask8) __builtin_ia32_fpclasssh_mask ((__v8hf) (__m128h) (X), \ - (int) (C), (__mmask8) (-1))) \ +#define _mm_fmsub_round_sh(A, B, C, R) \ + ((__m128h) __builtin_ia32_vfmaddsh3_mask ((A), (B), -(C), (-1), (R))) +#define _mm_mask_fmsub_round_sh(A, U, B, C, R) \ + ((__m128h) __builtin_ia32_vfmaddsh3_mask ((A), (B), -(C), (U), (R))) +#define _mm_mask3_fmsub_round_sh(A, B, C, U, R) \ + ((__m128h) __builtin_ia32_vfmsubsh3_mask3 ((A), (B), (C), (U), (R))) +#define _mm_maskz_fmsub_round_sh(U, A, B, C, R) \ + ((__m128h) __builtin_ia32_vfmaddsh3_maskz ((A), (B), -(C), (U), (R))) -#define _mm_mask_fpclass_sh_mask(U, X, C) \ - ((__mmask8) __builtin_ia32_fpclasssh_mask ((__v8hf) (__m128h) (X), \ - (int) (C), (__mmask8) (U))) #endif /* __OPTIMIZE__ */ -/* Intrinsics vfpclassph. */ -#ifdef __OPTIMIZE__ -extern __inline __mmask32 -__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_mask_fpclass_ph_mask (__mmask32 __U, __m512h __A, - const int __imm) +/* Intrinsics vfnmsub[132,213,231]sh. */ +extern __inline __m128h + __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_fnmsub_sh (__m128h __W, __m128h __A, __m128h __B) { - return (__mmask32) __builtin_ia32_fpclassph512_mask ((__v32hf) __A, - __imm, __U); + return (__m128h) __builtin_ia32_vfmaddsh3_mask ((__v8hf) __W, + -(__v8hf) __A, + -(__v8hf) __B, + (__mmask8) -1, + _MM_FROUND_CUR_DIRECTION); } -extern __inline __mmask32 +extern __inline __m128h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_fpclass_ph_mask (__m512h __A, const int __imm) +_mm_mask_fnmsub_sh (__m128h __W, __mmask8 __U, __m128h __A, __m128h __B) { - return (__mmask32) __builtin_ia32_fpclassph512_mask ((__v32hf) __A, - __imm, - (__mmask32) -1); + return (__m128h) __builtin_ia32_vfmaddsh3_mask ((__v8hf) __W, + -(__v8hf) __A, + -(__v8hf) __B, + (__mmask8) __U, + _MM_FROUND_CUR_DIRECTION); } -#else -#define _mm512_mask_fpclass_ph_mask(u, x, c) \ - ((__mmask32) __builtin_ia32_fpclassph512_mask ((__v32hf) (__m512h) (x), \ - (int) (c),(__mmask8)(u))) - -#define _mm512_fpclass_ph_mask(x, c) \ - ((__mmask32) __builtin_ia32_fpclassph512_mask ((__v32hf) (__m512h) (x), \ - (int) (c),(__mmask8)-1)) -#endif /* __OPIMTIZE__ */ - -/* Intrinsics vgetexpph, vgetexpsh. */ extern __inline __m128h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_getexp_sh (__m128h __A, __m128h __B) +_mm_mask3_fnmsub_sh (__m128h __W, __m128h __A, __m128h __B, __mmask8 __U) { - return (__m128h) - __builtin_ia32_getexpsh_mask_round ((__v8hf) __A, (__v8hf) __B, - (__v8hf) _mm_setzero_ph (), - (__mmask8) -1, - _MM_FROUND_CUR_DIRECTION); + return (__m128h) __builtin_ia32_vfmsubsh3_mask3 ((__v8hf) __W, + -(__v8hf) __A, + (__v8hf) __B, + (__mmask8) __U, + _MM_FROUND_CUR_DIRECTION); } extern __inline __m128h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_mask_getexp_sh (__m128h __W, __mmask8 __U, __m128h __A, __m128h __B) +_mm_maskz_fnmsub_sh (__mmask8 __U, __m128h __W, __m128h __A, __m128h __B) { - return (__m128h) - __builtin_ia32_getexpsh_mask_round ((__v8hf) __A, (__v8hf) __B, - (__v8hf) __W, (__mmask8) __U, - _MM_FROUND_CUR_DIRECTION); + return (__m128h) __builtin_ia32_vfmaddsh3_maskz ((__v8hf) __W, + -(__v8hf) __A, + -(__v8hf) __B, + (__mmask8) __U, + _MM_FROUND_CUR_DIRECTION); } + +#ifdef __OPTIMIZE__ extern __inline __m128h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_maskz_getexp_sh (__mmask8 __U, __m128h __A, __m128h __B) +_mm_fnmsub_round_sh (__m128h __W, __m128h __A, __m128h __B, const int __R) { - return (__m128h) - __builtin_ia32_getexpsh_mask_round ((__v8hf) __A, (__v8hf) __B, - (__v8hf) _mm_setzero_ph (), - (__mmask8) __U, - _MM_FROUND_CUR_DIRECTION); + return (__m128h) __builtin_ia32_vfmaddsh3_mask ((__v8hf) __W, + -(__v8hf) __A, + -(__v8hf) __B, + (__mmask8) -1, + __R); } -extern __inline __m512h +extern __inline __m128h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_getexp_ph (__m512h __A) +_mm_mask_fnmsub_round_sh (__m128h __W, __mmask8 __U, __m128h __A, __m128h __B, + const int __R) { - return (__m512h) - __builtin_ia32_getexpph512_mask ((__v32hf) __A, - (__v32hf) _mm512_setzero_ph (), - (__mmask32) -1, _MM_FROUND_CUR_DIRECTION); + return (__m128h) __builtin_ia32_vfmaddsh3_mask ((__v8hf) __W, + -(__v8hf) __A, + -(__v8hf) __B, + (__mmask8) __U, __R); } -extern __inline __m512h +extern __inline __m128h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_mask_getexp_ph (__m512h __W, __mmask32 __U, __m512h __A) +_mm_mask3_fnmsub_round_sh (__m128h __W, __m128h __A, __m128h __B, __mmask8 __U, + const int __R) { - return (__m512h) - __builtin_ia32_getexpph512_mask ((__v32hf) __A, (__v32hf) __W, - (__mmask32) __U, _MM_FROUND_CUR_DIRECTION); + return (__m128h) __builtin_ia32_vfmsubsh3_mask3 ((__v8hf) __W, + -(__v8hf) __A, + (__v8hf) __B, + (__mmask8) __U, __R); } -extern __inline __m512h +extern __inline __m128h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_maskz_getexp_ph (__mmask32 __U, __m512h __A) +_mm_maskz_fnmsub_round_sh (__mmask8 __U, __m128h __W, __m128h __A, + __m128h __B, const int __R) { - return (__m512h) - __builtin_ia32_getexpph512_mask ((__v32hf) __A, - (__v32hf) _mm512_setzero_ph (), - (__mmask32) __U, _MM_FROUND_CUR_DIRECTION); + return (__m128h) __builtin_ia32_vfmaddsh3_maskz ((__v8hf) __W, + -(__v8hf) __A, + -(__v8hf) __B, + (__mmask8) __U, __R); } -#ifdef __OPTIMIZE__ +#else +#define _mm_fnmsub_round_sh(A, B, C, R) \ + ((__m128h) __builtin_ia32_vfmaddsh3_mask ((A), -(B), -(C), (-1), (R))) +#define _mm_mask_fnmsub_round_sh(A, U, B, C, R) \ + ((__m128h) __builtin_ia32_vfmaddsh3_mask ((A), -(B), -(C), (U), (R))) +#define _mm_mask3_fnmsub_round_sh(A, B, C, U, R) \ + ((__m128h) __builtin_ia32_vfmsubsh3_mask3 ((A), -(B), (C), (U), (R))) +#define _mm_maskz_fnmsub_round_sh(U, A, B, C, R) \ + ((__m128h) __builtin_ia32_vfmaddsh3_maskz ((A), -(B), -(C), (U), (R))) + +#endif /* __OPTIMIZE__ */ + +/* Intrinsics vf[,c]maddcsh. */ extern __inline __m128h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_getexp_round_sh (__m128h __A, __m128h __B, const int __R) +_mm_mask_fcmadd_sch (__m128h __A, __mmask8 __B, __m128h __C, __m128h __D) { - return (__m128h) __builtin_ia32_getexpsh_mask_round ((__v8hf) __A, - (__v8hf) __B, - _mm_setzero_ph (), - (__mmask8) -1, - __R); + return (__m128h) + __builtin_ia32_vfcmaddcsh_mask_round ((__v8hf) __A, + (__v8hf) __C, + (__v8hf) __D, __B, + _MM_FROUND_CUR_DIRECTION); } extern __inline __m128h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_mask_getexp_round_sh (__m128h __W, __mmask8 __U, __m128h __A, - __m128h __B, const int __R) +_mm_mask3_fcmadd_sch (__m128h __A, __m128h __B, __m128h __C, __mmask8 __D) { - return (__m128h) __builtin_ia32_getexpsh_mask_round ((__v8hf) __A, - (__v8hf) __B, - (__v8hf) __W, - (__mmask8) __U, __R); + return (__m128h) + __builtin_ia32_vfcmaddcsh_mask3_round ((__v8hf) __A, + (__v8hf) __B, + (__v8hf) __C, __D, + _MM_FROUND_CUR_DIRECTION); } extern __inline __m128h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_maskz_getexp_round_sh (__mmask8 __U, __m128h __A, __m128h __B, - const int __R) +_mm_maskz_fcmadd_sch (__mmask8 __A, __m128h __B, __m128h __C, __m128h __D) { - return (__m128h) __builtin_ia32_getexpsh_mask_round ((__v8hf) __A, - (__v8hf) __B, - (__v8hf) - _mm_setzero_ph (), - (__mmask8) __U, __R); + return (__m128h) + __builtin_ia32_vfcmaddcsh_maskz_round ((__v8hf) __B, + (__v8hf) __C, + (__v8hf) __D, + __A, _MM_FROUND_CUR_DIRECTION); } -extern __inline __m512h +extern __inline __m128h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_getexp_round_ph (__m512h __A, const int __R) +_mm_fcmadd_sch (__m128h __A, __m128h __B, __m128h __C) { - return (__m512h) __builtin_ia32_getexpph512_mask ((__v32hf) __A, - (__v32hf) - _mm512_setzero_ph (), - (__mmask32) -1, __R); + return (__m128h) + __builtin_ia32_vfcmaddcsh_round ((__v8hf) __A, + (__v8hf) __B, + (__v8hf) __C, + _MM_FROUND_CUR_DIRECTION); } -extern __inline __m512h +extern __inline __m128h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_mask_getexp_round_ph (__m512h __W, __mmask32 __U, __m512h __A, - const int __R) -{ - return (__m512h) __builtin_ia32_getexpph512_mask ((__v32hf) __A, - (__v32hf) __W, - (__mmask32) __U, __R); -} - -extern __inline __m512h -__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_maskz_getexp_round_ph (__mmask32 __U, __m512h __A, const int __R) +_mm_mask_fmadd_sch (__m128h __A, __mmask8 __B, __m128h __C, __m128h __D) { - return (__m512h) __builtin_ia32_getexpph512_mask ((__v32hf) __A, - (__v32hf) - _mm512_setzero_ph (), - (__mmask32) __U, __R); + return (__m128h) + __builtin_ia32_vfmaddcsh_mask_round ((__v8hf) __A, + (__v8hf) __C, + (__v8hf) __D, __B, + _MM_FROUND_CUR_DIRECTION); } -#else -#define _mm_getexp_round_sh(A, B, R) \ - ((__m128h)__builtin_ia32_getexpsh_mask_round((__v8hf)(__m128h)(A), \ - (__v8hf)(__m128h)(B), \ - (__v8hf)_mm_setzero_ph(), \ - (__mmask8)-1, R)) - -#define _mm_mask_getexp_round_sh(W, U, A, B, C) \ - (__m128h)__builtin_ia32_getexpsh_mask_round(A, B, W, U, C) - -#define _mm_maskz_getexp_round_sh(U, A, B, C) \ - (__m128h)__builtin_ia32_getexpsh_mask_round(A, B, \ - (__v8hf)_mm_setzero_ph(), \ - U, C) - -#define _mm512_getexp_round_ph(A, R) \ - ((__m512h)__builtin_ia32_getexpph512_mask((__v32hf)(__m512h)(A), \ - (__v32hf)_mm512_setzero_ph(), (__mmask32)-1, R)) - -#define _mm512_mask_getexp_round_ph(W, U, A, R) \ - ((__m512h)__builtin_ia32_getexpph512_mask((__v32hf)(__m512h)(A), \ - (__v32hf)(__m512h)(W), (__mmask32)(U), R)) - -#define _mm512_maskz_getexp_round_ph(U, A, R) \ - ((__m512h)__builtin_ia32_getexpph512_mask((__v32hf)(__m512h)(A), \ - (__v32hf)_mm512_setzero_ph(), (__mmask32)(U), R)) - -#endif /* __OPTIMIZE__ */ - -/* Intrinsics vgetmantph, vgetmantsh. */ -#ifdef __OPTIMIZE__ extern __inline __m128h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_getmant_sh (__m128h __A, __m128h __B, - _MM_MANTISSA_NORM_ENUM __C, - _MM_MANTISSA_SIGN_ENUM __D) +_mm_mask3_fmadd_sch (__m128h __A, __m128h __B, __m128h __C, __mmask8 __D) { return (__m128h) - __builtin_ia32_getmantsh_mask_round ((__v8hf) __A, (__v8hf) __B, - (__D << 2) | __C, _mm_setzero_ph (), - (__mmask8) -1, - _MM_FROUND_CUR_DIRECTION); + __builtin_ia32_vfmaddcsh_mask3_round ((__v8hf) __A, + (__v8hf) __B, + (__v8hf) __C, __D, + _MM_FROUND_CUR_DIRECTION); } extern __inline __m128h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_mask_getmant_sh (__m128h __W, __mmask8 __U, __m128h __A, - __m128h __B, _MM_MANTISSA_NORM_ENUM __C, - _MM_MANTISSA_SIGN_ENUM __D) +_mm_maskz_fmadd_sch (__mmask8 __A, __m128h __B, __m128h __C, __m128h __D) { return (__m128h) - __builtin_ia32_getmantsh_mask_round ((__v8hf) __A, (__v8hf) __B, - (__D << 2) | __C, (__v8hf) __W, - __U, _MM_FROUND_CUR_DIRECTION); + __builtin_ia32_vfmaddcsh_maskz_round ((__v8hf) __B, + (__v8hf) __C, + (__v8hf) __D, + __A, _MM_FROUND_CUR_DIRECTION); } extern __inline __m128h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_maskz_getmant_sh (__mmask8 __U, __m128h __A, __m128h __B, - _MM_MANTISSA_NORM_ENUM __C, - _MM_MANTISSA_SIGN_ENUM __D) +_mm_fmadd_sch (__m128h __A, __m128h __B, __m128h __C) { return (__m128h) - __builtin_ia32_getmantsh_mask_round ((__v8hf) __A, (__v8hf) __B, - (__D << 2) | __C, - (__v8hf) _mm_setzero_ph(), - __U, _MM_FROUND_CUR_DIRECTION); + __builtin_ia32_vfmaddcsh_round ((__v8hf) __A, + (__v8hf) __B, + (__v8hf) __C, + _MM_FROUND_CUR_DIRECTION); } -extern __inline __m512h +#ifdef __OPTIMIZE__ +extern __inline __m128h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_getmant_ph (__m512h __A, _MM_MANTISSA_NORM_ENUM __B, - _MM_MANTISSA_SIGN_ENUM __C) +_mm_mask_fcmadd_round_sch (__m128h __A, __mmask8 __B, __m128h __C, + __m128h __D, const int __E) { - return (__m512h) __builtin_ia32_getmantph512_mask ((__v32hf) __A, - (__C << 2) | __B, - _mm512_setzero_ph (), - (__mmask32) -1, - _MM_FROUND_CUR_DIRECTION); + return (__m128h) + __builtin_ia32_vfcmaddcsh_mask_round ((__v8hf) __A, + (__v8hf) __C, + (__v8hf) __D, + __B, __E); } -extern __inline __m512h +extern __inline __m128h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_mask_getmant_ph (__m512h __W, __mmask32 __U, __m512h __A, - _MM_MANTISSA_NORM_ENUM __B, - _MM_MANTISSA_SIGN_ENUM __C) +_mm_mask3_fcmadd_round_sch (__m128h __A, __m128h __B, __m128h __C, + __mmask8 __D, const int __E) { - return (__m512h) __builtin_ia32_getmantph512_mask ((__v32hf) __A, - (__C << 2) | __B, - (__v32hf) __W, __U, - _MM_FROUND_CUR_DIRECTION); + return (__m128h) + __builtin_ia32_vfcmaddcsh_mask3_round ((__v8hf) __A, + (__v8hf) __B, + (__v8hf) __C, + __D, __E); } -extern __inline __m512h +extern __inline __m128h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_maskz_getmant_ph (__mmask32 __U, __m512h __A, - _MM_MANTISSA_NORM_ENUM __B, - _MM_MANTISSA_SIGN_ENUM __C) +_mm_maskz_fcmadd_round_sch (__mmask8 __A, __m128h __B, __m128h __C, + __m128h __D, const int __E) { - return (__m512h) __builtin_ia32_getmantph512_mask ((__v32hf) __A, - (__C << 2) | __B, - (__v32hf) - _mm512_setzero_ph (), - __U, - _MM_FROUND_CUR_DIRECTION); + return (__m128h) + __builtin_ia32_vfcmaddcsh_maskz_round ((__v8hf) __B, + (__v8hf) __C, + (__v8hf) __D, + __A, __E); } extern __inline __m128h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_getmant_round_sh (__m128h __A, __m128h __B, - _MM_MANTISSA_NORM_ENUM __C, - _MM_MANTISSA_SIGN_ENUM __D, const int __R) +_mm_fcmadd_round_sch (__m128h __A, __m128h __B, __m128h __C, const int __D) { - return (__m128h) __builtin_ia32_getmantsh_mask_round ((__v8hf) __A, - (__v8hf) __B, - (__D << 2) | __C, - _mm_setzero_ph (), - (__mmask8) -1, - __R); + return (__m128h) + __builtin_ia32_vfcmaddcsh_round ((__v8hf) __A, + (__v8hf) __B, + (__v8hf) __C, + __D); } extern __inline __m128h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_mask_getmant_round_sh (__m128h __W, __mmask8 __U, __m128h __A, - __m128h __B, _MM_MANTISSA_NORM_ENUM __C, - _MM_MANTISSA_SIGN_ENUM __D, const int __R) +_mm_mask_fmadd_round_sch (__m128h __A, __mmask8 __B, __m128h __C, + __m128h __D, const int __E) { - return (__m128h) __builtin_ia32_getmantsh_mask_round ((__v8hf) __A, - (__v8hf) __B, - (__D << 2) | __C, - (__v8hf) __W, - __U, __R); + return (__m128h) + __builtin_ia32_vfmaddcsh_mask_round ((__v8hf) __A, + (__v8hf) __C, + (__v8hf) __D, + __B, __E); } extern __inline __m128h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_maskz_getmant_round_sh (__mmask8 __U, __m128h __A, __m128h __B, - _MM_MANTISSA_NORM_ENUM __C, - _MM_MANTISSA_SIGN_ENUM __D, const int __R) +_mm_mask3_fmadd_round_sch (__m128h __A, __m128h __B, __m128h __C, + __mmask8 __D, const int __E) { - return (__m128h) __builtin_ia32_getmantsh_mask_round ((__v8hf) __A, - (__v8hf) __B, - (__D << 2) | __C, - (__v8hf) - _mm_setzero_ph(), - __U, __R); + return (__m128h) + __builtin_ia32_vfmaddcsh_mask3_round ((__v8hf) __A, + (__v8hf) __B, + (__v8hf) __C, + __D, __E); } -extern __inline __m512h +extern __inline __m128h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_getmant_round_ph (__m512h __A, _MM_MANTISSA_NORM_ENUM __B, - _MM_MANTISSA_SIGN_ENUM __C, const int __R) +_mm_maskz_fmadd_round_sch (__mmask8 __A, __m128h __B, __m128h __C, + __m128h __D, const int __E) { - return (__m512h) __builtin_ia32_getmantph512_mask ((__v32hf) __A, - (__C << 2) | __B, - _mm512_setzero_ph (), - (__mmask32) -1, __R); + return (__m128h) + __builtin_ia32_vfmaddcsh_maskz_round ((__v8hf) __B, + (__v8hf) __C, + (__v8hf) __D, + __A, __E); } -extern __inline __m512h +extern __inline __m128h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_mask_getmant_round_ph (__m512h __W, __mmask32 __U, __m512h __A, - _MM_MANTISSA_NORM_ENUM __B, - _MM_MANTISSA_SIGN_ENUM __C, const int __R) +_mm_fmadd_round_sch (__m128h __A, __m128h __B, __m128h __C, const int __D) { - return (__m512h) __builtin_ia32_getmantph512_mask ((__v32hf) __A, - (__C << 2) | __B, - (__v32hf) __W, __U, - __R); + return (__m128h) + __builtin_ia32_vfmaddcsh_round ((__v8hf) __A, + (__v8hf) __B, + (__v8hf) __C, + __D); } +#else +#define _mm_mask_fcmadd_round_sch(A, B, C, D, E) \ + ((__m128h) \ + __builtin_ia32_vfcmaddcsh_mask_round ((__v8hf) (A), \ + (__v8hf) (C), \ + (__v8hf) (D), \ + (B), (E))) -extern __inline __m512h -__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_maskz_getmant_round_ph (__mmask32 __U, __m512h __A, - _MM_MANTISSA_NORM_ENUM __B, - _MM_MANTISSA_SIGN_ENUM __C, const int __R) -{ - return (__m512h) __builtin_ia32_getmantph512_mask ((__v32hf) __A, - (__C << 2) | __B, - (__v32hf) - _mm512_setzero_ph (), - __U, __R); -} -#else -#define _mm512_getmant_ph(X, B, C) \ - ((__m512h)__builtin_ia32_getmantph512_mask ((__v32hf)(__m512h)(X), \ - (int)(((C)<<2) | (B)), \ - (__v32hf)(__m512h) \ - _mm512_setzero_ph(), \ - (__mmask32)-1, \ - _MM_FROUND_CUR_DIRECTION)) +#define _mm_mask3_fcmadd_round_sch(A, B, C, D, E) \ + ((__m128h) \ + __builtin_ia32_vfcmaddcsh_mask3_round ((__v8hf) (A), \ + (__v8hf) (B), \ + (__v8hf) (C), \ + (D), (E))) -#define _mm512_mask_getmant_ph(W, U, X, B, C) \ - ((__m512h)__builtin_ia32_getmantph512_mask ((__v32hf)(__m512h)(X), \ - (int)(((C)<<2) | (B)), \ - (__v32hf)(__m512h)(W), \ - (__mmask32)(U), \ - _MM_FROUND_CUR_DIRECTION)) +#define _mm_maskz_fcmadd_round_sch(A, B, C, D, E) \ + __builtin_ia32_vfcmaddcsh_maskz_round ((B), (C), (D), (A), (E)) +#define _mm_fcmadd_round_sch(A, B, C, D) \ + __builtin_ia32_vfcmaddcsh_round ((A), (B), (C), (D)) -#define _mm512_maskz_getmant_ph(U, X, B, C) \ - ((__m512h)__builtin_ia32_getmantph512_mask ((__v32hf)(__m512h)(X), \ - (int)(((C)<<2) | (B)), \ - (__v32hf)(__m512h) \ - _mm512_setzero_ph(), \ - (__mmask32)(U), \ - _MM_FROUND_CUR_DIRECTION)) - -#define _mm_getmant_sh(X, Y, C, D) \ - ((__m128h)__builtin_ia32_getmantsh_mask_round ((__v8hf)(__m128h)(X), \ - (__v8hf)(__m128h)(Y), \ - (int)(((D)<<2) | (C)), \ - (__v8hf)(__m128h) \ - _mm_setzero_ph (), \ - (__mmask8)-1, \ - _MM_FROUND_CUR_DIRECTION)) - -#define _mm_mask_getmant_sh(W, U, X, Y, C, D) \ - ((__m128h)__builtin_ia32_getmantsh_mask_round ((__v8hf)(__m128h)(X), \ - (__v8hf)(__m128h)(Y), \ - (int)(((D)<<2) | (C)), \ - (__v8hf)(__m128h)(W), \ - (__mmask8)(U), \ - _MM_FROUND_CUR_DIRECTION)) - -#define _mm_maskz_getmant_sh(U, X, Y, C, D) \ - ((__m128h)__builtin_ia32_getmantsh_mask_round ((__v8hf)(__m128h)(X), \ - (__v8hf)(__m128h)(Y), \ - (int)(((D)<<2) | (C)), \ - (__v8hf)(__m128h) \ - _mm_setzero_ph(), \ - (__mmask8)(U), \ - _MM_FROUND_CUR_DIRECTION)) - -#define _mm512_getmant_round_ph(X, B, C, R) \ - ((__m512h)__builtin_ia32_getmantph512_mask ((__v32hf)(__m512h)(X), \ - (int)(((C)<<2) | (B)), \ - (__v32hf)(__m512h) \ - _mm512_setzero_ph(), \ - (__mmask32)-1, \ - (R))) - -#define _mm512_mask_getmant_round_ph(W, U, X, B, C, R) \ - ((__m512h)__builtin_ia32_getmantph512_mask ((__v32hf)(__m512h)(X), \ - (int)(((C)<<2) | (B)), \ - (__v32hf)(__m512h)(W), \ - (__mmask32)(U), \ - (R))) - - -#define _mm512_maskz_getmant_round_ph(U, X, B, C, R) \ - ((__m512h)__builtin_ia32_getmantph512_mask ((__v32hf)(__m512h)(X), \ - (int)(((C)<<2) | (B)), \ - (__v32hf)(__m512h) \ - _mm512_setzero_ph(), \ - (__mmask32)(U), \ - (R))) +#define _mm_mask_fmadd_round_sch(A, B, C, D, E) \ + ((__m128h) \ + __builtin_ia32_vfmaddcsh_mask_round ((__v8hf) (A), \ + (__v8hf) (C), \ + (__v8hf) (D), \ + (B), (E))) -#define _mm_getmant_round_sh(X, Y, C, D, R) \ - ((__m128h)__builtin_ia32_getmantsh_mask_round ((__v8hf)(__m128h)(X), \ - (__v8hf)(__m128h)(Y), \ - (int)(((D)<<2) | (C)), \ - (__v8hf)(__m128h) \ - _mm_setzero_ph (), \ - (__mmask8)-1, \ - (R))) +#define _mm_mask3_fmadd_round_sch(A, B, C, D, E) \ + ((__m128h) \ + __builtin_ia32_vfmaddcsh_mask3_round ((__v8hf) (A), \ + (__v8hf) (B), \ + (__v8hf) (C), \ + (D), (E))) -#define _mm_mask_getmant_round_sh(W, U, X, Y, C, D, R) \ - ((__m128h)__builtin_ia32_getmantsh_mask_round ((__v8hf)(__m128h)(X), \ - (__v8hf)(__m128h)(Y), \ - (int)(((D)<<2) | (C)), \ - (__v8hf)(__m128h)(W), \ - (__mmask8)(U), \ - (R))) +#define _mm_maskz_fmadd_round_sch(A, B, C, D, E) \ + __builtin_ia32_vfmaddcsh_maskz_round ((B), (C), (D), (A), (E)) -#define _mm_maskz_getmant_round_sh(U, X, Y, C, D, R) \ - ((__m128h)__builtin_ia32_getmantsh_mask_round ((__v8hf)(__m128h)(X), \ - (__v8hf)(__m128h)(Y), \ - (int)(((D)<<2) | (C)), \ - (__v8hf)(__m128h) \ - _mm_setzero_ph(), \ - (__mmask8)(U), \ - (R))) +#define _mm_fmadd_round_sch(A, B, C, D) \ + __builtin_ia32_vfmaddcsh_round ((A), (B), (C), (D)) #endif /* __OPTIMIZE__ */ -/* Intrinsics vmovw. */ -extern __inline __m128i -__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_cvtsi16_si128 (short __A) -{ - return _mm_set_epi16 (0, 0, 0, 0, 0, 0, 0, __A); -} - -extern __inline short -__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_cvtsi128_si16 (__m128i __A) -{ - return __builtin_ia32_vec_ext_v8hi ((__v8hi)__A, 0); -} - -/* Intrinsics vmovsh. */ +/* Intrinsics vf[,c]mulcsh. */ extern __inline __m128h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_mask_load_sh (__m128h __A, __mmask8 __B, _Float16 const* __C) +_mm_fcmul_sch (__m128h __A, __m128h __B) { - return __builtin_ia32_loadsh_mask (__C, __A, __B); + return (__m128h) + __builtin_ia32_vfcmulcsh_round ((__v8hf) __A, + (__v8hf) __B, + _MM_FROUND_CUR_DIRECTION); } extern __inline __m128h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_maskz_load_sh (__mmask8 __A, _Float16 const* __B) +_mm_mask_fcmul_sch (__m128h __A, __mmask8 __B, __m128h __C, __m128h __D) { - return __builtin_ia32_loadsh_mask (__B, _mm_setzero_ph (), __A); + return (__m128h) + __builtin_ia32_vfcmulcsh_mask_round ((__v8hf) __C, + (__v8hf) __D, + (__v8hf) __A, + __B, _MM_FROUND_CUR_DIRECTION); } -extern __inline void +extern __inline __m128h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_mask_store_sh (_Float16 const* __A, __mmask8 __B, __m128h __C) +_mm_maskz_fcmul_sch (__mmask8 __A, __m128h __B, __m128h __C) { - __builtin_ia32_storesh_mask (__A, __C, __B); + return (__m128h) + __builtin_ia32_vfcmulcsh_mask_round ((__v8hf) __B, + (__v8hf) __C, + _mm_setzero_ph (), + __A, _MM_FROUND_CUR_DIRECTION); } extern __inline __m128h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_move_sh (__m128h __A, __m128h __B) +_mm_fmul_sch (__m128h __A, __m128h __B) { - __A[0] = __B[0]; - return __A; + return (__m128h) + __builtin_ia32_vfmulcsh_round ((__v8hf) __A, + (__v8hf) __B, + _MM_FROUND_CUR_DIRECTION); } extern __inline __m128h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_mask_move_sh (__m128h __A, __mmask8 __B, __m128h __C, __m128h __D) +_mm_mask_fmul_sch (__m128h __A, __mmask8 __B, __m128h __C, __m128h __D) { - return __builtin_ia32_vmovsh_mask (__C, __D, __A, __B); + return (__m128h) + __builtin_ia32_vfmulcsh_mask_round ((__v8hf) __C, + (__v8hf) __D, + (__v8hf) __A, + __B, _MM_FROUND_CUR_DIRECTION); } extern __inline __m128h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_maskz_move_sh (__mmask8 __A, __m128h __B, __m128h __C) +_mm_maskz_fmul_sch (__mmask8 __A, __m128h __B, __m128h __C) { - return __builtin_ia32_vmovsh_mask (__B, __C, _mm_setzero_ph (), __A); + return (__m128h) + __builtin_ia32_vfmulcsh_mask_round ((__v8hf) __B, + (__v8hf) __C, + _mm_setzero_ph (), + __A, _MM_FROUND_CUR_DIRECTION); } -/* Intrinsics vcvtph2dq. */ -extern __inline __m512i +#ifdef __OPTIMIZE__ +extern __inline __m128h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_cvtph_epi32 (__m256h __A) +_mm_fcmul_round_sch (__m128h __A, __m128h __B, const int __D) { - return (__m512i) - __builtin_ia32_vcvtph2dq512_mask_round (__A, - (__v16si) - _mm512_setzero_si512 (), - (__mmask16) -1, - _MM_FROUND_CUR_DIRECTION); + return (__m128h) + __builtin_ia32_vfcmulcsh_round ((__v8hf) __A, + (__v8hf) __B, + __D); } -extern __inline __m512i +extern __inline __m128h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_mask_cvtph_epi32 (__m512i __A, __mmask16 __B, __m256h __C) +_mm_mask_fcmul_round_sch (__m128h __A, __mmask8 __B, __m128h __C, + __m128h __D, const int __E) { - return (__m512i) - __builtin_ia32_vcvtph2dq512_mask_round (__C, - (__v16si) __A, - __B, - _MM_FROUND_CUR_DIRECTION); + return (__m128h) + __builtin_ia32_vfcmulcsh_mask_round ((__v8hf) __C, + (__v8hf) __D, + (__v8hf) __A, + __B, __E); } -extern __inline __m512i +extern __inline __m128h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_maskz_cvtph_epi32 (__mmask16 __A, __m256h __B) +_mm_maskz_fcmul_round_sch (__mmask8 __A, __m128h __B, __m128h __C, + const int __E) { - return (__m512i) - __builtin_ia32_vcvtph2dq512_mask_round (__B, - (__v16si) - _mm512_setzero_si512 (), - __A, - _MM_FROUND_CUR_DIRECTION); + return (__m128h) + __builtin_ia32_vfcmulcsh_mask_round ((__v8hf) __B, + (__v8hf) __C, + _mm_setzero_ph (), + __A, __E); } -#ifdef __OPTIMIZE__ -extern __inline __m512i +extern __inline __m128h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_cvt_roundph_epi32 (__m256h __A, int __B) +_mm_fmul_round_sch (__m128h __A, __m128h __B, const int __D) { - return (__m512i) - __builtin_ia32_vcvtph2dq512_mask_round (__A, - (__v16si) - _mm512_setzero_si512 (), - (__mmask16) -1, - __B); + return (__m128h) + __builtin_ia32_vfmulcsh_round ((__v8hf) __A, + (__v8hf) __B, __D); } -extern __inline __m512i +extern __inline __m128h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_mask_cvt_roundph_epi32 (__m512i __A, __mmask16 __B, __m256h __C, int __D) +_mm_mask_fmul_round_sch (__m128h __A, __mmask8 __B, __m128h __C, + __m128h __D, const int __E) { - return (__m512i) - __builtin_ia32_vcvtph2dq512_mask_round (__C, - (__v16si) __A, - __B, - __D); + return (__m128h) + __builtin_ia32_vfmulcsh_mask_round ((__v8hf) __C, + (__v8hf) __D, + (__v8hf) __A, + __B, __E); } -extern __inline __m512i +extern __inline __m128h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_maskz_cvt_roundph_epi32 (__mmask16 __A, __m256h __B, int __C) +_mm_maskz_fmul_round_sch (__mmask8 __A, __m128h __B, __m128h __C, const int __E) { - return (__m512i) - __builtin_ia32_vcvtph2dq512_mask_round (__B, - (__v16si) - _mm512_setzero_si512 (), - __A, - __C); + return (__m128h) + __builtin_ia32_vfmulcsh_mask_round ((__v8hf) __B, + (__v8hf) __C, + _mm_setzero_ph (), + __A, __E); } #else -#define _mm512_cvt_roundph_epi32(A, B) \ - ((__m512i) \ - __builtin_ia32_vcvtph2dq512_mask_round ((A), \ - (__v16si) \ - _mm512_setzero_si512 (), \ - (__mmask16)-1, \ - (B))) +#define _mm_fcmul_round_sch(__A, __B, __D) \ + (__m128h) __builtin_ia32_vfcmulcsh_round ((__v8hf) __A, \ + (__v8hf) __B, __D) -#define _mm512_mask_cvt_roundph_epi32(A, B, C, D) \ - ((__m512i) \ - __builtin_ia32_vcvtph2dq512_mask_round ((C), (__v16si)(A), (B), (D))) +#define _mm_mask_fcmul_round_sch(__A, __B, __C, __D, __E) \ + (__m128h) __builtin_ia32_vfcmulcsh_mask_round ((__v8hf) __C, \ + (__v8hf) __D, \ + (__v8hf) __A, \ + __B, __E) -#define _mm512_maskz_cvt_roundph_epi32(A, B, C) \ - ((__m512i) \ - __builtin_ia32_vcvtph2dq512_mask_round ((B), \ - (__v16si) \ - _mm512_setzero_si512 (), \ - (A), \ - (C))) +#define _mm_maskz_fcmul_round_sch(__A, __B, __C, __E) \ + (__m128h) __builtin_ia32_vfcmulcsh_mask_round ((__v8hf) __B, \ + (__v8hf) __C, \ + _mm_setzero_ph (), \ + __A, __E) + +#define _mm_fmul_round_sch(__A, __B, __D) \ + (__m128h) __builtin_ia32_vfmulcsh_round ((__v8hf) __A, \ + (__v8hf) __B, __D) + +#define _mm_mask_fmul_round_sch(__A, __B, __C, __D, __E) \ + (__m128h) __builtin_ia32_vfmulcsh_mask_round ((__v8hf) __C, \ + (__v8hf) __D, \ + (__v8hf) __A, \ + __B, __E) + +#define _mm_maskz_fmul_round_sch(__A, __B, __C, __E) \ + (__m128h) __builtin_ia32_vfmulcsh_mask_round ((__v8hf) __B, \ + (__v8hf) __C, \ + _mm_setzero_ph (), \ + __A, __E) #endif /* __OPTIMIZE__ */ -/* Intrinsics vcvtph2udq. */ -extern __inline __m512i +#define _mm_mul_sch(A, B) _mm_fmul_sch ((A), (B)) +#define _mm_mask_mul_sch(W, U, A, B) _mm_mask_fmul_sch ((W), (U), (A), (B)) +#define _mm_maskz_mul_sch(U, A, B) _mm_maskz_fmul_sch ((U), (A), (B)) +#define _mm_mul_round_sch(A, B, R) _mm_fmul_round_sch ((A), (B), (R)) +#define _mm_mask_mul_round_sch(W, U, A, B, R) \ + _mm_mask_fmul_round_sch ((W), (U), (A), (B), (R)) +#define _mm_maskz_mul_round_sch(U, A, B, R) \ + _mm_maskz_fmul_round_sch ((U), (A), (B), (R)) + +#define _mm_cmul_sch(A, B) _mm_fcmul_sch ((A), (B)) +#define _mm_mask_cmul_sch(W, U, A, B) _mm_mask_fcmul_sch ((W), (U), (A), (B)) +#define _mm_maskz_cmul_sch(U, A, B) _mm_maskz_fcmul_sch ((U), (A), (B)) +#define _mm_cmul_round_sch(A, B, R) _mm_fcmul_round_sch ((A), (B), (R)) +#define _mm_mask_cmul_round_sch(W, U, A, B, R) \ + _mm_mask_fcmul_round_sch ((W), (U), (A), (B), (R)) +#define _mm_maskz_cmul_round_sch(U, A, B, R) \ + _mm_maskz_fcmul_round_sch ((U), (A), (B), (R)) + +#ifdef __DISABLE_AVX512FP16__ +#undef __DISABLE_AVX512FP16__ +#pragma GCC pop_options +#endif /* __DISABLE_AVX512FP16__ */ + +#if !defined (__AVX512FP16__) || !defined (__EVEX512__) +#pragma GCC push_options +#pragma GCC target("avx512fp16,evex512") +#define __DISABLE_AVX512FP16_512__ +#endif /* __AVX512FP16_512__ */ + +typedef _Float16 __v32hf __attribute__ ((__vector_size__ (64))); +typedef _Float16 __m512h __attribute__ ((__vector_size__ (64), __may_alias__)); +typedef _Float16 __m512h_u __attribute__ ((__vector_size__ (64), \ + __may_alias__, __aligned__ (1))); + +extern __inline __m512h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_cvtph_epu32 (__m256h __A) +_mm512_set_ph (_Float16 __A31, _Float16 __A30, _Float16 __A29, + _Float16 __A28, _Float16 __A27, _Float16 __A26, + _Float16 __A25, _Float16 __A24, _Float16 __A23, + _Float16 __A22, _Float16 __A21, _Float16 __A20, + _Float16 __A19, _Float16 __A18, _Float16 __A17, + _Float16 __A16, _Float16 __A15, _Float16 __A14, + _Float16 __A13, _Float16 __A12, _Float16 __A11, + _Float16 __A10, _Float16 __A9, _Float16 __A8, + _Float16 __A7, _Float16 __A6, _Float16 __A5, + _Float16 __A4, _Float16 __A3, _Float16 __A2, + _Float16 __A1, _Float16 __A0) { - return (__m512i) - __builtin_ia32_vcvtph2udq512_mask_round (__A, - (__v16si) - _mm512_setzero_si512 (), - (__mmask16) -1, - _MM_FROUND_CUR_DIRECTION); + return __extension__ (__m512h)(__v32hf){ __A0, __A1, __A2, __A3, + __A4, __A5, __A6, __A7, + __A8, __A9, __A10, __A11, + __A12, __A13, __A14, __A15, + __A16, __A17, __A18, __A19, + __A20, __A21, __A22, __A23, + __A24, __A25, __A26, __A27, + __A28, __A29, __A30, __A31 }; } -extern __inline __m512i +/* Create vectors of elements in the reversed order from + _mm512_set_ph functions. */ + +extern __inline __m512h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_mask_cvtph_epu32 (__m512i __A, __mmask16 __B, __m256h __C) +_mm512_setr_ph (_Float16 __A0, _Float16 __A1, _Float16 __A2, + _Float16 __A3, _Float16 __A4, _Float16 __A5, + _Float16 __A6, _Float16 __A7, _Float16 __A8, + _Float16 __A9, _Float16 __A10, _Float16 __A11, + _Float16 __A12, _Float16 __A13, _Float16 __A14, + _Float16 __A15, _Float16 __A16, _Float16 __A17, + _Float16 __A18, _Float16 __A19, _Float16 __A20, + _Float16 __A21, _Float16 __A22, _Float16 __A23, + _Float16 __A24, _Float16 __A25, _Float16 __A26, + _Float16 __A27, _Float16 __A28, _Float16 __A29, + _Float16 __A30, _Float16 __A31) + { - return (__m512i) - __builtin_ia32_vcvtph2udq512_mask_round (__C, - (__v16si) __A, - __B, - _MM_FROUND_CUR_DIRECTION); + return _mm512_set_ph (__A31, __A30, __A29, __A28, __A27, __A26, __A25, + __A24, __A23, __A22, __A21, __A20, __A19, __A18, + __A17, __A16, __A15, __A14, __A13, __A12, __A11, + __A10, __A9, __A8, __A7, __A6, __A5, __A4, __A3, + __A2, __A1, __A0); } -extern __inline __m512i +/* Broadcast _Float16 to vector. */ +extern __inline __m512h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_maskz_cvtph_epu32 (__mmask16 __A, __m256h __B) +_mm512_set1_ph (_Float16 __A) { - return (__m512i) - __builtin_ia32_vcvtph2udq512_mask_round (__B, - (__v16si) - _mm512_setzero_si512 (), - __A, - _MM_FROUND_CUR_DIRECTION); + return _mm512_set_ph (__A, __A, __A, __A, __A, __A, __A, __A, + __A, __A, __A, __A, __A, __A, __A, __A, + __A, __A, __A, __A, __A, __A, __A, __A, + __A, __A, __A, __A, __A, __A, __A, __A); } -#ifdef __OPTIMIZE__ -extern __inline __m512i +/* Create a vector with all zeros. */ +extern __inline __m512h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_cvt_roundph_epu32 (__m256h __A, int __B) +_mm512_setzero_ph (void) { - return (__m512i) - __builtin_ia32_vcvtph2udq512_mask_round (__A, - (__v16si) - _mm512_setzero_si512 (), - (__mmask16) -1, - __B); + return _mm512_set1_ph (0.0f16); } -extern __inline __m512i +extern __inline __m512h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_mask_cvt_roundph_epu32 (__m512i __A, __mmask16 __B, __m256h __C, int __D) +_mm512_undefined_ph (void) { - return (__m512i) - __builtin_ia32_vcvtph2udq512_mask_round (__C, - (__v16si) __A, - __B, - __D); +#pragma GCC diagnostic push +#pragma GCC diagnostic ignored "-Winit-self" + __m512h __Y = __Y; +#pragma GCC diagnostic pop + return __Y; } -extern __inline __m512i +extern __inline _Float16 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_maskz_cvt_roundph_epu32 (__mmask16 __A, __m256h __B, int __C) +_mm512_cvtsh_h (__m512h __A) { - return (__m512i) - __builtin_ia32_vcvtph2udq512_mask_round (__B, - (__v16si) - _mm512_setzero_si512 (), - __A, - __C); + return __A[0]; } -#else -#define _mm512_cvt_roundph_epu32(A, B) \ - ((__m512i) \ - __builtin_ia32_vcvtph2udq512_mask_round ((A), \ - (__v16si) \ - _mm512_setzero_si512 (), \ - (__mmask16)-1, \ - (B))) - -#define _mm512_mask_cvt_roundph_epu32(A, B, C, D) \ - ((__m512i) \ - __builtin_ia32_vcvtph2udq512_mask_round ((C), (__v16si)(A), (B), (D))) - -#define _mm512_maskz_cvt_roundph_epu32(A, B, C) \ - ((__m512i) \ - __builtin_ia32_vcvtph2udq512_mask_round ((B), \ - (__v16si) \ - _mm512_setzero_si512 (), \ - (A), \ - (C))) - -#endif /* __OPTIMIZE__ */ - -/* Intrinsics vcvttph2dq. */ -extern __inline __m512i +extern __inline __m512 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_cvttph_epi32 (__m256h __A) +_mm512_castph_ps (__m512h __a) { - return (__m512i) - __builtin_ia32_vcvttph2dq512_mask_round (__A, - (__v16si) - _mm512_setzero_si512 (), - (__mmask16) -1, - _MM_FROUND_CUR_DIRECTION); + return (__m512) __a; } -extern __inline __m512i +extern __inline __m512d __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_mask_cvttph_epi32 (__m512i __A, __mmask16 __B, __m256h __C) +_mm512_castph_pd (__m512h __a) { - return (__m512i) - __builtin_ia32_vcvttph2dq512_mask_round (__C, - (__v16si) __A, - __B, - _MM_FROUND_CUR_DIRECTION); + return (__m512d) __a; } extern __inline __m512i __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_maskz_cvttph_epi32 (__mmask16 __A, __m256h __B) +_mm512_castph_si512 (__m512h __a) { - return (__m512i) - __builtin_ia32_vcvttph2dq512_mask_round (__B, - (__v16si) - _mm512_setzero_si512 (), - __A, - _MM_FROUND_CUR_DIRECTION); + return (__m512i) __a; } -#ifdef __OPTIMIZE__ -extern __inline __m512i +extern __inline __m128h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_cvtt_roundph_epi32 (__m256h __A, int __B) +_mm512_castph512_ph128 (__m512h __A) { - return (__m512i) - __builtin_ia32_vcvttph2dq512_mask_round (__A, - (__v16si) - _mm512_setzero_si512 (), - (__mmask16) -1, - __B); + union + { + __m128h __a[4]; + __m512h __v; + } __u = { .__v = __A }; + return __u.__a[0]; } -extern __inline __m512i +extern __inline __m256h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_mask_cvtt_roundph_epi32 (__m512i __A, __mmask16 __B, - __m256h __C, int __D) +_mm512_castph512_ph256 (__m512h __A) { - return (__m512i) - __builtin_ia32_vcvttph2dq512_mask_round (__C, - (__v16si) __A, - __B, - __D); + union + { + __m256h __a[2]; + __m512h __v; + } __u = { .__v = __A }; + return __u.__a[0]; } -extern __inline __m512i +extern __inline __m512h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_maskz_cvtt_roundph_epi32 (__mmask16 __A, __m256h __B, int __C) +_mm512_castph128_ph512 (__m128h __A) { - return (__m512i) - __builtin_ia32_vcvttph2dq512_mask_round (__B, - (__v16si) - _mm512_setzero_si512 (), - __A, - __C); + union + { + __m128h __a[4]; + __m512h __v; + } __u; + __u.__a[0] = __A; + return __u.__v; } -#else -#define _mm512_cvtt_roundph_epi32(A, B) \ - ((__m512i) \ - __builtin_ia32_vcvttph2dq512_mask_round ((A), \ - (__v16si) \ - (_mm512_setzero_si512 ()), \ - (__mmask16)(-1), (B))) +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_castph256_ph512 (__m256h __A) +{ + union + { + __m256h __a[2]; + __m512h __v; + } __u; + __u.__a[0] = __A; + return __u.__v; +} -#define _mm512_mask_cvtt_roundph_epi32(A, B, C, D) \ - ((__m512i) \ - __builtin_ia32_vcvttph2dq512_mask_round ((C), \ - (__v16si)(A), \ - (B), \ - (D))) +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_zextph128_ph512 (__m128h __A) +{ + return (__m512h) _mm512_insertf32x4 (_mm512_setzero_ps (), + (__m128) __A, 0); +} -#define _mm512_maskz_cvtt_roundph_epi32(A, B, C) \ - ((__m512i) \ - __builtin_ia32_vcvttph2dq512_mask_round ((B), \ - (__v16si) \ - _mm512_setzero_si512 (), \ - (A), \ - (C))) +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_zextph256_ph512 (__m256h __A) +{ + return (__m512h) _mm512_insertf64x4 (_mm512_setzero_pd (), + (__m256d) __A, 0); +} -#endif /* __OPTIMIZE__ */ +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_castps_ph (__m512 __a) +{ + return (__m512h) __a; +} -/* Intrinsics vcvttph2udq. */ -extern __inline __m512i +extern __inline __m512h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_cvttph_epu32 (__m256h __A) +_mm512_castpd_ph (__m512d __a) { - return (__m512i) - __builtin_ia32_vcvttph2udq512_mask_round (__A, - (__v16si) - _mm512_setzero_si512 (), - (__mmask16) -1, - _MM_FROUND_CUR_DIRECTION); + return (__m512h) __a; } -extern __inline __m512i +extern __inline __m512h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_mask_cvttph_epu32 (__m512i __A, __mmask16 __B, __m256h __C) +_mm512_castsi512_ph (__m512i __a) { - return (__m512i) - __builtin_ia32_vcvttph2udq512_mask_round (__C, - (__v16si) __A, - __B, - _MM_FROUND_CUR_DIRECTION); + return (__m512h) __a; } -extern __inline __m512i +/* Create a vector with element 0 as *P and the rest zero. */ +extern __inline __m512h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_maskz_cvttph_epu32 (__mmask16 __A, __m256h __B) +_mm512_load_ph (void const *__P) { - return (__m512i) - __builtin_ia32_vcvttph2udq512_mask_round (__B, - (__v16si) - _mm512_setzero_si512 (), - __A, - _MM_FROUND_CUR_DIRECTION); + return *(const __m512h *) __P; +} + +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_loadu_ph (void const *__P) +{ + return *(const __m512h_u *) __P; +} + +/* Stores the lower _Float16 value. */ +extern __inline void +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_store_ph (void *__P, __m512h __A) +{ + *(__m512h *) __P = __A; +} + +extern __inline void +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_storeu_ph (void *__P, __m512h __A) +{ + *(__m512h_u *) __P = __A; +} + +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_abs_ph (__m512h __A) +{ + return (__m512h) _mm512_and_epi32 ( _mm512_set1_epi32 (0x7FFF7FFF), + (__m512i) __A); +} + +/* Intrinsics v[add,sub,mul,div]ph. */ +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_add_ph (__m512h __A, __m512h __B) +{ + return (__m512h) ((__v32hf) __A + (__v32hf) __B); +} + +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_add_ph (__m512h __A, __mmask32 __B, __m512h __C, __m512h __D) +{ + return __builtin_ia32_addph512_mask (__C, __D, __A, __B); +} + +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_add_ph (__mmask32 __A, __m512h __B, __m512h __C) +{ + return __builtin_ia32_addph512_mask (__B, __C, + _mm512_setzero_ph (), __A); +} + +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_sub_ph (__m512h __A, __m512h __B) +{ + return (__m512h) ((__v32hf) __A - (__v32hf) __B); +} + +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_sub_ph (__m512h __A, __mmask32 __B, __m512h __C, __m512h __D) +{ + return __builtin_ia32_subph512_mask (__C, __D, __A, __B); +} + +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_sub_ph (__mmask32 __A, __m512h __B, __m512h __C) +{ + return __builtin_ia32_subph512_mask (__B, __C, + _mm512_setzero_ph (), __A); +} + +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mul_ph (__m512h __A, __m512h __B) +{ + return (__m512h) ((__v32hf) __A * (__v32hf) __B); +} + +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_mul_ph (__m512h __A, __mmask32 __B, __m512h __C, __m512h __D) +{ + return __builtin_ia32_mulph512_mask (__C, __D, __A, __B); +} + +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_mul_ph (__mmask32 __A, __m512h __B, __m512h __C) +{ + return __builtin_ia32_mulph512_mask (__B, __C, + _mm512_setzero_ph (), __A); +} + +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_div_ph (__m512h __A, __m512h __B) +{ + return (__m512h) ((__v32hf) __A / (__v32hf) __B); +} + +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_div_ph (__m512h __A, __mmask32 __B, __m512h __C, __m512h __D) +{ + return __builtin_ia32_divph512_mask (__C, __D, __A, __B); +} + +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_div_ph (__mmask32 __A, __m512h __B, __m512h __C) +{ + return __builtin_ia32_divph512_mask (__B, __C, + _mm512_setzero_ph (), __A); } #ifdef __OPTIMIZE__ -extern __inline __m512i +extern __inline __m512h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_cvtt_roundph_epu32 (__m256h __A, int __B) +_mm512_add_round_ph (__m512h __A, __m512h __B, const int __C) { - return (__m512i) - __builtin_ia32_vcvttph2udq512_mask_round (__A, - (__v16si) - _mm512_setzero_si512 (), - (__mmask16) -1, - __B); + return __builtin_ia32_addph512_mask_round (__A, __B, + _mm512_setzero_ph (), + (__mmask32) -1, __C); } -extern __inline __m512i +extern __inline __m512h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_mask_cvtt_roundph_epu32 (__m512i __A, __mmask16 __B, - __m256h __C, int __D) +_mm512_mask_add_round_ph (__m512h __A, __mmask32 __B, __m512h __C, + __m512h __D, const int __E) { - return (__m512i) - __builtin_ia32_vcvttph2udq512_mask_round (__C, - (__v16si) __A, - __B, - __D); + return __builtin_ia32_addph512_mask_round (__C, __D, __A, __B, __E); } -extern __inline __m512i +extern __inline __m512h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_maskz_cvtt_roundph_epu32 (__mmask16 __A, __m256h __B, int __C) +_mm512_maskz_add_round_ph (__mmask32 __A, __m512h __B, __m512h __C, + const int __D) { - return (__m512i) - __builtin_ia32_vcvttph2udq512_mask_round (__B, - (__v16si) - _mm512_setzero_si512 (), - __A, - __C); + return __builtin_ia32_addph512_mask_round (__B, __C, + _mm512_setzero_ph (), + __A, __D); +} + +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_sub_round_ph (__m512h __A, __m512h __B, const int __C) +{ + return __builtin_ia32_subph512_mask_round (__A, __B, + _mm512_setzero_ph (), + (__mmask32) -1, __C); +} + +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_sub_round_ph (__m512h __A, __mmask32 __B, __m512h __C, + __m512h __D, const int __E) +{ + return __builtin_ia32_subph512_mask_round (__C, __D, __A, __B, __E); +} + +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_sub_round_ph (__mmask32 __A, __m512h __B, __m512h __C, + const int __D) +{ + return __builtin_ia32_subph512_mask_round (__B, __C, + _mm512_setzero_ph (), + __A, __D); +} + +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mul_round_ph (__m512h __A, __m512h __B, const int __C) +{ + return __builtin_ia32_mulph512_mask_round (__A, __B, + _mm512_setzero_ph (), + (__mmask32) -1, __C); +} + +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_mul_round_ph (__m512h __A, __mmask32 __B, __m512h __C, + __m512h __D, const int __E) +{ + return __builtin_ia32_mulph512_mask_round (__C, __D, __A, __B, __E); +} + +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_mul_round_ph (__mmask32 __A, __m512h __B, __m512h __C, + const int __D) +{ + return __builtin_ia32_mulph512_mask_round (__B, __C, + _mm512_setzero_ph (), + __A, __D); +} + +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_div_round_ph (__m512h __A, __m512h __B, const int __C) +{ + return __builtin_ia32_divph512_mask_round (__A, __B, + _mm512_setzero_ph (), + (__mmask32) -1, __C); +} + +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_div_round_ph (__m512h __A, __mmask32 __B, __m512h __C, + __m512h __D, const int __E) +{ + return __builtin_ia32_divph512_mask_round (__C, __D, __A, __B, __E); } +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_div_round_ph (__mmask32 __A, __m512h __B, __m512h __C, + const int __D) +{ + return __builtin_ia32_divph512_mask_round (__B, __C, + _mm512_setzero_ph (), + __A, __D); +} #else -#define _mm512_cvtt_roundph_epu32(A, B) \ - ((__m512i) \ - __builtin_ia32_vcvttph2udq512_mask_round ((A), \ - (__v16si) \ - _mm512_setzero_si512 (), \ - (__mmask16)-1, \ - (B))) +#define _mm512_add_round_ph(A, B, C) \ + ((__m512h)__builtin_ia32_addph512_mask_round((A), (B), \ + _mm512_setzero_ph (), \ + (__mmask32)-1, (C))) -#define _mm512_mask_cvtt_roundph_epu32(A, B, C, D) \ - ((__m512i) \ - __builtin_ia32_vcvttph2udq512_mask_round ((C), \ - (__v16si)(A), \ - (B), \ - (D))) +#define _mm512_mask_add_round_ph(A, B, C, D, E) \ + ((__m512h)__builtin_ia32_addph512_mask_round((C), (D), (A), (B), (E))) -#define _mm512_maskz_cvtt_roundph_epu32(A, B, C) \ - ((__m512i) \ - __builtin_ia32_vcvttph2udq512_mask_round ((B), \ - (__v16si) \ - _mm512_setzero_si512 (), \ - (A), \ - (C))) +#define _mm512_maskz_add_round_ph(A, B, C, D) \ + ((__m512h)__builtin_ia32_addph512_mask_round((B), (C), \ + _mm512_setzero_ph (), \ + (A), (D))) -#endif /* __OPTIMIZE__ */ +#define _mm512_sub_round_ph(A, B, C) \ + ((__m512h)__builtin_ia32_subph512_mask_round((A), (B), \ + _mm512_setzero_ph (), \ + (__mmask32)-1, (C))) -/* Intrinsics vcvtdq2ph. */ -extern __inline __m256h +#define _mm512_mask_sub_round_ph(A, B, C, D, E) \ + ((__m512h)__builtin_ia32_subph512_mask_round((C), (D), (A), (B), (E))) + +#define _mm512_maskz_sub_round_ph(A, B, C, D) \ + ((__m512h)__builtin_ia32_subph512_mask_round((B), (C), \ + _mm512_setzero_ph (), \ + (A), (D))) + +#define _mm512_mul_round_ph(A, B, C) \ + ((__m512h)__builtin_ia32_mulph512_mask_round((A), (B), \ + _mm512_setzero_ph (), \ + (__mmask32)-1, (C))) + +#define _mm512_mask_mul_round_ph(A, B, C, D, E) \ + ((__m512h)__builtin_ia32_mulph512_mask_round((C), (D), (A), (B), (E))) + +#define _mm512_maskz_mul_round_ph(A, B, C, D) \ + ((__m512h)__builtin_ia32_mulph512_mask_round((B), (C), \ + _mm512_setzero_ph (), \ + (A), (D))) + +#define _mm512_div_round_ph(A, B, C) \ + ((__m512h)__builtin_ia32_divph512_mask_round((A), (B), \ + _mm512_setzero_ph (), \ + (__mmask32)-1, (C))) + +#define _mm512_mask_div_round_ph(A, B, C, D, E) \ + ((__m512h)__builtin_ia32_divph512_mask_round((C), (D), (A), (B), (E))) + +#define _mm512_maskz_div_round_ph(A, B, C, D) \ + ((__m512h)__builtin_ia32_divph512_mask_round((B), (C), \ + _mm512_setzero_ph (), \ + (A), (D))) +#endif /* __OPTIMIZE__ */ + +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_conj_pch (__m512h __A) +{ + return (__m512h) _mm512_xor_epi32 ((__m512i) __A, _mm512_set1_epi32 (1<<31)); +} + +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_conj_pch (__m512h __W, __mmask16 __U, __m512h __A) +{ + return (__m512h) + __builtin_ia32_movaps512_mask ((__v16sf) _mm512_conj_pch (__A), + (__v16sf) __W, + (__mmask16) __U); +} + +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_conj_pch (__mmask16 __U, __m512h __A) +{ + return (__m512h) + __builtin_ia32_movaps512_mask ((__v16sf) _mm512_conj_pch (__A), + (__v16sf) _mm512_setzero_ps (), + (__mmask16) __U); +} + +/* Intrinsic vmaxph vminph. */ +extern __inline __m512h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_cvtepi32_ph (__m512i __A) +_mm512_max_ph (__m512h __A, __m512h __B) { - return __builtin_ia32_vcvtdq2ph512_mask_round ((__v16si) __A, - _mm256_setzero_ph (), - (__mmask16) -1, - _MM_FROUND_CUR_DIRECTION); + return __builtin_ia32_maxph512_mask (__A, __B, + _mm512_setzero_ph (), + (__mmask32) -1); } -extern __inline __m256h +extern __inline __m512h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_mask_cvtepi32_ph (__m256h __A, __mmask16 __B, __m512i __C) +_mm512_mask_max_ph (__m512h __A, __mmask32 __B, __m512h __C, __m512h __D) { - return __builtin_ia32_vcvtdq2ph512_mask_round ((__v16si) __C, - __A, - __B, - _MM_FROUND_CUR_DIRECTION); + return __builtin_ia32_maxph512_mask (__C, __D, __A, __B); } -extern __inline __m256h +extern __inline __m512h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_maskz_cvtepi32_ph (__mmask16 __A, __m512i __B) +_mm512_maskz_max_ph (__mmask32 __A, __m512h __B, __m512h __C) { - return __builtin_ia32_vcvtdq2ph512_mask_round ((__v16si) __B, - _mm256_setzero_ph (), - __A, - _MM_FROUND_CUR_DIRECTION); + return __builtin_ia32_maxph512_mask (__B, __C, + _mm512_setzero_ph (), __A); } -#ifdef __OPTIMIZE__ -extern __inline __m256h +extern __inline __m512h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_cvt_roundepi32_ph (__m512i __A, int __B) +_mm512_min_ph (__m512h __A, __m512h __B) { - return __builtin_ia32_vcvtdq2ph512_mask_round ((__v16si) __A, - _mm256_setzero_ph (), - (__mmask16) -1, - __B); + return __builtin_ia32_minph512_mask (__A, __B, + _mm512_setzero_ph (), + (__mmask32) -1); } -extern __inline __m256h +extern __inline __m512h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_mask_cvt_roundepi32_ph (__m256h __A, __mmask16 __B, __m512i __C, int __D) +_mm512_mask_min_ph (__m512h __A, __mmask32 __B, __m512h __C, __m512h __D) { - return __builtin_ia32_vcvtdq2ph512_mask_round ((__v16si) __C, - __A, - __B, - __D); + return __builtin_ia32_minph512_mask (__C, __D, __A, __B); } -extern __inline __m256h +extern __inline __m512h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_maskz_cvt_roundepi32_ph (__mmask16 __A, __m512i __B, int __C) +_mm512_maskz_min_ph (__mmask32 __A, __m512h __B, __m512h __C) { - return __builtin_ia32_vcvtdq2ph512_mask_round ((__v16si) __B, - _mm256_setzero_ph (), - __A, - __C); + return __builtin_ia32_minph512_mask (__B, __C, + _mm512_setzero_ph (), __A); } -#else -#define _mm512_cvt_roundepi32_ph(A, B) \ - (__builtin_ia32_vcvtdq2ph512_mask_round ((__v16si)(A), \ - _mm256_setzero_ph (), \ - (__mmask16)-1, \ - (B))) - -#define _mm512_mask_cvt_roundepi32_ph(A, B, C, D) \ - (__builtin_ia32_vcvtdq2ph512_mask_round ((__v16si)(C), \ - (A), \ - (B), \ - (D))) - -#define _mm512_maskz_cvt_roundepi32_ph(A, B, C) \ - (__builtin_ia32_vcvtdq2ph512_mask_round ((__v16si)(B), \ - _mm256_setzero_ph (), \ - (A), \ - (C))) - -#endif /* __OPTIMIZE__ */ - -/* Intrinsics vcvtudq2ph. */ -extern __inline __m256h +#ifdef __OPTIMIZE__ +extern __inline __m512h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_cvtepu32_ph (__m512i __A) +_mm512_max_round_ph (__m512h __A, __m512h __B, const int __C) { - return __builtin_ia32_vcvtudq2ph512_mask_round ((__v16si) __A, - _mm256_setzero_ph (), - (__mmask16) -1, - _MM_FROUND_CUR_DIRECTION); + return __builtin_ia32_maxph512_mask_round (__A, __B, + _mm512_setzero_ph (), + (__mmask32) -1, __C); } -extern __inline __m256h +extern __inline __m512h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_mask_cvtepu32_ph (__m256h __A, __mmask16 __B, __m512i __C) +_mm512_mask_max_round_ph (__m512h __A, __mmask32 __B, __m512h __C, + __m512h __D, const int __E) { - return __builtin_ia32_vcvtudq2ph512_mask_round ((__v16si) __C, - __A, - __B, - _MM_FROUND_CUR_DIRECTION); + return __builtin_ia32_maxph512_mask_round (__C, __D, __A, __B, __E); } -extern __inline __m256h +extern __inline __m512h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_maskz_cvtepu32_ph (__mmask16 __A, __m512i __B) +_mm512_maskz_max_round_ph (__mmask32 __A, __m512h __B, __m512h __C, + const int __D) { - return __builtin_ia32_vcvtudq2ph512_mask_round ((__v16si) __B, - _mm256_setzero_ph (), - __A, - _MM_FROUND_CUR_DIRECTION); + return __builtin_ia32_maxph512_mask_round (__B, __C, + _mm512_setzero_ph (), + __A, __D); } -#ifdef __OPTIMIZE__ -extern __inline __m256h +extern __inline __m512h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_cvt_roundepu32_ph (__m512i __A, int __B) +_mm512_min_round_ph (__m512h __A, __m512h __B, const int __C) { - return __builtin_ia32_vcvtudq2ph512_mask_round ((__v16si) __A, - _mm256_setzero_ph (), - (__mmask16) -1, - __B); + return __builtin_ia32_minph512_mask_round (__A, __B, + _mm512_setzero_ph (), + (__mmask32) -1, __C); } -extern __inline __m256h +extern __inline __m512h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_mask_cvt_roundepu32_ph (__m256h __A, __mmask16 __B, __m512i __C, int __D) +_mm512_mask_min_round_ph (__m512h __A, __mmask32 __B, __m512h __C, + __m512h __D, const int __E) { - return __builtin_ia32_vcvtudq2ph512_mask_round ((__v16si) __C, - __A, - __B, - __D); + return __builtin_ia32_minph512_mask_round (__C, __D, __A, __B, __E); } -extern __inline __m256h +extern __inline __m512h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_maskz_cvt_roundepu32_ph (__mmask16 __A, __m512i __B, int __C) +_mm512_maskz_min_round_ph (__mmask32 __A, __m512h __B, __m512h __C, + const int __D) { - return __builtin_ia32_vcvtudq2ph512_mask_round ((__v16si) __B, - _mm256_setzero_ph (), - __A, - __C); + return __builtin_ia32_minph512_mask_round (__B, __C, + _mm512_setzero_ph (), + __A, __D); } #else -#define _mm512_cvt_roundepu32_ph(A, B) \ - (__builtin_ia32_vcvtudq2ph512_mask_round ((__v16si)(A), \ - _mm256_setzero_ph (), \ - (__mmask16)-1, \ - B)) +#define _mm512_max_round_ph(A, B, C) \ + (__builtin_ia32_maxph512_mask_round ((A), (B), \ + _mm512_setzero_ph (), \ + (__mmask32)-1, (C))) -#define _mm512_mask_cvt_roundepu32_ph(A, B, C, D) \ - (__builtin_ia32_vcvtudq2ph512_mask_round ((__v16si)C, \ - A, \ - B, \ - D)) +#define _mm512_mask_max_round_ph(A, B, C, D, E) \ + (__builtin_ia32_maxph512_mask_round ((C), (D), (A), (B), (E))) -#define _mm512_maskz_cvt_roundepu32_ph(A, B, C) \ - (__builtin_ia32_vcvtudq2ph512_mask_round ((__v16si)B, \ - _mm256_setzero_ph (), \ - A, \ - C)) +#define _mm512_maskz_max_round_ph(A, B, C, D) \ + (__builtin_ia32_maxph512_mask_round ((B), (C), \ + _mm512_setzero_ph (), \ + (A), (D))) -#endif /* __OPTIMIZE__ */ +#define _mm512_min_round_ph(A, B, C) \ + (__builtin_ia32_minph512_mask_round ((A), (B), \ + _mm512_setzero_ph (), \ + (__mmask32)-1, (C))) -/* Intrinsics vcvtph2qq. */ -extern __inline __m512i -__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_cvtph_epi64 (__m128h __A) -{ - return __builtin_ia32_vcvtph2qq512_mask_round (__A, - _mm512_setzero_si512 (), - (__mmask8) -1, - _MM_FROUND_CUR_DIRECTION); -} +#define _mm512_mask_min_round_ph(A, B, C, D, E) \ + (__builtin_ia32_minph512_mask_round ((C), (D), (A), (B), (E))) -extern __inline __m512i -__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_mask_cvtph_epi64 (__m512i __A, __mmask8 __B, __m128h __C) -{ - return __builtin_ia32_vcvtph2qq512_mask_round (__C, __A, __B, - _MM_FROUND_CUR_DIRECTION); -} +#define _mm512_maskz_min_round_ph(A, B, C, D) \ + (__builtin_ia32_minph512_mask_round ((B), (C), \ + _mm512_setzero_ph (), \ + (A), (D))) +#endif /* __OPTIMIZE__ */ -extern __inline __m512i +/* vcmpph */ +#ifdef __OPTIMIZE +extern __inline __mmask32 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_maskz_cvtph_epi64 (__mmask8 __A, __m128h __B) +_mm512_cmp_ph_mask (__m512h __A, __m512h __B, const int __C) { - return __builtin_ia32_vcvtph2qq512_mask_round (__B, - _mm512_setzero_si512 (), - __A, - _MM_FROUND_CUR_DIRECTION); + return (__mmask32) __builtin_ia32_cmpph512_mask (__A, __B, __C, + (__mmask32) -1); } -#ifdef __OPTIMIZE__ -extern __inline __m512i +extern __inline __mmask32 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_cvt_roundph_epi64 (__m128h __A, int __B) +_mm512_mask_cmp_ph_mask (__mmask32 __A, __m512h __B, __m512h __C, + const int __D) { - return __builtin_ia32_vcvtph2qq512_mask_round (__A, - _mm512_setzero_si512 (), - (__mmask8) -1, - __B); + return (__mmask32) __builtin_ia32_cmpph512_mask (__B, __C, __D, + __A); } -extern __inline __m512i +extern __inline __mmask32 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_mask_cvt_roundph_epi64 (__m512i __A, __mmask8 __B, __m128h __C, int __D) +_mm512_cmp_round_ph_mask (__m512h __A, __m512h __B, const int __C, + const int __D) { - return __builtin_ia32_vcvtph2qq512_mask_round (__C, __A, __B, __D); + return (__mmask32) __builtin_ia32_cmpph512_mask_round (__A, __B, + __C, (__mmask32) -1, + __D); } -extern __inline __m512i +extern __inline __mmask32 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_maskz_cvt_roundph_epi64 (__mmask8 __A, __m128h __B, int __C) +_mm512_mask_cmp_round_ph_mask (__mmask32 __A, __m512h __B, __m512h __C, + const int __D, const int __E) { - return __builtin_ia32_vcvtph2qq512_mask_round (__B, - _mm512_setzero_si512 (), - __A, - __C); + return (__mmask32) __builtin_ia32_cmpph512_mask_round (__B, __C, + __D, __A, + __E); } -#else -#define _mm512_cvt_roundph_epi64(A, B) \ - (__builtin_ia32_vcvtph2qq512_mask_round ((A), \ - _mm512_setzero_si512 (), \ - (__mmask8)-1, \ - (B))) - -#define _mm512_mask_cvt_roundph_epi64(A, B, C, D) \ - (__builtin_ia32_vcvtph2qq512_mask_round ((C), (A), (B), (D))) +#else +#define _mm512_cmp_ph_mask(A, B, C) \ + (__builtin_ia32_cmpph512_mask ((A), (B), (C), (-1))) -#define _mm512_maskz_cvt_roundph_epi64(A, B, C) \ - (__builtin_ia32_vcvtph2qq512_mask_round ((B), \ - _mm512_setzero_si512 (), \ - (A), \ - (C))) +#define _mm512_mask_cmp_ph_mask(A, B, C, D) \ + (__builtin_ia32_cmpph512_mask ((B), (C), (D), (A))) + +#define _mm512_cmp_round_ph_mask(A, B, C, D) \ + (__builtin_ia32_cmpph512_mask_round ((A), (B), (C), (-1), (D))) + +#define _mm512_mask_cmp_round_ph_mask(A, B, C, D, E) \ + (__builtin_ia32_cmpph512_mask_round ((B), (C), (D), (A), (E))) #endif /* __OPTIMIZE__ */ -/* Intrinsics vcvtph2uqq. */ -extern __inline __m512i +/* Intrinsics vsqrtph. */ +extern __inline __m512h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_cvtph_epu64 (__m128h __A) +_mm512_sqrt_ph (__m512h __A) { - return __builtin_ia32_vcvtph2uqq512_mask_round (__A, - _mm512_setzero_si512 (), - (__mmask8) -1, - _MM_FROUND_CUR_DIRECTION); + return __builtin_ia32_sqrtph512_mask_round (__A, + _mm512_setzero_ph(), + (__mmask32) -1, + _MM_FROUND_CUR_DIRECTION); } -extern __inline __m512i +extern __inline __m512h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_mask_cvtph_epu64 (__m512i __A, __mmask8 __B, __m128h __C) +_mm512_mask_sqrt_ph (__m512h __A, __mmask32 __B, __m512h __C) { - return __builtin_ia32_vcvtph2uqq512_mask_round (__C, __A, __B, - _MM_FROUND_CUR_DIRECTION); + return __builtin_ia32_sqrtph512_mask_round (__C, __A, __B, + _MM_FROUND_CUR_DIRECTION); } -extern __inline __m512i +extern __inline __m512h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_maskz_cvtph_epu64 (__mmask8 __A, __m128h __B) +_mm512_maskz_sqrt_ph (__mmask32 __A, __m512h __B) { - return __builtin_ia32_vcvtph2uqq512_mask_round (__B, - _mm512_setzero_si512 (), - __A, - _MM_FROUND_CUR_DIRECTION); + return __builtin_ia32_sqrtph512_mask_round (__B, + _mm512_setzero_ph (), + __A, + _MM_FROUND_CUR_DIRECTION); } #ifdef __OPTIMIZE__ - -extern __inline __m512i +extern __inline __m512h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_cvt_roundph_epu64 (__m128h __A, int __B) +_mm512_sqrt_round_ph (__m512h __A, const int __B) { - return __builtin_ia32_vcvtph2uqq512_mask_round (__A, - _mm512_setzero_si512 (), - (__mmask8) -1, - __B); + return __builtin_ia32_sqrtph512_mask_round (__A, + _mm512_setzero_ph(), + (__mmask32) -1, __B); } -extern __inline __m512i +extern __inline __m512h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_mask_cvt_roundph_epu64 (__m512i __A, __mmask8 __B, __m128h __C, int __D) +_mm512_mask_sqrt_round_ph (__m512h __A, __mmask32 __B, __m512h __C, + const int __D) { - return __builtin_ia32_vcvtph2uqq512_mask_round (__C, __A, __B, __D); + return __builtin_ia32_sqrtph512_mask_round (__C, __A, __B, __D); } -extern __inline __m512i +extern __inline __m512h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_maskz_cvt_roundph_epu64 (__mmask8 __A, __m128h __B, int __C) +_mm512_maskz_sqrt_round_ph (__mmask32 __A, __m512h __B, const int __C) { - return __builtin_ia32_vcvtph2uqq512_mask_round (__B, - _mm512_setzero_si512 (), - __A, - __C); + return __builtin_ia32_sqrtph512_mask_round (__B, + _mm512_setzero_ph (), + __A, __C); } #else -#define _mm512_cvt_roundph_epu64(A, B) \ - (__builtin_ia32_vcvtph2uqq512_mask_round ((A), \ - _mm512_setzero_si512 (), \ - (__mmask8)-1, \ - (B))) +#define _mm512_sqrt_round_ph(A, B) \ + (__builtin_ia32_sqrtph512_mask_round ((A), \ + _mm512_setzero_ph (), \ + (__mmask32)-1, (B))) -#define _mm512_mask_cvt_roundph_epu64(A, B, C, D) \ - (__builtin_ia32_vcvtph2uqq512_mask_round ((C), (A), (B), (D))) +#define _mm512_mask_sqrt_round_ph(A, B, C, D) \ + (__builtin_ia32_sqrtph512_mask_round ((C), (A), (B), (D))) -#define _mm512_maskz_cvt_roundph_epu64(A, B, C) \ - (__builtin_ia32_vcvtph2uqq512_mask_round ((B), \ - _mm512_setzero_si512 (), \ - (A), \ - (C))) +#define _mm512_maskz_sqrt_round_ph(A, B, C) \ + (__builtin_ia32_sqrtph512_mask_round ((B), \ + _mm512_setzero_ph (), \ + (A), (C))) #endif /* __OPTIMIZE__ */ -/* Intrinsics vcvttph2qq. */ -extern __inline __m512i +/* Intrinsics vrsqrtph. */ +extern __inline __m512h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_cvttph_epi64 (__m128h __A) +_mm512_rsqrt_ph (__m512h __A) { - return __builtin_ia32_vcvttph2qq512_mask_round (__A, - _mm512_setzero_si512 (), - (__mmask8) -1, - _MM_FROUND_CUR_DIRECTION); + return __builtin_ia32_rsqrtph512_mask (__A, _mm512_setzero_ph (), + (__mmask32) -1); } -extern __inline __m512i +extern __inline __m512h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_mask_cvttph_epi64 (__m512i __A, __mmask8 __B, __m128h __C) +_mm512_mask_rsqrt_ph (__m512h __A, __mmask32 __B, __m512h __C) { - return __builtin_ia32_vcvttph2qq512_mask_round (__C, __A, __B, - _MM_FROUND_CUR_DIRECTION); + return __builtin_ia32_rsqrtph512_mask (__C, __A, __B); } -extern __inline __m512i +extern __inline __m512h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_maskz_cvttph_epi64 (__mmask8 __A, __m128h __B) +_mm512_maskz_rsqrt_ph (__mmask32 __A, __m512h __B) { - return __builtin_ia32_vcvttph2qq512_mask_round (__B, - _mm512_setzero_si512 (), - __A, - _MM_FROUND_CUR_DIRECTION); + return __builtin_ia32_rsqrtph512_mask (__B, _mm512_setzero_ph (), + __A); } -#ifdef __OPTIMIZE__ -extern __inline __m512i +/* Intrinsics vrcpph. */ +extern __inline __m512h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_cvtt_roundph_epi64 (__m128h __A, int __B) +_mm512_rcp_ph (__m512h __A) { - return __builtin_ia32_vcvttph2qq512_mask_round (__A, - _mm512_setzero_si512 (), - (__mmask8) -1, - __B); + return __builtin_ia32_rcpph512_mask (__A, _mm512_setzero_ph (), + (__mmask32) -1); } -extern __inline __m512i +extern __inline __m512h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_mask_cvtt_roundph_epi64 (__m512i __A, __mmask8 __B, __m128h __C, int __D) +_mm512_mask_rcp_ph (__m512h __A, __mmask32 __B, __m512h __C) { - return __builtin_ia32_vcvttph2qq512_mask_round (__C, __A, __B, __D); + return __builtin_ia32_rcpph512_mask (__C, __A, __B); } -extern __inline __m512i +extern __inline __m512h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_maskz_cvtt_roundph_epi64 (__mmask8 __A, __m128h __B, int __C) +_mm512_maskz_rcp_ph (__mmask32 __A, __m512h __B) { - return __builtin_ia32_vcvttph2qq512_mask_round (__B, - _mm512_setzero_si512 (), - __A, - __C); + return __builtin_ia32_rcpph512_mask (__B, _mm512_setzero_ph (), + __A); } -#else -#define _mm512_cvtt_roundph_epi64(A, B) \ - (__builtin_ia32_vcvttph2qq512_mask_round ((A), \ - _mm512_setzero_si512 (), \ - (__mmask8)-1, \ - (B))) - -#define _mm512_mask_cvtt_roundph_epi64(A, B, C, D) \ - __builtin_ia32_vcvttph2qq512_mask_round ((C), (A), (B), (D)) - -#define _mm512_maskz_cvtt_roundph_epi64(A, B, C) \ - (__builtin_ia32_vcvttph2qq512_mask_round ((B), \ - _mm512_setzero_si512 (), \ - (A), \ - (C))) - -#endif /* __OPTIMIZE__ */ - -/* Intrinsics vcvttph2uqq. */ -extern __inline __m512i +/* Intrinsics vscalefph. */ +extern __inline __m512h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_cvttph_epu64 (__m128h __A) +_mm512_scalef_ph (__m512h __A, __m512h __B) { - return __builtin_ia32_vcvttph2uqq512_mask_round (__A, - _mm512_setzero_si512 (), - (__mmask8) -1, - _MM_FROUND_CUR_DIRECTION); + return __builtin_ia32_scalefph512_mask_round (__A, __B, + _mm512_setzero_ph (), + (__mmask32) -1, + _MM_FROUND_CUR_DIRECTION); } -extern __inline __m512i +extern __inline __m512h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_mask_cvttph_epu64 (__m512i __A, __mmask8 __B, __m128h __C) +_mm512_mask_scalef_ph (__m512h __A, __mmask32 __B, __m512h __C, __m512h __D) { - return __builtin_ia32_vcvttph2uqq512_mask_round (__C, __A, __B, - _MM_FROUND_CUR_DIRECTION); + return __builtin_ia32_scalefph512_mask_round (__C, __D, __A, __B, + _MM_FROUND_CUR_DIRECTION); } -extern __inline __m512i +extern __inline __m512h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_maskz_cvttph_epu64 (__mmask8 __A, __m128h __B) +_mm512_maskz_scalef_ph (__mmask32 __A, __m512h __B, __m512h __C) { - return __builtin_ia32_vcvttph2uqq512_mask_round (__B, - _mm512_setzero_si512 (), - __A, - _MM_FROUND_CUR_DIRECTION); + return __builtin_ia32_scalefph512_mask_round (__B, __C, + _mm512_setzero_ph (), + __A, + _MM_FROUND_CUR_DIRECTION); } #ifdef __OPTIMIZE__ -extern __inline __m512i +extern __inline __m512h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_cvtt_roundph_epu64 (__m128h __A, int __B) +_mm512_scalef_round_ph (__m512h __A, __m512h __B, const int __C) { - return __builtin_ia32_vcvttph2uqq512_mask_round (__A, - _mm512_setzero_si512 (), - (__mmask8) -1, - __B); + return __builtin_ia32_scalefph512_mask_round (__A, __B, + _mm512_setzero_ph (), + (__mmask32) -1, __C); } -extern __inline __m512i +extern __inline __m512h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_mask_cvtt_roundph_epu64 (__m512i __A, __mmask8 __B, __m128h __C, int __D) +_mm512_mask_scalef_round_ph (__m512h __A, __mmask32 __B, __m512h __C, + __m512h __D, const int __E) { - return __builtin_ia32_vcvttph2uqq512_mask_round (__C, __A, __B, __D); + return __builtin_ia32_scalefph512_mask_round (__C, __D, __A, __B, + __E); } -extern __inline __m512i +extern __inline __m512h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_maskz_cvtt_roundph_epu64 (__mmask8 __A, __m128h __B, int __C) +_mm512_maskz_scalef_round_ph (__mmask32 __A, __m512h __B, __m512h __C, + const int __D) { - return __builtin_ia32_vcvttph2uqq512_mask_round (__B, - _mm512_setzero_si512 (), - __A, - __C); + return __builtin_ia32_scalefph512_mask_round (__B, __C, + _mm512_setzero_ph (), + __A, __D); } #else -#define _mm512_cvtt_roundph_epu64(A, B) \ - (__builtin_ia32_vcvttph2uqq512_mask_round ((A), \ - _mm512_setzero_si512 (), \ - (__mmask8)-1, \ - (B))) - -#define _mm512_mask_cvtt_roundph_epu64(A, B, C, D) \ - __builtin_ia32_vcvttph2uqq512_mask_round ((C), (A), (B), (D)) +#define _mm512_scalef_round_ph(A, B, C) \ + (__builtin_ia32_scalefph512_mask_round ((A), (B), \ + _mm512_setzero_ph (), \ + (__mmask32)-1, (C))) -#define _mm512_maskz_cvtt_roundph_epu64(A, B, C) \ - (__builtin_ia32_vcvttph2uqq512_mask_round ((B), \ - _mm512_setzero_si512 (), \ - (A), \ - (C))) +#define _mm512_mask_scalef_round_ph(A, B, C, D, E) \ + (__builtin_ia32_scalefph512_mask_round ((C), (D), (A), (B), (E))) -#endif /* __OPTIMIZE__ */ +#define _mm512_maskz_scalef_round_ph(A, B, C, D) \ + (__builtin_ia32_scalefph512_mask_round ((B), (C), \ + _mm512_setzero_ph (), \ + (A), (D))) -/* Intrinsics vcvtqq2ph. */ -extern __inline __m128h - __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_cvtepi64_ph (__m512i __A) +#endif /* __OPTIMIZE__ */ + +/* Intrinsics vreduceph. */ +#ifdef __OPTIMIZE__ +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_reduce_ph (__m512h __A, int __B) { - return __builtin_ia32_vcvtqq2ph512_mask_round ((__v8di) __A, - _mm_setzero_ph (), - (__mmask8) -1, - _MM_FROUND_CUR_DIRECTION); + return __builtin_ia32_reduceph512_mask_round (__A, __B, + _mm512_setzero_ph (), + (__mmask32) -1, + _MM_FROUND_CUR_DIRECTION); } -extern __inline __m128h +extern __inline __m512h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_mask_cvtepi64_ph (__m128h __A, __mmask8 __B, __m512i __C) +_mm512_mask_reduce_ph (__m512h __A, __mmask32 __B, __m512h __C, int __D) { - return __builtin_ia32_vcvtqq2ph512_mask_round ((__v8di) __C, - __A, - __B, - _MM_FROUND_CUR_DIRECTION); + return __builtin_ia32_reduceph512_mask_round (__C, __D, __A, __B, + _MM_FROUND_CUR_DIRECTION); } -extern __inline __m128h +extern __inline __m512h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_maskz_cvtepi64_ph (__mmask8 __A, __m512i __B) +_mm512_maskz_reduce_ph (__mmask32 __A, __m512h __B, int __C) { - return __builtin_ia32_vcvtqq2ph512_mask_round ((__v8di) __B, - _mm_setzero_ph (), - __A, - _MM_FROUND_CUR_DIRECTION); + return __builtin_ia32_reduceph512_mask_round (__B, __C, + _mm512_setzero_ph (), + __A, + _MM_FROUND_CUR_DIRECTION); } -#ifdef __OPTIMIZE__ -extern __inline __m128h +extern __inline __m512h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_cvt_roundepi64_ph (__m512i __A, int __B) +_mm512_reduce_round_ph (__m512h __A, int __B, const int __C) { - return __builtin_ia32_vcvtqq2ph512_mask_round ((__v8di) __A, - _mm_setzero_ph (), - (__mmask8) -1, - __B); + return __builtin_ia32_reduceph512_mask_round (__A, __B, + _mm512_setzero_ph (), + (__mmask32) -1, __C); } -extern __inline __m128h +extern __inline __m512h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_mask_cvt_roundepi64_ph (__m128h __A, __mmask8 __B, __m512i __C, int __D) +_mm512_mask_reduce_round_ph (__m512h __A, __mmask32 __B, __m512h __C, + int __D, const int __E) { - return __builtin_ia32_vcvtqq2ph512_mask_round ((__v8di) __C, - __A, - __B, - __D); + return __builtin_ia32_reduceph512_mask_round (__C, __D, __A, __B, + __E); } -extern __inline __m128h +extern __inline __m512h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_maskz_cvt_roundepi64_ph (__mmask8 __A, __m512i __B, int __C) +_mm512_maskz_reduce_round_ph (__mmask32 __A, __m512h __B, int __C, + const int __D) { - return __builtin_ia32_vcvtqq2ph512_mask_round ((__v8di) __B, - _mm_setzero_ph (), - __A, - __C); + return __builtin_ia32_reduceph512_mask_round (__B, __C, + _mm512_setzero_ph (), + __A, __D); } #else -#define _mm512_cvt_roundepi64_ph(A, B) \ - (__builtin_ia32_vcvtqq2ph512_mask_round ((__v8di)(A), \ - _mm_setzero_ph (), \ - (__mmask8)-1, \ - (B))) +#define _mm512_reduce_ph(A, B) \ + (__builtin_ia32_reduceph512_mask_round ((A), (B), \ + _mm512_setzero_ph (), \ + (__mmask32)-1, \ + _MM_FROUND_CUR_DIRECTION)) -#define _mm512_mask_cvt_roundepi64_ph(A, B, C, D) \ - (__builtin_ia32_vcvtqq2ph512_mask_round ((__v8di)(C), (A), (B), (D))) +#define _mm512_mask_reduce_ph(A, B, C, D) \ + (__builtin_ia32_reduceph512_mask_round ((C), (D), (A), (B), \ + _MM_FROUND_CUR_DIRECTION)) -#define _mm512_maskz_cvt_roundepi64_ph(A, B, C) \ - (__builtin_ia32_vcvtqq2ph512_mask_round ((__v8di)(B), \ - _mm_setzero_ph (), \ - (A), \ - (C))) +#define _mm512_maskz_reduce_ph(A, B, C) \ + (__builtin_ia32_reduceph512_mask_round ((B), (C), \ + _mm512_setzero_ph (), \ + (A), _MM_FROUND_CUR_DIRECTION)) + +#define _mm512_reduce_round_ph(A, B, C) \ + (__builtin_ia32_reduceph512_mask_round ((A), (B), \ + _mm512_setzero_ph (), \ + (__mmask32)-1, (C))) + +#define _mm512_mask_reduce_round_ph(A, B, C, D, E) \ + (__builtin_ia32_reduceph512_mask_round ((C), (D), (A), (B), (E))) + +#define _mm512_maskz_reduce_round_ph(A, B, C, D) \ + (__builtin_ia32_reduceph512_mask_round ((B), (C), \ + _mm512_setzero_ph (), \ + (A), (D))) #endif /* __OPTIMIZE__ */ -/* Intrinsics vcvtuqq2ph. */ -extern __inline __m128h -__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_cvtepu64_ph (__m512i __A) +/* Intrinsics vrndscaleph. */ +#ifdef __OPTIMIZE__ +extern __inline __m512h + __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_roundscale_ph (__m512h __A, int __B) { - return __builtin_ia32_vcvtuqq2ph512_mask_round ((__v8di) __A, - _mm_setzero_ph (), - (__mmask8) -1, + return __builtin_ia32_rndscaleph512_mask_round (__A, __B, + _mm512_setzero_ph (), + (__mmask32) -1, _MM_FROUND_CUR_DIRECTION); } -extern __inline __m128h +extern __inline __m512h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_mask_cvtepu64_ph (__m128h __A, __mmask8 __B, __m512i __C) +_mm512_mask_roundscale_ph (__m512h __A, __mmask32 __B, + __m512h __C, int __D) { - return __builtin_ia32_vcvtuqq2ph512_mask_round ((__v8di) __C, - __A, - __B, + return __builtin_ia32_rndscaleph512_mask_round (__C, __D, __A, __B, _MM_FROUND_CUR_DIRECTION); } -extern __inline __m128h +extern __inline __m512h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_maskz_cvtepu64_ph (__mmask8 __A, __m512i __B) +_mm512_maskz_roundscale_ph (__mmask32 __A, __m512h __B, int __C) { - return __builtin_ia32_vcvtuqq2ph512_mask_round ((__v8di) __B, - _mm_setzero_ph (), + return __builtin_ia32_rndscaleph512_mask_round (__B, __C, + _mm512_setzero_ph (), __A, _MM_FROUND_CUR_DIRECTION); } -#ifdef __OPTIMIZE__ -extern __inline __m128h +extern __inline __m512h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_cvt_roundepu64_ph (__m512i __A, int __B) +_mm512_roundscale_round_ph (__m512h __A, int __B, const int __C) { - return __builtin_ia32_vcvtuqq2ph512_mask_round ((__v8di) __A, - _mm_setzero_ph (), - (__mmask8) -1, - __B); + return __builtin_ia32_rndscaleph512_mask_round (__A, __B, + _mm512_setzero_ph (), + (__mmask32) -1, + __C); } -extern __inline __m128h +extern __inline __m512h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_mask_cvt_roundepu64_ph (__m128h __A, __mmask8 __B, __m512i __C, int __D) +_mm512_mask_roundscale_round_ph (__m512h __A, __mmask32 __B, + __m512h __C, int __D, const int __E) { - return __builtin_ia32_vcvtuqq2ph512_mask_round ((__v8di) __C, - __A, - __B, - __D); + return __builtin_ia32_rndscaleph512_mask_round (__C, __D, __A, + __B, __E); } -extern __inline __m128h +extern __inline __m512h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_maskz_cvt_roundepu64_ph (__mmask8 __A, __m512i __B, int __C) +_mm512_maskz_roundscale_round_ph (__mmask32 __A, __m512h __B, int __C, + const int __D) { - return __builtin_ia32_vcvtuqq2ph512_mask_round ((__v8di) __B, - _mm_setzero_ph (), - __A, - __C); + return __builtin_ia32_rndscaleph512_mask_round (__B, __C, + _mm512_setzero_ph (), + __A, __D); } #else -#define _mm512_cvt_roundepu64_ph(A, B) \ - (__builtin_ia32_vcvtuqq2ph512_mask_round ((__v8di)(A), \ - _mm_setzero_ph (), \ - (__mmask8)-1, \ - (B))) +#define _mm512_roundscale_ph(A, B) \ + (__builtin_ia32_rndscaleph512_mask_round ((A), (B), \ + _mm512_setzero_ph (), \ + (__mmask32)-1, \ + _MM_FROUND_CUR_DIRECTION)) -#define _mm512_mask_cvt_roundepu64_ph(A, B, C, D) \ - (__builtin_ia32_vcvtuqq2ph512_mask_round ((__v8di)(C), (A), (B), (D))) +#define _mm512_mask_roundscale_ph(A, B, C, D) \ + (__builtin_ia32_rndscaleph512_mask_round ((C), (D), (A), (B), \ + _MM_FROUND_CUR_DIRECTION)) -#define _mm512_maskz_cvt_roundepu64_ph(A, B, C) \ - (__builtin_ia32_vcvtuqq2ph512_mask_round ((__v8di)(B), \ - _mm_setzero_ph (), \ - (A), \ - (C))) +#define _mm512_maskz_roundscale_ph(A, B, C) \ + (__builtin_ia32_rndscaleph512_mask_round ((B), (C), \ + _mm512_setzero_ph (), \ + (A), \ + _MM_FROUND_CUR_DIRECTION)) +#define _mm512_roundscale_round_ph(A, B, C) \ + (__builtin_ia32_rndscaleph512_mask_round ((A), (B), \ + _mm512_setzero_ph (), \ + (__mmask32)-1, (C))) + +#define _mm512_mask_roundscale_round_ph(A, B, C, D, E) \ + (__builtin_ia32_rndscaleph512_mask_round ((C), (D), (A), (B), (E))) + +#define _mm512_maskz_roundscale_round_ph(A, B, C, D) \ + (__builtin_ia32_rndscaleph512_mask_round ((B), (C), \ + _mm512_setzero_ph (), \ + (A), (D))) #endif /* __OPTIMIZE__ */ -/* Intrinsics vcvtph2w. */ -extern __inline __m512i +/* Intrinsics vfpclassph. */ +#ifdef __OPTIMIZE__ +extern __inline __mmask32 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_fpclass_ph_mask (__mmask32 __U, __m512h __A, + const int __imm) +{ + return (__mmask32) __builtin_ia32_fpclassph512_mask ((__v32hf) __A, + __imm, __U); +} + +extern __inline __mmask32 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_fpclass_ph_mask (__m512h __A, const int __imm) +{ + return (__mmask32) __builtin_ia32_fpclassph512_mask ((__v32hf) __A, + __imm, + (__mmask32) -1); +} + +#else +#define _mm512_mask_fpclass_ph_mask(u, x, c) \ + ((__mmask32) __builtin_ia32_fpclassph512_mask ((__v32hf) (__m512h) (x), \ + (int) (c),(__mmask8)(u))) + +#define _mm512_fpclass_ph_mask(x, c) \ + ((__mmask32) __builtin_ia32_fpclassph512_mask ((__v32hf) (__m512h) (x), \ + (int) (c),(__mmask8)-1)) +#endif /* __OPIMTIZE__ */ + +/* Intrinsics vgetexpph. */ +extern __inline __m512h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_cvtph_epi16 (__m512h __A) +_mm512_getexp_ph (__m512h __A) { - return (__m512i) - __builtin_ia32_vcvtph2w512_mask_round (__A, - (__v32hi) - _mm512_setzero_si512 (), - (__mmask32) -1, - _MM_FROUND_CUR_DIRECTION); + return (__m512h) + __builtin_ia32_getexpph512_mask ((__v32hf) __A, + (__v32hf) _mm512_setzero_ph (), + (__mmask32) -1, _MM_FROUND_CUR_DIRECTION); } -extern __inline __m512i +extern __inline __m512h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_mask_cvtph_epi16 (__m512i __A, __mmask32 __B, __m512h __C) +_mm512_mask_getexp_ph (__m512h __W, __mmask32 __U, __m512h __A) { - return (__m512i) - __builtin_ia32_vcvtph2w512_mask_round (__C, - (__v32hi) __A, - __B, - _MM_FROUND_CUR_DIRECTION); + return (__m512h) + __builtin_ia32_getexpph512_mask ((__v32hf) __A, (__v32hf) __W, + (__mmask32) __U, _MM_FROUND_CUR_DIRECTION); } -extern __inline __m512i +extern __inline __m512h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_maskz_cvtph_epi16 (__mmask32 __A, __m512h __B) +_mm512_maskz_getexp_ph (__mmask32 __U, __m512h __A) { - return (__m512i) - __builtin_ia32_vcvtph2w512_mask_round (__B, - (__v32hi) - _mm512_setzero_si512 (), - __A, - _MM_FROUND_CUR_DIRECTION); + return (__m512h) + __builtin_ia32_getexpph512_mask ((__v32hf) __A, + (__v32hf) _mm512_setzero_ph (), + (__mmask32) __U, _MM_FROUND_CUR_DIRECTION); } #ifdef __OPTIMIZE__ -extern __inline __m512i +extern __inline __m512h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_cvt_roundph_epi16 (__m512h __A, int __B) +_mm512_getexp_round_ph (__m512h __A, const int __R) { - return (__m512i) - __builtin_ia32_vcvtph2w512_mask_round (__A, - (__v32hi) - _mm512_setzero_si512 (), - (__mmask32) -1, - __B); + return (__m512h) __builtin_ia32_getexpph512_mask ((__v32hf) __A, + (__v32hf) + _mm512_setzero_ph (), + (__mmask32) -1, __R); } -extern __inline __m512i +extern __inline __m512h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_mask_cvt_roundph_epi16 (__m512i __A, __mmask32 __B, __m512h __C, int __D) +_mm512_mask_getexp_round_ph (__m512h __W, __mmask32 __U, __m512h __A, + const int __R) { - return (__m512i) - __builtin_ia32_vcvtph2w512_mask_round (__C, - (__v32hi) __A, - __B, - __D); + return (__m512h) __builtin_ia32_getexpph512_mask ((__v32hf) __A, + (__v32hf) __W, + (__mmask32) __U, __R); } -extern __inline __m512i +extern __inline __m512h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_maskz_cvt_roundph_epi16 (__mmask32 __A, __m512h __B, int __C) +_mm512_maskz_getexp_round_ph (__mmask32 __U, __m512h __A, const int __R) { - return (__m512i) - __builtin_ia32_vcvtph2w512_mask_round (__B, - (__v32hi) - _mm512_setzero_si512 (), - __A, - __C); + return (__m512h) __builtin_ia32_getexpph512_mask ((__v32hf) __A, + (__v32hf) + _mm512_setzero_ph (), + (__mmask32) __U, __R); } #else -#define _mm512_cvt_roundph_epi16(A, B) \ - ((__m512i)__builtin_ia32_vcvtph2w512_mask_round ((A), \ - (__v32hi) \ - _mm512_setzero_si512 (), \ - (__mmask32)-1, \ - (B))) +#define _mm512_getexp_round_ph(A, R) \ + ((__m512h)__builtin_ia32_getexpph512_mask((__v32hf)(__m512h)(A), \ + (__v32hf)_mm512_setzero_ph(), (__mmask32)-1, R)) -#define _mm512_mask_cvt_roundph_epi16(A, B, C, D) \ - ((__m512i)__builtin_ia32_vcvtph2w512_mask_round ((C), \ - (__v32hi)(A), \ - (B), \ - (D))) +#define _mm512_mask_getexp_round_ph(W, U, A, R) \ + ((__m512h)__builtin_ia32_getexpph512_mask((__v32hf)(__m512h)(A), \ + (__v32hf)(__m512h)(W), (__mmask32)(U), R)) -#define _mm512_maskz_cvt_roundph_epi16(A, B, C) \ - ((__m512i)__builtin_ia32_vcvtph2w512_mask_round ((B), \ - (__v32hi) \ - _mm512_setzero_si512 (), \ - (A), \ - (C))) +#define _mm512_maskz_getexp_round_ph(U, A, R) \ + ((__m512h)__builtin_ia32_getexpph512_mask((__v32hf)(__m512h)(A), \ + (__v32hf)_mm512_setzero_ph(), (__mmask32)(U), R)) #endif /* __OPTIMIZE__ */ -/* Intrinsics vcvtph2uw. */ -extern __inline __m512i +/* Intrinsics vgetmantph. */ +#ifdef __OPTIMIZE__ +extern __inline __m512h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_cvtph_epu16 (__m512h __A) +_mm512_getmant_ph (__m512h __A, _MM_MANTISSA_NORM_ENUM __B, + _MM_MANTISSA_SIGN_ENUM __C) { - return (__m512i) - __builtin_ia32_vcvtph2uw512_mask_round (__A, - (__v32hi) - _mm512_setzero_si512 (), - (__mmask32) -1, - _MM_FROUND_CUR_DIRECTION); + return (__m512h) __builtin_ia32_getmantph512_mask ((__v32hf) __A, + (__C << 2) | __B, + _mm512_setzero_ph (), + (__mmask32) -1, + _MM_FROUND_CUR_DIRECTION); } -extern __inline __m512i +extern __inline __m512h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_mask_cvtph_epu16 (__m512i __A, __mmask32 __B, __m512h __C) +_mm512_mask_getmant_ph (__m512h __W, __mmask32 __U, __m512h __A, + _MM_MANTISSA_NORM_ENUM __B, + _MM_MANTISSA_SIGN_ENUM __C) { - return (__m512i) - __builtin_ia32_vcvtph2uw512_mask_round (__C, (__v32hi) __A, __B, - _MM_FROUND_CUR_DIRECTION); + return (__m512h) __builtin_ia32_getmantph512_mask ((__v32hf) __A, + (__C << 2) | __B, + (__v32hf) __W, __U, + _MM_FROUND_CUR_DIRECTION); } -extern __inline __m512i +extern __inline __m512h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_maskz_cvtph_epu16 (__mmask32 __A, __m512h __B) +_mm512_maskz_getmant_ph (__mmask32 __U, __m512h __A, + _MM_MANTISSA_NORM_ENUM __B, + _MM_MANTISSA_SIGN_ENUM __C) { - return (__m512i) - __builtin_ia32_vcvtph2uw512_mask_round (__B, - (__v32hi) - _mm512_setzero_si512 (), - __A, - _MM_FROUND_CUR_DIRECTION); + return (__m512h) __builtin_ia32_getmantph512_mask ((__v32hf) __A, + (__C << 2) | __B, + (__v32hf) + _mm512_setzero_ph (), + __U, + _MM_FROUND_CUR_DIRECTION); } -#ifdef __OPTIMIZE__ -extern __inline __m512i +extern __inline __m512h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_cvt_roundph_epu16 (__m512h __A, int __B) +_mm512_getmant_round_ph (__m512h __A, _MM_MANTISSA_NORM_ENUM __B, + _MM_MANTISSA_SIGN_ENUM __C, const int __R) { - return (__m512i) - __builtin_ia32_vcvtph2uw512_mask_round (__A, - (__v32hi) - _mm512_setzero_si512 (), - (__mmask32) -1, - __B); + return (__m512h) __builtin_ia32_getmantph512_mask ((__v32hf) __A, + (__C << 2) | __B, + _mm512_setzero_ph (), + (__mmask32) -1, __R); } -extern __inline __m512i +extern __inline __m512h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_mask_cvt_roundph_epu16 (__m512i __A, __mmask32 __B, __m512h __C, int __D) +_mm512_mask_getmant_round_ph (__m512h __W, __mmask32 __U, __m512h __A, + _MM_MANTISSA_NORM_ENUM __B, + _MM_MANTISSA_SIGN_ENUM __C, const int __R) { - return (__m512i) - __builtin_ia32_vcvtph2uw512_mask_round (__C, (__v32hi) __A, __B, __D); + return (__m512h) __builtin_ia32_getmantph512_mask ((__v32hf) __A, + (__C << 2) | __B, + (__v32hf) __W, __U, + __R); } -extern __inline __m512i +extern __inline __m512h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_maskz_cvt_roundph_epu16 (__mmask32 __A, __m512h __B, int __C) +_mm512_maskz_getmant_round_ph (__mmask32 __U, __m512h __A, + _MM_MANTISSA_NORM_ENUM __B, + _MM_MANTISSA_SIGN_ENUM __C, const int __R) { - return (__m512i) - __builtin_ia32_vcvtph2uw512_mask_round (__B, - (__v32hi) - _mm512_setzero_si512 (), - __A, - __C); + return (__m512h) __builtin_ia32_getmantph512_mask ((__v32hf) __A, + (__C << 2) | __B, + (__v32hf) + _mm512_setzero_ph (), + __U, __R); } #else -#define _mm512_cvt_roundph_epu16(A, B) \ - ((__m512i) \ - __builtin_ia32_vcvtph2uw512_mask_round ((A), \ - (__v32hi) \ - _mm512_setzero_si512 (), \ - (__mmask32)-1, (B))) +#define _mm512_getmant_ph(X, B, C) \ + ((__m512h)__builtin_ia32_getmantph512_mask ((__v32hf)(__m512h)(X), \ + (int)(((C)<<2) | (B)), \ + (__v32hf)(__m512h) \ + _mm512_setzero_ph(), \ + (__mmask32)-1, \ + _MM_FROUND_CUR_DIRECTION)) -#define _mm512_mask_cvt_roundph_epu16(A, B, C, D) \ - ((__m512i) \ - __builtin_ia32_vcvtph2uw512_mask_round ((C), (__v32hi)(A), (B), (D))) +#define _mm512_mask_getmant_ph(W, U, X, B, C) \ + ((__m512h)__builtin_ia32_getmantph512_mask ((__v32hf)(__m512h)(X), \ + (int)(((C)<<2) | (B)), \ + (__v32hf)(__m512h)(W), \ + (__mmask32)(U), \ + _MM_FROUND_CUR_DIRECTION)) -#define _mm512_maskz_cvt_roundph_epu16(A, B, C) \ - ((__m512i) \ - __builtin_ia32_vcvtph2uw512_mask_round ((B), \ - (__v32hi) \ - _mm512_setzero_si512 (), \ - (A), \ - (C))) + +#define _mm512_maskz_getmant_ph(U, X, B, C) \ + ((__m512h)__builtin_ia32_getmantph512_mask ((__v32hf)(__m512h)(X), \ + (int)(((C)<<2) | (B)), \ + (__v32hf)(__m512h) \ + _mm512_setzero_ph(), \ + (__mmask32)(U), \ + _MM_FROUND_CUR_DIRECTION)) + +#define _mm512_getmant_round_ph(X, B, C, R) \ + ((__m512h)__builtin_ia32_getmantph512_mask ((__v32hf)(__m512h)(X), \ + (int)(((C)<<2) | (B)), \ + (__v32hf)(__m512h) \ + _mm512_setzero_ph(), \ + (__mmask32)-1, \ + (R))) + +#define _mm512_mask_getmant_round_ph(W, U, X, B, C, R) \ + ((__m512h)__builtin_ia32_getmantph512_mask ((__v32hf)(__m512h)(X), \ + (int)(((C)<<2) | (B)), \ + (__v32hf)(__m512h)(W), \ + (__mmask32)(U), \ + (R))) + + +#define _mm512_maskz_getmant_round_ph(U, X, B, C, R) \ + ((__m512h)__builtin_ia32_getmantph512_mask ((__v32hf)(__m512h)(X), \ + (int)(((C)<<2) | (B)), \ + (__v32hf)(__m512h) \ + _mm512_setzero_ph(), \ + (__mmask32)(U), \ + (R))) #endif /* __OPTIMIZE__ */ -/* Intrinsics vcvttph2w. */ +/* Intrinsics vcvtph2dq. */ extern __inline __m512i __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_cvttph_epi16 (__m512h __A) +_mm512_cvtph_epi32 (__m256h __A) { return (__m512i) - __builtin_ia32_vcvttph2w512_mask_round (__A, - (__v32hi) + __builtin_ia32_vcvtph2dq512_mask_round (__A, + (__v16si) _mm512_setzero_si512 (), - (__mmask32) -1, + (__mmask16) -1, _MM_FROUND_CUR_DIRECTION); } extern __inline __m512i __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_mask_cvttph_epi16 (__m512i __A, __mmask32 __B, __m512h __C) +_mm512_mask_cvtph_epi32 (__m512i __A, __mmask16 __B, __m256h __C) { return (__m512i) - __builtin_ia32_vcvttph2w512_mask_round (__C, - (__v32hi) __A, + __builtin_ia32_vcvtph2dq512_mask_round (__C, + (__v16si) __A, __B, _MM_FROUND_CUR_DIRECTION); } extern __inline __m512i __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_maskz_cvttph_epi16 (__mmask32 __A, __m512h __B) +_mm512_maskz_cvtph_epi32 (__mmask16 __A, __m256h __B) { return (__m512i) - __builtin_ia32_vcvttph2w512_mask_round (__B, - (__v32hi) + __builtin_ia32_vcvtph2dq512_mask_round (__B, + (__v16si) _mm512_setzero_si512 (), __A, _MM_FROUND_CUR_DIRECTION); @@ -4039,2994 +4210,2868 @@ _mm512_maskz_cvttph_epi16 (__mmask32 __A, __m512h __B) #ifdef __OPTIMIZE__ extern __inline __m512i __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_cvtt_roundph_epi16 (__m512h __A, int __B) +_mm512_cvt_roundph_epi32 (__m256h __A, int __B) { return (__m512i) - __builtin_ia32_vcvttph2w512_mask_round (__A, - (__v32hi) + __builtin_ia32_vcvtph2dq512_mask_round (__A, + (__v16si) _mm512_setzero_si512 (), - (__mmask32) -1, + (__mmask16) -1, __B); } extern __inline __m512i __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_mask_cvtt_roundph_epi16 (__m512i __A, __mmask32 __B, - __m512h __C, int __D) +_mm512_mask_cvt_roundph_epi32 (__m512i __A, __mmask16 __B, __m256h __C, int __D) { return (__m512i) - __builtin_ia32_vcvttph2w512_mask_round (__C, - (__v32hi) __A, + __builtin_ia32_vcvtph2dq512_mask_round (__C, + (__v16si) __A, __B, __D); } extern __inline __m512i __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_maskz_cvtt_roundph_epi16 (__mmask32 __A, __m512h __B, int __C) +_mm512_maskz_cvt_roundph_epi32 (__mmask16 __A, __m256h __B, int __C) { return (__m512i) - __builtin_ia32_vcvttph2w512_mask_round (__B, - (__v32hi) + __builtin_ia32_vcvtph2dq512_mask_round (__B, + (__v16si) _mm512_setzero_si512 (), __A, __C); } #else -#define _mm512_cvtt_roundph_epi16(A, B) \ - ((__m512i) \ - __builtin_ia32_vcvttph2w512_mask_round ((A), \ - (__v32hi) \ - _mm512_setzero_si512 (), \ - (__mmask32)-1, \ +#define _mm512_cvt_roundph_epi32(A, B) \ + ((__m512i) \ + __builtin_ia32_vcvtph2dq512_mask_round ((A), \ + (__v16si) \ + _mm512_setzero_si512 (), \ + (__mmask16)-1, \ (B))) -#define _mm512_mask_cvtt_roundph_epi16(A, B, C, D) \ - ((__m512i) \ - __builtin_ia32_vcvttph2w512_mask_round ((C), \ - (__v32hi)(A), \ - (B), \ - (D))) - -#define _mm512_maskz_cvtt_roundph_epi16(A, B, C) \ - ((__m512i) \ - __builtin_ia32_vcvttph2w512_mask_round ((B), \ - (__v32hi) \ - _mm512_setzero_si512 (), \ - (A), \ - (C))) - -#endif /* __OPTIMIZE__ */ - -/* Intrinsics vcvttph2uw. */ -extern __inline __m512i -__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_cvttph_epu16 (__m512h __A) -{ - return (__m512i) - __builtin_ia32_vcvttph2uw512_mask_round (__A, - (__v32hi) - _mm512_setzero_si512 (), - (__mmask32) -1, - _MM_FROUND_CUR_DIRECTION); -} - -extern __inline __m512i -__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_mask_cvttph_epu16 (__m512i __A, __mmask32 __B, __m512h __C) -{ - return (__m512i) - __builtin_ia32_vcvttph2uw512_mask_round (__C, - (__v32hi) __A, - __B, - _MM_FROUND_CUR_DIRECTION); -} - -extern __inline __m512i -__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_maskz_cvttph_epu16 (__mmask32 __A, __m512h __B) -{ - return (__m512i) - __builtin_ia32_vcvttph2uw512_mask_round (__B, - (__v32hi) - _mm512_setzero_si512 (), - __A, - _MM_FROUND_CUR_DIRECTION); -} - -#ifdef __OPTIMIZE__ -extern __inline __m512i -__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_cvtt_roundph_epu16 (__m512h __A, int __B) -{ - return (__m512i) - __builtin_ia32_vcvttph2uw512_mask_round (__A, - (__v32hi) - _mm512_setzero_si512 (), - (__mmask32) -1, - __B); -} - -extern __inline __m512i -__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_mask_cvtt_roundph_epu16 (__m512i __A, __mmask32 __B, - __m512h __C, int __D) -{ - return (__m512i) - __builtin_ia32_vcvttph2uw512_mask_round (__C, - (__v32hi) __A, - __B, - __D); -} - -extern __inline __m512i -__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_maskz_cvtt_roundph_epu16 (__mmask32 __A, __m512h __B, int __C) -{ - return (__m512i) - __builtin_ia32_vcvttph2uw512_mask_round (__B, - (__v32hi) - _mm512_setzero_si512 (), - __A, - __C); -} - -#else -#define _mm512_cvtt_roundph_epu16(A, B) \ - ((__m512i) \ - __builtin_ia32_vcvttph2uw512_mask_round ((A), \ - (__v32hi) \ - _mm512_setzero_si512 (), \ - (__mmask32)-1, \ - (B))) - -#define _mm512_mask_cvtt_roundph_epu16(A, B, C, D) \ - ((__m512i) \ - __builtin_ia32_vcvttph2uw512_mask_round ((C), \ - (__v32hi)(A), \ - (B), \ - (D))) - -#define _mm512_maskz_cvtt_roundph_epu16(A, B, C) \ - ((__m512i) \ - __builtin_ia32_vcvttph2uw512_mask_round ((B), \ - (__v32hi) \ - _mm512_setzero_si512 (), \ - (A), \ - (C))) - -#endif /* __OPTIMIZE__ */ - -/* Intrinsics vcvtw2ph. */ -extern __inline __m512h -__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_cvtepi16_ph (__m512i __A) -{ - return __builtin_ia32_vcvtw2ph512_mask_round ((__v32hi) __A, - _mm512_setzero_ph (), - (__mmask32) -1, - _MM_FROUND_CUR_DIRECTION); -} - -extern __inline __m512h -__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_mask_cvtepi16_ph (__m512h __A, __mmask32 __B, __m512i __C) -{ - return __builtin_ia32_vcvtw2ph512_mask_round ((__v32hi) __C, - __A, - __B, - _MM_FROUND_CUR_DIRECTION); -} - -extern __inline __m512h -__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_maskz_cvtepi16_ph (__mmask32 __A, __m512i __B) -{ - return __builtin_ia32_vcvtw2ph512_mask_round ((__v32hi) __B, - _mm512_setzero_ph (), - __A, - _MM_FROUND_CUR_DIRECTION); -} - -#ifdef __OPTIMIZE__ -extern __inline __m512h -__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_cvt_roundepi16_ph (__m512i __A, int __B) -{ - return __builtin_ia32_vcvtw2ph512_mask_round ((__v32hi) __A, - _mm512_setzero_ph (), - (__mmask32) -1, - __B); -} - -extern __inline __m512h -__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_mask_cvt_roundepi16_ph (__m512h __A, __mmask32 __B, __m512i __C, int __D) -{ - return __builtin_ia32_vcvtw2ph512_mask_round ((__v32hi) __C, - __A, - __B, - __D); -} - -extern __inline __m512h -__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_maskz_cvt_roundepi16_ph (__mmask32 __A, __m512i __B, int __C) -{ - return __builtin_ia32_vcvtw2ph512_mask_round ((__v32hi) __B, - _mm512_setzero_ph (), - __A, - __C); -} - -#else -#define _mm512_cvt_roundepi16_ph(A, B) \ - (__builtin_ia32_vcvtw2ph512_mask_round ((__v32hi)(A), \ - _mm512_setzero_ph (), \ - (__mmask32)-1, \ - (B))) - -#define _mm512_mask_cvt_roundepi16_ph(A, B, C, D) \ - (__builtin_ia32_vcvtw2ph512_mask_round ((__v32hi)(C), \ - (A), \ - (B), \ - (D))) +#define _mm512_mask_cvt_roundph_epi32(A, B, C, D) \ + ((__m512i) \ + __builtin_ia32_vcvtph2dq512_mask_round ((C), (__v16si)(A), (B), (D))) -#define _mm512_maskz_cvt_roundepi16_ph(A, B, C) \ - (__builtin_ia32_vcvtw2ph512_mask_round ((__v32hi)(B), \ - _mm512_setzero_ph (), \ - (A), \ - (C))) +#define _mm512_maskz_cvt_roundph_epi32(A, B, C) \ + ((__m512i) \ + __builtin_ia32_vcvtph2dq512_mask_round ((B), \ + (__v16si) \ + _mm512_setzero_si512 (), \ + (A), \ + (C))) #endif /* __OPTIMIZE__ */ -/* Intrinsics vcvtuw2ph. */ - extern __inline __m512h - __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) - _mm512_cvtepu16_ph (__m512i __A) - { - return __builtin_ia32_vcvtuw2ph512_mask_round ((__v32hi) __A, - _mm512_setzero_ph (), - (__mmask32) -1, - _MM_FROUND_CUR_DIRECTION); - } +/* Intrinsics vcvtph2udq. */ +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_cvtph_epu32 (__m256h __A) +{ + return (__m512i) + __builtin_ia32_vcvtph2udq512_mask_round (__A, + (__v16si) + _mm512_setzero_si512 (), + (__mmask16) -1, + _MM_FROUND_CUR_DIRECTION); +} -extern __inline __m512h +extern __inline __m512i __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_mask_cvtepu16_ph (__m512h __A, __mmask32 __B, __m512i __C) +_mm512_mask_cvtph_epu32 (__m512i __A, __mmask16 __B, __m256h __C) { - return __builtin_ia32_vcvtuw2ph512_mask_round ((__v32hi) __C, - __A, - __B, - _MM_FROUND_CUR_DIRECTION); + return (__m512i) + __builtin_ia32_vcvtph2udq512_mask_round (__C, + (__v16si) __A, + __B, + _MM_FROUND_CUR_DIRECTION); } -extern __inline __m512h +extern __inline __m512i __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_maskz_cvtepu16_ph (__mmask32 __A, __m512i __B) +_mm512_maskz_cvtph_epu32 (__mmask16 __A, __m256h __B) { - return __builtin_ia32_vcvtuw2ph512_mask_round ((__v32hi) __B, - _mm512_setzero_ph (), - __A, - _MM_FROUND_CUR_DIRECTION); + return (__m512i) + __builtin_ia32_vcvtph2udq512_mask_round (__B, + (__v16si) + _mm512_setzero_si512 (), + __A, + _MM_FROUND_CUR_DIRECTION); } #ifdef __OPTIMIZE__ -extern __inline __m512h +extern __inline __m512i __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_cvt_roundepu16_ph (__m512i __A, int __B) +_mm512_cvt_roundph_epu32 (__m256h __A, int __B) { - return __builtin_ia32_vcvtuw2ph512_mask_round ((__v32hi) __A, - _mm512_setzero_ph (), - (__mmask32) -1, - __B); + return (__m512i) + __builtin_ia32_vcvtph2udq512_mask_round (__A, + (__v16si) + _mm512_setzero_si512 (), + (__mmask16) -1, + __B); } -extern __inline __m512h +extern __inline __m512i __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_mask_cvt_roundepu16_ph (__m512h __A, __mmask32 __B, __m512i __C, int __D) +_mm512_mask_cvt_roundph_epu32 (__m512i __A, __mmask16 __B, __m256h __C, int __D) { - return __builtin_ia32_vcvtuw2ph512_mask_round ((__v32hi) __C, - __A, - __B, - __D); + return (__m512i) + __builtin_ia32_vcvtph2udq512_mask_round (__C, + (__v16si) __A, + __B, + __D); } -extern __inline __m512h +extern __inline __m512i __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_maskz_cvt_roundepu16_ph (__mmask32 __A, __m512i __B, int __C) +_mm512_maskz_cvt_roundph_epu32 (__mmask16 __A, __m256h __B, int __C) { - return __builtin_ia32_vcvtuw2ph512_mask_round ((__v32hi) __B, - _mm512_setzero_ph (), - __A, - __C); + return (__m512i) + __builtin_ia32_vcvtph2udq512_mask_round (__B, + (__v16si) + _mm512_setzero_si512 (), + __A, + __C); } #else -#define _mm512_cvt_roundepu16_ph(A, B) \ - (__builtin_ia32_vcvtuw2ph512_mask_round ((__v32hi)(A), \ - _mm512_setzero_ph (), \ - (__mmask32)-1, \ - (B))) +#define _mm512_cvt_roundph_epu32(A, B) \ + ((__m512i) \ + __builtin_ia32_vcvtph2udq512_mask_round ((A), \ + (__v16si) \ + _mm512_setzero_si512 (), \ + (__mmask16)-1, \ + (B))) -#define _mm512_mask_cvt_roundepu16_ph(A, B, C, D) \ - (__builtin_ia32_vcvtuw2ph512_mask_round ((__v32hi)(C), \ - (A), \ - (B), \ - (D))) +#define _mm512_mask_cvt_roundph_epu32(A, B, C, D) \ + ((__m512i) \ + __builtin_ia32_vcvtph2udq512_mask_round ((C), (__v16si)(A), (B), (D))) -#define _mm512_maskz_cvt_roundepu16_ph(A, B, C) \ - (__builtin_ia32_vcvtuw2ph512_mask_round ((__v32hi)(B), \ - _mm512_setzero_ph (), \ - (A), \ - (C))) +#define _mm512_maskz_cvt_roundph_epu32(A, B, C) \ + ((__m512i) \ + __builtin_ia32_vcvtph2udq512_mask_round ((B), \ + (__v16si) \ + _mm512_setzero_si512 (), \ + (A), \ + (C))) #endif /* __OPTIMIZE__ */ -/* Intrinsics vcvtsh2si, vcvtsh2us. */ -extern __inline int +/* Intrinsics vcvttph2dq. */ +extern __inline __m512i __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_cvtsh_i32 (__m128h __A) +_mm512_cvttph_epi32 (__m256h __A) { - return (int) __builtin_ia32_vcvtsh2si32_round (__A, _MM_FROUND_CUR_DIRECTION); + return (__m512i) + __builtin_ia32_vcvttph2dq512_mask_round (__A, + (__v16si) + _mm512_setzero_si512 (), + (__mmask16) -1, + _MM_FROUND_CUR_DIRECTION); } -extern __inline unsigned +extern __inline __m512i __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_cvtsh_u32 (__m128h __A) +_mm512_mask_cvttph_epi32 (__m512i __A, __mmask16 __B, __m256h __C) { - return (int) __builtin_ia32_vcvtsh2usi32_round (__A, - _MM_FROUND_CUR_DIRECTION); + return (__m512i) + __builtin_ia32_vcvttph2dq512_mask_round (__C, + (__v16si) __A, + __B, + _MM_FROUND_CUR_DIRECTION); } -#ifdef __OPTIMIZE__ -extern __inline int +extern __inline __m512i __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_cvt_roundsh_i32 (__m128h __A, const int __R) +_mm512_maskz_cvttph_epi32 (__mmask16 __A, __m256h __B) { - return (int) __builtin_ia32_vcvtsh2si32_round (__A, __R); + return (__m512i) + __builtin_ia32_vcvttph2dq512_mask_round (__B, + (__v16si) + _mm512_setzero_si512 (), + __A, + _MM_FROUND_CUR_DIRECTION); } -extern __inline unsigned +#ifdef __OPTIMIZE__ +extern __inline __m512i __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_cvt_roundsh_u32 (__m128h __A, const int __R) +_mm512_cvtt_roundph_epi32 (__m256h __A, int __B) { - return (int) __builtin_ia32_vcvtsh2usi32_round (__A, __R); + return (__m512i) + __builtin_ia32_vcvttph2dq512_mask_round (__A, + (__v16si) + _mm512_setzero_si512 (), + (__mmask16) -1, + __B); } -#else -#define _mm_cvt_roundsh_i32(A, B) \ - ((int)__builtin_ia32_vcvtsh2si32_round ((A), (B))) -#define _mm_cvt_roundsh_u32(A, B) \ - ((int)__builtin_ia32_vcvtsh2usi32_round ((A), (B))) - -#endif /* __OPTIMIZE__ */ - -#ifdef __x86_64__ -extern __inline long long +extern __inline __m512i __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_cvtsh_i64 (__m128h __A) +_mm512_mask_cvtt_roundph_epi32 (__m512i __A, __mmask16 __B, + __m256h __C, int __D) { - return (long long) - __builtin_ia32_vcvtsh2si64_round (__A, _MM_FROUND_CUR_DIRECTION); + return (__m512i) + __builtin_ia32_vcvttph2dq512_mask_round (__C, + (__v16si) __A, + __B, + __D); } -extern __inline unsigned long long +extern __inline __m512i __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_cvtsh_u64 (__m128h __A) +_mm512_maskz_cvtt_roundph_epi32 (__mmask16 __A, __m256h __B, int __C) { - return (long long) - __builtin_ia32_vcvtsh2usi64_round (__A, _MM_FROUND_CUR_DIRECTION); + return (__m512i) + __builtin_ia32_vcvttph2dq512_mask_round (__B, + (__v16si) + _mm512_setzero_si512 (), + __A, + __C); } -#ifdef __OPTIMIZE__ -extern __inline long long +#else +#define _mm512_cvtt_roundph_epi32(A, B) \ + ((__m512i) \ + __builtin_ia32_vcvttph2dq512_mask_round ((A), \ + (__v16si) \ + (_mm512_setzero_si512 ()), \ + (__mmask16)(-1), (B))) + +#define _mm512_mask_cvtt_roundph_epi32(A, B, C, D) \ + ((__m512i) \ + __builtin_ia32_vcvttph2dq512_mask_round ((C), \ + (__v16si)(A), \ + (B), \ + (D))) + +#define _mm512_maskz_cvtt_roundph_epi32(A, B, C) \ + ((__m512i) \ + __builtin_ia32_vcvttph2dq512_mask_round ((B), \ + (__v16si) \ + _mm512_setzero_si512 (), \ + (A), \ + (C))) + +#endif /* __OPTIMIZE__ */ + +/* Intrinsics vcvttph2udq. */ +extern __inline __m512i __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_cvt_roundsh_i64 (__m128h __A, const int __R) +_mm512_cvttph_epu32 (__m256h __A) { - return (long long) __builtin_ia32_vcvtsh2si64_round (__A, __R); + return (__m512i) + __builtin_ia32_vcvttph2udq512_mask_round (__A, + (__v16si) + _mm512_setzero_si512 (), + (__mmask16) -1, + _MM_FROUND_CUR_DIRECTION); } -extern __inline unsigned long long +extern __inline __m512i __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_cvt_roundsh_u64 (__m128h __A, const int __R) +_mm512_mask_cvttph_epu32 (__m512i __A, __mmask16 __B, __m256h __C) { - return (long long) __builtin_ia32_vcvtsh2usi64_round (__A, __R); + return (__m512i) + __builtin_ia32_vcvttph2udq512_mask_round (__C, + (__v16si) __A, + __B, + _MM_FROUND_CUR_DIRECTION); } -#else -#define _mm_cvt_roundsh_i64(A, B) \ - ((long long)__builtin_ia32_vcvtsh2si64_round ((A), (B))) -#define _mm_cvt_roundsh_u64(A, B) \ - ((long long)__builtin_ia32_vcvtsh2usi64_round ((A), (B))) - -#endif /* __OPTIMIZE__ */ -#endif /* __x86_64__ */ - -/* Intrinsics vcvttsh2si, vcvttsh2us. */ -extern __inline int +extern __inline __m512i __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_cvttsh_i32 (__m128h __A) +_mm512_maskz_cvttph_epu32 (__mmask16 __A, __m256h __B) { - return (int) - __builtin_ia32_vcvttsh2si32_round (__A, _MM_FROUND_CUR_DIRECTION); + return (__m512i) + __builtin_ia32_vcvttph2udq512_mask_round (__B, + (__v16si) + _mm512_setzero_si512 (), + __A, + _MM_FROUND_CUR_DIRECTION); } -extern __inline unsigned +#ifdef __OPTIMIZE__ +extern __inline __m512i __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_cvttsh_u32 (__m128h __A) +_mm512_cvtt_roundph_epu32 (__m256h __A, int __B) { - return (int) - __builtin_ia32_vcvttsh2usi32_round (__A, _MM_FROUND_CUR_DIRECTION); + return (__m512i) + __builtin_ia32_vcvttph2udq512_mask_round (__A, + (__v16si) + _mm512_setzero_si512 (), + (__mmask16) -1, + __B); } -#ifdef __OPTIMIZE__ -extern __inline int +extern __inline __m512i __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_cvtt_roundsh_i32 (__m128h __A, const int __R) +_mm512_mask_cvtt_roundph_epu32 (__m512i __A, __mmask16 __B, + __m256h __C, int __D) { - return (int) __builtin_ia32_vcvttsh2si32_round (__A, __R); + return (__m512i) + __builtin_ia32_vcvttph2udq512_mask_round (__C, + (__v16si) __A, + __B, + __D); } -extern __inline unsigned +extern __inline __m512i __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_cvtt_roundsh_u32 (__m128h __A, const int __R) +_mm512_maskz_cvtt_roundph_epu32 (__mmask16 __A, __m256h __B, int __C) { - return (int) __builtin_ia32_vcvttsh2usi32_round (__A, __R); + return (__m512i) + __builtin_ia32_vcvttph2udq512_mask_round (__B, + (__v16si) + _mm512_setzero_si512 (), + __A, + __C); } #else -#define _mm_cvtt_roundsh_i32(A, B) \ - ((int)__builtin_ia32_vcvttsh2si32_round ((A), (B))) -#define _mm_cvtt_roundsh_u32(A, B) \ - ((int)__builtin_ia32_vcvttsh2usi32_round ((A), (B))) +#define _mm512_cvtt_roundph_epu32(A, B) \ + ((__m512i) \ + __builtin_ia32_vcvttph2udq512_mask_round ((A), \ + (__v16si) \ + _mm512_setzero_si512 (), \ + (__mmask16)-1, \ + (B))) + +#define _mm512_mask_cvtt_roundph_epu32(A, B, C, D) \ + ((__m512i) \ + __builtin_ia32_vcvttph2udq512_mask_round ((C), \ + (__v16si)(A), \ + (B), \ + (D))) + +#define _mm512_maskz_cvtt_roundph_epu32(A, B, C) \ + ((__m512i) \ + __builtin_ia32_vcvttph2udq512_mask_round ((B), \ + (__v16si) \ + _mm512_setzero_si512 (), \ + (A), \ + (C))) #endif /* __OPTIMIZE__ */ -#ifdef __x86_64__ -extern __inline long long +/* Intrinsics vcvtdq2ph. */ +extern __inline __m256h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_cvttsh_i64 (__m128h __A) +_mm512_cvtepi32_ph (__m512i __A) { - return (long long) - __builtin_ia32_vcvttsh2si64_round (__A, _MM_FROUND_CUR_DIRECTION); + return __builtin_ia32_vcvtdq2ph512_mask_round ((__v16si) __A, + _mm256_setzero_ph (), + (__mmask16) -1, + _MM_FROUND_CUR_DIRECTION); } -extern __inline unsigned long long +extern __inline __m256h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_cvttsh_u64 (__m128h __A) +_mm512_mask_cvtepi32_ph (__m256h __A, __mmask16 __B, __m512i __C) { - return (long long) - __builtin_ia32_vcvttsh2usi64_round (__A, _MM_FROUND_CUR_DIRECTION); + return __builtin_ia32_vcvtdq2ph512_mask_round ((__v16si) __C, + __A, + __B, + _MM_FROUND_CUR_DIRECTION); } -#ifdef __OPTIMIZE__ -extern __inline long long +extern __inline __m256h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_cvtt_roundsh_i64 (__m128h __A, const int __R) +_mm512_maskz_cvtepi32_ph (__mmask16 __A, __m512i __B) { - return (long long) __builtin_ia32_vcvttsh2si64_round (__A, __R); + return __builtin_ia32_vcvtdq2ph512_mask_round ((__v16si) __B, + _mm256_setzero_ph (), + __A, + _MM_FROUND_CUR_DIRECTION); } -extern __inline unsigned long long +#ifdef __OPTIMIZE__ +extern __inline __m256h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_cvtt_roundsh_u64 (__m128h __A, const int __R) +_mm512_cvt_roundepi32_ph (__m512i __A, int __B) { - return (long long) __builtin_ia32_vcvttsh2usi64_round (__A, __R); + return __builtin_ia32_vcvtdq2ph512_mask_round ((__v16si) __A, + _mm256_setzero_ph (), + (__mmask16) -1, + __B); } -#else -#define _mm_cvtt_roundsh_i64(A, B) \ - ((long long)__builtin_ia32_vcvttsh2si64_round ((A), (B))) -#define _mm_cvtt_roundsh_u64(A, B) \ - ((long long)__builtin_ia32_vcvttsh2usi64_round ((A), (B))) - -#endif /* __OPTIMIZE__ */ -#endif /* __x86_64__ */ - -/* Intrinsics vcvtsi2sh, vcvtusi2sh. */ -extern __inline __m128h +extern __inline __m256h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_cvti32_sh (__m128h __A, int __B) +_mm512_mask_cvt_roundepi32_ph (__m256h __A, __mmask16 __B, __m512i __C, int __D) { - return __builtin_ia32_vcvtsi2sh32_round (__A, __B, _MM_FROUND_CUR_DIRECTION); + return __builtin_ia32_vcvtdq2ph512_mask_round ((__v16si) __C, + __A, + __B, + __D); } -extern __inline __m128h +extern __inline __m256h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_cvtu32_sh (__m128h __A, unsigned int __B) +_mm512_maskz_cvt_roundepi32_ph (__mmask16 __A, __m512i __B, int __C) { - return __builtin_ia32_vcvtusi2sh32_round (__A, __B, _MM_FROUND_CUR_DIRECTION); + return __builtin_ia32_vcvtdq2ph512_mask_round ((__v16si) __B, + _mm256_setzero_ph (), + __A, + __C); } -#ifdef __OPTIMIZE__ -extern __inline __m128h +#else +#define _mm512_cvt_roundepi32_ph(A, B) \ + (__builtin_ia32_vcvtdq2ph512_mask_round ((__v16si)(A), \ + _mm256_setzero_ph (), \ + (__mmask16)-1, \ + (B))) + +#define _mm512_mask_cvt_roundepi32_ph(A, B, C, D) \ + (__builtin_ia32_vcvtdq2ph512_mask_round ((__v16si)(C), \ + (A), \ + (B), \ + (D))) + +#define _mm512_maskz_cvt_roundepi32_ph(A, B, C) \ + (__builtin_ia32_vcvtdq2ph512_mask_round ((__v16si)(B), \ + _mm256_setzero_ph (), \ + (A), \ + (C))) + +#endif /* __OPTIMIZE__ */ + +/* Intrinsics vcvtudq2ph. */ +extern __inline __m256h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_cvt_roundi32_sh (__m128h __A, int __B, const int __R) +_mm512_cvtepu32_ph (__m512i __A) { - return __builtin_ia32_vcvtsi2sh32_round (__A, __B, __R); + return __builtin_ia32_vcvtudq2ph512_mask_round ((__v16si) __A, + _mm256_setzero_ph (), + (__mmask16) -1, + _MM_FROUND_CUR_DIRECTION); } -extern __inline __m128h +extern __inline __m256h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_cvt_roundu32_sh (__m128h __A, unsigned int __B, const int __R) +_mm512_mask_cvtepu32_ph (__m256h __A, __mmask16 __B, __m512i __C) { - return __builtin_ia32_vcvtusi2sh32_round (__A, __B, __R); + return __builtin_ia32_vcvtudq2ph512_mask_round ((__v16si) __C, + __A, + __B, + _MM_FROUND_CUR_DIRECTION); } -#else -#define _mm_cvt_roundi32_sh(A, B, C) \ - (__builtin_ia32_vcvtsi2sh32_round ((A), (B), (C))) -#define _mm_cvt_roundu32_sh(A, B, C) \ - (__builtin_ia32_vcvtusi2sh32_round ((A), (B), (C))) - -#endif /* __OPTIMIZE__ */ - -#ifdef __x86_64__ -extern __inline __m128h +extern __inline __m256h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_cvti64_sh (__m128h __A, long long __B) +_mm512_maskz_cvtepu32_ph (__mmask16 __A, __m512i __B) { - return __builtin_ia32_vcvtsi2sh64_round (__A, __B, _MM_FROUND_CUR_DIRECTION); + return __builtin_ia32_vcvtudq2ph512_mask_round ((__v16si) __B, + _mm256_setzero_ph (), + __A, + _MM_FROUND_CUR_DIRECTION); } -extern __inline __m128h +#ifdef __OPTIMIZE__ +extern __inline __m256h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_cvtu64_sh (__m128h __A, unsigned long long __B) +_mm512_cvt_roundepu32_ph (__m512i __A, int __B) { - return __builtin_ia32_vcvtusi2sh64_round (__A, __B, _MM_FROUND_CUR_DIRECTION); + return __builtin_ia32_vcvtudq2ph512_mask_round ((__v16si) __A, + _mm256_setzero_ph (), + (__mmask16) -1, + __B); } -#ifdef __OPTIMIZE__ -extern __inline __m128h +extern __inline __m256h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_cvt_roundi64_sh (__m128h __A, long long __B, const int __R) +_mm512_mask_cvt_roundepu32_ph (__m256h __A, __mmask16 __B, __m512i __C, int __D) { - return __builtin_ia32_vcvtsi2sh64_round (__A, __B, __R); + return __builtin_ia32_vcvtudq2ph512_mask_round ((__v16si) __C, + __A, + __B, + __D); } -extern __inline __m128h +extern __inline __m256h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_cvt_roundu64_sh (__m128h __A, unsigned long long __B, const int __R) +_mm512_maskz_cvt_roundepu32_ph (__mmask16 __A, __m512i __B, int __C) { - return __builtin_ia32_vcvtusi2sh64_round (__A, __B, __R); + return __builtin_ia32_vcvtudq2ph512_mask_round ((__v16si) __B, + _mm256_setzero_ph (), + __A, + __C); } -#else -#define _mm_cvt_roundi64_sh(A, B, C) \ - (__builtin_ia32_vcvtsi2sh64_round ((A), (B), (C))) -#define _mm_cvt_roundu64_sh(A, B, C) \ - (__builtin_ia32_vcvtusi2sh64_round ((A), (B), (C))) +#else +#define _mm512_cvt_roundepu32_ph(A, B) \ + (__builtin_ia32_vcvtudq2ph512_mask_round ((__v16si)(A), \ + _mm256_setzero_ph (), \ + (__mmask16)-1, \ + B)) + +#define _mm512_mask_cvt_roundepu32_ph(A, B, C, D) \ + (__builtin_ia32_vcvtudq2ph512_mask_round ((__v16si)C, \ + A, \ + B, \ + D)) + +#define _mm512_maskz_cvt_roundepu32_ph(A, B, C) \ + (__builtin_ia32_vcvtudq2ph512_mask_round ((__v16si)B, \ + _mm256_setzero_ph (), \ + A, \ + C)) #endif /* __OPTIMIZE__ */ -#endif /* __x86_64__ */ -/* Intrinsics vcvtph2pd. */ -extern __inline __m512d +/* Intrinsics vcvtph2qq. */ +extern __inline __m512i __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_cvtph_pd (__m128h __A) +_mm512_cvtph_epi64 (__m128h __A) { - return __builtin_ia32_vcvtph2pd512_mask_round (__A, - _mm512_setzero_pd (), + return __builtin_ia32_vcvtph2qq512_mask_round (__A, + _mm512_setzero_si512 (), (__mmask8) -1, _MM_FROUND_CUR_DIRECTION); } -extern __inline __m512d +extern __inline __m512i __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_mask_cvtph_pd (__m512d __A, __mmask8 __B, __m128h __C) +_mm512_mask_cvtph_epi64 (__m512i __A, __mmask8 __B, __m128h __C) { - return __builtin_ia32_vcvtph2pd512_mask_round (__C, __A, __B, + return __builtin_ia32_vcvtph2qq512_mask_round (__C, __A, __B, _MM_FROUND_CUR_DIRECTION); } -extern __inline __m512d +extern __inline __m512i __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_maskz_cvtph_pd (__mmask8 __A, __m128h __B) +_mm512_maskz_cvtph_epi64 (__mmask8 __A, __m128h __B) { - return __builtin_ia32_vcvtph2pd512_mask_round (__B, - _mm512_setzero_pd (), + return __builtin_ia32_vcvtph2qq512_mask_round (__B, + _mm512_setzero_si512 (), __A, _MM_FROUND_CUR_DIRECTION); } #ifdef __OPTIMIZE__ -extern __inline __m512d +extern __inline __m512i __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_cvt_roundph_pd (__m128h __A, int __B) +_mm512_cvt_roundph_epi64 (__m128h __A, int __B) { - return __builtin_ia32_vcvtph2pd512_mask_round (__A, - _mm512_setzero_pd (), + return __builtin_ia32_vcvtph2qq512_mask_round (__A, + _mm512_setzero_si512 (), (__mmask8) -1, __B); } -extern __inline __m512d +extern __inline __m512i __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_mask_cvt_roundph_pd (__m512d __A, __mmask8 __B, __m128h __C, int __D) +_mm512_mask_cvt_roundph_epi64 (__m512i __A, __mmask8 __B, __m128h __C, int __D) { - return __builtin_ia32_vcvtph2pd512_mask_round (__C, __A, __B, __D); + return __builtin_ia32_vcvtph2qq512_mask_round (__C, __A, __B, __D); } -extern __inline __m512d +extern __inline __m512i __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_maskz_cvt_roundph_pd (__mmask8 __A, __m128h __B, int __C) +_mm512_maskz_cvt_roundph_epi64 (__mmask8 __A, __m128h __B, int __C) { - return __builtin_ia32_vcvtph2pd512_mask_round (__B, - _mm512_setzero_pd (), + return __builtin_ia32_vcvtph2qq512_mask_round (__B, + _mm512_setzero_si512 (), __A, __C); } #else -#define _mm512_cvt_roundph_pd(A, B) \ - (__builtin_ia32_vcvtph2pd512_mask_round ((A), \ - _mm512_setzero_pd (), \ +#define _mm512_cvt_roundph_epi64(A, B) \ + (__builtin_ia32_vcvtph2qq512_mask_round ((A), \ + _mm512_setzero_si512 (), \ (__mmask8)-1, \ (B))) -#define _mm512_mask_cvt_roundph_pd(A, B, C, D) \ - (__builtin_ia32_vcvtph2pd512_mask_round ((C), (A), (B), (D))) +#define _mm512_mask_cvt_roundph_epi64(A, B, C, D) \ + (__builtin_ia32_vcvtph2qq512_mask_round ((C), (A), (B), (D))) -#define _mm512_maskz_cvt_roundph_pd(A, B, C) \ - (__builtin_ia32_vcvtph2pd512_mask_round ((B), \ - _mm512_setzero_pd (), \ - (A), \ +#define _mm512_maskz_cvt_roundph_epi64(A, B, C) \ + (__builtin_ia32_vcvtph2qq512_mask_round ((B), \ + _mm512_setzero_si512 (), \ + (A), \ (C))) #endif /* __OPTIMIZE__ */ -/* Intrinsics vcvtph2psx. */ -extern __inline __m512 +/* Intrinsics vcvtph2uqq. */ +extern __inline __m512i __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_cvtxph_ps (__m256h __A) +_mm512_cvtph_epu64 (__m128h __A) { - return __builtin_ia32_vcvtph2psx512_mask_round (__A, - _mm512_setzero_ps (), - (__mmask16) -1, + return __builtin_ia32_vcvtph2uqq512_mask_round (__A, + _mm512_setzero_si512 (), + (__mmask8) -1, _MM_FROUND_CUR_DIRECTION); } -extern __inline __m512 +extern __inline __m512i __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_mask_cvtxph_ps (__m512 __A, __mmask16 __B, __m256h __C) +_mm512_mask_cvtph_epu64 (__m512i __A, __mmask8 __B, __m128h __C) { - return __builtin_ia32_vcvtph2psx512_mask_round (__C, __A, __B, + return __builtin_ia32_vcvtph2uqq512_mask_round (__C, __A, __B, _MM_FROUND_CUR_DIRECTION); } -extern __inline __m512 +extern __inline __m512i __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_maskz_cvtxph_ps (__mmask16 __A, __m256h __B) +_mm512_maskz_cvtph_epu64 (__mmask8 __A, __m128h __B) { - return __builtin_ia32_vcvtph2psx512_mask_round (__B, - _mm512_setzero_ps (), + return __builtin_ia32_vcvtph2uqq512_mask_round (__B, + _mm512_setzero_si512 (), __A, _MM_FROUND_CUR_DIRECTION); } #ifdef __OPTIMIZE__ -extern __inline __m512 + +extern __inline __m512i __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_cvtx_roundph_ps (__m256h __A, int __B) +_mm512_cvt_roundph_epu64 (__m128h __A, int __B) { - return __builtin_ia32_vcvtph2psx512_mask_round (__A, - _mm512_setzero_ps (), - (__mmask16) -1, + return __builtin_ia32_vcvtph2uqq512_mask_round (__A, + _mm512_setzero_si512 (), + (__mmask8) -1, __B); } -extern __inline __m512 +extern __inline __m512i __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_mask_cvtx_roundph_ps (__m512 __A, __mmask16 __B, __m256h __C, int __D) +_mm512_mask_cvt_roundph_epu64 (__m512i __A, __mmask8 __B, __m128h __C, int __D) { - return __builtin_ia32_vcvtph2psx512_mask_round (__C, __A, __B, __D); + return __builtin_ia32_vcvtph2uqq512_mask_round (__C, __A, __B, __D); } -extern __inline __m512 +extern __inline __m512i __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_maskz_cvtx_roundph_ps (__mmask16 __A, __m256h __B, int __C) +_mm512_maskz_cvt_roundph_epu64 (__mmask8 __A, __m128h __B, int __C) { - return __builtin_ia32_vcvtph2psx512_mask_round (__B, - _mm512_setzero_ps (), + return __builtin_ia32_vcvtph2uqq512_mask_round (__B, + _mm512_setzero_si512 (), __A, __C); } #else -#define _mm512_cvtx_roundph_ps(A, B) \ - (__builtin_ia32_vcvtph2psx512_mask_round ((A), \ - _mm512_setzero_ps (), \ - (__mmask16)-1, \ +#define _mm512_cvt_roundph_epu64(A, B) \ + (__builtin_ia32_vcvtph2uqq512_mask_round ((A), \ + _mm512_setzero_si512 (), \ + (__mmask8)-1, \ (B))) -#define _mm512_mask_cvtx_roundph_ps(A, B, C, D) \ - (__builtin_ia32_vcvtph2psx512_mask_round ((C), (A), (B), (D))) +#define _mm512_mask_cvt_roundph_epu64(A, B, C, D) \ + (__builtin_ia32_vcvtph2uqq512_mask_round ((C), (A), (B), (D))) -#define _mm512_maskz_cvtx_roundph_ps(A, B, C) \ - (__builtin_ia32_vcvtph2psx512_mask_round ((B), \ - _mm512_setzero_ps (), \ +#define _mm512_maskz_cvt_roundph_epu64(A, B, C) \ + (__builtin_ia32_vcvtph2uqq512_mask_round ((B), \ + _mm512_setzero_si512 (), \ (A), \ (C))) + #endif /* __OPTIMIZE__ */ -/* Intrinsics vcvtps2ph. */ -extern __inline __m256h +/* Intrinsics vcvttph2qq. */ +extern __inline __m512i __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_cvtxps_ph (__m512 __A) +_mm512_cvttph_epi64 (__m128h __A) { - return __builtin_ia32_vcvtps2phx512_mask_round ((__v16sf) __A, - _mm256_setzero_ph (), - (__mmask16) -1, + return __builtin_ia32_vcvttph2qq512_mask_round (__A, + _mm512_setzero_si512 (), + (__mmask8) -1, _MM_FROUND_CUR_DIRECTION); } -extern __inline __m256h +extern __inline __m512i __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_mask_cvtxps_ph (__m256h __A, __mmask16 __B, __m512 __C) +_mm512_mask_cvttph_epi64 (__m512i __A, __mmask8 __B, __m128h __C) { - return __builtin_ia32_vcvtps2phx512_mask_round ((__v16sf) __C, - __A, __B, + return __builtin_ia32_vcvttph2qq512_mask_round (__C, __A, __B, _MM_FROUND_CUR_DIRECTION); } -extern __inline __m256h +extern __inline __m512i __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_maskz_cvtxps_ph (__mmask16 __A, __m512 __B) +_mm512_maskz_cvttph_epi64 (__mmask8 __A, __m128h __B) { - return __builtin_ia32_vcvtps2phx512_mask_round ((__v16sf) __B, - _mm256_setzero_ph (), + return __builtin_ia32_vcvttph2qq512_mask_round (__B, + _mm512_setzero_si512 (), __A, - _MM_FROUND_CUR_DIRECTION); -} - -#ifdef __OPTIMIZE__ -extern __inline __m256h -__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_cvtx_roundps_ph (__m512 __A, int __B) -{ - return __builtin_ia32_vcvtps2phx512_mask_round ((__v16sf) __A, - _mm256_setzero_ph (), - (__mmask16) -1, - __B); -} - -extern __inline __m256h -__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_mask_cvtx_roundps_ph (__m256h __A, __mmask16 __B, __m512 __C, int __D) -{ - return __builtin_ia32_vcvtps2phx512_mask_round ((__v16sf) __C, - __A, __B, __D); -} - -extern __inline __m256h -__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_maskz_cvtx_roundps_ph (__mmask16 __A, __m512 __B, int __C) -{ - return __builtin_ia32_vcvtps2phx512_mask_round ((__v16sf) __B, - _mm256_setzero_ph (), - __A, __C); -} - -#else -#define _mm512_cvtx_roundps_ph(A, B) \ - (__builtin_ia32_vcvtps2phx512_mask_round ((__v16sf)(A), \ - _mm256_setzero_ph (),\ - (__mmask16)-1, (B))) - -#define _mm512_mask_cvtx_roundps_ph(A, B, C, D) \ - (__builtin_ia32_vcvtps2phx512_mask_round ((__v16sf)(C), \ - (A), (B), (D))) - -#define _mm512_maskz_cvtx_roundps_ph(A, B, C) \ - (__builtin_ia32_vcvtps2phx512_mask_round ((__v16sf)(B), \ - _mm256_setzero_ph (),\ - (A), (C))) -#endif /* __OPTIMIZE__ */ - -/* Intrinsics vcvtpd2ph. */ -extern __inline __m128h -__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_cvtpd_ph (__m512d __A) -{ - return __builtin_ia32_vcvtpd2ph512_mask_round ((__v8df) __A, - _mm_setzero_ph (), - (__mmask8) -1, - _MM_FROUND_CUR_DIRECTION); -} - -extern __inline __m128h -__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_mask_cvtpd_ph (__m128h __A, __mmask8 __B, __m512d __C) -{ - return __builtin_ia32_vcvtpd2ph512_mask_round ((__v8df) __C, - __A, __B, - _MM_FROUND_CUR_DIRECTION); -} - -extern __inline __m128h -__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_maskz_cvtpd_ph (__mmask8 __A, __m512d __B) -{ - return __builtin_ia32_vcvtpd2ph512_mask_round ((__v8df) __B, - _mm_setzero_ph (), - __A, - _MM_FROUND_CUR_DIRECTION); + _MM_FROUND_CUR_DIRECTION); } #ifdef __OPTIMIZE__ -extern __inline __m128h +extern __inline __m512i __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_cvt_roundpd_ph (__m512d __A, int __B) +_mm512_cvtt_roundph_epi64 (__m128h __A, int __B) { - return __builtin_ia32_vcvtpd2ph512_mask_round ((__v8df) __A, - _mm_setzero_ph (), - (__mmask8) -1, - __B); + return __builtin_ia32_vcvttph2qq512_mask_round (__A, + _mm512_setzero_si512 (), + (__mmask8) -1, + __B); } -extern __inline __m128h +extern __inline __m512i __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_mask_cvt_roundpd_ph (__m128h __A, __mmask8 __B, __m512d __C, int __D) +_mm512_mask_cvtt_roundph_epi64 (__m512i __A, __mmask8 __B, __m128h __C, int __D) { - return __builtin_ia32_vcvtpd2ph512_mask_round ((__v8df) __C, - __A, __B, __D); + return __builtin_ia32_vcvttph2qq512_mask_round (__C, __A, __B, __D); } -extern __inline __m128h +extern __inline __m512i __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_maskz_cvt_roundpd_ph (__mmask8 __A, __m512d __B, int __C) +_mm512_maskz_cvtt_roundph_epi64 (__mmask8 __A, __m128h __B, int __C) { - return __builtin_ia32_vcvtpd2ph512_mask_round ((__v8df) __B, - _mm_setzero_ph (), - __A, __C); + return __builtin_ia32_vcvttph2qq512_mask_round (__B, + _mm512_setzero_si512 (), + __A, + __C); } #else -#define _mm512_cvt_roundpd_ph(A, B) \ - (__builtin_ia32_vcvtpd2ph512_mask_round ((__v8df)(A), \ - _mm_setzero_ph (), \ - (__mmask8)-1, (B))) +#define _mm512_cvtt_roundph_epi64(A, B) \ + (__builtin_ia32_vcvttph2qq512_mask_round ((A), \ + _mm512_setzero_si512 (), \ + (__mmask8)-1, \ + (B))) -#define _mm512_mask_cvt_roundpd_ph(A, B, C, D) \ - (__builtin_ia32_vcvtpd2ph512_mask_round ((__v8df)(C), \ - (A), (B), (D))) +#define _mm512_mask_cvtt_roundph_epi64(A, B, C, D) \ + __builtin_ia32_vcvttph2qq512_mask_round ((C), (A), (B), (D)) -#define _mm512_maskz_cvt_roundpd_ph(A, B, C) \ - (__builtin_ia32_vcvtpd2ph512_mask_round ((__v8df)(B), \ - _mm_setzero_ph (), \ - (A), (C))) +#define _mm512_maskz_cvtt_roundph_epi64(A, B, C) \ + (__builtin_ia32_vcvttph2qq512_mask_round ((B), \ + _mm512_setzero_si512 (), \ + (A), \ + (C))) #endif /* __OPTIMIZE__ */ -/* Intrinsics vcvtsh2ss, vcvtsh2sd. */ -extern __inline __m128 +/* Intrinsics vcvttph2uqq. */ +extern __inline __m512i __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_cvtsh_ss (__m128 __A, __m128h __B) +_mm512_cvttph_epu64 (__m128h __A) { - return __builtin_ia32_vcvtsh2ss_mask_round (__B, __A, - _mm_setzero_ps (), - (__mmask8) -1, - _MM_FROUND_CUR_DIRECTION); + return __builtin_ia32_vcvttph2uqq512_mask_round (__A, + _mm512_setzero_si512 (), + (__mmask8) -1, + _MM_FROUND_CUR_DIRECTION); } -extern __inline __m128 +extern __inline __m512i __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_mask_cvtsh_ss (__m128 __A, __mmask8 __B, __m128 __C, - __m128h __D) +_mm512_mask_cvttph_epu64 (__m512i __A, __mmask8 __B, __m128h __C) { - return __builtin_ia32_vcvtsh2ss_mask_round (__D, __C, __A, __B, - _MM_FROUND_CUR_DIRECTION); + return __builtin_ia32_vcvttph2uqq512_mask_round (__C, __A, __B, + _MM_FROUND_CUR_DIRECTION); } -extern __inline __m128 +extern __inline __m512i __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_maskz_cvtsh_ss (__mmask8 __A, __m128 __B, - __m128h __C) +_mm512_maskz_cvttph_epu64 (__mmask8 __A, __m128h __B) { - return __builtin_ia32_vcvtsh2ss_mask_round (__C, __B, - _mm_setzero_ps (), - __A, _MM_FROUND_CUR_DIRECTION); + return __builtin_ia32_vcvttph2uqq512_mask_round (__B, + _mm512_setzero_si512 (), + __A, + _MM_FROUND_CUR_DIRECTION); } -extern __inline __m128d +#ifdef __OPTIMIZE__ +extern __inline __m512i __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_cvtsh_sd (__m128d __A, __m128h __B) +_mm512_cvtt_roundph_epu64 (__m128h __A, int __B) { - return __builtin_ia32_vcvtsh2sd_mask_round (__B, __A, - _mm_setzero_pd (), - (__mmask8) -1, - _MM_FROUND_CUR_DIRECTION); + return __builtin_ia32_vcvttph2uqq512_mask_round (__A, + _mm512_setzero_si512 (), + (__mmask8) -1, + __B); } -extern __inline __m128d +extern __inline __m512i __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_mask_cvtsh_sd (__m128d __A, __mmask8 __B, __m128d __C, - __m128h __D) +_mm512_mask_cvtt_roundph_epu64 (__m512i __A, __mmask8 __B, __m128h __C, int __D) { - return __builtin_ia32_vcvtsh2sd_mask_round (__D, __C, __A, __B, - _MM_FROUND_CUR_DIRECTION); + return __builtin_ia32_vcvttph2uqq512_mask_round (__C, __A, __B, __D); } -extern __inline __m128d +extern __inline __m512i __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_maskz_cvtsh_sd (__mmask8 __A, __m128d __B, __m128h __C) +_mm512_maskz_cvtt_roundph_epu64 (__mmask8 __A, __m128h __B, int __C) { - return __builtin_ia32_vcvtsh2sd_mask_round (__C, __B, - _mm_setzero_pd (), - __A, _MM_FROUND_CUR_DIRECTION); + return __builtin_ia32_vcvttph2uqq512_mask_round (__B, + _mm512_setzero_si512 (), + __A, + __C); } -#ifdef __OPTIMIZE__ -extern __inline __m128 -__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_cvt_roundsh_ss (__m128 __A, __m128h __B, const int __R) +#else +#define _mm512_cvtt_roundph_epu64(A, B) \ + (__builtin_ia32_vcvttph2uqq512_mask_round ((A), \ + _mm512_setzero_si512 (), \ + (__mmask8)-1, \ + (B))) + +#define _mm512_mask_cvtt_roundph_epu64(A, B, C, D) \ + __builtin_ia32_vcvttph2uqq512_mask_round ((C), (A), (B), (D)) + +#define _mm512_maskz_cvtt_roundph_epu64(A, B, C) \ + (__builtin_ia32_vcvttph2uqq512_mask_round ((B), \ + _mm512_setzero_si512 (), \ + (A), \ + (C))) + +#endif /* __OPTIMIZE__ */ + +/* Intrinsics vcvtqq2ph. */ +extern __inline __m128h + __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_cvtepi64_ph (__m512i __A) { - return __builtin_ia32_vcvtsh2ss_mask_round (__B, __A, - _mm_setzero_ps (), - (__mmask8) -1, __R); + return __builtin_ia32_vcvtqq2ph512_mask_round ((__v8di) __A, + _mm_setzero_ph (), + (__mmask8) -1, + _MM_FROUND_CUR_DIRECTION); } -extern __inline __m128 +extern __inline __m128h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_mask_cvt_roundsh_ss (__m128 __A, __mmask8 __B, __m128 __C, - __m128h __D, const int __R) +_mm512_mask_cvtepi64_ph (__m128h __A, __mmask8 __B, __m512i __C) { - return __builtin_ia32_vcvtsh2ss_mask_round (__D, __C, __A, __B, __R); + return __builtin_ia32_vcvtqq2ph512_mask_round ((__v8di) __C, + __A, + __B, + _MM_FROUND_CUR_DIRECTION); } -extern __inline __m128 +extern __inline __m128h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_maskz_cvt_roundsh_ss (__mmask8 __A, __m128 __B, - __m128h __C, const int __R) +_mm512_maskz_cvtepi64_ph (__mmask8 __A, __m512i __B) { - return __builtin_ia32_vcvtsh2ss_mask_round (__C, __B, - _mm_setzero_ps (), - __A, __R); + return __builtin_ia32_vcvtqq2ph512_mask_round ((__v8di) __B, + _mm_setzero_ph (), + __A, + _MM_FROUND_CUR_DIRECTION); } -extern __inline __m128d +#ifdef __OPTIMIZE__ +extern __inline __m128h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_cvt_roundsh_sd (__m128d __A, __m128h __B, const int __R) +_mm512_cvt_roundepi64_ph (__m512i __A, int __B) { - return __builtin_ia32_vcvtsh2sd_mask_round (__B, __A, - _mm_setzero_pd (), - (__mmask8) -1, __R); + return __builtin_ia32_vcvtqq2ph512_mask_round ((__v8di) __A, + _mm_setzero_ph (), + (__mmask8) -1, + __B); } -extern __inline __m128d +extern __inline __m128h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_mask_cvt_roundsh_sd (__m128d __A, __mmask8 __B, __m128d __C, - __m128h __D, const int __R) +_mm512_mask_cvt_roundepi64_ph (__m128h __A, __mmask8 __B, __m512i __C, int __D) { - return __builtin_ia32_vcvtsh2sd_mask_round (__D, __C, __A, __B, __R); + return __builtin_ia32_vcvtqq2ph512_mask_round ((__v8di) __C, + __A, + __B, + __D); } -extern __inline __m128d +extern __inline __m128h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_maskz_cvt_roundsh_sd (__mmask8 __A, __m128d __B, __m128h __C, const int __R) +_mm512_maskz_cvt_roundepi64_ph (__mmask8 __A, __m512i __B, int __C) { - return __builtin_ia32_vcvtsh2sd_mask_round (__C, __B, - _mm_setzero_pd (), - __A, __R); + return __builtin_ia32_vcvtqq2ph512_mask_round ((__v8di) __B, + _mm_setzero_ph (), + __A, + __C); } #else -#define _mm_cvt_roundsh_ss(A, B, R) \ - (__builtin_ia32_vcvtsh2ss_mask_round ((B), (A), \ - _mm_setzero_ps (), \ - (__mmask8) -1, (R))) - -#define _mm_mask_cvt_roundsh_ss(A, B, C, D, R) \ - (__builtin_ia32_vcvtsh2ss_mask_round ((D), (C), (A), (B), (R))) - -#define _mm_maskz_cvt_roundsh_ss(A, B, C, R) \ - (__builtin_ia32_vcvtsh2ss_mask_round ((C), (B), \ - _mm_setzero_ps (), \ - (A), (R))) - -#define _mm_cvt_roundsh_sd(A, B, R) \ - (__builtin_ia32_vcvtsh2sd_mask_round ((B), (A), \ - _mm_setzero_pd (), \ - (__mmask8) -1, (R))) +#define _mm512_cvt_roundepi64_ph(A, B) \ + (__builtin_ia32_vcvtqq2ph512_mask_round ((__v8di)(A), \ + _mm_setzero_ph (), \ + (__mmask8)-1, \ + (B))) -#define _mm_mask_cvt_roundsh_sd(A, B, C, D, R) \ - (__builtin_ia32_vcvtsh2sd_mask_round ((D), (C), (A), (B), (R))) +#define _mm512_mask_cvt_roundepi64_ph(A, B, C, D) \ + (__builtin_ia32_vcvtqq2ph512_mask_round ((__v8di)(C), (A), (B), (D))) -#define _mm_maskz_cvt_roundsh_sd(A, B, C, R) \ - (__builtin_ia32_vcvtsh2sd_mask_round ((C), (B), \ - _mm_setzero_pd (), \ - (A), (R))) +#define _mm512_maskz_cvt_roundepi64_ph(A, B, C) \ + (__builtin_ia32_vcvtqq2ph512_mask_round ((__v8di)(B), \ + _mm_setzero_ph (), \ + (A), \ + (C))) #endif /* __OPTIMIZE__ */ -/* Intrinsics vcvtss2sh, vcvtsd2sh. */ +/* Intrinsics vcvtuqq2ph. */ extern __inline __m128h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_cvtss_sh (__m128h __A, __m128 __B) +_mm512_cvtepu64_ph (__m512i __A) { - return __builtin_ia32_vcvtss2sh_mask_round (__B, __A, - _mm_setzero_ph (), - (__mmask8) -1, - _MM_FROUND_CUR_DIRECTION); + return __builtin_ia32_vcvtuqq2ph512_mask_round ((__v8di) __A, + _mm_setzero_ph (), + (__mmask8) -1, + _MM_FROUND_CUR_DIRECTION); } extern __inline __m128h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_mask_cvtss_sh (__m128h __A, __mmask8 __B, __m128h __C, __m128 __D) +_mm512_mask_cvtepu64_ph (__m128h __A, __mmask8 __B, __m512i __C) { - return __builtin_ia32_vcvtss2sh_mask_round (__D, __C, __A, __B, - _MM_FROUND_CUR_DIRECTION); + return __builtin_ia32_vcvtuqq2ph512_mask_round ((__v8di) __C, + __A, + __B, + _MM_FROUND_CUR_DIRECTION); } extern __inline __m128h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_maskz_cvtss_sh (__mmask8 __A, __m128h __B, __m128 __C) +_mm512_maskz_cvtepu64_ph (__mmask8 __A, __m512i __B) { - return __builtin_ia32_vcvtss2sh_mask_round (__C, __B, - _mm_setzero_ph (), - __A, _MM_FROUND_CUR_DIRECTION); + return __builtin_ia32_vcvtuqq2ph512_mask_round ((__v8di) __B, + _mm_setzero_ph (), + __A, + _MM_FROUND_CUR_DIRECTION); } +#ifdef __OPTIMIZE__ extern __inline __m128h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_cvtsd_sh (__m128h __A, __m128d __B) +_mm512_cvt_roundepu64_ph (__m512i __A, int __B) { - return __builtin_ia32_vcvtsd2sh_mask_round (__B, __A, - _mm_setzero_ph (), - (__mmask8) -1, - _MM_FROUND_CUR_DIRECTION); + return __builtin_ia32_vcvtuqq2ph512_mask_round ((__v8di) __A, + _mm_setzero_ph (), + (__mmask8) -1, + __B); } extern __inline __m128h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_mask_cvtsd_sh (__m128h __A, __mmask8 __B, __m128h __C, __m128d __D) +_mm512_mask_cvt_roundepu64_ph (__m128h __A, __mmask8 __B, __m512i __C, int __D) { - return __builtin_ia32_vcvtsd2sh_mask_round (__D, __C, __A, __B, - _MM_FROUND_CUR_DIRECTION); + return __builtin_ia32_vcvtuqq2ph512_mask_round ((__v8di) __C, + __A, + __B, + __D); } extern __inline __m128h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_maskz_cvtsd_sh (__mmask8 __A, __m128h __B, __m128d __C) +_mm512_maskz_cvt_roundepu64_ph (__mmask8 __A, __m512i __B, int __C) { - return __builtin_ia32_vcvtsd2sh_mask_round (__C, __B, - _mm_setzero_ph (), - __A, _MM_FROUND_CUR_DIRECTION); + return __builtin_ia32_vcvtuqq2ph512_mask_round ((__v8di) __B, + _mm_setzero_ph (), + __A, + __C); } -#ifdef __OPTIMIZE__ -extern __inline __m128h +#else +#define _mm512_cvt_roundepu64_ph(A, B) \ + (__builtin_ia32_vcvtuqq2ph512_mask_round ((__v8di)(A), \ + _mm_setzero_ph (), \ + (__mmask8)-1, \ + (B))) + +#define _mm512_mask_cvt_roundepu64_ph(A, B, C, D) \ + (__builtin_ia32_vcvtuqq2ph512_mask_round ((__v8di)(C), (A), (B), (D))) + +#define _mm512_maskz_cvt_roundepu64_ph(A, B, C) \ + (__builtin_ia32_vcvtuqq2ph512_mask_round ((__v8di)(B), \ + _mm_setzero_ph (), \ + (A), \ + (C))) + +#endif /* __OPTIMIZE__ */ + +/* Intrinsics vcvtph2w. */ +extern __inline __m512i __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_cvt_roundss_sh (__m128h __A, __m128 __B, const int __R) +_mm512_cvtph_epi16 (__m512h __A) { - return __builtin_ia32_vcvtss2sh_mask_round (__B, __A, - _mm_setzero_ph (), - (__mmask8) -1, __R); + return (__m512i) + __builtin_ia32_vcvtph2w512_mask_round (__A, + (__v32hi) + _mm512_setzero_si512 (), + (__mmask32) -1, + _MM_FROUND_CUR_DIRECTION); } -extern __inline __m128h +extern __inline __m512i __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_mask_cvt_roundss_sh (__m128h __A, __mmask8 __B, __m128h __C, __m128 __D, - const int __R) +_mm512_mask_cvtph_epi16 (__m512i __A, __mmask32 __B, __m512h __C) { - return __builtin_ia32_vcvtss2sh_mask_round (__D, __C, __A, __B, __R); + return (__m512i) + __builtin_ia32_vcvtph2w512_mask_round (__C, + (__v32hi) __A, + __B, + _MM_FROUND_CUR_DIRECTION); } -extern __inline __m128h +extern __inline __m512i __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_maskz_cvt_roundss_sh (__mmask8 __A, __m128h __B, __m128 __C, - const int __R) +_mm512_maskz_cvtph_epi16 (__mmask32 __A, __m512h __B) { - return __builtin_ia32_vcvtss2sh_mask_round (__C, __B, - _mm_setzero_ph (), - __A, __R); + return (__m512i) + __builtin_ia32_vcvtph2w512_mask_round (__B, + (__v32hi) + _mm512_setzero_si512 (), + __A, + _MM_FROUND_CUR_DIRECTION); } -extern __inline __m128h +#ifdef __OPTIMIZE__ +extern __inline __m512i __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_cvt_roundsd_sh (__m128h __A, __m128d __B, const int __R) +_mm512_cvt_roundph_epi16 (__m512h __A, int __B) { - return __builtin_ia32_vcvtsd2sh_mask_round (__B, __A, - _mm_setzero_ph (), - (__mmask8) -1, __R); + return (__m512i) + __builtin_ia32_vcvtph2w512_mask_round (__A, + (__v32hi) + _mm512_setzero_si512 (), + (__mmask32) -1, + __B); } -extern __inline __m128h +extern __inline __m512i __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_mask_cvt_roundsd_sh (__m128h __A, __mmask8 __B, __m128h __C, __m128d __D, - const int __R) +_mm512_mask_cvt_roundph_epi16 (__m512i __A, __mmask32 __B, __m512h __C, int __D) { - return __builtin_ia32_vcvtsd2sh_mask_round (__D, __C, __A, __B, __R); + return (__m512i) + __builtin_ia32_vcvtph2w512_mask_round (__C, + (__v32hi) __A, + __B, + __D); } -extern __inline __m128h +extern __inline __m512i __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_maskz_cvt_roundsd_sh (__mmask8 __A, __m128h __B, __m128d __C, - const int __R) +_mm512_maskz_cvt_roundph_epi16 (__mmask32 __A, __m512h __B, int __C) { - return __builtin_ia32_vcvtsd2sh_mask_round (__C, __B, - _mm_setzero_ph (), - __A, __R); + return (__m512i) + __builtin_ia32_vcvtph2w512_mask_round (__B, + (__v32hi) + _mm512_setzero_si512 (), + __A, + __C); } #else -#define _mm_cvt_roundss_sh(A, B, R) \ - (__builtin_ia32_vcvtss2sh_mask_round ((B), (A), \ - _mm_setzero_ph (), \ - (__mmask8) -1, R)) - -#define _mm_mask_cvt_roundss_sh(A, B, C, D, R) \ - (__builtin_ia32_vcvtss2sh_mask_round ((D), (C), (A), (B), (R))) - -#define _mm_maskz_cvt_roundss_sh(A, B, C, R) \ - (__builtin_ia32_vcvtss2sh_mask_round ((C), (B), \ - _mm_setzero_ph (), \ - A, R)) - -#define _mm_cvt_roundsd_sh(A, B, R) \ - (__builtin_ia32_vcvtsd2sh_mask_round ((B), (A), \ - _mm_setzero_ph (), \ - (__mmask8) -1, R)) +#define _mm512_cvt_roundph_epi16(A, B) \ + ((__m512i)__builtin_ia32_vcvtph2w512_mask_round ((A), \ + (__v32hi) \ + _mm512_setzero_si512 (), \ + (__mmask32)-1, \ + (B))) -#define _mm_mask_cvt_roundsd_sh(A, B, C, D, R) \ - (__builtin_ia32_vcvtsd2sh_mask_round ((D), (C), (A), (B), (R))) +#define _mm512_mask_cvt_roundph_epi16(A, B, C, D) \ + ((__m512i)__builtin_ia32_vcvtph2w512_mask_round ((C), \ + (__v32hi)(A), \ + (B), \ + (D))) -#define _mm_maskz_cvt_roundsd_sh(A, B, C, R) \ - (__builtin_ia32_vcvtsd2sh_mask_round ((C), (B), \ - _mm_setzero_ph (), \ - (A), (R))) +#define _mm512_maskz_cvt_roundph_epi16(A, B, C) \ + ((__m512i)__builtin_ia32_vcvtph2w512_mask_round ((B), \ + (__v32hi) \ + _mm512_setzero_si512 (), \ + (A), \ + (C))) #endif /* __OPTIMIZE__ */ -/* Intrinsics vfmaddsub[132,213,231]ph. */ -extern __inline __m512h -__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_fmaddsub_ph (__m512h __A, __m512h __B, __m512h __C) -{ - return (__m512h) - __builtin_ia32_vfmaddsubph512_mask ((__v32hf) __A, - (__v32hf) __B, - (__v32hf) __C, - (__mmask32) -1, - _MM_FROUND_CUR_DIRECTION); -} - -extern __inline __m512h +/* Intrinsics vcvtph2uw. */ +extern __inline __m512i __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_mask_fmaddsub_ph (__m512h __A, __mmask32 __U, __m512h __B, __m512h __C) +_mm512_cvtph_epu16 (__m512h __A) { - return (__m512h) - __builtin_ia32_vfmaddsubph512_mask ((__v32hf) __A, - (__v32hf) __B, - (__v32hf) __C, - (__mmask32) __U, - _MM_FROUND_CUR_DIRECTION); + return (__m512i) + __builtin_ia32_vcvtph2uw512_mask_round (__A, + (__v32hi) + _mm512_setzero_si512 (), + (__mmask32) -1, + _MM_FROUND_CUR_DIRECTION); } -extern __inline __m512h +extern __inline __m512i __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_mask3_fmaddsub_ph (__m512h __A, __m512h __B, __m512h __C, __mmask32 __U) +_mm512_mask_cvtph_epu16 (__m512i __A, __mmask32 __B, __m512h __C) { - return (__m512h) - __builtin_ia32_vfmaddsubph512_mask3 ((__v32hf) __A, - (__v32hf) __B, - (__v32hf) __C, - (__mmask32) __U, - _MM_FROUND_CUR_DIRECTION); + return (__m512i) + __builtin_ia32_vcvtph2uw512_mask_round (__C, (__v32hi) __A, __B, + _MM_FROUND_CUR_DIRECTION); } -extern __inline __m512h +extern __inline __m512i __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_maskz_fmaddsub_ph (__mmask32 __U, __m512h __A, __m512h __B, __m512h __C) +_mm512_maskz_cvtph_epu16 (__mmask32 __A, __m512h __B) { - return (__m512h) - __builtin_ia32_vfmaddsubph512_maskz ((__v32hf) __A, - (__v32hf) __B, - (__v32hf) __C, - (__mmask32) __U, - _MM_FROUND_CUR_DIRECTION); + return (__m512i) + __builtin_ia32_vcvtph2uw512_mask_round (__B, + (__v32hi) + _mm512_setzero_si512 (), + __A, + _MM_FROUND_CUR_DIRECTION); } #ifdef __OPTIMIZE__ -extern __inline __m512h -__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_fmaddsub_round_ph (__m512h __A, __m512h __B, __m512h __C, const int __R) -{ - return (__m512h) - __builtin_ia32_vfmaddsubph512_mask ((__v32hf) __A, - (__v32hf) __B, - (__v32hf) __C, - (__mmask32) -1, __R); -} - -extern __inline __m512h +extern __inline __m512i __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_mask_fmaddsub_round_ph (__m512h __A, __mmask32 __U, __m512h __B, - __m512h __C, const int __R) +_mm512_cvt_roundph_epu16 (__m512h __A, int __B) { - return (__m512h) - __builtin_ia32_vfmaddsubph512_mask ((__v32hf) __A, - (__v32hf) __B, - (__v32hf) __C, - (__mmask32) __U, __R); + return (__m512i) + __builtin_ia32_vcvtph2uw512_mask_round (__A, + (__v32hi) + _mm512_setzero_si512 (), + (__mmask32) -1, + __B); } -extern __inline __m512h +extern __inline __m512i __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_mask3_fmaddsub_round_ph (__m512h __A, __m512h __B, __m512h __C, - __mmask32 __U, const int __R) +_mm512_mask_cvt_roundph_epu16 (__m512i __A, __mmask32 __B, __m512h __C, int __D) { - return (__m512h) - __builtin_ia32_vfmaddsubph512_mask3 ((__v32hf) __A, - (__v32hf) __B, - (__v32hf) __C, - (__mmask32) __U, __R); + return (__m512i) + __builtin_ia32_vcvtph2uw512_mask_round (__C, (__v32hi) __A, __B, __D); } -extern __inline __m512h +extern __inline __m512i __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_maskz_fmaddsub_round_ph (__mmask32 __U, __m512h __A, __m512h __B, - __m512h __C, const int __R) +_mm512_maskz_cvt_roundph_epu16 (__mmask32 __A, __m512h __B, int __C) { - return (__m512h) - __builtin_ia32_vfmaddsubph512_maskz ((__v32hf) __A, - (__v32hf) __B, - (__v32hf) __C, - (__mmask32) __U, __R); + return (__m512i) + __builtin_ia32_vcvtph2uw512_mask_round (__B, + (__v32hi) + _mm512_setzero_si512 (), + __A, + __C); } #else -#define _mm512_fmaddsub_round_ph(A, B, C, R) \ - ((__m512h)__builtin_ia32_vfmaddsubph512_mask ((A), (B), (C), -1, (R))) - -#define _mm512_mask_fmaddsub_round_ph(A, U, B, C, R) \ - ((__m512h)__builtin_ia32_vfmaddsubph512_mask ((A), (B), (C), (U), (R))) +#define _mm512_cvt_roundph_epu16(A, B) \ + ((__m512i) \ + __builtin_ia32_vcvtph2uw512_mask_round ((A), \ + (__v32hi) \ + _mm512_setzero_si512 (), \ + (__mmask32)-1, (B))) -#define _mm512_mask3_fmaddsub_round_ph(A, B, C, U, R) \ - ((__m512h)__builtin_ia32_vfmaddsubph512_mask3 ((A), (B), (C), (U), (R))) +#define _mm512_mask_cvt_roundph_epu16(A, B, C, D) \ + ((__m512i) \ + __builtin_ia32_vcvtph2uw512_mask_round ((C), (__v32hi)(A), (B), (D))) -#define _mm512_maskz_fmaddsub_round_ph(U, A, B, C, R) \ - ((__m512h)__builtin_ia32_vfmaddsubph512_maskz ((A), (B), (C), (U), (R))) +#define _mm512_maskz_cvt_roundph_epu16(A, B, C) \ + ((__m512i) \ + __builtin_ia32_vcvtph2uw512_mask_round ((B), \ + (__v32hi) \ + _mm512_setzero_si512 (), \ + (A), \ + (C))) #endif /* __OPTIMIZE__ */ -/* Intrinsics vfmsubadd[132,213,231]ph. */ -extern __inline __m512h -__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) - _mm512_fmsubadd_ph (__m512h __A, __m512h __B, __m512h __C) -{ - return (__m512h) - __builtin_ia32_vfmsubaddph512_mask ((__v32hf) __A, - (__v32hf) __B, - (__v32hf) __C, - (__mmask32) -1, - _MM_FROUND_CUR_DIRECTION); -} - -extern __inline __m512h +/* Intrinsics vcvttph2w. */ +extern __inline __m512i __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_mask_fmsubadd_ph (__m512h __A, __mmask32 __U, - __m512h __B, __m512h __C) +_mm512_cvttph_epi16 (__m512h __A) { - return (__m512h) - __builtin_ia32_vfmsubaddph512_mask ((__v32hf) __A, - (__v32hf) __B, - (__v32hf) __C, - (__mmask32) __U, - _MM_FROUND_CUR_DIRECTION); + return (__m512i) + __builtin_ia32_vcvttph2w512_mask_round (__A, + (__v32hi) + _mm512_setzero_si512 (), + (__mmask32) -1, + _MM_FROUND_CUR_DIRECTION); } -extern __inline __m512h +extern __inline __m512i __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_mask3_fmsubadd_ph (__m512h __A, __m512h __B, - __m512h __C, __mmask32 __U) +_mm512_mask_cvttph_epi16 (__m512i __A, __mmask32 __B, __m512h __C) { - return (__m512h) - __builtin_ia32_vfmsubaddph512_mask3 ((__v32hf) __A, - (__v32hf) __B, - (__v32hf) __C, - (__mmask32) __U, - _MM_FROUND_CUR_DIRECTION); + return (__m512i) + __builtin_ia32_vcvttph2w512_mask_round (__C, + (__v32hi) __A, + __B, + _MM_FROUND_CUR_DIRECTION); } -extern __inline __m512h +extern __inline __m512i __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_maskz_fmsubadd_ph (__mmask32 __U, __m512h __A, - __m512h __B, __m512h __C) +_mm512_maskz_cvttph_epi16 (__mmask32 __A, __m512h __B) { - return (__m512h) - __builtin_ia32_vfmsubaddph512_maskz ((__v32hf) __A, - (__v32hf) __B, - (__v32hf) __C, - (__mmask32) __U, - _MM_FROUND_CUR_DIRECTION); + return (__m512i) + __builtin_ia32_vcvttph2w512_mask_round (__B, + (__v32hi) + _mm512_setzero_si512 (), + __A, + _MM_FROUND_CUR_DIRECTION); } #ifdef __OPTIMIZE__ -extern __inline __m512h -__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_fmsubadd_round_ph (__m512h __A, __m512h __B, - __m512h __C, const int __R) -{ - return (__m512h) - __builtin_ia32_vfmsubaddph512_mask ((__v32hf) __A, - (__v32hf) __B, - (__v32hf) __C, - (__mmask32) -1, __R); -} - -extern __inline __m512h +extern __inline __m512i __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_mask_fmsubadd_round_ph (__m512h __A, __mmask32 __U, __m512h __B, - __m512h __C, const int __R) +_mm512_cvtt_roundph_epi16 (__m512h __A, int __B) { - return (__m512h) - __builtin_ia32_vfmsubaddph512_mask ((__v32hf) __A, - (__v32hf) __B, - (__v32hf) __C, - (__mmask32) __U, __R); + return (__m512i) + __builtin_ia32_vcvttph2w512_mask_round (__A, + (__v32hi) + _mm512_setzero_si512 (), + (__mmask32) -1, + __B); } -extern __inline __m512h +extern __inline __m512i __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_mask3_fmsubadd_round_ph (__m512h __A, __m512h __B, __m512h __C, - __mmask32 __U, const int __R) +_mm512_mask_cvtt_roundph_epi16 (__m512i __A, __mmask32 __B, + __m512h __C, int __D) { - return (__m512h) - __builtin_ia32_vfmsubaddph512_mask3 ((__v32hf) __A, - (__v32hf) __B, - (__v32hf) __C, - (__mmask32) __U, __R); + return (__m512i) + __builtin_ia32_vcvttph2w512_mask_round (__C, + (__v32hi) __A, + __B, + __D); } -extern __inline __m512h +extern __inline __m512i __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_maskz_fmsubadd_round_ph (__mmask32 __U, __m512h __A, __m512h __B, - __m512h __C, const int __R) +_mm512_maskz_cvtt_roundph_epi16 (__mmask32 __A, __m512h __B, int __C) { - return (__m512h) - __builtin_ia32_vfmsubaddph512_maskz ((__v32hf) __A, - (__v32hf) __B, - (__v32hf) __C, - (__mmask32) __U, __R); + return (__m512i) + __builtin_ia32_vcvttph2w512_mask_round (__B, + (__v32hi) + _mm512_setzero_si512 (), + __A, + __C); } #else -#define _mm512_fmsubadd_round_ph(A, B, C, R) \ - ((__m512h)__builtin_ia32_vfmsubaddph512_mask ((A), (B), (C), -1, (R))) - -#define _mm512_mask_fmsubadd_round_ph(A, U, B, C, R) \ - ((__m512h)__builtin_ia32_vfmsubaddph512_mask ((A), (B), (C), (U), (R))) +#define _mm512_cvtt_roundph_epi16(A, B) \ + ((__m512i) \ + __builtin_ia32_vcvttph2w512_mask_round ((A), \ + (__v32hi) \ + _mm512_setzero_si512 (), \ + (__mmask32)-1, \ + (B))) -#define _mm512_mask3_fmsubadd_round_ph(A, B, C, U, R) \ - ((__m512h)__builtin_ia32_vfmsubaddph512_mask3 ((A), (B), (C), (U), (R))) +#define _mm512_mask_cvtt_roundph_epi16(A, B, C, D) \ + ((__m512i) \ + __builtin_ia32_vcvttph2w512_mask_round ((C), \ + (__v32hi)(A), \ + (B), \ + (D))) -#define _mm512_maskz_fmsubadd_round_ph(U, A, B, C, R) \ - ((__m512h)__builtin_ia32_vfmsubaddph512_maskz ((A), (B), (C), (U), (R))) +#define _mm512_maskz_cvtt_roundph_epi16(A, B, C) \ + ((__m512i) \ + __builtin_ia32_vcvttph2w512_mask_round ((B), \ + (__v32hi) \ + _mm512_setzero_si512 (), \ + (A), \ + (C))) #endif /* __OPTIMIZE__ */ -/* Intrinsics vfmadd[132,213,231]ph. */ -extern __inline __m512h -__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) - _mm512_fmadd_ph (__m512h __A, __m512h __B, __m512h __C) -{ - return (__m512h) - __builtin_ia32_vfmaddph512_mask ((__v32hf) __A, - (__v32hf) __B, - (__v32hf) __C, - (__mmask32) -1, - _MM_FROUND_CUR_DIRECTION); -} - -extern __inline __m512h -__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_mask_fmadd_ph (__m512h __A, __mmask32 __U, __m512h __B, __m512h __C) -{ - return (__m512h) - __builtin_ia32_vfmaddph512_mask ((__v32hf) __A, - (__v32hf) __B, - (__v32hf) __C, - (__mmask32) __U, - _MM_FROUND_CUR_DIRECTION); -} - -extern __inline __m512h +/* Intrinsics vcvttph2uw. */ +extern __inline __m512i __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_mask3_fmadd_ph (__m512h __A, __m512h __B, __m512h __C, __mmask32 __U) +_mm512_cvttph_epu16 (__m512h __A) { - return (__m512h) - __builtin_ia32_vfmaddph512_mask3 ((__v32hf) __A, - (__v32hf) __B, - (__v32hf) __C, - (__mmask32) __U, - _MM_FROUND_CUR_DIRECTION); + return (__m512i) + __builtin_ia32_vcvttph2uw512_mask_round (__A, + (__v32hi) + _mm512_setzero_si512 (), + (__mmask32) -1, + _MM_FROUND_CUR_DIRECTION); } -extern __inline __m512h +extern __inline __m512i __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_maskz_fmadd_ph (__mmask32 __U, __m512h __A, __m512h __B, __m512h __C) +_mm512_mask_cvttph_epu16 (__m512i __A, __mmask32 __B, __m512h __C) { - return (__m512h) - __builtin_ia32_vfmaddph512_maskz ((__v32hf) __A, - (__v32hf) __B, - (__v32hf) __C, - (__mmask32) __U, - _MM_FROUND_CUR_DIRECTION); + return (__m512i) + __builtin_ia32_vcvttph2uw512_mask_round (__C, + (__v32hi) __A, + __B, + _MM_FROUND_CUR_DIRECTION); } -#ifdef __OPTIMIZE__ -extern __inline __m512h +extern __inline __m512i __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_fmadd_round_ph (__m512h __A, __m512h __B, __m512h __C, const int __R) +_mm512_maskz_cvttph_epu16 (__mmask32 __A, __m512h __B) { - return (__m512h) __builtin_ia32_vfmaddph512_mask ((__v32hf) __A, - (__v32hf) __B, - (__v32hf) __C, - (__mmask32) -1, __R); + return (__m512i) + __builtin_ia32_vcvttph2uw512_mask_round (__B, + (__v32hi) + _mm512_setzero_si512 (), + __A, + _MM_FROUND_CUR_DIRECTION); } -extern __inline __m512h +#ifdef __OPTIMIZE__ +extern __inline __m512i __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_mask_fmadd_round_ph (__m512h __A, __mmask32 __U, __m512h __B, - __m512h __C, const int __R) +_mm512_cvtt_roundph_epu16 (__m512h __A, int __B) { - return (__m512h) __builtin_ia32_vfmaddph512_mask ((__v32hf) __A, - (__v32hf) __B, - (__v32hf) __C, - (__mmask32) __U, __R); + return (__m512i) + __builtin_ia32_vcvttph2uw512_mask_round (__A, + (__v32hi) + _mm512_setzero_si512 (), + (__mmask32) -1, + __B); } -extern __inline __m512h +extern __inline __m512i __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_mask3_fmadd_round_ph (__m512h __A, __m512h __B, __m512h __C, - __mmask32 __U, const int __R) +_mm512_mask_cvtt_roundph_epu16 (__m512i __A, __mmask32 __B, + __m512h __C, int __D) { - return (__m512h) __builtin_ia32_vfmaddph512_mask3 ((__v32hf) __A, - (__v32hf) __B, - (__v32hf) __C, - (__mmask32) __U, __R); + return (__m512i) + __builtin_ia32_vcvttph2uw512_mask_round (__C, + (__v32hi) __A, + __B, + __D); } -extern __inline __m512h +extern __inline __m512i __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_maskz_fmadd_round_ph (__mmask32 __U, __m512h __A, __m512h __B, - __m512h __C, const int __R) +_mm512_maskz_cvtt_roundph_epu16 (__mmask32 __A, __m512h __B, int __C) { - return (__m512h) __builtin_ia32_vfmaddph512_maskz ((__v32hf) __A, - (__v32hf) __B, - (__v32hf) __C, - (__mmask32) __U, __R); + return (__m512i) + __builtin_ia32_vcvttph2uw512_mask_round (__B, + (__v32hi) + _mm512_setzero_si512 (), + __A, + __C); } #else -#define _mm512_fmadd_round_ph(A, B, C, R) \ - ((__m512h)__builtin_ia32_vfmaddph512_mask ((A), (B), (C), -1, (R))) - -#define _mm512_mask_fmadd_round_ph(A, U, B, C, R) \ - ((__m512h)__builtin_ia32_vfmaddph512_mask ((A), (B), (C), (U), (R))) +#define _mm512_cvtt_roundph_epu16(A, B) \ + ((__m512i) \ + __builtin_ia32_vcvttph2uw512_mask_round ((A), \ + (__v32hi) \ + _mm512_setzero_si512 (), \ + (__mmask32)-1, \ + (B))) -#define _mm512_mask3_fmadd_round_ph(A, B, C, U, R) \ - ((__m512h)__builtin_ia32_vfmaddph512_mask3 ((A), (B), (C), (U), (R))) +#define _mm512_mask_cvtt_roundph_epu16(A, B, C, D) \ + ((__m512i) \ + __builtin_ia32_vcvttph2uw512_mask_round ((C), \ + (__v32hi)(A), \ + (B), \ + (D))) -#define _mm512_maskz_fmadd_round_ph(U, A, B, C, R) \ - ((__m512h)__builtin_ia32_vfmaddph512_maskz ((A), (B), (C), (U), (R))) +#define _mm512_maskz_cvtt_roundph_epu16(A, B, C) \ + ((__m512i) \ + __builtin_ia32_vcvttph2uw512_mask_round ((B), \ + (__v32hi) \ + _mm512_setzero_si512 (), \ + (A), \ + (C))) #endif /* __OPTIMIZE__ */ -/* Intrinsics vfnmadd[132,213,231]ph. */ -extern __inline __m512h -__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_fnmadd_ph (__m512h __A, __m512h __B, __m512h __C) -{ - return (__m512h) - __builtin_ia32_vfnmaddph512_mask ((__v32hf) __A, - (__v32hf) __B, - (__v32hf) __C, - (__mmask32) -1, - _MM_FROUND_CUR_DIRECTION); -} - +/* Intrinsics vcvtw2ph. */ extern __inline __m512h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_mask_fnmadd_ph (__m512h __A, __mmask32 __U, __m512h __B, __m512h __C) +_mm512_cvtepi16_ph (__m512i __A) { - return (__m512h) - __builtin_ia32_vfnmaddph512_mask ((__v32hf) __A, - (__v32hf) __B, - (__v32hf) __C, - (__mmask32) __U, - _MM_FROUND_CUR_DIRECTION); + return __builtin_ia32_vcvtw2ph512_mask_round ((__v32hi) __A, + _mm512_setzero_ph (), + (__mmask32) -1, + _MM_FROUND_CUR_DIRECTION); } extern __inline __m512h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_mask3_fnmadd_ph (__m512h __A, __m512h __B, __m512h __C, __mmask32 __U) +_mm512_mask_cvtepi16_ph (__m512h __A, __mmask32 __B, __m512i __C) { - return (__m512h) - __builtin_ia32_vfnmaddph512_mask3 ((__v32hf) __A, - (__v32hf) __B, - (__v32hf) __C, - (__mmask32) __U, - _MM_FROUND_CUR_DIRECTION); + return __builtin_ia32_vcvtw2ph512_mask_round ((__v32hi) __C, + __A, + __B, + _MM_FROUND_CUR_DIRECTION); } extern __inline __m512h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_maskz_fnmadd_ph (__mmask32 __U, __m512h __A, __m512h __B, __m512h __C) +_mm512_maskz_cvtepi16_ph (__mmask32 __A, __m512i __B) { - return (__m512h) - __builtin_ia32_vfnmaddph512_maskz ((__v32hf) __A, - (__v32hf) __B, - (__v32hf) __C, - (__mmask32) __U, - _MM_FROUND_CUR_DIRECTION); + return __builtin_ia32_vcvtw2ph512_mask_round ((__v32hi) __B, + _mm512_setzero_ph (), + __A, + _MM_FROUND_CUR_DIRECTION); } #ifdef __OPTIMIZE__ extern __inline __m512h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_fnmadd_round_ph (__m512h __A, __m512h __B, __m512h __C, const int __R) -{ - return (__m512h) __builtin_ia32_vfnmaddph512_mask ((__v32hf) __A, - (__v32hf) __B, - (__v32hf) __C, - (__mmask32) -1, __R); -} - -extern __inline __m512h -__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_mask_fnmadd_round_ph (__m512h __A, __mmask32 __U, __m512h __B, - __m512h __C, const int __R) +_mm512_cvt_roundepi16_ph (__m512i __A, int __B) { - return (__m512h) __builtin_ia32_vfnmaddph512_mask ((__v32hf) __A, - (__v32hf) __B, - (__v32hf) __C, - (__mmask32) __U, __R); + return __builtin_ia32_vcvtw2ph512_mask_round ((__v32hi) __A, + _mm512_setzero_ph (), + (__mmask32) -1, + __B); } extern __inline __m512h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_mask3_fnmadd_round_ph (__m512h __A, __m512h __B, __m512h __C, - __mmask32 __U, const int __R) +_mm512_mask_cvt_roundepi16_ph (__m512h __A, __mmask32 __B, __m512i __C, int __D) { - return (__m512h) __builtin_ia32_vfnmaddph512_mask3 ((__v32hf) __A, - (__v32hf) __B, - (__v32hf) __C, - (__mmask32) __U, __R); + return __builtin_ia32_vcvtw2ph512_mask_round ((__v32hi) __C, + __A, + __B, + __D); } extern __inline __m512h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_maskz_fnmadd_round_ph (__mmask32 __U, __m512h __A, __m512h __B, - __m512h __C, const int __R) +_mm512_maskz_cvt_roundepi16_ph (__mmask32 __A, __m512i __B, int __C) { - return (__m512h) __builtin_ia32_vfnmaddph512_maskz ((__v32hf) __A, - (__v32hf) __B, - (__v32hf) __C, - (__mmask32) __U, __R); + return __builtin_ia32_vcvtw2ph512_mask_round ((__v32hi) __B, + _mm512_setzero_ph (), + __A, + __C); } #else -#define _mm512_fnmadd_round_ph(A, B, C, R) \ - ((__m512h)__builtin_ia32_vfnmaddph512_mask ((A), (B), (C), -1, (R))) - -#define _mm512_mask_fnmadd_round_ph(A, U, B, C, R) \ - ((__m512h)__builtin_ia32_vfnmaddph512_mask ((A), (B), (C), (U), (R))) +#define _mm512_cvt_roundepi16_ph(A, B) \ + (__builtin_ia32_vcvtw2ph512_mask_round ((__v32hi)(A), \ + _mm512_setzero_ph (), \ + (__mmask32)-1, \ + (B))) -#define _mm512_mask3_fnmadd_round_ph(A, B, C, U, R) \ - ((__m512h)__builtin_ia32_vfnmaddph512_mask3 ((A), (B), (C), (U), (R))) +#define _mm512_mask_cvt_roundepi16_ph(A, B, C, D) \ + (__builtin_ia32_vcvtw2ph512_mask_round ((__v32hi)(C), \ + (A), \ + (B), \ + (D))) -#define _mm512_maskz_fnmadd_round_ph(U, A, B, C, R) \ - ((__m512h)__builtin_ia32_vfnmaddph512_maskz ((A), (B), (C), (U), (R))) +#define _mm512_maskz_cvt_roundepi16_ph(A, B, C) \ + (__builtin_ia32_vcvtw2ph512_mask_round ((__v32hi)(B), \ + _mm512_setzero_ph (), \ + (A), \ + (C))) #endif /* __OPTIMIZE__ */ -/* Intrinsics vfmsub[132,213,231]ph. */ -extern __inline __m512h -__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_fmsub_ph (__m512h __A, __m512h __B, __m512h __C) -{ - return (__m512h) - __builtin_ia32_vfmsubph512_mask ((__v32hf) __A, - (__v32hf) __B, - (__v32hf) __C, - (__mmask32) -1, - _MM_FROUND_CUR_DIRECTION); -} - -extern __inline __m512h -__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_mask_fmsub_ph (__m512h __A, __mmask32 __U, __m512h __B, __m512h __C) -{ - return (__m512h) - __builtin_ia32_vfmsubph512_mask ((__v32hf) __A, - (__v32hf) __B, - (__v32hf) __C, - (__mmask32) __U, - _MM_FROUND_CUR_DIRECTION); -} +/* Intrinsics vcvtuw2ph. */ + extern __inline __m512h + __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) + _mm512_cvtepu16_ph (__m512i __A) + { + return __builtin_ia32_vcvtuw2ph512_mask_round ((__v32hi) __A, + _mm512_setzero_ph (), + (__mmask32) -1, + _MM_FROUND_CUR_DIRECTION); + } extern __inline __m512h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_mask3_fmsub_ph (__m512h __A, __m512h __B, __m512h __C, __mmask32 __U) +_mm512_mask_cvtepu16_ph (__m512h __A, __mmask32 __B, __m512i __C) { - return (__m512h) - __builtin_ia32_vfmsubph512_mask3 ((__v32hf) __A, - (__v32hf) __B, - (__v32hf) __C, - (__mmask32) __U, - _MM_FROUND_CUR_DIRECTION); + return __builtin_ia32_vcvtuw2ph512_mask_round ((__v32hi) __C, + __A, + __B, + _MM_FROUND_CUR_DIRECTION); } extern __inline __m512h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_maskz_fmsub_ph (__mmask32 __U, __m512h __A, __m512h __B, __m512h __C) +_mm512_maskz_cvtepu16_ph (__mmask32 __A, __m512i __B) { - return (__m512h) - __builtin_ia32_vfmsubph512_maskz ((__v32hf) __A, - (__v32hf) __B, - (__v32hf) __C, - (__mmask32) __U, - _MM_FROUND_CUR_DIRECTION); + return __builtin_ia32_vcvtuw2ph512_mask_round ((__v32hi) __B, + _mm512_setzero_ph (), + __A, + _MM_FROUND_CUR_DIRECTION); } #ifdef __OPTIMIZE__ extern __inline __m512h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_fmsub_round_ph (__m512h __A, __m512h __B, __m512h __C, const int __R) -{ - return (__m512h) __builtin_ia32_vfmsubph512_mask ((__v32hf) __A, - (__v32hf) __B, - (__v32hf) __C, - (__mmask32) -1, __R); -} - -extern __inline __m512h -__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_mask_fmsub_round_ph (__m512h __A, __mmask32 __U, __m512h __B, - __m512h __C, const int __R) +_mm512_cvt_roundepu16_ph (__m512i __A, int __B) { - return (__m512h) __builtin_ia32_vfmsubph512_mask ((__v32hf) __A, - (__v32hf) __B, - (__v32hf) __C, - (__mmask32) __U, __R); + return __builtin_ia32_vcvtuw2ph512_mask_round ((__v32hi) __A, + _mm512_setzero_ph (), + (__mmask32) -1, + __B); } extern __inline __m512h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_mask3_fmsub_round_ph (__m512h __A, __m512h __B, __m512h __C, - __mmask32 __U, const int __R) +_mm512_mask_cvt_roundepu16_ph (__m512h __A, __mmask32 __B, __m512i __C, int __D) { - return (__m512h) __builtin_ia32_vfmsubph512_mask3 ((__v32hf) __A, - (__v32hf) __B, - (__v32hf) __C, - (__mmask32) __U, __R); + return __builtin_ia32_vcvtuw2ph512_mask_round ((__v32hi) __C, + __A, + __B, + __D); } extern __inline __m512h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_maskz_fmsub_round_ph (__mmask32 __U, __m512h __A, __m512h __B, - __m512h __C, const int __R) +_mm512_maskz_cvt_roundepu16_ph (__mmask32 __A, __m512i __B, int __C) { - return (__m512h) __builtin_ia32_vfmsubph512_maskz ((__v32hf) __A, - (__v32hf) __B, - (__v32hf) __C, - (__mmask32) __U, __R); + return __builtin_ia32_vcvtuw2ph512_mask_round ((__v32hi) __B, + _mm512_setzero_ph (), + __A, + __C); } #else -#define _mm512_fmsub_round_ph(A, B, C, R) \ - ((__m512h)__builtin_ia32_vfmsubph512_mask ((A), (B), (C), -1, (R))) - -#define _mm512_mask_fmsub_round_ph(A, U, B, C, R) \ - ((__m512h)__builtin_ia32_vfmsubph512_mask ((A), (B), (C), (U), (R))) +#define _mm512_cvt_roundepu16_ph(A, B) \ + (__builtin_ia32_vcvtuw2ph512_mask_round ((__v32hi)(A), \ + _mm512_setzero_ph (), \ + (__mmask32)-1, \ + (B))) -#define _mm512_mask3_fmsub_round_ph(A, B, C, U, R) \ - ((__m512h)__builtin_ia32_vfmsubph512_mask3 ((A), (B), (C), (U), (R))) +#define _mm512_mask_cvt_roundepu16_ph(A, B, C, D) \ + (__builtin_ia32_vcvtuw2ph512_mask_round ((__v32hi)(C), \ + (A), \ + (B), \ + (D))) -#define _mm512_maskz_fmsub_round_ph(U, A, B, C, R) \ - ((__m512h)__builtin_ia32_vfmsubph512_maskz ((A), (B), (C), (U), (R))) +#define _mm512_maskz_cvt_roundepu16_ph(A, B, C) \ + (__builtin_ia32_vcvtuw2ph512_mask_round ((__v32hi)(B), \ + _mm512_setzero_ph (), \ + (A), \ + (C))) #endif /* __OPTIMIZE__ */ -/* Intrinsics vfnmsub[132,213,231]ph. */ -extern __inline __m512h +/* Intrinsics vcvtph2pd. */ +extern __inline __m512d __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_fnmsub_ph (__m512h __A, __m512h __B, __m512h __C) +_mm512_cvtph_pd (__m128h __A) { - return (__m512h) - __builtin_ia32_vfnmsubph512_mask ((__v32hf) __A, - (__v32hf) __B, - (__v32hf) __C, - (__mmask32) -1, - _MM_FROUND_CUR_DIRECTION); + return __builtin_ia32_vcvtph2pd512_mask_round (__A, + _mm512_setzero_pd (), + (__mmask8) -1, + _MM_FROUND_CUR_DIRECTION); } -extern __inline __m512h +extern __inline __m512d __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_mask_fnmsub_ph (__m512h __A, __mmask32 __U, __m512h __B, __m512h __C) +_mm512_mask_cvtph_pd (__m512d __A, __mmask8 __B, __m128h __C) { - return (__m512h) - __builtin_ia32_vfnmsubph512_mask ((__v32hf) __A, - (__v32hf) __B, - (__v32hf) __C, - (__mmask32) __U, - _MM_FROUND_CUR_DIRECTION); + return __builtin_ia32_vcvtph2pd512_mask_round (__C, __A, __B, + _MM_FROUND_CUR_DIRECTION); } -extern __inline __m512h +extern __inline __m512d __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_mask3_fnmsub_ph (__m512h __A, __m512h __B, __m512h __C, __mmask32 __U) +_mm512_maskz_cvtph_pd (__mmask8 __A, __m128h __B) { - return (__m512h) - __builtin_ia32_vfnmsubph512_mask3 ((__v32hf) __A, - (__v32hf) __B, - (__v32hf) __C, - (__mmask32) __U, - _MM_FROUND_CUR_DIRECTION); + return __builtin_ia32_vcvtph2pd512_mask_round (__B, + _mm512_setzero_pd (), + __A, + _MM_FROUND_CUR_DIRECTION); } -extern __inline __m512h +#ifdef __OPTIMIZE__ +extern __inline __m512d __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_maskz_fnmsub_ph (__mmask32 __U, __m512h __A, __m512h __B, __m512h __C) +_mm512_cvt_roundph_pd (__m128h __A, int __B) { - return (__m512h) - __builtin_ia32_vfnmsubph512_maskz ((__v32hf) __A, - (__v32hf) __B, - (__v32hf) __C, - (__mmask32) __U, - _MM_FROUND_CUR_DIRECTION); + return __builtin_ia32_vcvtph2pd512_mask_round (__A, + _mm512_setzero_pd (), + (__mmask8) -1, + __B); } -#ifdef __OPTIMIZE__ -extern __inline __m512h +extern __inline __m512d __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_fnmsub_round_ph (__m512h __A, __m512h __B, __m512h __C, const int __R) +_mm512_mask_cvt_roundph_pd (__m512d __A, __mmask8 __B, __m128h __C, int __D) { - return (__m512h) __builtin_ia32_vfnmsubph512_mask ((__v32hf) __A, - (__v32hf) __B, - (__v32hf) __C, - (__mmask32) -1, __R); + return __builtin_ia32_vcvtph2pd512_mask_round (__C, __A, __B, __D); } -extern __inline __m512h +extern __inline __m512d __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_mask_fnmsub_round_ph (__m512h __A, __mmask32 __U, __m512h __B, - __m512h __C, const int __R) +_mm512_maskz_cvt_roundph_pd (__mmask8 __A, __m128h __B, int __C) { - return (__m512h) __builtin_ia32_vfnmsubph512_mask ((__v32hf) __A, - (__v32hf) __B, - (__v32hf) __C, - (__mmask32) __U, __R); + return __builtin_ia32_vcvtph2pd512_mask_round (__B, + _mm512_setzero_pd (), + __A, + __C); } -extern __inline __m512h +#else +#define _mm512_cvt_roundph_pd(A, B) \ + (__builtin_ia32_vcvtph2pd512_mask_round ((A), \ + _mm512_setzero_pd (), \ + (__mmask8)-1, \ + (B))) + +#define _mm512_mask_cvt_roundph_pd(A, B, C, D) \ + (__builtin_ia32_vcvtph2pd512_mask_round ((C), (A), (B), (D))) + +#define _mm512_maskz_cvt_roundph_pd(A, B, C) \ + (__builtin_ia32_vcvtph2pd512_mask_round ((B), \ + _mm512_setzero_pd (), \ + (A), \ + (C))) + +#endif /* __OPTIMIZE__ */ + +/* Intrinsics vcvtph2psx. */ +extern __inline __m512 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_mask3_fnmsub_round_ph (__m512h __A, __m512h __B, __m512h __C, - __mmask32 __U, const int __R) +_mm512_cvtxph_ps (__m256h __A) { - return (__m512h) __builtin_ia32_vfnmsubph512_mask3 ((__v32hf) __A, - (__v32hf) __B, - (__v32hf) __C, - (__mmask32) __U, __R); + return __builtin_ia32_vcvtph2psx512_mask_round (__A, + _mm512_setzero_ps (), + (__mmask16) -1, + _MM_FROUND_CUR_DIRECTION); } -extern __inline __m512h +extern __inline __m512 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_maskz_fnmsub_round_ph (__mmask32 __U, __m512h __A, __m512h __B, - __m512h __C, const int __R) +_mm512_mask_cvtxph_ps (__m512 __A, __mmask16 __B, __m256h __C) { - return (__m512h) __builtin_ia32_vfnmsubph512_maskz ((__v32hf) __A, - (__v32hf) __B, - (__v32hf) __C, - (__mmask32) __U, __R); + return __builtin_ia32_vcvtph2psx512_mask_round (__C, __A, __B, + _MM_FROUND_CUR_DIRECTION); } -#else -#define _mm512_fnmsub_round_ph(A, B, C, R) \ - ((__m512h)__builtin_ia32_vfnmsubph512_mask ((A), (B), (C), -1, (R))) - -#define _mm512_mask_fnmsub_round_ph(A, U, B, C, R) \ - ((__m512h)__builtin_ia32_vfnmsubph512_mask ((A), (B), (C), (U), (R))) - -#define _mm512_mask3_fnmsub_round_ph(A, B, C, U, R) \ - ((__m512h)__builtin_ia32_vfnmsubph512_mask3 ((A), (B), (C), (U), (R))) - -#define _mm512_maskz_fnmsub_round_ph(U, A, B, C, R) \ - ((__m512h)__builtin_ia32_vfnmsubph512_maskz ((A), (B), (C), (U), (R))) - -#endif /* __OPTIMIZE__ */ - -/* Intrinsics vfmadd[132,213,231]sh. */ -extern __inline __m128h - __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_fmadd_sh (__m128h __W, __m128h __A, __m128h __B) +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_cvtxph_ps (__mmask16 __A, __m256h __B) { - return (__m128h) __builtin_ia32_vfmaddsh3_mask ((__v8hf) __W, - (__v8hf) __A, - (__v8hf) __B, - (__mmask8) -1, + return __builtin_ia32_vcvtph2psx512_mask_round (__B, + _mm512_setzero_ps (), + __A, _MM_FROUND_CUR_DIRECTION); } -extern __inline __m128h +#ifdef __OPTIMIZE__ +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_cvtx_roundph_ps (__m256h __A, int __B) +{ + return __builtin_ia32_vcvtph2psx512_mask_round (__A, + _mm512_setzero_ps (), + (__mmask16) -1, + __B); +} + +extern __inline __m512 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_mask_fmadd_sh (__m128h __W, __mmask8 __U, __m128h __A, __m128h __B) +_mm512_mask_cvtx_roundph_ps (__m512 __A, __mmask16 __B, __m256h __C, int __D) { - return (__m128h) __builtin_ia32_vfmaddsh3_mask ((__v8hf) __W, - (__v8hf) __A, - (__v8hf) __B, - (__mmask8) __U, - _MM_FROUND_CUR_DIRECTION); + return __builtin_ia32_vcvtph2psx512_mask_round (__C, __A, __B, __D); } -extern __inline __m128h +extern __inline __m512 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_mask3_fmadd_sh (__m128h __W, __m128h __A, __m128h __B, __mmask8 __U) +_mm512_maskz_cvtx_roundph_ps (__mmask16 __A, __m256h __B, int __C) { - return (__m128h) __builtin_ia32_vfmaddsh3_mask3 ((__v8hf) __W, - (__v8hf) __A, - (__v8hf) __B, - (__mmask8) __U, - _MM_FROUND_CUR_DIRECTION); + return __builtin_ia32_vcvtph2psx512_mask_round (__B, + _mm512_setzero_ps (), + __A, + __C); } -extern __inline __m128h +#else +#define _mm512_cvtx_roundph_ps(A, B) \ + (__builtin_ia32_vcvtph2psx512_mask_round ((A), \ + _mm512_setzero_ps (), \ + (__mmask16)-1, \ + (B))) + +#define _mm512_mask_cvtx_roundph_ps(A, B, C, D) \ + (__builtin_ia32_vcvtph2psx512_mask_round ((C), (A), (B), (D))) + +#define _mm512_maskz_cvtx_roundph_ps(A, B, C) \ + (__builtin_ia32_vcvtph2psx512_mask_round ((B), \ + _mm512_setzero_ps (), \ + (A), \ + (C))) +#endif /* __OPTIMIZE__ */ + +/* Intrinsics vcvtps2ph. */ +extern __inline __m256h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_maskz_fmadd_sh (__mmask8 __U, __m128h __W, __m128h __A, __m128h __B) +_mm512_cvtxps_ph (__m512 __A) { - return (__m128h) __builtin_ia32_vfmaddsh3_maskz ((__v8hf) __W, - (__v8hf) __A, - (__v8hf) __B, - (__mmask8) __U, - _MM_FROUND_CUR_DIRECTION); + return __builtin_ia32_vcvtps2phx512_mask_round ((__v16sf) __A, + _mm256_setzero_ph (), + (__mmask16) -1, + _MM_FROUND_CUR_DIRECTION); } +extern __inline __m256h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_cvtxps_ph (__m256h __A, __mmask16 __B, __m512 __C) +{ + return __builtin_ia32_vcvtps2phx512_mask_round ((__v16sf) __C, + __A, __B, + _MM_FROUND_CUR_DIRECTION); +} -#ifdef __OPTIMIZE__ -extern __inline __m128h +extern __inline __m256h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_fmadd_round_sh (__m128h __W, __m128h __A, __m128h __B, const int __R) +_mm512_maskz_cvtxps_ph (__mmask16 __A, __m512 __B) { - return (__m128h) __builtin_ia32_vfmaddsh3_mask ((__v8hf) __W, - (__v8hf) __A, - (__v8hf) __B, - (__mmask8) -1, - __R); + return __builtin_ia32_vcvtps2phx512_mask_round ((__v16sf) __B, + _mm256_setzero_ph (), + __A, + _MM_FROUND_CUR_DIRECTION); } -extern __inline __m128h +#ifdef __OPTIMIZE__ +extern __inline __m256h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_mask_fmadd_round_sh (__m128h __W, __mmask8 __U, __m128h __A, __m128h __B, - const int __R) +_mm512_cvtx_roundps_ph (__m512 __A, int __B) { - return (__m128h) __builtin_ia32_vfmaddsh3_mask ((__v8hf) __W, - (__v8hf) __A, - (__v8hf) __B, - (__mmask8) __U, __R); + return __builtin_ia32_vcvtps2phx512_mask_round ((__v16sf) __A, + _mm256_setzero_ph (), + (__mmask16) -1, + __B); } -extern __inline __m128h +extern __inline __m256h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_mask3_fmadd_round_sh (__m128h __W, __m128h __A, __m128h __B, __mmask8 __U, - const int __R) +_mm512_mask_cvtx_roundps_ph (__m256h __A, __mmask16 __B, __m512 __C, int __D) { - return (__m128h) __builtin_ia32_vfmaddsh3_mask3 ((__v8hf) __W, - (__v8hf) __A, - (__v8hf) __B, - (__mmask8) __U, __R); + return __builtin_ia32_vcvtps2phx512_mask_round ((__v16sf) __C, + __A, __B, __D); } -extern __inline __m128h +extern __inline __m256h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_maskz_fmadd_round_sh (__mmask8 __U, __m128h __W, __m128h __A, - __m128h __B, const int __R) +_mm512_maskz_cvtx_roundps_ph (__mmask16 __A, __m512 __B, int __C) { - return (__m128h) __builtin_ia32_vfmaddsh3_maskz ((__v8hf) __W, - (__v8hf) __A, - (__v8hf) __B, - (__mmask8) __U, __R); + return __builtin_ia32_vcvtps2phx512_mask_round ((__v16sf) __B, + _mm256_setzero_ph (), + __A, __C); } #else -#define _mm_fmadd_round_sh(A, B, C, R) \ - ((__m128h) __builtin_ia32_vfmaddsh3_mask ((A), (B), (C), (-1), (R))) -#define _mm_mask_fmadd_round_sh(A, U, B, C, R) \ - ((__m128h) __builtin_ia32_vfmaddsh3_mask ((A), (B), (C), (U), (R))) -#define _mm_mask3_fmadd_round_sh(A, B, C, U, R) \ - ((__m128h) __builtin_ia32_vfmaddsh3_mask3 ((A), (B), (C), (U), (R))) -#define _mm_maskz_fmadd_round_sh(U, A, B, C, R) \ - ((__m128h) __builtin_ia32_vfmaddsh3_maskz ((A), (B), (C), (U), (R))) +#define _mm512_cvtx_roundps_ph(A, B) \ + (__builtin_ia32_vcvtps2phx512_mask_round ((__v16sf)(A), \ + _mm256_setzero_ph (),\ + (__mmask16)-1, (B))) -#endif /* __OPTIMIZE__ */ +#define _mm512_mask_cvtx_roundps_ph(A, B, C, D) \ + (__builtin_ia32_vcvtps2phx512_mask_round ((__v16sf)(C), \ + (A), (B), (D))) -/* Intrinsics vfnmadd[132,213,231]sh. */ -extern __inline __m128h - __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_fnmadd_sh (__m128h __W, __m128h __A, __m128h __B) -{ - return (__m128h) __builtin_ia32_vfnmaddsh3_mask ((__v8hf) __W, - (__v8hf) __A, - (__v8hf) __B, - (__mmask8) -1, - _MM_FROUND_CUR_DIRECTION); -} +#define _mm512_maskz_cvtx_roundps_ph(A, B, C) \ + (__builtin_ia32_vcvtps2phx512_mask_round ((__v16sf)(B), \ + _mm256_setzero_ph (),\ + (A), (C))) +#endif /* __OPTIMIZE__ */ +/* Intrinsics vcvtpd2ph. */ extern __inline __m128h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_mask_fnmadd_sh (__m128h __W, __mmask8 __U, __m128h __A, __m128h __B) +_mm512_cvtpd_ph (__m512d __A) { - return (__m128h) __builtin_ia32_vfnmaddsh3_mask ((__v8hf) __W, - (__v8hf) __A, - (__v8hf) __B, - (__mmask8) __U, - _MM_FROUND_CUR_DIRECTION); + return __builtin_ia32_vcvtpd2ph512_mask_round ((__v8df) __A, + _mm_setzero_ph (), + (__mmask8) -1, + _MM_FROUND_CUR_DIRECTION); } extern __inline __m128h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_mask3_fnmadd_sh (__m128h __W, __m128h __A, __m128h __B, __mmask8 __U) +_mm512_mask_cvtpd_ph (__m128h __A, __mmask8 __B, __m512d __C) { - return (__m128h) __builtin_ia32_vfnmaddsh3_mask3 ((__v8hf) __W, - (__v8hf) __A, - (__v8hf) __B, - (__mmask8) __U, - _MM_FROUND_CUR_DIRECTION); + return __builtin_ia32_vcvtpd2ph512_mask_round ((__v8df) __C, + __A, __B, + _MM_FROUND_CUR_DIRECTION); } extern __inline __m128h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_maskz_fnmadd_sh (__mmask8 __U, __m128h __W, __m128h __A, __m128h __B) +_mm512_maskz_cvtpd_ph (__mmask8 __A, __m512d __B) { - return (__m128h) __builtin_ia32_vfnmaddsh3_maskz ((__v8hf) __W, - (__v8hf) __A, - (__v8hf) __B, - (__mmask8) __U, - _MM_FROUND_CUR_DIRECTION); + return __builtin_ia32_vcvtpd2ph512_mask_round ((__v8df) __B, + _mm_setzero_ph (), + __A, + _MM_FROUND_CUR_DIRECTION); } - #ifdef __OPTIMIZE__ extern __inline __m128h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_fnmadd_round_sh (__m128h __W, __m128h __A, __m128h __B, const int __R) -{ - return (__m128h) __builtin_ia32_vfnmaddsh3_mask ((__v8hf) __W, - (__v8hf) __A, - (__v8hf) __B, - (__mmask8) -1, - __R); -} - -extern __inline __m128h -__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_mask_fnmadd_round_sh (__m128h __W, __mmask8 __U, __m128h __A, __m128h __B, - const int __R) +_mm512_cvt_roundpd_ph (__m512d __A, int __B) { - return (__m128h) __builtin_ia32_vfnmaddsh3_mask ((__v8hf) __W, - (__v8hf) __A, - (__v8hf) __B, - (__mmask8) __U, __R); + return __builtin_ia32_vcvtpd2ph512_mask_round ((__v8df) __A, + _mm_setzero_ph (), + (__mmask8) -1, + __B); } extern __inline __m128h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_mask3_fnmadd_round_sh (__m128h __W, __m128h __A, __m128h __B, __mmask8 __U, - const int __R) +_mm512_mask_cvt_roundpd_ph (__m128h __A, __mmask8 __B, __m512d __C, int __D) { - return (__m128h) __builtin_ia32_vfnmaddsh3_mask3 ((__v8hf) __W, - (__v8hf) __A, - (__v8hf) __B, - (__mmask8) __U, __R); + return __builtin_ia32_vcvtpd2ph512_mask_round ((__v8df) __C, + __A, __B, __D); } extern __inline __m128h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_maskz_fnmadd_round_sh (__mmask8 __U, __m128h __W, __m128h __A, - __m128h __B, const int __R) +_mm512_maskz_cvt_roundpd_ph (__mmask8 __A, __m512d __B, int __C) { - return (__m128h) __builtin_ia32_vfnmaddsh3_maskz ((__v8hf) __W, - (__v8hf) __A, - (__v8hf) __B, - (__mmask8) __U, __R); + return __builtin_ia32_vcvtpd2ph512_mask_round ((__v8df) __B, + _mm_setzero_ph (), + __A, __C); } #else -#define _mm_fnmadd_round_sh(A, B, C, R) \ - ((__m128h) __builtin_ia32_vfnmaddsh3_mask ((A), (B), (C), (-1), (R))) -#define _mm_mask_fnmadd_round_sh(A, U, B, C, R) \ - ((__m128h) __builtin_ia32_vfnmaddsh3_mask ((A), (B), (C), (U), (R))) -#define _mm_mask3_fnmadd_round_sh(A, B, C, U, R) \ - ((__m128h) __builtin_ia32_vfnmaddsh3_mask3 ((A), (B), (C), (U), (R))) -#define _mm_maskz_fnmadd_round_sh(U, A, B, C, R) \ - ((__m128h) __builtin_ia32_vfnmaddsh3_maskz ((A), (B), (C), (U), (R))) +#define _mm512_cvt_roundpd_ph(A, B) \ + (__builtin_ia32_vcvtpd2ph512_mask_round ((__v8df)(A), \ + _mm_setzero_ph (), \ + (__mmask8)-1, (B))) + +#define _mm512_mask_cvt_roundpd_ph(A, B, C, D) \ + (__builtin_ia32_vcvtpd2ph512_mask_round ((__v8df)(C), \ + (A), (B), (D))) + +#define _mm512_maskz_cvt_roundpd_ph(A, B, C) \ + (__builtin_ia32_vcvtpd2ph512_mask_round ((__v8df)(B), \ + _mm_setzero_ph (), \ + (A), (C))) #endif /* __OPTIMIZE__ */ -/* Intrinsics vfmsub[132,213,231]sh. */ -extern __inline __m128h - __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_fmsub_sh (__m128h __W, __m128h __A, __m128h __B) -{ - return (__m128h) __builtin_ia32_vfmaddsh3_mask ((__v8hf) __W, - (__v8hf) __A, - -(__v8hf) __B, - (__mmask8) -1, - _MM_FROUND_CUR_DIRECTION); +/* Intrinsics vfmaddsub[132,213,231]ph. */ +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_fmaddsub_ph (__m512h __A, __m512h __B, __m512h __C) +{ + return (__m512h) + __builtin_ia32_vfmaddsubph512_mask ((__v32hf) __A, + (__v32hf) __B, + (__v32hf) __C, + (__mmask32) -1, + _MM_FROUND_CUR_DIRECTION); } -extern __inline __m128h +extern __inline __m512h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_mask_fmsub_sh (__m128h __W, __mmask8 __U, __m128h __A, __m128h __B) +_mm512_mask_fmaddsub_ph (__m512h __A, __mmask32 __U, __m512h __B, __m512h __C) { - return (__m128h) __builtin_ia32_vfmaddsh3_mask ((__v8hf) __W, - (__v8hf) __A, - -(__v8hf) __B, - (__mmask8) __U, - _MM_FROUND_CUR_DIRECTION); + return (__m512h) + __builtin_ia32_vfmaddsubph512_mask ((__v32hf) __A, + (__v32hf) __B, + (__v32hf) __C, + (__mmask32) __U, + _MM_FROUND_CUR_DIRECTION); } -extern __inline __m128h +extern __inline __m512h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_mask3_fmsub_sh (__m128h __W, __m128h __A, __m128h __B, __mmask8 __U) +_mm512_mask3_fmaddsub_ph (__m512h __A, __m512h __B, __m512h __C, __mmask32 __U) { - return (__m128h) __builtin_ia32_vfmsubsh3_mask3 ((__v8hf) __W, - (__v8hf) __A, - (__v8hf) __B, - (__mmask8) __U, - _MM_FROUND_CUR_DIRECTION); + return (__m512h) + __builtin_ia32_vfmaddsubph512_mask3 ((__v32hf) __A, + (__v32hf) __B, + (__v32hf) __C, + (__mmask32) __U, + _MM_FROUND_CUR_DIRECTION); } -extern __inline __m128h +extern __inline __m512h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_maskz_fmsub_sh (__mmask8 __U, __m128h __W, __m128h __A, __m128h __B) +_mm512_maskz_fmaddsub_ph (__mmask32 __U, __m512h __A, __m512h __B, __m512h __C) { - return (__m128h) __builtin_ia32_vfmaddsh3_maskz ((__v8hf) __W, - (__v8hf) __A, - -(__v8hf) __B, - (__mmask8) __U, - _MM_FROUND_CUR_DIRECTION); + return (__m512h) + __builtin_ia32_vfmaddsubph512_maskz ((__v32hf) __A, + (__v32hf) __B, + (__v32hf) __C, + (__mmask32) __U, + _MM_FROUND_CUR_DIRECTION); } - #ifdef __OPTIMIZE__ -extern __inline __m128h +extern __inline __m512h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_fmsub_round_sh (__m128h __W, __m128h __A, __m128h __B, const int __R) +_mm512_fmaddsub_round_ph (__m512h __A, __m512h __B, __m512h __C, const int __R) { - return (__m128h) __builtin_ia32_vfmaddsh3_mask ((__v8hf) __W, - (__v8hf) __A, - -(__v8hf) __B, - (__mmask8) -1, - __R); + return (__m512h) + __builtin_ia32_vfmaddsubph512_mask ((__v32hf) __A, + (__v32hf) __B, + (__v32hf) __C, + (__mmask32) -1, __R); } -extern __inline __m128h +extern __inline __m512h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_mask_fmsub_round_sh (__m128h __W, __mmask8 __U, __m128h __A, __m128h __B, - const int __R) +_mm512_mask_fmaddsub_round_ph (__m512h __A, __mmask32 __U, __m512h __B, + __m512h __C, const int __R) { - return (__m128h) __builtin_ia32_vfmaddsh3_mask ((__v8hf) __W, - (__v8hf) __A, - -(__v8hf) __B, - (__mmask8) __U, __R); + return (__m512h) + __builtin_ia32_vfmaddsubph512_mask ((__v32hf) __A, + (__v32hf) __B, + (__v32hf) __C, + (__mmask32) __U, __R); } -extern __inline __m128h +extern __inline __m512h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_mask3_fmsub_round_sh (__m128h __W, __m128h __A, __m128h __B, __mmask8 __U, - const int __R) +_mm512_mask3_fmaddsub_round_ph (__m512h __A, __m512h __B, __m512h __C, + __mmask32 __U, const int __R) { - return (__m128h) __builtin_ia32_vfmsubsh3_mask3 ((__v8hf) __W, - (__v8hf) __A, - (__v8hf) __B, - (__mmask8) __U, __R); + return (__m512h) + __builtin_ia32_vfmaddsubph512_mask3 ((__v32hf) __A, + (__v32hf) __B, + (__v32hf) __C, + (__mmask32) __U, __R); } -extern __inline __m128h +extern __inline __m512h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_maskz_fmsub_round_sh (__mmask8 __U, __m128h __W, __m128h __A, - __m128h __B, const int __R) +_mm512_maskz_fmaddsub_round_ph (__mmask32 __U, __m512h __A, __m512h __B, + __m512h __C, const int __R) { - return (__m128h) __builtin_ia32_vfmaddsh3_maskz ((__v8hf) __W, - (__v8hf) __A, - -(__v8hf) __B, - (__mmask8) __U, __R); + return (__m512h) + __builtin_ia32_vfmaddsubph512_maskz ((__v32hf) __A, + (__v32hf) __B, + (__v32hf) __C, + (__mmask32) __U, __R); } #else -#define _mm_fmsub_round_sh(A, B, C, R) \ - ((__m128h) __builtin_ia32_vfmaddsh3_mask ((A), (B), -(C), (-1), (R))) -#define _mm_mask_fmsub_round_sh(A, U, B, C, R) \ - ((__m128h) __builtin_ia32_vfmaddsh3_mask ((A), (B), -(C), (U), (R))) -#define _mm_mask3_fmsub_round_sh(A, B, C, U, R) \ - ((__m128h) __builtin_ia32_vfmsubsh3_mask3 ((A), (B), (C), (U), (R))) -#define _mm_maskz_fmsub_round_sh(U, A, B, C, R) \ - ((__m128h) __builtin_ia32_vfmaddsh3_maskz ((A), (B), -(C), (U), (R))) +#define _mm512_fmaddsub_round_ph(A, B, C, R) \ + ((__m512h)__builtin_ia32_vfmaddsubph512_mask ((A), (B), (C), -1, (R))) + +#define _mm512_mask_fmaddsub_round_ph(A, U, B, C, R) \ + ((__m512h)__builtin_ia32_vfmaddsubph512_mask ((A), (B), (C), (U), (R))) + +#define _mm512_mask3_fmaddsub_round_ph(A, B, C, U, R) \ + ((__m512h)__builtin_ia32_vfmaddsubph512_mask3 ((A), (B), (C), (U), (R))) + +#define _mm512_maskz_fmaddsub_round_ph(U, A, B, C, R) \ + ((__m512h)__builtin_ia32_vfmaddsubph512_maskz ((A), (B), (C), (U), (R))) #endif /* __OPTIMIZE__ */ -/* Intrinsics vfnmsub[132,213,231]sh. */ -extern __inline __m128h - __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_fnmsub_sh (__m128h __W, __m128h __A, __m128h __B) +/* Intrinsics vfmsubadd[132,213,231]ph. */ +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) + _mm512_fmsubadd_ph (__m512h __A, __m512h __B, __m512h __C) { - return (__m128h) __builtin_ia32_vfmaddsh3_mask ((__v8hf) __W, - -(__v8hf) __A, - -(__v8hf) __B, - (__mmask8) -1, - _MM_FROUND_CUR_DIRECTION); + return (__m512h) + __builtin_ia32_vfmsubaddph512_mask ((__v32hf) __A, + (__v32hf) __B, + (__v32hf) __C, + (__mmask32) -1, + _MM_FROUND_CUR_DIRECTION); } -extern __inline __m128h +extern __inline __m512h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_mask_fnmsub_sh (__m128h __W, __mmask8 __U, __m128h __A, __m128h __B) +_mm512_mask_fmsubadd_ph (__m512h __A, __mmask32 __U, + __m512h __B, __m512h __C) { - return (__m128h) __builtin_ia32_vfmaddsh3_mask ((__v8hf) __W, - -(__v8hf) __A, - -(__v8hf) __B, - (__mmask8) __U, - _MM_FROUND_CUR_DIRECTION); + return (__m512h) + __builtin_ia32_vfmsubaddph512_mask ((__v32hf) __A, + (__v32hf) __B, + (__v32hf) __C, + (__mmask32) __U, + _MM_FROUND_CUR_DIRECTION); } -extern __inline __m128h +extern __inline __m512h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_mask3_fnmsub_sh (__m128h __W, __m128h __A, __m128h __B, __mmask8 __U) +_mm512_mask3_fmsubadd_ph (__m512h __A, __m512h __B, + __m512h __C, __mmask32 __U) { - return (__m128h) __builtin_ia32_vfmsubsh3_mask3 ((__v8hf) __W, - -(__v8hf) __A, - (__v8hf) __B, - (__mmask8) __U, - _MM_FROUND_CUR_DIRECTION); + return (__m512h) + __builtin_ia32_vfmsubaddph512_mask3 ((__v32hf) __A, + (__v32hf) __B, + (__v32hf) __C, + (__mmask32) __U, + _MM_FROUND_CUR_DIRECTION); } -extern __inline __m128h +extern __inline __m512h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_maskz_fnmsub_sh (__mmask8 __U, __m128h __W, __m128h __A, __m128h __B) +_mm512_maskz_fmsubadd_ph (__mmask32 __U, __m512h __A, + __m512h __B, __m512h __C) { - return (__m128h) __builtin_ia32_vfmaddsh3_maskz ((__v8hf) __W, - -(__v8hf) __A, - -(__v8hf) __B, - (__mmask8) __U, - _MM_FROUND_CUR_DIRECTION); + return (__m512h) + __builtin_ia32_vfmsubaddph512_maskz ((__v32hf) __A, + (__v32hf) __B, + (__v32hf) __C, + (__mmask32) __U, + _MM_FROUND_CUR_DIRECTION); } - #ifdef __OPTIMIZE__ -extern __inline __m128h +extern __inline __m512h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_fnmsub_round_sh (__m128h __W, __m128h __A, __m128h __B, const int __R) +_mm512_fmsubadd_round_ph (__m512h __A, __m512h __B, + __m512h __C, const int __R) { - return (__m128h) __builtin_ia32_vfmaddsh3_mask ((__v8hf) __W, - -(__v8hf) __A, - -(__v8hf) __B, - (__mmask8) -1, - __R); + return (__m512h) + __builtin_ia32_vfmsubaddph512_mask ((__v32hf) __A, + (__v32hf) __B, + (__v32hf) __C, + (__mmask32) -1, __R); } -extern __inline __m128h +extern __inline __m512h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_mask_fnmsub_round_sh (__m128h __W, __mmask8 __U, __m128h __A, __m128h __B, - const int __R) +_mm512_mask_fmsubadd_round_ph (__m512h __A, __mmask32 __U, __m512h __B, + __m512h __C, const int __R) { - return (__m128h) __builtin_ia32_vfmaddsh3_mask ((__v8hf) __W, - -(__v8hf) __A, - -(__v8hf) __B, - (__mmask8) __U, __R); + return (__m512h) + __builtin_ia32_vfmsubaddph512_mask ((__v32hf) __A, + (__v32hf) __B, + (__v32hf) __C, + (__mmask32) __U, __R); } -extern __inline __m128h +extern __inline __m512h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_mask3_fnmsub_round_sh (__m128h __W, __m128h __A, __m128h __B, __mmask8 __U, - const int __R) +_mm512_mask3_fmsubadd_round_ph (__m512h __A, __m512h __B, __m512h __C, + __mmask32 __U, const int __R) { - return (__m128h) __builtin_ia32_vfmsubsh3_mask3 ((__v8hf) __W, - -(__v8hf) __A, - (__v8hf) __B, - (__mmask8) __U, __R); + return (__m512h) + __builtin_ia32_vfmsubaddph512_mask3 ((__v32hf) __A, + (__v32hf) __B, + (__v32hf) __C, + (__mmask32) __U, __R); } -extern __inline __m128h +extern __inline __m512h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_maskz_fnmsub_round_sh (__mmask8 __U, __m128h __W, __m128h __A, - __m128h __B, const int __R) -{ - return (__m128h) __builtin_ia32_vfmaddsh3_maskz ((__v8hf) __W, - -(__v8hf) __A, - -(__v8hf) __B, - (__mmask8) __U, __R); +_mm512_maskz_fmsubadd_round_ph (__mmask32 __U, __m512h __A, __m512h __B, + __m512h __C, const int __R) +{ + return (__m512h) + __builtin_ia32_vfmsubaddph512_maskz ((__v32hf) __A, + (__v32hf) __B, + (__v32hf) __C, + (__mmask32) __U, __R); } #else -#define _mm_fnmsub_round_sh(A, B, C, R) \ - ((__m128h) __builtin_ia32_vfmaddsh3_mask ((A), -(B), -(C), (-1), (R))) -#define _mm_mask_fnmsub_round_sh(A, U, B, C, R) \ - ((__m128h) __builtin_ia32_vfmaddsh3_mask ((A), -(B), -(C), (U), (R))) -#define _mm_mask3_fnmsub_round_sh(A, B, C, U, R) \ - ((__m128h) __builtin_ia32_vfmsubsh3_mask3 ((A), -(B), (C), (U), (R))) -#define _mm_maskz_fnmsub_round_sh(U, A, B, C, R) \ - ((__m128h) __builtin_ia32_vfmaddsh3_maskz ((A), -(B), -(C), (U), (R))) +#define _mm512_fmsubadd_round_ph(A, B, C, R) \ + ((__m512h)__builtin_ia32_vfmsubaddph512_mask ((A), (B), (C), -1, (R))) + +#define _mm512_mask_fmsubadd_round_ph(A, U, B, C, R) \ + ((__m512h)__builtin_ia32_vfmsubaddph512_mask ((A), (B), (C), (U), (R))) + +#define _mm512_mask3_fmsubadd_round_ph(A, B, C, U, R) \ + ((__m512h)__builtin_ia32_vfmsubaddph512_mask3 ((A), (B), (C), (U), (R))) + +#define _mm512_maskz_fmsubadd_round_ph(U, A, B, C, R) \ + ((__m512h)__builtin_ia32_vfmsubaddph512_maskz ((A), (B), (C), (U), (R))) #endif /* __OPTIMIZE__ */ -/* Intrinsics vf[,c]maddcph. */ +/* Intrinsics vfmadd[132,213,231]ph. */ extern __inline __m512h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_fcmadd_pch (__m512h __A, __m512h __B, __m512h __C) + _mm512_fmadd_ph (__m512h __A, __m512h __B, __m512h __C) { return (__m512h) - __builtin_ia32_vfcmaddcph512_round ((__v32hf) __A, - (__v32hf) __B, - (__v32hf) __C, - _MM_FROUND_CUR_DIRECTION); + __builtin_ia32_vfmaddph512_mask ((__v32hf) __A, + (__v32hf) __B, + (__v32hf) __C, + (__mmask32) -1, + _MM_FROUND_CUR_DIRECTION); } extern __inline __m512h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_mask_fcmadd_pch (__m512h __A, __mmask16 __B, __m512h __C, __m512h __D) +_mm512_mask_fmadd_ph (__m512h __A, __mmask32 __U, __m512h __B, __m512h __C) { return (__m512h) - __builtin_ia32_vfcmaddcph512_mask_round ((__v32hf) __A, - (__v32hf) __C, - (__v32hf) __D, __B, - _MM_FROUND_CUR_DIRECTION); + __builtin_ia32_vfmaddph512_mask ((__v32hf) __A, + (__v32hf) __B, + (__v32hf) __C, + (__mmask32) __U, + _MM_FROUND_CUR_DIRECTION); } extern __inline __m512h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_mask3_fcmadd_pch (__m512h __A, __m512h __B, __m512h __C, __mmask16 __D) +_mm512_mask3_fmadd_ph (__m512h __A, __m512h __B, __m512h __C, __mmask32 __U) { return (__m512h) - __builtin_ia32_vfcmaddcph512_mask3_round ((__v32hf) __A, - (__v32hf) __B, - (__v32hf) __C, - __D, _MM_FROUND_CUR_DIRECTION); + __builtin_ia32_vfmaddph512_mask3 ((__v32hf) __A, + (__v32hf) __B, + (__v32hf) __C, + (__mmask32) __U, + _MM_FROUND_CUR_DIRECTION); } extern __inline __m512h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_maskz_fcmadd_pch (__mmask16 __A, __m512h __B, __m512h __C, __m512h __D) +_mm512_maskz_fmadd_ph (__mmask32 __U, __m512h __A, __m512h __B, __m512h __C) { return (__m512h) - __builtin_ia32_vfcmaddcph512_maskz_round ((__v32hf) __B, - (__v32hf) __C, - (__v32hf) __D, - __A, _MM_FROUND_CUR_DIRECTION); + __builtin_ia32_vfmaddph512_maskz ((__v32hf) __A, + (__v32hf) __B, + (__v32hf) __C, + (__mmask32) __U, + _MM_FROUND_CUR_DIRECTION); } +#ifdef __OPTIMIZE__ extern __inline __m512h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_fmadd_pch (__m512h __A, __m512h __B, __m512h __C) +_mm512_fmadd_round_ph (__m512h __A, __m512h __B, __m512h __C, const int __R) { - return (__m512h) - __builtin_ia32_vfmaddcph512_round ((__v32hf) __A, - (__v32hf) __B, - (__v32hf) __C, - _MM_FROUND_CUR_DIRECTION); + return (__m512h) __builtin_ia32_vfmaddph512_mask ((__v32hf) __A, + (__v32hf) __B, + (__v32hf) __C, + (__mmask32) -1, __R); } extern __inline __m512h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_mask_fmadd_pch (__m512h __A, __mmask16 __B, __m512h __C, __m512h __D) +_mm512_mask_fmadd_round_ph (__m512h __A, __mmask32 __U, __m512h __B, + __m512h __C, const int __R) { - return (__m512h) - __builtin_ia32_vfmaddcph512_mask_round ((__v32hf) __A, - (__v32hf) __C, - (__v32hf) __D, __B, - _MM_FROUND_CUR_DIRECTION); + return (__m512h) __builtin_ia32_vfmaddph512_mask ((__v32hf) __A, + (__v32hf) __B, + (__v32hf) __C, + (__mmask32) __U, __R); } extern __inline __m512h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_mask3_fmadd_pch (__m512h __A, __m512h __B, __m512h __C, __mmask16 __D) +_mm512_mask3_fmadd_round_ph (__m512h __A, __m512h __B, __m512h __C, + __mmask32 __U, const int __R) { - return (__m512h) - __builtin_ia32_vfmaddcph512_mask3_round ((__v32hf) __A, - (__v32hf) __B, - (__v32hf) __C, - __D, _MM_FROUND_CUR_DIRECTION); + return (__m512h) __builtin_ia32_vfmaddph512_mask3 ((__v32hf) __A, + (__v32hf) __B, + (__v32hf) __C, + (__mmask32) __U, __R); } extern __inline __m512h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_maskz_fmadd_pch (__mmask16 __A, __m512h __B, __m512h __C, __m512h __D) +_mm512_maskz_fmadd_round_ph (__mmask32 __U, __m512h __A, __m512h __B, + __m512h __C, const int __R) { - return (__m512h) - __builtin_ia32_vfmaddcph512_maskz_round ((__v32hf) __B, - (__v32hf) __C, - (__v32hf) __D, - __A, _MM_FROUND_CUR_DIRECTION); + return (__m512h) __builtin_ia32_vfmaddph512_maskz ((__v32hf) __A, + (__v32hf) __B, + (__v32hf) __C, + (__mmask32) __U, __R); } -#ifdef __OPTIMIZE__ +#else +#define _mm512_fmadd_round_ph(A, B, C, R) \ + ((__m512h)__builtin_ia32_vfmaddph512_mask ((A), (B), (C), -1, (R))) + +#define _mm512_mask_fmadd_round_ph(A, U, B, C, R) \ + ((__m512h)__builtin_ia32_vfmaddph512_mask ((A), (B), (C), (U), (R))) + +#define _mm512_mask3_fmadd_round_ph(A, B, C, U, R) \ + ((__m512h)__builtin_ia32_vfmaddph512_mask3 ((A), (B), (C), (U), (R))) + +#define _mm512_maskz_fmadd_round_ph(U, A, B, C, R) \ + ((__m512h)__builtin_ia32_vfmaddph512_maskz ((A), (B), (C), (U), (R))) + +#endif /* __OPTIMIZE__ */ + +/* Intrinsics vfnmadd[132,213,231]ph. */ extern __inline __m512h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_fcmadd_round_pch (__m512h __A, __m512h __B, __m512h __C, const int __D) +_mm512_fnmadd_ph (__m512h __A, __m512h __B, __m512h __C) { return (__m512h) - __builtin_ia32_vfcmaddcph512_round ((__v32hf) __A, - (__v32hf) __B, - (__v32hf) __C, - __D); + __builtin_ia32_vfnmaddph512_mask ((__v32hf) __A, + (__v32hf) __B, + (__v32hf) __C, + (__mmask32) -1, + _MM_FROUND_CUR_DIRECTION); } extern __inline __m512h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_mask_fcmadd_round_pch (__m512h __A, __mmask16 __B, __m512h __C, - __m512h __D, const int __E) +_mm512_mask_fnmadd_ph (__m512h __A, __mmask32 __U, __m512h __B, __m512h __C) { return (__m512h) - __builtin_ia32_vfcmaddcph512_mask_round ((__v32hf) __A, - (__v32hf) __C, - (__v32hf) __D, __B, - __E); + __builtin_ia32_vfnmaddph512_mask ((__v32hf) __A, + (__v32hf) __B, + (__v32hf) __C, + (__mmask32) __U, + _MM_FROUND_CUR_DIRECTION); } extern __inline __m512h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_mask3_fcmadd_round_pch (__m512h __A, __m512h __B, __m512h __C, - __mmask16 __D, const int __E) +_mm512_mask3_fnmadd_ph (__m512h __A, __m512h __B, __m512h __C, __mmask32 __U) { return (__m512h) - __builtin_ia32_vfcmaddcph512_mask3_round ((__v32hf) __A, - (__v32hf) __B, - (__v32hf) __C, - __D, __E); + __builtin_ia32_vfnmaddph512_mask3 ((__v32hf) __A, + (__v32hf) __B, + (__v32hf) __C, + (__mmask32) __U, + _MM_FROUND_CUR_DIRECTION); } extern __inline __m512h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_maskz_fcmadd_round_pch (__mmask16 __A, __m512h __B, __m512h __C, - __m512h __D, const int __E) +_mm512_maskz_fnmadd_ph (__mmask32 __U, __m512h __A, __m512h __B, __m512h __C) { return (__m512h) - __builtin_ia32_vfcmaddcph512_maskz_round ((__v32hf) __B, - (__v32hf) __C, - (__v32hf) __D, - __A, __E); + __builtin_ia32_vfnmaddph512_maskz ((__v32hf) __A, + (__v32hf) __B, + (__v32hf) __C, + (__mmask32) __U, + _MM_FROUND_CUR_DIRECTION); } +#ifdef __OPTIMIZE__ extern __inline __m512h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_fmadd_round_pch (__m512h __A, __m512h __B, __m512h __C, const int __D) +_mm512_fnmadd_round_ph (__m512h __A, __m512h __B, __m512h __C, const int __R) { - return (__m512h) - __builtin_ia32_vfmaddcph512_round ((__v32hf) __A, - (__v32hf) __B, - (__v32hf) __C, - __D); + return (__m512h) __builtin_ia32_vfnmaddph512_mask ((__v32hf) __A, + (__v32hf) __B, + (__v32hf) __C, + (__mmask32) -1, __R); } extern __inline __m512h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_mask_fmadd_round_pch (__m512h __A, __mmask16 __B, __m512h __C, - __m512h __D, const int __E) +_mm512_mask_fnmadd_round_ph (__m512h __A, __mmask32 __U, __m512h __B, + __m512h __C, const int __R) { - return (__m512h) - __builtin_ia32_vfmaddcph512_mask_round ((__v32hf) __A, - (__v32hf) __C, - (__v32hf) __D, __B, - __E); + return (__m512h) __builtin_ia32_vfnmaddph512_mask ((__v32hf) __A, + (__v32hf) __B, + (__v32hf) __C, + (__mmask32) __U, __R); } extern __inline __m512h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_mask3_fmadd_round_pch (__m512h __A, __m512h __B, __m512h __C, - __mmask16 __D, const int __E) +_mm512_mask3_fnmadd_round_ph (__m512h __A, __m512h __B, __m512h __C, + __mmask32 __U, const int __R) { - return (__m512h) - __builtin_ia32_vfmaddcph512_mask3_round ((__v32hf) __A, - (__v32hf) __B, - (__v32hf) __C, - __D, __E); + return (__m512h) __builtin_ia32_vfnmaddph512_mask3 ((__v32hf) __A, + (__v32hf) __B, + (__v32hf) __C, + (__mmask32) __U, __R); } extern __inline __m512h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_maskz_fmadd_round_pch (__mmask16 __A, __m512h __B, __m512h __C, - __m512h __D, const int __E) +_mm512_maskz_fnmadd_round_ph (__mmask32 __U, __m512h __A, __m512h __B, + __m512h __C, const int __R) { - return (__m512h) - __builtin_ia32_vfmaddcph512_maskz_round ((__v32hf) __B, - (__v32hf) __C, - (__v32hf) __D, - __A, __E); + return (__m512h) __builtin_ia32_vfnmaddph512_maskz ((__v32hf) __A, + (__v32hf) __B, + (__v32hf) __C, + (__mmask32) __U, __R); } #else -#define _mm512_fcmadd_round_pch(A, B, C, D) \ - (__m512h) __builtin_ia32_vfcmaddcph512_round ((A), (B), (C), (D)) - -#define _mm512_mask_fcmadd_round_pch(A, B, C, D, E) \ - ((__m512h) \ - __builtin_ia32_vfcmaddcph512_mask_round ((__v32hf) (A), \ - (__v32hf) (C), \ - (__v32hf) (D), \ - (B), (E))) - - -#define _mm512_mask3_fcmadd_round_pch(A, B, C, D, E) \ - ((__m512h) \ - __builtin_ia32_vfcmaddcph512_mask3_round ((A), (B), (C), (D), (E))) - -#define _mm512_maskz_fcmadd_round_pch(A, B, C, D, E) \ - (__m512h) \ - __builtin_ia32_vfcmaddcph512_maskz_round ((B), (C), (D), (A), (E)) - -#define _mm512_fmadd_round_pch(A, B, C, D) \ - (__m512h) __builtin_ia32_vfmaddcph512_round ((A), (B), (C), (D)) +#define _mm512_fnmadd_round_ph(A, B, C, R) \ + ((__m512h)__builtin_ia32_vfnmaddph512_mask ((A), (B), (C), -1, (R))) -#define _mm512_mask_fmadd_round_pch(A, B, C, D, E) \ - ((__m512h) \ - __builtin_ia32_vfmaddcph512_mask_round ((__v32hf) (A), \ - (__v32hf) (C), \ - (__v32hf) (D), \ - (B), (E))) +#define _mm512_mask_fnmadd_round_ph(A, U, B, C, R) \ + ((__m512h)__builtin_ia32_vfnmaddph512_mask ((A), (B), (C), (U), (R))) -#define _mm512_mask3_fmadd_round_pch(A, B, C, D, E) \ - (__m512h) \ - __builtin_ia32_vfmaddcph512_mask3_round ((A), (B), (C), (D), (E)) +#define _mm512_mask3_fnmadd_round_ph(A, B, C, U, R) \ + ((__m512h)__builtin_ia32_vfnmaddph512_mask3 ((A), (B), (C), (U), (R))) -#define _mm512_maskz_fmadd_round_pch(A, B, C, D, E) \ - (__m512h) \ - __builtin_ia32_vfmaddcph512_maskz_round ((B), (C), (D), (A), (E)) +#define _mm512_maskz_fnmadd_round_ph(U, A, B, C, R) \ + ((__m512h)__builtin_ia32_vfnmaddph512_maskz ((A), (B), (C), (U), (R))) #endif /* __OPTIMIZE__ */ -/* Intrinsics vf[,c]mulcph. */ +/* Intrinsics vfmsub[132,213,231]ph. */ extern __inline __m512h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_fcmul_pch (__m512h __A, __m512h __B) +_mm512_fmsub_ph (__m512h __A, __m512h __B, __m512h __C) { return (__m512h) - __builtin_ia32_vfcmulcph512_round ((__v32hf) __A, - (__v32hf) __B, - _MM_FROUND_CUR_DIRECTION); + __builtin_ia32_vfmsubph512_mask ((__v32hf) __A, + (__v32hf) __B, + (__v32hf) __C, + (__mmask32) -1, + _MM_FROUND_CUR_DIRECTION); } extern __inline __m512h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_mask_fcmul_pch (__m512h __A, __mmask16 __B, __m512h __C, __m512h __D) +_mm512_mask_fmsub_ph (__m512h __A, __mmask32 __U, __m512h __B, __m512h __C) { return (__m512h) - __builtin_ia32_vfcmulcph512_mask_round ((__v32hf) __C, - (__v32hf) __D, - (__v32hf) __A, - __B, _MM_FROUND_CUR_DIRECTION); + __builtin_ia32_vfmsubph512_mask ((__v32hf) __A, + (__v32hf) __B, + (__v32hf) __C, + (__mmask32) __U, + _MM_FROUND_CUR_DIRECTION); } extern __inline __m512h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_maskz_fcmul_pch (__mmask16 __A, __m512h __B, __m512h __C) +_mm512_mask3_fmsub_ph (__m512h __A, __m512h __B, __m512h __C, __mmask32 __U) { return (__m512h) - __builtin_ia32_vfcmulcph512_mask_round ((__v32hf) __B, - (__v32hf) __C, - _mm512_setzero_ph (), - __A, _MM_FROUND_CUR_DIRECTION); + __builtin_ia32_vfmsubph512_mask3 ((__v32hf) __A, + (__v32hf) __B, + (__v32hf) __C, + (__mmask32) __U, + _MM_FROUND_CUR_DIRECTION); } extern __inline __m512h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_fmul_pch (__m512h __A, __m512h __B) +_mm512_maskz_fmsub_ph (__mmask32 __U, __m512h __A, __m512h __B, __m512h __C) { return (__m512h) - __builtin_ia32_vfmulcph512_round ((__v32hf) __A, + __builtin_ia32_vfmsubph512_maskz ((__v32hf) __A, (__v32hf) __B, + (__v32hf) __C, + (__mmask32) __U, _MM_FROUND_CUR_DIRECTION); } +#ifdef __OPTIMIZE__ extern __inline __m512h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_mask_fmul_pch (__m512h __A, __mmask16 __B, __m512h __C, __m512h __D) +_mm512_fmsub_round_ph (__m512h __A, __m512h __B, __m512h __C, const int __R) { - return (__m512h) - __builtin_ia32_vfmulcph512_mask_round ((__v32hf) __C, - (__v32hf) __D, - (__v32hf) __A, - __B, _MM_FROUND_CUR_DIRECTION); + return (__m512h) __builtin_ia32_vfmsubph512_mask ((__v32hf) __A, + (__v32hf) __B, + (__v32hf) __C, + (__mmask32) -1, __R); } extern __inline __m512h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_maskz_fmul_pch (__mmask16 __A, __m512h __B, __m512h __C) +_mm512_mask_fmsub_round_ph (__m512h __A, __mmask32 __U, __m512h __B, + __m512h __C, const int __R) { - return (__m512h) - __builtin_ia32_vfmulcph512_mask_round ((__v32hf) __B, - (__v32hf) __C, - _mm512_setzero_ph (), - __A, _MM_FROUND_CUR_DIRECTION); + return (__m512h) __builtin_ia32_vfmsubph512_mask ((__v32hf) __A, + (__v32hf) __B, + (__v32hf) __C, + (__mmask32) __U, __R); } -#ifdef __OPTIMIZE__ extern __inline __m512h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_fcmul_round_pch (__m512h __A, __m512h __B, const int __D) +_mm512_mask3_fmsub_round_ph (__m512h __A, __m512h __B, __m512h __C, + __mmask32 __U, const int __R) { - return (__m512h) - __builtin_ia32_vfcmulcph512_round ((__v32hf) __A, - (__v32hf) __B, __D); + return (__m512h) __builtin_ia32_vfmsubph512_mask3 ((__v32hf) __A, + (__v32hf) __B, + (__v32hf) __C, + (__mmask32) __U, __R); } extern __inline __m512h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_mask_fcmul_round_pch (__m512h __A, __mmask16 __B, __m512h __C, - __m512h __D, const int __E) +_mm512_maskz_fmsub_round_ph (__mmask32 __U, __m512h __A, __m512h __B, + __m512h __C, const int __R) { - return (__m512h) - __builtin_ia32_vfcmulcph512_mask_round ((__v32hf) __C, - (__v32hf) __D, - (__v32hf) __A, - __B, __E); + return (__m512h) __builtin_ia32_vfmsubph512_maskz ((__v32hf) __A, + (__v32hf) __B, + (__v32hf) __C, + (__mmask32) __U, __R); } +#else +#define _mm512_fmsub_round_ph(A, B, C, R) \ + ((__m512h)__builtin_ia32_vfmsubph512_mask ((A), (B), (C), -1, (R))) + +#define _mm512_mask_fmsub_round_ph(A, U, B, C, R) \ + ((__m512h)__builtin_ia32_vfmsubph512_mask ((A), (B), (C), (U), (R))) + +#define _mm512_mask3_fmsub_round_ph(A, B, C, U, R) \ + ((__m512h)__builtin_ia32_vfmsubph512_mask3 ((A), (B), (C), (U), (R))) + +#define _mm512_maskz_fmsub_round_ph(U, A, B, C, R) \ + ((__m512h)__builtin_ia32_vfmsubph512_maskz ((A), (B), (C), (U), (R))) + +#endif /* __OPTIMIZE__ */ + +/* Intrinsics vfnmsub[132,213,231]ph. */ extern __inline __m512h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_maskz_fcmul_round_pch (__mmask16 __A, __m512h __B, - __m512h __C, const int __E) +_mm512_fnmsub_ph (__m512h __A, __m512h __B, __m512h __C) { return (__m512h) - __builtin_ia32_vfcmulcph512_mask_round ((__v32hf) __B, - (__v32hf) __C, - _mm512_setzero_ph (), - __A, __E); + __builtin_ia32_vfnmsubph512_mask ((__v32hf) __A, + (__v32hf) __B, + (__v32hf) __C, + (__mmask32) -1, + _MM_FROUND_CUR_DIRECTION); } extern __inline __m512h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_fmul_round_pch (__m512h __A, __m512h __B, const int __D) +_mm512_mask_fnmsub_ph (__m512h __A, __mmask32 __U, __m512h __B, __m512h __C) { return (__m512h) - __builtin_ia32_vfmulcph512_round ((__v32hf) __A, + __builtin_ia32_vfnmsubph512_mask ((__v32hf) __A, (__v32hf) __B, - __D); + (__v32hf) __C, + (__mmask32) __U, + _MM_FROUND_CUR_DIRECTION); } extern __inline __m512h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_mask_fmul_round_pch (__m512h __A, __mmask16 __B, __m512h __C, - __m512h __D, const int __E) +_mm512_mask3_fnmsub_ph (__m512h __A, __m512h __B, __m512h __C, __mmask32 __U) { return (__m512h) - __builtin_ia32_vfmulcph512_mask_round ((__v32hf) __C, - (__v32hf) __D, - (__v32hf) __A, - __B, __E); + __builtin_ia32_vfnmsubph512_mask3 ((__v32hf) __A, + (__v32hf) __B, + (__v32hf) __C, + (__mmask32) __U, + _MM_FROUND_CUR_DIRECTION); } extern __inline __m512h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_maskz_fmul_round_pch (__mmask16 __A, __m512h __B, - __m512h __C, const int __E) +_mm512_maskz_fnmsub_ph (__mmask32 __U, __m512h __A, __m512h __B, __m512h __C) { return (__m512h) - __builtin_ia32_vfmulcph512_mask_round ((__v32hf) __B, - (__v32hf) __C, - _mm512_setzero_ph (), - __A, __E); + __builtin_ia32_vfnmsubph512_maskz ((__v32hf) __A, + (__v32hf) __B, + (__v32hf) __C, + (__mmask32) __U, + _MM_FROUND_CUR_DIRECTION); } -#else -#define _mm512_fcmul_round_pch(A, B, D) \ - (__m512h) __builtin_ia32_vfcmulcph512_round ((A), (B), (D)) +#ifdef __OPTIMIZE__ +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_fnmsub_round_ph (__m512h __A, __m512h __B, __m512h __C, const int __R) +{ + return (__m512h) __builtin_ia32_vfnmsubph512_mask ((__v32hf) __A, + (__v32hf) __B, + (__v32hf) __C, + (__mmask32) -1, __R); +} -#define _mm512_mask_fcmul_round_pch(A, B, C, D, E) \ - (__m512h) __builtin_ia32_vfcmulcph512_mask_round ((C), (D), (A), (B), (E)) +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_fnmsub_round_ph (__m512h __A, __mmask32 __U, __m512h __B, + __m512h __C, const int __R) +{ + return (__m512h) __builtin_ia32_vfnmsubph512_mask ((__v32hf) __A, + (__v32hf) __B, + (__v32hf) __C, + (__mmask32) __U, __R); +} -#define _mm512_maskz_fcmul_round_pch(A, B, C, E) \ - (__m512h) __builtin_ia32_vfcmulcph512_mask_round ((B), (C), \ - (__v32hf) \ - _mm512_setzero_ph (), \ - (A), (E)) +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask3_fnmsub_round_ph (__m512h __A, __m512h __B, __m512h __C, + __mmask32 __U, const int __R) +{ + return (__m512h) __builtin_ia32_vfnmsubph512_mask3 ((__v32hf) __A, + (__v32hf) __B, + (__v32hf) __C, + (__mmask32) __U, __R); +} -#define _mm512_fmul_round_pch(A, B, D) \ - (__m512h) __builtin_ia32_vfmulcph512_round ((A), (B), (D)) +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_fnmsub_round_ph (__mmask32 __U, __m512h __A, __m512h __B, + __m512h __C, const int __R) +{ + return (__m512h) __builtin_ia32_vfnmsubph512_maskz ((__v32hf) __A, + (__v32hf) __B, + (__v32hf) __C, + (__mmask32) __U, __R); +} -#define _mm512_mask_fmul_round_pch(A, B, C, D, E) \ - (__m512h) __builtin_ia32_vfmulcph512_mask_round ((C), (D), (A), (B), (E)) +#else +#define _mm512_fnmsub_round_ph(A, B, C, R) \ + ((__m512h)__builtin_ia32_vfnmsubph512_mask ((A), (B), (C), -1, (R))) -#define _mm512_maskz_fmul_round_pch(A, B, C, E) \ - (__m512h) __builtin_ia32_vfmulcph512_mask_round ((B), (C), \ - (__v32hf) \ - _mm512_setzero_ph (), \ - (A), (E)) +#define _mm512_mask_fnmsub_round_ph(A, U, B, C, R) \ + ((__m512h)__builtin_ia32_vfnmsubph512_mask ((A), (B), (C), (U), (R))) + +#define _mm512_mask3_fnmsub_round_ph(A, B, C, U, R) \ + ((__m512h)__builtin_ia32_vfnmsubph512_mask3 ((A), (B), (C), (U), (R))) + +#define _mm512_maskz_fnmsub_round_ph(U, A, B, C, R) \ + ((__m512h)__builtin_ia32_vfnmsubph512_maskz ((A), (B), (C), (U), (R))) #endif /* __OPTIMIZE__ */ -/* Intrinsics vf[,c]maddcsh. */ -extern __inline __m128h +/* Intrinsics vf[,c]maddcph. */ +extern __inline __m512h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_mask_fcmadd_sch (__m128h __A, __mmask8 __B, __m128h __C, __m128h __D) +_mm512_fcmadd_pch (__m512h __A, __m512h __B, __m512h __C) { - return (__m128h) - __builtin_ia32_vfcmaddcsh_mask_round ((__v8hf) __A, - (__v8hf) __C, - (__v8hf) __D, __B, - _MM_FROUND_CUR_DIRECTION); + return (__m512h) + __builtin_ia32_vfcmaddcph512_round ((__v32hf) __A, + (__v32hf) __B, + (__v32hf) __C, + _MM_FROUND_CUR_DIRECTION); } -extern __inline __m128h +extern __inline __m512h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_mask3_fcmadd_sch (__m128h __A, __m128h __B, __m128h __C, __mmask8 __D) +_mm512_mask_fcmadd_pch (__m512h __A, __mmask16 __B, __m512h __C, __m512h __D) { - return (__m128h) - __builtin_ia32_vfcmaddcsh_mask3_round ((__v8hf) __A, - (__v8hf) __B, - (__v8hf) __C, __D, - _MM_FROUND_CUR_DIRECTION); + return (__m512h) + __builtin_ia32_vfcmaddcph512_mask_round ((__v32hf) __A, + (__v32hf) __C, + (__v32hf) __D, __B, + _MM_FROUND_CUR_DIRECTION); } -extern __inline __m128h +extern __inline __m512h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_maskz_fcmadd_sch (__mmask8 __A, __m128h __B, __m128h __C, __m128h __D) +_mm512_mask3_fcmadd_pch (__m512h __A, __m512h __B, __m512h __C, __mmask16 __D) { - return (__m128h) - __builtin_ia32_vfcmaddcsh_maskz_round ((__v8hf) __B, - (__v8hf) __C, - (__v8hf) __D, - __A, _MM_FROUND_CUR_DIRECTION); + return (__m512h) + __builtin_ia32_vfcmaddcph512_mask3_round ((__v32hf) __A, + (__v32hf) __B, + (__v32hf) __C, + __D, _MM_FROUND_CUR_DIRECTION); } -extern __inline __m128h +extern __inline __m512h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_fcmadd_sch (__m128h __A, __m128h __B, __m128h __C) +_mm512_maskz_fcmadd_pch (__mmask16 __A, __m512h __B, __m512h __C, __m512h __D) { - return (__m128h) - __builtin_ia32_vfcmaddcsh_round ((__v8hf) __A, - (__v8hf) __B, - (__v8hf) __C, - _MM_FROUND_CUR_DIRECTION); + return (__m512h) + __builtin_ia32_vfcmaddcph512_maskz_round ((__v32hf) __B, + (__v32hf) __C, + (__v32hf) __D, + __A, _MM_FROUND_CUR_DIRECTION); } -extern __inline __m128h +extern __inline __m512h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_mask_fmadd_sch (__m128h __A, __mmask8 __B, __m128h __C, __m128h __D) +_mm512_fmadd_pch (__m512h __A, __m512h __B, __m512h __C) { - return (__m128h) - __builtin_ia32_vfmaddcsh_mask_round ((__v8hf) __A, - (__v8hf) __C, - (__v8hf) __D, __B, - _MM_FROUND_CUR_DIRECTION); + return (__m512h) + __builtin_ia32_vfmaddcph512_round ((__v32hf) __A, + (__v32hf) __B, + (__v32hf) __C, + _MM_FROUND_CUR_DIRECTION); } -extern __inline __m128h +extern __inline __m512h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_mask3_fmadd_sch (__m128h __A, __m128h __B, __m128h __C, __mmask8 __D) +_mm512_mask_fmadd_pch (__m512h __A, __mmask16 __B, __m512h __C, __m512h __D) { - return (__m128h) - __builtin_ia32_vfmaddcsh_mask3_round ((__v8hf) __A, - (__v8hf) __B, - (__v8hf) __C, __D, - _MM_FROUND_CUR_DIRECTION); + return (__m512h) + __builtin_ia32_vfmaddcph512_mask_round ((__v32hf) __A, + (__v32hf) __C, + (__v32hf) __D, __B, + _MM_FROUND_CUR_DIRECTION); } -extern __inline __m128h +extern __inline __m512h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_maskz_fmadd_sch (__mmask8 __A, __m128h __B, __m128h __C, __m128h __D) +_mm512_mask3_fmadd_pch (__m512h __A, __m512h __B, __m512h __C, __mmask16 __D) { - return (__m128h) - __builtin_ia32_vfmaddcsh_maskz_round ((__v8hf) __B, - (__v8hf) __C, - (__v8hf) __D, - __A, _MM_FROUND_CUR_DIRECTION); + return (__m512h) + __builtin_ia32_vfmaddcph512_mask3_round ((__v32hf) __A, + (__v32hf) __B, + (__v32hf) __C, + __D, _MM_FROUND_CUR_DIRECTION); } -extern __inline __m128h +extern __inline __m512h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_fmadd_sch (__m128h __A, __m128h __B, __m128h __C) +_mm512_maskz_fmadd_pch (__mmask16 __A, __m512h __B, __m512h __C, __m512h __D) { - return (__m128h) - __builtin_ia32_vfmaddcsh_round ((__v8hf) __A, - (__v8hf) __B, - (__v8hf) __C, - _MM_FROUND_CUR_DIRECTION); + return (__m512h) + __builtin_ia32_vfmaddcph512_maskz_round ((__v32hf) __B, + (__v32hf) __C, + (__v32hf) __D, + __A, _MM_FROUND_CUR_DIRECTION); } #ifdef __OPTIMIZE__ -extern __inline __m128h +extern __inline __m512h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_mask_fcmadd_round_sch (__m128h __A, __mmask8 __B, __m128h __C, - __m128h __D, const int __E) +_mm512_fcmadd_round_pch (__m512h __A, __m512h __B, __m512h __C, const int __D) { - return (__m128h) - __builtin_ia32_vfcmaddcsh_mask_round ((__v8hf) __A, - (__v8hf) __C, - (__v8hf) __D, - __B, __E); + return (__m512h) + __builtin_ia32_vfcmaddcph512_round ((__v32hf) __A, + (__v32hf) __B, + (__v32hf) __C, + __D); } -extern __inline __m128h +extern __inline __m512h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_mask3_fcmadd_round_sch (__m128h __A, __m128h __B, __m128h __C, - __mmask8 __D, const int __E) +_mm512_mask_fcmadd_round_pch (__m512h __A, __mmask16 __B, __m512h __C, + __m512h __D, const int __E) { - return (__m128h) - __builtin_ia32_vfcmaddcsh_mask3_round ((__v8hf) __A, - (__v8hf) __B, - (__v8hf) __C, - __D, __E); + return (__m512h) + __builtin_ia32_vfcmaddcph512_mask_round ((__v32hf) __A, + (__v32hf) __C, + (__v32hf) __D, __B, + __E); } -extern __inline __m128h +extern __inline __m512h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_maskz_fcmadd_round_sch (__mmask8 __A, __m128h __B, __m128h __C, - __m128h __D, const int __E) +_mm512_mask3_fcmadd_round_pch (__m512h __A, __m512h __B, __m512h __C, + __mmask16 __D, const int __E) { - return (__m128h) - __builtin_ia32_vfcmaddcsh_maskz_round ((__v8hf) __B, - (__v8hf) __C, - (__v8hf) __D, - __A, __E); + return (__m512h) + __builtin_ia32_vfcmaddcph512_mask3_round ((__v32hf) __A, + (__v32hf) __B, + (__v32hf) __C, + __D, __E); } -extern __inline __m128h +extern __inline __m512h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_fcmadd_round_sch (__m128h __A, __m128h __B, __m128h __C, const int __D) +_mm512_maskz_fcmadd_round_pch (__mmask16 __A, __m512h __B, __m512h __C, + __m512h __D, const int __E) { - return (__m128h) - __builtin_ia32_vfcmaddcsh_round ((__v8hf) __A, - (__v8hf) __B, - (__v8hf) __C, - __D); + return (__m512h) + __builtin_ia32_vfcmaddcph512_maskz_round ((__v32hf) __B, + (__v32hf) __C, + (__v32hf) __D, + __A, __E); } -extern __inline __m128h +extern __inline __m512h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_mask_fmadd_round_sch (__m128h __A, __mmask8 __B, __m128h __C, - __m128h __D, const int __E) +_mm512_fmadd_round_pch (__m512h __A, __m512h __B, __m512h __C, const int __D) { - return (__m128h) - __builtin_ia32_vfmaddcsh_mask_round ((__v8hf) __A, - (__v8hf) __C, - (__v8hf) __D, - __B, __E); + return (__m512h) + __builtin_ia32_vfmaddcph512_round ((__v32hf) __A, + (__v32hf) __B, + (__v32hf) __C, + __D); } -extern __inline __m128h +extern __inline __m512h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_mask3_fmadd_round_sch (__m128h __A, __m128h __B, __m128h __C, - __mmask8 __D, const int __E) +_mm512_mask_fmadd_round_pch (__m512h __A, __mmask16 __B, __m512h __C, + __m512h __D, const int __E) { - return (__m128h) - __builtin_ia32_vfmaddcsh_mask3_round ((__v8hf) __A, - (__v8hf) __B, - (__v8hf) __C, - __D, __E); + return (__m512h) + __builtin_ia32_vfmaddcph512_mask_round ((__v32hf) __A, + (__v32hf) __C, + (__v32hf) __D, __B, + __E); } -extern __inline __m128h +extern __inline __m512h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_maskz_fmadd_round_sch (__mmask8 __A, __m128h __B, __m128h __C, - __m128h __D, const int __E) +_mm512_mask3_fmadd_round_pch (__m512h __A, __m512h __B, __m512h __C, + __mmask16 __D, const int __E) { - return (__m128h) - __builtin_ia32_vfmaddcsh_maskz_round ((__v8hf) __B, - (__v8hf) __C, - (__v8hf) __D, - __A, __E); + return (__m512h) + __builtin_ia32_vfmaddcph512_mask3_round ((__v32hf) __A, + (__v32hf) __B, + (__v32hf) __C, + __D, __E); } -extern __inline __m128h +extern __inline __m512h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_fmadd_round_sch (__m128h __A, __m128h __B, __m128h __C, const int __D) +_mm512_maskz_fmadd_round_pch (__mmask16 __A, __m512h __B, __m512h __C, + __m512h __D, const int __E) { - return (__m128h) - __builtin_ia32_vfmaddcsh_round ((__v8hf) __A, - (__v8hf) __B, - (__v8hf) __C, - __D); + return (__m512h) + __builtin_ia32_vfmaddcph512_maskz_round ((__v32hf) __B, + (__v32hf) __C, + (__v32hf) __D, + __A, __E); } + #else -#define _mm_mask_fcmadd_round_sch(A, B, C, D, E) \ - ((__m128h) \ - __builtin_ia32_vfcmaddcsh_mask_round ((__v8hf) (A), \ - (__v8hf) (C), \ - (__v8hf) (D), \ - (B), (E))) +#define _mm512_fcmadd_round_pch(A, B, C, D) \ + (__m512h) __builtin_ia32_vfcmaddcph512_round ((A), (B), (C), (D)) +#define _mm512_mask_fcmadd_round_pch(A, B, C, D, E) \ + ((__m512h) \ + __builtin_ia32_vfcmaddcph512_mask_round ((__v32hf) (A), \ + (__v32hf) (C), \ + (__v32hf) (D), \ + (B), (E))) -#define _mm_mask3_fcmadd_round_sch(A, B, C, D, E) \ - ((__m128h) \ - __builtin_ia32_vfcmaddcsh_mask3_round ((__v8hf) (A), \ - (__v8hf) (B), \ - (__v8hf) (C), \ - (D), (E))) -#define _mm_maskz_fcmadd_round_sch(A, B, C, D, E) \ - __builtin_ia32_vfcmaddcsh_maskz_round ((B), (C), (D), (A), (E)) +#define _mm512_mask3_fcmadd_round_pch(A, B, C, D, E) \ + ((__m512h) \ + __builtin_ia32_vfcmaddcph512_mask3_round ((A), (B), (C), (D), (E))) -#define _mm_fcmadd_round_sch(A, B, C, D) \ - __builtin_ia32_vfcmaddcsh_round ((A), (B), (C), (D)) +#define _mm512_maskz_fcmadd_round_pch(A, B, C, D, E) \ + (__m512h) \ + __builtin_ia32_vfcmaddcph512_maskz_round ((B), (C), (D), (A), (E)) -#define _mm_mask_fmadd_round_sch(A, B, C, D, E) \ - ((__m128h) \ - __builtin_ia32_vfmaddcsh_mask_round ((__v8hf) (A), \ - (__v8hf) (C), \ - (__v8hf) (D), \ - (B), (E))) +#define _mm512_fmadd_round_pch(A, B, C, D) \ + (__m512h) __builtin_ia32_vfmaddcph512_round ((A), (B), (C), (D)) -#define _mm_mask3_fmadd_round_sch(A, B, C, D, E) \ - ((__m128h) \ - __builtin_ia32_vfmaddcsh_mask3_round ((__v8hf) (A), \ - (__v8hf) (B), \ - (__v8hf) (C), \ - (D), (E))) +#define _mm512_mask_fmadd_round_pch(A, B, C, D, E) \ + ((__m512h) \ + __builtin_ia32_vfmaddcph512_mask_round ((__v32hf) (A), \ + (__v32hf) (C), \ + (__v32hf) (D), \ + (B), (E))) -#define _mm_maskz_fmadd_round_sch(A, B, C, D, E) \ - __builtin_ia32_vfmaddcsh_maskz_round ((B), (C), (D), (A), (E)) +#define _mm512_mask3_fmadd_round_pch(A, B, C, D, E) \ + (__m512h) \ + __builtin_ia32_vfmaddcph512_mask3_round ((A), (B), (C), (D), (E)) -#define _mm_fmadd_round_sch(A, B, C, D) \ - __builtin_ia32_vfmaddcsh_round ((A), (B), (C), (D)) +#define _mm512_maskz_fmadd_round_pch(A, B, C, D, E) \ + (__m512h) \ + __builtin_ia32_vfmaddcph512_maskz_round ((B), (C), (D), (A), (E)) #endif /* __OPTIMIZE__ */ -/* Intrinsics vf[,c]mulcsh. */ -extern __inline __m128h +/* Intrinsics vf[,c]mulcph. */ +extern __inline __m512h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_fcmul_sch (__m128h __A, __m128h __B) +_mm512_fcmul_pch (__m512h __A, __m512h __B) { - return (__m128h) - __builtin_ia32_vfcmulcsh_round ((__v8hf) __A, - (__v8hf) __B, - _MM_FROUND_CUR_DIRECTION); + return (__m512h) + __builtin_ia32_vfcmulcph512_round ((__v32hf) __A, + (__v32hf) __B, + _MM_FROUND_CUR_DIRECTION); } -extern __inline __m128h +extern __inline __m512h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_mask_fcmul_sch (__m128h __A, __mmask8 __B, __m128h __C, __m128h __D) +_mm512_mask_fcmul_pch (__m512h __A, __mmask16 __B, __m512h __C, __m512h __D) { - return (__m128h) - __builtin_ia32_vfcmulcsh_mask_round ((__v8hf) __C, - (__v8hf) __D, - (__v8hf) __A, - __B, _MM_FROUND_CUR_DIRECTION); + return (__m512h) + __builtin_ia32_vfcmulcph512_mask_round ((__v32hf) __C, + (__v32hf) __D, + (__v32hf) __A, + __B, _MM_FROUND_CUR_DIRECTION); } -extern __inline __m128h +extern __inline __m512h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_maskz_fcmul_sch (__mmask8 __A, __m128h __B, __m128h __C) +_mm512_maskz_fcmul_pch (__mmask16 __A, __m512h __B, __m512h __C) { - return (__m128h) - __builtin_ia32_vfcmulcsh_mask_round ((__v8hf) __B, - (__v8hf) __C, - _mm_setzero_ph (), - __A, _MM_FROUND_CUR_DIRECTION); + return (__m512h) + __builtin_ia32_vfcmulcph512_mask_round ((__v32hf) __B, + (__v32hf) __C, + _mm512_setzero_ph (), + __A, _MM_FROUND_CUR_DIRECTION); } -extern __inline __m128h +extern __inline __m512h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_fmul_sch (__m128h __A, __m128h __B) +_mm512_fmul_pch (__m512h __A, __m512h __B) { - return (__m128h) - __builtin_ia32_vfmulcsh_round ((__v8hf) __A, - (__v8hf) __B, - _MM_FROUND_CUR_DIRECTION); + return (__m512h) + __builtin_ia32_vfmulcph512_round ((__v32hf) __A, + (__v32hf) __B, + _MM_FROUND_CUR_DIRECTION); } -extern __inline __m128h +extern __inline __m512h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_mask_fmul_sch (__m128h __A, __mmask8 __B, __m128h __C, __m128h __D) +_mm512_mask_fmul_pch (__m512h __A, __mmask16 __B, __m512h __C, __m512h __D) { - return (__m128h) - __builtin_ia32_vfmulcsh_mask_round ((__v8hf) __C, - (__v8hf) __D, - (__v8hf) __A, - __B, _MM_FROUND_CUR_DIRECTION); + return (__m512h) + __builtin_ia32_vfmulcph512_mask_round ((__v32hf) __C, + (__v32hf) __D, + (__v32hf) __A, + __B, _MM_FROUND_CUR_DIRECTION); } -extern __inline __m128h +extern __inline __m512h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_maskz_fmul_sch (__mmask8 __A, __m128h __B, __m128h __C) +_mm512_maskz_fmul_pch (__mmask16 __A, __m512h __B, __m512h __C) { - return (__m128h) - __builtin_ia32_vfmulcsh_mask_round ((__v8hf) __B, - (__v8hf) __C, - _mm_setzero_ph (), - __A, _MM_FROUND_CUR_DIRECTION); + return (__m512h) + __builtin_ia32_vfmulcph512_mask_round ((__v32hf) __B, + (__v32hf) __C, + _mm512_setzero_ph (), + __A, _MM_FROUND_CUR_DIRECTION); } #ifdef __OPTIMIZE__ -extern __inline __m128h +extern __inline __m512h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_fcmul_round_sch (__m128h __A, __m128h __B, const int __D) +_mm512_fcmul_round_pch (__m512h __A, __m512h __B, const int __D) { - return (__m128h) - __builtin_ia32_vfcmulcsh_round ((__v8hf) __A, - (__v8hf) __B, - __D); + return (__m512h) + __builtin_ia32_vfcmulcph512_round ((__v32hf) __A, + (__v32hf) __B, __D); } -extern __inline __m128h +extern __inline __m512h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_mask_fcmul_round_sch (__m128h __A, __mmask8 __B, __m128h __C, - __m128h __D, const int __E) +_mm512_mask_fcmul_round_pch (__m512h __A, __mmask16 __B, __m512h __C, + __m512h __D, const int __E) { - return (__m128h) - __builtin_ia32_vfcmulcsh_mask_round ((__v8hf) __C, - (__v8hf) __D, - (__v8hf) __A, - __B, __E); + return (__m512h) + __builtin_ia32_vfcmulcph512_mask_round ((__v32hf) __C, + (__v32hf) __D, + (__v32hf) __A, + __B, __E); } -extern __inline __m128h +extern __inline __m512h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_maskz_fcmul_round_sch (__mmask8 __A, __m128h __B, __m128h __C, - const int __E) +_mm512_maskz_fcmul_round_pch (__mmask16 __A, __m512h __B, + __m512h __C, const int __E) { - return (__m128h) - __builtin_ia32_vfcmulcsh_mask_round ((__v8hf) __B, - (__v8hf) __C, - _mm_setzero_ph (), - __A, __E); + return (__m512h) + __builtin_ia32_vfcmulcph512_mask_round ((__v32hf) __B, + (__v32hf) __C, + _mm512_setzero_ph (), + __A, __E); } -extern __inline __m128h +extern __inline __m512h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_fmul_round_sch (__m128h __A, __m128h __B, const int __D) +_mm512_fmul_round_pch (__m512h __A, __m512h __B, const int __D) { - return (__m128h) - __builtin_ia32_vfmulcsh_round ((__v8hf) __A, - (__v8hf) __B, __D); + return (__m512h) + __builtin_ia32_vfmulcph512_round ((__v32hf) __A, + (__v32hf) __B, + __D); } -extern __inline __m128h +extern __inline __m512h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_mask_fmul_round_sch (__m128h __A, __mmask8 __B, __m128h __C, - __m128h __D, const int __E) +_mm512_mask_fmul_round_pch (__m512h __A, __mmask16 __B, __m512h __C, + __m512h __D, const int __E) { - return (__m128h) - __builtin_ia32_vfmulcsh_mask_round ((__v8hf) __C, - (__v8hf) __D, - (__v8hf) __A, - __B, __E); + return (__m512h) + __builtin_ia32_vfmulcph512_mask_round ((__v32hf) __C, + (__v32hf) __D, + (__v32hf) __A, + __B, __E); } -extern __inline __m128h +extern __inline __m512h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_maskz_fmul_round_sch (__mmask8 __A, __m128h __B, __m128h __C, const int __E) +_mm512_maskz_fmul_round_pch (__mmask16 __A, __m512h __B, + __m512h __C, const int __E) { - return (__m128h) - __builtin_ia32_vfmulcsh_mask_round ((__v8hf) __B, - (__v8hf) __C, - _mm_setzero_ph (), - __A, __E); + return (__m512h) + __builtin_ia32_vfmulcph512_mask_round ((__v32hf) __B, + (__v32hf) __C, + _mm512_setzero_ph (), + __A, __E); } #else -#define _mm_fcmul_round_sch(__A, __B, __D) \ - (__m128h) __builtin_ia32_vfcmulcsh_round ((__v8hf) __A, \ - (__v8hf) __B, __D) +#define _mm512_fcmul_round_pch(A, B, D) \ + (__m512h) __builtin_ia32_vfcmulcph512_round ((A), (B), (D)) -#define _mm_mask_fcmul_round_sch(__A, __B, __C, __D, __E) \ - (__m128h) __builtin_ia32_vfcmulcsh_mask_round ((__v8hf) __C, \ - (__v8hf) __D, \ - (__v8hf) __A, \ - __B, __E) +#define _mm512_mask_fcmul_round_pch(A, B, C, D, E) \ + (__m512h) __builtin_ia32_vfcmulcph512_mask_round ((C), (D), (A), (B), (E)) -#define _mm_maskz_fcmul_round_sch(__A, __B, __C, __E) \ - (__m128h) __builtin_ia32_vfcmulcsh_mask_round ((__v8hf) __B, \ - (__v8hf) __C, \ - _mm_setzero_ph (), \ - __A, __E) +#define _mm512_maskz_fcmul_round_pch(A, B, C, E) \ + (__m512h) __builtin_ia32_vfcmulcph512_mask_round ((B), (C), \ + (__v32hf) \ + _mm512_setzero_ph (), \ + (A), (E)) -#define _mm_fmul_round_sch(__A, __B, __D) \ - (__m128h) __builtin_ia32_vfmulcsh_round ((__v8hf) __A, \ - (__v8hf) __B, __D) +#define _mm512_fmul_round_pch(A, B, D) \ + (__m512h) __builtin_ia32_vfmulcph512_round ((A), (B), (D)) -#define _mm_mask_fmul_round_sch(__A, __B, __C, __D, __E) \ - (__m128h) __builtin_ia32_vfmulcsh_mask_round ((__v8hf) __C, \ - (__v8hf) __D, \ - (__v8hf) __A, \ - __B, __E) +#define _mm512_mask_fmul_round_pch(A, B, C, D, E) \ + (__m512h) __builtin_ia32_vfmulcph512_mask_round ((C), (D), (A), (B), (E)) -#define _mm_maskz_fmul_round_sch(__A, __B, __C, __E) \ - (__m128h) __builtin_ia32_vfmulcsh_mask_round ((__v8hf) __B, \ - (__v8hf) __C, \ - _mm_setzero_ph (), \ - __A, __E) +#define _mm512_maskz_fmul_round_pch(A, B, C, E) \ + (__m512h) __builtin_ia32_vfmulcph512_mask_round ((B), (C), \ + (__v32hf) \ + _mm512_setzero_ph (), \ + (A), (E)) #endif /* __OPTIMIZE__ */ @@ -7193,27 +7238,9 @@ _mm512_set1_pch (_Float16 _Complex __A) #define _mm512_maskz_cmul_round_pch(U, A, B, R) \ _mm512_maskz_fcmul_round_pch ((U), (A), (B), (R)) -#define _mm_mul_sch(A, B) _mm_fmul_sch ((A), (B)) -#define _mm_mask_mul_sch(W, U, A, B) _mm_mask_fmul_sch ((W), (U), (A), (B)) -#define _mm_maskz_mul_sch(U, A, B) _mm_maskz_fmul_sch ((U), (A), (B)) -#define _mm_mul_round_sch(A, B, R) _mm_fmul_round_sch ((A), (B), (R)) -#define _mm_mask_mul_round_sch(W, U, A, B, R) \ - _mm_mask_fmul_round_sch ((W), (U), (A), (B), (R)) -#define _mm_maskz_mul_round_sch(U, A, B, R) \ - _mm_maskz_fmul_round_sch ((U), (A), (B), (R)) - -#define _mm_cmul_sch(A, B) _mm_fcmul_sch ((A), (B)) -#define _mm_mask_cmul_sch(W, U, A, B) _mm_mask_fcmul_sch ((W), (U), (A), (B)) -#define _mm_maskz_cmul_sch(U, A, B) _mm_maskz_fcmul_sch ((U), (A), (B)) -#define _mm_cmul_round_sch(A, B, R) _mm_fcmul_round_sch ((A), (B), (R)) -#define _mm_mask_cmul_round_sch(W, U, A, B, R) \ - _mm_mask_fcmul_round_sch ((W), (U), (A), (B), (R)) -#define _mm_maskz_cmul_round_sch(U, A, B, R) \ - _mm_maskz_fcmul_round_sch ((U), (A), (B), (R)) - -#ifdef __DISABLE_AVX512FP16__ -#undef __DISABLE_AVX512FP16__ +#ifdef __DISABLE_AVX512FP16_512__ +#undef __DISABLE_AVX512FP16_512__ #pragma GCC pop_options -#endif /* __DISABLE_AVX512FP16__ */ +#endif /* __DISABLE_AVX512FP16_512__ */ -#endif /* __AVX512FP16INTRIN_H_INCLUDED */ +#endif /* _AVX512FP16INTRIN_H_INCLUDED */ From patchwork Thu Sep 21 07:20:02 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Hu, Lin1" X-Patchwork-Id: 1837519 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=intel.com header.i=@intel.com header.a=rsa-sha256 header.s=Intel header.b=gFe8BCdZ; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4Rrn2x6BGlz1ynF for ; Thu, 21 Sep 2023 17:24:25 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 3229E3894C05 for ; Thu, 21 Sep 2023 07:24:19 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mgamail.intel.com (mgamail.intel.com [192.55.52.115]) by sourceware.org (Postfix) with ESMTPS id 3F6D93857031 for ; Thu, 21 Sep 2023 07:22:53 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 3F6D93857031 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=intel.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1695280975; x=1726816975; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=HfTz5L9OIhHRbRqmdwsFcWhTh6O+XYUAVKNjy8FWCHk=; b=gFe8BCdZYteDzYJNxLtVLzLMbRhJA53lXQDyeTTokVgIE3tUdPnPX3G4 82vjPPXzxH1DUoQhgl/6kfJa213hXH5++w6X2pCkIMxjrruKRQrErWcs/ 2GRWXOe5EdXcBHw6FVM+aVpjfnUeAPS/DfMDMGF2EiSN8lm4T3H+Qqaxv qrsZHt8WLOT4EMrIH/knEeLcYLGSTevOrRivytXJhQGFN74kkrprCRKr+ 1QQfSlxCCYfc7FPLJ9oml+Rq7A+r6yhMSboeqh6IAbNu05bZlz+8yo+Bv SHdGnTVb3hzZSqjsHK23KnJzJxNPm5SsRoE4/DcMgLkZeLUd+mWOW6qow g==; X-IronPort-AV: E=McAfee;i="6600,9927,10839"; a="380352181" X-IronPort-AV: E=Sophos;i="6.03,164,1694761200"; d="scan'208";a="380352181" Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Sep 2023 00:22:28 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10839"; a="817262259" X-IronPort-AV: E=Sophos;i="6.03,164,1694761200"; d="scan'208";a="817262259" Received: from shvmail03.sh.intel.com ([10.239.245.20]) by fmsmga004.fm.intel.com with ESMTP; 21 Sep 2023 00:22:15 -0700 Received: from shliclel4217.sh.intel.com (shliclel4217.sh.intel.com [10.239.240.127]) by shvmail03.sh.intel.com (Postfix) with ESMTP id 436481005135; Thu, 21 Sep 2023 15:22:14 +0800 (CST) From: "Hu, Lin1" To: gcc-patches@gcc.gnu.org Cc: hongtao.liu@intel.com, ubizjak@gmail.com, haochen.jiang@intel.com Subject: [PATCH 07/18] [PATCH 1/5] Add OPTION_MASK_ISA2_EVEX512 for 512 bit builtins Date: Thu, 21 Sep 2023 15:20:02 +0800 Message-Id: <20230921072013.2124750-8-lin1.hu@intel.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20230921072013.2124750-1-lin1.hu@intel.com> References: <20230921072013.2124750-1-lin1.hu@intel.com> MIME-Version: 1.0 X-Spam-Status: No, score=-11.9 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, SPF_HELO_NONE, SPF_NONE, TXREP, UPPERCASE_50_75 autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org From: Haochen Jiang gcc/ChangeLog: * config/i386/i386-builtin.def (BDESC): Add OPTION_MASK_ISA2_EVEX512. * config/i386/i386-builtins.cc (ix86_init_mmx_sse_builtins): Ditto. --- gcc/config/i386/i386-builtin.def | 648 +++++++++++++++---------------- gcc/config/i386/i386-builtins.cc | 72 ++-- 2 files changed, 372 insertions(+), 348 deletions(-) diff --git a/gcc/config/i386/i386-builtin.def b/gcc/config/i386/i386-builtin.def index 8738b3b6a8a..0cc526383db 100644 --- a/gcc/config/i386/i386-builtin.def +++ b/gcc/config/i386/i386-builtin.def @@ -200,53 +200,53 @@ BDESC (OPTION_MASK_ISA_AVX2, 0, CODE_FOR_avx2_maskstored256, "__builtin_ia32_mas BDESC (OPTION_MASK_ISA_AVX2, 0, CODE_FOR_avx2_maskstoreq256, "__builtin_ia32_maskstoreq256", IX86_BUILTIN_MASKSTOREQ256, UNKNOWN, (int) VOID_FTYPE_PV4DI_V4DI_V4DI) /* AVX512F */ -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_compressstorev16sf_mask, "__builtin_ia32_compressstoresf512_mask", IX86_BUILTIN_COMPRESSPSSTORE512, UNKNOWN, (int) VOID_FTYPE_PV16SF_V16SF_UHI) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_compressstorev16si_mask, "__builtin_ia32_compressstoresi512_mask", IX86_BUILTIN_PCOMPRESSDSTORE512, UNKNOWN, (int) VOID_FTYPE_PV16SI_V16SI_UHI) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_compressstorev8df_mask, "__builtin_ia32_compressstoredf512_mask", IX86_BUILTIN_COMPRESSPDSTORE512, UNKNOWN, (int) VOID_FTYPE_PV8DF_V8DF_UQI) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_compressstorev8di_mask, "__builtin_ia32_compressstoredi512_mask", IX86_BUILTIN_PCOMPRESSQSTORE512, UNKNOWN, (int) VOID_FTYPE_PV8DI_V8DI_UQI) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_expandv16sf_mask, "__builtin_ia32_expandloadsf512_mask", IX86_BUILTIN_EXPANDPSLOAD512, UNKNOWN, (int) V16SF_FTYPE_PCV16SF_V16SF_UHI) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_expandv16sf_maskz, "__builtin_ia32_expandloadsf512_maskz", IX86_BUILTIN_EXPANDPSLOAD512Z, UNKNOWN, (int) V16SF_FTYPE_PCV16SF_V16SF_UHI) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_expandv16si_mask, "__builtin_ia32_expandloadsi512_mask", IX86_BUILTIN_PEXPANDDLOAD512, UNKNOWN, (int) V16SI_FTYPE_PCV16SI_V16SI_UHI) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_expandv16si_maskz, "__builtin_ia32_expandloadsi512_maskz", IX86_BUILTIN_PEXPANDDLOAD512Z, UNKNOWN, (int) V16SI_FTYPE_PCV16SI_V16SI_UHI) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_expandv8df_mask, "__builtin_ia32_expandloaddf512_mask", IX86_BUILTIN_EXPANDPDLOAD512, UNKNOWN, (int) V8DF_FTYPE_PCV8DF_V8DF_UQI) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_expandv8df_maskz, "__builtin_ia32_expandloaddf512_maskz", IX86_BUILTIN_EXPANDPDLOAD512Z, UNKNOWN, (int) V8DF_FTYPE_PCV8DF_V8DF_UQI) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_expandv8di_mask, "__builtin_ia32_expandloaddi512_mask", IX86_BUILTIN_PEXPANDQLOAD512, UNKNOWN, (int) V8DI_FTYPE_PCV8DI_V8DI_UQI) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_expandv8di_maskz, "__builtin_ia32_expandloaddi512_maskz", IX86_BUILTIN_PEXPANDQLOAD512Z, UNKNOWN, (int) V8DI_FTYPE_PCV8DI_V8DI_UQI) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_loadv16si_mask, "__builtin_ia32_loaddqusi512_mask", IX86_BUILTIN_LOADDQUSI512, UNKNOWN, (int) V16SI_FTYPE_PCINT_V16SI_UHI) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_loadv8di_mask, "__builtin_ia32_loaddqudi512_mask", IX86_BUILTIN_LOADDQUDI512, UNKNOWN, (int) V8DI_FTYPE_PCINT64_V8DI_UQI) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_loadv8df_mask, "__builtin_ia32_loadupd512_mask", IX86_BUILTIN_LOADUPD512, UNKNOWN, (int) V8DF_FTYPE_PCDOUBLE_V8DF_UQI) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_loadv16sf_mask, "__builtin_ia32_loadups512_mask", IX86_BUILTIN_LOADUPS512, UNKNOWN, (int) V16SF_FTYPE_PCFLOAT_V16SF_UHI) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_loadv16sf_mask, "__builtin_ia32_loadaps512_mask", IX86_BUILTIN_LOADAPS512, UNKNOWN, (int) V16SF_FTYPE_PCV16SF_V16SF_UHI) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_loadv16si_mask, "__builtin_ia32_movdqa32load512_mask", IX86_BUILTIN_MOVDQA32LOAD512, UNKNOWN, (int) V16SI_FTYPE_PCV16SI_V16SI_UHI) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_loadv8df_mask, "__builtin_ia32_loadapd512_mask", IX86_BUILTIN_LOADAPD512, UNKNOWN, (int) V8DF_FTYPE_PCV8DF_V8DF_UQI) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_loadv8di_mask, "__builtin_ia32_movdqa64load512_mask", IX86_BUILTIN_MOVDQA64LOAD512, UNKNOWN, (int) V8DI_FTYPE_PCV8DI_V8DI_UQI) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_movntv16sf, "__builtin_ia32_movntps512", IX86_BUILTIN_MOVNTPS512, UNKNOWN, (int) VOID_FTYPE_PFLOAT_V16SF) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_movntv8df, "__builtin_ia32_movntpd512", IX86_BUILTIN_MOVNTPD512, UNKNOWN, (int) VOID_FTYPE_PDOUBLE_V8DF) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_movntv8di, "__builtin_ia32_movntdq512", IX86_BUILTIN_MOVNTDQ512, UNKNOWN, (int) VOID_FTYPE_PV8DI_V8DI) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_movntdqa, "__builtin_ia32_movntdqa512", IX86_BUILTIN_MOVNTDQA512, UNKNOWN, (int) V8DI_FTYPE_PV8DI) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_storev16si_mask, "__builtin_ia32_storedqusi512_mask", IX86_BUILTIN_STOREDQUSI512, UNKNOWN, (int) VOID_FTYPE_PINT_V16SI_UHI) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_storev8di_mask, "__builtin_ia32_storedqudi512_mask", IX86_BUILTIN_STOREDQUDI512, UNKNOWN, (int) VOID_FTYPE_PINT64_V8DI_UQI) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_storev8df_mask, "__builtin_ia32_storeupd512_mask", IX86_BUILTIN_STOREUPD512, UNKNOWN, (int) VOID_FTYPE_PDOUBLE_V8DF_UQI) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_us_truncatev8div8si2_mask_store, "__builtin_ia32_pmovusqd512mem_mask", IX86_BUILTIN_PMOVUSQD512_MEM, UNKNOWN, (int) VOID_FTYPE_PV8SI_V8DI_UQI) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_ss_truncatev8div8si2_mask_store, "__builtin_ia32_pmovsqd512mem_mask", IX86_BUILTIN_PMOVSQD512_MEM, UNKNOWN, (int) VOID_FTYPE_PV8SI_V8DI_UQI) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_truncatev8div8si2_mask_store, "__builtin_ia32_pmovqd512mem_mask", IX86_BUILTIN_PMOVQD512_MEM, UNKNOWN, (int) VOID_FTYPE_PV8SI_V8DI_UQI) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_us_truncatev8div8hi2_mask_store, "__builtin_ia32_pmovusqw512mem_mask", IX86_BUILTIN_PMOVUSQW512_MEM, UNKNOWN, (int) VOID_FTYPE_PV8HI_V8DI_UQI) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_ss_truncatev8div8hi2_mask_store, "__builtin_ia32_pmovsqw512mem_mask", IX86_BUILTIN_PMOVSQW512_MEM, UNKNOWN, (int) VOID_FTYPE_PV8HI_V8DI_UQI) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_truncatev8div8hi2_mask_store, "__builtin_ia32_pmovqw512mem_mask", IX86_BUILTIN_PMOVQW512_MEM, UNKNOWN, (int) VOID_FTYPE_PV8HI_V8DI_UQI) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_us_truncatev16siv16hi2_mask_store, "__builtin_ia32_pmovusdw512mem_mask", IX86_BUILTIN_PMOVUSDW512_MEM, UNKNOWN, (int) VOID_FTYPE_PV16HI_V16SI_UHI) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_ss_truncatev16siv16hi2_mask_store, "__builtin_ia32_pmovsdw512mem_mask", IX86_BUILTIN_PMOVSDW512_MEM, UNKNOWN, (int) VOID_FTYPE_PV16HI_V16SI_UHI) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_truncatev16siv16hi2_mask_store, "__builtin_ia32_pmovdw512mem_mask", IX86_BUILTIN_PMOVDW512_MEM, UNKNOWN, (int) VOID_FTYPE_PV16HI_V16SI_UHI) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_truncatev8div16qi2_mask_store_2, "__builtin_ia32_pmovqb512mem_mask", IX86_BUILTIN_PMOVQB512_MEM, UNKNOWN, (int) VOID_FTYPE_PUDI_V8DI_UQI) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_us_truncatev8div16qi2_mask_store_2, "__builtin_ia32_pmovusqb512mem_mask", IX86_BUILTIN_PMOVUSQB512_MEM, UNKNOWN, (int) VOID_FTYPE_PUDI_V8DI_UQI) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_ss_truncatev8div16qi2_mask_store_2, "__builtin_ia32_pmovsqb512mem_mask", IX86_BUILTIN_PMOVSQB512_MEM, UNKNOWN, (int) VOID_FTYPE_PUDI_V8DI_UQI) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_us_truncatev16siv16qi2_mask_store, "__builtin_ia32_pmovusdb512mem_mask", IX86_BUILTIN_PMOVUSDB512_MEM, UNKNOWN, (int) VOID_FTYPE_PV16QI_V16SI_UHI) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_ss_truncatev16siv16qi2_mask_store, "__builtin_ia32_pmovsdb512mem_mask", IX86_BUILTIN_PMOVSDB512_MEM, UNKNOWN, (int) VOID_FTYPE_PV16QI_V16SI_UHI) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_truncatev16siv16qi2_mask_store, "__builtin_ia32_pmovdb512mem_mask", IX86_BUILTIN_PMOVDB512_MEM, UNKNOWN, (int) VOID_FTYPE_PV16QI_V16SI_UHI) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_storev16sf_mask, "__builtin_ia32_storeups512_mask", IX86_BUILTIN_STOREUPS512, UNKNOWN, (int) VOID_FTYPE_PFLOAT_V16SF_UHI) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_storev16sf_mask, "__builtin_ia32_storeaps512_mask", IX86_BUILTIN_STOREAPS512, UNKNOWN, (int) VOID_FTYPE_PV16SF_V16SF_UHI) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_storev16si_mask, "__builtin_ia32_movdqa32store512_mask", IX86_BUILTIN_MOVDQA32STORE512, UNKNOWN, (int) VOID_FTYPE_PV16SI_V16SI_UHI) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_storev8df_mask, "__builtin_ia32_storeapd512_mask", IX86_BUILTIN_STOREAPD512, UNKNOWN, (int) VOID_FTYPE_PV8DF_V8DF_UQI) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_storev8di_mask, "__builtin_ia32_movdqa64store512_mask", IX86_BUILTIN_MOVDQA64STORE512, UNKNOWN, (int) VOID_FTYPE_PV8DI_V8DI_UQI) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_compressstorev16sf_mask, "__builtin_ia32_compressstoresf512_mask", IX86_BUILTIN_COMPRESSPSSTORE512, UNKNOWN, (int) VOID_FTYPE_PV16SF_V16SF_UHI) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_compressstorev16si_mask, "__builtin_ia32_compressstoresi512_mask", IX86_BUILTIN_PCOMPRESSDSTORE512, UNKNOWN, (int) VOID_FTYPE_PV16SI_V16SI_UHI) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_compressstorev8df_mask, "__builtin_ia32_compressstoredf512_mask", IX86_BUILTIN_COMPRESSPDSTORE512, UNKNOWN, (int) VOID_FTYPE_PV8DF_V8DF_UQI) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_compressstorev8di_mask, "__builtin_ia32_compressstoredi512_mask", IX86_BUILTIN_PCOMPRESSQSTORE512, UNKNOWN, (int) VOID_FTYPE_PV8DI_V8DI_UQI) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_expandv16sf_mask, "__builtin_ia32_expandloadsf512_mask", IX86_BUILTIN_EXPANDPSLOAD512, UNKNOWN, (int) V16SF_FTYPE_PCV16SF_V16SF_UHI) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_expandv16sf_maskz, "__builtin_ia32_expandloadsf512_maskz", IX86_BUILTIN_EXPANDPSLOAD512Z, UNKNOWN, (int) V16SF_FTYPE_PCV16SF_V16SF_UHI) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_expandv16si_mask, "__builtin_ia32_expandloadsi512_mask", IX86_BUILTIN_PEXPANDDLOAD512, UNKNOWN, (int) V16SI_FTYPE_PCV16SI_V16SI_UHI) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_expandv16si_maskz, "__builtin_ia32_expandloadsi512_maskz", IX86_BUILTIN_PEXPANDDLOAD512Z, UNKNOWN, (int) V16SI_FTYPE_PCV16SI_V16SI_UHI) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_expandv8df_mask, "__builtin_ia32_expandloaddf512_mask", IX86_BUILTIN_EXPANDPDLOAD512, UNKNOWN, (int) V8DF_FTYPE_PCV8DF_V8DF_UQI) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_expandv8df_maskz, "__builtin_ia32_expandloaddf512_maskz", IX86_BUILTIN_EXPANDPDLOAD512Z, UNKNOWN, (int) V8DF_FTYPE_PCV8DF_V8DF_UQI) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_expandv8di_mask, "__builtin_ia32_expandloaddi512_mask", IX86_BUILTIN_PEXPANDQLOAD512, UNKNOWN, (int) V8DI_FTYPE_PCV8DI_V8DI_UQI) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_expandv8di_maskz, "__builtin_ia32_expandloaddi512_maskz", IX86_BUILTIN_PEXPANDQLOAD512Z, UNKNOWN, (int) V8DI_FTYPE_PCV8DI_V8DI_UQI) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_loadv16si_mask, "__builtin_ia32_loaddqusi512_mask", IX86_BUILTIN_LOADDQUSI512, UNKNOWN, (int) V16SI_FTYPE_PCINT_V16SI_UHI) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_loadv8di_mask, "__builtin_ia32_loaddqudi512_mask", IX86_BUILTIN_LOADDQUDI512, UNKNOWN, (int) V8DI_FTYPE_PCINT64_V8DI_UQI) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_loadv8df_mask, "__builtin_ia32_loadupd512_mask", IX86_BUILTIN_LOADUPD512, UNKNOWN, (int) V8DF_FTYPE_PCDOUBLE_V8DF_UQI) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_loadv16sf_mask, "__builtin_ia32_loadups512_mask", IX86_BUILTIN_LOADUPS512, UNKNOWN, (int) V16SF_FTYPE_PCFLOAT_V16SF_UHI) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_loadv16sf_mask, "__builtin_ia32_loadaps512_mask", IX86_BUILTIN_LOADAPS512, UNKNOWN, (int) V16SF_FTYPE_PCV16SF_V16SF_UHI) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_loadv16si_mask, "__builtin_ia32_movdqa32load512_mask", IX86_BUILTIN_MOVDQA32LOAD512, UNKNOWN, (int) V16SI_FTYPE_PCV16SI_V16SI_UHI) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_loadv8df_mask, "__builtin_ia32_loadapd512_mask", IX86_BUILTIN_LOADAPD512, UNKNOWN, (int) V8DF_FTYPE_PCV8DF_V8DF_UQI) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_loadv8di_mask, "__builtin_ia32_movdqa64load512_mask", IX86_BUILTIN_MOVDQA64LOAD512, UNKNOWN, (int) V8DI_FTYPE_PCV8DI_V8DI_UQI) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_movntv16sf, "__builtin_ia32_movntps512", IX86_BUILTIN_MOVNTPS512, UNKNOWN, (int) VOID_FTYPE_PFLOAT_V16SF) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_movntv8df, "__builtin_ia32_movntpd512", IX86_BUILTIN_MOVNTPD512, UNKNOWN, (int) VOID_FTYPE_PDOUBLE_V8DF) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_movntv8di, "__builtin_ia32_movntdq512", IX86_BUILTIN_MOVNTDQ512, UNKNOWN, (int) VOID_FTYPE_PV8DI_V8DI) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_movntdqa, "__builtin_ia32_movntdqa512", IX86_BUILTIN_MOVNTDQA512, UNKNOWN, (int) V8DI_FTYPE_PV8DI) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_storev16si_mask, "__builtin_ia32_storedqusi512_mask", IX86_BUILTIN_STOREDQUSI512, UNKNOWN, (int) VOID_FTYPE_PINT_V16SI_UHI) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_storev8di_mask, "__builtin_ia32_storedqudi512_mask", IX86_BUILTIN_STOREDQUDI512, UNKNOWN, (int) VOID_FTYPE_PINT64_V8DI_UQI) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_storev8df_mask, "__builtin_ia32_storeupd512_mask", IX86_BUILTIN_STOREUPD512, UNKNOWN, (int) VOID_FTYPE_PDOUBLE_V8DF_UQI) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_us_truncatev8div8si2_mask_store, "__builtin_ia32_pmovusqd512mem_mask", IX86_BUILTIN_PMOVUSQD512_MEM, UNKNOWN, (int) VOID_FTYPE_PV8SI_V8DI_UQI) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_ss_truncatev8div8si2_mask_store, "__builtin_ia32_pmovsqd512mem_mask", IX86_BUILTIN_PMOVSQD512_MEM, UNKNOWN, (int) VOID_FTYPE_PV8SI_V8DI_UQI) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_truncatev8div8si2_mask_store, "__builtin_ia32_pmovqd512mem_mask", IX86_BUILTIN_PMOVQD512_MEM, UNKNOWN, (int) VOID_FTYPE_PV8SI_V8DI_UQI) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_us_truncatev8div8hi2_mask_store, "__builtin_ia32_pmovusqw512mem_mask", IX86_BUILTIN_PMOVUSQW512_MEM, UNKNOWN, (int) VOID_FTYPE_PV8HI_V8DI_UQI) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_ss_truncatev8div8hi2_mask_store, "__builtin_ia32_pmovsqw512mem_mask", IX86_BUILTIN_PMOVSQW512_MEM, UNKNOWN, (int) VOID_FTYPE_PV8HI_V8DI_UQI) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_truncatev8div8hi2_mask_store, "__builtin_ia32_pmovqw512mem_mask", IX86_BUILTIN_PMOVQW512_MEM, UNKNOWN, (int) VOID_FTYPE_PV8HI_V8DI_UQI) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_us_truncatev16siv16hi2_mask_store, "__builtin_ia32_pmovusdw512mem_mask", IX86_BUILTIN_PMOVUSDW512_MEM, UNKNOWN, (int) VOID_FTYPE_PV16HI_V16SI_UHI) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_ss_truncatev16siv16hi2_mask_store, "__builtin_ia32_pmovsdw512mem_mask", IX86_BUILTIN_PMOVSDW512_MEM, UNKNOWN, (int) VOID_FTYPE_PV16HI_V16SI_UHI) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_truncatev16siv16hi2_mask_store, "__builtin_ia32_pmovdw512mem_mask", IX86_BUILTIN_PMOVDW512_MEM, UNKNOWN, (int) VOID_FTYPE_PV16HI_V16SI_UHI) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_truncatev8div16qi2_mask_store_2, "__builtin_ia32_pmovqb512mem_mask", IX86_BUILTIN_PMOVQB512_MEM, UNKNOWN, (int) VOID_FTYPE_PUDI_V8DI_UQI) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_us_truncatev8div16qi2_mask_store_2, "__builtin_ia32_pmovusqb512mem_mask", IX86_BUILTIN_PMOVUSQB512_MEM, UNKNOWN, (int) VOID_FTYPE_PUDI_V8DI_UQI) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_ss_truncatev8div16qi2_mask_store_2, "__builtin_ia32_pmovsqb512mem_mask", IX86_BUILTIN_PMOVSQB512_MEM, UNKNOWN, (int) VOID_FTYPE_PUDI_V8DI_UQI) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_us_truncatev16siv16qi2_mask_store, "__builtin_ia32_pmovusdb512mem_mask", IX86_BUILTIN_PMOVUSDB512_MEM, UNKNOWN, (int) VOID_FTYPE_PV16QI_V16SI_UHI) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_ss_truncatev16siv16qi2_mask_store, "__builtin_ia32_pmovsdb512mem_mask", IX86_BUILTIN_PMOVSDB512_MEM, UNKNOWN, (int) VOID_FTYPE_PV16QI_V16SI_UHI) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_truncatev16siv16qi2_mask_store, "__builtin_ia32_pmovdb512mem_mask", IX86_BUILTIN_PMOVDB512_MEM, UNKNOWN, (int) VOID_FTYPE_PV16QI_V16SI_UHI) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_storev16sf_mask, "__builtin_ia32_storeups512_mask", IX86_BUILTIN_STOREUPS512, UNKNOWN, (int) VOID_FTYPE_PFLOAT_V16SF_UHI) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_storev16sf_mask, "__builtin_ia32_storeaps512_mask", IX86_BUILTIN_STOREAPS512, UNKNOWN, (int) VOID_FTYPE_PV16SF_V16SF_UHI) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_storev16si_mask, "__builtin_ia32_movdqa32store512_mask", IX86_BUILTIN_MOVDQA32STORE512, UNKNOWN, (int) VOID_FTYPE_PV16SI_V16SI_UHI) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_storev8df_mask, "__builtin_ia32_storeapd512_mask", IX86_BUILTIN_STOREAPD512, UNKNOWN, (int) VOID_FTYPE_PV8DF_V8DF_UQI) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_storev8di_mask, "__builtin_ia32_movdqa64store512_mask", IX86_BUILTIN_MOVDQA64STORE512, UNKNOWN, (int) VOID_FTYPE_PV8DI_V8DI_UQI) BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_loaddf_mask, "__builtin_ia32_loadsd_mask", IX86_BUILTIN_LOADSD_MASK, UNKNOWN, (int) V2DF_FTYPE_PCDOUBLE_V2DF_UQI) BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_loadsf_mask, "__builtin_ia32_loadss_mask", IX86_BUILTIN_LOADSS_MASK, UNKNOWN, (int) V4SF_FTYPE_PCFLOAT_V4SF_UQI) BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_storedf_mask, "__builtin_ia32_storesd_mask", IX86_BUILTIN_STORESD_MASK, UNKNOWN, (int) VOID_FTYPE_PDOUBLE_V2DF_UQI) @@ -1360,231 +1360,231 @@ BDESC (OPTION_MASK_ISA_BMI2, 0, CODE_FOR_bmi2_pext_si3, "__builtin_ia32_pext_si" BDESC (OPTION_MASK_ISA_BMI2 | OPTION_MASK_ISA_64BIT, 0, CODE_FOR_bmi2_pext_di3, "__builtin_ia32_pext_di", IX86_BUILTIN_PEXT64, UNKNOWN, (int) UINT64_FTYPE_UINT64_UINT64) /* AVX512F */ -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_si512_256si, "__builtin_ia32_si512_256si", IX86_BUILTIN_SI512_SI256, UNKNOWN, (int) V16SI_FTYPE_V8SI) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_ps512_256ps, "__builtin_ia32_ps512_256ps", IX86_BUILTIN_PS512_PS256, UNKNOWN, (int) V16SF_FTYPE_V8SF) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_pd512_256pd, "__builtin_ia32_pd512_256pd", IX86_BUILTIN_PD512_PD256, UNKNOWN, (int) V8DF_FTYPE_V4DF) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_si512_si, "__builtin_ia32_si512_si", IX86_BUILTIN_SI512_SI, UNKNOWN, (int) V16SI_FTYPE_V4SI) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_ps512_ps, "__builtin_ia32_ps512_ps", IX86_BUILTIN_PS512_PS, UNKNOWN, (int) V16SF_FTYPE_V4SF) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_pd512_pd, "__builtin_ia32_pd512_pd", IX86_BUILTIN_PD512_PD, UNKNOWN, (int) V8DF_FTYPE_V2DF) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_alignv16si_mask, "__builtin_ia32_alignd512_mask", IX86_BUILTIN_ALIGND512, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_INT_V16SI_UHI) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_alignv8di_mask, "__builtin_ia32_alignq512_mask", IX86_BUILTIN_ALIGNQ512, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_INT_V8DI_UQI) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_blendmv16si, "__builtin_ia32_blendmd_512_mask", IX86_BUILTIN_BLENDMD512, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_UHI) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_blendmv8df, "__builtin_ia32_blendmpd_512_mask", IX86_BUILTIN_BLENDMPD512, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_UQI) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_blendmv16sf, "__builtin_ia32_blendmps_512_mask", IX86_BUILTIN_BLENDMPS512, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_UHI) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_blendmv8di, "__builtin_ia32_blendmq_512_mask", IX86_BUILTIN_BLENDMQ512, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_UQI) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_broadcastv16sf_mask, "__builtin_ia32_broadcastf32x4_512", IX86_BUILTIN_BROADCASTF32X4_512, UNKNOWN, (int) V16SF_FTYPE_V4SF_V16SF_UHI) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_broadcastv8df_mask, "__builtin_ia32_broadcastf64x4_512", IX86_BUILTIN_BROADCASTF64X4_512, UNKNOWN, (int) V8DF_FTYPE_V4DF_V8DF_UQI) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_broadcastv16si_mask, "__builtin_ia32_broadcasti32x4_512", IX86_BUILTIN_BROADCASTI32X4_512, UNKNOWN, (int) V16SI_FTYPE_V4SI_V16SI_UHI) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_broadcastv8di_mask, "__builtin_ia32_broadcasti64x4_512", IX86_BUILTIN_BROADCASTI64X4_512, UNKNOWN, (int) V8DI_FTYPE_V4DI_V8DI_UQI) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_vec_dupv8df_mask, "__builtin_ia32_broadcastsd512", IX86_BUILTIN_BROADCASTSD512, UNKNOWN, (int) V8DF_FTYPE_V2DF_V8DF_UQI) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_vec_dupv16sf_mask, "__builtin_ia32_broadcastss512", IX86_BUILTIN_BROADCASTSS512, UNKNOWN, (int) V16SF_FTYPE_V4SF_V16SF_UHI) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_cmpv16si3_mask, "__builtin_ia32_cmpd512_mask", IX86_BUILTIN_CMPD512, UNKNOWN, (int) UHI_FTYPE_V16SI_V16SI_INT_UHI) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_cmpv8di3_mask, "__builtin_ia32_cmpq512_mask", IX86_BUILTIN_CMPQ512, UNKNOWN, (int) UQI_FTYPE_V8DI_V8DI_INT_UQI) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_compressv8df_mask, "__builtin_ia32_compressdf512_mask", IX86_BUILTIN_COMPRESSPD512, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_UQI) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_compressv16sf_mask, "__builtin_ia32_compresssf512_mask", IX86_BUILTIN_COMPRESSPS512, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_UHI) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_floatv8siv8df2_mask, "__builtin_ia32_cvtdq2pd512_mask", IX86_BUILTIN_CVTDQ2PD512, UNKNOWN, (int) V8DF_FTYPE_V8SI_V8DF_UQI) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_vcvtps2ph512_mask_sae, "__builtin_ia32_vcvtps2ph512_mask", IX86_BUILTIN_CVTPS2PH512, UNKNOWN, (int) V16HI_FTYPE_V16SF_INT_V16HI_UHI) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_floatunsv8siv8df2_mask, "__builtin_ia32_cvtudq2pd512_mask", IX86_BUILTIN_CVTUDQ2PD512, UNKNOWN, (int) V8DF_FTYPE_V8SI_V8DF_UQI) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_si512_256si, "__builtin_ia32_si512_256si", IX86_BUILTIN_SI512_SI256, UNKNOWN, (int) V16SI_FTYPE_V8SI) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_ps512_256ps, "__builtin_ia32_ps512_256ps", IX86_BUILTIN_PS512_PS256, UNKNOWN, (int) V16SF_FTYPE_V8SF) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_pd512_256pd, "__builtin_ia32_pd512_256pd", IX86_BUILTIN_PD512_PD256, UNKNOWN, (int) V8DF_FTYPE_V4DF) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_si512_si, "__builtin_ia32_si512_si", IX86_BUILTIN_SI512_SI, UNKNOWN, (int) V16SI_FTYPE_V4SI) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_ps512_ps, "__builtin_ia32_ps512_ps", IX86_BUILTIN_PS512_PS, UNKNOWN, (int) V16SF_FTYPE_V4SF) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_pd512_pd, "__builtin_ia32_pd512_pd", IX86_BUILTIN_PD512_PD, UNKNOWN, (int) V8DF_FTYPE_V2DF) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_alignv16si_mask, "__builtin_ia32_alignd512_mask", IX86_BUILTIN_ALIGND512, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_INT_V16SI_UHI) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_alignv8di_mask, "__builtin_ia32_alignq512_mask", IX86_BUILTIN_ALIGNQ512, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_INT_V8DI_UQI) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_blendmv16si, "__builtin_ia32_blendmd_512_mask", IX86_BUILTIN_BLENDMD512, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_UHI) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_blendmv8df, "__builtin_ia32_blendmpd_512_mask", IX86_BUILTIN_BLENDMPD512, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_UQI) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_blendmv16sf, "__builtin_ia32_blendmps_512_mask", IX86_BUILTIN_BLENDMPS512, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_UHI) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_blendmv8di, "__builtin_ia32_blendmq_512_mask", IX86_BUILTIN_BLENDMQ512, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_UQI) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_broadcastv16sf_mask, "__builtin_ia32_broadcastf32x4_512", IX86_BUILTIN_BROADCASTF32X4_512, UNKNOWN, (int) V16SF_FTYPE_V4SF_V16SF_UHI) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_broadcastv8df_mask, "__builtin_ia32_broadcastf64x4_512", IX86_BUILTIN_BROADCASTF64X4_512, UNKNOWN, (int) V8DF_FTYPE_V4DF_V8DF_UQI) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_broadcastv16si_mask, "__builtin_ia32_broadcasti32x4_512", IX86_BUILTIN_BROADCASTI32X4_512, UNKNOWN, (int) V16SI_FTYPE_V4SI_V16SI_UHI) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_broadcastv8di_mask, "__builtin_ia32_broadcasti64x4_512", IX86_BUILTIN_BROADCASTI64X4_512, UNKNOWN, (int) V8DI_FTYPE_V4DI_V8DI_UQI) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_vec_dupv8df_mask, "__builtin_ia32_broadcastsd512", IX86_BUILTIN_BROADCASTSD512, UNKNOWN, (int) V8DF_FTYPE_V2DF_V8DF_UQI) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_vec_dupv16sf_mask, "__builtin_ia32_broadcastss512", IX86_BUILTIN_BROADCASTSS512, UNKNOWN, (int) V16SF_FTYPE_V4SF_V16SF_UHI) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_cmpv16si3_mask, "__builtin_ia32_cmpd512_mask", IX86_BUILTIN_CMPD512, UNKNOWN, (int) UHI_FTYPE_V16SI_V16SI_INT_UHI) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_cmpv8di3_mask, "__builtin_ia32_cmpq512_mask", IX86_BUILTIN_CMPQ512, UNKNOWN, (int) UQI_FTYPE_V8DI_V8DI_INT_UQI) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_compressv8df_mask, "__builtin_ia32_compressdf512_mask", IX86_BUILTIN_COMPRESSPD512, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_UQI) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_compressv16sf_mask, "__builtin_ia32_compresssf512_mask", IX86_BUILTIN_COMPRESSPS512, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_UHI) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_floatv8siv8df2_mask, "__builtin_ia32_cvtdq2pd512_mask", IX86_BUILTIN_CVTDQ2PD512, UNKNOWN, (int) V8DF_FTYPE_V8SI_V8DF_UQI) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_vcvtps2ph512_mask_sae, "__builtin_ia32_vcvtps2ph512_mask", IX86_BUILTIN_CVTPS2PH512, UNKNOWN, (int) V16HI_FTYPE_V16SF_INT_V16HI_UHI) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_floatunsv8siv8df2_mask, "__builtin_ia32_cvtudq2pd512_mask", IX86_BUILTIN_CVTUDQ2PD512, UNKNOWN, (int) V8DF_FTYPE_V8SI_V8DF_UQI) BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_cvtusi2sd32, "__builtin_ia32_cvtusi2sd32", IX86_BUILTIN_CVTUSI2SD32, UNKNOWN, (int) V2DF_FTYPE_V2DF_UINT) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_expandv8df_mask, "__builtin_ia32_expanddf512_mask", IX86_BUILTIN_EXPANDPD512, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_UQI) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_expandv8df_maskz, "__builtin_ia32_expanddf512_maskz", IX86_BUILTIN_EXPANDPD512Z, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_UQI) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_expandv16sf_mask, "__builtin_ia32_expandsf512_mask", IX86_BUILTIN_EXPANDPS512, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_UHI) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_expandv16sf_maskz, "__builtin_ia32_expandsf512_maskz", IX86_BUILTIN_EXPANDPS512Z, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_UHI) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_vextractf32x4_mask, "__builtin_ia32_extractf32x4_mask", IX86_BUILTIN_EXTRACTF32X4, UNKNOWN, (int) V4SF_FTYPE_V16SF_INT_V4SF_UQI) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_vextractf64x4_mask, "__builtin_ia32_extractf64x4_mask", IX86_BUILTIN_EXTRACTF64X4, UNKNOWN, (int) V4DF_FTYPE_V8DF_INT_V4DF_UQI) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_vextracti32x4_mask, "__builtin_ia32_extracti32x4_mask", IX86_BUILTIN_EXTRACTI32X4, UNKNOWN, (int) V4SI_FTYPE_V16SI_INT_V4SI_UQI) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_vextracti64x4_mask, "__builtin_ia32_extracti64x4_mask", IX86_BUILTIN_EXTRACTI64X4, UNKNOWN, (int) V4DI_FTYPE_V8DI_INT_V4DI_UQI) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_vinsertf32x4_mask, "__builtin_ia32_insertf32x4_mask", IX86_BUILTIN_INSERTF32X4, UNKNOWN, (int) V16SF_FTYPE_V16SF_V4SF_INT_V16SF_UHI) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_vinsertf64x4_mask, "__builtin_ia32_insertf64x4_mask", IX86_BUILTIN_INSERTF64X4, UNKNOWN, (int) V8DF_FTYPE_V8DF_V4DF_INT_V8DF_UQI) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_vinserti32x4_mask, "__builtin_ia32_inserti32x4_mask", IX86_BUILTIN_INSERTI32X4, UNKNOWN, (int) V16SI_FTYPE_V16SI_V4SI_INT_V16SI_UHI) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_vinserti64x4_mask, "__builtin_ia32_inserti64x4_mask", IX86_BUILTIN_INSERTI64X4, UNKNOWN, (int) V8DI_FTYPE_V8DI_V4DI_INT_V8DI_UQI) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_loadv8df_mask, "__builtin_ia32_movapd512_mask", IX86_BUILTIN_MOVAPD512, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_UQI) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_loadv16sf_mask, "__builtin_ia32_movaps512_mask", IX86_BUILTIN_MOVAPS512, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_UHI) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_movddup512_mask, "__builtin_ia32_movddup512_mask", IX86_BUILTIN_MOVDDUP512, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_UQI) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_loadv16si_mask, "__builtin_ia32_movdqa32_512_mask", IX86_BUILTIN_MOVDQA32_512, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_UHI) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_loadv8di_mask, "__builtin_ia32_movdqa64_512_mask", IX86_BUILTIN_MOVDQA64_512, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_UQI) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_movshdup512_mask, "__builtin_ia32_movshdup512_mask", IX86_BUILTIN_MOVSHDUP512, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_UHI) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_movsldup512_mask, "__builtin_ia32_movsldup512_mask", IX86_BUILTIN_MOVSLDUP512, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_UHI) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_absv16si2_mask, "__builtin_ia32_pabsd512_mask", IX86_BUILTIN_PABSD512, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_UHI) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_absv8di2_mask, "__builtin_ia32_pabsq512_mask", IX86_BUILTIN_PABSQ512, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_UQI) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_addv16si3_mask, "__builtin_ia32_paddd512_mask", IX86_BUILTIN_PADDD512, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_addv8di3_mask, "__builtin_ia32_paddq512_mask", IX86_BUILTIN_PADDQ512, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_V8DI_UQI) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_andv16si3_mask, "__builtin_ia32_pandd512_mask", IX86_BUILTIN_PANDD512, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_andnotv16si3_mask, "__builtin_ia32_pandnd512_mask", IX86_BUILTIN_PANDND512, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_andnotv8di3_mask, "__builtin_ia32_pandnq512_mask", IX86_BUILTIN_PANDNQ512, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_V8DI_UQI) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_andv8di3_mask, "__builtin_ia32_pandq512_mask", IX86_BUILTIN_PANDQ512, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_V8DI_UQI) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_vec_dupv16si_mask, "__builtin_ia32_pbroadcastd512", IX86_BUILTIN_PBROADCASTD512, UNKNOWN, (int) V16SI_FTYPE_V4SI_V16SI_UHI) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_vec_dup_gprv16si_mask, "__builtin_ia32_pbroadcastd512_gpr_mask", IX86_BUILTIN_PBROADCASTD512_GPR, UNKNOWN, (int) V16SI_FTYPE_SI_V16SI_UHI) -BDESC (OPTION_MASK_ISA_AVX512CD, 0, CODE_FOR_avx512cd_maskb_vec_dupv8di, "__builtin_ia32_broadcastmb512", IX86_BUILTIN_PBROADCASTMB512, UNKNOWN, (int) V8DI_FTYPE_UQI) -BDESC (OPTION_MASK_ISA_AVX512CD, 0, CODE_FOR_avx512cd_maskw_vec_dupv16si, "__builtin_ia32_broadcastmw512", IX86_BUILTIN_PBROADCASTMW512, UNKNOWN, (int) V16SI_FTYPE_UHI) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_vec_dupv8di_mask, "__builtin_ia32_pbroadcastq512", IX86_BUILTIN_PBROADCASTQ512, UNKNOWN, (int) V8DI_FTYPE_V2DI_V8DI_UQI) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_vec_dup_gprv8di_mask, "__builtin_ia32_pbroadcastq512_gpr_mask", IX86_BUILTIN_PBROADCASTQ512_GPR, UNKNOWN, (int) V8DI_FTYPE_DI_V8DI_UQI) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_eqv16si3_mask, "__builtin_ia32_pcmpeqd512_mask", IX86_BUILTIN_PCMPEQD512_MASK, UNKNOWN, (int) UHI_FTYPE_V16SI_V16SI_UHI) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_eqv8di3_mask, "__builtin_ia32_pcmpeqq512_mask", IX86_BUILTIN_PCMPEQQ512_MASK, UNKNOWN, (int) UQI_FTYPE_V8DI_V8DI_UQI) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_gtv16si3_mask, "__builtin_ia32_pcmpgtd512_mask", IX86_BUILTIN_PCMPGTD512_MASK, UNKNOWN, (int) UHI_FTYPE_V16SI_V16SI_UHI) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_gtv8di3_mask, "__builtin_ia32_pcmpgtq512_mask", IX86_BUILTIN_PCMPGTQ512_MASK, UNKNOWN, (int) UQI_FTYPE_V8DI_V8DI_UQI) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_compressv16si_mask, "__builtin_ia32_compresssi512_mask", IX86_BUILTIN_PCOMPRESSD512, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_UHI) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_compressv8di_mask, "__builtin_ia32_compressdi512_mask", IX86_BUILTIN_PCOMPRESSQ512, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_UQI) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_expandv16si_mask, "__builtin_ia32_expandsi512_mask", IX86_BUILTIN_PEXPANDD512, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_UHI) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_expandv16si_maskz, "__builtin_ia32_expandsi512_maskz", IX86_BUILTIN_PEXPANDD512Z, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_UHI) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_expandv8di_mask, "__builtin_ia32_expanddi512_mask", IX86_BUILTIN_PEXPANDQ512, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_UQI) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_expandv8di_maskz, "__builtin_ia32_expanddi512_maskz", IX86_BUILTIN_PEXPANDQ512Z, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_UQI) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_smaxv16si3_mask, "__builtin_ia32_pmaxsd512_mask", IX86_BUILTIN_PMAXSD512, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_smaxv8di3_mask, "__builtin_ia32_pmaxsq512_mask", IX86_BUILTIN_PMAXSQ512, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_V8DI_UQI) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_umaxv16si3_mask, "__builtin_ia32_pmaxud512_mask", IX86_BUILTIN_PMAXUD512, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_umaxv8di3_mask, "__builtin_ia32_pmaxuq512_mask", IX86_BUILTIN_PMAXUQ512, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_V8DI_UQI) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_sminv16si3_mask, "__builtin_ia32_pminsd512_mask", IX86_BUILTIN_PMINSD512, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_sminv8di3_mask, "__builtin_ia32_pminsq512_mask", IX86_BUILTIN_PMINSQ512, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_V8DI_UQI) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_uminv16si3_mask, "__builtin_ia32_pminud512_mask", IX86_BUILTIN_PMINUD512, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_uminv8di3_mask, "__builtin_ia32_pminuq512_mask", IX86_BUILTIN_PMINUQ512, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_V8DI_UQI) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_truncatev16siv16qi2_mask, "__builtin_ia32_pmovdb512_mask", IX86_BUILTIN_PMOVDB512, UNKNOWN, (int) V16QI_FTYPE_V16SI_V16QI_UHI) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_truncatev16siv16hi2_mask, "__builtin_ia32_pmovdw512_mask", IX86_BUILTIN_PMOVDW512, UNKNOWN, (int) V16HI_FTYPE_V16SI_V16HI_UHI) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_truncatev8div16qi2_mask, "__builtin_ia32_pmovqb512_mask", IX86_BUILTIN_PMOVQB512, UNKNOWN, (int) V16QI_FTYPE_V8DI_V16QI_UQI) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_truncatev8div8si2_mask, "__builtin_ia32_pmovqd512_mask", IX86_BUILTIN_PMOVQD512, UNKNOWN, (int) V8SI_FTYPE_V8DI_V8SI_UQI) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_truncatev8div8hi2_mask, "__builtin_ia32_pmovqw512_mask", IX86_BUILTIN_PMOVQW512, UNKNOWN, (int) V8HI_FTYPE_V8DI_V8HI_UQI) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_ss_truncatev16siv16qi2_mask, "__builtin_ia32_pmovsdb512_mask", IX86_BUILTIN_PMOVSDB512, UNKNOWN, (int) V16QI_FTYPE_V16SI_V16QI_UHI) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_ss_truncatev16siv16hi2_mask, "__builtin_ia32_pmovsdw512_mask", IX86_BUILTIN_PMOVSDW512, UNKNOWN, (int) V16HI_FTYPE_V16SI_V16HI_UHI) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_ss_truncatev8div16qi2_mask, "__builtin_ia32_pmovsqb512_mask", IX86_BUILTIN_PMOVSQB512, UNKNOWN, (int) V16QI_FTYPE_V8DI_V16QI_UQI) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_ss_truncatev8div8si2_mask, "__builtin_ia32_pmovsqd512_mask", IX86_BUILTIN_PMOVSQD512, UNKNOWN, (int) V8SI_FTYPE_V8DI_V8SI_UQI) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_ss_truncatev8div8hi2_mask, "__builtin_ia32_pmovsqw512_mask", IX86_BUILTIN_PMOVSQW512, UNKNOWN, (int) V8HI_FTYPE_V8DI_V8HI_UQI) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_sign_extendv16qiv16si2_mask, "__builtin_ia32_pmovsxbd512_mask", IX86_BUILTIN_PMOVSXBD512, UNKNOWN, (int) V16SI_FTYPE_V16QI_V16SI_UHI) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_sign_extendv8qiv8di2_mask, "__builtin_ia32_pmovsxbq512_mask", IX86_BUILTIN_PMOVSXBQ512, UNKNOWN, (int) V8DI_FTYPE_V16QI_V8DI_UQI) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_sign_extendv8siv8di2_mask, "__builtin_ia32_pmovsxdq512_mask", IX86_BUILTIN_PMOVSXDQ512, UNKNOWN, (int) V8DI_FTYPE_V8SI_V8DI_UQI) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_sign_extendv16hiv16si2_mask, "__builtin_ia32_pmovsxwd512_mask", IX86_BUILTIN_PMOVSXWD512, UNKNOWN, (int) V16SI_FTYPE_V16HI_V16SI_UHI) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_sign_extendv8hiv8di2_mask, "__builtin_ia32_pmovsxwq512_mask", IX86_BUILTIN_PMOVSXWQ512, UNKNOWN, (int) V8DI_FTYPE_V8HI_V8DI_UQI) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_us_truncatev16siv16qi2_mask, "__builtin_ia32_pmovusdb512_mask", IX86_BUILTIN_PMOVUSDB512, UNKNOWN, (int) V16QI_FTYPE_V16SI_V16QI_UHI) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_us_truncatev16siv16hi2_mask, "__builtin_ia32_pmovusdw512_mask", IX86_BUILTIN_PMOVUSDW512, UNKNOWN, (int) V16HI_FTYPE_V16SI_V16HI_UHI) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_us_truncatev8div16qi2_mask, "__builtin_ia32_pmovusqb512_mask", IX86_BUILTIN_PMOVUSQB512, UNKNOWN, (int) V16QI_FTYPE_V8DI_V16QI_UQI) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_us_truncatev8div8si2_mask, "__builtin_ia32_pmovusqd512_mask", IX86_BUILTIN_PMOVUSQD512, UNKNOWN, (int) V8SI_FTYPE_V8DI_V8SI_UQI) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_us_truncatev8div8hi2_mask, "__builtin_ia32_pmovusqw512_mask", IX86_BUILTIN_PMOVUSQW512, UNKNOWN, (int) V8HI_FTYPE_V8DI_V8HI_UQI) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_zero_extendv16qiv16si2_mask, "__builtin_ia32_pmovzxbd512_mask", IX86_BUILTIN_PMOVZXBD512, UNKNOWN, (int) V16SI_FTYPE_V16QI_V16SI_UHI) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_zero_extendv8qiv8di2_mask, "__builtin_ia32_pmovzxbq512_mask", IX86_BUILTIN_PMOVZXBQ512, UNKNOWN, (int) V8DI_FTYPE_V16QI_V8DI_UQI) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_zero_extendv8siv8di2_mask, "__builtin_ia32_pmovzxdq512_mask", IX86_BUILTIN_PMOVZXDQ512, UNKNOWN, (int) V8DI_FTYPE_V8SI_V8DI_UQI) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_zero_extendv16hiv16si2_mask, "__builtin_ia32_pmovzxwd512_mask", IX86_BUILTIN_PMOVZXWD512, UNKNOWN, (int) V16SI_FTYPE_V16HI_V16SI_UHI) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_zero_extendv8hiv8di2_mask, "__builtin_ia32_pmovzxwq512_mask", IX86_BUILTIN_PMOVZXWQ512, UNKNOWN, (int) V8DI_FTYPE_V8HI_V8DI_UQI) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_vec_widen_smult_even_v16si_mask, "__builtin_ia32_pmuldq512_mask", IX86_BUILTIN_PMULDQ512, UNKNOWN, (int) V8DI_FTYPE_V16SI_V16SI_V8DI_UQI) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_mulv16si3_mask, "__builtin_ia32_pmulld512_mask" , IX86_BUILTIN_PMULLD512, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_vec_widen_umult_even_v16si_mask, "__builtin_ia32_pmuludq512_mask", IX86_BUILTIN_PMULUDQ512, UNKNOWN, (int) V8DI_FTYPE_V16SI_V16SI_V8DI_UQI) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_iorv16si3_mask, "__builtin_ia32_pord512_mask", IX86_BUILTIN_PORD512, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_iorv8di3_mask, "__builtin_ia32_porq512_mask", IX86_BUILTIN_PORQ512, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_V8DI_UQI) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_rolv16si_mask, "__builtin_ia32_prold512_mask", IX86_BUILTIN_PROLD512, UNKNOWN, (int) V16SI_FTYPE_V16SI_INT_V16SI_UHI) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_rolv8di_mask, "__builtin_ia32_prolq512_mask", IX86_BUILTIN_PROLQ512, UNKNOWN, (int) V8DI_FTYPE_V8DI_INT_V8DI_UQI) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_rolvv16si_mask, "__builtin_ia32_prolvd512_mask", IX86_BUILTIN_PROLVD512, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_rolvv8di_mask, "__builtin_ia32_prolvq512_mask", IX86_BUILTIN_PROLVQ512, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_V8DI_UQI) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_rorv16si_mask, "__builtin_ia32_prord512_mask", IX86_BUILTIN_PRORD512, UNKNOWN, (int) V16SI_FTYPE_V16SI_INT_V16SI_UHI) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_rorv8di_mask, "__builtin_ia32_prorq512_mask", IX86_BUILTIN_PRORQ512, UNKNOWN, (int) V8DI_FTYPE_V8DI_INT_V8DI_UQI) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_rorvv16si_mask, "__builtin_ia32_prorvd512_mask", IX86_BUILTIN_PRORVD512, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_rorvv8di_mask, "__builtin_ia32_prorvq512_mask", IX86_BUILTIN_PRORVQ512, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_V8DI_UQI) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_pshufdv3_mask, "__builtin_ia32_pshufd512_mask", IX86_BUILTIN_PSHUFD512, UNKNOWN, (int) V16SI_FTYPE_V16SI_INT_V16SI_UHI) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_ashlv16si3_mask, "__builtin_ia32_pslld512_mask", IX86_BUILTIN_PSLLD512, UNKNOWN, (int) V16SI_FTYPE_V16SI_V4SI_V16SI_UHI_COUNT) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_ashlv16si3_mask, "__builtin_ia32_pslldi512_mask", IX86_BUILTIN_PSLLDI512, UNKNOWN, (int) V16SI_FTYPE_V16SI_INT_V16SI_UHI_COUNT) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_ashlv8di3_mask, "__builtin_ia32_psllq512_mask", IX86_BUILTIN_PSLLQ512, UNKNOWN, (int) V8DI_FTYPE_V8DI_V2DI_V8DI_UQI_COUNT) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_ashlv8di3_mask, "__builtin_ia32_psllqi512_mask", IX86_BUILTIN_PSLLQI512, UNKNOWN, (int) V8DI_FTYPE_V8DI_INT_V8DI_UQI_COUNT) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_ashlvv16si_mask, "__builtin_ia32_psllv16si_mask", IX86_BUILTIN_PSLLVV16SI, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_ashlvv8di_mask, "__builtin_ia32_psllv8di_mask", IX86_BUILTIN_PSLLVV8DI, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_V8DI_UQI) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_ashrv16si3_mask, "__builtin_ia32_psrad512_mask", IX86_BUILTIN_PSRAD512, UNKNOWN, (int) V16SI_FTYPE_V16SI_V4SI_V16SI_UHI_COUNT) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_ashrv16si3_mask, "__builtin_ia32_psradi512_mask", IX86_BUILTIN_PSRADI512, UNKNOWN, (int) V16SI_FTYPE_V16SI_INT_V16SI_UHI_COUNT) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_ashrv8di3_mask, "__builtin_ia32_psraq512_mask", IX86_BUILTIN_PSRAQ512, UNKNOWN, (int) V8DI_FTYPE_V8DI_V2DI_V8DI_UQI_COUNT) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_ashrv8di3_mask, "__builtin_ia32_psraqi512_mask", IX86_BUILTIN_PSRAQI512, UNKNOWN, (int) V8DI_FTYPE_V8DI_INT_V8DI_UQI_COUNT) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_ashrvv16si_mask, "__builtin_ia32_psrav16si_mask", IX86_BUILTIN_PSRAVV16SI, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_ashrvv8di_mask, "__builtin_ia32_psrav8di_mask", IX86_BUILTIN_PSRAVV8DI, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_V8DI_UQI) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_lshrv16si3_mask, "__builtin_ia32_psrld512_mask", IX86_BUILTIN_PSRLD512, UNKNOWN, (int) V16SI_FTYPE_V16SI_V4SI_V16SI_UHI_COUNT) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_lshrv16si3_mask, "__builtin_ia32_psrldi512_mask", IX86_BUILTIN_PSRLDI512, UNKNOWN, (int) V16SI_FTYPE_V16SI_INT_V16SI_UHI_COUNT) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_lshrv8di3_mask, "__builtin_ia32_psrlq512_mask", IX86_BUILTIN_PSRLQ512, UNKNOWN, (int) V8DI_FTYPE_V8DI_V2DI_V8DI_UQI_COUNT) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_lshrv8di3_mask, "__builtin_ia32_psrlqi512_mask", IX86_BUILTIN_PSRLQI512, UNKNOWN, (int) V8DI_FTYPE_V8DI_INT_V8DI_UQI_COUNT) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_lshrvv16si_mask, "__builtin_ia32_psrlv16si_mask", IX86_BUILTIN_PSRLVV16SI, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_lshrvv8di_mask, "__builtin_ia32_psrlv8di_mask", IX86_BUILTIN_PSRLVV8DI, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_V8DI_UQI) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_subv16si3_mask, "__builtin_ia32_psubd512_mask", IX86_BUILTIN_PSUBD512, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_subv8di3_mask, "__builtin_ia32_psubq512_mask", IX86_BUILTIN_PSUBQ512, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_V8DI_UQI) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_testmv16si3_mask, "__builtin_ia32_ptestmd512", IX86_BUILTIN_PTESTMD512, UNKNOWN, (int) UHI_FTYPE_V16SI_V16SI_UHI) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_testmv8di3_mask, "__builtin_ia32_ptestmq512", IX86_BUILTIN_PTESTMQ512, UNKNOWN, (int) UQI_FTYPE_V8DI_V8DI_UQI) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_testnmv16si3_mask, "__builtin_ia32_ptestnmd512", IX86_BUILTIN_PTESTNMD512, UNKNOWN, (int) UHI_FTYPE_V16SI_V16SI_UHI) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_testnmv8di3_mask, "__builtin_ia32_ptestnmq512", IX86_BUILTIN_PTESTNMQ512, UNKNOWN, (int) UQI_FTYPE_V8DI_V8DI_UQI) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_interleave_highv16si_mask, "__builtin_ia32_punpckhdq512_mask", IX86_BUILTIN_PUNPCKHDQ512, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_interleave_highv8di_mask, "__builtin_ia32_punpckhqdq512_mask", IX86_BUILTIN_PUNPCKHQDQ512, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_V8DI_UQI) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_interleave_lowv16si_mask, "__builtin_ia32_punpckldq512_mask", IX86_BUILTIN_PUNPCKLDQ512, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_interleave_lowv8di_mask, "__builtin_ia32_punpcklqdq512_mask", IX86_BUILTIN_PUNPCKLQDQ512, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_V8DI_UQI) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_xorv16si3_mask, "__builtin_ia32_pxord512_mask", IX86_BUILTIN_PXORD512, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_xorv8di3_mask, "__builtin_ia32_pxorq512_mask", IX86_BUILTIN_PXORQ512, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_V8DI_UQI) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_rcp14v8df_mask, "__builtin_ia32_rcp14pd512_mask", IX86_BUILTIN_RCP14PD512, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_UQI) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_rcp14v16sf_mask, "__builtin_ia32_rcp14ps512_mask", IX86_BUILTIN_RCP14PS512, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_UHI) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_expandv8df_mask, "__builtin_ia32_expanddf512_mask", IX86_BUILTIN_EXPANDPD512, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_UQI) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_expandv8df_maskz, "__builtin_ia32_expanddf512_maskz", IX86_BUILTIN_EXPANDPD512Z, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_UQI) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_expandv16sf_mask, "__builtin_ia32_expandsf512_mask", IX86_BUILTIN_EXPANDPS512, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_UHI) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_expandv16sf_maskz, "__builtin_ia32_expandsf512_maskz", IX86_BUILTIN_EXPANDPS512Z, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_UHI) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_vextractf32x4_mask, "__builtin_ia32_extractf32x4_mask", IX86_BUILTIN_EXTRACTF32X4, UNKNOWN, (int) V4SF_FTYPE_V16SF_INT_V4SF_UQI) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_vextractf64x4_mask, "__builtin_ia32_extractf64x4_mask", IX86_BUILTIN_EXTRACTF64X4, UNKNOWN, (int) V4DF_FTYPE_V8DF_INT_V4DF_UQI) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_vextracti32x4_mask, "__builtin_ia32_extracti32x4_mask", IX86_BUILTIN_EXTRACTI32X4, UNKNOWN, (int) V4SI_FTYPE_V16SI_INT_V4SI_UQI) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_vextracti64x4_mask, "__builtin_ia32_extracti64x4_mask", IX86_BUILTIN_EXTRACTI64X4, UNKNOWN, (int) V4DI_FTYPE_V8DI_INT_V4DI_UQI) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_vinsertf32x4_mask, "__builtin_ia32_insertf32x4_mask", IX86_BUILTIN_INSERTF32X4, UNKNOWN, (int) V16SF_FTYPE_V16SF_V4SF_INT_V16SF_UHI) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_vinsertf64x4_mask, "__builtin_ia32_insertf64x4_mask", IX86_BUILTIN_INSERTF64X4, UNKNOWN, (int) V8DF_FTYPE_V8DF_V4DF_INT_V8DF_UQI) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_vinserti32x4_mask, "__builtin_ia32_inserti32x4_mask", IX86_BUILTIN_INSERTI32X4, UNKNOWN, (int) V16SI_FTYPE_V16SI_V4SI_INT_V16SI_UHI) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_vinserti64x4_mask, "__builtin_ia32_inserti64x4_mask", IX86_BUILTIN_INSERTI64X4, UNKNOWN, (int) V8DI_FTYPE_V8DI_V4DI_INT_V8DI_UQI) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_loadv8df_mask, "__builtin_ia32_movapd512_mask", IX86_BUILTIN_MOVAPD512, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_UQI) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_loadv16sf_mask, "__builtin_ia32_movaps512_mask", IX86_BUILTIN_MOVAPS512, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_UHI) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_movddup512_mask, "__builtin_ia32_movddup512_mask", IX86_BUILTIN_MOVDDUP512, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_UQI) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_loadv16si_mask, "__builtin_ia32_movdqa32_512_mask", IX86_BUILTIN_MOVDQA32_512, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_UHI) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_loadv8di_mask, "__builtin_ia32_movdqa64_512_mask", IX86_BUILTIN_MOVDQA64_512, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_UQI) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_movshdup512_mask, "__builtin_ia32_movshdup512_mask", IX86_BUILTIN_MOVSHDUP512, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_UHI) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_movsldup512_mask, "__builtin_ia32_movsldup512_mask", IX86_BUILTIN_MOVSLDUP512, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_UHI) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_absv16si2_mask, "__builtin_ia32_pabsd512_mask", IX86_BUILTIN_PABSD512, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_UHI) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_absv8di2_mask, "__builtin_ia32_pabsq512_mask", IX86_BUILTIN_PABSQ512, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_UQI) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_addv16si3_mask, "__builtin_ia32_paddd512_mask", IX86_BUILTIN_PADDD512, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_addv8di3_mask, "__builtin_ia32_paddq512_mask", IX86_BUILTIN_PADDQ512, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_V8DI_UQI) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_andv16si3_mask, "__builtin_ia32_pandd512_mask", IX86_BUILTIN_PANDD512, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_andnotv16si3_mask, "__builtin_ia32_pandnd512_mask", IX86_BUILTIN_PANDND512, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_andnotv8di3_mask, "__builtin_ia32_pandnq512_mask", IX86_BUILTIN_PANDNQ512, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_V8DI_UQI) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_andv8di3_mask, "__builtin_ia32_pandq512_mask", IX86_BUILTIN_PANDQ512, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_V8DI_UQI) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_vec_dupv16si_mask, "__builtin_ia32_pbroadcastd512", IX86_BUILTIN_PBROADCASTD512, UNKNOWN, (int) V16SI_FTYPE_V4SI_V16SI_UHI) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_vec_dup_gprv16si_mask, "__builtin_ia32_pbroadcastd512_gpr_mask", IX86_BUILTIN_PBROADCASTD512_GPR, UNKNOWN, (int) V16SI_FTYPE_SI_V16SI_UHI) +BDESC (OPTION_MASK_ISA_AVX512CD, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512cd_maskb_vec_dupv8di, "__builtin_ia32_broadcastmb512", IX86_BUILTIN_PBROADCASTMB512, UNKNOWN, (int) V8DI_FTYPE_UQI) +BDESC (OPTION_MASK_ISA_AVX512CD, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512cd_maskw_vec_dupv16si, "__builtin_ia32_broadcastmw512", IX86_BUILTIN_PBROADCASTMW512, UNKNOWN, (int) V16SI_FTYPE_UHI) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_vec_dupv8di_mask, "__builtin_ia32_pbroadcastq512", IX86_BUILTIN_PBROADCASTQ512, UNKNOWN, (int) V8DI_FTYPE_V2DI_V8DI_UQI) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_vec_dup_gprv8di_mask, "__builtin_ia32_pbroadcastq512_gpr_mask", IX86_BUILTIN_PBROADCASTQ512_GPR, UNKNOWN, (int) V8DI_FTYPE_DI_V8DI_UQI) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_eqv16si3_mask, "__builtin_ia32_pcmpeqd512_mask", IX86_BUILTIN_PCMPEQD512_MASK, UNKNOWN, (int) UHI_FTYPE_V16SI_V16SI_UHI) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_eqv8di3_mask, "__builtin_ia32_pcmpeqq512_mask", IX86_BUILTIN_PCMPEQQ512_MASK, UNKNOWN, (int) UQI_FTYPE_V8DI_V8DI_UQI) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_gtv16si3_mask, "__builtin_ia32_pcmpgtd512_mask", IX86_BUILTIN_PCMPGTD512_MASK, UNKNOWN, (int) UHI_FTYPE_V16SI_V16SI_UHI) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_gtv8di3_mask, "__builtin_ia32_pcmpgtq512_mask", IX86_BUILTIN_PCMPGTQ512_MASK, UNKNOWN, (int) UQI_FTYPE_V8DI_V8DI_UQI) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_compressv16si_mask, "__builtin_ia32_compresssi512_mask", IX86_BUILTIN_PCOMPRESSD512, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_UHI) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_compressv8di_mask, "__builtin_ia32_compressdi512_mask", IX86_BUILTIN_PCOMPRESSQ512, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_UQI) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_expandv16si_mask, "__builtin_ia32_expandsi512_mask", IX86_BUILTIN_PEXPANDD512, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_UHI) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_expandv16si_maskz, "__builtin_ia32_expandsi512_maskz", IX86_BUILTIN_PEXPANDD512Z, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_UHI) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_expandv8di_mask, "__builtin_ia32_expanddi512_mask", IX86_BUILTIN_PEXPANDQ512, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_UQI) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_expandv8di_maskz, "__builtin_ia32_expanddi512_maskz", IX86_BUILTIN_PEXPANDQ512Z, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_UQI) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_smaxv16si3_mask, "__builtin_ia32_pmaxsd512_mask", IX86_BUILTIN_PMAXSD512, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_smaxv8di3_mask, "__builtin_ia32_pmaxsq512_mask", IX86_BUILTIN_PMAXSQ512, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_V8DI_UQI) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_umaxv16si3_mask, "__builtin_ia32_pmaxud512_mask", IX86_BUILTIN_PMAXUD512, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_umaxv8di3_mask, "__builtin_ia32_pmaxuq512_mask", IX86_BUILTIN_PMAXUQ512, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_V8DI_UQI) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_sminv16si3_mask, "__builtin_ia32_pminsd512_mask", IX86_BUILTIN_PMINSD512, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_sminv8di3_mask, "__builtin_ia32_pminsq512_mask", IX86_BUILTIN_PMINSQ512, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_V8DI_UQI) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_uminv16si3_mask, "__builtin_ia32_pminud512_mask", IX86_BUILTIN_PMINUD512, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_uminv8di3_mask, "__builtin_ia32_pminuq512_mask", IX86_BUILTIN_PMINUQ512, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_V8DI_UQI) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_truncatev16siv16qi2_mask, "__builtin_ia32_pmovdb512_mask", IX86_BUILTIN_PMOVDB512, UNKNOWN, (int) V16QI_FTYPE_V16SI_V16QI_UHI) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_truncatev16siv16hi2_mask, "__builtin_ia32_pmovdw512_mask", IX86_BUILTIN_PMOVDW512, UNKNOWN, (int) V16HI_FTYPE_V16SI_V16HI_UHI) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_truncatev8div16qi2_mask, "__builtin_ia32_pmovqb512_mask", IX86_BUILTIN_PMOVQB512, UNKNOWN, (int) V16QI_FTYPE_V8DI_V16QI_UQI) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_truncatev8div8si2_mask, "__builtin_ia32_pmovqd512_mask", IX86_BUILTIN_PMOVQD512, UNKNOWN, (int) V8SI_FTYPE_V8DI_V8SI_UQI) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_truncatev8div8hi2_mask, "__builtin_ia32_pmovqw512_mask", IX86_BUILTIN_PMOVQW512, UNKNOWN, (int) V8HI_FTYPE_V8DI_V8HI_UQI) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_ss_truncatev16siv16qi2_mask, "__builtin_ia32_pmovsdb512_mask", IX86_BUILTIN_PMOVSDB512, UNKNOWN, (int) V16QI_FTYPE_V16SI_V16QI_UHI) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_ss_truncatev16siv16hi2_mask, "__builtin_ia32_pmovsdw512_mask", IX86_BUILTIN_PMOVSDW512, UNKNOWN, (int) V16HI_FTYPE_V16SI_V16HI_UHI) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_ss_truncatev8div16qi2_mask, "__builtin_ia32_pmovsqb512_mask", IX86_BUILTIN_PMOVSQB512, UNKNOWN, (int) V16QI_FTYPE_V8DI_V16QI_UQI) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_ss_truncatev8div8si2_mask, "__builtin_ia32_pmovsqd512_mask", IX86_BUILTIN_PMOVSQD512, UNKNOWN, (int) V8SI_FTYPE_V8DI_V8SI_UQI) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_ss_truncatev8div8hi2_mask, "__builtin_ia32_pmovsqw512_mask", IX86_BUILTIN_PMOVSQW512, UNKNOWN, (int) V8HI_FTYPE_V8DI_V8HI_UQI) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_sign_extendv16qiv16si2_mask, "__builtin_ia32_pmovsxbd512_mask", IX86_BUILTIN_PMOVSXBD512, UNKNOWN, (int) V16SI_FTYPE_V16QI_V16SI_UHI) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_sign_extendv8qiv8di2_mask, "__builtin_ia32_pmovsxbq512_mask", IX86_BUILTIN_PMOVSXBQ512, UNKNOWN, (int) V8DI_FTYPE_V16QI_V8DI_UQI) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_sign_extendv8siv8di2_mask, "__builtin_ia32_pmovsxdq512_mask", IX86_BUILTIN_PMOVSXDQ512, UNKNOWN, (int) V8DI_FTYPE_V8SI_V8DI_UQI) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_sign_extendv16hiv16si2_mask, "__builtin_ia32_pmovsxwd512_mask", IX86_BUILTIN_PMOVSXWD512, UNKNOWN, (int) V16SI_FTYPE_V16HI_V16SI_UHI) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_sign_extendv8hiv8di2_mask, "__builtin_ia32_pmovsxwq512_mask", IX86_BUILTIN_PMOVSXWQ512, UNKNOWN, (int) V8DI_FTYPE_V8HI_V8DI_UQI) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_us_truncatev16siv16qi2_mask, "__builtin_ia32_pmovusdb512_mask", IX86_BUILTIN_PMOVUSDB512, UNKNOWN, (int) V16QI_FTYPE_V16SI_V16QI_UHI) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_us_truncatev16siv16hi2_mask, "__builtin_ia32_pmovusdw512_mask", IX86_BUILTIN_PMOVUSDW512, UNKNOWN, (int) V16HI_FTYPE_V16SI_V16HI_UHI) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_us_truncatev8div16qi2_mask, "__builtin_ia32_pmovusqb512_mask", IX86_BUILTIN_PMOVUSQB512, UNKNOWN, (int) V16QI_FTYPE_V8DI_V16QI_UQI) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_us_truncatev8div8si2_mask, "__builtin_ia32_pmovusqd512_mask", IX86_BUILTIN_PMOVUSQD512, UNKNOWN, (int) V8SI_FTYPE_V8DI_V8SI_UQI) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_us_truncatev8div8hi2_mask, "__builtin_ia32_pmovusqw512_mask", IX86_BUILTIN_PMOVUSQW512, UNKNOWN, (int) V8HI_FTYPE_V8DI_V8HI_UQI) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_zero_extendv16qiv16si2_mask, "__builtin_ia32_pmovzxbd512_mask", IX86_BUILTIN_PMOVZXBD512, UNKNOWN, (int) V16SI_FTYPE_V16QI_V16SI_UHI) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_zero_extendv8qiv8di2_mask, "__builtin_ia32_pmovzxbq512_mask", IX86_BUILTIN_PMOVZXBQ512, UNKNOWN, (int) V8DI_FTYPE_V16QI_V8DI_UQI) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_zero_extendv8siv8di2_mask, "__builtin_ia32_pmovzxdq512_mask", IX86_BUILTIN_PMOVZXDQ512, UNKNOWN, (int) V8DI_FTYPE_V8SI_V8DI_UQI) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_zero_extendv16hiv16si2_mask, "__builtin_ia32_pmovzxwd512_mask", IX86_BUILTIN_PMOVZXWD512, UNKNOWN, (int) V16SI_FTYPE_V16HI_V16SI_UHI) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_zero_extendv8hiv8di2_mask, "__builtin_ia32_pmovzxwq512_mask", IX86_BUILTIN_PMOVZXWQ512, UNKNOWN, (int) V8DI_FTYPE_V8HI_V8DI_UQI) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_vec_widen_smult_even_v16si_mask, "__builtin_ia32_pmuldq512_mask", IX86_BUILTIN_PMULDQ512, UNKNOWN, (int) V8DI_FTYPE_V16SI_V16SI_V8DI_UQI) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_mulv16si3_mask, "__builtin_ia32_pmulld512_mask" , IX86_BUILTIN_PMULLD512, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_vec_widen_umult_even_v16si_mask, "__builtin_ia32_pmuludq512_mask", IX86_BUILTIN_PMULUDQ512, UNKNOWN, (int) V8DI_FTYPE_V16SI_V16SI_V8DI_UQI) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_iorv16si3_mask, "__builtin_ia32_pord512_mask", IX86_BUILTIN_PORD512, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_iorv8di3_mask, "__builtin_ia32_porq512_mask", IX86_BUILTIN_PORQ512, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_V8DI_UQI) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_rolv16si_mask, "__builtin_ia32_prold512_mask", IX86_BUILTIN_PROLD512, UNKNOWN, (int) V16SI_FTYPE_V16SI_INT_V16SI_UHI) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_rolv8di_mask, "__builtin_ia32_prolq512_mask", IX86_BUILTIN_PROLQ512, UNKNOWN, (int) V8DI_FTYPE_V8DI_INT_V8DI_UQI) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_rolvv16si_mask, "__builtin_ia32_prolvd512_mask", IX86_BUILTIN_PROLVD512, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_rolvv8di_mask, "__builtin_ia32_prolvq512_mask", IX86_BUILTIN_PROLVQ512, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_V8DI_UQI) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_rorv16si_mask, "__builtin_ia32_prord512_mask", IX86_BUILTIN_PRORD512, UNKNOWN, (int) V16SI_FTYPE_V16SI_INT_V16SI_UHI) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_rorv8di_mask, "__builtin_ia32_prorq512_mask", IX86_BUILTIN_PRORQ512, UNKNOWN, (int) V8DI_FTYPE_V8DI_INT_V8DI_UQI) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_rorvv16si_mask, "__builtin_ia32_prorvd512_mask", IX86_BUILTIN_PRORVD512, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_rorvv8di_mask, "__builtin_ia32_prorvq512_mask", IX86_BUILTIN_PRORVQ512, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_V8DI_UQI) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_pshufdv3_mask, "__builtin_ia32_pshufd512_mask", IX86_BUILTIN_PSHUFD512, UNKNOWN, (int) V16SI_FTYPE_V16SI_INT_V16SI_UHI) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_ashlv16si3_mask, "__builtin_ia32_pslld512_mask", IX86_BUILTIN_PSLLD512, UNKNOWN, (int) V16SI_FTYPE_V16SI_V4SI_V16SI_UHI_COUNT) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_ashlv16si3_mask, "__builtin_ia32_pslldi512_mask", IX86_BUILTIN_PSLLDI512, UNKNOWN, (int) V16SI_FTYPE_V16SI_INT_V16SI_UHI_COUNT) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_ashlv8di3_mask, "__builtin_ia32_psllq512_mask", IX86_BUILTIN_PSLLQ512, UNKNOWN, (int) V8DI_FTYPE_V8DI_V2DI_V8DI_UQI_COUNT) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_ashlv8di3_mask, "__builtin_ia32_psllqi512_mask", IX86_BUILTIN_PSLLQI512, UNKNOWN, (int) V8DI_FTYPE_V8DI_INT_V8DI_UQI_COUNT) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_ashlvv16si_mask, "__builtin_ia32_psllv16si_mask", IX86_BUILTIN_PSLLVV16SI, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_ashlvv8di_mask, "__builtin_ia32_psllv8di_mask", IX86_BUILTIN_PSLLVV8DI, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_V8DI_UQI) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_ashrv16si3_mask, "__builtin_ia32_psrad512_mask", IX86_BUILTIN_PSRAD512, UNKNOWN, (int) V16SI_FTYPE_V16SI_V4SI_V16SI_UHI_COUNT) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_ashrv16si3_mask, "__builtin_ia32_psradi512_mask", IX86_BUILTIN_PSRADI512, UNKNOWN, (int) V16SI_FTYPE_V16SI_INT_V16SI_UHI_COUNT) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_ashrv8di3_mask, "__builtin_ia32_psraq512_mask", IX86_BUILTIN_PSRAQ512, UNKNOWN, (int) V8DI_FTYPE_V8DI_V2DI_V8DI_UQI_COUNT) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_ashrv8di3_mask, "__builtin_ia32_psraqi512_mask", IX86_BUILTIN_PSRAQI512, UNKNOWN, (int) V8DI_FTYPE_V8DI_INT_V8DI_UQI_COUNT) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_ashrvv16si_mask, "__builtin_ia32_psrav16si_mask", IX86_BUILTIN_PSRAVV16SI, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_ashrvv8di_mask, "__builtin_ia32_psrav8di_mask", IX86_BUILTIN_PSRAVV8DI, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_V8DI_UQI) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_lshrv16si3_mask, "__builtin_ia32_psrld512_mask", IX86_BUILTIN_PSRLD512, UNKNOWN, (int) V16SI_FTYPE_V16SI_V4SI_V16SI_UHI_COUNT) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_lshrv16si3_mask, "__builtin_ia32_psrldi512_mask", IX86_BUILTIN_PSRLDI512, UNKNOWN, (int) V16SI_FTYPE_V16SI_INT_V16SI_UHI_COUNT) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_lshrv8di3_mask, "__builtin_ia32_psrlq512_mask", IX86_BUILTIN_PSRLQ512, UNKNOWN, (int) V8DI_FTYPE_V8DI_V2DI_V8DI_UQI_COUNT) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_lshrv8di3_mask, "__builtin_ia32_psrlqi512_mask", IX86_BUILTIN_PSRLQI512, UNKNOWN, (int) V8DI_FTYPE_V8DI_INT_V8DI_UQI_COUNT) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_lshrvv16si_mask, "__builtin_ia32_psrlv16si_mask", IX86_BUILTIN_PSRLVV16SI, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_lshrvv8di_mask, "__builtin_ia32_psrlv8di_mask", IX86_BUILTIN_PSRLVV8DI, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_V8DI_UQI) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_subv16si3_mask, "__builtin_ia32_psubd512_mask", IX86_BUILTIN_PSUBD512, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_subv8di3_mask, "__builtin_ia32_psubq512_mask", IX86_BUILTIN_PSUBQ512, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_V8DI_UQI) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_testmv16si3_mask, "__builtin_ia32_ptestmd512", IX86_BUILTIN_PTESTMD512, UNKNOWN, (int) UHI_FTYPE_V16SI_V16SI_UHI) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_testmv8di3_mask, "__builtin_ia32_ptestmq512", IX86_BUILTIN_PTESTMQ512, UNKNOWN, (int) UQI_FTYPE_V8DI_V8DI_UQI) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_testnmv16si3_mask, "__builtin_ia32_ptestnmd512", IX86_BUILTIN_PTESTNMD512, UNKNOWN, (int) UHI_FTYPE_V16SI_V16SI_UHI) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_testnmv8di3_mask, "__builtin_ia32_ptestnmq512", IX86_BUILTIN_PTESTNMQ512, UNKNOWN, (int) UQI_FTYPE_V8DI_V8DI_UQI) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_interleave_highv16si_mask, "__builtin_ia32_punpckhdq512_mask", IX86_BUILTIN_PUNPCKHDQ512, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_interleave_highv8di_mask, "__builtin_ia32_punpckhqdq512_mask", IX86_BUILTIN_PUNPCKHQDQ512, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_V8DI_UQI) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_interleave_lowv16si_mask, "__builtin_ia32_punpckldq512_mask", IX86_BUILTIN_PUNPCKLDQ512, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_interleave_lowv8di_mask, "__builtin_ia32_punpcklqdq512_mask", IX86_BUILTIN_PUNPCKLQDQ512, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_V8DI_UQI) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_xorv16si3_mask, "__builtin_ia32_pxord512_mask", IX86_BUILTIN_PXORD512, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_xorv8di3_mask, "__builtin_ia32_pxorq512_mask", IX86_BUILTIN_PXORQ512, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_V8DI_UQI) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_rcp14v8df_mask, "__builtin_ia32_rcp14pd512_mask", IX86_BUILTIN_RCP14PD512, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_UQI) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_rcp14v16sf_mask, "__builtin_ia32_rcp14ps512_mask", IX86_BUILTIN_RCP14PS512, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_UHI) BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_srcp14v2df, "__builtin_ia32_rcp14sd", IX86_BUILTIN_RCP14SD, UNKNOWN, (int) V2DF_FTYPE_V2DF_V2DF) BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_srcp14v2df_mask, "__builtin_ia32_rcp14sd_mask", IX86_BUILTIN_RCP14SDMASK, UNKNOWN, (int) V2DF_FTYPE_V2DF_V2DF_V2DF_UQI) BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_srcp14v4sf, "__builtin_ia32_rcp14ss", IX86_BUILTIN_RCP14SS, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF) BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_srcp14v4sf_mask, "__builtin_ia32_rcp14ss_mask", IX86_BUILTIN_RCP14SSMASK, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_V4SF_UQI) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_rsqrt14v8df_mask, "__builtin_ia32_rsqrt14pd512_mask", IX86_BUILTIN_RSQRT14PD512, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_UQI) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_rsqrt14v16sf_mask, "__builtin_ia32_rsqrt14ps512_mask", IX86_BUILTIN_RSQRT14PS512, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_UHI) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_rsqrt14v8df_mask, "__builtin_ia32_rsqrt14pd512_mask", IX86_BUILTIN_RSQRT14PD512, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_UQI) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_rsqrt14v16sf_mask, "__builtin_ia32_rsqrt14ps512_mask", IX86_BUILTIN_RSQRT14PS512, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_UHI) BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_rsqrt14v2df, "__builtin_ia32_rsqrt14sd", IX86_BUILTIN_RSQRT14SD, UNKNOWN, (int) V2DF_FTYPE_V2DF_V2DF) BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_rsqrt14_v2df_mask, "__builtin_ia32_rsqrt14sd_mask", IX86_BUILTIN_RSQRT14SDMASK, UNKNOWN, (int) V2DF_FTYPE_V2DF_V2DF_V2DF_UQI) BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_rsqrt14v4sf, "__builtin_ia32_rsqrt14ss", IX86_BUILTIN_RSQRT14SS, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF) BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_rsqrt14_v4sf_mask, "__builtin_ia32_rsqrt14ss_mask", IX86_BUILTIN_RSQRT14SSMASK, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_V4SF_UQI) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_shufpd512_mask, "__builtin_ia32_shufpd512_mask", IX86_BUILTIN_SHUFPD512, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_INT_V8DF_UQI) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_shufps512_mask, "__builtin_ia32_shufps512_mask", IX86_BUILTIN_SHUFPS512, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_INT_V16SF_UHI) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_shuf_f32x4_mask, "__builtin_ia32_shuf_f32x4_mask", IX86_BUILTIN_SHUF_F32x4, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_INT_V16SF_UHI) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_shuf_f64x2_mask, "__builtin_ia32_shuf_f64x2_mask", IX86_BUILTIN_SHUF_F64x2, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_INT_V8DF_UQI) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_shuf_i32x4_mask, "__builtin_ia32_shuf_i32x4_mask", IX86_BUILTIN_SHUF_I32x4, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_INT_V16SI_UHI) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_shuf_i64x2_mask, "__builtin_ia32_shuf_i64x2_mask", IX86_BUILTIN_SHUF_I64x2, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_INT_V8DI_UQI) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_ucmpv16si3_mask, "__builtin_ia32_ucmpd512_mask", IX86_BUILTIN_UCMPD512, UNKNOWN, (int) UHI_FTYPE_V16SI_V16SI_INT_UHI) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_ucmpv8di3_mask, "__builtin_ia32_ucmpq512_mask", IX86_BUILTIN_UCMPQ512, UNKNOWN, (int) UQI_FTYPE_V8DI_V8DI_INT_UQI) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_unpckhpd512_mask, "__builtin_ia32_unpckhpd512_mask", IX86_BUILTIN_UNPCKHPD512, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_UQI) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_unpckhps512_mask, "__builtin_ia32_unpckhps512_mask", IX86_BUILTIN_UNPCKHPS512, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_UHI) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_unpcklpd512_mask, "__builtin_ia32_unpcklpd512_mask", IX86_BUILTIN_UNPCKLPD512, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_UQI) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_unpcklps512_mask, "__builtin_ia32_unpcklps512_mask", IX86_BUILTIN_UNPCKLPS512, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_UHI) -BDESC (OPTION_MASK_ISA_AVX512CD, 0, CODE_FOR_clzv16si2_mask, "__builtin_ia32_vplzcntd_512_mask", IX86_BUILTIN_VPCLZCNTD512, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_UHI) -BDESC (OPTION_MASK_ISA_AVX512CD, 0, CODE_FOR_clzv8di2_mask, "__builtin_ia32_vplzcntq_512_mask", IX86_BUILTIN_VPCLZCNTQ512, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_UQI) -BDESC (OPTION_MASK_ISA_AVX512CD, 0, CODE_FOR_conflictv16si_mask, "__builtin_ia32_vpconflictsi_512_mask", IX86_BUILTIN_VPCONFLICTD512, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_UHI) -BDESC (OPTION_MASK_ISA_AVX512CD, 0, CODE_FOR_conflictv8di_mask, "__builtin_ia32_vpconflictdi_512_mask", IX86_BUILTIN_VPCONFLICTQ512, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_UQI) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_permv8df_mask, "__builtin_ia32_permdf512_mask", IX86_BUILTIN_VPERMDF512, UNKNOWN, (int) V8DF_FTYPE_V8DF_INT_V8DF_UQI) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_permv8di_mask, "__builtin_ia32_permdi512_mask", IX86_BUILTIN_VPERMDI512, UNKNOWN, (int) V8DI_FTYPE_V8DI_INT_V8DI_UQI) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_vpermi2varv16si3_mask, "__builtin_ia32_vpermi2vard512_mask", IX86_BUILTIN_VPERMI2VARD512, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_vpermi2varv8df3_mask, "__builtin_ia32_vpermi2varpd512_mask", IX86_BUILTIN_VPERMI2VARPD512, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DI_V8DF_UQI) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_vpermi2varv16sf3_mask, "__builtin_ia32_vpermi2varps512_mask", IX86_BUILTIN_VPERMI2VARPS512, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SI_V16SF_UHI) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_vpermi2varv8di3_mask, "__builtin_ia32_vpermi2varq512_mask", IX86_BUILTIN_VPERMI2VARQ512, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_V8DI_UQI) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_vpermilv8df_mask, "__builtin_ia32_vpermilpd512_mask", IX86_BUILTIN_VPERMILPD512, UNKNOWN, (int) V8DF_FTYPE_V8DF_INT_V8DF_UQI) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_vpermilv16sf_mask, "__builtin_ia32_vpermilps512_mask", IX86_BUILTIN_VPERMILPS512, UNKNOWN, (int) V16SF_FTYPE_V16SF_INT_V16SF_UHI) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_vpermilvarv8df3_mask, "__builtin_ia32_vpermilvarpd512_mask", IX86_BUILTIN_VPERMILVARPD512, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DI_V8DF_UQI) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_vpermilvarv16sf3_mask, "__builtin_ia32_vpermilvarps512_mask", IX86_BUILTIN_VPERMILVARPS512, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SI_V16SF_UHI) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_vpermt2varv16si3_mask, "__builtin_ia32_vpermt2vard512_mask", IX86_BUILTIN_VPERMT2VARD512, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_vpermt2varv16si3_maskz, "__builtin_ia32_vpermt2vard512_maskz", IX86_BUILTIN_VPERMT2VARD512_MASKZ, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_vpermt2varv8df3_mask, "__builtin_ia32_vpermt2varpd512_mask", IX86_BUILTIN_VPERMT2VARPD512, UNKNOWN, (int) V8DF_FTYPE_V8DI_V8DF_V8DF_UQI) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_vpermt2varv8df3_maskz, "__builtin_ia32_vpermt2varpd512_maskz", IX86_BUILTIN_VPERMT2VARPD512_MASKZ, UNKNOWN, (int) V8DF_FTYPE_V8DI_V8DF_V8DF_UQI) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_vpermt2varv16sf3_mask, "__builtin_ia32_vpermt2varps512_mask", IX86_BUILTIN_VPERMT2VARPS512, UNKNOWN, (int) V16SF_FTYPE_V16SI_V16SF_V16SF_UHI) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_vpermt2varv16sf3_maskz, "__builtin_ia32_vpermt2varps512_maskz", IX86_BUILTIN_VPERMT2VARPS512_MASKZ, UNKNOWN, (int) V16SF_FTYPE_V16SI_V16SF_V16SF_UHI) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_vpermt2varv8di3_mask, "__builtin_ia32_vpermt2varq512_mask", IX86_BUILTIN_VPERMT2VARQ512, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_V8DI_UQI) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_vpermt2varv8di3_maskz, "__builtin_ia32_vpermt2varq512_maskz", IX86_BUILTIN_VPERMT2VARQ512_MASKZ, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_V8DI_UQI) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_permvarv8df_mask, "__builtin_ia32_permvardf512_mask", IX86_BUILTIN_VPERMVARDF512, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DI_V8DF_UQI) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_permvarv8di_mask, "__builtin_ia32_permvardi512_mask", IX86_BUILTIN_VPERMVARDI512, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_V8DI_UQI) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_permvarv16sf_mask, "__builtin_ia32_permvarsf512_mask", IX86_BUILTIN_VPERMVARSF512, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SI_V16SF_UHI) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_permvarv16si_mask, "__builtin_ia32_permvarsi512_mask", IX86_BUILTIN_VPERMVARSI512, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_vternlogv16si_mask, "__builtin_ia32_pternlogd512_mask", IX86_BUILTIN_VTERNLOGD512_MASK, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_INT_UHI) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_vternlogv16si_maskz, "__builtin_ia32_pternlogd512_maskz", IX86_BUILTIN_VTERNLOGD512_MASKZ, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_INT_UHI) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_vternlogv8di_mask, "__builtin_ia32_pternlogq512_mask", IX86_BUILTIN_VTERNLOGQ512_MASK, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_V8DI_INT_UQI) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_vternlogv8di_maskz, "__builtin_ia32_pternlogq512_maskz", IX86_BUILTIN_VTERNLOGQ512_MASKZ, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_V8DI_INT_UQI) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_shufpd512_mask, "__builtin_ia32_shufpd512_mask", IX86_BUILTIN_SHUFPD512, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_INT_V8DF_UQI) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_shufps512_mask, "__builtin_ia32_shufps512_mask", IX86_BUILTIN_SHUFPS512, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_INT_V16SF_UHI) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_shuf_f32x4_mask, "__builtin_ia32_shuf_f32x4_mask", IX86_BUILTIN_SHUF_F32x4, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_INT_V16SF_UHI) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_shuf_f64x2_mask, "__builtin_ia32_shuf_f64x2_mask", IX86_BUILTIN_SHUF_F64x2, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_INT_V8DF_UQI) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_shuf_i32x4_mask, "__builtin_ia32_shuf_i32x4_mask", IX86_BUILTIN_SHUF_I32x4, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_INT_V16SI_UHI) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_shuf_i64x2_mask, "__builtin_ia32_shuf_i64x2_mask", IX86_BUILTIN_SHUF_I64x2, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_INT_V8DI_UQI) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_ucmpv16si3_mask, "__builtin_ia32_ucmpd512_mask", IX86_BUILTIN_UCMPD512, UNKNOWN, (int) UHI_FTYPE_V16SI_V16SI_INT_UHI) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_ucmpv8di3_mask, "__builtin_ia32_ucmpq512_mask", IX86_BUILTIN_UCMPQ512, UNKNOWN, (int) UQI_FTYPE_V8DI_V8DI_INT_UQI) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_unpckhpd512_mask, "__builtin_ia32_unpckhpd512_mask", IX86_BUILTIN_UNPCKHPD512, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_UQI) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_unpckhps512_mask, "__builtin_ia32_unpckhps512_mask", IX86_BUILTIN_UNPCKHPS512, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_UHI) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_unpcklpd512_mask, "__builtin_ia32_unpcklpd512_mask", IX86_BUILTIN_UNPCKLPD512, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_UQI) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_unpcklps512_mask, "__builtin_ia32_unpcklps512_mask", IX86_BUILTIN_UNPCKLPS512, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_UHI) +BDESC (OPTION_MASK_ISA_AVX512CD, OPTION_MASK_ISA2_EVEX512, CODE_FOR_clzv16si2_mask, "__builtin_ia32_vplzcntd_512_mask", IX86_BUILTIN_VPCLZCNTD512, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_UHI) +BDESC (OPTION_MASK_ISA_AVX512CD, OPTION_MASK_ISA2_EVEX512, CODE_FOR_clzv8di2_mask, "__builtin_ia32_vplzcntq_512_mask", IX86_BUILTIN_VPCLZCNTQ512, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_UQI) +BDESC (OPTION_MASK_ISA_AVX512CD, OPTION_MASK_ISA2_EVEX512, CODE_FOR_conflictv16si_mask, "__builtin_ia32_vpconflictsi_512_mask", IX86_BUILTIN_VPCONFLICTD512, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_UHI) +BDESC (OPTION_MASK_ISA_AVX512CD, OPTION_MASK_ISA2_EVEX512, CODE_FOR_conflictv8di_mask, "__builtin_ia32_vpconflictdi_512_mask", IX86_BUILTIN_VPCONFLICTQ512, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_UQI) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_permv8df_mask, "__builtin_ia32_permdf512_mask", IX86_BUILTIN_VPERMDF512, UNKNOWN, (int) V8DF_FTYPE_V8DF_INT_V8DF_UQI) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_permv8di_mask, "__builtin_ia32_permdi512_mask", IX86_BUILTIN_VPERMDI512, UNKNOWN, (int) V8DI_FTYPE_V8DI_INT_V8DI_UQI) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_vpermi2varv16si3_mask, "__builtin_ia32_vpermi2vard512_mask", IX86_BUILTIN_VPERMI2VARD512, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_vpermi2varv8df3_mask, "__builtin_ia32_vpermi2varpd512_mask", IX86_BUILTIN_VPERMI2VARPD512, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DI_V8DF_UQI) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_vpermi2varv16sf3_mask, "__builtin_ia32_vpermi2varps512_mask", IX86_BUILTIN_VPERMI2VARPS512, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SI_V16SF_UHI) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_vpermi2varv8di3_mask, "__builtin_ia32_vpermi2varq512_mask", IX86_BUILTIN_VPERMI2VARQ512, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_V8DI_UQI) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_vpermilv8df_mask, "__builtin_ia32_vpermilpd512_mask", IX86_BUILTIN_VPERMILPD512, UNKNOWN, (int) V8DF_FTYPE_V8DF_INT_V8DF_UQI) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_vpermilv16sf_mask, "__builtin_ia32_vpermilps512_mask", IX86_BUILTIN_VPERMILPS512, UNKNOWN, (int) V16SF_FTYPE_V16SF_INT_V16SF_UHI) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_vpermilvarv8df3_mask, "__builtin_ia32_vpermilvarpd512_mask", IX86_BUILTIN_VPERMILVARPD512, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DI_V8DF_UQI) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_vpermilvarv16sf3_mask, "__builtin_ia32_vpermilvarps512_mask", IX86_BUILTIN_VPERMILVARPS512, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SI_V16SF_UHI) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_vpermt2varv16si3_mask, "__builtin_ia32_vpermt2vard512_mask", IX86_BUILTIN_VPERMT2VARD512, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_vpermt2varv16si3_maskz, "__builtin_ia32_vpermt2vard512_maskz", IX86_BUILTIN_VPERMT2VARD512_MASKZ, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_vpermt2varv8df3_mask, "__builtin_ia32_vpermt2varpd512_mask", IX86_BUILTIN_VPERMT2VARPD512, UNKNOWN, (int) V8DF_FTYPE_V8DI_V8DF_V8DF_UQI) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_vpermt2varv8df3_maskz, "__builtin_ia32_vpermt2varpd512_maskz", IX86_BUILTIN_VPERMT2VARPD512_MASKZ, UNKNOWN, (int) V8DF_FTYPE_V8DI_V8DF_V8DF_UQI) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_vpermt2varv16sf3_mask, "__builtin_ia32_vpermt2varps512_mask", IX86_BUILTIN_VPERMT2VARPS512, UNKNOWN, (int) V16SF_FTYPE_V16SI_V16SF_V16SF_UHI) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_vpermt2varv16sf3_maskz, "__builtin_ia32_vpermt2varps512_maskz", IX86_BUILTIN_VPERMT2VARPS512_MASKZ, UNKNOWN, (int) V16SF_FTYPE_V16SI_V16SF_V16SF_UHI) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_vpermt2varv8di3_mask, "__builtin_ia32_vpermt2varq512_mask", IX86_BUILTIN_VPERMT2VARQ512, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_V8DI_UQI) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_vpermt2varv8di3_maskz, "__builtin_ia32_vpermt2varq512_maskz", IX86_BUILTIN_VPERMT2VARQ512_MASKZ, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_V8DI_UQI) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_permvarv8df_mask, "__builtin_ia32_permvardf512_mask", IX86_BUILTIN_VPERMVARDF512, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DI_V8DF_UQI) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_permvarv8di_mask, "__builtin_ia32_permvardi512_mask", IX86_BUILTIN_VPERMVARDI512, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_V8DI_UQI) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_permvarv16sf_mask, "__builtin_ia32_permvarsf512_mask", IX86_BUILTIN_VPERMVARSF512, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SI_V16SF_UHI) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_permvarv16si_mask, "__builtin_ia32_permvarsi512_mask", IX86_BUILTIN_VPERMVARSI512, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_vternlogv16si_mask, "__builtin_ia32_pternlogd512_mask", IX86_BUILTIN_VTERNLOGD512_MASK, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_INT_UHI) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_vternlogv16si_maskz, "__builtin_ia32_pternlogd512_maskz", IX86_BUILTIN_VTERNLOGD512_MASKZ, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_INT_UHI) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_vternlogv8di_mask, "__builtin_ia32_pternlogq512_mask", IX86_BUILTIN_VTERNLOGQ512_MASK, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_V8DI_INT_UQI) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_vternlogv8di_maskz, "__builtin_ia32_pternlogq512_maskz", IX86_BUILTIN_VTERNLOGQ512_MASKZ, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_V8DI_INT_UQI) BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_movdf_mask, "__builtin_ia32_movesd_mask", IX86_BUILTIN_MOVSD_MASK, UNKNOWN, (int) V2DF_FTYPE_V2DF_V2DF_V2DF_UQI) BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_movsf_mask, "__builtin_ia32_movess_mask", IX86_BUILTIN_MOVSS_MASK, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_V4SF_UQI) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_copysignv16sf3, "__builtin_ia32_copysignps512", IX86_BUILTIN_CPYSGNPS512, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_copysignv8df3, "__builtin_ia32_copysignpd512", IX86_BUILTIN_CPYSGNPD512, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_sqrtv8df2, "__builtin_ia32_sqrtpd512", IX86_BUILTIN_SQRTPD512, UNKNOWN, (int) V8DF_FTYPE_V8DF) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_sqrtv16sf2, "__builtin_ia32_sqrtps512", IX86_BUILTIN_SQRTPS_NR512, UNKNOWN, (int) V16SF_FTYPE_V16SF) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_copysignv16sf3, "__builtin_ia32_copysignps512", IX86_BUILTIN_CPYSGNPS512, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_copysignv8df3, "__builtin_ia32_copysignpd512", IX86_BUILTIN_CPYSGNPD512, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_sqrtv8df2, "__builtin_ia32_sqrtpd512", IX86_BUILTIN_SQRTPD512, UNKNOWN, (int) V8DF_FTYPE_V8DF) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_sqrtv16sf2, "__builtin_ia32_sqrtps512", IX86_BUILTIN_SQRTPS_NR512, UNKNOWN, (int) V16SF_FTYPE_V16SF) BDESC (OPTION_MASK_ISA_AVX512ER, 0, CODE_FOR_avx512er_exp2v16sf, "__builtin_ia32_exp2ps", IX86_BUILTIN_EXP2PS, UNKNOWN, (int) V16SF_FTYPE_V16SF) BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_rndscalev32hf, "__builtin_ia32_floorph512", IX86_BUILTIN_FLOORPH512, (enum rtx_code) ROUND_FLOOR, (int) V32HF_FTYPE_V32HF_ROUND) BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_rndscalev32hf, "__builtin_ia32_ceilph512", IX86_BUILTIN_CEILPH512, (enum rtx_code) ROUND_CEIL, (int) V32HF_FTYPE_V32HF_ROUND) BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_rndscalev32hf, "__builtin_ia32_truncph512", IX86_BUILTIN_TRUNCPH512, (enum rtx_code) ROUND_TRUNC, (int) V32HF_FTYPE_V32HF_ROUND) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_roundps512, "__builtin_ia32_floorps512", IX86_BUILTIN_FLOORPS512, (enum rtx_code) ROUND_FLOOR, (int) V16SF_FTYPE_V16SF_ROUND) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_roundps512, "__builtin_ia32_ceilps512", IX86_BUILTIN_CEILPS512, (enum rtx_code) ROUND_CEIL, (int) V16SF_FTYPE_V16SF_ROUND) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_roundps512, "__builtin_ia32_truncps512", IX86_BUILTIN_TRUNCPS512, (enum rtx_code) ROUND_TRUNC, (int) V16SF_FTYPE_V16SF_ROUND) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_roundpd512, "__builtin_ia32_floorpd512", IX86_BUILTIN_FLOORPD512, (enum rtx_code) ROUND_FLOOR, (int) V8DF_FTYPE_V8DF_ROUND) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_roundpd512, "__builtin_ia32_ceilpd512", IX86_BUILTIN_CEILPD512, (enum rtx_code) ROUND_CEIL, (int) V8DF_FTYPE_V8DF_ROUND) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_roundpd512, "__builtin_ia32_truncpd512", IX86_BUILTIN_TRUNCPD512, (enum rtx_code) ROUND_TRUNC, (int) V8DF_FTYPE_V8DF_ROUND) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_fix_notruncv16sfv16si, "__builtin_ia32_cvtps2dq512", IX86_BUILTIN_CVTPS2DQ512, UNKNOWN, (int) V16SI_FTYPE_V16SF) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_vec_pack_sfix_v8df, "__builtin_ia32_vec_pack_sfix512", IX86_BUILTIN_VEC_PACK_SFIX512, UNKNOWN, (int) V16SI_FTYPE_V8DF_V8DF) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_roundv16sf2_sfix, "__builtin_ia32_roundps_az_sfix512", IX86_BUILTIN_ROUNDPS_AZ_SFIX512, UNKNOWN, (int) V16SI_FTYPE_V16SF) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_roundps512_sfix, "__builtin_ia32_floorps_sfix512", IX86_BUILTIN_FLOORPS_SFIX512, (enum rtx_code) ROUND_FLOOR, (int) V16SI_FTYPE_V16SF_ROUND) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_roundps512_sfix, "__builtin_ia32_ceilps_sfix512", IX86_BUILTIN_CEILPS_SFIX512, (enum rtx_code) ROUND_CEIL, (int) V16SI_FTYPE_V16SF_ROUND) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_roundv8df2_vec_pack_sfix, "__builtin_ia32_roundpd_az_vec_pack_sfix512", IX86_BUILTIN_ROUNDPD_AZ_VEC_PACK_SFIX512, UNKNOWN, (int) V16SI_FTYPE_V8DF_V8DF) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_roundpd_vec_pack_sfix512, "__builtin_ia32_floorpd_vec_pack_sfix512", IX86_BUILTIN_FLOORPD_VEC_PACK_SFIX512, (enum rtx_code) ROUND_FLOOR, (int) V16SI_FTYPE_V8DF_V8DF_ROUND) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_roundpd_vec_pack_sfix512, "__builtin_ia32_ceilpd_vec_pack_sfix512", IX86_BUILTIN_CEILPD_VEC_PACK_SFIX512, (enum rtx_code) ROUND_CEIL, (int) V16SI_FTYPE_V8DF_V8DF_ROUND) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_roundps512, "__builtin_ia32_floorps512", IX86_BUILTIN_FLOORPS512, (enum rtx_code) ROUND_FLOOR, (int) V16SF_FTYPE_V16SF_ROUND) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_roundps512, "__builtin_ia32_ceilps512", IX86_BUILTIN_CEILPS512, (enum rtx_code) ROUND_CEIL, (int) V16SF_FTYPE_V16SF_ROUND) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_roundps512, "__builtin_ia32_truncps512", IX86_BUILTIN_TRUNCPS512, (enum rtx_code) ROUND_TRUNC, (int) V16SF_FTYPE_V16SF_ROUND) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_roundpd512, "__builtin_ia32_floorpd512", IX86_BUILTIN_FLOORPD512, (enum rtx_code) ROUND_FLOOR, (int) V8DF_FTYPE_V8DF_ROUND) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_roundpd512, "__builtin_ia32_ceilpd512", IX86_BUILTIN_CEILPD512, (enum rtx_code) ROUND_CEIL, (int) V8DF_FTYPE_V8DF_ROUND) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_roundpd512, "__builtin_ia32_truncpd512", IX86_BUILTIN_TRUNCPD512, (enum rtx_code) ROUND_TRUNC, (int) V8DF_FTYPE_V8DF_ROUND) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_fix_notruncv16sfv16si, "__builtin_ia32_cvtps2dq512", IX86_BUILTIN_CVTPS2DQ512, UNKNOWN, (int) V16SI_FTYPE_V16SF) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_vec_pack_sfix_v8df, "__builtin_ia32_vec_pack_sfix512", IX86_BUILTIN_VEC_PACK_SFIX512, UNKNOWN, (int) V16SI_FTYPE_V8DF_V8DF) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_roundv16sf2_sfix, "__builtin_ia32_roundps_az_sfix512", IX86_BUILTIN_ROUNDPS_AZ_SFIX512, UNKNOWN, (int) V16SI_FTYPE_V16SF) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_roundps512_sfix, "__builtin_ia32_floorps_sfix512", IX86_BUILTIN_FLOORPS_SFIX512, (enum rtx_code) ROUND_FLOOR, (int) V16SI_FTYPE_V16SF_ROUND) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_roundps512_sfix, "__builtin_ia32_ceilps_sfix512", IX86_BUILTIN_CEILPS_SFIX512, (enum rtx_code) ROUND_CEIL, (int) V16SI_FTYPE_V16SF_ROUND) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_roundv8df2_vec_pack_sfix, "__builtin_ia32_roundpd_az_vec_pack_sfix512", IX86_BUILTIN_ROUNDPD_AZ_VEC_PACK_SFIX512, UNKNOWN, (int) V16SI_FTYPE_V8DF_V8DF) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_roundpd_vec_pack_sfix512, "__builtin_ia32_floorpd_vec_pack_sfix512", IX86_BUILTIN_FLOORPD_VEC_PACK_SFIX512, (enum rtx_code) ROUND_FLOOR, (int) V16SI_FTYPE_V8DF_V8DF_ROUND) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_roundpd_vec_pack_sfix512, "__builtin_ia32_ceilpd_vec_pack_sfix512", IX86_BUILTIN_CEILPD_VEC_PACK_SFIX512, (enum rtx_code) ROUND_CEIL, (int) V16SI_FTYPE_V8DF_V8DF_ROUND) /* Mask arithmetic operations */ BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_kashiftqi, "__builtin_ia32_kshiftliqi", IX86_BUILTIN_KSHIFTLI8, UNKNOWN, (int) UQI_FTYPE_UQI_UQI_CONST) @@ -3034,26 +3034,26 @@ BDESC_END (ARGS, ROUND_ARGS) /* AVX512F. */ BDESC_FIRST (round_args, ROUND_ARGS, - OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_addv8df3_mask_round, "__builtin_ia32_addpd512_mask", IX86_BUILTIN_ADDPD512, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_UQI_INT) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_addv16sf3_mask_round, "__builtin_ia32_addps512_mask", IX86_BUILTIN_ADDPS512, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_HI_INT) + OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_addv8df3_mask_round, "__builtin_ia32_addpd512_mask", IX86_BUILTIN_ADDPD512, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_UQI_INT) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_addv16sf3_mask_round, "__builtin_ia32_addps512_mask", IX86_BUILTIN_ADDPS512, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_HI_INT) BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_sse2_vmaddv2df3_round, "__builtin_ia32_addsd_round", IX86_BUILTIN_ADDSD_ROUND, UNKNOWN, (int) V2DF_FTYPE_V2DF_V2DF_INT) BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_sse2_vmaddv2df3_mask_round, "__builtin_ia32_addsd_mask_round", IX86_BUILTIN_ADDSD_MASK_ROUND, UNKNOWN, (int) V2DF_FTYPE_V2DF_V2DF_V2DF_UQI_INT) BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_sse_vmaddv4sf3_round, "__builtin_ia32_addss_round", IX86_BUILTIN_ADDSS_ROUND, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_INT) BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_sse_vmaddv4sf3_mask_round, "__builtin_ia32_addss_mask_round", IX86_BUILTIN_ADDSS_MASK_ROUND, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_V4SF_UQI_INT) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_cmpv8df3_mask_round, "__builtin_ia32_cmppd512_mask", IX86_BUILTIN_CMPPD512, UNKNOWN, (int) UQI_FTYPE_V8DF_V8DF_INT_UQI_INT) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_cmpv16sf3_mask_round, "__builtin_ia32_cmpps512_mask", IX86_BUILTIN_CMPPS512, UNKNOWN, (int) UHI_FTYPE_V16SF_V16SF_INT_UHI_INT) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_cmpv8df3_mask_round, "__builtin_ia32_cmppd512_mask", IX86_BUILTIN_CMPPD512, UNKNOWN, (int) UQI_FTYPE_V8DF_V8DF_INT_UQI_INT) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_cmpv16sf3_mask_round, "__builtin_ia32_cmpps512_mask", IX86_BUILTIN_CMPPS512, UNKNOWN, (int) UHI_FTYPE_V16SF_V16SF_INT_UHI_INT) BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_vmcmpv2df3_mask_round, "__builtin_ia32_cmpsd_mask", IX86_BUILTIN_CMPSD_MASK, UNKNOWN, (int) UQI_FTYPE_V2DF_V2DF_INT_UQI_INT) BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_vmcmpv4sf3_mask_round, "__builtin_ia32_cmpss_mask", IX86_BUILTIN_CMPSS_MASK, UNKNOWN, (int) UQI_FTYPE_V4SF_V4SF_INT_UQI_INT) BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_sse2_comi_round, "__builtin_ia32_vcomisd", IX86_BUILTIN_COMIDF, UNKNOWN, (int) INT_FTYPE_V2DF_V2DF_INT_INT) BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_sse_comi_round, "__builtin_ia32_vcomiss", IX86_BUILTIN_COMISF, UNKNOWN, (int) INT_FTYPE_V4SF_V4SF_INT_INT) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_floatv16siv16sf2_mask_round, "__builtin_ia32_cvtdq2ps512_mask", IX86_BUILTIN_CVTDQ2PS512, UNKNOWN, (int) V16SF_FTYPE_V16SI_V16SF_HI_INT) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_cvtpd2dq512_mask_round, "__builtin_ia32_cvtpd2dq512_mask", IX86_BUILTIN_CVTPD2DQ512, UNKNOWN, (int) V8SI_FTYPE_V8DF_V8SI_QI_INT) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_cvtpd2ps512_mask_round, "__builtin_ia32_cvtpd2ps512_mask", IX86_BUILTIN_CVTPD2PS512, UNKNOWN, (int) V8SF_FTYPE_V8DF_V8SF_QI_INT) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_fixuns_notruncv8dfv8si2_mask_round, "__builtin_ia32_cvtpd2udq512_mask", IX86_BUILTIN_CVTPD2UDQ512, UNKNOWN, (int) V8SI_FTYPE_V8DF_V8SI_QI_INT) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_vcvtph2ps512_mask_round, "__builtin_ia32_vcvtph2ps512_mask", IX86_BUILTIN_CVTPH2PS512, UNKNOWN, (int) V16SF_FTYPE_V16HI_V16SF_HI_INT) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_fix_notruncv16sfv16si_mask_round, "__builtin_ia32_cvtps2dq512_mask", IX86_BUILTIN_CVTPS2DQ512_MASK, UNKNOWN, (int) V16SI_FTYPE_V16SF_V16SI_HI_INT) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_cvtps2pd512_mask_round, "__builtin_ia32_cvtps2pd512_mask", IX86_BUILTIN_CVTPS2PD512, UNKNOWN, (int) V8DF_FTYPE_V8SF_V8DF_QI_INT) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_fixuns_notruncv16sfv16si_mask_round, "__builtin_ia32_cvtps2udq512_mask", IX86_BUILTIN_CVTPS2UDQ512, UNKNOWN, (int) V16SI_FTYPE_V16SF_V16SI_HI_INT) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_floatv16siv16sf2_mask_round, "__builtin_ia32_cvtdq2ps512_mask", IX86_BUILTIN_CVTDQ2PS512, UNKNOWN, (int) V16SF_FTYPE_V16SI_V16SF_HI_INT) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_cvtpd2dq512_mask_round, "__builtin_ia32_cvtpd2dq512_mask", IX86_BUILTIN_CVTPD2DQ512, UNKNOWN, (int) V8SI_FTYPE_V8DF_V8SI_QI_INT) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_cvtpd2ps512_mask_round, "__builtin_ia32_cvtpd2ps512_mask", IX86_BUILTIN_CVTPD2PS512, UNKNOWN, (int) V8SF_FTYPE_V8DF_V8SF_QI_INT) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_fixuns_notruncv8dfv8si2_mask_round, "__builtin_ia32_cvtpd2udq512_mask", IX86_BUILTIN_CVTPD2UDQ512, UNKNOWN, (int) V8SI_FTYPE_V8DF_V8SI_QI_INT) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_vcvtph2ps512_mask_round, "__builtin_ia32_vcvtph2ps512_mask", IX86_BUILTIN_CVTPH2PS512, UNKNOWN, (int) V16SF_FTYPE_V16HI_V16SF_HI_INT) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_fix_notruncv16sfv16si_mask_round, "__builtin_ia32_cvtps2dq512_mask", IX86_BUILTIN_CVTPS2DQ512_MASK, UNKNOWN, (int) V16SI_FTYPE_V16SF_V16SI_HI_INT) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_cvtps2pd512_mask_round, "__builtin_ia32_cvtps2pd512_mask", IX86_BUILTIN_CVTPS2PD512, UNKNOWN, (int) V8DF_FTYPE_V8SF_V8DF_QI_INT) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_fixuns_notruncv16sfv16si_mask_round, "__builtin_ia32_cvtps2udq512_mask", IX86_BUILTIN_CVTPS2UDQ512, UNKNOWN, (int) V16SI_FTYPE_V16SF_V16SI_HI_INT) BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_sse2_cvtsd2ss_round, "__builtin_ia32_cvtsd2ss_round", IX86_BUILTIN_CVTSD2SS_ROUND, UNKNOWN, (int) V4SF_FTYPE_V4SF_V2DF_INT) BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_sse2_cvtsd2ss_mask_round, "__builtin_ia32_cvtsd2ss_mask_round", IX86_BUILTIN_CVTSD2SS_MASK_ROUND, UNKNOWN, (int) V4SF_FTYPE_V4SF_V2DF_V4SF_UQI_INT) BDESC (OPTION_MASK_ISA_AVX512F | OPTION_MASK_ISA_64BIT, 0, CODE_FOR_sse2_cvtsi2sdq_round, "__builtin_ia32_cvtsi2sd64", IX86_BUILTIN_CVTSI2SD64, UNKNOWN, (int) V2DF_FTYPE_V2DF_INT64_INT) @@ -3069,64 +3069,64 @@ BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_floatunsv16siv16sf2_mask_round, "__b BDESC (OPTION_MASK_ISA_AVX512F | OPTION_MASK_ISA_64BIT, 0, CODE_FOR_cvtusi2sd64_round, "__builtin_ia32_cvtusi2sd64", IX86_BUILTIN_CVTUSI2SD64, UNKNOWN, (int) V2DF_FTYPE_V2DF_UINT64_INT) BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_cvtusi2ss32_round, "__builtin_ia32_cvtusi2ss32", IX86_BUILTIN_CVTUSI2SS32, UNKNOWN, (int) V4SF_FTYPE_V4SF_UINT_INT) BDESC (OPTION_MASK_ISA_AVX512F | OPTION_MASK_ISA_64BIT, 0, CODE_FOR_cvtusi2ss64_round, "__builtin_ia32_cvtusi2ss64", IX86_BUILTIN_CVTUSI2SS64, UNKNOWN, (int) V4SF_FTYPE_V4SF_UINT64_INT) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_divv8df3_mask_round, "__builtin_ia32_divpd512_mask", IX86_BUILTIN_DIVPD512, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_UQI_INT) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_divv16sf3_mask_round, "__builtin_ia32_divps512_mask", IX86_BUILTIN_DIVPS512, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_HI_INT) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_divv8df3_mask_round, "__builtin_ia32_divpd512_mask", IX86_BUILTIN_DIVPD512, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_UQI_INT) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_divv16sf3_mask_round, "__builtin_ia32_divps512_mask", IX86_BUILTIN_DIVPS512, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_HI_INT) BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_sse2_vmdivv2df3_round, "__builtin_ia32_divsd_round", IX86_BUILTIN_DIVSD_ROUND, UNKNOWN, (int) V2DF_FTYPE_V2DF_V2DF_INT) BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_sse2_vmdivv2df3_mask_round, "__builtin_ia32_divsd_mask_round", IX86_BUILTIN_DIVSD_MASK_ROUND, UNKNOWN, (int) V2DF_FTYPE_V2DF_V2DF_V2DF_UQI_INT) BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_sse_vmdivv4sf3_round, "__builtin_ia32_divss_round", IX86_BUILTIN_DIVSS_ROUND, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_INT) BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_sse_vmdivv4sf3_mask_round, "__builtin_ia32_divss_mask_round", IX86_BUILTIN_DIVSS_MASK_ROUND, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_V4SF_UQI_INT) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_fixupimmv8df_mask_round, "__builtin_ia32_fixupimmpd512_mask", IX86_BUILTIN_FIXUPIMMPD512_MASK, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DI_INT_QI_INT) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_fixupimmv8df_maskz_round, "__builtin_ia32_fixupimmpd512_maskz", IX86_BUILTIN_FIXUPIMMPD512_MASKZ, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DI_INT_QI_INT) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_fixupimmv16sf_mask_round, "__builtin_ia32_fixupimmps512_mask", IX86_BUILTIN_FIXUPIMMPS512_MASK, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SI_INT_HI_INT) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_fixupimmv16sf_maskz_round, "__builtin_ia32_fixupimmps512_maskz", IX86_BUILTIN_FIXUPIMMPS512_MASKZ, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SI_INT_HI_INT) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_fixupimmv8df_mask_round, "__builtin_ia32_fixupimmpd512_mask", IX86_BUILTIN_FIXUPIMMPD512_MASK, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DI_INT_QI_INT) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_fixupimmv8df_maskz_round, "__builtin_ia32_fixupimmpd512_maskz", IX86_BUILTIN_FIXUPIMMPD512_MASKZ, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DI_INT_QI_INT) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_fixupimmv16sf_mask_round, "__builtin_ia32_fixupimmps512_mask", IX86_BUILTIN_FIXUPIMMPS512_MASK, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SI_INT_HI_INT) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_fixupimmv16sf_maskz_round, "__builtin_ia32_fixupimmps512_maskz", IX86_BUILTIN_FIXUPIMMPS512_MASKZ, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SI_INT_HI_INT) BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_sfixupimmv2df_mask_round, "__builtin_ia32_fixupimmsd_mask", IX86_BUILTIN_FIXUPIMMSD128_MASK, UNKNOWN, (int) V2DF_FTYPE_V2DF_V2DF_V2DI_INT_QI_INT) BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_sfixupimmv2df_maskz_round, "__builtin_ia32_fixupimmsd_maskz", IX86_BUILTIN_FIXUPIMMSD128_MASKZ, UNKNOWN, (int) V2DF_FTYPE_V2DF_V2DF_V2DI_INT_QI_INT) BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_sfixupimmv4sf_mask_round, "__builtin_ia32_fixupimmss_mask", IX86_BUILTIN_FIXUPIMMSS128_MASK, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_V4SI_INT_QI_INT) BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_sfixupimmv4sf_maskz_round, "__builtin_ia32_fixupimmss_maskz", IX86_BUILTIN_FIXUPIMMSS128_MASKZ, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_V4SI_INT_QI_INT) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_getexpv8df_mask_round, "__builtin_ia32_getexppd512_mask", IX86_BUILTIN_GETEXPPD512, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_QI_INT) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_getexpv16sf_mask_round, "__builtin_ia32_getexpps512_mask", IX86_BUILTIN_GETEXPPS512, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_HI_INT) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_getexpv8df_mask_round, "__builtin_ia32_getexppd512_mask", IX86_BUILTIN_GETEXPPD512, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_QI_INT) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_getexpv16sf_mask_round, "__builtin_ia32_getexpps512_mask", IX86_BUILTIN_GETEXPPS512, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_HI_INT) BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_sgetexpv2df_round, "__builtin_ia32_getexpsd128_round", IX86_BUILTIN_GETEXPSD128, UNKNOWN, (int) V2DF_FTYPE_V2DF_V2DF_INT) BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_sgetexpv2df_mask_round, "__builtin_ia32_getexpsd_mask_round", IX86_BUILTIN_GETEXPSD_MASK_ROUND, UNKNOWN, (int) V2DF_FTYPE_V2DF_V2DF_V2DF_UQI_INT) BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_sgetexpv4sf_round, "__builtin_ia32_getexpss128_round", IX86_BUILTIN_GETEXPSS128, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_INT) BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_sgetexpv4sf_mask_round, "__builtin_ia32_getexpss_mask_round", IX86_BUILTIN_GETEXPSS_MASK_ROUND, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_V4SF_UQI_INT) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_getmantv8df_mask_round, "__builtin_ia32_getmantpd512_mask", IX86_BUILTIN_GETMANTPD512, UNKNOWN, (int) V8DF_FTYPE_V8DF_INT_V8DF_QI_INT) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_getmantv16sf_mask_round, "__builtin_ia32_getmantps512_mask", IX86_BUILTIN_GETMANTPS512, UNKNOWN, (int) V16SF_FTYPE_V16SF_INT_V16SF_HI_INT) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_getmantv8df_mask_round, "__builtin_ia32_getmantpd512_mask", IX86_BUILTIN_GETMANTPD512, UNKNOWN, (int) V8DF_FTYPE_V8DF_INT_V8DF_QI_INT) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_getmantv16sf_mask_round, "__builtin_ia32_getmantps512_mask", IX86_BUILTIN_GETMANTPS512, UNKNOWN, (int) V16SF_FTYPE_V16SF_INT_V16SF_HI_INT) BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_vgetmantv2df_round, "__builtin_ia32_getmantsd_round", IX86_BUILTIN_GETMANTSD128, UNKNOWN, (int) V2DF_FTYPE_V2DF_V2DF_INT_INT) BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_vgetmantv2df_mask_round, "__builtin_ia32_getmantsd_mask_round", IX86_BUILTIN_GETMANTSD_MASK_ROUND, UNKNOWN, (int) V2DF_FTYPE_V2DF_V2DF_INT_V2DF_UQI_INT) BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_vgetmantv4sf_round, "__builtin_ia32_getmantss_round", IX86_BUILTIN_GETMANTSS128, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_INT_INT) BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_vgetmantv4sf_mask_round, "__builtin_ia32_getmantss_mask_round", IX86_BUILTIN_GETMANTSS_MASK_ROUND, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_INT_V4SF_UQI_INT) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_smaxv8df3_mask_round, "__builtin_ia32_maxpd512_mask", IX86_BUILTIN_MAXPD512, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_UQI_INT) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_smaxv16sf3_mask_round, "__builtin_ia32_maxps512_mask", IX86_BUILTIN_MAXPS512, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_HI_INT) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_smaxv8df3_mask_round, "__builtin_ia32_maxpd512_mask", IX86_BUILTIN_MAXPD512, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_UQI_INT) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_smaxv16sf3_mask_round, "__builtin_ia32_maxps512_mask", IX86_BUILTIN_MAXPS512, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_HI_INT) BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_sse2_vmsmaxv2df3_round, "__builtin_ia32_maxsd_round", IX86_BUILTIN_MAXSD_ROUND, UNKNOWN, (int) V2DF_FTYPE_V2DF_V2DF_INT) BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_sse2_vmsmaxv2df3_mask_round, "__builtin_ia32_maxsd_mask_round", IX86_BUILTIN_MAXSD_MASK_ROUND, UNKNOWN, (int) V2DF_FTYPE_V2DF_V2DF_V2DF_UQI_INT) BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_sse_vmsmaxv4sf3_round, "__builtin_ia32_maxss_round", IX86_BUILTIN_MAXSS_ROUND, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_INT) BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_sse_vmsmaxv4sf3_mask_round, "__builtin_ia32_maxss_mask_round", IX86_BUILTIN_MAXSS_MASK_ROUND, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_V4SF_UQI_INT) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_sminv8df3_mask_round, "__builtin_ia32_minpd512_mask", IX86_BUILTIN_MINPD512, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_UQI_INT) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_sminv16sf3_mask_round, "__builtin_ia32_minps512_mask", IX86_BUILTIN_MINPS512, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_HI_INT) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_sminv8df3_mask_round, "__builtin_ia32_minpd512_mask", IX86_BUILTIN_MINPD512, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_UQI_INT) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_sminv16sf3_mask_round, "__builtin_ia32_minps512_mask", IX86_BUILTIN_MINPS512, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_HI_INT) BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_sse2_vmsminv2df3_round, "__builtin_ia32_minsd_round", IX86_BUILTIN_MINSD_ROUND, UNKNOWN, (int) V2DF_FTYPE_V2DF_V2DF_INT) BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_sse2_vmsminv2df3_mask_round, "__builtin_ia32_minsd_mask_round", IX86_BUILTIN_MINSD_MASK_ROUND, UNKNOWN, (int) V2DF_FTYPE_V2DF_V2DF_V2DF_UQI_INT) BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_sse_vmsminv4sf3_round, "__builtin_ia32_minss_round", IX86_BUILTIN_MINSS_ROUND, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_INT) BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_sse_vmsminv4sf3_mask_round, "__builtin_ia32_minss_mask_round", IX86_BUILTIN_MINSS_MASK_ROUND, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_V4SF_UQI_INT) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_mulv8df3_mask_round, "__builtin_ia32_mulpd512_mask", IX86_BUILTIN_MULPD512, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_UQI_INT) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_mulv16sf3_mask_round, "__builtin_ia32_mulps512_mask", IX86_BUILTIN_MULPS512, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_HI_INT) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_mulv8df3_mask_round, "__builtin_ia32_mulpd512_mask", IX86_BUILTIN_MULPD512, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_UQI_INT) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_mulv16sf3_mask_round, "__builtin_ia32_mulps512_mask", IX86_BUILTIN_MULPS512, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_HI_INT) BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_sse2_vmmulv2df3_round, "__builtin_ia32_mulsd_round", IX86_BUILTIN_MULSD_ROUND, UNKNOWN, (int) V2DF_FTYPE_V2DF_V2DF_INT) BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_sse2_vmmulv2df3_mask_round, "__builtin_ia32_mulsd_mask_round", IX86_BUILTIN_MULSD_MASK_ROUND, UNKNOWN, (int) V2DF_FTYPE_V2DF_V2DF_V2DF_UQI_INT) BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_sse_vmmulv4sf3_round, "__builtin_ia32_mulss_round", IX86_BUILTIN_MULSS_ROUND, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_INT) BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_sse_vmmulv4sf3_mask_round, "__builtin_ia32_mulss_mask_round", IX86_BUILTIN_MULSS_MASK_ROUND, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_V4SF_UQI_INT) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_rndscalev8df_mask_round, "__builtin_ia32_rndscalepd_mask", IX86_BUILTIN_RNDSCALEPD, UNKNOWN, (int) V8DF_FTYPE_V8DF_INT_V8DF_QI_INT) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_rndscalev16sf_mask_round, "__builtin_ia32_rndscaleps_mask", IX86_BUILTIN_RNDSCALEPS, UNKNOWN, (int) V16SF_FTYPE_V16SF_INT_V16SF_HI_INT) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_rndscalev8df_mask_round, "__builtin_ia32_rndscalepd_mask", IX86_BUILTIN_RNDSCALEPD, UNKNOWN, (int) V8DF_FTYPE_V8DF_INT_V8DF_QI_INT) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_rndscalev16sf_mask_round, "__builtin_ia32_rndscaleps_mask", IX86_BUILTIN_RNDSCALEPS, UNKNOWN, (int) V16SF_FTYPE_V16SF_INT_V16SF_HI_INT) BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_rndscalev2df_mask_round, "__builtin_ia32_rndscalesd_mask_round", IX86_BUILTIN_RNDSCALESD, UNKNOWN, (int) V2DF_FTYPE_V2DF_V2DF_INT_V2DF_UQI_INT) BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_rndscalev4sf_mask_round, "__builtin_ia32_rndscaless_mask_round", IX86_BUILTIN_RNDSCALESS, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_INT_V4SF_UQI_INT) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_scalefv8df_mask_round, "__builtin_ia32_scalefpd512_mask", IX86_BUILTIN_SCALEFPD512, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_UQI_INT) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_scalefv16sf_mask_round, "__builtin_ia32_scalefps512_mask", IX86_BUILTIN_SCALEFPS512, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_HI_INT) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_scalefv8df_mask_round, "__builtin_ia32_scalefpd512_mask", IX86_BUILTIN_SCALEFPD512, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_UQI_INT) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_scalefv16sf_mask_round, "__builtin_ia32_scalefps512_mask", IX86_BUILTIN_SCALEFPS512, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_HI_INT) BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_vmscalefv2df_mask_round, "__builtin_ia32_scalefsd_mask_round", IX86_BUILTIN_SCALEFSD, UNKNOWN, (int) V2DF_FTYPE_V2DF_V2DF_V2DF_UQI_INT) BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_vmscalefv4sf_mask_round, "__builtin_ia32_scalefss_mask_round", IX86_BUILTIN_SCALEFSS, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_V4SF_UQI_INT) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_sqrtv8df2_mask_round, "__builtin_ia32_sqrtpd512_mask", IX86_BUILTIN_SQRTPD512_MASK, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_QI_INT) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_sqrtv16sf2_mask_round, "__builtin_ia32_sqrtps512_mask", IX86_BUILTIN_SQRTPS512_MASK, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_HI_INT) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_sqrtv8df2_mask_round, "__builtin_ia32_sqrtpd512_mask", IX86_BUILTIN_SQRTPD512_MASK, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_QI_INT) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_sqrtv16sf2_mask_round, "__builtin_ia32_sqrtps512_mask", IX86_BUILTIN_SQRTPS512_MASK, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_HI_INT) BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_sse2_vmsqrtv2df2_mask_round, "__builtin_ia32_sqrtsd_mask_round", IX86_BUILTIN_SQRTSD_ROUND, UNKNOWN, (int) V2DF_FTYPE_V2DF_V2DF_V2DF_UQI_INT) BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_sse_vmsqrtv4sf2_mask_round, "__builtin_ia32_sqrtss_mask_round", IX86_BUILTIN_SQRTSS_ROUND, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_V4SF_UQI_INT) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_subv8df3_mask_round, "__builtin_ia32_subpd512_mask", IX86_BUILTIN_SUBPD512, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_UQI_INT) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_subv16sf3_mask_round, "__builtin_ia32_subps512_mask", IX86_BUILTIN_SUBPS512, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_HI_INT) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_subv8df3_mask_round, "__builtin_ia32_subpd512_mask", IX86_BUILTIN_SUBPD512, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_UQI_INT) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_subv16sf3_mask_round, "__builtin_ia32_subps512_mask", IX86_BUILTIN_SUBPS512, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_HI_INT) BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_sse2_vmsubv2df3_round, "__builtin_ia32_subsd_round", IX86_BUILTIN_SUBSD_ROUND, UNKNOWN, (int) V2DF_FTYPE_V2DF_V2DF_INT) BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_sse2_vmsubv2df3_mask_round, "__builtin_ia32_subsd_mask_round", IX86_BUILTIN_SUBSD_MASK_ROUND, UNKNOWN, (int) V2DF_FTYPE_V2DF_V2DF_V2DF_UQI_INT) BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_sse_vmsubv4sf3_round, "__builtin_ia32_subss_round", IX86_BUILTIN_SUBSS_ROUND, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_INT) @@ -3147,12 +3147,12 @@ BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_sse_cvttss2si_round, "__builtin_ia32 BDESC (OPTION_MASK_ISA_AVX512F | OPTION_MASK_ISA_64BIT, 0, CODE_FOR_sse_cvttss2siq_round, "__builtin_ia32_vcvttss2si64", IX86_BUILTIN_VCVTTSS2SI64, UNKNOWN, (int) INT64_FTYPE_V4SF_INT) BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_vcvttss2usi_round, "__builtin_ia32_vcvttss2usi32", IX86_BUILTIN_VCVTTSS2USI32, UNKNOWN, (int) UINT_FTYPE_V4SF_INT) BDESC (OPTION_MASK_ISA_AVX512F | OPTION_MASK_ISA_64BIT, 0, CODE_FOR_avx512f_vcvttss2usiq_round, "__builtin_ia32_vcvttss2usi64", IX86_BUILTIN_VCVTTSS2USI64, UNKNOWN, (int) UINT64_FTYPE_V4SF_INT) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_fmadd_v8df_mask_round, "__builtin_ia32_vfmaddpd512_mask", IX86_BUILTIN_VFMADDPD512_MASK, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_UQI_INT) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_fmadd_v8df_mask3_round, "__builtin_ia32_vfmaddpd512_mask3", IX86_BUILTIN_VFMADDPD512_MASK3, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_UQI_INT) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_fmadd_v8df_maskz_round, "__builtin_ia32_vfmaddpd512_maskz", IX86_BUILTIN_VFMADDPD512_MASKZ, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_UQI_INT) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_fmadd_v16sf_mask_round, "__builtin_ia32_vfmaddps512_mask", IX86_BUILTIN_VFMADDPS512_MASK, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_HI_INT) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_fmadd_v16sf_mask3_round, "__builtin_ia32_vfmaddps512_mask3", IX86_BUILTIN_VFMADDPS512_MASK3, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_HI_INT) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_fmadd_v16sf_maskz_round, "__builtin_ia32_vfmaddps512_maskz", IX86_BUILTIN_VFMADDPS512_MASKZ, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_HI_INT) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_fmadd_v8df_mask_round, "__builtin_ia32_vfmaddpd512_mask", IX86_BUILTIN_VFMADDPD512_MASK, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_UQI_INT) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_fmadd_v8df_mask3_round, "__builtin_ia32_vfmaddpd512_mask3", IX86_BUILTIN_VFMADDPD512_MASK3, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_UQI_INT) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_fmadd_v8df_maskz_round, "__builtin_ia32_vfmaddpd512_maskz", IX86_BUILTIN_VFMADDPD512_MASKZ, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_UQI_INT) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_fmadd_v16sf_mask_round, "__builtin_ia32_vfmaddps512_mask", IX86_BUILTIN_VFMADDPS512_MASK, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_HI_INT) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_fmadd_v16sf_mask3_round, "__builtin_ia32_vfmaddps512_mask3", IX86_BUILTIN_VFMADDPS512_MASK3, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_HI_INT) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_fmadd_v16sf_maskz_round, "__builtin_ia32_vfmaddps512_maskz", IX86_BUILTIN_VFMADDPS512_MASKZ, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_HI_INT) BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_fmai_vmfmadd_v2df_round, "__builtin_ia32_vfmaddsd3_round", IX86_BUILTIN_VFMADDSD3_ROUND, UNKNOWN, (int) V2DF_FTYPE_V2DF_V2DF_V2DF_INT) BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_fmai_vmfmadd_v4sf_round, "__builtin_ia32_vfmaddss3_round", IX86_BUILTIN_VFMADDSS3_ROUND, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_V4SF_INT) BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_vmfmadd_v2df_mask_round, "__builtin_ia32_vfmaddsd3_mask", IX86_BUILTIN_VFMADDSD3_MASK, UNKNOWN, (int) V2DF_FTYPE_V2DF_V2DF_V2DF_UQI_INT) @@ -3163,32 +3163,32 @@ BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_vmfmadd_v4sf_mask_round, "__ BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_vmfmadd_v4sf_mask3_round, "__builtin_ia32_vfmaddss3_mask3", IX86_BUILTIN_VFMADDSS3_MASK3, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_V4SF_UQI_INT) BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_vmfmadd_v4sf_maskz_round, "__builtin_ia32_vfmaddss3_maskz", IX86_BUILTIN_VFMADDSS3_MASKZ, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_V4SF_UQI_INT) BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_vmfmsub_v4sf_mask3_round, "__builtin_ia32_vfmsubss3_mask3", IX86_BUILTIN_VFMSUBSS3_MASK3, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_V4SF_UQI_INT) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_fmaddsub_v8df_mask_round, "__builtin_ia32_vfmaddsubpd512_mask", IX86_BUILTIN_VFMADDSUBPD512_MASK, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_UQI_INT) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_fmaddsub_v8df_mask3_round, "__builtin_ia32_vfmaddsubpd512_mask3", IX86_BUILTIN_VFMADDSUBPD512_MASK3, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_UQI_INT) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_fmaddsub_v8df_maskz_round, "__builtin_ia32_vfmaddsubpd512_maskz", IX86_BUILTIN_VFMADDSUBPD512_MASKZ, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_UQI_INT) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_fmaddsub_v16sf_mask_round, "__builtin_ia32_vfmaddsubps512_mask", IX86_BUILTIN_VFMADDSUBPS512_MASK, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_HI_INT) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_fmaddsub_v16sf_mask3_round, "__builtin_ia32_vfmaddsubps512_mask3", IX86_BUILTIN_VFMADDSUBPS512_MASK3, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_HI_INT) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_fmaddsub_v16sf_maskz_round, "__builtin_ia32_vfmaddsubps512_maskz", IX86_BUILTIN_VFMADDSUBPS512_MASKZ, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_HI_INT) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_fmsubadd_v8df_mask3_round, "__builtin_ia32_vfmsubaddpd512_mask3", IX86_BUILTIN_VFMSUBADDPD512_MASK3, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_UQI_INT) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_fmsubadd_v16sf_mask3_round, "__builtin_ia32_vfmsubaddps512_mask3", IX86_BUILTIN_VFMSUBADDPS512_MASK3, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_HI_INT) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_fmsub_v8df_mask_round, "__builtin_ia32_vfmsubpd512_mask", IX86_BUILTIN_VFMSUBPD512_MASK, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_UQI_INT) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_fmsub_v8df_mask3_round, "__builtin_ia32_vfmsubpd512_mask3", IX86_BUILTIN_VFMSUBPD512_MASK3, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_UQI_INT) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_fmsub_v8df_maskz_round, "__builtin_ia32_vfmsubpd512_maskz", IX86_BUILTIN_VFMSUBPD512_MASKZ, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_UQI_INT) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_fmsub_v16sf_mask_round, "__builtin_ia32_vfmsubps512_mask", IX86_BUILTIN_VFMSUBPS512_MASK, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_HI_INT) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_fmsub_v16sf_mask3_round, "__builtin_ia32_vfmsubps512_mask3", IX86_BUILTIN_VFMSUBPS512_MASK3, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_HI_INT) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_fmsub_v16sf_maskz_round, "__builtin_ia32_vfmsubps512_maskz", IX86_BUILTIN_VFMSUBPS512_MASKZ, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_HI_INT) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_fnmadd_v8df_mask_round, "__builtin_ia32_vfnmaddpd512_mask", IX86_BUILTIN_VFNMADDPD512_MASK, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_UQI_INT) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_fnmadd_v8df_mask3_round, "__builtin_ia32_vfnmaddpd512_mask3", IX86_BUILTIN_VFNMADDPD512_MASK3, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_UQI_INT) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_fnmadd_v8df_maskz_round, "__builtin_ia32_vfnmaddpd512_maskz", IX86_BUILTIN_VFNMADDPD512_MASKZ, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_UQI_INT) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_fnmadd_v16sf_mask_round, "__builtin_ia32_vfnmaddps512_mask", IX86_BUILTIN_VFNMADDPS512_MASK, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_HI_INT) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_fnmadd_v16sf_mask3_round, "__builtin_ia32_vfnmaddps512_mask3", IX86_BUILTIN_VFNMADDPS512_MASK3, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_HI_INT) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_fnmadd_v16sf_maskz_round, "__builtin_ia32_vfnmaddps512_maskz", IX86_BUILTIN_VFNMADDPS512_MASKZ, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_HI_INT) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_fnmsub_v8df_mask_round, "__builtin_ia32_vfnmsubpd512_mask", IX86_BUILTIN_VFNMSUBPD512_MASK, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_UQI_INT) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_fnmsub_v8df_mask3_round, "__builtin_ia32_vfnmsubpd512_mask3", IX86_BUILTIN_VFNMSUBPD512_MASK3, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_UQI_INT) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_fnmsub_v8df_maskz_round, "__builtin_ia32_vfnmsubpd512_maskz", IX86_BUILTIN_VFNMSUBPD512_MASKZ, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_UQI_INT) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_fnmsub_v16sf_mask_round, "__builtin_ia32_vfnmsubps512_mask", IX86_BUILTIN_VFNMSUBPS512_MASK, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_HI_INT) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_fnmsub_v16sf_mask3_round, "__builtin_ia32_vfnmsubps512_mask3", IX86_BUILTIN_VFNMSUBPS512_MASK3, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_HI_INT) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_fnmsub_v16sf_maskz_round, "__builtin_ia32_vfnmsubps512_maskz", IX86_BUILTIN_VFNMSUBPS512_MASKZ, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_HI_INT) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_fmaddsub_v8df_mask_round, "__builtin_ia32_vfmaddsubpd512_mask", IX86_BUILTIN_VFMADDSUBPD512_MASK, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_UQI_INT) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_fmaddsub_v8df_mask3_round, "__builtin_ia32_vfmaddsubpd512_mask3", IX86_BUILTIN_VFMADDSUBPD512_MASK3, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_UQI_INT) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_fmaddsub_v8df_maskz_round, "__builtin_ia32_vfmaddsubpd512_maskz", IX86_BUILTIN_VFMADDSUBPD512_MASKZ, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_UQI_INT) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_fmaddsub_v16sf_mask_round, "__builtin_ia32_vfmaddsubps512_mask", IX86_BUILTIN_VFMADDSUBPS512_MASK, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_HI_INT) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_fmaddsub_v16sf_mask3_round, "__builtin_ia32_vfmaddsubps512_mask3", IX86_BUILTIN_VFMADDSUBPS512_MASK3, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_HI_INT) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_fmaddsub_v16sf_maskz_round, "__builtin_ia32_vfmaddsubps512_maskz", IX86_BUILTIN_VFMADDSUBPS512_MASKZ, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_HI_INT) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_fmsubadd_v8df_mask3_round, "__builtin_ia32_vfmsubaddpd512_mask3", IX86_BUILTIN_VFMSUBADDPD512_MASK3, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_UQI_INT) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_fmsubadd_v16sf_mask3_round, "__builtin_ia32_vfmsubaddps512_mask3", IX86_BUILTIN_VFMSUBADDPS512_MASK3, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_HI_INT) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_fmsub_v8df_mask_round, "__builtin_ia32_vfmsubpd512_mask", IX86_BUILTIN_VFMSUBPD512_MASK, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_UQI_INT) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_fmsub_v8df_mask3_round, "__builtin_ia32_vfmsubpd512_mask3", IX86_BUILTIN_VFMSUBPD512_MASK3, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_UQI_INT) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_fmsub_v8df_maskz_round, "__builtin_ia32_vfmsubpd512_maskz", IX86_BUILTIN_VFMSUBPD512_MASKZ, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_UQI_INT) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_fmsub_v16sf_mask_round, "__builtin_ia32_vfmsubps512_mask", IX86_BUILTIN_VFMSUBPS512_MASK, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_HI_INT) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_fmsub_v16sf_mask3_round, "__builtin_ia32_vfmsubps512_mask3", IX86_BUILTIN_VFMSUBPS512_MASK3, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_HI_INT) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_fmsub_v16sf_maskz_round, "__builtin_ia32_vfmsubps512_maskz", IX86_BUILTIN_VFMSUBPS512_MASKZ, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_HI_INT) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_fnmadd_v8df_mask_round, "__builtin_ia32_vfnmaddpd512_mask", IX86_BUILTIN_VFNMADDPD512_MASK, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_UQI_INT) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_fnmadd_v8df_mask3_round, "__builtin_ia32_vfnmaddpd512_mask3", IX86_BUILTIN_VFNMADDPD512_MASK3, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_UQI_INT) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_fnmadd_v8df_maskz_round, "__builtin_ia32_vfnmaddpd512_maskz", IX86_BUILTIN_VFNMADDPD512_MASKZ, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_UQI_INT) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_fnmadd_v16sf_mask_round, "__builtin_ia32_vfnmaddps512_mask", IX86_BUILTIN_VFNMADDPS512_MASK, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_HI_INT) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_fnmadd_v16sf_mask3_round, "__builtin_ia32_vfnmaddps512_mask3", IX86_BUILTIN_VFNMADDPS512_MASK3, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_HI_INT) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_fnmadd_v16sf_maskz_round, "__builtin_ia32_vfnmaddps512_maskz", IX86_BUILTIN_VFNMADDPS512_MASKZ, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_HI_INT) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_fnmsub_v8df_mask_round, "__builtin_ia32_vfnmsubpd512_mask", IX86_BUILTIN_VFNMSUBPD512_MASK, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_UQI_INT) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_fnmsub_v8df_mask3_round, "__builtin_ia32_vfnmsubpd512_mask3", IX86_BUILTIN_VFNMSUBPD512_MASK3, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_UQI_INT) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_fnmsub_v8df_maskz_round, "__builtin_ia32_vfnmsubpd512_maskz", IX86_BUILTIN_VFNMSUBPD512_MASKZ, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_UQI_INT) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_fnmsub_v16sf_mask_round, "__builtin_ia32_vfnmsubps512_mask", IX86_BUILTIN_VFNMSUBPS512_MASK, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_HI_INT) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_fnmsub_v16sf_mask3_round, "__builtin_ia32_vfnmsubps512_mask3", IX86_BUILTIN_VFNMSUBPS512_MASK3, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_HI_INT) +BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_fnmsub_v16sf_maskz_round, "__builtin_ia32_vfnmsubps512_maskz", IX86_BUILTIN_VFNMSUBPS512_MASKZ, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_HI_INT) /* AVX512ER */ BDESC (OPTION_MASK_ISA_AVX512ER, 0, CODE_FOR_avx512er_exp2v8df_mask_round, "__builtin_ia32_exp2pd_mask", IX86_BUILTIN_EXP2PD_MASK, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_QI_INT) diff --git a/gcc/config/i386/i386-builtins.cc b/gcc/config/i386/i386-builtins.cc index 8a0b8dfe073..e1d1dac2ba2 100644 --- a/gcc/config/i386/i386-builtins.cc +++ b/gcc/config/i386/i386-builtins.cc @@ -784,83 +784,103 @@ ix86_init_mmx_sse_builtins (void) IX86_BUILTIN_GATHERALTDIV8SI); /* AVX512F */ - def_builtin_pure (OPTION_MASK_ISA_AVX512F, 0, "__builtin_ia32_gathersiv16sf", + def_builtin_pure (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, + "__builtin_ia32_gathersiv16sf", V16SF_FTYPE_V16SF_PCVOID_V16SI_HI_INT, IX86_BUILTIN_GATHER3SIV16SF); - def_builtin_pure (OPTION_MASK_ISA_AVX512F, 0, "__builtin_ia32_gathersiv8df", + def_builtin_pure (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, + "__builtin_ia32_gathersiv8df", V8DF_FTYPE_V8DF_PCVOID_V8SI_QI_INT, IX86_BUILTIN_GATHER3SIV8DF); - def_builtin_pure (OPTION_MASK_ISA_AVX512F, 0, "__builtin_ia32_gatherdiv16sf", + def_builtin_pure (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, + "__builtin_ia32_gatherdiv16sf", V8SF_FTYPE_V8SF_PCVOID_V8DI_QI_INT, IX86_BUILTIN_GATHER3DIV16SF); - def_builtin_pure (OPTION_MASK_ISA_AVX512F, 0, "__builtin_ia32_gatherdiv8df", + def_builtin_pure (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, + "__builtin_ia32_gatherdiv8df", V8DF_FTYPE_V8DF_PCVOID_V8DI_QI_INT, IX86_BUILTIN_GATHER3DIV8DF); - def_builtin_pure (OPTION_MASK_ISA_AVX512F, 0, "__builtin_ia32_gathersiv16si", + def_builtin_pure (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, + "__builtin_ia32_gathersiv16si", V16SI_FTYPE_V16SI_PCVOID_V16SI_HI_INT, IX86_BUILTIN_GATHER3SIV16SI); - def_builtin_pure (OPTION_MASK_ISA_AVX512F, 0, "__builtin_ia32_gathersiv8di", + def_builtin_pure (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, + "__builtin_ia32_gathersiv8di", V8DI_FTYPE_V8DI_PCVOID_V8SI_QI_INT, IX86_BUILTIN_GATHER3SIV8DI); - def_builtin_pure (OPTION_MASK_ISA_AVX512F, 0, "__builtin_ia32_gatherdiv16si", + def_builtin_pure (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, + "__builtin_ia32_gatherdiv16si", V8SI_FTYPE_V8SI_PCVOID_V8DI_QI_INT, IX86_BUILTIN_GATHER3DIV16SI); - def_builtin_pure (OPTION_MASK_ISA_AVX512F, 0, "__builtin_ia32_gatherdiv8di", + def_builtin_pure (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, + "__builtin_ia32_gatherdiv8di", V8DI_FTYPE_V8DI_PCVOID_V8DI_QI_INT, IX86_BUILTIN_GATHER3DIV8DI); - def_builtin_pure (OPTION_MASK_ISA_AVX512F, 0, "__builtin_ia32_gather3altsiv8df ", + def_builtin_pure (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, + "__builtin_ia32_gather3altsiv8df ", V8DF_FTYPE_V8DF_PCDOUBLE_V16SI_QI_INT, IX86_BUILTIN_GATHER3ALTSIV8DF); - def_builtin_pure (OPTION_MASK_ISA_AVX512F, 0, "__builtin_ia32_gather3altdiv16sf ", + def_builtin_pure (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, + "__builtin_ia32_gather3altdiv16sf ", V16SF_FTYPE_V16SF_PCFLOAT_V8DI_HI_INT, IX86_BUILTIN_GATHER3ALTDIV16SF); - def_builtin_pure (OPTION_MASK_ISA_AVX512F, 0, "__builtin_ia32_gather3altsiv8di ", + def_builtin_pure (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, + "__builtin_ia32_gather3altsiv8di ", V8DI_FTYPE_V8DI_PCINT64_V16SI_QI_INT, IX86_BUILTIN_GATHER3ALTSIV8DI); - def_builtin_pure (OPTION_MASK_ISA_AVX512F, 0, "__builtin_ia32_gather3altdiv16si ", + def_builtin_pure (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, + "__builtin_ia32_gather3altdiv16si ", V16SI_FTYPE_V16SI_PCINT_V8DI_HI_INT, IX86_BUILTIN_GATHER3ALTDIV16SI); - def_builtin (OPTION_MASK_ISA_AVX512F, 0, "__builtin_ia32_scattersiv16sf", + def_builtin (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, + "__builtin_ia32_scattersiv16sf", VOID_FTYPE_PVOID_HI_V16SI_V16SF_INT, IX86_BUILTIN_SCATTERSIV16SF); - def_builtin (OPTION_MASK_ISA_AVX512F, 0, "__builtin_ia32_scattersiv8df", + def_builtin (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, + "__builtin_ia32_scattersiv8df", VOID_FTYPE_PVOID_QI_V8SI_V8DF_INT, IX86_BUILTIN_SCATTERSIV8DF); - def_builtin (OPTION_MASK_ISA_AVX512F, 0, "__builtin_ia32_scatterdiv16sf", + def_builtin (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, + "__builtin_ia32_scatterdiv16sf", VOID_FTYPE_PVOID_QI_V8DI_V8SF_INT, IX86_BUILTIN_SCATTERDIV16SF); - def_builtin (OPTION_MASK_ISA_AVX512F, 0, "__builtin_ia32_scatterdiv8df", + def_builtin (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, + "__builtin_ia32_scatterdiv8df", VOID_FTYPE_PVOID_QI_V8DI_V8DF_INT, IX86_BUILTIN_SCATTERDIV8DF); - def_builtin (OPTION_MASK_ISA_AVX512F, 0, "__builtin_ia32_scattersiv16si", + def_builtin (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, + "__builtin_ia32_scattersiv16si", VOID_FTYPE_PVOID_HI_V16SI_V16SI_INT, IX86_BUILTIN_SCATTERSIV16SI); - def_builtin (OPTION_MASK_ISA_AVX512F, 0, "__builtin_ia32_scattersiv8di", + def_builtin (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, + "__builtin_ia32_scattersiv8di", VOID_FTYPE_PVOID_QI_V8SI_V8DI_INT, IX86_BUILTIN_SCATTERSIV8DI); - def_builtin (OPTION_MASK_ISA_AVX512F, 0, "__builtin_ia32_scatterdiv16si", + def_builtin (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, + "__builtin_ia32_scatterdiv16si", VOID_FTYPE_PVOID_QI_V8DI_V8SI_INT, IX86_BUILTIN_SCATTERDIV16SI); - def_builtin (OPTION_MASK_ISA_AVX512F, 0, "__builtin_ia32_scatterdiv8di", + def_builtin (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, + "__builtin_ia32_scatterdiv8di", VOID_FTYPE_PVOID_QI_V8DI_V8DI_INT, IX86_BUILTIN_SCATTERDIV8DI); @@ -1009,19 +1029,23 @@ ix86_init_mmx_sse_builtins (void) VOID_FTYPE_PVOID_QI_V2DI_V2DI_INT, IX86_BUILTIN_SCATTERDIV2DI); - def_builtin (OPTION_MASK_ISA_AVX512F, 0, "__builtin_ia32_scatteraltsiv8df ", + def_builtin (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, + "__builtin_ia32_scatteraltsiv8df ", VOID_FTYPE_PDOUBLE_QI_V16SI_V8DF_INT, IX86_BUILTIN_SCATTERALTSIV8DF); - def_builtin (OPTION_MASK_ISA_AVX512F, 0, "__builtin_ia32_scatteraltdiv16sf ", + def_builtin (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, + "__builtin_ia32_scatteraltdiv16sf ", VOID_FTYPE_PFLOAT_HI_V8DI_V16SF_INT, IX86_BUILTIN_SCATTERALTDIV16SF); - def_builtin (OPTION_MASK_ISA_AVX512F, 0, "__builtin_ia32_scatteraltsiv8di ", + def_builtin (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, + "__builtin_ia32_scatteraltsiv8di ", VOID_FTYPE_PLONGLONG_QI_V16SI_V8DI_INT, IX86_BUILTIN_SCATTERALTSIV8DI); - def_builtin (OPTION_MASK_ISA_AVX512F, 0, "__builtin_ia32_scatteraltdiv16si ", + def_builtin (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, + "__builtin_ia32_scatteraltdiv16si ", VOID_FTYPE_PINT_HI_V8DI_V16SI_INT, IX86_BUILTIN_SCATTERALTDIV16SI); From patchwork Thu Sep 21 07:20:03 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Hu, Lin1" X-Patchwork-Id: 1837512 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=intel.com header.i=@intel.com header.a=rsa-sha256 header.s=Intel header.b=jIN5OCBA; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4Rrn1K1ZN3z1ynF for ; Thu, 21 Sep 2023 17:23:01 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 04CD2385CCA9 for ; Thu, 21 Sep 2023 07:22:59 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mgamail.intel.com (mgamail.intel.com [192.55.52.115]) by sourceware.org (Postfix) with ESMTPS id 2B7FE3858281 for ; Thu, 21 Sep 2023 07:22:36 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 2B7FE3858281 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=intel.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1695280956; x=1726816956; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=pj5BvATP+0LbPaJlPSOCWhbJ6sOgQkkKQFYZaMJyZKM=; b=jIN5OCBAnIGhSbLpJoyJpOt93zFd/KVek6lKKpMuWW6if9l9LGEGczZB Ev408gH80sCxKQH9iNC5ynQPPRU+hKnwcYuM6vf8QKM4hF9c675+njKlz Lur9AAzWEDSjH3oN55ZuQgAnEgZMpjqw1kSgDOXUHijjb50JwRleTXuTm 2L1D459NG3Uqucput9Ycips7bs6arY4F297qrFwajMUOa6EyjCx/zOvND JA6PkUl45v1LozpXQAPTT9hYfS+uBEgCc57rfsKOeK8SUHHNOz4C3wZqc xtcgIHuLsDnG0fqe6yH5GmR0YG3iLAX5r6PqEUvoNYAijgy1GFpVnoYzP Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10839"; a="380352156" X-IronPort-AV: E=Sophos;i="6.03,164,1694761200"; d="scan'208";a="380352156" Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Sep 2023 00:22:18 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10839"; a="817262188" X-IronPort-AV: E=Sophos;i="6.03,164,1694761200"; d="scan'208";a="817262188" Received: from shvmail03.sh.intel.com ([10.239.245.20]) by fmsmga004.fm.intel.com with ESMTP; 21 Sep 2023 00:22:15 -0700 Received: from shliclel4217.sh.intel.com (shliclel4217.sh.intel.com [10.239.240.127]) by shvmail03.sh.intel.com (Postfix) with ESMTP id 488241005136; Thu, 21 Sep 2023 15:22:14 +0800 (CST) From: "Hu, Lin1" To: gcc-patches@gcc.gnu.org Cc: hongtao.liu@intel.com, ubizjak@gmail.com, haochen.jiang@intel.com Subject: [PATCH 08/18] [PATCH 2/5] Add OPTION_MASK_ISA2_EVEX512 for 512 bit builtins Date: Thu, 21 Sep 2023 15:20:03 +0800 Message-Id: <20230921072013.2124750-9-lin1.hu@intel.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20230921072013.2124750-1-lin1.hu@intel.com> References: <20230921072013.2124750-1-lin1.hu@intel.com> MIME-Version: 1.0 X-Spam-Status: No, score=-11.7 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, SPF_HELO_NONE, SPF_NONE, TXREP, UPPERCASE_50_75 autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org From: Haochen Jiang gcc/ChangeLog: * config/i386/i386-builtin.def (BDESC): Add OPTION_MASK_ISA2_EVEX512. --- gcc/config/i386/i386-builtin.def | 94 ++++++++++++++++---------------- 1 file changed, 47 insertions(+), 47 deletions(-) diff --git a/gcc/config/i386/i386-builtin.def b/gcc/config/i386/i386-builtin.def index 0cc526383db..7a0dec9bc8b 100644 --- a/gcc/config/i386/i386-builtin.def +++ b/gcc/config/i386/i386-builtin.def @@ -2408,37 +2408,37 @@ BDESC (OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_avx512vl_cmpv2df3_mask, "__builtin_ BDESC (OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_avx512vl_cmpv4sf3_mask, "__builtin_ia32_cmpps128_mask", IX86_BUILTIN_CMPPS128_MASK, UNKNOWN, (int) UQI_FTYPE_V4SF_V4SF_INT_UQI) /* AVX512DQ. */ -BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_avx512dq_broadcastv16sf_mask, "__builtin_ia32_broadcastf32x2_512_mask", IX86_BUILTIN_BROADCASTF32x2_512, UNKNOWN, (int) V16SF_FTYPE_V4SF_V16SF_UHI) -BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_avx512dq_broadcastv16si_mask, "__builtin_ia32_broadcasti32x2_512_mask", IX86_BUILTIN_BROADCASTI32x2_512, UNKNOWN, (int) V16SI_FTYPE_V4SI_V16SI_UHI) -BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_avx512dq_broadcastv8df_mask_1, "__builtin_ia32_broadcastf64x2_512_mask", IX86_BUILTIN_BROADCASTF64X2_512, UNKNOWN, (int) V8DF_FTYPE_V2DF_V8DF_UQI) -BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_avx512dq_broadcastv8di_mask_1, "__builtin_ia32_broadcasti64x2_512_mask", IX86_BUILTIN_BROADCASTI64X2_512, UNKNOWN, (int) V8DI_FTYPE_V2DI_V8DI_UQI) -BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_avx512dq_broadcastv16sf_mask_1, "__builtin_ia32_broadcastf32x8_512_mask", IX86_BUILTIN_BROADCASTF32X8_512, UNKNOWN, (int) V16SF_FTYPE_V8SF_V16SF_UHI) -BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_avx512dq_broadcastv16si_mask_1, "__builtin_ia32_broadcasti32x8_512_mask", IX86_BUILTIN_BROADCASTI32X8_512, UNKNOWN, (int) V16SI_FTYPE_V8SI_V16SI_UHI) -BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_avx512dq_vextractf64x2_mask, "__builtin_ia32_extractf64x2_512_mask", IX86_BUILTIN_EXTRACTF64X2_512, UNKNOWN, (int) V2DF_FTYPE_V8DF_INT_V2DF_UQI) -BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_avx512dq_vextractf32x8_mask, "__builtin_ia32_extractf32x8_mask", IX86_BUILTIN_EXTRACTF32X8, UNKNOWN, (int) V8SF_FTYPE_V16SF_INT_V8SF_UQI) -BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_avx512dq_vextracti64x2_mask, "__builtin_ia32_extracti64x2_512_mask", IX86_BUILTIN_EXTRACTI64X2_512, UNKNOWN, (int) V2DI_FTYPE_V8DI_INT_V2DI_UQI) -BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_avx512dq_vextracti32x8_mask, "__builtin_ia32_extracti32x8_mask", IX86_BUILTIN_EXTRACTI32X8, UNKNOWN, (int) V8SI_FTYPE_V16SI_INT_V8SI_UQI) -BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_reducepv8df_mask, "__builtin_ia32_reducepd512_mask", IX86_BUILTIN_REDUCEPD512_MASK, UNKNOWN, (int) V8DF_FTYPE_V8DF_INT_V8DF_UQI) -BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_reducepv16sf_mask, "__builtin_ia32_reduceps512_mask", IX86_BUILTIN_REDUCEPS512_MASK, UNKNOWN, (int) V16SF_FTYPE_V16SF_INT_V16SF_UHI) -BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_avx512dq_mulv8di3_mask, "__builtin_ia32_pmullq512_mask", IX86_BUILTIN_PMULLQ512, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_V8DI_UQI) -BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_xorv8df3_mask, "__builtin_ia32_xorpd512_mask", IX86_BUILTIN_XORPD512, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_UQI) -BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_xorv16sf3_mask, "__builtin_ia32_xorps512_mask", IX86_BUILTIN_XORPS512, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_UHI) -BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_iorv8df3_mask, "__builtin_ia32_orpd512_mask", IX86_BUILTIN_ORPD512, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_UQI) -BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_iorv16sf3_mask, "__builtin_ia32_orps512_mask", IX86_BUILTIN_ORPS512, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_UHI) -BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_andv8df3_mask, "__builtin_ia32_andpd512_mask", IX86_BUILTIN_ANDPD512, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_UQI) -BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_andv16sf3_mask, "__builtin_ia32_andps512_mask", IX86_BUILTIN_ANDPS512, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_UHI) -BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_avx512f_andnotv8df3_mask, "__builtin_ia32_andnpd512_mask", IX86_BUILTIN_ANDNPD512, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_UQI) -BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_avx512f_andnotv16sf3_mask, "__builtin_ia32_andnps512_mask", IX86_BUILTIN_ANDNPS512, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_UHI) -BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_avx512dq_vinsertf32x8_mask, "__builtin_ia32_insertf32x8_mask", IX86_BUILTIN_INSERTF32X8, UNKNOWN, (int) V16SF_FTYPE_V16SF_V8SF_INT_V16SF_UHI) -BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_avx512dq_vinserti32x8_mask, "__builtin_ia32_inserti32x8_mask", IX86_BUILTIN_INSERTI32X8, UNKNOWN, (int) V16SI_FTYPE_V16SI_V8SI_INT_V16SI_UHI) -BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_avx512dq_vinsertf64x2_mask, "__builtin_ia32_insertf64x2_512_mask", IX86_BUILTIN_INSERTF64X2_512, UNKNOWN, (int) V8DF_FTYPE_V8DF_V2DF_INT_V8DF_UQI) -BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_avx512dq_vinserti64x2_mask, "__builtin_ia32_inserti64x2_512_mask", IX86_BUILTIN_INSERTI64X2_512, UNKNOWN, (int) V8DI_FTYPE_V8DI_V2DI_INT_V8DI_UQI) -BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_avx512dq_fpclassv8df_mask, "__builtin_ia32_fpclasspd512_mask", IX86_BUILTIN_FPCLASSPD512, UNKNOWN, (int) QI_FTYPE_V8DF_INT_UQI) -BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_avx512dq_fpclassv16sf_mask, "__builtin_ia32_fpclassps512_mask", IX86_BUILTIN_FPCLASSPS512, UNKNOWN, (int) HI_FTYPE_V16SF_INT_UHI) -BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_avx512f_cvtd2maskv16si, "__builtin_ia32_cvtd2mask512", IX86_BUILTIN_CVTD2MASK512, UNKNOWN, (int) UHI_FTYPE_V16SI) -BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_avx512f_cvtq2maskv8di, "__builtin_ia32_cvtq2mask512", IX86_BUILTIN_CVTQ2MASK512, UNKNOWN, (int) UQI_FTYPE_V8DI) -BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_avx512f_cvtmask2dv16si, "__builtin_ia32_cvtmask2d512", IX86_BUILTIN_CVTMASK2D512, UNKNOWN, (int) V16SI_FTYPE_UHI) -BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_avx512f_cvtmask2qv8di, "__builtin_ia32_cvtmask2q512", IX86_BUILTIN_CVTMASK2Q512, UNKNOWN, (int) V8DI_FTYPE_UQI) +BDESC (OPTION_MASK_ISA_AVX512DQ, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512dq_broadcastv16sf_mask, "__builtin_ia32_broadcastf32x2_512_mask", IX86_BUILTIN_BROADCASTF32x2_512, UNKNOWN, (int) V16SF_FTYPE_V4SF_V16SF_UHI) +BDESC (OPTION_MASK_ISA_AVX512DQ, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512dq_broadcastv16si_mask, "__builtin_ia32_broadcasti32x2_512_mask", IX86_BUILTIN_BROADCASTI32x2_512, UNKNOWN, (int) V16SI_FTYPE_V4SI_V16SI_UHI) +BDESC (OPTION_MASK_ISA_AVX512DQ, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512dq_broadcastv8df_mask_1, "__builtin_ia32_broadcastf64x2_512_mask", IX86_BUILTIN_BROADCASTF64X2_512, UNKNOWN, (int) V8DF_FTYPE_V2DF_V8DF_UQI) +BDESC (OPTION_MASK_ISA_AVX512DQ, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512dq_broadcastv8di_mask_1, "__builtin_ia32_broadcasti64x2_512_mask", IX86_BUILTIN_BROADCASTI64X2_512, UNKNOWN, (int) V8DI_FTYPE_V2DI_V8DI_UQI) +BDESC (OPTION_MASK_ISA_AVX512DQ, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512dq_broadcastv16sf_mask_1, "__builtin_ia32_broadcastf32x8_512_mask", IX86_BUILTIN_BROADCASTF32X8_512, UNKNOWN, (int) V16SF_FTYPE_V8SF_V16SF_UHI) +BDESC (OPTION_MASK_ISA_AVX512DQ, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512dq_broadcastv16si_mask_1, "__builtin_ia32_broadcasti32x8_512_mask", IX86_BUILTIN_BROADCASTI32X8_512, UNKNOWN, (int) V16SI_FTYPE_V8SI_V16SI_UHI) +BDESC (OPTION_MASK_ISA_AVX512DQ, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512dq_vextractf64x2_mask, "__builtin_ia32_extractf64x2_512_mask", IX86_BUILTIN_EXTRACTF64X2_512, UNKNOWN, (int) V2DF_FTYPE_V8DF_INT_V2DF_UQI) +BDESC (OPTION_MASK_ISA_AVX512DQ, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512dq_vextractf32x8_mask, "__builtin_ia32_extractf32x8_mask", IX86_BUILTIN_EXTRACTF32X8, UNKNOWN, (int) V8SF_FTYPE_V16SF_INT_V8SF_UQI) +BDESC (OPTION_MASK_ISA_AVX512DQ, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512dq_vextracti64x2_mask, "__builtin_ia32_extracti64x2_512_mask", IX86_BUILTIN_EXTRACTI64X2_512, UNKNOWN, (int) V2DI_FTYPE_V8DI_INT_V2DI_UQI) +BDESC (OPTION_MASK_ISA_AVX512DQ, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512dq_vextracti32x8_mask, "__builtin_ia32_extracti32x8_mask", IX86_BUILTIN_EXTRACTI32X8, UNKNOWN, (int) V8SI_FTYPE_V16SI_INT_V8SI_UQI) +BDESC (OPTION_MASK_ISA_AVX512DQ, OPTION_MASK_ISA2_EVEX512, CODE_FOR_reducepv8df_mask, "__builtin_ia32_reducepd512_mask", IX86_BUILTIN_REDUCEPD512_MASK, UNKNOWN, (int) V8DF_FTYPE_V8DF_INT_V8DF_UQI) +BDESC (OPTION_MASK_ISA_AVX512DQ, OPTION_MASK_ISA2_EVEX512, CODE_FOR_reducepv16sf_mask, "__builtin_ia32_reduceps512_mask", IX86_BUILTIN_REDUCEPS512_MASK, UNKNOWN, (int) V16SF_FTYPE_V16SF_INT_V16SF_UHI) +BDESC (OPTION_MASK_ISA_AVX512DQ, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512dq_mulv8di3_mask, "__builtin_ia32_pmullq512_mask", IX86_BUILTIN_PMULLQ512, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_V8DI_UQI) +BDESC (OPTION_MASK_ISA_AVX512DQ, OPTION_MASK_ISA2_EVEX512, CODE_FOR_xorv8df3_mask, "__builtin_ia32_xorpd512_mask", IX86_BUILTIN_XORPD512, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_UQI) +BDESC (OPTION_MASK_ISA_AVX512DQ, OPTION_MASK_ISA2_EVEX512, CODE_FOR_xorv16sf3_mask, "__builtin_ia32_xorps512_mask", IX86_BUILTIN_XORPS512, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_UHI) +BDESC (OPTION_MASK_ISA_AVX512DQ, OPTION_MASK_ISA2_EVEX512, CODE_FOR_iorv8df3_mask, "__builtin_ia32_orpd512_mask", IX86_BUILTIN_ORPD512, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_UQI) +BDESC (OPTION_MASK_ISA_AVX512DQ, OPTION_MASK_ISA2_EVEX512, CODE_FOR_iorv16sf3_mask, "__builtin_ia32_orps512_mask", IX86_BUILTIN_ORPS512, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_UHI) +BDESC (OPTION_MASK_ISA_AVX512DQ, OPTION_MASK_ISA2_EVEX512, CODE_FOR_andv8df3_mask, "__builtin_ia32_andpd512_mask", IX86_BUILTIN_ANDPD512, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_UQI) +BDESC (OPTION_MASK_ISA_AVX512DQ, OPTION_MASK_ISA2_EVEX512, CODE_FOR_andv16sf3_mask, "__builtin_ia32_andps512_mask", IX86_BUILTIN_ANDPS512, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_UHI) +BDESC (OPTION_MASK_ISA_AVX512DQ, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_andnotv8df3_mask, "__builtin_ia32_andnpd512_mask", IX86_BUILTIN_ANDNPD512, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_UQI) +BDESC (OPTION_MASK_ISA_AVX512DQ, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_andnotv16sf3_mask, "__builtin_ia32_andnps512_mask", IX86_BUILTIN_ANDNPS512, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_UHI) +BDESC (OPTION_MASK_ISA_AVX512DQ, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512dq_vinsertf32x8_mask, "__builtin_ia32_insertf32x8_mask", IX86_BUILTIN_INSERTF32X8, UNKNOWN, (int) V16SF_FTYPE_V16SF_V8SF_INT_V16SF_UHI) +BDESC (OPTION_MASK_ISA_AVX512DQ, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512dq_vinserti32x8_mask, "__builtin_ia32_inserti32x8_mask", IX86_BUILTIN_INSERTI32X8, UNKNOWN, (int) V16SI_FTYPE_V16SI_V8SI_INT_V16SI_UHI) +BDESC (OPTION_MASK_ISA_AVX512DQ, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512dq_vinsertf64x2_mask, "__builtin_ia32_insertf64x2_512_mask", IX86_BUILTIN_INSERTF64X2_512, UNKNOWN, (int) V8DF_FTYPE_V8DF_V2DF_INT_V8DF_UQI) +BDESC (OPTION_MASK_ISA_AVX512DQ, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512dq_vinserti64x2_mask, "__builtin_ia32_inserti64x2_512_mask", IX86_BUILTIN_INSERTI64X2_512, UNKNOWN, (int) V8DI_FTYPE_V8DI_V2DI_INT_V8DI_UQI) +BDESC (OPTION_MASK_ISA_AVX512DQ, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512dq_fpclassv8df_mask, "__builtin_ia32_fpclasspd512_mask", IX86_BUILTIN_FPCLASSPD512, UNKNOWN, (int) QI_FTYPE_V8DF_INT_UQI) +BDESC (OPTION_MASK_ISA_AVX512DQ, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512dq_fpclassv16sf_mask, "__builtin_ia32_fpclassps512_mask", IX86_BUILTIN_FPCLASSPS512, UNKNOWN, (int) HI_FTYPE_V16SF_INT_UHI) +BDESC (OPTION_MASK_ISA_AVX512DQ, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_cvtd2maskv16si, "__builtin_ia32_cvtd2mask512", IX86_BUILTIN_CVTD2MASK512, UNKNOWN, (int) UHI_FTYPE_V16SI) +BDESC (OPTION_MASK_ISA_AVX512DQ, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_cvtq2maskv8di, "__builtin_ia32_cvtq2mask512", IX86_BUILTIN_CVTQ2MASK512, UNKNOWN, (int) UQI_FTYPE_V8DI) +BDESC (OPTION_MASK_ISA_AVX512DQ, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_cvtmask2dv16si, "__builtin_ia32_cvtmask2d512", IX86_BUILTIN_CVTMASK2D512, UNKNOWN, (int) V16SI_FTYPE_UHI) +BDESC (OPTION_MASK_ISA_AVX512DQ, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_cvtmask2qv8di, "__builtin_ia32_cvtmask2q512", IX86_BUILTIN_CVTMASK2Q512, UNKNOWN, (int) V8DI_FTYPE_UQI) /* AVX512BW. */ BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_kunpcksi, "__builtin_ia32_kunpcksi", IX86_BUILTIN_KUNPCKWD, UNKNOWN, (int) USI_FTYPE_USI_USI) @@ -3207,26 +3207,26 @@ BDESC (OPTION_MASK_ISA_AVX512ER, 0, CODE_FOR_avx512er_vmrsqrt28v4sf_round, "__bu BDESC (OPTION_MASK_ISA_AVX512ER, 0, CODE_FOR_avx512er_vmrsqrt28v4sf_mask_round, "__builtin_ia32_rsqrt28ss_mask_round", IX86_BUILTIN_RSQRT28SS_MASK_ROUND, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_V4SF_UQI_INT) /* AVX512DQ. */ -BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_reducepv8df_mask_round, "__builtin_ia32_reducepd512_mask_round", IX86_BUILTIN_REDUCEPD512_MASK_ROUND, UNKNOWN, (int) V8DF_FTYPE_V8DF_INT_V8DF_UQI_INT) -BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_reducepv16sf_mask_round, "__builtin_ia32_reduceps512_mask_round", IX86_BUILTIN_REDUCEPS512_MASK_ROUND, UNKNOWN, (int) V16SF_FTYPE_V16SF_INT_V16SF_UHI_INT) +BDESC (OPTION_MASK_ISA_AVX512DQ, OPTION_MASK_ISA2_EVEX512, CODE_FOR_reducepv8df_mask_round, "__builtin_ia32_reducepd512_mask_round", IX86_BUILTIN_REDUCEPD512_MASK_ROUND, UNKNOWN, (int) V8DF_FTYPE_V8DF_INT_V8DF_UQI_INT) +BDESC (OPTION_MASK_ISA_AVX512DQ, OPTION_MASK_ISA2_EVEX512, CODE_FOR_reducepv16sf_mask_round, "__builtin_ia32_reduceps512_mask_round", IX86_BUILTIN_REDUCEPS512_MASK_ROUND, UNKNOWN, (int) V16SF_FTYPE_V16SF_INT_V16SF_UHI_INT) BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_reducesv2df_mask_round, "__builtin_ia32_reducesd_mask_round", IX86_BUILTIN_REDUCESD128_MASK_ROUND, UNKNOWN, (int) V2DF_FTYPE_V2DF_V2DF_INT_V2DF_UQI_INT) BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_reducesv4sf_mask_round, "__builtin_ia32_reducess_mask_round", IX86_BUILTIN_REDUCESS128_MASK_ROUND, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_INT_V4SF_UQI_INT) BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_avx512dq_rangesv2df_mask_round, "__builtin_ia32_rangesd128_mask_round", IX86_BUILTIN_RANGESD128, UNKNOWN, (int) V2DF_FTYPE_V2DF_V2DF_INT_V2DF_UQI_INT) BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_avx512dq_rangesv4sf_mask_round, "__builtin_ia32_rangess128_mask_round", IX86_BUILTIN_RANGESS128, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_INT_V4SF_UQI_INT) -BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_fix_notruncv8dfv8di2_mask_round, "__builtin_ia32_cvtpd2qq512_mask", IX86_BUILTIN_CVTPD2QQ512, UNKNOWN, (int) V8DI_FTYPE_V8DF_V8DI_QI_INT) -BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_avx512dq_cvtps2qqv8di_mask_round, "__builtin_ia32_cvtps2qq512_mask", IX86_BUILTIN_CVTPS2QQ512, UNKNOWN, (int) V8DI_FTYPE_V8SF_V8DI_QI_INT) -BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_fixuns_notruncv8dfv8di2_mask_round, "__builtin_ia32_cvtpd2uqq512_mask", IX86_BUILTIN_CVTPD2UQQ512, UNKNOWN, (int) V8DI_FTYPE_V8DF_V8DI_QI_INT) -BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_avx512dq_cvtps2uqqv8di_mask_round, "__builtin_ia32_cvtps2uqq512_mask", IX86_BUILTIN_CVTPS2UQQ512, UNKNOWN, (int) V8DI_FTYPE_V8SF_V8DI_QI_INT) -BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_floatv8div8sf2_mask_round, "__builtin_ia32_cvtqq2ps512_mask", IX86_BUILTIN_CVTQQ2PS512, UNKNOWN, (int) V8SF_FTYPE_V8DI_V8SF_QI_INT) -BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_floatunsv8div8sf2_mask_round, "__builtin_ia32_cvtuqq2ps512_mask", IX86_BUILTIN_CVTUQQ2PS512, UNKNOWN, (int) V8SF_FTYPE_V8DI_V8SF_QI_INT) -BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_floatv8div8df2_mask_round, "__builtin_ia32_cvtqq2pd512_mask", IX86_BUILTIN_CVTQQ2PD512, UNKNOWN, (int) V8DF_FTYPE_V8DI_V8DF_QI_INT) -BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_floatunsv8div8df2_mask_round, "__builtin_ia32_cvtuqq2pd512_mask", IX86_BUILTIN_CVTUQQ2PD512, UNKNOWN, (int) V8DF_FTYPE_V8DI_V8DF_QI_INT) -BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_fix_truncv8sfv8di2_mask_round, "__builtin_ia32_cvttps2qq512_mask", IX86_BUILTIN_CVTTPS2QQ512, UNKNOWN, (int) V8DI_FTYPE_V8SF_V8DI_QI_INT) -BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_fixuns_truncv8sfv8di2_mask_round, "__builtin_ia32_cvttps2uqq512_mask", IX86_BUILTIN_CVTTPS2UQQ512, UNKNOWN, (int) V8DI_FTYPE_V8SF_V8DI_QI_INT) -BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_fix_truncv8dfv8di2_mask_round, "__builtin_ia32_cvttpd2qq512_mask", IX86_BUILTIN_CVTTPD2QQ512, UNKNOWN, (int) V8DI_FTYPE_V8DF_V8DI_QI_INT) -BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_fixuns_truncv8dfv8di2_mask_round, "__builtin_ia32_cvttpd2uqq512_mask", IX86_BUILTIN_CVTTPD2UQQ512, UNKNOWN, (int) V8DI_FTYPE_V8DF_V8DI_QI_INT) -BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_avx512dq_rangepv16sf_mask_round, "__builtin_ia32_rangeps512_mask", IX86_BUILTIN_RANGEPS512, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_INT_V16SF_HI_INT) -BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_avx512dq_rangepv8df_mask_round, "__builtin_ia32_rangepd512_mask", IX86_BUILTIN_RANGEPD512, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_INT_V8DF_QI_INT) +BDESC (OPTION_MASK_ISA_AVX512DQ, OPTION_MASK_ISA2_EVEX512, CODE_FOR_fix_notruncv8dfv8di2_mask_round, "__builtin_ia32_cvtpd2qq512_mask", IX86_BUILTIN_CVTPD2QQ512, UNKNOWN, (int) V8DI_FTYPE_V8DF_V8DI_QI_INT) +BDESC (OPTION_MASK_ISA_AVX512DQ, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512dq_cvtps2qqv8di_mask_round, "__builtin_ia32_cvtps2qq512_mask", IX86_BUILTIN_CVTPS2QQ512, UNKNOWN, (int) V8DI_FTYPE_V8SF_V8DI_QI_INT) +BDESC (OPTION_MASK_ISA_AVX512DQ, OPTION_MASK_ISA2_EVEX512, CODE_FOR_fixuns_notruncv8dfv8di2_mask_round, "__builtin_ia32_cvtpd2uqq512_mask", IX86_BUILTIN_CVTPD2UQQ512, UNKNOWN, (int) V8DI_FTYPE_V8DF_V8DI_QI_INT) +BDESC (OPTION_MASK_ISA_AVX512DQ, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512dq_cvtps2uqqv8di_mask_round, "__builtin_ia32_cvtps2uqq512_mask", IX86_BUILTIN_CVTPS2UQQ512, UNKNOWN, (int) V8DI_FTYPE_V8SF_V8DI_QI_INT) +BDESC (OPTION_MASK_ISA_AVX512DQ, OPTION_MASK_ISA2_EVEX512, CODE_FOR_floatv8div8sf2_mask_round, "__builtin_ia32_cvtqq2ps512_mask", IX86_BUILTIN_CVTQQ2PS512, UNKNOWN, (int) V8SF_FTYPE_V8DI_V8SF_QI_INT) +BDESC (OPTION_MASK_ISA_AVX512DQ, OPTION_MASK_ISA2_EVEX512, CODE_FOR_floatunsv8div8sf2_mask_round, "__builtin_ia32_cvtuqq2ps512_mask", IX86_BUILTIN_CVTUQQ2PS512, UNKNOWN, (int) V8SF_FTYPE_V8DI_V8SF_QI_INT) +BDESC (OPTION_MASK_ISA_AVX512DQ, OPTION_MASK_ISA2_EVEX512, CODE_FOR_floatv8div8df2_mask_round, "__builtin_ia32_cvtqq2pd512_mask", IX86_BUILTIN_CVTQQ2PD512, UNKNOWN, (int) V8DF_FTYPE_V8DI_V8DF_QI_INT) +BDESC (OPTION_MASK_ISA_AVX512DQ, OPTION_MASK_ISA2_EVEX512, CODE_FOR_floatunsv8div8df2_mask_round, "__builtin_ia32_cvtuqq2pd512_mask", IX86_BUILTIN_CVTUQQ2PD512, UNKNOWN, (int) V8DF_FTYPE_V8DI_V8DF_QI_INT) +BDESC (OPTION_MASK_ISA_AVX512DQ, OPTION_MASK_ISA2_EVEX512, CODE_FOR_fix_truncv8sfv8di2_mask_round, "__builtin_ia32_cvttps2qq512_mask", IX86_BUILTIN_CVTTPS2QQ512, UNKNOWN, (int) V8DI_FTYPE_V8SF_V8DI_QI_INT) +BDESC (OPTION_MASK_ISA_AVX512DQ, OPTION_MASK_ISA2_EVEX512, CODE_FOR_fixuns_truncv8sfv8di2_mask_round, "__builtin_ia32_cvttps2uqq512_mask", IX86_BUILTIN_CVTTPS2UQQ512, UNKNOWN, (int) V8DI_FTYPE_V8SF_V8DI_QI_INT) +BDESC (OPTION_MASK_ISA_AVX512DQ, OPTION_MASK_ISA2_EVEX512, CODE_FOR_fix_truncv8dfv8di2_mask_round, "__builtin_ia32_cvttpd2qq512_mask", IX86_BUILTIN_CVTTPD2QQ512, UNKNOWN, (int) V8DI_FTYPE_V8DF_V8DI_QI_INT) +BDESC (OPTION_MASK_ISA_AVX512DQ, OPTION_MASK_ISA2_EVEX512, CODE_FOR_fixuns_truncv8dfv8di2_mask_round, "__builtin_ia32_cvttpd2uqq512_mask", IX86_BUILTIN_CVTTPD2UQQ512, UNKNOWN, (int) V8DI_FTYPE_V8DF_V8DI_QI_INT) +BDESC (OPTION_MASK_ISA_AVX512DQ, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512dq_rangepv16sf_mask_round, "__builtin_ia32_rangeps512_mask", IX86_BUILTIN_RANGEPS512, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_INT_V16SF_HI_INT) +BDESC (OPTION_MASK_ISA_AVX512DQ, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512dq_rangepv8df_mask_round, "__builtin_ia32_rangepd512_mask", IX86_BUILTIN_RANGEPD512, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_INT_V8DF_QI_INT) /* AVX512FP16. */ BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_addv32hf3_mask_round, "__builtin_ia32_addph512_mask_round", IX86_BUILTIN_ADDPH512_MASK_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT) From patchwork Thu Sep 21 07:20:04 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Hu, Lin1" X-Patchwork-Id: 1837515 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=intel.com header.i=@intel.com header.a=rsa-sha256 header.s=Intel header.b=DPVvM3hI; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4Rrn1t0bJpz1ynF for ; Thu, 21 Sep 2023 17:23:28 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 8576D3831E0D for ; Thu, 21 Sep 2023 07:23:25 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mgamail.intel.com (mgamail.intel.com [192.55.52.115]) by sourceware.org (Postfix) with ESMTPS id 452913856975 for ; Thu, 21 Sep 2023 07:22:41 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 452913856975 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=intel.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1695280961; x=1726816961; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=yLsSTgfJcPPUETdypgBpDIM73CzaWYCKS7ZwjsqEsj0=; b=DPVvM3hIa7p+eb5/HhqFG5FJFCw8dsMRzx9cdokUpnkUgY2qGdBCrvkD OIi99DV4duIcddtLMHfvo2yybJhXa8ta1LjmhZHgTEvj/4YETUZuXSJld We8vFIxJpazL8bFg2AT9k1vdPkU+9aw09MA3ZKQzW3wBjYrX5VlEO4zEv RA9eAHuV7kxI9xSuT1Pp1qLURy2GgBvJJpMxBjoWt31kGyHzLvLiCiVnX 8B8xRCFVepKssTBWIVujn9BaGhpPeHvKW3tmx9p++e5OcB5jKwDoUsXG5 If2XsUsSzOGSrF9LtpYcibXJuxh8S0F/IzhoBi82Awm/f51e5xc5JF37A w==; X-IronPort-AV: E=McAfee;i="6600,9927,10839"; a="380352166" X-IronPort-AV: E=Sophos;i="6.03,164,1694761200"; d="scan'208";a="380352166" Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Sep 2023 00:22:21 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10839"; a="817262201" X-IronPort-AV: E=Sophos;i="6.03,164,1694761200"; d="scan'208";a="817262201" Received: from shvmail03.sh.intel.com ([10.239.245.20]) by fmsmga004.fm.intel.com with ESMTP; 21 Sep 2023 00:22:15 -0700 Received: from shliclel4217.sh.intel.com (shliclel4217.sh.intel.com [10.239.240.127]) by shvmail03.sh.intel.com (Postfix) with ESMTP id 4AFF31005137; Thu, 21 Sep 2023 15:22:14 +0800 (CST) From: "Hu, Lin1" To: gcc-patches@gcc.gnu.org Cc: hongtao.liu@intel.com, ubizjak@gmail.com, haochen.jiang@intel.com Subject: [PATCH 09/18] [PATCH 3/5] Add OPTION_MASK_ISA2_EVEX512 for 512 bit builtins Date: Thu, 21 Sep 2023 15:20:04 +0800 Message-Id: <20230921072013.2124750-10-lin1.hu@intel.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20230921072013.2124750-1-lin1.hu@intel.com> References: <20230921072013.2124750-1-lin1.hu@intel.com> MIME-Version: 1.0 X-Spam-Status: No, score=-11.8 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, SPF_HELO_NONE, SPF_NONE, TXREP, UPPERCASE_50_75 autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org From: Haochen Jiang gcc/ChangeLog: * config/i386/i386-builtin.def (BDESC): Add OPTION_MASK_ISA2_EVEX512. --- gcc/config/i386/i386-builtin.def | 226 +++++++++++++++---------------- 1 file changed, 113 insertions(+), 113 deletions(-) diff --git a/gcc/config/i386/i386-builtin.def b/gcc/config/i386/i386-builtin.def index 7a0dec9bc8b..167d530a537 100644 --- a/gcc/config/i386/i386-builtin.def +++ b/gcc/config/i386/i386-builtin.def @@ -293,10 +293,10 @@ BDESC (OPTION_MASK_ISA_64BIT, OPTION_MASK_ISA2_CMPCCXADD, CODE_FOR_cmpccxadd_si, BDESC (OPTION_MASK_ISA_64BIT, OPTION_MASK_ISA2_CMPCCXADD, CODE_FOR_cmpccxadd_di, "__builtin_ia32_cmpccxadd64", IX86_BUILTIN_CMPCCXADD64, UNKNOWN, (int) LONGLONG_FTYPE_PLONGLONG_LONGLONG_LONGLONG_INT) /* AVX512BW */ -BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_loadv32hi_mask, "__builtin_ia32_loaddquhi512_mask", IX86_BUILTIN_LOADDQUHI512_MASK, UNKNOWN, (int) V32HI_FTYPE_PCSHORT_V32HI_USI) -BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_loadv64qi_mask, "__builtin_ia32_loaddquqi512_mask", IX86_BUILTIN_LOADDQUQI512_MASK, UNKNOWN, (int) V64QI_FTYPE_PCCHAR_V64QI_UDI) -BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_storev32hi_mask, "__builtin_ia32_storedquhi512_mask", IX86_BUILTIN_STOREDQUHI512_MASK, UNKNOWN, (int) VOID_FTYPE_PSHORT_V32HI_USI) -BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_storev64qi_mask, "__builtin_ia32_storedquqi512_mask", IX86_BUILTIN_STOREDQUQI512_MASK, UNKNOWN, (int) VOID_FTYPE_PCHAR_V64QI_UDI) +BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_loadv32hi_mask, "__builtin_ia32_loaddquhi512_mask", IX86_BUILTIN_LOADDQUHI512_MASK, UNKNOWN, (int) V32HI_FTYPE_PCSHORT_V32HI_USI) +BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_loadv64qi_mask, "__builtin_ia32_loaddquqi512_mask", IX86_BUILTIN_LOADDQUQI512_MASK, UNKNOWN, (int) V64QI_FTYPE_PCCHAR_V64QI_UDI) +BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_storev32hi_mask, "__builtin_ia32_storedquhi512_mask", IX86_BUILTIN_STOREDQUHI512_MASK, UNKNOWN, (int) VOID_FTYPE_PSHORT_V32HI_USI) +BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_storev64qi_mask, "__builtin_ia32_storedquqi512_mask", IX86_BUILTIN_STOREDQUQI512_MASK, UNKNOWN, (int) VOID_FTYPE_PCHAR_V64QI_UDI) /* AVX512VP2INTERSECT */ BDESC (0, OPTION_MASK_ISA2_AVX512VP2INTERSECT, CODE_FOR_nothing, "__builtin_ia32_2intersectd512", IX86_BUILTIN_2INTERSECTD512, UNKNOWN, (int) VOID_FTYPE_PUHI_PUHI_V16SI_V16SI) @@ -407,9 +407,9 @@ BDESC (OPTION_MASK_ISA_AVX512BW | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_avx512vl BDESC (OPTION_MASK_ISA_AVX512BW | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_avx512vl_ss_truncatev16hiv16qi2_mask_store, "__builtin_ia32_pmovswb256mem_mask", IX86_BUILTIN_PMOVSWB256_MEM, UNKNOWN, (int) VOID_FTYPE_PV16QI_V16HI_UHI) BDESC (OPTION_MASK_ISA_AVX512BW | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_avx512vl_us_truncatev8hiv8qi2_mask_store_2, "__builtin_ia32_pmovuswb128mem_mask", IX86_BUILTIN_PMOVUSWB128_MEM, UNKNOWN, (int) VOID_FTYPE_PUDI_V8HI_UQI) BDESC (OPTION_MASK_ISA_AVX512BW | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_avx512vl_us_truncatev16hiv16qi2_mask_store, "__builtin_ia32_pmovuswb256mem_mask", IX86_BUILTIN_PMOVUSWB256_MEM, UNKNOWN, (int) VOID_FTYPE_PV16QI_V16HI_UHI) -BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_us_truncatev32hiv32qi2_mask_store, "__builtin_ia32_pmovuswb512mem_mask", IX86_BUILTIN_PMOVUSWB512_MEM, UNKNOWN, (int) VOID_FTYPE_PV32QI_V32HI_USI) -BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_ss_truncatev32hiv32qi2_mask_store, "__builtin_ia32_pmovswb512mem_mask", IX86_BUILTIN_PMOVSWB512_MEM, UNKNOWN, (int) VOID_FTYPE_PV32QI_V32HI_USI) -BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_truncatev32hiv32qi2_mask_store, "__builtin_ia32_pmovwb512mem_mask", IX86_BUILTIN_PMOVWB512_MEM, UNKNOWN, (int) VOID_FTYPE_PV32QI_V32HI_USI) +BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_us_truncatev32hiv32qi2_mask_store, "__builtin_ia32_pmovuswb512mem_mask", IX86_BUILTIN_PMOVUSWB512_MEM, UNKNOWN, (int) VOID_FTYPE_PV32QI_V32HI_USI) +BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_ss_truncatev32hiv32qi2_mask_store, "__builtin_ia32_pmovswb512mem_mask", IX86_BUILTIN_PMOVSWB512_MEM, UNKNOWN, (int) VOID_FTYPE_PV32QI_V32HI_USI) +BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_truncatev32hiv32qi2_mask_store, "__builtin_ia32_pmovwb512mem_mask", IX86_BUILTIN_PMOVWB512_MEM, UNKNOWN, (int) VOID_FTYPE_PV32QI_V32HI_USI) /* AVX512FP16 */ BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512f_loadhf_mask, "__builtin_ia32_loadsh_mask", IX86_BUILTIN_LOADSH_MASK, UNKNOWN, (int) V8HF_FTYPE_PCFLOAT16_V8HF_UQI) @@ -1590,61 +1590,61 @@ BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_round BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_kashiftqi, "__builtin_ia32_kshiftliqi", IX86_BUILTIN_KSHIFTLI8, UNKNOWN, (int) UQI_FTYPE_UQI_UQI_CONST) BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_kashifthi, "__builtin_ia32_kshiftlihi", IX86_BUILTIN_KSHIFTLI16, UNKNOWN, (int) UHI_FTYPE_UHI_UQI) BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_kashiftsi, "__builtin_ia32_kshiftlisi", IX86_BUILTIN_KSHIFTLI32, UNKNOWN, (int) USI_FTYPE_USI_UQI) -BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_kashiftdi, "__builtin_ia32_kshiftlidi", IX86_BUILTIN_KSHIFTLI64, UNKNOWN, (int) UDI_FTYPE_UDI_UQI) +BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_kashiftdi, "__builtin_ia32_kshiftlidi", IX86_BUILTIN_KSHIFTLI64, UNKNOWN, (int) UDI_FTYPE_UDI_UQI) BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_klshiftrtqi, "__builtin_ia32_kshiftriqi", IX86_BUILTIN_KSHIFTRI8, UNKNOWN, (int) UQI_FTYPE_UQI_UQI_CONST) BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_klshiftrthi, "__builtin_ia32_kshiftrihi", IX86_BUILTIN_KSHIFTRI16, UNKNOWN, (int) UHI_FTYPE_UHI_UQI) BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_klshiftrtsi, "__builtin_ia32_kshiftrisi", IX86_BUILTIN_KSHIFTRI32, UNKNOWN, (int) USI_FTYPE_USI_UQI) -BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_klshiftrtdi, "__builtin_ia32_kshiftridi", IX86_BUILTIN_KSHIFTRI64, UNKNOWN, (int) UDI_FTYPE_UDI_UQI) +BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_klshiftrtdi, "__builtin_ia32_kshiftridi", IX86_BUILTIN_KSHIFTRI64, UNKNOWN, (int) UDI_FTYPE_UDI_UQI) BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_kandqi, "__builtin_ia32_kandqi", IX86_BUILTIN_KAND8, UNKNOWN, (int) UQI_FTYPE_UQI_UQI) BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_kandhi, "__builtin_ia32_kandhi", IX86_BUILTIN_KAND16, UNKNOWN, (int) UHI_FTYPE_UHI_UHI) BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_kandsi, "__builtin_ia32_kandsi", IX86_BUILTIN_KAND32, UNKNOWN, (int) USI_FTYPE_USI_USI) -BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_kanddi, "__builtin_ia32_kanddi", IX86_BUILTIN_KAND64, UNKNOWN, (int) UDI_FTYPE_UDI_UDI) +BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_kanddi, "__builtin_ia32_kanddi", IX86_BUILTIN_KAND64, UNKNOWN, (int) UDI_FTYPE_UDI_UDI) BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_kandnqi, "__builtin_ia32_kandnqi", IX86_BUILTIN_KANDN8, UNKNOWN, (int) UQI_FTYPE_UQI_UQI) BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_kandnhi, "__builtin_ia32_kandnhi", IX86_BUILTIN_KANDN16, UNKNOWN, (int) UHI_FTYPE_UHI_UHI) BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_kandnsi, "__builtin_ia32_kandnsi", IX86_BUILTIN_KANDN32, UNKNOWN, (int) USI_FTYPE_USI_USI) -BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_kandndi, "__builtin_ia32_kandndi", IX86_BUILTIN_KANDN64, UNKNOWN, (int) UDI_FTYPE_UDI_UDI) +BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_kandndi, "__builtin_ia32_kandndi", IX86_BUILTIN_KANDN64, UNKNOWN, (int) UDI_FTYPE_UDI_UDI) BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_knotqi, "__builtin_ia32_knotqi", IX86_BUILTIN_KNOT8, UNKNOWN, (int) UQI_FTYPE_UQI) BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_knothi, "__builtin_ia32_knothi", IX86_BUILTIN_KNOT16, UNKNOWN, (int) UHI_FTYPE_UHI) BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_knotsi, "__builtin_ia32_knotsi", IX86_BUILTIN_KNOT32, UNKNOWN, (int) USI_FTYPE_USI) -BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_knotdi, "__builtin_ia32_knotdi", IX86_BUILTIN_KNOT64, UNKNOWN, (int) UDI_FTYPE_UDI) +BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_knotdi, "__builtin_ia32_knotdi", IX86_BUILTIN_KNOT64, UNKNOWN, (int) UDI_FTYPE_UDI) BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_kiorqi, "__builtin_ia32_korqi", IX86_BUILTIN_KOR8, UNKNOWN, (int) UQI_FTYPE_UQI_UQI) BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_kiorhi, "__builtin_ia32_korhi", IX86_BUILTIN_KOR16, UNKNOWN, (int) UHI_FTYPE_UHI_UHI) BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_kiorsi, "__builtin_ia32_korsi", IX86_BUILTIN_KOR32, UNKNOWN, (int) USI_FTYPE_USI_USI) -BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_kiordi, "__builtin_ia32_kordi", IX86_BUILTIN_KOR64, UNKNOWN, (int) UDI_FTYPE_UDI_UDI) +BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_kiordi, "__builtin_ia32_kordi", IX86_BUILTIN_KOR64, UNKNOWN, (int) UDI_FTYPE_UDI_UDI) BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_ktestqi, "__builtin_ia32_ktestcqi", IX86_BUILTIN_KTESTC8, UNKNOWN, (int) UQI_FTYPE_UQI_UQI) BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_ktestqi, "__builtin_ia32_ktestzqi", IX86_BUILTIN_KTESTZ8, UNKNOWN, (int) UQI_FTYPE_UQI_UQI) BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_ktesthi, "__builtin_ia32_ktestchi", IX86_BUILTIN_KTESTC16, UNKNOWN, (int) UHI_FTYPE_UHI_UHI) BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_ktesthi, "__builtin_ia32_ktestzhi", IX86_BUILTIN_KTESTZ16, UNKNOWN, (int) UHI_FTYPE_UHI_UHI) -BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_ktestsi, "__builtin_ia32_ktestcsi", IX86_BUILTIN_KTESTC32, UNKNOWN, (int) USI_FTYPE_USI_USI) -BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_ktestsi, "__builtin_ia32_ktestzsi", IX86_BUILTIN_KTESTZ32, UNKNOWN, (int) USI_FTYPE_USI_USI) -BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_ktestdi, "__builtin_ia32_ktestcdi", IX86_BUILTIN_KTESTC64, UNKNOWN, (int) UDI_FTYPE_UDI_UDI) -BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_ktestdi, "__builtin_ia32_ktestzdi", IX86_BUILTIN_KTESTZ64, UNKNOWN, (int) UDI_FTYPE_UDI_UDI) +BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_ktestsi, "__builtin_ia32_ktestcsi", IX86_BUILTIN_KTESTC32, UNKNOWN, (int) USI_FTYPE_USI_USI) +BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_ktestsi, "__builtin_ia32_ktestzsi", IX86_BUILTIN_KTESTZ32, UNKNOWN, (int) USI_FTYPE_USI_USI) +BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_ktestdi, "__builtin_ia32_ktestcdi", IX86_BUILTIN_KTESTC64, UNKNOWN, (int) UDI_FTYPE_UDI_UDI) +BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_ktestdi, "__builtin_ia32_ktestzdi", IX86_BUILTIN_KTESTZ64, UNKNOWN, (int) UDI_FTYPE_UDI_UDI) BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_kortestqi, "__builtin_ia32_kortestcqi", IX86_BUILTIN_KORTESTC8, UNKNOWN, (int) UQI_FTYPE_UQI_UQI) BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_kortestqi, "__builtin_ia32_kortestzqi", IX86_BUILTIN_KORTESTZ8, UNKNOWN, (int) UQI_FTYPE_UQI_UQI) BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_kortesthi, "__builtin_ia32_kortestchi", IX86_BUILTIN_KORTESTC16, UNKNOWN, (int) UHI_FTYPE_UHI_UHI) BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_kortesthi, "__builtin_ia32_kortestzhi", IX86_BUILTIN_KORTESTZ16, UNKNOWN, (int) UHI_FTYPE_UHI_UHI) BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_kortestsi, "__builtin_ia32_kortestcsi", IX86_BUILTIN_KORTESTC32, UNKNOWN, (int) USI_FTYPE_USI_USI) BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_kortestsi, "__builtin_ia32_kortestzsi", IX86_BUILTIN_KORTESTZ32, UNKNOWN, (int) USI_FTYPE_USI_USI) -BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_kortestdi, "__builtin_ia32_kortestcdi", IX86_BUILTIN_KORTESTC64, UNKNOWN, (int) UDI_FTYPE_UDI_UDI) -BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_kortestdi, "__builtin_ia32_kortestzdi", IX86_BUILTIN_KORTESTZ64, UNKNOWN, (int) UDI_FTYPE_UDI_UDI) +BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_kortestdi, "__builtin_ia32_kortestcdi", IX86_BUILTIN_KORTESTC64, UNKNOWN, (int) UDI_FTYPE_UDI_UDI) +BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_kortestdi, "__builtin_ia32_kortestzdi", IX86_BUILTIN_KORTESTZ64, UNKNOWN, (int) UDI_FTYPE_UDI_UDI) BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_kunpckhi, "__builtin_ia32_kunpckhi", IX86_BUILTIN_KUNPCKBW, UNKNOWN, (int) UHI_FTYPE_UHI_UHI) BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_kxnorqi, "__builtin_ia32_kxnorqi", IX86_BUILTIN_KXNOR8, UNKNOWN, (int) UQI_FTYPE_UQI_UQI) BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_kxnorhi, "__builtin_ia32_kxnorhi", IX86_BUILTIN_KXNOR16, UNKNOWN, (int) UHI_FTYPE_UHI_UHI) BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_kxnorsi, "__builtin_ia32_kxnorsi", IX86_BUILTIN_KXNOR32, UNKNOWN, (int) USI_FTYPE_USI_USI) -BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_kxnordi, "__builtin_ia32_kxnordi", IX86_BUILTIN_KXNOR64, UNKNOWN, (int) UDI_FTYPE_UDI_UDI) +BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_kxnordi, "__builtin_ia32_kxnordi", IX86_BUILTIN_KXNOR64, UNKNOWN, (int) UDI_FTYPE_UDI_UDI) BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_kxorqi, "__builtin_ia32_kxorqi", IX86_BUILTIN_KXOR8, UNKNOWN, (int) UQI_FTYPE_UQI_UQI) BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_kxorhi, "__builtin_ia32_kxorhi", IX86_BUILTIN_KXOR16, UNKNOWN, (int) UHI_FTYPE_UHI_UHI) BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_kxorsi, "__builtin_ia32_kxorsi", IX86_BUILTIN_KXOR32, UNKNOWN, (int) USI_FTYPE_USI_USI) -BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_kxordi, "__builtin_ia32_kxordi", IX86_BUILTIN_KXOR64, UNKNOWN, (int) UDI_FTYPE_UDI_UDI) +BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_kxordi, "__builtin_ia32_kxordi", IX86_BUILTIN_KXOR64, UNKNOWN, (int) UDI_FTYPE_UDI_UDI) BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_kmovb, "__builtin_ia32_kmovb", IX86_BUILTIN_KMOV8, UNKNOWN, (int) UQI_FTYPE_UQI) BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_kmovw, "__builtin_ia32_kmovw", IX86_BUILTIN_KMOV16, UNKNOWN, (int) UHI_FTYPE_UHI) BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_kmovd, "__builtin_ia32_kmovd", IX86_BUILTIN_KMOV32, UNKNOWN, (int) USI_FTYPE_USI) -BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_kmovq, "__builtin_ia32_kmovq", IX86_BUILTIN_KMOV64, UNKNOWN, (int) UDI_FTYPE_UDI) +BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_kmovq, "__builtin_ia32_kmovq", IX86_BUILTIN_KMOV64, UNKNOWN, (int) UDI_FTYPE_UDI) BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_kaddqi, "__builtin_ia32_kaddqi", IX86_BUILTIN_KADD8, UNKNOWN, (int) UQI_FTYPE_UQI_UQI) BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_kaddhi, "__builtin_ia32_kaddhi", IX86_BUILTIN_KADD16, UNKNOWN, (int) UHI_FTYPE_UHI_UHI) BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_kaddsi, "__builtin_ia32_kaddsi", IX86_BUILTIN_KADD32, UNKNOWN, (int) USI_FTYPE_USI_USI) -BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_kadddi, "__builtin_ia32_kadddi", IX86_BUILTIN_KADD64, UNKNOWN, (int) UDI_FTYPE_UDI_UDI) +BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_kadddi, "__builtin_ia32_kadddi", IX86_BUILTIN_KADD64, UNKNOWN, (int) UDI_FTYPE_UDI_UDI) /* SHA */ BDESC (OPTION_MASK_ISA_SSE2, 0, CODE_FOR_sha1msg1, 0, IX86_BUILTIN_SHA1MSG1, UNKNOWN, (int) V4SI_FTYPE_V4SI_V4SI) @@ -2442,96 +2442,96 @@ BDESC (OPTION_MASK_ISA_AVX512DQ, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_cvtm /* AVX512BW. */ BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_kunpcksi, "__builtin_ia32_kunpcksi", IX86_BUILTIN_KUNPCKWD, UNKNOWN, (int) USI_FTYPE_USI_USI) -BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_kunpckdi, "__builtin_ia32_kunpckdi", IX86_BUILTIN_KUNPCKDQ, UNKNOWN, (int) UDI_FTYPE_UDI_UDI) -BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_packusdw_mask, "__builtin_ia32_packusdw512_mask", IX86_BUILTIN_PACKUSDW512, UNKNOWN, (int) V32HI_FTYPE_V16SI_V16SI_V32HI_USI) -BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_ashlv4ti3, "__builtin_ia32_pslldq512", IX86_BUILTIN_PSLLDQ512, UNKNOWN, (int) V8DI_FTYPE_V8DI_INT_CONVERT) -BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_lshrv4ti3, "__builtin_ia32_psrldq512", IX86_BUILTIN_PSRLDQ512, UNKNOWN, (int) V8DI_FTYPE_V8DI_INT_CONVERT) -BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_packssdw_mask, "__builtin_ia32_packssdw512_mask", IX86_BUILTIN_PACKSSDW512, UNKNOWN, (int) V32HI_FTYPE_V16SI_V16SI_V32HI_USI) -BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_palignrv4ti, "__builtin_ia32_palignr512", IX86_BUILTIN_PALIGNR512, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_INT_CONVERT) -BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_palignrv64qi_mask, "__builtin_ia32_palignr512_mask", IX86_BUILTIN_PALIGNR512_MASK, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_INT_V8DI_UDI_CONVERT) -BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_loadv32hi_mask, "__builtin_ia32_movdquhi512_mask", IX86_BUILTIN_MOVDQUHI512_MASK, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_USI) -BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_loadv64qi_mask, "__builtin_ia32_movdquqi512_mask", IX86_BUILTIN_MOVDQUQI512_MASK, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI_UDI) -BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512f_psadbw, "__builtin_ia32_psadbw512", IX86_BUILTIN_PSADBW512, UNKNOWN, (int) V8DI_FTYPE_V64QI_V64QI) -BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_dbpsadbwv32hi_mask, "__builtin_ia32_dbpsadbw512_mask", IX86_BUILTIN_DBPSADBW512, UNKNOWN, (int) V32HI_FTYPE_V64QI_V64QI_INT_V32HI_USI) -BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_vec_dupv64qi_mask, "__builtin_ia32_pbroadcastb512_mask", IX86_BUILTIN_PBROADCASTB512, UNKNOWN, (int) V64QI_FTYPE_V16QI_V64QI_UDI) -BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_vec_dup_gprv64qi_mask, "__builtin_ia32_pbroadcastb512_gpr_mask", IX86_BUILTIN_PBROADCASTB512_GPR, UNKNOWN, (int) V64QI_FTYPE_QI_V64QI_UDI) -BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_vec_dupv32hi_mask, "__builtin_ia32_pbroadcastw512_mask", IX86_BUILTIN_PBROADCASTW512, UNKNOWN, (int) V32HI_FTYPE_V8HI_V32HI_USI) -BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_vec_dup_gprv32hi_mask, "__builtin_ia32_pbroadcastw512_gpr_mask", IX86_BUILTIN_PBROADCASTW512_GPR, UNKNOWN, (int) V32HI_FTYPE_HI_V32HI_USI) -BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_sign_extendv32qiv32hi2_mask, "__builtin_ia32_pmovsxbw512_mask", IX86_BUILTIN_PMOVSXBW512_MASK, UNKNOWN, (int) V32HI_FTYPE_V32QI_V32HI_USI) -BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_zero_extendv32qiv32hi2_mask, "__builtin_ia32_pmovzxbw512_mask", IX86_BUILTIN_PMOVZXBW512_MASK, UNKNOWN, (int) V32HI_FTYPE_V32QI_V32HI_USI) -BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_permvarv32hi_mask, "__builtin_ia32_permvarhi512_mask", IX86_BUILTIN_VPERMVARHI512_MASK, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_V32HI_USI) -BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_vpermt2varv32hi3_mask, "__builtin_ia32_vpermt2varhi512_mask", IX86_BUILTIN_VPERMT2VARHI512, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_V32HI_USI) -BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_vpermt2varv32hi3_maskz, "__builtin_ia32_vpermt2varhi512_maskz", IX86_BUILTIN_VPERMT2VARHI512_MASKZ, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_V32HI_USI) -BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_vpermi2varv32hi3_mask, "__builtin_ia32_vpermi2varhi512_mask", IX86_BUILTIN_VPERMI2VARHI512, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_V32HI_USI) -BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_uavgv64qi3_mask, "__builtin_ia32_pavgb512_mask", IX86_BUILTIN_PAVGB512, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI_V64QI_UDI) -BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_uavgv32hi3_mask, "__builtin_ia32_pavgw512_mask", IX86_BUILTIN_PAVGW512, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_V32HI_USI) -BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_addv64qi3_mask, "__builtin_ia32_paddb512_mask", IX86_BUILTIN_PADDB512, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI_V64QI_UDI) -BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_subv64qi3_mask, "__builtin_ia32_psubb512_mask", IX86_BUILTIN_PSUBB512, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI_V64QI_UDI) -BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_sssubv64qi3_mask, "__builtin_ia32_psubsb512_mask", IX86_BUILTIN_PSUBSB512, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI_V64QI_UDI) -BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_ssaddv64qi3_mask, "__builtin_ia32_paddsb512_mask", IX86_BUILTIN_PADDSB512, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI_V64QI_UDI) -BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_ussubv64qi3_mask, "__builtin_ia32_psubusb512_mask", IX86_BUILTIN_PSUBUSB512, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI_V64QI_UDI) -BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_usaddv64qi3_mask, "__builtin_ia32_paddusb512_mask", IX86_BUILTIN_PADDUSB512, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI_V64QI_UDI) -BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_subv32hi3_mask, "__builtin_ia32_psubw512_mask", IX86_BUILTIN_PSUBW512, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_V32HI_USI) -BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_addv32hi3_mask, "__builtin_ia32_paddw512_mask", IX86_BUILTIN_PADDW512, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_V32HI_USI) -BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_sssubv32hi3_mask, "__builtin_ia32_psubsw512_mask", IX86_BUILTIN_PSUBSW512, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_V32HI_USI) -BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_ssaddv32hi3_mask, "__builtin_ia32_paddsw512_mask", IX86_BUILTIN_PADDSW512, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_V32HI_USI) -BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_ussubv32hi3_mask, "__builtin_ia32_psubusw512_mask", IX86_BUILTIN_PSUBUSW512, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_V32HI_USI) -BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_usaddv32hi3_mask, "__builtin_ia32_paddusw512_mask", IX86_BUILTIN_PADDUSW512, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_V32HI_USI) -BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_umaxv32hi3_mask, "__builtin_ia32_pmaxuw512_mask", IX86_BUILTIN_PMAXUW512, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_V32HI_USI) -BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_smaxv32hi3_mask, "__builtin_ia32_pmaxsw512_mask", IX86_BUILTIN_PMAXSW512, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_V32HI_USI) -BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_uminv32hi3_mask, "__builtin_ia32_pminuw512_mask", IX86_BUILTIN_PMINUW512, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_V32HI_USI) -BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_sminv32hi3_mask, "__builtin_ia32_pminsw512_mask", IX86_BUILTIN_PMINSW512, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_V32HI_USI) -BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_umaxv64qi3_mask, "__builtin_ia32_pmaxub512_mask", IX86_BUILTIN_PMAXUB512, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI_V64QI_UDI) -BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_smaxv64qi3_mask, "__builtin_ia32_pmaxsb512_mask", IX86_BUILTIN_PMAXSB512, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI_V64QI_UDI) -BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_uminv64qi3_mask, "__builtin_ia32_pminub512_mask", IX86_BUILTIN_PMINUB512, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI_V64QI_UDI) -BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_sminv64qi3_mask, "__builtin_ia32_pminsb512_mask", IX86_BUILTIN_PMINSB512, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI_V64QI_UDI) -BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_truncatev32hiv32qi2_mask, "__builtin_ia32_pmovwb512_mask", IX86_BUILTIN_PMOVWB512, UNKNOWN, (int) V32QI_FTYPE_V32HI_V32QI_USI) -BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_ss_truncatev32hiv32qi2_mask, "__builtin_ia32_pmovswb512_mask", IX86_BUILTIN_PMOVSWB512, UNKNOWN, (int) V32QI_FTYPE_V32HI_V32QI_USI) -BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_us_truncatev32hiv32qi2_mask, "__builtin_ia32_pmovuswb512_mask", IX86_BUILTIN_PMOVUSWB512, UNKNOWN, (int) V32QI_FTYPE_V32HI_V32QI_USI) -BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_umulhrswv32hi3_mask, "__builtin_ia32_pmulhrsw512_mask", IX86_BUILTIN_PMULHRSW512_MASK, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_V32HI_USI) -BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_umulv32hi3_highpart_mask, "__builtin_ia32_pmulhuw512_mask" , IX86_BUILTIN_PMULHUW512_MASK, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_V32HI_USI) -BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_smulv32hi3_highpart_mask, "__builtin_ia32_pmulhw512_mask" , IX86_BUILTIN_PMULHW512_MASK, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_V32HI_USI) -BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_mulv32hi3_mask, "__builtin_ia32_pmullw512_mask", IX86_BUILTIN_PMULLW512_MASK, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_V32HI_USI) -BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_ashlv32hi3_mask, "__builtin_ia32_psllwi512_mask", IX86_BUILTIN_PSLLWI512_MASK, UNKNOWN, (int) V32HI_FTYPE_V32HI_INT_V32HI_USI_COUNT) -BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_ashlv32hi3_mask, "__builtin_ia32_psllw512_mask", IX86_BUILTIN_PSLLW512_MASK, UNKNOWN, (int) V32HI_FTYPE_V32HI_V8HI_V32HI_USI_COUNT) -BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_packsswb_mask, "__builtin_ia32_packsswb512_mask", IX86_BUILTIN_PACKSSWB512, UNKNOWN, (int) V64QI_FTYPE_V32HI_V32HI_V64QI_UDI) -BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_packuswb_mask, "__builtin_ia32_packuswb512_mask", IX86_BUILTIN_PACKUSWB512, UNKNOWN, (int) V64QI_FTYPE_V32HI_V32HI_V64QI_UDI) -BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_ashrvv32hi_mask, "__builtin_ia32_psrav32hi_mask", IX86_BUILTIN_PSRAVV32HI, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_V32HI_USI) -BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_pmaddubsw512v32hi_mask, "__builtin_ia32_pmaddubsw512_mask", IX86_BUILTIN_PMADDUBSW512_MASK, UNKNOWN, (int) V32HI_FTYPE_V64QI_V64QI_V32HI_USI) -BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_pmaddwd512v32hi_mask, "__builtin_ia32_pmaddwd512_mask", IX86_BUILTIN_PMADDWD512_MASK, UNKNOWN, (int) V16SI_FTYPE_V32HI_V32HI_V16SI_UHI) -BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_lshrvv32hi_mask, "__builtin_ia32_psrlv32hi_mask", IX86_BUILTIN_PSRLVV32HI, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_V32HI_USI) -BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_interleave_highv64qi_mask, "__builtin_ia32_punpckhbw512_mask", IX86_BUILTIN_PUNPCKHBW512, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI_V64QI_UDI) -BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_interleave_highv32hi_mask, "__builtin_ia32_punpckhwd512_mask", IX86_BUILTIN_PUNPCKHWD512, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_V32HI_USI) -BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_interleave_lowv64qi_mask, "__builtin_ia32_punpcklbw512_mask", IX86_BUILTIN_PUNPCKLBW512, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI_V64QI_UDI) -BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_interleave_lowv32hi_mask, "__builtin_ia32_punpcklwd512_mask", IX86_BUILTIN_PUNPCKLWD512, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_V32HI_USI) -BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_pshufbv64qi3_mask, "__builtin_ia32_pshufb512_mask", IX86_BUILTIN_PSHUFB512, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI_V64QI_UDI) -BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_pshufhwv32hi_mask, "__builtin_ia32_pshufhw512_mask", IX86_BUILTIN_PSHUFHW512, UNKNOWN, (int) V32HI_FTYPE_V32HI_INT_V32HI_USI) -BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_pshuflwv32hi_mask, "__builtin_ia32_pshuflw512_mask", IX86_BUILTIN_PSHUFLW512, UNKNOWN, (int) V32HI_FTYPE_V32HI_INT_V32HI_USI) -BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_ashrv32hi3_mask, "__builtin_ia32_psrawi512_mask", IX86_BUILTIN_PSRAWI512, UNKNOWN, (int) V32HI_FTYPE_V32HI_INT_V32HI_USI_COUNT) -BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_ashrv32hi3_mask, "__builtin_ia32_psraw512_mask", IX86_BUILTIN_PSRAW512, UNKNOWN, (int) V32HI_FTYPE_V32HI_V8HI_V32HI_USI_COUNT) -BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_lshrv32hi3_mask, "__builtin_ia32_psrlwi512_mask", IX86_BUILTIN_PSRLWI512, UNKNOWN, (int) V32HI_FTYPE_V32HI_INT_V32HI_USI_COUNT) -BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_lshrv32hi3_mask, "__builtin_ia32_psrlw512_mask", IX86_BUILTIN_PSRLW512, UNKNOWN, (int) V32HI_FTYPE_V32HI_V8HI_V32HI_USI_COUNT) -BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_cvtb2maskv64qi, "__builtin_ia32_cvtb2mask512", IX86_BUILTIN_CVTB2MASK512, UNKNOWN, (int) UDI_FTYPE_V64QI) -BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_cvtw2maskv32hi, "__builtin_ia32_cvtw2mask512", IX86_BUILTIN_CVTW2MASK512, UNKNOWN, (int) USI_FTYPE_V32HI) -BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_cvtmask2bv64qi, "__builtin_ia32_cvtmask2b512", IX86_BUILTIN_CVTMASK2B512, UNKNOWN, (int) V64QI_FTYPE_UDI) -BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_cvtmask2wv32hi, "__builtin_ia32_cvtmask2w512", IX86_BUILTIN_CVTMASK2W512, UNKNOWN, (int) V32HI_FTYPE_USI) -BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_eqv64qi3_mask, "__builtin_ia32_pcmpeqb512_mask", IX86_BUILTIN_PCMPEQB512_MASK, UNKNOWN, (int) UDI_FTYPE_V64QI_V64QI_UDI) -BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_eqv32hi3_mask, "__builtin_ia32_pcmpeqw512_mask", IX86_BUILTIN_PCMPEQW512_MASK, UNKNOWN, (int) USI_FTYPE_V32HI_V32HI_USI) -BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_gtv64qi3_mask, "__builtin_ia32_pcmpgtb512_mask", IX86_BUILTIN_PCMPGTB512_MASK, UNKNOWN, (int) UDI_FTYPE_V64QI_V64QI_UDI) -BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_gtv32hi3_mask, "__builtin_ia32_pcmpgtw512_mask", IX86_BUILTIN_PCMPGTW512_MASK, UNKNOWN, (int) USI_FTYPE_V32HI_V32HI_USI) -BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_testmv64qi3_mask, "__builtin_ia32_ptestmb512", IX86_BUILTIN_PTESTMB512, UNKNOWN, (int) UDI_FTYPE_V64QI_V64QI_UDI) -BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_testmv32hi3_mask, "__builtin_ia32_ptestmw512", IX86_BUILTIN_PTESTMW512, UNKNOWN, (int) USI_FTYPE_V32HI_V32HI_USI) -BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_testnmv64qi3_mask, "__builtin_ia32_ptestnmb512", IX86_BUILTIN_PTESTNMB512, UNKNOWN, (int) UDI_FTYPE_V64QI_V64QI_UDI) -BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_testnmv32hi3_mask, "__builtin_ia32_ptestnmw512", IX86_BUILTIN_PTESTNMW512, UNKNOWN, (int) USI_FTYPE_V32HI_V32HI_USI) -BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_ashlvv32hi_mask, "__builtin_ia32_psllv32hi_mask", IX86_BUILTIN_PSLLVV32HI, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_V32HI_USI) -BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_absv64qi2_mask, "__builtin_ia32_pabsb512_mask", IX86_BUILTIN_PABSB512, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI_UDI) -BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_absv32hi2_mask, "__builtin_ia32_pabsw512_mask", IX86_BUILTIN_PABSW512, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_USI) -BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_blendmv32hi, "__builtin_ia32_blendmw_512_mask", IX86_BUILTIN_BLENDMW512, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_USI) -BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_blendmv64qi, "__builtin_ia32_blendmb_512_mask", IX86_BUILTIN_BLENDMB512, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI_UDI) -BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_cmpv64qi3_mask, "__builtin_ia32_cmpb512_mask", IX86_BUILTIN_CMPB512, UNKNOWN, (int) UDI_FTYPE_V64QI_V64QI_INT_UDI) -BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_cmpv32hi3_mask, "__builtin_ia32_cmpw512_mask", IX86_BUILTIN_CMPW512, UNKNOWN, (int) USI_FTYPE_V32HI_V32HI_INT_USI) -BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_ucmpv64qi3_mask, "__builtin_ia32_ucmpb512_mask", IX86_BUILTIN_UCMPB512, UNKNOWN, (int) UDI_FTYPE_V64QI_V64QI_INT_UDI) -BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_ucmpv32hi3_mask, "__builtin_ia32_ucmpw512_mask", IX86_BUILTIN_UCMPW512, UNKNOWN, (int) USI_FTYPE_V32HI_V32HI_INT_USI) +BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_kunpckdi, "__builtin_ia32_kunpckdi", IX86_BUILTIN_KUNPCKDQ, UNKNOWN, (int) UDI_FTYPE_UDI_UDI) +BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_packusdw_mask, "__builtin_ia32_packusdw512_mask", IX86_BUILTIN_PACKUSDW512, UNKNOWN, (int) V32HI_FTYPE_V16SI_V16SI_V32HI_USI) +BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_ashlv4ti3, "__builtin_ia32_pslldq512", IX86_BUILTIN_PSLLDQ512, UNKNOWN, (int) V8DI_FTYPE_V8DI_INT_CONVERT) +BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_lshrv4ti3, "__builtin_ia32_psrldq512", IX86_BUILTIN_PSRLDQ512, UNKNOWN, (int) V8DI_FTYPE_V8DI_INT_CONVERT) +BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_packssdw_mask, "__builtin_ia32_packssdw512_mask", IX86_BUILTIN_PACKSSDW512, UNKNOWN, (int) V32HI_FTYPE_V16SI_V16SI_V32HI_USI) +BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_palignrv4ti, "__builtin_ia32_palignr512", IX86_BUILTIN_PALIGNR512, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_INT_CONVERT) +BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_palignrv64qi_mask, "__builtin_ia32_palignr512_mask", IX86_BUILTIN_PALIGNR512_MASK, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_INT_V8DI_UDI_CONVERT) +BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_loadv32hi_mask, "__builtin_ia32_movdquhi512_mask", IX86_BUILTIN_MOVDQUHI512_MASK, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_USI) +BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_loadv64qi_mask, "__builtin_ia32_movdquqi512_mask", IX86_BUILTIN_MOVDQUQI512_MASK, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI_UDI) +BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_psadbw, "__builtin_ia32_psadbw512", IX86_BUILTIN_PSADBW512, UNKNOWN, (int) V8DI_FTYPE_V64QI_V64QI) +BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_dbpsadbwv32hi_mask, "__builtin_ia32_dbpsadbw512_mask", IX86_BUILTIN_DBPSADBW512, UNKNOWN, (int) V32HI_FTYPE_V64QI_V64QI_INT_V32HI_USI) +BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_vec_dupv64qi_mask, "__builtin_ia32_pbroadcastb512_mask", IX86_BUILTIN_PBROADCASTB512, UNKNOWN, (int) V64QI_FTYPE_V16QI_V64QI_UDI) +BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_vec_dup_gprv64qi_mask, "__builtin_ia32_pbroadcastb512_gpr_mask", IX86_BUILTIN_PBROADCASTB512_GPR, UNKNOWN, (int) V64QI_FTYPE_QI_V64QI_UDI) +BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_vec_dupv32hi_mask, "__builtin_ia32_pbroadcastw512_mask", IX86_BUILTIN_PBROADCASTW512, UNKNOWN, (int) V32HI_FTYPE_V8HI_V32HI_USI) +BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_vec_dup_gprv32hi_mask, "__builtin_ia32_pbroadcastw512_gpr_mask", IX86_BUILTIN_PBROADCASTW512_GPR, UNKNOWN, (int) V32HI_FTYPE_HI_V32HI_USI) +BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_sign_extendv32qiv32hi2_mask, "__builtin_ia32_pmovsxbw512_mask", IX86_BUILTIN_PMOVSXBW512_MASK, UNKNOWN, (int) V32HI_FTYPE_V32QI_V32HI_USI) +BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_zero_extendv32qiv32hi2_mask, "__builtin_ia32_pmovzxbw512_mask", IX86_BUILTIN_PMOVZXBW512_MASK, UNKNOWN, (int) V32HI_FTYPE_V32QI_V32HI_USI) +BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_permvarv32hi_mask, "__builtin_ia32_permvarhi512_mask", IX86_BUILTIN_VPERMVARHI512_MASK, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_V32HI_USI) +BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_vpermt2varv32hi3_mask, "__builtin_ia32_vpermt2varhi512_mask", IX86_BUILTIN_VPERMT2VARHI512, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_V32HI_USI) +BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_vpermt2varv32hi3_maskz, "__builtin_ia32_vpermt2varhi512_maskz", IX86_BUILTIN_VPERMT2VARHI512_MASKZ, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_V32HI_USI) +BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_vpermi2varv32hi3_mask, "__builtin_ia32_vpermi2varhi512_mask", IX86_BUILTIN_VPERMI2VARHI512, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_V32HI_USI) +BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_uavgv64qi3_mask, "__builtin_ia32_pavgb512_mask", IX86_BUILTIN_PAVGB512, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI_V64QI_UDI) +BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_uavgv32hi3_mask, "__builtin_ia32_pavgw512_mask", IX86_BUILTIN_PAVGW512, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_V32HI_USI) +BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_addv64qi3_mask, "__builtin_ia32_paddb512_mask", IX86_BUILTIN_PADDB512, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI_V64QI_UDI) +BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_subv64qi3_mask, "__builtin_ia32_psubb512_mask", IX86_BUILTIN_PSUBB512, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI_V64QI_UDI) +BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_sssubv64qi3_mask, "__builtin_ia32_psubsb512_mask", IX86_BUILTIN_PSUBSB512, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI_V64QI_UDI) +BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_ssaddv64qi3_mask, "__builtin_ia32_paddsb512_mask", IX86_BUILTIN_PADDSB512, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI_V64QI_UDI) +BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_ussubv64qi3_mask, "__builtin_ia32_psubusb512_mask", IX86_BUILTIN_PSUBUSB512, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI_V64QI_UDI) +BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_usaddv64qi3_mask, "__builtin_ia32_paddusb512_mask", IX86_BUILTIN_PADDUSB512, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI_V64QI_UDI) +BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_subv32hi3_mask, "__builtin_ia32_psubw512_mask", IX86_BUILTIN_PSUBW512, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_V32HI_USI) +BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_addv32hi3_mask, "__builtin_ia32_paddw512_mask", IX86_BUILTIN_PADDW512, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_V32HI_USI) +BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_sssubv32hi3_mask, "__builtin_ia32_psubsw512_mask", IX86_BUILTIN_PSUBSW512, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_V32HI_USI) +BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_ssaddv32hi3_mask, "__builtin_ia32_paddsw512_mask", IX86_BUILTIN_PADDSW512, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_V32HI_USI) +BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_ussubv32hi3_mask, "__builtin_ia32_psubusw512_mask", IX86_BUILTIN_PSUBUSW512, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_V32HI_USI) +BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_usaddv32hi3_mask, "__builtin_ia32_paddusw512_mask", IX86_BUILTIN_PADDUSW512, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_V32HI_USI) +BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_umaxv32hi3_mask, "__builtin_ia32_pmaxuw512_mask", IX86_BUILTIN_PMAXUW512, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_V32HI_USI) +BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_smaxv32hi3_mask, "__builtin_ia32_pmaxsw512_mask", IX86_BUILTIN_PMAXSW512, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_V32HI_USI) +BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_uminv32hi3_mask, "__builtin_ia32_pminuw512_mask", IX86_BUILTIN_PMINUW512, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_V32HI_USI) +BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_sminv32hi3_mask, "__builtin_ia32_pminsw512_mask", IX86_BUILTIN_PMINSW512, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_V32HI_USI) +BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_umaxv64qi3_mask, "__builtin_ia32_pmaxub512_mask", IX86_BUILTIN_PMAXUB512, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI_V64QI_UDI) +BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_smaxv64qi3_mask, "__builtin_ia32_pmaxsb512_mask", IX86_BUILTIN_PMAXSB512, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI_V64QI_UDI) +BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_uminv64qi3_mask, "__builtin_ia32_pminub512_mask", IX86_BUILTIN_PMINUB512, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI_V64QI_UDI) +BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_sminv64qi3_mask, "__builtin_ia32_pminsb512_mask", IX86_BUILTIN_PMINSB512, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI_V64QI_UDI) +BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_truncatev32hiv32qi2_mask, "__builtin_ia32_pmovwb512_mask", IX86_BUILTIN_PMOVWB512, UNKNOWN, (int) V32QI_FTYPE_V32HI_V32QI_USI) +BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_ss_truncatev32hiv32qi2_mask, "__builtin_ia32_pmovswb512_mask", IX86_BUILTIN_PMOVSWB512, UNKNOWN, (int) V32QI_FTYPE_V32HI_V32QI_USI) +BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_us_truncatev32hiv32qi2_mask, "__builtin_ia32_pmovuswb512_mask", IX86_BUILTIN_PMOVUSWB512, UNKNOWN, (int) V32QI_FTYPE_V32HI_V32QI_USI) +BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_umulhrswv32hi3_mask, "__builtin_ia32_pmulhrsw512_mask", IX86_BUILTIN_PMULHRSW512_MASK, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_V32HI_USI) +BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_umulv32hi3_highpart_mask, "__builtin_ia32_pmulhuw512_mask" , IX86_BUILTIN_PMULHUW512_MASK, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_V32HI_USI) +BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_smulv32hi3_highpart_mask, "__builtin_ia32_pmulhw512_mask" , IX86_BUILTIN_PMULHW512_MASK, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_V32HI_USI) +BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_mulv32hi3_mask, "__builtin_ia32_pmullw512_mask", IX86_BUILTIN_PMULLW512_MASK, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_V32HI_USI) +BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_ashlv32hi3_mask, "__builtin_ia32_psllwi512_mask", IX86_BUILTIN_PSLLWI512_MASK, UNKNOWN, (int) V32HI_FTYPE_V32HI_INT_V32HI_USI_COUNT) +BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_ashlv32hi3_mask, "__builtin_ia32_psllw512_mask", IX86_BUILTIN_PSLLW512_MASK, UNKNOWN, (int) V32HI_FTYPE_V32HI_V8HI_V32HI_USI_COUNT) +BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_packsswb_mask, "__builtin_ia32_packsswb512_mask", IX86_BUILTIN_PACKSSWB512, UNKNOWN, (int) V64QI_FTYPE_V32HI_V32HI_V64QI_UDI) +BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_packuswb_mask, "__builtin_ia32_packuswb512_mask", IX86_BUILTIN_PACKUSWB512, UNKNOWN, (int) V64QI_FTYPE_V32HI_V32HI_V64QI_UDI) +BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_ashrvv32hi_mask, "__builtin_ia32_psrav32hi_mask", IX86_BUILTIN_PSRAVV32HI, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_V32HI_USI) +BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_pmaddubsw512v32hi_mask, "__builtin_ia32_pmaddubsw512_mask", IX86_BUILTIN_PMADDUBSW512_MASK, UNKNOWN, (int) V32HI_FTYPE_V64QI_V64QI_V32HI_USI) +BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_pmaddwd512v32hi_mask, "__builtin_ia32_pmaddwd512_mask", IX86_BUILTIN_PMADDWD512_MASK, UNKNOWN, (int) V16SI_FTYPE_V32HI_V32HI_V16SI_UHI) +BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_lshrvv32hi_mask, "__builtin_ia32_psrlv32hi_mask", IX86_BUILTIN_PSRLVV32HI, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_V32HI_USI) +BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_interleave_highv64qi_mask, "__builtin_ia32_punpckhbw512_mask", IX86_BUILTIN_PUNPCKHBW512, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI_V64QI_UDI) +BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_interleave_highv32hi_mask, "__builtin_ia32_punpckhwd512_mask", IX86_BUILTIN_PUNPCKHWD512, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_V32HI_USI) +BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_interleave_lowv64qi_mask, "__builtin_ia32_punpcklbw512_mask", IX86_BUILTIN_PUNPCKLBW512, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI_V64QI_UDI) +BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_interleave_lowv32hi_mask, "__builtin_ia32_punpcklwd512_mask", IX86_BUILTIN_PUNPCKLWD512, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_V32HI_USI) +BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_pshufbv64qi3_mask, "__builtin_ia32_pshufb512_mask", IX86_BUILTIN_PSHUFB512, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI_V64QI_UDI) +BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_pshufhwv32hi_mask, "__builtin_ia32_pshufhw512_mask", IX86_BUILTIN_PSHUFHW512, UNKNOWN, (int) V32HI_FTYPE_V32HI_INT_V32HI_USI) +BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_pshuflwv32hi_mask, "__builtin_ia32_pshuflw512_mask", IX86_BUILTIN_PSHUFLW512, UNKNOWN, (int) V32HI_FTYPE_V32HI_INT_V32HI_USI) +BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_ashrv32hi3_mask, "__builtin_ia32_psrawi512_mask", IX86_BUILTIN_PSRAWI512, UNKNOWN, (int) V32HI_FTYPE_V32HI_INT_V32HI_USI_COUNT) +BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_ashrv32hi3_mask, "__builtin_ia32_psraw512_mask", IX86_BUILTIN_PSRAW512, UNKNOWN, (int) V32HI_FTYPE_V32HI_V8HI_V32HI_USI_COUNT) +BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_lshrv32hi3_mask, "__builtin_ia32_psrlwi512_mask", IX86_BUILTIN_PSRLWI512, UNKNOWN, (int) V32HI_FTYPE_V32HI_INT_V32HI_USI_COUNT) +BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_lshrv32hi3_mask, "__builtin_ia32_psrlw512_mask", IX86_BUILTIN_PSRLW512, UNKNOWN, (int) V32HI_FTYPE_V32HI_V8HI_V32HI_USI_COUNT) +BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_cvtb2maskv64qi, "__builtin_ia32_cvtb2mask512", IX86_BUILTIN_CVTB2MASK512, UNKNOWN, (int) UDI_FTYPE_V64QI) +BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_cvtw2maskv32hi, "__builtin_ia32_cvtw2mask512", IX86_BUILTIN_CVTW2MASK512, UNKNOWN, (int) USI_FTYPE_V32HI) +BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_cvtmask2bv64qi, "__builtin_ia32_cvtmask2b512", IX86_BUILTIN_CVTMASK2B512, UNKNOWN, (int) V64QI_FTYPE_UDI) +BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_cvtmask2wv32hi, "__builtin_ia32_cvtmask2w512", IX86_BUILTIN_CVTMASK2W512, UNKNOWN, (int) V32HI_FTYPE_USI) +BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_eqv64qi3_mask, "__builtin_ia32_pcmpeqb512_mask", IX86_BUILTIN_PCMPEQB512_MASK, UNKNOWN, (int) UDI_FTYPE_V64QI_V64QI_UDI) +BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_eqv32hi3_mask, "__builtin_ia32_pcmpeqw512_mask", IX86_BUILTIN_PCMPEQW512_MASK, UNKNOWN, (int) USI_FTYPE_V32HI_V32HI_USI) +BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_gtv64qi3_mask, "__builtin_ia32_pcmpgtb512_mask", IX86_BUILTIN_PCMPGTB512_MASK, UNKNOWN, (int) UDI_FTYPE_V64QI_V64QI_UDI) +BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_gtv32hi3_mask, "__builtin_ia32_pcmpgtw512_mask", IX86_BUILTIN_PCMPGTW512_MASK, UNKNOWN, (int) USI_FTYPE_V32HI_V32HI_USI) +BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_testmv64qi3_mask, "__builtin_ia32_ptestmb512", IX86_BUILTIN_PTESTMB512, UNKNOWN, (int) UDI_FTYPE_V64QI_V64QI_UDI) +BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_testmv32hi3_mask, "__builtin_ia32_ptestmw512", IX86_BUILTIN_PTESTMW512, UNKNOWN, (int) USI_FTYPE_V32HI_V32HI_USI) +BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_testnmv64qi3_mask, "__builtin_ia32_ptestnmb512", IX86_BUILTIN_PTESTNMB512, UNKNOWN, (int) UDI_FTYPE_V64QI_V64QI_UDI) +BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_testnmv32hi3_mask, "__builtin_ia32_ptestnmw512", IX86_BUILTIN_PTESTNMW512, UNKNOWN, (int) USI_FTYPE_V32HI_V32HI_USI) +BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_ashlvv32hi_mask, "__builtin_ia32_psllv32hi_mask", IX86_BUILTIN_PSLLVV32HI, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_V32HI_USI) +BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_absv64qi2_mask, "__builtin_ia32_pabsb512_mask", IX86_BUILTIN_PABSB512, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI_UDI) +BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_absv32hi2_mask, "__builtin_ia32_pabsw512_mask", IX86_BUILTIN_PABSW512, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_USI) +BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_blendmv32hi, "__builtin_ia32_blendmw_512_mask", IX86_BUILTIN_BLENDMW512, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_USI) +BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_blendmv64qi, "__builtin_ia32_blendmb_512_mask", IX86_BUILTIN_BLENDMB512, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI_UDI) +BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_cmpv64qi3_mask, "__builtin_ia32_cmpb512_mask", IX86_BUILTIN_CMPB512, UNKNOWN, (int) UDI_FTYPE_V64QI_V64QI_INT_UDI) +BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_cmpv32hi3_mask, "__builtin_ia32_cmpw512_mask", IX86_BUILTIN_CMPW512, UNKNOWN, (int) USI_FTYPE_V32HI_V32HI_INT_USI) +BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_ucmpv64qi3_mask, "__builtin_ia32_ucmpb512_mask", IX86_BUILTIN_UCMPB512, UNKNOWN, (int) UDI_FTYPE_V64QI_V64QI_INT_UDI) +BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_ucmpv32hi3_mask, "__builtin_ia32_ucmpw512_mask", IX86_BUILTIN_UCMPW512, UNKNOWN, (int) USI_FTYPE_V32HI_V32HI_INT_USI) /* AVX512IFMA */ BDESC (OPTION_MASK_ISA_AVX512IFMA, 0, CODE_FOR_vpmadd52luqv8di_mask, "__builtin_ia32_vpmadd52luq512_mask", IX86_BUILTIN_VPMADD52LUQ512, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_V8DI_UQI) From patchwork Thu Sep 21 07:20:05 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Hu, Lin1" X-Patchwork-Id: 1837521 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=intel.com header.i=@intel.com header.a=rsa-sha256 header.s=Intel header.b=LPWF7tui; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4Rrn3T021vz1ynF for ; Thu, 21 Sep 2023 17:24:52 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 3B919385DC03 for ; Thu, 21 Sep 2023 07:24:49 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mgamail.intel.com (mgamail.intel.com [192.55.52.115]) by sourceware.org (Postfix) with ESMTPS id 22A543856962 for ; Thu, 21 Sep 2023 07:22:43 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 22A543856962 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=intel.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1695280964; x=1726816964; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=k4eeJ6eBmWzjNK3eBAJLsckmg1YsEYH+7qzeabkYJRw=; b=LPWF7tuiju2YdZ49irToHQlRhxpHwamTAAWmmCP0adc9VY0ItVTZeCuR waEc4P1MNRXsXqNgAZtXUj/asHJhGe353EmD0Mp0O2xcdPfnCUVhzoJPQ XD/V/euy0N7M+DnF1C5iLD+hILPssuxiMZCZ+sv/8tEHZM8neZLFufGEF e4umIE+SYUori1oNK5LOmPpGyCtykPYvtbSdC1FuqrHoBlTA/IV/5a5It Uw2BVabmfV943AKK/eN3+0No6IXD07V0vE1hIk+ZtvkW4nlHUTX3jZ3Ce D91DU48ZUXifjH+ayPEXRGkyzFrpZfQEklGljCTU/BD4sMFun+oFhevjz w==; X-IronPort-AV: E=McAfee;i="6600,9927,10839"; a="380352171" X-IronPort-AV: E=Sophos;i="6.03,164,1694761200"; d="scan'208";a="380352171" Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Sep 2023 00:22:23 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10839"; a="817262207" X-IronPort-AV: E=Sophos;i="6.03,164,1694761200"; d="scan'208";a="817262207" Received: from shvmail03.sh.intel.com ([10.239.245.20]) by fmsmga004.fm.intel.com with ESMTP; 21 Sep 2023 00:22:15 -0700 Received: from shliclel4217.sh.intel.com (shliclel4217.sh.intel.com [10.239.240.127]) by shvmail03.sh.intel.com (Postfix) with ESMTP id 4E8CF1005138; Thu, 21 Sep 2023 15:22:14 +0800 (CST) From: "Hu, Lin1" To: gcc-patches@gcc.gnu.org Cc: hongtao.liu@intel.com, ubizjak@gmail.com, haochen.jiang@intel.com Subject: [PATCH 10/18] [PATCH 4/5] Add OPTION_MASK_ISA2_EVEX512 for 512 bit builtins Date: Thu, 21 Sep 2023 15:20:05 +0800 Message-Id: <20230921072013.2124750-11-lin1.hu@intel.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20230921072013.2124750-1-lin1.hu@intel.com> References: <20230921072013.2124750-1-lin1.hu@intel.com> MIME-Version: 1.0 X-Spam-Status: No, score=-11.8 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, SPF_HELO_NONE, SPF_NONE, TXREP, UPPERCASE_50_75 autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org From: Haochen Jiang gcc/ChangeLog: * config/i386/i386-builtin.def (BDESC): Add OPTION_MASK_ISA2_EVEX512. --- gcc/config/i386/i386-builtin.def | 188 +++++++++++++++---------------- 1 file changed, 94 insertions(+), 94 deletions(-) diff --git a/gcc/config/i386/i386-builtin.def b/gcc/config/i386/i386-builtin.def index 167d530a537..8250e2998cd 100644 --- a/gcc/config/i386/i386-builtin.def +++ b/gcc/config/i386/i386-builtin.def @@ -299,8 +299,8 @@ BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_sto BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_storev64qi_mask, "__builtin_ia32_storedquqi512_mask", IX86_BUILTIN_STOREDQUQI512_MASK, UNKNOWN, (int) VOID_FTYPE_PCHAR_V64QI_UDI) /* AVX512VP2INTERSECT */ -BDESC (0, OPTION_MASK_ISA2_AVX512VP2INTERSECT, CODE_FOR_nothing, "__builtin_ia32_2intersectd512", IX86_BUILTIN_2INTERSECTD512, UNKNOWN, (int) VOID_FTYPE_PUHI_PUHI_V16SI_V16SI) -BDESC (0, OPTION_MASK_ISA2_AVX512VP2INTERSECT, CODE_FOR_nothing, "__builtin_ia32_2intersectq512", IX86_BUILTIN_2INTERSECTQ512, UNKNOWN, (int) VOID_FTYPE_PUQI_PUQI_V8DI_V8DI) +BDESC (0, OPTION_MASK_ISA2_AVX512VP2INTERSECT | OPTION_MASK_ISA2_EVEX512, CODE_FOR_nothing, "__builtin_ia32_2intersectd512", IX86_BUILTIN_2INTERSECTD512, UNKNOWN, (int) VOID_FTYPE_PUHI_PUHI_V16SI_V16SI) +BDESC (0, OPTION_MASK_ISA2_AVX512VP2INTERSECT | OPTION_MASK_ISA2_EVEX512, CODE_FOR_nothing, "__builtin_ia32_2intersectq512", IX86_BUILTIN_2INTERSECTQ512, UNKNOWN, (int) VOID_FTYPE_PUQI_PUQI_V8DI_V8DI) BDESC (0, OPTION_MASK_ISA2_AVX512VP2INTERSECT, CODE_FOR_nothing, "__builtin_ia32_2intersectd256", IX86_BUILTIN_2INTERSECTD256, UNKNOWN, (int) VOID_FTYPE_PUQI_PUQI_V8SI_V8SI) BDESC (0, OPTION_MASK_ISA2_AVX512VP2INTERSECT, CODE_FOR_nothing, "__builtin_ia32_2intersectq256", IX86_BUILTIN_2INTERSECTQ256, UNKNOWN, (int) VOID_FTYPE_PUQI_PUQI_V4DI_V4DI) BDESC (0, OPTION_MASK_ISA2_AVX512VP2INTERSECT, CODE_FOR_nothing, "__builtin_ia32_2intersectd128", IX86_BUILTIN_2INTERSECTD128, UNKNOWN, (int) VOID_FTYPE_PUQI_PUQI_V4SI_V4SI) @@ -430,17 +430,17 @@ BDESC (OPTION_MASK_ISA_PKU, 0, CODE_FOR_rdpkru, "__builtin_ia32_rdpkru", IX86_B BDESC (OPTION_MASK_ISA_PKU, 0, CODE_FOR_wrpkru, "__builtin_ia32_wrpkru", IX86_BUILTIN_WRPKRU, UNKNOWN, (int) VOID_FTYPE_UNSIGNED) /* VBMI2 */ -BDESC (OPTION_MASK_ISA_AVX512VBMI2, 0, CODE_FOR_compressstorev64qi_mask, "__builtin_ia32_compressstoreuqi512_mask", IX86_BUILTIN_PCOMPRESSBSTORE512, UNKNOWN, (int) VOID_FTYPE_PV64QI_V64QI_UDI) -BDESC (OPTION_MASK_ISA_AVX512VBMI2, 0, CODE_FOR_compressstorev32hi_mask, "__builtin_ia32_compressstoreuhi512_mask", IX86_BUILTIN_PCOMPRESSWSTORE512, UNKNOWN, (int) VOID_FTYPE_PV32HI_V32HI_USI) +BDESC (OPTION_MASK_ISA_AVX512VBMI2, OPTION_MASK_ISA2_EVEX512, CODE_FOR_compressstorev64qi_mask, "__builtin_ia32_compressstoreuqi512_mask", IX86_BUILTIN_PCOMPRESSBSTORE512, UNKNOWN, (int) VOID_FTYPE_PV64QI_V64QI_UDI) +BDESC (OPTION_MASK_ISA_AVX512VBMI2, OPTION_MASK_ISA2_EVEX512, CODE_FOR_compressstorev32hi_mask, "__builtin_ia32_compressstoreuhi512_mask", IX86_BUILTIN_PCOMPRESSWSTORE512, UNKNOWN, (int) VOID_FTYPE_PV32HI_V32HI_USI) BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_compressstorev32qi_mask, "__builtin_ia32_compressstoreuqi256_mask", IX86_BUILTIN_PCOMPRESSBSTORE256, UNKNOWN, (int) VOID_FTYPE_PV32QI_V32QI_USI) BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_compressstorev16qi_mask, "__builtin_ia32_compressstoreuqi128_mask", IX86_BUILTIN_PCOMPRESSBSTORE128, UNKNOWN, (int) VOID_FTYPE_PV16QI_V16QI_UHI) BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_compressstorev16hi_mask, "__builtin_ia32_compressstoreuhi256_mask", IX86_BUILTIN_PCOMPRESSWSTORE256, UNKNOWN, (int) VOID_FTYPE_PV16HI_V16HI_UHI) BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_compressstorev8hi_mask, "__builtin_ia32_compressstoreuhi128_mask", IX86_BUILTIN_PCOMPRESSWSTORE128, UNKNOWN, (int) VOID_FTYPE_PV8HI_V8HI_UQI) -BDESC (OPTION_MASK_ISA_AVX512VBMI2, 0, CODE_FOR_expandv64qi_mask, "__builtin_ia32_expandloadqi512_mask", IX86_BUILTIN_PEXPANDBLOAD512, UNKNOWN, (int) V64QI_FTYPE_PCV64QI_V64QI_UDI) -BDESC (OPTION_MASK_ISA_AVX512VBMI2, 0, CODE_FOR_expandv64qi_maskz, "__builtin_ia32_expandloadqi512_maskz", IX86_BUILTIN_PEXPANDBLOAD512Z, UNKNOWN, (int) V64QI_FTYPE_PCV64QI_V64QI_UDI) -BDESC (OPTION_MASK_ISA_AVX512VBMI2, 0, CODE_FOR_expandv32hi_mask, "__builtin_ia32_expandloadhi512_mask", IX86_BUILTIN_PEXPANDWLOAD512, UNKNOWN, (int) V32HI_FTYPE_PCV32HI_V32HI_USI) -BDESC (OPTION_MASK_ISA_AVX512VBMI2, 0, CODE_FOR_expandv32hi_maskz, "__builtin_ia32_expandloadhi512_maskz", IX86_BUILTIN_PEXPANDWLOAD512Z, UNKNOWN, (int) V32HI_FTYPE_PCV32HI_V32HI_USI) +BDESC (OPTION_MASK_ISA_AVX512VBMI2, OPTION_MASK_ISA2_EVEX512, CODE_FOR_expandv64qi_mask, "__builtin_ia32_expandloadqi512_mask", IX86_BUILTIN_PEXPANDBLOAD512, UNKNOWN, (int) V64QI_FTYPE_PCV64QI_V64QI_UDI) +BDESC (OPTION_MASK_ISA_AVX512VBMI2, OPTION_MASK_ISA2_EVEX512, CODE_FOR_expandv64qi_maskz, "__builtin_ia32_expandloadqi512_maskz", IX86_BUILTIN_PEXPANDBLOAD512Z, UNKNOWN, (int) V64QI_FTYPE_PCV64QI_V64QI_UDI) +BDESC (OPTION_MASK_ISA_AVX512VBMI2, OPTION_MASK_ISA2_EVEX512, CODE_FOR_expandv32hi_mask, "__builtin_ia32_expandloadhi512_mask", IX86_BUILTIN_PEXPANDWLOAD512, UNKNOWN, (int) V32HI_FTYPE_PCV32HI_V32HI_USI) +BDESC (OPTION_MASK_ISA_AVX512VBMI2, OPTION_MASK_ISA2_EVEX512, CODE_FOR_expandv32hi_maskz, "__builtin_ia32_expandloadhi512_maskz", IX86_BUILTIN_PEXPANDWLOAD512Z, UNKNOWN, (int) V32HI_FTYPE_PCV32HI_V32HI_USI) BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_expandv32qi_mask, "__builtin_ia32_expandloadqi256_mask", IX86_BUILTIN_PEXPANDBLOAD256, UNKNOWN, (int) V32QI_FTYPE_PCV32QI_V32QI_USI) BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_expandv32qi_maskz, "__builtin_ia32_expandloadqi256_maskz", IX86_BUILTIN_PEXPANDBLOAD256Z, UNKNOWN, (int) V32QI_FTYPE_PCV32QI_V32QI_USI) @@ -2534,10 +2534,10 @@ BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_ucm BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_ucmpv32hi3_mask, "__builtin_ia32_ucmpw512_mask", IX86_BUILTIN_UCMPW512, UNKNOWN, (int) USI_FTYPE_V32HI_V32HI_INT_USI) /* AVX512IFMA */ -BDESC (OPTION_MASK_ISA_AVX512IFMA, 0, CODE_FOR_vpmadd52luqv8di_mask, "__builtin_ia32_vpmadd52luq512_mask", IX86_BUILTIN_VPMADD52LUQ512, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_V8DI_UQI) -BDESC (OPTION_MASK_ISA_AVX512IFMA, 0, CODE_FOR_vpmadd52luqv8di_maskz, "__builtin_ia32_vpmadd52luq512_maskz", IX86_BUILTIN_VPMADD52LUQ512_MASKZ, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_V8DI_UQI) -BDESC (OPTION_MASK_ISA_AVX512IFMA, 0, CODE_FOR_vpmadd52huqv8di_mask, "__builtin_ia32_vpmadd52huq512_mask", IX86_BUILTIN_VPMADD52HUQ512, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_V8DI_UQI) -BDESC (OPTION_MASK_ISA_AVX512IFMA, 0, CODE_FOR_vpmadd52huqv8di_maskz, "__builtin_ia32_vpmadd52huq512_maskz", IX86_BUILTIN_VPMADD52HUQ512_MASKZ, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_V8DI_UQI) +BDESC (OPTION_MASK_ISA_AVX512IFMA, OPTION_MASK_ISA2_EVEX512, CODE_FOR_vpmadd52luqv8di_mask, "__builtin_ia32_vpmadd52luq512_mask", IX86_BUILTIN_VPMADD52LUQ512, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_V8DI_UQI) +BDESC (OPTION_MASK_ISA_AVX512IFMA, OPTION_MASK_ISA2_EVEX512, CODE_FOR_vpmadd52luqv8di_maskz, "__builtin_ia32_vpmadd52luq512_maskz", IX86_BUILTIN_VPMADD52LUQ512_MASKZ, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_V8DI_UQI) +BDESC (OPTION_MASK_ISA_AVX512IFMA, OPTION_MASK_ISA2_EVEX512, CODE_FOR_vpmadd52huqv8di_mask, "__builtin_ia32_vpmadd52huq512_mask", IX86_BUILTIN_VPMADD52HUQ512, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_V8DI_UQI) +BDESC (OPTION_MASK_ISA_AVX512IFMA, OPTION_MASK_ISA2_EVEX512, CODE_FOR_vpmadd52huqv8di_maskz, "__builtin_ia32_vpmadd52huq512_maskz", IX86_BUILTIN_VPMADD52HUQ512_MASKZ, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_V8DI_UQI) BDESC (OPTION_MASK_ISA_AVX512IFMA | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpmadd52luqv4di_mask, "__builtin_ia32_vpmadd52luq256_mask", IX86_BUILTIN_VPMADD52LUQ256, UNKNOWN, (int) V4DI_FTYPE_V4DI_V4DI_V4DI_UQI) BDESC (OPTION_MASK_ISA_AVX512IFMA | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpmadd52luqv4di_maskz, "__builtin_ia32_vpmadd52luq256_maskz", IX86_BUILTIN_VPMADD52LUQ256_MASKZ, UNKNOWN, (int) V4DI_FTYPE_V4DI_V4DI_V4DI_UQI) BDESC (OPTION_MASK_ISA_AVX512IFMA | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpmadd52huqv4di_mask, "__builtin_ia32_vpmadd52huq256_mask", IX86_BUILTIN_VPMADD52HUQ256, UNKNOWN, (int) V4DI_FTYPE_V4DI_V4DI_V4DI_UQI) @@ -2552,13 +2552,13 @@ BDESC (OPTION_MASK_ISA_AVX512IFMA | OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_A BDESC (OPTION_MASK_ISA_AVX512IFMA | OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVXIFMA, CODE_FOR_vpmadd52huqv2di, "__builtin_ia32_vpmadd52huq128", IX86_BUINTIN_VPMADD52HUQ128, UNKNOWN, (int) V2DI_FTYPE_V2DI_V2DI_V2DI) /* AVX512VBMI */ -BDESC (OPTION_MASK_ISA_AVX512VBMI, 0, CODE_FOR_vpmultishiftqbv64qi_mask, "__builtin_ia32_vpmultishiftqb512_mask", IX86_BUILTIN_VPMULTISHIFTQB512, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI_V64QI_UDI) +BDESC (OPTION_MASK_ISA_AVX512VBMI, OPTION_MASK_ISA2_EVEX512, CODE_FOR_vpmultishiftqbv64qi_mask, "__builtin_ia32_vpmultishiftqb512_mask", IX86_BUILTIN_VPMULTISHIFTQB512, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI_V64QI_UDI) BDESC (OPTION_MASK_ISA_AVX512VBMI | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpmultishiftqbv32qi_mask, "__builtin_ia32_vpmultishiftqb256_mask", IX86_BUILTIN_VPMULTISHIFTQB256, UNKNOWN, (int) V32QI_FTYPE_V32QI_V32QI_V32QI_USI) BDESC (OPTION_MASK_ISA_AVX512VBMI | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpmultishiftqbv16qi_mask, "__builtin_ia32_vpmultishiftqb128_mask", IX86_BUILTIN_VPMULTISHIFTQB128, UNKNOWN, (int) V16QI_FTYPE_V16QI_V16QI_V16QI_UHI) -BDESC (OPTION_MASK_ISA_AVX512VBMI, 0, CODE_FOR_avx512bw_permvarv64qi_mask, "__builtin_ia32_permvarqi512_mask", IX86_BUILTIN_VPERMVARQI512_MASK, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI_V64QI_UDI) -BDESC (OPTION_MASK_ISA_AVX512VBMI, 0, CODE_FOR_avx512bw_vpermt2varv64qi3_mask, "__builtin_ia32_vpermt2varqi512_mask", IX86_BUILTIN_VPERMT2VARQI512, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI_V64QI_UDI) -BDESC (OPTION_MASK_ISA_AVX512VBMI, 0, CODE_FOR_avx512bw_vpermt2varv64qi3_maskz, "__builtin_ia32_vpermt2varqi512_maskz", IX86_BUILTIN_VPERMT2VARQI512_MASKZ, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI_V64QI_UDI) -BDESC (OPTION_MASK_ISA_AVX512VBMI, 0, CODE_FOR_avx512bw_vpermi2varv64qi3_mask, "__builtin_ia32_vpermi2varqi512_mask", IX86_BUILTIN_VPERMI2VARQI512, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI_V64QI_UDI) +BDESC (OPTION_MASK_ISA_AVX512VBMI, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_permvarv64qi_mask, "__builtin_ia32_permvarqi512_mask", IX86_BUILTIN_VPERMVARQI512_MASK, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI_V64QI_UDI) +BDESC (OPTION_MASK_ISA_AVX512VBMI, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_vpermt2varv64qi3_mask, "__builtin_ia32_vpermt2varqi512_mask", IX86_BUILTIN_VPERMT2VARQI512, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI_V64QI_UDI) +BDESC (OPTION_MASK_ISA_AVX512VBMI, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_vpermt2varv64qi3_maskz, "__builtin_ia32_vpermt2varqi512_maskz", IX86_BUILTIN_VPERMT2VARQI512_MASKZ, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI_V64QI_UDI) +BDESC (OPTION_MASK_ISA_AVX512VBMI, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_vpermi2varv64qi3_mask, "__builtin_ia32_vpermi2varqi512_mask", IX86_BUILTIN_VPERMI2VARQI512, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI_V64QI_UDI) BDESC (OPTION_MASK_ISA_AVX512VBMI | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_avx512vl_permvarv32qi_mask, "__builtin_ia32_permvarqi256_mask", IX86_BUILTIN_VPERMVARQI256_MASK, UNKNOWN, (int) V32QI_FTYPE_V32QI_V32QI_V32QI_USI) BDESC (OPTION_MASK_ISA_AVX512VBMI | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_avx512vl_permvarv16qi_mask, "__builtin_ia32_permvarqi128_mask", IX86_BUILTIN_VPERMVARQI128_MASK, UNKNOWN, (int) V16QI_FTYPE_V16QI_V16QI_V16QI_UHI) BDESC (OPTION_MASK_ISA_AVX512VBMI | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_avx512vl_vpermt2varv32qi3_mask, "__builtin_ia32_vpermt2varqi256_mask", IX86_BUILTIN_VPERMT2VARQI256, UNKNOWN, (int) V32QI_FTYPE_V32QI_V32QI_V32QI_USI) @@ -2569,16 +2569,16 @@ BDESC (OPTION_MASK_ISA_AVX512VBMI | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_avx512 BDESC (OPTION_MASK_ISA_AVX512VBMI | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_avx512vl_vpermi2varv16qi3_mask, "__builtin_ia32_vpermi2varqi128_mask", IX86_BUILTIN_VPERMI2VARQI128, UNKNOWN, (int) V16QI_FTYPE_V16QI_V16QI_V16QI_UHI) /* VBMI2 */ -BDESC (OPTION_MASK_ISA_AVX512VBMI2, 0, CODE_FOR_compressv64qi_mask, "__builtin_ia32_compressqi512_mask", IX86_BUILTIN_PCOMPRESSB512, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI_UDI) -BDESC (OPTION_MASK_ISA_AVX512VBMI2, 0, CODE_FOR_compressv32hi_mask, "__builtin_ia32_compresshi512_mask", IX86_BUILTIN_PCOMPRESSW512, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_USI) +BDESC (OPTION_MASK_ISA_AVX512VBMI2, OPTION_MASK_ISA2_EVEX512, CODE_FOR_compressv64qi_mask, "__builtin_ia32_compressqi512_mask", IX86_BUILTIN_PCOMPRESSB512, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI_UDI) +BDESC (OPTION_MASK_ISA_AVX512VBMI2, OPTION_MASK_ISA2_EVEX512, CODE_FOR_compressv32hi_mask, "__builtin_ia32_compresshi512_mask", IX86_BUILTIN_PCOMPRESSW512, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_USI) BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_compressv32qi_mask, "__builtin_ia32_compressqi256_mask", IX86_BUILTIN_PCOMPRESSB256, UNKNOWN, (int) V32QI_FTYPE_V32QI_V32QI_USI) BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_compressv16qi_mask, "__builtin_ia32_compressqi128_mask", IX86_BUILTIN_PCOMPRESSB128, UNKNOWN, (int) V16QI_FTYPE_V16QI_V16QI_UHI) BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_compressv16hi_mask, "__builtin_ia32_compresshi256_mask", IX86_BUILTIN_PCOMPRESSW256, UNKNOWN, (int) V16HI_FTYPE_V16HI_V16HI_UHI) BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_compressv8hi_mask, "__builtin_ia32_compresshi128_mask", IX86_BUILTIN_PCOMPRESSW128, UNKNOWN, (int) V8HI_FTYPE_V8HI_V8HI_UQI) -BDESC (OPTION_MASK_ISA_AVX512VBMI2, 0, CODE_FOR_expandv64qi_mask, "__builtin_ia32_expandqi512_mask", IX86_BUILTIN_PEXPANDB512, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI_UDI) -BDESC (OPTION_MASK_ISA_AVX512VBMI2, 0, CODE_FOR_expandv64qi_maskz, "__builtin_ia32_expandqi512_maskz", IX86_BUILTIN_PEXPANDB512Z, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI_UDI) -BDESC (OPTION_MASK_ISA_AVX512VBMI2, 0, CODE_FOR_expandv32hi_mask, "__builtin_ia32_expandhi512_mask", IX86_BUILTIN_PEXPANDW512, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_USI) -BDESC (OPTION_MASK_ISA_AVX512VBMI2, 0, CODE_FOR_expandv32hi_maskz, "__builtin_ia32_expandhi512_maskz", IX86_BUILTIN_PEXPANDW512Z, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_USI) +BDESC (OPTION_MASK_ISA_AVX512VBMI2, OPTION_MASK_ISA2_EVEX512, CODE_FOR_expandv64qi_mask, "__builtin_ia32_expandqi512_mask", IX86_BUILTIN_PEXPANDB512, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI_UDI) +BDESC (OPTION_MASK_ISA_AVX512VBMI2, OPTION_MASK_ISA2_EVEX512, CODE_FOR_expandv64qi_maskz, "__builtin_ia32_expandqi512_maskz", IX86_BUILTIN_PEXPANDB512Z, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI_UDI) +BDESC (OPTION_MASK_ISA_AVX512VBMI2, OPTION_MASK_ISA2_EVEX512, CODE_FOR_expandv32hi_mask, "__builtin_ia32_expandhi512_mask", IX86_BUILTIN_PEXPANDW512, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_USI) +BDESC (OPTION_MASK_ISA_AVX512VBMI2, OPTION_MASK_ISA2_EVEX512, CODE_FOR_expandv32hi_maskz, "__builtin_ia32_expandhi512_maskz", IX86_BUILTIN_PEXPANDW512Z, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_USI) BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_expandv32qi_mask, "__builtin_ia32_expandqi256_mask", IX86_BUILTIN_PEXPANDB256, UNKNOWN, (int) V32QI_FTYPE_V32QI_V32QI_USI) BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_expandv32qi_maskz, "__builtin_ia32_expandqi256_maskz", IX86_BUILTIN_PEXPANDB256Z, UNKNOWN, (int) V32QI_FTYPE_V32QI_V32QI_USI) BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_expandv16qi_mask, "__builtin_ia32_expandqi128_mask", IX86_BUILTIN_PEXPANDB128, UNKNOWN, (int) V16QI_FTYPE_V16QI_V16QI_UHI) @@ -2587,64 +2587,64 @@ BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_expan BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_expandv16hi_maskz, "__builtin_ia32_expandhi256_maskz", IX86_BUILTIN_PEXPANDW256Z, UNKNOWN, (int) V16HI_FTYPE_V16HI_V16HI_UHI) BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_expandv8hi_mask, "__builtin_ia32_expandhi128_mask", IX86_BUILTIN_PEXPANDW128, UNKNOWN, (int) V8HI_FTYPE_V8HI_V8HI_UQI) BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_expandv8hi_maskz, "__builtin_ia32_expandhi128_maskz", IX86_BUILTIN_PEXPANDW128Z, UNKNOWN, (int) V8HI_FTYPE_V8HI_V8HI_UQI) -BDESC (OPTION_MASK_ISA_AVX512VBMI2, 0, CODE_FOR_vpshrd_v32hi, "__builtin_ia32_vpshrd_v32hi", IX86_BUILTIN_VPSHRDV32HI, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_INT) -BDESC (OPTION_MASK_ISA_AVX512VBMI2, 0, CODE_FOR_vpshrd_v32hi_mask, "__builtin_ia32_vpshrd_v32hi_mask", IX86_BUILTIN_VPSHRDV32HI_MASK, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_INT_V32HI_INT) +BDESC (OPTION_MASK_ISA_AVX512VBMI2, OPTION_MASK_ISA2_EVEX512, CODE_FOR_vpshrd_v32hi, "__builtin_ia32_vpshrd_v32hi", IX86_BUILTIN_VPSHRDV32HI, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_INT) +BDESC (OPTION_MASK_ISA_AVX512VBMI2, OPTION_MASK_ISA2_EVEX512, CODE_FOR_vpshrd_v32hi_mask, "__builtin_ia32_vpshrd_v32hi_mask", IX86_BUILTIN_VPSHRDV32HI_MASK, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_INT_V32HI_INT) BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpshrd_v16hi, "__builtin_ia32_vpshrd_v16hi", IX86_BUILTIN_VPSHRDV16HI, UNKNOWN, (int) V16HI_FTYPE_V16HI_V16HI_INT) BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpshrd_v16hi_mask, "__builtin_ia32_vpshrd_v16hi_mask", IX86_BUILTIN_VPSHRDV16HI_MASK, UNKNOWN, (int) V16HI_FTYPE_V16HI_V16HI_INT_V16HI_INT) BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpshrd_v8hi, "__builtin_ia32_vpshrd_v8hi", IX86_BUILTIN_VPSHRDV8HI, UNKNOWN, (int) V8HI_FTYPE_V8HI_V8HI_INT) BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpshrd_v8hi_mask, "__builtin_ia32_vpshrd_v8hi_mask", IX86_BUILTIN_VPSHRDV8HI_MASK, UNKNOWN, (int) V8HI_FTYPE_V8HI_V8HI_INT_V8HI_INT) -BDESC (OPTION_MASK_ISA_AVX512VBMI2, 0, CODE_FOR_vpshrd_v16si, "__builtin_ia32_vpshrd_v16si", IX86_BUILTIN_VPSHRDV16SI, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_INT) -BDESC (OPTION_MASK_ISA_AVX512VBMI2, 0, CODE_FOR_vpshrd_v16si_mask, "__builtin_ia32_vpshrd_v16si_mask", IX86_BUILTIN_VPSHRDV16SI_MASK, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_INT_V16SI_INT) +BDESC (OPTION_MASK_ISA_AVX512VBMI2, OPTION_MASK_ISA2_EVEX512, CODE_FOR_vpshrd_v16si, "__builtin_ia32_vpshrd_v16si", IX86_BUILTIN_VPSHRDV16SI, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_INT) +BDESC (OPTION_MASK_ISA_AVX512VBMI2, OPTION_MASK_ISA2_EVEX512, CODE_FOR_vpshrd_v16si_mask, "__builtin_ia32_vpshrd_v16si_mask", IX86_BUILTIN_VPSHRDV16SI_MASK, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_INT_V16SI_INT) BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpshrd_v8si, "__builtin_ia32_vpshrd_v8si", IX86_BUILTIN_VPSHRDV8SI, UNKNOWN, (int) V8SI_FTYPE_V8SI_V8SI_INT) BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpshrd_v8si_mask, "__builtin_ia32_vpshrd_v8si_mask", IX86_BUILTIN_VPSHRDV8SI_MASK, UNKNOWN, (int) V8SI_FTYPE_V8SI_V8SI_INT_V8SI_INT) BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpshrd_v4si, "__builtin_ia32_vpshrd_v4si", IX86_BUILTIN_VPSHRDV4SI, UNKNOWN, (int) V4SI_FTYPE_V4SI_V4SI_INT) BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpshrd_v4si_mask, "__builtin_ia32_vpshrd_v4si_mask", IX86_BUILTIN_VPSHRDV4SI_MASK, UNKNOWN, (int) V4SI_FTYPE_V4SI_V4SI_INT_V4SI_INT) -BDESC (OPTION_MASK_ISA_AVX512VBMI2, 0, CODE_FOR_vpshrd_v8di, "__builtin_ia32_vpshrd_v8di", IX86_BUILTIN_VPSHRDV8DI, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_INT) -BDESC (OPTION_MASK_ISA_AVX512VBMI2, 0, CODE_FOR_vpshrd_v8di_mask, "__builtin_ia32_vpshrd_v8di_mask", IX86_BUILTIN_VPSHRDV8DI_MASK, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_INT_V8DI_INT) +BDESC (OPTION_MASK_ISA_AVX512VBMI2, OPTION_MASK_ISA2_EVEX512, CODE_FOR_vpshrd_v8di, "__builtin_ia32_vpshrd_v8di", IX86_BUILTIN_VPSHRDV8DI, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_INT) +BDESC (OPTION_MASK_ISA_AVX512VBMI2, OPTION_MASK_ISA2_EVEX512, CODE_FOR_vpshrd_v8di_mask, "__builtin_ia32_vpshrd_v8di_mask", IX86_BUILTIN_VPSHRDV8DI_MASK, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_INT_V8DI_INT) BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpshrd_v4di, "__builtin_ia32_vpshrd_v4di", IX86_BUILTIN_VPSHRDV4DI, UNKNOWN, (int) V4DI_FTYPE_V4DI_V4DI_INT) BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpshrd_v4di_mask, "__builtin_ia32_vpshrd_v4di_mask", IX86_BUILTIN_VPSHRDV4DI_MASK, UNKNOWN, (int) V4DI_FTYPE_V4DI_V4DI_INT_V4DI_INT) BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpshrd_v2di, "__builtin_ia32_vpshrd_v2di", IX86_BUILTIN_VPSHRDV2DI, UNKNOWN, (int) V2DI_FTYPE_V2DI_V2DI_INT) BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpshrd_v2di_mask, "__builtin_ia32_vpshrd_v2di_mask", IX86_BUILTIN_VPSHRDV2DI_MASK, UNKNOWN, (int) V2DI_FTYPE_V2DI_V2DI_INT_V2DI_INT) -BDESC (OPTION_MASK_ISA_AVX512VBMI2, 0, CODE_FOR_vpshld_v32hi, "__builtin_ia32_vpshld_v32hi", IX86_BUILTIN_VPSHLDV32HI, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_INT) -BDESC (OPTION_MASK_ISA_AVX512VBMI2, 0, CODE_FOR_vpshld_v32hi_mask, "__builtin_ia32_vpshld_v32hi_mask", IX86_BUILTIN_VPSHLDV32HI_MASK, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_INT_V32HI_INT) +BDESC (OPTION_MASK_ISA_AVX512VBMI2, OPTION_MASK_ISA2_EVEX512, CODE_FOR_vpshld_v32hi, "__builtin_ia32_vpshld_v32hi", IX86_BUILTIN_VPSHLDV32HI, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_INT) +BDESC (OPTION_MASK_ISA_AVX512VBMI2, OPTION_MASK_ISA2_EVEX512, CODE_FOR_vpshld_v32hi_mask, "__builtin_ia32_vpshld_v32hi_mask", IX86_BUILTIN_VPSHLDV32HI_MASK, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_INT_V32HI_INT) BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpshld_v16hi, "__builtin_ia32_vpshld_v16hi", IX86_BUILTIN_VPSHLDV16HI, UNKNOWN, (int) V16HI_FTYPE_V16HI_V16HI_INT) BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpshld_v16hi_mask, "__builtin_ia32_vpshld_v16hi_mask", IX86_BUILTIN_VPSHLDV16HI_MASK, UNKNOWN, (int) V16HI_FTYPE_V16HI_V16HI_INT_V16HI_INT) BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpshld_v8hi, "__builtin_ia32_vpshld_v8hi", IX86_BUILTIN_VPSHLDV8HI, UNKNOWN, (int) V8HI_FTYPE_V8HI_V8HI_INT) BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpshld_v8hi_mask, "__builtin_ia32_vpshld_v8hi_mask", IX86_BUILTIN_VPSHLDV8HI_MASK, UNKNOWN, (int) V8HI_FTYPE_V8HI_V8HI_INT_V8HI_INT) -BDESC (OPTION_MASK_ISA_AVX512VBMI2, 0, CODE_FOR_vpshld_v16si, "__builtin_ia32_vpshld_v16si", IX86_BUILTIN_VPSHLDV16SI, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_INT) -BDESC (OPTION_MASK_ISA_AVX512VBMI2, 0, CODE_FOR_vpshld_v16si_mask, "__builtin_ia32_vpshld_v16si_mask", IX86_BUILTIN_VPSHLDV16SI_MASK, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_INT_V16SI_INT) +BDESC (OPTION_MASK_ISA_AVX512VBMI2, OPTION_MASK_ISA2_EVEX512, CODE_FOR_vpshld_v16si, "__builtin_ia32_vpshld_v16si", IX86_BUILTIN_VPSHLDV16SI, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_INT) +BDESC (OPTION_MASK_ISA_AVX512VBMI2, OPTION_MASK_ISA2_EVEX512, CODE_FOR_vpshld_v16si_mask, "__builtin_ia32_vpshld_v16si_mask", IX86_BUILTIN_VPSHLDV16SI_MASK, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_INT_V16SI_INT) BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpshld_v8si, "__builtin_ia32_vpshld_v8si", IX86_BUILTIN_VPSHLDV8SI, UNKNOWN, (int) V8SI_FTYPE_V8SI_V8SI_INT) BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpshld_v8si_mask, "__builtin_ia32_vpshld_v8si_mask", IX86_BUILTIN_VPSHLDV8SI_MASK, UNKNOWN, (int) V8SI_FTYPE_V8SI_V8SI_INT_V8SI_INT) BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpshld_v4si, "__builtin_ia32_vpshld_v4si", IX86_BUILTIN_VPSHLDV4SI, UNKNOWN, (int) V4SI_FTYPE_V4SI_V4SI_INT) BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpshld_v4si_mask, "__builtin_ia32_vpshld_v4si_mask", IX86_BUILTIN_VPSHLDV4SI_MASK, UNKNOWN, (int) V4SI_FTYPE_V4SI_V4SI_INT_V4SI_INT) -BDESC (OPTION_MASK_ISA_AVX512VBMI2, 0, CODE_FOR_vpshld_v8di, "__builtin_ia32_vpshld_v8di", IX86_BUILTIN_VPSHLDV8DI, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_INT) -BDESC (OPTION_MASK_ISA_AVX512VBMI2, 0, CODE_FOR_vpshld_v8di_mask, "__builtin_ia32_vpshld_v8di_mask", IX86_BUILTIN_VPSHLDV8DI_MASK, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_INT_V8DI_INT) +BDESC (OPTION_MASK_ISA_AVX512VBMI2, OPTION_MASK_ISA2_EVEX512, CODE_FOR_vpshld_v8di, "__builtin_ia32_vpshld_v8di", IX86_BUILTIN_VPSHLDV8DI, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_INT) +BDESC (OPTION_MASK_ISA_AVX512VBMI2, OPTION_MASK_ISA2_EVEX512, CODE_FOR_vpshld_v8di_mask, "__builtin_ia32_vpshld_v8di_mask", IX86_BUILTIN_VPSHLDV8DI_MASK, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_INT_V8DI_INT) BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpshld_v4di, "__builtin_ia32_vpshld_v4di", IX86_BUILTIN_VPSHLDV4DI, UNKNOWN, (int) V4DI_FTYPE_V4DI_V4DI_INT) BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpshld_v4di_mask, "__builtin_ia32_vpshld_v4di_mask", IX86_BUILTIN_VPSHLDV4DI_MASK, UNKNOWN, (int) V4DI_FTYPE_V4DI_V4DI_INT_V4DI_INT) BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpshld_v2di, "__builtin_ia32_vpshld_v2di", IX86_BUILTIN_VPSHLDV2DI, UNKNOWN, (int) V2DI_FTYPE_V2DI_V2DI_INT) BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpshld_v2di_mask, "__builtin_ia32_vpshld_v2di_mask", IX86_BUILTIN_VPSHLDV2DI_MASK, UNKNOWN, (int) V2DI_FTYPE_V2DI_V2DI_INT_V2DI_INT) -BDESC (OPTION_MASK_ISA_AVX512VBMI2, 0, CODE_FOR_vpshrdv_v32hi, "__builtin_ia32_vpshrdv_v32hi", IX86_BUILTIN_VPSHRDVV32HI, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_V32HI) -BDESC (OPTION_MASK_ISA_AVX512VBMI2, 0, CODE_FOR_vpshrdv_v32hi_mask, "__builtin_ia32_vpshrdv_v32hi_mask", IX86_BUILTIN_VPSHRDVV32HI_MASK, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_V32HI_USI) -BDESC (OPTION_MASK_ISA_AVX512VBMI2, 0, CODE_FOR_vpshrdv_v32hi_maskz, "__builtin_ia32_vpshrdv_v32hi_maskz", IX86_BUILTIN_VPSHRDVV32HI_MASKZ, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_V32HI_USI) +BDESC (OPTION_MASK_ISA_AVX512VBMI2, OPTION_MASK_ISA2_EVEX512, CODE_FOR_vpshrdv_v32hi, "__builtin_ia32_vpshrdv_v32hi", IX86_BUILTIN_VPSHRDVV32HI, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_V32HI) +BDESC (OPTION_MASK_ISA_AVX512VBMI2, OPTION_MASK_ISA2_EVEX512, CODE_FOR_vpshrdv_v32hi_mask, "__builtin_ia32_vpshrdv_v32hi_mask", IX86_BUILTIN_VPSHRDVV32HI_MASK, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_V32HI_USI) +BDESC (OPTION_MASK_ISA_AVX512VBMI2, OPTION_MASK_ISA2_EVEX512, CODE_FOR_vpshrdv_v32hi_maskz, "__builtin_ia32_vpshrdv_v32hi_maskz", IX86_BUILTIN_VPSHRDVV32HI_MASKZ, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_V32HI_USI) BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpshrdv_v16hi, "__builtin_ia32_vpshrdv_v16hi", IX86_BUILTIN_VPSHRDVV16HI, UNKNOWN, (int) V16HI_FTYPE_V16HI_V16HI_V16HI) BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpshrdv_v16hi_mask, "__builtin_ia32_vpshrdv_v16hi_mask", IX86_BUILTIN_VPSHRDVV16HI_MASK, UNKNOWN, (int) V16HI_FTYPE_V16HI_V16HI_V16HI_UHI) BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpshrdv_v16hi_maskz, "__builtin_ia32_vpshrdv_v16hi_maskz", IX86_BUILTIN_VPSHRDVV16HI_MASKZ, UNKNOWN, (int) V16HI_FTYPE_V16HI_V16HI_V16HI_UHI) BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpshrdv_v8hi, "__builtin_ia32_vpshrdv_v8hi", IX86_BUILTIN_VPSHRDVV8HI, UNKNOWN, (int) V8HI_FTYPE_V8HI_V8HI_V8HI) BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpshrdv_v8hi_mask, "__builtin_ia32_vpshrdv_v8hi_mask", IX86_BUILTIN_VPSHRDVV8HI_MASK, UNKNOWN, (int) V8HI_FTYPE_V8HI_V8HI_V8HI_UQI) BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpshrdv_v8hi_maskz, "__builtin_ia32_vpshrdv_v8hi_maskz", IX86_BUILTIN_VPSHRDVV8HI_MASKZ, UNKNOWN, (int) V8HI_FTYPE_V8HI_V8HI_V8HI_UQI) -BDESC (OPTION_MASK_ISA_AVX512VBMI2, 0, CODE_FOR_vpshrdv_v16si, "__builtin_ia32_vpshrdv_v16si", IX86_BUILTIN_VPSHRDVV16SI, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI) -BDESC (OPTION_MASK_ISA_AVX512VBMI2, 0, CODE_FOR_vpshrdv_v16si_mask, "__builtin_ia32_vpshrdv_v16si_mask", IX86_BUILTIN_VPSHRDVV16SI_MASK, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI) -BDESC (OPTION_MASK_ISA_AVX512VBMI2, 0, CODE_FOR_vpshrdv_v16si_maskz, "__builtin_ia32_vpshrdv_v16si_maskz", IX86_BUILTIN_VPSHRDVV16SI_MASKZ, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI) +BDESC (OPTION_MASK_ISA_AVX512VBMI2, OPTION_MASK_ISA2_EVEX512, CODE_FOR_vpshrdv_v16si, "__builtin_ia32_vpshrdv_v16si", IX86_BUILTIN_VPSHRDVV16SI, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI) +BDESC (OPTION_MASK_ISA_AVX512VBMI2, OPTION_MASK_ISA2_EVEX512, CODE_FOR_vpshrdv_v16si_mask, "__builtin_ia32_vpshrdv_v16si_mask", IX86_BUILTIN_VPSHRDVV16SI_MASK, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI) +BDESC (OPTION_MASK_ISA_AVX512VBMI2, OPTION_MASK_ISA2_EVEX512, CODE_FOR_vpshrdv_v16si_maskz, "__builtin_ia32_vpshrdv_v16si_maskz", IX86_BUILTIN_VPSHRDVV16SI_MASKZ, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI) BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpshrdv_v8si, "__builtin_ia32_vpshrdv_v8si", IX86_BUILTIN_VPSHRDVV8SI, UNKNOWN, (int) V8SI_FTYPE_V8SI_V8SI_V8SI) BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpshrdv_v8si_mask, "__builtin_ia32_vpshrdv_v8si_mask", IX86_BUILTIN_VPSHRDVV8SI_MASK, UNKNOWN, (int) V8SI_FTYPE_V8SI_V8SI_V8SI_UQI) BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpshrdv_v8si_maskz, "__builtin_ia32_vpshrdv_v8si_maskz", IX86_BUILTIN_VPSHRDVV8SI_MASKZ, UNKNOWN, (int) V8SI_FTYPE_V8SI_V8SI_V8SI_UQI) BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpshrdv_v4si, "__builtin_ia32_vpshrdv_v4si", IX86_BUILTIN_VPSHRDVV4SI, UNKNOWN, (int) V4SI_FTYPE_V4SI_V4SI_V4SI) BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpshrdv_v4si_mask, "__builtin_ia32_vpshrdv_v4si_mask", IX86_BUILTIN_VPSHRDVV4SI_MASK, UNKNOWN, (int) V4SI_FTYPE_V4SI_V4SI_V4SI_UQI) BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpshrdv_v4si_maskz, "__builtin_ia32_vpshrdv_v4si_maskz", IX86_BUILTIN_VPSHRDVV4SI_MASKZ, UNKNOWN, (int) V4SI_FTYPE_V4SI_V4SI_V4SI_UQI) -BDESC (OPTION_MASK_ISA_AVX512VBMI2, 0, CODE_FOR_vpshrdv_v8di, "__builtin_ia32_vpshrdv_v8di", IX86_BUILTIN_VPSHRDVV8DI, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_V8DI) -BDESC (OPTION_MASK_ISA_AVX512VBMI2, 0, CODE_FOR_vpshrdv_v8di_mask, "__builtin_ia32_vpshrdv_v8di_mask", IX86_BUILTIN_VPSHRDVV8DI_MASK, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_V8DI_UQI) -BDESC (OPTION_MASK_ISA_AVX512VBMI2, 0, CODE_FOR_vpshrdv_v8di_maskz, "__builtin_ia32_vpshrdv_v8di_maskz", IX86_BUILTIN_VPSHRDVV8DI_MASKZ, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_V8DI_UQI) +BDESC (OPTION_MASK_ISA_AVX512VBMI2, OPTION_MASK_ISA2_EVEX512, CODE_FOR_vpshrdv_v8di, "__builtin_ia32_vpshrdv_v8di", IX86_BUILTIN_VPSHRDVV8DI, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_V8DI) +BDESC (OPTION_MASK_ISA_AVX512VBMI2, OPTION_MASK_ISA2_EVEX512, CODE_FOR_vpshrdv_v8di_mask, "__builtin_ia32_vpshrdv_v8di_mask", IX86_BUILTIN_VPSHRDVV8DI_MASK, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_V8DI_UQI) +BDESC (OPTION_MASK_ISA_AVX512VBMI2, OPTION_MASK_ISA2_EVEX512, CODE_FOR_vpshrdv_v8di_maskz, "__builtin_ia32_vpshrdv_v8di_maskz", IX86_BUILTIN_VPSHRDVV8DI_MASKZ, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_V8DI_UQI) BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpshrdv_v4di, "__builtin_ia32_vpshrdv_v4di", IX86_BUILTIN_VPSHRDVV4DI, UNKNOWN, (int) V4DI_FTYPE_V4DI_V4DI_V4DI) BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpshrdv_v4di_mask, "__builtin_ia32_vpshrdv_v4di_mask", IX86_BUILTIN_VPSHRDVV4DI_MASK, UNKNOWN, (int) V4DI_FTYPE_V4DI_V4DI_V4DI_UQI) BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpshrdv_v4di_maskz, "__builtin_ia32_vpshrdv_v4di_maskz", IX86_BUILTIN_VPSHRDVV4DI_MASKZ, UNKNOWN, (int) V4DI_FTYPE_V4DI_V4DI_V4DI_UQI) @@ -2652,27 +2652,27 @@ BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpshr BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpshrdv_v2di_mask, "__builtin_ia32_vpshrdv_v2di_mask", IX86_BUILTIN_VPSHRDVV2DI_MASK, UNKNOWN, (int) V2DI_FTYPE_V2DI_V2DI_V2DI_UQI) BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpshrdv_v2di_maskz, "__builtin_ia32_vpshrdv_v2di_maskz", IX86_BUILTIN_VPSHRDVV2DI_MASKZ, UNKNOWN, (int) V2DI_FTYPE_V2DI_V2DI_V2DI_UQI) -BDESC (OPTION_MASK_ISA_AVX512VBMI2, 0, CODE_FOR_vpshldv_v32hi, "__builtin_ia32_vpshldv_v32hi", IX86_BUILTIN_VPSHLDVV32HI, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_V32HI) -BDESC (OPTION_MASK_ISA_AVX512VBMI2, 0, CODE_FOR_vpshldv_v32hi_mask, "__builtin_ia32_vpshldv_v32hi_mask", IX86_BUILTIN_VPSHLDVV32HI_MASK, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_V32HI_USI) -BDESC (OPTION_MASK_ISA_AVX512VBMI2, 0, CODE_FOR_vpshldv_v32hi_maskz, "__builtin_ia32_vpshldv_v32hi_maskz", IX86_BUILTIN_VPSHLDVV32HI_MASKZ, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_V32HI_USI) +BDESC (OPTION_MASK_ISA_AVX512VBMI2, OPTION_MASK_ISA2_EVEX512, CODE_FOR_vpshldv_v32hi, "__builtin_ia32_vpshldv_v32hi", IX86_BUILTIN_VPSHLDVV32HI, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_V32HI) +BDESC (OPTION_MASK_ISA_AVX512VBMI2, OPTION_MASK_ISA2_EVEX512, CODE_FOR_vpshldv_v32hi_mask, "__builtin_ia32_vpshldv_v32hi_mask", IX86_BUILTIN_VPSHLDVV32HI_MASK, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_V32HI_USI) +BDESC (OPTION_MASK_ISA_AVX512VBMI2, OPTION_MASK_ISA2_EVEX512, CODE_FOR_vpshldv_v32hi_maskz, "__builtin_ia32_vpshldv_v32hi_maskz", IX86_BUILTIN_VPSHLDVV32HI_MASKZ, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_V32HI_USI) BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpshldv_v16hi, "__builtin_ia32_vpshldv_v16hi", IX86_BUILTIN_VPSHLDVV16HI, UNKNOWN, (int) V16HI_FTYPE_V16HI_V16HI_V16HI) BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpshldv_v16hi_mask, "__builtin_ia32_vpshldv_v16hi_mask", IX86_BUILTIN_VPSHLDVV16HI_MASK, UNKNOWN, (int) V16HI_FTYPE_V16HI_V16HI_V16HI_UHI) BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpshldv_v16hi_maskz, "__builtin_ia32_vpshldv_v16hi_maskz", IX86_BUILTIN_VPSHLDVV16HI_MASKZ, UNKNOWN, (int) V16HI_FTYPE_V16HI_V16HI_V16HI_UHI) BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpshldv_v8hi, "__builtin_ia32_vpshldv_v8hi", IX86_BUILTIN_VPSHLDVV8HI, UNKNOWN, (int) V8HI_FTYPE_V8HI_V8HI_V8HI) BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpshldv_v8hi_mask, "__builtin_ia32_vpshldv_v8hi_mask", IX86_BUILTIN_VPSHLDVV8HI_MASK, UNKNOWN, (int) V8HI_FTYPE_V8HI_V8HI_V8HI_UQI) BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpshldv_v8hi_maskz, "__builtin_ia32_vpshldv_v8hi_maskz", IX86_BUILTIN_VPSHLDVV8HI_MASKZ, UNKNOWN, (int) V8HI_FTYPE_V8HI_V8HI_V8HI_UQI) -BDESC (OPTION_MASK_ISA_AVX512VBMI2, 0, CODE_FOR_vpshldv_v16si, "__builtin_ia32_vpshldv_v16si", IX86_BUILTIN_VPSHLDVV16SI, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI) -BDESC (OPTION_MASK_ISA_AVX512VBMI2, 0, CODE_FOR_vpshldv_v16si_mask, "__builtin_ia32_vpshldv_v16si_mask", IX86_BUILTIN_VPSHLDVV16SI_MASK, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI) -BDESC (OPTION_MASK_ISA_AVX512VBMI2, 0, CODE_FOR_vpshldv_v16si_maskz, "__builtin_ia32_vpshldv_v16si_maskz", IX86_BUILTIN_VPSHLDVV16SI_MASKZ, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI) +BDESC (OPTION_MASK_ISA_AVX512VBMI2, OPTION_MASK_ISA2_EVEX512, CODE_FOR_vpshldv_v16si, "__builtin_ia32_vpshldv_v16si", IX86_BUILTIN_VPSHLDVV16SI, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI) +BDESC (OPTION_MASK_ISA_AVX512VBMI2, OPTION_MASK_ISA2_EVEX512, CODE_FOR_vpshldv_v16si_mask, "__builtin_ia32_vpshldv_v16si_mask", IX86_BUILTIN_VPSHLDVV16SI_MASK, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI) +BDESC (OPTION_MASK_ISA_AVX512VBMI2, OPTION_MASK_ISA2_EVEX512, CODE_FOR_vpshldv_v16si_maskz, "__builtin_ia32_vpshldv_v16si_maskz", IX86_BUILTIN_VPSHLDVV16SI_MASKZ, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI) BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpshldv_v8si, "__builtin_ia32_vpshldv_v8si", IX86_BUILTIN_VPSHLDVV8SI, UNKNOWN, (int) V8SI_FTYPE_V8SI_V8SI_V8SI) BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpshldv_v8si_mask, "__builtin_ia32_vpshldv_v8si_mask", IX86_BUILTIN_VPSHLDVV8SI_MASK, UNKNOWN, (int) V8SI_FTYPE_V8SI_V8SI_V8SI_UQI) BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpshldv_v8si_maskz, "__builtin_ia32_vpshldv_v8si_maskz", IX86_BUILTIN_VPSHLDVV8SI_MASKZ, UNKNOWN, (int) V8SI_FTYPE_V8SI_V8SI_V8SI_UQI) BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpshldv_v4si, "__builtin_ia32_vpshldv_v4si", IX86_BUILTIN_VPSHLDVV4SI, UNKNOWN, (int) V4SI_FTYPE_V4SI_V4SI_V4SI) BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpshldv_v4si_mask, "__builtin_ia32_vpshldv_v4si_mask", IX86_BUILTIN_VPSHLDVV4SI_MASK, UNKNOWN, (int) V4SI_FTYPE_V4SI_V4SI_V4SI_UQI) BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpshldv_v4si_maskz, "__builtin_ia32_vpshldv_v4si_maskz", IX86_BUILTIN_VPSHLDVV4SI_MASKZ, UNKNOWN, (int) V4SI_FTYPE_V4SI_V4SI_V4SI_UQI) -BDESC (OPTION_MASK_ISA_AVX512VBMI2, 0, CODE_FOR_vpshldv_v8di, "__builtin_ia32_vpshldv_v8di", IX86_BUILTIN_VPSHLDVV8DI, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_V8DI) -BDESC (OPTION_MASK_ISA_AVX512VBMI2, 0, CODE_FOR_vpshldv_v8di_mask, "__builtin_ia32_vpshldv_v8di_mask", IX86_BUILTIN_VPSHLDVV8DI_MASK, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_V8DI_UQI) -BDESC (OPTION_MASK_ISA_AVX512VBMI2, 0, CODE_FOR_vpshldv_v8di_maskz, "__builtin_ia32_vpshldv_v8di_maskz", IX86_BUILTIN_VPSHLDVV8DI_MASKZ, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_V8DI_UQI) +BDESC (OPTION_MASK_ISA_AVX512VBMI2, OPTION_MASK_ISA2_EVEX512, CODE_FOR_vpshldv_v8di, "__builtin_ia32_vpshldv_v8di", IX86_BUILTIN_VPSHLDVV8DI, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_V8DI) +BDESC (OPTION_MASK_ISA_AVX512VBMI2, OPTION_MASK_ISA2_EVEX512, CODE_FOR_vpshldv_v8di_mask, "__builtin_ia32_vpshldv_v8di_mask", IX86_BUILTIN_VPSHLDVV8DI_MASK, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_V8DI_UQI) +BDESC (OPTION_MASK_ISA_AVX512VBMI2, OPTION_MASK_ISA2_EVEX512, CODE_FOR_vpshldv_v8di_maskz, "__builtin_ia32_vpshldv_v8di_maskz", IX86_BUILTIN_VPSHLDVV8DI_MASKZ, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_V8DI_UQI) BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpshldv_v4di, "__builtin_ia32_vpshldv_v4di", IX86_BUILTIN_VPSHLDVV4DI, UNKNOWN, (int) V4DI_FTYPE_V4DI_V4DI_V4DI) BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpshldv_v4di_mask, "__builtin_ia32_vpshldv_v4di_mask", IX86_BUILTIN_VPSHLDVV4DI_MASK, UNKNOWN, (int) V4DI_FTYPE_V4DI_V4DI_V4DI_UQI) BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpshldv_v4di_maskz, "__builtin_ia32_vpshldv_v4di_maskz", IX86_BUILTIN_VPSHLDVV4DI_MASKZ, UNKNOWN, (int) V4DI_FTYPE_V4DI_V4DI_V4DI_UQI) @@ -2681,20 +2681,20 @@ BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpshl BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpshldv_v2di_maskz, "__builtin_ia32_vpshldv_v2di_maskz", IX86_BUILTIN_VPSHLDVV2DI_MASKZ, UNKNOWN, (int) V2DI_FTYPE_V2DI_V2DI_V2DI_UQI) /* GFNI */ -BDESC (OPTION_MASK_ISA_GFNI | OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_vgf2p8affineinvqb_v64qi, "__builtin_ia32_vgf2p8affineinvqb_v64qi", IX86_BUILTIN_VGF2P8AFFINEINVQB512, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI_INT) -BDESC (OPTION_MASK_ISA_GFNI | OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_vgf2p8affineinvqb_v64qi_mask, "__builtin_ia32_vgf2p8affineinvqb_v64qi_mask", IX86_BUILTIN_VGF2P8AFFINEINVQB512MASK, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI_INT_V64QI_UDI) +BDESC (OPTION_MASK_ISA_GFNI | OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_vgf2p8affineinvqb_v64qi, "__builtin_ia32_vgf2p8affineinvqb_v64qi", IX86_BUILTIN_VGF2P8AFFINEINVQB512, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI_INT) +BDESC (OPTION_MASK_ISA_GFNI | OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_vgf2p8affineinvqb_v64qi_mask, "__builtin_ia32_vgf2p8affineinvqb_v64qi_mask", IX86_BUILTIN_VGF2P8AFFINEINVQB512MASK, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI_INT_V64QI_UDI) BDESC (OPTION_MASK_ISA_GFNI | OPTION_MASK_ISA_AVX, 0, CODE_FOR_vgf2p8affineinvqb_v32qi, "__builtin_ia32_vgf2p8affineinvqb_v32qi", IX86_BUILTIN_VGF2P8AFFINEINVQB256, UNKNOWN, (int) V32QI_FTYPE_V32QI_V32QI_INT) BDESC (OPTION_MASK_ISA_GFNI | OPTION_MASK_ISA_AVX512VL | OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_vgf2p8affineinvqb_v32qi_mask, "__builtin_ia32_vgf2p8affineinvqb_v32qi_mask", IX86_BUILTIN_VGF2P8AFFINEINVQB256MASK, UNKNOWN, (int) V32QI_FTYPE_V32QI_V32QI_INT_V32QI_USI) BDESC (OPTION_MASK_ISA_GFNI | OPTION_MASK_ISA_SSE2, 0, CODE_FOR_vgf2p8affineinvqb_v16qi, "__builtin_ia32_vgf2p8affineinvqb_v16qi", IX86_BUILTIN_VGF2P8AFFINEINVQB128, UNKNOWN, (int) V16QI_FTYPE_V16QI_V16QI_INT) BDESC (OPTION_MASK_ISA_GFNI | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vgf2p8affineinvqb_v16qi_mask, "__builtin_ia32_vgf2p8affineinvqb_v16qi_mask", IX86_BUILTIN_VGF2P8AFFINEINVQB128MASK, UNKNOWN, (int) V16QI_FTYPE_V16QI_V16QI_INT_V16QI_UHI) -BDESC (OPTION_MASK_ISA_GFNI | OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_vgf2p8affineqb_v64qi, "__builtin_ia32_vgf2p8affineqb_v64qi", IX86_BUILTIN_VGF2P8AFFINEQB512, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI_INT) -BDESC (OPTION_MASK_ISA_GFNI | OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_vgf2p8affineqb_v64qi_mask, "__builtin_ia32_vgf2p8affineqb_v64qi_mask", IX86_BUILTIN_VGF2P8AFFINEQB512MASK, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI_INT_V64QI_UDI) +BDESC (OPTION_MASK_ISA_GFNI | OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_vgf2p8affineqb_v64qi, "__builtin_ia32_vgf2p8affineqb_v64qi", IX86_BUILTIN_VGF2P8AFFINEQB512, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI_INT) +BDESC (OPTION_MASK_ISA_GFNI | OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_vgf2p8affineqb_v64qi_mask, "__builtin_ia32_vgf2p8affineqb_v64qi_mask", IX86_BUILTIN_VGF2P8AFFINEQB512MASK, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI_INT_V64QI_UDI) BDESC (OPTION_MASK_ISA_GFNI | OPTION_MASK_ISA_AVX, 0, CODE_FOR_vgf2p8affineqb_v32qi, "__builtin_ia32_vgf2p8affineqb_v32qi", IX86_BUILTIN_VGF2P8AFFINEQB256, UNKNOWN, (int) V32QI_FTYPE_V32QI_V32QI_INT) BDESC (OPTION_MASK_ISA_GFNI | OPTION_MASK_ISA_AVX512VL | OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_vgf2p8affineqb_v32qi_mask, "__builtin_ia32_vgf2p8affineqb_v32qi_mask", IX86_BUILTIN_VGF2P8AFFINEQB256MASK, UNKNOWN, (int) V32QI_FTYPE_V32QI_V32QI_INT_V32QI_USI) BDESC (OPTION_MASK_ISA_GFNI | OPTION_MASK_ISA_SSE2, 0, CODE_FOR_vgf2p8affineqb_v16qi, "__builtin_ia32_vgf2p8affineqb_v16qi", IX86_BUILTIN_VGF2P8AFFINEQB128, UNKNOWN, (int) V16QI_FTYPE_V16QI_V16QI_INT) BDESC (OPTION_MASK_ISA_GFNI | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vgf2p8affineqb_v16qi_mask, "__builtin_ia32_vgf2p8affineqb_v16qi_mask", IX86_BUILTIN_VGF2P8AFFINEQB128MASK, UNKNOWN, (int) V16QI_FTYPE_V16QI_V16QI_INT_V16QI_UHI) -BDESC (OPTION_MASK_ISA_GFNI | OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_vgf2p8mulb_v64qi, "__builtin_ia32_vgf2p8mulb_v64qi", IX86_BUILTIN_VGF2P8MULB512, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI) -BDESC (OPTION_MASK_ISA_GFNI | OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_vgf2p8mulb_v64qi_mask, "__builtin_ia32_vgf2p8mulb_v64qi_mask", IX86_BUILTIN_VGF2P8MULB512MASK, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI_V64QI_UDI) +BDESC (OPTION_MASK_ISA_GFNI | OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_vgf2p8mulb_v64qi, "__builtin_ia32_vgf2p8mulb_v64qi", IX86_BUILTIN_VGF2P8MULB512, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI) +BDESC (OPTION_MASK_ISA_GFNI | OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_vgf2p8mulb_v64qi_mask, "__builtin_ia32_vgf2p8mulb_v64qi_mask", IX86_BUILTIN_VGF2P8MULB512MASK, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI_V64QI_UDI) BDESC (OPTION_MASK_ISA_GFNI | OPTION_MASK_ISA_AVX, 0, CODE_FOR_vgf2p8mulb_v32qi, "__builtin_ia32_vgf2p8mulb_v32qi", IX86_BUILTIN_VGF2P8MULB256, UNKNOWN, (int) V32QI_FTYPE_V32QI_V32QI) BDESC (OPTION_MASK_ISA_GFNI | OPTION_MASK_ISA_AVX512VL | OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_vgf2p8mulb_v32qi_mask, "__builtin_ia32_vgf2p8mulb_v32qi_mask", IX86_BUILTIN_VGF2P8MULB256MASK, UNKNOWN, (int) V32QI_FTYPE_V32QI_V32QI_V32QI_USI) BDESC (OPTION_MASK_ISA_GFNI | OPTION_MASK_ISA_SSE2, 0, CODE_FOR_vgf2p8mulb_v16qi, "__builtin_ia32_vgf2p8mulb_v16qi", IX86_BUILTIN_VGF2P8MULB128, UNKNOWN, (int) V16QI_FTYPE_V16QI_V16QI) @@ -2702,9 +2702,9 @@ BDESC (OPTION_MASK_ISA_GFNI | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vgf2p8mulb_v /* AVX512_VNNI */ -BDESC (OPTION_MASK_ISA_AVX512VNNI, 0, CODE_FOR_vpdpbusd_v16si, "__builtin_ia32_vpdpbusd_v16si", IX86_BUILTIN_VPDPBUSDV16SI, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI) -BDESC (OPTION_MASK_ISA_AVX512VNNI, 0, CODE_FOR_vpdpbusd_v16si_mask, "__builtin_ia32_vpdpbusd_v16si_mask", IX86_BUILTIN_VPDPBUSDV16SI_MASK, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI) -BDESC (OPTION_MASK_ISA_AVX512VNNI, 0, CODE_FOR_vpdpbusd_v16si_maskz, "__builtin_ia32_vpdpbusd_v16si_maskz", IX86_BUILTIN_VPDPBUSDV16SI_MASKZ, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI) +BDESC (OPTION_MASK_ISA_AVX512VNNI, OPTION_MASK_ISA2_EVEX512, CODE_FOR_vpdpbusd_v16si, "__builtin_ia32_vpdpbusd_v16si", IX86_BUILTIN_VPDPBUSDV16SI, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI) +BDESC (OPTION_MASK_ISA_AVX512VNNI, OPTION_MASK_ISA2_EVEX512, CODE_FOR_vpdpbusd_v16si_mask, "__builtin_ia32_vpdpbusd_v16si_mask", IX86_BUILTIN_VPDPBUSDV16SI_MASK, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI) +BDESC (OPTION_MASK_ISA_AVX512VNNI, OPTION_MASK_ISA2_EVEX512, CODE_FOR_vpdpbusd_v16si_maskz, "__builtin_ia32_vpdpbusd_v16si_maskz", IX86_BUILTIN_VPDPBUSDV16SI_MASKZ, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI) BDESC (OPTION_MASK_ISA_AVX512VNNI | OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVXVNNI, CODE_FOR_vpdpbusd_v8si, "__builtin_ia32_vpdpbusd_v8si", IX86_BUILTIN_VPDPBUSDV8SI, UNKNOWN, (int) V8SI_FTYPE_V8SI_V8SI_V8SI) BDESC (OPTION_MASK_ISA_AVX512VNNI | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpdpbusd_v8si_mask, "__builtin_ia32_vpdpbusd_v8si_mask", IX86_BUILTIN_VPDPBUSDV8SI_MASK, UNKNOWN, (int) V8SI_FTYPE_V8SI_V8SI_V8SI_UQI) BDESC (OPTION_MASK_ISA_AVX512VNNI | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpdpbusd_v8si_maskz, "__builtin_ia32_vpdpbusd_v8si_maskz", IX86_BUILTIN_VPDPBUSDV8SI_MASKZ, UNKNOWN, (int) V8SI_FTYPE_V8SI_V8SI_V8SI_UQI) @@ -2712,9 +2712,9 @@ BDESC (OPTION_MASK_ISA_AVX512VNNI | OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_A BDESC (OPTION_MASK_ISA_AVX512VNNI | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpdpbusd_v4si_mask, "__builtin_ia32_vpdpbusd_v4si_mask", IX86_BUILTIN_VPDPBUSDV4SI_MASK, UNKNOWN, (int) V4SI_FTYPE_V4SI_V4SI_V4SI_UQI) BDESC (OPTION_MASK_ISA_AVX512VNNI | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpdpbusd_v4si_maskz, "__builtin_ia32_vpdpbusd_v4si_maskz", IX86_BUILTIN_VPDPBUSDV4SI_MASKZ, UNKNOWN, (int) V4SI_FTYPE_V4SI_V4SI_V4SI_UQI) -BDESC (OPTION_MASK_ISA_AVX512VNNI, 0, CODE_FOR_vpdpbusds_v16si, "__builtin_ia32_vpdpbusds_v16si", IX86_BUILTIN_VPDPBUSDSV16SI, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI) -BDESC (OPTION_MASK_ISA_AVX512VNNI, 0, CODE_FOR_vpdpbusds_v16si_mask, "__builtin_ia32_vpdpbusds_v16si_mask", IX86_BUILTIN_VPDPBUSDSV16SI_MASK, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI) -BDESC (OPTION_MASK_ISA_AVX512VNNI, 0, CODE_FOR_vpdpbusds_v16si_maskz, "__builtin_ia32_vpdpbusds_v16si_maskz", IX86_BUILTIN_VPDPBUSDSV16SI_MASKZ, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI) +BDESC (OPTION_MASK_ISA_AVX512VNNI, OPTION_MASK_ISA2_EVEX512, CODE_FOR_vpdpbusds_v16si, "__builtin_ia32_vpdpbusds_v16si", IX86_BUILTIN_VPDPBUSDSV16SI, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI) +BDESC (OPTION_MASK_ISA_AVX512VNNI, OPTION_MASK_ISA2_EVEX512, CODE_FOR_vpdpbusds_v16si_mask, "__builtin_ia32_vpdpbusds_v16si_mask", IX86_BUILTIN_VPDPBUSDSV16SI_MASK, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI) +BDESC (OPTION_MASK_ISA_AVX512VNNI, OPTION_MASK_ISA2_EVEX512, CODE_FOR_vpdpbusds_v16si_maskz, "__builtin_ia32_vpdpbusds_v16si_maskz", IX86_BUILTIN_VPDPBUSDSV16SI_MASKZ, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI) BDESC (OPTION_MASK_ISA_AVX512VNNI | OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVXVNNI, CODE_FOR_vpdpbusds_v8si, "__builtin_ia32_vpdpbusds_v8si", IX86_BUILTIN_VPDPBUSDSV8SI, UNKNOWN, (int) V8SI_FTYPE_V8SI_V8SI_V8SI) BDESC (OPTION_MASK_ISA_AVX512VNNI | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpdpbusds_v8si_mask, "__builtin_ia32_vpdpbusds_v8si_mask", IX86_BUILTIN_VPDPBUSDSV8SI_MASK, UNKNOWN, (int) V8SI_FTYPE_V8SI_V8SI_V8SI_UQI) BDESC (OPTION_MASK_ISA_AVX512VNNI | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpdpbusds_v8si_maskz, "__builtin_ia32_vpdpbusds_v8si_maskz", IX86_BUILTIN_VPDPBUSDSV8SI_MASKZ, UNKNOWN, (int) V8SI_FTYPE_V8SI_V8SI_V8SI_UQI) @@ -2722,9 +2722,9 @@ BDESC (OPTION_MASK_ISA_AVX512VNNI | OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_A BDESC (OPTION_MASK_ISA_AVX512VNNI | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpdpbusds_v4si_mask, "__builtin_ia32_vpdpbusds_v4si_mask", IX86_BUILTIN_VPDPBUSDSV4SI_MASK, UNKNOWN, (int) V4SI_FTYPE_V4SI_V4SI_V4SI_UQI) BDESC (OPTION_MASK_ISA_AVX512VNNI | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpdpbusds_v4si_maskz, "__builtin_ia32_vpdpbusds_v4si_maskz", IX86_BUILTIN_VPDPBUSDSV4SI_MASKZ, UNKNOWN, (int) V4SI_FTYPE_V4SI_V4SI_V4SI_UQI) -BDESC (OPTION_MASK_ISA_AVX512VNNI, 0, CODE_FOR_vpdpwssd_v16si, "__builtin_ia32_vpdpwssd_v16si", IX86_BUILTIN_VPDPWSSDV16SI, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI) -BDESC (OPTION_MASK_ISA_AVX512VNNI, 0, CODE_FOR_vpdpwssd_v16si_mask, "__builtin_ia32_vpdpwssd_v16si_mask", IX86_BUILTIN_VPDPWSSDV16SI_MASK, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI) -BDESC (OPTION_MASK_ISA_AVX512VNNI, 0, CODE_FOR_vpdpwssd_v16si_maskz, "__builtin_ia32_vpdpwssd_v16si_maskz", IX86_BUILTIN_VPDPWSSDV16SI_MASKZ, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI) +BDESC (OPTION_MASK_ISA_AVX512VNNI, OPTION_MASK_ISA2_EVEX512, CODE_FOR_vpdpwssd_v16si, "__builtin_ia32_vpdpwssd_v16si", IX86_BUILTIN_VPDPWSSDV16SI, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI) +BDESC (OPTION_MASK_ISA_AVX512VNNI, OPTION_MASK_ISA2_EVEX512, CODE_FOR_vpdpwssd_v16si_mask, "__builtin_ia32_vpdpwssd_v16si_mask", IX86_BUILTIN_VPDPWSSDV16SI_MASK, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI) +BDESC (OPTION_MASK_ISA_AVX512VNNI, OPTION_MASK_ISA2_EVEX512, CODE_FOR_vpdpwssd_v16si_maskz, "__builtin_ia32_vpdpwssd_v16si_maskz", IX86_BUILTIN_VPDPWSSDV16SI_MASKZ, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI) BDESC (OPTION_MASK_ISA_AVX512VNNI | OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVXVNNI, CODE_FOR_vpdpwssd_v8si, "__builtin_ia32_vpdpwssd_v8si", IX86_BUILTIN_VPDPWSSDV8SI, UNKNOWN, (int) V8SI_FTYPE_V8SI_V8SI_V8SI) BDESC (OPTION_MASK_ISA_AVX512VNNI | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpdpwssd_v8si_mask, "__builtin_ia32_vpdpwssd_v8si_mask", IX86_BUILTIN_VPDPWSSDV8SI_MASK, UNKNOWN, (int) V8SI_FTYPE_V8SI_V8SI_V8SI_UQI) BDESC (OPTION_MASK_ISA_AVX512VNNI | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpdpwssd_v8si_maskz, "__builtin_ia32_vpdpwssd_v8si_maskz", IX86_BUILTIN_VPDPWSSDV8SI_MASKZ, UNKNOWN, (int) V8SI_FTYPE_V8SI_V8SI_V8SI_UQI) @@ -2732,9 +2732,9 @@ BDESC (OPTION_MASK_ISA_AVX512VNNI | OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_A BDESC (OPTION_MASK_ISA_AVX512VNNI | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpdpwssd_v4si_mask, "__builtin_ia32_vpdpwssd_v4si_mask", IX86_BUILTIN_VPDPWSSDV4SI_MASK, UNKNOWN, (int) V4SI_FTYPE_V4SI_V4SI_V4SI_UQI) BDESC (OPTION_MASK_ISA_AVX512VNNI | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpdpwssd_v4si_maskz, "__builtin_ia32_vpdpwssd_v4si_maskz", IX86_BUILTIN_VPDPWSSDV4SI_MASKZ, UNKNOWN, (int) V4SI_FTYPE_V4SI_V4SI_V4SI_UQI) -BDESC (OPTION_MASK_ISA_AVX512VNNI, 0, CODE_FOR_vpdpwssds_v16si, "__builtin_ia32_vpdpwssds_v16si", IX86_BUILTIN_VPDPWSSDSV16SI, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI) -BDESC (OPTION_MASK_ISA_AVX512VNNI, 0, CODE_FOR_vpdpwssds_v16si_mask, "__builtin_ia32_vpdpwssds_v16si_mask", IX86_BUILTIN_VPDPWSSDSV16SI_MASK, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI) -BDESC (OPTION_MASK_ISA_AVX512VNNI, 0, CODE_FOR_vpdpwssds_v16si_maskz, "__builtin_ia32_vpdpwssds_v16si_maskz", IX86_BUILTIN_VPDPWSSDSV16SI_MASKZ, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI) +BDESC (OPTION_MASK_ISA_AVX512VNNI, OPTION_MASK_ISA2_EVEX512, CODE_FOR_vpdpwssds_v16si, "__builtin_ia32_vpdpwssds_v16si", IX86_BUILTIN_VPDPWSSDSV16SI, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI) +BDESC (OPTION_MASK_ISA_AVX512VNNI, OPTION_MASK_ISA2_EVEX512, CODE_FOR_vpdpwssds_v16si_mask, "__builtin_ia32_vpdpwssds_v16si_mask", IX86_BUILTIN_VPDPWSSDSV16SI_MASK, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI) +BDESC (OPTION_MASK_ISA_AVX512VNNI, OPTION_MASK_ISA2_EVEX512, CODE_FOR_vpdpwssds_v16si_maskz, "__builtin_ia32_vpdpwssds_v16si_maskz", IX86_BUILTIN_VPDPWSSDSV16SI_MASKZ, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI) BDESC (OPTION_MASK_ISA_AVX512VNNI | OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVXVNNI, CODE_FOR_vpdpwssds_v8si, "__builtin_ia32_vpdpwssds_v8si", IX86_BUILTIN_VPDPWSSDSV8SI, UNKNOWN, (int) V8SI_FTYPE_V8SI_V8SI_V8SI) BDESC (OPTION_MASK_ISA_AVX512VNNI | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpdpwssds_v8si_mask, "__builtin_ia32_vpdpwssds_v8si_mask", IX86_BUILTIN_VPDPWSSDSV8SI_MASK, UNKNOWN, (int) V8SI_FTYPE_V8SI_V8SI_V8SI_UQI) BDESC (OPTION_MASK_ISA_AVX512VNNI | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpdpwssds_v8si_maskz, "__builtin_ia32_vpdpwssds_v8si_maskz", IX86_BUILTIN_VPDPWSSDSV8SI_MASKZ, UNKNOWN, (int) V8SI_FTYPE_V8SI_V8SI_V8SI_UQI) @@ -2773,13 +2773,13 @@ BDESC (0, OPTION_MASK_ISA2_AVXVNNIINT16, CODE_FOR_vpdpwuuds_v4si, "__builtin_ia3 /* VPCLMULQDQ */ BDESC (OPTION_MASK_ISA_VPCLMULQDQ | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpclmulqdq_v2di, "__builtin_ia32_vpclmulqdq_v2di", IX86_BUILTIN_VPCLMULQDQ2, UNKNOWN, (int) V2DI_FTYPE_V2DI_V2DI_INT) BDESC (OPTION_MASK_ISA_VPCLMULQDQ | OPTION_MASK_ISA_AVX, 0, CODE_FOR_vpclmulqdq_v4di, "__builtin_ia32_vpclmulqdq_v4di", IX86_BUILTIN_VPCLMULQDQ4, UNKNOWN, (int) V4DI_FTYPE_V4DI_V4DI_INT) -BDESC (OPTION_MASK_ISA_VPCLMULQDQ | OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_vpclmulqdq_v8di, "__builtin_ia32_vpclmulqdq_v8di", IX86_BUILTIN_VPCLMULQDQ8, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_INT) +BDESC (OPTION_MASK_ISA_VPCLMULQDQ | OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_vpclmulqdq_v8di, "__builtin_ia32_vpclmulqdq_v8di", IX86_BUILTIN_VPCLMULQDQ8, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_INT) /* VPOPCNTDQ */ -BDESC (OPTION_MASK_ISA_AVX512VPOPCNTDQ, 0, CODE_FOR_vpopcountv16si, "__builtin_ia32_vpopcountd_v16si", IX86_BUILTIN_VPOPCOUNTDV16SI, UNKNOWN, (int) V16SI_FTYPE_V16SI) -BDESC (OPTION_MASK_ISA_AVX512VPOPCNTDQ, 0, CODE_FOR_vpopcountv16si_mask, "__builtin_ia32_vpopcountd_v16si_mask", IX86_BUILTIN_VPOPCOUNTDV16SI_MASK, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_UHI) -BDESC (OPTION_MASK_ISA_AVX512VPOPCNTDQ, 0, CODE_FOR_vpopcountv8di, "__builtin_ia32_vpopcountq_v8di", IX86_BUILTIN_VPOPCOUNTQV8DI, UNKNOWN, (int) V8DI_FTYPE_V8DI) -BDESC (OPTION_MASK_ISA_AVX512VPOPCNTDQ, 0, CODE_FOR_vpopcountv8di_mask, "__builtin_ia32_vpopcountq_v8di_mask", IX86_BUILTIN_VPOPCOUNTQV8DI_MASK, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_UQI) +BDESC (OPTION_MASK_ISA_AVX512VPOPCNTDQ, OPTION_MASK_ISA2_EVEX512, CODE_FOR_vpopcountv16si, "__builtin_ia32_vpopcountd_v16si", IX86_BUILTIN_VPOPCOUNTDV16SI, UNKNOWN, (int) V16SI_FTYPE_V16SI) +BDESC (OPTION_MASK_ISA_AVX512VPOPCNTDQ, OPTION_MASK_ISA2_EVEX512, CODE_FOR_vpopcountv16si_mask, "__builtin_ia32_vpopcountd_v16si_mask", IX86_BUILTIN_VPOPCOUNTDV16SI_MASK, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_UHI) +BDESC (OPTION_MASK_ISA_AVX512VPOPCNTDQ, OPTION_MASK_ISA2_EVEX512, CODE_FOR_vpopcountv8di, "__builtin_ia32_vpopcountq_v8di", IX86_BUILTIN_VPOPCOUNTQV8DI, UNKNOWN, (int) V8DI_FTYPE_V8DI) +BDESC (OPTION_MASK_ISA_AVX512VPOPCNTDQ, OPTION_MASK_ISA2_EVEX512, CODE_FOR_vpopcountv8di_mask, "__builtin_ia32_vpopcountq_v8di_mask", IX86_BUILTIN_VPOPCOUNTQV8DI_MASK, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_UQI) BDESC (OPTION_MASK_ISA_AVX512VPOPCNTDQ | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpopcountv4di, "__builtin_ia32_vpopcountq_v4di", IX86_BUILTIN_VPOPCOUNTQV4DI, UNKNOWN, (int) V4DI_FTYPE_V4DI) BDESC (OPTION_MASK_ISA_AVX512VPOPCNTDQ | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpopcountv4di_mask, "__builtin_ia32_vpopcountq_v4di_mask", IX86_BUILTIN_VPOPCOUNTQV4DI_MASK, UNKNOWN, (int) V4DI_FTYPE_V4DI_V4DI_UQI) @@ -2791,21 +2791,21 @@ BDESC (OPTION_MASK_ISA_AVX512VPOPCNTDQ | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_v BDESC (OPTION_MASK_ISA_AVX512VPOPCNTDQ | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpopcountv8si_mask, "__builtin_ia32_vpopcountd_v8si_mask", IX86_BUILTIN_VPOPCOUNTDV8SI_MASK, UNKNOWN, (int) V8SI_FTYPE_V8SI_V8SI_UHI) /* BITALG */ -BDESC (OPTION_MASK_ISA_AVX512BITALG, 0, CODE_FOR_vpopcountv64qi, "__builtin_ia32_vpopcountb_v64qi", IX86_BUILTIN_VPOPCOUNTBV64QI, UNKNOWN, (int) V64QI_FTYPE_V64QI) -BDESC (OPTION_MASK_ISA_AVX512BITALG, 0, CODE_FOR_vpopcountv64qi_mask, "__builtin_ia32_vpopcountb_v64qi_mask", IX86_BUILTIN_VPOPCOUNTBV64QI_MASK, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI_UDI) +BDESC (OPTION_MASK_ISA_AVX512BITALG, OPTION_MASK_ISA2_EVEX512, CODE_FOR_vpopcountv64qi, "__builtin_ia32_vpopcountb_v64qi", IX86_BUILTIN_VPOPCOUNTBV64QI, UNKNOWN, (int) V64QI_FTYPE_V64QI) +BDESC (OPTION_MASK_ISA_AVX512BITALG, OPTION_MASK_ISA2_EVEX512, CODE_FOR_vpopcountv64qi_mask, "__builtin_ia32_vpopcountb_v64qi_mask", IX86_BUILTIN_VPOPCOUNTBV64QI_MASK, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI_UDI) BDESC (OPTION_MASK_ISA_AVX512BITALG | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpopcountv32qi, "__builtin_ia32_vpopcountb_v32qi", IX86_BUILTIN_VPOPCOUNTBV32QI, UNKNOWN, (int) V32QI_FTYPE_V32QI) BDESC (OPTION_MASK_ISA_AVX512BITALG | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpopcountv32qi_mask, "__builtin_ia32_vpopcountb_v32qi_mask", IX86_BUILTIN_VPOPCOUNTBV32QI_MASK, UNKNOWN, (int) V32QI_FTYPE_V32QI_V32QI_USI) BDESC (OPTION_MASK_ISA_AVX512BITALG | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpopcountv16qi, "__builtin_ia32_vpopcountb_v16qi", IX86_BUILTIN_VPOPCOUNTBV16QI, UNKNOWN, (int) V16QI_FTYPE_V16QI) BDESC (OPTION_MASK_ISA_AVX512BITALG | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpopcountv16qi_mask, "__builtin_ia32_vpopcountb_v16qi_mask", IX86_BUILTIN_VPOPCOUNTBV16QI_MASK, UNKNOWN, (int) V16QI_FTYPE_V16QI_V16QI_UHI) -BDESC (OPTION_MASK_ISA_AVX512BITALG, 0, CODE_FOR_vpopcountv32hi, "__builtin_ia32_vpopcountw_v32hi", IX86_BUILTIN_VPOPCOUNTWV32HI, UNKNOWN, (int) V32HI_FTYPE_V32HI) -BDESC (OPTION_MASK_ISA_AVX512BITALG, 0, CODE_FOR_vpopcountv32hi_mask, "__builtin_ia32_vpopcountw_v32hi_mask", IX86_BUILTIN_VPOPCOUNTQV32HI_MASK, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_USI) +BDESC (OPTION_MASK_ISA_AVX512BITALG, OPTION_MASK_ISA2_EVEX512, CODE_FOR_vpopcountv32hi, "__builtin_ia32_vpopcountw_v32hi", IX86_BUILTIN_VPOPCOUNTWV32HI, UNKNOWN, (int) V32HI_FTYPE_V32HI) +BDESC (OPTION_MASK_ISA_AVX512BITALG, OPTION_MASK_ISA2_EVEX512, CODE_FOR_vpopcountv32hi_mask, "__builtin_ia32_vpopcountw_v32hi_mask", IX86_BUILTIN_VPOPCOUNTQV32HI_MASK, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_USI) BDESC (OPTION_MASK_ISA_AVX512BITALG | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpopcountv16hi, "__builtin_ia32_vpopcountw_v16hi", IX86_BUILTIN_VPOPCOUNTWV16HI, UNKNOWN, (int) V16HI_FTYPE_V16HI) BDESC (OPTION_MASK_ISA_AVX512BITALG | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpopcountv16hi_mask, "__builtin_ia32_vpopcountw_v16hi_mask", IX86_BUILTIN_VPOPCOUNTQV16HI_MASK, UNKNOWN, (int) V16HI_FTYPE_V16HI_V16HI_UHI) BDESC (OPTION_MASK_ISA_AVX512BITALG | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpopcountv8hi, "__builtin_ia32_vpopcountw_v8hi", IX86_BUILTIN_VPOPCOUNTWV8HI, UNKNOWN, (int) V8HI_FTYPE_V8HI) BDESC (OPTION_MASK_ISA_AVX512BITALG | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpopcountv8hi_mask, "__builtin_ia32_vpopcountw_v8hi_mask", IX86_BUILTIN_VPOPCOUNTQV8HI_MASK, UNKNOWN, (int) V8HI_FTYPE_V8HI_V8HI_UQI) -BDESC (OPTION_MASK_ISA_AVX512BITALG, 0, CODE_FOR_avx512vl_vpshufbitqmbv64qi_mask, "__builtin_ia32_vpshufbitqmb512_mask", IX86_BUILTIN_VPSHUFBITQMB512_MASK, UNKNOWN, (int) UDI_FTYPE_V64QI_V64QI_UDI) +BDESC (OPTION_MASK_ISA_AVX512BITALG, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512vl_vpshufbitqmbv64qi_mask, "__builtin_ia32_vpshufbitqmb512_mask", IX86_BUILTIN_VPSHUFBITQMB512_MASK, UNKNOWN, (int) UDI_FTYPE_V64QI_V64QI_UDI) BDESC (OPTION_MASK_ISA_AVX512BITALG | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_avx512vl_vpshufbitqmbv32qi_mask, "__builtin_ia32_vpshufbitqmb256_mask", IX86_BUILTIN_VPSHUFBITQMB256_MASK, UNKNOWN, (int) USI_FTYPE_V32QI_V32QI_USI) BDESC (OPTION_MASK_ISA_AVX512BITALG | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_avx512vl_vpshufbitqmbv16qi_mask, "__builtin_ia32_vpshufbitqmb128_mask", IX86_BUILTIN_VPSHUFBITQMB128_MASK, UNKNOWN, (int) UHI_FTYPE_V16QI_V16QI_UHI) @@ -2829,39 +2829,39 @@ BDESC (0, OPTION_MASK_ISA2_RDPID, CODE_FOR_rdpid, "__builtin_ia32_rdpid", IX86_B /* VAES. */ BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_VAES, CODE_FOR_vaesdec_v16qi, "__builtin_ia32_vaesdec_v16qi", IX86_BUILTIN_VAESDEC16, UNKNOWN, (int) V16QI_FTYPE_V16QI_V16QI) BDESC (0, OPTION_MASK_ISA2_VAES, CODE_FOR_vaesdec_v32qi, "__builtin_ia32_vaesdec_v32qi", IX86_BUILTIN_VAESDEC32, UNKNOWN, (int) V32QI_FTYPE_V32QI_V32QI) -BDESC (0, OPTION_MASK_ISA2_VAES, CODE_FOR_vaesdec_v64qi, "__builtin_ia32_vaesdec_v64qi", IX86_BUILTIN_VAESDEC64, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI) +BDESC (0, OPTION_MASK_ISA2_VAES | OPTION_MASK_ISA2_EVEX512, CODE_FOR_vaesdec_v64qi, "__builtin_ia32_vaesdec_v64qi", IX86_BUILTIN_VAESDEC64, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI) BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_VAES, CODE_FOR_vaesdeclast_v16qi, "__builtin_ia32_vaesdeclast_v16qi", IX86_BUILTIN_VAESDECLAST16, UNKNOWN, (int) V16QI_FTYPE_V16QI_V16QI) BDESC (0, OPTION_MASK_ISA2_VAES, CODE_FOR_vaesdeclast_v32qi, "__builtin_ia32_vaesdeclast_v32qi", IX86_BUILTIN_VAESDECLAST32, UNKNOWN, (int) V32QI_FTYPE_V32QI_V32QI) -BDESC (0, OPTION_MASK_ISA2_VAES, CODE_FOR_vaesdeclast_v64qi, "__builtin_ia32_vaesdeclast_v64qi", IX86_BUILTIN_VAESDECLAST64, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI) +BDESC (0, OPTION_MASK_ISA2_VAES | OPTION_MASK_ISA2_EVEX512, CODE_FOR_vaesdeclast_v64qi, "__builtin_ia32_vaesdeclast_v64qi", IX86_BUILTIN_VAESDECLAST64, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI) BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_VAES, CODE_FOR_vaesenc_v16qi, "__builtin_ia32_vaesenc_v16qi", IX86_BUILTIN_VAESENC16, UNKNOWN, (int) V16QI_FTYPE_V16QI_V16QI) BDESC (0, OPTION_MASK_ISA2_VAES, CODE_FOR_vaesenc_v32qi, "__builtin_ia32_vaesenc_v32qi", IX86_BUILTIN_VAESENC32, UNKNOWN, (int) V32QI_FTYPE_V32QI_V32QI) -BDESC (0, OPTION_MASK_ISA2_VAES, CODE_FOR_vaesenc_v64qi, "__builtin_ia32_vaesenc_v64qi", IX86_BUILTIN_VAESENC64, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI) +BDESC (0, OPTION_MASK_ISA2_VAES | OPTION_MASK_ISA2_EVEX512, CODE_FOR_vaesenc_v64qi, "__builtin_ia32_vaesenc_v64qi", IX86_BUILTIN_VAESENC64, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI) BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_VAES, CODE_FOR_vaesenclast_v16qi, "__builtin_ia32_vaesenclast_v16qi", IX86_BUILTIN_VAESENCLAST16, UNKNOWN, (int) V16QI_FTYPE_V16QI_V16QI) BDESC (0, OPTION_MASK_ISA2_VAES, CODE_FOR_vaesenclast_v32qi, "__builtin_ia32_vaesenclast_v32qi", IX86_BUILTIN_VAESENCLAST32, UNKNOWN, (int) V32QI_FTYPE_V32QI_V32QI) -BDESC (0, OPTION_MASK_ISA2_VAES, CODE_FOR_vaesenclast_v64qi, "__builtin_ia32_vaesenclast_v64qi", IX86_BUILTIN_VAESENCLAST64, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI) +BDESC (0, OPTION_MASK_ISA2_VAES | OPTION_MASK_ISA2_EVEX512, CODE_FOR_vaesenclast_v64qi, "__builtin_ia32_vaesenclast_v64qi", IX86_BUILTIN_VAESENCLAST64, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI) /* BF16 */ -BDESC (0, OPTION_MASK_ISA2_AVX512BF16, CODE_FOR_avx512f_cvtne2ps2bf16_v32bf, "__builtin_ia32_cvtne2ps2bf16_v32bf", IX86_BUILTIN_CVTNE2PS2BF16_V32BF, UNKNOWN, (int) V32BF_FTYPE_V16SF_V16SF) -BDESC (0, OPTION_MASK_ISA2_AVX512BF16, CODE_FOR_avx512f_cvtne2ps2bf16_v32bf_mask, "__builtin_ia32_cvtne2ps2bf16_v32bf_mask", IX86_BUILTIN_CVTNE2PS2BF16_V32BF_MASK, UNKNOWN, (int) V32BF_FTYPE_V16SF_V16SF_V32BF_USI) -BDESC (0, OPTION_MASK_ISA2_AVX512BF16, CODE_FOR_avx512f_cvtne2ps2bf16_v32bf_maskz, "__builtin_ia32_cvtne2ps2bf16_v32bf_maskz", IX86_BUILTIN_CVTNE2PS2BF16_V32BF_MASKZ, UNKNOWN, (int) V32BF_FTYPE_V16SF_V16SF_USI) +BDESC (0, OPTION_MASK_ISA2_AVX512BF16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_cvtne2ps2bf16_v32bf, "__builtin_ia32_cvtne2ps2bf16_v32bf", IX86_BUILTIN_CVTNE2PS2BF16_V32BF, UNKNOWN, (int) V32BF_FTYPE_V16SF_V16SF) +BDESC (0, OPTION_MASK_ISA2_AVX512BF16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_cvtne2ps2bf16_v32bf_mask, "__builtin_ia32_cvtne2ps2bf16_v32bf_mask", IX86_BUILTIN_CVTNE2PS2BF16_V32BF_MASK, UNKNOWN, (int) V32BF_FTYPE_V16SF_V16SF_V32BF_USI) +BDESC (0, OPTION_MASK_ISA2_AVX512BF16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_cvtne2ps2bf16_v32bf_maskz, "__builtin_ia32_cvtne2ps2bf16_v32bf_maskz", IX86_BUILTIN_CVTNE2PS2BF16_V32BF_MASKZ, UNKNOWN, (int) V32BF_FTYPE_V16SF_V16SF_USI) BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512BF16, CODE_FOR_avx512f_cvtne2ps2bf16_v16bf, "__builtin_ia32_cvtne2ps2bf16_v16bf", IX86_BUILTIN_CVTNE2PS2BF16_V16BF, UNKNOWN, (int) V16BF_FTYPE_V8SF_V8SF) BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512BF16, CODE_FOR_avx512f_cvtne2ps2bf16_v16bf_mask, "__builtin_ia32_cvtne2ps2bf16_v16bf_mask", IX86_BUILTIN_CVTNE2PS2BF16_V16BF_MASK, UNKNOWN, (int) V16BF_FTYPE_V8SF_V8SF_V16BF_UHI) BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512BF16, CODE_FOR_avx512f_cvtne2ps2bf16_v16bf_maskz, "__builtin_ia32_cvtne2ps2bf16_v16bf_maskz", IX86_BUILTIN_CVTNE2PS2BF16_V16BF_MASKZ, UNKNOWN, (int) V16BF_FTYPE_V8SF_V8SF_UHI) BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512BF16, CODE_FOR_avx512f_cvtne2ps2bf16_v8bf, "__builtin_ia32_cvtne2ps2bf16_v8bf", IX86_BUILTIN_CVTNE2PS2BF16_V8BF, UNKNOWN, (int) V8BF_FTYPE_V4SF_V4SF) BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512BF16, CODE_FOR_avx512f_cvtne2ps2bf16_v8bf_mask, "__builtin_ia32_cvtne2ps2bf16_v8bf_mask", IX86_BUILTIN_CVTNE2PS2BF16_V8BF_MASK, UNKNOWN, (int) V8BF_FTYPE_V4SF_V4SF_V8BF_UQI) BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512BF16, CODE_FOR_avx512f_cvtne2ps2bf16_v8bf_maskz, "__builtin_ia32_cvtne2ps2bf16_v8bf_maskz", IX86_BUILTIN_CVTNE2PS2BF16_V8BF_MASKZ, UNKNOWN, (int) V8BF_FTYPE_V4SF_V4SF_UQI) -BDESC (0, OPTION_MASK_ISA2_AVX512BF16, CODE_FOR_avx512f_cvtneps2bf16_v16sf, "__builtin_ia32_cvtneps2bf16_v16sf", IX86_BUILTIN_CVTNEPS2BF16_V16SF, UNKNOWN, (int) V16BF_FTYPE_V16SF) -BDESC (0, OPTION_MASK_ISA2_AVX512BF16, CODE_FOR_avx512f_cvtneps2bf16_v16sf_mask, "__builtin_ia32_cvtneps2bf16_v16sf_mask", IX86_BUILTIN_CVTNEPS2BF16_V16SF_MASK, UNKNOWN, (int) V16BF_FTYPE_V16SF_V16BF_UHI) -BDESC (0, OPTION_MASK_ISA2_AVX512BF16, CODE_FOR_avx512f_cvtneps2bf16_v16sf_maskz, "__builtin_ia32_cvtneps2bf16_v16sf_maskz", IX86_BUILTIN_CVTNE2PS2BF16_V16SF_MASKZ, UNKNOWN, (int) V16BF_FTYPE_V16SF_UHI) +BDESC (0, OPTION_MASK_ISA2_AVX512BF16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_cvtneps2bf16_v16sf, "__builtin_ia32_cvtneps2bf16_v16sf", IX86_BUILTIN_CVTNEPS2BF16_V16SF, UNKNOWN, (int) V16BF_FTYPE_V16SF) +BDESC (0, OPTION_MASK_ISA2_AVX512BF16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_cvtneps2bf16_v16sf_mask, "__builtin_ia32_cvtneps2bf16_v16sf_mask", IX86_BUILTIN_CVTNEPS2BF16_V16SF_MASK, UNKNOWN, (int) V16BF_FTYPE_V16SF_V16BF_UHI) +BDESC (0, OPTION_MASK_ISA2_AVX512BF16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_cvtneps2bf16_v16sf_maskz, "__builtin_ia32_cvtneps2bf16_v16sf_maskz", IX86_BUILTIN_CVTNE2PS2BF16_V16SF_MASKZ, UNKNOWN, (int) V16BF_FTYPE_V16SF_UHI) BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVXNECONVERT | OPTION_MASK_ISA2_AVX512BF16, CODE_FOR_vcvtneps2bf16_v8sf, "__builtin_ia32_cvtneps2bf16_v8sf", IX86_BUILTIN_CVTNEPS2BF16_V8SF, UNKNOWN, (int) V8BF_FTYPE_V8SF) BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512BF16, CODE_FOR_avx512f_cvtneps2bf16_v8sf_mask, "__builtin_ia32_cvtneps2bf16_v8sf_mask", IX86_BUILTIN_CVTNEPS2BF16_V8SF_MASK, UNKNOWN, (int) V8BF_FTYPE_V8SF_V8BF_UQI) BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512BF16, CODE_FOR_avx512f_cvtneps2bf16_v8sf_maskz, "__builtin_ia32_cvtneps2bf16_v8sf_maskz", IX86_BUILTIN_CVTNE2PS2BF16_V8SF_MASKZ, UNKNOWN, (int) V8BF_FTYPE_V8SF_UQI) BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVXNECONVERT | OPTION_MASK_ISA2_AVX512BF16, CODE_FOR_vcvtneps2bf16_v4sf, "__builtin_ia32_cvtneps2bf16_v4sf", IX86_BUILTIN_CVTNEPS2BF16_V4SF, UNKNOWN, (int) V8BF_FTYPE_V4SF) BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512BF16, CODE_FOR_avx512f_cvtneps2bf16_v4sf_mask, "__builtin_ia32_cvtneps2bf16_v4sf_mask", IX86_BUILTIN_CVTNEPS2BF16_V4SF_MASK, UNKNOWN, (int) V8BF_FTYPE_V4SF_V8BF_UQI) BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512BF16, CODE_FOR_avx512f_cvtneps2bf16_v4sf_maskz, "__builtin_ia32_cvtneps2bf16_v4sf_maskz", IX86_BUILTIN_CVTNE2PS2BF16_V4SF_MASKZ, UNKNOWN, (int) V8BF_FTYPE_V4SF_UQI) -BDESC (0, OPTION_MASK_ISA2_AVX512BF16, CODE_FOR_avx512f_dpbf16ps_v16sf, "__builtin_ia32_dpbf16ps_v16sf", IX86_BUILTIN_DPBF16PS_V16SF, UNKNOWN, (int) V16SF_FTYPE_V16SF_V32BF_V32BF) -BDESC (0, OPTION_MASK_ISA2_AVX512BF16, CODE_FOR_avx512f_dpbf16ps_v16sf_mask, "__builtin_ia32_dpbf16ps_v16sf_mask", IX86_BUILTIN_DPBF16PS_V16SF_MASK, UNKNOWN, (int) V16SF_FTYPE_V16SF_V32BF_V32BF_UHI) -BDESC (0, OPTION_MASK_ISA2_AVX512BF16, CODE_FOR_avx512f_dpbf16ps_v16sf_maskz, "__builtin_ia32_dpbf16ps_v16sf_maskz", IX86_BUILTIN_DPBF16PS_V16SF_MASKZ, UNKNOWN, (int) V16SF_FTYPE_V16SF_V32BF_V32BF_UHI) +BDESC (0, OPTION_MASK_ISA2_AVX512BF16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_dpbf16ps_v16sf, "__builtin_ia32_dpbf16ps_v16sf", IX86_BUILTIN_DPBF16PS_V16SF, UNKNOWN, (int) V16SF_FTYPE_V16SF_V32BF_V32BF) +BDESC (0, OPTION_MASK_ISA2_AVX512BF16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_dpbf16ps_v16sf_mask, "__builtin_ia32_dpbf16ps_v16sf_mask", IX86_BUILTIN_DPBF16PS_V16SF_MASK, UNKNOWN, (int) V16SF_FTYPE_V16SF_V32BF_V32BF_UHI) +BDESC (0, OPTION_MASK_ISA2_AVX512BF16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_dpbf16ps_v16sf_maskz, "__builtin_ia32_dpbf16ps_v16sf_maskz", IX86_BUILTIN_DPBF16PS_V16SF_MASKZ, UNKNOWN, (int) V16SF_FTYPE_V16SF_V32BF_V32BF_UHI) BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512BF16, CODE_FOR_avx512f_dpbf16ps_v8sf, "__builtin_ia32_dpbf16ps_v8sf", IX86_BUILTIN_DPBF16PS_V8SF, UNKNOWN, (int) V8SF_FTYPE_V8SF_V16BF_V16BF) BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512BF16, CODE_FOR_avx512f_dpbf16ps_v8sf_mask, "__builtin_ia32_dpbf16ps_v8sf_mask", IX86_BUILTIN_DPBF16PS_V8SF_MASK, UNKNOWN, (int) V8SF_FTYPE_V8SF_V16BF_V16BF_UQI) BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512BF16, CODE_FOR_avx512f_dpbf16ps_v8sf_maskz, "__builtin_ia32_dpbf16ps_v8sf_maskz", IX86_BUILTIN_DPBF16PS_V8SF_MASKZ, UNKNOWN, (int) V8SF_FTYPE_V8SF_V16BF_V16BF_UQI) From patchwork Thu Sep 21 07:20:06 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Hu, Lin1" X-Patchwork-Id: 1837527 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=intel.com header.i=@intel.com header.a=rsa-sha256 header.s=Intel header.b=fcDjU112; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4Rrn5B6jYVz1yh6 for ; Thu, 21 Sep 2023 17:26:22 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 032273831E3E for ; Thu, 21 Sep 2023 07:26:21 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mgamail.intel.com (mgamail.intel.com [192.55.52.120]) by sourceware.org (Postfix) with ESMTPS id 01F34393BA4D for ; Thu, 21 Sep 2023 07:25:18 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 01F34393BA4D Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=intel.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1695281119; x=1726817119; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=3mzbqHbCsRwYUQiAhYEyWDWRBUWmI795ASfPDjf8GB0=; b=fcDjU1128IDJbOkzI8hQmWhc5Er0f+2lcNcR1aNnVxZWU50Xf+pUCKdh tt9vJdKLuJ+HrTF1d/1EqkPSlCye0/l8Tc/WWSdmrUSQ9D91ZsLgfUfw7 ioZLcXLOPHz+5n1s3f4yjttSzKmyFoItaKlBYOwX97avNKWXvwa/QYDjs jAmdiXBbJQbPgeMeo9QEAiUi6F54jUtLdZ68jvmX6J5cBOtz4tqK72nEh /D+EVIJWycKhJsa5Q71oGPvOEVfLseb2gFmdv7PtMPB8b/Pi61QgwM8Ut /PSdvHx7AKMEn+XS/hZUKL6eLg1rDa6SXRtvxcg3BIqZ7ff1oydBYYnUH A==; X-IronPort-AV: E=McAfee;i="6600,9927,10839"; a="379326834" X-IronPort-AV: E=Sophos;i="6.03,164,1694761200"; d="scan'208";a="379326834" Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by fmsmga104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Sep 2023 00:23:00 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10839"; a="817262430" X-IronPort-AV: E=Sophos;i="6.03,164,1694761200"; d="scan'208";a="817262430" Received: from shvmail03.sh.intel.com ([10.239.245.20]) by fmsmga004.fm.intel.com with ESMTP; 21 Sep 2023 00:22:15 -0700 Received: from shliclel4217.sh.intel.com (shliclel4217.sh.intel.com [10.239.240.127]) by shvmail03.sh.intel.com (Postfix) with ESMTP id 5250A1005139; Thu, 21 Sep 2023 15:22:14 +0800 (CST) From: "Hu, Lin1" To: gcc-patches@gcc.gnu.org Cc: hongtao.liu@intel.com, ubizjak@gmail.com, haochen.jiang@intel.com Subject: [PATCH 11/18] [PATCH 5/5] Add OPTION_MASK_ISA2_EVEX512 for 512 bit builtins Date: Thu, 21 Sep 2023 15:20:06 +0800 Message-Id: <20230921072013.2124750-12-lin1.hu@intel.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20230921072013.2124750-1-lin1.hu@intel.com> References: <20230921072013.2124750-1-lin1.hu@intel.com> MIME-Version: 1.0 X-Spam-Status: No, score=-12.0 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, SPF_HELO_NONE, SPF_NONE, TXREP, UPPERCASE_50_75 autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org From: Haochen Jiang gcc/ChangeLog: * config/i386/i386-builtin.def (BDESC): Add OPTION_MASK_ISA2_EVEX512. --- gcc/config/i386/i386-builtin.def | 156 +++++++++++++++---------------- 1 file changed, 78 insertions(+), 78 deletions(-) diff --git a/gcc/config/i386/i386-builtin.def b/gcc/config/i386/i386-builtin.def index 8250e2998cd..b90d5ccc969 100644 --- a/gcc/config/i386/i386-builtin.def +++ b/gcc/config/i386/i386-builtin.def @@ -1568,9 +1568,9 @@ BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_copysignv8df3 BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_sqrtv8df2, "__builtin_ia32_sqrtpd512", IX86_BUILTIN_SQRTPD512, UNKNOWN, (int) V8DF_FTYPE_V8DF) BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_sqrtv16sf2, "__builtin_ia32_sqrtps512", IX86_BUILTIN_SQRTPS_NR512, UNKNOWN, (int) V16SF_FTYPE_V16SF) BDESC (OPTION_MASK_ISA_AVX512ER, 0, CODE_FOR_avx512er_exp2v16sf, "__builtin_ia32_exp2ps", IX86_BUILTIN_EXP2PS, UNKNOWN, (int) V16SF_FTYPE_V16SF) -BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_rndscalev32hf, "__builtin_ia32_floorph512", IX86_BUILTIN_FLOORPH512, (enum rtx_code) ROUND_FLOOR, (int) V32HF_FTYPE_V32HF_ROUND) -BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_rndscalev32hf, "__builtin_ia32_ceilph512", IX86_BUILTIN_CEILPH512, (enum rtx_code) ROUND_CEIL, (int) V32HF_FTYPE_V32HF_ROUND) -BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_rndscalev32hf, "__builtin_ia32_truncph512", IX86_BUILTIN_TRUNCPH512, (enum rtx_code) ROUND_TRUNC, (int) V32HF_FTYPE_V32HF_ROUND) +BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_rndscalev32hf, "__builtin_ia32_floorph512", IX86_BUILTIN_FLOORPH512, (enum rtx_code) ROUND_FLOOR, (int) V32HF_FTYPE_V32HF_ROUND) +BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_rndscalev32hf, "__builtin_ia32_ceilph512", IX86_BUILTIN_CEILPH512, (enum rtx_code) ROUND_CEIL, (int) V32HF_FTYPE_V32HF_ROUND) +BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_rndscalev32hf, "__builtin_ia32_truncph512", IX86_BUILTIN_TRUNCPH512, (enum rtx_code) ROUND_TRUNC, (int) V32HF_FTYPE_V32HF_ROUND) BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_roundps512, "__builtin_ia32_floorps512", IX86_BUILTIN_FLOORPS512, (enum rtx_code) ROUND_FLOOR, (int) V16SF_FTYPE_V16SF_ROUND) BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_roundps512, "__builtin_ia32_ceilps512", IX86_BUILTIN_CEILPS512, (enum rtx_code) ROUND_CEIL, (int) V16SF_FTYPE_V16SF_ROUND) BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_roundps512, "__builtin_ia32_truncps512", IX86_BUILTIN_TRUNCPS512, (enum rtx_code) ROUND_TRUNC, (int) V16SF_FTYPE_V16SF_ROUND) @@ -2874,40 +2874,40 @@ BDESC (OPTION_MASK_ISA_SSE2, 0, CODE_FOR_extendbfsf2_1, "__builtin_ia32_cvtbf2sf /* AVX512FP16. */ BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_addv8hf3_mask, "__builtin_ia32_addph128_mask", IX86_BUILTIN_ADDPH128_MASK, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI) BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_addv16hf3_mask, "__builtin_ia32_addph256_mask", IX86_BUILTIN_ADDPH256_MASK, UNKNOWN, (int) V16HF_FTYPE_V16HF_V16HF_V16HF_UHI) -BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_addv32hf3_mask, "__builtin_ia32_addph512_mask", IX86_BUILTIN_ADDPH512_MASK, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI) +BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_addv32hf3_mask, "__builtin_ia32_addph512_mask", IX86_BUILTIN_ADDPH512_MASK, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI) BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_subv8hf3_mask, "__builtin_ia32_subph128_mask", IX86_BUILTIN_SUBPH128_MASK, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI) BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_subv16hf3_mask, "__builtin_ia32_subph256_mask", IX86_BUILTIN_SUBPH256_MASK, UNKNOWN, (int) V16HF_FTYPE_V16HF_V16HF_V16HF_UHI) -BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_subv32hf3_mask, "__builtin_ia32_subph512_mask", IX86_BUILTIN_SUBPH512_MASK, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI) +BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_subv32hf3_mask, "__builtin_ia32_subph512_mask", IX86_BUILTIN_SUBPH512_MASK, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI) BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_mulv8hf3_mask, "__builtin_ia32_mulph128_mask", IX86_BUILTIN_MULPH128_MASK, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI) BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_mulv16hf3_mask, "__builtin_ia32_mulph256_mask", IX86_BUILTIN_MULPH256_MASK, UNKNOWN, (int) V16HF_FTYPE_V16HF_V16HF_V16HF_UHI) -BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_mulv32hf3_mask, "__builtin_ia32_mulph512_mask", IX86_BUILTIN_MULPH512_MASK, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI) +BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_mulv32hf3_mask, "__builtin_ia32_mulph512_mask", IX86_BUILTIN_MULPH512_MASK, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI) BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_divv8hf3_mask, "__builtin_ia32_divph128_mask", IX86_BUILTIN_DIVPH128_MASK, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI) BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_divv16hf3_mask, "__builtin_ia32_divph256_mask", IX86_BUILTIN_DIVPH256_MASK, UNKNOWN, (int) V16HF_FTYPE_V16HF_V16HF_V16HF_UHI) -BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_divv32hf3_mask, "__builtin_ia32_divph512_mask", IX86_BUILTIN_DIVPH512_MASK, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI) +BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512fp16_divv32hf3_mask, "__builtin_ia32_divph512_mask", IX86_BUILTIN_DIVPH512_MASK, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI) BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vmaddv8hf3_mask, "__builtin_ia32_addsh_mask", IX86_BUILTIN_ADDSH_MASK, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI) BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vmsubv8hf3_mask, "__builtin_ia32_subsh_mask", IX86_BUILTIN_SUBSH_MASK, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI) BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vmmulv8hf3_mask, "__builtin_ia32_mulsh_mask", IX86_BUILTIN_MULSH_MASK, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI) BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vmdivv8hf3_mask, "__builtin_ia32_divsh_mask", IX86_BUILTIN_DIVSH_MASK, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI) BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_smaxv8hf3_mask, "__builtin_ia32_maxph128_mask", IX86_BUILTIN_MAXPH128_MASK, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI) BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_smaxv16hf3_mask, "__builtin_ia32_maxph256_mask", IX86_BUILTIN_MAXPH256_MASK, UNKNOWN, (int) V16HF_FTYPE_V16HF_V16HF_V16HF_UHI) -BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_smaxv32hf3_mask, "__builtin_ia32_maxph512_mask", IX86_BUILTIN_MAXPH512_MASK, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI) +BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_smaxv32hf3_mask, "__builtin_ia32_maxph512_mask", IX86_BUILTIN_MAXPH512_MASK, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI) BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_sminv8hf3_mask, "__builtin_ia32_minph128_mask", IX86_BUILTIN_MINPH128_MASK, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI) BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_sminv16hf3_mask, "__builtin_ia32_minph256_mask", IX86_BUILTIN_MINPH256_MASK, UNKNOWN, (int) V16HF_FTYPE_V16HF_V16HF_V16HF_UHI) -BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_sminv32hf3_mask, "__builtin_ia32_minph512_mask", IX86_BUILTIN_MINPH512_MASK, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI) +BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_sminv32hf3_mask, "__builtin_ia32_minph512_mask", IX86_BUILTIN_MINPH512_MASK, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI) BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vmsmaxv8hf3_mask, "__builtin_ia32_maxsh_mask", IX86_BUILTIN_MAXSH_MASK, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI) BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vmsminv8hf3_mask, "__builtin_ia32_minsh_mask", IX86_BUILTIN_MINSH_MASK, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI) BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_cmpv8hf3_mask, "__builtin_ia32_cmpph128_mask", IX86_BUILTIN_CMPPH128_MASK, UNKNOWN, (int) UQI_FTYPE_V8HF_V8HF_INT_UQI) BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512vl_cmpv16hf3_mask, "__builtin_ia32_cmpph256_mask", IX86_BUILTIN_CMPPH256_MASK, UNKNOWN, (int) UHI_FTYPE_V16HF_V16HF_INT_UHI) -BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_cmpv32hf3_mask, "__builtin_ia32_cmpph512_mask", IX86_BUILTIN_CMPPH512_MASK, UNKNOWN, (int) USI_FTYPE_V32HF_V32HF_INT_USI) +BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_cmpv32hf3_mask, "__builtin_ia32_cmpph512_mask", IX86_BUILTIN_CMPPH512_MASK, UNKNOWN, (int) USI_FTYPE_V32HF_V32HF_INT_USI) BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_sqrtv8hf2_mask, "__builtin_ia32_sqrtph128_mask", IX86_BUILTIN_SQRTPH128_MASK, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_UQI) BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_sqrtv16hf2_mask, "__builtin_ia32_sqrtph256_mask", IX86_BUILTIN_SQRTPH256_MASK, UNKNOWN, (int) V16HF_FTYPE_V16HF_V16HF_UHI) BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_rsqrtv8hf2_mask, "__builtin_ia32_rsqrtph128_mask", IX86_BUILTIN_RSQRTPH128_MASK, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_UQI) BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_rsqrtv16hf2_mask, "__builtin_ia32_rsqrtph256_mask", IX86_BUILTIN_RSQRTPH256_MASK, UNKNOWN, (int) V16HF_FTYPE_V16HF_V16HF_UHI) -BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_rsqrtv32hf2_mask, "__builtin_ia32_rsqrtph512_mask", IX86_BUILTIN_RSQRTPH512_MASK, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_USI) +BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512fp16_rsqrtv32hf2_mask, "__builtin_ia32_rsqrtph512_mask", IX86_BUILTIN_RSQRTPH512_MASK, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_USI) BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vmrsqrtv8hf2_mask, "__builtin_ia32_rsqrtsh_mask", IX86_BUILTIN_RSQRTSH_MASK, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI) BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_rcpv8hf2_mask, "__builtin_ia32_rcpph128_mask", IX86_BUILTIN_RCPPH128_MASK, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_UQI) BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_rcpv16hf2_mask, "__builtin_ia32_rcpph256_mask", IX86_BUILTIN_RCPPH256_MASK, UNKNOWN, (int) V16HF_FTYPE_V16HF_V16HF_UHI) -BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_rcpv32hf2_mask, "__builtin_ia32_rcpph512_mask", IX86_BUILTIN_RCPPH512_MASK, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_USI) +BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512fp16_rcpv32hf2_mask, "__builtin_ia32_rcpph512_mask", IX86_BUILTIN_RCPPH512_MASK, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_USI) BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vmrcpv8hf2_mask, "__builtin_ia32_rcpsh_mask", IX86_BUILTIN_RCPSH_MASK, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI) BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_scalefv8hf_mask, "__builtin_ia32_scalefph128_mask", IX86_BUILTIN_SCALEFPH128_MASK, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI) BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512vl_scalefv16hf_mask, "__builtin_ia32_scalefph256_mask", IX86_BUILTIN_SCALEFPH256_MASK, UNKNOWN, (int) V16HF_FTYPE_V16HF_V16HF_V16HF_UHI) @@ -2917,7 +2917,7 @@ BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp1 BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512vl_rndscalev16hf_mask, "__builtin_ia32_rndscaleph256_mask", IX86_BUILTIN_RNDSCALEPH256_MASK, UNKNOWN, (int) V16HF_FTYPE_V16HF_INT_V16HF_UHI) BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512dq_fpclassv16hf_mask, "__builtin_ia32_fpclassph256_mask", IX86_BUILTIN_FPCLASSPH256, UNKNOWN, (int) HI_FTYPE_V16HF_INT_UHI) BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512dq_fpclassv8hf_mask, "__builtin_ia32_fpclassph128_mask", IX86_BUILTIN_FPCLASSPH128, UNKNOWN, (int) QI_FTYPE_V8HF_INT_UQI) -BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512dq_fpclassv32hf_mask, "__builtin_ia32_fpclassph512_mask", IX86_BUILTIN_FPCLASSPH512, UNKNOWN, (int) SI_FTYPE_V32HF_INT_USI) +BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512dq_fpclassv32hf_mask, "__builtin_ia32_fpclassph512_mask", IX86_BUILTIN_FPCLASSPH512, UNKNOWN, (int) SI_FTYPE_V32HF_INT_USI) BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512dq_vmfpclassv8hf_mask, "__builtin_ia32_fpclasssh_mask", IX86_BUILTIN_FPCLASSSH_MASK, UNKNOWN, (int) QI_FTYPE_V8HF_INT_UQI) BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512vl_getexpv16hf_mask, "__builtin_ia32_getexpph256_mask", IX86_BUILTIN_GETEXPPH256, UNKNOWN, (int) V16HF_FTYPE_V16HF_V16HF_UHI) BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_getexpv8hf_mask, "__builtin_ia32_getexpph128_mask", IX86_BUILTIN_GETEXPPH128, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_UQI) @@ -3229,50 +3229,50 @@ BDESC (OPTION_MASK_ISA_AVX512DQ, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512dq_ran BDESC (OPTION_MASK_ISA_AVX512DQ, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512dq_rangepv8df_mask_round, "__builtin_ia32_rangepd512_mask", IX86_BUILTIN_RANGEPD512, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_INT_V8DF_QI_INT) /* AVX512FP16. */ -BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_addv32hf3_mask_round, "__builtin_ia32_addph512_mask_round", IX86_BUILTIN_ADDPH512_MASK_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT) -BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_subv32hf3_mask_round, "__builtin_ia32_subph512_mask_round", IX86_BUILTIN_SUBPH512_MASK_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT) -BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_mulv32hf3_mask_round, "__builtin_ia32_mulph512_mask_round", IX86_BUILTIN_MULPH512_MASK_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT) -BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_divv32hf3_mask_round, "__builtin_ia32_divph512_mask_round", IX86_BUILTIN_DIVPH512_MASK_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT) +BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_addv32hf3_mask_round, "__builtin_ia32_addph512_mask_round", IX86_BUILTIN_ADDPH512_MASK_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT) +BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_subv32hf3_mask_round, "__builtin_ia32_subph512_mask_round", IX86_BUILTIN_SUBPH512_MASK_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT) +BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_mulv32hf3_mask_round, "__builtin_ia32_mulph512_mask_round", IX86_BUILTIN_MULPH512_MASK_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT) +BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512fp16_divv32hf3_mask_round, "__builtin_ia32_divph512_mask_round", IX86_BUILTIN_DIVPH512_MASK_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT) BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vmaddv8hf3_mask_round, "__builtin_ia32_addsh_mask_round", IX86_BUILTIN_ADDSH_MASK_ROUND, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI_INT) BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vmsubv8hf3_mask_round, "__builtin_ia32_subsh_mask_round", IX86_BUILTIN_SUBSH_MASK_ROUND, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI_INT) BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vmmulv8hf3_mask_round, "__builtin_ia32_mulsh_mask_round", IX86_BUILTIN_MULSH_MASK_ROUND, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI_INT) BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vmdivv8hf3_mask_round, "__builtin_ia32_divsh_mask_round", IX86_BUILTIN_DIVSH_MASK_ROUND, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI_INT) -BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_smaxv32hf3_mask_round, "__builtin_ia32_maxph512_mask_round", IX86_BUILTIN_MAXPH512_MASK_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT) -BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_sminv32hf3_mask_round, "__builtin_ia32_minph512_mask_round", IX86_BUILTIN_MINPH512_MASK_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT) +BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_smaxv32hf3_mask_round, "__builtin_ia32_maxph512_mask_round", IX86_BUILTIN_MAXPH512_MASK_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT) +BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_sminv32hf3_mask_round, "__builtin_ia32_minph512_mask_round", IX86_BUILTIN_MINPH512_MASK_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT) BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vmsmaxv8hf3_mask_round, "__builtin_ia32_maxsh_mask_round", IX86_BUILTIN_MAXSH_MASK_ROUND, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI_INT) BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vmsminv8hf3_mask_round, "__builtin_ia32_minsh_mask_round", IX86_BUILTIN_MINSH_MASK_ROUND, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI_INT) -BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_cmpv32hf3_mask_round, "__builtin_ia32_cmpph512_mask_round", IX86_BUILTIN_CMPPH512_MASK_ROUND, UNKNOWN, (int) USI_FTYPE_V32HF_V32HF_INT_USI_INT) +BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_cmpv32hf3_mask_round, "__builtin_ia32_cmpph512_mask_round", IX86_BUILTIN_CMPPH512_MASK_ROUND, UNKNOWN, (int) USI_FTYPE_V32HF_V32HF_INT_USI_INT) BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512f_vmcmpv8hf3_mask_round, "__builtin_ia32_cmpsh_mask_round", IX86_BUILTIN_CMPSH_MASK_ROUND, UNKNOWN, (int) UQI_FTYPE_V8HF_V8HF_INT_UQI_INT) -BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_sqrtv32hf2_mask_round, "__builtin_ia32_sqrtph512_mask_round", IX86_BUILTIN_SQRTPH512_MASK_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_USI_INT) +BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512fp16_sqrtv32hf2_mask_round, "__builtin_ia32_sqrtph512_mask_round", IX86_BUILTIN_SQRTPH512_MASK_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_USI_INT) BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vmsqrtv8hf2_mask_round, "__builtin_ia32_sqrtsh_mask_round", IX86_BUILTIN_SQRTSH_MASK_ROUND, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI_INT) -BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_scalefv32hf_mask_round, "__builtin_ia32_scalefph512_mask_round", IX86_BUILTIN_SCALEFPH512_MASK_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT) +BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_scalefv32hf_mask_round, "__builtin_ia32_scalefph512_mask_round", IX86_BUILTIN_SCALEFPH512_MASK_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT) BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512f_vmscalefv8hf_mask_round, "__builtin_ia32_scalefsh_mask_round", IX86_BUILTIN_SCALEFSH_MASK_ROUND, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI_INT) -BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_reducepv32hf_mask_round, "__builtin_ia32_reduceph512_mask_round", IX86_BUILTIN_REDUCEPH512_MASK_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HF_INT_V32HF_USI_INT) +BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_reducepv32hf_mask_round, "__builtin_ia32_reduceph512_mask_round", IX86_BUILTIN_REDUCEPH512_MASK_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HF_INT_V32HF_USI_INT) BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_reducesv8hf_mask_round, "__builtin_ia32_reducesh_mask_round", IX86_BUILTIN_REDUCESH_MASK_ROUND, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_INT_V8HF_UQI_INT) -BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_rndscalev32hf_mask_round, "__builtin_ia32_rndscaleph512_mask_round", IX86_BUILTIN_RNDSCALEPH512_MASK_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HF_INT_V32HF_USI_INT) +BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_rndscalev32hf_mask_round, "__builtin_ia32_rndscaleph512_mask_round", IX86_BUILTIN_RNDSCALEPH512_MASK_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HF_INT_V32HF_USI_INT) BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512f_rndscalev8hf_mask_round, "__builtin_ia32_rndscalesh_mask_round", IX86_BUILTIN_RNDSCALESH_MASK_ROUND, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_INT_V8HF_UQI_INT) -BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_getexpv32hf_mask_round, "__builtin_ia32_getexpph512_mask", IX86_BUILTIN_GETEXPPH512, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_USI_INT) +BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_getexpv32hf_mask_round, "__builtin_ia32_getexpph512_mask", IX86_BUILTIN_GETEXPPH512, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_USI_INT) BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512f_sgetexpv8hf_mask_round, "__builtin_ia32_getexpsh_mask_round", IX86_BUILTIN_GETEXPSH_MASK_ROUND, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI_INT) -BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_getmantv32hf_mask_round, "__builtin_ia32_getmantph512_mask", IX86_BUILTIN_GETMANTPH512, UNKNOWN, (int) V32HF_FTYPE_V32HF_INT_V32HF_USI_INT) +BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_getmantv32hf_mask_round, "__builtin_ia32_getmantph512_mask", IX86_BUILTIN_GETMANTPH512, UNKNOWN, (int) V32HF_FTYPE_V32HF_INT_V32HF_USI_INT) BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512f_vgetmantv8hf_mask_round, "__builtin_ia32_getmantsh_mask_round", IX86_BUILTIN_GETMANTSH_MASK_ROUND, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_INT_V8HF_UQI_INT) -BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtph2dq_v16si_mask_round, "__builtin_ia32_vcvtph2dq512_mask_round", IX86_BUILTIN_VCVTPH2DQ512_MASK_ROUND, UNKNOWN, (int) V16SI_FTYPE_V16HF_V16SI_UHI_INT) -BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtph2udq_v16si_mask_round, "__builtin_ia32_vcvtph2udq512_mask_round", IX86_BUILTIN_VCVTPH2UDQ512_MASK_ROUND, UNKNOWN, (int) V16SI_FTYPE_V16HF_V16SI_UHI_INT) -BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_fix_truncv16si2_mask_round, "__builtin_ia32_vcvttph2dq512_mask_round", IX86_BUILTIN_VCVTTPH2DQ512_MASK_ROUND, UNKNOWN, (int) V16SI_FTYPE_V16HF_V16SI_UHI_INT) -BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_fixuns_truncv16si2_mask_round, "__builtin_ia32_vcvttph2udq512_mask_round", IX86_BUILTIN_VCVTTPH2UDQ512_MASK_ROUND, UNKNOWN, (int) V16SI_FTYPE_V16HF_V16SI_UHI_INT) -BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtph2qq_v8di_mask_round, "__builtin_ia32_vcvtph2qq512_mask_round", IX86_BUILTIN_VCVTPH2QQ512_MASK_ROUND, UNKNOWN, (int) V8DI_FTYPE_V8HF_V8DI_UQI_INT) -BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtph2uqq_v8di_mask_round, "__builtin_ia32_vcvtph2uqq512_mask_round", IX86_BUILTIN_VCVTPH2UQQ512_MASK_ROUND, UNKNOWN, (int) V8DI_FTYPE_V8HF_V8DI_UQI_INT) -BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_fix_truncv8di2_mask_round, "__builtin_ia32_vcvttph2qq512_mask_round", IX86_BUILTIN_VCVTTPH2QQ512_MASK_ROUND, UNKNOWN, (int) V8DI_FTYPE_V8HF_V8DI_UQI_INT) -BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_fixuns_truncv8di2_mask_round, "__builtin_ia32_vcvttph2uqq512_mask_round", IX86_BUILTIN_VCVTTPH2UQQ512_MASK_ROUND, UNKNOWN, (int) V8DI_FTYPE_V8HF_V8DI_UQI_INT) -BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtph2w_v32hi_mask_round, "__builtin_ia32_vcvtph2w512_mask_round", IX86_BUILTIN_VCVTPH2W512_MASK_ROUND, UNKNOWN, (int) V32HI_FTYPE_V32HF_V32HI_USI_INT) -BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtph2uw_v32hi_mask_round, "__builtin_ia32_vcvtph2uw512_mask_round", IX86_BUILTIN_VCVTPH2UW512_MASK_ROUND, UNKNOWN, (int) V32HI_FTYPE_V32HF_V32HI_USI_INT) -BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_fix_truncv32hi2_mask_round, "__builtin_ia32_vcvttph2w512_mask_round", IX86_BUILTIN_VCVTTPH2W512_MASK_ROUND, UNKNOWN, (int) V32HI_FTYPE_V32HF_V32HI_USI_INT) -BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_fixuns_truncv32hi2_mask_round, "__builtin_ia32_vcvttph2uw512_mask_round", IX86_BUILTIN_VCVTTPH2UW512_MASK_ROUND, UNKNOWN, (int) V32HI_FTYPE_V32HF_V32HI_USI_INT) -BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtw2ph_v32hi_mask_round, "__builtin_ia32_vcvtw2ph512_mask_round", IX86_BUILTIN_VCVTW2PH512_MASK_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HI_V32HF_USI_INT) -BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtuw2ph_v32hi_mask_round, "__builtin_ia32_vcvtuw2ph512_mask_round", IX86_BUILTIN_VCVTUW2PH512_MASK_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HI_V32HF_USI_INT) -BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtdq2ph_v16si_mask_round, "__builtin_ia32_vcvtdq2ph512_mask_round", IX86_BUILTIN_VCVTDQ2PH512_MASK_ROUND, UNKNOWN, (int) V16HF_FTYPE_V16SI_V16HF_UHI_INT) -BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtudq2ph_v16si_mask_round, "__builtin_ia32_vcvtudq2ph512_mask_round", IX86_BUILTIN_VCVTUDQ2PH512_MASK_ROUND, UNKNOWN, (int) V16HF_FTYPE_V16SI_V16HF_UHI_INT) -BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtqq2ph_v8di_mask_round, "__builtin_ia32_vcvtqq2ph512_mask_round", IX86_BUILTIN_VCVTQQ2PH512_MASK_ROUND, UNKNOWN, (int) V8HF_FTYPE_V8DI_V8HF_UQI_INT) -BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtuqq2ph_v8di_mask_round, "__builtin_ia32_vcvtuqq2ph512_mask_round", IX86_BUILTIN_VCVTUQQ2PH512_MASK_ROUND, UNKNOWN, (int) V8HF_FTYPE_V8DI_V8HF_UQI_INT) +BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512fp16_vcvtph2dq_v16si_mask_round, "__builtin_ia32_vcvtph2dq512_mask_round", IX86_BUILTIN_VCVTPH2DQ512_MASK_ROUND, UNKNOWN, (int) V16SI_FTYPE_V16HF_V16SI_UHI_INT) +BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512fp16_vcvtph2udq_v16si_mask_round, "__builtin_ia32_vcvtph2udq512_mask_round", IX86_BUILTIN_VCVTPH2UDQ512_MASK_ROUND, UNKNOWN, (int) V16SI_FTYPE_V16HF_V16SI_UHI_INT) +BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512fp16_fix_truncv16si2_mask_round, "__builtin_ia32_vcvttph2dq512_mask_round", IX86_BUILTIN_VCVTTPH2DQ512_MASK_ROUND, UNKNOWN, (int) V16SI_FTYPE_V16HF_V16SI_UHI_INT) +BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512fp16_fixuns_truncv16si2_mask_round, "__builtin_ia32_vcvttph2udq512_mask_round", IX86_BUILTIN_VCVTTPH2UDQ512_MASK_ROUND, UNKNOWN, (int) V16SI_FTYPE_V16HF_V16SI_UHI_INT) +BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512fp16_vcvtph2qq_v8di_mask_round, "__builtin_ia32_vcvtph2qq512_mask_round", IX86_BUILTIN_VCVTPH2QQ512_MASK_ROUND, UNKNOWN, (int) V8DI_FTYPE_V8HF_V8DI_UQI_INT) +BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512fp16_vcvtph2uqq_v8di_mask_round, "__builtin_ia32_vcvtph2uqq512_mask_round", IX86_BUILTIN_VCVTPH2UQQ512_MASK_ROUND, UNKNOWN, (int) V8DI_FTYPE_V8HF_V8DI_UQI_INT) +BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512fp16_fix_truncv8di2_mask_round, "__builtin_ia32_vcvttph2qq512_mask_round", IX86_BUILTIN_VCVTTPH2QQ512_MASK_ROUND, UNKNOWN, (int) V8DI_FTYPE_V8HF_V8DI_UQI_INT) +BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512fp16_fixuns_truncv8di2_mask_round, "__builtin_ia32_vcvttph2uqq512_mask_round", IX86_BUILTIN_VCVTTPH2UQQ512_MASK_ROUND, UNKNOWN, (int) V8DI_FTYPE_V8HF_V8DI_UQI_INT) +BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512fp16_vcvtph2w_v32hi_mask_round, "__builtin_ia32_vcvtph2w512_mask_round", IX86_BUILTIN_VCVTPH2W512_MASK_ROUND, UNKNOWN, (int) V32HI_FTYPE_V32HF_V32HI_USI_INT) +BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512fp16_vcvtph2uw_v32hi_mask_round, "__builtin_ia32_vcvtph2uw512_mask_round", IX86_BUILTIN_VCVTPH2UW512_MASK_ROUND, UNKNOWN, (int) V32HI_FTYPE_V32HF_V32HI_USI_INT) +BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512fp16_fix_truncv32hi2_mask_round, "__builtin_ia32_vcvttph2w512_mask_round", IX86_BUILTIN_VCVTTPH2W512_MASK_ROUND, UNKNOWN, (int) V32HI_FTYPE_V32HF_V32HI_USI_INT) +BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512fp16_fixuns_truncv32hi2_mask_round, "__builtin_ia32_vcvttph2uw512_mask_round", IX86_BUILTIN_VCVTTPH2UW512_MASK_ROUND, UNKNOWN, (int) V32HI_FTYPE_V32HF_V32HI_USI_INT) +BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512fp16_vcvtw2ph_v32hi_mask_round, "__builtin_ia32_vcvtw2ph512_mask_round", IX86_BUILTIN_VCVTW2PH512_MASK_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HI_V32HF_USI_INT) +BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512fp16_vcvtuw2ph_v32hi_mask_round, "__builtin_ia32_vcvtuw2ph512_mask_round", IX86_BUILTIN_VCVTUW2PH512_MASK_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HI_V32HF_USI_INT) +BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512fp16_vcvtdq2ph_v16si_mask_round, "__builtin_ia32_vcvtdq2ph512_mask_round", IX86_BUILTIN_VCVTDQ2PH512_MASK_ROUND, UNKNOWN, (int) V16HF_FTYPE_V16SI_V16HF_UHI_INT) +BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512fp16_vcvtudq2ph_v16si_mask_round, "__builtin_ia32_vcvtudq2ph512_mask_round", IX86_BUILTIN_VCVTUDQ2PH512_MASK_ROUND, UNKNOWN, (int) V16HF_FTYPE_V16SI_V16HF_UHI_INT) +BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512fp16_vcvtqq2ph_v8di_mask_round, "__builtin_ia32_vcvtqq2ph512_mask_round", IX86_BUILTIN_VCVTQQ2PH512_MASK_ROUND, UNKNOWN, (int) V8HF_FTYPE_V8DI_V8HF_UQI_INT) +BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512fp16_vcvtuqq2ph_v8di_mask_round, "__builtin_ia32_vcvtuqq2ph512_mask_round", IX86_BUILTIN_VCVTUQQ2PH512_MASK_ROUND, UNKNOWN, (int) V8HF_FTYPE_V8DI_V8HF_UQI_INT) BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtsh2si_round, "__builtin_ia32_vcvtsh2si32_round", IX86_BUILTIN_VCVTSH2SI32_ROUND, UNKNOWN, (int) INT_FTYPE_V8HF_INT) BDESC (OPTION_MASK_ISA_64BIT, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtsh2siq_round, "__builtin_ia32_vcvtsh2si64_round", IX86_BUILTIN_VCVTSH2SI64_ROUND, UNKNOWN, (int) INT64_FTYPE_V8HF_INT) BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtsh2usi_round, "__builtin_ia32_vcvtsh2usi32_round", IX86_BUILTIN_VCVTSH2USI32_ROUND, UNKNOWN, (int) UINT_FTYPE_V8HF_INT) @@ -3285,32 +3285,32 @@ BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtsi2sh_round, "__b BDESC (OPTION_MASK_ISA_64BIT, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtsi2shq_round, "__builtin_ia32_vcvtsi2sh64_round", IX86_BUILTIN_VCVTSI2SH64_ROUND, UNKNOWN, (int) V8HF_FTYPE_V8HF_INT64_INT) BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtusi2sh_round, "__builtin_ia32_vcvtusi2sh32_round", IX86_BUILTIN_VCVTUSI2SH32_ROUND, UNKNOWN, (int) V8HF_FTYPE_V8HF_UINT_INT) BDESC (OPTION_MASK_ISA_64BIT, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtusi2shq_round, "__builtin_ia32_vcvtusi2sh64_round", IX86_BUILTIN_VCVTUSI2SH64_ROUND, UNKNOWN, (int) V8HF_FTYPE_V8HF_UINT64_INT) -BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_float_extend_phv8df2_mask_round, "__builtin_ia32_vcvtph2pd512_mask_round", IX86_BUILTIN_VCVTPH2PD512_MASK_ROUND, UNKNOWN, (int) V8DF_FTYPE_V8HF_V8DF_UQI_INT) -BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_float_extend_phv16sf2_mask_round, "__builtin_ia32_vcvtph2psx512_mask_round", IX86_BUILTIN_VCVTPH2PSX512_MASK_ROUND, UNKNOWN, (int) V16SF_FTYPE_V16HF_V16SF_UHI_INT) -BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtpd2ph_v8df_mask_round, "__builtin_ia32_vcvtpd2ph512_mask_round", IX86_BUILTIN_VCVTPD2PH512_MASK_ROUND, UNKNOWN, (int) V8HF_FTYPE_V8DF_V8HF_UQI_INT) -BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtps2ph_v16sf_mask_round, "__builtin_ia32_vcvtps2phx512_mask_round", IX86_BUILTIN_VCVTPS2PHX512_MASK_ROUND, UNKNOWN, (int) V16HF_FTYPE_V16SF_V16HF_UHI_INT) +BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512fp16_float_extend_phv8df2_mask_round, "__builtin_ia32_vcvtph2pd512_mask_round", IX86_BUILTIN_VCVTPH2PD512_MASK_ROUND, UNKNOWN, (int) V8DF_FTYPE_V8HF_V8DF_UQI_INT) +BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512fp16_float_extend_phv16sf2_mask_round, "__builtin_ia32_vcvtph2psx512_mask_round", IX86_BUILTIN_VCVTPH2PSX512_MASK_ROUND, UNKNOWN, (int) V16SF_FTYPE_V16HF_V16SF_UHI_INT) +BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512fp16_vcvtpd2ph_v8df_mask_round, "__builtin_ia32_vcvtpd2ph512_mask_round", IX86_BUILTIN_VCVTPD2PH512_MASK_ROUND, UNKNOWN, (int) V8HF_FTYPE_V8DF_V8HF_UQI_INT) +BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512fp16_vcvtps2ph_v16sf_mask_round, "__builtin_ia32_vcvtps2phx512_mask_round", IX86_BUILTIN_VCVTPS2PHX512_MASK_ROUND, UNKNOWN, (int) V16HF_FTYPE_V16SF_V16HF_UHI_INT) BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtsh2ss_mask_round, "__builtin_ia32_vcvtsh2ss_mask_round", IX86_BUILTIN_VCVTSH2SS_MASK_ROUND, UNKNOWN, (int) V4SF_FTYPE_V8HF_V4SF_V4SF_UQI_INT) BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtsh2sd_mask_round, "__builtin_ia32_vcvtsh2sd_mask_round", IX86_BUILTIN_VCVTSH2SD_MASK_ROUND, UNKNOWN, (int) V2DF_FTYPE_V8HF_V2DF_V2DF_UQI_INT) BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtss2sh_mask_round, "__builtin_ia32_vcvtss2sh_mask_round", IX86_BUILTIN_VCVTSS2SH_MASK_ROUND, UNKNOWN, (int) V8HF_FTYPE_V4SF_V8HF_V8HF_UQI_INT) BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtsd2sh_mask_round, "__builtin_ia32_vcvtsd2sh_mask_round", IX86_BUILTIN_VCVTSD2SH_MASK_ROUND, UNKNOWN, (int) V8HF_FTYPE_V2DF_V8HF_V8HF_UQI_INT) -BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_fmaddsub_v32hf_mask_round, "__builtin_ia32_vfmaddsubph512_mask", IX86_BUILTIN_VFMADDSUBPH512_MASK, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT) -BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_fmaddsub_v32hf_mask3_round, "__builtin_ia32_vfmaddsubph512_mask3", IX86_BUILTIN_VFMADDSUBPH512_MASK3, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT) -BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_fmaddsub_v32hf_maskz_round, "__builtin_ia32_vfmaddsubph512_maskz", IX86_BUILTIN_VFMADDSUBPH512_MASKZ, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT) -BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_fmsubadd_v32hf_mask_round, "__builtin_ia32_vfmsubaddph512_mask", IX86_BUILTIN_VFMSUBADDPH512_MASK, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT) -BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_fmsubadd_v32hf_mask3_round, "__builtin_ia32_vfmsubaddph512_mask3", IX86_BUILTIN_VFMSUBADDPH512_MASK3, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT) -BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_fmsubadd_v32hf_maskz_round, "__builtin_ia32_vfmsubaddph512_maskz", IX86_BUILTIN_VFMSUBADDPH512_MASKZ, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT) -BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_fmadd_v32hf_mask_round, "__builtin_ia32_vfmaddph512_mask", IX86_BUILTIN_VFMADDPH512_MASK, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT) -BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_fmadd_v32hf_mask3_round, "__builtin_ia32_vfmaddph512_mask3", IX86_BUILTIN_VFMADDPH512_MASK3, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT) -BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_fmadd_v32hf_maskz_round, "__builtin_ia32_vfmaddph512_maskz", IX86_BUILTIN_VFMADDPH512_MASKZ, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT) -BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_fnmadd_v32hf_mask_round, "__builtin_ia32_vfnmaddph512_mask", IX86_BUILTIN_VFNMADDPH512_MASK, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT) -BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_fnmadd_v32hf_mask3_round, "__builtin_ia32_vfnmaddph512_mask3", IX86_BUILTIN_VFNMADDPH512_MASK3, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT) -BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_fnmadd_v32hf_maskz_round, "__builtin_ia32_vfnmaddph512_maskz", IX86_BUILTIN_VFNMADDPH512_MASKZ, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT) -BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_fmsub_v32hf_mask_round, "__builtin_ia32_vfmsubph512_mask", IX86_BUILTIN_VFMSUBPH512_MASK, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT) -BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_fmsub_v32hf_mask3_round, "__builtin_ia32_vfmsubph512_mask3", IX86_BUILTIN_VFMSUBPH512_MASK3, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT) -BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_fmsub_v32hf_maskz_round, "__builtin_ia32_vfmsubph512_maskz", IX86_BUILTIN_VFMSUBPH512_MASKZ, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT) -BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_fnmsub_v32hf_mask_round, "__builtin_ia32_vfnmsubph512_mask", IX86_BUILTIN_VFNMSUBPH512_MASK, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT) -BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_fnmsub_v32hf_mask3_round, "__builtin_ia32_vfnmsubph512_mask3", IX86_BUILTIN_VFNMSUBPH512_MASK3, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT) -BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_fnmsub_v32hf_maskz_round, "__builtin_ia32_vfnmsubph512_maskz", IX86_BUILTIN_VFNMSUBPH512_MASKZ, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT) +BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_fmaddsub_v32hf_mask_round, "__builtin_ia32_vfmaddsubph512_mask", IX86_BUILTIN_VFMADDSUBPH512_MASK, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT) +BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_fmaddsub_v32hf_mask3_round, "__builtin_ia32_vfmaddsubph512_mask3", IX86_BUILTIN_VFMADDSUBPH512_MASK3, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT) +BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_fmaddsub_v32hf_maskz_round, "__builtin_ia32_vfmaddsubph512_maskz", IX86_BUILTIN_VFMADDSUBPH512_MASKZ, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT) +BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_fmsubadd_v32hf_mask_round, "__builtin_ia32_vfmsubaddph512_mask", IX86_BUILTIN_VFMSUBADDPH512_MASK, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT) +BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_fmsubadd_v32hf_mask3_round, "__builtin_ia32_vfmsubaddph512_mask3", IX86_BUILTIN_VFMSUBADDPH512_MASK3, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT) +BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_fmsubadd_v32hf_maskz_round, "__builtin_ia32_vfmsubaddph512_maskz", IX86_BUILTIN_VFMSUBADDPH512_MASKZ, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT) +BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_fmadd_v32hf_mask_round, "__builtin_ia32_vfmaddph512_mask", IX86_BUILTIN_VFMADDPH512_MASK, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT) +BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_fmadd_v32hf_mask3_round, "__builtin_ia32_vfmaddph512_mask3", IX86_BUILTIN_VFMADDPH512_MASK3, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT) +BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_fmadd_v32hf_maskz_round, "__builtin_ia32_vfmaddph512_maskz", IX86_BUILTIN_VFMADDPH512_MASKZ, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT) +BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_fnmadd_v32hf_mask_round, "__builtin_ia32_vfnmaddph512_mask", IX86_BUILTIN_VFNMADDPH512_MASK, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT) +BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_fnmadd_v32hf_mask3_round, "__builtin_ia32_vfnmaddph512_mask3", IX86_BUILTIN_VFNMADDPH512_MASK3, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT) +BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_fnmadd_v32hf_maskz_round, "__builtin_ia32_vfnmaddph512_maskz", IX86_BUILTIN_VFNMADDPH512_MASKZ, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT) +BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_fmsub_v32hf_mask_round, "__builtin_ia32_vfmsubph512_mask", IX86_BUILTIN_VFMSUBPH512_MASK, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT) +BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_fmsub_v32hf_mask3_round, "__builtin_ia32_vfmsubph512_mask3", IX86_BUILTIN_VFMSUBPH512_MASK3, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT) +BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_fmsub_v32hf_maskz_round, "__builtin_ia32_vfmsubph512_maskz", IX86_BUILTIN_VFMSUBPH512_MASKZ, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT) +BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_fnmsub_v32hf_mask_round, "__builtin_ia32_vfnmsubph512_mask", IX86_BUILTIN_VFNMSUBPH512_MASK, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT) +BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_fnmsub_v32hf_mask3_round, "__builtin_ia32_vfnmsubph512_mask3", IX86_BUILTIN_VFNMSUBPH512_MASK3, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT) +BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_fnmsub_v32hf_maskz_round, "__builtin_ia32_vfnmsubph512_maskz", IX86_BUILTIN_VFNMSUBPH512_MASKZ, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT) BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512f_vmfmadd_v8hf_mask_round, "__builtin_ia32_vfmaddsh3_mask", IX86_BUILTIN_VFMADDSH3_MASK, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI_INT) BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512f_vmfmadd_v8hf_mask3_round, "__builtin_ia32_vfmaddsh3_mask3", IX86_BUILTIN_VFMADDSH3_MASK3, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI_INT) BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512f_vmfmadd_v8hf_maskz_round, "__builtin_ia32_vfmaddsh3_maskz", IX86_BUILTIN_VFMADDSH3_MASKZ, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI_INT) @@ -3318,18 +3318,18 @@ BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512f_vmfnmadd_v8hf_mask_round BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512f_vmfnmadd_v8hf_mask3_round, "__builtin_ia32_vfnmaddsh3_mask3", IX86_BUILTIN_VFNMADDSH3_MASK3, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI_INT) BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512f_vmfnmadd_v8hf_maskz_round, "__builtin_ia32_vfnmaddsh3_maskz", IX86_BUILTIN_VFNMADDSH3_MASKZ, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI_INT) BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512f_vmfmsub_v8hf_mask3_round, "__builtin_ia32_vfmsubsh3_mask3", IX86_BUILTIN_VFMSUBSH3_MASK3, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI_INT) -BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_fma_fmaddc_v32hf_round, "__builtin_ia32_vfmaddcph512_round", IX86_BUILTIN_VFMADDCPH512_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_INT) -BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_fmaddc_v32hf_mask1_round, "__builtin_ia32_vfmaddcph512_mask_round", IX86_BUILTIN_VFMADDCPH512_MASK_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_UHI_INT) -BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_fmaddc_v32hf_mask_round, "__builtin_ia32_vfmaddcph512_mask3_round", IX86_BUILTIN_VFMADDCPH512_MASK3_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_UHI_INT) -BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_fmaddc_v32hf_maskz_round, "__builtin_ia32_vfmaddcph512_maskz_round", IX86_BUILTIN_VFMADDCPH512_MASKZ_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_UHI_INT) -BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_fma_fcmaddc_v32hf_round, "__builtin_ia32_vfcmaddcph512_round", IX86_BUILTIN_VFCMADDCPH512_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_INT) -BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_fcmaddc_v32hf_mask1_round, "__builtin_ia32_vfcmaddcph512_mask_round", IX86_BUILTIN_VFCMADDCPH512_MASK_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_UHI_INT) -BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_fcmaddc_v32hf_mask_round, "__builtin_ia32_vfcmaddcph512_mask3_round", IX86_BUILTIN_VFCMADDCPH512_MASK3_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_UHI_INT) -BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_fcmaddc_v32hf_maskz_round, "__builtin_ia32_vfcmaddcph512_maskz_round", IX86_BUILTIN_VFCMADDCPH512_MASKZ_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_UHI_INT) -BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_fcmulc_v32hf_round, "__builtin_ia32_vfcmulcph512_round", IX86_BUILTIN_VFCMULCPH512_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_INT) -BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_fcmulc_v32hf_mask_round, "__builtin_ia32_vfcmulcph512_mask_round", IX86_BUILTIN_VFCMULCPH512_MASK_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_UHI_INT) -BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_fmulc_v32hf_round, "__builtin_ia32_vfmulcph512_round", IX86_BUILTIN_VFMULCPH512_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_INT) -BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_fmulc_v32hf_mask_round, "__builtin_ia32_vfmulcph512_mask_round", IX86_BUILTIN_VFMULCPH512_MASK_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_UHI_INT) +BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_fma_fmaddc_v32hf_round, "__builtin_ia32_vfmaddcph512_round", IX86_BUILTIN_VFMADDCPH512_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_INT) +BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_fmaddc_v32hf_mask1_round, "__builtin_ia32_vfmaddcph512_mask_round", IX86_BUILTIN_VFMADDCPH512_MASK_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_UHI_INT) +BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_fmaddc_v32hf_mask_round, "__builtin_ia32_vfmaddcph512_mask3_round", IX86_BUILTIN_VFMADDCPH512_MASK3_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_UHI_INT) +BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_fmaddc_v32hf_maskz_round, "__builtin_ia32_vfmaddcph512_maskz_round", IX86_BUILTIN_VFMADDCPH512_MASKZ_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_UHI_INT) +BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_fma_fcmaddc_v32hf_round, "__builtin_ia32_vfcmaddcph512_round", IX86_BUILTIN_VFCMADDCPH512_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_INT) +BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_fcmaddc_v32hf_mask1_round, "__builtin_ia32_vfcmaddcph512_mask_round", IX86_BUILTIN_VFCMADDCPH512_MASK_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_UHI_INT) +BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_fcmaddc_v32hf_mask_round, "__builtin_ia32_vfcmaddcph512_mask3_round", IX86_BUILTIN_VFCMADDCPH512_MASK3_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_UHI_INT) +BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_fcmaddc_v32hf_maskz_round, "__builtin_ia32_vfcmaddcph512_maskz_round", IX86_BUILTIN_VFCMADDCPH512_MASKZ_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_UHI_INT) +BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_fcmulc_v32hf_round, "__builtin_ia32_vfcmulcph512_round", IX86_BUILTIN_VFCMULCPH512_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_INT) +BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_fcmulc_v32hf_mask_round, "__builtin_ia32_vfcmulcph512_mask_round", IX86_BUILTIN_VFCMULCPH512_MASK_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_UHI_INT) +BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_fmulc_v32hf_round, "__builtin_ia32_vfmulcph512_round", IX86_BUILTIN_VFMULCPH512_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_INT) +BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_fmulc_v32hf_mask_round, "__builtin_ia32_vfmulcph512_mask_round", IX86_BUILTIN_VFMULCPH512_MASK_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_UHI_INT) BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_fma_fcmaddcsh_v8hf_round, "__builtin_ia32_vfcmaddcsh_round", IX86_BUILTIN_VFCMADDCSH_ROUND, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_INT) BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_fcmaddcsh_v8hf_mask1_round, "__builtin_ia32_vfcmaddcsh_mask_round", IX86_BUILTIN_VFCMADDCSH_MASK_ROUND, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI_INT) BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_fcmaddcsh_v8hf_mask3_round, "__builtin_ia32_vfcmaddcsh_mask3_round", IX86_BUILTIN_VFCMADDCSH_MASK3_ROUND, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI_INT) From patchwork Thu Sep 21 07:20:07 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Hu, Lin1" X-Patchwork-Id: 1837525 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=intel.com header.i=@intel.com header.a=rsa-sha256 header.s=Intel header.b=a+HToQxA; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=8.43.85.97; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (ip-8-43-85-97.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4Rrn4X09vCz1yh6 for ; Thu, 21 Sep 2023 17:25:48 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id B255B393BA52 for ; Thu, 21 Sep 2023 07:25:44 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mgamail.intel.com (mgamail.intel.com [192.55.52.120]) by sourceware.org (Postfix) with ESMTPS id 5B31D385F00A for ; Thu, 21 Sep 2023 07:25:04 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 5B31D385F00A Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=intel.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1695281104; x=1726817104; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=+h7ejzRILbymkOAc+/sXHVhCvFXzh6GvpdnD6gvJtEY=; b=a+HToQxA7z2SAN5BI81mCAEiW5REZ5BxbghjcGuCIR+8vmen6CfcB/iN ZJHOuwJ1N9n00YO9oNTl/gmwzru4u6IcQWcRkKQBA350Yw6gl3LwAU4RJ azkkOp7Tfskx4HS9pfY1qGHpHEV5pxOCJ74GE+cAFWWmFxEZ3+snrhne3 U6w43rXqXDOEcB7wYURGUx8C4zfJMjtiaPgIAby1LlbOIyp5glp3fXq4C XiQfwcbPXHRzy0+hWCd8VVxz7DyroTo0+/avSDeazpKkz9Qvc/kcwo683 tbcQVm/RATjM4I7F35vx0gT+tgDRzqcR9SkYd8oDXPV4J/8BYsIKiV3T6 w==; X-IronPort-AV: E=McAfee;i="6600,9927,10839"; a="379326676" X-IronPort-AV: E=Sophos;i="6.03,164,1694761200"; d="scan'208";a="379326676" Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by fmsmga104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Sep 2023 00:22:18 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10839"; a="817262183" X-IronPort-AV: E=Sophos;i="6.03,164,1694761200"; d="scan'208";a="817262183" Received: from shvmail03.sh.intel.com ([10.239.245.20]) by fmsmga004.fm.intel.com with ESMTP; 21 Sep 2023 00:22:15 -0700 Received: from shliclel4217.sh.intel.com (shliclel4217.sh.intel.com [10.239.240.127]) by shvmail03.sh.intel.com (Postfix) with ESMTP id 54C93100513A; Thu, 21 Sep 2023 15:22:14 +0800 (CST) From: "Hu, Lin1" To: gcc-patches@gcc.gnu.org Cc: hongtao.liu@intel.com, ubizjak@gmail.com, haochen.jiang@intel.com Subject: [PATCH 12/18] Disable zmm register and 512 bit libmvec call when !TARGET_EVEX512 Date: Thu, 21 Sep 2023 15:20:07 +0800 Message-Id: <20230921072013.2124750-13-lin1.hu@intel.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20230921072013.2124750-1-lin1.hu@intel.com> References: <20230921072013.2124750-1-lin1.hu@intel.com> MIME-Version: 1.0 X-Spam-Status: No, score=-11.3 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_NUMSUBJECT, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org From: Haochen Jiang gcc/ChangeLog: * config/i386/i386-expand.cc (ix86_broadcast_from_constant): Disable zmm broadcast for !TARGET_EVEX512. * config/i386/i386-options.cc (ix86_option_override_internal): Do not use PVW_512 when no-evex512. (ix86_simd_clone_adjust): Add evex512 target into string. * config/i386/i386.cc (type_natural_mode): Report ABI warning when using zmm register w/o evex512. (ix86_return_in_memory): Do not allow zmm when !TARGET_EVEX512. (ix86_hard_regno_mode_ok): Ditto. (ix86_set_reg_reg_cost): Ditto. (ix86_rtx_costs): Ditto. (ix86_vector_mode_supported_p): Ditto. (ix86_preferred_simd_mode): Ditto. (ix86_get_mask_mode): Ditto. (ix86_simd_clone_compute_vecsize_and_simdlen): Disable 512 bit libmvec call when !TARGET_EVEX512. (ix86_simd_clone_usable): Ditto. * config/i386/i386.h (BIGGEST_ALIGNMENT): Disable 512 alignment when !TARGET_EVEX512 (MOVE_MAX): Do not use PVW_512 when !TARGET_EVEX512. (STORE_MAX_PIECES): Ditto. --- gcc/config/i386/i386-expand.cc | 1 + gcc/config/i386/i386-options.cc | 14 +++++---- gcc/config/i386/i386.cc | 53 ++++++++++++++++++--------------- gcc/config/i386/i386.h | 7 +++-- 4 files changed, 42 insertions(+), 33 deletions(-) diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc index e42ff27c6ef..6eedcb384c0 100644 --- a/gcc/config/i386/i386-expand.cc +++ b/gcc/config/i386/i386-expand.cc @@ -611,6 +611,7 @@ ix86_broadcast_from_constant (machine_mode mode, rtx op) avx512 embed broadcast is available. */ if (GET_MODE_INNER (mode) == DImode && !TARGET_64BIT && (!TARGET_AVX512F + || (GET_MODE_SIZE (mode) == 64 && !TARGET_EVEX512) || (GET_MODE_SIZE (mode) < 64 && !TARGET_AVX512VL))) return nullptr; diff --git a/gcc/config/i386/i386-options.cc b/gcc/config/i386/i386-options.cc index a1a7a92da9f..e2a90d7d9e2 100644 --- a/gcc/config/i386/i386-options.cc +++ b/gcc/config/i386/i386-options.cc @@ -2845,7 +2845,8 @@ ix86_option_override_internal (bool main_args_p, opts->x_ix86_move_max = opts->x_prefer_vector_width_type; if (opts_set->x_ix86_move_max == PVW_NONE) { - if (TARGET_AVX512F_P (opts->x_ix86_isa_flags)) + if (TARGET_AVX512F_P (opts->x_ix86_isa_flags) + && TARGET_EVEX512_P (opts->x_ix86_isa_flags2)) opts->x_ix86_move_max = PVW_AVX512; else opts->x_ix86_move_max = PVW_AVX128; @@ -2866,7 +2867,8 @@ ix86_option_override_internal (bool main_args_p, opts->x_ix86_store_max = opts->x_prefer_vector_width_type; if (opts_set->x_ix86_store_max == PVW_NONE) { - if (TARGET_AVX512F_P (opts->x_ix86_isa_flags)) + if (TARGET_AVX512F_P (opts->x_ix86_isa_flags) + && TARGET_EVEX512_P (opts->x_ix86_isa_flags2)) opts->x_ix86_store_max = PVW_AVX512; else opts->x_ix86_store_max = PVW_AVX128; @@ -3145,13 +3147,13 @@ ix86_simd_clone_adjust (struct cgraph_node *node) case 'e': if (TARGET_PREFER_AVX256) { - if (!TARGET_AVX512F) - str = "avx512f,prefer-vector-width=512"; + if (!TARGET_AVX512F || !TARGET_EVEX512) + str = "avx512f,evex512,prefer-vector-width=512"; else str = "prefer-vector-width=512"; } - else if (!TARGET_AVX512F) - str = "avx512f"; + else if (!TARGET_AVX512F || !TARGET_EVEX512) + str = "avx512f,evex512"; break; default: gcc_unreachable (); diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc index 477e6cecc38..0df3bf10547 100644 --- a/gcc/config/i386/i386.cc +++ b/gcc/config/i386/i386.cc @@ -1924,7 +1924,8 @@ type_natural_mode (const_tree type, const CUMULATIVE_ARGS *cum, if (GET_MODE_NUNITS (mode) == TYPE_VECTOR_SUBPARTS (type) && GET_MODE_INNER (mode) == innermode) { - if (size == 64 && !TARGET_AVX512F && !TARGET_IAMCU) + if (size == 64 && (!TARGET_AVX512F || !TARGET_EVEX512) + && !TARGET_IAMCU) { static bool warnedavx512f; static bool warnedavx512f_ret; @@ -4347,7 +4348,7 @@ ix86_return_in_memory (const_tree type, const_tree fntype ATTRIBUTE_UNUSED) /* AVX512F values are returned in ZMM0 if available. */ if (size == 64) - return !TARGET_AVX512F; + return !TARGET_AVX512F || !TARGET_EVEX512; } if (mode == XFmode) @@ -20286,7 +20287,7 @@ ix86_hard_regno_mode_ok (unsigned int regno, machine_mode mode) - any of 512-bit wide vector mode - any scalar mode. */ if (TARGET_AVX512F - && (VALID_AVX512F_REG_OR_XI_MODE (mode) + && ((VALID_AVX512F_REG_OR_XI_MODE (mode) && TARGET_EVEX512) || VALID_AVX512F_SCALAR_MODE (mode))) return true; @@ -20538,7 +20539,7 @@ ix86_set_reg_reg_cost (machine_mode mode) case MODE_VECTOR_INT: case MODE_VECTOR_FLOAT: - if ((TARGET_AVX512F && VALID_AVX512F_REG_MODE (mode)) + if ((TARGET_AVX512F && TARGET_EVEX512 && VALID_AVX512F_REG_MODE (mode)) || (TARGET_AVX && VALID_AVX256_REG_MODE (mode)) || (TARGET_SSE2 && VALID_SSE2_REG_MODE (mode)) || (TARGET_SSE && VALID_SSE_REG_MODE (mode)) @@ -21267,7 +21268,8 @@ ix86_rtx_costs (rtx x, machine_mode mode, int outer_code_i, int opno, { /* (ior (not ...) ...) can be a single insn in AVX512. */ if (GET_CODE (XEXP (x, 0)) == NOT && TARGET_AVX512F - && (GET_MODE_SIZE (mode) == 64 + && ((TARGET_EVEX512 + && GET_MODE_SIZE (mode) == 64) || (TARGET_AVX512VL && (GET_MODE_SIZE (mode) == 32 || GET_MODE_SIZE (mode) == 16)))) @@ -21315,7 +21317,8 @@ ix86_rtx_costs (rtx x, machine_mode mode, int outer_code_i, int opno, /* (and (not ...) (not ...)) can be a single insn in AVX512. */ if (GET_CODE (right) == NOT && TARGET_AVX512F - && (GET_MODE_SIZE (mode) == 64 + && ((TARGET_EVEX512 + && GET_MODE_SIZE (mode) == 64) || (TARGET_AVX512VL && (GET_MODE_SIZE (mode) == 32 || GET_MODE_SIZE (mode) == 16)))) @@ -21385,7 +21388,8 @@ ix86_rtx_costs (rtx x, machine_mode mode, int outer_code_i, int opno, { /* (not (xor ...)) can be a single insn in AVX512. */ if (GET_CODE (XEXP (x, 0)) == XOR && TARGET_AVX512F - && (GET_MODE_SIZE (mode) == 64 + && ((TARGET_EVEX512 + && GET_MODE_SIZE (mode) == 64) || (TARGET_AVX512VL && (GET_MODE_SIZE (mode) == 32 || GET_MODE_SIZE (mode) == 16)))) @@ -23000,7 +23004,7 @@ ix86_vector_mode_supported_p (machine_mode mode) return true; if (TARGET_AVX && VALID_AVX256_REG_MODE (mode)) return true; - if (TARGET_AVX512F && VALID_AVX512F_REG_MODE (mode)) + if (TARGET_AVX512F && TARGET_EVEX512 && VALID_AVX512F_REG_MODE (mode)) return true; if ((TARGET_MMX || TARGET_MMX_WITH_SSE) && VALID_MMX_REG_MODE (mode)) @@ -23690,7 +23694,7 @@ ix86_preferred_simd_mode (scalar_mode mode) switch (mode) { case E_QImode: - if (TARGET_AVX512BW && !TARGET_PREFER_AVX256) + if (TARGET_AVX512BW && TARGET_EVEX512 && !TARGET_PREFER_AVX256) return V64QImode; else if (TARGET_AVX && !TARGET_PREFER_AVX128) return V32QImode; @@ -23698,7 +23702,7 @@ ix86_preferred_simd_mode (scalar_mode mode) return V16QImode; case E_HImode: - if (TARGET_AVX512BW && !TARGET_PREFER_AVX256) + if (TARGET_AVX512BW && TARGET_EVEX512 && !TARGET_PREFER_AVX256) return V32HImode; else if (TARGET_AVX && !TARGET_PREFER_AVX128) return V16HImode; @@ -23706,7 +23710,7 @@ ix86_preferred_simd_mode (scalar_mode mode) return V8HImode; case E_SImode: - if (TARGET_AVX512F && !TARGET_PREFER_AVX256) + if (TARGET_AVX512F && TARGET_EVEX512 && !TARGET_PREFER_AVX256) return V16SImode; else if (TARGET_AVX && !TARGET_PREFER_AVX128) return V8SImode; @@ -23714,7 +23718,7 @@ ix86_preferred_simd_mode (scalar_mode mode) return V4SImode; case E_DImode: - if (TARGET_AVX512F && !TARGET_PREFER_AVX256) + if (TARGET_AVX512F && TARGET_EVEX512 && !TARGET_PREFER_AVX256) return V8DImode; else if (TARGET_AVX && !TARGET_PREFER_AVX128) return V4DImode; @@ -23728,15 +23732,16 @@ ix86_preferred_simd_mode (scalar_mode mode) { if (TARGET_PREFER_AVX128) return V8HFmode; - else if (TARGET_PREFER_AVX256) + else if (TARGET_PREFER_AVX256 || !TARGET_EVEX512) return V16HFmode; } - return V32HFmode; + if (TARGET_EVEX512) + return V32HFmode; } return word_mode; case E_SFmode: - if (TARGET_AVX512F && !TARGET_PREFER_AVX256) + if (TARGET_AVX512F && TARGET_EVEX512 && !TARGET_PREFER_AVX256) return V16SFmode; else if (TARGET_AVX && !TARGET_PREFER_AVX128) return V8SFmode; @@ -23744,7 +23749,7 @@ ix86_preferred_simd_mode (scalar_mode mode) return V4SFmode; case E_DFmode: - if (TARGET_AVX512F && !TARGET_PREFER_AVX256) + if (TARGET_AVX512F && TARGET_EVEX512 && !TARGET_PREFER_AVX256) return V8DFmode; else if (TARGET_AVX && !TARGET_PREFER_AVX128) return V4DFmode; @@ -23764,13 +23769,13 @@ ix86_preferred_simd_mode (scalar_mode mode) static unsigned int ix86_autovectorize_vector_modes (vector_modes *modes, bool all) { - if (TARGET_AVX512F && !TARGET_PREFER_AVX256) + if (TARGET_AVX512F && TARGET_EVEX512 && !TARGET_PREFER_AVX256) { modes->safe_push (V64QImode); modes->safe_push (V32QImode); modes->safe_push (V16QImode); } - else if (TARGET_AVX512F && all) + else if (TARGET_AVX512F && TARGET_EVEX512 && all) { modes->safe_push (V32QImode); modes->safe_push (V16QImode); @@ -23808,7 +23813,7 @@ ix86_get_mask_mode (machine_mode data_mode) unsigned elem_size = vector_size / nunits; /* Scalar mask case. */ - if ((TARGET_AVX512F && vector_size == 64) + if ((TARGET_AVX512F && TARGET_EVEX512 && vector_size == 64) || (TARGET_AVX512VL && (vector_size == 32 || vector_size == 16))) { if (elem_size == 4 @@ -24306,7 +24311,7 @@ ix86_simd_clone_compute_vecsize_and_simdlen (struct cgraph_node *node, { /* If the function isn't exported, we can pick up just one ISA for the clones. */ - if (TARGET_AVX512F) + if (TARGET_AVX512F && TARGET_EVEX512) clonei->vecsize_mangle = 'e'; else if (TARGET_AVX2) clonei->vecsize_mangle = 'd'; @@ -24398,17 +24403,17 @@ ix86_simd_clone_usable (struct cgraph_node *node) return -1; if (!TARGET_AVX) return 0; - return TARGET_AVX512F ? 3 : TARGET_AVX2 ? 2 : 1; + return (TARGET_AVX512F && TARGET_EVEX512) ? 3 : TARGET_AVX2 ? 2 : 1; case 'c': if (!TARGET_AVX) return -1; - return TARGET_AVX512F ? 2 : TARGET_AVX2 ? 1 : 0; + return (TARGET_AVX512F && TARGET_EVEX512) ? 2 : TARGET_AVX2 ? 1 : 0; case 'd': if (!TARGET_AVX2) return -1; - return TARGET_AVX512F ? 1 : 0; + return (TARGET_AVX512F && TARGET_EVEX512) ? 1 : 0; case 'e': - if (!TARGET_AVX512F) + if (!TARGET_AVX512F || !TARGET_EVEX512) return -1; return 0; default: diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h index 3e8488f2ae8..aac972f5caf 100644 --- a/gcc/config/i386/i386.h +++ b/gcc/config/i386/i386.h @@ -770,7 +770,8 @@ extern const char *host_detect_local_cpu (int argc, const char **argv); TARGET_ABSOLUTE_BIGGEST_ALIGNMENT. */ #define BIGGEST_ALIGNMENT \ - (TARGET_IAMCU ? 32 : (TARGET_AVX512F ? 512 : (TARGET_AVX ? 256 : 128))) + (TARGET_IAMCU ? 32 : ((TARGET_AVX512F && TARGET_EVEX512) \ + ? 512 : (TARGET_AVX ? 256 : 128))) /* Maximum stack alignment. */ #define MAX_STACK_ALIGNMENT MAX_OFILE_ALIGNMENT @@ -1807,7 +1808,7 @@ typedef struct ix86_args { MOVE_MAX_PIECES defaults to MOVE_MAX. */ #define MOVE_MAX \ - ((TARGET_AVX512F \ + ((TARGET_AVX512F && TARGET_EVEX512\ && (ix86_move_max == PVW_AVX512 \ || ix86_store_max == PVW_AVX512)) \ ? 64 \ @@ -1826,7 +1827,7 @@ typedef struct ix86_args { store_by_pieces of 16/32/64 bytes. */ #define STORE_MAX_PIECES \ (TARGET_INTER_UNIT_MOVES_TO_VEC \ - ? ((TARGET_AVX512F && ix86_store_max == PVW_AVX512) \ + ? ((TARGET_AVX512F && TARGET_EVEX512 && ix86_store_max == PVW_AVX512) \ ? 64 \ : ((TARGET_AVX \ && ix86_store_max >= PVW_AVX256) \ From patchwork Thu Sep 21 07:20:08 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Hu, Lin1" X-Patchwork-Id: 1837524 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=intel.com header.i=@intel.com header.a=rsa-sha256 header.s=Intel header.b=CXSuacbr; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=8.43.85.97; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (ip-8-43-85-97.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4Rrn4T2vjVz1yh6 for ; Thu, 21 Sep 2023 17:25:45 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 94ECF3948A73 for ; Thu, 21 Sep 2023 07:25:41 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mgamail.intel.com (mgamail.intel.com [192.55.52.115]) by sourceware.org (Postfix) with ESMTPS id 6459C385735E for ; Thu, 21 Sep 2023 07:22:47 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 6459C385735E Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=intel.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1695280967; x=1726816967; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=KwvAhE4LcUUUga/hd6xnEYB6tPp3H8PNCjJ4autoKSE=; b=CXSuacbriGkACnIMw4b8b0X4RizgFLFPitzFu3/Iq255UZ0KbtsP5C48 183GztsTqMUp9L/T95DHKvGLjsjP3bU3JDdDSeAKkW56FSWQdKvNYG+dG EolLXZ+1xR7kAi/5EykPk8CEuGriWlokfoMpVh19IAnIJM3rUltemRc3O OLJkdT8W4f2ocvDHzkf9EZidU8IuCEO0T5ON5XAul+Gs2uoJWxxx2H1iE /gDeBiAEgcAoqjD28fmplc6Nm0rMuXR+sK1RMQbNwGDgS8L0VzuL992vQ CmjUDwtK++sYuwUc9LolH7d3h3ELryGHmT28jqtI3HfOPVbEkH5ng3BM7 w==; X-IronPort-AV: E=McAfee;i="6600,9927,10839"; a="380352177" X-IronPort-AV: E=Sophos;i="6.03,164,1694761200"; d="scan'208";a="380352177" Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Sep 2023 00:22:25 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10839"; a="817262232" X-IronPort-AV: E=Sophos;i="6.03,164,1694761200"; d="scan'208";a="817262232" Received: from shvmail03.sh.intel.com ([10.239.245.20]) by fmsmga004.fm.intel.com with ESMTP; 21 Sep 2023 00:22:15 -0700 Received: from shliclel4217.sh.intel.com (shliclel4217.sh.intel.com [10.239.240.127]) by shvmail03.sh.intel.com (Postfix) with ESMTP id 58960100513B; Thu, 21 Sep 2023 15:22:14 +0800 (CST) From: "Hu, Lin1" To: gcc-patches@gcc.gnu.org Cc: hongtao.liu@intel.com, ubizjak@gmail.com, haochen.jiang@intel.com Subject: [PATCH 13/18] Support -mevex512 for AVX512F intrins Date: Thu, 21 Sep 2023 15:20:08 +0800 Message-Id: <20230921072013.2124750-14-lin1.hu@intel.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20230921072013.2124750-1-lin1.hu@intel.com> References: <20230921072013.2124750-1-lin1.hu@intel.com> MIME-Version: 1.0 X-Spam-Status: No, score=-11.5 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org From: Haochen Jiang gcc/ChangeLog: * config/i386/i386-builtins.cc (ix86_vectorize_builtin_gather): Disable 512 bit gather when !TARGET_EVEX512. * config/i386/i386-expand.cc (ix86_valid_mask_cmp_mode): Add TARGET_EVEX512. (ix86_expand_int_sse_cmp): Ditto. (ix86_expand_vector_init_one_nonzero): Disable subroutine when !TARGET_EVEX512. (ix86_emit_swsqrtsf): Add TARGET_EVEX512. (ix86_vectorize_vec_perm_const): Disable subroutine when !TARGET_EVEX512. * config/i386/i386.cc (standard_sse_constant_p): Add TARGET_EVEX512. (standard_sse_constant_opcode): Ditto. (ix86_get_ssemov): Ditto. (ix86_legitimate_constant_p): Ditto. (ix86_vectorize_builtin_scatter): Diable 512 bit scatter when !TARGET_EVEX512. * config/i386/i386.md (avx512f_512): New. (movxi): Add TARGET_EVEX512. (*movxi_internal_avx512f): Ditto. (*movdi_internal): Change alternative 12 to ?Yv. Adjust mode for alternative 13. (*movsi_internal): Change alternative 8 to ?Yv. Adjust mode for alternative 9. (*movhi_internal): Change alternative 11 to *Yv. (*movdf_internal): Change alternative 12 to Yv. (*movsf_internal): Change alternative 5 to Yv. Adjust mode for alternative 5 and 6. (*mov_internal): Change alternative 4 to Yv. (define_split for convert SF to DF): Add TARGET_EVEX512. (extendbfsf2_1): Ditto. * config/i386/predicates.md (bcst_mem_operand): Disable predicate for 512 bit when !TARGET_EVEX512. * config/i386/sse.md (VMOVE): Add TARGET_EVEX512. (V48_AVX512VL): Ditto. (V48_256_512_AVX512VL): Ditto. (V48H_AVX512VL): Ditto. (VI12_AVX512VL): Ditto. (V): Ditto. (V_512): Ditto. (V_256_512): Ditto. (VF): Ditto. (VF1_VF2_AVX512DQ): Ditto. (VFH): Ditto. (VFB): Ditto. (VF1): Ditto. (VF1_AVX2): Ditto. (VF2): Ditto. (VF2H): Ditto. (VF2_512_256): Ditto. (VF2_512_256VL): Ditto. (VF_512): Ditto. (VFB_512): Ditto. (VI48_AVX512VL): Ditto. (VI1248_AVX512VLBW): Ditto. (VF_AVX512VL): Ditto. (VFH_AVX512VL): Ditto. (VF1_AVX512VL): Ditto. (VI): Ditto. (VIHFBF): Ditto. (VI_AVX2): Ditto. (VI8): Ditto. (VI8_AVX512VL): Ditto. (VI2_AVX512F): Ditto. (VI4_AVX512F): Ditto. (VI4_AVX512VL): Ditto. (VI48_AVX512F_AVX512VL): Ditto. (VI8_AVX2_AVX512F): Ditto. (VI8_AVX_AVX512F): Ditto. (V8FI): Ditto. (V16FI): Ditto. (VI124_AVX2_24_AVX512F_1_AVX512BW): Ditto. (VI248_AVX512VLBW): Ditto. (VI248_AVX2_8_AVX512F_24_AVX512BW): Ditto. (VI248_AVX512BW): Ditto. (VI248_AVX512BW_AVX512VL): Ditto. (VI48_AVX512F): Ditto. (VI48_AVX_AVX512F): Ditto. (VI12_AVX_AVX512F): Ditto. (VI148_512): Ditto. (VI124_256_AVX512F_AVX512BW): Ditto. (VI48_512): Ditto. (VI_AVX512BW): Ditto. (VIHFBF_AVX512BW): Ditto. (VI4F_256_512): Ditto. (VI48F_256_512): Ditto. (VI48F): Ditto. (VI12_VI48F_AVX512VL): Ditto. (V32_512): Ditto. (AVX512MODE2P): Ditto. (STORENT_MODE): Ditto. (REDUC_PLUS_MODE): Ditto. (REDUC_SMINMAX_MODE): Ditto. (*andnot3): Change isa attribute to avx512f_512. (*andnot3): Ditto. (3): Ditto. (tf3): Ditto. (FMAMODEM): Add TARGET_EVEX512. (FMAMODE_AVX512): Ditto. (VFH_SF_AVX512VL): Ditto. (avx512f_fix_notruncv16sfv16si): Ditto. (fix_truncv16sfv16si2): Ditto. (avx512f_cvtdq2pd512_2): Ditto. (avx512f_cvtpd2dq512): Ditto. (fix_truncv8dfv8si2): Ditto. (avx512f_cvtpd2ps512): Ditto. (vec_unpacks_lo_v16sf): Ditto. (vec_unpacks_hi_v16sf): Ditto. (vec_unpacks_float_hi_v16si): Ditto. (vec_unpacks_float_lo_v16si): Ditto. (vec_unpacku_float_hi_v16si): Ditto. (vec_unpacku_float_lo_v16si): Ditto. (vec_pack_sfix_trunc_v8df): Ditto. (avx512f_vec_pack_sfix_v8df): Ditto. (avx512f_unpckhps512): Ditto. (avx512f_unpcklps512): Ditto. (avx512f_movshdup512): Ditto. (avx512f_movsldup512): Ditto. (AVX512_VEC): Ditto. (AVX512_VEC_2): Ditto. (vec_extract_lo_v64qi): Ditto. (vec_extract_hi_v64qi): Ditto. (VEC_EXTRACT_MODE): Ditto. (avx512f_unpckhpd512): Ditto. (avx512f_movddup512): Ditto. (avx512f_unpcklpd512): Ditto. (*_vternlog_all): Ditto. (*_vpternlog_1): Ditto. (*_vpternlog_2): Ditto. (*_vpternlog_3): Ditto. (avx512f_shufps512_mask): Ditto. (avx512f_shufps512_1): Ditto. (avx512f_shufpd512_mask): Ditto. (avx512f_shufpd512_1): Ditto. (avx512f_interleave_highv8di): Ditto. (avx512f_interleave_lowv8di): Ditto. (vec_dupv2df): Ditto. (trunc2): Ditto. (*avx512f_2): Ditto. (*avx512f_vpermvar_truncv8div8si_1): Ditto. (avx512f_2_mask): Ditto. (avx512f_2_mask_store): Ditto. (truncv8div8qi2): Ditto. (avx512f_v8div16qi2): Ditto. (*avx512f_v8div16qi2_store_1): Ditto. (*avx512f_v8div16qi2_store_2): Ditto. (avx512f_v8div16qi2_mask): Ditto. (*avx512f_v8div16qi2_mask_1): Ditto. (*avx512f_v8div16qi2_mask_store_1): Ditto. (avx512f_v8div16qi2_mask_store_2): Ditto. (vec_widen_umult_even_v16si): Ditto. (*vec_widen_umult_even_v16si): Ditto. (vec_widen_smult_even_v16si): Ditto. (*vec_widen_smult_even_v16si): Ditto. (VEC_PERM_AVX2): Ditto. (one_cmpl2): Ditto. (one_cmpl2): Ditto. (*one_cmpl2_pternlog_false_dep): Ditto. (define_split to xor): Ditto. (*andnot3): Ditto. (define_split for ior): Ditto. (*iornot3): Ditto. (*xnor3): Ditto. (*3): Ditto. (avx512f_interleave_highv16si): Ditto. (avx512f_interleave_lowv16si): Ditto. (avx512f_pshufdv3_mask): Ditto. (avx512f_pshufd_1): Ditto. (*vec_extractv4ti): Ditto. (VEXTRACTI128_MODE): Ditto. (define_split to vec_extract): Ditto. (VI1248_AVX512VL_AVX512BW): Ditto. (avx512f_v16qiv16si2): Ditto. (v16qiv16si2): Ditto. (avx512f_v16hiv16si2): Ditto. (v16hiv16si2): Ditto. (avx512f_zero_extendv16hiv16si2_1): Ditto. (avx512f_v8qiv8di2): Ditto. (*avx512f_v8qiv8di2_1): Ditto. (*avx512f_v8qiv8di2_2): Ditto. (v8qiv8di2): Ditto. (avx512f_v8hiv8di2): Ditto. (v8hiv8di2): Ditto. (avx512f_v8siv8di2): Ditto. (*avx512f_zero_extendv8siv8di2_1): Ditto. (*avx512f_zero_extendv8siv8di2_2): Ditto. (v8siv8di2): Ditto. (avx512f_roundps512_sfix): Ditto. (vashrv8di3): Ditto. (vashrv16si3): Ditto. (pbroadcast_evex_isa): Change isa attribute to avx512f_512. (vec_dupv4sf): Add TARGET_EVEX512. (*vec_dupv4si): Ditto. (*vec_dupv2di): Ditto. (vec_dup): Change isa attribute to avx512f_512. (VPERMI2): Add TARGET_EVEX512. (VPERMI2I): Ditto. (VEC_INIT_MODE): Ditto. (VEC_INIT_HALF_MODE): Ditto. (avx512f_vcvtph2ps512): Ditto. (avx512f_vcvtps2ph512_mask_sae): Ditto. (avx512f_vcvtps2ph512): Ditto. (*avx512f_vcvtps2ph512): Ditto. (INT_BROADCAST_MODE): Ditto. gcc/testsuite/ChangeLog: * gcc.target/i386/pr89229-5b.c: Modify message of scan-assembler. * gcc.target/i386/pr89229-6b.c: Ditto. * gcc.target/i386/pr89229-7b.c: Ditto. --- gcc/config/i386/i386-builtins.cc | 24 +- gcc/config/i386/i386-expand.cc | 12 +- gcc/config/i386/i386.cc | 101 ++-- gcc/config/i386/i386.md | 81 ++- gcc/config/i386/predicates.md | 3 +- gcc/config/i386/sse.md | 553 +++++++++++---------- gcc/testsuite/gcc.target/i386/pr89229-5b.c | 2 +- gcc/testsuite/gcc.target/i386/pr89229-6b.c | 2 +- gcc/testsuite/gcc.target/i386/pr89229-7b.c | 2 +- 9 files changed, 445 insertions(+), 335 deletions(-) diff --git a/gcc/config/i386/i386-builtins.cc b/gcc/config/i386/i386-builtins.cc index e1d1dac2ba2..70538fbe17b 100644 --- a/gcc/config/i386/i386-builtins.cc +++ b/gcc/config/i386/i386-builtins.cc @@ -1675,6 +1675,10 @@ ix86_vectorize_builtin_gather (const_tree mem_vectype, { bool si; enum ix86_builtins code; + const machine_mode mode = TYPE_MODE (TREE_TYPE (mem_vectype)); + + if ((!TARGET_AVX512F || !TARGET_EVEX512) && GET_MODE_SIZE (mode) == 64) + return NULL_TREE; if (! TARGET_AVX2 || (known_eq (TYPE_VECTOR_SUBPARTS (mem_vectype), 2u) @@ -1755,28 +1759,16 @@ ix86_vectorize_builtin_gather (const_tree mem_vectype, code = si ? IX86_BUILTIN_GATHERSIV8SI : IX86_BUILTIN_GATHERALTDIV8SI; break; case E_V8DFmode: - if (TARGET_AVX512F) - code = si ? IX86_BUILTIN_GATHER3ALTSIV8DF : IX86_BUILTIN_GATHER3DIV8DF; - else - return NULL_TREE; + code = si ? IX86_BUILTIN_GATHER3ALTSIV8DF : IX86_BUILTIN_GATHER3DIV8DF; break; case E_V8DImode: - if (TARGET_AVX512F) - code = si ? IX86_BUILTIN_GATHER3ALTSIV8DI : IX86_BUILTIN_GATHER3DIV8DI; - else - return NULL_TREE; + code = si ? IX86_BUILTIN_GATHER3ALTSIV8DI : IX86_BUILTIN_GATHER3DIV8DI; break; case E_V16SFmode: - if (TARGET_AVX512F) - code = si ? IX86_BUILTIN_GATHER3SIV16SF : IX86_BUILTIN_GATHER3ALTDIV16SF; - else - return NULL_TREE; + code = si ? IX86_BUILTIN_GATHER3SIV16SF : IX86_BUILTIN_GATHER3ALTDIV16SF; break; case E_V16SImode: - if (TARGET_AVX512F) - code = si ? IX86_BUILTIN_GATHER3SIV16SI : IX86_BUILTIN_GATHER3ALTDIV16SI; - else - return NULL_TREE; + code = si ? IX86_BUILTIN_GATHER3SIV16SI : IX86_BUILTIN_GATHER3ALTDIV16SI; break; default: return NULL_TREE; diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc index 6eedcb384c0..0705e08d38c 100644 --- a/gcc/config/i386/i386-expand.cc +++ b/gcc/config/i386/i386-expand.cc @@ -3943,7 +3943,7 @@ ix86_valid_mask_cmp_mode (machine_mode mode) if ((inner_mode == QImode || inner_mode == HImode) && !TARGET_AVX512BW) return false; - return vector_size == 64 || TARGET_AVX512VL; + return (vector_size == 64 && TARGET_EVEX512) || TARGET_AVX512VL; } /* Return true if integer mask comparison should be used. */ @@ -4773,7 +4773,7 @@ ix86_expand_int_sse_cmp (rtx dest, enum rtx_code code, rtx cop0, rtx cop1, && GET_MODE_SIZE (GET_MODE_INNER (mode)) >= 4 /* Don't do it if not using integer masks and we'd end up with the right values in the registers though. */ - && (GET_MODE_SIZE (mode) == 64 + && ((GET_MODE_SIZE (mode) == 64 && TARGET_EVEX512) || !vector_all_ones_operand (optrue, data_mode) || opfalse != CONST0_RTX (data_mode)))) { @@ -15668,6 +15668,9 @@ ix86_expand_vector_init_one_nonzero (bool mmx_ok, machine_mode mode, bool use_vector_set = false; rtx (*gen_vec_set_0) (rtx, rtx, rtx) = NULL; + if (GET_MODE_SIZE (mode) == 64 && !TARGET_EVEX512) + return false; + switch (mode) { case E_V2DImode: @@ -18288,7 +18291,7 @@ ix86_emit_swsqrtsf (rtx res, rtx a, machine_mode mode, bool recip) unsigned vector_size = GET_MODE_SIZE (mode); if (TARGET_FMA - || (TARGET_AVX512F && vector_size == 64) + || (TARGET_AVX512F && TARGET_EVEX512 && vector_size == 64) || (TARGET_AVX512VL && (vector_size == 32 || vector_size == 16))) emit_insn (gen_rtx_SET (e2, gen_rtx_FMA (mode, e0, x0, mthree))); @@ -23005,6 +23008,9 @@ ix86_vectorize_vec_perm_const (machine_mode vmode, machine_mode op_mode, unsigned int i, nelt, which; bool two_args; + if (GET_MODE_SIZE (vmode) == 64 && !TARGET_EVEX512) + return false; + /* For HF mode vector, convert it to HI using subreg. */ if (GET_MODE_INNER (vmode) == HFmode) { diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc index 0df3bf10547..635dd85e764 100644 --- a/gcc/config/i386/i386.cc +++ b/gcc/config/i386/i386.cc @@ -5263,7 +5263,7 @@ standard_sse_constant_p (rtx x, machine_mode pred_mode) switch (GET_MODE_SIZE (mode)) { case 64: - if (TARGET_AVX512F) + if (TARGET_AVX512F && TARGET_EVEX512) return 2; break; case 32: @@ -5313,9 +5313,14 @@ standard_sse_constant_opcode (rtx_insn *insn, rtx *operands) case MODE_XI: case MODE_OI: if (EXT_REX_SSE_REG_P (operands[0])) - return (TARGET_AVX512VL - ? "vpxord\t%x0, %x0, %x0" - : "vpxord\t%g0, %g0, %g0"); + { + if (TARGET_AVX512VL) + return "vpxord\t%x0, %x0, %x0"; + else if (TARGET_EVEX512) + return "vpxord\t%g0, %g0, %g0"; + else + gcc_unreachable (); + } return "vpxor\t%x0, %x0, %x0"; case MODE_V2DF: @@ -5324,16 +5329,23 @@ standard_sse_constant_opcode (rtx_insn *insn, rtx *operands) /* FALLTHRU */ case MODE_V8DF: case MODE_V4DF: - if (!EXT_REX_SSE_REG_P (operands[0])) - return "vxorpd\t%x0, %x0, %x0"; - else if (TARGET_AVX512DQ) - return (TARGET_AVX512VL - ? "vxorpd\t%x0, %x0, %x0" - : "vxorpd\t%g0, %g0, %g0"); - else - return (TARGET_AVX512VL - ? "vpxorq\t%x0, %x0, %x0" - : "vpxorq\t%g0, %g0, %g0"); + if (EXT_REX_SSE_REG_P (operands[0])) + { + if (TARGET_AVX512DQ) + return (TARGET_AVX512VL + ? "vxorpd\t%x0, %x0, %x0" + : "vxorpd\t%g0, %g0, %g0"); + else + { + if (TARGET_AVX512VL) + return "vpxorq\t%x0, %x0, %x0"; + else if (TARGET_EVEX512) + return "vpxorq\t%g0, %g0, %g0"; + else + gcc_unreachable (); + } + } + return "vxorpd\t%x0, %x0, %x0"; case MODE_V4SF: if (!EXT_REX_SSE_REG_P (operands[0])) @@ -5341,16 +5353,23 @@ standard_sse_constant_opcode (rtx_insn *insn, rtx *operands) /* FALLTHRU */ case MODE_V16SF: case MODE_V8SF: - if (!EXT_REX_SSE_REG_P (operands[0])) - return "vxorps\t%x0, %x0, %x0"; - else if (TARGET_AVX512DQ) - return (TARGET_AVX512VL - ? "vxorps\t%x0, %x0, %x0" - : "vxorps\t%g0, %g0, %g0"); - else - return (TARGET_AVX512VL - ? "vpxord\t%x0, %x0, %x0" - : "vpxord\t%g0, %g0, %g0"); + if (EXT_REX_SSE_REG_P (operands[0])) + { + if (TARGET_AVX512DQ) + return (TARGET_AVX512VL + ? "vxorps\t%x0, %x0, %x0" + : "vxorps\t%g0, %g0, %g0"); + else + { + if (TARGET_AVX512VL) + return "vpxord\t%x0, %x0, %x0"; + else if (TARGET_EVEX512) + return "vpxord\t%g0, %g0, %g0"; + else + gcc_unreachable (); + } + } + return "vxorps\t%x0, %x0, %x0"; default: gcc_unreachable (); @@ -5368,7 +5387,7 @@ standard_sse_constant_opcode (rtx_insn *insn, rtx *operands) case MODE_XI: case MODE_V8DF: case MODE_V16SF: - gcc_assert (TARGET_AVX512F); + gcc_assert (TARGET_AVX512F && TARGET_EVEX512); return "vpternlogd\t{$0xFF, %g0, %g0, %g0|%g0, %g0, %g0, 0xFF}"; case MODE_OI: @@ -5380,14 +5399,18 @@ standard_sse_constant_opcode (rtx_insn *insn, rtx *operands) case MODE_V2DF: case MODE_V4SF: gcc_assert (TARGET_SSE2); - if (!EXT_REX_SSE_REG_P (operands[0])) - return (TARGET_AVX - ? "vpcmpeqd\t%0, %0, %0" - : "pcmpeqd\t%0, %0"); - else if (TARGET_AVX512VL) - return "vpternlogd\t{$0xFF, %0, %0, %0|%0, %0, %0, 0xFF}"; - else - return "vpternlogd\t{$0xFF, %g0, %g0, %g0|%g0, %g0, %g0, 0xFF}"; + if (EXT_REX_SSE_REG_P (operands[0])) + { + if (TARGET_AVX512VL) + return "vpternlogd\t{$0xFF, %0, %0, %0|%0, %0, %0, 0xFF}"; + else if (TARGET_EVEX512) + return "vpternlogd\t{$0xFF, %g0, %g0, %g0|%g0, %g0, %g0, 0xFF}"; + else + gcc_unreachable (); + } + return (TARGET_AVX + ? "vpcmpeqd\t%0, %0, %0" + : "pcmpeqd\t%0, %0"); default: gcc_unreachable (); @@ -5397,7 +5420,7 @@ standard_sse_constant_opcode (rtx_insn *insn, rtx *operands) { if (GET_MODE_SIZE (mode) == 64) { - gcc_assert (TARGET_AVX512F); + gcc_assert (TARGET_AVX512F && TARGET_EVEX512); return "vpcmpeqd\t%t0, %t0, %t0"; } else if (GET_MODE_SIZE (mode) == 32) @@ -5409,7 +5432,7 @@ standard_sse_constant_opcode (rtx_insn *insn, rtx *operands) } else if (vector_all_ones_zero_extend_quarter_operand (x, mode)) { - gcc_assert (TARGET_AVX512F); + gcc_assert (TARGET_AVX512F && TARGET_EVEX512); return "vpcmpeqd\t%x0, %x0, %x0"; } @@ -5511,6 +5534,8 @@ ix86_get_ssemov (rtx *operands, unsigned size, || memory_operand (operands[1], mode)) gcc_unreachable (); size = 64; + /* We need TARGET_EVEX512 to move into zmm register. */ + gcc_assert (TARGET_EVEX512); switch (type) { case opcode_int: @@ -10727,7 +10752,7 @@ ix86_legitimate_constant_p (machine_mode mode, rtx x) case E_OImode: case E_XImode: if (!standard_sse_constant_p (x, mode) - && GET_MODE_SIZE (TARGET_AVX512F + && GET_MODE_SIZE (TARGET_AVX512F && TARGET_EVEX512 ? XImode : (TARGET_AVX ? OImode @@ -19195,10 +19220,14 @@ ix86_vectorize_builtin_scatter (const_tree vectype, { bool si; enum ix86_builtins code; + const machine_mode mode = TYPE_MODE (TREE_TYPE (vectype)); if (!TARGET_AVX512F) return NULL_TREE; + if (!TARGET_EVEX512 && GET_MODE_SIZE (mode) == 64) + return NULL_TREE; + if (known_eq (TYPE_VECTOR_SUBPARTS (vectype), 2u) ? !TARGET_USE_SCATTER_2PARTS : (known_eq (TYPE_VECTOR_SUBPARTS (vectype), 4u) diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md index eef8a0e01eb..6eb4e540140 100644 --- a/gcc/config/i386/i386.md +++ b/gcc/config/i386/i386.md @@ -535,10 +535,11 @@ (define_attr "isa" "base,x64,nox64,x64_sse2,x64_sse4,x64_sse4_noavx, x64_avx,x64_avx512bw,x64_avx512dq,aes, sse_noavx,sse2,sse2_noavx,sse3,sse3_noavx,sse4,sse4_noavx, - avx,noavx,avx2,noavx2,bmi,bmi2,fma4,fma,avx512f,noavx512f, - avx512bw,noavx512bw,avx512dq,noavx512dq,fma_or_avx512vl, - avx512vl,noavx512vl,avxvnni,avx512vnnivl,avx512fp16,avxifma, - avx512ifmavl,avxneconvert,avx512bf16vl,vpclmulqdqvl" + avx,noavx,avx2,noavx2,bmi,bmi2,fma4,fma,avx512f,avx512f_512, + noavx512f,avx512bw,noavx512bw,avx512dq,noavx512dq, + fma_or_avx512vl,avx512vl,noavx512vl,avxvnni,avx512vnnivl, + avx512fp16,avxifma,avx512ifmavl,avxneconvert,avx512bf16vl, + vpclmulqdqvl" (const_string "base")) ;; The (bounding maximum) length of an instruction immediate. @@ -899,6 +900,8 @@ (eq_attr "isa" "fma_or_avx512vl") (symbol_ref "TARGET_FMA || TARGET_AVX512VL") (eq_attr "isa" "avx512f") (symbol_ref "TARGET_AVX512F") + (eq_attr "isa" "avx512f_512") + (symbol_ref "TARGET_AVX512F && TARGET_EVEX512") (eq_attr "isa" "noavx512f") (symbol_ref "!TARGET_AVX512F") (eq_attr "isa" "avx512bw") (symbol_ref "TARGET_AVX512BW") (eq_attr "isa" "noavx512bw") (symbol_ref "!TARGET_AVX512BW") @@ -2281,7 +2284,7 @@ (define_expand "movxi" [(set (match_operand:XI 0 "nonimmediate_operand") (match_operand:XI 1 "general_operand"))] - "TARGET_AVX512F" + "TARGET_AVX512F && TARGET_EVEX512" "ix86_expand_vector_move (XImode, operands); DONE;") (define_expand "movoi" @@ -2357,7 +2360,7 @@ (define_insn "*movxi_internal_avx512f" [(set (match_operand:XI 0 "nonimmediate_operand" "=v,v ,v ,m") (match_operand:XI 1 "nonimmediate_or_sse_const_operand" " C,BC,vm,v"))] - "TARGET_AVX512F + "TARGET_AVX512F && TARGET_EVEX512 && (register_operand (operands[0], XImode) || register_operand (operands[1], XImode))" { @@ -2485,9 +2488,9 @@ (define_insn "*movdi_internal" [(set (match_operand:DI 0 "nonimmediate_operand" - "=r ,o ,r,r ,r,m ,*y,*y,?*y,?m,?r,?*y,?v,?v,?v,m ,m,?r ,?*Yd,?r,?v,?*y,?*x,*k,*k ,*r,*m,*k") + "=r ,o ,r,r ,r,m ,*y,*y,?*y,?m,?r,?*y,?Yv,?v,?v,m ,m,?r ,?*Yd,?r,?v,?*y,?*x,*k,*k ,*r,*m,*k") (match_operand:DI 1 "general_operand" - "riFo,riF,Z,rem,i,re,C ,*y,Bk ,*y,*y,r ,C ,?v,Bk,?v,v,*Yd,r ,?v,r ,*x ,*y ,*r,*kBk,*k,*k,CBC"))] + "riFo,riF,Z,rem,i,re,C ,*y,Bk ,*y,*y,r ,C ,?v,Bk,?v,v,*Yd,r ,?v,r ,*x ,*y ,*r,*kBk,*k,*k,CBC"))] "!(MEM_P (operands[0]) && MEM_P (operands[1])) && ix86_hardreg_mov_ok (operands[0], operands[1])" { @@ -2605,7 +2608,7 @@ (set (attr "mode") (cond [(eq_attr "alternative" "2") (const_string "SI") - (eq_attr "alternative" "12,13") + (eq_attr "alternative" "12") (cond [(match_test "TARGET_AVX") (const_string "TI") (ior (not (match_test "TARGET_SSE2")) @@ -2613,6 +2616,18 @@ (const_string "V4SF") ] (const_string "TI")) + (eq_attr "alternative" "13") + (cond [(match_test "TARGET_AVX512VL") + (const_string "TI") + (match_test "TARGET_AVX512F") + (const_string "DF") + (match_test "TARGET_AVX") + (const_string "TI") + (ior (not (match_test "TARGET_SSE2")) + (match_test "optimize_function_for_size_p (cfun)")) + (const_string "V4SF") + ] + (const_string "TI")) (and (eq_attr "alternative" "14,15,16") (not (match_test "TARGET_SSE2"))) @@ -2706,9 +2721,9 @@ (define_insn "*movsi_internal" [(set (match_operand:SI 0 "nonimmediate_operand" - "=r,m ,*y,*y,?*y,?m,?r,?*y,?v,?v,?v,m ,?r,?v,*k,*k ,*rm,*k") + "=r,m ,*y,*y,?*y,?m,?r,?*y,?Yv,?v,?v,m ,?r,?v,*k,*k ,*rm,*k") (match_operand:SI 1 "general_operand" - "g ,re,C ,*y,Bk ,*y,*y,r ,C ,?v,Bk,?v,?v,r ,*r,*kBk,*k ,CBC"))] + "g ,re,C ,*y,Bk ,*y,*y,r ,C ,?v,Bk,?v,?v,r ,*r,*kBk,*k ,CBC"))] "!(MEM_P (operands[0]) && MEM_P (operands[1])) && ix86_hardreg_mov_ok (operands[0], operands[1])" { @@ -2793,7 +2808,7 @@ (set (attr "mode") (cond [(eq_attr "alternative" "2,3") (const_string "DI") - (eq_attr "alternative" "8,9") + (eq_attr "alternative" "8") (cond [(match_test "TARGET_AVX") (const_string "TI") (ior (not (match_test "TARGET_SSE2")) @@ -2801,6 +2816,18 @@ (const_string "V4SF") ] (const_string "TI")) + (eq_attr "alternative" "9") + (cond [(match_test "TARGET_AVX512VL") + (const_string "TI") + (match_test "TARGET_AVX512F") + (const_string "SF") + (match_test "TARGET_AVX") + (const_string "TI") + (ior (not (match_test "TARGET_SSE2")) + (match_test "optimize_function_for_size_p (cfun)")) + (const_string "V4SF") + ] + (const_string "TI")) (and (eq_attr "alternative" "10,11") (not (match_test "TARGET_SSE2"))) @@ -2849,9 +2876,9 @@ (define_insn "*movhi_internal" [(set (match_operand:HI 0 "nonimmediate_operand" - "=r,r,r,m ,*k,*k ,r ,m ,*k ,?r,?*v,*v,*v,*v,m") + "=r,r,r,m ,*k,*k ,r ,m ,*k ,?r,?*v,*Yv,*v,*v,m") (match_operand:HI 1 "general_operand" - "r ,n,m,rn,r ,*km,*k,*k,CBC,*v,r ,C ,*v,m ,*v"))] + "r ,n,m,rn,r ,*km,*k,*k,CBC,*v,r ,C ,*v,m ,*v"))] "!(MEM_P (operands[0]) && MEM_P (operands[1])) && ix86_hardreg_mov_ok (operands[0], operands[1])" { @@ -3993,9 +4020,9 @@ ;; Possible store forwarding (partial memory) stall in alternatives 4, 6 and 7. (define_insn "*movdf_internal" [(set (match_operand:DF 0 "nonimmediate_operand" - "=Yf*f,m ,Yf*f,?r ,!o,?*r ,!o,!o,?r,?m,?r,?r,v,v,v,m,*x,*x,*x,m ,?r,?v,r ,o ,r ,m") + "=Yf*f,m ,Yf*f,?r ,!o,?*r ,!o,!o,?r,?m,?r,?r,Yv,v,v,m,*x,*x,*x,m ,?r,?v,r ,o ,r ,m") (match_operand:DF 1 "general_operand" - "Yf*fm,Yf*f,G ,roF,r ,*roF,*r,F ,rm,rC,C ,F ,C,v,m,v,C ,*x,m ,*x, v, r,roF,rF,rmF,rC"))] + "Yf*fm,Yf*f,G ,roF,r ,*roF,*r,F ,rm,rC,C ,F ,C ,v,m,v,C ,*x,m ,*x, v, r,roF,rF,rmF,rC"))] "!(MEM_P (operands[0]) && MEM_P (operands[1])) && (lra_in_progress || reload_completed || !CONST_DOUBLE_P (operands[1]) @@ -4170,9 +4197,9 @@ (define_insn "*movsf_internal" [(set (match_operand:SF 0 "nonimmediate_operand" - "=Yf*f,m ,Yf*f,?r ,?m,v,v,v,m,?r,?v,!*y,!*y,!m,!r,!*y,r ,m") + "=Yf*f,m ,Yf*f,?r ,?m,Yv,v,v,m,?r,?v,!*y,!*y,!m,!r,!*y,r ,m") (match_operand:SF 1 "general_operand" - "Yf*fm,Yf*f,G ,rmF,rF,C,v,m,v,v ,r ,*y ,m ,*y,*y,r ,rmF,rF"))] + "Yf*fm,Yf*f,G ,rmF,rF,C ,v,m,v,v ,r ,*y ,m ,*y,*y,r ,rmF,rF"))] "!(MEM_P (operands[0]) && MEM_P (operands[1])) && (lra_in_progress || reload_completed || !CONST_DOUBLE_P (operands[1]) @@ -4247,7 +4274,7 @@ (eq_attr "alternative" "11") (const_string "DI") (eq_attr "alternative" "5") - (cond [(and (match_test "TARGET_AVX512F") + (cond [(and (match_test "TARGET_AVX512F && TARGET_EVEX512") (not (match_test "TARGET_PREFER_AVX256"))) (const_string "V16SF") (match_test "TARGET_AVX") @@ -4271,7 +4298,11 @@ better to maintain the whole registers in single format to avoid problems on using packed logical operations. */ (eq_attr "alternative" "6") - (cond [(ior (match_test "TARGET_SSE_PARTIAL_REG_DEPENDENCY") + (cond [(match_test "TARGET_AVX512VL") + (const_string "V4SF") + (match_test "TARGET_AVX512F") + (const_string "SF") + (ior (match_test "TARGET_SSE_PARTIAL_REG_DEPENDENCY") (match_test "TARGET_SSE_SPLIT_REGS")) (const_string "V4SF") ] @@ -4301,9 +4332,9 @@ (define_insn "*mov_internal" [(set (match_operand:HFBF 0 "nonimmediate_operand" - "=?r,?r,?r,?m,v,v,?r,m,?v,v") + "=?r,?r,?r,?m ,Yv,v,?r,m,?v,v") (match_operand:HFBF 1 "general_operand" - "r ,F ,m ,r,C,v, v,v,r ,m"))] + "r ,F ,m ,r,C ,v, v,v,r ,m"))] "!(MEM_P (operands[0]) && MEM_P (operands[1])) && (lra_in_progress || reload_completed @@ -5144,7 +5175,7 @@ && optimize_insn_for_speed_p () && reload_completed && (!EXT_REX_SSE_REG_P (operands[0]) - || TARGET_AVX512VL)" + || TARGET_AVX512VL || TARGET_EVEX512)" [(set (match_dup 2) (float_extend:V2DF (vec_select:V2SF @@ -5287,8 +5318,8 @@ (set_attr "memory" "none") (set (attr "enabled") (if_then_else (eq_attr "alternative" "2") - (symbol_ref "TARGET_AVX512F && !TARGET_AVX512VL - && !TARGET_PREFER_AVX256") + (symbol_ref "TARGET_AVX512F && TARGET_EVEX512 + && !TARGET_AVX512VL && !TARGET_PREFER_AVX256") (const_string "*")))]) (define_expand "extendxf2" diff --git a/gcc/config/i386/predicates.md b/gcc/config/i386/predicates.md index 37d20c6303a..ef49efdbde5 100644 --- a/gcc/config/i386/predicates.md +++ b/gcc/config/i386/predicates.md @@ -1276,7 +1276,8 @@ (and (match_code "vec_duplicate") (and (match_test "TARGET_AVX512F") (ior (match_test "TARGET_AVX512VL") - (match_test "GET_MODE_SIZE (GET_MODE (op)) == 64"))) + (and (match_test "GET_MODE_SIZE (GET_MODE (op)) == 64") + (match_test "TARGET_EVEX512")))) (match_test "VALID_BCST_MODE_P (GET_MODE_INNER (GET_MODE (op)))") (match_test "GET_MODE (XEXP (op, 0)) == GET_MODE_INNER (GET_MODE (op))") diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index 80b43fd7db7..8d1b75b43e0 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -253,43 +253,43 @@ ;; All vector modes including V?TImode, used in move patterns. (define_mode_iterator VMOVE - [(V64QI "TARGET_AVX512F") (V32QI "TARGET_AVX") V16QI - (V32HI "TARGET_AVX512F") (V16HI "TARGET_AVX") V8HI - (V16SI "TARGET_AVX512F") (V8SI "TARGET_AVX") V4SI - (V8DI "TARGET_AVX512F") (V4DI "TARGET_AVX") V2DI - (V4TI "TARGET_AVX512F") (V2TI "TARGET_AVX") V1TI - (V32HF "TARGET_AVX512F") (V16HF "TARGET_AVX") V8HF - (V32BF "TARGET_AVX512F") (V16BF "TARGET_AVX") V8BF - (V16SF "TARGET_AVX512F") (V8SF "TARGET_AVX") V4SF - (V8DF "TARGET_AVX512F") (V4DF "TARGET_AVX") V2DF]) + [(V64QI "TARGET_AVX512F && TARGET_EVEX512") (V32QI "TARGET_AVX") V16QI + (V32HI "TARGET_AVX512F && TARGET_EVEX512") (V16HI "TARGET_AVX") V8HI + (V16SI "TARGET_AVX512F && TARGET_EVEX512") (V8SI "TARGET_AVX") V4SI + (V8DI "TARGET_AVX512F && TARGET_EVEX512") (V4DI "TARGET_AVX") V2DI + (V4TI "TARGET_AVX512F && TARGET_EVEX512") (V2TI "TARGET_AVX") V1TI + (V32HF "TARGET_AVX512F && TARGET_EVEX512") (V16HF "TARGET_AVX") V8HF + (V32BF "TARGET_AVX512F && TARGET_EVEX512") (V16BF "TARGET_AVX") V8BF + (V16SF "TARGET_AVX512F && TARGET_EVEX512") (V8SF "TARGET_AVX") V4SF + (V8DF "TARGET_AVX512F && TARGET_EVEX512") (V4DF "TARGET_AVX") V2DF]) ;; All AVX-512{F,VL} vector modes without HF. Supposed TARGET_AVX512F baseline. (define_mode_iterator V48_AVX512VL - [V16SI (V8SI "TARGET_AVX512VL") (V4SI "TARGET_AVX512VL") - V8DI (V4DI "TARGET_AVX512VL") (V2DI "TARGET_AVX512VL") - V16SF (V8SF "TARGET_AVX512VL") (V4SF "TARGET_AVX512VL") - V8DF (V4DF "TARGET_AVX512VL") (V2DF "TARGET_AVX512VL")]) + [(V16SI "TARGET_EVEX512") (V8SI "TARGET_AVX512VL") (V4SI "TARGET_AVX512VL") + (V8DI "TARGET_EVEX512") (V4DI "TARGET_AVX512VL") (V2DI "TARGET_AVX512VL") + (V16SF "TARGET_EVEX512") (V8SF "TARGET_AVX512VL") (V4SF "TARGET_AVX512VL") + (V8DF "TARGET_EVEX512") (V4DF "TARGET_AVX512VL") (V2DF "TARGET_AVX512VL")]) (define_mode_iterator V48_256_512_AVX512VL - [V16SI (V8SI "TARGET_AVX512VL") - V8DI (V4DI "TARGET_AVX512VL") - V16SF (V8SF "TARGET_AVX512VL") - V8DF (V4DF "TARGET_AVX512VL")]) + [(V16SI "TARGET_EVEX512") (V8SI "TARGET_AVX512VL") + (V8DI "TARGET_EVEX512") (V4DI "TARGET_AVX512VL") + (V16SF "TARGET_EVEX512") (V8SF "TARGET_AVX512VL") + (V8DF "TARGET_EVEX512") (V4DF "TARGET_AVX512VL")]) ;; All AVX-512{F,VL} vector modes. Supposed TARGET_AVX512F baseline. (define_mode_iterator V48H_AVX512VL - [V16SI (V8SI "TARGET_AVX512VL") (V4SI "TARGET_AVX512VL") - V8DI (V4DI "TARGET_AVX512VL") (V2DI "TARGET_AVX512VL") + [(V16SI "TARGET_EVEX512") (V8SI "TARGET_AVX512VL") (V4SI "TARGET_AVX512VL") + (V8DI "TARGET_EVEX512") (V4DI "TARGET_AVX512VL") (V2DI "TARGET_AVX512VL") (V32HF "TARGET_AVX512FP16") (V16HF "TARGET_AVX512FP16 && TARGET_AVX512VL") (V8HF "TARGET_AVX512FP16 && TARGET_AVX512VL") - V16SF (V8SF "TARGET_AVX512VL") (V4SF "TARGET_AVX512VL") - V8DF (V4DF "TARGET_AVX512VL") (V2DF "TARGET_AVX512VL")]) + (V16SF "TARGET_EVEX512") (V8SF "TARGET_AVX512VL") (V4SF "TARGET_AVX512VL") + (V8DF "TARGET_EVEX512") (V4DF "TARGET_AVX512VL") (V2DF "TARGET_AVX512VL")]) ;; 1,2 byte AVX-512{BW,VL} vector modes. Supposed TARGET_AVX512BW baseline. (define_mode_iterator VI12_AVX512VL - [V64QI (V16QI "TARGET_AVX512VL") (V32QI "TARGET_AVX512VL") - V32HI (V16HI "TARGET_AVX512VL") (V8HI "TARGET_AVX512VL")]) + [(V64QI "TARGET_EVEX512") (V16QI "TARGET_AVX512VL") (V32QI "TARGET_AVX512VL") + (V32HI "TARGET_EVEX512") (V16HI "TARGET_AVX512VL") (V8HI "TARGET_AVX512VL")]) (define_mode_iterator VI12HFBF_AVX512VL [V64QI (V16QI "TARGET_AVX512VL") (V32QI "TARGET_AVX512VL") @@ -302,13 +302,13 @@ ;; All vector modes (define_mode_iterator V - [(V64QI "TARGET_AVX512F") (V32QI "TARGET_AVX") V16QI - (V32HI "TARGET_AVX512F") (V16HI "TARGET_AVX") V8HI - (V16SI "TARGET_AVX512F") (V8SI "TARGET_AVX") V4SI - (V8DI "TARGET_AVX512F") (V4DI "TARGET_AVX") V2DI - (V32HF "TARGET_AVX512F") (V16HF "TARGET_AVX") V8HF - (V16SF "TARGET_AVX512F") (V8SF "TARGET_AVX") V4SF - (V8DF "TARGET_AVX512F") (V4DF "TARGET_AVX") (V2DF "TARGET_SSE2")]) + [(V64QI "TARGET_AVX512F && TARGET_EVEX512") (V32QI "TARGET_AVX") V16QI + (V32HI "TARGET_AVX512F && TARGET_EVEX512") (V16HI "TARGET_AVX") V8HI + (V16SI "TARGET_AVX512F && TARGET_EVEX512") (V8SI "TARGET_AVX") V4SI + (V8DI "TARGET_AVX512F && TARGET_EVEX512") (V4DI "TARGET_AVX") V2DI + (V32HF "TARGET_AVX512F && TARGET_EVEX512") (V16HF "TARGET_AVX") V8HF + (V16SF "TARGET_AVX512F && TARGET_EVEX512") (V8SF "TARGET_AVX") V4SF + (V8DF "TARGET_AVX512F && TARGET_EVEX512") (V4DF "TARGET_AVX") (V2DF "TARGET_SSE2")]) ;; All 128bit vector modes (define_mode_iterator V_128 @@ -324,22 +324,32 @@ V16HF V8HF V8SF V4SF V4DF V2DF]) ;; All 512bit vector modes -(define_mode_iterator V_512 [V64QI V32HI V16SI V8DI V16SF V8DF V32HF V32BF]) +(define_mode_iterator V_512 + [(V64QI "TARGET_EVEX512") (V32HI "TARGET_EVEX512") + (V16SI "TARGET_EVEX512") (V8DI "TARGET_EVEX512") + (V16SF "TARGET_EVEX512") (V8DF "TARGET_EVEX512") + (V32HF "TARGET_EVEX512") (V32BF "TARGET_EVEX512")]) ;; All 256bit and 512bit vector modes (define_mode_iterator V_256_512 [V32QI V16HI V16HF V16BF V8SI V4DI V8SF V4DF - (V64QI "TARGET_AVX512F") (V32HI "TARGET_AVX512F") (V32HF "TARGET_AVX512F") - (V32BF "TARGET_AVX512F") (V16SI "TARGET_AVX512F") (V8DI "TARGET_AVX512F") - (V16SF "TARGET_AVX512F") (V8DF "TARGET_AVX512F")]) + (V64QI "TARGET_AVX512F && TARGET_EVEX512") + (V32HI "TARGET_AVX512F && TARGET_EVEX512") + (V32HF "TARGET_AVX512F && TARGET_EVEX512") + (V32BF "TARGET_AVX512F && TARGET_EVEX512") + (V16SI "TARGET_AVX512F && TARGET_EVEX512") + (V8DI "TARGET_AVX512F && TARGET_EVEX512") + (V16SF "TARGET_AVX512F && TARGET_EVEX512") + (V8DF "TARGET_AVX512F && TARGET_EVEX512")]) ;; All vector float modes (define_mode_iterator VF - [(V16SF "TARGET_AVX512F") (V8SF "TARGET_AVX") V4SF - (V8DF "TARGET_AVX512F") (V4DF "TARGET_AVX") (V2DF "TARGET_SSE2")]) + [(V16SF "TARGET_AVX512F && TARGET_EVEX512") (V8SF "TARGET_AVX") V4SF + (V8DF "TARGET_AVX512F && TARGET_EVEX512") (V4DF "TARGET_AVX") + (V2DF "TARGET_SSE2")]) (define_mode_iterator VF1_VF2_AVX512DQ - [(V16SF "TARGET_AVX512F") (V8SF "TARGET_AVX") V4SF + [(V16SF "TARGET_AVX512F && TARGET_EVEX512") (V8SF "TARGET_AVX") V4SF (V8DF "TARGET_AVX512DQ") (V4DF "TARGET_AVX512DQ && TARGET_AVX512VL") (V2DF "TARGET_AVX512DQ && TARGET_AVX512VL")]) @@ -347,14 +357,17 @@ [(V32HF "TARGET_AVX512FP16") (V16HF "TARGET_AVX512FP16 && TARGET_AVX512VL") (V8HF "TARGET_AVX512FP16 && TARGET_AVX512VL") - (V16SF "TARGET_AVX512F") (V8SF "TARGET_AVX") V4SF - (V8DF "TARGET_AVX512F") (V4DF "TARGET_AVX") (V2DF "TARGET_SSE2")]) + (V16SF "TARGET_AVX512F && TARGET_EVEX512") (V8SF "TARGET_AVX") V4SF + (V8DF "TARGET_AVX512F && TARGET_EVEX512") (V4DF "TARGET_AVX") + (V2DF "TARGET_SSE2")]) ;; 128-, 256- and 512-bit float vector modes for bitwise operations (define_mode_iterator VFB - [(V32HF "TARGET_AVX512F") (V16HF "TARGET_AVX") (V8HF "TARGET_SSE2") - (V16SF "TARGET_AVX512F") (V8SF "TARGET_AVX") V4SF - (V8DF "TARGET_AVX512F") (V4DF "TARGET_AVX") (V2DF "TARGET_SSE2")]) + [(V32HF "TARGET_AVX512F && TARGET_EVEX512") + (V16HF "TARGET_AVX") (V8HF "TARGET_SSE2") + (V16SF "TARGET_AVX512F && TARGET_EVEX512") (V8SF "TARGET_AVX") V4SF + (V8DF "TARGET_AVX512F && TARGET_EVEX512") + (V4DF "TARGET_AVX") (V2DF "TARGET_SSE2")]) ;; 128- and 256-bit float vector modes (define_mode_iterator VF_128_256 @@ -369,10 +382,10 @@ ;; All SFmode vector float modes (define_mode_iterator VF1 - [(V16SF "TARGET_AVX512F") (V8SF "TARGET_AVX") V4SF]) + [(V16SF "TARGET_AVX512F && TARGET_EVEX512") (V8SF "TARGET_AVX") V4SF]) (define_mode_iterator VF1_AVX2 - [(V16SF "TARGET_AVX512F") (V8SF "TARGET_AVX2") V4SF]) + [(V16SF "TARGET_AVX512F && TARGET_EVEX512") (V8SF "TARGET_AVX2") V4SF]) ;; 128- and 256-bit SF vector modes (define_mode_iterator VF1_128_256 @@ -383,24 +396,24 @@ ;; All DFmode vector float modes (define_mode_iterator VF2 - [(V8DF "TARGET_AVX512F") (V4DF "TARGET_AVX") V2DF]) + [(V8DF "TARGET_AVX512F && TARGET_EVEX512") (V4DF "TARGET_AVX") V2DF]) ;; All DFmode & HFmode vector float modes (define_mode_iterator VF2H [(V32HF "TARGET_AVX512FP16") (V16HF "TARGET_AVX512FP16 && TARGET_AVX512VL") (V8HF "TARGET_AVX512FP16 && TARGET_AVX512VL") - (V8DF "TARGET_AVX512F") (V4DF "TARGET_AVX") V2DF]) + (V8DF "TARGET_AVX512F && TARGET_EVEX512") (V4DF "TARGET_AVX") V2DF]) ;; 128- and 256-bit DF vector modes (define_mode_iterator VF2_128_256 [(V4DF "TARGET_AVX") V2DF]) (define_mode_iterator VF2_512_256 - [(V8DF "TARGET_AVX512F") V4DF]) + [(V8DF "TARGET_AVX512F && TARGET_EVEX512") V4DF]) (define_mode_iterator VF2_512_256VL - [V8DF (V4DF "TARGET_AVX512VL")]) + [(V8DF "TARGET_EVEX512") (V4DF "TARGET_AVX512VL")]) ;; All 128bit vector SF/DF modes (define_mode_iterator VF_128 @@ -417,30 +430,30 @@ ;; All 512bit vector float modes (define_mode_iterator VF_512 - [V16SF V8DF]) + [(V16SF "TARGET_EVEX512") (V8DF "TARGET_EVEX512")]) ;; All 512bit vector float modes for bitwise operations (define_mode_iterator VFB_512 - [V32HF V16SF V8DF]) + [(V32HF "TARGET_EVEX512") (V16SF "TARGET_EVEX512") (V8DF "TARGET_EVEX512")]) (define_mode_iterator V4SF_V8HF [V4SF V8HF]) (define_mode_iterator VI48_AVX512VL - [V16SI (V8SI "TARGET_AVX512VL") (V4SI "TARGET_AVX512VL") - V8DI (V4DI "TARGET_AVX512VL") (V2DI "TARGET_AVX512VL")]) + [(V16SI "TARGET_EVEX512") (V8SI "TARGET_AVX512VL") (V4SI "TARGET_AVX512VL") + (V8DI "TARGET_EVEX512") (V4DI "TARGET_AVX512VL") (V2DI "TARGET_AVX512VL")]) (define_mode_iterator VI1248_AVX512VLBW [(V64QI "TARGET_AVX512BW") (V32QI "TARGET_AVX512VL && TARGET_AVX512BW") (V16QI "TARGET_AVX512VL && TARGET_AVX512BW") (V32HI "TARGET_AVX512BW") (V16HI "TARGET_AVX512VL && TARGET_AVX512BW") (V8HI "TARGET_AVX512VL && TARGET_AVX512BW") - V16SI (V8SI "TARGET_AVX512VL") (V4SI "TARGET_AVX512VL") - V8DI (V4DI "TARGET_AVX512VL") (V2DI "TARGET_AVX512VL")]) + (V16SI "TARGET_EVEX512") (V8SI "TARGET_AVX512VL") (V4SI "TARGET_AVX512VL") + (V8DI "TARGET_EVEX512") (V4DI "TARGET_AVX512VL") (V2DI "TARGET_AVX512VL")]) (define_mode_iterator VF_AVX512VL - [V16SF (V8SF "TARGET_AVX512VL") (V4SF "TARGET_AVX512VL") - V8DF (V4DF "TARGET_AVX512VL") (V2DF "TARGET_AVX512VL")]) + [(V16SF "TARGET_EVEX512") (V8SF "TARGET_AVX512VL") (V4SF "TARGET_AVX512VL") + (V8DF "TARGET_EVEX512") (V4DF "TARGET_AVX512VL") (V2DF "TARGET_AVX512VL")]) ;; AVX512ER SF plus 128- and 256-bit SF vector modes (define_mode_iterator VF1_AVX512ER_128_256 @@ -450,14 +463,14 @@ [(V32HF "TARGET_AVX512FP16") (V16HF "TARGET_AVX512FP16 && TARGET_AVX512VL") (V8HF "TARGET_AVX512FP16 && TARGET_AVX512VL") - V16SF (V8SF "TARGET_AVX512VL") (V4SF "TARGET_AVX512VL") - V8DF (V4DF "TARGET_AVX512VL") (V2DF "TARGET_AVX512VL")]) + (V16SF "TARGET_EVEX512") (V8SF "TARGET_AVX512VL") (V4SF "TARGET_AVX512VL") + (V8DF "TARGET_EVEX512") (V4DF "TARGET_AVX512VL") (V2DF "TARGET_AVX512VL")]) (define_mode_iterator VF2_AVX512VL [V8DF (V4DF "TARGET_AVX512VL") (V2DF "TARGET_AVX512VL")]) (define_mode_iterator VF1_AVX512VL - [V16SF (V8SF "TARGET_AVX512VL") (V4SF "TARGET_AVX512VL")]) + [(V16SF "TARGET_EVEX512") (V8SF "TARGET_AVX512VL") (V4SF "TARGET_AVX512VL")]) (define_mode_iterator VHFBF [V32HF V16HF V8HF V32BF V16BF V8BF]) (define_mode_iterator VHFBF_256 [V16HF V16BF]) @@ -472,7 +485,8 @@ ;; All vector integer modes (define_mode_iterator VI - [(V16SI "TARGET_AVX512F") (V8DI "TARGET_AVX512F") + [(V16SI "TARGET_AVX512F && TARGET_EVEX512") + (V8DI "TARGET_AVX512F && TARGET_EVEX512") (V64QI "TARGET_AVX512BW") (V32QI "TARGET_AVX") V16QI (V32HI "TARGET_AVX512BW") (V16HI "TARGET_AVX") V8HI (V8SI "TARGET_AVX") V4SI @@ -480,7 +494,8 @@ ;; All vector integer and HF modes (define_mode_iterator VIHFBF - [(V16SI "TARGET_AVX512F") (V8DI "TARGET_AVX512F") + [(V16SI "TARGET_AVX512F && TARGET_EVEX512") + (V8DI "TARGET_AVX512F && TARGET_EVEX512") (V64QI "TARGET_AVX512BW") (V32QI "TARGET_AVX") V16QI (V32HI "TARGET_AVX512BW") (V16HI "TARGET_AVX") V8HI (V8SI "TARGET_AVX") V4SI @@ -491,8 +506,8 @@ (define_mode_iterator VI_AVX2 [(V64QI "TARGET_AVX512BW") (V32QI "TARGET_AVX2") V16QI (V32HI "TARGET_AVX512BW") (V16HI "TARGET_AVX2") V8HI - (V16SI "TARGET_AVX512F") (V8SI "TARGET_AVX2") V4SI - (V8DI "TARGET_AVX512F") (V4DI "TARGET_AVX2") V2DI]) + (V16SI "TARGET_AVX512F && TARGET_EVEX512") (V8SI "TARGET_AVX2") V4SI + (V8DI "TARGET_AVX512F && TARGET_EVEX512") (V4DI "TARGET_AVX2") V2DI]) ;; All QImode vector integer modes (define_mode_iterator VI1 @@ -510,13 +525,13 @@ (V8SI "TARGET_AVX") (V4DI "TARGET_AVX")]) (define_mode_iterator VI8 - [(V8DI "TARGET_AVX512F") (V4DI "TARGET_AVX") V2DI]) + [(V8DI "TARGET_AVX512F && TARGET_EVEX512") (V4DI "TARGET_AVX") V2DI]) (define_mode_iterator VI8_FVL [(V8DI "TARGET_AVX512F") V4DI (V2DI "TARGET_AVX512VL")]) (define_mode_iterator VI8_AVX512VL - [V8DI (V4DI "TARGET_AVX512VL") (V2DI "TARGET_AVX512VL")]) + [(V8DI "TARGET_EVEX512") (V4DI "TARGET_AVX512VL") (V2DI "TARGET_AVX512VL")]) (define_mode_iterator VI8_256_512 [V8DI (V4DI "TARGET_AVX512VL")]) @@ -544,7 +559,7 @@ [(V32HI "TARGET_AVX512BW") (V16HI "TARGET_AVX2") V8HI]) (define_mode_iterator VI2_AVX512F - [(V32HI "TARGET_AVX512F") (V16HI "TARGET_AVX2") V8HI]) + [(V32HI "TARGET_AVX512F && TARGET_EVEX512") (V16HI "TARGET_AVX2") V8HI]) (define_mode_iterator VI2_AVX512VNNIBW [(V32HI "TARGET_AVX512BW || TARGET_AVX512VNNI") @@ -557,14 +572,15 @@ [(V8SI "TARGET_AVX2") V4SI]) (define_mode_iterator VI4_AVX512F - [(V16SI "TARGET_AVX512F") (V8SI "TARGET_AVX2") V4SI]) + [(V16SI "TARGET_AVX512F && TARGET_EVEX512") (V8SI "TARGET_AVX2") V4SI]) (define_mode_iterator VI4_AVX512VL - [V16SI (V8SI "TARGET_AVX512VL") (V4SI "TARGET_AVX512VL")]) + [(V16SI "TARGET_EVEX512") (V8SI "TARGET_AVX512VL") (V4SI "TARGET_AVX512VL")]) (define_mode_iterator VI48_AVX512F_AVX512VL - [V4SI V8SI (V16SI "TARGET_AVX512F") - (V2DI "TARGET_AVX512VL") (V4DI "TARGET_AVX512VL") (V8DI "TARGET_AVX512F")]) + [V4SI V8SI (V16SI "TARGET_AVX512F && TARGET_EVEX512") + (V2DI "TARGET_AVX512VL") (V4DI "TARGET_AVX512VL") + (V8DI "TARGET_AVX512F && TARGET_EVEX512")]) (define_mode_iterator VI2_AVX512VL [(V8HI "TARGET_AVX512VL") (V16HI "TARGET_AVX512VL") V32HI]) @@ -589,21 +605,21 @@ [(V4DI "TARGET_AVX2") V2DI]) (define_mode_iterator VI8_AVX2_AVX512F - [(V8DI "TARGET_AVX512F") (V4DI "TARGET_AVX2") V2DI]) + [(V8DI "TARGET_AVX512F && TARGET_EVEX512") (V4DI "TARGET_AVX2") V2DI]) (define_mode_iterator VI8_AVX_AVX512F - [(V8DI "TARGET_AVX512F") (V4DI "TARGET_AVX")]) + [(V8DI "TARGET_AVX512F && TARGET_EVEX512") (V4DI "TARGET_AVX")]) (define_mode_iterator VI4_128_8_256 [V4SI V4DI]) ;; All V8D* modes (define_mode_iterator V8FI - [V8DF V8DI]) + [(V8DF "TARGET_EVEX512") (V8DI "TARGET_EVEX512")]) ;; All V16S* modes (define_mode_iterator V16FI - [V16SF V16SI]) + [(V16SF "TARGET_EVEX512") (V16SI "TARGET_EVEX512")]) ;; ??? We should probably use TImode instead. (define_mode_iterator VIMAX_AVX2_AVX512BW @@ -630,8 +646,8 @@ (define_mode_iterator VI124_AVX2_24_AVX512F_1_AVX512BW [(V64QI "TARGET_AVX512BW") (V32QI "TARGET_AVX2") V16QI - (V32HI "TARGET_AVX512F") (V16HI "TARGET_AVX2") V8HI - (V16SI "TARGET_AVX512F") (V8SI "TARGET_AVX2") V4SI]) + (V32HI "TARGET_AVX512F && TARGET_EVEX512") (V16HI "TARGET_AVX2") V8HI + (V16SI "TARGET_AVX512F && TARGET_EVEX512") (V8SI "TARGET_AVX2") V4SI]) (define_mode_iterator VI124_AVX2 [(V32QI "TARGET_AVX2") V16QI @@ -648,8 +664,8 @@ [(V32HI "TARGET_AVX512BW") (V16HI "TARGET_AVX512VL && TARGET_AVX512BW") (V8HI "TARGET_AVX512VL && TARGET_AVX512BW") - V16SI (V8SI "TARGET_AVX512VL") (V4SI "TARGET_AVX512VL") - V8DI (V4DI "TARGET_AVX512VL") (V2DI "TARGET_AVX512VL")]) + (V16SI "TARGET_EVEX512") (V8SI "TARGET_AVX512VL") (V4SI "TARGET_AVX512VL") + (V8DI "TARGET_EVEX512") (V4DI "TARGET_AVX512VL") (V2DI "TARGET_AVX512VL")]) (define_mode_iterator VI48_AVX2 [(V8SI "TARGET_AVX2") V4SI @@ -663,14 +679,15 @@ (define_mode_iterator VI248_AVX2_8_AVX512F_24_AVX512BW [(V32HI "TARGET_AVX512BW") (V16HI "TARGET_AVX2") V8HI (V16SI "TARGET_AVX512BW") (V8SI "TARGET_AVX2") V4SI - (V8DI "TARGET_AVX512F") (V4DI "TARGET_AVX2") V2DI]) + (V8DI "TARGET_AVX512F && TARGET_EVEX512") (V4DI "TARGET_AVX2") V2DI]) (define_mode_iterator VI248_AVX512BW - [(V32HI "TARGET_AVX512BW") V16SI V8DI]) + [(V32HI "TARGET_AVX512BW") (V16SI "TARGET_EVEX512") + (V8DI "TARGET_EVEX512")]) (define_mode_iterator VI248_AVX512BW_AVX512VL [(V32HI "TARGET_AVX512BW") - (V4DI "TARGET_AVX512VL") V16SI V8DI]) + (V4DI "TARGET_AVX512VL") (V16SI "TARGET_EVEX512") (V8DI "TARGET_EVEX512")]) ;; Suppose TARGET_AVX512VL as baseline (define_mode_iterator VI248_AVX512BW_1 @@ -684,16 +701,16 @@ V4DI V2DI]) (define_mode_iterator VI48_AVX512F - [(V16SI "TARGET_AVX512F") V8SI V4SI - (V8DI "TARGET_AVX512F") V4DI V2DI]) + [(V16SI "TARGET_AVX512F && TARGET_EVEX512") V8SI V4SI + (V8DI "TARGET_AVX512F && TARGET_EVEX512") V4DI V2DI]) (define_mode_iterator VI48_AVX_AVX512F - [(V16SI "TARGET_AVX512F") (V8SI "TARGET_AVX") V4SI - (V8DI "TARGET_AVX512F") (V4DI "TARGET_AVX") V2DI]) + [(V16SI "TARGET_AVX512F && TARGET_EVEX512") (V8SI "TARGET_AVX") V4SI + (V8DI "TARGET_AVX512F && TARGET_EVEX512") (V4DI "TARGET_AVX") V2DI]) (define_mode_iterator VI12_AVX_AVX512F - [ (V64QI "TARGET_AVX512F") (V32QI "TARGET_AVX") V16QI - (V32HI "TARGET_AVX512F") (V16HI "TARGET_AVX") V8HI]) + [(V64QI "TARGET_AVX512F && TARGET_EVEX512") (V32QI "TARGET_AVX") V16QI + (V32HI "TARGET_AVX512F && TARGET_EVEX512") (V16HI "TARGET_AVX") V8HI]) (define_mode_iterator V48_128_256 [V4SF V2DF @@ -834,7 +851,8 @@ (define_mode_iterator VI248_256 [V16HI V8SI V4DI]) (define_mode_iterator VI248_512 [V32HI V16SI V8DI]) (define_mode_iterator VI48_128 [V4SI V2DI]) -(define_mode_iterator VI148_512 [V64QI V16SI V8DI]) +(define_mode_iterator VI148_512 + [(V64QI "TARGET_EVEX512") (V16SI "TARGET_EVEX512") (V8DI "TARGET_EVEX512")]) (define_mode_iterator VI148_256 [V32QI V8SI V4DI]) (define_mode_iterator VI148_128 [V16QI V4SI V2DI]) @@ -844,15 +862,18 @@ [V32QI V16HI V8SI (V64QI "TARGET_AVX512BW") (V32HI "TARGET_AVX512BW") - (V16SI "TARGET_AVX512F")]) + (V16SI "TARGET_AVX512F && TARGET_EVEX512")]) (define_mode_iterator VI48_256 [V8SI V4DI]) -(define_mode_iterator VI48_512 [V16SI V8DI]) +(define_mode_iterator VI48_512 + [(V16SI "TARGET_EVEX512") (V8DI "TARGET_EVEX512")]) (define_mode_iterator VI4_256_8_512 [V8SI V8DI]) (define_mode_iterator VI_AVX512BW - [V16SI V8DI (V32HI "TARGET_AVX512BW") (V64QI "TARGET_AVX512BW")]) + [(V16SI "TARGET_EVEX512") (V8DI "TARGET_EVEX512") + (V32HI "TARGET_AVX512BW") (V64QI "TARGET_AVX512BW")]) (define_mode_iterator VIHFBF_AVX512BW - [V16SI V8DI (V32HI "TARGET_AVX512BW") (V64QI "TARGET_AVX512BW") - (V32HF "TARGET_AVX512BW") (V32BF "TARGET_AVX512BW")]) + [(V16SI "TARGET_EVEX512") (V8DI "TARGET_EVEX512") + (V32HI "TARGET_AVX512BW") (V64QI "TARGET_AVX512BW") + (V32HF "TARGET_AVX512BW") (V32BF "TARGET_AVX512BW")]) ;; Int-float size matches (define_mode_iterator VI2F_256_512 [V16HI V32HI V16HF V32HF V16BF V32BF]) @@ -862,12 +883,15 @@ (define_mode_iterator VI8F_256 [V4DI V4DF]) (define_mode_iterator VI4F_256_512 [V8SI V8SF - (V16SI "TARGET_AVX512F") (V16SF "TARGET_AVX512F")]) + (V16SI "TARGET_AVX512F && TARGET_EVEX512") + (V16SF "TARGET_AVX512F && TARGET_EVEX512")]) (define_mode_iterator VI48F_256_512 [V8SI V8SF - (V16SI "TARGET_AVX512F") (V16SF "TARGET_AVX512F") - (V8DI "TARGET_AVX512F") (V8DF "TARGET_AVX512F") - (V4DI "TARGET_AVX512VL") (V4DF "TARGET_AVX512VL")]) + (V16SI "TARGET_AVX512F && TARGET_EVEX512") + (V16SF "TARGET_AVX512F && TARGET_EVEX512") + (V8DI "TARGET_AVX512F && TARGET_EVEX512") + (V8DF "TARGET_AVX512F && TARGET_EVEX512") + (V4DI "TARGET_AVX512VL") (V4DF "TARGET_AVX512VL")]) (define_mode_iterator VF48_I1248 [V16SI V16SF V8DI V8DF V32HI V64QI]) (define_mode_iterator VF48H_AVX512VL @@ -877,14 +901,17 @@ [V2DF V4SF]) (define_mode_iterator VI48F - [V16SI V16SF V8DI V8DF + [(V16SI "TARGET_EVEX512") (V16SF "TARGET_EVEX512") + (V8DI "TARGET_EVEX512") (V8DF "TARGET_EVEX512") (V8SI "TARGET_AVX512VL") (V8SF "TARGET_AVX512VL") (V4DI "TARGET_AVX512VL") (V4DF "TARGET_AVX512VL") (V4SI "TARGET_AVX512VL") (V4SF "TARGET_AVX512VL") (V2DI "TARGET_AVX512VL") (V2DF "TARGET_AVX512VL")]) (define_mode_iterator VI12_VI48F_AVX512VL - [(V16SI "TARGET_AVX512F") (V16SF "TARGET_AVX512F") - (V8DI "TARGET_AVX512F") (V8DF "TARGET_AVX512F") + [(V16SI "TARGET_AVX512F && TARGET_EVEX512") + (V16SF "TARGET_AVX512F && TARGET_EVEX512") + (V8DI "TARGET_AVX512F && TARGET_EVEX512") + (V8DF "TARGET_AVX512F && TARGET_EVEX512") (V8SI "TARGET_AVX512VL") (V8SF "TARGET_AVX512VL") (V4DI "TARGET_AVX512VL") (V4DF "TARGET_AVX512VL") (V4SI "TARGET_AVX512VL") (V4SF "TARGET_AVX512VL") @@ -901,7 +928,8 @@ (define_mode_iterator V8_128 [V8HI V8HF V8BF]) (define_mode_iterator V16_256 [V16HI V16HF V16BF]) -(define_mode_iterator V32_512 [V32HI V32HF V32BF]) +(define_mode_iterator V32_512 + [(V32HI "TARGET_EVEX512") (V32HF "TARGET_EVEX512") (V32BF "TARGET_EVEX512")]) ;; Mapping from float mode to required SSE level (define_mode_attr sse @@ -1295,7 +1323,8 @@ ;; Mix-n-match (define_mode_iterator AVX256MODE2P [V8SI V8SF V4DF]) -(define_mode_iterator AVX512MODE2P [V16SI V16SF V8DF]) +(define_mode_iterator AVX512MODE2P + [(V16SI "TARGET_EVEX512") (V16SF "TARGET_EVEX512") (V8DF "TARGET_EVEX512")]) ;; Mapping for dbpsabbw modes (define_mode_attr dbpsadbwmode @@ -1897,9 +1926,11 @@ (define_mode_iterator STORENT_MODE [(DI "TARGET_SSE2 && TARGET_64BIT") (SI "TARGET_SSE2") (SF "TARGET_SSE4A") (DF "TARGET_SSE4A") - (V8DI "TARGET_AVX512F") (V4DI "TARGET_AVX") (V2DI "TARGET_SSE2") - (V16SF "TARGET_AVX512F") (V8SF "TARGET_AVX") V4SF - (V8DF "TARGET_AVX512F") (V4DF "TARGET_AVX") (V2DF "TARGET_SSE2")]) + (V8DI "TARGET_AVX512F && TARGET_EVEX512") + (V4DI "TARGET_AVX") (V2DI "TARGET_SSE2") + (V16SF "TARGET_AVX512F && TARGET_EVEX512") (V8SF "TARGET_AVX") V4SF + (V8DF "TARGET_AVX512F && TARGET_EVEX512") + (V4DF "TARGET_AVX") (V2DF "TARGET_SSE2")]) (define_expand "storent" [(set (match_operand:STORENT_MODE 0 "memory_operand") @@ -3377,9 +3408,11 @@ (define_mode_iterator REDUC_PLUS_MODE [(V4DF "TARGET_AVX") (V8SF "TARGET_AVX") (V16HF "TARGET_AVX512FP16 && TARGET_AVX512VL") - (V8DF "TARGET_AVX512F") (V16SF "TARGET_AVX512F") + (V8DF "TARGET_AVX512F && TARGET_EVEX512") + (V16SF "TARGET_AVX512F && TARGET_EVEX512") (V32HF "TARGET_AVX512FP16 && TARGET_AVX512VL") - (V32QI "TARGET_AVX") (V64QI "TARGET_AVX512F")]) + (V32QI "TARGET_AVX") + (V64QI "TARGET_AVX512F && TARGET_EVEX512")]) (define_expand "reduc_plus_scal_" [(plus:REDUC_PLUS_MODE @@ -3423,9 +3456,11 @@ (V8SF "TARGET_AVX") (V4DF "TARGET_AVX") (V64QI "TARGET_AVX512BW") (V32HF "TARGET_AVX512FP16 && TARGET_AVX512VL") - (V32HI "TARGET_AVX512BW") (V16SI "TARGET_AVX512F") - (V8DI "TARGET_AVX512F") (V16SF "TARGET_AVX512F") - (V8DF "TARGET_AVX512F")]) + (V32HI "TARGET_AVX512BW") + (V16SI "TARGET_AVX512F && TARGET_EVEX512") + (V8DI "TARGET_AVX512F && TARGET_EVEX512") + (V16SF "TARGET_AVX512F && TARGET_EVEX512") + (V8DF "TARGET_AVX512F && TARGET_EVEX512")]) (define_expand "reduc__scal_" [(smaxmin:REDUC_SMINMAX_MODE @@ -5035,7 +5070,7 @@ output_asm_insn (buf, operands); return ""; } - [(set_attr "isa" "noavx,avx,avx512vl,avx512f") + [(set_attr "isa" "noavx,avx,avx512vl,avx512f_512") (set_attr "type" "sselog") (set_attr "prefix" "orig,vex,evex,evex") (set (attr "mode") @@ -5092,7 +5127,7 @@ output_asm_insn (buf, operands); return ""; } - [(set_attr "isa" "noavx,avx,avx512vl,avx512f") + [(set_attr "isa" "noavx,avx,avx512vl,avx512f_512") (set_attr "type" "sselog") (set (attr "prefix_data16") (if_then_else @@ -5161,7 +5196,7 @@ output_asm_insn (buf, operands); return ""; } - [(set_attr "isa" "noavx,avx,avx512vl,avx512f") + [(set_attr "isa" "noavx,avx,avx512vl,avx512f_512") (set_attr "type" "sselog") (set_attr "prefix" "orig,vex,evex,evex") (set (attr "mode") @@ -5223,7 +5258,7 @@ output_asm_insn (buf, operands); return ""; } - [(set_attr "isa" "noavx,avx,avx512vl,avx512f") + [(set_attr "isa" "noavx,avx,avx512vl,avx512f_512") (set_attr "type" "sselog") (set (attr "prefix_data16") (if_then_else @@ -5269,8 +5304,8 @@ (V2DF "TARGET_FMA || TARGET_FMA4 || TARGET_AVX512VL") (V8SF "TARGET_FMA || TARGET_FMA4 || TARGET_AVX512VL") (V4DF "TARGET_FMA || TARGET_FMA4 || TARGET_AVX512VL") - (V16SF "TARGET_AVX512F") - (V8DF "TARGET_AVX512F") + (V16SF "TARGET_AVX512F && TARGET_EVEX512") + (V8DF "TARGET_AVX512F && TARGET_EVEX512") (HF "TARGET_AVX512FP16") (V8HF "TARGET_AVX512FP16 && TARGET_AVX512VL") (V16HF "TARGET_AVX512FP16 && TARGET_AVX512VL") @@ -5312,8 +5347,8 @@ (V2DF "TARGET_FMA || TARGET_FMA4 || TARGET_AVX512VL") (V8SF "TARGET_FMA || TARGET_FMA4 || TARGET_AVX512VL") (V4DF "TARGET_FMA || TARGET_FMA4 || TARGET_AVX512VL") - (V16SF "TARGET_AVX512F") - (V8DF "TARGET_AVX512F")]) + (V16SF "TARGET_AVX512F && TARGET_EVEX512") + (V8DF "TARGET_AVX512F && TARGET_EVEX512")]) (define_mode_iterator FMAMODE [SF DF V4SF V2DF V8SF V4DF]) @@ -5387,8 +5422,10 @@ (V16HF "TARGET_AVX512FP16 && TARGET_AVX512VL") (V8HF "TARGET_AVX512FP16 && TARGET_AVX512VL") (HF "TARGET_AVX512FP16") - SF V16SF (V8SF "TARGET_AVX512VL") (V4SF "TARGET_AVX512VL") - DF V8DF (V4DF "TARGET_AVX512VL") (V2DF "TARGET_AVX512VL")]) + SF (V16SF "TARGET_EVEX512") + (V8SF "TARGET_AVX512VL") (V4SF "TARGET_AVX512VL") + DF (V8DF "TARGET_EVEX512") + (V4DF "TARGET_AVX512VL") (V2DF "TARGET_AVX512VL")]) (define_insn "fma_fmadd_" [(set (match_operand:VFH_SF_AVX512VL 0 "register_operand" "=v,v,v") @@ -8028,7 +8065,7 @@ (unspec:V16SI [(match_operand:V16SF 1 "" "")] UNSPEC_FIX_NOTRUNC))] - "TARGET_AVX512F" + "TARGET_AVX512F && TARGET_EVEX512" "vcvtps2dq\t{%1, %0|%0, %1}" [(set_attr "type" "ssecvt") (set_attr "prefix" "evex") @@ -8095,7 +8132,7 @@ [(set (match_operand:V16SI 0 "register_operand" "=v") (any_fix:V16SI (match_operand:V16SF 1 "" "")))] - "TARGET_AVX512F" + "TARGET_AVX512F && TARGET_EVEX512" "vcvttps2dq\t{%1, %0|%0, %1}" [(set_attr "type" "ssecvt") (set_attr "prefix" "evex") @@ -8595,7 +8632,7 @@ (const_int 2) (const_int 3) (const_int 4) (const_int 5) (const_int 6) (const_int 7)]))))] - "TARGET_AVX512F" + "TARGET_AVX512F && TARGET_EVEX512" "vcvtdq2pd\t{%t1, %0|%0, %t1}" [(set_attr "type" "ssecvt") (set_attr "prefix" "evex") @@ -8631,7 +8668,7 @@ (unspec:V8SI [(match_operand:V8DF 1 "" "")] UNSPEC_FIX_NOTRUNC))] - "TARGET_AVX512F" + "TARGET_AVX512F && TARGET_EVEX512" "vcvtpd2dq\t{%1, %0|%0, %1}" [(set_attr "type" "ssecvt") (set_attr "prefix" "evex") @@ -8789,7 +8826,7 @@ [(set (match_operand:V8SI 0 "register_operand" "=v") (any_fix:V8SI (match_operand:V8DF 1 "" "")))] - "TARGET_AVX512F" + "TARGET_AVX512F && TARGET_EVEX512" "vcvttpd2dq\t{%1, %0|%0, %1}" [(set_attr "type" "ssecvt") (set_attr "prefix" "evex") @@ -9193,7 +9230,7 @@ [(set (match_operand:V8SF 0 "register_operand" "=v") (float_truncate:V8SF (match_operand:V8DF 1 "" "")))] - "TARGET_AVX512F" + "TARGET_AVX512F && TARGET_EVEX512" "vcvtpd2ps\t{%1, %0|%0, %1}" [(set_attr "type" "ssecvt") (set_attr "prefix" "evex") @@ -9355,7 +9392,7 @@ (const_int 2) (const_int 3) (const_int 4) (const_int 5) (const_int 6) (const_int 7)]))))] - "TARGET_AVX512F" + "TARGET_AVX512F && TARGET_EVEX512" "vcvtps2pd\t{%t1, %0|%0, %t1}" [(set_attr "type" "ssecvt") (set_attr "prefix" "evex") @@ -9540,7 +9577,7 @@ (set (match_operand:V8DF 0 "register_operand") (float_extend:V8DF (match_dup 2)))] -"TARGET_AVX512F" +"TARGET_AVX512F && TARGET_EVEX512" "operands[2] = gen_reg_rtx (V8SFmode);") (define_expand "vec_unpacks_lo_v4sf" @@ -9678,7 +9715,7 @@ (set (match_operand:V8DF 0 "register_operand") (float:V8DF (match_dup 2)))] - "TARGET_AVX512F" + "TARGET_AVX512F && TARGET_EVEX512" "operands[2] = gen_reg_rtx (V8SImode);") (define_expand "vec_unpacks_float_lo_v16si" @@ -9690,7 +9727,7 @@ (const_int 2) (const_int 3) (const_int 4) (const_int 5) (const_int 6) (const_int 7)]))))] - "TARGET_AVX512F") + "TARGET_AVX512F && TARGET_EVEX512") (define_expand "vec_unpacku_float_hi_v4si" [(set (match_dup 5) @@ -9786,7 +9823,7 @@ (define_expand "vec_unpacku_float_hi_v16si" [(match_operand:V8DF 0 "register_operand") (match_operand:V16SI 1 "register_operand")] - "TARGET_AVX512F" + "TARGET_AVX512F && TARGET_EVEX512" { REAL_VALUE_TYPE TWO32r; rtx k, x, tmp[4]; @@ -9835,7 +9872,7 @@ (define_expand "vec_unpacku_float_lo_v16si" [(match_operand:V8DF 0 "register_operand") (match_operand:V16SI 1 "nonimmediate_operand")] - "TARGET_AVX512F" + "TARGET_AVX512F && TARGET_EVEX512" { REAL_VALUE_TYPE TWO32r; rtx k, x, tmp[3]; @@ -9929,7 +9966,7 @@ [(match_operand:V16SI 0 "register_operand") (match_operand:V8DF 1 "nonimmediate_operand") (match_operand:V8DF 2 "nonimmediate_operand")] - "TARGET_AVX512F" + "TARGET_AVX512F && TARGET_EVEX512" { rtx r1, r2; @@ -10044,7 +10081,7 @@ [(match_operand:V16SI 0 "register_operand") (match_operand:V8DF 1 "nonimmediate_operand") (match_operand:V8DF 2 "nonimmediate_operand")] - "TARGET_AVX512F" + "TARGET_AVX512F && TARGET_EVEX512" { rtx r1, r2; @@ -10237,7 +10274,7 @@ (const_int 11) (const_int 27) (const_int 14) (const_int 30) (const_int 15) (const_int 31)])))] - "TARGET_AVX512F" + "TARGET_AVX512F && TARGET_EVEX512" "vunpckhps\t{%2, %1, %0|%0, %1, %2}" [(set_attr "type" "sselog") (set_attr "prefix" "evex") @@ -10325,7 +10362,7 @@ (const_int 9) (const_int 25) (const_int 12) (const_int 28) (const_int 13) (const_int 29)])))] - "TARGET_AVX512F" + "TARGET_AVX512F && TARGET_EVEX512" "vunpcklps\t{%2, %1, %0|%0, %1, %2}" [(set_attr "type" "sselog") (set_attr "prefix" "evex") @@ -10465,7 +10502,7 @@ (const_int 11) (const_int 11) (const_int 13) (const_int 13) (const_int 15) (const_int 15)])))] - "TARGET_AVX512F" + "TARGET_AVX512F && TARGET_EVEX512" "vmovshdup\t{%1, %0|%0, %1}" [(set_attr "type" "sse") (set_attr "prefix" "evex") @@ -10518,7 +10555,7 @@ (const_int 10) (const_int 10) (const_int 12) (const_int 12) (const_int 14) (const_int 14)])))] - "TARGET_AVX512F" + "TARGET_AVX512F && TARGET_EVEX512" "vmovsldup\t{%1, %0|%0, %1}" [(set_attr "type" "sse") (set_attr "prefix" "evex") @@ -11429,7 +11466,8 @@ (V8SF "32x4") (V8SI "32x4") (V4DF "64x2") (V4DI "64x2")]) (define_mode_iterator AVX512_VEC - [(V8DF "TARGET_AVX512DQ") (V8DI "TARGET_AVX512DQ") V16SF V16SI]) + [(V8DF "TARGET_AVX512DQ") (V8DI "TARGET_AVX512DQ") + (V16SF "TARGET_EVEX512") (V16SI "TARGET_EVEX512")]) (define_expand "_vextract_mask" [(match_operand: 0 "nonimmediate_operand") @@ -11598,7 +11636,8 @@ [(V16SF "32x8") (V16SI "32x8") (V8DF "64x4") (V8DI "64x4")]) (define_mode_iterator AVX512_VEC_2 - [(V16SF "TARGET_AVX512DQ") (V16SI "TARGET_AVX512DQ") V8DF V8DI]) + [(V16SF "TARGET_AVX512DQ") (V16SI "TARGET_AVX512DQ") + (V8DF "TARGET_EVEX512") (V8DI "TARGET_EVEX512")]) (define_expand "_vextract_mask" [(match_operand: 0 "nonimmediate_operand") @@ -12155,7 +12194,8 @@ (const_int 26) (const_int 27) (const_int 28) (const_int 29) (const_int 30) (const_int 31)])))] - "TARGET_AVX512F && !(MEM_P (operands[0]) && MEM_P (operands[1]))" + "TARGET_AVX512F && TARGET_EVEX512 + && !(MEM_P (operands[0]) && MEM_P (operands[1]))" { if (TARGET_AVX512VL || REG_P (operands[0]) @@ -12203,7 +12243,7 @@ (const_int 58) (const_int 59) (const_int 60) (const_int 61) (const_int 62) (const_int 63)])))] - "TARGET_AVX512F" + "TARGET_AVX512F && TARGET_EVEX512" "vextracti64x4\t{$0x1, %1, %0|%0, %1, 0x1}" [(set_attr "type" "sselog1") (set_attr "length_immediate" "1") @@ -12299,13 +12339,13 @@ (define_mode_iterator VEC_EXTRACT_MODE [(V64QI "TARGET_AVX512BW") (V32QI "TARGET_AVX") V16QI (V32HI "TARGET_AVX512BW") (V16HI "TARGET_AVX") V8HI - (V16SI "TARGET_AVX512F") (V8SI "TARGET_AVX") V4SI - (V8DI "TARGET_AVX512F") (V4DI "TARGET_AVX") V2DI + (V16SI "TARGET_AVX512F && TARGET_EVEX512") (V8SI "TARGET_AVX") V4SI + (V8DI "TARGET_AVX512F && TARGET_EVEX512") (V4DI "TARGET_AVX") V2DI (V32HF "TARGET_AVX512BW") (V16HF "TARGET_AVX") V8HF (V32BF "TARGET_AVX512BW") (V16BF "TARGET_AVX") V8BF - (V16SF "TARGET_AVX512F") (V8SF "TARGET_AVX") V4SF - (V8DF "TARGET_AVX512F") (V4DF "TARGET_AVX") V2DF - (V4TI "TARGET_AVX512F") (V2TI "TARGET_AVX")]) + (V16SF "TARGET_AVX512F && TARGET_EVEX512") (V8SF "TARGET_AVX") V4SF + (V8DF "TARGET_AVX512F && TARGET_EVEX512") (V4DF "TARGET_AVX") V2DF + (V4TI "TARGET_AVX512F && TARGET_EVEX512") (V2TI "TARGET_AVX")]) (define_expand "vec_extract" [(match_operand: 0 "register_operand") @@ -12347,7 +12387,7 @@ (const_int 3) (const_int 11) (const_int 5) (const_int 13) (const_int 7) (const_int 15)])))] - "TARGET_AVX512F" + "TARGET_AVX512F && TARGET_EVEX512" "vunpckhpd\t{%2, %1, %0|%0, %1, %2}" [(set_attr "type" "sselog") (set_attr "prefix" "evex") @@ -12461,7 +12501,7 @@ (const_int 2) (const_int 10) (const_int 4) (const_int 12) (const_int 6) (const_int 14)])))] - "TARGET_AVX512F" + "TARGET_AVX512F && TARGET_EVEX512" "vmovddup\t{%1, %0|%0, %1}" [(set_attr "type" "sselog1") (set_attr "prefix" "evex") @@ -12477,7 +12517,7 @@ (const_int 2) (const_int 10) (const_int 4) (const_int 12) (const_int 6) (const_int 14)])))] - "TARGET_AVX512F" + "TARGET_AVX512F && TARGET_EVEX512" "vunpcklpd\t{%2, %1, %0|%0, %1, %2}" [(set_attr "type" "sselog") (set_attr "prefix" "evex") @@ -12689,7 +12729,7 @@ (match_operand:SI 4 "const_0_to_255_operand")] UNSPEC_VTERNLOG))] "( == 64 || TARGET_AVX512VL - || (TARGET_AVX512F && !TARGET_PREFER_AVX256)) + || (TARGET_AVX512F && TARGET_EVEX512 && !TARGET_PREFER_AVX256)) /* Disallow embeded broadcast for vector HFmode since it's not real AVX512FP16 instruction. */ && (GET_MODE_SIZE (GET_MODE_INNER (mode)) >= 4 @@ -12781,7 +12821,7 @@ (match_operand:V 3 "regmem_or_bitnot_regmem_operand") (match_operand:V 4 "regmem_or_bitnot_regmem_operand"))))] "( == 64 || TARGET_AVX512VL - || (TARGET_AVX512F && !TARGET_PREFER_AVX256)) + || (TARGET_AVX512F && TARGET_EVEX512 && !TARGET_PREFER_AVX256)) && ix86_pre_reload_split () && (rtx_equal_p (STRIP_UNARY (operands[1]), STRIP_UNARY (operands[4])) @@ -12866,7 +12906,7 @@ (match_operand:V 3 "regmem_or_bitnot_regmem_operand")) (match_operand:V 4 "regmem_or_bitnot_regmem_operand")))] "( == 64 || TARGET_AVX512VL - || (TARGET_AVX512F && !TARGET_PREFER_AVX256)) + || (TARGET_AVX512F && TARGET_EVEX512 && !TARGET_PREFER_AVX256)) && ix86_pre_reload_split () && (rtx_equal_p (STRIP_UNARY (operands[1]), STRIP_UNARY (operands[4])) @@ -12950,7 +12990,7 @@ (match_operand:V 2 "regmem_or_bitnot_regmem_operand")) (match_operand:V 3 "regmem_or_bitnot_regmem_operand")))] "( == 64 || TARGET_AVX512VL - || (TARGET_AVX512F && !TARGET_PREFER_AVX256)) + || (TARGET_AVX512F && TARGET_EVEX512 && !TARGET_PREFER_AVX256)) && ix86_pre_reload_split ()" "#" "&& 1" @@ -13074,7 +13114,7 @@ (match_operand:SI 3 "const_0_to_255_operand") (match_operand:V16SF 4 "register_operand") (match_operand:HI 5 "register_operand")] - "TARGET_AVX512F" + "TARGET_AVX512F && TARGET_EVEX512" { int mask = INTVAL (operands[3]); emit_insn (gen_avx512f_shufps512_1_mask (operands[0], operands[1], operands[2], @@ -13261,7 +13301,7 @@ (match_operand 16 "const_12_to_15_operand") (match_operand 17 "const_28_to_31_operand") (match_operand 18 "const_28_to_31_operand")])))] - "TARGET_AVX512F + "TARGET_AVX512F && TARGET_EVEX512 && (INTVAL (operands[3]) == (INTVAL (operands[7]) - 4) && INTVAL (operands[4]) == (INTVAL (operands[8]) - 4) && INTVAL (operands[5]) == (INTVAL (operands[9]) - 4) @@ -13296,7 +13336,7 @@ (match_operand:SI 3 "const_0_to_255_operand") (match_operand:V8DF 4 "register_operand") (match_operand:QI 5 "register_operand")] - "TARGET_AVX512F" + "TARGET_AVX512F && TARGET_EVEX512" { int mask = INTVAL (operands[3]); emit_insn (gen_avx512f_shufpd512_1_mask (operands[0], operands[1], operands[2], @@ -13326,7 +13366,7 @@ (match_operand 8 "const_12_to_13_operand") (match_operand 9 "const_6_to_7_operand") (match_operand 10 "const_14_to_15_operand")])))] - "TARGET_AVX512F" + "TARGET_AVX512F && TARGET_EVEX512" { int mask; mask = INTVAL (operands[3]); @@ -13458,7 +13498,7 @@ (const_int 3) (const_int 11) (const_int 5) (const_int 13) (const_int 7) (const_int 15)])))] - "TARGET_AVX512F" + "TARGET_AVX512F && TARGET_EVEX512" "vpunpckhqdq\t{%2, %1, %0|%0, %1, %2}" [(set_attr "type" "sselog") (set_attr "prefix" "evex") @@ -13508,7 +13548,7 @@ (const_int 2) (const_int 10) (const_int 4) (const_int 12) (const_int 6) (const_int 14)])))] - "TARGET_AVX512F" + "TARGET_AVX512F && TARGET_EVEX512" "vpunpcklqdq\t{%2, %1, %0|%0, %1, %2}" [(set_attr "type" "sselog") (set_attr "prefix" "evex") @@ -13872,8 +13912,8 @@ (set_attr "mode" "V2DF,DF,V8DF") (set (attr "enabled") (cond [(eq_attr "alternative" "2") - (symbol_ref "TARGET_AVX512F && !TARGET_AVX512VL - && !TARGET_PREFER_AVX256") + (symbol_ref "TARGET_AVX512F && TARGET_EVEX512 + && !TARGET_AVX512VL && !TARGET_PREFER_AVX256") (match_test "") (const_string "*") ] @@ -13957,13 +13997,13 @@ [(set (match_operand:PMOV_DST_MODE_1 0 "nonimmediate_operand") (truncate:PMOV_DST_MODE_1 (match_operand: 1 "register_operand")))] - "TARGET_AVX512F") + "TARGET_AVX512F && TARGET_EVEX512") (define_insn "*avx512f_2" [(set (match_operand:PMOV_DST_MODE_1 0 "nonimmediate_operand" "=v,m") (any_truncate:PMOV_DST_MODE_1 (match_operand: 1 "register_operand" "v,v")))] - "TARGET_AVX512F" + "TARGET_AVX512F && TARGET_EVEX512" "vpmov\t{%1, %0|%0, %1}" [(set_attr "type" "ssemov") (set_attr "memory" "none,store") @@ -14094,7 +14134,7 @@ (const_int 2) (const_int 3) (const_int 4) (const_int 5) (const_int 6) (const_int 7)])))] - "TARGET_AVX512F && ix86_pre_reload_split ()" + "TARGET_AVX512F && TARGET_EVEX512 && ix86_pre_reload_split ()" "#" "&& 1" [(set (match_dup 0) @@ -14110,7 +14150,7 @@ (match_operand: 1 "register_operand" "v,v")) (match_operand:PMOV_DST_MODE_1 2 "nonimm_or_0_operand" "0C,0") (match_operand: 3 "register_operand" "Yk,Yk")))] - "TARGET_AVX512F" + "TARGET_AVX512F && TARGET_EVEX512" "vpmov\t{%1, %0%{%3%}%N2|%0%{%3%}%N2, %1}" [(set_attr "type" "ssemov") (set_attr "memory" "none,store") @@ -14124,7 +14164,7 @@ (match_operand: 1 "register_operand")) (match_dup 0) (match_operand: 2 "register_operand")))] - "TARGET_AVX512F") + "TARGET_AVX512F && TARGET_EVEX512") (define_expand "truncv32hiv32qi2" [(set (match_operand:V32QI 0 "nonimmediate_operand") @@ -15072,7 +15112,7 @@ [(set (match_operand:V8QI 0 "register_operand") (truncate:V8QI (match_operand:V8DI 1 "register_operand")))] - "TARGET_AVX512F" + "TARGET_AVX512F && TARGET_EVEX512" { rtx op0 = gen_reg_rtx (V16QImode); @@ -15092,7 +15132,7 @@ (const_int 0) (const_int 0) (const_int 0) (const_int 0) (const_int 0) (const_int 0)])))] - "TARGET_AVX512F" + "TARGET_AVX512F && TARGET_EVEX512" "vpmovqb\t{%1, %0|%0, %1}" [(set_attr "type" "ssemov") (set_attr "prefix" "evex") @@ -15102,7 +15142,7 @@ [(set (match_operand:V8QI 0 "memory_operand" "=m") (any_truncate:V8QI (match_operand:V8DI 1 "register_operand" "v")))] - "TARGET_AVX512F" + "TARGET_AVX512F && TARGET_EVEX512" "vpmovqb\t{%1, %0|%0, %1}" [(set_attr "type" "ssemov") (set_attr "memory" "store") @@ -15114,7 +15154,7 @@ (subreg:DI (any_truncate:V8QI (match_operand:V8DI 1 "register_operand")) 0))] - "TARGET_AVX512F && ix86_pre_reload_split ()" + "TARGET_AVX512F && TARGET_EVEX512 && ix86_pre_reload_split ()" "#" "&& 1" [(set (match_dup 0) @@ -15138,7 +15178,7 @@ (const_int 0) (const_int 0) (const_int 0) (const_int 0) (const_int 0) (const_int 0)])))] - "TARGET_AVX512F" + "TARGET_AVX512F && TARGET_EVEX512" "vpmovqb\t{%1, %0%{%3%}%N2|%0%{%3%}%N2, %1}" [(set_attr "type" "ssemov") (set_attr "prefix" "evex") @@ -15159,7 +15199,7 @@ (const_int 0) (const_int 0) (const_int 0) (const_int 0) (const_int 0) (const_int 0)])))] - "TARGET_AVX512F" + "TARGET_AVX512F && TARGET_EVEX512" "vpmovqb\t{%1, %0%{%2%}%{z%}|%0%{%2%}%{z%}, %1}" [(set_attr "type" "ssemov") (set_attr "prefix" "evex") @@ -15172,7 +15212,7 @@ (match_operand:V8DI 1 "register_operand" "v")) (match_dup 0) (match_operand:QI 2 "register_operand" "Yk")))] - "TARGET_AVX512F" + "TARGET_AVX512F && TARGET_EVEX512" "vpmovqb\t{%1, %0%{%2%}|%0%{%2%}, %1}" [(set_attr "type" "ssemov") (set_attr "memory" "store") @@ -15195,7 +15235,7 @@ (const_int 4) (const_int 5) (const_int 6) (const_int 7)])) (match_operand:QI 2 "register_operand")) 0))] - "TARGET_AVX512F && ix86_pre_reload_split ()" + "TARGET_AVX512F && TARGET_EVEX512 && ix86_pre_reload_split ()" "#" "&& 1" [(set (match_dup 0) @@ -15453,7 +15493,7 @@ (const_int 4) (const_int 6) (const_int 8) (const_int 10) (const_int 12) (const_int 14)])))))] - "TARGET_AVX512F" + "TARGET_AVX512F && TARGET_EVEX512" "ix86_fixup_binary_operands_no_copy (MULT, V16SImode, operands);") (define_insn "*vec_widen_umult_even_v16si" @@ -15473,7 +15513,8 @@ (const_int 4) (const_int 6) (const_int 8) (const_int 10) (const_int 12) (const_int 14)])))))] - "TARGET_AVX512F && !(MEM_P (operands[1]) && MEM_P (operands[2]))" + "TARGET_AVX512F && TARGET_EVEX512 + && !(MEM_P (operands[1]) && MEM_P (operands[2]))" "vpmuludq\t{%2, %1, %0|%0, %1, %2}" [(set_attr "type" "sseimul") (set_attr "prefix" "evex") @@ -15568,7 +15609,7 @@ (const_int 4) (const_int 6) (const_int 8) (const_int 10) (const_int 12) (const_int 14)])))))] - "TARGET_AVX512F" + "TARGET_AVX512F && TARGET_EVEX512" "ix86_fixup_binary_operands_no_copy (MULT, V16SImode, operands);") (define_insn "*vec_widen_smult_even_v16si" @@ -15588,7 +15629,8 @@ (const_int 4) (const_int 6) (const_int 8) (const_int 10) (const_int 12) (const_int 14)])))))] - "TARGET_AVX512F && !(MEM_P (operands[1]) && MEM_P (operands[2]))" + "TARGET_AVX512F && TARGET_EVEX512 + && !(MEM_P (operands[1]) && MEM_P (operands[2]))" "vpmuldq\t{%2, %1, %0|%0, %1, %2}" [(set_attr "type" "sseimul") (set_attr "prefix" "evex") @@ -17263,8 +17305,10 @@ (V8SI "TARGET_AVX2") (V4DI "TARGET_AVX2") (V8SF "TARGET_AVX2") (V4DF "TARGET_AVX2") (V16HF "TARGET_AVX512FP16") - (V16SF "TARGET_AVX512F") (V8DF "TARGET_AVX512F") - (V16SI "TARGET_AVX512F") (V8DI "TARGET_AVX512F") + (V16SF "TARGET_AVX512F && TARGET_EVEX512") + (V8DF "TARGET_AVX512F && TARGET_EVEX512") + (V16SI "TARGET_AVX512F && TARGET_EVEX512") + (V8DI "TARGET_AVX512F && TARGET_EVEX512") (V32HI "TARGET_AVX512BW") (V64QI "TARGET_AVX512VBMI") (V32HF "TARGET_AVX512FP16")]) @@ -17293,7 +17337,7 @@ { operands[2] = CONSTM1_RTX (mode); - if (!TARGET_AVX512F) + if (!TARGET_AVX512F || (!TARGET_AVX512VL && !TARGET_EVEX512)) operands[2] = force_reg (mode, operands[2]); }) @@ -17302,6 +17346,7 @@ (xor:VI (match_operand:VI 1 "bcst_vector_operand" " 0, m,Br") (match_operand:VI 2 "vector_all_ones_operand" "BC,BC,BC")))] "TARGET_AVX512F + && ( == 64 || TARGET_AVX512VL || TARGET_EVEX512) && (! || mode == SImode || mode == DImode)" @@ -17368,7 +17413,7 @@ (match_operand:VI 2 "vector_all_ones_operand" "BC,BC,BC"))) (unspec [(match_operand:VI 3 "register_operand" "0,0,0")] UNSPEC_INSN_FALSE_DEP)] - "TARGET_AVX512F" + "TARGET_AVX512F && ( == 64 || TARGET_AVX512VL || TARGET_EVEX512)" { if (TARGET_AVX512VL) return "vpternlog\t{$0x55, %1, %0, %0|%0, %0, %1, 0x55}"; @@ -17392,7 +17437,7 @@ (not: (match_operand: 1 "nonimmediate_operand"))))] " == 64 || TARGET_AVX512VL - || (TARGET_AVX512F && !TARGET_PREFER_AVX256)" + || (TARGET_AVX512F && TARGET_EVEX512 && !TARGET_PREFER_AVX256)" [(set (match_dup 0) (xor:VI48_AVX512F (vec_duplicate:VI48_AVX512F (match_dup 1)) @@ -17538,7 +17583,8 @@ (symbol_ref " == 64 || TARGET_AVX512VL") (eq_attr "alternative" "4") (symbol_ref " == 64 || TARGET_AVX512VL - || (TARGET_AVX512F && !TARGET_PREFER_AVX256)") + || (TARGET_AVX512F && TARGET_EVEX512 + && !TARGET_PREFER_AVX256)") ] (const_string "*")))]) @@ -17582,7 +17628,7 @@ (match_operand: 1 "nonimmediate_operand"))) (match_operand:VI 2 "vector_operand")))] " == 64 || TARGET_AVX512VL - || (TARGET_AVX512F && !TARGET_PREFER_AVX256)" + || (TARGET_AVX512F && TARGET_EVEX512 && !TARGET_PREFER_AVX256)" [(set (match_dup 3) (vec_duplicate:VI (match_dup 1))) (set (match_dup 0) @@ -17597,7 +17643,7 @@ (match_operand: 1 "nonimmediate_operand"))) (match_operand:VI 2 "vector_operand")))] " == 64 || TARGET_AVX512VL - || (TARGET_AVX512F && !TARGET_PREFER_AVX256)" + || (TARGET_AVX512F && TARGET_EVEX512 && !TARGET_PREFER_AVX256)" [(set (match_dup 3) (vec_duplicate:VI (match_dup 1))) (set (match_dup 0) @@ -17883,7 +17929,7 @@ (match_operand:VI 1 "bcst_vector_operand" "0,m, 0,vBr")) (match_operand:VI 2 "bcst_vector_operand" "m,0,vBr, 0")))] "( == 64 || TARGET_AVX512VL - || (TARGET_AVX512F && !TARGET_PREFER_AVX256)) + || (TARGET_AVX512F && TARGET_EVEX512 && !TARGET_PREFER_AVX256)) && (register_operand (operands[1], mode) || register_operand (operands[2], mode))" { @@ -17916,7 +17962,7 @@ (match_operand:VI 1 "bcst_vector_operand" "%0, 0") (match_operand:VI 2 "bcst_vector_operand" " m,vBr"))))] "( == 64 || TARGET_AVX512VL - || (TARGET_AVX512F && !TARGET_PREFER_AVX256)) + || (TARGET_AVX512F && TARGET_EVEX512 && !TARGET_PREFER_AVX256)) && (register_operand (operands[1], mode) || register_operand (operands[2], mode))" { @@ -17947,7 +17993,7 @@ (not:VI (match_operand:VI 1 "bcst_vector_operand" "%0, 0")) (not:VI (match_operand:VI 2 "bcst_vector_operand" "m,vBr"))))] "( == 64 || TARGET_AVX512VL - || (TARGET_AVX512F && !TARGET_PREFER_AVX256)) + || (TARGET_AVX512F && TARGET_EVEX512 && !TARGET_PREFER_AVX256)) && (register_operand (operands[1], mode) || register_operand (operands[2], mode))" { @@ -18669,7 +18715,7 @@ (const_int 11) (const_int 27) (const_int 14) (const_int 30) (const_int 15) (const_int 31)])))] - "TARGET_AVX512F" + "TARGET_AVX512F && TARGET_EVEX512" "vpunpckhdq\t{%2, %1, %0|%0, %1, %2}" [(set_attr "type" "sselog") (set_attr "prefix" "evex") @@ -18724,7 +18770,7 @@ (const_int 9) (const_int 25) (const_int 12) (const_int 28) (const_int 13) (const_int 29)])))] - "TARGET_AVX512F" + "TARGET_AVX512F && TARGET_EVEX512" "vpunpckldq\t{%2, %1, %0|%0, %1, %2}" [(set_attr "type" "sselog") (set_attr "prefix" "evex") @@ -19418,7 +19464,7 @@ (match_operand:SI 2 "const_0_to_255_operand") (match_operand:V16SI 3 "register_operand") (match_operand:HI 4 "register_operand")] - "TARGET_AVX512F" + "TARGET_AVX512F && TARGET_EVEX512" { int mask = INTVAL (operands[2]); emit_insn (gen_avx512f_pshufd_1_mask (operands[0], operands[1], @@ -19462,7 +19508,7 @@ (match_operand 15 "const_12_to_15_operand") (match_operand 16 "const_12_to_15_operand") (match_operand 17 "const_12_to_15_operand")])))] - "TARGET_AVX512F + "TARGET_AVX512F && TARGET_EVEX512 && INTVAL (operands[2]) + 4 == INTVAL (operands[6]) && INTVAL (operands[3]) + 4 == INTVAL (operands[7]) && INTVAL (operands[4]) + 4 == INTVAL (operands[8]) @@ -20315,7 +20361,7 @@ (match_operand:V4TI 1 "register_operand" "v") (parallel [(match_operand:SI 2 "const_0_to_3_operand")])))] - "TARGET_AVX512F" + "TARGET_AVX512F && TARGET_EVEX512" "vextracti32x4\t{%2, %1, %0|%0, %1, %2}" [(set_attr "type" "sselog") (set_attr "length_immediate" "1") @@ -20323,7 +20369,7 @@ (set_attr "mode" "XI")]) (define_mode_iterator VEXTRACTI128_MODE - [(V4TI "TARGET_AVX512F") V2TI]) + [(V4TI "TARGET_AVX512F && TARGET_EVEX512") V2TI]) (define_split [(set (match_operand:TI 0 "nonimmediate_operand") @@ -20346,7 +20392,8 @@ && VECTOR_MODE_P (GET_MODE (operands[1])) && ((TARGET_SSE && GET_MODE_SIZE (GET_MODE (operands[1])) == 16) || (TARGET_AVX && GET_MODE_SIZE (GET_MODE (operands[1])) == 32) - || (TARGET_AVX512F && GET_MODE_SIZE (GET_MODE (operands[1])) == 64)) + || (TARGET_AVX512F && TARGET_EVEX512 + && GET_MODE_SIZE (GET_MODE (operands[1])) == 64)) && (mode == SImode || TARGET_64BIT || MEM_P (operands[0]))" [(set (match_dup 0) (vec_select:SWI48x (match_dup 1) (parallel [(const_int 0)])))] @@ -21994,8 +22041,9 @@ (define_mode_iterator VI1248_AVX512VL_AVX512BW [(V64QI "TARGET_AVX512BW") (V32QI "TARGET_AVX2") V16QI (V32HI "TARGET_AVX512BW") (V16HI "TARGET_AVX2") V8HI - (V16SI "TARGET_AVX512F") (V8SI "TARGET_AVX2") V4SI - (V8DI "TARGET_AVX512F") (V4DI "TARGET_AVX512VL") (V2DI "TARGET_AVX512VL")]) + (V16SI "TARGET_AVX512F && TARGET_EVEX512") (V8SI "TARGET_AVX2") V4SI + (V8DI "TARGET_AVX512F && TARGET_EVEX512") (V4DI "TARGET_AVX512VL") + (V2DI "TARGET_AVX512VL")]) (define_insn "*abs2" [(set (match_operand:VI1248_AVX512VL_AVX512BW 0 "register_operand" "=") @@ -22840,7 +22888,7 @@ [(set (match_operand:V16SI 0 "register_operand" "=v") (any_extend:V16SI (match_operand:V16QI 1 "nonimmediate_operand" "vm")))] - "TARGET_AVX512F" + "TARGET_AVX512F && TARGET_EVEX512" "vpmovbd\t{%1, %0|%0, %1}" [(set_attr "type" "ssemov") (set_attr "prefix" "evex") @@ -22850,7 +22898,7 @@ [(set (match_operand:V16SI 0 "register_operand") (any_extend:V16SI (match_operand:V16QI 1 "nonimmediate_operand")))] - "TARGET_AVX512F") + "TARGET_AVX512F && TARGET_EVEX512") (define_insn "avx2_v8qiv8si2" [(set (match_operand:V8SI 0 "register_operand" "=v") @@ -22982,7 +23030,7 @@ [(set (match_operand:V16SI 0 "register_operand" "=v") (any_extend:V16SI (match_operand:V16HI 1 "nonimmediate_operand" "vm")))] - "TARGET_AVX512F" + "TARGET_AVX512F && TARGET_EVEX512" "vpmovwd\t{%1, %0|%0, %1}" [(set_attr "type" "ssemov") (set_attr "prefix" "evex") @@ -22992,7 +23040,7 @@ [(set (match_operand:V16SI 0 "register_operand") (any_extend:V16SI (match_operand:V16HI 1 "nonimmediate_operand")))] - "TARGET_AVX512F") + "TARGET_AVX512F && TARGET_EVEX512") (define_insn_and_split "avx512f_zero_extendv16hiv16si2_1" [(set (match_operand:V32HI 0 "register_operand" "=v") @@ -23002,7 +23050,7 @@ (match_operand:V32HI 2 "const0_operand")) (match_parallel 3 "pmovzx_parallel" [(match_operand 4 "const_int_operand")])))] - "TARGET_AVX512F" + "TARGET_AVX512F && TARGET_EVEX512" "#" "&& reload_completed" [(set (match_dup 0) (zero_extend:V16SI (match_dup 1)))] @@ -23223,7 +23271,7 @@ (const_int 2) (const_int 3) (const_int 4) (const_int 5) (const_int 6) (const_int 7)]))))] - "TARGET_AVX512F" + "TARGET_AVX512F && TARGET_EVEX512" "vpmovbq\t{%1, %0|%0, %1}" [(set_attr "type" "ssemov") (set_attr "prefix" "evex") @@ -23233,7 +23281,7 @@ [(set (match_operand:V8DI 0 "register_operand" "=v") (any_extend:V8DI (match_operand:V8QI 1 "memory_operand" "m")))] - "TARGET_AVX512F" + "TARGET_AVX512F && TARGET_EVEX512" "vpmovbq\t{%1, %0|%0, %1}" [(set_attr "type" "ssemov") (set_attr "prefix" "evex") @@ -23251,7 +23299,7 @@ (const_int 2) (const_int 3) (const_int 4) (const_int 5) (const_int 6) (const_int 7)]))))] - "TARGET_AVX512F && ix86_pre_reload_split ()" + "TARGET_AVX512F && TARGET_EVEX512 && ix86_pre_reload_split ()" "#" "&& 1" [(set (match_dup 0) @@ -23262,7 +23310,7 @@ [(set (match_operand:V8DI 0 "register_operand") (any_extend:V8DI (match_operand:V8QI 1 "nonimmediate_operand")))] - "TARGET_AVX512F" + "TARGET_AVX512F && TARGET_EVEX512" { if (!MEM_P (operands[1])) { @@ -23402,7 +23450,7 @@ [(set (match_operand:V8DI 0 "register_operand" "=v") (any_extend:V8DI (match_operand:V8HI 1 "nonimmediate_operand" "vm")))] - "TARGET_AVX512F" + "TARGET_AVX512F && TARGET_EVEX512" "vpmovwq\t{%1, %0|%0, %1}" [(set_attr "type" "ssemov") (set_attr "prefix" "evex") @@ -23412,7 +23460,7 @@ [(set (match_operand:V8DI 0 "register_operand") (any_extend:V8DI (match_operand:V8HI 1 "nonimmediate_operand")))] - "TARGET_AVX512F") + "TARGET_AVX512F && TARGET_EVEX512") (define_insn "avx2_v4hiv4di2" [(set (match_operand:V4DI 0 "register_operand" "=v") @@ -23538,7 +23586,7 @@ [(set (match_operand:V8DI 0 "register_operand" "=v") (any_extend:V8DI (match_operand:V8SI 1 "nonimmediate_operand" "vm")))] - "TARGET_AVX512F" + "TARGET_AVX512F && TARGET_EVEX512" "vpmovdq\t{%1, %0|%0, %1}" [(set_attr "type" "ssemov") (set_attr "prefix" "evex") @@ -23552,7 +23600,7 @@ (match_operand:V16SI 2 "const0_operand")) (match_parallel 3 "pmovzx_parallel" [(match_operand 4 "const_int_operand")])))] - "TARGET_AVX512F" + "TARGET_AVX512F && TARGET_EVEX512" "#" "&& reload_completed" [(set (match_dup 0) (zero_extend:V8DI (match_dup 1)))] @@ -23571,7 +23619,7 @@ (match_operand:V16SI 3 "const0_operand")) (match_parallel 4 "pmovzx_parallel" [(match_operand 5 "const_int_operand")])))] - "TARGET_AVX512F" + "TARGET_AVX512F && TARGET_EVEX512" "#" "&& reload_completed" [(set (match_dup 0) (zero_extend:V8DI (match_dup 1)))] @@ -23583,7 +23631,7 @@ [(set (match_operand:V8DI 0 "register_operand" "=v") (any_extend:V8DI (match_operand:V8SI 1 "nonimmediate_operand" "vm")))] - "TARGET_AVX512F") + "TARGET_AVX512F && TARGET_EVEX512") (define_insn "avx2_v4siv4di2" [(set (match_operand:V4DI 0 "register_operand" "=v") @@ -23977,7 +24025,7 @@ [(match_operand:V16SI 0 "register_operand") (match_operand:V16SF 1 "nonimmediate_operand") (match_operand:SI 2 "const_0_to_15_operand")] - "TARGET_AVX512F" + "TARGET_AVX512F && TARGET_EVEX512" { rtx tmp = gen_reg_rtx (V16SFmode); emit_insn (gen_avx512f_rndscalev16sf (tmp, operands[1], operands[2])); @@ -25394,7 +25442,7 @@ (ashiftrt:V8DI (match_operand:V8DI 1 "register_operand") (match_operand:V8DI 2 "nonimmediate_operand")))] - "TARGET_AVX512F") + "TARGET_AVX512F && TARGET_EVEX512") (define_expand "vashrv4di3" [(set (match_operand:V4DI 0 "register_operand") @@ -25485,7 +25533,7 @@ [(set (match_operand:V16SI 0 "register_operand") (ashiftrt:V16SI (match_operand:V16SI 1 "register_operand") (match_operand:V16SI 2 "nonimmediate_operand")))] - "TARGET_AVX512F") + "TARGET_AVX512F && TARGET_EVEX512") (define_expand "vashrv8si3" [(set (match_operand:V8SI 0 "register_operand") @@ -26058,8 +26106,8 @@ (define_mode_attr pbroadcast_evex_isa [(V64QI "avx512bw") (V32QI "avx512bw") (V16QI "avx512bw") (V32HI "avx512bw") (V16HI "avx512bw") (V8HI "avx512bw") - (V16SI "avx512f") (V8SI "avx512f") (V4SI "avx512f") - (V8DI "avx512f") (V4DI "avx512f") (V2DI "avx512f") + (V16SI "avx512f_512") (V8SI "avx512f") (V4SI "avx512f") + (V8DI "avx512f_512") (V4DI "avx512f") (V2DI "avx512f") (V32HF "avx512bw") (V16HF "avx512bw") (V8HF "avx512bw") (V32BF "avx512bw") (V16BF "avx512bw") (V8BF "avx512bw")]) @@ -26602,7 +26650,7 @@ (set (attr "enabled") (if_then_else (eq_attr "alternative" "1") (symbol_ref "TARGET_AVX512F && !TARGET_AVX512VL - && !TARGET_PREFER_AVX256") + && TARGET_EVEX512 && !TARGET_PREFER_AVX256") (const_string "*")))]) (define_insn "*vec_dupv4si" @@ -26630,7 +26678,7 @@ (set (attr "enabled") (if_then_else (eq_attr "alternative" "1") (symbol_ref "TARGET_AVX512F && !TARGET_AVX512VL - && !TARGET_PREFER_AVX256") + && TARGET_EVEX512 && !TARGET_PREFER_AVX256") (const_string "*")))]) (define_insn "*vec_dupv2di" @@ -26661,7 +26709,8 @@ (if_then_else (eq_attr "alternative" "2") (symbol_ref "TARGET_AVX512VL - || (TARGET_AVX512F && !TARGET_PREFER_AVX256)") + || (TARGET_AVX512F && TARGET_EVEX512 + && !TARGET_PREFER_AVX256)") (const_string "*")))]) (define_insn "avx2_vbroadcasti128_" @@ -26741,7 +26790,7 @@ [(set_attr "type" "ssemov") (set_attr "prefix_extra" "1") (set_attr "prefix" "maybe_evex") - (set_attr "isa" "avx2,noavx2,avx2,avx512f,noavx2") + (set_attr "isa" "avx2,noavx2,avx2,avx512f_512,noavx2") (set_attr "mode" ",V8SF,,,V8SF")]) (define_split @@ -26908,7 +26957,8 @@ (set_attr "mode" "")]) (define_mode_iterator VPERMI2 - [V16SI V16SF V8DI V8DF + [(V16SI "TARGET_EVEX512") (V16SF "TARGET_EVEX512") + (V8DI "TARGET_EVEX512") (V8DF "TARGET_EVEX512") (V8SI "TARGET_AVX512VL") (V8SF "TARGET_AVX512VL") (V4DI "TARGET_AVX512VL") (V4DF "TARGET_AVX512VL") (V4SI "TARGET_AVX512VL") (V4SF "TARGET_AVX512VL") @@ -26919,7 +26969,7 @@ (V16QI "TARGET_AVX512VBMI && TARGET_AVX512VL")]) (define_mode_iterator VPERMI2I - [V16SI V8DI + [(V16SI "TARGET_EVEX512") (V8DI "TARGET_EVEX512") (V8SI "TARGET_AVX512VL") (V4SI "TARGET_AVX512VL") (V4DI "TARGET_AVX512VL") (V2DI "TARGET_AVX512VL") (V32HI "TARGET_AVX512BW") (V16HI "TARGET_AVX512BW && TARGET_AVX512VL") @@ -27543,28 +27593,29 @@ ;; Modes handled by vec_init expanders. (define_mode_iterator VEC_INIT_MODE - [(V64QI "TARGET_AVX512F") (V32QI "TARGET_AVX") V16QI - (V32HI "TARGET_AVX512F") (V16HI "TARGET_AVX") V8HI - (V16SI "TARGET_AVX512F") (V8SI "TARGET_AVX") V4SI - (V8DI "TARGET_AVX512F") (V4DI "TARGET_AVX") V2DI - (V32HF "TARGET_AVX512F") (V16HF "TARGET_AVX") V8HF - (V32BF "TARGET_AVX512F") (V16BF "TARGET_AVX") V8BF - (V16SF "TARGET_AVX512F") (V8SF "TARGET_AVX") V4SF - (V8DF "TARGET_AVX512F") (V4DF "TARGET_AVX") (V2DF "TARGET_SSE2") - (V4TI "TARGET_AVX512F") (V2TI "TARGET_AVX")]) + [(V64QI "TARGET_AVX512F && TARGET_EVEX512") (V32QI "TARGET_AVX") V16QI + (V32HI "TARGET_AVX512F && TARGET_EVEX512") (V16HI "TARGET_AVX") V8HI + (V16SI "TARGET_AVX512F && TARGET_EVEX512") (V8SI "TARGET_AVX") V4SI + (V8DI "TARGET_AVX512F && TARGET_EVEX512") (V4DI "TARGET_AVX") V2DI + (V32HF "TARGET_AVX512F && TARGET_EVEX512") (V16HF "TARGET_AVX") V8HF + (V32BF "TARGET_AVX512F && TARGET_EVEX512") (V16BF "TARGET_AVX") V8BF + (V16SF "TARGET_AVX512F && TARGET_EVEX512") (V8SF "TARGET_AVX") V4SF + (V8DF "TARGET_AVX512F && TARGET_EVEX512") + (V4DF "TARGET_AVX") (V2DF "TARGET_SSE2") + (V4TI "TARGET_AVX512F && TARGET_EVEX512") (V2TI "TARGET_AVX")]) ;; Likewise, but for initialization from half sized vectors. ;; Thus, these are all VEC_INIT_MODE modes except V2??. (define_mode_iterator VEC_INIT_HALF_MODE - [(V64QI "TARGET_AVX512F") (V32QI "TARGET_AVX") V16QI - (V32HI "TARGET_AVX512F") (V16HI "TARGET_AVX") V8HI - (V16SI "TARGET_AVX512F") (V8SI "TARGET_AVX") V4SI - (V8DI "TARGET_AVX512F") (V4DI "TARGET_AVX") - (V32HF "TARGET_AVX512F") (V16HF "TARGET_AVX") V8HF - (V32BF "TARGET_AVX512F") (V16BF "TARGET_AVX") V8BF - (V16SF "TARGET_AVX512F") (V8SF "TARGET_AVX") V4SF - (V8DF "TARGET_AVX512F") (V4DF "TARGET_AVX") - (V4TI "TARGET_AVX512F")]) + [(V64QI "TARGET_AVX512F && TARGET_EVEX512") (V32QI "TARGET_AVX") V16QI + (V32HI "TARGET_AVX512F && TARGET_EVEX512") (V16HI "TARGET_AVX") V8HI + (V16SI "TARGET_AVX512F && TARGET_EVEX512") (V8SI "TARGET_AVX") V4SI + (V8DI "TARGET_AVX512F && TARGET_EVEX512") (V4DI "TARGET_AVX") + (V32HF "TARGET_AVX512F && TARGET_EVEX512") (V16HF "TARGET_AVX") V8HF + (V32BF "TARGET_AVX512F && TARGET_EVEX512") (V16BF "TARGET_AVX") V8BF + (V16SF "TARGET_AVX512F && TARGET_EVEX512") (V8SF "TARGET_AVX") V4SF + (V8DF "TARGET_AVX512F && TARGET_EVEX512") (V4DF "TARGET_AVX") + (V4TI "TARGET_AVX512F && TARGET_EVEX512")]) (define_expand "vec_init" [(match_operand:VEC_INIT_MODE 0 "register_operand") @@ -27817,7 +27868,7 @@ (unspec:V16SF [(match_operand:V16HI 1 "" "")] UNSPEC_VCVTPH2PS))] - "TARGET_AVX512F" + "TARGET_AVX512F && TARGET_EVEX512" "vcvtph2ps\t{%1, %0|%0, %1}" [(set_attr "type" "ssecvt") (set_attr "prefix" "evex") @@ -27907,7 +27958,7 @@ UNSPEC_VCVTPS2PH) (match_operand:V16HI 3 "nonimm_or_0_operand") (match_operand:HI 4 "register_operand")))] - "TARGET_AVX512F" + "TARGET_AVX512F && TARGET_EVEX512" { int round = INTVAL (operands[2]); /* Separate {sae} from rounding control imm, @@ -27926,7 +27977,7 @@ [(match_operand:V16SF 1 "register_operand" "v") (match_operand:SI 2 "const_0_to_255_operand")] UNSPEC_VCVTPS2PH))] - "TARGET_AVX512F" + "TARGET_AVX512F && TARGET_EVEX512" "vcvtps2ph\t{%2, %1, %0|%0, %1, %2}" [(set_attr "type" "ssecvt") (set_attr "prefix" "evex") @@ -27938,7 +27989,7 @@ [(match_operand:V16SF 1 "register_operand" "v") (match_operand:SI 2 "const_0_to_255_operand")] UNSPEC_VCVTPS2PH))] - "TARGET_AVX512F" + "TARGET_AVX512F && TARGET_EVEX512" "vcvtps2ph\t{%2, %1, %0|%0, %1, %2}" [(set_attr "type" "ssecvt") (set_attr "prefix" "evex") @@ -30285,10 +30336,10 @@ ;; vinserti64x4 $0x1, %ymm15, %zmm15, %zmm15 (define_mode_iterator INT_BROADCAST_MODE - [(V64QI "TARGET_AVX512F") (V32QI "TARGET_AVX") V16QI - (V32HI "TARGET_AVX512F") (V16HI "TARGET_AVX") V8HI - (V16SI "TARGET_AVX512F") (V8SI "TARGET_AVX") V4SI - (V8DI "TARGET_AVX512F && TARGET_64BIT") + [(V64QI "TARGET_AVX512F && TARGET_EVEX512") (V32QI "TARGET_AVX") V16QI + (V32HI "TARGET_AVX512F && TARGET_EVEX512") (V16HI "TARGET_AVX") V8HI + (V16SI "TARGET_AVX512F && TARGET_EVEX512") (V8SI "TARGET_AVX") V4SI + (V8DI "TARGET_AVX512F && TARGET_EVEX512 && TARGET_64BIT") (V4DI "TARGET_AVX && TARGET_64BIT") (V2DI "TARGET_64BIT")]) ;; Broadcast from an integer. NB: Enable broadcast only if we can move diff --git a/gcc/testsuite/gcc.target/i386/pr89229-5b.c b/gcc/testsuite/gcc.target/i386/pr89229-5b.c index 261f2e12e8d..8a81585e790 100644 --- a/gcc/testsuite/gcc.target/i386/pr89229-5b.c +++ b/gcc/testsuite/gcc.target/i386/pr89229-5b.c @@ -3,4 +3,4 @@ #include "pr89229-5a.c" -/* { dg-final { scan-assembler-times "vmovdqa32\[^\n\r]*zmm1\[67]\[^\n\r]*zmm1\[67]" 1 } } */ +/* { dg-final { scan-assembler-times "vmovsd\[^\n\r]*xmm1\[67]\[^\n\r]*xmm1\[67]\[^\n\r]*xmm1\[67]" 1 } } */ diff --git a/gcc/testsuite/gcc.target/i386/pr89229-6b.c b/gcc/testsuite/gcc.target/i386/pr89229-6b.c index a74f7169e6e..0c27daa4f74 100644 --- a/gcc/testsuite/gcc.target/i386/pr89229-6b.c +++ b/gcc/testsuite/gcc.target/i386/pr89229-6b.c @@ -3,4 +3,4 @@ #include "pr89229-6a.c" -/* { dg-final { scan-assembler-times "vmovaps\[^\n\r]*zmm1\[67]\[^\n\r]*zmm1\[67]" 1 } } */ +/* { dg-final { scan-assembler-times "vmovss\[^\n\r]*xmm1\[67]\[^\n\r]*xmm1\[67]\[^\n\r]*xmm1\[67]" 1 } } */ diff --git a/gcc/testsuite/gcc.target/i386/pr89229-7b.c b/gcc/testsuite/gcc.target/i386/pr89229-7b.c index d3a56e6e2b7..baba99ec775 100644 --- a/gcc/testsuite/gcc.target/i386/pr89229-7b.c +++ b/gcc/testsuite/gcc.target/i386/pr89229-7b.c @@ -3,4 +3,4 @@ #include "pr89229-7a.c" -/* { dg-final { scan-assembler-times "vmovdqa32\[^\n\r]*zmm1\[67]\[^\n\r]*zmm1\[67]" 1 } } */ +/* { dg-final { scan-assembler-times "vmovss\[^\n\r]*xmm1\[67]\[^\n\r]*xmm1\[67]\[^\n\r]*xmm1\[67]" 1 } } */ From patchwork Thu Sep 21 07:20:09 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Hu, Lin1" X-Patchwork-Id: 1837514 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=intel.com header.i=@intel.com header.a=rsa-sha256 header.s=Intel header.b=Jbp60+fV; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4Rrn1t166lz1ynj for ; Thu, 21 Sep 2023 17:23:29 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 29BF738768A9 for ; Thu, 21 Sep 2023 07:23:26 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mgamail.intel.com (mgamail.intel.com [192.55.52.115]) by sourceware.org (Postfix) with ESMTPS id 24F493858418 for ; Thu, 21 Sep 2023 07:22:35 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 24F493858418 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=intel.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1695280956; x=1726816956; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=XuMxU+Grrtp2C3ozqnJycAT1vncS8fqkijJNTgd9POE=; b=Jbp60+fVBIexQX9JTPIbwwf1qakhrUg3g3DOI0Q7E8TZPD559oTjMOZh aquOahSxWdQYxbdUzC5A/2m9Dzf1pRrp2J72l0B96x+7PD7W4874ICU+C NSieidjHDn1mbnd2KRx2KOzS1bmNJIA1ihMc1c2nMUg/j5UL7IrqHdop9 ZtVNs60tWAO76Twvi7QW9ia7hDOtG2PPrmAMu7Pl5wmsR7q3b0v4sNhhD aLp5Ij4W1oOrfKAoTXxXA2U7IoSyTm4kW2QFVPLN4nLgn07qA6M4Sz8cs NgEdG5+8e+R1s9GkXhnQJVPoVN/gTb1QO9Jdjo2Dl4Fyh1xCKO1L0QbSC A==; X-IronPort-AV: E=McAfee;i="6600,9927,10839"; a="380352146" X-IronPort-AV: E=Sophos;i="6.03,164,1694761200"; d="scan'208";a="380352146" Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Sep 2023 00:22:18 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10839"; a="817262181" X-IronPort-AV: E=Sophos;i="6.03,164,1694761200"; d="scan'208";a="817262181" Received: from shvmail03.sh.intel.com ([10.239.245.20]) by fmsmga004.fm.intel.com with ESMTP; 21 Sep 2023 00:22:15 -0700 Received: from shliclel4217.sh.intel.com (shliclel4217.sh.intel.com [10.239.240.127]) by shvmail03.sh.intel.com (Postfix) with ESMTP id 5FD8E100513C; Thu, 21 Sep 2023 15:22:14 +0800 (CST) From: "Hu, Lin1" To: gcc-patches@gcc.gnu.org Cc: hongtao.liu@intel.com, ubizjak@gmail.com, haochen.jiang@intel.com Subject: [PATCH 14/18] Support -mevex512 for AVX512DQ intrins Date: Thu, 21 Sep 2023 15:20:09 +0800 Message-Id: <20230921072013.2124750-15-lin1.hu@intel.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20230921072013.2124750-1-lin1.hu@intel.com> References: <20230921072013.2124750-1-lin1.hu@intel.com> MIME-Version: 1.0 X-Spam-Status: No, score=-11.2 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org From: Haochen Jiang gcc/ChangeLog: * config/i386/i386-expand.cc (ix86_expand_sse2_mulvxdi3): Add TARGET_EVEX512 for 512 bit usage. * config/i386/i386.cc (standard_sse_constant_opcode): Ditto. * config/i386/sse.md (VF1_VF2_AVX512DQ): Ditto. (VF1_128_256VL): Ditto. (VF2_AVX512VL): Ditto. (VI8_256_512): Ditto. (fixuns_trunc2): Ditto. (AVX512_VEC): Ditto. (AVX512_VEC_2): Ditto. (VI4F_BRCST32x2): Ditto. (VI8F_BRCST64x2): Ditto. --- gcc/config/i386/i386-expand.cc | 2 +- gcc/config/i386/i386.cc | 22 ++++++++++++++++------ gcc/config/i386/sse.md | 24 ++++++++++++++---------- 3 files changed, 31 insertions(+), 17 deletions(-) diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc index 0705e08d38c..063561e1265 100644 --- a/gcc/config/i386/i386-expand.cc +++ b/gcc/config/i386/i386-expand.cc @@ -24008,7 +24008,7 @@ ix86_expand_sse2_mulvxdi3 (rtx op0, rtx op1, rtx op2) machine_mode mode = GET_MODE (op0); rtx t1, t2, t3, t4, t5, t6; - if (TARGET_AVX512DQ && mode == V8DImode) + if (TARGET_AVX512DQ && TARGET_EVEX512 && mode == V8DImode) emit_insn (gen_avx512dq_mulv8di3 (op0, op1, op2)); else if (TARGET_AVX512DQ && TARGET_AVX512VL && mode == V4DImode) emit_insn (gen_avx512dq_mulv4di3 (op0, op1, op2)); diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc index 635dd85e764..589b29a324d 100644 --- a/gcc/config/i386/i386.cc +++ b/gcc/config/i386/i386.cc @@ -5332,9 +5332,14 @@ standard_sse_constant_opcode (rtx_insn *insn, rtx *operands) if (EXT_REX_SSE_REG_P (operands[0])) { if (TARGET_AVX512DQ) - return (TARGET_AVX512VL - ? "vxorpd\t%x0, %x0, %x0" - : "vxorpd\t%g0, %g0, %g0"); + { + if (TARGET_AVX512VL) + return "vxorpd\t%x0, %x0, %x0"; + else if (TARGET_EVEX512) + return "vxorpd\t%g0, %g0, %g0"; + else + gcc_unreachable (); + } else { if (TARGET_AVX512VL) @@ -5356,9 +5361,14 @@ standard_sse_constant_opcode (rtx_insn *insn, rtx *operands) if (EXT_REX_SSE_REG_P (operands[0])) { if (TARGET_AVX512DQ) - return (TARGET_AVX512VL - ? "vxorps\t%x0, %x0, %x0" - : "vxorps\t%g0, %g0, %g0"); + { + if (TARGET_AVX512VL) + return "vxorps\t%x0, %x0, %x0"; + else if (TARGET_EVEX512) + return "vxorps\t%g0, %g0, %g0"; + else + gcc_unreachable (); + } else { if (TARGET_AVX512VL) diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index 8d1b75b43e0..a8f93ceddc5 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -350,7 +350,8 @@ (define_mode_iterator VF1_VF2_AVX512DQ [(V16SF "TARGET_AVX512F && TARGET_EVEX512") (V8SF "TARGET_AVX") V4SF - (V8DF "TARGET_AVX512DQ") (V4DF "TARGET_AVX512DQ && TARGET_AVX512VL") + (V8DF "TARGET_AVX512DQ && TARGET_EVEX512") + (V4DF "TARGET_AVX512DQ && TARGET_AVX512VL") (V2DF "TARGET_AVX512DQ && TARGET_AVX512VL")]) (define_mode_iterator VFH @@ -392,7 +393,7 @@ [(V8SF "TARGET_AVX") V4SF]) (define_mode_iterator VF1_128_256VL - [V8SF (V4SF "TARGET_AVX512VL")]) + [(V8SF "TARGET_EVEX512") (V4SF "TARGET_AVX512VL")]) ;; All DFmode vector float modes (define_mode_iterator VF2 @@ -467,7 +468,7 @@ (V8DF "TARGET_EVEX512") (V4DF "TARGET_AVX512VL") (V2DF "TARGET_AVX512VL")]) (define_mode_iterator VF2_AVX512VL - [V8DF (V4DF "TARGET_AVX512VL") (V2DF "TARGET_AVX512VL")]) + [(V8DF "TARGET_EVEX512") (V4DF "TARGET_AVX512VL") (V2DF "TARGET_AVX512VL")]) (define_mode_iterator VF1_AVX512VL [(V16SF "TARGET_EVEX512") (V8SF "TARGET_AVX512VL") (V4SF "TARGET_AVX512VL")]) @@ -534,7 +535,7 @@ [(V8DI "TARGET_EVEX512") (V4DI "TARGET_AVX512VL") (V2DI "TARGET_AVX512VL")]) (define_mode_iterator VI8_256_512 - [V8DI (V4DI "TARGET_AVX512VL")]) + [(V8DI "TARGET_EVEX512") (V4DI "TARGET_AVX512VL")]) (define_mode_iterator VI1_AVX2 [(V32QI "TARGET_AVX2") V16QI]) @@ -9075,7 +9076,7 @@ (define_insn "fixuns_trunc2" [(set (match_operand: 0 "register_operand" "=v") (unsigned_fix: - (match_operand:VF1_128_256VL 1 "nonimmediate_operand" "vm")))] + (match_operand:VF1_128_256 1 "nonimmediate_operand" "vm")))] "TARGET_AVX512VL" "vcvttps2udq\t{%1, %0|%0, %1}" [(set_attr "type" "ssecvt") @@ -11466,7 +11467,8 @@ (V8SF "32x4") (V8SI "32x4") (V4DF "64x2") (V4DI "64x2")]) (define_mode_iterator AVX512_VEC - [(V8DF "TARGET_AVX512DQ") (V8DI "TARGET_AVX512DQ") + [(V8DF "TARGET_AVX512DQ && TARGET_EVEX512") + (V8DI "TARGET_AVX512DQ && TARGET_EVEX512") (V16SF "TARGET_EVEX512") (V16SI "TARGET_EVEX512")]) (define_expand "_vextract_mask" @@ -11636,7 +11638,8 @@ [(V16SF "32x8") (V16SI "32x8") (V8DF "64x4") (V8DI "64x4")]) (define_mode_iterator AVX512_VEC_2 - [(V16SF "TARGET_AVX512DQ") (V16SI "TARGET_AVX512DQ") + [(V16SF "TARGET_AVX512DQ && TARGET_EVEX512") + (V16SI "TARGET_AVX512DQ && TARGET_EVEX512") (V8DF "TARGET_EVEX512") (V8DI "TARGET_EVEX512")]) (define_expand "_vextract_mask" @@ -26850,8 +26853,8 @@ ;; For broadcast[i|f]32x2. Yes there is no v4sf version, only v4si. (define_mode_iterator VI4F_BRCST32x2 - [V16SI (V8SI "TARGET_AVX512VL") (V4SI "TARGET_AVX512VL") - V16SF (V8SF "TARGET_AVX512VL")]) + [(V16SI "TARGET_EVEX512") (V8SI "TARGET_AVX512VL") (V4SI "TARGET_AVX512VL") + (V16SF "TARGET_EVEX512") (V8SF "TARGET_AVX512VL")]) (define_mode_attr 64x2mode [(V8DF "V2DF") (V8DI "V2DI") (V4DI "V2DI") (V4DF "V2DF")]) @@ -26901,7 +26904,8 @@ ;; For broadcast[i|f]64x2 (define_mode_iterator VI8F_BRCST64x2 - [V8DI V8DF (V4DI "TARGET_AVX512VL") (V4DF "TARGET_AVX512VL")]) + [(V8DI "TARGET_EVEX512") (V8DF "TARGET_EVEX512") + (V4DI "TARGET_AVX512VL") (V4DF "TARGET_AVX512VL")]) (define_insn "avx512dq_broadcast_1" [(set (match_operand:VI8F_BRCST64x2 0 "register_operand" "=v,v") From patchwork Thu Sep 21 07:20:10 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Hu, Lin1" X-Patchwork-Id: 1837522 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=intel.com header.i=@intel.com header.a=rsa-sha256 header.s=Intel header.b=GLR9YeTb; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4Rrn3V6Jwdz1ynF for ; Thu, 21 Sep 2023 17:24:54 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 759703894C37 for ; Thu, 21 Sep 2023 07:24:52 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mgamail.intel.com (mgamail.intel.com [192.55.52.115]) by sourceware.org (Postfix) with ESMTPS id 2D58C3857705 for ; Thu, 21 Sep 2023 07:22:41 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 2D58C3857705 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=intel.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1695280961; x=1726816961; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=yA05PVNVQLWuvE30W9B3WQL8p5HTOuYRHkzuBXVUdxo=; b=GLR9YeTbiNqPfnA9H5TBlJn84ZGXz6pWXULdgrTXIufkQeKiffAAi4Mz eb9wPFG01s8duZKbTzbMkcDy6wY57XKLLiVtfx2ME+UtPiOQSTapfzRev 8Neqank2L1jNf/33Q8XiM3Ya/6ZpX59UPRtAmFg/VYTICAf0ztlzkyapp 2QDyZIQLpCHgkGwMPC+EgS3/5GI+K8n9Cwn0Awq4PvDzEuI6h0Z94ENDW DEtRraqNNU45h2eKTil617jVG5+5+BtbfGItks5vA+JYcaP8X89/op1ae /U8Rt8wBnkbdnBKUbdiTClm5AW375xB+1nqmBMnHBJTy3GeHmHtAPIQ7q g==; X-IronPort-AV: E=McAfee;i="6600,9927,10839"; a="380352167" X-IronPort-AV: E=Sophos;i="6.03,164,1694761200"; d="scan'208";a="380352167" Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Sep 2023 00:22:21 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10839"; a="817262203" X-IronPort-AV: E=Sophos;i="6.03,164,1694761200"; d="scan'208";a="817262203" Received: from shvmail03.sh.intel.com ([10.239.245.20]) by fmsmga004.fm.intel.com with ESMTP; 21 Sep 2023 00:22:17 -0700 Received: from shliclel4217.sh.intel.com (shliclel4217.sh.intel.com [10.239.240.127]) by shvmail03.sh.intel.com (Postfix) with ESMTP id 6255E100513D; Thu, 21 Sep 2023 15:22:14 +0800 (CST) From: "Hu, Lin1" To: gcc-patches@gcc.gnu.org Cc: hongtao.liu@intel.com, ubizjak@gmail.com, haochen.jiang@intel.com Subject: [PATCH 15/18] Support -mevex512 for AVX512BW intrins Date: Thu, 21 Sep 2023 15:20:10 +0800 Message-Id: <20230921072013.2124750-16-lin1.hu@intel.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20230921072013.2124750-1-lin1.hu@intel.com> References: <20230921072013.2124750-1-lin1.hu@intel.com> MIME-Version: 1.0 X-Spam-Status: No, score=-11.4 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org From: Haochen Jiang gcc/Changelog: * config/i386/i386-expand.cc (ix86_expand_vector_init_duplicate): Make sure there is EVEX512 enabled. (ix86_expand_vecop_qihi2): Refuse V32QI->V32HI when no EVEX512. * config/i386/i386.cc (ix86_hard_regno_mode_ok): Disable 64 bit mask when !TARGET_EVEX512. * config/i386/i386.md (avx512bw_512): New. (SWI1248_AVX512BWDQ_64): Add TARGET_EVEX512. (*zero_extendsidi2): Change isa to avx512bw_512. (kmov_isa): Ditto. (*anddi_1): Ditto. (*andn_1): Change isa to kmov_isa. (*_1): Ditto. (*notxor_1): Ditto. (*one_cmpl2_1): Ditto. (*one_cmplsi2_1_zext): Change isa to avx512bw_512. (*ashl3_1): Change isa to kmov_isa. (*lshr3_1): Ditto. * config/i386/sse.md (VI12HFBF_AVX512VL): Add TARGET_EVEX512. (VI1248_AVX512VLBW): Ditto. (VHFBF_AVX512VL): Ditto. (VI): Ditto. (VIHFBF): Ditto. (VI_AVX2): Ditto. (VI1_AVX512): Ditto. (VI12_256_512_AVX512VL): Ditto. (VI2_AVX2_AVX512BW): Ditto. (VI2_AVX512VNNIBW): Ditto. (VI2_AVX512VL): Ditto. (VI2HFBF_AVX512VL): Ditto. (VI8_AVX2_AVX512BW): Ditto. (VIMAX_AVX2_AVX512BW): Ditto. (VIMAX_AVX512VL): Ditto. (VI12_AVX2_AVX512BW): Ditto. (VI124_AVX2_24_AVX512F_1_AVX512BW): Ditto. (VI248_AVX512VL): Ditto. (VI248_AVX512VLBW): Ditto. (VI248_AVX2_8_AVX512F_24_AVX512BW): Ditto. (VI248_AVX512BW): Ditto. (VI248_AVX512BW_AVX512VL): Ditto. (VI248_512): Ditto. (VI124_256_AVX512F_AVX512BW): Ditto. (VI_AVX512BW): Ditto. (VIHFBF_AVX512BW): Ditto. (SWI1248_AVX512BWDQ): Ditto. (SWI1248_AVX512BW): Ditto. (SWI1248_AVX512BWDQ2): Ditto. (*knotsi_1_zext): Ditto. (define_split for zero_extend + not): Ditto. (kunpckdi): Ditto. (REDUC_SMINMAX_MODE): Ditto. (VEC_EXTRACT_MODE): Ditto. (*avx512bw_permvar_truncv16siv16hi_1): Ditto. (*avx512bw_permvar_truncv16siv16hi_1_hf): Ditto. (truncv32hiv32qi2): Ditto. (avx512bw_v32hiv32qi2): Ditto. (avx512bw_v32hiv32qi2_mask): Ditto. (avx512bw_v32hiv32qi2_mask_store): Ditto. (usadv64qi): Ditto. (VEC_PERM_AVX2): Ditto. (AVX512ZEXTMASK): Ditto. (SWI24_MASK): New. (vec_pack_trunc_): Change iterator to SWI24_MASK. (avx512bw_packsswb): Add TARGET_EVEX512. (avx512bw_packssdw): Ditto. (avx512bw_interleave_highv64qi): Ditto. (avx512bw_interleave_lowv64qi): Ditto. (avx512bw_pshuflwv32hi): Ditto. (avx512bw_pshufhwv32hi): Ditto. (vec_unpacks_lo_di): Ditto. (SWI48x_MASK): New. (vec_unpacks_hi_): Change iterator to SWI48x_MASK. (avx512bw_umulhrswv32hi3): Add TARGET_EVEX512. (VI1248_AVX512VL_AVX512BW): Ditto. (avx512bw_v32qiv32hi2): Ditto. (*avx512bw_zero_extendv32qiv32hi2_1): Ditto. (*avx512bw_zero_extendv32qiv32hi2_2): Ditto. (v32qiv32hi2): Ditto. (pbroadcast_evex_isa): Change isa attribute to avx512bw_512. (VPERMI2): Add TARGET_EVEX512. (VPERMI2I): Ditto. --- gcc/config/i386/i386-expand.cc | 3 +- gcc/config/i386/i386.cc | 4 +- gcc/config/i386/i386.md | 54 ++++----- gcc/config/i386/sse.md | 193 ++++++++++++++++++--------------- 4 files changed, 128 insertions(+), 126 deletions(-) diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc index 063561e1265..ff2423f91ed 100644 --- a/gcc/config/i386/i386-expand.cc +++ b/gcc/config/i386/i386-expand.cc @@ -15617,6 +15617,7 @@ ix86_expand_vector_init_duplicate (bool mmx_ok, machine_mode mode, case E_V32HFmode: case E_V32BFmode: case E_V64QImode: + gcc_assert (TARGET_EVEX512); if (TARGET_AVX512BW) return ix86_vector_duplicate_value (mode, target, val); else @@ -23512,7 +23513,7 @@ ix86_expand_vecop_qihi2 (enum rtx_code code, rtx dest, rtx op1, rtx op2) bool uns_p = code != ASHIFTRT; if ((qimode == V16QImode && !TARGET_AVX2) - || (qimode == V32QImode && !TARGET_AVX512BW) + || (qimode == V32QImode && (!TARGET_AVX512BW || !TARGET_EVEX512)) /* There are no V64HImode instructions. */ || qimode == V64QImode) return false; diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc index 589b29a324d..03c96ff048d 100644 --- a/gcc/config/i386/i386.cc +++ b/gcc/config/i386/i386.cc @@ -20308,8 +20308,8 @@ ix86_hard_regno_mode_ok (unsigned int regno, machine_mode mode) return MASK_PAIR_REGNO_P(regno); return ((TARGET_AVX512F && VALID_MASK_REG_MODE (mode)) - || (TARGET_AVX512BW - && VALID_MASK_AVX512BW_MODE (mode))); + || (TARGET_AVX512BW && mode == SImode) + || (TARGET_AVX512BW && TARGET_EVEX512 && mode == DImode)); } if (GET_MODE_CLASS (mode) == MODE_PARTIAL_INT) diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md index 6eb4e540140..bdececc2309 100644 --- a/gcc/config/i386/i386.md +++ b/gcc/config/i386/i386.md @@ -536,10 +536,10 @@ x64_avx,x64_avx512bw,x64_avx512dq,aes, sse_noavx,sse2,sse2_noavx,sse3,sse3_noavx,sse4,sse4_noavx, avx,noavx,avx2,noavx2,bmi,bmi2,fma4,fma,avx512f,avx512f_512, - noavx512f,avx512bw,noavx512bw,avx512dq,noavx512dq, - fma_or_avx512vl,avx512vl,noavx512vl,avxvnni,avx512vnnivl, - avx512fp16,avxifma,avx512ifmavl,avxneconvert,avx512bf16vl, - vpclmulqdqvl" + noavx512f,avx512bw,avx512bw_512,noavx512bw,avx512dq, + noavx512dq,fma_or_avx512vl,avx512vl,noavx512vl,avxvnni, + avx512vnnivl,avx512fp16,avxifma,avx512ifmavl,avxneconvert, + avx512bf16vl,vpclmulqdqvl" (const_string "base")) ;; The (bounding maximum) length of an instruction immediate. @@ -904,6 +904,8 @@ (symbol_ref "TARGET_AVX512F && TARGET_EVEX512") (eq_attr "isa" "noavx512f") (symbol_ref "!TARGET_AVX512F") (eq_attr "isa" "avx512bw") (symbol_ref "TARGET_AVX512BW") + (eq_attr "isa" "avx512bw_512") + (symbol_ref "TARGET_AVX512BW && TARGET_EVEX512") (eq_attr "isa" "noavx512bw") (symbol_ref "!TARGET_AVX512BW") (eq_attr "isa" "avx512dq") (symbol_ref "TARGET_AVX512DQ") (eq_attr "isa" "noavx512dq") (symbol_ref "!TARGET_AVX512DQ") @@ -1440,7 +1442,8 @@ (define_mode_iterator SWI1248_AVX512BWDQ_64 [(QI "TARGET_AVX512DQ") HI - (SI "TARGET_AVX512BW") (DI "TARGET_AVX512BW && TARGET_64BIT")]) + (SI "TARGET_AVX512BW") + (DI "TARGET_AVX512BW && TARGET_EVEX512 && TARGET_64BIT")]) (define_insn "*cmp_ccz_1" [(set (reg FLAGS_REG) @@ -4580,7 +4583,7 @@ (eq_attr "alternative" "12") (const_string "x64_avx512bw") (eq_attr "alternative" "13") - (const_string "avx512bw") + (const_string "avx512bw_512") ] (const_string "*"))) (set (attr "mmx_isa") @@ -4657,7 +4660,7 @@ "split_double_mode (DImode, &operands[0], 1, &operands[3], &operands[4]);") (define_mode_attr kmov_isa - [(QI "avx512dq") (HI "avx512f") (SI "avx512bw") (DI "avx512bw")]) + [(QI "avx512dq") (HI "avx512f") (SI "avx512bw") (DI "avx512bw_512")]) (define_insn "zero_extenddi2" [(set (match_operand:DI 0 "register_operand" "=r,*r,*k") @@ -11124,7 +11127,7 @@ and{q}\t{%2, %0|%0, %2} # #" - [(set_attr "isa" "x64,x64,x64,x64,avx512bw") + [(set_attr "isa" "x64,x64,x64,x64,avx512bw_512") (set_attr "type" "alu,alu,alu,imovx,msklog") (set_attr "length_immediate" "*,*,*,0,*") (set (attr "prefix_rex") @@ -11647,12 +11650,13 @@ (not:SWI48 (match_operand:SWI48 1 "register_operand" "r,r,k")) (match_operand:SWI48 2 "nonimmediate_operand" "r,m,k"))) (clobber (reg:CC FLAGS_REG))] - "TARGET_BMI || TARGET_AVX512BW" + "TARGET_BMI + || (TARGET_AVX512BW && (mode == SImode || TARGET_EVEX512))" "@ andn\t{%2, %1, %0|%0, %1, %2} andn\t{%2, %1, %0|%0, %1, %2} #" - [(set_attr "isa" "bmi,bmi,avx512bw") + [(set_attr "isa" "bmi,bmi,") (set_attr "type" "bitmanip,bitmanip,msklog") (set_attr "btver2_decode" "direct, double,*") (set_attr "mode" "")]) @@ -11880,13 +11884,7 @@ {}\t{%2, %0|%0, %2} {}\t{%2, %0|%0, %2} #" - [(set (attr "isa") - (cond [(eq_attr "alternative" "2") - (if_then_else (eq_attr "mode" "SI,DI") - (const_string "avx512bw") - (const_string "avx512f")) - ] - (const_string "*"))) + [(set_attr "isa" "*,*,") (set_attr "type" "alu, alu, msklog") (set_attr "mode" "")]) @@ -11913,13 +11911,7 @@ DONE; } } - [(set (attr "isa") - (cond [(eq_attr "alternative" "2") - (if_then_else (eq_attr "mode" "SI,DI") - (const_string "avx512bw") - (const_string "avx512f")) - ] - (const_string "*"))) + [(set_attr "isa" "*,*,") (set_attr "type" "alu, alu, msklog") (set_attr "mode" "")]) @@ -13300,13 +13292,7 @@ "@ not{}\t%0 #" - [(set (attr "isa") - (cond [(eq_attr "alternative" "1") - (if_then_else (eq_attr "mode" "SI,DI") - (const_string "avx512bw") - (const_string "avx512f")) - ] - (const_string "*"))) + [(set_attr "isa" "*,") (set_attr "type" "negnot,msklog") (set_attr "mode" "")]) @@ -13318,7 +13304,7 @@ "@ not{l}\t%k0 #" - [(set_attr "isa" "x64,avx512bw") + [(set_attr "isa" "x64,avx512bw_512") (set_attr "type" "negnot,msklog") (set_attr "mode" "SI,SI")]) @@ -13943,7 +13929,7 @@ return "sal{}\t{%2, %0|%0, %2}"; } } - [(set_attr "isa" "*,*,bmi2,avx512bw") + [(set_attr "isa" "*,*,bmi2,") (set (attr "type") (cond [(eq_attr "alternative" "1") (const_string "lea") @@ -14995,7 +14981,7 @@ return "shr{}\t{%2, %0|%0, %2}"; } } - [(set_attr "isa" "*,bmi2,avx512bw") + [(set_attr "isa" "*,bmi2,") (set_attr "type" "ishift,ishiftx,msklog") (set (attr "length_immediate") (if_then_else diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index a8f93ceddc5..e59f6bf4410 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -292,10 +292,10 @@ (V32HI "TARGET_EVEX512") (V16HI "TARGET_AVX512VL") (V8HI "TARGET_AVX512VL")]) (define_mode_iterator VI12HFBF_AVX512VL - [V64QI (V16QI "TARGET_AVX512VL") (V32QI "TARGET_AVX512VL") - V32HI (V16HI "TARGET_AVX512VL") (V8HI "TARGET_AVX512VL") - V32HF (V16HF "TARGET_AVX512VL") (V8HF "TARGET_AVX512VL") - V32BF (V16BF "TARGET_AVX512VL") (V8BF "TARGET_AVX512VL")]) + [(V64QI "TARGET_EVEX512") (V16QI "TARGET_AVX512VL") (V32QI "TARGET_AVX512VL") + (V32HI "TARGET_EVEX512") (V16HI "TARGET_AVX512VL") (V8HI "TARGET_AVX512VL") + (V32HF "TARGET_EVEX512") (V16HF "TARGET_AVX512VL") (V8HF "TARGET_AVX512VL") + (V32BF "TARGET_EVEX512") (V16BF "TARGET_AVX512VL") (V8BF "TARGET_AVX512VL")]) (define_mode_iterator VI1_AVX512VL [V64QI (V16QI "TARGET_AVX512VL") (V32QI "TARGET_AVX512VL")]) @@ -445,9 +445,11 @@ (V8DI "TARGET_EVEX512") (V4DI "TARGET_AVX512VL") (V2DI "TARGET_AVX512VL")]) (define_mode_iterator VI1248_AVX512VLBW - [(V64QI "TARGET_AVX512BW") (V32QI "TARGET_AVX512VL && TARGET_AVX512BW") + [(V64QI "TARGET_AVX512BW && TARGET_EVEX512") + (V32QI "TARGET_AVX512VL && TARGET_AVX512BW") (V16QI "TARGET_AVX512VL && TARGET_AVX512BW") - (V32HI "TARGET_AVX512BW") (V16HI "TARGET_AVX512VL && TARGET_AVX512BW") + (V32HI "TARGET_AVX512BW && TARGET_EVEX512") + (V16HI "TARGET_AVX512VL && TARGET_AVX512BW") (V8HI "TARGET_AVX512VL && TARGET_AVX512BW") (V16SI "TARGET_EVEX512") (V8SI "TARGET_AVX512VL") (V4SI "TARGET_AVX512VL") (V8DI "TARGET_EVEX512") (V4DI "TARGET_AVX512VL") (V2DI "TARGET_AVX512VL")]) @@ -481,15 +483,15 @@ [V32HF (V16HF "TARGET_AVX512VL") (V8HF "TARGET_AVX512VL")]) (define_mode_iterator VHFBF_AVX512VL - [V32HF (V16HF "TARGET_AVX512VL") (V8HF "TARGET_AVX512VL") - V32BF (V16BF "TARGET_AVX512VL") (V8BF "TARGET_AVX512VL")]) + [(V32HF "TARGET_EVEX512") (V16HF "TARGET_AVX512VL") (V8HF "TARGET_AVX512VL") + (V32BF "TARGET_EVEX512") (V16BF "TARGET_AVX512VL") (V8BF "TARGET_AVX512VL")]) ;; All vector integer modes (define_mode_iterator VI [(V16SI "TARGET_AVX512F && TARGET_EVEX512") (V8DI "TARGET_AVX512F && TARGET_EVEX512") - (V64QI "TARGET_AVX512BW") (V32QI "TARGET_AVX") V16QI - (V32HI "TARGET_AVX512BW") (V16HI "TARGET_AVX") V8HI + (V64QI "TARGET_AVX512BW && TARGET_EVEX512") (V32QI "TARGET_AVX") V16QI + (V32HI "TARGET_AVX512BW && TARGET_EVEX512") (V16HI "TARGET_AVX") V8HI (V8SI "TARGET_AVX") V4SI (V4DI "TARGET_AVX") V2DI]) @@ -497,16 +499,16 @@ (define_mode_iterator VIHFBF [(V16SI "TARGET_AVX512F && TARGET_EVEX512") (V8DI "TARGET_AVX512F && TARGET_EVEX512") - (V64QI "TARGET_AVX512BW") (V32QI "TARGET_AVX") V16QI - (V32HI "TARGET_AVX512BW") (V16HI "TARGET_AVX") V8HI + (V64QI "TARGET_AVX512BW && TARGET_EVEX512") (V32QI "TARGET_AVX") V16QI + (V32HI "TARGET_AVX512BW && TARGET_EVEX512") (V16HI "TARGET_AVX") V8HI (V8SI "TARGET_AVX") V4SI (V4DI "TARGET_AVX") V2DI - (V32HF "TARGET_AVX512BW") (V16HF "TARGET_AVX") V8HF - (V32BF "TARGET_AVX512BW") (V16BF "TARGET_AVX") V8BF]) + (V32HF "TARGET_AVX512BW && TARGET_EVEX512") (V16HF "TARGET_AVX") V8HF + (V32BF "TARGET_AVX512BW && TARGET_EVEX512") (V16BF "TARGET_AVX") V8BF]) (define_mode_iterator VI_AVX2 - [(V64QI "TARGET_AVX512BW") (V32QI "TARGET_AVX2") V16QI - (V32HI "TARGET_AVX512BW") (V16HI "TARGET_AVX2") V8HI + [(V64QI "TARGET_AVX512BW && TARGET_EVEX512") (V32QI "TARGET_AVX2") V16QI + (V32HI "TARGET_AVX512BW && TARGET_EVEX512") (V16HI "TARGET_AVX2") V8HI (V16SI "TARGET_AVX512F && TARGET_EVEX512") (V8SI "TARGET_AVX2") V4SI (V8DI "TARGET_AVX512F && TARGET_EVEX512") (V4DI "TARGET_AVX2") V2DI]) @@ -541,7 +543,7 @@ [(V32QI "TARGET_AVX2") V16QI]) (define_mode_iterator VI1_AVX512 - [(V64QI "TARGET_AVX512BW") (V32QI "TARGET_AVX2") V16QI]) + [(V64QI "TARGET_AVX512BW && TARGET_EVEX512") (V32QI "TARGET_AVX2") V16QI]) (define_mode_iterator VI1_AVX512F [(V64QI "TARGET_AVX512F") (V32QI "TARGET_AVX") V16QI]) @@ -550,20 +552,20 @@ [(V64QI "TARGET_AVX512VNNI") (V32QI "TARGET_AVX2") V16QI]) (define_mode_iterator VI12_256_512_AVX512VL - [V64QI (V32QI "TARGET_AVX512VL") - V32HI (V16HI "TARGET_AVX512VL")]) + [(V64QI "TARGET_EVEX512") (V32QI "TARGET_AVX512VL") + (V32HI "TARGET_EVEX512") (V16HI "TARGET_AVX512VL")]) (define_mode_iterator VI2_AVX2 [(V16HI "TARGET_AVX2") V8HI]) (define_mode_iterator VI2_AVX2_AVX512BW - [(V32HI "TARGET_AVX512BW") (V16HI "TARGET_AVX2") V8HI]) + [(V32HI "TARGET_AVX512BW && TARGET_EVEX512") (V16HI "TARGET_AVX2") V8HI]) (define_mode_iterator VI2_AVX512F [(V32HI "TARGET_AVX512F && TARGET_EVEX512") (V16HI "TARGET_AVX2") V8HI]) (define_mode_iterator VI2_AVX512VNNIBW - [(V32HI "TARGET_AVX512BW || TARGET_AVX512VNNI") + [(V32HI "(TARGET_AVX512BW || TARGET_AVX512VNNI) && TARGET_EVEX512") (V16HI "TARGET_AVX2") V8HI]) (define_mode_iterator VI4_AVX @@ -584,12 +586,12 @@ (V8DI "TARGET_AVX512F && TARGET_EVEX512")]) (define_mode_iterator VI2_AVX512VL - [(V8HI "TARGET_AVX512VL") (V16HI "TARGET_AVX512VL") V32HI]) + [(V8HI "TARGET_AVX512VL") (V16HI "TARGET_AVX512VL") (V32HI "TARGET_EVEX512")]) (define_mode_iterator VI2HFBF_AVX512VL - [(V8HI "TARGET_AVX512VL") (V16HI "TARGET_AVX512VL") V32HI - (V8HF "TARGET_AVX512VL") (V16HF "TARGET_AVX512VL") V32HF - (V8BF "TARGET_AVX512VL") (V16BF "TARGET_AVX512VL") V32BF]) + [(V8HI "TARGET_AVX512VL") (V16HI "TARGET_AVX512VL") (V32HI "TARGET_EVEX512") + (V8HF "TARGET_AVX512VL") (V16HF "TARGET_AVX512VL") (V32HF "TARGET_EVEX512") + (V8BF "TARGET_AVX512VL") (V16BF "TARGET_AVX512VL") (V32BF "TARGET_EVEX512")]) (define_mode_iterator VI2H_AVX512VL [(V8HI "TARGET_AVX512VL") (V16HI "TARGET_AVX512VL") V32HI @@ -600,7 +602,7 @@ [V32QI (V16QI "TARGET_AVX512VL") (V64QI "TARGET_AVX512F")]) (define_mode_iterator VI8_AVX2_AVX512BW - [(V8DI "TARGET_AVX512BW") (V4DI "TARGET_AVX2") V2DI]) + [(V8DI "TARGET_AVX512BW && TARGET_EVEX512") (V4DI "TARGET_AVX2") V2DI]) (define_mode_iterator VI8_AVX2 [(V4DI "TARGET_AVX2") V2DI]) @@ -624,11 +626,11 @@ ;; ??? We should probably use TImode instead. (define_mode_iterator VIMAX_AVX2_AVX512BW - [(V4TI "TARGET_AVX512BW") (V2TI "TARGET_AVX2") V1TI]) + [(V4TI "TARGET_AVX512BW && TARGET_EVEX512") (V2TI "TARGET_AVX2") V1TI]) ;; Suppose TARGET_AVX512BW as baseline (define_mode_iterator VIMAX_AVX512VL - [V4TI (V2TI "TARGET_AVX512VL") (V1TI "TARGET_AVX512VL")]) + [(V4TI "TARGET_EVEX512") (V2TI "TARGET_AVX512VL") (V1TI "TARGET_AVX512VL")]) (define_mode_iterator VIMAX_AVX2 [(V2TI "TARGET_AVX2") V1TI]) @@ -638,15 +640,15 @@ (V16HI "TARGET_AVX2") V8HI]) (define_mode_iterator VI12_AVX2_AVX512BW - [(V64QI "TARGET_AVX512BW") (V32QI "TARGET_AVX2") V16QI - (V32HI "TARGET_AVX512BW") (V16HI "TARGET_AVX2") V8HI]) + [(V64QI "TARGET_AVX512BW && TARGET_EVEX512") (V32QI "TARGET_AVX2") V16QI + (V32HI "TARGET_AVX512BW && TARGET_EVEX512") (V16HI "TARGET_AVX2") V8HI]) (define_mode_iterator VI24_AVX2 [(V16HI "TARGET_AVX2") V8HI (V8SI "TARGET_AVX2") V4SI]) (define_mode_iterator VI124_AVX2_24_AVX512F_1_AVX512BW - [(V64QI "TARGET_AVX512BW") (V32QI "TARGET_AVX2") V16QI + [(V64QI "TARGET_AVX512BW && TARGET_EVEX512") (V32QI "TARGET_AVX2") V16QI (V32HI "TARGET_AVX512F && TARGET_EVEX512") (V16HI "TARGET_AVX2") V8HI (V16SI "TARGET_AVX512F && TARGET_EVEX512") (V8SI "TARGET_AVX2") V4SI]) @@ -656,13 +658,13 @@ (V8SI "TARGET_AVX2") V4SI]) (define_mode_iterator VI248_AVX512VL - [V32HI V16SI V8DI + [(V32HI "TARGET_EVEX512") (V16SI "TARGET_EVEX512") (V8DI "TARGET_EVEX512") (V16HI "TARGET_AVX512VL") (V8SI "TARGET_AVX512VL") (V4DI "TARGET_AVX512VL") (V8HI "TARGET_AVX512VL") (V4SI "TARGET_AVX512VL") (V2DI "TARGET_AVX512VL")]) (define_mode_iterator VI248_AVX512VLBW - [(V32HI "TARGET_AVX512BW") + [(V32HI "TARGET_AVX512BW && TARGET_EVEX512") (V16HI "TARGET_AVX512VL && TARGET_AVX512BW") (V8HI "TARGET_AVX512VL && TARGET_AVX512BW") (V16SI "TARGET_EVEX512") (V8SI "TARGET_AVX512VL") (V4SI "TARGET_AVX512VL") @@ -678,16 +680,16 @@ (V4DI "TARGET_AVX2") V2DI]) (define_mode_iterator VI248_AVX2_8_AVX512F_24_AVX512BW - [(V32HI "TARGET_AVX512BW") (V16HI "TARGET_AVX2") V8HI - (V16SI "TARGET_AVX512BW") (V8SI "TARGET_AVX2") V4SI + [(V32HI "TARGET_AVX512BW && TARGET_EVEX512") (V16HI "TARGET_AVX2") V8HI + (V16SI "TARGET_AVX512BW && TARGET_EVEX512") (V8SI "TARGET_AVX2") V4SI (V8DI "TARGET_AVX512F && TARGET_EVEX512") (V4DI "TARGET_AVX2") V2DI]) (define_mode_iterator VI248_AVX512BW - [(V32HI "TARGET_AVX512BW") (V16SI "TARGET_EVEX512") + [(V32HI "TARGET_AVX512BW && TARGET_EVEX512") (V16SI "TARGET_EVEX512") (V8DI "TARGET_EVEX512")]) (define_mode_iterator VI248_AVX512BW_AVX512VL - [(V32HI "TARGET_AVX512BW") + [(V32HI "TARGET_AVX512BW && TARGET_EVEX512") (V4DI "TARGET_AVX512VL") (V16SI "TARGET_EVEX512") (V8DI "TARGET_EVEX512")]) ;; Suppose TARGET_AVX512VL as baseline @@ -850,7 +852,8 @@ (define_mode_iterator VI24_128 [V8HI V4SI]) (define_mode_iterator VI248_128 [V8HI V4SI V2DI]) (define_mode_iterator VI248_256 [V16HI V8SI V4DI]) -(define_mode_iterator VI248_512 [V32HI V16SI V8DI]) +(define_mode_iterator VI248_512 + [(V32HI "TARGET_EVEX512") (V16SI "TARGET_EVEX512") (V8DI "TARGET_EVEX512")]) (define_mode_iterator VI48_128 [V4SI V2DI]) (define_mode_iterator VI148_512 [(V64QI "TARGET_EVEX512") (V16SI "TARGET_EVEX512") (V8DI "TARGET_EVEX512")]) @@ -861,8 +864,8 @@ (define_mode_iterator VI124_256 [V32QI V16HI V8SI]) (define_mode_iterator VI124_256_AVX512F_AVX512BW [V32QI V16HI V8SI - (V64QI "TARGET_AVX512BW") - (V32HI "TARGET_AVX512BW") + (V64QI "TARGET_AVX512BW && TARGET_EVEX512") + (V32HI "TARGET_AVX512BW && TARGET_EVEX512") (V16SI "TARGET_AVX512F && TARGET_EVEX512")]) (define_mode_iterator VI48_256 [V8SI V4DI]) (define_mode_iterator VI48_512 @@ -870,11 +873,14 @@ (define_mode_iterator VI4_256_8_512 [V8SI V8DI]) (define_mode_iterator VI_AVX512BW [(V16SI "TARGET_EVEX512") (V8DI "TARGET_EVEX512") - (V32HI "TARGET_AVX512BW") (V64QI "TARGET_AVX512BW")]) + (V32HI "TARGET_AVX512BW && TARGET_EVEX512") + (V64QI "TARGET_AVX512BW && TARGET_EVEX512")]) (define_mode_iterator VIHFBF_AVX512BW [(V16SI "TARGET_EVEX512") (V8DI "TARGET_EVEX512") - (V32HI "TARGET_AVX512BW") (V64QI "TARGET_AVX512BW") - (V32HF "TARGET_AVX512BW") (V32BF "TARGET_AVX512BW")]) + (V32HI "TARGET_AVX512BW && TARGET_EVEX512") + (V64QI "TARGET_AVX512BW && TARGET_EVEX512") + (V32HF "TARGET_AVX512BW && TARGET_EVEX512") + (V32BF "TARGET_AVX512BW && TARGET_EVEX512")]) ;; Int-float size matches (define_mode_iterator VI2F_256_512 [V16HI V32HI V16HF V32HF V16BF V32BF]) @@ -1948,17 +1954,19 @@ ;; All integer modes with AVX512BW/DQ. (define_mode_iterator SWI1248_AVX512BWDQ - [(QI "TARGET_AVX512DQ") HI (SI "TARGET_AVX512BW") (DI "TARGET_AVX512BW")]) + [(QI "TARGET_AVX512DQ") HI (SI "TARGET_AVX512BW") + (DI "TARGET_AVX512BW && TARGET_EVEX512")]) ;; All integer modes with AVX512BW, where HImode operation ;; can be used instead of QImode. (define_mode_iterator SWI1248_AVX512BW - [QI HI (SI "TARGET_AVX512BW") (DI "TARGET_AVX512BW")]) + [QI HI (SI "TARGET_AVX512BW") + (DI "TARGET_AVX512BW && TARGET_EVEX512")]) ;; All integer modes with AVX512BW/DQ, even HImode requires DQ. (define_mode_iterator SWI1248_AVX512BWDQ2 [(QI "TARGET_AVX512DQ") (HI "TARGET_AVX512DQ") - (SI "TARGET_AVX512BW") (DI "TARGET_AVX512BW")]) + (SI "TARGET_AVX512BW") (DI "TARGET_AVX512BW && TARGET_EVEX512")]) (define_expand "kmov" [(set (match_operand:SWI1248_AVX512BWDQ 0 "nonimmediate_operand") @@ -2097,7 +2105,7 @@ (zero_extend:DI (not:SI (match_operand:SI 1 "register_operand" "k")))) (unspec [(const_int 0)] UNSPEC_MASKOP)] - "TARGET_AVX512BW" + "TARGET_AVX512BW && TARGET_EVEX512" "knotd\t{%1, %0|%0, %1}"; [(set_attr "type" "msklog") (set_attr "prefix" "vex") @@ -2107,7 +2115,7 @@ [(set (match_operand:DI 0 "mask_reg_operand") (zero_extend:DI (not:SI (match_operand:SI 1 "mask_reg_operand"))))] - "TARGET_AVX512BW && reload_completed" + "TARGET_AVX512BW && TARGET_EVEX512 && reload_completed" [(parallel [(set (match_dup 0) (zero_extend:DI @@ -2213,7 +2221,7 @@ (const_int 32)) (zero_extend:DI (match_operand:SI 2 "register_operand" "k")))) (unspec [(const_int 0)] UNSPEC_MASKOP)] - "TARGET_AVX512BW" + "TARGET_AVX512BW && TARGET_EVEX512" "kunpckdq\t{%2, %1, %0|%0, %1, %2}" [(set_attr "mode" "DI")]) @@ -3455,9 +3463,9 @@ (V16HF "TARGET_AVX512FP16 && TARGET_AVX512VL") (V8SI "TARGET_AVX2") (V4DI "TARGET_AVX2") (V8SF "TARGET_AVX") (V4DF "TARGET_AVX") - (V64QI "TARGET_AVX512BW") + (V64QI "TARGET_AVX512BW && TARGET_EVEX512") (V32HF "TARGET_AVX512FP16 && TARGET_AVX512VL") - (V32HI "TARGET_AVX512BW") + (V32HI "TARGET_AVX512BW && TARGET_EVEX512") (V16SI "TARGET_AVX512F && TARGET_EVEX512") (V8DI "TARGET_AVX512F && TARGET_EVEX512") (V16SF "TARGET_AVX512F && TARGET_EVEX512") @@ -12340,12 +12348,12 @@ ;; Modes handled by vec_extract patterns. (define_mode_iterator VEC_EXTRACT_MODE - [(V64QI "TARGET_AVX512BW") (V32QI "TARGET_AVX") V16QI - (V32HI "TARGET_AVX512BW") (V16HI "TARGET_AVX") V8HI + [(V64QI "TARGET_AVX512BW && TARGET_EVEX512") (V32QI "TARGET_AVX") V16QI + (V32HI "TARGET_AVX512BW && TARGET_EVEX512") (V16HI "TARGET_AVX") V8HI (V16SI "TARGET_AVX512F && TARGET_EVEX512") (V8SI "TARGET_AVX") V4SI (V8DI "TARGET_AVX512F && TARGET_EVEX512") (V4DI "TARGET_AVX") V2DI - (V32HF "TARGET_AVX512BW") (V16HF "TARGET_AVX") V8HF - (V32BF "TARGET_AVX512BW") (V16BF "TARGET_AVX") V8BF + (V32HF "TARGET_AVX512BW && TARGET_EVEX512") (V16HF "TARGET_AVX") V8HF + (V32BF "TARGET_AVX512BW && TARGET_EVEX512") (V16BF "TARGET_AVX") V8BF (V16SF "TARGET_AVX512F && TARGET_EVEX512") (V8SF "TARGET_AVX") V4SF (V8DF "TARGET_AVX512F && TARGET_EVEX512") (V4DF "TARGET_AVX") V2DF (V4TI "TARGET_AVX512F && TARGET_EVEX512") (V2TI "TARGET_AVX")]) @@ -14028,7 +14036,7 @@ (const_int 10) (const_int 11) (const_int 12) (const_int 13) (const_int 14) (const_int 15)])))] - "TARGET_AVX512BW && ix86_pre_reload_split ()" + "TARGET_AVX512BW && TARGET_EVEX512 && ix86_pre_reload_split ()" "#" "&& 1" [(set (match_dup 0) @@ -14053,7 +14061,7 @@ (const_int 10) (const_int 11) (const_int 12) (const_int 13) (const_int 14) (const_int 15)])))] - "TARGET_AVX512BW && ix86_pre_reload_split ()" + "TARGET_AVX512BW && TARGET_EVEX512 && ix86_pre_reload_split ()" "#" "&& 1" [(set (match_dup 0) @@ -14173,13 +14181,13 @@ [(set (match_operand:V32QI 0 "nonimmediate_operand") (truncate:V32QI (match_operand:V32HI 1 "register_operand")))] - "TARGET_AVX512BW") + "TARGET_AVX512BW && TARGET_EVEX512") (define_insn "avx512bw_v32hiv32qi2" [(set (match_operand:V32QI 0 "nonimmediate_operand" "=v,m") (any_truncate:V32QI (match_operand:V32HI 1 "register_operand" "v,v")))] - "TARGET_AVX512BW" + "TARGET_AVX512BW && TARGET_EVEX512" "vpmovwb\t{%1, %0|%0, %1}" [(set_attr "type" "ssemov") (set_attr "memory" "none,store") @@ -14225,7 +14233,7 @@ (match_operand:V32HI 1 "register_operand" "v,v")) (match_operand:V32QI 2 "nonimm_or_0_operand" "0C,0") (match_operand:SI 3 "register_operand" "Yk,Yk")))] - "TARGET_AVX512BW" + "TARGET_AVX512BW && TARGET_EVEX512" "vpmovwb\t{%1, %0%{%3%}%N2|%0%{%3%}%N2, %1}" [(set_attr "type" "ssemov") (set_attr "memory" "none,store") @@ -14239,7 +14247,7 @@ (match_operand:V32HI 1 "register_operand")) (match_dup 0) (match_operand:SI 2 "register_operand")))] - "TARGET_AVX512BW") + "TARGET_AVX512BW && TARGET_EVEX512") (define_mode_iterator PMOV_DST_MODE_2 [V4SI V8HI (V16QI "TARGET_AVX512BW")]) @@ -16126,7 +16134,7 @@ (match_operand:V64QI 1 "register_operand") (match_operand:V64QI 2 "nonimmediate_operand") (match_operand:V16SI 3 "nonimmediate_operand")] - "TARGET_AVX512BW" + "TARGET_AVX512BW && TARGET_EVEX512" { rtx t1 = gen_reg_rtx (V8DImode); rtx t2 = gen_reg_rtx (V16SImode); @@ -17312,7 +17320,7 @@ (V8DF "TARGET_AVX512F && TARGET_EVEX512") (V16SI "TARGET_AVX512F && TARGET_EVEX512") (V8DI "TARGET_AVX512F && TARGET_EVEX512") - (V32HI "TARGET_AVX512BW") (V64QI "TARGET_AVX512VBMI") + (V32HI "TARGET_AVX512BW && TARGET_EVEX512") (V64QI "TARGET_AVX512VBMI") (V32HF "TARGET_AVX512FP16")]) (define_expand "vec_perm" @@ -18018,7 +18026,7 @@ (const_string "*")))]) (define_mode_iterator AVX512ZEXTMASK - [(DI "TARGET_AVX512BW") (SI "TARGET_AVX512BW") HI]) + [(DI "TARGET_AVX512BW && TARGET_EVEX512") (SI "TARGET_AVX512BW") HI]) (define_insn "_testm3" [(set (match_operand: 0 "register_operand" "=k") @@ -18130,16 +18138,18 @@ (unspec [(const_int 0)] UNSPEC_MASKOP)])] "TARGET_AVX512F") +(define_mode_iterator SWI24_MASK [HI (SI "TARGET_EVEX512")]) + (define_expand "vec_pack_trunc_" [(parallel [(set (match_operand: 0 "register_operand") (ior: (ashift: (zero_extend: - (match_operand:SWI24 2 "register_operand")) + (match_operand:SWI24_MASK 2 "register_operand")) (match_dup 3)) (zero_extend: - (match_operand:SWI24 1 "register_operand")))) + (match_operand:SWI24_MASK 1 "register_operand")))) (unspec [(const_int 0)] UNSPEC_MASKOP)])] "TARGET_AVX512BW" { @@ -18267,7 +18277,7 @@ (const_int 60) (const_int 61) (const_int 62) (const_int 63)])))] - "TARGET_AVX512BW" + "TARGET_AVX512BW && TARGET_EVEX512" "vpacksswb\t{%2, %1, %0|%0, %1, %2}" [(set_attr "type" "sselog") (set_attr "prefix" "") @@ -18336,7 +18346,7 @@ (const_int 14) (const_int 15) (const_int 28) (const_int 29) (const_int 30) (const_int 31)])))] - "TARGET_AVX512BW" + "TARGET_AVX512BW && TARGET_EVEX512" "vpackssdw\t{%2, %1, %0|%0, %1, %2}" [(set_attr "type" "sselog") (set_attr "prefix" "") @@ -18398,7 +18408,7 @@ (const_int 61) (const_int 125) (const_int 62) (const_int 126) (const_int 63) (const_int 127)])))] - "TARGET_AVX512BW" + "TARGET_AVX512BW && TARGET_EVEX512" "vpunpckhbw\t{%2, %1, %0|%0, %1, %2}" [(set_attr "type" "sselog") (set_attr "prefix" "evex") @@ -18494,7 +18504,7 @@ (const_int 53) (const_int 117) (const_int 54) (const_int 118) (const_int 55) (const_int 119)])))] - "TARGET_AVX512BW" + "TARGET_AVX512BW && TARGET_EVEX512" "vpunpcklbw\t{%2, %1, %0|%0, %1, %2}" [(set_attr "type" "sselog") (set_attr "prefix" "evex") @@ -19677,7 +19687,7 @@ [(match_operand:V32HI 1 "nonimmediate_operand" "vm") (match_operand:SI 2 "const_0_to_255_operand")] UNSPEC_PSHUFLW))] - "TARGET_AVX512BW" + "TARGET_AVX512BW && TARGET_EVEX512" "vpshuflw\t{%2, %1, %0|%0, %1, %2}" [(set_attr "type" "sselog") (set_attr "prefix" "evex") @@ -19853,7 +19863,7 @@ [(match_operand:V32HI 1 "nonimmediate_operand" "vm") (match_operand:SI 2 "const_0_to_255_operand")] UNSPEC_PSHUFHW))] - "TARGET_AVX512BW" + "TARGET_AVX512BW && TARGET_EVEX512" "vpshufhw\t{%2, %1, %0|%0, %1, %2}" [(set_attr "type" "sselog") (set_attr "prefix" "evex") @@ -20735,7 +20745,7 @@ (define_expand "vec_unpacks_lo_di" [(set (match_operand:SI 0 "register_operand") (subreg:SI (match_operand:DI 1 "register_operand") 0))] - "TARGET_AVX512BW") + "TARGET_AVX512BW && TARGET_EVEX512") (define_expand "vec_unpacku_hi_" [(match_operand: 0 "register_operand") @@ -20774,12 +20784,15 @@ (unspec [(const_int 0)] UNSPEC_MASKOP)])] "TARGET_AVX512F") +(define_mode_iterator SWI48x_MASK [SI (DI "TARGET_EVEX512")]) + (define_expand "vec_unpacks_hi_" [(parallel - [(set (subreg:SWI48x + [(set (subreg:SWI48x_MASK (match_operand: 0 "register_operand") 0) - (lshiftrt:SWI48x (match_operand:SWI48x 1 "register_operand") - (match_dup 2))) + (lshiftrt:SWI48x_MASK + (match_operand:SWI48x_MASK 1 "register_operand") + (match_dup 2))) (unspec [(const_int 0)] UNSPEC_MASKOP)])] "TARGET_AVX512BW" "operands[2] = GEN_INT (GET_MODE_BITSIZE (mode));") @@ -21534,7 +21547,7 @@ (const_int 1) (const_int 1) (const_int 1) (const_int 1)])) (const_int 1))))] - "TARGET_AVX512BW" + "TARGET_AVX512BW && TARGET_EVEX512" "vpmulhrsw\t{%2, %1, %0|%0, %1, %2}" [(set_attr "type" "sseimul") (set_attr "prefix" "evex") @@ -22042,8 +22055,8 @@ ;; Mode iterator to handle singularity w/ absence of V2DI and V4DI ;; modes for abs instruction on pre AVX-512 targets. (define_mode_iterator VI1248_AVX512VL_AVX512BW - [(V64QI "TARGET_AVX512BW") (V32QI "TARGET_AVX2") V16QI - (V32HI "TARGET_AVX512BW") (V16HI "TARGET_AVX2") V8HI + [(V64QI "TARGET_AVX512BW && TARGET_EVEX512") (V32QI "TARGET_AVX2") V16QI + (V32HI "TARGET_AVX512BW && TARGET_EVEX512") (V16HI "TARGET_AVX2") V8HI (V16SI "TARGET_AVX512F && TARGET_EVEX512") (V8SI "TARGET_AVX2") V4SI (V8DI "TARGET_AVX512F && TARGET_EVEX512") (V4DI "TARGET_AVX512VL") (V2DI "TARGET_AVX512VL")]) @@ -22702,7 +22715,7 @@ [(set (match_operand:V32HI 0 "register_operand" "=v") (any_extend:V32HI (match_operand:V32QI 1 "nonimmediate_operand" "vm")))] - "TARGET_AVX512BW" + "TARGET_AVX512BW && TARGET_EVEX512" "vpmovbw\t{%1, %0|%0, %1}" [(set_attr "type" "ssemov") (set_attr "prefix" "evex") @@ -22716,7 +22729,7 @@ (match_operand:V64QI 2 "const0_operand")) (match_parallel 3 "pmovzx_parallel" [(match_operand 4 "const_int_operand")])))] - "TARGET_AVX512BW" + "TARGET_AVX512BW && TARGET_EVEX512" "#" "&& reload_completed" [(set (match_dup 0) (zero_extend:V32HI (match_dup 1)))] @@ -22736,7 +22749,7 @@ (match_operand:V64QI 3 "const0_operand")) (match_parallel 4 "pmovzx_parallel" [(match_operand 5 "const_int_operand")])))] - "TARGET_AVX512BW" + "TARGET_AVX512BW && TARGET_EVEX512" "#" "&& reload_completed" [(set (match_dup 0) (zero_extend:V32HI (match_dup 1)))] @@ -22749,7 +22762,7 @@ [(set (match_operand:V32HI 0 "register_operand") (any_extend:V32HI (match_operand:V32QI 1 "nonimmediate_operand")))] - "TARGET_AVX512BW") + "TARGET_AVX512BW && TARGET_EVEX512") (define_insn "sse4_1_v8qiv8hi2" [(set (match_operand:V8HI 0 "register_operand" "=Yr,*x,Yw") @@ -26107,12 +26120,12 @@ (set_attr "mode" "OI")]) (define_mode_attr pbroadcast_evex_isa - [(V64QI "avx512bw") (V32QI "avx512bw") (V16QI "avx512bw") - (V32HI "avx512bw") (V16HI "avx512bw") (V8HI "avx512bw") + [(V64QI "avx512bw_512") (V32QI "avx512bw") (V16QI "avx512bw") + (V32HI "avx512bw_512") (V16HI "avx512bw") (V8HI "avx512bw") (V16SI "avx512f_512") (V8SI "avx512f") (V4SI "avx512f") (V8DI "avx512f_512") (V4DI "avx512f") (V2DI "avx512f") - (V32HF "avx512bw") (V16HF "avx512bw") (V8HF "avx512bw") - (V32BF "avx512bw") (V16BF "avx512bw") (V8BF "avx512bw")]) + (V32HF "avx512bw_512") (V16HF "avx512bw") (V8HF "avx512bw") + (V32BF "avx512bw_512") (V16BF "avx512bw") (V8BF "avx512bw")]) (define_insn "avx2_pbroadcast" [(set (match_operand:VIHFBF 0 "register_operand" "=x,v") @@ -26967,7 +26980,8 @@ (V4DI "TARGET_AVX512VL") (V4DF "TARGET_AVX512VL") (V4SI "TARGET_AVX512VL") (V4SF "TARGET_AVX512VL") (V2DI "TARGET_AVX512VL") (V2DF "TARGET_AVX512VL") - (V32HI "TARGET_AVX512BW") (V16HI "TARGET_AVX512BW && TARGET_AVX512VL") + (V32HI "TARGET_AVX512BW && TARGET_EVEX512") + (V16HI "TARGET_AVX512BW && TARGET_AVX512VL") (V8HI "TARGET_AVX512BW && TARGET_AVX512VL") (V64QI "TARGET_AVX512VBMI") (V32QI "TARGET_AVX512VBMI && TARGET_AVX512VL") (V16QI "TARGET_AVX512VBMI && TARGET_AVX512VL")]) @@ -26976,7 +26990,8 @@ [(V16SI "TARGET_EVEX512") (V8DI "TARGET_EVEX512") (V8SI "TARGET_AVX512VL") (V4SI "TARGET_AVX512VL") (V4DI "TARGET_AVX512VL") (V2DI "TARGET_AVX512VL") - (V32HI "TARGET_AVX512BW") (V16HI "TARGET_AVX512BW && TARGET_AVX512VL") + (V32HI "TARGET_AVX512BW && TARGET_EVEX512") + (V16HI "TARGET_AVX512BW && TARGET_AVX512VL") (V8HI "TARGET_AVX512BW && TARGET_AVX512VL") (V64QI "TARGET_AVX512VBMI") (V32QI "TARGET_AVX512VBMI && TARGET_AVX512VL") (V16QI "TARGET_AVX512VBMI && TARGET_AVX512VL")]) From patchwork Thu Sep 21 07:20:11 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Hu, Lin1" X-Patchwork-Id: 1837513 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=intel.com header.i=@intel.com header.a=rsa-sha256 header.s=Intel header.b=U0u4NsNR; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=8.43.85.97; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (ip-8-43-85-97.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4Rrn1M3WKTz1ynF for ; Thu, 21 Sep 2023 17:23:03 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 043A138618F7 for ; Thu, 21 Sep 2023 07:23:01 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mgamail.intel.com (mgamail.intel.com [192.55.52.115]) by sourceware.org (Postfix) with ESMTPS id 453343857344 for ; Thu, 21 Sep 2023 07:22:38 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 453343857344 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=intel.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1695280958; x=1726816958; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=owD7NPFKEgXvFNvrr4kgpy4brLhFmsPxptjOgX2J078=; b=U0u4NsNRzF94Fywz/wC0zJOroI6jTZK6D4PhkuuhFJWIbEBQvb9p6lkL mpnmlqkBBtDjy9pwRZ4h/Gq3kL6uLUuO8M6ZJH0kOm878D5TlwXBaDcrc wort2VEOr32lj4H9TyG65X0AvDreOiWlNWibQR74lJAlklbdnF4wSUsBG FzQpjxgG9p93PMWE3/z1efgnxkrW4jayeVDTG98MAn2+3SHJMl8wI/KAO NWVvw4hciDDqmFILe2DbonotuibFq+Ghqc1imbDZNxUBoiLEjnZaWcFxJ aFiS/Wol/Axj/1r9BIqoc+GMEyMVe3No1uusaR+sgH9axXVoVEbfPlRXd w==; X-IronPort-AV: E=McAfee;i="6600,9927,10839"; a="380352163" X-IronPort-AV: E=Sophos;i="6.03,164,1694761200"; d="scan'208";a="380352163" Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Sep 2023 00:22:20 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10839"; a="817262195" X-IronPort-AV: E=Sophos;i="6.03,164,1694761200"; d="scan'208";a="817262195" Received: from shvmail03.sh.intel.com ([10.239.245.20]) by fmsmga004.fm.intel.com with ESMTP; 21 Sep 2023 00:22:17 -0700 Received: from shliclel4217.sh.intel.com (shliclel4217.sh.intel.com [10.239.240.127]) by shvmail03.sh.intel.com (Postfix) with ESMTP id 6694C100513E; Thu, 21 Sep 2023 15:22:14 +0800 (CST) From: "Hu, Lin1" To: gcc-patches@gcc.gnu.org Cc: hongtao.liu@intel.com, ubizjak@gmail.com, haochen.jiang@intel.com Subject: [PATCH 16/18] Support -mevex512 for AVX512{IFMA, VBMI, VNNI, BF16, VPOPCNTDQ, VBMI2, BITALG, VP2INTERSECT}, VAES, GFNI, VPCLMULQDQ intrins Date: Thu, 21 Sep 2023 15:20:11 +0800 Message-Id: <20230921072013.2124750-17-lin1.hu@intel.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20230921072013.2124750-1-lin1.hu@intel.com> References: <20230921072013.2124750-1-lin1.hu@intel.com> MIME-Version: 1.0 X-Spam-Status: No, score=-11.3 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org From: Haochen Jiang gcc/ChangeLog: * config/i386/sse.md (VI1_AVX512VL): Add TARGET_EVEX512. (VI8_FVL): Ditto. (VI1_AVX512F): Ditto. (VI1_AVX512VNNI): Ditto. (VI1_AVX512VL_F): Ditto. (VI12_VI48F_AVX512VL): Ditto. (*avx512f_permvar_truncv32hiv32qi_1): Ditto. (sdot_prod): Ditto. (VEC_PERM_AVX2): Ditto. (VPERMI2): Ditto. (VPERMI2I): Ditto. (vpmadd52v8di): Ditto. (usdot_prod): Ditto. (vpdpbusd_v16si): Ditto. (vpdpbusds_v16si): Ditto. (vpdpwssd_v16si): Ditto. (vpdpwssds_v16si): Ditto. (VI48_AVX512VP2VL): Ditto. (avx512vp2intersect_2intersectv16si): Ditto. (VF_AVX512BF16VL): Ditto. (VF1_AVX512_256): Ditto. gcc/testsuite/ChangeLog: * gcc.target/i386/pr90096.c: Adjust error message. Co-authored-by: Hu, Lin1 --- gcc/config/i386/sse.md | 56 +++++++++++++------------ gcc/testsuite/gcc.target/i386/pr90096.c | 2 +- 2 files changed, 31 insertions(+), 27 deletions(-) diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index e59f6bf4410..a5a95b9de66 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -298,7 +298,7 @@ (V32BF "TARGET_EVEX512") (V16BF "TARGET_AVX512VL") (V8BF "TARGET_AVX512VL")]) (define_mode_iterator VI1_AVX512VL - [V64QI (V16QI "TARGET_AVX512VL") (V32QI "TARGET_AVX512VL")]) + [(V64QI "TARGET_EVEX512") (V16QI "TARGET_AVX512VL") (V32QI "TARGET_AVX512VL")]) ;; All vector modes (define_mode_iterator V @@ -531,7 +531,7 @@ [(V8DI "TARGET_AVX512F && TARGET_EVEX512") (V4DI "TARGET_AVX") V2DI]) (define_mode_iterator VI8_FVL - [(V8DI "TARGET_AVX512F") V4DI (V2DI "TARGET_AVX512VL")]) + [(V8DI "TARGET_AVX512F && TARGET_EVEX512") V4DI (V2DI "TARGET_AVX512VL")]) (define_mode_iterator VI8_AVX512VL [(V8DI "TARGET_EVEX512") (V4DI "TARGET_AVX512VL") (V2DI "TARGET_AVX512VL")]) @@ -546,10 +546,10 @@ [(V64QI "TARGET_AVX512BW && TARGET_EVEX512") (V32QI "TARGET_AVX2") V16QI]) (define_mode_iterator VI1_AVX512F - [(V64QI "TARGET_AVX512F") (V32QI "TARGET_AVX") V16QI]) + [(V64QI "TARGET_AVX512F && TARGET_EVEX512") (V32QI "TARGET_AVX") V16QI]) (define_mode_iterator VI1_AVX512VNNI - [(V64QI "TARGET_AVX512VNNI") (V32QI "TARGET_AVX2") V16QI]) + [(V64QI "TARGET_AVX512VNNI && TARGET_EVEX512") (V32QI "TARGET_AVX2") V16QI]) (define_mode_iterator VI12_256_512_AVX512VL [(V64QI "TARGET_EVEX512") (V32QI "TARGET_AVX512VL") @@ -599,7 +599,7 @@ V8DI ]) (define_mode_iterator VI1_AVX512VL_F - [V32QI (V16QI "TARGET_AVX512VL") (V64QI "TARGET_AVX512F")]) + [V32QI (V16QI "TARGET_AVX512VL") (V64QI "TARGET_AVX512F && TARGET_EVEX512")]) (define_mode_iterator VI8_AVX2_AVX512BW [(V8DI "TARGET_AVX512BW && TARGET_EVEX512") (V4DI "TARGET_AVX2") V2DI]) @@ -923,8 +923,8 @@ (V4DI "TARGET_AVX512VL") (V4DF "TARGET_AVX512VL") (V4SI "TARGET_AVX512VL") (V4SF "TARGET_AVX512VL") (V2DI "TARGET_AVX512VL") (V2DF "TARGET_AVX512VL") - V64QI (V16QI "TARGET_AVX512VL") (V32QI "TARGET_AVX512VL") - V32HI (V16HI "TARGET_AVX512VL") (V8HI "TARGET_AVX512VL")]) + (V64QI "TARGET_EVEX512") (V16QI "TARGET_AVX512VL") (V32QI "TARGET_AVX512VL") + (V32HI "TARGET_EVEX512") (V16HI "TARGET_AVX512VL") (V8HI "TARGET_AVX512VL")]) (define_mode_iterator VI48F_256 [V8SI V8SF V4DI V4DF]) @@ -14217,7 +14217,7 @@ (const_int 26) (const_int 27) (const_int 28) (const_int 29) (const_int 30) (const_int 31)])))] - "TARGET_AVX512VBMI && ix86_pre_reload_split ()" + "TARGET_AVX512VBMI && TARGET_EVEX512 && ix86_pre_reload_split ()" "#" "&& 1" [(set (match_dup 0) @@ -16040,7 +16040,7 @@ "TARGET_SSE2" { /* Try with vnni instructions. */ - if (( == 64 && TARGET_AVX512VNNI) + if (( == 64 && TARGET_AVX512VNNI && TARGET_EVEX512) || ( < 64 && ((TARGET_AVX512VNNI && TARGET_AVX512VL) || TARGET_AVXVNNI))) { @@ -17320,7 +17320,8 @@ (V8DF "TARGET_AVX512F && TARGET_EVEX512") (V16SI "TARGET_AVX512F && TARGET_EVEX512") (V8DI "TARGET_AVX512F && TARGET_EVEX512") - (V32HI "TARGET_AVX512BW && TARGET_EVEX512") (V64QI "TARGET_AVX512VBMI") + (V32HI "TARGET_AVX512BW && TARGET_EVEX512") + (V64QI "TARGET_AVX512VBMI && TARGET_EVEX512") (V32HF "TARGET_AVX512FP16")]) (define_expand "vec_perm" @@ -26983,7 +26984,8 @@ (V32HI "TARGET_AVX512BW && TARGET_EVEX512") (V16HI "TARGET_AVX512BW && TARGET_AVX512VL") (V8HI "TARGET_AVX512BW && TARGET_AVX512VL") - (V64QI "TARGET_AVX512VBMI") (V32QI "TARGET_AVX512VBMI && TARGET_AVX512VL") + (V64QI "TARGET_AVX512VBMI && TARGET_EVEX512") + (V32QI "TARGET_AVX512VBMI && TARGET_AVX512VL") (V16QI "TARGET_AVX512VBMI && TARGET_AVX512VL")]) (define_mode_iterator VPERMI2I @@ -26993,7 +26995,8 @@ (V32HI "TARGET_AVX512BW && TARGET_EVEX512") (V16HI "TARGET_AVX512BW && TARGET_AVX512VL") (V8HI "TARGET_AVX512BW && TARGET_AVX512VL") - (V64QI "TARGET_AVX512VBMI") (V32QI "TARGET_AVX512VBMI && TARGET_AVX512VL") + (V64QI "TARGET_AVX512VBMI && TARGET_EVEX512") + (V32QI "TARGET_AVX512VBMI && TARGET_AVX512VL") (V16QI "TARGET_AVX512VBMI && TARGET_AVX512VL")]) (define_expand "_vpermi2var3_mask" @@ -28977,7 +28980,7 @@ (match_operand:V8DI 2 "register_operand" "v") (match_operand:V8DI 3 "nonimmediate_operand" "vm")] VPMADD52))] - "TARGET_AVX512IFMA" + "TARGET_AVX512IFMA && TARGET_EVEX512" "vpmadd52\t{%3, %2, %0|%0, %2, %3}" [(set_attr "type" "ssemuladd") (set_attr "prefix" "evex") @@ -29579,9 +29582,9 @@ (match_operand:VI1_AVX512VNNI 1 "register_operand") (match_operand:VI1_AVX512VNNI 2 "register_operand") (match_operand: 3 "register_operand")] - "( == 64 - ||((TARGET_AVX512VNNI && TARGET_AVX512VL) - || TARGET_AVXVNNI))" + "(( == 64 && TARGET_EVEX512) + || ((TARGET_AVX512VNNI && TARGET_AVX512VL) + || TARGET_AVXVNNI))" { operands[1] = lowpart_subreg (mode, force_reg (mode, operands[1]), @@ -29602,7 +29605,7 @@ (match_operand:V16SI 2 "register_operand" "v") (match_operand:V16SI 3 "nonimmediate_operand" "vm")] UNSPEC_VPDPBUSD))] - "TARGET_AVX512VNNI" + "TARGET_AVX512VNNI && TARGET_EVEX512" "vpdpbusd\t{%3, %2, %0|%0, %2, %3}" [(set_attr ("prefix") ("evex"))]) @@ -29670,7 +29673,7 @@ (match_operand:V16SI 2 "register_operand" "v") (match_operand:V16SI 3 "nonimmediate_operand" "vm")] UNSPEC_VPDPBUSDS))] - "TARGET_AVX512VNNI" + "TARGET_AVX512VNNI && TARGET_EVEX512" "vpdpbusds\t{%3, %2, %0|%0, %2, %3}" [(set_attr ("prefix") ("evex"))]) @@ -29738,7 +29741,7 @@ (match_operand:V16SI 2 "register_operand" "v") (match_operand:V16SI 3 "nonimmediate_operand" "vm")] UNSPEC_VPDPWSSD))] - "TARGET_AVX512VNNI" + "TARGET_AVX512VNNI && TARGET_EVEX512" "vpdpwssd\t{%3, %2, %0|%0, %2, %3}" [(set_attr ("prefix") ("evex"))]) @@ -29806,7 +29809,7 @@ (match_operand:V16SI 2 "register_operand" "v") (match_operand:V16SI 3 "nonimmediate_operand" "vm")] UNSPEC_VPDPWSSDS))] - "TARGET_AVX512VNNI" + "TARGET_AVX512VNNI && TARGET_EVEX512" "vpdpwssds\t{%3, %2, %0|%0, %2, %3}" [(set_attr ("prefix") ("evex"))]) @@ -29929,9 +29932,9 @@ (set_attr "mode" "")]) (define_mode_iterator VI48_AVX512VP2VL - [V8DI - (V4DI "TARGET_AVX512VL") (V2DI "TARGET_AVX512VL") - (V8SI "TARGET_AVX512VL") (V4SI "TARGET_AVX512VL")]) + [(V8DI "TARGET_EVEX512") + (V4DI "TARGET_AVX512VL") (V2DI "TARGET_AVX512VL") + (V8SI "TARGET_AVX512VL") (V4SI "TARGET_AVX512VL")]) (define_mode_iterator MASK_DWI [P2QI P2HI]) @@ -29972,12 +29975,12 @@ (unspec:P2HI [(match_operand:V16SI 1 "register_operand" "v") (match_operand:V16SI 2 "vector_operand" "vm")] UNSPEC_VP2INTERSECT))] - "TARGET_AVX512VP2INTERSECT" + "TARGET_AVX512VP2INTERSECT && TARGET_EVEX512" "vp2intersectd\t{%2, %1, %0|%0, %1, %2}" [(set_attr ("prefix") ("evex"))]) (define_mode_iterator VF_AVX512BF16VL - [V32BF (V16BF "TARGET_AVX512VL") (V8BF "TARGET_AVX512VL")]) + [(V32BF "TARGET_EVEX512") (V16BF "TARGET_AVX512VL") (V8BF "TARGET_AVX512VL")]) ;; Converting from BF to SF (define_mode_attr bf16_cvt_2sf [(V32BF "V16SF") (V16BF "V8SF") (V8BF "V4SF")]) @@ -30070,7 +30073,8 @@ "TARGET_AVX512BF16 && TARGET_AVX512VL" "vcvtneps2bf16{x}\t{%1, %0%{%3%}%N2|%0%{%3%}%N2, %1}") -(define_mode_iterator VF1_AVX512_256 [V16SF (V8SF "TARGET_AVX512VL")]) +(define_mode_iterator VF1_AVX512_256 + [(V16SF "TARGET_EVEX512") (V8SF "TARGET_AVX512VL")]) (define_expand "avx512f_cvtneps2bf16__maskz" [(match_operand: 0 "register_operand") diff --git a/gcc/testsuite/gcc.target/i386/pr90096.c b/gcc/testsuite/gcc.target/i386/pr90096.c index 871e0ffc691..74f052ea8e5 100644 --- a/gcc/testsuite/gcc.target/i386/pr90096.c +++ b/gcc/testsuite/gcc.target/i386/pr90096.c @@ -10,7 +10,7 @@ volatile __mmask64 m64; void foo (int i) { - x1 = _mm512_gf2p8affineinv_epi64_epi8 (x1, x2, 3); /* { dg-error "needs isa option -mgfni -mavx512f" } */ + x1 = _mm512_gf2p8affineinv_epi64_epi8 (x1, x2, 3); /* { dg-error "needs isa option -mevex512 -mgfni -mavx512f" } */ } #ifdef __x86_64__ From patchwork Thu Sep 21 07:20:12 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Hu, Lin1" X-Patchwork-Id: 1837518 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=intel.com header.i=@intel.com header.a=rsa-sha256 header.s=Intel header.b=ROfteOMk; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4Rrn2c4y0Gz1ynF for ; Thu, 21 Sep 2023 17:24:08 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 2AB4F386183D for ; Thu, 21 Sep 2023 07:24:04 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mgamail.intel.com (mgamail.intel.com [192.55.52.115]) by sourceware.org (Postfix) with ESMTPS id 5CDE63857714 for ; Thu, 21 Sep 2023 07:22:37 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 5CDE63857714 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=intel.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1695280957; x=1726816957; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=CLDeXvQkWmf6bR9OSS8ae1zV1STIEYnRRFgpmny5pk4=; b=ROfteOMkp9oA3tUcDYO52U+u7YoBRgGhbq9LuM1YElW/4tgi4Nx57v9p m+86sdIcgMCHYc884sP9IQo6SDG6uO2LVaExYprpoWcIzIxhfIqHHUHl7 bTSVhgqR4VZDo7SblTxLjcQVLi6hzA1CSF446h+lTtpRJn/lEtN6EvVF2 5WRSAIOQmkLmDf3K9r8NliAX9++P9veuZL5nUmj8ufJivjtMvXvvgXlwr xJt3TfUSLKJKUOuNtsU1lrygksDou9g4+77zFIy2EHfes3cgr5Ncg5xcu kfJr+G9vZyluvowJbprNI0RLILFWq2kP0z1tzMbzda7FXRL156Vqz31xk g==; X-IronPort-AV: E=McAfee;i="6600,9927,10839"; a="380352161" X-IronPort-AV: E=Sophos;i="6.03,164,1694761200"; d="scan'208";a="380352161" Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Sep 2023 00:22:19 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10839"; a="817262193" X-IronPort-AV: E=Sophos;i="6.03,164,1694761200"; d="scan'208";a="817262193" Received: from shvmail03.sh.intel.com ([10.239.245.20]) by fmsmga004.fm.intel.com with ESMTP; 21 Sep 2023 00:22:17 -0700 Received: from shliclel4217.sh.intel.com (shliclel4217.sh.intel.com [10.239.240.127]) by shvmail03.sh.intel.com (Postfix) with ESMTP id 690631005142; Thu, 21 Sep 2023 15:22:14 +0800 (CST) From: "Hu, Lin1" To: gcc-patches@gcc.gnu.org Cc: hongtao.liu@intel.com, ubizjak@gmail.com, haochen.jiang@intel.com Subject: [PATCH 17/18] Support -mevex512 for AVX512FP16 intrins Date: Thu, 21 Sep 2023 15:20:12 +0800 Message-Id: <20230921072013.2124750-18-lin1.hu@intel.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20230921072013.2124750-1-lin1.hu@intel.com> References: <20230921072013.2124750-1-lin1.hu@intel.com> MIME-Version: 1.0 X-Spam-Status: No, score=-11.6 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, SPF_HELO_NONE, SPF_NONE, TXREP, UPPERCASE_50_75 autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org From: Haochen Jiang gcc/ChangeLog: * config/i386/sse.md (V48H_AVX512VL): Add TARGET_EVEX512. (VFH): Ditto. (VF2H): Ditto. (VFH_AVX512VL): Ditto. (VHFBF): Ditto. (VHF_AVX512VL): Ditto. (VI2H_AVX512VL): Ditto. (VI2F_256_512): Ditto. (VF48_I1248): Remove unused iterator. (VF48H_AVX512VL): Add TARGET_EVEX512. (VF_AVX512): Remove unused iterator. (REDUC_PLUS_MODE): Add TARGET_EVEX512. (REDUC_SMINMAX_MODE): Ditto. (FMAMODEM): Ditto. (VFH_SF_AVX512VL): Ditto. (VEC_PERM_AVX2): Ditto. Co-authored-by: Hu, Lin1 --- gcc/config/i386/sse.md | 44 ++++++++++++++++++++---------------------- 1 file changed, 21 insertions(+), 23 deletions(-) diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index a5a95b9de66..25d53e15dce 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -280,7 +280,7 @@ (define_mode_iterator V48H_AVX512VL [(V16SI "TARGET_EVEX512") (V8SI "TARGET_AVX512VL") (V4SI "TARGET_AVX512VL") (V8DI "TARGET_EVEX512") (V4DI "TARGET_AVX512VL") (V2DI "TARGET_AVX512VL") - (V32HF "TARGET_AVX512FP16") + (V32HF "TARGET_AVX512FP16 && TARGET_EVEX512") (V16HF "TARGET_AVX512FP16 && TARGET_AVX512VL") (V8HF "TARGET_AVX512FP16 && TARGET_AVX512VL") (V16SF "TARGET_EVEX512") (V8SF "TARGET_AVX512VL") (V4SF "TARGET_AVX512VL") @@ -355,7 +355,7 @@ (V2DF "TARGET_AVX512DQ && TARGET_AVX512VL")]) (define_mode_iterator VFH - [(V32HF "TARGET_AVX512FP16") + [(V32HF "TARGET_AVX512FP16 && TARGET_EVEX512") (V16HF "TARGET_AVX512FP16 && TARGET_AVX512VL") (V8HF "TARGET_AVX512FP16 && TARGET_AVX512VL") (V16SF "TARGET_AVX512F && TARGET_EVEX512") (V8SF "TARGET_AVX") V4SF @@ -401,7 +401,7 @@ ;; All DFmode & HFmode vector float modes (define_mode_iterator VF2H - [(V32HF "TARGET_AVX512FP16") + [(V32HF "TARGET_AVX512FP16 && TARGET_EVEX512") (V16HF "TARGET_AVX512FP16 && TARGET_AVX512VL") (V8HF "TARGET_AVX512FP16 && TARGET_AVX512VL") (V8DF "TARGET_AVX512F && TARGET_EVEX512") (V4DF "TARGET_AVX") V2DF]) @@ -463,7 +463,7 @@ [(V16SF "TARGET_AVX512ER") (V8SF "TARGET_AVX") V4SF]) (define_mode_iterator VFH_AVX512VL - [(V32HF "TARGET_AVX512FP16") + [(V32HF "TARGET_AVX512FP16 && TARGET_EVEX512") (V16HF "TARGET_AVX512FP16 && TARGET_AVX512VL") (V8HF "TARGET_AVX512FP16 && TARGET_AVX512VL") (V16SF "TARGET_EVEX512") (V8SF "TARGET_AVX512VL") (V4SF "TARGET_AVX512VL") @@ -475,12 +475,14 @@ (define_mode_iterator VF1_AVX512VL [(V16SF "TARGET_EVEX512") (V8SF "TARGET_AVX512VL") (V4SF "TARGET_AVX512VL")]) -(define_mode_iterator VHFBF [V32HF V16HF V8HF V32BF V16BF V8BF]) +(define_mode_iterator VHFBF + [(V32HF "TARGET_EVEX512") V16HF V8HF + (V32BF "TARGET_EVEX512") V16BF V8BF]) (define_mode_iterator VHFBF_256 [V16HF V16BF]) (define_mode_iterator VHFBF_128 [V8HF V8BF]) (define_mode_iterator VHF_AVX512VL - [V32HF (V16HF "TARGET_AVX512VL") (V8HF "TARGET_AVX512VL")]) + [(V32HF "TARGET_EVEX512") (V16HF "TARGET_AVX512VL") (V8HF "TARGET_AVX512VL")]) (define_mode_iterator VHFBF_AVX512VL [(V32HF "TARGET_EVEX512") (V16HF "TARGET_AVX512VL") (V8HF "TARGET_AVX512VL") @@ -594,9 +596,9 @@ (V8BF "TARGET_AVX512VL") (V16BF "TARGET_AVX512VL") (V32BF "TARGET_EVEX512")]) (define_mode_iterator VI2H_AVX512VL - [(V8HI "TARGET_AVX512VL") (V16HI "TARGET_AVX512VL") V32HI - (V8SI "TARGET_AVX512VL") V16SI - V8DI ]) + [(V8HI "TARGET_AVX512VL") (V16HI "TARGET_AVX512VL") (V32HI "TARGET_EVEX512") + (V8SI "TARGET_AVX512VL") (V16SI "TARGET_EVEX512") + (V8DI "TARGET_EVEX512")]) (define_mode_iterator VI1_AVX512VL_F [V32QI (V16QI "TARGET_AVX512VL") (V64QI "TARGET_AVX512F && TARGET_EVEX512")]) @@ -883,7 +885,10 @@ (V32BF "TARGET_AVX512BW && TARGET_EVEX512")]) ;; Int-float size matches -(define_mode_iterator VI2F_256_512 [V16HI V32HI V16HF V32HF V16BF V32BF]) +(define_mode_iterator VI2F_256_512 + [V16HI (V32HI "TARGET_EVEX512") + V16HF (V32HF "TARGET_EVEX512") + V16BF (V32BF "TARGET_EVEX512")]) (define_mode_iterator VI4F_128 [V4SI V4SF]) (define_mode_iterator VI8F_128 [V2DI V2DF]) (define_mode_iterator VI4F_256 [V8SI V8SF]) @@ -899,10 +904,8 @@ (V8DI "TARGET_AVX512F && TARGET_EVEX512") (V8DF "TARGET_AVX512F && TARGET_EVEX512") (V4DI "TARGET_AVX512VL") (V4DF "TARGET_AVX512VL")]) -(define_mode_iterator VF48_I1248 - [V16SI V16SF V8DI V8DF V32HI V64QI]) (define_mode_iterator VF48H_AVX512VL - [V8DF V16SF (V8SF "TARGET_AVX512VL")]) + [(V8DF "TARGET_EVEX512") (V16SF "TARGET_EVEX512") (V8SF "TARGET_AVX512VL")]) (define_mode_iterator VF48_128 [V2DF V4SF]) @@ -928,11 +931,6 @@ (define_mode_iterator VI48F_256 [V8SI V8SF V4DI V4DF]) -(define_mode_iterator VF_AVX512 - [(V4SF "TARGET_AVX512VL") (V2DF "TARGET_AVX512VL") - (V8SF "TARGET_AVX512VL") (V4DF "TARGET_AVX512VL") - V16SF V8DF]) - (define_mode_iterator V8_128 [V8HI V8HF V8BF]) (define_mode_iterator V16_256 [V16HI V16HF V16BF]) (define_mode_iterator V32_512 @@ -3419,7 +3417,7 @@ (V16HF "TARGET_AVX512FP16 && TARGET_AVX512VL") (V8DF "TARGET_AVX512F && TARGET_EVEX512") (V16SF "TARGET_AVX512F && TARGET_EVEX512") - (V32HF "TARGET_AVX512FP16 && TARGET_AVX512VL") + (V32HF "TARGET_AVX512FP16 && TARGET_AVX512VL && TARGET_EVEX512") (V32QI "TARGET_AVX") (V64QI "TARGET_AVX512F && TARGET_EVEX512")]) @@ -3464,7 +3462,7 @@ (V8SI "TARGET_AVX2") (V4DI "TARGET_AVX2") (V8SF "TARGET_AVX") (V4DF "TARGET_AVX") (V64QI "TARGET_AVX512BW && TARGET_EVEX512") - (V32HF "TARGET_AVX512FP16 && TARGET_AVX512VL") + (V32HF "TARGET_AVX512FP16 && TARGET_AVX512VL && TARGET_EVEX512") (V32HI "TARGET_AVX512BW && TARGET_EVEX512") (V16SI "TARGET_AVX512F && TARGET_EVEX512") (V8DI "TARGET_AVX512F && TARGET_EVEX512") @@ -5318,7 +5316,7 @@ (HF "TARGET_AVX512FP16") (V8HF "TARGET_AVX512FP16 && TARGET_AVX512VL") (V16HF "TARGET_AVX512FP16 && TARGET_AVX512VL") - (V32HF "TARGET_AVX512FP16")]) + (V32HF "TARGET_AVX512FP16 && TARGET_EVEX512")]) (define_expand "fma4" [(set (match_operand:FMAMODEM 0 "register_operand") @@ -5427,7 +5425,7 @@ ;; Suppose AVX-512F as baseline (define_mode_iterator VFH_SF_AVX512VL - [(V32HF "TARGET_AVX512FP16") + [(V32HF "TARGET_AVX512FP16 && TARGET_EVEX512") (V16HF "TARGET_AVX512FP16 && TARGET_AVX512VL") (V8HF "TARGET_AVX512FP16 && TARGET_AVX512VL") (HF "TARGET_AVX512FP16") @@ -17322,7 +17320,7 @@ (V8DI "TARGET_AVX512F && TARGET_EVEX512") (V32HI "TARGET_AVX512BW && TARGET_EVEX512") (V64QI "TARGET_AVX512VBMI && TARGET_EVEX512") - (V32HF "TARGET_AVX512FP16")]) + (V32HF "TARGET_AVX512FP16 && TARGET_EVEX512")]) (define_expand "vec_perm" [(match_operand:VEC_PERM_AVX2 0 "register_operand") From patchwork Thu Sep 21 07:20:13 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Hu, Lin1" X-Patchwork-Id: 1837516 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=intel.com header.i=@intel.com header.a=rsa-sha256 header.s=Intel header.b=auS06QOL; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=8.43.85.97; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4Rrn2J6jycz1ynF for ; Thu, 21 Sep 2023 17:23:52 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id E8273388215F for ; Thu, 21 Sep 2023 07:23:47 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mgamail.intel.com (mgamail.intel.com [192.55.52.115]) by sourceware.org (Postfix) with ESMTPS id 68A48385703C for ; Thu, 21 Sep 2023 07:22:39 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 68A48385703C Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=intel.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1695280959; x=1726816959; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=4WjSflmpm+nytvYcfQp5oGl2ohkto/29GbQ7KBJPK9E=; b=auS06QOLtoyNMD8FEHipZzwUi1HezI+6Y3f0Kq4orPKH1sAbapsDxuAh jE96B1y5Nw2kwDOmc+OK6e0uHGV3BxWVZNDjaJaheYL55GLX2TwVnLksk 9pB6sJuJDG2K1ihqVmpH0AYjf7edZPwD+swj0BbTjweJAbHoW8zA3cMbu xVHzlyfoS3Eq0FqzgorBMudy4blM/Q7XFNiKwzf4VGlr+2eSJmjcG5c3u kv3x57AtBz4OLpdke0E5NotsK7nnneEqvwilomRpoO5VhH3wZJXVXn0wu JrVN5XYnVVE1c7XmjSkr7nSKV4PY2iRhG02zWkrL0ihb3Df6jWE0MmA+Q Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10839"; a="380352159" X-IronPort-AV: E=Sophos;i="6.03,164,1694761200"; d="scan'208";a="380352159" Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Sep 2023 00:22:19 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10839"; a="817262190" X-IronPort-AV: E=Sophos;i="6.03,164,1694761200"; d="scan'208";a="817262190" Received: from shvmail03.sh.intel.com ([10.239.245.20]) by fmsmga004.fm.intel.com with ESMTP; 21 Sep 2023 00:22:17 -0700 Received: from shliclel4217.sh.intel.com (shliclel4217.sh.intel.com [10.239.240.127]) by shvmail03.sh.intel.com (Postfix) with ESMTP id 6AE49100514D; Thu, 21 Sep 2023 15:22:14 +0800 (CST) From: "Hu, Lin1" To: gcc-patches@gcc.gnu.org Cc: hongtao.liu@intel.com, ubizjak@gmail.com, haochen.jiang@intel.com Subject: [PATCH 18/18] Allow -mno-evex512 usage Date: Thu, 21 Sep 2023 15:20:13 +0800 Message-Id: <20230921072013.2124750-19-lin1.hu@intel.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20230921072013.2124750-1-lin1.hu@intel.com> References: <20230921072013.2124750-1-lin1.hu@intel.com> MIME-Version: 1.0 X-Spam-Status: No, score=-11.3 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org From: Haochen Jiang gcc/ChangeLog: * config/i386/i386.opt: Allow -mno-evex512. gcc/testsuite/ChangeLog: * gcc.target/i386/noevex512-1.c: New test. * gcc.target/i386/noevex512-2.c: Ditto. * gcc.target/i386/noevex512-3.c: Ditto. --- gcc/config/i386/i386.opt | 2 +- gcc/testsuite/gcc.target/i386/noevex512-1.c | 13 +++++++++++++ gcc/testsuite/gcc.target/i386/noevex512-2.c | 13 +++++++++++++ gcc/testsuite/gcc.target/i386/noevex512-3.c | 13 +++++++++++++ 4 files changed, 40 insertions(+), 1 deletion(-) create mode 100644 gcc/testsuite/gcc.target/i386/noevex512-1.c create mode 100644 gcc/testsuite/gcc.target/i386/noevex512-2.c create mode 100644 gcc/testsuite/gcc.target/i386/noevex512-3.c diff --git a/gcc/config/i386/i386.opt b/gcc/config/i386/i386.opt index 6d8601b1f75..34fc167af82 100644 --- a/gcc/config/i386/i386.opt +++ b/gcc/config/i386/i386.opt @@ -1312,5 +1312,5 @@ Target Alias(mtune-ctrl=, use_scatter, ^use_scatter) Enable vectorization for scatter instruction. mevex512 -Target RejectNegative Mask(ISA2_EVEX512) Var(ix86_isa_flags2) Save +Target Mask(ISA2_EVEX512) Var(ix86_isa_flags2) Save Support 512 bit vector built-in functions and code generation. diff --git a/gcc/testsuite/gcc.target/i386/noevex512-1.c b/gcc/testsuite/gcc.target/i386/noevex512-1.c new file mode 100644 index 00000000000..7fd45f15be6 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/noevex512-1.c @@ -0,0 +1,13 @@ +/* { dg-do compile } */ +/* { dg-options "-O0 -march=x86-64 -mavx512f -mno-evex512 -Wno-psabi" } */ +/* { dg-final { scan-assembler-not ".%zmm" } } */ + +typedef double __m512d __attribute__ ((__vector_size__ (64), __may_alias__)); + +__m512d +foo () +{ + __m512d a, b; + a = a + b; + return a; +} diff --git a/gcc/testsuite/gcc.target/i386/noevex512-2.c b/gcc/testsuite/gcc.target/i386/noevex512-2.c new file mode 100644 index 00000000000..1c206e385d0 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/noevex512-2.c @@ -0,0 +1,13 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -march=x86-64 -mavx512bw -mno-evex512" } */ + +#include + +long long +foo (long long c) +{ + register long long a __asm ("k7") = c; + long long b = foo (a); + asm volatile ("" : "+k" (b)); /* { dg-error "inconsistent operand constraints in an 'asm'" } */ + return b; +} diff --git a/gcc/testsuite/gcc.target/i386/noevex512-3.c b/gcc/testsuite/gcc.target/i386/noevex512-3.c new file mode 100644 index 00000000000..10e00c2d61c --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/noevex512-3.c @@ -0,0 +1,13 @@ +/* { dg-do compile } */ +/* { dg-options "-march=x86-64 -Wno-psabi -mavx512f" } */ +/* { dg-final { scan-assembler-not ".%zmm" } } */ + +typedef double __m512d __attribute__ ((__vector_size__ (64), __may_alias__)); + +__attribute__ ((target ("no-evex512"))) __m512d +foo () +{ + __m512d a, b; + a = a + b; + return a; +}