[47/62] AVX512FP16: Add scalar fma instructions.

Message ID	20210701061648.9447-48-hongtao.liu@intel.com
State	New
Headers	show Return-Path: <gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org> DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 480863848024 To: gcc-patches@gcc.gnu.org Subject: [PATCH 47/62] AVX512FP16: Add scalar fma instructions. Date: Thu, 1 Jul 2021 14:16:33 +0800 Message-Id: <20210701061648.9447-48-hongtao.liu@intel.com> In-Reply-To: <20210701061648.9447-1-hongtao.liu@intel.com> References: <20210701061648.9447-1-hongtao.liu@intel.com> Precedence: list From: liuhongt via Gcc-patches <gcc-patches@gcc.gnu.org> Reply-To: liuhongt <hongtao.liu@intel.com> Cc: jakub@redhat.com Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" <gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org>
Series	Support all AVX512FP16 intrinsics. \| expand [00/62] Support all AVX512FP16 intrinsics. [01/62] AVX512FP16: Support vector init/broadcast for FP16. [02/62] AVX512FP16: Add testcase for vector init and broadcast intrinsics. [03/62] AVX512FP16: Fix HF vector passing in variable arguments. [04/62] AVX512FP16: Add ABI tests for xmm. [05/62] AVX512FP16: Add ABI test for ymm. [06/62] AVX512FP16: Add abi test for zmm [07/62] AVX512FP16: Add vaddph/vsubph/vdivph/vmulph. [08/62] AVX512FP16: Add testcase for vaddph/vsubph/vmulph/vdivph. [09/62] AVX512FP16: Enable _Float16 autovectorization [10/62] AVX512FP16: Add vaddsh/vsubsh/vmulsh/vdivsh. [11/62] AVX512FP16: Add testcase for vaddsh/vsubsh/vmulsh/vdivsh. [12/62] AVX512FP16: Add vmaxph/vminph/vmaxsh/vminsh. [13/62] AVX512FP16: Add testcase for vmaxph/vmaxsh/vminph/vminsh. [14/62] AVX512FP16: Add vcmpph/vcmpsh/vcomish/vucomish. [15/62] AVX512FP16: Add testcase for vcmpph/vcmpsh/vcomish/vucomish. [16/62] AVX512FP16: Add vsqrtph/vrsqrtph/vsqrtsh/vrsqrtsh. [17/62] AVX512FP16: Add testcase for vsqrtph/vsqrtsh/vrsqrtph/vrsqrtsh. [18/62] AVX512FP16: Add vrcpph/vrcpsh/vscalefph/vscalefsh. [19/62] AVX512FP16: Add testcase for vrcpph/vrcpsh/vscalefph/vscalefsh. [20/62] AVX512FP16: Add vreduceph/vreducesh/vrndscaleph/vrndscalesh. [21/62] AVX512FP16: Add testcase for vreduceph/vreducesh/vrndscaleph/vrndscalesh. [22/62] AVX512FP16: Add fpclass/getexp/getmant instructions. [23/62] AVX512FP16: Add testcase for fpclass/getmant/getexp instructions. [24/62] AVX512FP16: Add vmovw/vmovsh. [25/62] AVX512FP16: Add testcase for vmovsh/vmovw. [26/62] AVX512FP16: Add vcvtph2dq/vcvtph2qq/vcvtph2w/vcvtph2uw/vcvtph2uqq/vcvtph2udq [27/62] AVX512FP16: Add testcase for vcvtph2w/vcvtph2uw/vcvtph2dq/vcvtph2udq/vcvtph2qq/vcvtph2uqq. [28/62] AVX512FP16: Add vcvtuw2ph/vcvtw2ph/vcvtdq2ph/vcvtudq2ph/vcvtqq2ph/vcvtuqq2ph [29/62] AVX512FP16: Add testcase for vcvtw2ph/vcvtuw2ph/vcvtdq2ph/vcvtudq2ph/vcvtqq2ph/vcvtuqq2ph. [30/62] AVX512FP16: Add vcvtsh2si/vcvtsh2usi/vcvtsi2sh/vcvtusi2sh. [31/62] AVX512FP16: Add testcase for vcvtsh2si/vcvtsh2usi/vcvtsi2sh/vcvtusi2sh. [32/62] AVX512FP16: Add vcvttph2w/vcvttph2uw/vcvttph2dq/vcvttph2qq/vcvttph2udq/vcvttph2uqq [33/62] AVX512FP16: Add testcase for vcvttph2w/vcvttph2uw/vcvttph2dq/vcvttph2udq/vcvttph2qq/vcvttph… [34/62] AVX512FP16: Add vcvttsh2si/vcvttsh2usi. [35/62] AVX512FP16: Add vcvtph2pd/vcvtph2psx/vcvtpd2ph/vcvtps2phx. [36/62] AVX512FP16: Add testcase for vcvtph2pd/vcvtph2psx/vcvtpd2ph/vcvtps2phx. [37/62] AVX512FP16: Add vcvtsh2ss/vcvtsh2sd/vcvtss2sh/vcvtsd2sh. [38/62] AVX512FP16: Add testcase for vcvtsh2sd/vcvtsh2ss/vcvtsd2sh/vcvtss2sh. [39/62] AVX512FP16: Add intrinsics for casting between vector float16 and vector float32/float64/in… [40/62] AVX512FP16: Add vfmaddsub[132, 213, 231]ph/vfmsubadd[132, 213, 231]ph. [41/62] AVX512FP16: Add testcase for vfmaddsub[132, 213, 231]ph/vfmsubadd[132, 213, 231]ph. [42/62] AVX512FP16: Add FP16 fma instructions. [43/62] AVX512FP16: Add testcase for fma instructions [44/62] AVX512FP16: Add scalar/vector bitwise operations, including [45/62] AVX512FP16: Add testcase for fp16 bitwise operations. [46/62] AVX512FP16: Enable FP16 mask load/store. [47/62] AVX512FP16: Add scalar fma instructions. [48/62] AVX512FP16: Add testcase for scalar FMA instructions. [49/62] AVX512FP16: Add vfcmaddcph/vfmaddcph/vfcmulcph/vfmulcph [50/62] AVX512FP16: Add testcases for vfcmaddcph/vfmaddcph/vfcmulcph/vfmulcph. [51/62] AVX512FP16: Add vfcmaddcsh/vfmaddcsh/vfcmulcsh/vfmulcsh. [52/62] AVX512FP16: Add testcases for vfcmaddcsh/vfmaddcsh/vfcmulcsh/vfmulcsh. [53/62] AVX512FP16: Add expander for sqrthf2. [54/62] AVX512FP16: Add expander for ceil/floor/trunc/roundeven. [55/62] AVX512FP16: Add expander for cstorehf4. [56/62] AVX512FP16: Optimize (_Float16) sqrtf ((float) f16) to sqrtf16 (f16). [57/62] AVX512FP16: Add expander for fmahf4 [58/62] AVX512FP16: Optimize for code like (_Float16) __builtin_ceif ((float) f16). [59/62] AVX512FP16: Support load/store/abs intrinsics. [60/62] AVX512FP16: Add reduce operators(add/mul/min/max). [61/62] AVX512FP16: Add complex conjugation intrinsic instructions. [62/62] AVX512FP16: Add permutation and mask blend intrinsics.

diff --git a/gcc/config/i386/avx512fp16intrin.h b/gcc/config/i386/avx512fp16intrin.h index f246bab5159..5c85ec15b22 100644 --- a/gcc/config/i386/avx512fp16intrin.h +++ b/gcc/config/i386/avx512fp16intrin.h @@ -5697,6 +5697,418 @@ _mm512_maskz_fnmsub_round_ph (__mmask32 __U, __m512h __A, __m512h __B, #endif /* __OPTIMIZE__ */ +/* Intrinsics vfmadd[132,213,231]sh. */ +extern __inline __m128h + __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_fmadd_sh (__m128h __W, __m128h __A, __m128h __B) +{ + return (__m128h) __builtin_ia32_vfmaddsh3_mask ((__v8hf) __W, + (__v8hf) __A, + (__v8hf) __B, + (__mmask8) -1, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_fmadd_sh (__m128h __W, __mmask8 __U, __m128h __A, __m128h __B) +{ + return (__m128h) __builtin_ia32_vfmaddsh3_mask ((__v8hf) __W, + (__v8hf) __A, + (__v8hf) __B, + (__mmask8) __U, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask3_fmadd_sh (__m128h __W, __m128h __A, __m128h __B, __mmask8 __U) +{ + return (__m128h) __builtin_ia32_vfmaddsh3_mask3 ((__v8hf) __W, + (__v8hf) __A, + (__v8hf) __B, + (__mmask8) __U, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_fmadd_sh (__mmask8 __U, __m128h __W, __m128h __A, __m128h __B) +{ + return (__m128h) __builtin_ia32_vfmaddsh3_maskz ((__v8hf) __W, + (__v8hf) __A, + (__v8hf) __B, + (__mmask8) __U, + _MM_FROUND_CUR_DIRECTION); +} + + +#ifdef __OPTIMIZE__ +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_fmadd_round_sh (__m128h __W, __m128h __A, __m128h __B, const int __R) +{ + return (__m128h) __builtin_ia32_vfmaddsh3_mask ((__v8hf) __W, + (__v8hf) __A, + (__v8hf) __B, + (__mmask8) -1, + __R); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_fmadd_round_sh (__m128h __W, __mmask8 __U, __m128h __A, __m128h __B, + const int __R) +{ + return (__m128h) __builtin_ia32_vfmaddsh3_mask ((__v8hf) __W, + (__v8hf) __A, + (__v8hf) __B, + (__mmask8) __U, __R); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask3_fmadd_round_sh (__m128h __W, __m128h __A, __m128h __B, __mmask8 __U, + const int __R) +{ + return (__m128h) __builtin_ia32_vfmaddsh3_mask3 ((__v8hf) __W, + (__v8hf) __A, + (__v8hf) __B, + (__mmask8) __U, __R); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_fmadd_round_sh (__mmask8 __U, __m128h __W, __m128h __A, + __m128h __B, const int __R) +{ + return (__m128h) __builtin_ia32_vfmaddsh3_maskz ((__v8hf) __W, + (__v8hf) __A, + (__v8hf) __B, + (__mmask8) __U, __R); +} + +#else +#define _mm_fmadd_round_sh(A, B, C, R) \ + ((__m128h) __builtin_ia32_vfmaddsh3_mask ((A), (B), (C), (-1), (R))) +#define _mm_mask_fmadd_round_sh(A, U, B, C, R) \ + ((__m128h) __builtin_ia32_vfmaddsh3_mask ((A), (B), (C), (U), (R))) +#define _mm_mask3_fmadd_round_sh(A, B, C, U, R) \ + ((__m128h) __builtin_ia32_vfmaddsh3_mask3 ((A), (B), (C), (U), (R))) +#define _mm_maskz_fmadd_round_sh(U, A, B, C, R) \ + ((__m128h) __builtin_ia32_vfmaddsh3_maskz ((A), (B), (C), (U), (R))) + +#endif /* __OPTIMIZE__ */ + +/* Intrinsics vfnmadd[132,213,231]sh. */ +extern __inline __m128h + __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_fnmadd_sh (__m128h __W, __m128h __A, __m128h __B) +{ + return (__m128h) __builtin_ia32_vfnmaddsh3_mask ((__v8hf) __W, + (__v8hf) __A, + (__v8hf) __B, + (__mmask8) -1, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_fnmadd_sh (__m128h __W, __mmask8 __U, __m128h __A, __m128h __B) +{ + return (__m128h) __builtin_ia32_vfnmaddsh3_mask ((__v8hf) __W, + (__v8hf) __A, + (__v8hf) __B, + (__mmask8) __U, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask3_fnmadd_sh (__m128h __W, __m128h __A, __m128h __B, __mmask8 __U) +{ + return (__m128h) __builtin_ia32_vfnmaddsh3_mask3 ((__v8hf) __W, + (__v8hf) __A, + (__v8hf) __B, + (__mmask8) __U, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_fnmadd_sh (__mmask8 __U, __m128h __W, __m128h __A, __m128h __B) +{ + return (__m128h) __builtin_ia32_vfnmaddsh3_maskz ((__v8hf) __W, + (__v8hf) __A, + (__v8hf) __B, + (__mmask8) __U, + _MM_FROUND_CUR_DIRECTION); +} + + +#ifdef __OPTIMIZE__ +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_fnmadd_round_sh (__m128h __W, __m128h __A, __m128h __B, const int __R) +{ + return (__m128h) __builtin_ia32_vfnmaddsh3_mask ((__v8hf) __W, + (__v8hf) __A, + (__v8hf) __B, + (__mmask8) -1, + __R); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_fnmadd_round_sh (__m128h __W, __mmask8 __U, __m128h __A, __m128h __B, + const int __R) +{ + return (__m128h) __builtin_ia32_vfnmaddsh3_mask ((__v8hf) __W, + (__v8hf) __A, + (__v8hf) __B, + (__mmask8) __U, __R); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask3_fnmadd_round_sh (__m128h __W, __m128h __A, __m128h __B, __mmask8 __U, + const int __R) +{ + return (__m128h) __builtin_ia32_vfnmaddsh3_mask3 ((__v8hf) __W, + (__v8hf) __A, + (__v8hf) __B, + (__mmask8) __U, __R); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_fnmadd_round_sh (__mmask8 __U, __m128h __W, __m128h __A, + __m128h __B, const int __R) +{ + return (__m128h) __builtin_ia32_vfnmaddsh3_maskz ((__v8hf) __W, + (__v8hf) __A, + (__v8hf) __B, + (__mmask8) __U, __R); +} + +#else +#define _mm_fnmadd_round_sh(A, B, C, R) \ + ((__m128h) __builtin_ia32_vfnmaddsh3_mask ((A), (B), (C), (-1), (R))) +#define _mm_mask_fnmadd_round_sh(A, U, B, C, R) \ + ((__m128h) __builtin_ia32_vfnmaddsh3_mask ((A), (B), (C), (U), (R))) +#define _mm_mask3_fnmadd_round_sh(A, B, C, U, R) \ + ((__m128h) __builtin_ia32_vfnmaddsh3_mask3 ((A), (B), (C), (U), (R))) +#define _mm_maskz_fnmadd_round_sh(U, A, B, C, R) \ + ((__m128h) __builtin_ia32_vfnmaddsh3_maskz ((A), (B), (C), (U), (R))) + +#endif /* __OPTIMIZE__ */ + +/* Intrinsics vfmsub[132,213,231]sh. */ +extern __inline __m128h + __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_fmsub_sh (__m128h __W, __m128h __A, __m128h __B) +{ + return (__m128h) __builtin_ia32_vfmaddsh3_mask ((__v8hf) __W, + (__v8hf) __A, + -(__v8hf) __B, + (__mmask8) -1, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_fmsub_sh (__m128h __W, __mmask8 __U, __m128h __A, __m128h __B) +{ + return (__m128h) __builtin_ia32_vfmaddsh3_mask ((__v8hf) __W, + (__v8hf) __A, + -(__v8hf) __B, + (__mmask8) __U, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask3_fmsub_sh (__m128h __W, __m128h __A, __m128h __B, __mmask8 __U) +{ + return (__m128h) __builtin_ia32_vfmsubsh3_mask3 ((__v8hf) __W, + (__v8hf) __A, + (__v8hf) __B, + (__mmask8) __U, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_fmsub_sh (__mmask8 __U, __m128h __W, __m128h __A, __m128h __B) +{ + return (__m128h) __builtin_ia32_vfmaddsh3_maskz ((__v8hf) __W, + (__v8hf) __A, + -(__v8hf) __B, + (__mmask8) __U, + _MM_FROUND_CUR_DIRECTION); +} + + +#ifdef __OPTIMIZE__ +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_fmsub_round_sh (__m128h __W, __m128h __A, __m128h __B, const int __R) +{ + return (__m128h) __builtin_ia32_vfmaddsh3_mask ((__v8hf) __W, + (__v8hf) __A, + -(__v8hf) __B, + (__mmask8) -1, + __R); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_fmsub_round_sh (__m128h __W, __mmask8 __U, __m128h __A, __m128h __B, + const int __R) +{ + return (__m128h) __builtin_ia32_vfmaddsh3_mask ((__v8hf) __W, + (__v8hf) __A, + -(__v8hf) __B, + (__mmask8) __U, __R); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask3_fmsub_round_sh (__m128h __W, __m128h __A, __m128h __B, __mmask8 __U, + const int __R) +{ + return (__m128h) __builtin_ia32_vfmsubsh3_mask3 ((__v8hf) __W, + (__v8hf) __A, + (__v8hf) __B, + (__mmask8) __U, __R); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_fmsub_round_sh (__mmask8 __U, __m128h __W, __m128h __A, + __m128h __B, const int __R) +{ + return (__m128h) __builtin_ia32_vfmaddsh3_maskz ((__v8hf) __W, + (__v8hf) __A, + -(__v8hf) __B, + (__mmask8) __U, __R); +} + +#else +#define _mm_fmsub_round_sh(A, B, C, R) \ + ((__m128h) __builtin_ia32_vfmaddsh3_mask ((A), (B), -(C), (-1), (R))) +#define _mm_mask_fmsub_round_sh(A, U, B, C, R) \ + ((__m128h) __builtin_ia32_vfmaddsh3_mask ((A), (B), -(C), (U), (R))) +#define _mm_mask3_fmsub_round_sh(A, B, C, U, R) \ + ((__m128h) __builtin_ia32_vfmsubsh3_mask3 ((A), (B), (C), (U), (R))) +#define _mm_maskz_fmsub_round_sh(U, A, B, C, R) \ + ((__m128h) __builtin_ia32_vfmaddsh3_maskz ((A), (B), -(C), (U), (R))) + +#endif /* __OPTIMIZE__ */ + +/* Intrinsics vfnmsub[132,213,231]sh. */ +extern __inline __m128h + __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_fnmsub_sh (__m128h __W, __m128h __A, __m128h __B) +{ + return (__m128h) __builtin_ia32_vfmaddsh3_mask ((__v8hf) __W, + -(__v8hf) __A, + -(__v8hf) __B, + (__mmask8) -1, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_fnmsub_sh (__m128h __W, __mmask8 __U, __m128h __A, __m128h __B) +{ + return (__m128h) __builtin_ia32_vfmaddsh3_mask ((__v8hf) __W, + -(__v8hf) __A, + -(__v8hf) __B, + (__mmask8) __U, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask3_fnmsub_sh (__m128h __W, __m128h __A, __m128h __B, __mmask8 __U) +{ + return (__m128h) __builtin_ia32_vfmsubsh3_mask3 ((__v8hf) __W, + -(__v8hf) __A, + (__v8hf) __B, + (__mmask8) __U, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_fnmsub_sh (__mmask8 __U, __m128h __W, __m128h __A, __m128h __B) +{ + return (__m128h) __builtin_ia32_vfmaddsh3_maskz ((__v8hf) __W, + -(__v8hf) __A, + -(__v8hf) __B, + (__mmask8) __U, + _MM_FROUND_CUR_DIRECTION); +} + + +#ifdef __OPTIMIZE__ +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_fnmsub_round_sh (__m128h __W, __m128h __A, __m128h __B, const int __R) +{ + return (__m128h) __builtin_ia32_vfmaddsh3_mask ((__v8hf) __W, + -(__v8hf) __A, + -(__v8hf) __B, + (__mmask8) -1, + __R); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_fnmsub_round_sh (__m128h __W, __mmask8 __U, __m128h __A, __m128h __B, + const int __R) +{ + return (__m128h) __builtin_ia32_vfmaddsh3_mask ((__v8hf) __W, + -(__v8hf) __A, + -(__v8hf) __B, + (__mmask8) __U, __R); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask3_fnmsub_round_sh (__m128h __W, __m128h __A, __m128h __B, __mmask8 __U, + const int __R) +{ + return (__m128h) __builtin_ia32_vfmsubsh3_mask3 ((__v8hf) __W, + -(__v8hf) __A, + (__v8hf) __B, + (__mmask8) __U, __R); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_fnmsub_round_sh (__mmask8 __U, __m128h __W, __m128h __A, + __m128h __B, const int __R) +{ + return (__m128h) __builtin_ia32_vfmaddsh3_maskz ((__v8hf) __W, + -(__v8hf) __A, + -(__v8hf) __B, + (__mmask8) __U, __R); +} + +#else +#define _mm_fnmsub_round_sh(A, B, C, R) \ + ((__m128h) __builtin_ia32_vfmaddsh3_mask ((A), -(B), -(C), (-1), (R))) +#define _mm_mask_fnmsub_round_sh(A, U, B, C, R) \ + ((__m128h) __builtin_ia32_vfmaddsh3_mask ((A), -(B), -(C), (U), (R))) +#define _mm_mask3_fnmsub_round_sh(A, B, C, U, R) \ + ((__m128h) __builtin_ia32_vfmsubsh3_mask3 ((A), -(B), (C), (U), (R))) +#define _mm_maskz_fnmsub_round_sh(U, A, B, C, R) \ + ((__m128h) __builtin_ia32_vfmaddsh3_maskz ((A), -(B), -(C), (U), (R))) + +#endif /* __OPTIMIZE__ */ + #ifdef __DISABLE_AVX512FP16__ #undef __DISABLE_AVX512FP16__ #pragma GCC pop_options diff --git a/gcc/config/i386/i386-builtin-types.def b/gcc/config/i386/i386-builtin-types.def index 0cdbf1bc0c0..22b924bf98d 100644 --- a/gcc/config/i386/i386-builtin-types.def +++ b/gcc/config/i386/i386-builtin-types.def @@ -1342,6 +1342,7 @@ DEF_FUNCTION_TYPE (V8HF, V8HF, V8HF, INT) DEF_FUNCTION_TYPE (V8HF, V8HF, INT, V8HF, UQI) DEF_FUNCTION_TYPE (UQI, V8HF, V8HF, INT, UQI) DEF_FUNCTION_TYPE (V8HF, V8HF, V8HF, V8HF, UQI) +DEF_FUNCTION_TYPE (V8HF, V8HF, V8HF, V8HF, INT) DEF_FUNCTION_TYPE (UQI, V8HF, V8HF, INT, UQI, INT) DEF_FUNCTION_TYPE (V8DI, V8HF, V8DI, UQI, INT) DEF_FUNCTION_TYPE (V8DF, V8HF, V8DF, UQI, INT) diff --git a/gcc/config/i386/i386-builtin.def b/gcc/config/i386/i386-builtin.def index cf0259843cc..f446a6ce5d3 100644 --- a/gcc/config/i386/i386-builtin.def +++ b/gcc/config/i386/i386-builtin.def @@ -3194,6 +3194,13 @@ BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_fmsub_v32hf_maskz_round BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_fnmsub_v32hf_mask_round, "__builtin_ia32_vfnmsubph512_mask", IX86_BUILTIN_VFNMSUBPH512_MASK, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT) BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_fnmsub_v32hf_mask3_round, "__builtin_ia32_vfnmsubph512_mask3", IX86_BUILTIN_VFNMSUBPH512_MASK3, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT) BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_fnmsub_v32hf_maskz_round, "__builtin_ia32_vfnmsubph512_maskz", IX86_BUILTIN_VFNMSUBPH512_MASKZ, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT) +BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512f_vmfmadd_v8hf_mask_round, "__builtin_ia32_vfmaddsh3_mask", IX86_BUILTIN_VFMADDSH3_MASK, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI_INT) +BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512f_vmfmadd_v8hf_mask3_round, "__builtin_ia32_vfmaddsh3_mask3", IX86_BUILTIN_VFMADDSH3_MASK3, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI_INT) +BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512f_vmfmadd_v8hf_maskz_round, "__builtin_ia32_vfmaddsh3_maskz", IX86_BUILTIN_VFMADDSH3_MASKZ, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI_INT) +BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512f_vmfnmadd_v8hf_mask_round, "__builtin_ia32_vfnmaddsh3_mask", IX86_BUILTIN_VFNMADDSH3_MASK, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI_INT) +BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512f_vmfnmadd_v8hf_mask3_round, "__builtin_ia32_vfnmaddsh3_mask3", IX86_BUILTIN_VFNMADDSH3_MASK3, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI_INT) +BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512f_vmfnmadd_v8hf_maskz_round, "__builtin_ia32_vfnmaddsh3_maskz", IX86_BUILTIN_VFNMADDSH3_MASKZ, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI_INT) +BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512f_vmfmsub_v8hf_mask3_round, "__builtin_ia32_vfmsubsh3_mask3", IX86_BUILTIN_VFMSUBSH3_MASK3, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI_INT) BDESC_END (ROUND_ARGS, MULTI_ARG) diff --git a/gcc/config/i386/i386-expand.c b/gcc/config/i386/i386-expand.c index 006f4bec8db..f6de05c769a 100644 --- a/gcc/config/i386/i386-expand.c +++ b/gcc/config/i386/i386-expand.c @@ -10558,6 +10558,7 @@ ix86_expand_round_builtin (const struct builtin_description *d, case V8HF_FTYPE_V8DI_V8HF_UQI_INT: case V8HF_FTYPE_V8DF_V8HF_UQI_INT: case V16HF_FTYPE_V16SF_V16HF_UHI_INT: + case V8HF_FTYPE_V8HF_V8HF_V8HF_INT: nargs = 4; break; case V4SF_FTYPE_V4SF_V4SF_INT_INT: diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index cbf1e75c0b2..31f8fc68c65 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -5049,60 +5049,60 @@ (define_insn "<avx512>_fmsubadd_<mode>_mask3<round_name>" ;; high-order elements from the destination register. (define_expand "fmai_vmfmadd_<mode><round_name>" - [(set (match_operand:VF_128 0 "register_operand") - (vec_merge:VF_128 - (fma:VF_128 - (match_operand:VF_128 1 "register_operand") - (match_operand:VF_128 2 "<round_nimm_scalar_predicate>") - (match_operand:VF_128 3 "<round_nimm_scalar_predicate>")) + [(set (match_operand:VFH_128 0 "register_operand") + (vec_merge:VFH_128 + (fma:VFH_128 + (match_operand:VFH_128 1 "register_operand") + (match_operand:VFH_128 2 "<round_nimm_scalar_predicate>") + (match_operand:VFH_128 3 "<round_nimm_scalar_predicate>")) (match_dup 1) (const_int 1)))] "TARGET_FMA") (define_expand "fmai_vmfmsub_<mode><round_name>" - [(set (match_operand:VF_128 0 "register_operand") - (vec_merge:VF_128 - (fma:VF_128 - (match_operand:VF_128 1 "register_operand") - (match_operand:VF_128 2 "<round_nimm_scalar_predicate>") - (neg:VF_128 - (match_operand:VF_128 3 "<round_nimm_scalar_predicate>"))) + [(set (match_operand:VFH_128 0 "register_operand") + (vec_merge:VFH_128 + (fma:VFH_128 + (match_operand:VFH_128 1 "register_operand") + (match_operand:VFH_128 2 "<round_nimm_scalar_predicate>") + (neg:VFH_128 + (match_operand:VFH_128 3 "<round_nimm_scalar_predicate>"))) (match_dup 1) (const_int 1)))] "TARGET_FMA") (define_expand "fmai_vmfnmadd_<mode><round_name>" - [(set (match_operand:VF_128 0 "register_operand") - (vec_merge:VF_128 - (fma:VF_128 - (neg:VF_128 - (match_operand:VF_128 2 "<round_nimm_scalar_predicate>")) - (match_operand:VF_128 1 "register_operand") - (match_operand:VF_128 3 "<round_nimm_scalar_predicate>")) + [(set (match_operand:VFH_128 0 "register_operand") + (vec_merge:VFH_128 + (fma:VFH_128 + (neg:VFH_128 + (match_operand:VFH_128 2 "<round_nimm_scalar_predicate>")) + (match_operand:VFH_128 1 "register_operand") + (match_operand:VFH_128 3 "<round_nimm_scalar_predicate>")) (match_dup 1) (const_int 1)))] "TARGET_FMA") (define_expand "fmai_vmfnmsub_<mode><round_name>" - [(set (match_operand:VF_128 0 "register_operand") - (vec_merge:VF_128 - (fma:VF_128 - (neg:VF_128 - (match_operand:VF_128 2 "<round_nimm_scalar_predicate>")) - (match_operand:VF_128 1 "register_operand") - (neg:VF_128 - (match_operand:VF_128 3 "<round_nimm_scalar_predicate>"))) + [(set (match_operand:VFH_128 0 "register_operand") + (vec_merge:VFH_128 + (fma:VFH_128 + (neg:VFH_128 + (match_operand:VFH_128 2 "<round_nimm_scalar_predicate>")) + (match_operand:VFH_128 1 "register_operand") + (neg:VFH_128 + (match_operand:VFH_128 3 "<round_nimm_scalar_predicate>"))) (match_dup 1) (const_int 1)))] "TARGET_FMA") (define_insn "*fmai_fmadd_<mode>" - [(set (match_operand:VF_128 0 "register_operand" "=v,v") - (vec_merge:VF_128 - (fma:VF_128 - (match_operand:VF_128 1 "register_operand" "0,0") - (match_operand:VF_128 2 "<round_nimm_scalar_predicate>" "<round_constraint>, v") - (match_operand:VF_128 3 "<round_nimm_scalar_predicate>" "v,<round_constraint>")) + [(set (match_operand:VFH_128 0 "register_operand" "=v,v") + (vec_merge:VFH_128 + (fma:VFH_128 + (match_operand:VFH_128 1 "register_operand" "0,0") + (match_operand:VFH_128 2 "<round_nimm_scalar_predicate>" "<round_constraint>, v") + (match_operand:VFH_128 3 "<round_nimm_scalar_predicate>" "v,<round_constraint>")) (match_dup 1) (const_int 1)))] "TARGET_FMA || TARGET_AVX512F" @@ -5113,13 +5113,13 @@ (define_insn "*fmai_fmadd_<mode>" (set_attr "mode" "<MODE>")]) (define_insn "*fmai_fmsub_<mode>" - [(set (match_operand:VF_128 0 "register_operand" "=v,v") - (vec_merge:VF_128 - (fma:VF_128 - (match_operand:VF_128 1 "register_operand" "0,0") - (match_operand:VF_128 2 "<round_nimm_scalar_predicate>" "<round_constraint>,v") - (neg:VF_128 - (match_operand:VF_128 3 "<round_nimm_scalar_predicate>" "v,<round_constraint>"))) + [(set (match_operand:VFH_128 0 "register_operand" "=v,v") + (vec_merge:VFH_128 + (fma:VFH_128 + (match_operand:VFH_128 1 "register_operand" "0,0") + (match_operand:VFH_128 2 "<round_nimm_scalar_predicate>" "<round_constraint>,v") + (neg:VFH_128 + (match_operand:VFH_128 3 "<round_nimm_scalar_predicate>" "v,<round_constraint>"))) (match_dup 1) (const_int 1)))] "TARGET_FMA || TARGET_AVX512F" @@ -5130,13 +5130,13 @@ (define_insn "*fmai_fmsub_<mode>" (set_attr "mode" "<MODE>")]) (define_insn "*fmai_fnmadd_<mode><round_name>" - [(set (match_operand:VF_128 0 "register_operand" "=v,v") - (vec_merge:VF_128 - (fma:VF_128 - (neg:VF_128 - (match_operand:VF_128 2 "<round_nimm_scalar_predicate>" "<round_constraint>,v")) - (match_operand:VF_128 1 "register_operand" "0,0") - (match_operand:VF_128 3 "<round_nimm_scalar_predicate>" "v,<round_constraint>")) + [(set (match_operand:VFH_128 0 "register_operand" "=v,v") + (vec_merge:VFH_128 + (fma:VFH_128 + (neg:VFH_128 + (match_operand:VFH_128 2 "<round_nimm_scalar_predicate>" "<round_constraint>,v")) + (match_operand:VFH_128 1 "register_operand" "0,0") + (match_operand:VFH_128 3 "<round_nimm_scalar_predicate>" "v,<round_constraint>")) (match_dup 1) (const_int 1)))] "TARGET_FMA || TARGET_AVX512F" @@ -5147,14 +5147,14 @@ (define_insn "*fmai_fnmadd_<mode><round_name>" (set_attr "mode" "<MODE>")]) (define_insn "*fmai_fnmsub_<mode><round_name>" - [(set (match_operand:VF_128 0 "register_operand" "=v,v") - (vec_merge:VF_128 - (fma:VF_128 - (neg:VF_128 - (match_operand:VF_128 2 "<round_nimm_scalar_predicate>" "<round_constraint>,v")) - (match_operand:VF_128 1 "register_operand" "0,0") - (neg:VF_128 - (match_operand:VF_128 3 "<round_nimm_scalar_predicate>" "v,<round_constraint>"))) + [(set (match_operand:VFH_128 0 "register_operand" "=v,v") + (vec_merge:VFH_128 + (fma:VFH_128 + (neg:VFH_128 + (match_operand:VFH_128 2 "<round_nimm_scalar_predicate>" "<round_constraint>,v")) + (match_operand:VFH_128 1 "register_operand" "0,0") + (neg:VFH_128 + (match_operand:VFH_128 3 "<round_nimm_scalar_predicate>" "v,<round_constraint>"))) (match_dup 1) (const_int 1)))] "TARGET_FMA || TARGET_AVX512F" @@ -5165,13 +5165,13 @@ (define_insn "*fmai_fnmsub_<mode><round_name>" (set_attr "mode" "<MODE>")]) (define_insn "avx512f_vmfmadd_<mode>_mask<round_name>" - [(set (match_operand:VF_128 0 "register_operand" "=v,v") - (vec_merge:VF_128 - (vec_merge:VF_128 - (fma:VF_128 - (match_operand:VF_128 1 "register_operand" "0,0") - (match_operand:VF_128 2 "<round_nimm_scalar_predicate>" "<round_constraint>,v") - (match_operand:VF_128 3 "<round_nimm_scalar_predicate>" "v,<round_constraint>")) + [(set (match_operand:VFH_128 0 "register_operand" "=v,v") + (vec_merge:VFH_128 + (vec_merge:VFH_128 + (fma:VFH_128 + (match_operand:VFH_128 1 "register_operand" "0,0") + (match_operand:VFH_128 2 "<round_nimm_scalar_predicate>" "<round_constraint>,v") + (match_operand:VFH_128 3 "<round_nimm_scalar_predicate>" "v,<round_constraint>")) (match_dup 1) (match_operand:QI 4 "register_operand" "Yk,Yk")) (match_dup 1) @@ -5184,13 +5184,13 @@ (define_insn "avx512f_vmfmadd_<mode>_mask<round_name>" (set_attr "mode" "<MODE>")]) (define_insn "avx512f_vmfmadd_<mode>_mask3<round_name>" - [(set (match_operand:VF_128 0 "register_operand" "=v") - (vec_merge:VF_128 - (vec_merge:VF_128 - (fma:VF_128 - (match_operand:VF_128 1 "<round_nimm_scalar_predicate>" "%v") - (match_operand:VF_128 2 "<round_nimm_scalar_predicate>" "<round_constraint>") - (match_operand:VF_128 3 "register_operand" "0")) + [(set (match_operand:VFH_128 0 "register_operand" "=v") + (vec_merge:VFH_128 + (vec_merge:VFH_128 + (fma:VFH_128 + (match_operand:VFH_128 1 "<round_nimm_scalar_predicate>" "%v") + (match_operand:VFH_128 2 "<round_nimm_scalar_predicate>" "<round_constraint>") + (match_operand:VFH_128 3 "register_operand" "0")) (match_dup 3) (match_operand:QI 4 "register_operand" "Yk")) (match_dup 3) @@ -5201,10 +5201,10 @@ (define_insn "avx512f_vmfmadd_<mode>_mask3<round_name>" (set_attr "mode" "<MODE>")]) (define_expand "avx512f_vmfmadd_<mode>_maskz<round_expand_name>" - [(match_operand:VF_128 0 "register_operand") - (match_operand:VF_128 1 "<round_expand_nimm_predicate>") - (match_operand:VF_128 2 "<round_expand_nimm_predicate>") - (match_operand:VF_128 3 "<round_expand_nimm_predicate>") + [(match_operand:VFH_128 0 "register_operand") + (match_operand:VFH_128 1 "<round_expand_nimm_predicate>") + (match_operand:VFH_128 2 "<round_expand_nimm_predicate>") + (match_operand:VFH_128 3 "<round_expand_nimm_predicate>") (match_operand:QI 4 "register_operand")] "TARGET_AVX512F" { @@ -5215,14 +5215,14 @@ (define_expand "avx512f_vmfmadd_<mode>_maskz<round_expand_name>" }) (define_insn "avx512f_vmfmadd_<mode>_maskz_1<round_name>" - [(set (match_operand:VF_128 0 "register_operand" "=v,v") - (vec_merge:VF_128 - (vec_merge:VF_128 - (fma:VF_128 - (match_operand:VF_128 1 "register_operand" "0,0") - (match_operand:VF_128 2 "<round_nimm_scalar_predicate>" "<round_constraint>,v") - (match_operand:VF_128 3 "<round_nimm_scalar_predicate>" "v,<round_constraint>")) - (match_operand:VF_128 4 "const0_operand" "C,C") + [(set (match_operand:VFH_128 0 "register_operand" "=v,v") + (vec_merge:VFH_128 + (vec_merge:VFH_128 + (fma:VFH_128 + (match_operand:VFH_128 1 "register_operand" "0,0") + (match_operand:VFH_128 2 "<round_nimm_scalar_predicate>" "<round_constraint>,v") + (match_operand:VFH_128 3 "<round_nimm_scalar_predicate>" "v,<round_constraint>")) + (match_operand:VFH_128 4 "const0_operand" "C,C") (match_operand:QI 5 "register_operand" "Yk,Yk")) (match_dup 1) (const_int 1)))] @@ -5234,14 +5234,14 @@ (define_insn "avx512f_vmfmadd_<mode>_maskz_1<round_name>" (set_attr "mode" "<MODE>")]) (define_insn "*avx512f_vmfmsub_<mode>_mask<round_name>" - [(set (match_operand:VF_128 0 "register_operand" "=v,v") - (vec_merge:VF_128 - (vec_merge:VF_128 - (fma:VF_128 - (match_operand:VF_128 1 "register_operand" "0,0") - (match_operand:VF_128 2 "<round_nimm_scalar_predicate>" "<round_constraint>,v") - (neg:VF_128 - (match_operand:VF_128 3 "<round_nimm_scalar_predicate>" "v,<round_constraint>"))) + [(set (match_operand:VFH_128 0 "register_operand" "=v,v") + (vec_merge:VFH_128 + (vec_merge:VFH_128 + (fma:VFH_128 + (match_operand:VFH_128 1 "register_operand" "0,0") + (match_operand:VFH_128 2 "<round_nimm_scalar_predicate>" "<round_constraint>,v") + (neg:VFH_128 + (match_operand:VFH_128 3 "<round_nimm_scalar_predicate>" "v,<round_constraint>"))) (match_dup 1) (match_operand:QI 4 "register_operand" "Yk,Yk")) (match_dup 1) @@ -5254,14 +5254,14 @@ (define_insn "*avx512f_vmfmsub_<mode>_mask<round_name>" (set_attr "mode" "<MODE>")]) (define_insn "avx512f_vmfmsub_<mode>_mask3<round_name>" - [(set (match_operand:VF_128 0 "register_operand" "=v") - (vec_merge:VF_128 - (vec_merge:VF_128 - (fma:VF_128 - (match_operand:VF_128 1 "<round_nimm_scalar_predicate>" "%v") - (match_operand:VF_128 2 "<round_nimm_scalar_predicate>" "<round_constraint>") - (neg:VF_128 - (match_operand:VF_128 3 "register_operand" "0"))) + [(set (match_operand:VFH_128 0 "register_operand" "=v") + (vec_merge:VFH_128 + (vec_merge:VFH_128 + (fma:VFH_128 + (match_operand:VFH_128 1 "<round_nimm_scalar_predicate>" "%v") + (match_operand:VFH_128 2 "<round_nimm_scalar_predicate>" "<round_constraint>") + (neg:VFH_128 + (match_operand:VFH_128 3 "register_operand" "0"))) (match_dup 3) (match_operand:QI 4 "register_operand" "Yk")) (match_dup 3) @@ -5272,15 +5272,15 @@ (define_insn "avx512f_vmfmsub_<mode>_mask3<round_name>" (set_attr "mode" "<MODE>")]) (define_insn "*avx512f_vmfmsub_<mode>_maskz_1<round_name>" - [(set (match_operand:VF_128 0 "register_operand" "=v,v") - (vec_merge:VF_128 - (vec_merge:VF_128 - (fma:VF_128 - (match_operand:VF_128 1 "register_operand" "0,0") - (match_operand:VF_128 2 "<round_nimm_scalar_predicate>" "<round_constraint>,v") - (neg:VF_128 - (match_operand:VF_128 3 "<round_nimm_scalar_predicate>" "v,<round_constraint>"))) - (match_operand:VF_128 4 "const0_operand" "C,C") + [(set (match_operand:VFH_128 0 "register_operand" "=v,v") + (vec_merge:VFH_128 + (vec_merge:VFH_128 + (fma:VFH_128 + (match_operand:VFH_128 1 "register_operand" "0,0") + (match_operand:VFH_128 2 "<round_nimm_scalar_predicate>" "<round_constraint>,v") + (neg:VFH_128 + (match_operand:VFH_128 3 "<round_nimm_scalar_predicate>" "v,<round_constraint>"))) + (match_operand:VFH_128 4 "const0_operand" "C,C") (match_operand:QI 5 "register_operand" "Yk,Yk")) (match_dup 1) (const_int 1)))] @@ -5291,15 +5291,15 @@ (define_insn "*avx512f_vmfmsub_<mode>_maskz_1<round_name>" [(set_attr "type" "ssemuladd") (set_attr "mode" "<MODE>")]) -(define_insn "*avx512f_vmfnmadd_<mode>_mask<round_name>" - [(set (match_operand:VF_128 0 "register_operand" "=v,v") - (vec_merge:VF_128 - (vec_merge:VF_128 - (fma:VF_128 - (neg:VF_128 - (match_operand:VF_128 2 "<round_nimm_scalar_predicate>" "<round_constraint>,v")) - (match_operand:VF_128 1 "register_operand" "0,0") - (match_operand:VF_128 3 "<round_nimm_scalar_predicate>" "v,<round_constraint>")) +(define_insn "avx512f_vmfnmadd_<mode>_mask<round_name>" + [(set (match_operand:VFH_128 0 "register_operand" "=v,v") + (vec_merge:VFH_128 + (vec_merge:VFH_128 + (fma:VFH_128 + (neg:VFH_128 + (match_operand:VFH_128 2 "<round_nimm_scalar_predicate>" "<round_constraint>,v")) + (match_operand:VFH_128 1 "register_operand" "0,0") + (match_operand:VFH_128 3 "<round_nimm_scalar_predicate>" "v,<round_constraint>")) (match_dup 1) (match_operand:QI 4 "register_operand" "Yk,Yk")) (match_dup 1) @@ -5311,15 +5311,15 @@ (define_insn "*avx512f_vmfnmadd_<mode>_mask<round_name>" [(set_attr "type" "ssemuladd") (set_attr "mode" "<MODE>")]) -(define_insn "*avx512f_vmfnmadd_<mode>_mask3<round_name>" - [(set (match_operand:VF_128 0 "register_operand" "=v") - (vec_merge:VF_128 - (vec_merge:VF_128 - (fma:VF_128 - (neg:VF_128 - (match_operand:VF_128 2 "<round_nimm_scalar_predicate>" "<round_constraint>")) - (match_operand:VF_128 1 "<round_nimm_scalar_predicate>" "%v") - (match_operand:VF_128 3 "register_operand" "0")) +(define_insn "avx512f_vmfnmadd_<mode>_mask3<round_name>" + [(set (match_operand:VFH_128 0 "register_operand" "=v") + (vec_merge:VFH_128 + (vec_merge:VFH_128 + (fma:VFH_128 + (neg:VFH_128 + (match_operand:VFH_128 2 "<round_nimm_scalar_predicate>" "<round_constraint>")) + (match_operand:VFH_128 1 "<round_nimm_scalar_predicate>" "%v") + (match_operand:VFH_128 3 "register_operand" "0")) (match_dup 3) (match_operand:QI 4 "register_operand" "Yk")) (match_dup 3) @@ -5329,16 +5329,30 @@ (define_insn "*avx512f_vmfnmadd_<mode>_mask3<round_name>" [(set_attr "type" "ssemuladd") (set_attr "mode" "<MODE>")]) -(define_insn "*avx512f_vmfnmadd_<mode>_maskz_1<round_name>" - [(set (match_operand:VF_128 0 "register_operand" "=v,v") - (vec_merge:VF_128 - (vec_merge:VF_128 - (fma:VF_128 - (neg:VF_128 - (match_operand:VF_128 2 "<round_nimm_scalar_predicate>" "<round_constraint>,v")) - (match_operand:VF_128 1 "register_operand" "0,0") - (match_operand:VF_128 3 "<round_nimm_scalar_predicate>" "v,<round_constraint>")) - (match_operand:VF_128 4 "const0_operand" "C,C") +(define_expand "avx512f_vmfnmadd_<mode>_maskz<round_expand_name>" + [(match_operand:VFH_128 0 "register_operand") + (match_operand:VFH_128 1 "<round_expand_nimm_predicate>") + (match_operand:VFH_128 2 "<round_expand_nimm_predicate>") + (match_operand:VFH_128 3 "<round_expand_nimm_predicate>") + (match_operand:QI 4 "register_operand")] + "TARGET_AVX512F" +{ + emit_insn (gen_avx512f_vmfnmadd_<mode>_maskz_1<round_expand_name> ( + operands[0], operands[1], operands[2], operands[3], + CONST0_RTX (<MODE>mode), operands[4]<round_expand_operand>)); + DONE; +}) + +(define_insn "avx512f_vmfnmadd_<mode>_maskz_1<round_name>" + [(set (match_operand:VFH_128 0 "register_operand" "=v,v") + (vec_merge:VFH_128 + (vec_merge:VFH_128 + (fma:VFH_128 + (neg:VFH_128 + (match_operand:VFH_128 2 "<round_nimm_scalar_predicate>" "<round_constraint>,v")) + (match_operand:VFH_128 1 "register_operand" "0,0") + (match_operand:VFH_128 3 "<round_nimm_scalar_predicate>" "v,<round_constraint>")) + (match_operand:VFH_128 4 "const0_operand" "C,C") (match_operand:QI 5 "register_operand" "Yk,Yk")) (match_dup 1) (const_int 1)))] @@ -5350,15 +5364,15 @@ (define_insn "*avx512f_vmfnmadd_<mode>_maskz_1<round_name>" (set_attr "mode" "<MODE>")]) (define_insn "*avx512f_vmfnmsub_<mode>_mask<round_name>" - [(set (match_operand:VF_128 0 "register_operand" "=v,v") - (vec_merge:VF_128 - (vec_merge:VF_128 - (fma:VF_128 - (neg:VF_128 - (match_operand:VF_128 2 "<round_nimm_scalar_predicate>" "<round_constraint>,v")) - (match_operand:VF_128 1 "register_operand" "0,0") - (neg:VF_128 - (match_operand:VF_128 3 "<round_nimm_scalar_predicate>" "v,<round_constraint>"))) + [(set (match_operand:VFH_128 0 "register_operand" "=v,v") + (vec_merge:VFH_128 + (vec_merge:VFH_128 + (fma:VFH_128 + (neg:VFH_128 + (match_operand:VFH_128 2 "<round_nimm_scalar_predicate>" "<round_constraint>,v")) + (match_operand:VFH_128 1 "register_operand" "0,0") + (neg:VFH_128 + (match_operand:VFH_128 3 "<round_nimm_scalar_predicate>" "v,<round_constraint>"))) (match_dup 1) (match_operand:QI 4 "register_operand" "Yk,Yk")) (match_dup 1) @@ -5371,15 +5385,15 @@ (define_insn "*avx512f_vmfnmsub_<mode>_mask<round_name>" (set_attr "mode" "<MODE>")]) (define_insn "*avx512f_vmfnmsub_<mode>_mask3<round_name>" - [(set (match_operand:VF_128 0 "register_operand" "=v") - (vec_merge:VF_128 - (vec_merge:VF_128 - (fma:VF_128 - (neg:VF_128 - (match_operand:VF_128 2 "<round_nimm_scalar_predicate>" "<round_constraint>")) - (match_operand:VF_128 1 "<round_nimm_scalar_predicate>" "%v") - (neg:VF_128 - (match_operand:VF_128 3 "register_operand" "0"))) + [(set (match_operand:VFH_128 0 "register_operand" "=v") + (vec_merge:VFH_128 + (vec_merge:VFH_128 + (fma:VFH_128 + (neg:VFH_128 + (match_operand:VFH_128 2 "<round_nimm_scalar_predicate>" "<round_constraint>")) + (match_operand:VFH_128 1 "<round_nimm_scalar_predicate>" "%v") + (neg:VFH_128 + (match_operand:VFH_128 3 "register_operand" "0"))) (match_dup 3) (match_operand:QI 4 "register_operand" "Yk")) (match_dup 3) @@ -5390,16 +5404,16 @@ (define_insn "*avx512f_vmfnmsub_<mode>_mask3<round_name>" (set_attr "mode" "<MODE>")]) (define_insn "*avx512f_vmfnmsub_<mode>_maskz_1<round_name>" - [(set (match_operand:VF_128 0 "register_operand" "=v,v") - (vec_merge:VF_128 - (vec_merge:VF_128 - (fma:VF_128 - (neg:VF_128 - (match_operand:VF_128 2 "<round_nimm_scalar_predicate>" "<round_constraint>,v")) - (match_operand:VF_128 1 "register_operand" "0,0") - (neg:VF_128 - (match_operand:VF_128 3 "<round_nimm_scalar_predicate>" "v,<round_constraint>"))) - (match_operand:VF_128 4 "const0_operand" "C,C") + [(set (match_operand:VFH_128 0 "register_operand" "=v,v") + (vec_merge:VFH_128 + (vec_merge:VFH_128 + (fma:VFH_128 + (neg:VFH_128 + (match_operand:VFH_128 2 "<round_nimm_scalar_predicate>" "<round_constraint>,v")) + (match_operand:VFH_128 1 "register_operand" "0,0") + (neg:VFH_128 + (match_operand:VFH_128 3 "<round_nimm_scalar_predicate>" "v,<round_constraint>"))) + (match_operand:VFH_128 4 "const0_operand" "C,C") (match_operand:QI 5 "register_operand" "Yk,Yk")) (match_dup 1) (const_int 1)))] diff --git a/gcc/testsuite/gcc.target/i386/avx-1.c b/gcc/testsuite/gcc.target/i386/avx-1.c index d2ab16538d8..6c2d1dc3df4 100644 --- a/gcc/testsuite/gcc.target/i386/avx-1.c +++ b/gcc/testsuite/gcc.target/i386/avx-1.c @@ -775,6 +775,18 @@ #define __builtin_ia32_vfnmsubph512_mask(A, B, C, D, E) __builtin_ia32_vfnmsubph512_mask(A, B, C, D, 8) #define __builtin_ia32_vfnmsubph512_mask3(A, B, C, D, E) __builtin_ia32_vfnmsubph512_mask3(A, B, C, D, 8) #define __builtin_ia32_vfnmsubph512_maskz(A, B, C, D, E) __builtin_ia32_vfnmsubph512_maskz(A, B, C, D, 8) +#define __builtin_ia32_vfmaddsh3_mask(A, B, C, D, E) __builtin_ia32_vfmaddsh3_mask(A, B, C, D, 8) +#define __builtin_ia32_vfmaddsh3_mask3(A, B, C, D, E) __builtin_ia32_vfmaddsh3_mask3(A, B, C, D, 8) +#define __builtin_ia32_vfmaddsh3_maskz(A, B, C, D, E) __builtin_ia32_vfmaddsh3_maskz(A, B, C, D, 8) +#define __builtin_ia32_vfnmaddsh3_mask(A, B, C, D, E) __builtin_ia32_vfnmaddsh3_mask(A, B, C, D, 8) +#define __builtin_ia32_vfnmaddsh3_mask3(A, B, C, D, E) __builtin_ia32_vfnmaddsh3_mask3(A, B, C, D, 8) +#define __builtin_ia32_vfnmaddsh3_maskz(A, B, C, D, E) __builtin_ia32_vfnmaddsh3_maskz(A, B, C, D, 8) +#define __builtin_ia32_vfmsubsh3_mask(A, B, C, D, E) __builtin_ia32_vfmsubsh3_mask(A, B, C, D, 8) +#define __builtin_ia32_vfmsubsh3_mask3(A, B, C, D, E) __builtin_ia32_vfmsubsh3_mask3(A, B, C, D, 8) +#define __builtin_ia32_vfmsubsh3_maskz(A, B, C, D, E) __builtin_ia32_vfmsubsh3_maskz(A, B, C, D, 8) +#define __builtin_ia32_vfnmsubsh3_mask(A, B, C, D, E) __builtin_ia32_vfnmsubsh3_mask(A, B, C, D, 8) +#define __builtin_ia32_vfnmsubsh3_mask3(A, B, C, D, E) __builtin_ia32_vfnmsubsh3_mask3(A, B, C, D, 8) +#define __builtin_ia32_vfnmsubsh3_maskz(A, B, C, D, E) __builtin_ia32_vfnmsubsh3_maskz(A, B, C, D, 8) /* avx512fp16vlintrin.h */ #define __builtin_ia32_vcmpph_v8hf_mask(A, B, C, D) __builtin_ia32_vcmpph_v8hf_mask(A, B, 1, D) diff --git a/gcc/testsuite/gcc.target/i386/sse-13.c b/gcc/testsuite/gcc.target/i386/sse-13.c index 49c72f6fcef..f16be008909 100644 --- a/gcc/testsuite/gcc.target/i386/sse-13.c +++ b/gcc/testsuite/gcc.target/i386/sse-13.c @@ -792,6 +792,18 @@ #define __builtin_ia32_vfnmsubph512_mask(A, B, C, D, E) __builtin_ia32_vfnmsubph512_mask(A, B, C, D, 8) #define __builtin_ia32_vfnmsubph512_mask3(A, B, C, D, E) __builtin_ia32_vfnmsubph512_mask3(A, B, C, D, 8) #define __builtin_ia32_vfnmsubph512_maskz(A, B, C, D, E) __builtin_ia32_vfnmsubph512_maskz(A, B, C, D, 8) +#define __builtin_ia32_vfmaddsh3_mask(A, B, C, D, E) __builtin_ia32_vfmaddsh3_mask(A, B, C, D, 8) +#define __builtin_ia32_vfmaddsh3_mask3(A, B, C, D, E) __builtin_ia32_vfmaddsh3_mask3(A, B, C, D, 8) +#define __builtin_ia32_vfmaddsh3_maskz(A, B, C, D, E) __builtin_ia32_vfmaddsh3_maskz(A, B, C, D, 8) +#define __builtin_ia32_vfnmaddsh3_mask(A, B, C, D, E) __builtin_ia32_vfnmaddsh3_mask(A, B, C, D, 8) +#define __builtin_ia32_vfnmaddsh3_mask3(A, B, C, D, E) __builtin_ia32_vfnmaddsh3_mask3(A, B, C, D, 8) +#define __builtin_ia32_vfnmaddsh3_maskz(A, B, C, D, E) __builtin_ia32_vfnmaddsh3_maskz(A, B, C, D, 8) +#define __builtin_ia32_vfmsubsh3_mask(A, B, C, D, E) __builtin_ia32_vfmsubsh3_mask(A, B, C, D, 8) +#define __builtin_ia32_vfmsubsh3_mask3(A, B, C, D, E) __builtin_ia32_vfmsubsh3_mask3(A, B, C, D, 8) +#define __builtin_ia32_vfmsubsh3_maskz(A, B, C, D, E) __builtin_ia32_vfmsubsh3_maskz(A, B, C, D, 8) +#define __builtin_ia32_vfnmsubsh3_mask(A, B, C, D, E) __builtin_ia32_vfnmsubsh3_mask(A, B, C, D, 8) +#define __builtin_ia32_vfnmsubsh3_mask3(A, B, C, D, E) __builtin_ia32_vfnmsubsh3_mask3(A, B, C, D, 8) +#define __builtin_ia32_vfnmsubsh3_maskz(A, B, C, D, E) __builtin_ia32_vfnmsubsh3_maskz(A, B, C, D, 8) /* avx512fp16vlintrin.h */ #define __builtin_ia32_vcmpph_v8hf_mask(A, B, C, D) __builtin_ia32_vcmpph_v8hf_mask(A, B, 1, D) diff --git a/gcc/testsuite/gcc.target/i386/sse-14.c b/gcc/testsuite/gcc.target/i386/sse-14.c index 9151e50afd2..01ac4e04173 100644 --- a/gcc/testsuite/gcc.target/i386/sse-14.c +++ b/gcc/testsuite/gcc.target/i386/sse-14.c @@ -842,6 +842,10 @@ test_3 (_mm512_fmadd_round_ph, __m512h, __m512h, __m512h, __m512h, 9) test_3 (_mm512_fnmadd_round_ph, __m512h, __m512h, __m512h, __m512h, 9) test_3 (_mm512_fmsub_round_ph, __m512h, __m512h, __m512h, __m512h, 9) test_3 (_mm512_fnmsub_round_ph, __m512h, __m512h, __m512h, __m512h, 9) +test_3 (_mm_fmadd_round_sh, __m128h, __m128h, __m128h, __m128h, 9) +test_3 (_mm_fnmadd_round_sh, __m128h, __m128h, __m128h, __m128h, 9) +test_3 (_mm_fmsub_round_sh, __m128h, __m128h, __m128h, __m128h, 9) +test_3 (_mm_fnmsub_round_sh, __m128h, __m128h, __m128h, __m128h, 9) test_3x (_mm512_mask_cmp_round_ph_mask, __mmask32, __mmask32, __m512h, __m512h, 1, 8) test_3x (_mm_mask_cmp_round_sh_mask, __mmask8, __mmask8, __m128h, __m128h, 1, 8) test_3x (_mm512_mask_reduce_round_ph, __m512h, __m512h, __mmask32, __m512h, 123, 8) @@ -892,6 +896,18 @@ test_4 (_mm512_maskz_fmsub_round_ph, __m512h, __mmask32, __m512h, __m512h, __m51 test_4 (_mm512_mask_fnmsub_round_ph, __m512h, __m512h, __mmask32, __m512h, __m512h, 9) test_4 (_mm512_mask3_fnmsub_round_ph, __m512h, __m512h, __m512h, __m512h, __mmask32, 9) test_4 (_mm512_maskz_fnmsub_round_ph, __m512h, __mmask32, __m512h, __m512h, __m512h, 9) +test_4 (_mm_mask_fmadd_round_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 9) +test_4 (_mm_mask3_fmadd_round_sh, __m128h, __m128h, __m128h, __m128h, __mmask8, 9) +test_4 (_mm_maskz_fmadd_round_sh, __m128h, __mmask8, __m128h, __m128h, __m128h, 9) +test_4 (_mm_mask_fnmadd_round_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 9) +test_4 (_mm_mask3_fnmadd_round_sh, __m128h, __m128h, __m128h, __m128h, __mmask8, 9) +test_4 (_mm_maskz_fnmadd_round_sh, __m128h, __mmask8, __m128h, __m128h, __m128h, 9) +test_4 (_mm_mask_fmsub_round_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 9) +test_4 (_mm_mask3_fmsub_round_sh, __m128h, __m128h, __m128h, __m128h, __mmask8, 9) +test_4 (_mm_maskz_fmsub_round_sh, __m128h, __mmask8, __m128h, __m128h, __m128h, 9) +test_4 (_mm_mask_fnmsub_round_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 9) +test_4 (_mm_mask3_fnmsub_round_sh, __m128h, __m128h, __m128h, __m128h, __mmask8, 9) +test_4 (_mm_maskz_fnmsub_round_sh, __m128h, __mmask8, __m128h, __m128h, __m128h, 9) test_4x (_mm_mask_reduce_round_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 123, 8) test_4x (_mm_mask_roundscale_round_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 123, 8) test_4x (_mm_mask_getmant_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 1, 1) diff --git a/gcc/testsuite/gcc.target/i386/sse-22.c b/gcc/testsuite/gcc.target/i386/sse-22.c index 892b6334ae2..79e3f35ab86 100644 --- a/gcc/testsuite/gcc.target/i386/sse-22.c +++ b/gcc/testsuite/gcc.target/i386/sse-22.c @@ -945,6 +945,10 @@ test_3 (_mm512_fmadd_round_ph, __m512h, __m512h, __m512h, __m512h, 9) test_3 (_mm512_fnmadd_round_ph, __m512h, __m512h, __m512h, __m512h, 9) test_3 (_mm512_fmsub_round_ph, __m512h, __m512h, __m512h, __m512h, 9) test_3 (_mm512_fnmsub_round_ph, __m512h, __m512h, __m512h, __m512h, 9) +test_3 (_mm_fmadd_round_sh, __m128h, __m128h, __m128h, __m128h, 9) +test_3 (_mm_fnmadd_round_sh, __m128h, __m128h, __m128h, __m128h, 9) +test_3 (_mm_fmsub_round_sh, __m128h, __m128h, __m128h, __m128h, 9) +test_3 (_mm_fnmsub_round_sh, __m128h, __m128h, __m128h, __m128h, 9) test_3x (_mm512_mask_cmp_round_ph_mask, __mmask32, __mmask32, __m512h, __m512h, 1, 8) test_3x (_mm_mask_cmp_round_sh_mask, __mmask8, __mmask8, __m128h, __m128h, 1, 8) test_3x (_mm512_mask_reduce_round_ph, __m512h, __m512h, __mmask32, __m512h, 123, 8) @@ -994,6 +998,18 @@ test_4 (_mm512_maskz_fmsub_round_ph, __m512h, __mmask32, __m512h, __m512h, __m51 test_4 (_mm512_mask_fnmsub_round_ph, __m512h, __m512h, __mmask32, __m512h, __m512h, 9) test_4 (_mm512_mask3_fnmsub_round_ph, __m512h, __m512h, __m512h, __m512h, __mmask32, 9) test_4 (_mm512_maskz_fnmsub_round_ph, __m512h, __mmask32, __m512h, __m512h, __m512h, 9) +test_4 (_mm_mask_fmadd_round_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 9) +test_4 (_mm_mask3_fmadd_round_sh, __m128h, __m128h, __m128h, __m128h, __mmask8, 9) +test_4 (_mm_maskz_fmadd_round_sh, __m128h, __mmask8, __m128h, __m128h, __m128h, 9) +test_4 (_mm_mask_fnmadd_round_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 9) +test_4 (_mm_mask3_fnmadd_round_sh, __m128h, __m128h, __m128h, __m128h, __mmask8, 9) +test_4 (_mm_maskz_fnmadd_round_sh, __m128h, __mmask8, __m128h, __m128h, __m128h, 9) +test_4 (_mm_mask_fmsub_round_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 9) +test_4 (_mm_mask3_fmsub_round_sh, __m128h, __m128h, __m128h, __m128h, __mmask8, 9) +test_4 (_mm_maskz_fmsub_round_sh, __m128h, __mmask8, __m128h, __m128h, __m128h, 9) +test_4 (_mm_mask_fnmsub_round_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 9) +test_4 (_mm_mask3_fnmsub_round_sh, __m128h, __m128h, __m128h, __m128h, __mmask8, 9) +test_4 (_mm_maskz_fnmsub_round_sh, __m128h, __mmask8, __m128h, __m128h, __m128h, 9) test_4x (_mm_mask_reduce_round_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 123, 8) test_4x (_mm_mask_roundscale_round_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 123, 8) test_4x (_mm_mask_getmant_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 1, 1) diff --git a/gcc/testsuite/gcc.target/i386/sse-23.c b/gcc/testsuite/gcc.target/i386/sse-23.c index 447b83829f3..caf14408b91 100644 --- a/gcc/testsuite/gcc.target/i386/sse-23.c +++ b/gcc/testsuite/gcc.target/i386/sse-23.c @@ -793,6 +793,18 @@ #define __builtin_ia32_vfnmsubph512_mask(A, B, C, D, E) __builtin_ia32_vfnmsubph512_mask(A, B, C, D, 8) #define __builtin_ia32_vfnmsubph512_mask3(A, B, C, D, E) __builtin_ia32_vfnmsubph512_mask3(A, B, C, D, 8) #define __builtin_ia32_vfnmsubph512_maskz(A, B, C, D, E) __builtin_ia32_vfnmsubph512_maskz(A, B, C, D, 8) +#define __builtin_ia32_vfmaddsh3_mask(A, B, C, D, E) __builtin_ia32_vfmaddsh3_mask(A, B, C, D, 8) +#define __builtin_ia32_vfmaddsh3_mask3(A, B, C, D, E) __builtin_ia32_vfmaddsh3_mask3(A, B, C, D, 8) +#define __builtin_ia32_vfmaddsh3_maskz(A, B, C, D, E) __builtin_ia32_vfmaddsh3_maskz(A, B, C, D, 8) +#define __builtin_ia32_vfnmaddsh3_mask(A, B, C, D, E) __builtin_ia32_vfnmaddsh3_mask(A, B, C, D, 8) +#define __builtin_ia32_vfnmaddsh3_mask3(A, B, C, D, E) __builtin_ia32_vfnmaddsh3_mask3(A, B, C, D, 8) +#define __builtin_ia32_vfnmaddsh3_maskz(A, B, C, D, E) __builtin_ia32_vfnmaddsh3_maskz(A, B, C, D, 8) +#define __builtin_ia32_vfmsubsh3_mask(A, B, C, D, E) __builtin_ia32_vfmsubsh3_mask(A, B, C, D, 8) +#define __builtin_ia32_vfmsubsh3_mask3(A, B, C, D, E) __builtin_ia32_vfmsubsh3_mask3(A, B, C, D, 8) +#define __builtin_ia32_vfmsubsh3_maskz(A, B, C, D, E) __builtin_ia32_vfmsubsh3_maskz(A, B, C, D, 8) +#define __builtin_ia32_vfnmsubsh3_mask(A, B, C, D, E) __builtin_ia32_vfnmsubsh3_mask(A, B, C, D, 8) +#define __builtin_ia32_vfnmsubsh3_mask3(A, B, C, D, E) __builtin_ia32_vfnmsubsh3_mask3(A, B, C, D, 8) +#define __builtin_ia32_vfnmsubsh3_maskz(A, B, C, D, E) __builtin_ia32_vfnmsubsh3_maskz(A, B, C, D, 8) /* avx512fp16vlintrin.h */ #define __builtin_ia32_vcmpph_v8hf_mask(A, B, C, D) __builtin_ia32_vcmpph_v8hf_mask(A, B, 1, D)

[47/62] AVX512FP16: Add scalar fma instructions.

Commit Message

Patch