From patchwork Thu Jul 1 06:15:47 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: liuhongt X-Patchwork-Id: 1499294 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=8.43.85.97; helo=sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256 header.s=default header.b=GsR9rlc7; dkim-atps=neutral Received: from sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4GFp365TFQz9sWX for ; Thu, 1 Jul 2021 16:19:46 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 7D61B384B06F for ; Thu, 1 Jul 2021 06:19:44 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 7D61B384B06F DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1625120384; bh=hF3j+htQbvdqBZLFuyZkdm8ytupsq1hXmtkWSVXY62g=; h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=GsR9rlc7HcJN6PVs9hnK2dip4t2Au0neQV+x+JhW9w2W4emQ1UpVc4eL866c6QtpZ X3w8chOI1PZqQzq0xg22YBYQBXgswfPmIJq521lC94ksUDRtjBnZVywNbt5LOdOozH QgNhSxkVzV6lmw+OPywU3ZgxUn/u8wa4wGR4plDg= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mga07.intel.com (mga07.intel.com [134.134.136.100]) by sourceware.org (Postfix) with ESMTPS id 6415B3858034 for ; Thu, 1 Jul 2021 06:16:56 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 6415B3858034 X-IronPort-AV: E=McAfee;i="6200,9189,10031"; a="272333990" X-IronPort-AV: E=Sophos;i="5.83,313,1616482800"; d="scan'208";a="272333990" Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Jun 2021 23:16:52 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.83,313,1616482800"; d="scan'208";a="457530399" Received: from scymds01.sc.intel.com ([10.148.94.138]) by fmsmga008.fm.intel.com with ESMTP; 30 Jun 2021 23:16:52 -0700 Received: from shliclel320.sh.intel.com (shliclel320.sh.intel.com [10.239.236.50]) by scymds01.sc.intel.com with ESMTP id 1616Gmee031625; Wed, 30 Jun 2021 23:16:51 -0700 To: gcc-patches@gcc.gnu.org Subject: [PATCH 01/62] AVX512FP16: Support vector init/broadcast for FP16. Date: Thu, 1 Jul 2021 14:15:47 +0800 Message-Id: <20210701061648.9447-2-hongtao.liu@intel.com> X-Mailer: git-send-email 2.18.1 In-Reply-To: <20210701061648.9447-1-hongtao.liu@intel.com> References: <20210701061648.9447-1-hongtao.liu@intel.com> X-Spam-Status: No, score=-13.4 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_NONE, KAM_DMARC_STATUS, KAM_LAZY_DOMAIN_SECURITY, SCC_5_SHORT_WORD_LINES, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: liuhongt via Gcc-patches From: liuhongt Reply-To: liuhongt Cc: jakub@redhat.com Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" From: "H.J. Lu" gcc/ChangeLog: * config/i386/avx512fp16intrin.h (_mm_set_ph): New intrinsic. (_mm256_set_ph): Likewise. (_mm512_set_ph): Likewise. (_mm_setr_ph): Likewise. (_mm256_setr_ph): Likewise. (_mm512_setr_ph): Likewise. (_mm_set1_ph): Likewise. (_mm256_set1_ph): Likewise. (_mm512_set1_ph): Likewise. (_mm_setzero_ph): Likewise. (_mm256_setzero_ph): Likewise. (_mm512_setzero_ph): Likewise. (_mm_set_sh): Likewise. (_mm_load_sh): Likewise. (_mm_store_sh): Likewise. * config/i386/i386-expand.c (ix86_expand_vector_init_duplicate): Support vector HFmodes. (ix86_expand_vector_init_one_nonzero): Likewise. (ix86_expand_vector_init_one_var): Likewise. (ix86_expand_vector_init_interleave): Likewise. (ix86_expand_vector_init_general): Likewise. (ix86_expand_vector_set): Likewise. (ix86_expand_vector_extract): Likewise. * config/i386/i386-modes.def: Add HF vector modes in comment. * config/i386/i386.c (classify_argument): Add HF vector modes. (inline_secondary_memory_needed): Enable 16bit move. (ix86_hard_regno_mode_ok): Allow HF vector modes for AVX512FP16. (ix86_vector_mode_supported_p): Likewise. * config/i386/i386.md (mode): Add HF vector modes. (MODE_SIZE): Likewise. (ssemodesuffix): Add ph suffix for HF vector modes. * config/i386/sse.md (VMOVE): Adjust for HF vector modes. (V): Likewise. (V_256_512): Likewise. (avx512): Likewise. (shuffletype): Likewise. (sseinsnmode): Likewise. (ssedoublevecmode): Likewise. (ssehalfvecmode): Likewise. (ssehalfvecmodelower): Likewise. (ssePScmode): Likewise. (ssescalarmode): Likewise. (ssescalarmodelower): Likewise. (sseintprefix): Likewise. (i128): Likewise. (bcstscalarsuff): Likewise. (xtg_mode): Likewise. (VI12HF_AVX512VL): New mode_iterator. (VF_AVX512FP16): Likewise. (VIHF): Likewise. (VIHF_256): Likewise. (VIHF_AVX512BW): Likewise. (V16_256): Likewise. (V32_512): Likewise. (sseintmodesuffix): New mode_attr. (vec_set_0): New define_insn for HF vector set. (*avx512fp16_movsh): Likewise. (avx512fp16_movsh): Likewise. (vec_extract_lo_v32hi): Rename to ... (vec_extract_lo_): ... this, and adjust to allow HF vector modes. (vec_extract_hi_v32hi): Likewise. (vec_extract_hi_): Likewise. (vec_extract_lo_v16hi): Likewise. (vec_extract_lo_): Likewise. (vec_extract_hi_v16hi): Likewise. (vec_extract_hi_): Likewise. (*vec_extract_0): New define_insn_and_split for HF vector extract. (*vec_extracthf): New define_insn. (VEC_EXTRACT_MODE): Add HF vector modes. (PINSR_MODE): Add V8HF. (sse2p4_1): Likewise. (pinsr_evex_isa): Likewise. (_pinsr): Adjust to support insert for V8HFmode. (pbroadcast_evex_isa): Add HF vector modes. (AVX2_VEC_DUP_MODE): Likewise. (VEC_INIT_MODE): Likewise. (VEC_INIT_HALF_MODE): Likewise. (avx2_pbroadcast): Adjust to support HF vector mode broadcast. (avx2_pbroadcast_1): Likewise. (_vec_dup_1): Likewise. (_vec_dup): Likewise. (_vec_dup_gpr): Likewise. --- gcc/config/i386/avx512fp16intrin.h | 172 +++++++++++++++++++ gcc/config/i386/i386-expand.c | 79 ++++++++- gcc/config/i386/i386-modes.def | 12 +- gcc/config/i386/i386.c | 19 ++- gcc/config/i386/i386.md | 13 +- gcc/config/i386/sse.md | 266 ++++++++++++++++++++++------- 6 files changed, 480 insertions(+), 81 deletions(-) diff --git a/gcc/config/i386/avx512fp16intrin.h b/gcc/config/i386/avx512fp16intrin.h index 38d63161ba6..3fc0770986e 100644 --- a/gcc/config/i386/avx512fp16intrin.h +++ b/gcc/config/i386/avx512fp16intrin.h @@ -45,6 +45,178 @@ typedef _Float16 __m128h __attribute__ ((__vector_size__ (16), __may_alias__)); typedef _Float16 __m256h __attribute__ ((__vector_size__ (32), __may_alias__)); typedef _Float16 __m512h __attribute__ ((__vector_size__ (64), __may_alias__)); +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_set_ph (_Float16 __A7, _Float16 __A6, _Float16 __A5, + _Float16 __A4, _Float16 __A3, _Float16 __A2, + _Float16 __A1, _Float16 __A0) +{ + return __extension__ (__m128h)(__v8hf){ __A0, __A1, __A2, __A3, + __A4, __A5, __A6, __A7 }; +} + +extern __inline __m256h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_set_ph (_Float16 __A15, _Float16 __A14, _Float16 __A13, + _Float16 __A12, _Float16 __A11, _Float16 __A10, + _Float16 __A9, _Float16 __A8, _Float16 __A7, + _Float16 __A6, _Float16 __A5, _Float16 __A4, + _Float16 __A3, _Float16 __A2, _Float16 __A1, + _Float16 __A0) +{ + return __extension__ (__m256h)(__v16hf){ __A0, __A1, __A2, __A3, + __A4, __A5, __A6, __A7, + __A8, __A9, __A10, __A11, + __A12, __A13, __A14, __A15 }; +} + +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_set_ph (_Float16 __A31, _Float16 __A30, _Float16 __A29, + _Float16 __A28, _Float16 __A27, _Float16 __A26, + _Float16 __A25, _Float16 __A24, _Float16 __A23, + _Float16 __A22, _Float16 __A21, _Float16 __A20, + _Float16 __A19, _Float16 __A18, _Float16 __A17, + _Float16 __A16, _Float16 __A15, _Float16 __A14, + _Float16 __A13, _Float16 __A12, _Float16 __A11, + _Float16 __A10, _Float16 __A9, _Float16 __A8, + _Float16 __A7, _Float16 __A6, _Float16 __A5, + _Float16 __A4, _Float16 __A3, _Float16 __A2, + _Float16 __A1, _Float16 __A0) +{ + return __extension__ (__m512h)(__v32hf){ __A0, __A1, __A2, __A3, + __A4, __A5, __A6, __A7, + __A8, __A9, __A10, __A11, + __A12, __A13, __A14, __A15, + __A16, __A17, __A18, __A19, + __A20, __A21, __A22, __A23, + __A24, __A25, __A26, __A27, + __A28, __A29, __A30, __A31 }; +} + +/* Create vectors of elements in the reversed order from _mm_set_ph, + _mm256_set_ph and _mm512_set_ph functions. */ + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_setr_ph (_Float16 __A0, _Float16 __A1, _Float16 __A2, + _Float16 __A3, _Float16 __A4, _Float16 __A5, + _Float16 __A6, _Float16 __A7) +{ + return _mm_set_ph (__A7, __A6, __A5, __A4, __A3, __A2, __A1, __A0); +} + +extern __inline __m256h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_setr_ph (_Float16 __A0, _Float16 __A1, _Float16 __A2, + _Float16 __A3, _Float16 __A4, _Float16 __A5, + _Float16 __A6, _Float16 __A7, _Float16 __A8, + _Float16 __A9, _Float16 __A10, _Float16 __A11, + _Float16 __A12, _Float16 __A13, _Float16 __A14, + _Float16 __A15) +{ + return _mm256_set_ph (__A15, __A14, __A13, __A12, __A11, __A10, __A9, + __A8, __A7, __A6, __A5, __A4, __A3, __A2, __A1, + __A0); +} + +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_setr_ph (_Float16 __A0, _Float16 __A1, _Float16 __A2, + _Float16 __A3, _Float16 __A4, _Float16 __A5, + _Float16 __A6, _Float16 __A7, _Float16 __A8, + _Float16 __A9, _Float16 __A10, _Float16 __A11, + _Float16 __A12, _Float16 __A13, _Float16 __A14, + _Float16 __A15, _Float16 __A16, _Float16 __A17, + _Float16 __A18, _Float16 __A19, _Float16 __A20, + _Float16 __A21, _Float16 __A22, _Float16 __A23, + _Float16 __A24, _Float16 __A25, _Float16 __A26, + _Float16 __A27, _Float16 __A28, _Float16 __A29, + _Float16 __A30, _Float16 __A31) + +{ + return _mm512_set_ph (__A31, __A30, __A29, __A28, __A27, __A26, __A25, + __A24, __A23, __A22, __A21, __A20, __A19, __A18, + __A17, __A16, __A15, __A14, __A13, __A12, __A11, + __A10, __A9, __A8, __A7, __A6, __A5, __A4, __A3, + __A2, __A1, __A0); +} + +/* Broadcast _Float16 to vector. */ + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_set1_ph (_Float16 __A) +{ + return _mm_set_ph (__A, __A, __A, __A, __A, __A, __A, __A); +} + +extern __inline __m256h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_set1_ph (_Float16 __A) +{ + return _mm256_set_ph (__A, __A, __A, __A, __A, __A, __A, __A, + __A, __A, __A, __A, __A, __A, __A, __A); +} + +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_set1_ph (_Float16 __A) +{ + return _mm512_set_ph (__A, __A, __A, __A, __A, __A, __A, __A, + __A, __A, __A, __A, __A, __A, __A, __A, + __A, __A, __A, __A, __A, __A, __A, __A, + __A, __A, __A, __A, __A, __A, __A, __A); +} + +/* Create a vector with all zeros. */ + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_setzero_ph (void) +{ + return _mm_set1_ph (0.0f); +} + +extern __inline __m256h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_setzero_ph (void) +{ + return _mm256_set1_ph (0.0f); +} + +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_setzero_ph (void) +{ + return _mm512_set1_ph (0.0f); +} + +/* Create a vector with element 0 as F and the rest zero. */ +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_set_sh (_Float16 __F) +{ + return _mm_set_ph (0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, __F); +} + +/* Create a vector with element 0 as *P and the rest zero. */ +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_load_sh (void const *__P) +{ + return _mm_set_ph (0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, + *(_Float16 const *) __P); +} + +/* Stores the lower _Float16 value. */ +extern __inline void +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_store_sh (void *__P, __m128h __A) +{ + *(_Float16 *) __P = ((__v8hf)__A)[0]; +} + #ifdef __DISABLE_AVX512FP16__ #undef __DISABLE_AVX512FP16__ #pragma GCC pop_options diff --git a/gcc/config/i386/i386-expand.c b/gcc/config/i386/i386-expand.c index ab5f5b284c8..5ce7163b241 100644 --- a/gcc/config/i386/i386-expand.c +++ b/gcc/config/i386/i386-expand.c @@ -13914,6 +13914,11 @@ ix86_expand_vector_init_duplicate (bool mmx_ok, machine_mode mode, } return true; + case E_V8HFmode: + case E_V16HFmode: + case E_V32HFmode: + return ix86_vector_duplicate_value (mode, target, val); + default: return false; } @@ -13998,6 +14003,18 @@ ix86_expand_vector_init_one_nonzero (bool mmx_ok, machine_mode mode, use_vector_set = TARGET_AVX512F && TARGET_64BIT && one_var == 0; gen_vec_set_0 = gen_vec_setv8di_0; break; + case E_V8HFmode: + use_vector_set = TARGET_AVX512FP16 && one_var == 0; + gen_vec_set_0 = gen_vec_setv8hf_0; + break; + case E_V16HFmode: + use_vector_set = TARGET_AVX512FP16 && one_var == 0; + gen_vec_set_0 = gen_vec_setv16hf_0; + break; + case E_V32HFmode: + use_vector_set = TARGET_AVX512FP16 && one_var == 0; + gen_vec_set_0 = gen_vec_setv32hf_0; + break; default: break; } @@ -14147,6 +14164,7 @@ ix86_expand_vector_init_one_var (bool mmx_ok, machine_mode mode, if (!TARGET_64BIT) return false; /* FALLTHRU */ + case E_V8HFmode: case E_V4DFmode: case E_V8SFmode: case E_V8SImode: @@ -14381,13 +14399,22 @@ ix86_expand_vector_init_interleave (machine_mode mode, { machine_mode first_imode, second_imode, third_imode, inner_mode; int i, j; - rtx op0, op1; + rtx op, op0, op1; rtx (*gen_load_even) (rtx, rtx, rtx); rtx (*gen_interleave_first_low) (rtx, rtx, rtx); rtx (*gen_interleave_second_low) (rtx, rtx, rtx); switch (mode) { + case E_V8HFmode: + gen_load_even = gen_vec_setv8hf; + gen_interleave_first_low = gen_vec_interleave_lowv4si; + gen_interleave_second_low = gen_vec_interleave_lowv2di; + inner_mode = HFmode; + first_imode = V4SImode; + second_imode = V2DImode; + third_imode = VOIDmode; + break; case E_V8HImode: gen_load_even = gen_vec_setv8hi; gen_interleave_first_low = gen_vec_interleave_lowv4si; @@ -14412,9 +14439,19 @@ ix86_expand_vector_init_interleave (machine_mode mode, for (i = 0; i < n; i++) { + op = ops [i + i]; + if (inner_mode == HFmode) + { + /* Convert HFmode to HImode. */ + op1 = gen_reg_rtx (HImode); + op1 = gen_rtx_SUBREG (HImode, force_reg (HFmode, op), 0); + op = gen_reg_rtx (HImode); + emit_move_insn (op, op1); + } + /* Extend the odd elment to SImode using a paradoxical SUBREG. */ op0 = gen_reg_rtx (SImode); - emit_move_insn (op0, gen_lowpart (SImode, ops [i + i])); + emit_move_insn (op0, gen_lowpart (SImode, op)); /* Insert the SImode value as low element of V4SImode vector. */ op1 = gen_reg_rtx (V4SImode); @@ -14551,6 +14588,10 @@ ix86_expand_vector_init_general (bool mmx_ok, machine_mode mode, half_mode = V8HImode; goto half; + case E_V16HFmode: + half_mode = V8HFmode; + goto half; + half: n = GET_MODE_NUNITS (mode); for (i = 0; i < n; i++) @@ -14574,6 +14615,11 @@ half: half_mode = V16HImode; goto quarter; + case E_V32HFmode: + quarter_mode = V8HFmode; + half_mode = V16HFmode; + goto quarter; + quarter: n = GET_MODE_NUNITS (mode); for (i = 0; i < n; i++) @@ -14610,6 +14656,9 @@ quarter: move from GPR to SSE register directly. */ if (!TARGET_INTER_UNIT_MOVES_TO_VEC) break; + /* FALLTHRU */ + + case E_V8HFmode: n = GET_MODE_NUNITS (mode); for (i = 0; i < n; i++) @@ -15076,6 +15125,10 @@ ix86_expand_vector_set (bool mmx_ok, rtx target, rtx val, int elt) } return; + case E_V8HFmode: + use_vec_merge = true; + break; + case E_V8HImode: case E_V2HImode: use_vec_merge = TARGET_SSE2; @@ -15550,6 +15603,28 @@ ix86_expand_vector_extract (bool mmx_ok, rtx target, rtx vec, int elt) ix86_expand_vector_extract (false, target, tmp, elt & 3); return; + case E_V32HFmode: + tmp = gen_reg_rtx (V16HFmode); + if (elt < 16) + emit_insn (gen_vec_extract_lo_v32hf (tmp, vec)); + else + emit_insn (gen_vec_extract_hi_v32hf (tmp, vec)); + ix86_expand_vector_extract (false, target, tmp, elt & 15); + return; + + case E_V16HFmode: + tmp = gen_reg_rtx (V8HFmode); + if (elt < 8) + emit_insn (gen_vec_extract_lo_v16hf (tmp, vec)); + else + emit_insn (gen_vec_extract_hi_v16hf (tmp, vec)); + ix86_expand_vector_extract (false, target, tmp, elt & 7); + return; + + case E_V8HFmode: + use_vec_extr = true; + break; + case E_V8QImode: use_vec_extr = TARGET_MMX_WITH_SSE && TARGET_SSE4_1; /* ??? Could extract the appropriate HImode element and shift. */ diff --git a/gcc/config/i386/i386-modes.def b/gcc/config/i386/i386-modes.def index 9232f59a925..fcadfcd4c94 100644 --- a/gcc/config/i386/i386-modes.def +++ b/gcc/config/i386/i386-modes.def @@ -84,12 +84,12 @@ VECTOR_MODES (INT, 16); /* V16QI V8HI V4SI V2DI */ VECTOR_MODES (INT, 32); /* V32QI V16HI V8SI V4DI */ VECTOR_MODES (INT, 64); /* V64QI V32HI V16SI V8DI */ VECTOR_MODES (INT, 128); /* V128QI V64HI V32SI V16DI */ -VECTOR_MODES (FLOAT, 8); /* V2SF */ -VECTOR_MODES (FLOAT, 16); /* V4SF V2DF */ -VECTOR_MODES (FLOAT, 32); /* V8SF V4DF V2TF */ -VECTOR_MODES (FLOAT, 64); /* V16SF V8DF V4TF */ -VECTOR_MODES (FLOAT, 128); /* V32SF V16DF V8TF */ -VECTOR_MODES (FLOAT, 256); /* V64SF V32DF V16TF */ +VECTOR_MODES (FLOAT, 8); /* V4HF V2SF */ +VECTOR_MODES (FLOAT, 16); /* V8HF V4SF V2DF */ +VECTOR_MODES (FLOAT, 32); /* V16HF V8SF V4DF V2TF */ +VECTOR_MODES (FLOAT, 64); /* V32HF V16SF V8DF V4TF */ +VECTOR_MODES (FLOAT, 128); /* V64HF V32SF V16DF V8TF */ +VECTOR_MODES (FLOAT, 256); /* V128HF V64SF V32DF V16TF */ VECTOR_MODE (INT, TI, 1); /* V1TI */ VECTOR_MODE (INT, DI, 1); /* V1DI */ VECTOR_MODE (INT, SI, 1); /* V1SI */ diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 9ca31e934ab..021283e6f39 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -2404,6 +2404,7 @@ classify_argument (machine_mode mode, const_tree type, case E_V8SFmode: case E_V8SImode: case E_V32QImode: + case E_V16HFmode: case E_V16HImode: case E_V4DFmode: case E_V4DImode: @@ -2414,6 +2415,7 @@ classify_argument (machine_mode mode, const_tree type, return 4; case E_V8DFmode: case E_V16SFmode: + case E_V32HFmode: case E_V8DImode: case E_V16SImode: case E_V32HImode: @@ -2431,6 +2433,7 @@ classify_argument (machine_mode mode, const_tree type, case E_V4SImode: case E_V16QImode: case E_V8HImode: + case E_V8HFmode: case E_V2DFmode: case E_V2DImode: classes[0] = X86_64_SSE_CLASS; @@ -19102,9 +19105,11 @@ inline_secondary_memory_needed (machine_mode mode, reg_class_t class1, if (!TARGET_SSE2) return true; - /* Between SSE and general, we have moves no larger than word size. */ + /* Between SSE and general, we have moves no larger than word size + except for AVX512FP16, VMOVW enable 16bits movement. */ if (!(INTEGER_CLASS_P (class1) || INTEGER_CLASS_P (class2)) - || GET_MODE_SIZE (mode) < GET_MODE_SIZE (SImode) + || GET_MODE_SIZE (mode) < GET_MODE_SIZE (TARGET_AVX512FP16 + ? HImode : SImode) || GET_MODE_SIZE (mode) > UNITS_PER_WORD) return true; @@ -19552,6 +19557,14 @@ ix86_hard_regno_mode_ok (unsigned int regno, machine_mode mode) || VALID_AVX512F_SCALAR_MODE (mode))) return true; + /* Allow HF vector modes for AVX512FP16. NB: Since HF vector + moves are implemented as integer vector moves, we allow + V8HFmode and V16HFmode without AVX512VL in xmm0-xmm15. */ + if (TARGET_AVX512FP16 && VALID_AVX512FP16_REG_MODE (mode)) + return (mode == V32HFmode + || TARGET_AVX512VL + || !EXT_REX_SSE_REGNO_P (regno)); + /* For AVX-5124FMAPS or AVX-5124VNNIW allow V64SF and V64SI modes for special regnos. */ if ((TARGET_AVX5124FMAPS || TARGET_AVX5124VNNIW) @@ -21663,6 +21676,8 @@ ix86_vector_mode_supported_p (machine_mode mode) if ((TARGET_MMX || TARGET_MMX_WITH_SSE) && VALID_MMX_REG_MODE (mode)) return true; + if (TARGET_AVX512FP16 && VALID_AVX512FP16_REG_MODE (mode)) + return true; if ((TARGET_3DNOW || TARGET_MMX_WITH_SSE) && VALID_MMX_REG_MODE_3DNOW (mode)) return true; diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md index ee5660e8161..25cee502f97 100644 --- a/gcc/config/i386/i386.md +++ b/gcc/config/i386/i386.md @@ -496,8 +496,8 @@ (define_attr "type" ;; Main data type used by the insn (define_attr "mode" - "unknown,none,QI,HI,SI,DI,TI,OI,XI,HF,SF,DF,XF,TF,V16SF,V8SF,V4DF,V4SF, - V2DF,V2SF,V1DF,V8DF" + "unknown,none,QI,HI,SI,DI,TI,OI,XI,HF,SF,DF,XF,TF,V32HF,V16HF,V8HF, + V16SF,V8SF,V4DF,V4SF,V2DF,V2SF,V1DF,V8DF" (const_string "unknown")) ;; The CPU unit operations uses. @@ -1098,7 +1098,8 @@ (define_mode_attr MODE_SIZE [(QI "1") (HI "2") (SI "4") (DI "8") (V2DI "16") (V4DI "32") (V8DI "64") (V1TI "16") (V2TI "32") (V4TI "64") (V2DF "16") (V4DF "32") (V8DF "64") - (V4SF "16") (V8SF "32") (V16SF "64")]) + (V4SF "16") (V8SF "32") (V16SF "64") + (V8HF "16") (V16HF "32") (V32HF "64")]) ;; Double word integer modes as mode attribute. (define_mode_attr DWI [(QI "HI") (HI "SI") (SI "DI") (DI "TI") (TI "OI")]) @@ -1239,9 +1240,9 @@ (define_mode_attr ssevecmodef [(SF "V4SF") (DF "V2DF") (TF "TF")]) ;; SSE instruction suffix for various modes (define_mode_attr ssemodesuffix [(HF "sh") (SF "ss") (DF "sd") - (V16SF "ps") (V8DF "pd") - (V8SF "ps") (V4DF "pd") - (V4SF "ps") (V2DF "pd") + (V32HF "ph") (V16SF "ps") (V8DF "pd") + (V16HF "ph") (V8SF "ps") (V4DF "pd") + (V8HF "ph") (V4SF "ps") (V2DF "pd") (V16QI "b") (V8HI "w") (V4SI "d") (V2DI "q") (V32QI "b") (V16HI "w") (V8SI "d") (V4DI "q") (V64QI "b") (V32HI "w") (V16SI "d") (V8DI "q")]) diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index 446f9ba552f..1009d656cbb 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -225,6 +225,8 @@ (define_mode_iterator VMOVE (V16SI "TARGET_AVX512F") (V8SI "TARGET_AVX") V4SI (V8DI "TARGET_AVX512F") (V4DI "TARGET_AVX") V2DI (V4TI "TARGET_AVX512F") (V2TI "TARGET_AVX") V1TI + (V32HF "TARGET_AVX512FP16") (V16HF "TARGET_AVX512FP16") + (V8HF "TARGET_AVX512FP16") (V16SF "TARGET_AVX512F") (V8SF "TARGET_AVX") V4SF (V8DF "TARGET_AVX512F") (V4DF "TARGET_AVX") V2DF]) @@ -240,6 +242,13 @@ (define_mode_iterator VI12_AVX512VL [V64QI (V16QI "TARGET_AVX512VL") (V32QI "TARGET_AVX512VL") V32HI (V16HI "TARGET_AVX512VL") (V8HI "TARGET_AVX512VL")]) +(define_mode_iterator VI12HF_AVX512VL + [V64QI (V16QI "TARGET_AVX512VL") (V32QI "TARGET_AVX512VL") + V32HI (V16HI "TARGET_AVX512VL") (V8HI "TARGET_AVX512VL") + (V32HF "TARGET_AVX512FP16") + (V16HF "TARGET_AVX512FP16 && TARGET_AVX512VL") + (V8HF "TARGET_AVX512FP16 && TARGET_AVX512VL")]) + ;; Same iterator, but without supposed TARGET_AVX512BW (define_mode_iterator VI12_AVX512VLBW [(V64QI "TARGET_AVX512BW") (V16QI "TARGET_AVX512VL") @@ -255,6 +264,7 @@ (define_mode_iterator V (V32HI "TARGET_AVX512F") (V16HI "TARGET_AVX") V8HI (V16SI "TARGET_AVX512F") (V8SI "TARGET_AVX") V4SI (V8DI "TARGET_AVX512F") (V4DI "TARGET_AVX") V2DI + (V32HF "TARGET_AVX512FP16") (V16HF "TARGET_AVX512FP16") V8HF (V16SF "TARGET_AVX512F") (V8SF "TARGET_AVX") V4SF (V8DF "TARGET_AVX512F") (V4DF "TARGET_AVX") (V2DF "TARGET_SSE2")]) @@ -277,7 +287,8 @@ (define_mode_iterator V_512 [V64QI V32HI V16SI V8DI V16SF V8DF]) (define_mode_iterator V_256_512 [V32QI V16HI V8SI V4DI V8SF V4DF (V64QI "TARGET_AVX512F") (V32HI "TARGET_AVX512F") (V16SI "TARGET_AVX512F") - (V8DI "TARGET_AVX512F") (V16SF "TARGET_AVX512F") (V8DF "TARGET_AVX512F")]) + (V8DI "TARGET_AVX512F") (V16SF "TARGET_AVX512F") (V8DF "TARGET_AVX512F") + (V16HF "TARGET_AVX512FP16") (V32HF "TARGET_AVX512FP16")]) ;; All vector float modes (define_mode_iterator VF @@ -352,6 +363,9 @@ (define_mode_iterator VF2_AVX512VL (define_mode_iterator VF1_AVX512VL [V16SF (V8SF "TARGET_AVX512VL") (V4SF "TARGET_AVX512VL")]) +(define_mode_iterator VF_AVX512FP16 + [V32HF V16HF V8HF]) + ;; All vector integer modes (define_mode_iterator VI [(V16SI "TARGET_AVX512F") (V8DI "TARGET_AVX512F") @@ -360,6 +374,16 @@ (define_mode_iterator VI (V8SI "TARGET_AVX") V4SI (V4DI "TARGET_AVX") V2DI]) +;; All vector integer and HF modes +(define_mode_iterator VIHF + [(V16SI "TARGET_AVX512F") (V8DI "TARGET_AVX512F") + (V64QI "TARGET_AVX512BW") (V32QI "TARGET_AVX") V16QI + (V32HI "TARGET_AVX512BW") (V16HI "TARGET_AVX") V8HI + (V8SI "TARGET_AVX") V4SI + (V4DI "TARGET_AVX") V2DI + (V32HF "TARGET_AVX512FP16") (V16HF "TARGET_AVX512FP16") + (V8HF "TARGET_AVX512FP16")]) + (define_mode_iterator VI_AVX2 [(V64QI "TARGET_AVX512BW") (V32QI "TARGET_AVX2") V16QI (V32HI "TARGET_AVX512BW") (V16HI "TARGET_AVX2") V8HI @@ -562,6 +586,7 @@ (define_mode_attr avx512 (V8HI "avx512vl") (V16HI "avx512vl") (V32HI "avx512bw") (V4SI "avx512vl") (V8SI "avx512vl") (V16SI "avx512f") (V2DI "avx512vl") (V4DI "avx512vl") (V8DI "avx512f") + (V8HF "avx512fp16") (V16HF "avx512vl") (V32HF "avx512bw") (V4SF "avx512vl") (V8SF "avx512vl") (V16SF "avx512f") (V2DF "avx512vl") (V4DF "avx512vl") (V8DF "avx512f")]) @@ -622,12 +647,13 @@ (define_mode_attr avx2_avx512 (V8HI "avx512vl") (V16HI "avx512vl") (V32HI "avx512bw")]) (define_mode_attr shuffletype - [(V16SF "f") (V16SI "i") (V8DF "f") (V8DI "i") - (V8SF "f") (V8SI "i") (V4DF "f") (V4DI "i") - (V4SF "f") (V4SI "i") (V2DF "f") (V2DI "i") - (V32HI "i") (V16HI "i") (V8HI "i") - (V64QI "i") (V32QI "i") (V16QI "i") - (V4TI "i") (V2TI "i") (V1TI "i")]) + [(V32HF "f") (V16HF "f") (V8HF "f") + (V16SF "f") (V16SI "i") (V8DF "f") (V8DI "i") + (V8SF "f") (V8SI "i") (V4DF "f") (V4DI "i") + (V4SF "f") (V4SI "i") (V2DF "f") (V2DI "i") + (V32HI "i") (V16HI "i") (V8HI "i") + (V64QI "i") (V32QI "i") (V16QI "i") + (V4TI "i") (V2TI "i") (V1TI "i")]) (define_mode_attr ssequartermode [(V16SF "V4SF") (V8DF "V2DF") (V16SI "V4SI") (V8DI "V2DI")]) @@ -664,6 +690,8 @@ (define_mode_iterator VI_256 [V32QI V16HI V8SI V4DI]) ;; All 128 and 256bit vector integer modes (define_mode_iterator VI_128_256 [V16QI V8HI V4SI V2DI V32QI V16HI V8SI V4DI]) +;; All 256bit vector integer and HF modes +(define_mode_iterator VIHF_256 [V32QI V16HI V8SI V4DI V16HF]) ;; Various 128bit vector integer mode combinations (define_mode_iterator VI12_128 [V16QI V8HI]) @@ -685,6 +713,9 @@ (define_mode_iterator VI48_512 [V16SI V8DI]) (define_mode_iterator VI4_256_8_512 [V8SI V8DI]) (define_mode_iterator VI_AVX512BW [V16SI V8DI (V32HI "TARGET_AVX512BW") (V64QI "TARGET_AVX512BW")]) +(define_mode_iterator VIHF_AVX512BW + [V16SI V8DI (V32HI "TARGET_AVX512BW") (V64QI "TARGET_AVX512BW") + (V32HF "TARGET_AVX512FP16")]) ;; Int-float size matches (define_mode_iterator VI4F_128 [V4SI V4SF]) @@ -725,6 +756,9 @@ (define_mode_iterator VF_AVX512 (V8SF "TARGET_AVX512VL") (V4DF "TARGET_AVX512VL") V16SF V8DF]) +(define_mode_iterator V16_256 [V16HI V16HF]) +(define_mode_iterator V32_512 [V32HI V32HF]) + (define_mode_attr avx512bcst [(V4SI "%{1to4%}") (V2DI "%{1to2%}") (V8SI "%{1to8%}") (V4DI "%{1to4%}") @@ -774,8 +808,16 @@ (define_mode_attr sseinsnmode (V16SF "V16SF") (V8DF "V8DF") (V8SF "V8SF") (V4DF "V4DF") (V4SF "V4SF") (V2DF "V2DF") + (V8HF "TI") (V16HF "OI") (V32HF "XI") (TI "TI")]) +;; SSE integer instruction suffix for various modes +(define_mode_attr sseintmodesuffix + [(V16QI "b") (V8HI "w") (V4SI "d") (V2DI "q") + (V32QI "b") (V16HI "w") (V8SI "d") (V4DI "q") + (V64QI "b") (V32HI "w") (V16SI "d") (V8DI "q") + (V8HF "w") (V16HF "w") (V32HF "w")]) + ;; Mapping of vector modes to corresponding mask size (define_mode_attr avx512fmaskmode [(V64QI "DI") (V32QI "SI") (V16QI "HI") @@ -835,7 +877,8 @@ (define_mode_attr ssedoublevecmode (V16QI "V32QI") (V8HI "V16HI") (V4SI "V8SI") (V2DI "V4DI") (V16SF "V32SF") (V8DF "V16DF") (V8SF "V16SF") (V4DF "V8DF") - (V4SF "V8SF") (V2DF "V4DF")]) + (V4SF "V8SF") (V2DF "V4DF") + (V32HF "V64HF") (V16HF "V32HF") (V8HF "V16HF")]) ;; Mapping of vector modes to a vector mode of half size ;; instead of V1DI/V1DF, DI/DF are used for V2DI/V2DF although they are scalar. @@ -845,7 +888,8 @@ (define_mode_attr ssehalfvecmode (V16QI "V8QI") (V8HI "V4HI") (V4SI "V2SI") (V2DI "DI") (V16SF "V8SF") (V8DF "V4DF") (V8SF "V4SF") (V4DF "V2DF") - (V4SF "V2SF") (V2DF "DF")]) + (V4SF "V2SF") (V2DF "DF") + (V32HF "V16HF") (V16HF "V8HF") (V8HF "V4HF")]) (define_mode_attr ssehalfvecmodelower [(V64QI "v32qi") (V32HI "v16hi") (V16SI "v8si") (V8DI "v4di") (V4TI "v2ti") @@ -853,9 +897,10 @@ (define_mode_attr ssehalfvecmodelower (V16QI "v8qi") (V8HI "v4hi") (V4SI "v2si") (V16SF "v8sf") (V8DF "v4df") (V8SF "v4sf") (V4DF "v2df") - (V4SF "v2sf")]) + (V4SF "v2sf") + (V32HF "v16hf") (V16HF "v8hf") (V8HF "v4hf")]) -;; Mapping of vector modes ti packed single mode of the same size +;; Mapping of vector modes to packed single mode of the same size (define_mode_attr ssePSmode [(V16SI "V16SF") (V8DF "V16SF") (V16SF "V16SF") (V8DI "V16SF") @@ -865,7 +910,8 @@ (define_mode_attr ssePSmode (V4DI "V8SF") (V2DI "V4SF") (V4TI "V16SF") (V2TI "V8SF") (V1TI "V4SF") (V8SF "V8SF") (V4SF "V4SF") - (V4DF "V8SF") (V2DF "V4SF")]) + (V4DF "V8SF") (V2DF "V4SF") + (V32HF "V16SF") (V16HF "V8SF") (V8HF "V4SF")]) (define_mode_attr ssePSmode2 [(V8DI "V8SF") (V4DI "V4SF")]) @@ -887,6 +933,7 @@ (define_mode_attr ssescalarmodelower (V32HI "hi") (V16HI "hi") (V8HI "hi") (V16SI "si") (V8SI "si") (V4SI "si") (V8DI "di") (V4DI "di") (V2DI "di") + (V32HF "hf") (V16HF "hf") (V8HF "hf") (V16SF "sf") (V8SF "sf") (V4SF "sf") (V8DF "df") (V4DF "df") (V2DF "df") (V4TI "ti") (V2TI "ti")]) @@ -897,6 +944,7 @@ (define_mode_attr ssexmmmode (V32HI "V8HI") (V16HI "V8HI") (V8HI "V8HI") (V16SI "V4SI") (V8SI "V4SI") (V4SI "V4SI") (V8DI "V2DI") (V4DI "V2DI") (V2DI "V2DI") + (V32HF "V8HF") (V16HF "V8HF") (V8HF "V8HF") (V16SF "V4SF") (V8SF "V4SF") (V4SF "V4SF") (V8DF "V2DF") (V4DF "V2DF") (V2DF "V2DF")]) @@ -939,10 +987,11 @@ (define_mode_attr ssescalarsize (V64QI "8") (V32QI "8") (V16QI "8") (V32HI "16") (V16HI "16") (V8HI "16") (V16SI "32") (V8SI "32") (V4SI "32") + (V32HF "16") (V16HF "16") (V8HF "16") (V16SF "32") (V8SF "32") (V4SF "32") (V8DF "64") (V4DF "64") (V2DF "64")]) -;; SSE prefix for integer vector modes +;; SSE prefix for integer and HF vector modes (define_mode_attr sseintprefix [(V2DI "p") (V2DF "") (V4DI "p") (V4DF "") @@ -950,9 +999,9 @@ (define_mode_attr sseintprefix (V4SI "p") (V4SF "") (V8SI "p") (V8SF "") (V16SI "p") (V16SF "") - (V16QI "p") (V8HI "p") - (V32QI "p") (V16HI "p") - (V64QI "p") (V32HI "p")]) + (V16QI "p") (V8HI "p") (V8HF "p") + (V32QI "p") (V16HI "p") (V16HF "p") + (V64QI "p") (V32HI "p") (V32HF "p")]) ;; SSE scalar suffix for vector modes (define_mode_attr ssescalarmodesuffix @@ -987,7 +1036,8 @@ (define_mode_attr castmode ;; i128 for integer vectors and TARGET_AVX2, f128 otherwise. ;; i64x4 or f64x4 for 512bit modes. (define_mode_attr i128 - [(V16SF "f64x4") (V8SF "f128") (V8DF "f64x4") (V4DF "f128") + [(V16HF "%~128") (V32HF "i64x4") (V16SF "f64x4") (V8SF "f128") + (V8DF "f64x4") (V4DF "f128") (V64QI "i64x4") (V32QI "%~128") (V32HI "i64x4") (V16HI "%~128") (V16SI "i64x4") (V8SI "%~128") (V8DI "i64x4") (V4DI "%~128")]) @@ -1011,14 +1061,18 @@ (define_mode_attr bcstscalarsuff (V32HI "w") (V16HI "w") (V8HI "w") (V16SI "d") (V8SI "d") (V4SI "d") (V8DI "q") (V4DI "q") (V2DI "q") + (V32HF "w") (V16HF "w") (V8HF "w") (V16SF "ss") (V8SF "ss") (V4SF "ss") (V8DF "sd") (V4DF "sd") (V2DF "sd")]) ;; Tie mode of assembler operand to mode iterator (define_mode_attr xtg_mode - [(V16QI "x") (V8HI "x") (V4SI "x") (V2DI "x") (V4SF "x") (V2DF "x") - (V32QI "t") (V16HI "t") (V8SI "t") (V4DI "t") (V8SF "t") (V4DF "t") - (V64QI "g") (V32HI "g") (V16SI "g") (V8DI "g") (V16SF "g") (V8DF "g")]) + [(V16QI "x") (V8HI "x") (V4SI "x") (V2DI "x") + (V8HF "x") (V4SF "x") (V2DF "x") + (V32QI "t") (V16HI "t") (V8SI "t") (V4DI "t") + (V16HF "t") (V8SF "t") (V4DF "t") + (V64QI "g") (V32HI "g") (V16SI "g") (V8DI "g") + (V32HF "g") (V16SF "g") (V8DF "g")]) ;; Half mask mode for unpacks (define_mode_attr HALFMASKMODE @@ -8353,6 +8407,45 @@ (define_insn "vec_set_0" ] (symbol_ref "true")))]) +;; vmovw clears also the higer bits +(define_insn "vec_set_0" + [(set (match_operand:VF_AVX512FP16 0 "register_operand" "=v") + (vec_merge:VF_AVX512FP16 + (vec_duplicate:VF_AVX512FP16 + (match_operand:HF 2 "nonimmediate_operand" "rm")) + (match_operand:VF_AVX512FP16 1 "const0_operand" "C") + (const_int 1)))] + "TARGET_AVX512FP16" + "vmovw\t{%2, %x0|%x0, %2}" + [(set_attr "type" "ssemov") + (set_attr "prefix" "evex") + (set_attr "mode" "HF")]) + +(define_insn "*avx512fp16_movsh" + [(set (match_operand:V8HF 0 "register_operand" "=v") + (vec_merge:V8HF + (vec_duplicate:V8HF + (match_operand:HF 2 "register_operand" "v")) + (match_operand:V8HF 1 "register_operand" "v") + (const_int 1)))] + "TARGET_AVX512FP16" + "vmovsh\t{%2, %1, %0|%0, %1, %2}" + [(set_attr "type" "ssemov") + (set_attr "prefix" "evex") + (set_attr "mode" "HF")]) + +(define_insn "avx512fp16_movsh" + [(set (match_operand:V8HF 0 "register_operand" "=v") + (vec_merge:V8HF + (match_operand:V8HF 2 "register_operand" "v") + (match_operand:V8HF 1 "register_operand" "v") + (const_int 1)))] + "TARGET_AVX512FP16" + "vmovsh\t{%2, %1, %0|%0, %1, %2}" + [(set_attr "type" "ssemov") + (set_attr "prefix" "evex") + (set_attr "mode" "HF")]) + ;; A subset is vec_setv4sf. (define_insn "*vec_setv4sf_sse4_1" [(set (match_operand:V4SF 0 "register_operand" "=Yr,*x,v") @@ -9189,10 +9282,10 @@ (define_insn "vec_extract_hi_" (set_attr "length_immediate" "1") (set_attr "mode" "")]) -(define_insn_and_split "vec_extract_lo_v32hi" - [(set (match_operand:V16HI 0 "nonimmediate_operand" "=v,v,m") - (vec_select:V16HI - (match_operand:V32HI 1 "nonimmediate_operand" "v,m,v") +(define_insn_and_split "vec_extract_lo_" + [(set (match_operand: 0 "nonimmediate_operand" "=v,v,m") + (vec_select: + (match_operand:V32_512 1 "nonimmediate_operand" "v,m,v") (parallel [(const_int 0) (const_int 1) (const_int 2) (const_int 3) (const_int 4) (const_int 5) @@ -9219,9 +9312,10 @@ (define_insn_and_split "vec_extract_lo_v32hi" if (!TARGET_AVX512VL && REG_P (operands[0]) && EXT_REX_SSE_REG_P (operands[1])) - operands[0] = lowpart_subreg (V32HImode, operands[0], V16HImode); + operands[0] = lowpart_subreg (mode, operands[0], + mode); else - operands[1] = gen_lowpart (V16HImode, operands[1]); + operands[1] = gen_lowpart (mode, operands[1]); } [(set_attr "type" "sselog1") (set_attr "prefix_extra" "1") @@ -9230,10 +9324,10 @@ (define_insn_and_split "vec_extract_lo_v32hi" (set_attr "prefix" "evex") (set_attr "mode" "XI")]) -(define_insn "vec_extract_hi_v32hi" - [(set (match_operand:V16HI 0 "nonimmediate_operand" "=vm") - (vec_select:V16HI - (match_operand:V32HI 1 "register_operand" "v") +(define_insn "vec_extract_hi_" + [(set (match_operand: 0 "nonimmediate_operand" "=vm") + (vec_select: + (match_operand:V32_512 1 "register_operand" "v") (parallel [(const_int 16) (const_int 17) (const_int 18) (const_int 19) (const_int 20) (const_int 21) @@ -9250,10 +9344,10 @@ (define_insn "vec_extract_hi_v32hi" (set_attr "prefix" "evex") (set_attr "mode" "XI")]) -(define_insn_and_split "vec_extract_lo_v16hi" - [(set (match_operand:V8HI 0 "nonimmediate_operand" "=v,m") - (vec_select:V8HI - (match_operand:V16HI 1 "nonimmediate_operand" "vm,v") +(define_insn_and_split "vec_extract_lo_" + [(set (match_operand: 0 "nonimmediate_operand" "=v,m") + (vec_select: + (match_operand:V16_256 1 "nonimmediate_operand" "vm,v") (parallel [(const_int 0) (const_int 1) (const_int 2) (const_int 3) (const_int 4) (const_int 5) @@ -9262,12 +9356,12 @@ (define_insn_and_split "vec_extract_lo_v16hi" "#" "&& reload_completed" [(set (match_dup 0) (match_dup 1))] - "operands[1] = gen_lowpart (V8HImode, operands[1]);") + "operands[1] = gen_lowpart (mode, operands[1]);") -(define_insn "vec_extract_hi_v16hi" - [(set (match_operand:V8HI 0 "nonimmediate_operand" "=xm,vm,vm") - (vec_select:V8HI - (match_operand:V16HI 1 "register_operand" "x,v,v") +(define_insn "vec_extract_hi_" + [(set (match_operand: 0 "nonimmediate_operand" "=xm,vm,vm") + (vec_select: + (match_operand:V16_256 1 "register_operand" "x,v,v") (parallel [(const_int 8) (const_int 9) (const_int 10) (const_int 11) (const_int 12) (const_int 13) @@ -9403,12 +9497,41 @@ (define_insn "vec_extract_hi_v32qi" (set_attr "prefix" "vex,evex,evex") (set_attr "mode" "OI")]) +;; NB: *vec_extract_0 must be placed before *vec_extracthf. +;; Otherwise, it will be ignored. +(define_insn_and_split "*vec_extract_0" + [(set (match_operand:HF 0 "nonimmediate_operand" "=v,m,r") + (vec_select:HF + (match_operand:VF_AVX512FP16 1 "nonimmediate_operand" "vm,v,m") + (parallel [(const_int 0)])))] + "TARGET_SSE && !(MEM_P (operands[0]) && MEM_P (operands[1]))" + "#" + "&& reload_completed" + [(set (match_dup 0) (match_dup 1))] + "operands[1] = gen_lowpart (HFmode, operands[1]);") + +(define_insn "*vec_extracthf" + [(set (match_operand:HF 0 "register_sse4nonimm_operand" "=r,m") + (vec_select:HF + (match_operand:V8HF 1 "register_operand" "v,v") + (parallel + [(match_operand:SI 2 "const_0_to_7_operand")])))] + "TARGET_AVX512FP16" + "@ + vpextrw\t{%2, %1, %k0|%k0, %1, %2} + vpextrw\t{%2, %1, %0|%0, %1, %2}" + [(set_attr "type" "sselog1") + (set_attr "prefix" "maybe_evex") + (set_attr "mode" "TI")]) + ;; Modes handled by vec_extract patterns. (define_mode_iterator VEC_EXTRACT_MODE [(V64QI "TARGET_AVX512BW") (V32QI "TARGET_AVX") V16QI (V32HI "TARGET_AVX512BW") (V16HI "TARGET_AVX") V8HI (V16SI "TARGET_AVX512F") (V8SI "TARGET_AVX") V4SI (V8DI "TARGET_AVX512F") (V4DI "TARGET_AVX") V2DI + (V32HF "TARGET_AVX512FP16") (V16HF "TARGET_AVX512FP16") + (V8HF "TARGET_AVX512FP16") (V16SF "TARGET_AVX512F") (V8SF "TARGET_AVX") V4SF (V8DF "TARGET_AVX512F") (V4DF "TARGET_AVX") V2DF (V4TI "TARGET_AVX512F") (V2TI "TARGET_AVX")]) @@ -14639,16 +14762,16 @@ (define_expand "vec_interleave_low" ;; Modes handled by pinsr patterns. (define_mode_iterator PINSR_MODE - [(V16QI "TARGET_SSE4_1") V8HI + [(V16QI "TARGET_SSE4_1") V8HI (V8HF "TARGET_AVX512FP16") (V4SI "TARGET_SSE4_1") (V2DI "TARGET_SSE4_1 && TARGET_64BIT")]) (define_mode_attr sse2p4_1 - [(V16QI "sse4_1") (V8HI "sse2") + [(V16QI "sse4_1") (V8HI "sse2") (V8HF "sse4_1") (V4SI "sse4_1") (V2DI "sse4_1")]) (define_mode_attr pinsr_evex_isa - [(V16QI "avx512bw") (V8HI "avx512bw") + [(V16QI "avx512bw") (V8HI "avx512bw") (V8HF "avx512bw") (V4SI "avx512dq") (V2DI "avx512dq")]) ;; sse4_1_pinsrd must come before sse2_loadld since it is preferred. @@ -14676,11 +14799,19 @@ (define_insn "_pinsr" case 2: case 4: if (GET_MODE_SIZE (mode) < GET_MODE_SIZE (SImode)) - return "vpinsr\t{%3, %k2, %1, %0|%0, %1, %k2, %3}"; + { + if (mode == V8HFmode) + return "vpinsrw\t{%3, %k2, %1, %0|%0, %1, %k2, %3}"; + else + return "vpinsr\t{%3, %k2, %1, %0|%0, %1, %k2, %3}"; + } /* FALLTHRU */ case 3: case 5: - return "vpinsr\t{%3, %2, %1, %0|%0, %1, %2, %3}"; + if (mode == V8HFmode) + return "vpinsrw\t{%3, %2, %1, %0|%0, %1, %2, %3}"; + else + return "vpinsr\t{%3, %2, %1, %0|%0, %1, %2, %3}"; default: gcc_unreachable (); } @@ -21095,16 +21226,17 @@ (define_mode_attr pbroadcast_evex_isa [(V64QI "avx512bw") (V32QI "avx512bw") (V16QI "avx512bw") (V32HI "avx512bw") (V16HI "avx512bw") (V8HI "avx512bw") (V16SI "avx512f") (V8SI "avx512f") (V4SI "avx512f") - (V8DI "avx512f") (V4DI "avx512f") (V2DI "avx512f")]) + (V8DI "avx512f") (V4DI "avx512f") (V2DI "avx512f") + (V32HF "avx512bw") (V16HF "avx512bw") (V8HF "avx512bw")]) (define_insn "avx2_pbroadcast" - [(set (match_operand:VI 0 "register_operand" "=x,v") - (vec_duplicate:VI + [(set (match_operand:VIHF 0 "register_operand" "=x,v") + (vec_duplicate:VIHF (vec_select: (match_operand: 1 "nonimmediate_operand" "xm,vm") (parallel [(const_int 0)]))))] "TARGET_AVX2" - "vpbroadcast\t{%1, %0|%0, %1}" + "vpbroadcast\t{%1, %0|%0, %1}" [(set_attr "isa" "*,") (set_attr "type" "ssemov") (set_attr "prefix_extra" "1") @@ -21112,17 +21244,17 @@ (define_insn "avx2_pbroadcast" (set_attr "mode" "")]) (define_insn "avx2_pbroadcast_1" - [(set (match_operand:VI_256 0 "register_operand" "=x,x,v,v") - (vec_duplicate:VI_256 + [(set (match_operand:VIHF_256 0 "register_operand" "=x,x,v,v") + (vec_duplicate:VIHF_256 (vec_select: - (match_operand:VI_256 1 "nonimmediate_operand" "m,x,m,v") + (match_operand:VIHF_256 1 "nonimmediate_operand" "m,x,m,v") (parallel [(const_int 0)]))))] "TARGET_AVX2" "@ - vpbroadcast\t{%1, %0|%0, %1} - vpbroadcast\t{%x1, %0|%0, %x1} - vpbroadcast\t{%1, %0|%0, %1} - vpbroadcast\t{%x1, %0|%0, %x1}" + vpbroadcast\t{%1, %0|%0, %1} + vpbroadcast\t{%x1, %0|%0, %x1} + vpbroadcast\t{%1, %0|%0, %1} + vpbroadcast\t{%x1, %0|%0, %x1}" [(set_attr "isa" "*,*,,") (set_attr "type" "ssemov") (set_attr "prefix_extra" "1") @@ -21476,15 +21608,15 @@ (define_insn "avx2_vec_dupv4df" (set_attr "mode" "V4DF")]) (define_insn "_vec_dup_1" - [(set (match_operand:VI_AVX512BW 0 "register_operand" "=v,v") - (vec_duplicate:VI_AVX512BW + [(set (match_operand:VIHF_AVX512BW 0 "register_operand" "=v,v") + (vec_duplicate:VIHF_AVX512BW (vec_select: - (match_operand:VI_AVX512BW 1 "nonimmediate_operand" "v,m") + (match_operand:VIHF_AVX512BW 1 "nonimmediate_operand" "v,m") (parallel [(const_int 0)]))))] "TARGET_AVX512F" "@ - vpbroadcast\t{%x1, %0|%0, %x1} - vpbroadcast\t{%x1, %0|%0, %1}" + vpbroadcast\t{%x1, %0|%0, %x1} + vpbroadcast\t{%x1, %0|%0, %1}" [(set_attr "type" "ssemov") (set_attr "prefix" "evex") (set_attr "mode" "")]) @@ -21509,8 +21641,8 @@ (define_insn "_vec_dup" (set_attr "mode" "")]) (define_insn "_vec_dup" - [(set (match_operand:VI12_AVX512VL 0 "register_operand" "=v") - (vec_duplicate:VI12_AVX512VL + [(set (match_operand:VI12HF_AVX512VL 0 "register_operand" "=v") + (vec_duplicate:VI12HF_AVX512VL (vec_select: (match_operand: 1 "nonimmediate_operand" "vm") (parallel [(const_int 0)]))))] @@ -21545,8 +21677,8 @@ (define_insn "avx512f_broadcast" (set_attr "mode" "")]) (define_insn "_vec_dup_gpr" - [(set (match_operand:VI12_AVX512VL 0 "register_operand" "=v,v") - (vec_duplicate:VI12_AVX512VL + [(set (match_operand:VI12HF_AVX512VL 0 "register_operand" "=v,v") + (vec_duplicate:VI12HF_AVX512VL (match_operand: 1 "nonimmediate_operand" "vm,r")))] "TARGET_AVX512BW" "@ @@ -21641,7 +21773,7 @@ (define_mode_attr vecdupssescalarmodesuffix [(V8SF "ss") (V4DF "sd") (V8SI "ss") (V4DI "sd")]) ;; Modes handled by AVX2 vec_dup patterns. (define_mode_iterator AVX2_VEC_DUP_MODE - [V32QI V16QI V16HI V8HI V8SI V4SI]) + [V32QI V16QI V16HI V8HI V8SI V4SI V16HF V8HF]) (define_insn "*vec_dup" [(set (match_operand:AVX2_VEC_DUP_MODE 0 "register_operand" "=x,x,v") @@ -22403,6 +22535,8 @@ (define_mode_iterator VEC_INIT_MODE (V32HI "TARGET_AVX512F") (V16HI "TARGET_AVX") V8HI (V16SI "TARGET_AVX512F") (V8SI "TARGET_AVX") V4SI (V8DI "TARGET_AVX512F") (V4DI "TARGET_AVX") V2DI + (V32HF "TARGET_AVX512FP16") (V16HF "TARGET_AVX512FP16") + (V8HF "TARGET_AVX512FP16") (V16SF "TARGET_AVX512F") (V8SF "TARGET_AVX") V4SF (V8DF "TARGET_AVX512F") (V4DF "TARGET_AVX") (V2DF "TARGET_SSE2") (V4TI "TARGET_AVX512F") (V2TI "TARGET_AVX")]) @@ -22414,6 +22548,8 @@ (define_mode_iterator VEC_INIT_HALF_MODE (V32HI "TARGET_AVX512F") (V16HI "TARGET_AVX") V8HI (V16SI "TARGET_AVX512F") (V8SI "TARGET_AVX") V4SI (V8DI "TARGET_AVX512F") (V4DI "TARGET_AVX") + (V32HF "TARGET_AVX512FP16") (V16HF "TARGET_AVX512FP16") + (V8HF "TARGET_AVX512FP16") (V16SF "TARGET_AVX512F") (V8SF "TARGET_AVX") V4SF (V8DF "TARGET_AVX512F") (V4DF "TARGET_AVX") (V4TI "TARGET_AVX512F")]) From patchwork Thu Jul 1 06:15:48 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: liuhongt X-Patchwork-Id: 1499299 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256 header.s=default header.b=X2v+s74Y; dkim-atps=neutral Received: from sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4GFp3r1xgzz9sWX for ; Thu, 1 Jul 2021 16:20:24 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id B9BD5384A881 for ; Thu, 1 Jul 2021 06:20:21 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org B9BD5384A881 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1625120421; bh=ykDdg3spOg9IGJtoFAa+jpbOqSgNYAhBP9+Ao5vTrYg=; h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=X2v+s74YG4wE8b0FoBtzTo2lPuS2rqQv1JNO/XUqgLnd8RWxre03mntDf8a57jKMz 74fGOC6Mbos2SoeqXHDWrwHgu9qBFGw2xX3acBKxzyBdE7yLibX1iGqkUVGJUyp2Pb 8YyAdC3vr+hPbgIZbml5uCgj8BdD0bIqo3h5YS68= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mga06.intel.com (mga06.intel.com [134.134.136.31]) by sourceware.org (Postfix) with ESMTPS id 52AB43857432 for ; Thu, 1 Jul 2021 06:16:57 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 52AB43857432 X-IronPort-AV: E=McAfee;i="6200,9189,10031"; a="269610072" X-IronPort-AV: E=Sophos;i="5.83,313,1616482800"; d="scan'208";a="269610072" Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Jun 2021 23:16:54 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.83,313,1616482800"; d="scan'208";a="420287665" Received: from scymds01.sc.intel.com ([10.148.94.138]) by fmsmga007.fm.intel.com with ESMTP; 30 Jun 2021 23:16:54 -0700 Received: from shliclel320.sh.intel.com (shliclel320.sh.intel.com [10.239.236.50]) by scymds01.sc.intel.com with ESMTP id 1616Gmef031625; Wed, 30 Jun 2021 23:16:52 -0700 To: gcc-patches@gcc.gnu.org Subject: [PATCH 02/62] AVX512FP16: Add testcase for vector init and broadcast intrinsics. Date: Thu, 1 Jul 2021 14:15:48 +0800 Message-Id: <20210701061648.9447-3-hongtao.liu@intel.com> X-Mailer: git-send-email 2.18.1 In-Reply-To: <20210701061648.9447-1-hongtao.liu@intel.com> References: <20210701061648.9447-1-hongtao.liu@intel.com> X-Spam-Status: No, score=-14.2 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_NONE, KAM_DMARC_STATUS, KAM_LAZY_DOMAIN_SECURITY, KAM_SHORT, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: liuhongt via Gcc-patches From: liuhongt Reply-To: liuhongt Cc: jakub@redhat.com Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" gcc/testsuite/ChangeLog: * gcc.target/i386/m512-check.h: Add union128h, union256h, union512h. * gcc.target/i386/avx512fp16-10a.c: New test. * gcc.target/i386/avx512fp16-10b.c: Ditto. * gcc.target/i386/avx512fp16-1a.c: Ditto. * gcc.target/i386/avx512fp16-1b.c: Ditto. * gcc.target/i386/avx512fp16-1c.c: Ditto. * gcc.target/i386/avx512fp16-1d.c: Ditto. * gcc.target/i386/avx512fp16-1e.c: Ditto. * gcc.target/i386/avx512fp16-2a.c: Ditto. * gcc.target/i386/avx512fp16-2b.c: Ditto. * gcc.target/i386/avx512fp16-2c.c: Ditto. * gcc.target/i386/avx512fp16-3a.c: Ditto. * gcc.target/i386/avx512fp16-3b.c: Ditto. * gcc.target/i386/avx512fp16-3c.c: Ditto. * gcc.target/i386/avx512fp16-4.c: Ditto. * gcc.target/i386/avx512fp16-5.c: Ditto. * gcc.target/i386/avx512fp16-6.c: Ditto. * gcc.target/i386/avx512fp16-7.c: Ditto. * gcc.target/i386/avx512fp16-8.c: Ditto. * gcc.target/i386/avx512fp16-9a.c: Ditto. * gcc.target/i386/avx512fp16-9b.c: Ditto. --- .../gcc.target/i386/avx512fp16-10a.c | 14 ++ .../gcc.target/i386/avx512fp16-10b.c | 25 ++++ gcc/testsuite/gcc.target/i386/avx512fp16-1a.c | 24 ++++ gcc/testsuite/gcc.target/i386/avx512fp16-1b.c | 32 +++++ gcc/testsuite/gcc.target/i386/avx512fp16-1c.c | 26 ++++ gcc/testsuite/gcc.target/i386/avx512fp16-1d.c | 33 +++++ gcc/testsuite/gcc.target/i386/avx512fp16-1e.c | 30 ++++ gcc/testsuite/gcc.target/i386/avx512fp16-2a.c | 28 ++++ gcc/testsuite/gcc.target/i386/avx512fp16-2b.c | 33 +++++ gcc/testsuite/gcc.target/i386/avx512fp16-2c.c | 36 +++++ gcc/testsuite/gcc.target/i386/avx512fp16-3a.c | 36 +++++ gcc/testsuite/gcc.target/i386/avx512fp16-3b.c | 35 +++++ gcc/testsuite/gcc.target/i386/avx512fp16-3c.c | 40 ++++++ gcc/testsuite/gcc.target/i386/avx512fp16-4.c | 31 ++++ gcc/testsuite/gcc.target/i386/avx512fp16-5.c | 133 ++++++++++++++++++ gcc/testsuite/gcc.target/i386/avx512fp16-6.c | 57 ++++++++ gcc/testsuite/gcc.target/i386/avx512fp16-7.c | 86 +++++++++++ gcc/testsuite/gcc.target/i386/avx512fp16-8.c | 53 +++++++ gcc/testsuite/gcc.target/i386/avx512fp16-9a.c | 27 ++++ gcc/testsuite/gcc.target/i386/avx512fp16-9b.c | 49 +++++++ gcc/testsuite/gcc.target/i386/m512-check.h | 38 ++++- 21 files changed, 865 insertions(+), 1 deletion(-) create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-10a.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-10b.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-1a.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-1b.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-1c.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-1d.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-1e.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-2a.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-2b.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-2c.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-3a.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-3b.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-3c.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-4.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-5.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-6.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-7.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-8.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-9a.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-9b.c diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-10a.c b/gcc/testsuite/gcc.target/i386/avx512fp16-10a.c new file mode 100644 index 00000000000..f06ffffa822 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-10a.c @@ -0,0 +1,14 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -mavx512fp16" } */ + +#include + +__m128h +__attribute__ ((noinline, noclone)) +set_128 (_Float16 x) +{ + return _mm_set_sh (x); +} + +/* { dg-final { scan-assembler-times "vmovw\[ \t]\+\[^\n\r]*xmm0" 1 { target { ia32 } } } } */ +/* { dg-final { scan-assembler-times "vmovw\[ \t]\+\[^\n\r]*xmm0" 2 { target { ! ia32 } } } } */ diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-10b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-10b.c new file mode 100644 index 00000000000..055edd7aaf5 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-10b.c @@ -0,0 +1,25 @@ +/* { dg-do run { target avx512fp16 } } */ +/* { dg-options "-O2 -mavx512fp16" } */ + +#include + +static void do_test (void); + +#define DO_TEST do_test +#define AVX512FP16 +#include "avx512-check.h" +#include "avx512fp16-10a.c" + +union128h u128 = { ESP_FLOAT16, 0.0f, 0.0f, 0.0f, + 0.0f, 0.0f, 0.0f, 0.0f }; + +static void +do_test (void) +{ + __m128h v128 = set_128 (ESP_FLOAT16); + union128h a128; + + a128.x = v128; + if (check_union128h (a128, u128.a)) + abort (); +} diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16-1a.c new file mode 100644 index 00000000000..45c7bddeba5 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-1a.c @@ -0,0 +1,24 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -mavx512fp16" } */ + +typedef _Float16 __v8hf __attribute__ ((__vector_size__ (16))); +typedef _Float16 __m128h __attribute__ ((__vector_size__ (16), __may_alias__)); + +__m128h +__attribute__ ((noinline, noclone)) +foo1 (_Float16 x) +{ + return __extension__ (__m128h)(__v8hf) { x, 0.0f, 0.0f, 0.0f, + 0.0f, 0.0f, 0.0f, 0.0f }; +} + +__m128h +__attribute__ ((noinline, noclone)) +foo2 (_Float16 *x) +{ + return __extension__ (__m128h)(__v8hf) { *x, 0.0f, 0.0f, 0.0f, + 0.0f, 0.0f, 0.0f, 0.0f }; +} + +/* { dg-final { scan-assembler-times "vmovw\[^\n\r]*xmm0" 3 { target { ! ia32 } } } } */ +/* { dg-final { scan-assembler-times "vmovw\[^\n\r]*xmm0" 2 { target { ia32 } } } } */ diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-1b.c new file mode 100644 index 00000000000..7560c625e25 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-1b.c @@ -0,0 +1,32 @@ +/* { dg-do run { target avx512fp16 } } */ +/* { dg-options "-O2 -mavx512fp16" } */ + +#include + +static void do_test (void); + +#define DO_TEST do_test +#define AVX512FP16 +#include "avx512-check.h" +#include "avx512fp16-1a.c" + +static void +do_test (void) +{ + _Float16 x = 25.3; + union128h u = { x, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f }; + __m128h v; + union128h a; + memset (&v, -1, sizeof (v)); + v = foo1 (x); + a.x = v; + if (check_union128h (a, u.a)) + abort (); + x = 33.3; + u.a[0] = x; + memset (&v, -1, sizeof (v)); + v = foo2 (&x); + a.x = v; + if (check_union128h (a, u.a)) + abort (); +} diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-1c.c b/gcc/testsuite/gcc.target/i386/avx512fp16-1c.c new file mode 100644 index 00000000000..9814e9c0363 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-1c.c @@ -0,0 +1,26 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx512fp16 -O2" } */ +/* { dg-final { scan-assembler-times "vmovsh" 2 { target { ! ia32 } } } } */ +/* { dg-final { scan-assembler-times "vpinsrw" 1 { target { ! ia32 } } } } */ +/* { dg-final { scan-assembler-times "vpinsrw" 2 { target { ia32 } } } } */ + +typedef _Float16 __v8hf __attribute__ ((__vector_size__ (16))); +typedef _Float16 __m128h __attribute__ ((__vector_size__ (16), __may_alias__)); + +__m128h +__attribute__ ((noinline, noclone)) +foo1 (__m128h a, _Float16 f) +{ + __v8hf x = (__v8hf) a; + x[2] = f; + return (__m128h) x; +} + +__m128h +__attribute__ ((noinline, noclone)) +foo2 (__m128h a, _Float16 f) +{ + __v8hf x = (__v8hf) a; + x[0] = f; + return (__m128h) x; +} diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-1d.c b/gcc/testsuite/gcc.target/i386/avx512fp16-1d.c new file mode 100644 index 00000000000..cdaf656eb48 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-1d.c @@ -0,0 +1,33 @@ +/* { dg-do run { target avx512fp16 } } */ +/* { dg-options "-O2 -mavx512fp16" } */ + +#include + +static void do_test (void); + +#define DO_TEST do_test +#define AVX512FP16 +#include "avx512-check.h" +#include "avx512fp16-1c.c" + +static void +do_test (void) +{ + _Float16 x = 25.3; + union128h u = { -1.2f, 1.0f, 2.0f, 3.0f, 4.0f, 5.0f, 6.0f, 7.0f }; + __m128h v; + union128h a, b; + v = foo1 (u.x, x); + a.x = v; + b = u; + b.a[2] = x; + if (check_union128h (a, b.a)) + abort (); + x = 33.3; + b = u; + b.a[0] = x; + v = foo2 (u.x, x); + a.x = v; + if (check_union128h (a, b.a)) + abort (); +} diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-1e.c b/gcc/testsuite/gcc.target/i386/avx512fp16-1e.c new file mode 100644 index 00000000000..04d33cfcf2b --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-1e.c @@ -0,0 +1,30 @@ +/* { dg-do run { target avx512fp16 } } */ +/* { dg-options "-O2 -mavx512fp16" } */ + +#include + +static void do_test (void); + +#define DO_TEST do_test +#define AVX512FP16 +#include "avx512-check.h" +#include "avx512fp16-1a.c" + +__m128h +__attribute__ ((noinline,noclone)) +foo3 (__m128h x) +{ + return foo1(x[0]); +} + +static void +do_test (void) +{ + union128h u = { -1.2f, 1.0f, 2.0f, 3.0f, 4.0f, 5.0f, 6.0f, 7.0f }; + union128h a, b = { -1.2f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f}; + __m128h v; + v = foo3 (u.x); + a.x = v; + if (check_union128h (a, b.a)) + abort (); +} diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-2a.c b/gcc/testsuite/gcc.target/i386/avx512fp16-2a.c new file mode 100644 index 00000000000..c03138fb13d --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-2a.c @@ -0,0 +1,28 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -mavx512fp16" } */ + +typedef _Float16 __v16hf __attribute__ ((__vector_size__ (32))); +typedef _Float16 __m256h __attribute__ ((__vector_size__ (32), __may_alias__)); + +__m256h +__attribute__ ((noinline, noclone)) +foo1 (_Float16 x) +{ + return __extension__ (__m256h)(__v16hf) { x, 0.0f, 0.0f, 0.0f, + 0.0f, 0.0f, 0.0f, 0.0f, + 0.0f, 0.0f, 0.0f, 0.0f, + 0.0f, 0.0f, 0.0f, 0.0f }; +} + +__m256h +__attribute__ ((noinline, noclone)) +foo2 (_Float16 *x) +{ + return __extension__ (__m256h)(__v16hf) { *x, 0.0f, 0.0f, 0.0f, + 0.0f, 0.0f, 0.0f, 0.0f, + 0.0f, 0.0f, 0.0f, 0.0f, + 0.0f, 0.0f, 0.0f, 0.0f }; +} + +/* { dg-final { scan-assembler-times "vmovw\[^\n\r]*xmm0" 3 { target { ! ia32 } } } } */ +/* { dg-final { scan-assembler-times "vmovw\[^\n\r]*xmm0" 2 { target { ia32 } } } } */ diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-2b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-2b.c new file mode 100644 index 00000000000..100afd0f49c --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-2b.c @@ -0,0 +1,33 @@ +/* { dg-do run { target avx512fp16 } } */ +/* { dg-options "-O2 -mavx512fp16" } */ + +#include + +static void do_test (void); + +#define DO_TEST do_test +#define AVX512FP16 +#include "avx512-check.h" +#include "avx512fp16-2a.c" + +static void +do_test (void) +{ + _Float16 x = 25.3; + union256h u = { x, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, + 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f }; + __m256h v; + union256h a; + memset (&v, -1, sizeof (v)); + v = foo1 (x); + a.x = v; + if (check_union256h (a, u.a)) + abort (); + x = 33.3; + u.a[0] = x; + memset (&v, -1, sizeof (v)); + v = foo2 (&x); + a.x = v; + if (check_union256h (a, u.a)) + abort (); +} diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-2c.c b/gcc/testsuite/gcc.target/i386/avx512fp16-2c.c new file mode 100644 index 00000000000..cf4b42a4021 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-2c.c @@ -0,0 +1,36 @@ +/* { dg-do run { target avx512fp16 } } */ +/* { dg-options "-O2 -mavx512fp16" } */ + +#include + +static void do_test (void); + +#define DO_TEST do_test +#define AVX512FP16 +#include "avx512-check.h" +#include "avx512fp16-2a.c" + +__m256h +__attribute__ ((noinline,noclone)) +foo3 (__m256h x) +{ + return foo1(x[0]); +} + +static void +do_test (void) +{ + _Float16 x = 25.3; + union256h u = { x, 3.5f, -5.9f, 0.0f, 0.0f, 0.0f, 7.7f, 0.0f, + 4.0f, -4.20f, 0.0f, 0.0f, 0.0f, -8.7f, 0.0f, 0.0f }; + + union256h exp = { x, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, + 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f }; + __m256h v; + union256h a; + memset (&v, -1, sizeof (v)); + v = foo3 (u.x); + a.x = v; + if (check_union256h (a, exp.a)) + abort (); +} diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-3a.c b/gcc/testsuite/gcc.target/i386/avx512fp16-3a.c new file mode 100644 index 00000000000..126e7d9ee36 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-3a.c @@ -0,0 +1,36 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -mavx512fp16" } */ + +typedef _Float16 __v32hf __attribute__ ((__vector_size__ (64))); +typedef _Float16 __m512h __attribute__ ((__vector_size__ (64), __may_alias__)); + +__m512h +__attribute__ ((noinline, noclone)) +foo1 (_Float16 x) +{ + return __extension__ (__m512h)(__v32hf) { x, 0.0f, 0.0f, 0.0f, + 0.0f, 0.0f, 0.0f, 0.0f, + 0.0f, 0.0f, 0.0f, 0.0f, + 0.0f, 0.0f, 0.0f, 0.0f, + 0.0f, 0.0f, 0.0f, 0.0f, + 0.0f, 0.0f, 0.0f, 0.0f, + 0.0f, 0.0f, 0.0f, 0.0f, + 0.0f, 0.0f, 0.0f, 0.0f }; +} + +__m512h +__attribute__ ((noinline, noclone)) +foo2 (_Float16 *x) +{ + return __extension__ (__m512h)(__v32hf) { *x, 0.0f, 0.0f, 0.0f, + 0.0f, 0.0f, 0.0f, 0.0f, + 0.0f, 0.0f, 0.0f, 0.0f, + 0.0f, 0.0f, 0.0f, 0.0f, + 0.0f, 0.0f, 0.0f, 0.0f, + 0.0f, 0.0f, 0.0f, 0.0f, + 0.0f, 0.0f, 0.0f, 0.0f, + 0.0f, 0.0f, 0.0f, 0.0f }; +} + +/* { dg-final { scan-assembler-times "vmovw\[^\n\r]*xmm0" 3 { target { ! ia32 } } } } */ +/* { dg-final { scan-assembler-times "vmovw\[^\n\r]*xmm0" 2 { target { ia32 } } } } */ diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-3b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-3b.c new file mode 100644 index 00000000000..291db066bfa --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-3b.c @@ -0,0 +1,35 @@ +/* { dg-do run { target avx512fp16 } } */ +/* { dg-options "-O2 -mavx512fp16" } */ + +#include + +static void do_test (void); + +#define DO_TEST do_test +#define AVX512FP16 +#include "avx512-check.h" +#include "avx512fp16-3a.c" + +static void +do_test (void) +{ + _Float16 x = 25.3; + union512h u = { x, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, + 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, + 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, + 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f }; + __m512h v; + union512h a; + memset (&v, -1, sizeof (v)); + v = foo1 (x); + a.x = v; + if (check_union512h (a, u.a)) + abort (); + x = 33.3; + u.a[0] = x; + memset (&v, -1, sizeof (v)); + v = foo2 (&x); + a.x = v; + if (check_union512h (a, u.a)) + abort (); +} diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-3c.c b/gcc/testsuite/gcc.target/i386/avx512fp16-3c.c new file mode 100644 index 00000000000..21f9e16434a --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-3c.c @@ -0,0 +1,40 @@ +/* { dg-do run { target avx512fp16 } } */ +/* { dg-options "-O2 -mavx512fp16" } */ + +#include + +static void do_test (void); + +#define DO_TEST do_test +#define AVX512FP16 +#include "avx512-check.h" +#include "avx512fp16-3a.c" + +__m512h +__attribute__ ((noinline,noclone)) +foo3 (__m512h x) +{ + return foo1(x[0]); +} + +static void +do_test (void) +{ + _Float16 x = 25.3; + union512h u = { x, 3.5f, -5.9f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, + 2.0f, -2.3f, 0.0f, 0.0f, 10.4f, 0.0f, 0.0f, 0.0f, + 3.0f, -3.2f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, + 4.0f, -4.20f, 0.0f, 0.0f, 0.0f, -8.7f, 0.0f, 0.0f }; + + union512h exp = { x, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, + 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, + 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, + 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f }; + __m512h v; + union512h a; + memset (&v, -1, sizeof (v)); + v = foo3 (u.x); + a.x = v; + if (check_union512h (a, exp.a)) + abort (); +} diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-4.c b/gcc/testsuite/gcc.target/i386/avx512fp16-4.c new file mode 100644 index 00000000000..1329a0434a0 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-4.c @@ -0,0 +1,31 @@ +/* { dg-do assemble { target avx512fp16 } } */ +/* { dg-options "-O2 -mavx512fp16" } */ + +typedef _Float16 __m256h __attribute__ ((__vector_size__ (32), __may_alias__)); +typedef _Float16 __m512h __attribute__ ((__vector_size__ (64), __may_alias__)); +typedef _Float16 __m128h __attribute__ ((__vector_size__ (16), __may_alias__)); + +extern __m128h x128, y128; +extern __m256h x256, y256; +extern __m512h x512, y512; + +__m128h +foo1 (float f1, __m128h f2) +{ + x128 = y128; + return f2; +} + +__m256h +foo2 (float f1, __m256h f2) +{ + x256 = y256; + return f2; +} + +__m512h +foo3 (float f1, __m512h f2) +{ + x512 = y512; + return f2; +} diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-5.c b/gcc/testsuite/gcc.target/i386/avx512fp16-5.c new file mode 100644 index 00000000000..d28b9651b8b --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-5.c @@ -0,0 +1,133 @@ +/* { dg-do run { target avx512fp16 } } */ +/* { dg-options "-O2 -mavx512fp16" } */ + +#include + +static void do_test (void); + +#define DO_TEST do_test +#define AVX512FP16 +#include "avx512-check.h" + +__m128h +__attribute__ ((noinline, noclone)) +foo1 (_Float16 x) +{ + return __extension__ (__m128h)(__v8hf) { x, 0.0f, 0.0f, 0.0f, + 1.0f, 0.0f, 0.0f, 0.0f }; +} + +__m128h +__attribute__ ((noinline, noclone)) +foo2 (_Float16 x, _Float16 y) +{ + return __extension__ (__m128h)(__v8hf) { x, 0.0f, 0.0f, y, + 3.0f, 0.0f, 0.0f, 0.0f }; +} + +__m256h +__attribute__ ((noinline, noclone)) +foo3 (_Float16 x) +{ + return __extension__ (__m256h)(__v16hf) { x, 0.0f, 0.0f, 0.0f, + 0.0f, 0.0f, 0.0f, 0.0f, + 1.0f, 0.0f, 0.0f, 0.0f, + 0.0f, 0.0f, 0.0f, 0.0f }; +} + +__m256h +__attribute__ ((noinline, noclone)) +foo4 (_Float16 x, _Float16 y) +{ + return __extension__ (__m256h)(__v16hf) { x, 0.0f, 0.0f, 0.0f, + 0.0f, 0.0f, 0.0f, y, + 3.0f, 0.0f, 0.0f, 0.0f, + 0.0f, 0.0f, 0.0f, 0.0f }; +} + +__m512h +__attribute__ ((noinline, noclone)) +foo5 (_Float16 x) +{ + return __extension__ (__m512h)(__v32hf) { x, 0.0f, 0.0f, 0.0f, + 0.0f, 0.0f, 0.0f, 0.0f, + 0.0f, 0.0f, 0.0f, 0.0f, + 0.0f, 0.0f, 0.0f, 0.0f, + 1.0f, 0.0f, 0.0f, 0.0f, + 0.0f, 0.0f, 0.0f, 0.0f, + 0.0f, 0.0f, 0.0f, 0.0f, + 0.0f, 0.0f, 0.0f, 0.0f }; +} + +__m512h +__attribute__ ((noinline, noclone)) +foo6 (_Float16 x, _Float16 y) +{ + return __extension__ (__m512h)(__v32hf) { x, 0.0f, 0.0f, 0.0f, + 0.0f, 0.0f, 0.0f, 0.0f, + 0.0f, 0.0f, 0.0f, 0.0f, + 0.0f, 0.0f, 0.0f, y, + 3.0f, 0.0f, 0.0f, 0.0f, + 0.0f, 0.0f, 0.0f, 0.0f, + 0.0f, 0.0f, 0.0f, 0.0f, + 0.0f, 0.0f, 0.0f, 0.0f }; +} + +static void +do_test (void) +{ + _Float16 x = 25.3; + _Float16 y = -35.7; + union128h u128 = { x, 0.0f, 0.0f, 0.0f, 1.0f, 0.0f, 0.0f, 0.0f }; + union256h u256 = { x, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, + 1.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f }; + union512h u512 = { x, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, + 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, + 1.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, + 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f }; + __m128h v128; + __m256h v256; + __m512h v512; + union128h a128; + union256h a256; + union512h a512; + + memset (&v128, -1, sizeof (v128)); + v128 = foo1 (x); + a128.x = v128; + if (check_union128h (a128, u128.a)) + abort (); + memset (&v128, -1, sizeof (v128)); + u128.a[3] = y; + u128.a[4] = 3.0f; + v128 = foo2 (x, y); + a128.x = v128; + if (check_union128h (a128, u128.a)) + abort (); + + memset (&v256, -1, sizeof (v256)); + v256 = foo3 (x); + a256.x = v256; + if (check_union256h (a256, u256.a)) + abort (); + memset (&v256, -1, sizeof (v256)); + u256.a[7] = y; + u256.a[8] = 3.0f; + v256 = foo4 (x, y); + a256.x = v256; + if (check_union256h (a256, u256.a)) + abort (); + + memset (&v512, -1, sizeof (v512)); + v512 = foo5 (x); + a512.x = v512; + if (check_union512h (a512, u512.a)) + abort (); + memset (&v512, -1, sizeof (v512)); + u512.a[15] = y; + u512.a[16] = 3.0f; + v512 = foo6 (x, y); + a512.x = v512; + if (check_union512h (a512, u512.a)) + abort (); +} diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-6.c b/gcc/testsuite/gcc.target/i386/avx512fp16-6.c new file mode 100644 index 00000000000..d85a6c40603 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-6.c @@ -0,0 +1,57 @@ +/* { dg-do run { target avx512fp16 } } */ +/* { dg-options "-O2 -mavx512fp16" } */ + +#include + +static void do_test (void); + +#define DO_TEST do_test +#define AVX512FP16 +#include "avx512-check.h" + +void +__attribute__ ((noinline, noclone)) +foo128 (_Float16 *p, __m128h x) +{ + *p = ((__v8hf)x)[0]; +} + +void +__attribute__ ((noinline, noclone)) +foo256 (_Float16 *p, __m256h x) +{ + *p = ((__v16hf)x)[0]; +} + +void +__attribute__ ((noinline, noclone)) +foo512 (_Float16 *p, __m512h x) +{ + *p = ((__v32hf)x)[0]; +} + +static void +do_test (void) +{ + _Float16 x = 25.3; + union128h u128 = { x, 0.0f, 0.0f, 0.0f, 1.0f, 0.0f, 0.0f, 0.0f }; + union256h u256 = { x, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, + 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f }; + union512h u512 = { x, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, + 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, + 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, + 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f }; + _Float16 y; + + foo128 (&y, u128.x); + if (x != y) + abort (); + + foo256 (&y, u256.x); + if (x != y) + abort (); + + foo512 (&y, u512.x); + if (x != y) + abort (); +} diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-7.c b/gcc/testsuite/gcc.target/i386/avx512fp16-7.c new file mode 100644 index 00000000000..26ae25fc0d4 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-7.c @@ -0,0 +1,86 @@ +/* { dg-do run { target avx512fp16 } } */ +/* { dg-options "-O2 -mavx512fp16" } */ + +#include + +static void do_test (void); + +#define DO_TEST do_test +#define AVX512FP16 +#include "avx512-check.h" + +void +__attribute__ ((noinline, noclone)) +foo128 (_Float16 *p, __m128h x) +{ + *p = ((__v8hf)x)[4]; +} + +void +__attribute__ ((noinline, noclone)) +foo256 (_Float16 *p, __m256h x) +{ + *p = ((__v16hf)x)[10]; +} + +void +__attribute__ ((noinline, noclone)) +foo512 (_Float16 *p, __m512h x) +{ + *p = ((__v32hf)x)[30]; +} + +static void +do_test (void) +{ + _Float16 x = 25.3; + union128h u128 = { 0.0f, x, 0.0f, 0.0f, x, 0.0f, 0.0f, x }; + union256h u256 = { x, 0.0f, 0.0f, 0.0f, x, 0.0f, 0.0f, 0.0f, + 0.0f, 0.0f, x, 0.0f, 0.0f, x, 0.0f, 0.0f }; + union512h u512 = { x, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, + 0.0f, x, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, + 0.0f, 0.0f, x, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, + 0.0f, 0.0f, 0.0f, x, 0.0f, 0.0f, x, 0.0f }; + __m128h v128 = _mm_setr_ph (0.0f, x, 0.0f, 0.0f, + x, 0.0f, 0.0f, x); + __m256h v256 = _mm256_setr_ph (x, 0.0f, 0.0f, 0.0f, + x, 0.0f, 0.0f, 0.0f, + 0.0f, 0.0f, x, 0.0f, + 0.0f, x, 0.0f, 0.0f); + __m512h v512 = _mm512_setr_ph (x, 0.0f, 0.0f, 0.0f, + 0.0f, 0.0f, 0.0f, 0.0f, + 0.0f, x, 0.0f, 0.0f, + 0.0f, 0.0f, 0.0f, 0.0f, + 0.0f, 0.0f, x, 0.0f, + 0.0f, 0.0f, 0.0f, 0.0f, + 0.0f, 0.0f, 0.0f, x, + 0.0f, 0.0f, x, 0.0f); + union128h a128; + union256h a256; + union512h a512; + _Float16 y; + + a128.x = v128; + if (check_union128h (a128, u128.a)) + abort (); + + a256.x = v256; + if (check_union256h (a256, u256.a)) + abort (); + + a512.x = v512; + if (check_union512h (a512, u512.a)) + abort (); + + foo128 (&y, u128.x); + if (x != y) + abort (); + + foo256 (&y, u256.x); + if (x != y) + abort (); + + foo512 (&y, u512.x); + if (x != y) + abort (); +} diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-8.c b/gcc/testsuite/gcc.target/i386/avx512fp16-8.c new file mode 100644 index 00000000000..8f103751c2f --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-8.c @@ -0,0 +1,53 @@ +/* { dg-do run { target avx512fp16 } } */ +/* { dg-options "-O2 -mavx512fp16" } */ + +#include + +static void do_test (void); + +#define DO_TEST do_test +#define AVX512FP16 +#include "avx512-check.h" + +_Float16 +__attribute__ ((noinline, noclone)) +foo128 (__m128h x) +{ + return ((__v8hf)x)[4]; +} + +_Float16 +__attribute__ ((noinline, noclone)) +foo256 (__m256h x) +{ + return ((__v16hf)x)[10]; +} + +_Float16 +__attribute__ ((noinline, noclone)) +foo512 (__m512h x) +{ + return ((__v32hf)x)[30]; +} + +static void +do_test (void) +{ + _Float16 x = 25.3; + union128h u128 = { 0.0f, 0.0f, 0.0f, 0.0f, x, 0.0f, 0.0f, 0.0f }; + union256h u256 = { 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, + 0.0f, 0.0f, x, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f }; + union512h u512 = { 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, + 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, + 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, + 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, x, 0.0f }; + + if (foo128 (u128.x) != x) + abort (); + + if (foo256 (u256.x) != x) + abort (); + + if (foo512 (u512.x) != x) + abort (); +} diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-9a.c b/gcc/testsuite/gcc.target/i386/avx512fp16-9a.c new file mode 100644 index 00000000000..580ffb51e45 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-9a.c @@ -0,0 +1,27 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -mavx512fp16" } */ + +#include + +__m128h +__attribute__ ((noinline, noclone)) +set1_128 (_Float16 x) +{ + return _mm_set1_ph (x); +} + +__m256h +__attribute__ ((noinline, noclone)) +set1_256 (_Float16 x) +{ + return _mm256_set1_ph (x); +} + +__m512h +__attribute__ ((noinline, noclone)) +set1_512 (_Float16 x) +{ + return _mm512_set1_ph (x); +} + +/* { dg-final { scan-assembler-times "vpbroadcastw\[ \t]\+\[^\n\r]*\[xyz\]mm0" 3 } } */ diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-9b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-9b.c new file mode 100644 index 00000000000..198b23e64b4 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-9b.c @@ -0,0 +1,49 @@ +/* { dg-do run { target avx512fp16 } } */ +/* { dg-options "-O2 -mavx512fp16" } */ + +#include + +static void do_test (void); + +#define DO_TEST do_test +#define AVX512FP16 +#include "avx512-check.h" +#include "avx512fp16-9a.c" + +union128h u128 = { ESP_FLOAT16, ESP_FLOAT16, ESP_FLOAT16, ESP_FLOAT16, + ESP_FLOAT16, ESP_FLOAT16, ESP_FLOAT16, ESP_FLOAT16 }; +union256h u256 = { ESP_FLOAT16, ESP_FLOAT16, ESP_FLOAT16, ESP_FLOAT16, + ESP_FLOAT16, ESP_FLOAT16, ESP_FLOAT16, ESP_FLOAT16, + ESP_FLOAT16, ESP_FLOAT16, ESP_FLOAT16, ESP_FLOAT16, + ESP_FLOAT16, ESP_FLOAT16, ESP_FLOAT16, ESP_FLOAT16 }; +union512h u512 = { ESP_FLOAT16, ESP_FLOAT16, ESP_FLOAT16, ESP_FLOAT16, + ESP_FLOAT16, ESP_FLOAT16, ESP_FLOAT16, ESP_FLOAT16, + ESP_FLOAT16, ESP_FLOAT16, ESP_FLOAT16, ESP_FLOAT16, + ESP_FLOAT16, ESP_FLOAT16, ESP_FLOAT16, ESP_FLOAT16, + ESP_FLOAT16, ESP_FLOAT16, ESP_FLOAT16, ESP_FLOAT16, + ESP_FLOAT16, ESP_FLOAT16, ESP_FLOAT16, ESP_FLOAT16, + ESP_FLOAT16, ESP_FLOAT16, ESP_FLOAT16, ESP_FLOAT16, + ESP_FLOAT16, ESP_FLOAT16, ESP_FLOAT16, ESP_FLOAT16 }; + +static void +do_test (void) +{ + __m128h v128 = set1_128 (ESP_FLOAT16); + __m256h v256 = set1_256 (ESP_FLOAT16); + __m512h v512 = set1_512 (ESP_FLOAT16); + union128h a128; + union256h a256; + union512h a512; + + a128.x = v128; + if (check_union128h (a128, u128.a)) + abort (); + + a256.x = v256; + if (check_union256h (a256, u256.a)) + abort (); + + a512.x = v512; + if (check_union512h (a512, u512.a)) + abort (); +} diff --git a/gcc/testsuite/gcc.target/i386/m512-check.h b/gcc/testsuite/gcc.target/i386/m512-check.h index 6befaf0a9ba..68e74fce68d 100644 --- a/gcc/testsuite/gcc.target/i386/m512-check.h +++ b/gcc/testsuite/gcc.target/i386/m512-check.h @@ -60,7 +60,24 @@ typedef union __m512i x; unsigned long long a[8]; } union512i_uq; - + +typedef union +{ + __m128h x; + _Float16 a[8]; +} union128h; + +typedef union +{ + __m256h x; + _Float16 a[16]; +} union256h; + +typedef union +{ + __m512h x; + _Float16 a[32]; +} union512h; CHECK_EXP (union512i_b, char, "%d") CHECK_EXP (union512i_w, short, "%d") @@ -115,3 +132,22 @@ CHECK_ROUGH_EXP (union256, float, "%f") CHECK_ROUGH_EXP (union256d, double, "%f") CHECK_ROUGH_EXP (union128, float, "%f") CHECK_ROUGH_EXP (union128d, double, "%f") + +#ifdef AVX512FP16 + +CHECK_EXP (union128h, _Float16, "%f") +CHECK_EXP (union256h, _Float16, "%f") +CHECK_EXP (union512h, _Float16, "%f") + +#ifndef ESP_FLOAT16 +#define ESP_FLOAT16 0.27 +#endif + +CHECK_FP_EXP (union128h, _Float16, ESP_FLOAT16, "%f") +CHECK_FP_EXP (union256h, _Float16, ESP_FLOAT16, "%f") +CHECK_FP_EXP (union512h, _Float16, ESP_FLOAT16, "%f") + +CHECK_ROUGH_EXP (union128h, _Float16, "%f") +CHECK_ROUGH_EXP (union256h, _Float16, "%f") +CHECK_ROUGH_EXP (union512h, _Float16, "%f") +#endif From patchwork Thu Jul 1 06:15:49 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: liuhongt X-Patchwork-Id: 1499285 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256 header.s=default header.b=RJfvxTtU; dkim-atps=neutral Received: from sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4GFp1X2ZYDz9sWX for ; Thu, 1 Jul 2021 16:18:24 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 7B32F3848015 for ; Thu, 1 Jul 2021 06:18:21 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 7B32F3848015 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1625120301; bh=xyLMcfez3JjDufPuCo97rBgXgFL8eavirlwtSncBcBw=; h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=RJfvxTtUmEE1yGvjQ0B3q/9DwiJ3x1e60GfSUOm+DGuuId/X9RiBVEgXoOtjCQu1f QOu1EY2oI5DMICYic5GC9hznLU/aKX9qY9mgy39bHz+/ayHrG0ocjMOm3RDJA9dLLY 4S6qrKDfW7e9a/K35M1QayTheDJQRRrrRRK91gJw= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by sourceware.org (Postfix) with ESMTPS id 906C6385C8B1 for ; Thu, 1 Jul 2021 06:16:57 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 906C6385C8B1 X-IronPort-AV: E=McAfee;i="6200,9189,10031"; a="205474380" X-IronPort-AV: E=Sophos;i="5.83,313,1616482800"; d="scan'208";a="205474380" Received: from fmsmga006.fm.intel.com ([10.253.24.20]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Jun 2021 23:16:56 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.83,313,1616482800"; d="scan'208";a="641961901" Received: from scymds01.sc.intel.com ([10.148.94.138]) by fmsmga006.fm.intel.com with ESMTP; 30 Jun 2021 23:16:55 -0700 Received: from shliclel320.sh.intel.com (shliclel320.sh.intel.com [10.239.236.50]) by scymds01.sc.intel.com with ESMTP id 1616Gmeg031625; Wed, 30 Jun 2021 23:16:54 -0700 To: gcc-patches@gcc.gnu.org Subject: [PATCH 03/62] AVX512FP16: Fix HF vector passing in variable arguments. Date: Thu, 1 Jul 2021 14:15:49 +0800 Message-Id: <20210701061648.9447-4-hongtao.liu@intel.com> X-Mailer: git-send-email 2.18.1 In-Reply-To: <20210701061648.9447-1-hongtao.liu@intel.com> References: <20210701061648.9447-1-hongtao.liu@intel.com> X-Spam-Status: No, score=-15.0 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_NONE, KAM_DMARC_STATUS, KAM_LAZY_DOMAIN_SECURITY, KAM_SHORT, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: liuhongt via Gcc-patches From: liuhongt Reply-To: liuhongt Cc: jakub@redhat.com Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" From: "H.J. Lu" gcc/ChangeLog: * config/i386/i386.c (function_arg_advance_64): Allow V16HFmode and V32HFmode. (function_arg_64): Likewise. (ix86_gimplify_va_arg): Likewise. gcc/testsuite/ChangeLog: * gcc.target/i386/avx512fp16-vararg-1.c: New test. * gcc.target/i386/avx512fp16-vararg-2.c: Ditto. * gcc.target/i386/avx512fp16-vararg-3.c: Ditto. * gcc.target/i386/avx512fp16-vararg-4.c: Ditto. --- gcc/config/i386/i386.c | 8 +- .../gcc.target/i386/avx512fp16-vararg-1.c | 122 ++++++++++++++++++ .../gcc.target/i386/avx512fp16-vararg-2.c | 107 +++++++++++++++ .../gcc.target/i386/avx512fp16-vararg-3.c | 114 ++++++++++++++++ .../gcc.target/i386/avx512fp16-vararg-4.c | 115 +++++++++++++++++ 5 files changed, 465 insertions(+), 1 deletion(-) create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vararg-1.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vararg-2.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vararg-3.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vararg-4.c diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 021283e6f39..79e6880d9dd 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -2908,7 +2908,9 @@ function_arg_advance_64 (CUMULATIVE_ARGS *cum, machine_mode mode, /* Unnamed 512 and 256bit vector mode parameters are passed on stack. */ if (!named && (VALID_AVX512F_REG_MODE (mode) - || VALID_AVX256_REG_MODE (mode))) + || VALID_AVX256_REG_MODE (mode) + || mode == V16HFmode + || mode == V32HFmode)) return 0; if (!examine_argument (mode, type, 0, &int_nregs, &sse_nregs) @@ -3167,6 +3169,8 @@ function_arg_64 (const CUMULATIVE_ARGS *cum, machine_mode mode, case E_V32HImode: case E_V8DFmode: case E_V8DImode: + case E_V16HFmode: + case E_V32HFmode: /* Unnamed 256 and 512bit vector mode parameters are passed on stack. */ if (!named) return NULL; @@ -4658,6 +4662,8 @@ ix86_gimplify_va_arg (tree valist, tree type, gimple_seq *pre_p, case E_V32HImode: case E_V8DFmode: case E_V8DImode: + case E_V16HFmode: + case E_V32HFmode: /* Unnamed 256 and 512bit vector mode parameters are passed on stack. */ if (!TARGET_64BIT_MS_ABI) { diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vararg-1.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vararg-1.c new file mode 100644 index 00000000000..9bd366838b9 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vararg-1.c @@ -0,0 +1,122 @@ +/* { dg-do run } */ +/* { dg-require-effective-target avx512fp16 } */ +/* { dg-options "-mavx512fp16" } */ + +#include +#include + +static void do_test (void); + +#define DO_TEST do_test +#define AVX512FP16 +#include "avx512-check.h" + +struct m256h +{ + __m256h v; +}; + +__m128 n1 = { -283.3, -23.3, 213.4, 1119.03 }; +struct m256h n2 = { { -93.83f16, 893.318f16, 3994.3f16, -39484.0f16, 213.4f16, 323.4f16, 42.5f16, -43.4f16, + 234.4f16, 93.9f16, 34.5f16, -14.5f16, -34.9f16, -421.0f16, 234.5f16, 214.5f16 } }; +__m128h n3 = { 11.5f16, -31.80f16, 242.3f16, 136.4f16, 42.8f16, -22.8f16, 343.8f16, 215.4f16 } ; +_Float16 n4 = 32.4f16; +double n5 = 103.3; +__m128h n6 = { -12.3f16, 2.0f16, 245.9f16, -432.1f16, 53.5f16, -13.4f16, 432.5f16, 482.4f16 }; +__m128d n7 = { -91.387, -8193.518 }; +struct m256h n8 = { { -93.83f16, 893.318f16, 3994.3f16, -39484.0f16, 213.4f16, 323.4f16, 42.5f16, -43.4f16, + 234.4f16, 93.9f16, 34.5f16, -14.5f16, -34.9f16, -421.0f16, 234.5f16, 214.5f16 } }; +__m128 n9 = { -123.3, 2.3, 3.4, -10.03 }; +__m128h n10 = { 123.3f16, -100.0f16, 246.9f16, 13.4f16, -134.4f16, 35.4f16, 156.5f16, 953.1f16 }; +_Float16 n11 = 40.7f16; +double n12 = 304.9; +__m128h n13 = { 23.3f16, -11.0f16, 24.5f16, -24.5f16, 535.4f16, 35.4f16, -13.4f16, 14.5f16 }; +__m256h n14 = { -123.3f16, 23.9f16, 34.4f16, -100.3f16, 284.4f16, 352.5f16, 131.5f16, -13.2f16, + 131.4f16, 382.5f16, 38.5f16, 99.6f16, 423.2f16, -12.44f16, 43.2f16, -34.45f16 }; +__m512h n15 = { -39.3f16, -180.9f16, 13.4f16, 35.4f16, -41.1f16, -14.4f16, 24.5f16, 53.54f16, + 238.4f16, -134.8f16, 24.5f16, 35.6f16, -346.7f16, -43.4f16, -535.3f16, 324.7f16, + 82.5f16, 21.4f16, 24.4f16, 53.4f16, 23.5f16, -24.4f16, -34.5f16, -32.5f16, + 23.6f16, -13.4f16, 24.5f16, 35.5f16, -34.4f16, -24.5f16, -34.5f16, 13.5f16 }; +__m128d n16 = { 73.0, 63.18 }; +__m256 n17 = { -183.3, -22.3, 13.9, -119.3, 483.1, 122.3, -33.4, -9.37 }; +__m128 n18 = { -183.3, 22.3, 13.4, -19.03 }; + +__m128 e1; +struct m256h e2; +__m128h e3; +_Float16 e4; +double e5; +__m128h e6; +__m128d e7; +struct m256h e8; +__m128 e9; +__m128h e10; +_Float16 e11; +double e12; +__m128h e13; +__m256h e14; +__m512h e15; +__m128d e16; +__m256 e17; +__m128 e18; + +static void +__attribute__((noinline)) +foo (va_list va_arglist) +{ + e4 = va_arg (va_arglist, _Float16); + e5 = va_arg (va_arglist, double); + e6 = va_arg (va_arglist, __m128h); + e7 = va_arg (va_arglist, __m128d); + e8 = va_arg (va_arglist, struct m256h); + e9 = va_arg (va_arglist, __m128); + e10 = va_arg (va_arglist, __m128h); + e11 = va_arg (va_arglist, _Float16); + e12 = va_arg (va_arglist, double); + e13 = va_arg (va_arglist, __m128h); + e14 = va_arg (va_arglist, __m256h); + e15 = va_arg (va_arglist, __m512h); + e16 = va_arg (va_arglist, __m128d); + e17 = va_arg (va_arglist, __m256); + e18 = va_arg (va_arglist, __m128); + va_end (va_arglist); +} + +static void +__attribute__((noinline)) +test (__m128 a1, struct m256h a2, __m128h a3, ...) +{ + va_list va_arglist; + + e1 = a1; + e2 = a2; + e3 = a3; + va_start (va_arglist, a3); + foo (va_arglist); + va_end (va_arglist); +} + +static void +do_test (void) +{ + test (n1, n2, n3, n4, n5, n6, n7, n8, n9, n10, n11, n12, + n13, n14, n15, n16, n17, n18); + assert (__builtin_memcmp (&e1, &n1, sizeof (e1)) == 0); + assert (__builtin_memcmp (&e2, &n2, sizeof (e2)) == 0); + assert (__builtin_memcmp (&e3, &n3, sizeof (e3)) == 0); + assert (n4 == e4); + assert (n5 == e5); + assert (__builtin_memcmp (&e6, &n6, sizeof (e6)) == 0); + assert (__builtin_memcmp (&e7, &n7, sizeof (e7)) == 0); + assert (__builtin_memcmp (&e8, &n8, sizeof (e8)) == 0); + assert (__builtin_memcmp (&e9, &n9, sizeof (e9)) == 0); + assert (__builtin_memcmp (&e10, &n10, sizeof (e10)) == 0); + assert (n11 == e11); + assert (n12 == e12); + assert (__builtin_memcmp (&e13, &n13, sizeof (e13)) == 0); + assert (__builtin_memcmp (&e14, &n14, sizeof (e14)) == 0); + assert (__builtin_memcmp (&e15, &n15, sizeof (e15)) == 0); + assert (__builtin_memcmp (&e16, &n16, sizeof (e16)) == 0); + assert (__builtin_memcmp (&e17, &n17, sizeof (e17)) == 0); + assert (__builtin_memcmp (&e18, &n18, sizeof (e18)) == 0); +} diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vararg-2.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vararg-2.c new file mode 100644 index 00000000000..043f1c75d00 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vararg-2.c @@ -0,0 +1,107 @@ +/* { dg-do run } */ +/* { dg-require-effective-target avx512fp16 } */ +/* { dg-options "-mavx512fp16" } */ + +#include +#include + +static void do_test (void); + +#define DO_TEST do_test +#define AVX512FP16 +#include "avx512-check.h" + +__m128 n1 = { -283.3, -23.3, 213.4, 1119.03 }; +__m256d n2 = { -93.83, 893.318, 3994.3, -39484.0 }; +__m128h n3 = { 11.5f16, -31.80f16, 242.3f16, 136.4f16, 42.8f16, -22.8f16, 343.8f16, 215.4f16 } ; +_Float16 n4 = 32.4f16; +double n5 = 103.3; +__m128h n6 = { -12.3f16, 2.0f16, 245.9f16, -432.1f16, 53.5f16, -13.4f16, 432.5f16, 482.4f16 }; +__m128d n7 = { -91.387, -8193.518 }; +__m256d n8 = { -123.3, 2.3, 3.4, -10.03 }; +__m128 n9 = { -123.3, 2.3, 3.4, -10.03 }; +__m128h n10 = { 123.3f16, -100.0f16, 246.9f16, 13.4f16, -134.4f16, 35.4f16, 156.5f16, 953.1f16 }; +_Float16 n11 = 40.7f16; +double n12 = 304.9; +__m128h n13 = { 23.3f16, -11.0f16, 24.5f16, -24.5f16, 535.4f16, 35.4f16, -13.4f16, 14.5f16 }; +__m256h n14 = { -123.3f16, 23.9f16, 34.4f16, -100.3f16, 284.4f16, 352.5f16, 131.5f16, -13.2f16, + 131.4f16, 382.5f16, 38.5f16, 99.6f16, 423.2f16, -12.44f16, 43.2f16, -34.45f16 }; +__m512h n15 = { -39.3f16, -180.9f16, 13.4f16, 35.4f16, -41.1f16, -14.4f16, 24.5f16, 53.54f16, + 238.4f16, -134.8f16, 24.5f16, 35.6f16, -346.7f16, -43.4f16, -535.3f16, 324.7f16, + 82.5f16, 21.4f16, 24.4f16, 53.4f16, 23.5f16, -24.4f16, -34.5f16, -32.5f16, + 23.6f16, -13.4f16, 24.5f16, 35.5f16, -34.4f16, -24.5f16, -34.5f16, 13.5f16 }; +__m128d n16 = { 73.0, 63.18 }; +__m256 n17 = { -183.3, -22.3, 13.9, -119.3, 483.1, 122.3, -33.4, -9.37 }; +__m128 n18 = { -183.3, 22.3, 13.4, -19.03 }; + +__m128 e1; +__m256d e2; +__m128h e3; +_Float16 e4; +double e5; +__m128h e6; +__m128d e7; +__m256d e8; +__m128 e9; +__m128h e10; +_Float16 e11; +double e12; +__m128h e13; +__m256h e14; +__m512h e15; +__m128d e16; +__m256 e17; +__m128 e18; + +static void +__attribute__((noinline)) +test (__m128 a1, __m256d a2, __m128h a3, ...) +{ + va_list va_arglist; + + e1 = a1; + e2 = a2; + e3 = a3; + va_start (va_arglist, a3); + e4 = va_arg (va_arglist, _Float16); + e5 = va_arg (va_arglist, double); + e6 = va_arg (va_arglist, __m128h); + e7 = va_arg (va_arglist, __m128d); + e8 = va_arg (va_arglist, __m256d); + e9 = va_arg (va_arglist, __m128); + e10 = va_arg (va_arglist, __m128h); + e11 = va_arg (va_arglist, _Float16); + e12 = va_arg (va_arglist, double); + e13 = va_arg (va_arglist, __m128h); + e14 = va_arg (va_arglist, __m256h); + e15 = va_arg (va_arglist, __m512h); + e16 = va_arg (va_arglist, __m128d); + e17 = va_arg (va_arglist, __m256); + e18 = va_arg (va_arglist, __m128); + va_end (va_arglist); +} + +static void +do_test (void) +{ + test (n1, n2, n3, n4, n5, n6, n7, n8, n9, n10, n11, n12, + n13, n14, n15, n16, n17, n18); + assert (__builtin_memcmp (&e1, &n1, sizeof (e1)) == 0); + assert (__builtin_memcmp (&e2, &n2, sizeof (e2)) == 0); + assert (__builtin_memcmp (&e3, &n3, sizeof (e3)) == 0); + assert (n4 == e4); + assert (n5 == e5); + assert (__builtin_memcmp (&e6, &n6, sizeof (e6)) == 0); + assert (__builtin_memcmp (&e7, &n7, sizeof (e7)) == 0); + assert (__builtin_memcmp (&e8, &n8, sizeof (e8)) == 0); + assert (__builtin_memcmp (&e9, &n9, sizeof (e9)) == 0); + assert (__builtin_memcmp (&e10, &n10, sizeof (e10)) == 0); + assert (n11 == e11); + assert (n12 == e12); + assert (__builtin_memcmp (&e13, &n13, sizeof (e13)) == 0); + assert (__builtin_memcmp (&e14, &n14, sizeof (e14)) == 0); + assert (__builtin_memcmp (&e15, &n15, sizeof (e15)) == 0); + assert (__builtin_memcmp (&e16, &n16, sizeof (e16)) == 0); + assert (__builtin_memcmp (&e17, &n17, sizeof (e17)) == 0); + assert (__builtin_memcmp (&e18, &n18, sizeof (e18)) == 0); +} diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vararg-3.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vararg-3.c new file mode 100644 index 00000000000..cb414a97753 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vararg-3.c @@ -0,0 +1,114 @@ +/* { dg-do run } */ +/* { dg-require-effective-target avx512fp16 } */ +/* { dg-options "-mavx512fp16" } */ + +#include +#include + +static void do_test (void); + +#define DO_TEST do_test +#define AVX512FP16 +#include "avx512-check.h" + +struct m256h +{ + __m256h v; +}; + +__m128 n1 = { -283.3, -23.3, 213.4, 1119.03 }; +struct m256h n2 = { { -93.83f16, 893.318f16, 3994.3f16, -39484.0f16, 213.4f16, 323.4f16, 42.5f16, -43.4f16, + 234.4f16, 93.9f16, 34.5f16, -14.5f16, -34.9f16, -421.0f16, 234.5f16, 214.5f16 } }; +__m128h n3 = { 11.5f16, -31.80f16, 242.3f16, 136.4f16, 42.8f16, -22.8f16, 343.8f16, 215.4f16 } ; +_Float16 n4 = 32.4f16; +double n5 = 103.3; +__m128h n6 = { -12.3f16, 2.0f16, 245.9f16, -432.1f16, 53.5f16, -13.4f16, 432.5f16, 482.4f16 }; +__m128d n7 = { -91.387, -8193.518 }; +struct m256h n8 = { { -93.83f16, 893.318f16, 3994.3f16, -39484.0f16, 213.4f16, 323.4f16, 42.5f16, -43.4f16, + 234.4f16, 93.9f16, 34.5f16, -14.5f16, -34.9f16, -421.0f16, 234.5f16, 214.5f16 } }; +__m128 n9 = { -123.3, 2.3, 3.4, -10.03 }; +__m128h n10 = { 123.3f16, -100.0f16, 246.9f16, 13.4f16, -134.4f16, 35.4f16, 156.5f16, 953.1f16 }; +_Float16 n11 = 40.7f16; +double n12 = 304.9; +__m128h n13 = { 23.3f16, -11.0f16, 24.5f16, -24.5f16, 535.4f16, 35.4f16, -13.4f16, 14.5f16 }; +__m256h n14 = { -123.3f16, 23.9f16, 34.4f16, -100.3f16, 284.4f16, 352.5f16, 131.5f16, -13.2f16, + 131.4f16, 382.5f16, 38.5f16, 99.6f16, 423.2f16, -12.44f16, 43.2f16, -34.45f16 }; +__m512h n15 = { -39.3f16, -180.9f16, 13.4f16, 35.4f16, -41.1f16, -14.4f16, 24.5f16, 53.54f16, + 238.4f16, -134.8f16, 24.5f16, 35.6f16, -346.7f16, -43.4f16, -535.3f16, 324.7f16, + 82.5f16, 21.4f16, 24.4f16, 53.4f16, 23.5f16, -24.4f16, -34.5f16, -32.5f16, + 23.6f16, -13.4f16, 24.5f16, 35.5f16, -34.4f16, -24.5f16, -34.5f16, 13.5f16 }; +__m128d n16 = { 73.0, 63.18 }; +__m256 n17 = { -183.3, -22.3, 13.9, -119.3, 483.1, 122.3, -33.4, -9.37 }; +__m128 n18 = { -183.3, 22.3, 13.4, -19.03 }; + +__m128 e1; +struct m256h e2; +__m128h e3; +_Float16 e4; +double e5; +__m128h e6; +__m128d e7; +struct m256h e8; +__m128 e9; +__m128h e10; +_Float16 e11; +double e12; +__m128h e13; +__m256h e14; +__m512h e15; +__m128d e16; +__m256 e17; +__m128 e18; + +static void +__attribute__((noinline)) +test (__m128 a1, struct m256h a2, __m128h a3, ...) +{ + va_list va_arglist; + + e1 = a1; + e2 = a2; + e3 = a3; + va_start (va_arglist, a3); + e4 = va_arg (va_arglist, _Float16); + e5 = va_arg (va_arglist, double); + e6 = va_arg (va_arglist, __m128h); + e7 = va_arg (va_arglist, __m128d); + e8 = va_arg (va_arglist, struct m256h); + e9 = va_arg (va_arglist, __m128); + e10 = va_arg (va_arglist, __m128h); + e11 = va_arg (va_arglist, _Float16); + e12 = va_arg (va_arglist, double); + e13 = va_arg (va_arglist, __m128h); + e14 = va_arg (va_arglist, __m256h); + e15 = va_arg (va_arglist, __m512h); + e16 = va_arg (va_arglist, __m128d); + e17 = va_arg (va_arglist, __m256); + e18 = va_arg (va_arglist, __m128); + va_end (va_arglist); +} + +static void +do_test (void) +{ + test (n1, n2, n3, n4, n5, n6, n7, n8, n9, n10, n11, n12, + n13, n14, n15, n16, n17, n18); + assert (__builtin_memcmp (&e1, &n1, sizeof (e1)) == 0); + assert (__builtin_memcmp (&e2, &n2, sizeof (e2)) == 0); + assert (__builtin_memcmp (&e3, &n3, sizeof (e3)) == 0); + assert (n4 == e4); + assert (n5 == e5); + assert (__builtin_memcmp (&e6, &n6, sizeof (e6)) == 0); + assert (__builtin_memcmp (&e7, &n7, sizeof (e7)) == 0); + assert (__builtin_memcmp (&e8, &n8, sizeof (e8)) == 0); + assert (__builtin_memcmp (&e9, &n9, sizeof (e9)) == 0); + assert (__builtin_memcmp (&e10, &n10, sizeof (e10)) == 0); + assert (n11 == e11); + assert (n12 == e12); + assert (__builtin_memcmp (&e13, &n13, sizeof (e13)) == 0); + assert (__builtin_memcmp (&e14, &n14, sizeof (e14)) == 0); + assert (__builtin_memcmp (&e15, &n15, sizeof (e15)) == 0); + assert (__builtin_memcmp (&e16, &n16, sizeof (e16)) == 0); + assert (__builtin_memcmp (&e17, &n17, sizeof (e17)) == 0); + assert (__builtin_memcmp (&e18, &n18, sizeof (e18)) == 0); +} diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vararg-4.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vararg-4.c new file mode 100644 index 00000000000..962c2bf031d --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vararg-4.c @@ -0,0 +1,115 @@ +/* { dg-do run } */ +/* { dg-require-effective-target avx512fp16 } */ +/* { dg-options "-mavx512fp16" } */ + +#include +#include + +static void do_test (void); + +#define DO_TEST do_test +#define AVX512FP16 +#include "avx512-check.h" + +__m128 n1 = { -283.3, -23.3, 213.4, 1119.03 }; +__m256d n2 = { -93.83, 893.318, 3994.3, -39484.0 }; +__m128h n3 = { 11.5f16, -31.80f16, 242.3f16, 136.4f16, 42.8f16, -22.8f16, 343.8f16, 215.4f16 } ; +_Float16 n4 = 32.4f16; +double n5 = 103.3; +__m128h n6 = { -12.3f16, 2.0f16, 245.9f16, -432.1f16, 53.5f16, -13.4f16, 432.5f16, 482.4f16 }; +__m128d n7 = { -91.387, -8193.518 }; +__m256d n8 = { -123.3, 2.3, 3.4, -10.03 }; +__m128 n9 = { -123.3, 2.3, 3.4, -10.03 }; +__m128h n10 = { 123.3f16, -100.0f16, 246.9f16, 13.4f16, -134.4f16, 35.4f16, 156.5f16, 953.1f16 }; +_Float16 n11 = 40.7f16; +double n12 = 304.9; +__m128h n13 = { 23.3f16, -11.0f16, 24.5f16, -24.5f16, 535.4f16, 35.4f16, -13.4f16, 14.5f16 }; +__m256h n14 = { -123.3f16, 23.9f16, 34.4f16, -100.3f16, 284.4f16, 352.5f16, 131.5f16, -13.2f16, + 131.4f16, 382.5f16, 38.5f16, 99.6f16, 423.2f16, -12.44f16, 43.2f16, -34.45f16 }; +__m512h n15 = { -39.3f16, -180.9f16, 13.4f16, 35.4f16, -41.1f16, -14.4f16, 24.5f16, 53.54f16, + 238.4f16, -134.8f16, 24.5f16, 35.6f16, -346.7f16, -43.4f16, -535.3f16, 324.7f16, + 82.5f16, 21.4f16, 24.4f16, 53.4f16, 23.5f16, -24.4f16, -34.5f16, -32.5f16, + 23.6f16, -13.4f16, 24.5f16, 35.5f16, -34.4f16, -24.5f16, -34.5f16, 13.5f16 }; +__m128d n16 = { 73.0, 63.18 }; +__m256 n17 = { -183.3, -22.3, 13.9, -119.3, 483.1, 122.3, -33.4, -9.37 }; +__m128 n18 = { -183.3, 22.3, 13.4, -19.03 }; + +__m128 e1; +__m256d e2; +__m128h e3; +_Float16 e4; +double e5; +__m128h e6; +__m128d e7; +__m256d e8; +__m128 e9; +__m128h e10; +_Float16 e11; +double e12; +__m128h e13; +__m256h e14; +__m512h e15; +__m128d e16; +__m256 e17; +__m128 e18; + +static void +__attribute__((noinline)) +foo (va_list va_arglist) +{ + e4 = va_arg (va_arglist, _Float16); + e5 = va_arg (va_arglist, double); + e6 = va_arg (va_arglist, __m128h); + e7 = va_arg (va_arglist, __m128d); + e8 = va_arg (va_arglist, __m256d); + e9 = va_arg (va_arglist, __m128); + e10 = va_arg (va_arglist, __m128h); + e11 = va_arg (va_arglist, _Float16); + e12 = va_arg (va_arglist, double); + e13 = va_arg (va_arglist, __m128h); + e14 = va_arg (va_arglist, __m256h); + e15 = va_arg (va_arglist, __m512h); + e16 = va_arg (va_arglist, __m128d); + e17 = va_arg (va_arglist, __m256); + e18 = va_arg (va_arglist, __m128); + va_end (va_arglist); +} + +static void +__attribute__((noinline)) +test (__m128 a1, __m256d a2, __m128h a3, ...) +{ + va_list va_arglist; + + e1 = a1; + e2 = a2; + e3 = a3; + va_start (va_arglist, a3); + foo (va_arglist); + va_end (va_arglist); +} + +static void +do_test (void) +{ + test (n1, n2, n3, n4, n5, n6, n7, n8, n9, n10, n11, n12, + n13, n14, n15, n16, n17, n18); + assert (__builtin_memcmp (&e1, &n1, sizeof (e1)) == 0); + assert (__builtin_memcmp (&e2, &n2, sizeof (e2)) == 0); + assert (__builtin_memcmp (&e3, &n3, sizeof (e3)) == 0); + assert (n4 == e4); + assert (n5 == e5); + assert (__builtin_memcmp (&e6, &n6, sizeof (e6)) == 0); + assert (__builtin_memcmp (&e7, &n7, sizeof (e7)) == 0); + assert (__builtin_memcmp (&e8, &n8, sizeof (e8)) == 0); + assert (__builtin_memcmp (&e9, &n9, sizeof (e9)) == 0); + assert (__builtin_memcmp (&e10, &n10, sizeof (e10)) == 0); + assert (n11 == e11); + assert (n12 == e12); + assert (__builtin_memcmp (&e13, &n13, sizeof (e13)) == 0); + assert (__builtin_memcmp (&e14, &n14, sizeof (e14)) == 0); + assert (__builtin_memcmp (&e15, &n15, sizeof (e15)) == 0); + assert (__builtin_memcmp (&e16, &n16, sizeof (e16)) == 0); + assert (__builtin_memcmp (&e17, &n17, sizeof (e17)) == 0); + assert (__builtin_memcmp (&e18, &n18, sizeof (e18)) == 0); +} From patchwork Thu Jul 1 06:15:50 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: liuhongt X-Patchwork-Id: 1499307 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256 header.s=default header.b=v/CarJzw; dkim-atps=neutral Received: from sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4GFp664KV5z9sWX for ; Thu, 1 Jul 2021 16:22:22 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 9535F3858034 for ; Thu, 1 Jul 2021 06:22:19 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 9535F3858034 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1625120539; bh=JK8/YNM51O5mEbXgpQcUYU8LkPspHY/mdZLXQea8h6A=; h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=v/CarJzwZZUy9UdyUBFdFdOAiQPxVLnCih31eZtVa415ml4cM+HmiILAdNk7wss/a xKZyqx2WrazqGid4psgbJd9Ng6hMt07O7UvVsGcabTuhKarpM4PkAlvCo6xui7Wd+Z Ujf3ipmL9qpMCjZE8Imi5Jeytp8CNBWKVltQq794= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mga01.intel.com (mga01.intel.com [192.55.52.88]) by sourceware.org (Postfix) with ESMTPS id B77EC385503C for ; Thu, 1 Jul 2021 06:16:59 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org B77EC385503C X-IronPort-AV: E=McAfee;i="6200,9189,10031"; a="230128619" X-IronPort-AV: E=Sophos;i="5.83,313,1616482800"; d="scan'208";a="230128619" Received: from orsmga001.jf.intel.com ([10.7.209.18]) by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Jun 2021 23:16:58 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.83,313,1616482800"; d="scan'208";a="489821161" Received: from scymds01.sc.intel.com ([10.148.94.138]) by orsmga001.jf.intel.com with ESMTP; 30 Jun 2021 23:16:57 -0700 Received: from shliclel320.sh.intel.com (shliclel320.sh.intel.com [10.239.236.50]) by scymds01.sc.intel.com with ESMTP id 1616Gmeh031625; Wed, 30 Jun 2021 23:16:56 -0700 To: gcc-patches@gcc.gnu.org Subject: [PATCH 04/62] AVX512FP16: Add ABI tests for xmm. Date: Thu, 1 Jul 2021 14:15:50 +0800 Message-Id: <20210701061648.9447-5-hongtao.liu@intel.com> X-Mailer: git-send-email 2.18.1 In-Reply-To: <20210701061648.9447-1-hongtao.liu@intel.com> References: <20210701061648.9447-1-hongtao.liu@intel.com> X-Spam-Status: No, score=-12.6 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_NONE, KAM_DMARC_STATUS, KAM_LAZY_DOMAIN_SECURITY, KAM_SHORT, RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: liuhongt via Gcc-patches From: liuhongt Reply-To: liuhongt Cc: jakub@redhat.com Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" From: "H.J. Lu" Copied from regular XMM ABI tests. Only run AVX512FP16 ABI tests for ELF targets. gcc/testsuite/ChangeLog: * gcc.target/x86_64/abi/avx512fp16/abi-avx512fp16-xmm.exp: New exp file for abi test. * gcc.target/x86_64/abi/avx512fp16/args.h: New header file for abi test. * gcc.target/x86_64/abi/avx512fp16/avx512fp16-check.h: Likewise. * gcc.target/x86_64/abi/avx512fp16/avx512fp16-xmm-check.h: Likewise. * gcc.target/x86_64/abi/avx512fp16/defines.h: Likewise. * gcc.target/x86_64/abi/avx512fp16/macros.h: Likewise. * gcc.target/x86_64/abi/avx512fp16/asm-support.S: New asm for abi check. * gcc.target/x86_64/abi/avx512fp16/test_3_element_struct_and_unions.c: New test. * gcc.target/x86_64/abi/avx512fp16/test_basic_alignment.c: Likewise. * gcc.target/x86_64/abi/avx512fp16/test_basic_array_size_and_align.c: Likewise. * gcc.target/x86_64/abi/avx512fp16/test_basic_returning.c: Likewise. * gcc.target/x86_64/abi/avx512fp16/test_basic_sizes.c: Likewise. * gcc.target/x86_64/abi/avx512fp16/test_basic_struct_size_and_align.c: Likewise. * gcc.target/x86_64/abi/avx512fp16/test_basic_union_size_and_align.c: Likewise. * gcc.target/x86_64/abi/avx512fp16/test_complex_returning.c: Likewise. * gcc.target/x86_64/abi/avx512fp16/test_m64m128_returning.c: Likewise. * gcc.target/x86_64/abi/avx512fp16/test_passing_floats.c: Likewise. * gcc.target/x86_64/abi/avx512fp16/test_passing_m64m128.c: Likewise. * gcc.target/x86_64/abi/avx512fp16/test_passing_structs.c: Likewise. * gcc.target/x86_64/abi/avx512fp16/test_passing_unions.c: Likewise. * gcc.target/x86_64/abi/avx512fp16/test_struct_returning.c: Likewise. * gcc.target/x86_64/abi/avx512fp16/test_varargs-m128.c: Likewise. --- .../abi/avx512fp16/abi-avx512fp16-xmm.exp | 48 + .../gcc.target/x86_64/abi/avx512fp16/args.h | 190 +++ .../x86_64/abi/avx512fp16/asm-support.S | 81 ++ .../x86_64/abi/avx512fp16/avx512fp16-check.h | 74 ++ .../abi/avx512fp16/avx512fp16-xmm-check.h | 3 + .../x86_64/abi/avx512fp16/defines.h | 150 +++ .../gcc.target/x86_64/abi/avx512fp16/macros.h | 53 + .../test_3_element_struct_and_unions.c | 692 +++++++++++ .../abi/avx512fp16/test_basic_alignment.c | 45 + .../test_basic_array_size_and_align.c | 43 + .../abi/avx512fp16/test_basic_returning.c | 87 ++ .../x86_64/abi/avx512fp16/test_basic_sizes.c | 43 + .../test_basic_struct_size_and_align.c | 42 + .../test_basic_union_size_and_align.c | 40 + .../abi/avx512fp16/test_complex_returning.c | 104 ++ .../abi/avx512fp16/test_m64m128_returning.c | 73 ++ .../abi/avx512fp16/test_passing_floats.c | 1066 +++++++++++++++++ .../abi/avx512fp16/test_passing_m64m128.c | 510 ++++++++ .../abi/avx512fp16/test_passing_structs.c | 332 +++++ .../abi/avx512fp16/test_passing_unions.c | 335 ++++++ .../abi/avx512fp16/test_struct_returning.c | 274 +++++ .../x86_64/abi/avx512fp16/test_varargs-m128.c | 164 +++ 22 files changed, 4449 insertions(+) create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/abi-avx512fp16-xmm.exp create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/args.h create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/asm-support.S create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/avx512fp16-check.h create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/avx512fp16-xmm-check.h create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/defines.h create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/macros.h create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_3_element_struct_and_unions.c create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_basic_alignment.c create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_basic_array_size_and_align.c create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_basic_returning.c create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_basic_sizes.c create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_basic_struct_size_and_align.c create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_basic_union_size_and_align.c create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_complex_returning.c create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_m64m128_returning.c create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_passing_floats.c create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_passing_m64m128.c create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_passing_structs.c create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_passing_unions.c create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_struct_returning.c create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_varargs-m128.c diff --git a/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/abi-avx512fp16-xmm.exp b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/abi-avx512fp16-xmm.exp new file mode 100644 index 00000000000..33d24762788 --- /dev/null +++ b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/abi-avx512fp16-xmm.exp @@ -0,0 +1,48 @@ +# Copyright (C) 2019 Free Software Foundation, Inc. + +# This program is free software; you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation; either version 3 of the License, or +# (at your option) any later version. +# +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with GCC; see the file COPYING3. If not see +# . + +# The x86-64 ABI testsuite needs one additional assembler file for most +# testcases. For simplicity we will just link it into each test. + +load_lib c-torture.exp +load_lib target-supports.exp +load_lib torture-options.exp +load_lib clearcap.exp +load_lib file-format.exp + +if { (![istarget x86_64-*-*] && ![istarget i?86-*-*]) + || [is-effective-target ia32] + || [gcc_target_object_format] != "elf" + || ![is-effective-target avx512fp16] } then { + return +} + + +torture-init +clearcap-init +set-torture-options $C_TORTURE_OPTIONS +set additional_flags "-W -Wall -Wno-abi -mavx512fp16" + +foreach src [lsort [glob -nocomplain $srcdir/$subdir/test_*.c]] { + if {[runtest_file_p $runtests $src]} { + c-torture-execute [list $src \ + $srcdir/$subdir/asm-support.S] \ + $additional_flags + } +} + +clearcap-finish +torture-finish diff --git a/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/args.h b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/args.h new file mode 100644 index 00000000000..4a7b9a90fbe --- /dev/null +++ b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/args.h @@ -0,0 +1,190 @@ +#ifndef INCLUDED_ARGS_H +#define INCLUDED_ARGS_H + +#include + +/* This defines the calling sequences for integers and floats. */ +#define I0 rdi +#define I1 rsi +#define I2 rdx +#define I3 rcx +#define I4 r8 +#define I5 r9 +#define F0 xmm0 +#define F1 xmm1 +#define F2 xmm2 +#define F3 xmm3 +#define F4 xmm4 +#define F5 xmm5 +#define F6 xmm6 +#define F7 xmm7 + +typedef union { + _Float16 __Float16[8]; + float _float[4]; + double _double[2]; + long _long[2]; + int _int[4]; + unsigned long _ulong[2]; +#ifdef CHECK_M64_M128 + __m64 _m64[2]; + __m128 _m128[1]; + __m128h _m128h[1]; +#endif +} XMM_T; + +typedef union { + _Float16 __Float16; + float _float; + double _double; + ldouble _ldouble; + ulong _ulong[2]; +} X87_T; +extern void (*callthis)(void); +extern unsigned long rax,rbx,rcx,rdx,rsi,rdi,rsp,rbp,r8,r9,r10,r11,r12,r13,r14,r15; +XMM_T xmm_regs[16]; +X87_T x87_regs[8]; +extern volatile unsigned long volatile_var; +extern void snapshot (void); +extern void snapshot_ret (void); +#define WRAP_CALL(N) \ + (callthis = (void (*)()) (N), (typeof (&N)) snapshot) +#define WRAP_RET(N) \ + (callthis = (void (*)()) (N), (typeof (&N)) snapshot_ret) + +/* Clear all integer registers. */ +#define clear_int_hardware_registers \ + asm __volatile__ ("xor %%rax, %%rax\n\t" \ + "xor %%rbx, %%rbx\n\t" \ + "xor %%rcx, %%rcx\n\t" \ + "xor %%rdx, %%rdx\n\t" \ + "xor %%rsi, %%rsi\n\t" \ + "xor %%rdi, %%rdi\n\t" \ + "xor %%r8, %%r8\n\t" \ + "xor %%r9, %%r9\n\t" \ + "xor %%r10, %%r10\n\t" \ + "xor %%r11, %%r11\n\t" \ + "xor %%r12, %%r12\n\t" \ + "xor %%r13, %%r13\n\t" \ + "xor %%r14, %%r14\n\t" \ + "xor %%r15, %%r15\n\t" \ + ::: "rax", "rbx", "rcx", "rdx", "rsi", "rdi", "r8", \ + "r9", "r10", "r11", "r12", "r13", "r14", "r15"); + +/* This is the list of registers available for passing arguments. Not all of + these are used or even really available. */ +struct IntegerRegisters +{ + unsigned long rax, rbx, rcx, rdx, rsi, rdi, r8, r9, r10, r11, r12, r13, r14, r15; +}; +struct FloatRegisters +{ + double mm0, mm1, mm2, mm3, mm4, mm5, mm6, mm7; + ldouble st0, st1, st2, st3, st4, st5, st6, st7; + XMM_T xmm0, xmm1, xmm2, xmm3, xmm4, xmm5, xmm6, xmm7, xmm8, xmm9, + xmm10, xmm11, xmm12, xmm13, xmm14, xmm15; +}; + +/* Implemented in scalarargs.c */ +extern struct IntegerRegisters iregs; +extern struct FloatRegisters fregs; +extern unsigned int num_iregs, num_fregs; + +#define check_int_arguments do { \ + assert (num_iregs <= 0 || iregs.I0 == I0); \ + assert (num_iregs <= 1 || iregs.I1 == I1); \ + assert (num_iregs <= 2 || iregs.I2 == I2); \ + assert (num_iregs <= 3 || iregs.I3 == I3); \ + assert (num_iregs <= 4 || iregs.I4 == I4); \ + assert (num_iregs <= 5 || iregs.I5 == I5); \ + } while (0) + +#define check_char_arguments check_int_arguments +#define check_short_arguments check_int_arguments +#define check_long_arguments check_int_arguments + +/* Clear register struct. */ +#define clear_struct_registers \ + rax = rbx = rcx = rdx = rdi = rsi = rbp = rsp \ + = r8 = r9 = r10 = r11 = r12 = r13 = r14 = r15 = 0; \ + memset (&iregs, 0, sizeof (iregs)); \ + memset (&fregs, 0, sizeof (fregs)); \ + memset (xmm_regs, 0, sizeof (xmm_regs)); \ + memset (x87_regs, 0, sizeof (x87_regs)); + +/* Clear both hardware and register structs for integers. */ +#define clear_int_registers \ + clear_struct_registers \ + clear_int_hardware_registers + +/* TODO: Do the checking. */ +#define check_f_arguments(T) do { \ + assert (num_fregs <= 0 || fregs.xmm0._ ## T [0] == xmm_regs[0]._ ## T [0]); \ + assert (num_fregs <= 1 || fregs.xmm1._ ## T [0] == xmm_regs[1]._ ## T [0]); \ + assert (num_fregs <= 2 || fregs.xmm2._ ## T [0] == xmm_regs[2]._ ## T [0]); \ + assert (num_fregs <= 3 || fregs.xmm3._ ## T [0] == xmm_regs[3]._ ## T [0]); \ + assert (num_fregs <= 4 || fregs.xmm4._ ## T [0] == xmm_regs[4]._ ## T [0]); \ + assert (num_fregs <= 5 || fregs.xmm5._ ## T [0] == xmm_regs[5]._ ## T [0]); \ + assert (num_fregs <= 6 || fregs.xmm6._ ## T [0] == xmm_regs[6]._ ## T [0]); \ + assert (num_fregs <= 7 || fregs.xmm7._ ## T [0] == xmm_regs[7]._ ## T [0]); \ + } while (0) + +#define check_float16_arguments check_f_arguments(_Float16) +#define check_float_arguments check_f_arguments(float) +#define check_double_arguments check_f_arguments(double) + +#define check_vector_arguments(T,O) do { \ + assert (num_fregs <= 0 \ + || memcmp (((char *) &fregs.xmm0) + (O), \ + &xmm_regs[0], \ + sizeof (__ ## T) - (O)) == 0); \ + assert (num_fregs <= 1 \ + || memcmp (((char *) &fregs.xmm1) + (O), \ + &xmm_regs[1], \ + sizeof (__ ## T) - (O)) == 0); \ + assert (num_fregs <= 2 \ + || memcmp (((char *) &fregs.xmm2) + (O), \ + &xmm_regs[2], \ + sizeof (__ ## T) - (O)) == 0); \ + assert (num_fregs <= 3 \ + || memcmp (((char *) &fregs.xmm3) + (O), \ + &xmm_regs[3], \ + sizeof (__ ## T) - (O)) == 0); \ + assert (num_fregs <= 4 \ + || memcmp (((char *) &fregs.xmm4) + (O), \ + &xmm_regs[4], \ + sizeof (__ ## T) - (O)) == 0); \ + assert (num_fregs <= 5 \ + || memcmp (((char *) &fregs.xmm5) + (O), \ + &xmm_regs[5], \ + sizeof (__ ## T) - (O)) == 0); \ + assert (num_fregs <= 6 \ + || memcmp (((char *) &fregs.xmm6) + (O), \ + &xmm_regs[6], \ + sizeof (__ ## T) - (O)) == 0); \ + assert (num_fregs <= 7 \ + || memcmp (((char *) &fregs.xmm7) + (O), \ + &xmm_regs[7], \ + sizeof (__ ## T) - (O)) == 0); \ + } while (0) + +#define check_m64_arguments check_vector_arguments(m64, 0) +#define check_m128_arguments check_vector_arguments(m128, 0) + +/* ldoubles are not passed in registers */ +#define check_ldouble_arguments + +/* TODO: Do the clearing. */ +#define clear_float_hardware_registers +#define clear_x87_hardware_registers + +#define clear_float_registers \ + clear_struct_registers \ + clear_float_hardware_registers + +#define clear_x87_registers \ + clear_struct_registers \ + clear_x87_hardware_registers + + +#endif /* INCLUDED_ARGS_H */ diff --git a/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/asm-support.S b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/asm-support.S new file mode 100644 index 00000000000..7849acd2649 --- /dev/null +++ b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/asm-support.S @@ -0,0 +1,81 @@ + .text + .p2align 4,,15 +.globl snapshot + .type snapshot, @function +snapshot: +.LFB3: + movq %rax, rax(%rip) + movq %rbx, rbx(%rip) + movq %rcx, rcx(%rip) + movq %rdx, rdx(%rip) + movq %rdi, rdi(%rip) + movq %rsi, rsi(%rip) + movq %rbp, rbp(%rip) + movq %rsp, rsp(%rip) + movq %r8, r8(%rip) + movq %r9, r9(%rip) + movq %r10, r10(%rip) + movq %r11, r11(%rip) + movq %r12, r12(%rip) + movq %r13, r13(%rip) + movq %r14, r14(%rip) + movq %r15, r15(%rip) + vmovdqu %xmm0, xmm_regs+0(%rip) + vmovdqu %xmm1, xmm_regs+16(%rip) + vmovdqu %xmm2, xmm_regs+32(%rip) + vmovdqu %xmm3, xmm_regs+48(%rip) + vmovdqu %xmm4, xmm_regs+64(%rip) + vmovdqu %xmm5, xmm_regs+80(%rip) + vmovdqu %xmm6, xmm_regs+96(%rip) + vmovdqu %xmm7, xmm_regs+112(%rip) + vmovdqu %xmm8, xmm_regs+128(%rip) + vmovdqu %xmm9, xmm_regs+144(%rip) + vmovdqu %xmm10, xmm_regs+160(%rip) + vmovdqu %xmm11, xmm_regs+176(%rip) + vmovdqu %xmm12, xmm_regs+192(%rip) + vmovdqu %xmm13, xmm_regs+208(%rip) + vmovdqu %xmm14, xmm_regs+224(%rip) + vmovdqu %xmm15, xmm_regs+240(%rip) + jmp *callthis(%rip) +.LFE3: + .size snapshot, .-snapshot + + .p2align 4,,15 +.globl snapshot_ret + .type snapshot_ret, @function +snapshot_ret: + movq %rdi, rdi(%rip) + subq $8, %rsp + call *callthis(%rip) + addq $8, %rsp + movq %rax, rax(%rip) + movq %rdx, rdx(%rip) + vmovdqu %xmm0, xmm_regs+0(%rip) + vmovdqu %xmm1, xmm_regs+16(%rip) + fstpt x87_regs(%rip) + fstpt x87_regs+16(%rip) + fldt x87_regs+16(%rip) + fldt x87_regs(%rip) + ret + .size snapshot_ret, .-snapshot_ret + + .comm callthis,8,8 + .comm rax,8,8 + .comm rbx,8,8 + .comm rcx,8,8 + .comm rdx,8,8 + .comm rsi,8,8 + .comm rdi,8,8 + .comm rsp,8,8 + .comm rbp,8,8 + .comm r8,8,8 + .comm r9,8,8 + .comm r10,8,8 + .comm r11,8,8 + .comm r12,8,8 + .comm r13,8,8 + .comm r14,8,8 + .comm r15,8,8 + .comm xmm_regs,256,32 + .comm x87_regs,128,32 + .comm volatile_var,8,8 diff --git a/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/avx512fp16-check.h b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/avx512fp16-check.h new file mode 100644 index 00000000000..9fbec9d03ff --- /dev/null +++ b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/avx512fp16-check.h @@ -0,0 +1,74 @@ +#include +#include + +/* Check if the OS supports executing AVX512FP16 instructions. */ + +#define XCR_XFEATURE_ENABLED_MASK 0x0 + +#define XSTATE_FP 0x1 +#define XSTATE_SSE 0x2 +#define XSTATE_YMM 0x4 +#define XSTATE_OPMASK 0x20 +#define XSTATE_ZMM 0x40 +#define XSTATE_HI_ZMM 0x80 + +static int +check_osxsave (void) +{ + unsigned int eax, ebx, ecx, edx; + + if (!__get_cpuid (1, &eax, &ebx, &ecx, &edx)) + return 0; + + return (ecx & bit_OSXSAVE) != 0; +} + +static int +avx512fp16_os_support (void) +{ + unsigned int eax, edx; + unsigned int ecx = XCR_XFEATURE_ENABLED_MASK; + unsigned int mask = XSTATE_MASK; + + if (!check_osxsave ()) + return 0; + + __asm__ ("xgetbv" : "=a" (eax), "=d" (edx) : "c" (ecx)); + + return ((eax & mask) == mask); +} + +static void do_test (void); + +int +main () +{ + unsigned int eax, ebx, ecx, edx; + + if (!avx512fp16_os_support ()) + return 0; + + if (__get_cpuid_max (0, NULL) < 7) + return 0; + + __cpuid_count (7, 0, eax, ebx, ecx, edx); + + /* Run AVX512FP16 test only if host has ISA support. */ + if (((ebx & (bit_AVX512F | bit_AVX512BW)) + == (bit_AVX512F | bit_AVX512BW)) + && (edx & bit_AVX512FP16) + && AVX512VL (ebx)) + { + do_test (); +#ifdef DEBUG + printf ("PASSED\n"); +#endif + return 0; + } + +#ifdef DEBUG + printf ("SKIPPED\n"); +#endif + + return 0; +} diff --git a/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/avx512fp16-xmm-check.h b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/avx512fp16-xmm-check.h new file mode 100644 index 00000000000..0abe09f1166 --- /dev/null +++ b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/avx512fp16-xmm-check.h @@ -0,0 +1,3 @@ +#define AVX512VL(ebx) (ebx & bit_AVX512VL) +#define XSTATE_MASK (XSTATE_SSE | XSTATE_OPMASK) +#include "avx512fp16-check.h" diff --git a/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/defines.h b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/defines.h new file mode 100644 index 00000000000..17f2c27edc6 --- /dev/null +++ b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/defines.h @@ -0,0 +1,150 @@ +#ifndef DEFINED_DEFINES_H +#define DEFINED_DEFINES_H + +/* Get __m64 and __m128. */ +#include + +typedef unsigned long ulong; +typedef long double ldouble; + +/* These defines determines what part of the test should be run. When + GCC implements these parts, the defines should be uncommented to + enable testing. */ + +/* Scalar type __int128. */ +/* #define CHECK_INT128 */ + +/* Scalar type long double. */ +#define CHECK_LONG_DOUBLE + +/* Scalar type __float128. */ +/* #define CHECK_FLOAT128 */ + +/* Scalar types __m64 and __m128. */ +#define CHECK_M64_M128 + +/* Returning of complex type. */ +#define CHECK_COMPLEX + +/* Structs with size >= 16. */ +#define CHECK_LARGER_STRUCTS + +/* Checks for passing floats and doubles. */ +#define CHECK_FLOAT_DOUBLE_PASSING + +/* Union passing with not-extremely-simple unions. */ +#define CHECK_LARGER_UNION_PASSING + +/* Variable args. */ +#define CHECK_VARARGS + +/* Check argument passing and returning for scalar types with sizeof = 16. */ +/* TODO: Implement these tests. Don't activate them for now. */ +#define CHECK_LARGE_SCALAR_PASSING + +/* Defines for sizing and alignment. */ + +#define TYPE_SIZE_CHAR 1 +#define TYPE_SIZE_SHORT 2 +#define TYPE_SIZE_INT 4 +#define TYPE_SIZE_LONG 8 +#define TYPE_SIZE_LONG_LONG 8 +#define TYPE_SIZE_INT128 16 +#define TYPE_SIZE_FLOAT16 2 +#define TYPE_SIZE_FLOAT 4 +#define TYPE_SIZE_DOUBLE 8 +#define TYPE_SIZE_LONG_DOUBLE 16 +#define TYPE_SIZE_FLOAT128 16 +#define TYPE_SIZE_M64 8 +#define TYPE_SIZE_M128 16 +#define TYPE_SIZE_ENUM 4 +#define TYPE_SIZE_POINTER 8 + +#define TYPE_ALIGN_CHAR 1 +#define TYPE_ALIGN_SHORT 2 +#define TYPE_ALIGN_INT 4 +#define TYPE_ALIGN_LONG 8 +#define TYPE_ALIGN_LONG_LONG 8 +#define TYPE_ALIGN_INT128 16 +#define TYPE_ALIGN_FLOAT16 2 +#define TYPE_ALIGN_FLOAT 4 +#define TYPE_ALIGN_DOUBLE 8 +#define TYPE_ALIGN_LONG_DOUBLE 16 +#define TYPE_ALIGN_FLOAT128 16 +#define TYPE_ALIGN_M64 8 +#define TYPE_ALIGN_M128 16 +#define TYPE_ALIGN_ENUM 4 +#define TYPE_ALIGN_POINTER 8 + +/* These defines control the building of the list of types to check. There + is a string identifying the type (with a comma after), a size of the type + (also with a comma and an integer for adding to the total amount of types) + and an alignment of the type (which is currently not really needed since + the abi specifies that alignof == sizeof for all scalar types). */ +#ifdef CHECK_INT128 +#define CI128_STR "__int128", +#define CI128_SIZ TYPE_SIZE_INT128, +#define CI128_ALI TYPE_ALIGN_INT128, +#define CI128_RET "???", +#else +#define CI128_STR +#define CI128_SIZ +#define CI128_ALI +#define CI128_RET +#endif +#ifdef CHECK_LONG_DOUBLE +#define CLD_STR "long double", +#define CLD_SIZ TYPE_SIZE_LONG_DOUBLE, +#define CLD_ALI TYPE_ALIGN_LONG_DOUBLE, +#define CLD_RET "x87_regs[0]._ldouble", +#else +#define CLD_STR +#define CLD_SIZ +#define CLD_ALI +#define CLD_RET +#endif +#ifdef CHECK_FLOAT128 +#define CF128_STR "__float128", +#define CF128_SIZ TYPE_SIZE_FLOAT128, +#define CF128_ALI TYPE_ALIGN_FLOAT128, +#define CF128_RET "???", +#else +#define CF128_STR +#define CF128_SIZ +#define CF128_ALI +#define CF128_RET +#endif +#ifdef CHECK_M64_M128 +#define CMM_STR "__m64", "__m128", +#define CMM_SIZ TYPE_SIZE_M64, TYPE_SIZE_M128, +#define CMM_ALI TYPE_ALIGN_M64, TYPE_ALIGN_M128, +#define CMM_RET "???", "???", +#else +#define CMM_STR +#define CMM_SIZ +#define CMM_ALI +#define CMM_RET +#endif + +/* Used in size and alignment tests. */ +enum dummytype { enumtype }; + +extern void abort (void); + +/* Assertion macro. */ +#define assert(test) if (!(test)) abort() + +#ifdef __GNUC__ +#define ATTRIBUTE_UNUSED __attribute__((__unused__)) +#else +#define ATTRIBUTE_UNUSED +#endif + +#ifdef __GNUC__ +#define PACKED __attribute__((__packed__)) +#else +#warning Some tests will fail due to missing __packed__ support +#define PACKED +#endif + +#endif /* DEFINED_DEFINES_H */ diff --git a/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/macros.h b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/macros.h new file mode 100644 index 00000000000..98fbc660f27 --- /dev/null +++ b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/macros.h @@ -0,0 +1,53 @@ +#ifndef MACROS_H + +#define check_size(_t, _size) assert(sizeof(_t) == (_size)) + +#define check_align(_t, _align) assert(__alignof__(_t) == (_align)) + +#define check_align_lv(_t, _align) assert(__alignof__(_t) == (_align) \ + && (((unsigned long)&(_t)) & ((_align) - 1) ) == 0) + +#define check_basic_struct_size_and_align(_type, _size, _align) { \ + struct _str { _type dummy; } _t; \ + check_size(_t, _size); \ + check_align_lv(_t, _align); \ +} + +#define check_array_size_and_align(_type, _size, _align) { \ + _type _a[1]; _type _b[2]; _type _c[16]; \ + struct _str { _type _a[1]; } _s; \ + check_align_lv(_a[0], _align); \ + check_size(_a, _size); \ + check_size(_b, (_size*2)); \ + check_size(_c, (_size*16)); \ + check_size(_s, _size); \ + check_align_lv(_s._a[0], _align); \ +} + +#define check_basic_union_size_and_align(_type, _size, _align) { \ + union _union { _type dummy; } _u; \ + check_size(_u, _size); \ + check_align_lv(_u, _align); \ +} + +#define run_signed_tests2(_function, _arg1, _arg2) \ + _function(_arg1, _arg2); \ + _function(signed _arg1, _arg2); \ + _function(unsigned _arg1, _arg2); + +#define run_signed_tests3(_function, _arg1, _arg2, _arg3) \ + _function(_arg1, _arg2, _arg3); \ + _function(signed _arg1, _arg2, _arg3); \ + _function(unsigned _arg1, _arg2, _arg3); + +/* Check size of a struct and a union of three types. */ + +#define check_struct_and_union3(type1, type2, type3, struct_size, align_size) \ +{ \ + struct _str { type1 t1; type2 t2; type3 t3; } _t; \ + union _uni { type1 t1; type2 t2; type3 t3; } _u; \ + check_size(_t, struct_size); \ + check_size(_u, align_size); \ +} + +#endif // MACROS_H diff --git a/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_3_element_struct_and_unions.c b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_3_element_struct_and_unions.c new file mode 100644 index 00000000000..cc94e0fe0e9 --- /dev/null +++ b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_3_element_struct_and_unions.c @@ -0,0 +1,692 @@ +/* This is an autogenerated file. Do not edit. */ + +#include "defines.h" +#include "macros.h" + +/* Check structs and unions of all permutations of 3 basic types. */ +int +main (void) +{ + check_struct_and_union3(char, char, char, 3, 1); + check_struct_and_union3(char, char, short, 4, 2); + check_struct_and_union3(char, char, int, 8, 4); + check_struct_and_union3(char, char, long, 16, 8); + check_struct_and_union3(char, char, long long, 16, 8); + check_struct_and_union3(char, char, float, 8, 4); + check_struct_and_union3(char, char, double, 16, 8); + check_struct_and_union3(char, char, long double, 32, 16); + check_struct_and_union3(char, short, char, 6, 2); + check_struct_and_union3(char, short, short, 6, 2); + check_struct_and_union3(char, short, int, 8, 4); + check_struct_and_union3(char, short, long, 16, 8); + check_struct_and_union3(char, short, long long, 16, 8); + check_struct_and_union3(char, short, float, 8, 4); + check_struct_and_union3(char, short, double, 16, 8); + check_struct_and_union3(char, short, long double, 32, 16); + check_struct_and_union3(char, int, char, 12, 4); + check_struct_and_union3(char, int, short, 12, 4); + check_struct_and_union3(char, int, int, 12, 4); + check_struct_and_union3(char, int, long, 16, 8); + check_struct_and_union3(char, int, long long, 16, 8); + check_struct_and_union3(char, int, float, 12, 4); + check_struct_and_union3(char, int, double, 16, 8); + check_struct_and_union3(char, int, long double, 32, 16); + check_struct_and_union3(char, long, char, 24, 8); + check_struct_and_union3(char, long, short, 24, 8); + check_struct_and_union3(char, long, int, 24, 8); + check_struct_and_union3(char, long, long, 24, 8); + check_struct_and_union3(char, long, long long, 24, 8); + check_struct_and_union3(char, long, float, 24, 8); + check_struct_and_union3(char, long, double, 24, 8); + check_struct_and_union3(char, long, long double, 32, 16); + check_struct_and_union3(char, long long, char, 24, 8); + check_struct_and_union3(char, long long, short, 24, 8); + check_struct_and_union3(char, long long, int, 24, 8); + check_struct_and_union3(char, long long, long, 24, 8); + check_struct_and_union3(char, long long, long long, 24, 8); + check_struct_and_union3(char, long long, float, 24, 8); + check_struct_and_union3(char, long long, double, 24, 8); + check_struct_and_union3(char, long long, long double, 32, 16); + check_struct_and_union3(char, float, char, 12, 4); + check_struct_and_union3(char, float, short, 12, 4); + check_struct_and_union3(char, float, int, 12, 4); + check_struct_and_union3(char, float, long, 16, 8); + check_struct_and_union3(char, float, long long, 16, 8); + check_struct_and_union3(char, float, float, 12, 4); + check_struct_and_union3(char, float, double, 16, 8); + check_struct_and_union3(char, float, long double, 32, 16); + check_struct_and_union3(char, double, char, 24, 8); + check_struct_and_union3(char, double, short, 24, 8); + check_struct_and_union3(char, double, int, 24, 8); + check_struct_and_union3(char, double, long, 24, 8); + check_struct_and_union3(char, double, long long, 24, 8); + check_struct_and_union3(char, double, float, 24, 8); + check_struct_and_union3(char, double, double, 24, 8); + check_struct_and_union3(char, double, long double, 32, 16); + check_struct_and_union3(char, long double, char, 48, 16); + check_struct_and_union3(char, long double, short, 48, 16); + check_struct_and_union3(char, long double, int, 48, 16); + check_struct_and_union3(char, long double, long, 48, 16); + check_struct_and_union3(char, long double, long long, 48, 16); + check_struct_and_union3(char, long double, float, 48, 16); + check_struct_and_union3(char, long double, double, 48, 16); + check_struct_and_union3(char, long double, long double, 48, 16); + check_struct_and_union3(short, char, char, 4, 2); + check_struct_and_union3(short, char, short, 6, 2); + check_struct_and_union3(short, char, int, 8, 4); + check_struct_and_union3(short, char, long, 16, 8); + check_struct_and_union3(short, char, long long, 16, 8); + check_struct_and_union3(short, char, float, 8, 4); + check_struct_and_union3(short, char, double, 16, 8); + check_struct_and_union3(short, char, long double, 32, 16); + check_struct_and_union3(short, short, char, 6, 2); + check_struct_and_union3(short, short, short, 6, 2); + check_struct_and_union3(short, short, int, 8, 4); + check_struct_and_union3(short, short, long, 16, 8); + check_struct_and_union3(short, short, long long, 16, 8); + check_struct_and_union3(short, short, float, 8, 4); + check_struct_and_union3(short, short, double, 16, 8); + check_struct_and_union3(short, short, long double, 32, 16); + check_struct_and_union3(short, int, char, 12, 4); + check_struct_and_union3(short, int, short, 12, 4); + check_struct_and_union3(short, int, int, 12, 4); + check_struct_and_union3(short, int, long, 16, 8); + check_struct_and_union3(short, int, long long, 16, 8); + check_struct_and_union3(short, int, float, 12, 4); + check_struct_and_union3(short, int, double, 16, 8); + check_struct_and_union3(short, int, long double, 32, 16); + check_struct_and_union3(short, long, char, 24, 8); + check_struct_and_union3(short, long, short, 24, 8); + check_struct_and_union3(short, long, int, 24, 8); + check_struct_and_union3(short, long, long, 24, 8); + check_struct_and_union3(short, long, long long, 24, 8); + check_struct_and_union3(short, long, float, 24, 8); + check_struct_and_union3(short, long, double, 24, 8); + check_struct_and_union3(short, long, long double, 32, 16); + check_struct_and_union3(short, long long, char, 24, 8); + check_struct_and_union3(short, long long, short, 24, 8); + check_struct_and_union3(short, long long, int, 24, 8); + check_struct_and_union3(short, long long, long, 24, 8); + check_struct_and_union3(short, long long, long long, 24, 8); + check_struct_and_union3(short, long long, float, 24, 8); + check_struct_and_union3(short, long long, double, 24, 8); + check_struct_and_union3(short, long long, long double, 32, 16); + check_struct_and_union3(short, float, char, 12, 4); + check_struct_and_union3(short, float, short, 12, 4); + check_struct_and_union3(short, float, int, 12, 4); + check_struct_and_union3(short, float, long, 16, 8); + check_struct_and_union3(short, float, long long, 16, 8); + check_struct_and_union3(short, float, float, 12, 4); + check_struct_and_union3(short, float, double, 16, 8); + check_struct_and_union3(short, float, long double, 32, 16); + check_struct_and_union3(short, double, char, 24, 8); + check_struct_and_union3(short, double, short, 24, 8); + check_struct_and_union3(short, double, int, 24, 8); + check_struct_and_union3(short, double, long, 24, 8); + check_struct_and_union3(short, double, long long, 24, 8); + check_struct_and_union3(short, double, float, 24, 8); + check_struct_and_union3(short, double, double, 24, 8); + check_struct_and_union3(short, double, long double, 32, 16); + check_struct_and_union3(short, long double, char, 48, 16); + check_struct_and_union3(short, long double, short, 48, 16); + check_struct_and_union3(short, long double, int, 48, 16); + check_struct_and_union3(short, long double, long, 48, 16); + check_struct_and_union3(short, long double, long long, 48, 16); + check_struct_and_union3(short, long double, float, 48, 16); + check_struct_and_union3(short, long double, double, 48, 16); + check_struct_and_union3(short, long double, long double, 48, 16); + check_struct_and_union3(int, char, char, 8, 4); + check_struct_and_union3(int, char, short, 8, 4); + check_struct_and_union3(int, char, int, 12, 4); + check_struct_and_union3(int, char, long, 16, 8); + check_struct_and_union3(int, char, long long, 16, 8); + check_struct_and_union3(int, char, float, 12, 4); + check_struct_and_union3(int, char, double, 16, 8); + check_struct_and_union3(int, char, long double, 32, 16); + check_struct_and_union3(int, short, char, 8, 4); + check_struct_and_union3(int, short, short, 8, 4); + check_struct_and_union3(int, short, int, 12, 4); + check_struct_and_union3(int, short, long, 16, 8); + check_struct_and_union3(int, short, long long, 16, 8); + check_struct_and_union3(int, short, float, 12, 4); + check_struct_and_union3(int, short, double, 16, 8); + check_struct_and_union3(int, short, long double, 32, 16); + check_struct_and_union3(int, int, char, 12, 4); + check_struct_and_union3(int, int, short, 12, 4); + check_struct_and_union3(int, int, int, 12, 4); + check_struct_and_union3(int, int, long, 16, 8); + check_struct_and_union3(int, int, long long, 16, 8); + check_struct_and_union3(int, int, float, 12, 4); + check_struct_and_union3(int, int, double, 16, 8); + check_struct_and_union3(int, int, long double, 32, 16); + check_struct_and_union3(int, long, char, 24, 8); + check_struct_and_union3(int, long, short, 24, 8); + check_struct_and_union3(int, long, int, 24, 8); + check_struct_and_union3(int, long, long, 24, 8); + check_struct_and_union3(int, long, long long, 24, 8); + check_struct_and_union3(int, long, float, 24, 8); + check_struct_and_union3(int, long, double, 24, 8); + check_struct_and_union3(int, long, long double, 32, 16); + check_struct_and_union3(int, long long, char, 24, 8); + check_struct_and_union3(int, long long, short, 24, 8); + check_struct_and_union3(int, long long, int, 24, 8); + check_struct_and_union3(int, long long, long, 24, 8); + check_struct_and_union3(int, long long, long long, 24, 8); + check_struct_and_union3(int, long long, float, 24, 8); + check_struct_and_union3(int, long long, double, 24, 8); + check_struct_and_union3(int, long long, long double, 32, 16); + check_struct_and_union3(int, float, char, 12, 4); + check_struct_and_union3(int, float, short, 12, 4); + check_struct_and_union3(int, float, int, 12, 4); + check_struct_and_union3(int, float, long, 16, 8); + check_struct_and_union3(int, float, long long, 16, 8); + check_struct_and_union3(int, float, float, 12, 4); + check_struct_and_union3(int, float, double, 16, 8); + check_struct_and_union3(int, float, long double, 32, 16); + check_struct_and_union3(int, double, char, 24, 8); + check_struct_and_union3(int, double, short, 24, 8); + check_struct_and_union3(int, double, int, 24, 8); + check_struct_and_union3(int, double, long, 24, 8); + check_struct_and_union3(int, double, long long, 24, 8); + check_struct_and_union3(int, double, float, 24, 8); + check_struct_and_union3(int, double, double, 24, 8); + check_struct_and_union3(int, double, long double, 32, 16); + check_struct_and_union3(int, long double, char, 48, 16); + check_struct_and_union3(int, long double, short, 48, 16); + check_struct_and_union3(int, long double, int, 48, 16); + check_struct_and_union3(int, long double, long, 48, 16); + check_struct_and_union3(int, long double, long long, 48, 16); + check_struct_and_union3(int, long double, float, 48, 16); + check_struct_and_union3(int, long double, double, 48, 16); + check_struct_and_union3(int, long double, long double, 48, 16); + check_struct_and_union3(long, char, char, 16, 8); + check_struct_and_union3(long, char, short, 16, 8); + check_struct_and_union3(long, char, int, 16, 8); + check_struct_and_union3(long, char, long, 24, 8); + check_struct_and_union3(long, char, long long, 24, 8); + check_struct_and_union3(long, char, float, 16, 8); + check_struct_and_union3(long, char, double, 24, 8); + check_struct_and_union3(long, char, long double, 32, 16); + check_struct_and_union3(long, short, char, 16, 8); + check_struct_and_union3(long, short, short, 16, 8); + check_struct_and_union3(long, short, int, 16, 8); + check_struct_and_union3(long, short, long, 24, 8); + check_struct_and_union3(long, short, long long, 24, 8); + check_struct_and_union3(long, short, float, 16, 8); + check_struct_and_union3(long, short, double, 24, 8); + check_struct_and_union3(long, short, long double, 32, 16); + check_struct_and_union3(long, int, char, 16, 8); + check_struct_and_union3(long, int, short, 16, 8); + check_struct_and_union3(long, int, int, 16, 8); + check_struct_and_union3(long, int, long, 24, 8); + check_struct_and_union3(long, int, long long, 24, 8); + check_struct_and_union3(long, int, float, 16, 8); + check_struct_and_union3(long, int, double, 24, 8); + check_struct_and_union3(long, int, long double, 32, 16); + check_struct_and_union3(long, long, char, 24, 8); + check_struct_and_union3(long, long, short, 24, 8); + check_struct_and_union3(long, long, int, 24, 8); + check_struct_and_union3(long, long, long, 24, 8); + check_struct_and_union3(long, long, long long, 24, 8); + check_struct_and_union3(long, long, float, 24, 8); + check_struct_and_union3(long, long, double, 24, 8); + check_struct_and_union3(long, long, long double, 32, 16); + check_struct_and_union3(long, long long, char, 24, 8); + check_struct_and_union3(long, long long, short, 24, 8); + check_struct_and_union3(long, long long, int, 24, 8); + check_struct_and_union3(long, long long, long, 24, 8); + check_struct_and_union3(long, long long, long long, 24, 8); + check_struct_and_union3(long, long long, float, 24, 8); + check_struct_and_union3(long, long long, double, 24, 8); + check_struct_and_union3(long, long long, long double, 32, 16); + check_struct_and_union3(long, float, char, 16, 8); + check_struct_and_union3(long, float, short, 16, 8); + check_struct_and_union3(long, float, int, 16, 8); + check_struct_and_union3(long, float, long, 24, 8); + check_struct_and_union3(long, float, long long, 24, 8); + check_struct_and_union3(long, float, float, 16, 8); + check_struct_and_union3(long, float, double, 24, 8); + check_struct_and_union3(long, float, long double, 32, 16); + check_struct_and_union3(long, double, char, 24, 8); + check_struct_and_union3(long, double, short, 24, 8); + check_struct_and_union3(long, double, int, 24, 8); + check_struct_and_union3(long, double, long, 24, 8); + check_struct_and_union3(long, double, long long, 24, 8); + check_struct_and_union3(long, double, float, 24, 8); + check_struct_and_union3(long, double, double, 24, 8); + check_struct_and_union3(long, double, long double, 32, 16); + check_struct_and_union3(long, long double, char, 48, 16); + check_struct_and_union3(long, long double, short, 48, 16); + check_struct_and_union3(long, long double, int, 48, 16); + check_struct_and_union3(long, long double, long, 48, 16); + check_struct_and_union3(long, long double, long long, 48, 16); + check_struct_and_union3(long, long double, float, 48, 16); + check_struct_and_union3(long, long double, double, 48, 16); + check_struct_and_union3(long, long double, long double, 48, 16); + check_struct_and_union3(long long, char, char, 16, 8); + check_struct_and_union3(long long, char, short, 16, 8); + check_struct_and_union3(long long, char, int, 16, 8); + check_struct_and_union3(long long, char, long, 24, 8); + check_struct_and_union3(long long, char, long long, 24, 8); + check_struct_and_union3(long long, char, float, 16, 8); + check_struct_and_union3(long long, char, double, 24, 8); + check_struct_and_union3(long long, char, long double, 32, 16); + check_struct_and_union3(long long, short, char, 16, 8); + check_struct_and_union3(long long, short, short, 16, 8); + check_struct_and_union3(long long, short, int, 16, 8); + check_struct_and_union3(long long, short, long, 24, 8); + check_struct_and_union3(long long, short, long long, 24, 8); + check_struct_and_union3(long long, short, float, 16, 8); + check_struct_and_union3(long long, short, double, 24, 8); + check_struct_and_union3(long long, short, long double, 32, 16); + check_struct_and_union3(long long, int, char, 16, 8); + check_struct_and_union3(long long, int, short, 16, 8); + check_struct_and_union3(long long, int, int, 16, 8); + check_struct_and_union3(long long, int, long, 24, 8); + check_struct_and_union3(long long, int, long long, 24, 8); + check_struct_and_union3(long long, int, float, 16, 8); + check_struct_and_union3(long long, int, double, 24, 8); + check_struct_and_union3(long long, int, long double, 32, 16); + check_struct_and_union3(long long, long, char, 24, 8); + check_struct_and_union3(long long, long, short, 24, 8); + check_struct_and_union3(long long, long, int, 24, 8); + check_struct_and_union3(long long, long, long, 24, 8); + check_struct_and_union3(long long, long, long long, 24, 8); + check_struct_and_union3(long long, long, float, 24, 8); + check_struct_and_union3(long long, long, double, 24, 8); + check_struct_and_union3(long long, long, long double, 32, 16); + check_struct_and_union3(long long, long long, char, 24, 8); + check_struct_and_union3(long long, long long, short, 24, 8); + check_struct_and_union3(long long, long long, int, 24, 8); + check_struct_and_union3(long long, long long, long, 24, 8); + check_struct_and_union3(long long, long long, long long, 24, 8); + check_struct_and_union3(long long, long long, float, 24, 8); + check_struct_and_union3(long long, long long, double, 24, 8); + check_struct_and_union3(long long, long long, long double, 32, 16); + check_struct_and_union3(long long, float, char, 16, 8); + check_struct_and_union3(long long, float, short, 16, 8); + check_struct_and_union3(long long, float, int, 16, 8); + check_struct_and_union3(long long, float, long, 24, 8); + check_struct_and_union3(long long, float, long long, 24, 8); + check_struct_and_union3(long long, float, float, 16, 8); + check_struct_and_union3(long long, float, double, 24, 8); + check_struct_and_union3(long long, float, long double, 32, 16); + check_struct_and_union3(long long, double, char, 24, 8); + check_struct_and_union3(long long, double, short, 24, 8); + check_struct_and_union3(long long, double, int, 24, 8); + check_struct_and_union3(long long, double, long, 24, 8); + check_struct_and_union3(long long, double, long long, 24, 8); + check_struct_and_union3(long long, double, float, 24, 8); + check_struct_and_union3(long long, double, double, 24, 8); + check_struct_and_union3(long long, double, long double, 32, 16); + check_struct_and_union3(long long, long double, char, 48, 16); + check_struct_and_union3(long long, long double, short, 48, 16); + check_struct_and_union3(long long, long double, int, 48, 16); + check_struct_and_union3(long long, long double, long, 48, 16); + check_struct_and_union3(long long, long double, long long, 48, 16); + check_struct_and_union3(long long, long double, float, 48, 16); + check_struct_and_union3(long long, long double, double, 48, 16); + check_struct_and_union3(long long, long double, long double, 48, 16); + check_struct_and_union3(float, char, char, 8, 4); + check_struct_and_union3(float, char, short, 8, 4); + check_struct_and_union3(float, char, int, 12, 4); + check_struct_and_union3(float, char, long, 16, 8); + check_struct_and_union3(float, char, long long, 16, 8); + check_struct_and_union3(float, char, float, 12, 4); + check_struct_and_union3(float, char, double, 16, 8); + check_struct_and_union3(float, char, long double, 32, 16); + check_struct_and_union3(float, short, char, 8, 4); + check_struct_and_union3(float, short, short, 8, 4); + check_struct_and_union3(float, short, int, 12, 4); + check_struct_and_union3(float, short, long, 16, 8); + check_struct_and_union3(float, short, long long, 16, 8); + check_struct_and_union3(float, short, float, 12, 4); + check_struct_and_union3(float, short, double, 16, 8); + check_struct_and_union3(float, short, long double, 32, 16); + check_struct_and_union3(float, int, char, 12, 4); + check_struct_and_union3(float, int, short, 12, 4); + check_struct_and_union3(float, int, int, 12, 4); + check_struct_and_union3(float, int, long, 16, 8); + check_struct_and_union3(float, int, long long, 16, 8); + check_struct_and_union3(float, int, float, 12, 4); + check_struct_and_union3(float, int, double, 16, 8); + check_struct_and_union3(float, int, long double, 32, 16); + check_struct_and_union3(float, long, char, 24, 8); + check_struct_and_union3(float, long, short, 24, 8); + check_struct_and_union3(float, long, int, 24, 8); + check_struct_and_union3(float, long, long, 24, 8); + check_struct_and_union3(float, long, long long, 24, 8); + check_struct_and_union3(float, long, float, 24, 8); + check_struct_and_union3(float, long, double, 24, 8); + check_struct_and_union3(float, long, long double, 32, 16); + check_struct_and_union3(float, long long, char, 24, 8); + check_struct_and_union3(float, long long, short, 24, 8); + check_struct_and_union3(float, long long, int, 24, 8); + check_struct_and_union3(float, long long, long, 24, 8); + check_struct_and_union3(float, long long, long long, 24, 8); + check_struct_and_union3(float, long long, float, 24, 8); + check_struct_and_union3(float, long long, double, 24, 8); + check_struct_and_union3(float, long long, long double, 32, 16); + check_struct_and_union3(float, float, char, 12, 4); + check_struct_and_union3(float, float, short, 12, 4); + check_struct_and_union3(float, float, int, 12, 4); + check_struct_and_union3(float, float, long, 16, 8); + check_struct_and_union3(float, float, long long, 16, 8); + check_struct_and_union3(float, float, float, 12, 4); + check_struct_and_union3(float, float, double, 16, 8); + check_struct_and_union3(float, float, long double, 32, 16); + check_struct_and_union3(float, double, char, 24, 8); + check_struct_and_union3(float, double, short, 24, 8); + check_struct_and_union3(float, double, int, 24, 8); + check_struct_and_union3(float, double, long, 24, 8); + check_struct_and_union3(float, double, long long, 24, 8); + check_struct_and_union3(float, double, float, 24, 8); + check_struct_and_union3(float, double, double, 24, 8); + check_struct_and_union3(float, double, long double, 32, 16); + check_struct_and_union3(float, long double, char, 48, 16); + check_struct_and_union3(float, long double, short, 48, 16); + check_struct_and_union3(float, long double, int, 48, 16); + check_struct_and_union3(float, long double, long, 48, 16); + check_struct_and_union3(float, long double, long long, 48, 16); + check_struct_and_union3(float, long double, float, 48, 16); + check_struct_and_union3(float, long double, double, 48, 16); + check_struct_and_union3(float, long double, long double, 48, 16); + check_struct_and_union3(double, char, char, 16, 8); + check_struct_and_union3(double, char, short, 16, 8); + check_struct_and_union3(double, char, int, 16, 8); + check_struct_and_union3(double, char, long, 24, 8); + check_struct_and_union3(double, char, long long, 24, 8); + check_struct_and_union3(double, char, float, 16, 8); + check_struct_and_union3(double, char, double, 24, 8); + check_struct_and_union3(double, char, long double, 32, 16); + check_struct_and_union3(double, short, char, 16, 8); + check_struct_and_union3(double, short, short, 16, 8); + check_struct_and_union3(double, short, int, 16, 8); + check_struct_and_union3(double, short, long, 24, 8); + check_struct_and_union3(double, short, long long, 24, 8); + check_struct_and_union3(double, short, float, 16, 8); + check_struct_and_union3(double, short, double, 24, 8); + check_struct_and_union3(double, short, long double, 32, 16); + check_struct_and_union3(double, int, char, 16, 8); + check_struct_and_union3(double, int, short, 16, 8); + check_struct_and_union3(double, int, int, 16, 8); + check_struct_and_union3(double, int, long, 24, 8); + check_struct_and_union3(double, int, long long, 24, 8); + check_struct_and_union3(double, int, float, 16, 8); + check_struct_and_union3(double, int, double, 24, 8); + check_struct_and_union3(double, int, long double, 32, 16); + check_struct_and_union3(double, long, char, 24, 8); + check_struct_and_union3(double, long, short, 24, 8); + check_struct_and_union3(double, long, int, 24, 8); + check_struct_and_union3(double, long, long, 24, 8); + check_struct_and_union3(double, long, long long, 24, 8); + check_struct_and_union3(double, long, float, 24, 8); + check_struct_and_union3(double, long, double, 24, 8); + check_struct_and_union3(double, long, long double, 32, 16); + check_struct_and_union3(double, long long, char, 24, 8); + check_struct_and_union3(double, long long, short, 24, 8); + check_struct_and_union3(double, long long, int, 24, 8); + check_struct_and_union3(double, long long, long, 24, 8); + check_struct_and_union3(double, long long, long long, 24, 8); + check_struct_and_union3(double, long long, float, 24, 8); + check_struct_and_union3(double, long long, double, 24, 8); + check_struct_and_union3(double, long long, long double, 32, 16); + check_struct_and_union3(double, float, char, 16, 8); + check_struct_and_union3(double, float, short, 16, 8); + check_struct_and_union3(double, float, int, 16, 8); + check_struct_and_union3(double, float, long, 24, 8); + check_struct_and_union3(double, float, long long, 24, 8); + check_struct_and_union3(double, float, float, 16, 8); + check_struct_and_union3(double, float, double, 24, 8); + check_struct_and_union3(double, float, long double, 32, 16); + check_struct_and_union3(double, double, char, 24, 8); + check_struct_and_union3(double, double, short, 24, 8); + check_struct_and_union3(double, double, int, 24, 8); + check_struct_and_union3(double, double, long, 24, 8); + check_struct_and_union3(double, double, long long, 24, 8); + check_struct_and_union3(double, double, float, 24, 8); + check_struct_and_union3(double, double, double, 24, 8); + check_struct_and_union3(double, double, long double, 32, 16); + check_struct_and_union3(double, long double, char, 48, 16); + check_struct_and_union3(double, long double, short, 48, 16); + check_struct_and_union3(double, long double, int, 48, 16); + check_struct_and_union3(double, long double, long, 48, 16); + check_struct_and_union3(double, long double, long long, 48, 16); + check_struct_and_union3(double, long double, float, 48, 16); + check_struct_and_union3(double, long double, double, 48, 16); + check_struct_and_union3(double, long double, long double, 48, 16); + check_struct_and_union3(long double, char, char, 32, 16); + check_struct_and_union3(long double, char, short, 32, 16); + check_struct_and_union3(long double, char, int, 32, 16); + check_struct_and_union3(long double, char, long, 32, 16); + check_struct_and_union3(long double, char, long long, 32, 16); + check_struct_and_union3(long double, char, float, 32, 16); + check_struct_and_union3(long double, char, double, 32, 16); + check_struct_and_union3(long double, char, long double, 48, 16); + check_struct_and_union3(long double, short, char, 32, 16); + check_struct_and_union3(long double, short, short, 32, 16); + check_struct_and_union3(long double, short, int, 32, 16); + check_struct_and_union3(long double, short, long, 32, 16); + check_struct_and_union3(long double, short, long long, 32, 16); + check_struct_and_union3(long double, short, float, 32, 16); + check_struct_and_union3(long double, short, double, 32, 16); + check_struct_and_union3(long double, short, long double, 48, 16); + check_struct_and_union3(long double, int, char, 32, 16); + check_struct_and_union3(long double, int, short, 32, 16); + check_struct_and_union3(long double, int, int, 32, 16); + check_struct_and_union3(long double, int, long, 32, 16); + check_struct_and_union3(long double, int, long long, 32, 16); + check_struct_and_union3(long double, int, float, 32, 16); + check_struct_and_union3(long double, int, double, 32, 16); + check_struct_and_union3(long double, int, long double, 48, 16); + check_struct_and_union3(long double, long, char, 32, 16); + check_struct_and_union3(long double, long, short, 32, 16); + check_struct_and_union3(long double, long, int, 32, 16); + check_struct_and_union3(long double, long, long, 32, 16); + check_struct_and_union3(long double, long, long long, 32, 16); + check_struct_and_union3(long double, long, float, 32, 16); + check_struct_and_union3(long double, long, double, 32, 16); + check_struct_and_union3(long double, long, long double, 48, 16); + check_struct_and_union3(long double, long long, char, 32, 16); + check_struct_and_union3(long double, long long, short, 32, 16); + check_struct_and_union3(long double, long long, int, 32, 16); + check_struct_and_union3(long double, long long, long, 32, 16); + check_struct_and_union3(long double, long long, long long, 32, 16); + check_struct_and_union3(long double, long long, float, 32, 16); + check_struct_and_union3(long double, long long, double, 32, 16); + check_struct_and_union3(long double, long long, long double, 48, 16); + check_struct_and_union3(long double, float, char, 32, 16); + check_struct_and_union3(long double, float, short, 32, 16); + check_struct_and_union3(long double, float, int, 32, 16); + check_struct_and_union3(long double, float, long, 32, 16); + check_struct_and_union3(long double, float, long long, 32, 16); + check_struct_and_union3(long double, float, float, 32, 16); + check_struct_and_union3(long double, float, double, 32, 16); + check_struct_and_union3(long double, float, long double, 48, 16); + check_struct_and_union3(long double, double, char, 32, 16); + check_struct_and_union3(long double, double, short, 32, 16); + check_struct_and_union3(long double, double, int, 32, 16); + check_struct_and_union3(long double, double, long, 32, 16); + check_struct_and_union3(long double, double, long long, 32, 16); + check_struct_and_union3(long double, double, float, 32, 16); + check_struct_and_union3(long double, double, double, 32, 16); + check_struct_and_union3(long double, double, long double, 48, 16); + check_struct_and_union3(long double, long double, char, 48, 16); + check_struct_and_union3(long double, long double, short, 48, 16); + check_struct_and_union3(long double, long double, int, 48, 16); + check_struct_and_union3(long double, long double, long, 48, 16); + check_struct_and_union3(long double, long double, long long, 48, 16); + check_struct_and_union3(long double, long double, float, 48, 16); + check_struct_and_union3(long double, long double, double, 48, 16); + check_struct_and_union3(long double, long double, long double, 48, 16); + check_struct_and_union3(char, char, _Float16, 4, 2); + check_struct_and_union3(char, _Float16, char, 6, 2); + check_struct_and_union3(char, _Float16, _Float16, 6, 2); + check_struct_and_union3(char, _Float16, int, 8, 4); + check_struct_and_union3(char, _Float16, long, 16, 8); + check_struct_and_union3(char, _Float16, long long, 16, 8); + check_struct_and_union3(char, _Float16, float, 8, 4); + check_struct_and_union3(char, _Float16, double, 16, 8); + check_struct_and_union3(char, _Float16, long double, 32, 16); + check_struct_and_union3(char, int, _Float16, 12, 4); + check_struct_and_union3(char, long, _Float16, 24, 8); + check_struct_and_union3(char, long long, _Float16, 24, 8); + check_struct_and_union3(char, float, _Float16, 12, 4); + check_struct_and_union3(char, double, _Float16, 24, 8); + check_struct_and_union3(char, long double, _Float16, 48, 16); + check_struct_and_union3(_Float16, char, char, 4, 2); + check_struct_and_union3(_Float16, char, _Float16, 6, 2); + check_struct_and_union3(_Float16, char, int, 8, 4); + check_struct_and_union3(_Float16, char, long, 16, 8); + check_struct_and_union3(_Float16, char, long long, 16, 8); + check_struct_and_union3(_Float16, char, float, 8, 4); + check_struct_and_union3(_Float16, char, double, 16, 8); + check_struct_and_union3(_Float16, char, long double, 32, 16); + check_struct_and_union3(_Float16, _Float16, char, 6, 2); + check_struct_and_union3(_Float16, _Float16, _Float16, 6, 2); + check_struct_and_union3(_Float16, _Float16, int, 8, 4); + check_struct_and_union3(_Float16, _Float16, long, 16, 8); + check_struct_and_union3(_Float16, _Float16, long long, 16, 8); + check_struct_and_union3(_Float16, _Float16, float, 8, 4); + check_struct_and_union3(_Float16, _Float16, double, 16, 8); + check_struct_and_union3(_Float16, _Float16, long double, 32, 16); + check_struct_and_union3(_Float16, int, char, 12, 4); + check_struct_and_union3(_Float16, int, _Float16, 12, 4); + check_struct_and_union3(_Float16, int, int, 12, 4); + check_struct_and_union3(_Float16, int, long, 16, 8); + check_struct_and_union3(_Float16, int, long long, 16, 8); + check_struct_and_union3(_Float16, int, float, 12, 4); + check_struct_and_union3(_Float16, int, double, 16, 8); + check_struct_and_union3(_Float16, int, long double, 32, 16); + check_struct_and_union3(_Float16, long, char, 24, 8); + check_struct_and_union3(_Float16, long, _Float16, 24, 8); + check_struct_and_union3(_Float16, long, int, 24, 8); + check_struct_and_union3(_Float16, long, long, 24, 8); + check_struct_and_union3(_Float16, long, long long, 24, 8); + check_struct_and_union3(_Float16, long, float, 24, 8); + check_struct_and_union3(_Float16, long, double, 24, 8); + check_struct_and_union3(_Float16, long, long double, 32, 16); + check_struct_and_union3(_Float16, long long, char, 24, 8); + check_struct_and_union3(_Float16, long long, _Float16, 24, 8); + check_struct_and_union3(_Float16, long long, int, 24, 8); + check_struct_and_union3(_Float16, long long, long, 24, 8); + check_struct_and_union3(_Float16, long long, long long, 24, 8); + check_struct_and_union3(_Float16, long long, float, 24, 8); + check_struct_and_union3(_Float16, long long, double, 24, 8); + check_struct_and_union3(_Float16, long long, long double, 32, 16); + check_struct_and_union3(_Float16, float, char, 12, 4); + check_struct_and_union3(_Float16, float, _Float16, 12, 4); + check_struct_and_union3(_Float16, float, int, 12, 4); + check_struct_and_union3(_Float16, float, long, 16, 8); + check_struct_and_union3(_Float16, float, long long, 16, 8); + check_struct_and_union3(_Float16, float, float, 12, 4); + check_struct_and_union3(_Float16, float, double, 16, 8); + check_struct_and_union3(_Float16, float, long double, 32, 16); + check_struct_and_union3(_Float16, double, char, 24, 8); + check_struct_and_union3(_Float16, double, _Float16, 24, 8); + check_struct_and_union3(_Float16, double, int, 24, 8); + check_struct_and_union3(_Float16, double, long, 24, 8); + check_struct_and_union3(_Float16, double, long long, 24, 8); + check_struct_and_union3(_Float16, double, float, 24, 8); + check_struct_and_union3(_Float16, double, double, 24, 8); + check_struct_and_union3(_Float16, double, long double, 32, 16); + check_struct_and_union3(_Float16, long double, char, 48, 16); + check_struct_and_union3(_Float16, long double, _Float16, 48, 16); + check_struct_and_union3(_Float16, long double, int, 48, 16); + check_struct_and_union3(_Float16, long double, long, 48, 16); + check_struct_and_union3(_Float16, long double, long long, 48, 16); + check_struct_and_union3(_Float16, long double, float, 48, 16); + check_struct_and_union3(_Float16, long double, double, 48, 16); + check_struct_and_union3(_Float16, long double, long double, 48, 16); + check_struct_and_union3(int, char, _Float16, 8, 4); + check_struct_and_union3(int, _Float16, char, 8, 4); + check_struct_and_union3(int, _Float16, _Float16, 8, 4); + check_struct_and_union3(int, _Float16, int, 12, 4); + check_struct_and_union3(int, _Float16, long, 16, 8); + check_struct_and_union3(int, _Float16, long long, 16, 8); + check_struct_and_union3(int, _Float16, float, 12, 4); + check_struct_and_union3(int, _Float16, double, 16, 8); + check_struct_and_union3(int, _Float16, long double, 32, 16); + check_struct_and_union3(int, int, _Float16, 12, 4); + check_struct_and_union3(int, long, _Float16, 24, 8); + check_struct_and_union3(int, long long, _Float16, 24, 8); + check_struct_and_union3(int, float, _Float16, 12, 4); + check_struct_and_union3(int, double, _Float16, 24, 8); + check_struct_and_union3(int, long double, _Float16, 48, 16); + check_struct_and_union3(long, char, _Float16, 16, 8); + check_struct_and_union3(long, _Float16, char, 16, 8); + check_struct_and_union3(long, _Float16, _Float16, 16, 8); + check_struct_and_union3(long, _Float16, int, 16, 8); + check_struct_and_union3(long, _Float16, long, 24, 8); + check_struct_and_union3(long, _Float16, long long, 24, 8); + check_struct_and_union3(long, _Float16, float, 16, 8); + check_struct_and_union3(long, _Float16, double, 24, 8); + check_struct_and_union3(long, _Float16, long double, 32, 16); + check_struct_and_union3(long, int, _Float16, 16, 8); + check_struct_and_union3(long, long, _Float16, 24, 8); + check_struct_and_union3(long, long long, _Float16, 24, 8); + check_struct_and_union3(long, float, _Float16, 16, 8); + check_struct_and_union3(long, double, _Float16, 24, 8); + check_struct_and_union3(long, long double, _Float16, 48, 16); + check_struct_and_union3(long long, char, _Float16, 16, 8); + check_struct_and_union3(long long, _Float16, char, 16, 8); + check_struct_and_union3(long long, _Float16, _Float16, 16, 8); + check_struct_and_union3(long long, _Float16, int, 16, 8); + check_struct_and_union3(long long, _Float16, long, 24, 8); + check_struct_and_union3(long long, _Float16, long long, 24, 8); + check_struct_and_union3(long long, _Float16, float, 16, 8); + check_struct_and_union3(long long, _Float16, double, 24, 8); + check_struct_and_union3(long long, _Float16, long double, 32, 16); + check_struct_and_union3(long long, int, _Float16, 16, 8); + check_struct_and_union3(long long, long, _Float16, 24, 8); + check_struct_and_union3(long long, long long, _Float16, 24, 8); + check_struct_and_union3(long long, float, _Float16, 16, 8); + check_struct_and_union3(long long, double, _Float16, 24, 8); + check_struct_and_union3(long long, long double, _Float16, 48, 16); + check_struct_and_union3(float, char, _Float16, 8, 4); + check_struct_and_union3(float, _Float16, char, 8, 4); + check_struct_and_union3(float, _Float16, _Float16, 8, 4); + check_struct_and_union3(float, _Float16, int, 12, 4); + check_struct_and_union3(float, _Float16, long, 16, 8); + check_struct_and_union3(float, _Float16, long long, 16, 8); + check_struct_and_union3(float, _Float16, float, 12, 4); + check_struct_and_union3(float, _Float16, double, 16, 8); + check_struct_and_union3(float, _Float16, long double, 32, 16); + check_struct_and_union3(float, int, _Float16, 12, 4); + check_struct_and_union3(float, long, _Float16, 24, 8); + check_struct_and_union3(float, long long, _Float16, 24, 8); + check_struct_and_union3(float, float, _Float16, 12, 4); + check_struct_and_union3(float, double, _Float16, 24, 8); + check_struct_and_union3(float, long double, _Float16, 48, 16); + check_struct_and_union3(double, char, _Float16, 16, 8); + check_struct_and_union3(double, _Float16, char, 16, 8); + check_struct_and_union3(double, _Float16, _Float16, 16, 8); + check_struct_and_union3(double, _Float16, int, 16, 8); + check_struct_and_union3(double, _Float16, long, 24, 8); + check_struct_and_union3(double, _Float16, long long, 24, 8); + check_struct_and_union3(double, _Float16, float, 16, 8); + check_struct_and_union3(double, _Float16, double, 24, 8); + check_struct_and_union3(double, _Float16, long double, 32, 16); + check_struct_and_union3(double, int, _Float16, 16, 8); + check_struct_and_union3(double, long, _Float16, 24, 8); + check_struct_and_union3(double, long long, _Float16, 24, 8); + check_struct_and_union3(double, float, _Float16, 16, 8); + check_struct_and_union3(double, double, _Float16, 24, 8); + check_struct_and_union3(double, long double, _Float16, 48, 16); + check_struct_and_union3(long double, char, _Float16, 32, 16); + check_struct_and_union3(long double, _Float16, char, 32, 16); + check_struct_and_union3(long double, _Float16, _Float16, 32, 16); + check_struct_and_union3(long double, _Float16, int, 32, 16); + check_struct_and_union3(long double, _Float16, long, 32, 16); + check_struct_and_union3(long double, _Float16, long long, 32, 16); + check_struct_and_union3(long double, _Float16, float, 32, 16); + check_struct_and_union3(long double, _Float16, double, 32, 16); + check_struct_and_union3(long double, _Float16, long double, 48, 16); + check_struct_and_union3(long double, int, _Float16, 32, 16); + check_struct_and_union3(long double, long, _Float16, 32, 16); + check_struct_and_union3(long double, long long, _Float16, 32, 16); + check_struct_and_union3(long double, float, _Float16, 32, 16); + check_struct_and_union3(long double, double, _Float16, 32, 16); + check_struct_and_union3(long double, long double, _Float16, 48, 16); + return 0; +} diff --git a/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_basic_alignment.c b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_basic_alignment.c new file mode 100644 index 00000000000..2a72b5c9e18 --- /dev/null +++ b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_basic_alignment.c @@ -0,0 +1,45 @@ +/* This checks alignment of basic types. */ + +#include "defines.h" +#include "macros.h" + + +int +main (void) +{ + /* Integral types. */ + run_signed_tests2(check_align, char, TYPE_ALIGN_CHAR); + run_signed_tests2(check_align, short, TYPE_ALIGN_SHORT); + run_signed_tests2(check_align, int, TYPE_ALIGN_INT); + run_signed_tests2(check_align, long, TYPE_ALIGN_LONG); + run_signed_tests2(check_align, long long, TYPE_ALIGN_LONG_LONG); +#ifdef CHECK_INT128 + run_signed_tests2(check_align, __int128, TYPE_ALIGN_INT128); +#endif + check_align(enumtype, TYPE_ALIGN_ENUM); + + /* Floating point types. */ + check_align(float, TYPE_ALIGN_FLOAT); + check_align(double, TYPE_ALIGN_DOUBLE); +#ifdef CHECK_LONG_DOUBLE + check_align(long double, TYPE_ALIGN_LONG_DOUBLE); +#endif +#ifdef CHECK_FLOAT128 + check_align(__float128, TYPE_ALIGN_FLOAT128); +#endif + + /* Packed types - MMX, 3DNow!, SSE and SSE2. */ +#ifdef CHECK_M64_M128 + check_align(__m64, TYPE_ALIGN_M64); + check_align(__m128, TYPE_ALIGN_M128); +#endif + + /* _Float16 point types. */ + check_align(_Float16, TYPE_ALIGN_FLOAT16); + + /* Pointer types. */ + check_align(void *, TYPE_ALIGN_POINTER); + check_align(void (*)(), TYPE_ALIGN_POINTER); + + return 0; +} diff --git a/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_basic_array_size_and_align.c b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_basic_array_size_and_align.c new file mode 100644 index 00000000000..d58b9d1c43c --- /dev/null +++ b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_basic_array_size_and_align.c @@ -0,0 +1,43 @@ +/* This checks . */ + +#include "defines.h" +#include "macros.h" + + +int +main (void) +{ + /* Integral types. */ + run_signed_tests3(check_array_size_and_align, char, TYPE_SIZE_CHAR, TYPE_ALIGN_CHAR); + run_signed_tests3(check_array_size_and_align, short, TYPE_SIZE_SHORT, TYPE_ALIGN_SHORT); + run_signed_tests3(check_array_size_and_align, int, TYPE_SIZE_INT, TYPE_ALIGN_INT); + run_signed_tests3(check_array_size_and_align, long, TYPE_SIZE_LONG, TYPE_ALIGN_LONG); + run_signed_tests3(check_array_size_and_align, long long, TYPE_SIZE_LONG_LONG, TYPE_ALIGN_LONG_LONG); +#ifdef CHECK_INT128 + run_signed_tests3(check_array_size_and_align, __int128, TYPE_SIZE_INT128, TYPE_ALIGN_INT128); +#endif + check_array_size_and_align(enum dummytype, TYPE_SIZE_ENUM, TYPE_ALIGN_ENUM); + + /* Floating point types. */ + check_array_size_and_align(float, TYPE_SIZE_FLOAT, TYPE_ALIGN_FLOAT); + check_array_size_and_align(double, TYPE_SIZE_DOUBLE, TYPE_ALIGN_DOUBLE); +#ifdef CHECK_LONG_DOUBLE + check_array_size_and_align(long double, TYPE_SIZE_LONG_DOUBLE, TYPE_ALIGN_LONG_DOUBLE); +#endif +#ifdef CHECK_FLOAT128 + check_array_size_and_align(__float128, TYPE_SIZE_FLOAT128, TYPE_ALIGN_FLOAT128); +#endif + + /* Packed types - MMX, 3DNow!, SSE and SSE2. */ +#ifdef CHECK_M64_M128 + check_array_size_and_align(__m64, TYPE_SIZE_M64, TYPE_ALIGN_M64); + check_array_size_and_align(__m128, TYPE_SIZE_M128, TYPE_ALIGN_M128); +#endif + + /* Pointer types. The function pointer doesn't work with these macros. */ + check_array_size_and_align(void *, TYPE_SIZE_POINTER, TYPE_ALIGN_POINTER); + + check_array_size_and_align(_Float16, TYPE_SIZE_FLOAT16, TYPE_ALIGN_FLOAT16); + + return 0; +} diff --git a/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_basic_returning.c b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_basic_returning.c new file mode 100644 index 00000000000..36fb24e6250 --- /dev/null +++ b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_basic_returning.c @@ -0,0 +1,87 @@ +/* This is an autogenerated file. Do not edit. */ + +#include "avx512fp16-xmm-check.h" +#include "defines.h" +#include "macros.h" +#include "args.h" + +char +fun_test_returning_char (void) +{ + volatile_var++; + return 64; +} + +short +fun_test_returning_short (void) +{ + volatile_var++; + return 65; +} + +int +fun_test_returning_int (void) +{ + volatile_var++; + return 66; +} + +long +fun_test_returning_long (void) +{ + volatile_var++; + return 67; +} + +long long +fun_test_returning_long_long (void) +{ + volatile_var++; + return 68; +} + +float +fun_test_returning_float (void) +{ + volatile_var++; + return 69; +} + +double +fun_test_returning_double (void) +{ + volatile_var++; + return 70; +} + +long double +fun_test_returning_long_double (void) +{ + volatile_var++; + return 71; +} + +_Float16 +fun_test_returning_float16 (void) +{ + volatile_var++; + return 72; +} + +#define def_test_returning_type_xmm(fun, type, ret, reg) \ + { type var = WRAP_RET (fun) (); \ + assert (ret == (type) reg && ret == var); } + +static void +do_test (void) +{ + def_test_returning_type_xmm(fun_test_returning_char, char, 64, rax); + def_test_returning_type_xmm(fun_test_returning_short, short, 65, rax); + def_test_returning_type_xmm(fun_test_returning_int, int, 66, rax); + def_test_returning_type_xmm(fun_test_returning_long, long, 67, rax); + def_test_returning_type_xmm(fun_test_returning_long_long, long long, 68, rax); + def_test_returning_type_xmm(fun_test_returning_float, float, 69, xmm_regs[0]._float[0]); + def_test_returning_type_xmm(fun_test_returning_double, double, 70, xmm_regs[0]._double[0]); + def_test_returning_type_xmm(fun_test_returning_long_double, long double, 71, x87_regs[0]._ldouble); + def_test_returning_type_xmm(fun_test_returning_float16, _Float16, 72, xmm_regs[0].__Float16[0]); +} diff --git a/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_basic_sizes.c b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_basic_sizes.c new file mode 100644 index 00000000000..47f3a5e87ca --- /dev/null +++ b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_basic_sizes.c @@ -0,0 +1,43 @@ +/* This checks sizes of basic types. */ + +#include "defines.h" +#include "macros.h" + + +int +main (void) +{ + /* Integral types. */ + run_signed_tests2(check_size, char, TYPE_SIZE_CHAR); + run_signed_tests2(check_size, short, TYPE_SIZE_SHORT); + run_signed_tests2(check_size, int, TYPE_SIZE_INT); + run_signed_tests2(check_size, long, TYPE_SIZE_LONG); + run_signed_tests2(check_size, long long, TYPE_SIZE_LONG_LONG); +#ifdef CHECK_INT128 + run_signed_tests2(check_size, __int128, TYPE_SIZE_INT128); +#endif + check_size(enumtype, TYPE_SIZE_ENUM); + + /* Floating point types. */ + check_size(_Float16, TYPE_SIZE_FLOAT16); + check_size(float, TYPE_SIZE_FLOAT); + check_size(double, TYPE_SIZE_DOUBLE); +#ifdef CHECK_LONG_DOUBLE + check_size(long double, TYPE_SIZE_LONG_DOUBLE); +#endif +#ifdef CHECK_FLOAT128 + check_size(__float128, TYPE_SIZE_FLOAT128); +#endif + + /* Packed types - MMX, 3DNow!, SSE and SSE2. */ +#ifdef CHECK_M64_M128 + check_size(__m64, TYPE_SIZE_M64); + check_size(__m128, TYPE_SIZE_M128); +#endif + + /* Pointer types. */ + check_size(void *, TYPE_SIZE_POINTER); + check_size(void (*)(), TYPE_SIZE_POINTER); + + return 0; +} diff --git a/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_basic_struct_size_and_align.c b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_basic_struct_size_and_align.c new file mode 100644 index 00000000000..3d1add464a2 --- /dev/null +++ b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_basic_struct_size_and_align.c @@ -0,0 +1,42 @@ +/* This checks size and alignment of structs with a single basic type + element. All basic types are checked. */ + +#include "avx512fp16-xmm-check.h" +#include "defines.h" +#include "macros.h" + + +static void +do_test (void) +{ + /* Integral types. */ + run_signed_tests3(check_basic_struct_size_and_align, char, TYPE_SIZE_CHAR, TYPE_ALIGN_CHAR); + run_signed_tests3(check_basic_struct_size_and_align, short, TYPE_SIZE_SHORT, TYPE_ALIGN_SHORT); + run_signed_tests3(check_basic_struct_size_and_align, int, TYPE_SIZE_INT, TYPE_ALIGN_INT); + run_signed_tests3(check_basic_struct_size_and_align, long, TYPE_SIZE_LONG, TYPE_ALIGN_LONG); + run_signed_tests3(check_basic_struct_size_and_align, long long, TYPE_SIZE_LONG_LONG, TYPE_ALIGN_LONG_LONG); +#ifdef CHECK_INT128 + run_signed_tests3(check_basic_struct_size_and_align, __int128, TYPE_SIZE_INT128, TYPE_ALIGN_INT128); +#endif + check_basic_struct_size_and_align(enum dummytype, TYPE_SIZE_ENUM, TYPE_ALIGN_ENUM); + + /* Floating point types. */ + check_basic_struct_size_and_align(_Float16, TYPE_SIZE_FLOAT16, TYPE_ALIGN_FLOAT16); + check_basic_struct_size_and_align(float, TYPE_SIZE_FLOAT, TYPE_ALIGN_FLOAT); + check_basic_struct_size_and_align(double, TYPE_SIZE_DOUBLE, TYPE_ALIGN_DOUBLE); +#ifdef CHECK_LONG_DOUBLE + check_basic_struct_size_and_align(long double, TYPE_SIZE_LONG_DOUBLE, TYPE_ALIGN_LONG_DOUBLE); +#endif +#ifdef CHECK_FLOAT128 + check_basic_struct_size_and_align(__float128, TYPE_SIZE_FLOAT128, TYPE_ALIGN_FLOAT128); +#endif + + /* Packed types - MMX, 3DNow!, SSE and SSE2. */ +#ifdef CHECK_M64_M128 + check_basic_struct_size_and_align(__m64, TYPE_SIZE_M64, TYPE_ALIGN_M64); + check_basic_struct_size_and_align(__m128, TYPE_SIZE_M128, TYPE_ALIGN_M128); +#endif + + /* Pointer types. The function pointer doesn't work with these macros. */ + check_basic_struct_size_and_align(void *, TYPE_SIZE_POINTER, TYPE_ALIGN_POINTER); +} diff --git a/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_basic_union_size_and_align.c b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_basic_union_size_and_align.c new file mode 100644 index 00000000000..632feebe920 --- /dev/null +++ b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_basic_union_size_and_align.c @@ -0,0 +1,40 @@ +/* Test of simple unions, size and alignment. */ + +#include "avx512fp16-xmm-check.h" +#include "defines.h" +#include "macros.h" + +static void +do_test (void) +{ + /* Integral types. */ + run_signed_tests3(check_basic_union_size_and_align, char, TYPE_SIZE_CHAR, TYPE_ALIGN_CHAR); + run_signed_tests3(check_basic_union_size_and_align, short, TYPE_SIZE_SHORT, TYPE_ALIGN_SHORT); + run_signed_tests3(check_basic_union_size_and_align, int, TYPE_SIZE_INT, TYPE_ALIGN_INT); + run_signed_tests3(check_basic_union_size_and_align, long, TYPE_SIZE_LONG, TYPE_ALIGN_LONG); + run_signed_tests3(check_basic_union_size_and_align, long long, TYPE_SIZE_LONG_LONG, TYPE_ALIGN_LONG_LONG); +#ifdef CHECK_INT128 + run_signed_tests3(check_basic_union_size_and_align, __int128, TYPE_SIZE_INT128, TYPE_ALIGN_INT128); +#endif + check_basic_union_size_and_align(enum dummytype, TYPE_SIZE_ENUM, TYPE_ALIGN_ENUM); + + /* Floating point types. */ + check_basic_union_size_and_align(_Float16, TYPE_SIZE_FLOAT16, TYPE_ALIGN_FLOAT16); + check_basic_union_size_and_align(float, TYPE_SIZE_FLOAT, TYPE_ALIGN_FLOAT); + check_basic_union_size_and_align(double, TYPE_SIZE_DOUBLE, TYPE_ALIGN_DOUBLE); +#ifdef CHECK_LONG_DOUBLE + check_basic_union_size_and_align(long double, TYPE_SIZE_LONG_DOUBLE, TYPE_ALIGN_LONG_DOUBLE); +#endif +#ifdef CHECK_FLOAT128 + check_basic_union_size_and_align(__float128, TYPE_SIZE_FLOAT128, TYPE_ALIGN_FLOAT128); +#endif + + /* Packed types - MMX, 3DNow!, SSE and SSE2. */ +#ifdef CHECK_M64_M128 + check_basic_union_size_and_align(__m64, TYPE_SIZE_M64, TYPE_ALIGN_M64); + check_basic_union_size_and_align(__m128, TYPE_SIZE_M128, TYPE_ALIGN_M128); +#endif + + /* Pointer types. The function pointer doesn't work with these macros. */ + check_basic_union_size_and_align(void *, TYPE_SIZE_POINTER, TYPE_ALIGN_POINTER); +} diff --git a/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_complex_returning.c b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_complex_returning.c new file mode 100644 index 00000000000..829d86e9ee7 --- /dev/null +++ b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_complex_returning.c @@ -0,0 +1,104 @@ +/* This is a small test case for returning a complex number. Written by + Andreas Jaeger. */ + +#include "avx512fp16-xmm-check.h" +#include "defines.h" + +#define BUILD_F16_COMPLEX(real, imag) \ + ({ __complex__ _Float16 __retval; \ + __real__ __retval = (real); \ + __imag__ __retval = (imag); \ + __retval; }) + +__complex__ _Float16 +aj_f16_times2 (__complex__ _Float16 x) +{ + __complex__ _Float16 res; + + __real__ res = (2.0 * __real__ x); + __imag__ res = (2.0 * __imag__ x); + + return res; +} + +#define BUILD_F_COMPLEX(real, imag) \ + ({ __complex__ float __retval; \ + __real__ __retval = (real); \ + __imag__ __retval = (imag); \ + __retval; }) + +#define BUILD_D_COMPLEX(real, imag) \ + ({ __complex__ double __retval; \ + __real__ __retval = (real); \ + __imag__ __retval = (imag); \ + __retval; }) + +#define BUILD_LD_COMPLEX(real, imag) \ + ({ __complex__ long double __retval; \ + __real__ __retval = (real); \ + __imag__ __retval = (imag); \ + __retval; }) + +__complex__ float +aj_f_times2 (__complex__ float x) +{ + __complex__ float res; + + __real__ res = (2.0 * __real__ x); + __imag__ res = (2.0 * __imag__ x); + + return res; +} + +__complex__ double +aj_d_times2 (__complex__ double x) +{ + __complex__ double res; + + __real__ res = (2.0 * __real__ x); + __imag__ res = (2.0 * __imag__ x); + + return res; +} + +__complex__ long double +aj_ld_times2 (__complex__ long double x) +{ + __complex__ long double res; + + __real__ res = (2.0 * __real__ x); + __imag__ res = (2.0 * __imag__ x); + + return res; +} + +static void +do_test (void) +{ +#ifdef CHECK_COMPLEX + _Complex _Float16 f16c, f16d; + _Complex float fc, fd; + _Complex double dc, dd; + _Complex long double ldc, ldd; + + f16c = BUILD_F16_COMPLEX (2.0, 3.0); + f16d = aj_f16_times2 (f16c); + + assert (__real__ f16d == 4.0f16 && __imag__ f16d == 6.0f16); + + fc = BUILD_LD_COMPLEX (2.0f, 3.0f); + fd = aj_f_times2 (fc); + + assert (__real__ fd == 4.0f && __imag__ fd == 6.0f); + + dc = BUILD_LD_COMPLEX (2.0, 3.0); + dd = aj_ld_times2 (dc); + + assert (__real__ dd == 4.0 && __imag__ dd == 6.0); + + ldc = BUILD_LD_COMPLEX (2.0L, 3.0L); + ldd = aj_ld_times2 (ldc); + + assert (__real__ ldd == 4.0L && __imag__ ldd == 6.0L); +#endif +} diff --git a/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_m64m128_returning.c b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_m64m128_returning.c new file mode 100644 index 00000000000..34afee66586 --- /dev/null +++ b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_m64m128_returning.c @@ -0,0 +1,73 @@ +#include +#include "avx512fp16-xmm-check.h" +#include "defines.h" +#include "macros.h" +#include "args.h" + +struct IntegerRegisters iregs; +struct FloatRegisters fregs; +unsigned int num_iregs, num_fregs; + +__m64 +fun_test_returning___m64 (void) +{ + volatile_var++; + return (__m64){72,0}; +} + +__m128 +fun_test_returning___m128 (void) +{ + volatile_var++; + return (__m128){73,0,0,0}; +} + +__m128h +fun_test_returning___m128h (void) +{ + volatile_var++; + return (__m128h){1.1f16, 2.2f16, 3.3f16, 4.4f16, 5.5f16, + 6.6f16, 7.7f16, 8.8f16}; +} + +__m64 test_64; +__m128 test_128; +__m128h test_128h; + +static void +do_test (void) +{ + unsigned failed = 0; + XMM_T xmmt1, xmmt2; + + /* We jump through hoops to compare the results as gcc 3.3 does throw + an ICE when trying to generate a compare for a == b, when a and b + are of __m64 or __m128 type :-( */ + clear_struct_registers; + test_64 = (__m64){72,0}; + xmmt1._m64[0] = test_64; + xmmt2._m64[0] = WRAP_RET (fun_test_returning___m64)(); + if (xmmt1._long[0] != xmmt2._long[0] + || xmmt1._long[0] != xmm_regs[0]._long[0]) + printf ("fail m64\n"), failed++; + + clear_struct_registers; + test_128 = (__m128){73,0}; + xmmt1._m128[0] = test_128; + xmmt2._m128[0] = WRAP_RET (fun_test_returning___m128)(); + if (xmmt1._long[0] != xmmt2._long[0] + || xmmt1._long[0] != xmm_regs[0]._long[0]) + printf ("fail m128\n"), failed++; + + clear_struct_registers; + test_128h = (__m128h){1.1f16, 2.2f16, 3.3f16, 4.4f16, 5.5f16, + 6.6f16, 7.7f16, 8.8f16}; + xmmt1._m128h[0] = test_128h; + xmmt2._m128h[0] = WRAP_RET (fun_test_returning___m128h)(); + if (xmmt1._long[0] != xmmt2._long[0] + || xmmt1._long[0] != xmm_regs[0]._long[0]) + printf ("fail m128h\n"), failed++; + + if (failed) + abort (); +} diff --git a/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_passing_floats.c b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_passing_floats.c new file mode 100644 index 00000000000..678b25c14d3 --- /dev/null +++ b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_passing_floats.c @@ -0,0 +1,1066 @@ +/* This is an autogenerated file. Do not edit. */ + +#include "avx512fp16-xmm-check.h" +#include "defines.h" +#include "macros.h" +#include "args.h" + +struct IntegerRegisters iregs; +struct FloatRegisters fregs; +unsigned int num_iregs, num_fregs; + +/* This struct holds values for argument checking. */ +struct +{ + _Float16 f0, f1, f2, f3, f4, f5, f6, f7, f8, f9, f10, f11, f12, f13, f14, + f15, f16, f17, f18, f19, f20, f21, f22, f23; +} values__Float16; + +struct +{ + float f0, f1, f2, f3, f4, f5, f6, f7, f8, f9, f10, f11, f12, f13, f14, f15, + f16, f17, f18, f19, f20, f21, f22, f23; +} values_float; + +struct +{ + double f0, f1, f2, f3, f4, f5, f6, f7, f8, f9, f10, f11, f12, f13, f14, f15, + f16, f17, f18, f19, f20, f21, f22, f23; +} values_double; + +struct +{ + ldouble f0, f1, f2, f3, f4, f5, f6, f7, f8, f9, f10, f11, f12, f13, f14, + f15, f16, f17, f18, f19, f20, f21, f22, f23; +} values_ldouble; + +void +fun_check_float16_passing_8_values (_Float16 f0 ATTRIBUTE_UNUSED, + _Float16 f1 ATTRIBUTE_UNUSED, + _Float16 f2 ATTRIBUTE_UNUSED, + _Float16 f3 ATTRIBUTE_UNUSED, + _Float16 f4 ATTRIBUTE_UNUSED, + _Float16 f5 ATTRIBUTE_UNUSED, + _Float16 f6 ATTRIBUTE_UNUSED, + _Float16 f7 ATTRIBUTE_UNUSED) +{ + /* Check argument values. */ + assert (values__Float16.f0 == f0); + assert (values__Float16.f1 == f1); + assert (values__Float16.f2 == f2); + assert (values__Float16.f3 == f3); + assert (values__Float16.f4 == f4); + assert (values__Float16.f5 == f5); + assert (values__Float16.f6 == f6); + assert (values__Float16.f7 == f7); +} + +void +fun_check_float16_passing_8_regs (_Float16 f0 ATTRIBUTE_UNUSED, + _Float16 f1 ATTRIBUTE_UNUSED, + _Float16 f2 ATTRIBUTE_UNUSED, + _Float16 f3 ATTRIBUTE_UNUSED, + _Float16 f4 ATTRIBUTE_UNUSED, + _Float16 f5 ATTRIBUTE_UNUSED, + _Float16 f6 ATTRIBUTE_UNUSED, + _Float16 f7 ATTRIBUTE_UNUSED) +{ + /* Check register contents. */ + check_float16_arguments; +} + +void +fun_check_float16_passing_16_values (_Float16 f0 ATTRIBUTE_UNUSED, + _Float16 f1 ATTRIBUTE_UNUSED, + _Float16 f2 ATTRIBUTE_UNUSED, + _Float16 f3 ATTRIBUTE_UNUSED, + _Float16 f4 ATTRIBUTE_UNUSED, + _Float16 f5 ATTRIBUTE_UNUSED, + _Float16 f6 ATTRIBUTE_UNUSED, + _Float16 f7 ATTRIBUTE_UNUSED, + _Float16 f8 ATTRIBUTE_UNUSED, + _Float16 f9 ATTRIBUTE_UNUSED, + _Float16 f10 ATTRIBUTE_UNUSED, + _Float16 f11 ATTRIBUTE_UNUSED, + _Float16 f12 ATTRIBUTE_UNUSED, + _Float16 f13 ATTRIBUTE_UNUSED, + _Float16 f14 ATTRIBUTE_UNUSED, + _Float16 f15 ATTRIBUTE_UNUSED) +{ + /* Check argument values. */ + assert (values__Float16.f0 == f0); + assert (values__Float16.f1 == f1); + assert (values__Float16.f2 == f2); + assert (values__Float16.f3 == f3); + assert (values__Float16.f4 == f4); + assert (values__Float16.f5 == f5); + assert (values__Float16.f6 == f6); + assert (values__Float16.f7 == f7); + assert (values__Float16.f8 == f8); + assert (values__Float16.f9 == f9); + assert (values__Float16.f10 == f10); + assert (values__Float16.f11 == f11); + assert (values__Float16.f12 == f12); + assert (values__Float16.f13 == f13); + assert (values__Float16.f14 == f14); + assert (values__Float16.f15 == f15); +} + +void +fun_check_float16_passing_16_regs (_Float16 f0 ATTRIBUTE_UNUSED, + _Float16 f1 ATTRIBUTE_UNUSED, + _Float16 f2 ATTRIBUTE_UNUSED, + _Float16 f3 ATTRIBUTE_UNUSED, + _Float16 f4 ATTRIBUTE_UNUSED, + _Float16 f5 ATTRIBUTE_UNUSED, + _Float16 f6 ATTRIBUTE_UNUSED, + _Float16 f7 ATTRIBUTE_UNUSED, + _Float16 f8 ATTRIBUTE_UNUSED, + _Float16 f9 ATTRIBUTE_UNUSED, + _Float16 f10 ATTRIBUTE_UNUSED, + _Float16 f11 ATTRIBUTE_UNUSED, + _Float16 f12 ATTRIBUTE_UNUSED, + _Float16 f13 ATTRIBUTE_UNUSED, + _Float16 f14 ATTRIBUTE_UNUSED, + _Float16 f15 ATTRIBUTE_UNUSED) +{ + /* Check register contents. */ + check_float16_arguments; +} + +void +fun_check_float16_passing_20_values (_Float16 f0 ATTRIBUTE_UNUSED, + _Float16 f1 ATTRIBUTE_UNUSED, + _Float16 f2 ATTRIBUTE_UNUSED, + _Float16 f3 ATTRIBUTE_UNUSED, + _Float16 f4 ATTRIBUTE_UNUSED, + _Float16 f5 ATTRIBUTE_UNUSED, + _Float16 f6 ATTRIBUTE_UNUSED, + _Float16 f7 ATTRIBUTE_UNUSED, + _Float16 f8 ATTRIBUTE_UNUSED, + _Float16 f9 ATTRIBUTE_UNUSED, + _Float16 f10 ATTRIBUTE_UNUSED, + _Float16 f11 ATTRIBUTE_UNUSED, + _Float16 f12 ATTRIBUTE_UNUSED, + _Float16 f13 ATTRIBUTE_UNUSED, + _Float16 f14 ATTRIBUTE_UNUSED, + _Float16 f15 ATTRIBUTE_UNUSED, + _Float16 f16 ATTRIBUTE_UNUSED, + _Float16 f17 ATTRIBUTE_UNUSED, + _Float16 f18 ATTRIBUTE_UNUSED, + _Float16 f19 ATTRIBUTE_UNUSED) +{ + /* Check argument values. */ + assert (values__Float16.f0 == f0); + assert (values__Float16.f1 == f1); + assert (values__Float16.f2 == f2); + assert (values__Float16.f3 == f3); + assert (values__Float16.f4 == f4); + assert (values__Float16.f5 == f5); + assert (values__Float16.f6 == f6); + assert (values__Float16.f7 == f7); + assert (values__Float16.f8 == f8); + assert (values__Float16.f9 == f9); + assert (values__Float16.f10 == f10); + assert (values__Float16.f11 == f11); + assert (values__Float16.f12 == f12); + assert (values__Float16.f13 == f13); + assert (values__Float16.f14 == f14); + assert (values__Float16.f15 == f15); + assert (values__Float16.f16 == f16); + assert (values__Float16.f17 == f17); + assert (values__Float16.f18 == f18); + assert (values__Float16.f19 == f19); +} + +void +fun_check_float16_passing_20_regs (_Float16 f0 ATTRIBUTE_UNUSED, + _Float16 f1 ATTRIBUTE_UNUSED, + _Float16 f2 ATTRIBUTE_UNUSED, + _Float16 f3 ATTRIBUTE_UNUSED, + _Float16 f4 ATTRIBUTE_UNUSED, + _Float16 f5 ATTRIBUTE_UNUSED, + _Float16 f6 ATTRIBUTE_UNUSED, + _Float16 f7 ATTRIBUTE_UNUSED, + _Float16 f8 ATTRIBUTE_UNUSED, + _Float16 f9 ATTRIBUTE_UNUSED, + _Float16 f10 ATTRIBUTE_UNUSED, + _Float16 f11 ATTRIBUTE_UNUSED, + _Float16 f12 ATTRIBUTE_UNUSED, + _Float16 f13 ATTRIBUTE_UNUSED, + _Float16 f14 ATTRIBUTE_UNUSED, + _Float16 f15 ATTRIBUTE_UNUSED, + _Float16 f16 ATTRIBUTE_UNUSED, + _Float16 f17 ATTRIBUTE_UNUSED, + _Float16 f18 ATTRIBUTE_UNUSED, + _Float16 f19 ATTRIBUTE_UNUSED) +{ + /* Check register contents. */ + check_float16_arguments; +} + +void +fun_check_float_passing_float8_values (float f0 ATTRIBUTE_UNUSED, + float f1 ATTRIBUTE_UNUSED, + float f2 ATTRIBUTE_UNUSED, + float f3 ATTRIBUTE_UNUSED, + float f4 ATTRIBUTE_UNUSED, + float f5 ATTRIBUTE_UNUSED, + float f6 ATTRIBUTE_UNUSED, + float f7 ATTRIBUTE_UNUSED) +{ + /* Check argument values. */ + assert (values_float.f0 == f0); + assert (values_float.f1 == f1); + assert (values_float.f2 == f2); + assert (values_float.f3 == f3); + assert (values_float.f4 == f4); + assert (values_float.f5 == f5); + assert (values_float.f6 == f6); + assert (values_float.f7 == f7); + +} + +void +fun_check_float_passing_float8_regs (float f0 ATTRIBUTE_UNUSED, + float f1 ATTRIBUTE_UNUSED, + float f2 ATTRIBUTE_UNUSED, + float f3 ATTRIBUTE_UNUSED, + float f4 ATTRIBUTE_UNUSED, + float f5 ATTRIBUTE_UNUSED, + float f6 ATTRIBUTE_UNUSED, + float f7 ATTRIBUTE_UNUSED) +{ + /* Check register contents. */ + check_float_arguments; +} + +void +fun_check_float_passing_float16_values (float f0 ATTRIBUTE_UNUSED, + float f1 ATTRIBUTE_UNUSED, + float f2 ATTRIBUTE_UNUSED, + float f3 ATTRIBUTE_UNUSED, + float f4 ATTRIBUTE_UNUSED, + float f5 ATTRIBUTE_UNUSED, + float f6 ATTRIBUTE_UNUSED, + float f7 ATTRIBUTE_UNUSED, + float f8 ATTRIBUTE_UNUSED, + float f9 ATTRIBUTE_UNUSED, + float f10 ATTRIBUTE_UNUSED, + float f11 ATTRIBUTE_UNUSED, + float f12 ATTRIBUTE_UNUSED, + float f13 ATTRIBUTE_UNUSED, + float f14 ATTRIBUTE_UNUSED, + float f15 ATTRIBUTE_UNUSED) +{ + /* Check argument values. */ + assert (values_float.f0 == f0); + assert (values_float.f1 == f1); + assert (values_float.f2 == f2); + assert (values_float.f3 == f3); + assert (values_float.f4 == f4); + assert (values_float.f5 == f5); + assert (values_float.f6 == f6); + assert (values_float.f7 == f7); + assert (values_float.f8 == f8); + assert (values_float.f9 == f9); + assert (values_float.f10 == f10); + assert (values_float.f11 == f11); + assert (values_float.f12 == f12); + assert (values_float.f13 == f13); + assert (values_float.f14 == f14); + assert (values_float.f15 == f15); + +} + +void +fun_check_float_passing_float16_regs (float f0 ATTRIBUTE_UNUSED, + float f1 ATTRIBUTE_UNUSED, + float f2 ATTRIBUTE_UNUSED, + float f3 ATTRIBUTE_UNUSED, + float f4 ATTRIBUTE_UNUSED, + float f5 ATTRIBUTE_UNUSED, + float f6 ATTRIBUTE_UNUSED, + float f7 ATTRIBUTE_UNUSED, + float f8 ATTRIBUTE_UNUSED, + float f9 ATTRIBUTE_UNUSED, + float f10 ATTRIBUTE_UNUSED, + float f11 ATTRIBUTE_UNUSED, + float f12 ATTRIBUTE_UNUSED, + float f13 ATTRIBUTE_UNUSED, + float f14 ATTRIBUTE_UNUSED, + float f15 ATTRIBUTE_UNUSED) +{ + /* Check register contents. */ + check_float_arguments; +} + +void +fun_check_float_passing_float20_values (float f0 ATTRIBUTE_UNUSED, + float f1 ATTRIBUTE_UNUSED, + float f2 ATTRIBUTE_UNUSED, + float f3 ATTRIBUTE_UNUSED, + float f4 ATTRIBUTE_UNUSED, + float f5 ATTRIBUTE_UNUSED, + float f6 ATTRIBUTE_UNUSED, + float f7 ATTRIBUTE_UNUSED, + float f8 ATTRIBUTE_UNUSED, + float f9 ATTRIBUTE_UNUSED, + float f10 ATTRIBUTE_UNUSED, + float f11 ATTRIBUTE_UNUSED, + float f12 ATTRIBUTE_UNUSED, + float f13 ATTRIBUTE_UNUSED, + float f14 ATTRIBUTE_UNUSED, + float f15 ATTRIBUTE_UNUSED, + float f16 ATTRIBUTE_UNUSED, + float f17 ATTRIBUTE_UNUSED, + float f18 ATTRIBUTE_UNUSED, + float f19 ATTRIBUTE_UNUSED) +{ + /* Check argument values. */ + assert (values_float.f0 == f0); + assert (values_float.f1 == f1); + assert (values_float.f2 == f2); + assert (values_float.f3 == f3); + assert (values_float.f4 == f4); + assert (values_float.f5 == f5); + assert (values_float.f6 == f6); + assert (values_float.f7 == f7); + assert (values_float.f8 == f8); + assert (values_float.f9 == f9); + assert (values_float.f10 == f10); + assert (values_float.f11 == f11); + assert (values_float.f12 == f12); + assert (values_float.f13 == f13); + assert (values_float.f14 == f14); + assert (values_float.f15 == f15); + assert (values_float.f16 == f16); + assert (values_float.f17 == f17); + assert (values_float.f18 == f18); + assert (values_float.f19 == f19); + +} + +void +fun_check_float_passing_float20_regs (float f0 ATTRIBUTE_UNUSED, + float f1 ATTRIBUTE_UNUSED, + float f2 ATTRIBUTE_UNUSED, + float f3 ATTRIBUTE_UNUSED, + float f4 ATTRIBUTE_UNUSED, + float f5 ATTRIBUTE_UNUSED, + float f6 ATTRIBUTE_UNUSED, + float f7 ATTRIBUTE_UNUSED, + float f8 ATTRIBUTE_UNUSED, + float f9 ATTRIBUTE_UNUSED, + float f10 ATTRIBUTE_UNUSED, + float f11 ATTRIBUTE_UNUSED, + float f12 ATTRIBUTE_UNUSED, + float f13 ATTRIBUTE_UNUSED, + float f14 ATTRIBUTE_UNUSED, + float f15 ATTRIBUTE_UNUSED, + float f16 ATTRIBUTE_UNUSED, + float f17 ATTRIBUTE_UNUSED, + float f18 ATTRIBUTE_UNUSED, + float f19 ATTRIBUTE_UNUSED) +{ + /* Check register contents. */ + check_float_arguments; +} + +void +fun_check_float_passing_double8_values (double f0 ATTRIBUTE_UNUSED, + double f1 ATTRIBUTE_UNUSED, + double f2 ATTRIBUTE_UNUSED, + double f3 ATTRIBUTE_UNUSED, + double f4 ATTRIBUTE_UNUSED, + double f5 ATTRIBUTE_UNUSED, + double f6 ATTRIBUTE_UNUSED, + double f7 ATTRIBUTE_UNUSED) +{ + /* Check argument values. */ + assert (values_double.f0 == f0); + assert (values_double.f1 == f1); + assert (values_double.f2 == f2); + assert (values_double.f3 == f3); + assert (values_double.f4 == f4); + assert (values_double.f5 == f5); + assert (values_double.f6 == f6); + assert (values_double.f7 == f7); + +} + +void +fun_check_float_passing_double8_regs (double f0 ATTRIBUTE_UNUSED, + double f1 ATTRIBUTE_UNUSED, + double f2 ATTRIBUTE_UNUSED, + double f3 ATTRIBUTE_UNUSED, + double f4 ATTRIBUTE_UNUSED, + double f5 ATTRIBUTE_UNUSED, + double f6 ATTRIBUTE_UNUSED, + double f7 ATTRIBUTE_UNUSED) +{ + /* Check register contents. */ + check_double_arguments; +} + +void +fun_check_float_passing_double16_values (double f0 ATTRIBUTE_UNUSED, + double f1 ATTRIBUTE_UNUSED, + double f2 ATTRIBUTE_UNUSED, + double f3 ATTRIBUTE_UNUSED, + double f4 ATTRIBUTE_UNUSED, + double f5 ATTRIBUTE_UNUSED, + double f6 ATTRIBUTE_UNUSED, + double f7 ATTRIBUTE_UNUSED, + double f8 ATTRIBUTE_UNUSED, + double f9 ATTRIBUTE_UNUSED, + double f10 ATTRIBUTE_UNUSED, + double f11 ATTRIBUTE_UNUSED, + double f12 ATTRIBUTE_UNUSED, + double f13 ATTRIBUTE_UNUSED, + double f14 ATTRIBUTE_UNUSED, + double f15 ATTRIBUTE_UNUSED) +{ + /* Check argument values. */ + assert (values_double.f0 == f0); + assert (values_double.f1 == f1); + assert (values_double.f2 == f2); + assert (values_double.f3 == f3); + assert (values_double.f4 == f4); + assert (values_double.f5 == f5); + assert (values_double.f6 == f6); + assert (values_double.f7 == f7); + assert (values_double.f8 == f8); + assert (values_double.f9 == f9); + assert (values_double.f10 == f10); + assert (values_double.f11 == f11); + assert (values_double.f12 == f12); + assert (values_double.f13 == f13); + assert (values_double.f14 == f14); + assert (values_double.f15 == f15); + +} + +void +fun_check_float_passing_double16_regs (double f0 ATTRIBUTE_UNUSED, + double f1 ATTRIBUTE_UNUSED, + double f2 ATTRIBUTE_UNUSED, + double f3 ATTRIBUTE_UNUSED, + double f4 ATTRIBUTE_UNUSED, + double f5 ATTRIBUTE_UNUSED, + double f6 ATTRIBUTE_UNUSED, + double f7 ATTRIBUTE_UNUSED, + double f8 ATTRIBUTE_UNUSED, + double f9 ATTRIBUTE_UNUSED, + double f10 ATTRIBUTE_UNUSED, + double f11 ATTRIBUTE_UNUSED, + double f12 ATTRIBUTE_UNUSED, + double f13 ATTRIBUTE_UNUSED, + double f14 ATTRIBUTE_UNUSED, + double f15 ATTRIBUTE_UNUSED) +{ + /* Check register contents. */ + check_double_arguments; +} + +void +fun_check_float_passing_double20_values (double f0 ATTRIBUTE_UNUSED, + double f1 ATTRIBUTE_UNUSED, + double f2 ATTRIBUTE_UNUSED, + double f3 ATTRIBUTE_UNUSED, + double f4 ATTRIBUTE_UNUSED, + double f5 ATTRIBUTE_UNUSED, + double f6 ATTRIBUTE_UNUSED, + double f7 ATTRIBUTE_UNUSED, + double f8 ATTRIBUTE_UNUSED, + double f9 ATTRIBUTE_UNUSED, + double f10 ATTRIBUTE_UNUSED, + double f11 ATTRIBUTE_UNUSED, + double f12 ATTRIBUTE_UNUSED, + double f13 ATTRIBUTE_UNUSED, + double f14 ATTRIBUTE_UNUSED, + double f15 ATTRIBUTE_UNUSED, + double f16 ATTRIBUTE_UNUSED, + double f17 ATTRIBUTE_UNUSED, + double f18 ATTRIBUTE_UNUSED, + double f19 ATTRIBUTE_UNUSED) +{ + /* Check argument values. */ + assert (values_double.f0 == f0); + assert (values_double.f1 == f1); + assert (values_double.f2 == f2); + assert (values_double.f3 == f3); + assert (values_double.f4 == f4); + assert (values_double.f5 == f5); + assert (values_double.f6 == f6); + assert (values_double.f7 == f7); + assert (values_double.f8 == f8); + assert (values_double.f9 == f9); + assert (values_double.f10 == f10); + assert (values_double.f11 == f11); + assert (values_double.f12 == f12); + assert (values_double.f13 == f13); + assert (values_double.f14 == f14); + assert (values_double.f15 == f15); + assert (values_double.f16 == f16); + assert (values_double.f17 == f17); + assert (values_double.f18 == f18); + assert (values_double.f19 == f19); + +} + +void +fun_check_float_passing_double20_regs (double f0 ATTRIBUTE_UNUSED, + double f1 ATTRIBUTE_UNUSED, + double f2 ATTRIBUTE_UNUSED, + double f3 ATTRIBUTE_UNUSED, + double f4 ATTRIBUTE_UNUSED, + double f5 ATTRIBUTE_UNUSED, + double f6 ATTRIBUTE_UNUSED, + double f7 ATTRIBUTE_UNUSED, + double f8 ATTRIBUTE_UNUSED, + double f9 ATTRIBUTE_UNUSED, + double f10 ATTRIBUTE_UNUSED, + double f11 ATTRIBUTE_UNUSED, + double f12 ATTRIBUTE_UNUSED, + double f13 ATTRIBUTE_UNUSED, + double f14 ATTRIBUTE_UNUSED, + double f15 ATTRIBUTE_UNUSED, + double f16 ATTRIBUTE_UNUSED, + double f17 ATTRIBUTE_UNUSED, + double f18 ATTRIBUTE_UNUSED, + double f19 ATTRIBUTE_UNUSED) +{ + /* Check register contents. */ + check_double_arguments; +} + +void +fun_check_x87_passing_ldouble8_values (ldouble f0 ATTRIBUTE_UNUSED, + ldouble f1 ATTRIBUTE_UNUSED, + ldouble f2 ATTRIBUTE_UNUSED, + ldouble f3 ATTRIBUTE_UNUSED, + ldouble f4 ATTRIBUTE_UNUSED, + ldouble f5 ATTRIBUTE_UNUSED, + ldouble f6 ATTRIBUTE_UNUSED, + ldouble f7 ATTRIBUTE_UNUSED) +{ + /* Check argument values. */ + assert (values_ldouble.f0 == f0); + assert (values_ldouble.f1 == f1); + assert (values_ldouble.f2 == f2); + assert (values_ldouble.f3 == f3); + assert (values_ldouble.f4 == f4); + assert (values_ldouble.f5 == f5); + assert (values_ldouble.f6 == f6); + assert (values_ldouble.f7 == f7); + +} + +void +fun_check_x87_passing_ldouble8_regs (ldouble f0 ATTRIBUTE_UNUSED, + ldouble f1 ATTRIBUTE_UNUSED, + ldouble f2 ATTRIBUTE_UNUSED, + ldouble f3 ATTRIBUTE_UNUSED, + ldouble f4 ATTRIBUTE_UNUSED, + ldouble f5 ATTRIBUTE_UNUSED, + ldouble f6 ATTRIBUTE_UNUSED, + ldouble f7 ATTRIBUTE_UNUSED) +{ + /* Check register contents. */ + check_ldouble_arguments; +} + +void +fun_check_x87_passing_ldouble16_values (ldouble f0 ATTRIBUTE_UNUSED, + ldouble f1 ATTRIBUTE_UNUSED, + ldouble f2 ATTRIBUTE_UNUSED, + ldouble f3 ATTRIBUTE_UNUSED, + ldouble f4 ATTRIBUTE_UNUSED, + ldouble f5 ATTRIBUTE_UNUSED, + ldouble f6 ATTRIBUTE_UNUSED, + ldouble f7 ATTRIBUTE_UNUSED, + ldouble f8 ATTRIBUTE_UNUSED, + ldouble f9 ATTRIBUTE_UNUSED, + ldouble f10 ATTRIBUTE_UNUSED, + ldouble f11 ATTRIBUTE_UNUSED, + ldouble f12 ATTRIBUTE_UNUSED, + ldouble f13 ATTRIBUTE_UNUSED, + ldouble f14 ATTRIBUTE_UNUSED, + ldouble f15 ATTRIBUTE_UNUSED) +{ + /* Check argument values. */ + assert (values_ldouble.f0 == f0); + assert (values_ldouble.f1 == f1); + assert (values_ldouble.f2 == f2); + assert (values_ldouble.f3 == f3); + assert (values_ldouble.f4 == f4); + assert (values_ldouble.f5 == f5); + assert (values_ldouble.f6 == f6); + assert (values_ldouble.f7 == f7); + assert (values_ldouble.f8 == f8); + assert (values_ldouble.f9 == f9); + assert (values_ldouble.f10 == f10); + assert (values_ldouble.f11 == f11); + assert (values_ldouble.f12 == f12); + assert (values_ldouble.f13 == f13); + assert (values_ldouble.f14 == f14); + assert (values_ldouble.f15 == f15); + +} + +void +fun_check_x87_passing_ldouble16_regs (ldouble f0 ATTRIBUTE_UNUSED, + ldouble f1 ATTRIBUTE_UNUSED, + ldouble f2 ATTRIBUTE_UNUSED, + ldouble f3 ATTRIBUTE_UNUSED, + ldouble f4 ATTRIBUTE_UNUSED, + ldouble f5 ATTRIBUTE_UNUSED, + ldouble f6 ATTRIBUTE_UNUSED, + ldouble f7 ATTRIBUTE_UNUSED, + ldouble f8 ATTRIBUTE_UNUSED, + ldouble f9 ATTRIBUTE_UNUSED, + ldouble f10 ATTRIBUTE_UNUSED, + ldouble f11 ATTRIBUTE_UNUSED, + ldouble f12 ATTRIBUTE_UNUSED, + ldouble f13 ATTRIBUTE_UNUSED, + ldouble f14 ATTRIBUTE_UNUSED, + ldouble f15 ATTRIBUTE_UNUSED) +{ + /* Check register contents. */ + check_ldouble_arguments; +} + +void +fun_check_x87_passing_ldouble20_values (ldouble f0 ATTRIBUTE_UNUSED, + ldouble f1 ATTRIBUTE_UNUSED, + ldouble f2 ATTRIBUTE_UNUSED, + ldouble f3 ATTRIBUTE_UNUSED, + ldouble f4 ATTRIBUTE_UNUSED, + ldouble f5 ATTRIBUTE_UNUSED, + ldouble f6 ATTRIBUTE_UNUSED, + ldouble f7 ATTRIBUTE_UNUSED, + ldouble f8 ATTRIBUTE_UNUSED, + ldouble f9 ATTRIBUTE_UNUSED, + ldouble f10 ATTRIBUTE_UNUSED, + ldouble f11 ATTRIBUTE_UNUSED, + ldouble f12 ATTRIBUTE_UNUSED, + ldouble f13 ATTRIBUTE_UNUSED, + ldouble f14 ATTRIBUTE_UNUSED, + ldouble f15 ATTRIBUTE_UNUSED, + ldouble f16 ATTRIBUTE_UNUSED, + ldouble f17 ATTRIBUTE_UNUSED, + ldouble f18 ATTRIBUTE_UNUSED, + ldouble f19 ATTRIBUTE_UNUSED) +{ + /* Check argument values. */ + assert (values_ldouble.f0 == f0); + assert (values_ldouble.f1 == f1); + assert (values_ldouble.f2 == f2); + assert (values_ldouble.f3 == f3); + assert (values_ldouble.f4 == f4); + assert (values_ldouble.f5 == f5); + assert (values_ldouble.f6 == f6); + assert (values_ldouble.f7 == f7); + assert (values_ldouble.f8 == f8); + assert (values_ldouble.f9 == f9); + assert (values_ldouble.f10 == f10); + assert (values_ldouble.f11 == f11); + assert (values_ldouble.f12 == f12); + assert (values_ldouble.f13 == f13); + assert (values_ldouble.f14 == f14); + assert (values_ldouble.f15 == f15); + assert (values_ldouble.f16 == f16); + assert (values_ldouble.f17 == f17); + assert (values_ldouble.f18 == f18); + assert (values_ldouble.f19 == f19); + +} + +void +fun_check_x87_passing_ldouble20_regs (ldouble f0 ATTRIBUTE_UNUSED, + ldouble f1 ATTRIBUTE_UNUSED, + ldouble f2 ATTRIBUTE_UNUSED, + ldouble f3 ATTRIBUTE_UNUSED, + ldouble f4 ATTRIBUTE_UNUSED, + ldouble f5 ATTRIBUTE_UNUSED, + ldouble f6 ATTRIBUTE_UNUSED, + ldouble f7 ATTRIBUTE_UNUSED, + ldouble f8 ATTRIBUTE_UNUSED, + ldouble f9 ATTRIBUTE_UNUSED, + ldouble f10 ATTRIBUTE_UNUSED, + ldouble f11 ATTRIBUTE_UNUSED, + ldouble f12 ATTRIBUTE_UNUSED, + ldouble f13 ATTRIBUTE_UNUSED, + ldouble f14 ATTRIBUTE_UNUSED, + ldouble f15 ATTRIBUTE_UNUSED, + ldouble f16 ATTRIBUTE_UNUSED, + ldouble f17 ATTRIBUTE_UNUSED, + ldouble f18 ATTRIBUTE_UNUSED, + ldouble f19 ATTRIBUTE_UNUSED) +{ + /* Check register contents. */ + check_ldouble_arguments; +} + +#define def_check_float16_passing8(_f0, _f1, _f2, _f3, _f4, _f5, _f6,\ + _f7, _func1, _func2, TYPE) \ + values_ ## TYPE .f0 = _f0; \ + values_ ## TYPE .f1 = _f1; \ + values_ ## TYPE .f2 = _f2; \ + values_ ## TYPE .f3 = _f3; \ + values_ ## TYPE .f4 = _f4; \ + values_ ## TYPE .f5 = _f5; \ + values_ ## TYPE .f6 = _f6; \ + values_ ## TYPE .f7 = _f7; \ + WRAP_CALL(_func1) (_f0, _f1, _f2, _f3, _f4, _f5, _f6, _f7); \ + clear_float_registers; \ + fregs.F0._ ## TYPE [0] = _f0; \ + fregs.F1._ ## TYPE [0] = _f1; \ + fregs.F2._ ## TYPE [0] = _f2; \ + fregs.F3._ ## TYPE [0] = _f3; \ + fregs.F4._ ## TYPE [0] = _f4; \ + fregs.F5._ ## TYPE [0] = _f5; \ + fregs.F6._ ## TYPE [0] = _f6; \ + fregs.F7._ ## TYPE [0] = _f7; \ + num_fregs = 8; \ + WRAP_CALL(_func2) (_f0, _f1, _f2, _f3, _f4, _f5, _f6, _f7); + +#define def_check_float16_passing16(_f0, _f1, _f2, _f3, _f4, _f5, _f6, \ + _f7, _f8, _f9, _f10, _f11, _f12, _f13, \ + _f14, _f15, _func1, _func2, TYPE) \ + values_ ## TYPE .f0 = _f0; \ + values_ ## TYPE .f1 = _f1; \ + values_ ## TYPE .f2 = _f2; \ + values_ ## TYPE .f3 = _f3; \ + values_ ## TYPE .f4 = _f4; \ + values_ ## TYPE .f5 = _f5; \ + values_ ## TYPE .f6 = _f6; \ + values_ ## TYPE .f7 = _f7; \ + values_ ## TYPE .f8 = _f8; \ + values_ ## TYPE .f9 = _f9; \ + values_ ## TYPE .f10 = _f10; \ + values_ ## TYPE .f11 = _f11; \ + values_ ## TYPE .f12 = _f12; \ + values_ ## TYPE .f13 = _f13; \ + values_ ## TYPE .f14 = _f14; \ + values_ ## TYPE .f15 = _f15; \ + WRAP_CALL(_func1) (_f0, _f1, _f2, _f3, _f4, _f5, _f6, _f7, _f8, _f9, \ + _f10, _f11, _f12, _f13, _f14, _f15); \ + clear_float_registers; \ + fregs.F0._ ## TYPE [0] = _f0; \ + fregs.F1._ ## TYPE [0] = _f1; \ + fregs.F2._ ## TYPE [0] = _f2; \ + fregs.F3._ ## TYPE [0] = _f3; \ + fregs.F4._ ## TYPE [0] = _f4; \ + fregs.F5._ ## TYPE [0] = _f5; \ + fregs.F6._ ## TYPE [0] = _f6; \ + fregs.F7._ ## TYPE [0] = _f7; \ + num_fregs = 8; \ + WRAP_CALL(_func2) (_f0, _f1, _f2, _f3, _f4, _f5, _f6, _f7, _f8, _f9, \ + _f10, _f11, _f12, _f13, _f14, _f15); + +#define def_check_float16_passing20(_f0, _f1, _f2, _f3, _f4, _f5, _f6, \ + _f7, _f8, _f9, _f10, _f11, _f12, \ + _f13, _f14, _f15, _f16, _f17, \ + _f18, _f19, _func1, _func2, TYPE) \ + values_ ## TYPE .f0 = _f0; \ + values_ ## TYPE .f1 = _f1; \ + values_ ## TYPE .f2 = _f2; \ + values_ ## TYPE .f3 = _f3; \ + values_ ## TYPE .f4 = _f4; \ + values_ ## TYPE .f5 = _f5; \ + values_ ## TYPE .f6 = _f6; \ + values_ ## TYPE .f7 = _f7; \ + values_ ## TYPE .f8 = _f8; \ + values_ ## TYPE .f9 = _f9; \ + values_ ## TYPE .f10 = _f10; \ + values_ ## TYPE .f11 = _f11; \ + values_ ## TYPE .f12 = _f12; \ + values_ ## TYPE .f13 = _f13; \ + values_ ## TYPE .f14 = _f14; \ + values_ ## TYPE .f15 = _f15; \ + values_ ## TYPE .f16 = _f16; \ + values_ ## TYPE .f17 = _f17; \ + values_ ## TYPE .f18 = _f18; \ + values_ ## TYPE .f19 = _f19; \ + WRAP_CALL(_func1) (_f0, _f1, _f2, _f3, _f4, _f5, _f6, _f7, _f8, \ + _f9, _f10, _f11, _f12, _f13, _f14, _f15, _f16, \ + _f17, _f18, _f19); \ + clear_float_registers; \ + fregs.F0._ ## TYPE [0] = _f0; \ + fregs.F1._ ## TYPE [0] = _f1; \ + fregs.F2._ ## TYPE [0] = _f2; \ + fregs.F3._ ## TYPE [0] = _f3; \ + fregs.F4._ ## TYPE [0] = _f4; \ + fregs.F5._ ## TYPE [0] = _f5; \ + fregs.F6._ ## TYPE [0] = _f6; \ + fregs.F7._ ## TYPE [0] = _f7; \ + num_fregs = 8; \ + WRAP_CALL(_func2) (_f0, _f1, _f2, _f3, _f4, _f5, _f6, _f7, _f8, _f9, \ + _f10, _f11, _f12, _f13, _f14, _f15, _f16, _f17, \ + _f18, _f19); + + +#define def_check_float_passing8(_f0, _f1, _f2, _f3, _f4, _f5, _f6, _f7, _func1, _func2, TYPE) \ + values_ ## TYPE .f0 = _f0; \ + values_ ## TYPE .f1 = _f1; \ + values_ ## TYPE .f2 = _f2; \ + values_ ## TYPE .f3 = _f3; \ + values_ ## TYPE .f4 = _f4; \ + values_ ## TYPE .f5 = _f5; \ + values_ ## TYPE .f6 = _f6; \ + values_ ## TYPE .f7 = _f7; \ + WRAP_CALL(_func1) (_f0, _f1, _f2, _f3, _f4, _f5, _f6, _f7); \ + \ + clear_float_registers; \ + fregs.F0._ ## TYPE [0] = _f0; \ + fregs.F1._ ## TYPE [0] = _f1; \ + fregs.F2._ ## TYPE [0] = _f2; \ + fregs.F3._ ## TYPE [0] = _f3; \ + fregs.F4._ ## TYPE [0] = _f4; \ + fregs.F5._ ## TYPE [0] = _f5; \ + fregs.F6._ ## TYPE [0] = _f6; \ + fregs.F7._ ## TYPE [0] = _f7; \ + num_fregs = 8; \ + WRAP_CALL(_func2) (_f0, _f1, _f2, _f3, _f4, _f5, _f6, _f7); + +#define def_check_float_passing16(_f0, _f1, _f2, _f3, _f4, _f5, _f6, _f7, _f8, _f9, _f10, _f11, _f12, _f13, _f14, _f15, _func1, _func2, TYPE) \ + values_ ## TYPE .f0 = _f0; \ + values_ ## TYPE .f1 = _f1; \ + values_ ## TYPE .f2 = _f2; \ + values_ ## TYPE .f3 = _f3; \ + values_ ## TYPE .f4 = _f4; \ + values_ ## TYPE .f5 = _f5; \ + values_ ## TYPE .f6 = _f6; \ + values_ ## TYPE .f7 = _f7; \ + values_ ## TYPE .f8 = _f8; \ + values_ ## TYPE .f9 = _f9; \ + values_ ## TYPE .f10 = _f10; \ + values_ ## TYPE .f11 = _f11; \ + values_ ## TYPE .f12 = _f12; \ + values_ ## TYPE .f13 = _f13; \ + values_ ## TYPE .f14 = _f14; \ + values_ ## TYPE .f15 = _f15; \ + WRAP_CALL(_func1) (_f0, _f1, _f2, _f3, _f4, _f5, _f6, _f7, _f8, _f9, _f10, _f11, _f12, _f13, _f14, _f15); \ + \ + clear_float_registers; \ + fregs.F0._ ## TYPE [0] = _f0; \ + fregs.F1._ ## TYPE [0] = _f1; \ + fregs.F2._ ## TYPE [0] = _f2; \ + fregs.F3._ ## TYPE [0] = _f3; \ + fregs.F4._ ## TYPE [0] = _f4; \ + fregs.F5._ ## TYPE [0] = _f5; \ + fregs.F6._ ## TYPE [0] = _f6; \ + fregs.F7._ ## TYPE [0] = _f7; \ + num_fregs = 8; \ + WRAP_CALL(_func2) (_f0, _f1, _f2, _f3, _f4, _f5, _f6, _f7, _f8, _f9, _f10, _f11, _f12, _f13, _f14, _f15); + +#define def_check_float_passing20(_f0, _f1, _f2, _f3, _f4, _f5, _f6, _f7, _f8, _f9, _f10, _f11, _f12, _f13, _f14, _f15, _f16, _f17, _f18, _f19, _func1, _func2, TYPE) \ + values_ ## TYPE .f0 = _f0; \ + values_ ## TYPE .f1 = _f1; \ + values_ ## TYPE .f2 = _f2; \ + values_ ## TYPE .f3 = _f3; \ + values_ ## TYPE .f4 = _f4; \ + values_ ## TYPE .f5 = _f5; \ + values_ ## TYPE .f6 = _f6; \ + values_ ## TYPE .f7 = _f7; \ + values_ ## TYPE .f8 = _f8; \ + values_ ## TYPE .f9 = _f9; \ + values_ ## TYPE .f10 = _f10; \ + values_ ## TYPE .f11 = _f11; \ + values_ ## TYPE .f12 = _f12; \ + values_ ## TYPE .f13 = _f13; \ + values_ ## TYPE .f14 = _f14; \ + values_ ## TYPE .f15 = _f15; \ + values_ ## TYPE .f16 = _f16; \ + values_ ## TYPE .f17 = _f17; \ + values_ ## TYPE .f18 = _f18; \ + values_ ## TYPE .f19 = _f19; \ + WRAP_CALL(_func1) (_f0, _f1, _f2, _f3, _f4, _f5, _f6, _f7, _f8, _f9, _f10, _f11, _f12, _f13, _f14, _f15, _f16, _f17, _f18, _f19); \ + \ + clear_float_registers; \ + fregs.F0._ ## TYPE [0] = _f0; \ + fregs.F1._ ## TYPE [0] = _f1; \ + fregs.F2._ ## TYPE [0] = _f2; \ + fregs.F3._ ## TYPE [0] = _f3; \ + fregs.F4._ ## TYPE [0] = _f4; \ + fregs.F5._ ## TYPE [0] = _f5; \ + fregs.F6._ ## TYPE [0] = _f6; \ + fregs.F7._ ## TYPE [0] = _f7; \ + num_fregs = 8; \ + WRAP_CALL(_func2) (_f0, _f1, _f2, _f3, _f4, _f5, _f6, _f7, _f8, _f9, _f10, _f11, _f12, _f13, _f14, _f15, _f16, _f17, _f18, _f19); + +#define def_check_x87_passing8(_f0, _f1, _f2, _f3, _f4, _f5, _f6, _f7, _func1, _func2, TYPE) \ + values_ ## TYPE .f0 = _f0; \ + values_ ## TYPE .f1 = _f1; \ + values_ ## TYPE .f2 = _f2; \ + values_ ## TYPE .f3 = _f3; \ + values_ ## TYPE .f4 = _f4; \ + values_ ## TYPE .f5 = _f5; \ + values_ ## TYPE .f6 = _f6; \ + values_ ## TYPE .f7 = _f7; \ + WRAP_CALL(_func1) (_f0, _f1, _f2, _f3, _f4, _f5, _f6, _f7); \ + \ + clear_x87_registers; \ + num_fregs = 0; \ + WRAP_CALL(_func2) (_f0, _f1, _f2, _f3, _f4, _f5, _f6, _f7); + +#define def_check_x87_passing16(_f0, _f1, _f2, _f3, _f4, _f5, _f6, _f7, _f8, _f9, _f10, _f11, _f12, _f13, _f14, _f15, _func1, _func2, TYPE) \ + values_ ## TYPE .f0 = _f0; \ + values_ ## TYPE .f1 = _f1; \ + values_ ## TYPE .f2 = _f2; \ + values_ ## TYPE .f3 = _f3; \ + values_ ## TYPE .f4 = _f4; \ + values_ ## TYPE .f5 = _f5; \ + values_ ## TYPE .f6 = _f6; \ + values_ ## TYPE .f7 = _f7; \ + values_ ## TYPE .f8 = _f8; \ + values_ ## TYPE .f9 = _f9; \ + values_ ## TYPE .f10 = _f10; \ + values_ ## TYPE .f11 = _f11; \ + values_ ## TYPE .f12 = _f12; \ + values_ ## TYPE .f13 = _f13; \ + values_ ## TYPE .f14 = _f14; \ + values_ ## TYPE .f15 = _f15; \ + WRAP_CALL(_func1) (_f0, _f1, _f2, _f3, _f4, _f5, _f6, _f7, _f8, _f9, _f10, _f11, _f12, _f13, _f14, _f15); \ + \ + clear_x87_registers; \ + num_fregs = 0; \ + WRAP_CALL(_func2) (_f0, _f1, _f2, _f3, _f4, _f5, _f6, _f7, _f8, _f9, _f10, _f11, _f12, _f13, _f14, _f15); + +#define def_check_x87_passing20(_f0, _f1, _f2, _f3, _f4, _f5, _f6, _f7, _f8, _f9, _f10, _f11, _f12, _f13, _f14, _f15, _f16, _f17, _f18, _f19, _func1, _func2, TYPE) \ + values_ ## TYPE .f0 = _f0; \ + values_ ## TYPE .f1 = _f1; \ + values_ ## TYPE .f2 = _f2; \ + values_ ## TYPE .f3 = _f3; \ + values_ ## TYPE .f4 = _f4; \ + values_ ## TYPE .f5 = _f5; \ + values_ ## TYPE .f6 = _f6; \ + values_ ## TYPE .f7 = _f7; \ + values_ ## TYPE .f8 = _f8; \ + values_ ## TYPE .f9 = _f9; \ + values_ ## TYPE .f10 = _f10; \ + values_ ## TYPE .f11 = _f11; \ + values_ ## TYPE .f12 = _f12; \ + values_ ## TYPE .f13 = _f13; \ + values_ ## TYPE .f14 = _f14; \ + values_ ## TYPE .f15 = _f15; \ + values_ ## TYPE .f16 = _f16; \ + values_ ## TYPE .f17 = _f17; \ + values_ ## TYPE .f18 = _f18; \ + values_ ## TYPE .f19 = _f19; \ + WRAP_CALL(_func1) (_f0, _f1, _f2, _f3, _f4, _f5, _f6, _f7, _f8, _f9, _f10, _f11, _f12, _f13, _f14, _f15, _f16, _f17, _f18, _f19); \ + \ + clear_x87_registers; \ + num_fregs = 0; \ + WRAP_CALL(_func2) (_f0, _f1, _f2, _f3, _f4, _f5, _f6, _f7, _f8, _f9, _f10, _f11, _f12, _f13, _f14, _f15, _f16, _f17, _f18, _f19); + +void +test_float16_on_stack () +{ + def_check_float16_passing8 (32, 33, 34, 35, 36, 37, 38, 39, + fun_check_float16_passing_8_values, + fun_check_float16_passing_8_regs, _Float16); + + def_check_float16_passing16 (32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, + 44, 45, 46, 47, + fun_check_float16_passing_16_values, + fun_check_float16_passing_16_regs, _Float16); +} + +void +test_too_many_float16 () +{ + def_check_float16_passing20 (32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, + 44, 45, 46, 47, 48, 49, 50, 51, + fun_check_float16_passing_20_values, + fun_check_float16_passing_20_regs, _Float16); +} + +void +test_floats_on_stack () +{ + def_check_float_passing8 (32, 33, 34, 35, 36, 37, 38, 39, + fun_check_float_passing_float8_values, + fun_check_float_passing_float8_regs, float); + + def_check_float_passing16 (32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, + 44, 45, 46, 47, + fun_check_float_passing_float16_values, + fun_check_float_passing_float16_regs, float); +} + +void +test_too_many_floats () +{ + def_check_float_passing20 (32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, + 44, 45, 46, 47, 48, 49, 50, 51, + fun_check_float_passing_float20_values, + fun_check_float_passing_float20_regs, float); +} + +void +test_doubles_on_stack () +{ + def_check_float_passing8 (32, 33, 34, 35, 36, 37, 38, 39, + fun_check_float_passing_double8_values, + fun_check_float_passing_double8_regs, double); + + def_check_float_passing16 (32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, + 44, 45, 46, 47, + fun_check_float_passing_double16_values, + fun_check_float_passing_double16_regs, double); +} + +void +test_too_many_doubles () +{ + def_check_float_passing20 (32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, + 44, 45, 46, 47, 48, 49, 50, 51, + fun_check_float_passing_double20_values, + fun_check_float_passing_double20_regs, double); +} + +void +test_long_doubles_on_stack () +{ + def_check_x87_passing8 (32, 33, 34, 35, 36, 37, 38, 39, + fun_check_x87_passing_ldouble8_values, + fun_check_x87_passing_ldouble8_regs, ldouble); +} + +void +test_too_many_long_doubles () +{ + def_check_x87_passing20 (32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, + 45, 46, 47, 48, 49, 50, 51, + fun_check_x87_passing_ldouble20_values, + fun_check_x87_passing_ldouble20_regs, ldouble); +} + +void +test_float128s_on_stack () +{ +} + +void +test_too_many_float128s () +{ +} + + +static void +do_test (void) +{ + test_float16_on_stack (); + test_too_many_float16 (); + test_floats_on_stack (); + test_too_many_floats (); + test_doubles_on_stack (); + test_too_many_doubles (); + test_long_doubles_on_stack (); + test_too_many_long_doubles (); + test_float128s_on_stack (); + test_too_many_float128s (); +} diff --git a/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_passing_m64m128.c b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_passing_m64m128.c new file mode 100644 index 00000000000..66c27aef7af --- /dev/null +++ b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_passing_m64m128.c @@ -0,0 +1,510 @@ +#include +#include "avx512fp16-xmm-check.h" +#include "defines.h" +#include "macros.h" +#include "args.h" + +struct IntegerRegisters iregs; +struct FloatRegisters fregs; +unsigned int num_iregs, num_fregs; + +/* This struct holds values for argument checking. */ +struct +{ + XMM_T i0, i1, i2, i3, i4, i5, i6, i7, i8, i9, i10, i11, i12, i13, i14, i15, + i16, i17, i18, i19, i20, i21, i22, i23; +} values; + +char *pass; +int failed = 0; + +#undef assert +#define assert(c) do { \ + if (!(c)) {failed++; printf ("failed %s\n", pass); } \ +} while (0) + +#define compare(X1,X2,T) do { \ + assert (memcmp (&X1, &X2, sizeof (T)) == 0); \ +} while (0) + +void +fun_check_passing_m64_8_values (__m64 i0 ATTRIBUTE_UNUSED, + __m64 i1 ATTRIBUTE_UNUSED, + __m64 i2 ATTRIBUTE_UNUSED, + __m64 i3 ATTRIBUTE_UNUSED, + __m64 i4 ATTRIBUTE_UNUSED, + __m64 i5 ATTRIBUTE_UNUSED, + __m64 i6 ATTRIBUTE_UNUSED, + __m64 i7 ATTRIBUTE_UNUSED) +{ + /* Check argument values. */ + compare (values.i0, i0, __m64); + compare (values.i1, i1, __m64); + compare (values.i2, i2, __m64); + compare (values.i3, i3, __m64); + compare (values.i4, i4, __m64); + compare (values.i5, i5, __m64); + compare (values.i6, i6, __m64); + compare (values.i7, i7, __m64); +} + +void +fun_check_passing_m64_8_regs (__m64 i0 ATTRIBUTE_UNUSED, + __m64 i1 ATTRIBUTE_UNUSED, + __m64 i2 ATTRIBUTE_UNUSED, + __m64 i3 ATTRIBUTE_UNUSED, + __m64 i4 ATTRIBUTE_UNUSED, + __m64 i5 ATTRIBUTE_UNUSED, + __m64 i6 ATTRIBUTE_UNUSED, + __m64 i7 ATTRIBUTE_UNUSED) +{ + /* Check register contents. */ + check_m64_arguments; +} + +void +fun_check_passing_m64_20_values (__m64 i0 ATTRIBUTE_UNUSED, + __m64 i1 ATTRIBUTE_UNUSED, + __m64 i2 ATTRIBUTE_UNUSED, + __m64 i3 ATTRIBUTE_UNUSED, + __m64 i4 ATTRIBUTE_UNUSED, + __m64 i5 ATTRIBUTE_UNUSED, + __m64 i6 ATTRIBUTE_UNUSED, + __m64 i7 ATTRIBUTE_UNUSED, + __m64 i8 ATTRIBUTE_UNUSED, + __m64 i9 ATTRIBUTE_UNUSED, + __m64 i10 ATTRIBUTE_UNUSED, + __m64 i11 ATTRIBUTE_UNUSED, + __m64 i12 ATTRIBUTE_UNUSED, + __m64 i13 ATTRIBUTE_UNUSED, + __m64 i14 ATTRIBUTE_UNUSED, + __m64 i15 ATTRIBUTE_UNUSED, + __m64 i16 ATTRIBUTE_UNUSED, + __m64 i17 ATTRIBUTE_UNUSED, + __m64 i18 ATTRIBUTE_UNUSED, + __m64 i19 ATTRIBUTE_UNUSED) +{ + /* Check argument values. */ + compare (values.i0, i0, __m64); + compare (values.i1, i1, __m64); + compare (values.i2, i2, __m64); + compare (values.i3, i3, __m64); + compare (values.i4, i4, __m64); + compare (values.i5, i5, __m64); + compare (values.i6, i6, __m64); + compare (values.i7, i7, __m64); + compare (values.i8, i8, __m64); + compare (values.i9, i9, __m64); + compare (values.i10, i10, __m64); + compare (values.i11, i11, __m64); + compare (values.i12, i12, __m64); + compare (values.i13, i13, __m64); + compare (values.i14, i14, __m64); + compare (values.i15, i15, __m64); + compare (values.i16, i16, __m64); + compare (values.i17, i17, __m64); + compare (values.i18, i18, __m64); + compare (values.i19, i19, __m64); +} + +void +fun_check_passing_m64_20_regs (__m64 i0 ATTRIBUTE_UNUSED, + __m64 i1 ATTRIBUTE_UNUSED, + __m64 i2 ATTRIBUTE_UNUSED, + __m64 i3 ATTRIBUTE_UNUSED, + __m64 i4 ATTRIBUTE_UNUSED, + __m64 i5 ATTRIBUTE_UNUSED, + __m64 i6 ATTRIBUTE_UNUSED, + __m64 i7 ATTRIBUTE_UNUSED, + __m64 i8 ATTRIBUTE_UNUSED, + __m64 i9 ATTRIBUTE_UNUSED, + __m64 i10 ATTRIBUTE_UNUSED, + __m64 i11 ATTRIBUTE_UNUSED, + __m64 i12 ATTRIBUTE_UNUSED, + __m64 i13 ATTRIBUTE_UNUSED, + __m64 i14 ATTRIBUTE_UNUSED, + __m64 i15 ATTRIBUTE_UNUSED, + __m64 i16 ATTRIBUTE_UNUSED, + __m64 i17 ATTRIBUTE_UNUSED, + __m64 i18 ATTRIBUTE_UNUSED, + __m64 i19 ATTRIBUTE_UNUSED) +{ + /* Check register contents. */ + check_m64_arguments; +} + +void +fun_check_passing_m128_8_values (__m128 i0 ATTRIBUTE_UNUSED, + __m128 i1 ATTRIBUTE_UNUSED, + __m128 i2 ATTRIBUTE_UNUSED, + __m128 i3 ATTRIBUTE_UNUSED, + __m128 i4 ATTRIBUTE_UNUSED, + __m128 i5 ATTRIBUTE_UNUSED, + __m128 i6 ATTRIBUTE_UNUSED, + __m128 i7 ATTRIBUTE_UNUSED) +{ + /* Check argument values. */ + compare (values.i0, i0, __m128); + compare (values.i1, i1, __m128); + compare (values.i2, i2, __m128); + compare (values.i3, i3, __m128); + compare (values.i4, i4, __m128); + compare (values.i5, i5, __m128); + compare (values.i6, i6, __m128); + compare (values.i7, i7, __m128); +} + +void +fun_check_passing_m128h_8_values (__m128h i0 ATTRIBUTE_UNUSED, + __m128h i1 ATTRIBUTE_UNUSED, + __m128h i2 ATTRIBUTE_UNUSED, + __m128h i3 ATTRIBUTE_UNUSED, + __m128h i4 ATTRIBUTE_UNUSED, + __m128h i5 ATTRIBUTE_UNUSED, + __m128h i6 ATTRIBUTE_UNUSED, + __m128h i7 ATTRIBUTE_UNUSED) +{ + /* Check argument values. */ + compare (values.i0, i0, __m128h); + compare (values.i1, i1, __m128h); + compare (values.i2, i2, __m128h); + compare (values.i3, i3, __m128h); + compare (values.i4, i4, __m128h); + compare (values.i5, i5, __m128h); + compare (values.i6, i6, __m128h); + compare (values.i7, i7, __m128h); +} + +void +fun_check_passing_m128_8_regs (__m128 i0 ATTRIBUTE_UNUSED, + __m128 i1 ATTRIBUTE_UNUSED, + __m128 i2 ATTRIBUTE_UNUSED, + __m128 i3 ATTRIBUTE_UNUSED, + __m128 i4 ATTRIBUTE_UNUSED, + __m128 i5 ATTRIBUTE_UNUSED, + __m128 i6 ATTRIBUTE_UNUSED, + __m128 i7 ATTRIBUTE_UNUSED) +{ + /* Check register contents. */ + check_m128_arguments; +} + +void +fun_check_passing_m128h_8_regs (__m128h i0 ATTRIBUTE_UNUSED, + __m128h i1 ATTRIBUTE_UNUSED, + __m128h i2 ATTRIBUTE_UNUSED, + __m128h i3 ATTRIBUTE_UNUSED, + __m128h i4 ATTRIBUTE_UNUSED, + __m128h i5 ATTRIBUTE_UNUSED, + __m128h i6 ATTRIBUTE_UNUSED, + __m128h i7 ATTRIBUTE_UNUSED) +{ + /* Check register contents. */ + check_m128_arguments; +} + +void +fun_check_passing_m128_20_values (__m128 i0 ATTRIBUTE_UNUSED, + __m128 i1 ATTRIBUTE_UNUSED, + __m128 i2 ATTRIBUTE_UNUSED, + __m128 i3 ATTRIBUTE_UNUSED, + __m128 i4 ATTRIBUTE_UNUSED, + __m128 i5 ATTRIBUTE_UNUSED, + __m128 i6 ATTRIBUTE_UNUSED, + __m128 i7 ATTRIBUTE_UNUSED, + __m128 i8 ATTRIBUTE_UNUSED, + __m128 i9 ATTRIBUTE_UNUSED, + __m128 i10 ATTRIBUTE_UNUSED, + __m128 i11 ATTRIBUTE_UNUSED, + __m128 i12 ATTRIBUTE_UNUSED, + __m128 i13 ATTRIBUTE_UNUSED, + __m128 i14 ATTRIBUTE_UNUSED, + __m128 i15 ATTRIBUTE_UNUSED, + __m128 i16 ATTRIBUTE_UNUSED, + __m128 i17 ATTRIBUTE_UNUSED, + __m128 i18 ATTRIBUTE_UNUSED, + __m128 i19 ATTRIBUTE_UNUSED) +{ + /* Check argument values. */ + compare (values.i0, i0, __m128); + compare (values.i1, i1, __m128); + compare (values.i2, i2, __m128); + compare (values.i3, i3, __m128); + compare (values.i4, i4, __m128); + compare (values.i5, i5, __m128); + compare (values.i6, i6, __m128); + compare (values.i7, i7, __m128); + compare (values.i8, i8, __m128); + compare (values.i9, i9, __m128); + compare (values.i10, i10, __m128); + compare (values.i11, i11, __m128); + compare (values.i12, i12, __m128); + compare (values.i13, i13, __m128); + compare (values.i14, i14, __m128); + compare (values.i15, i15, __m128); + compare (values.i16, i16, __m128); + compare (values.i17, i17, __m128); + compare (values.i18, i18, __m128); + compare (values.i19, i19, __m128); +} + +void +fun_check_passing_m128h_20_values (__m128h i0 ATTRIBUTE_UNUSED, + __m128h i1 ATTRIBUTE_UNUSED, + __m128h i2 ATTRIBUTE_UNUSED, + __m128h i3 ATTRIBUTE_UNUSED, + __m128h i4 ATTRIBUTE_UNUSED, + __m128h i5 ATTRIBUTE_UNUSED, + __m128h i6 ATTRIBUTE_UNUSED, + __m128h i7 ATTRIBUTE_UNUSED, + __m128h i8 ATTRIBUTE_UNUSED, + __m128h i9 ATTRIBUTE_UNUSED, + __m128h i10 ATTRIBUTE_UNUSED, + __m128h i11 ATTRIBUTE_UNUSED, + __m128h i12 ATTRIBUTE_UNUSED, + __m128h i13 ATTRIBUTE_UNUSED, + __m128h i14 ATTRIBUTE_UNUSED, + __m128h i15 ATTRIBUTE_UNUSED, + __m128h i16 ATTRIBUTE_UNUSED, + __m128h i17 ATTRIBUTE_UNUSED, + __m128h i18 ATTRIBUTE_UNUSED, + __m128h i19 ATTRIBUTE_UNUSED) +{ + /* Check argument values. */ + compare (values.i0, i0, __m128h); + compare (values.i1, i1, __m128h); + compare (values.i2, i2, __m128h); + compare (values.i3, i3, __m128h); + compare (values.i4, i4, __m128h); + compare (values.i5, i5, __m128h); + compare (values.i6, i6, __m128h); + compare (values.i7, i7, __m128h); + compare (values.i8, i8, __m128h); + compare (values.i9, i9, __m128h); + compare (values.i10, i10, __m128h); + compare (values.i11, i11, __m128h); + compare (values.i12, i12, __m128h); + compare (values.i13, i13, __m128h); + compare (values.i14, i14, __m128h); + compare (values.i15, i15, __m128h); + compare (values.i16, i16, __m128h); + compare (values.i17, i17, __m128h); + compare (values.i18, i18, __m128h); + compare (values.i19, i19, __m128h); +} + +void +fun_check_passing_m128_20_regs (__m128 i0 ATTRIBUTE_UNUSED, + __m128 i1 ATTRIBUTE_UNUSED, + __m128 i2 ATTRIBUTE_UNUSED, + __m128 i3 ATTRIBUTE_UNUSED, + __m128 i4 ATTRIBUTE_UNUSED, + __m128 i5 ATTRIBUTE_UNUSED, + __m128 i6 ATTRIBUTE_UNUSED, + __m128 i7 ATTRIBUTE_UNUSED, + __m128 i8 ATTRIBUTE_UNUSED, + __m128 i9 ATTRIBUTE_UNUSED, + __m128 i10 ATTRIBUTE_UNUSED, + __m128 i11 ATTRIBUTE_UNUSED, + __m128 i12 ATTRIBUTE_UNUSED, + __m128 i13 ATTRIBUTE_UNUSED, + __m128 i14 ATTRIBUTE_UNUSED, + __m128 i15 ATTRIBUTE_UNUSED, + __m128 i16 ATTRIBUTE_UNUSED, + __m128 i17 ATTRIBUTE_UNUSED, + __m128 i18 ATTRIBUTE_UNUSED, + __m128 i19 ATTRIBUTE_UNUSED) +{ + /* Check register contents. */ + check_m128_arguments; +} + +void +fun_check_passing_m128h_20_regs (__m128h i0 ATTRIBUTE_UNUSED, + __m128h i1 ATTRIBUTE_UNUSED, + __m128h i2 ATTRIBUTE_UNUSED, + __m128h i3 ATTRIBUTE_UNUSED, + __m128h i4 ATTRIBUTE_UNUSED, + __m128h i5 ATTRIBUTE_UNUSED, + __m128h i6 ATTRIBUTE_UNUSED, + __m128h i7 ATTRIBUTE_UNUSED, + __m128h i8 ATTRIBUTE_UNUSED, + __m128h i9 ATTRIBUTE_UNUSED, + __m128h i10 ATTRIBUTE_UNUSED, + __m128h i11 ATTRIBUTE_UNUSED, + __m128h i12 ATTRIBUTE_UNUSED, + __m128h i13 ATTRIBUTE_UNUSED, + __m128h i14 ATTRIBUTE_UNUSED, + __m128h i15 ATTRIBUTE_UNUSED, + __m128h i16 ATTRIBUTE_UNUSED, + __m128h i17 ATTRIBUTE_UNUSED, + __m128h i18 ATTRIBUTE_UNUSED, + __m128h i19 ATTRIBUTE_UNUSED) +{ + /* Check register contents. */ + check_m128_arguments; +} + +#define def_check_int_passing8(_i0, _i1, _i2, _i3, \ + _i4, _i5, _i6, _i7, \ + _func1, _func2, TYPE) \ + values.i0.TYPE[0] = _i0; \ + values.i1.TYPE[0] = _i1; \ + values.i2.TYPE[0] = _i2; \ + values.i3.TYPE[0] = _i3; \ + values.i4.TYPE[0] = _i4; \ + values.i5.TYPE[0] = _i5; \ + values.i6.TYPE[0] = _i6; \ + values.i7.TYPE[0] = _i7; \ + WRAP_CALL(_func1) (_i0, _i1, _i2, _i3, _i4, _i5, _i6, _i7); \ + clear_float_registers; \ + fregs.F0.TYPE[0] = _i0; \ + fregs.F1.TYPE[0] = _i1; \ + fregs.F2.TYPE[0] = _i2; \ + fregs.F3.TYPE[0] = _i3; \ + fregs.F4.TYPE[0] = _i4; \ + fregs.F5.TYPE[0] = _i5; \ + fregs.F6.TYPE[0] = _i6; \ + fregs.F7.TYPE[0] = _i7; \ + num_fregs = 8; \ + WRAP_CALL(_func2) (_i0, _i1, _i2, _i3, _i4, _i5, _i6, _i7); + +#define def_check_int_passing20(_i0, _i1, _i2, _i3, _i4, _i5, _i6, \ + _i7, _i8, _i9, _i10, _i11, _i12, _i13, \ + _i14, _i15, _i16, _i17, _i18, _i19, \ + _func1, _func2, TYPE) \ + values.i0.TYPE[0] = _i0; \ + values.i1.TYPE[0] = _i1; \ + values.i2.TYPE[0] = _i2; \ + values.i3.TYPE[0] = _i3; \ + values.i4.TYPE[0] = _i4; \ + values.i5.TYPE[0] = _i5; \ + values.i6.TYPE[0] = _i6; \ + values.i7.TYPE[0] = _i7; \ + values.i8.TYPE[0] = _i8; \ + values.i9.TYPE[0] = _i9; \ + values.i10.TYPE[0] = _i10; \ + values.i11.TYPE[0] = _i11; \ + values.i12.TYPE[0] = _i12; \ + values.i13.TYPE[0] = _i13; \ + values.i14.TYPE[0] = _i14; \ + values.i15.TYPE[0] = _i15; \ + values.i16.TYPE[0] = _i16; \ + values.i17.TYPE[0] = _i17; \ + values.i18.TYPE[0] = _i18; \ + values.i19.TYPE[0] = _i19; \ + WRAP_CALL(_func1) (_i0, _i1, _i2, _i3, _i4, _i5, _i6, _i7, _i8, \ + _i9, _i10, _i11, _i12, _i13, _i14, _i15, _i16, \ + _i17, _i18, _i19); \ + clear_float_registers; \ + fregs.F0.TYPE[0] = _i0; \ + fregs.F1.TYPE[0] = _i1; \ + fregs.F2.TYPE[0] = _i2; \ + fregs.F3.TYPE[0] = _i3; \ + fregs.F4.TYPE[0] = _i4; \ + fregs.F5.TYPE[0] = _i5; \ + fregs.F6.TYPE[0] = _i6; \ + fregs.F7.TYPE[0] = _i7; \ + num_fregs = 8; \ + WRAP_CALL(_func2) (_i0, _i1, _i2, _i3, _i4, _i5, _i6, _i7, _i8, \ + _i9, _i10, _i11, _i12, _i13, _i14, _i15, _i16, \ + _i17, _i18, _i19); + +void +test_m64_on_stack () +{ + __m64 x[8]; + int i; + for (i = 0; i < 8; i++) + x[i] = (__m64){32 + i, 0}; + pass = "m64-8"; + def_check_int_passing8 (x[0], x[1], x[2], x[3], x[4], x[5], x[6], x[7], + fun_check_passing_m64_8_values, + fun_check_passing_m64_8_regs, _m64); +} + +void +test_too_many_m64 () +{ + __m64 x[20]; + int i; + for (i = 0; i < 20; i++) + x[i] = (__m64){32 + i, 0}; + pass = "m64-20"; + def_check_int_passing20 (x[0], x[1], x[2], x[3], x[4], x[5], x[6], x[7], + x[8], x[9], x[10], x[11], x[12], x[13], x[14], + x[15], x[16], x[17], x[18], x[19], + fun_check_passing_m64_20_values, + fun_check_passing_m64_20_regs, _m64); +} + +void +test_m128_on_stack () +{ + __m128 x[8]; + int i; + for (i = 0; i < 8; i++) + x[i] = (__m128){32 + i, 0, 0, 0}; + pass = "m128-8"; + def_check_int_passing8 (x[0], x[1], x[2], x[3], x[4], x[5], x[6], x[7], + fun_check_passing_m128_8_values, + fun_check_passing_m128_8_regs, _m128); +} + +void +test_m128h_on_stack () +{ + __m128h x[8]; + int i; + for (i = 0; i < 8; i++) + x[i] = (__m128h){1.1f16, 2.2f16, 3.3f16, 4.4f16, 5.5f16, + 6.6f16, 7.7f16, 8.8f16}; + pass = "m128h-8"; + def_check_int_passing8 (x[0], x[1], x[2], x[3], x[4], x[5], x[6], x[7], + fun_check_passing_m128h_8_values, + fun_check_passing_m128h_8_regs, _m128h); +} + +void +test_too_many_m128 () +{ + __m128 x[20]; + int i; + for (i = 0; i < 20; i++) + x[i] = (__m128){32 + i, 0, 0, 0}; + pass = "m128-20"; + def_check_int_passing20 (x[0], x[1], x[2], x[3], x[4], x[5], x[6], x[7], + x[8], x[9], x[10], x[11], x[12], x[13], x[14], + x[15], x[16], x[17], x[18], x[19], + fun_check_passing_m128_20_values, + fun_check_passing_m128_20_regs, _m128); +} + +void +test_too_many_m128h () +{ + __m128h x[20]; + int i; + for (i = 0; i < 20; i++) + x[i] = (__m128h){1.1f16, 2.2f16, 3.3f16, 4.4f16, 5.5f16, + 6.6f16, 7.7f16, 8.8f16}; + pass = "m128h-20"; + def_check_int_passing20 (x[0], x[1], x[2], x[3], x[4], x[5], x[6], x[7], + x[8], x[9], x[10], x[11], x[12], x[13], x[14], + x[15], x[16], x[17], x[18], x[19], + fun_check_passing_m128h_20_values, + fun_check_passing_m128h_20_regs, _m128h); +} + +static void +do_test (void) +{ + test_m64_on_stack (); + test_too_many_m64 (); + test_m128_on_stack (); + test_too_many_m128 (); + test_m128h_on_stack (); + test_too_many_m128h (); + if (failed) + abort (); +} diff --git a/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_passing_structs.c b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_passing_structs.c new file mode 100644 index 00000000000..4d1956a846d --- /dev/null +++ b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_passing_structs.c @@ -0,0 +1,332 @@ +/* This tests passing of structs. */ + +#include "avx512fp16-xmm-check.h" +#include "defines.h" +#include "args.h" +#include + +struct IntegerRegisters iregs; +struct FloatRegisters fregs; +unsigned int num_iregs, num_fregs; + +struct int_struct +{ + int i; +}; + +struct long_struct +{ + long long l; +}; + +struct long2_struct +{ + long long l1, l2; +}; + +struct long3_struct +{ + long long l1, l2, l3; +}; + + +/* Check that the struct is passed as the individual members in iregs. */ +void +check_struct_passing1 (struct int_struct is ATTRIBUTE_UNUSED) +{ + check_int_arguments; +} + +void +check_struct_passing2 (struct long_struct ls ATTRIBUTE_UNUSED) +{ + check_int_arguments; +} + +void +check_struct_passing3 (struct long2_struct ls ATTRIBUTE_UNUSED) +{ + check_int_arguments; +} + +void +check_struct_passing4 (struct long3_struct ls ATTRIBUTE_UNUSED) +{ + /* Check the passing on the stack by comparing the address of the + stack elements to the expected place on the stack. */ + assert ((unsigned long)&ls.l1 == rsp+8); + assert ((unsigned long)&ls.l2 == rsp+16); + assert ((unsigned long)&ls.l3 == rsp+24); +} + +#ifdef CHECK_M64_M128 +struct m128_struct +{ + __m128 x; +}; + +struct m128_2_struct +{ + __m128 x1, x2; +}; + +/* Check that the struct is passed as the individual members in fregs. */ +void +check_struct_passing5 (struct m128_struct ms1 ATTRIBUTE_UNUSED, + struct m128_struct ms2 ATTRIBUTE_UNUSED, + struct m128_struct ms3 ATTRIBUTE_UNUSED, + struct m128_struct ms4 ATTRIBUTE_UNUSED, + struct m128_struct ms5 ATTRIBUTE_UNUSED, + struct m128_struct ms6 ATTRIBUTE_UNUSED, + struct m128_struct ms7 ATTRIBUTE_UNUSED, + struct m128_struct ms8 ATTRIBUTE_UNUSED) +{ + check_m128_arguments; +} + +void +check_struct_passing6 (struct m128_2_struct ms ATTRIBUTE_UNUSED) +{ + /* Check the passing on the stack by comparing the address of the + stack elements to the expected place on the stack. */ + assert ((unsigned long)&ms.x1 == rsp+8); + assert ((unsigned long)&ms.x2 == rsp+24); +} +#endif + +struct flex1_struct +{ + long long i; + long long flex[]; +}; + +struct flex2_struct +{ + long long i; + long long flex[0]; +}; + +void +check_struct_passing7 (struct flex1_struct is ATTRIBUTE_UNUSED) +{ + check_int_arguments; +} + +void +check_struct_passing8 (struct flex2_struct is ATTRIBUTE_UNUSED) +{ + check_int_arguments; +} + +struct complex1_struct +{ + int c; + __complex__ float x; +}; + +struct complex1a_struct +{ + long long l; + float f; +}; + +struct complex2_struct +{ + int c; + __complex__ float x; + float y; +}; + +struct complex2a_struct +{ + long long l; + double d; +}; + +struct complex3_struct +{ + int c; + __complex__ _Float16 x; +}; + +struct complex3a_struct +{ + long long l; + _Float16 f; +}; + +struct complex4_struct +{ + int c; + __complex__ _Float16 x; + _Float16 y; +}; + +struct complex4a_struct +{ + long long l; + _Float16 f; +}; + +void +check_struct_passing9 (struct complex1_struct is ATTRIBUTE_UNUSED) +{ + check_int_arguments; + check_float_arguments; +} + +void +check_struct_passing10 (struct complex2_struct is ATTRIBUTE_UNUSED) +{ + check_int_arguments; + check_double_arguments; +} + +void +check_struct_passing11 (struct complex3_struct is ATTRIBUTE_UNUSED) +{ + check_int_arguments; + check_float16_arguments; +} + +void +check_struct_passing12 (struct complex4_struct is ATTRIBUTE_UNUSED) +{ + check_int_arguments; + check_float16_arguments; +} + +static struct flex1_struct f1s = { 60, { } }; +static struct flex2_struct f2s = { 61, { } }; + +static void +do_test (void) +{ + struct int_struct is = { 48 }; + struct long_struct ls = { 49 }; +#ifdef CHECK_LARGER_STRUCTS + struct long2_struct l2s = { 50, 51 }; + struct long3_struct l3s = { 52, 53, 54 }; +#endif +#ifdef CHECK_M64_M128 + struct m128_struct m128s[8]; + struct m128_2_struct m128_2s = { + { 48.394, 39.3, -397.9, 3484.9 }, + { -8.394, -93.3, 7.9, 84.94 } + }; + int i; +#endif + struct complex1_struct c1s = { 4, ( -13.4 + 3.5*I ) }; + union + { + struct complex1_struct c; + struct complex1a_struct u; + } c1u; + struct complex2_struct c2s = { 4, ( -13.4 + 3.5*I ), -34.5 }; + union + { + struct complex2_struct c; + struct complex2a_struct u; + } c2u; + + struct complex3_struct c3s = { 4, ( -13.4 + 3.5*I ) }; + union + { + struct complex3_struct c; + struct complex3a_struct u; + } c3u; + + struct complex4_struct c4s = { 4, ( -13.4 + 3.5*I ), -34.5 }; + union + { + struct complex4_struct c; + struct complex4a_struct u; + } c4u; + + clear_struct_registers; + iregs.I0 = is.i; + num_iregs = 1; + clear_int_hardware_registers; + WRAP_CALL (check_struct_passing1)(is); + + clear_struct_registers; + iregs.I0 = ls.l; + num_iregs = 1; + clear_int_hardware_registers; + WRAP_CALL (check_struct_passing2)(ls); + +#ifdef CHECK_LARGER_STRUCTS + clear_struct_registers; + iregs.I0 = l2s.l1; + iregs.I1 = l2s.l2; + num_iregs = 2; + clear_int_hardware_registers; + WRAP_CALL (check_struct_passing3)(l2s); + WRAP_CALL (check_struct_passing4)(l3s); +#endif + +#ifdef CHECK_M64_M128 + clear_struct_registers; + for (i = 0; i < 8; i++) + { + m128s[i].x = (__m128){32+i, 0, i, 0}; + (&fregs.xmm0)[i]._m128[0] = m128s[i].x; + } + num_fregs = 8; + clear_float_hardware_registers; + WRAP_CALL (check_struct_passing5)(m128s[0], m128s[1], m128s[2], m128s[3], + m128s[4], m128s[5], m128s[6], m128s[7]); + WRAP_CALL (check_struct_passing6)(m128_2s); +#endif + + clear_struct_registers; + iregs.I0 = f1s.i; + num_iregs = 1; + clear_int_hardware_registers; + WRAP_CALL (check_struct_passing7)(f1s); + + clear_struct_registers; + iregs.I0 = f2s.i; + num_iregs = 1; + clear_int_hardware_registers; + WRAP_CALL (check_struct_passing8)(f2s); + + clear_struct_registers; + c1u.c = c1s; + iregs.I0 = c1u.u.l; + num_iregs = 1; + fregs.xmm0._float [0] = c1u.u.f; + num_fregs = 1; + clear_int_hardware_registers; + clear_float_hardware_registers; + WRAP_CALL (check_struct_passing9)(c1s); + + clear_struct_registers; + c2u.c = c2s; + iregs.I0 = c2u.u.l; + num_iregs = 1; + fregs.xmm0._double[0] = c2u.u.d; + num_fregs = 1; + clear_int_hardware_registers; + clear_float_hardware_registers; + WRAP_CALL (check_struct_passing10)(c2s); + + clear_struct_registers; + c3u.c = c3s; + iregs.I0 = c3u.u.l; + num_iregs = 1; + num_fregs = 0; + clear_int_hardware_registers; + clear_float_hardware_registers; + WRAP_CALL (check_struct_passing11)(c3s); + + clear_struct_registers; + c4u.c = c4s; + iregs.I0 = c4u.u.l; + num_iregs = 1; + fregs.xmm0.__Float16 [0] = c4u.u.f; + num_fregs = 1; + clear_int_hardware_registers; + clear_float_hardware_registers; + WRAP_CALL (check_struct_passing12)(c4s); +} diff --git a/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_passing_unions.c b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_passing_unions.c new file mode 100644 index 00000000000..640b3057f93 --- /dev/null +++ b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_passing_unions.c @@ -0,0 +1,335 @@ +/* This tests passing of structs. */ + +#include "avx512fp16-xmm-check.h" +#include "defines.h" +#include "args.h" + +struct IntegerRegisters iregs; +struct FloatRegisters fregs; +unsigned int num_iregs, num_fregs; + +struct int_struct +{ + int i; +}; + +struct long_struct +{ + long l; +}; + +union un1 +{ + char c; + int i; +}; + +union un2 +{ + char c1; + long l; + char c2; +}; + +union un3 +{ + struct int_struct is; + struct long_struct ls; + union un1 un; +}; + + +void +check_union_passing1(union un1 u ATTRIBUTE_UNUSED) +{ + check_int_arguments; +} + +void +check_union_passing2(union un2 u1 ATTRIBUTE_UNUSED) +{ + check_int_arguments; +} + +void +check_union_passing3(union un3 u ATTRIBUTE_UNUSED) +{ + check_int_arguments; +} + +#define check_union_passing1 WRAP_CALL(check_union_passing1) +#define check_union_passing2 WRAP_CALL(check_union_passing2) +#define check_union_passing3 WRAP_CALL(check_union_passing3) + +#ifdef CHECK_M64_M128 +union un4 +{ + __m128 x; + float f; +}; + +union un5 +{ + __m128 x; + long i; +}; + +void +check_union_passing4(union un4 u1 ATTRIBUTE_UNUSED, + union un4 u2 ATTRIBUTE_UNUSED, + union un4 u3 ATTRIBUTE_UNUSED, + union un4 u4 ATTRIBUTE_UNUSED, + union un4 u5 ATTRIBUTE_UNUSED, + union un4 u6 ATTRIBUTE_UNUSED, + union un4 u7 ATTRIBUTE_UNUSED, + union un4 u8 ATTRIBUTE_UNUSED) +{ + check_m128_arguments; +} + +void +check_union_passing5(union un5 u ATTRIBUTE_UNUSED) +{ + check_int_arguments; + check_vector_arguments(m128, 8); +} + +union un4a +{ + __m128 x; + _Float16 f; +}; + +void +check_union_passing4a(union un4a u1 ATTRIBUTE_UNUSED, + union un4a u2 ATTRIBUTE_UNUSED, + union un4a u3 ATTRIBUTE_UNUSED, + union un4a u4 ATTRIBUTE_UNUSED, + union un4a u5 ATTRIBUTE_UNUSED, + union un4a u6 ATTRIBUTE_UNUSED, + union un4a u7 ATTRIBUTE_UNUSED, + union un4a u8 ATTRIBUTE_UNUSED) +{ + check_m128_arguments; +} + +union un4b +{ + __m128h x; + _Float16 f; +}; + +void +check_union_passing4b(union un4b u1 ATTRIBUTE_UNUSED, + union un4b u2 ATTRIBUTE_UNUSED, + union un4b u3 ATTRIBUTE_UNUSED, + union un4b u4 ATTRIBUTE_UNUSED, + union un4b u5 ATTRIBUTE_UNUSED, + union un4b u6 ATTRIBUTE_UNUSED, + union un4b u7 ATTRIBUTE_UNUSED, + union un4b u8 ATTRIBUTE_UNUSED) +{ + check_m128_arguments; +} + +#define check_union_passing4 WRAP_CALL(check_union_passing4) +#define check_union_passing4a WRAP_CALL(check_union_passing4a) +#define check_union_passing4b WRAP_CALL(check_union_passing4b) +#define check_union_passing5 WRAP_CALL(check_union_passing5) +#endif + +union un6 +{ + long double ld; + int i; +}; + + +void +check_union_passing6(union un6 u ATTRIBUTE_UNUSED) +{ + /* Check the passing on the stack by comparing the address of the + stack elements to the expected place on the stack. */ + assert ((unsigned long)&u.ld == rsp+8); + assert ((unsigned long)&u.i == rsp+8); +} + +#define check_union_passing6 WRAP_CALL(check_union_passing6) + +union un7 +{ + long double ld; + _Float16 f; +}; + +void +check_union_passing7(union un7 u ATTRIBUTE_UNUSED) +{ + /* Check the passing on the stack by comparing the address of the + stack elements to the expected place on the stack. */ + assert ((unsigned long)&u.ld == rsp+8); + assert ((unsigned long)&u.f == rsp+8); +} + +#define check_union_passing7 WRAP_CALL(check_union_passing7) + +union un8 +{ + _Float16 f; + int i; +}; + +void +check_union_passing8(union un8 u ATTRIBUTE_UNUSED) +{ + check_int_arguments; +} + +#define check_union_passing8 WRAP_CALL(check_union_passing8) + +static void +do_test (void) +{ + union un1 u1; +#ifdef CHECK_LARGER_UNION_PASSING + union un2 u2; + union un3 u3; + struct int_struct is; + struct long_struct ls; +#endif /* CHECK_LARGER_UNION_PASSING */ +#ifdef CHECK_M64_M128 + union un4 u4[8]; + union un4a u4a[8]; + union un4b u4b[8]; + union un5 u5 = { { 48.394, 39.3, -397.9, 3484.9 } }; + int i; +#endif + union un6 u6; + union un7 u7; + union un8 u8; + + /* Check a union with char, int. */ + clear_struct_registers; + u1.i = 0; /* clear the struct to not have high bits left */ + u1.c = 32; + iregs.I0 = 32; + num_iregs = 1; + clear_int_hardware_registers; + check_union_passing1(u1); + u1.i = 0; /* clear the struct to not have high bits left */ + u1.i = 33; + iregs.I0 = 33; + num_iregs = 1; + clear_int_hardware_registers; + check_union_passing1(u1); + + /* Check a union with char, long, char. */ +#ifdef CHECK_LARGER_UNION_PASSING + clear_struct_registers; + u2.l = 0; /* clear the struct to not have high bits left */ + u2.c1 = 34; + iregs.I0 = 34; + num_iregs = 1; + clear_int_hardware_registers; + check_union_passing2(u2); + u2.l = 0; /* clear the struct to not have high bits left */ + u2.l = 35; + iregs.I0 = 35; + num_iregs = 1; + clear_int_hardware_registers; + check_union_passing2(u2); + u2.l = 0; /* clear the struct to not have high bits left */ + u2.c2 = 36; + iregs.I0 = 36; + num_iregs = 1; + clear_int_hardware_registers; + check_union_passing2(u2); + + /* check a union containing two structs and a union. */ + clear_struct_registers; + is.i = 37; + u3.ls.l = 0; /* clear the struct to not have high bits left */ + u3.is = is; + iregs.I0 = 37; + num_iregs = 1; + clear_int_hardware_registers; + check_union_passing3(u3); + ls.l = 38; + u3.ls.l = 0; /* clear the struct to not have high bits left */ + u3.ls = ls; + iregs.I0 = 38; + num_iregs = 1; + clear_int_hardware_registers; + check_union_passing3(u3); + u1.c = 39; + u3.ls.l = 0; /* clear the struct to not have high bits left */ + u3.un = u1; + iregs.I0 = 39; + num_iregs = 1; + clear_int_hardware_registers; + check_union_passing3(u3); + u1.i = 40; + u3.ls.l = 0; /* clear the struct to not have high bits left */ + u3.un = u1; + iregs.I0 = 40; + num_iregs = 1; + clear_int_hardware_registers; + check_union_passing3(u3); +#endif /* CHECK_LARGER_UNION_PASSING */ + +#ifdef CHECK_M64_M128 + clear_struct_registers; + for (i = 0; i < 8; i++) + { + u4[i].x = (__m128){32+i, 0, i, 0}; + (&fregs.xmm0)[i]._m128[0] = u4[i].x; + } + num_fregs = 8; + clear_float_hardware_registers; + check_union_passing4(u4[0], u4[1], u4[2], u4[3], + u4[4], u4[5], u4[6], u4[7]); + + clear_struct_registers; + for (i = 0; i < 8; i++) + { + u4a[i].x = (__m128){32+i, 0, i, 0}; + (&fregs.xmm0)[i]._m128[0] = u4[i].x; + } + num_fregs = 8; + clear_float_hardware_registers; + check_union_passing4a(u4a[0], u4a[1], u4a[2], u4a[3], + u4a[4], u4a[5], u4a[6], u4a[7]); + + clear_struct_registers; + for (i = 0; i < 8; i++) + { + u4b[i].x = (__m128h){33+i, 0, i, 0, -i, 1, 2 * i, i + 8}; + (&fregs.xmm0)[i]._m128h[0] = u4b[i].x; + } + num_fregs = 8; + clear_float_hardware_registers; + check_union_passing4b(u4b[0], u4b[1], u4b[2], u4b[3], + u4b[4], u4b[5], u4b[6], u4b[7]); + + clear_struct_registers; + fregs.xmm0._m128[0] = u5.x; + num_fregs = 1; + num_iregs = 1; + iregs.I0 = u5.i; + clear_float_hardware_registers; + check_union_passing5(u5); +#endif + + u6.i = 2; + check_union_passing6(u6); + + u7.f = 2.0f16; + check_union_passing7(u7); + + clear_struct_registers; + u8.i = 8; + num_iregs = 1; + iregs.I0 = u8.i; + clear_int_hardware_registers; + check_union_passing8(u8); +} diff --git a/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_struct_returning.c b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_struct_returning.c new file mode 100644 index 00000000000..92578127be7 --- /dev/null +++ b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_struct_returning.c @@ -0,0 +1,274 @@ +/* This tests returning of structures. */ + +#include +#include "avx512fp16-xmm-check.h" +#include "defines.h" +#include "macros.h" +#include "args.h" + +struct IntegerRegisters iregs; +struct FloatRegisters fregs; +unsigned int num_iregs, num_fregs; + +int current_test; +int num_failed = 0; + +#undef assert +#define assert(test) do { if (!(test)) {fprintf (stderr, "failed in test %d\n", current_test); num_failed++; } } while (0) + +#define xmm0h xmm_regs[0].__Float16 +#define xmm1h xmm_regs[1].__Float16 +#define xmm0f xmm_regs[0]._float +#define xmm0d xmm_regs[0]._double +#define xmm1f xmm_regs[1]._float +#define xmm1d xmm_regs[1]._double + +typedef enum { + INT = 0, + SSE_H, + SSE_F, + SSE_D, + X87, + MEM, + INT_SSE, + SSE_INT, + SSE_F_V, + SSE_F_H, + SSE_F_H8 +} Type; + +/* Structures which should be returned in INTEGER. */ +#define D(I,MEMBERS,B) struct S_ ## I { MEMBERS ; }; Type class_ ## I = INT; \ +struct S_ ## I f_ ## I (void) { struct S_ ## I s; memset (&s, 0, sizeof(s)); B; return s; } + +D(1,char m1, s.m1=42) +D(2,short m1, s.m1=42) +D(3,int m1, s.m1=42) +D(4,long m1, s.m1=42) +D(5,long long m1, s.m1=42) +D(6,char m1;short s, s.m1=42) +D(7,char m1;int i, s.m1=42) +D(8,char m1; long l, s.m1=42) +D(9,char m1; long long l, s.m1=42) +D(10,char m1[16], s.m1[0]=42) +D(11,short m1[8], s.m1[0]=42) +D(12,int m1[4], s.m1[0]=42) +D(13,long m1[2], s.m1[0]=42) +D(14,long long m1[2], s.m1[0]=42) + +#undef D + +/* Structures which should be returned in SSE. */ +#define D(I,MEMBERS,C,B) struct S_ ## I { MEMBERS ; }; Type class_ ## I = C; \ +struct S_ ## I f_ ## I (void) { struct S_ ## I s; memset (&s, 0, sizeof(s)); B; return s; } + +D(100,float f,SSE_F, s.f=42) +D(101,double d,SSE_D, s.d=42) +D(102,float f;float f2,SSE_F, s.f=42) +D(103,float f;double d,SSE_F, s.f=42) +D(104,double d; float f,SSE_D, s.d=42) +D(105,double d; double d2,SSE_D, s.d=42) +D(106,float f[2],SSE_F, s.f[0]=42) +D(107,float f[3],SSE_F, s.f[0]=42) +D(108,float f[4],SSE_F, s.f[0]=42) +D(109,double d[2],SSE_D, s.d[0]=42) +D(110,float f[2]; double d,SSE_F, s.f[0]=42) +D(111,double d;float f[2],SSE_D, s.d=42) + +D(120,_Float16 f,SSE_H, s.f=42) +D(121,_Float16 f;_Float16 f2,SSE_H, s.f=42) +D(122,_Float16 f;float d,SSE_H, s.f=42) +D(123,_Float16 f;double d,SSE_H, s.f=42) +D(124,double d; _Float16 f,SSE_D, s.d=42) +D(125,_Float16 f[2],SSE_H, s.f[0]=42) +D(126,_Float16 f[3],SSE_H, s.f[0]=42) +D(127,_Float16 f[4],SSE_H, s.f[0]=42) +D(128,_Float16 f[2]; double d,SSE_H, s.f[0]=42) +D(129,double d;_Float16 f[2],SSE_D, s.d=42) + +#undef D + +/* Structures which should be returned on x87 stack. */ +#define D(I,MEMBERS) struct S_ ## I { MEMBERS ; }; Type class_ ## I = X87; \ +struct S_ ## I f_ ## I (void) { struct S_ ## I s = { 42 }; return s; } + +/* The only struct containing a long double, which is returned in + registers at all, is the singleton struct. All others are too large. + This includes a struct containing complex long double, which is passed + in memory, although a complex long double type itself is returned in + two registers. */ +D(200,long double ld) + +#undef D + +/* Structures which should be returned in INT (low) and SSE (high). */ +#define D(I,MEMBERS) struct S_ ## I { MEMBERS ; }; Type class_ ## I = INT_SSE; \ +struct S_ ## I f_ ## I (void) { struct S_ ## I s = { 42,43 }; return s; } + +D(300,char m1; float m2) +D(301,char m1; double m2) +D(302,short m1; float m2) +D(303,short m1; double m2) +D(304,int m1; float m2) +D(305,int m1; double m2) +D(306,long long m1; float m2) +D(307,long long m1; double m2) + +D(310,char m1; _Float16 m2) +D(311,short m1; _Float16 m2) +D(312,int m1; _Float16 m2) +D(313,long long m1; _Float16 m2) + +#undef D + +void check_300 (void) +{ + XMM_T x; + x._ulong[0] = rax; + switch (current_test) { + case 300: assert ((rax & 0xff) == 42 && x._float[1] == 43); break; + case 301: assert ((rax & 0xff) == 42 && xmm0d[0] == 43); break; + case 302: assert ((rax & 0xffff) == 42 && x._float[1] == 43); break; + case 303: assert ((rax & 0xffff) == 42 && xmm0d[0] == 43); break; + case 304: assert ((rax & 0xffffffff) == 42 && x._float[1] == 43); break; + case 305: assert ((rax & 0xffffffff) == 42 && xmm0d[0] == 43); break; + case 306: assert (rax == 42 && xmm0f[0] == 43); break; + case 307: assert (rax == 42 && xmm0d[0] == 43); break; + case 310: assert ((rax & 0xff) == 42 && x.__Float16[1] == 43); break; + case 311: assert ((rax & 0xffff) == 42 && x.__Float16[1] == 43); break; + case 312: assert ((rax & 0xffffffff) == 42 && x.__Float16[2] == 43); break; + case 313: assert (rax == 42 && xmm0h[0] == 43); break; + + default: assert (0); break; + } +} + +/* Structures which should be returned in SSE (low) and INT (high). */ +#define D(I,MEMBERS,B) struct S_ ## I { MEMBERS ; }; Type class_ ## I = SSE_INT; \ +struct S_ ## I f_ ## I (void) { struct S_ ## I s; memset (&s, 0, sizeof(s)); B; return s; } + +D(400,float f[2];char c, s.f[0]=42; s.c=43) +D(401,double d;char c, s.d=42; s.c=43) + +D(402,_Float16 f[4];char c, s.f[0]=42; s.c=43) + +#undef D + +void check_400 (void) +{ + switch (current_test) { + case 400: assert (xmm0f[0] == 42 && (rax & 0xff) == 43); break; + case 401: assert (xmm0d[0] == 42 && (rax & 0xff) == 43); break; + case 402: assert (xmm0h[0] == 42 && (rax & 0xff) == 43); break; + + default: assert (0); break; + } +} + +/* Structures which should be returned in MEM. */ +void *struct_addr; +#define D(I,MEMBERS) struct S_ ## I { MEMBERS ; }; Type class_ ## I = MEM; \ +struct S_ ## I f_ ## I (void) { union {unsigned char c; struct S_ ## I s;} u; memset (&u.s, 0, sizeof(u.s)); u.c = 42; return u.s; } + +/* Too large. */ +D(500,char m1[17]) +D(501,short m1[9]) +D(502,int m1[5]) +D(503,long m1[3]) +D(504,short m1[8];char c) +D(505,char m1[1];int i[4]) +D(506,float m1[5]) +D(507,double m1[3]) +D(508,char m1[1];float f[4]) +D(509,char m1[1];double d[2]) +D(510,__complex long double m1[1]) + +/* Too large due to padding. */ +D(520,char m1[1];int i;char c2; int i2; char c3) + +/* Unnaturally aligned members. */ +D(530,short m1[1];int i PACKED) + +D(540,_Float16 m1[10]) +D(541,char m1[1];_Float16 f[8]) + +#undef D + + +/* Special tests. */ +#define D(I,MEMBERS,C,B) struct S_ ## I { MEMBERS ; }; Type class_ ## I = C; \ +struct S_ ## I f_ ## I (void) { struct S_ ## I s; B; return s; } +D(600,float f[4], SSE_F_V, s.f[0] = s.f[1] = s.f[2] = s.f[3] = 42) +D(601,_Float16 f[4], SSE_F_H, s.f[0] = s.f[1] = s.f[2] = s.f[3] = 42) +D(602,_Float16 f[8], SSE_F_H8, + s.f[0] = s.f[1] = s.f[2] = s.f[3] = s.f[4] = s.f[5] = s.f[6] = s.f[7] = 42) +#undef D + +void clear_all (void) +{ + clear_int_registers; + clear_float_registers; + clear_x87_registers; +} + +void check_all (Type class, unsigned long size) +{ + switch (class) { + case INT: if (size < 8) rax &= ~0UL >> (64-8*size); assert (rax == 42); break; + case SSE_H: assert (xmm0h[0] == 42); break; + case SSE_F: assert (xmm0f[0] == 42); break; + case SSE_D: assert (xmm0d[0] == 42); break; + case SSE_F_V: assert (xmm0f[0] == 42 && xmm0f[1]==42 && xmm1f[0] == 42 && xmm1f[1] == 42); break; + case SSE_F_H: assert (xmm0h[0] == 42 && xmm0h[1]==42 && xmm0h[2] == 42 && xmm0h[3] == 42); break; + case SSE_F_H8: assert (xmm0h[0] == 42 && xmm0h[1]==42 && xmm0h[2] == 42 && xmm0h[3] == 42 + && xmm1h[0] == 42 && xmm1h[1]==42 && xmm1h[2] == 42 && xmm1h[3] == 42); break; + case X87: assert (x87_regs[0]._ldouble == 42); break; + case INT_SSE: check_300(); break; + case SSE_INT: check_400(); break; + /* Ideally we would like to check that rax == struct_addr. + Unfortunately the address of the target struct escapes (for setting + struct_addr), so the return struct is a temporary one whose address + is given to the f_* functions, otherwise a conforming program + could notice the struct changing already before the function returns. + This temporary struct could be anywhere. For GCC it will be on + stack, but no one is forbidding that it could be a static variable + if there's no threading or proper locking. Nobody in his right mind + will not use the stack for that. */ + case MEM: assert (*(unsigned char*)struct_addr == 42 && rdi == rax); break; + } +} + +#define D(I) { struct S_ ## I s; current_test = I; struct_addr = (void*)&s; \ + clear_all(); \ + s = WRAP_RET(f_ ## I) (); \ + check_all(class_ ## I, sizeof(s)); \ +} + +static void +do_test (void) +{ + D(1) D(2) D(3) D(4) D(5) D(6) D(7) D(8) D(9) D(10) D(11) D(12) D(13) D(14) + + D(100) D(101) D(102) D(103) D(104) D(105) D(106) D(107) D(108) D(109) D(110) + D(111) + + D(120) D(121) D(122) D(123) D(124) D(125) D(126) D(127) D(128) D(129) + + D(200) + + D(300) D(301) D(302) D(303) D(304) D(305) D(306) D(307) + D(310) D(311) D(312) D(313) + + D(400) D(401) D(402) + + D(500) D(501) D(502) D(503) D(504) D(505) D(506) D(507) D(508) D(509) + D(520) + D(530) + + D(540) D(541) + + D(600) D(601) D(602) + if (num_failed) + abort (); +} +#undef D diff --git a/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_varargs-m128.c b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_varargs-m128.c new file mode 100644 index 00000000000..5bdc44db5f4 --- /dev/null +++ b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_varargs-m128.c @@ -0,0 +1,164 @@ +/* Test variable number of 128-bit vector arguments passed to functions. */ + +#include +#include "avx512fp16-xmm-check.h" +#include "defines.h" +#include "macros.h" +#include "args.h" + +struct IntegerRegisters iregs; +struct FloatRegisters fregs; + +/* This struct holds values for argument checking. */ +struct +{ + XMM_T i0, i1, i2, i3, i4, i5, i6, i7, i8, i9; +} values; + +char *pass; +int failed = 0; + +#undef assert +#define assert(c) do { \ + if (!(c)) {failed++; printf ("failed %s\n", pass); } \ +} while (0) + +#define compare(X1,X2,T) do { \ + assert (memcmp (&X1, &X2, sizeof (T)) == 0); \ +} while (0) + +void +fun_check_passing_m128_varargs (__m128 i0, __m128 i1, __m128 i2, + __m128 i3, ...) +{ + /* Check argument values. */ + void **fp = __builtin_frame_address (0); + void *ra = __builtin_return_address (0); + __m128 *argp; + + compare (values.i0, i0, __m128); + compare (values.i1, i1, __m128); + compare (values.i2, i2, __m128); + compare (values.i3, i3, __m128); + + /* Get the pointer to the return address on stack. */ + while (*fp != ra) + fp++; + + /* Skip the return address stack slot. */ + argp = (__m128 *) (((char *) fp) + 8); + + /* Check __m128 arguments passed on stack. */ + compare (values.i8, argp[0], __m128); + compare (values.i9, argp[1], __m128); + + /* Check register contents. */ + compare (fregs.xmm0, xmm_regs[0], __m128); + compare (fregs.xmm1, xmm_regs[1], __m128); + compare (fregs.xmm2, xmm_regs[2], __m128); + compare (fregs.xmm3, xmm_regs[3], __m128); + compare (fregs.xmm4, xmm_regs[4], __m128); + compare (fregs.xmm5, xmm_regs[5], __m128); + compare (fregs.xmm6, xmm_regs[6], __m128); + compare (fregs.xmm7, xmm_regs[7], __m128); +} + +void +fun_check_passing_m128h_varargs (__m128h i0, __m128h i1, __m128h i2, + __m128h i3, ...) +{ + /* Check argument values. */ + void **fp = __builtin_frame_address (0); + void *ra = __builtin_return_address (0); + __m128h *argp; + + compare (values.i0, i0, __m128h); + compare (values.i1, i1, __m128h); + compare (values.i2, i2, __m128h); + compare (values.i3, i3, __m128h); + + /* Get the pointer to the return address on stack. */ + while (*fp != ra) + fp++; + + /* Skip the return address stack slot. */ + argp = (__m128h *) (((char *) fp) + 8); + + /* Check __m128h arguments passed on stack. */ + compare (values.i8, argp[0], __m128h); + compare (values.i9, argp[1], __m128h); + + /* Check register contents. */ + compare (fregs.xmm0, xmm_regs[0], __m128h); + compare (fregs.xmm1, xmm_regs[1], __m128h); + compare (fregs.xmm2, xmm_regs[2], __m128h); + compare (fregs.xmm3, xmm_regs[3], __m128h); + compare (fregs.xmm4, xmm_regs[4], __m128h); + compare (fregs.xmm5, xmm_regs[5], __m128h); + compare (fregs.xmm6, xmm_regs[6], __m128h); + compare (fregs.xmm7, xmm_regs[7], __m128h); +} + +#define def_check_int_passing_varargs(_i0, _i1, _i2, _i3, _i4, _i5, \ + _i6, _i7, _i8, _i9, \ + _func, TYPE) \ + values.i0.TYPE[0] = _i0; \ + values.i1.TYPE[0] = _i1; \ + values.i2.TYPE[0] = _i2; \ + values.i3.TYPE[0] = _i3; \ + values.i4.TYPE[0] = _i4; \ + values.i5.TYPE[0] = _i5; \ + values.i6.TYPE[0] = _i6; \ + values.i7.TYPE[0] = _i7; \ + values.i8.TYPE[0] = _i8; \ + values.i9.TYPE[0] = _i9; \ + clear_float_registers; \ + fregs.F0.TYPE[0] = _i0; \ + fregs.F1.TYPE[0] = _i1; \ + fregs.F2.TYPE[0] = _i2; \ + fregs.F3.TYPE[0] = _i3; \ + fregs.F4.TYPE[0] = _i4; \ + fregs.F5.TYPE[0] = _i5; \ + fregs.F6.TYPE[0] = _i6; \ + fregs.F7.TYPE[0] = _i7; \ + WRAP_CALL(_func) (_i0, _i1, _i2, _i3, _i4, _i5, _i6, _i7, _i8, _i9); + +void +test_m128_varargs (void) +{ + __m128 x[10]; + int i; + for (i = 0; i < 10; i++) + x[i] = (__m128){32+i, 0, 0, 0}; + pass = "m128-varargs"; + def_check_int_passing_varargs (x[0], x[1], x[2], x[3], x[4], x[5], + x[6], x[7], x[8], x[9], + fun_check_passing_m128_varargs, + _m128); +} + +void +test_m128h_varargs (void) +{ + __m128h x[10]; + int i; + for (i = 0; i < 10; i++) + x[i] = (__m128h) { + 1.1f16 + i, 2.2f16 + i, 3.3f16 + i, 4.4f16 + i, + 5.5f16 + i, 6.6f16 + i, 7.7f16 + i, 8.8f16 + i + }; + pass = "m128h-varargs"; + def_check_int_passing_varargs (x[0], x[1], x[2], x[3], x[4], x[5], + x[6], x[7], x[8], x[9], + fun_check_passing_m128h_varargs, + _m128h); +} + +static void +do_test (void) +{ + test_m128_varargs (); + test_m128h_varargs (); + if (failed) + abort (); +} From patchwork Thu Jul 1 06:15:51 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: liuhongt X-Patchwork-Id: 1499308 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=8.43.85.97; helo=sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256 header.s=default header.b=vshBZhqy; dkim-atps=neutral Received: from sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4GFp6y3Kdhz9sWX for ; Thu, 1 Jul 2021 16:23:05 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 1C377384A015 for ; Thu, 1 Jul 2021 06:23:03 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 1C377384A015 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1625120583; bh=A1wr2O4RsiyBdjZwtGVY7zcn75H/g9wbEFOqpCq1pLQ=; h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=vshBZhqyJIq6wZdo8EQcs887Odd/ESYKRXZEOSUVfPMLkIEoucWU2z4FKPS/ori7s F+EYcWew7cIt0bsVkPTufFL3dYoQIJx+lYWqk5C6Vw1u2CfM5iqXHr4MN8APYDJ7Dc Xht98UhHrgrApwL3Ic+py1yABKR24yc7TK9ToLXs= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) by sourceware.org (Postfix) with ESMTPS id DAE21385F025 for ; Thu, 1 Jul 2021 06:17:00 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org DAE21385F025 X-IronPort-AV: E=McAfee;i="6200,9189,10031"; a="208299915" X-IronPort-AV: E=Sophos;i="5.83,313,1616482800"; d="scan'208";a="208299915" Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Jun 2021 23:16:59 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.83,313,1616482800"; d="scan'208";a="457530423" Received: from scymds01.sc.intel.com ([10.148.94.138]) by fmsmga008.fm.intel.com with ESMTP; 30 Jun 2021 23:16:59 -0700 Received: from shliclel320.sh.intel.com (shliclel320.sh.intel.com [10.239.236.50]) by scymds01.sc.intel.com with ESMTP id 1616Gmei031625; Wed, 30 Jun 2021 23:16:57 -0700 To: gcc-patches@gcc.gnu.org Subject: [PATCH 05/62] AVX512FP16: Add ABI test for ymm. Date: Thu, 1 Jul 2021 14:15:51 +0800 Message-Id: <20210701061648.9447-6-hongtao.liu@intel.com> X-Mailer: git-send-email 2.18.1 In-Reply-To: <20210701061648.9447-1-hongtao.liu@intel.com> References: <20210701061648.9447-1-hongtao.liu@intel.com> X-Spam-Status: No, score=-12.7 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_NONE, KAM_DMARC_STATUS, KAM_LAZY_DOMAIN_SECURITY, KAM_SHORT, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: liuhongt via Gcc-patches From: liuhongt Reply-To: liuhongt Cc: jakub@redhat.com Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" gcc/testsuite/ChangeLog: * gcc.target/x86_64/abi/avx512fp16/m256h/abi-avx512fp16-ymm.exp: New exp file. * gcc.target/x86_64/abi/avx512fp16/m256h/args.h: New header. * gcc.target/x86_64/abi/avx512fp16/m256h/avx512fp16-ymm-check.h: Likewise. * gcc.target/x86_64/abi/avx512fp16/m256h/asm-support.S: New. * gcc.target/x86_64/abi/avx512fp16/m256h/test_m256_returning.c: New test. * gcc.target/x86_64/abi/avx512fp16/m256h/test_passing_m256.c: Likewise. * gcc.target/x86_64/abi/avx512fp16/m256h/test_passing_structs.c: Likewise. * gcc.target/x86_64/abi/avx512fp16/m256h/test_passing_unions.c: Likewise. * gcc.target/x86_64/abi/avx512fp16/m256h/test_varargs-m256.c: Likewise. --- .../avx512fp16/m256h/abi-avx512fp16-ymm.exp | 45 +++ .../x86_64/abi/avx512fp16/m256h/args.h | 182 +++++++++ .../x86_64/abi/avx512fp16/m256h/asm-support.S | 81 ++++ .../avx512fp16/m256h/avx512fp16-ymm-check.h | 3 + .../avx512fp16/m256h/test_m256_returning.c | 54 +++ .../abi/avx512fp16/m256h/test_passing_m256.c | 370 ++++++++++++++++++ .../avx512fp16/m256h/test_passing_structs.c | 113 ++++++ .../avx512fp16/m256h/test_passing_unions.c | 337 ++++++++++++++++ .../abi/avx512fp16/m256h/test_varargs-m256.c | 160 ++++++++ 9 files changed, 1345 insertions(+) create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m256h/abi-avx512fp16-ymm.exp create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m256h/args.h create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m256h/asm-support.S create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m256h/avx512fp16-ymm-check.h create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m256h/test_m256_returning.c create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m256h/test_passing_m256.c create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m256h/test_passing_structs.c create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m256h/test_passing_unions.c create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m256h/test_varargs-m256.c diff --git a/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m256h/abi-avx512fp16-ymm.exp b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m256h/abi-avx512fp16-ymm.exp new file mode 100644 index 00000000000..ecf673bf796 --- /dev/null +++ b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m256h/abi-avx512fp16-ymm.exp @@ -0,0 +1,45 @@ +# Copyright (C) 2019 Free Software Foundation, Inc. + +# This program is free software; you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation; either version 3 of the License, or +# (at your option) any later version. +# +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with GCC; see the file COPYING3. If not see +# . + +# The x86-64 ABI testsuite needs one additional assembler file for most +# testcases. For simplicity we will just link it into each test. + +load_lib c-torture.exp +load_lib target-supports.exp +load_lib torture-options.exp +load_lib file-format.exp + +if { (![istarget x86_64-*-*] && ![istarget i?86-*-*]) + || [is-effective-target ia32] + || [gcc_target_object_format] != "elf" + || ![is-effective-target avx512fp16] } then { + return +} + + +torture-init +set-torture-options $C_TORTURE_OPTIONS +set additional_flags "-W -Wall -Wno-abi -mavx512fp16" + +foreach src [lsort [glob -nocomplain $srcdir/$subdir/test_*.c]] { + if {[runtest_file_p $runtests $src]} { + c-torture-execute [list $src \ + $srcdir/$subdir/asm-support.S] \ + $additional_flags + } +} + +torture-finish diff --git a/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m256h/args.h b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m256h/args.h new file mode 100644 index 00000000000..136db48c144 --- /dev/null +++ b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m256h/args.h @@ -0,0 +1,182 @@ +#ifndef INCLUDED_ARGS_H +#define INCLUDED_ARGS_H + +#include +#include + +/* Assertion macro. */ +#define assert(test) if (!(test)) abort() + +#ifdef __GNUC__ +#define ATTRIBUTE_UNUSED __attribute__((__unused__)) +#else +#define ATTRIBUTE_UNUSED +#endif + +/* This defines the calling sequences for integers and floats. */ +#define I0 rdi +#define I1 rsi +#define I2 rdx +#define I3 rcx +#define I4 r8 +#define I5 r9 +#define F0 ymm0 +#define F1 ymm1 +#define F2 ymm2 +#define F3 ymm3 +#define F4 ymm4 +#define F5 ymm5 +#define F6 ymm6 +#define F7 ymm7 + +typedef union { + _Float16 __Float16[16]; + float _float[8]; + double _double[4]; + long _long[4]; + int _int[8]; + unsigned long _ulong[4]; + __m64 _m64[4]; + __m128 _m128[2]; + __m256 _m256[1]; + __m256h _m256h[1]; +} YMM_T; + +typedef union { + float _float; + double _double; + long double _ldouble; + unsigned long _ulong[2]; +} X87_T; +extern void (*callthis)(void); +extern unsigned long rax,rbx,rcx,rdx,rsi,rdi,rsp,rbp,r8,r9,r10,r11,r12,r13,r14,r15; +YMM_T ymm_regs[16]; +X87_T x87_regs[8]; +extern volatile unsigned long volatile_var; +extern void snapshot (void); +extern void snapshot_ret (void); +#define WRAP_CALL(N) \ + (callthis = (void (*)()) (N), (typeof (&N)) snapshot) +#define WRAP_RET(N) \ + (callthis = (void (*)()) (N), (typeof (&N)) snapshot_ret) + +/* Clear all integer registers. */ +#define clear_int_hardware_registers \ + asm __volatile__ ("xor %%rax, %%rax\n\t" \ + "xor %%rbx, %%rbx\n\t" \ + "xor %%rcx, %%rcx\n\t" \ + "xor %%rdx, %%rdx\n\t" \ + "xor %%rsi, %%rsi\n\t" \ + "xor %%rdi, %%rdi\n\t" \ + "xor %%r8, %%r8\n\t" \ + "xor %%r9, %%r9\n\t" \ + "xor %%r10, %%r10\n\t" \ + "xor %%r11, %%r11\n\t" \ + "xor %%r12, %%r12\n\t" \ + "xor %%r13, %%r13\n\t" \ + "xor %%r14, %%r14\n\t" \ + "xor %%r15, %%r15\n\t" \ + ::: "rax", "rbx", "rcx", "rdx", "rsi", "rdi", "r8", \ + "r9", "r10", "r11", "r12", "r13", "r14", "r15"); + +/* This is the list of registers available for passing arguments. Not all of + these are used or even really available. */ +struct IntegerRegisters +{ + unsigned long rax, rbx, rcx, rdx, rsi, rdi, r8, r9, r10, r11, r12, r13, r14, r15; +}; +struct FloatRegisters +{ + double mm0, mm1, mm2, mm3, mm4, mm5, mm6, mm7; + long double st0, st1, st2, st3, st4, st5, st6, st7; + YMM_T ymm0, ymm1, ymm2, ymm3, ymm4, ymm5, ymm6, ymm7, ymm8, ymm9, + ymm10, ymm11, ymm12, ymm13, ymm14, ymm15; +}; + +/* Implemented in scalarargs.c */ +extern struct IntegerRegisters iregs; +extern struct FloatRegisters fregs; +extern unsigned int num_iregs, num_fregs; + +#define check_int_arguments do { \ + assert (num_iregs <= 0 || iregs.I0 == I0); \ + assert (num_iregs <= 1 || iregs.I1 == I1); \ + assert (num_iregs <= 2 || iregs.I2 == I2); \ + assert (num_iregs <= 3 || iregs.I3 == I3); \ + assert (num_iregs <= 4 || iregs.I4 == I4); \ + assert (num_iregs <= 5 || iregs.I5 == I5); \ + } while (0) + +#define check_char_arguments check_int_arguments +#define check_short_arguments check_int_arguments +#define check_long_arguments check_int_arguments + +/* Clear register struct. */ +#define clear_struct_registers \ + rax = rbx = rcx = rdx = rdi = rsi = rbp = rsp \ + = r8 = r9 = r10 = r11 = r12 = r13 = r14 = r15 = 0; \ + memset (&iregs, 0, sizeof (iregs)); \ + memset (&fregs, 0, sizeof (fregs)); \ + memset (ymm_regs, 0, sizeof (ymm_regs)); \ + memset (x87_regs, 0, sizeof (x87_regs)); + +/* Clear both hardware and register structs for integers. */ +#define clear_int_registers \ + clear_struct_registers \ + clear_int_hardware_registers + +/* TODO: Do the checking. */ +#define check_f_arguments(T) do { \ + assert (num_fregs <= 0 || fregs.ymm0._ ## T [0] == ymm_regs[0]._ ## T [0]); \ + assert (num_fregs <= 1 || fregs.ymm1._ ## T [0] == ymm_regs[1]._ ## T [0]); \ + assert (num_fregs <= 2 || fregs.ymm2._ ## T [0] == ymm_regs[2]._ ## T [0]); \ + assert (num_fregs <= 3 || fregs.ymm3._ ## T [0] == ymm_regs[3]._ ## T [0]); \ + assert (num_fregs <= 4 || fregs.ymm4._ ## T [0] == ymm_regs[4]._ ## T [0]); \ + assert (num_fregs <= 5 || fregs.ymm5._ ## T [0] == ymm_regs[5]._ ## T [0]); \ + assert (num_fregs <= 6 || fregs.ymm6._ ## T [0] == ymm_regs[6]._ ## T [0]); \ + assert (num_fregs <= 7 || fregs.ymm7._ ## T [0] == ymm_regs[7]._ ## T [0]); \ + } while (0) + +#define check_float_arguments check_f_arguments(float) +#define check_double_arguments check_f_arguments(double) + +#define check_vector_arguments(T,O) do { \ + assert (num_fregs <= 0 \ + || memcmp (((char *) &fregs.ymm0) + (O), \ + &ymm_regs[0], \ + sizeof (__ ## T) - (O)) == 0); \ + assert (num_fregs <= 1 \ + || memcmp (((char *) &fregs.ymm1) + (O), \ + &ymm_regs[1], \ + sizeof (__ ## T) - (O)) == 0); \ + assert (num_fregs <= 2 \ + || memcmp (((char *) &fregs.ymm2) + (O), \ + &ymm_regs[2], \ + sizeof (__ ## T) - (O)) == 0); \ + assert (num_fregs <= 3 \ + || memcmp (((char *) &fregs.ymm3) + (O), \ + &ymm_regs[3], \ + sizeof (__ ## T) - (O)) == 0); \ + assert (num_fregs <= 4 \ + || memcmp (((char *) &fregs.ymm4) + (O), \ + &ymm_regs[4], \ + sizeof (__ ## T) - (O)) == 0); \ + assert (num_fregs <= 5 \ + || memcmp (((char *) &fregs.ymm5) + (O), \ + &ymm_regs[5], \ + sizeof (__ ## T) - (O)) == 0); \ + assert (num_fregs <= 6 \ + || memcmp (((char *) &fregs.ymm6) + (O), \ + &ymm_regs[6], \ + sizeof (__ ## T) - (O)) == 0); \ + assert (num_fregs <= 7 \ + || memcmp (((char *) &fregs.ymm7) + (O), \ + &ymm_regs[7], \ + sizeof (__ ## T) - (O)) == 0); \ + } while (0) + +#define check_m64_arguments check_vector_arguments(m64, 0) +#define check_m128_arguments check_vector_arguments(m128, 0) +#define check_m256_arguments check_vector_arguments(m256, 0) + +#endif /* INCLUDED_ARGS_H */ diff --git a/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m256h/asm-support.S b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m256h/asm-support.S new file mode 100644 index 00000000000..73a59191d6d --- /dev/null +++ b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m256h/asm-support.S @@ -0,0 +1,81 @@ + .text + .p2align 4,,15 +.globl snapshot + .type snapshot, @function +snapshot: +.LFB3: + movq %rax, rax(%rip) + movq %rbx, rbx(%rip) + movq %rcx, rcx(%rip) + movq %rdx, rdx(%rip) + movq %rdi, rdi(%rip) + movq %rsi, rsi(%rip) + movq %rbp, rbp(%rip) + movq %rsp, rsp(%rip) + movq %r8, r8(%rip) + movq %r9, r9(%rip) + movq %r10, r10(%rip) + movq %r11, r11(%rip) + movq %r12, r12(%rip) + movq %r13, r13(%rip) + movq %r14, r14(%rip) + movq %r15, r15(%rip) + vmovdqu %ymm0, ymm_regs+0(%rip) + vmovdqu %ymm1, ymm_regs+32(%rip) + vmovdqu %ymm2, ymm_regs+64(%rip) + vmovdqu %ymm3, ymm_regs+96(%rip) + vmovdqu %ymm4, ymm_regs+128(%rip) + vmovdqu %ymm5, ymm_regs+160(%rip) + vmovdqu %ymm6, ymm_regs+192(%rip) + vmovdqu %ymm7, ymm_regs+224(%rip) + vmovdqu %ymm8, ymm_regs+256(%rip) + vmovdqu %ymm9, ymm_regs+288(%rip) + vmovdqu %ymm10, ymm_regs+320(%rip) + vmovdqu %ymm11, ymm_regs+352(%rip) + vmovdqu %ymm12, ymm_regs+384(%rip) + vmovdqu %ymm13, ymm_regs+416(%rip) + vmovdqu %ymm14, ymm_regs+448(%rip) + vmovdqu %ymm15, ymm_regs+480(%rip) + jmp *callthis(%rip) +.LFE3: + .size snapshot, .-snapshot + + .p2align 4,,15 +.globl snapshot_ret + .type snapshot_ret, @function +snapshot_ret: + movq %rdi, rdi(%rip) + subq $8, %rsp + call *callthis(%rip) + addq $8, %rsp + movq %rax, rax(%rip) + movq %rdx, rdx(%rip) + vmovdqu %ymm0, ymm_regs+0(%rip) + vmovdqu %ymm1, ymm_regs+32(%rip) + fstpt x87_regs(%rip) + fstpt x87_regs+16(%rip) + fldt x87_regs+16(%rip) + fldt x87_regs(%rip) + ret + .size snapshot_ret, .-snapshot_ret + + .comm callthis,8,8 + .comm rax,8,8 + .comm rbx,8,8 + .comm rcx,8,8 + .comm rdx,8,8 + .comm rsi,8,8 + .comm rdi,8,8 + .comm rsp,8,8 + .comm rbp,8,8 + .comm r8,8,8 + .comm r9,8,8 + .comm r10,8,8 + .comm r11,8,8 + .comm r12,8,8 + .comm r13,8,8 + .comm r14,8,8 + .comm r15,8,8 + .comm ymm_regs,512,32 + .comm x87_regs,128,32 + .comm volatile_var,8,8 diff --git a/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m256h/avx512fp16-ymm-check.h b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m256h/avx512fp16-ymm-check.h new file mode 100644 index 00000000000..6a55030c0d4 --- /dev/null +++ b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m256h/avx512fp16-ymm-check.h @@ -0,0 +1,3 @@ +#define AVX512VL(ebx) (ebx & bit_AVX512VL) +#define XSTATE_MASK (XSTATE_SSE | XSTATE_YMM | XSTATE_OPMASK) +#include "../avx512fp16-check.h" diff --git a/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m256h/test_m256_returning.c b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m256h/test_m256_returning.c new file mode 100644 index 00000000000..48e0139f416 --- /dev/null +++ b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m256h/test_m256_returning.c @@ -0,0 +1,54 @@ +#include +#include "avx512fp16-ymm-check.h" +#include "args.h" + +struct IntegerRegisters iregs; +struct FloatRegisters fregs; +unsigned int num_iregs, num_fregs; + +__m256 +fun_test_returning___m256 (void) +{ + volatile_var++; + return (__m256){73,0,0,0,0,0,0,0}; +} + +__m256h +fun_test_returning___m256h (void) +{ + volatile_var++; + return (__m256h){1.1f16,2.1f16,3.1f16,4.1f16, + 5.1f16,6.1f16,7.1f16,8.1f16, + 9.1f16,10.1f16,11.1f16,12.1f16, + 13.1f16,14.1f16,15.1f16,16.1f16}; +} + +__m256 test_256; +__m256h test_256h; + +static void +do_test (void) +{ + unsigned failed = 0; + YMM_T ymmt1, ymmt2; + + clear_struct_registers; + test_256 = (__m256){73,0,0,0,0,0,0,0}; + ymmt1._m256[0] = test_256; + ymmt2._m256[0] = WRAP_RET (fun_test_returning___m256)(); + if (memcmp (&ymmt1, &ymmt2, sizeof (ymmt2)) != 0) + printf ("fail m256\n"), failed++; + + clear_struct_registers; + test_256h = (__m256h){1.1f16,2.1f16,3.1f16,4.1f16, + 5.1f16,6.1f16,7.1f16,8.1f16, + 9.1f16,10.1f16,11.1f16,12.1f16, + 13.1f16,14.1f16,15.1f16,16.1f16}; + ymmt1._m256h[0] = test_256h; + ymmt2._m256h[0] = WRAP_RET (fun_test_returning___m256h)(); + if (memcmp (&ymmt1, &ymmt2, sizeof (ymmt2)) != 0) + printf ("fail m256h\n"), failed++; + + if (failed) + abort (); +} diff --git a/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m256h/test_passing_m256.c b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m256h/test_passing_m256.c new file mode 100644 index 00000000000..bfa80d616ee --- /dev/null +++ b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m256h/test_passing_m256.c @@ -0,0 +1,370 @@ +#include +#include "avx512fp16-ymm-check.h" +#include "args.h" + +struct IntegerRegisters iregs; +struct FloatRegisters fregs; +unsigned int num_iregs, num_fregs; + +/* This struct holds values for argument checking. */ +struct +{ + YMM_T i0, i1, i2, i3, i4, i5, i6, i7, i8, i9, i10, i11, i12, i13, i14, i15, + i16, i17, i18, i19, i20, i21, i22, i23; +} values; + +char *pass; +int failed = 0; + +#undef assert +#define assert(c) do { \ + if (!(c)) {failed++; printf ("failed %s\n", pass); } \ +} while (0) + +#define compare(X1,X2,T) do { \ + assert (memcmp (&X1, &X2, sizeof (T)) == 0); \ +} while (0) + +fun_check_passing_m256_8_values (__m256 i0 ATTRIBUTE_UNUSED, + __m256 i1 ATTRIBUTE_UNUSED, + __m256 i2 ATTRIBUTE_UNUSED, + __m256 i3 ATTRIBUTE_UNUSED, + __m256 i4 ATTRIBUTE_UNUSED, + __m256 i5 ATTRIBUTE_UNUSED, + __m256 i6 ATTRIBUTE_UNUSED, + __m256 i7 ATTRIBUTE_UNUSED) +{ + /* Check argument values. */ + compare (values.i0, i0, __m256); + compare (values.i1, i1, __m256); + compare (values.i2, i2, __m256); + compare (values.i3, i3, __m256); + compare (values.i4, i4, __m256); + compare (values.i5, i5, __m256); + compare (values.i6, i6, __m256); + compare (values.i7, i7, __m256); +} + +fun_check_passing_m256h_8_values (__m256h i0 ATTRIBUTE_UNUSED, + __m256h i1 ATTRIBUTE_UNUSED, + __m256h i2 ATTRIBUTE_UNUSED, + __m256h i3 ATTRIBUTE_UNUSED, + __m256h i4 ATTRIBUTE_UNUSED, + __m256h i5 ATTRIBUTE_UNUSED, + __m256h i6 ATTRIBUTE_UNUSED, + __m256h i7 ATTRIBUTE_UNUSED) +{ + /* Check argument values. */ + compare (values.i0, i0, __m256h); + compare (values.i1, i1, __m256h); + compare (values.i2, i2, __m256h); + compare (values.i3, i3, __m256h); + compare (values.i4, i4, __m256h); + compare (values.i5, i5, __m256h); + compare (values.i6, i6, __m256h); + compare (values.i7, i7, __m256h); +} + +void +fun_check_passing_m256_8_regs (__m256 i0 ATTRIBUTE_UNUSED, + __m256 i1 ATTRIBUTE_UNUSED, + __m256 i2 ATTRIBUTE_UNUSED, + __m256 i3 ATTRIBUTE_UNUSED, + __m256 i4 ATTRIBUTE_UNUSED, + __m256 i5 ATTRIBUTE_UNUSED, + __m256 i6 ATTRIBUTE_UNUSED, + __m256 i7 ATTRIBUTE_UNUSED) +{ + /* Check register contents. */ + check_m256_arguments; +} + +void +fun_check_passing_m256h_8_regs (__m256h i0 ATTRIBUTE_UNUSED, + __m256h i1 ATTRIBUTE_UNUSED, + __m256h i2 ATTRIBUTE_UNUSED, + __m256h i3 ATTRIBUTE_UNUSED, + __m256h i4 ATTRIBUTE_UNUSED, + __m256h i5 ATTRIBUTE_UNUSED, + __m256h i6 ATTRIBUTE_UNUSED, + __m256h i7 ATTRIBUTE_UNUSED) +{ + /* Check register contents. */ + check_m256_arguments; +} + +void +fun_check_passing_m256_20_values (__m256 i0 ATTRIBUTE_UNUSED, + __m256 i1 ATTRIBUTE_UNUSED, + __m256 i2 ATTRIBUTE_UNUSED, + __m256 i3 ATTRIBUTE_UNUSED, + __m256 i4 ATTRIBUTE_UNUSED, + __m256 i5 ATTRIBUTE_UNUSED, + __m256 i6 ATTRIBUTE_UNUSED, + __m256 i7 ATTRIBUTE_UNUSED, + __m256 i8 ATTRIBUTE_UNUSED, + __m256 i9 ATTRIBUTE_UNUSED, + __m256 i10 ATTRIBUTE_UNUSED, + __m256 i11 ATTRIBUTE_UNUSED, + __m256 i12 ATTRIBUTE_UNUSED, + __m256 i13 ATTRIBUTE_UNUSED, + __m256 i14 ATTRIBUTE_UNUSED, + __m256 i15 ATTRIBUTE_UNUSED, + __m256 i16 ATTRIBUTE_UNUSED, + __m256 i17 ATTRIBUTE_UNUSED, + __m256 i18 ATTRIBUTE_UNUSED, + __m256 i19 ATTRIBUTE_UNUSED) +{ + /* Check argument values. */ + compare (values.i0, i0, __m256); + compare (values.i1, i1, __m256); + compare (values.i2, i2, __m256); + compare (values.i3, i3, __m256); + compare (values.i4, i4, __m256); + compare (values.i5, i5, __m256); + compare (values.i6, i6, __m256); + compare (values.i7, i7, __m256); + compare (values.i8, i8, __m256); + compare (values.i9, i9, __m256); + compare (values.i10, i10, __m256); + compare (values.i11, i11, __m256); + compare (values.i12, i12, __m256); + compare (values.i13, i13, __m256); + compare (values.i14, i14, __m256); + compare (values.i15, i15, __m256); + compare (values.i16, i16, __m256); + compare (values.i17, i17, __m256); + compare (values.i18, i18, __m256); + compare (values.i19, i19, __m256); +} + +void +fun_check_passing_m256h_20_values (__m256h i0 ATTRIBUTE_UNUSED, + __m256h i1 ATTRIBUTE_UNUSED, + __m256h i2 ATTRIBUTE_UNUSED, + __m256h i3 ATTRIBUTE_UNUSED, + __m256h i4 ATTRIBUTE_UNUSED, + __m256h i5 ATTRIBUTE_UNUSED, + __m256h i6 ATTRIBUTE_UNUSED, + __m256h i7 ATTRIBUTE_UNUSED, + __m256h i8 ATTRIBUTE_UNUSED, + __m256h i9 ATTRIBUTE_UNUSED, + __m256h i10 ATTRIBUTE_UNUSED, + __m256h i11 ATTRIBUTE_UNUSED, + __m256h i12 ATTRIBUTE_UNUSED, + __m256h i13 ATTRIBUTE_UNUSED, + __m256h i14 ATTRIBUTE_UNUSED, + __m256h i15 ATTRIBUTE_UNUSED, + __m256h i16 ATTRIBUTE_UNUSED, + __m256h i17 ATTRIBUTE_UNUSED, + __m256h i18 ATTRIBUTE_UNUSED, + __m256h i19 ATTRIBUTE_UNUSED) +{ + /* Check argument values. */ + compare (values.i0, i0, __m256h); + compare (values.i1, i1, __m256h); + compare (values.i2, i2, __m256h); + compare (values.i3, i3, __m256h); + compare (values.i4, i4, __m256h); + compare (values.i5, i5, __m256h); + compare (values.i6, i6, __m256h); + compare (values.i7, i7, __m256h); + compare (values.i8, i8, __m256h); + compare (values.i9, i9, __m256h); + compare (values.i10, i10, __m256h); + compare (values.i11, i11, __m256h); + compare (values.i12, i12, __m256h); + compare (values.i13, i13, __m256h); + compare (values.i14, i14, __m256h); + compare (values.i15, i15, __m256h); + compare (values.i16, i16, __m256h); + compare (values.i17, i17, __m256h); + compare (values.i18, i18, __m256h); + compare (values.i19, i19, __m256h); +} + +void +fun_check_passing_m256_20_regs (__m256 i0 ATTRIBUTE_UNUSED, + __m256 i1 ATTRIBUTE_UNUSED, + __m256 i2 ATTRIBUTE_UNUSED, + __m256 i3 ATTRIBUTE_UNUSED, + __m256 i4 ATTRIBUTE_UNUSED, + __m256 i5 ATTRIBUTE_UNUSED, + __m256 i6 ATTRIBUTE_UNUSED, + __m256 i7 ATTRIBUTE_UNUSED, + __m256 i8 ATTRIBUTE_UNUSED, + __m256 i9 ATTRIBUTE_UNUSED, + __m256 i10 ATTRIBUTE_UNUSED, + __m256 i11 ATTRIBUTE_UNUSED, + __m256 i12 ATTRIBUTE_UNUSED, + __m256 i13 ATTRIBUTE_UNUSED, + __m256 i14 ATTRIBUTE_UNUSED, + __m256 i15 ATTRIBUTE_UNUSED, + __m256 i16 ATTRIBUTE_UNUSED, + __m256 i17 ATTRIBUTE_UNUSED, + __m256 i18 ATTRIBUTE_UNUSED, + __m256 i19 ATTRIBUTE_UNUSED) +{ + /* Check register contents. */ + check_m256_arguments; +} + +void +fun_check_passing_m256h_20_regs (__m256h i0 ATTRIBUTE_UNUSED, + __m256h i1 ATTRIBUTE_UNUSED, + __m256h i2 ATTRIBUTE_UNUSED, + __m256h i3 ATTRIBUTE_UNUSED, + __m256h i4 ATTRIBUTE_UNUSED, + __m256h i5 ATTRIBUTE_UNUSED, + __m256h i6 ATTRIBUTE_UNUSED, + __m256h i7 ATTRIBUTE_UNUSED, + __m256h i8 ATTRIBUTE_UNUSED, + __m256h i9 ATTRIBUTE_UNUSED, + __m256h i10 ATTRIBUTE_UNUSED, + __m256h i11 ATTRIBUTE_UNUSED, + __m256h i12 ATTRIBUTE_UNUSED, + __m256h i13 ATTRIBUTE_UNUSED, + __m256h i14 ATTRIBUTE_UNUSED, + __m256h i15 ATTRIBUTE_UNUSED, + __m256h i16 ATTRIBUTE_UNUSED, + __m256h i17 ATTRIBUTE_UNUSED, + __m256h i18 ATTRIBUTE_UNUSED, + __m256h i19 ATTRIBUTE_UNUSED) +{ + /* Check register contents. */ + check_m256_arguments; +} + +#define def_check_passing8(_i0, _i1, _i2, _i3, _i4, _i5, _i6, _i7, _func1, _func2, TYPE) \ + values.i0.TYPE[0] = _i0; \ + values.i1.TYPE[0] = _i1; \ + values.i2.TYPE[0] = _i2; \ + values.i3.TYPE[0] = _i3; \ + values.i4.TYPE[0] = _i4; \ + values.i5.TYPE[0] = _i5; \ + values.i6.TYPE[0] = _i6; \ + values.i7.TYPE[0] = _i7; \ + WRAP_CALL(_func1) (_i0, _i1, _i2, _i3, _i4, _i5, _i6, _i7); \ + clear_struct_registers; \ + fregs.F0.TYPE[0] = _i0; \ + fregs.F1.TYPE[0] = _i1; \ + fregs.F2.TYPE[0] = _i2; \ + fregs.F3.TYPE[0] = _i3; \ + fregs.F4.TYPE[0] = _i4; \ + fregs.F5.TYPE[0] = _i5; \ + fregs.F6.TYPE[0] = _i6; \ + fregs.F7.TYPE[0] = _i7; \ + num_fregs = 8; \ + WRAP_CALL(_func2) (_i0, _i1, _i2, _i3, _i4, _i5, _i6, _i7); + +#define def_check_passing20(_i0, _i1, _i2, _i3, _i4, _i5, _i6, _i7, \ + _i8, _i9, _i10, _i11, _i12, _i13, _i14, \ + _i15, _i16, _i17, _i18, _i19, _func1, \ + _func2, TYPE) \ + values.i0.TYPE[0] = _i0; \ + values.i1.TYPE[0] = _i1; \ + values.i2.TYPE[0] = _i2; \ + values.i3.TYPE[0] = _i3; \ + values.i4.TYPE[0] = _i4; \ + values.i5.TYPE[0] = _i5; \ + values.i6.TYPE[0] = _i6; \ + values.i7.TYPE[0] = _i7; \ + values.i8.TYPE[0] = _i8; \ + values.i9.TYPE[0] = _i9; \ + values.i10.TYPE[0] = _i10; \ + values.i11.TYPE[0] = _i11; \ + values.i12.TYPE[0] = _i12; \ + values.i13.TYPE[0] = _i13; \ + values.i14.TYPE[0] = _i14; \ + values.i15.TYPE[0] = _i15; \ + values.i16.TYPE[0] = _i16; \ + values.i17.TYPE[0] = _i17; \ + values.i18.TYPE[0] = _i18; \ + values.i19.TYPE[0] = _i19; \ + WRAP_CALL(_func1) (_i0, _i1, _i2, _i3, _i4, _i5, _i6, _i7, _i8, \ + _i9, _i10, _i11, _i12, _i13, _i14, _i15, \ + _i16, _i17, _i18, _i19); \ + clear_struct_registers; \ + fregs.F0.TYPE[0] = _i0; \ + fregs.F1.TYPE[0] = _i1; \ + fregs.F2.TYPE[0] = _i2; \ + fregs.F3.TYPE[0] = _i3; \ + fregs.F4.TYPE[0] = _i4; \ + fregs.F5.TYPE[0] = _i5; \ + fregs.F6.TYPE[0] = _i6; \ + fregs.F7.TYPE[0] = _i7; \ + num_fregs = 8; \ + WRAP_CALL(_func2) (_i0, _i1, _i2, _i3, _i4, _i5, _i6, _i7, _i8, \ + _i9, _i10, _i11, _i12, _i13, _i14, _i15, \ + _i16, _i17, _i18, _i19); + +void +test_m256_on_stack () +{ + __m256 x[8]; + int i; + for (i = 0; i < 8; i++) + x[i] = (__m256){32 + i, 0, 0, 0, 0, 0, 0, 0}; + pass = "m256-8"; + def_check_passing8 (x[0], x[1], x[2], x[3], x[4], x[5], x[6], x[7], + fun_check_passing_m256_8_values, + fun_check_passing_m256_8_regs, _m256); +} + +void +test_m256h_on_stack () +{ + __m256h x[8]; + int i; + for (i = 0; i < 8; i++) + x[i] = (__m256h){1.1f16 + i, 2.1f16 + i, 3.1f16 + i, 4.1f16 + i, + 5.1f16 + i, 6.1f16 + i, 7.1f16 + i, 8.1f16 + i, + 9.1f16 + i, 10.1f16 + i, 11.1f16 + i, 12.1f16 + i, + 13.1f16 + i, 14.1f16 + i, 15.1f16 + i, 16.1f16 + i}; + pass = "m256h-8"; + def_check_passing8 (x[0], x[1], x[2], x[3], x[4], x[5], x[6], x[7], + fun_check_passing_m256h_8_values, + fun_check_passing_m256h_8_regs, _m256h); +} + +void +test_too_many_m256 () +{ + __m256 x[20]; + int i; + for (i = 0; i < 20; i++) + x[i] = (__m256){32 + i, 0, 0, 0, 0, 0, 0, 0}; + pass = "m256-20"; + def_check_passing20 (x[0], x[1], x[2], x[3], x[4], x[5], x[6], x[7], x[8], + x[9], x[10], x[11], x[12], x[13], x[14], x[15], x[16], + x[17], x[18], x[19], fun_check_passing_m256_20_values, + fun_check_passing_m256_20_regs, _m256); +} + +void +test_too_many_m256h () +{ + __m256h x[20]; + int i; + for (i = 0; i < 20; i++) + x[i] = (__m256h){1.1f16 + i, 2.1f16 + i, 3.1f16 + i, 4.1f16 + i, + 5.1f16 + i, 6.1f16 + i, 7.1f16 + i, 8.1f16 + i, + 9.1f16 + i, 10.1f16 + i, 11.1f16 + i, 12.1f16 + i, + 13.1f16 + i, 14.1f16 + i, 15.1f16 + i, 16.1f16 + i}; + pass = "m256h-20"; + def_check_passing20 (x[0], x[1], x[2], x[3], x[4], x[5], x[6], x[7], x[8], + x[9], x[10], x[11], x[12], x[13], x[14], x[15], x[16], + x[17], x[18], x[19], fun_check_passing_m256h_20_values, + fun_check_passing_m256h_20_regs, _m256h); +} + +static void +do_test (void) +{ + test_m256_on_stack (); + test_too_many_m256 (); + test_m256h_on_stack (); + test_too_many_m256h (); + if (failed) + abort (); +} diff --git a/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m256h/test_passing_structs.c b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m256h/test_passing_structs.c new file mode 100644 index 00000000000..eff10badd6b --- /dev/null +++ b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m256h/test_passing_structs.c @@ -0,0 +1,113 @@ +#include "avx512fp16-ymm-check.h" +#include "args.h" + +struct IntegerRegisters iregs; +struct FloatRegisters fregs; +unsigned int num_iregs, num_fregs; + +struct m256_struct +{ + __m256 x; +}; + +struct m256_2_struct +{ + __m256 x1, x2; +}; + +struct m256h_struct +{ + __m256h x; +}; + +struct m256h_2_struct +{ + __m256h x1, x2; +}; + +/* Check that the struct is passed as the individual members in fregs. */ +void +check_struct_passing1 (struct m256_struct ms1 ATTRIBUTE_UNUSED, + struct m256_struct ms2 ATTRIBUTE_UNUSED, + struct m256_struct ms3 ATTRIBUTE_UNUSED, + struct m256_struct ms4 ATTRIBUTE_UNUSED, + struct m256_struct ms5 ATTRIBUTE_UNUSED, + struct m256_struct ms6 ATTRIBUTE_UNUSED, + struct m256_struct ms7 ATTRIBUTE_UNUSED, + struct m256_struct ms8 ATTRIBUTE_UNUSED) +{ + check_m256_arguments; +} + +void +check_struct_passing2 (struct m256_2_struct ms ATTRIBUTE_UNUSED) +{ + /* Check the passing on the stack by comparing the address of the + stack elements to the expected place on the stack. */ + assert ((unsigned long)&ms.x1 == rsp+8); + assert ((unsigned long)&ms.x2 == rsp+40); +} + +void +check_struct_passing1h (struct m256h_struct ms1 ATTRIBUTE_UNUSED, + struct m256h_struct ms2 ATTRIBUTE_UNUSED, + struct m256h_struct ms3 ATTRIBUTE_UNUSED, + struct m256h_struct ms4 ATTRIBUTE_UNUSED, + struct m256h_struct ms5 ATTRIBUTE_UNUSED, + struct m256h_struct ms6 ATTRIBUTE_UNUSED, + struct m256h_struct ms7 ATTRIBUTE_UNUSED, + struct m256h_struct ms8 ATTRIBUTE_UNUSED) +{ + check_m256_arguments; +} + +void +check_struct_passing2h (struct m256h_2_struct ms ATTRIBUTE_UNUSED) +{ + /* Check the passing on the stack by comparing the address of the + stack elements to the expected place on the stack. */ + assert ((unsigned long)&ms.x1 == rsp+8); + assert ((unsigned long)&ms.x2 == rsp+40); +} + +static void +do_test (void) +{ + struct m256_struct m256s [8]; + struct m256h_struct m256hs [8]; + struct m256_2_struct m256_2s = { + { 48.394, 39.3, -397.9, 3484.9, -8.394, -93.3, 7.9, 84.94 }, + { -8.394, -3.3, -39.9, 34.9, 7.9, 84.94, -48.394, 39.3 } + }; + struct m256h_2_struct m256h_2s = { + { 47.364f16, 36.3f16, -367.6f16, 3474.6f16, -7.364f16, -63.3f16, 7.6f16, 74.64f16, + 57.865f16, 86.8f16, -867.6f16, 8575.6f16, -7.865f16, -68.8f16, 7.6f16, 75.65f16 }, + { -7.364f16, -3.3f16, -36.6f16, 34.6f16, 7.6f16, 74.64f16, -47.364f16, 36.3f16, + -8.364f16, -3.3f16, -36.6f16, 34.6f16, 8.6f16, 84.64f16, -48.364f16, 36.3f16 } + }; + int i; + + for (i = 0; i < 8; i++) + { + m256s[i].x = (__m256){32+i, 0, i, 0, -i, 0, i - 12, i + 8}; + + m256hs[i].x = (__m256h){33+i, 0, i, 0, -i, 0, i - 11, i + 9, + 31+i, 2, i, 3, -i, 4, i - 10, i + 7}; + } + + clear_struct_registers; + for (i = 0; i < 8; i++) + (&fregs.ymm0)[i]._m256[0] = m256s[i].x; + num_fregs = 8; + WRAP_CALL (check_struct_passing1)(m256s[0], m256s[1], m256s[2], m256s[3], + m256s[4], m256s[5], m256s[6], m256s[7]); + WRAP_CALL (check_struct_passing2)(m256_2s); + + clear_struct_registers; + for (i = 0; i < 8; i++) + (&fregs.ymm0)[i]._m256h[0] = m256hs[i].x; + num_fregs = 8; + WRAP_CALL (check_struct_passing1h)(m256hs[0], m256hs[1], m256hs[2], m256hs[3], + m256hs[4], m256hs[5], m256hs[6], m256hs[7]); + WRAP_CALL (check_struct_passing2h)(m256h_2s); +} diff --git a/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m256h/test_passing_unions.c b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m256h/test_passing_unions.c new file mode 100644 index 00000000000..76f300c3e5d --- /dev/null +++ b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m256h/test_passing_unions.c @@ -0,0 +1,337 @@ +#include "avx512fp16-ymm-check.h" +#include "args.h" + +struct IntegerRegisters iregs; +struct FloatRegisters fregs; +unsigned int num_iregs, num_fregs; + +union un1 +{ + __m256 x; + float f; +}; + +union un2 +{ + __m256 x; + double d; +}; + +union un3 +{ + __m256 x; + __m128 v; +}; + +union un4 +{ + __m256 x; + long double ld; +}; + +union un5 +{ + __m256 x; + int i; +}; + +union un1a +{ + __m256 x; + _Float16 f; +}; + +union un1h +{ + __m256h x; + float f; +}; + +union un1hh +{ + __m256h x; + _Float16 f; +}; + +union un2h +{ + __m256h x; + double d; +}; + +union un3h +{ + __m256h x; + __m128 v; +}; + +union un4h +{ + __m256h x; + long double ld; +}; + +union un5h +{ + __m256h x; + int i; +}; + +void +check_union_passing1(union un1 u1 ATTRIBUTE_UNUSED, + union un1 u2 ATTRIBUTE_UNUSED, + union un1 u3 ATTRIBUTE_UNUSED, + union un1 u4 ATTRIBUTE_UNUSED, + union un1 u5 ATTRIBUTE_UNUSED, + union un1 u6 ATTRIBUTE_UNUSED, + union un1 u7 ATTRIBUTE_UNUSED, + union un1 u8 ATTRIBUTE_UNUSED) +{ + check_m256_arguments; +} + +void +check_union_passing1a(union un1a u1 ATTRIBUTE_UNUSED, + union un1a u2 ATTRIBUTE_UNUSED, + union un1a u3 ATTRIBUTE_UNUSED, + union un1a u4 ATTRIBUTE_UNUSED, + union un1a u5 ATTRIBUTE_UNUSED, + union un1a u6 ATTRIBUTE_UNUSED, + union un1a u7 ATTRIBUTE_UNUSED, + union un1a u8 ATTRIBUTE_UNUSED) +{ + check_m256_arguments; +} + +void +check_union_passing1h(union un1h u1 ATTRIBUTE_UNUSED, + union un1h u2 ATTRIBUTE_UNUSED, + union un1h u3 ATTRIBUTE_UNUSED, + union un1h u4 ATTRIBUTE_UNUSED, + union un1h u5 ATTRIBUTE_UNUSED, + union un1h u6 ATTRIBUTE_UNUSED, + union un1h u7 ATTRIBUTE_UNUSED, + union un1h u8 ATTRIBUTE_UNUSED) +{ + check_m256_arguments; +} + +void +check_union_passing1hh(union un1hh u1 ATTRIBUTE_UNUSED, + union un1hh u2 ATTRIBUTE_UNUSED, + union un1hh u3 ATTRIBUTE_UNUSED, + union un1hh u4 ATTRIBUTE_UNUSED, + union un1hh u5 ATTRIBUTE_UNUSED, + union un1hh u6 ATTRIBUTE_UNUSED, + union un1hh u7 ATTRIBUTE_UNUSED, + union un1hh u8 ATTRIBUTE_UNUSED) +{ + check_m256_arguments; +} + +void +check_union_passing2(union un2 u1 ATTRIBUTE_UNUSED, + union un2 u2 ATTRIBUTE_UNUSED, + union un2 u3 ATTRIBUTE_UNUSED, + union un2 u4 ATTRIBUTE_UNUSED, + union un2 u5 ATTRIBUTE_UNUSED, + union un2 u6 ATTRIBUTE_UNUSED, + union un2 u7 ATTRIBUTE_UNUSED, + union un2 u8 ATTRIBUTE_UNUSED) +{ + check_m256_arguments; +} + +void +check_union_passing2h(union un2h u1 ATTRIBUTE_UNUSED, + union un2h u2 ATTRIBUTE_UNUSED, + union un2h u3 ATTRIBUTE_UNUSED, + union un2h u4 ATTRIBUTE_UNUSED, + union un2h u5 ATTRIBUTE_UNUSED, + union un2h u6 ATTRIBUTE_UNUSED, + union un2h u7 ATTRIBUTE_UNUSED, + union un2h u8 ATTRIBUTE_UNUSED) +{ + check_m256_arguments; +} + +void +check_union_passing3(union un3 u1 ATTRIBUTE_UNUSED, + union un3 u2 ATTRIBUTE_UNUSED, + union un3 u3 ATTRIBUTE_UNUSED, + union un3 u4 ATTRIBUTE_UNUSED, + union un3 u5 ATTRIBUTE_UNUSED, + union un3 u6 ATTRIBUTE_UNUSED, + union un3 u7 ATTRIBUTE_UNUSED, + union un3 u8 ATTRIBUTE_UNUSED) +{ + check_m256_arguments; +} + +void +check_union_passing3h(union un3h u1 ATTRIBUTE_UNUSED, + union un3h u2 ATTRIBUTE_UNUSED, + union un3h u3 ATTRIBUTE_UNUSED, + union un3h u4 ATTRIBUTE_UNUSED, + union un3h u5 ATTRIBUTE_UNUSED, + union un3h u6 ATTRIBUTE_UNUSED, + union un3h u7 ATTRIBUTE_UNUSED, + union un3h u8 ATTRIBUTE_UNUSED) +{ + check_m256_arguments; +} + +void +check_union_passing4(union un4 u ATTRIBUTE_UNUSED) +{ + /* Check the passing on the stack by comparing the address of the + stack elements to the expected place on the stack. */ + assert ((unsigned long)&u.x == rsp+8); + assert ((unsigned long)&u.ld == rsp+8); +} + +void +check_union_passing4h(union un4h u ATTRIBUTE_UNUSED) +{ + /* Check the passing on the stack by comparing the address of the + stack elements to the expected place on the stack. */ + assert ((unsigned long)&u.x == rsp+8); + assert ((unsigned long)&u.ld == rsp+8); +} + +void +check_union_passing5(union un5 u ATTRIBUTE_UNUSED) +{ + /* Check the passing on the stack by comparing the address of the + stack elements to the expected place on the stack. */ + assert ((unsigned long)&u.x == rsp+8); + assert ((unsigned long)&u.i == rsp+8); +} + +void +check_union_passing5h(union un5h u ATTRIBUTE_UNUSED) +{ + /* Check the passing on the stack by comparing the address of the + stack elements to the expected place on the stack. */ + assert ((unsigned long)&u.x == rsp+8); + assert ((unsigned long)&u.i == rsp+8); +} + +#define check_union_passing1 WRAP_CALL(check_union_passing1) +#define check_union_passing2 WRAP_CALL(check_union_passing2) +#define check_union_passing3 WRAP_CALL(check_union_passing3) +#define check_union_passing4 WRAP_CALL(check_union_passing4) +#define check_union_passing5 WRAP_CALL(check_union_passing5) + +#define check_union_passing1h WRAP_CALL(check_union_passing1h) +#define check_union_passing1a WRAP_CALL(check_union_passing1a) +#define check_union_passing1hh WRAP_CALL(check_union_passing1hh) +#define check_union_passing2h WRAP_CALL(check_union_passing2h) +#define check_union_passing3h WRAP_CALL(check_union_passing3h) +#define check_union_passing4h WRAP_CALL(check_union_passing4h) +#define check_union_passing5h WRAP_CALL(check_union_passing5h) + +static void +do_test (void) +{ + union un1 u1[8]; + union un2 u2[8]; + union un3 u3[8]; + union un4 u4; + union un5 u5; + union un1a u1a[8]; + union un1h u1h[8]; + union un1hh u1hh[8]; + union un2h u2h[8]; + union un3h u3h[8]; + union un4h u4h; + union un5h u5h; + int i; + + for (i = 0; i < 8; i++) + { + u1[i].x = (__m256){32+i, 0, i, 0, -i, 0, i - 12, i + 8}; + u1h[i].x = (__m256h){32+i, 0, i, 0, -i, 0, i - 12, i + 8, + 33+i, 1, i, 2, -i, 4, i - 11, i + 9}; + } + + clear_struct_registers; + for (i = 0; i < 8; i++) + (&fregs.ymm0)[i]._m256[0] = u1[i].x; + num_fregs = 8; + check_union_passing1(u1[0], u1[1], u1[2], u1[3], + u1[4], u1[5], u1[6], u1[7]); + + clear_struct_registers; + for (i = 0; i < 8; i++) + { + u1a[i].x = u1[i].x; + (&fregs.ymm0)[i]._m256[0] = u1a[i].x; + } + num_fregs = 8; + check_union_passing1a(u1a[0], u1a[1], u1a[2], u1a[3], + u1a[4], u1a[5], u1a[6], u1a[7]); + + clear_struct_registers; + for (i = 0; i < 8; i++) + (&fregs.ymm0)[i]._m256h[0] = u1h[i].x; + num_fregs = 8; + check_union_passing1h(u1h[0], u1h[1], u1h[2], u1h[3], + u1h[4], u1h[5], u1h[6], u1h[7]); + + clear_struct_registers; + for (i = 0; i < 8; i++) + { + u1hh[i].x = u1h[i].x; + (&fregs.ymm0)[i]._m256h[0] = u1hh[i].x; + } + num_fregs = 8; + check_union_passing1hh(u1hh[0], u1hh[1], u1hh[2], u1hh[3], + u1hh[4], u1hh[5], u1hh[6], u1hh[7]); + + clear_struct_registers; + for (i = 0; i < 8; i++) + { + u2[i].x = u1[i].x; + (&fregs.ymm0)[i]._m256[0] = u2[i].x; + } + num_fregs = 8; + check_union_passing2(u2[0], u2[1], u2[2], u2[3], + u2[4], u2[5], u2[6], u2[7]); + + clear_struct_registers; + for (i = 0; i < 8; i++) + { + u2h[i].x = u1h[i].x; + (&fregs.ymm0)[i]._m256h[0] = u2h[i].x; + } + num_fregs = 8; + check_union_passing2h(u2h[0], u2h[1], u2h[2], u2h[3], + u2h[4], u2h[5], u2h[6], u2h[7]); + + clear_struct_registers; + for (i = 0; i < 8; i++) + { + u3[i].x = u1[i].x; + (&fregs.ymm0)[i]._m256[0] = u3[i].x; + } + num_fregs = 8; + check_union_passing3(u3[0], u3[1], u3[2], u3[3], + u3[4], u3[5], u3[6], u3[7]); + + clear_struct_registers; + for (i = 0; i < 8; i++) + { + u3h[i].x = u1h[i].x; + (&fregs.ymm0)[i]._m256h[0] = u3h[i].x; + } + num_fregs = 8; + check_union_passing3h(u3h[0], u3h[1], u3h[2], u3h[3], + u3h[4], u3h[5], u3h[6], u3h[7]); + + check_union_passing4(u4); + check_union_passing5(u5); + + check_union_passing4h(u4h); + check_union_passing5h(u5h); +} diff --git a/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m256h/test_varargs-m256.c b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m256h/test_varargs-m256.c new file mode 100644 index 00000000000..f15adb4a33b --- /dev/null +++ b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m256h/test_varargs-m256.c @@ -0,0 +1,160 @@ +/* Test variable number of 256-bit vector arguments passed to functions. */ + +#include +#include "avx512fp16-ymm-check.h" +#include "args.h" + +struct IntegerRegisters iregs; +struct FloatRegisters fregs; + +/* This struct holds values for argument checking. */ +struct +{ + YMM_T i0, i1, i2, i3, i4, i5, i6, i7, i8, i9; +} values; + +char *pass; +int failed = 0; + +#undef assert +#define assert(c) do { \ + if (!(c)) {failed++; printf ("failed %s\n", pass); } \ +} while (0) + +#define compare(X1,X2,T) do { \ + assert (memcmp (&X1, &X2, sizeof (T)) == 0); \ +} while (0) + +void +fun_check_passing_m256_varargs (__m256 i0, __m256 i1, __m256 i2, + __m256 i3, ...) +{ + /* Check argument values. */ + void **fp = __builtin_frame_address (0); + void *ra = __builtin_return_address (0); + __m256 *argp; + + compare (values.i0, i0, __m256); + compare (values.i1, i1, __m256); + compare (values.i2, i2, __m256); + compare (values.i3, i3, __m256); + + /* Get the pointer to the return address on stack. */ + while (*fp != ra) + fp++; + + /* Skip the return address stack slot. */ + argp = (__m256 *)(((char *) fp) + 8); + + /* Check __m256 arguments passed on stack. */ + compare (values.i4, argp[0], __m256); + compare (values.i5, argp[1], __m256); + compare (values.i6, argp[2], __m256); + compare (values.i7, argp[3], __m256); + compare (values.i8, argp[4], __m256); + compare (values.i9, argp[5], __m256); + + /* Check register contents. */ + compare (fregs.ymm0, ymm_regs[0], __m256); + compare (fregs.ymm1, ymm_regs[1], __m256); + compare (fregs.ymm2, ymm_regs[2], __m256); + compare (fregs.ymm3, ymm_regs[3], __m256); +} + +void +fun_check_passing_m256h_varargs (__m256h i0, __m256h i1, __m256h i2, + __m256h i3, ...) +{ + /* Check argument values. */ + void **fp = __builtin_frame_address (0); + void *ra = __builtin_return_address (0); + __m256h *argp; + + compare (values.i0, i0, __m256h); + compare (values.i1, i1, __m256h); + compare (values.i2, i2, __m256h); + compare (values.i3, i3, __m256h); + + /* Get the pointer to the return address on stack. */ + while (*fp != ra) + fp++; + + /* Skip the return address stack slot. */ + argp = (__m256h *)(((char *) fp) + 8); + + /* Check __m256h arguments passed on stack. */ + compare (values.i4, argp[0], __m256h); + compare (values.i5, argp[1], __m256h); + compare (values.i6, argp[2], __m256h); + compare (values.i7, argp[3], __m256h); + compare (values.i8, argp[4], __m256h); + compare (values.i9, argp[5], __m256h); + + /* Check register contents. */ + compare (fregs.ymm0, ymm_regs[0], __m256h); + compare (fregs.ymm1, ymm_regs[1], __m256h); + compare (fregs.ymm2, ymm_regs[2], __m256h); + compare (fregs.ymm3, ymm_regs[3], __m256h); +} + +#define def_check_int_passing_varargs(_i0, _i1, _i2, _i3, _i4, _i5, \ + _i6, _i7, _i8, _i9, \ + _func, TYPE) \ + values.i0.TYPE[0] = _i0; \ + values.i1.TYPE[0] = _i1; \ + values.i2.TYPE[0] = _i2; \ + values.i3.TYPE[0] = _i3; \ + values.i4.TYPE[0] = _i4; \ + values.i5.TYPE[0] = _i5; \ + values.i6.TYPE[0] = _i6; \ + values.i7.TYPE[0] = _i7; \ + values.i8.TYPE[0] = _i8; \ + values.i9.TYPE[0] = _i9; \ + clear_struct_registers; \ + fregs.F0.TYPE[0] = _i0; \ + fregs.F1.TYPE[0] = _i1; \ + fregs.F2.TYPE[0] = _i2; \ + fregs.F3.TYPE[0] = _i3; \ + WRAP_CALL(_func) (_i0, _i1, _i2, _i3, _i4, _i5, _i6, _i7, _i8, _i9); + +void +test_m256_varargs (void) +{ + __m256 x[10]; + int i; + for (i = 0; i < 10; i++) + x[i] = (__m256){32+i, 0, 0, 0, 0, 0, 0, 0}; + pass = "m256-varargs"; + def_check_int_passing_varargs (x[0], x[1], x[2], x[3], x[4], x[5], + x[6], x[7], x[8], x[9], + fun_check_passing_m256_varargs, + _m256); +} + +void +test_m256h_varargs (void) +{ + __m256h x[10]; + int i; + for (i = 0; i < 10; i++) + x[i] = (__m256h) { + 1.1f16 + i, 2.2f16 + i, 3.3f16 + i, 4.4f16 + i, + 5.5f16 + i, 6.6f16 + i, 7.7f16 + i, 8.8f16 + i, + 9.9f16 + i, 10.10f16 + i, 11.11f16 + i, 12.12f16 + i, + 13.13f16 + i, 14.14f16 + i, 15.15f16 + i, 16.16f16 + i + }; + pass = "m256h-varargs"; + def_check_int_passing_varargs (x[0], x[1], x[2], x[3], x[4], x[5], + x[6], x[7], x[8], x[9], + fun_check_passing_m256h_varargs, + _m256h); +} + +void +do_test (void) +{ + test_m256_varargs (); + test_m256h_varargs (); + if (failed) + abort (); +} From patchwork Thu Jul 1 06:15:52 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: liuhongt X-Patchwork-Id: 1499316 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256 header.s=default header.b=u4zKQE05; dkim-atps=neutral Received: from sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4GFpD56dGfz9sWX for ; Thu, 1 Jul 2021 16:27:32 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 6D9B2384B106 for ; Thu, 1 Jul 2021 06:27:30 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 6D9B2384B106 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1625120850; bh=QODPG1cTMUQGMnVBPA9gmOVpTIjFpCvNTuE93guLGZk=; h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=u4zKQE05eCQ/238V1C6cu9lCFjER7rQhwmZ5X5JvqATirGJ+Nn8KOJOgbznSprdrX 6gPw/bMt41dmIDRBucQjFOl16zOulUU9RqO+xVOaBmm1rh4V6rdGX3AZcR/cDiuDGF 9STUpfTwFm4CkQE0uy6cTSF9I5DtkuD9S52nVgc4= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) by sourceware.org (Postfix) with ESMTPS id DF298384B806 for ; Thu, 1 Jul 2021 06:17:05 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org DF298384B806 X-IronPort-AV: E=McAfee;i="6200,9189,10031"; a="208299917" X-IronPort-AV: E=Sophos;i="5.83,313,1616482800"; d="scan'208";a="208299917" Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Jun 2021 23:17:00 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.83,313,1616482800"; d="scan'208";a="457530427" Received: from scymds01.sc.intel.com ([10.148.94.138]) by fmsmga008.fm.intel.com with ESMTP; 30 Jun 2021 23:17:00 -0700 Received: from shliclel320.sh.intel.com (shliclel320.sh.intel.com [10.239.236.50]) by scymds01.sc.intel.com with ESMTP id 1616Gmej031625; Wed, 30 Jun 2021 23:16:59 -0700 To: gcc-patches@gcc.gnu.org Subject: [PATCH 06/62] AVX512FP16: Add abi test for zmm Date: Thu, 1 Jul 2021 14:15:52 +0800 Message-Id: <20210701061648.9447-7-hongtao.liu@intel.com> X-Mailer: git-send-email 2.18.1 In-Reply-To: <20210701061648.9447-1-hongtao.liu@intel.com> References: <20210701061648.9447-1-hongtao.liu@intel.com> X-Spam-Status: No, score=-12.7 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_NONE, KAM_DMARC_STATUS, KAM_LAZY_DOMAIN_SECURITY, KAM_SHORT, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: liuhongt via Gcc-patches From: liuhongt Reply-To: liuhongt Cc: jakub@redhat.com Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" gcc/testsuite/ChangeLog: * gcc.target/x86_64/abi/avx512fp16/m512h/abi-avx512fp16-zmm.exp: New file. * gcc.target/x86_64/abi/avx512fp16/m512h/args.h: Likewise. * gcc.target/x86_64/abi/avx512fp16/m512h/asm-support.S: Likewise. * gcc.target/x86_64/abi/avx512fp16/m512h/avx512fp16-zmm-check.h: Likewise. * gcc.target/x86_64/abi/avx512fp16/m512h/test_m512_returning.c: Likewise. * gcc.target/x86_64/abi/avx512fp16/m512h/test_passing_m512.c: Likewise. * gcc.target/x86_64/abi/avx512fp16/m512h/test_passing_structs.c: Likewise. * gcc.target/x86_64/abi/avx512fp16/m512h/test_passing_unions.c: Likewise. * gcc.target/x86_64/abi/avx512fp16/m512h/test_varargs-m512.c: Likewise. --- .../avx512fp16/m512h/abi-avx512fp16-zmm.exp | 48 ++ .../x86_64/abi/avx512fp16/m512h/args.h | 186 ++++++++ .../x86_64/abi/avx512fp16/m512h/asm-support.S | 97 ++++ .../avx512fp16/m512h/avx512fp16-zmm-check.h | 4 + .../avx512fp16/m512h/test_m512_returning.c | 62 +++ .../abi/avx512fp16/m512h/test_passing_m512.c | 380 ++++++++++++++++ .../avx512fp16/m512h/test_passing_structs.c | 123 ++++++ .../avx512fp16/m512h/test_passing_unions.c | 415 ++++++++++++++++++ .../abi/avx512fp16/m512h/test_varargs-m512.c | 164 +++++++ 9 files changed, 1479 insertions(+) create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m512h/abi-avx512fp16-zmm.exp create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m512h/args.h create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m512h/asm-support.S create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m512h/avx512fp16-zmm-check.h create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m512h/test_m512_returning.c create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m512h/test_passing_m512.c create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m512h/test_passing_structs.c create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m512h/test_passing_unions.c create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m512h/test_varargs-m512.c diff --git a/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m512h/abi-avx512fp16-zmm.exp b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m512h/abi-avx512fp16-zmm.exp new file mode 100644 index 00000000000..33d24762788 --- /dev/null +++ b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m512h/abi-avx512fp16-zmm.exp @@ -0,0 +1,48 @@ +# Copyright (C) 2019 Free Software Foundation, Inc. + +# This program is free software; you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation; either version 3 of the License, or +# (at your option) any later version. +# +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with GCC; see the file COPYING3. If not see +# . + +# The x86-64 ABI testsuite needs one additional assembler file for most +# testcases. For simplicity we will just link it into each test. + +load_lib c-torture.exp +load_lib target-supports.exp +load_lib torture-options.exp +load_lib clearcap.exp +load_lib file-format.exp + +if { (![istarget x86_64-*-*] && ![istarget i?86-*-*]) + || [is-effective-target ia32] + || [gcc_target_object_format] != "elf" + || ![is-effective-target avx512fp16] } then { + return +} + + +torture-init +clearcap-init +set-torture-options $C_TORTURE_OPTIONS +set additional_flags "-W -Wall -Wno-abi -mavx512fp16" + +foreach src [lsort [glob -nocomplain $srcdir/$subdir/test_*.c]] { + if {[runtest_file_p $runtests $src]} { + c-torture-execute [list $src \ + $srcdir/$subdir/asm-support.S] \ + $additional_flags + } +} + +clearcap-finish +torture-finish diff --git a/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m512h/args.h b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m512h/args.h new file mode 100644 index 00000000000..ec89fae4597 --- /dev/null +++ b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m512h/args.h @@ -0,0 +1,186 @@ +#ifndef INCLUDED_ARGS_H +#define INCLUDED_ARGS_H + +#include +#include + +/* Assertion macro. */ +#define assert(test) if (!(test)) abort() + +#ifdef __GNUC__ +#define ATTRIBUTE_UNUSED __attribute__((__unused__)) +#else +#define ATTRIBUTE_UNUSED +#endif + +/* This defines the calling sequences for integers and floats. */ +#define I0 rdi +#define I1 rsi +#define I2 rdx +#define I3 rcx +#define I4 r8 +#define I5 r9 +#define F0 zmm0 +#define F1 zmm1 +#define F2 zmm2 +#define F3 zmm3 +#define F4 zmm4 +#define F5 zmm5 +#define F6 zmm6 +#define F7 zmm7 + +typedef union { + _Float16 __Float16[32]; + float _float[16]; + double _double[8]; + long _long[8]; + int _int[16]; + unsigned long _ulong[8]; + __m64 _m64[8]; + __m128 _m128[4]; + __m256 _m256[2]; + __m512 _m512[1]; + __m512h _m512h[1]; +} ZMM_T; + +typedef union { + float _float; + double _double; + long double _ldouble; + unsigned long _ulong[2]; +} X87_T; +extern void (*callthis)(void); +extern unsigned long rax,rbx,rcx,rdx,rsi,rdi,rsp,rbp,r8,r9,r10,r11,r12,r13,r14,r15; +ZMM_T zmm_regs[32]; +X87_T x87_regs[8]; +extern volatile unsigned long volatile_var; +extern void snapshot (void); +extern void snapshot_ret (void); +#define WRAP_CALL(N) \ + (callthis = (void (*)()) (N), (typeof (&N)) snapshot) +#define WRAP_RET(N) \ + (callthis = (void (*)()) (N), (typeof (&N)) snapshot_ret) + +/* Clear all integer registers. */ +#define clear_int_hardware_registers \ + asm __volatile__ ("xor %%rax, %%rax\n\t" \ + "xor %%rbx, %%rbx\n\t" \ + "xor %%rcx, %%rcx\n\t" \ + "xor %%rdx, %%rdx\n\t" \ + "xor %%rsi, %%rsi\n\t" \ + "xor %%rdi, %%rdi\n\t" \ + "xor %%r8, %%r8\n\t" \ + "xor %%r9, %%r9\n\t" \ + "xor %%r10, %%r10\n\t" \ + "xor %%r11, %%r11\n\t" \ + "xor %%r12, %%r12\n\t" \ + "xor %%r13, %%r13\n\t" \ + "xor %%r14, %%r14\n\t" \ + "xor %%r15, %%r15\n\t" \ + ::: "rax", "rbx", "rcx", "rdx", "rsi", "rdi", "r8", \ + "r9", "r10", "r11", "r12", "r13", "r14", "r15"); + +/* This is the list of registers available for passing arguments. Not all of + these are used or even really available. */ +struct IntegerRegisters +{ + unsigned long rax, rbx, rcx, rdx, rsi, rdi, r8, r9, r10, r11, r12, r13, r14, r15; +}; +struct FloatRegisters +{ + double mm0, mm1, mm2, mm3, mm4, mm5, mm6, mm7; + long double st0, st1, st2, st3, st4, st5, st6, st7; + ZMM_T zmm0, zmm1, zmm2, zmm3, zmm4, zmm5, zmm6, zmm7, zmm8, zmm9, + zmm10, zmm11, zmm12, zmm13, zmm14, zmm15, zmm16, zmm17, zmm18, + zmm19, zmm20, zmm21, zmm22, zmm23, zmm24, zmm25, zmm26, zmm27, + zmm28, zmm29, zmm30, zmm31; +}; + +/* Implemented in scalarargs.c */ +extern struct IntegerRegisters iregs; +extern struct FloatRegisters fregs; +extern unsigned int num_iregs, num_fregs; + +#define check_int_arguments do { \ + assert (num_iregs <= 0 || iregs.I0 == I0); \ + assert (num_iregs <= 1 || iregs.I1 == I1); \ + assert (num_iregs <= 2 || iregs.I2 == I2); \ + assert (num_iregs <= 3 || iregs.I3 == I3); \ + assert (num_iregs <= 4 || iregs.I4 == I4); \ + assert (num_iregs <= 5 || iregs.I5 == I5); \ + } while (0) + +#define check_char_arguments check_int_arguments +#define check_short_arguments check_int_arguments +#define check_long_arguments check_int_arguments + +/* Clear register struct. */ +#define clear_struct_registers \ + rax = rbx = rcx = rdx = rdi = rsi = rbp = rsp \ + = r8 = r9 = r10 = r11 = r12 = r13 = r14 = r15 = 0; \ + memset (&iregs, 0, sizeof (iregs)); \ + memset (&fregs, 0, sizeof (fregs)); \ + memset (zmm_regs, 0, sizeof (zmm_regs)); \ + memset (x87_regs, 0, sizeof (x87_regs)); + +/* Clear both hardware and register structs for integers. */ +#define clear_int_registers \ + clear_struct_registers \ + clear_int_hardware_registers + +/* TODO: Do the checking. */ +#define check_f_arguments(T) do { \ + assert (num_fregs <= 0 || fregs.zmm0._ ## T [0] == zmm_regs[0]._ ## T [0]); \ + assert (num_fregs <= 1 || fregs.zmm1._ ## T [0] == zmm_regs[1]._ ## T [0]); \ + assert (num_fregs <= 2 || fregs.zmm2._ ## T [0] == zmm_regs[2]._ ## T [0]); \ + assert (num_fregs <= 3 || fregs.zmm3._ ## T [0] == zmm_regs[3]._ ## T [0]); \ + assert (num_fregs <= 4 || fregs.zmm4._ ## T [0] == zmm_regs[4]._ ## T [0]); \ + assert (num_fregs <= 5 || fregs.zmm5._ ## T [0] == zmm_regs[5]._ ## T [0]); \ + assert (num_fregs <= 6 || fregs.zmm6._ ## T [0] == zmm_regs[6]._ ## T [0]); \ + assert (num_fregs <= 7 || fregs.zmm7._ ## T [0] == zmm_regs[7]._ ## T [0]); \ + } while (0) + +#define check_float_arguments check_f_arguments(float) +#define check_double_arguments check_f_arguments(double) + +#define check_vector_arguments(T,O) do { \ + assert (num_fregs <= 0 \ + || memcmp (((char *) &fregs.zmm0) + (O), \ + &zmm_regs[0], \ + sizeof (__ ## T) - (O)) == 0); \ + assert (num_fregs <= 1 \ + || memcmp (((char *) &fregs.zmm1) + (O), \ + &zmm_regs[1], \ + sizeof (__ ## T) - (O)) == 0); \ + assert (num_fregs <= 2 \ + || memcmp (((char *) &fregs.zmm2) + (O), \ + &zmm_regs[2], \ + sizeof (__ ## T) - (O)) == 0); \ + assert (num_fregs <= 3 \ + || memcmp (((char *) &fregs.zmm3) + (O), \ + &zmm_regs[3], \ + sizeof (__ ## T) - (O)) == 0); \ + assert (num_fregs <= 4 \ + || memcmp (((char *) &fregs.zmm4) + (O), \ + &zmm_regs[4], \ + sizeof (__ ## T) - (O)) == 0); \ + assert (num_fregs <= 5 \ + || memcmp (((char *) &fregs.zmm5) + (O), \ + &zmm_regs[5], \ + sizeof (__ ## T) - (O)) == 0); \ + assert (num_fregs <= 6 \ + || memcmp (((char *) &fregs.zmm6) + (O), \ + &zmm_regs[6], \ + sizeof (__ ## T) - (O)) == 0); \ + assert (num_fregs <= 7 \ + || memcmp (((char *) &fregs.zmm7) + (O), \ + &zmm_regs[7], \ + sizeof (__ ## T) - (O)) == 0); \ + } while (0) + +#define check_m64_arguments check_vector_arguments(m64, 0) +#define check_m128_arguments check_vector_arguments(m128, 0) +#define check_m256_arguments check_vector_arguments(m256, 0) +#define check_m512_arguments check_vector_arguments(m512, 0) + +#endif /* INCLUDED_ARGS_H */ diff --git a/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m512h/asm-support.S b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m512h/asm-support.S new file mode 100644 index 00000000000..0ef82876dd9 --- /dev/null +++ b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m512h/asm-support.S @@ -0,0 +1,97 @@ + .text + .p2align 4,,15 +.globl snapshot + .type snapshot, @function +snapshot: +.LFB3: + movq %rax, rax(%rip) + movq %rbx, rbx(%rip) + movq %rcx, rcx(%rip) + movq %rdx, rdx(%rip) + movq %rdi, rdi(%rip) + movq %rsi, rsi(%rip) + movq %rbp, rbp(%rip) + movq %rsp, rsp(%rip) + movq %r8, r8(%rip) + movq %r9, r9(%rip) + movq %r10, r10(%rip) + movq %r11, r11(%rip) + movq %r12, r12(%rip) + movq %r13, r13(%rip) + movq %r14, r14(%rip) + movq %r15, r15(%rip) + vmovdqu32 %zmm0, zmm_regs+0(%rip) + vmovdqu32 %zmm1, zmm_regs+64(%rip) + vmovdqu32 %zmm2, zmm_regs+128(%rip) + vmovdqu32 %zmm3, zmm_regs+192(%rip) + vmovdqu32 %zmm4, zmm_regs+256(%rip) + vmovdqu32 %zmm5, zmm_regs+320(%rip) + vmovdqu32 %zmm6, zmm_regs+384(%rip) + vmovdqu32 %zmm7, zmm_regs+448(%rip) + vmovdqu32 %zmm8, zmm_regs+512(%rip) + vmovdqu32 %zmm9, zmm_regs+576(%rip) + vmovdqu32 %zmm10, zmm_regs+640(%rip) + vmovdqu32 %zmm11, zmm_regs+704(%rip) + vmovdqu32 %zmm12, zmm_regs+768(%rip) + vmovdqu32 %zmm13, zmm_regs+832(%rip) + vmovdqu32 %zmm14, zmm_regs+896(%rip) + vmovdqu32 %zmm15, zmm_regs+960(%rip) + vmovdqu32 %zmm16, zmm_regs+1024(%rip) + vmovdqu32 %zmm17, zmm_regs+1088(%rip) + vmovdqu32 %zmm18, zmm_regs+1152(%rip) + vmovdqu32 %zmm19, zmm_regs+1216(%rip) + vmovdqu32 %zmm20, zmm_regs+1280(%rip) + vmovdqu32 %zmm21, zmm_regs+1344(%rip) + vmovdqu32 %zmm22, zmm_regs+1408(%rip) + vmovdqu32 %zmm23, zmm_regs+1472(%rip) + vmovdqu32 %zmm24, zmm_regs+1536(%rip) + vmovdqu32 %zmm25, zmm_regs+1600(%rip) + vmovdqu32 %zmm26, zmm_regs+1664(%rip) + vmovdqu32 %zmm27, zmm_regs+1728(%rip) + vmovdqu32 %zmm28, zmm_regs+1792(%rip) + vmovdqu32 %zmm29, zmm_regs+1856(%rip) + vmovdqu32 %zmm30, zmm_regs+1920(%rip) + vmovdqu32 %zmm31, zmm_regs+1984(%rip) + jmp *callthis(%rip) +.LFE3: + .size snapshot, .-snapshot + + .p2align 4,,15 +.globl snapshot_ret + .type snapshot_ret, @function +snapshot_ret: + movq %rdi, rdi(%rip) + subq $8, %rsp + call *callthis(%rip) + addq $8, %rsp + movq %rax, rax(%rip) + movq %rdx, rdx(%rip) + vmovdqu32 %zmm0, zmm_regs+0(%rip) + vmovdqu32 %zmm1, zmm_regs+64(%rip) + fstpt x87_regs(%rip) + fstpt x87_regs+16(%rip) + fldt x87_regs+16(%rip) + fldt x87_regs(%rip) + ret + .size snapshot_ret, .-snapshot_ret + + .comm callthis,8,8 + .comm rax,8,8 + .comm rbx,8,8 + .comm rcx,8,8 + .comm rdx,8,8 + .comm rsi,8,8 + .comm rdi,8,8 + .comm rsp,8,8 + .comm rbp,8,8 + .comm r8,8,8 + .comm r9,8,8 + .comm r10,8,8 + .comm r11,8,8 + .comm r12,8,8 + .comm r13,8,8 + .comm r14,8,8 + .comm r15,8,8 + .comm zmm_regs,2048,64 + .comm x87_regs,128,32 + .comm volatile_var,8,8 diff --git a/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m512h/avx512fp16-zmm-check.h b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m512h/avx512fp16-zmm-check.h new file mode 100644 index 00000000000..4b882cc11fc --- /dev/null +++ b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m512h/avx512fp16-zmm-check.h @@ -0,0 +1,4 @@ +#define AVX512VL(ebx) 1 +#define XSTATE_MASK (XSTATE_SSE | XSTATE_YMM | XSTATE_ZMM \ + | XSTATE_HI_ZMM | XSTATE_OPMASK) +#include "../avx512fp16-check.h" diff --git a/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m512h/test_m512_returning.c b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m512h/test_m512_returning.c new file mode 100644 index 00000000000..5cb59436cfd --- /dev/null +++ b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m512h/test_m512_returning.c @@ -0,0 +1,62 @@ +#include +#include "avx512fp16-zmm-check.h" +#include "args.h" + +struct IntegerRegisters iregs; +struct FloatRegisters fregs; +unsigned int num_iregs, num_fregs; + +__m512 +fun_test_returning___m512 (void) +{ + volatile_var++; + return (__m512){73,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0}; +} + +__m512h +fun_test_returning___m512h (void) +{ + volatile_var++; + return (__m512h){ 1.1f16, 2.2f16, 3.3f16, 4.4f16, + 5.5f16, 6.6f16, 7.7f16, 8.8f16, + 9.9f16, 10.10f16, 11.11f16, 12.12f16, + 13.13f16, 14.14f16, 15.15f16, 16.16f16, + 17.17f16, 18.18f16, 19.19f16, 20.20f16, + 21.21f16, 22.22f16, 23.23f16, 24.24f16, + 25.25f16, 26.26f16, 27.27f16, 28.28f16, + 29.29f16, 30.30f16, 31.31f16, 32.32f16}; +} + +__m512 test_512; +__m512h test_512h; + +static void +do_test (void) +{ + unsigned failed = 0; + ZMM_T zmmt1, zmmt2; + + clear_struct_registers; + test_512 = (__m512){73,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0}; + zmmt1._m512[0] = test_512; + zmmt2._m512[0] = WRAP_RET (fun_test_returning___m512)(); + if (memcmp (&zmmt1, &zmmt2, sizeof (zmmt2)) != 0) + printf ("fail m512\n"), failed++; + + clear_struct_registers; + test_512h = (__m512h){ 1.1f16, 2.2f16, 3.3f16, 4.4f16, + 5.5f16, 6.6f16, 7.7f16, 8.8f16, + 9.9f16, 10.10f16, 11.11f16, 12.12f16, + 13.13f16, 14.14f16, 15.15f16, 16.16f16, + 17.17f16, 18.18f16, 19.19f16, 20.20f16, + 21.21f16, 22.22f16, 23.23f16, 24.24f16, + 25.25f16, 26.26f16, 27.27f16, 28.28f16, + 29.29f16, 30.30f16, 31.31f16, 32.32f16}; + zmmt1._m512h[0] = test_512h; + zmmt2._m512h[0] = WRAP_RET (fun_test_returning___m512h)(); + if (memcmp (&zmmt1, &zmmt2, sizeof (zmmt2)) != 0) + printf ("fail m512h\n"), failed++; + + if (failed) + abort (); +} diff --git a/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m512h/test_passing_m512.c b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m512h/test_passing_m512.c new file mode 100644 index 00000000000..ad5ba2e7f92 --- /dev/null +++ b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m512h/test_passing_m512.c @@ -0,0 +1,380 @@ +#include +#include "avx512fp16-zmm-check.h" +#include "args.h" + +struct IntegerRegisters iregs; +struct FloatRegisters fregs; +unsigned int num_iregs, num_fregs; + +/* This struct holds values for argument checking. */ +struct +{ + ZMM_T i0, i1, i2, i3, i4, i5, i6, i7, i8, i9, i10, i11, i12, i13, i14, i15, + i16, i17, i18, i19, i20, i21, i22, i23; +} values; + +char *pass; +int failed = 0; + +#undef assert +#define assert(c) do { \ + if (!(c)) {failed++; printf ("failed %s\n", pass); } \ +} while (0) + +#define compare(X1,X2,T) do { \ + assert (memcmp (&X1, &X2, sizeof (T)) == 0); \ +} while (0) + +fun_check_passing_m512_8_values (__m512 i0 ATTRIBUTE_UNUSED, + __m512 i1 ATTRIBUTE_UNUSED, + __m512 i2 ATTRIBUTE_UNUSED, + __m512 i3 ATTRIBUTE_UNUSED, + __m512 i4 ATTRIBUTE_UNUSED, + __m512 i5 ATTRIBUTE_UNUSED, + __m512 i6 ATTRIBUTE_UNUSED, + __m512 i7 ATTRIBUTE_UNUSED) +{ + /* Check argument values. */ + compare (values.i0, i0, __m512); + compare (values.i1, i1, __m512); + compare (values.i2, i2, __m512); + compare (values.i3, i3, __m512); + compare (values.i4, i4, __m512); + compare (values.i5, i5, __m512); + compare (values.i6, i6, __m512); + compare (values.i7, i7, __m512); +} + +fun_check_passing_m512h_8_values (__m512h i0 ATTRIBUTE_UNUSED, + __m512h i1 ATTRIBUTE_UNUSED, + __m512h i2 ATTRIBUTE_UNUSED, + __m512h i3 ATTRIBUTE_UNUSED, + __m512h i4 ATTRIBUTE_UNUSED, + __m512h i5 ATTRIBUTE_UNUSED, + __m512h i6 ATTRIBUTE_UNUSED, + __m512h i7 ATTRIBUTE_UNUSED) +{ + /* Check argument values. */ + compare (values.i0, i0, __m512h); + compare (values.i1, i1, __m512h); + compare (values.i2, i2, __m512h); + compare (values.i3, i3, __m512h); + compare (values.i4, i4, __m512h); + compare (values.i5, i5, __m512h); + compare (values.i6, i6, __m512h); + compare (values.i7, i7, __m512h); +} + +void +fun_check_passing_m512_8_regs (__m512 i0 ATTRIBUTE_UNUSED, + __m512 i1 ATTRIBUTE_UNUSED, + __m512 i2 ATTRIBUTE_UNUSED, + __m512 i3 ATTRIBUTE_UNUSED, + __m512 i4 ATTRIBUTE_UNUSED, + __m512 i5 ATTRIBUTE_UNUSED, + __m512 i6 ATTRIBUTE_UNUSED, + __m512 i7 ATTRIBUTE_UNUSED) +{ + /* Check register contents. */ + check_m512_arguments; +} + +void +fun_check_passing_m512h_8_regs (__m512h i0 ATTRIBUTE_UNUSED, + __m512h i1 ATTRIBUTE_UNUSED, + __m512h i2 ATTRIBUTE_UNUSED, + __m512h i3 ATTRIBUTE_UNUSED, + __m512h i4 ATTRIBUTE_UNUSED, + __m512h i5 ATTRIBUTE_UNUSED, + __m512h i6 ATTRIBUTE_UNUSED, + __m512h i7 ATTRIBUTE_UNUSED) +{ + /* Check register contents. */ + check_m512_arguments; +} + +void +fun_check_passing_m512_20_values (__m512 i0 ATTRIBUTE_UNUSED, + __m512 i1 ATTRIBUTE_UNUSED, + __m512 i2 ATTRIBUTE_UNUSED, + __m512 i3 ATTRIBUTE_UNUSED, + __m512 i4 ATTRIBUTE_UNUSED, + __m512 i5 ATTRIBUTE_UNUSED, + __m512 i6 ATTRIBUTE_UNUSED, + __m512 i7 ATTRIBUTE_UNUSED, + __m512 i8 ATTRIBUTE_UNUSED, + __m512 i9 ATTRIBUTE_UNUSED, + __m512 i10 ATTRIBUTE_UNUSED, + __m512 i11 ATTRIBUTE_UNUSED, + __m512 i12 ATTRIBUTE_UNUSED, + __m512 i13 ATTRIBUTE_UNUSED, + __m512 i14 ATTRIBUTE_UNUSED, + __m512 i15 ATTRIBUTE_UNUSED, + __m512 i16 ATTRIBUTE_UNUSED, + __m512 i17 ATTRIBUTE_UNUSED, + __m512 i18 ATTRIBUTE_UNUSED, + __m512 i19 ATTRIBUTE_UNUSED) +{ + /* Check argument values. */ + compare (values.i0, i0, __m512); + compare (values.i1, i1, __m512); + compare (values.i2, i2, __m512); + compare (values.i3, i3, __m512); + compare (values.i4, i4, __m512); + compare (values.i5, i5, __m512); + compare (values.i6, i6, __m512); + compare (values.i7, i7, __m512); + compare (values.i8, i8, __m512); + compare (values.i9, i9, __m512); + compare (values.i10, i10, __m512); + compare (values.i11, i11, __m512); + compare (values.i12, i12, __m512); + compare (values.i13, i13, __m512); + compare (values.i14, i14, __m512); + compare (values.i15, i15, __m512); + compare (values.i16, i16, __m512); + compare (values.i17, i17, __m512); + compare (values.i18, i18, __m512); + compare (values.i19, i19, __m512); +} + +void +fun_check_passing_m512h_20_values (__m512h i0 ATTRIBUTE_UNUSED, + __m512h i1 ATTRIBUTE_UNUSED, + __m512h i2 ATTRIBUTE_UNUSED, + __m512h i3 ATTRIBUTE_UNUSED, + __m512h i4 ATTRIBUTE_UNUSED, + __m512h i5 ATTRIBUTE_UNUSED, + __m512h i6 ATTRIBUTE_UNUSED, + __m512h i7 ATTRIBUTE_UNUSED, + __m512h i8 ATTRIBUTE_UNUSED, + __m512h i9 ATTRIBUTE_UNUSED, + __m512h i10 ATTRIBUTE_UNUSED, + __m512h i11 ATTRIBUTE_UNUSED, + __m512h i12 ATTRIBUTE_UNUSED, + __m512h i13 ATTRIBUTE_UNUSED, + __m512h i14 ATTRIBUTE_UNUSED, + __m512h i15 ATTRIBUTE_UNUSED, + __m512h i16 ATTRIBUTE_UNUSED, + __m512h i17 ATTRIBUTE_UNUSED, + __m512h i18 ATTRIBUTE_UNUSED, + __m512h i19 ATTRIBUTE_UNUSED) +{ + /* Check argument values. */ + compare (values.i0, i0, __m512h); + compare (values.i1, i1, __m512h); + compare (values.i2, i2, __m512h); + compare (values.i3, i3, __m512h); + compare (values.i4, i4, __m512h); + compare (values.i5, i5, __m512h); + compare (values.i6, i6, __m512h); + compare (values.i7, i7, __m512h); + compare (values.i8, i8, __m512h); + compare (values.i9, i9, __m512h); + compare (values.i10, i10, __m512h); + compare (values.i11, i11, __m512h); + compare (values.i12, i12, __m512h); + compare (values.i13, i13, __m512h); + compare (values.i14, i14, __m512h); + compare (values.i15, i15, __m512h); + compare (values.i16, i16, __m512h); + compare (values.i17, i17, __m512h); + compare (values.i18, i18, __m512h); + compare (values.i19, i19, __m512h); +} + +void +fun_check_passing_m512_20_regs (__m512 i0 ATTRIBUTE_UNUSED, + __m512 i1 ATTRIBUTE_UNUSED, + __m512 i2 ATTRIBUTE_UNUSED, + __m512 i3 ATTRIBUTE_UNUSED, + __m512 i4 ATTRIBUTE_UNUSED, + __m512 i5 ATTRIBUTE_UNUSED, + __m512 i6 ATTRIBUTE_UNUSED, + __m512 i7 ATTRIBUTE_UNUSED, + __m512 i8 ATTRIBUTE_UNUSED, + __m512 i9 ATTRIBUTE_UNUSED, + __m512 i10 ATTRIBUTE_UNUSED, + __m512 i11 ATTRIBUTE_UNUSED, + __m512 i12 ATTRIBUTE_UNUSED, + __m512 i13 ATTRIBUTE_UNUSED, + __m512 i14 ATTRIBUTE_UNUSED, + __m512 i15 ATTRIBUTE_UNUSED, + __m512 i16 ATTRIBUTE_UNUSED, + __m512 i17 ATTRIBUTE_UNUSED, + __m512 i18 ATTRIBUTE_UNUSED, + __m512 i19 ATTRIBUTE_UNUSED) +{ + /* Check register contents. */ + check_m512_arguments; +} + +void +fun_check_passing_m512h_20_regs (__m512h i0 ATTRIBUTE_UNUSED, + __m512h i1 ATTRIBUTE_UNUSED, + __m512h i2 ATTRIBUTE_UNUSED, + __m512h i3 ATTRIBUTE_UNUSED, + __m512h i4 ATTRIBUTE_UNUSED, + __m512h i5 ATTRIBUTE_UNUSED, + __m512h i6 ATTRIBUTE_UNUSED, + __m512h i7 ATTRIBUTE_UNUSED, + __m512h i8 ATTRIBUTE_UNUSED, + __m512h i9 ATTRIBUTE_UNUSED, + __m512h i10 ATTRIBUTE_UNUSED, + __m512h i11 ATTRIBUTE_UNUSED, + __m512h i12 ATTRIBUTE_UNUSED, + __m512h i13 ATTRIBUTE_UNUSED, + __m512h i14 ATTRIBUTE_UNUSED, + __m512h i15 ATTRIBUTE_UNUSED, + __m512h i16 ATTRIBUTE_UNUSED, + __m512h i17 ATTRIBUTE_UNUSED, + __m512h i18 ATTRIBUTE_UNUSED, + __m512h i19 ATTRIBUTE_UNUSED) +{ + /* Check register contents. */ + check_m512_arguments; +} + +#define def_check_passing8(_i0, _i1, _i2, _i3, _i4, _i5, _i6, _i7, _func1, _func2, TYPE) \ + values.i0.TYPE[0] = _i0; \ + values.i1.TYPE[0] = _i1; \ + values.i2.TYPE[0] = _i2; \ + values.i3.TYPE[0] = _i3; \ + values.i4.TYPE[0] = _i4; \ + values.i5.TYPE[0] = _i5; \ + values.i6.TYPE[0] = _i6; \ + values.i7.TYPE[0] = _i7; \ + WRAP_CALL(_func1) (_i0, _i1, _i2, _i3, _i4, _i5, _i6, _i7); \ + \ + clear_struct_registers; \ + fregs.F0.TYPE[0] = _i0; \ + fregs.F1.TYPE[0] = _i1; \ + fregs.F2.TYPE[0] = _i2; \ + fregs.F3.TYPE[0] = _i3; \ + fregs.F4.TYPE[0] = _i4; \ + fregs.F5.TYPE[0] = _i5; \ + fregs.F6.TYPE[0] = _i6; \ + fregs.F7.TYPE[0] = _i7; \ + num_fregs = 8; \ + WRAP_CALL(_func2) (_i0, _i1, _i2, _i3, _i4, _i5, _i6, _i7); + +#define def_check_passing20(_i0, _i1, _i2, _i3, _i4, _i5, _i6, _i7, _i8, _i9, \ + _i10, _i11, _i12, _i13, _i14, _i15, _i16, _i17, \ + _i18, _i19, _func1, _func2, TYPE) \ + values.i0.TYPE[0] = _i0; \ + values.i1.TYPE[0] = _i1; \ + values.i2.TYPE[0] = _i2; \ + values.i3.TYPE[0] = _i3; \ + values.i4.TYPE[0] = _i4; \ + values.i5.TYPE[0] = _i5; \ + values.i6.TYPE[0] = _i6; \ + values.i7.TYPE[0] = _i7; \ + values.i8.TYPE[0] = _i8; \ + values.i9.TYPE[0] = _i9; \ + values.i10.TYPE[0] = _i10; \ + values.i11.TYPE[0] = _i11; \ + values.i12.TYPE[0] = _i12; \ + values.i13.TYPE[0] = _i13; \ + values.i14.TYPE[0] = _i14; \ + values.i15.TYPE[0] = _i15; \ + values.i16.TYPE[0] = _i16; \ + values.i17.TYPE[0] = _i17; \ + values.i18.TYPE[0] = _i18; \ + values.i19.TYPE[0] = _i19; \ + WRAP_CALL(_func1) (_i0, _i1, _i2, _i3, _i4, _i5, _i6, _i7, _i8, _i9, \ + _i10, _i11, _i12, _i13, _i14, _i15, _i16, _i17, \ + _i18, _i19); \ + \ + clear_struct_registers; \ + fregs.F0.TYPE[0] = _i0; \ + fregs.F1.TYPE[0] = _i1; \ + fregs.F2.TYPE[0] = _i2; \ + fregs.F3.TYPE[0] = _i3; \ + fregs.F4.TYPE[0] = _i4; \ + fregs.F5.TYPE[0] = _i5; \ + fregs.F6.TYPE[0] = _i6; \ + fregs.F7.TYPE[0] = _i7; \ + num_fregs = 8; \ + WRAP_CALL(_func2) (_i0, _i1, _i2, _i3, _i4, _i5, _i6, _i7, _i8, _i9, \ + _i10, _i11, _i12, _i13, _i14, _i15, _i16, _i17, \ + _i18, _i19); + +void +test_m512_on_stack () +{ + __m512 x[8]; + int i; + for (i = 0; i < 8; i++) + x[i] = (__m512){32 + i, 0, 0, 0, 0, 0, 0, 0}; + pass = "m512-8"; + def_check_passing8 (x[0], x[1], x[2], x[3], x[4], x[5], x[6], x[7], + fun_check_passing_m512_8_values, + fun_check_passing_m512_8_regs, _m512); +} + +void +test_m512h_on_stack () +{ + __m512h x[8]; + int i; + for (i = 0; i < 8; i++) + x[i] = (__m512h){1.1f16 + i, 2.2f16 + i, 3.3f16 + i, 4.4f16 + i, + 5.5f16 + i, 6.6f16 + i, 7.7f16 + i, 8.8f16 + i, + 9.9f16 + i, 10.10f16 + i, 11.11f16 + i, 12.12f16 + i, + 13.13f16 + i, 14.14f16 + i, 15.15f16 + i, 16.16f16 + i, + 17.17f16 + i, 18.18f16 + i, 19.19f16 + i, 20.20f16 + i, + 21.21f16 + i, 22.22f16 + i, 23.23f16 + i, 24.24f16 + i, + 25.25f16 + i, 26.26f16 + i, 27.27f16 + i, 28.28f16 + i, + 29.29f16 + i, 30.30f16 + i, 31.31f16 + i, 32.32f16 + i}; + + pass = "m512h-8"; + def_check_passing8 (x[0], x[1], x[2], x[3], x[4], x[5], x[6], x[7], + fun_check_passing_m512h_8_values, + fun_check_passing_m512h_8_regs, _m512h); +} + +void +test_too_many_m512 () +{ + __m512 x[20]; + int i; + for (i = 0; i < 20; i++) + x[i] = (__m512){32 + i, 0, 0, 0, 0, 0, 0, 0}; + pass = "m512-20"; + def_check_passing20 (x[0], x[1], x[2], x[3], x[4], x[5], x[6], x[7], x[8], + x[9], x[10], x[11], x[12], x[13], x[14], x[15], x[16], + x[17], x[18], x[19], fun_check_passing_m512_20_values, + fun_check_passing_m512_20_regs, _m512); +} + +void +test_too_many_m512h () +{ + __m512h x[20]; + int i; + for (i = 0; i < 20; i++) + x[i] = (__m512h){ 1.1f16 + i, 2.2f16 + i, 3.3f16 + i, 4.4f16 + i, + 5.5f16 + i, 6.6f16 + i, 7.7f16 + i, 8.8f16 + i, + 9.9f16 + i, 10.10f16 + i, 11.11f16 + i, 12.12f16 + i, + 13.13f16 + i, 14.14f16 + i, 15.15f16 + i, 16.16f16 + i, + 17.17f16 + i, 18.18f16 + i, 19.19f16 + i, 20.20f16 + i, + 21.21f16 + i, 22.22f16 + i, 23.23f16 + i, 24.24f16 + i, + 25.25f16 + i, 26.26f16 + i, 27.27f16 + i, 28.28f16 + i, + 29.29f16 + i, 30.30f16 + i, 31.31f16 + i, 32.32f16 + i}; + pass = "m512h-20"; + def_check_passing20 (x[0], x[1], x[2], x[3], x[4], x[5], x[6], x[7], x[8], + x[9], x[10], x[11], x[12], x[13], x[14], x[15], x[16], + x[17], x[18], x[19], fun_check_passing_m512h_20_values, + fun_check_passing_m512h_20_regs, _m512h); +} + +static void +do_test (void) +{ + test_m512_on_stack (); + test_too_many_m512 (); + test_m512h_on_stack (); + test_too_many_m512h (); + if (failed) + abort (); +} diff --git a/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m512h/test_passing_structs.c b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m512h/test_passing_structs.c new file mode 100644 index 00000000000..734e0f8e9e9 --- /dev/null +++ b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m512h/test_passing_structs.c @@ -0,0 +1,123 @@ +#include "avx512fp16-zmm-check.h" +#include "args.h" + +struct IntegerRegisters iregs; +struct FloatRegisters fregs; +unsigned int num_iregs, num_fregs; + +struct m512_struct +{ + __m512 x; +}; + +struct m512h_struct +{ + __m512h x; +}; + +struct m512_2_struct +{ + __m512 x1, x2; +}; + +struct m512h_2_struct +{ + __m512h x1, x2; +}; + +/* Check that the struct is passed as the individual members in fregs. */ +void +check_struct_passing1 (struct m512_struct ms1 ATTRIBUTE_UNUSED, + struct m512_struct ms2 ATTRIBUTE_UNUSED, + struct m512_struct ms3 ATTRIBUTE_UNUSED, + struct m512_struct ms4 ATTRIBUTE_UNUSED, + struct m512_struct ms5 ATTRIBUTE_UNUSED, + struct m512_struct ms6 ATTRIBUTE_UNUSED, + struct m512_struct ms7 ATTRIBUTE_UNUSED, + struct m512_struct ms8 ATTRIBUTE_UNUSED) +{ + /* Check register contents. */ + check_m512_arguments; +} + +void +check_struct_passing1h (struct m512h_struct ms1 ATTRIBUTE_UNUSED, + struct m512h_struct ms2 ATTRIBUTE_UNUSED, + struct m512h_struct ms3 ATTRIBUTE_UNUSED, + struct m512h_struct ms4 ATTRIBUTE_UNUSED, + struct m512h_struct ms5 ATTRIBUTE_UNUSED, + struct m512h_struct ms6 ATTRIBUTE_UNUSED, + struct m512h_struct ms7 ATTRIBUTE_UNUSED, + struct m512h_struct ms8 ATTRIBUTE_UNUSED) +{ + /* Check register contents. */ + check_m512_arguments; +} + +void +check_struct_passing2 (struct m512_2_struct ms ATTRIBUTE_UNUSED) +{ + /* Check the passing on the stack by comparing the address of the + stack elements to the expected place on the stack. */ + assert ((unsigned long)&ms.x1 == rsp+8); + assert ((unsigned long)&ms.x2 == rsp+72); +} + +void +check_struct_passing2h (struct m512h_2_struct ms ATTRIBUTE_UNUSED) +{ + /* Check the passing on the stack by comparing the address of the + stack elements to the expected place on the stack. */ + assert ((unsigned long)&ms.x1 == rsp+8); + assert ((unsigned long)&ms.x2 == rsp+72); +} + +static void +do_test (void) +{ + struct m512_struct m512s [8]; + struct m512h_struct m512hs [8]; + struct m512_2_struct m512_2s = { + { 48.394, 39.3, -397.9, 3484.9, -8.394, -93.3, 7.9, 84.94, + 48.3941, 39.31, -397.91, 3484.91, -8.3941, -93.31, 7.91, 84.941 }, + { -8.394, -3.3, -39.9, 34.9, 7.9, 84.94, -48.394, 39.3, + -8.3942, -3.32, -39.92, 34.92, 7.92, 84.942, -48.3942, 39.32 } + }; + struct m512h_2_struct m512h_2s = { + { 58.395f16, 39.3f16, -397.9f16, 3585.9f16, -8.395f16, -93.3f16, 7.9f16, 85.95f16, + 58.395f16, 39.3f16, -397.9f16, 3585.9f16, -8.395f16, -93.3f16, 7.9f16, 85.95f16, + 58.395f16, 39.3f16, -397.9f16, 3585.9f16, -8.395f16, -93.3f16, 7.9f16, 85.95f16, + 58.3951f16, 39.31f16, -397.91f16, 3585.91f16, -8.3951f16, -93.31f16, 7.91f16, 85.951f16}, + { 67.396f16, 39.3f16, -397.9f16, 3676.9f16, -7.396f16, -93.3f16, 7.9f16, 76.96f16, + 67.396f16, 39.3f16, -397.9f16, 3676.9f16, -7.396f16, -93.3f16, 7.9f16, 76.96f16, + 67.396f16, 39.3f16, -397.9f16, 3676.9f16, -7.396f16, -93.3f16, 7.9f16, 76.96f16, + 67.3961f16, 39.31f16, -397.91f16, 3676.91f16, -7.3961f16, -93.31f16, 7.91f16, 76.961f16}, + }; + int i; + + for (i = 0; i < 8; i++) + { + m512s[i].x = (__m512){32+i, 0, i, 0, -i, 0, i - 12, i + 8, + 32+i, 0, i, 0, -i, 0, i - 12, i + 8}; + m512hs[i].x = (__m512h){33+i, 1, i, 2, -i, 0, i - 15, i + 9, + 34+i, 1, i, 2, -i, 0, i - 15, i + 9, + 35+i, 1, i, 2, -i, 0, i - 15, i + 9, + 36+i, 1, i, 2, -i, 0, i - 15, i + 9}; + } + + clear_struct_registers; + for (i = 0; i < 8; i++) + (&fregs.zmm0)[i]._m512[0] = m512s[i].x; + num_fregs = 8; + WRAP_CALL (check_struct_passing1)(m512s[0], m512s[1], m512s[2], m512s[3], + m512s[4], m512s[5], m512s[6], m512s[7]); + WRAP_CALL (check_struct_passing2)(m512_2s); + + clear_struct_registers; + for (i = 0; i < 8; i++) + (&fregs.zmm0)[i]._m512h[0] = m512hs[i].x; + num_fregs = 8; + WRAP_CALL (check_struct_passing1h)(m512hs[0], m512hs[1], m512hs[2], m512hs[3], + m512hs[4], m512hs[5], m512hs[6], m512hs[7]); + WRAP_CALL (check_struct_passing2h)(m512h_2s); +} diff --git a/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m512h/test_passing_unions.c b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m512h/test_passing_unions.c new file mode 100644 index 00000000000..fa801fbf7ce --- /dev/null +++ b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m512h/test_passing_unions.c @@ -0,0 +1,415 @@ +#include "avx512fp16-zmm-check.h" +#include "args.h" + +struct IntegerRegisters iregs; +struct FloatRegisters fregs; +unsigned int num_iregs, num_fregs; + +union un1 +{ + __m512 x; + float f; +}; + +union un2 +{ + __m512 x; + double d; +}; + +union un3 +{ + __m512 x; + __m128 v; +}; + +union un4 +{ + __m512 x; + long double ld; +}; + +union un5 +{ + __m512 x; + int i; +}; + +union un6 +{ + __m512 x; + __m256 v; +}; + +union un1h +{ + __m512 x; + _Float16 f; +}; + +union un1hf +{ + __m512h x; + float f; +}; + +union un1hh +{ + __m512h x; + _Float16 f; +}; + +union un2h +{ + __m512h x; + double d; +}; + +union un3h +{ + __m512h x; + __m128 v; +}; + +union un4h +{ + __m512h x; + long double ld; +}; + +union un5h +{ + __m512h x; + int i; +}; + +union un6h +{ + __m512h x; + __m256 v; +}; + +void +check_union_passing1(union un1 u1 ATTRIBUTE_UNUSED, + union un1 u2 ATTRIBUTE_UNUSED, + union un1 u3 ATTRIBUTE_UNUSED, + union un1 u4 ATTRIBUTE_UNUSED, + union un1 u5 ATTRIBUTE_UNUSED, + union un1 u6 ATTRIBUTE_UNUSED, + union un1 u7 ATTRIBUTE_UNUSED, + union un1 u8 ATTRIBUTE_UNUSED) +{ + /* Check register contents. */ + check_m512_arguments; +} + +void +check_union_passing1h(union un1h u1 ATTRIBUTE_UNUSED, + union un1h u2 ATTRIBUTE_UNUSED, + union un1h u3 ATTRIBUTE_UNUSED, + union un1h u4 ATTRIBUTE_UNUSED, + union un1h u5 ATTRIBUTE_UNUSED, + union un1h u6 ATTRIBUTE_UNUSED, + union un1h u7 ATTRIBUTE_UNUSED, + union un1h u8 ATTRIBUTE_UNUSED) +{ + /* Check register contents. */ + check_m512_arguments; +} + +void +check_union_passing1hf(union un1hf u1 ATTRIBUTE_UNUSED, + union un1hf u2 ATTRIBUTE_UNUSED, + union un1hf u3 ATTRIBUTE_UNUSED, + union un1hf u4 ATTRIBUTE_UNUSED, + union un1hf u5 ATTRIBUTE_UNUSED, + union un1hf u6 ATTRIBUTE_UNUSED, + union un1hf u7 ATTRIBUTE_UNUSED, + union un1hf u8 ATTRIBUTE_UNUSED) +{ + /* Check register contents. */ + check_m512_arguments; +} + +void +check_union_passing1hh(union un1hh u1 ATTRIBUTE_UNUSED, + union un1hh u2 ATTRIBUTE_UNUSED, + union un1hh u3 ATTRIBUTE_UNUSED, + union un1hh u4 ATTRIBUTE_UNUSED, + union un1hh u5 ATTRIBUTE_UNUSED, + union un1hh u6 ATTRIBUTE_UNUSED, + union un1hh u7 ATTRIBUTE_UNUSED, + union un1hh u8 ATTRIBUTE_UNUSED) +{ + /* Check register contents. */ + check_m512_arguments; +} + + +void +check_union_passing2(union un2 u1 ATTRIBUTE_UNUSED, + union un2 u2 ATTRIBUTE_UNUSED, + union un2 u3 ATTRIBUTE_UNUSED, + union un2 u4 ATTRIBUTE_UNUSED, + union un2 u5 ATTRIBUTE_UNUSED, + union un2 u6 ATTRIBUTE_UNUSED, + union un2 u7 ATTRIBUTE_UNUSED, + union un2 u8 ATTRIBUTE_UNUSED) +{ + /* Check register contents. */ + check_m512_arguments; +} + +void +check_union_passing2h(union un2h u1 ATTRIBUTE_UNUSED, + union un2h u2 ATTRIBUTE_UNUSED, + union un2h u3 ATTRIBUTE_UNUSED, + union un2h u4 ATTRIBUTE_UNUSED, + union un2h u5 ATTRIBUTE_UNUSED, + union un2h u6 ATTRIBUTE_UNUSED, + union un2h u7 ATTRIBUTE_UNUSED, + union un2h u8 ATTRIBUTE_UNUSED) +{ + /* Check register contents. */ + check_m512_arguments; +} + +void +check_union_passing3(union un3 u1 ATTRIBUTE_UNUSED, + union un3 u2 ATTRIBUTE_UNUSED, + union un3 u3 ATTRIBUTE_UNUSED, + union un3 u4 ATTRIBUTE_UNUSED, + union un3 u5 ATTRIBUTE_UNUSED, + union un3 u6 ATTRIBUTE_UNUSED, + union un3 u7 ATTRIBUTE_UNUSED, + union un3 u8 ATTRIBUTE_UNUSED) +{ + /* Check register contents. */ + check_m512_arguments; +} + +void +check_union_passing3h(union un3h u1 ATTRIBUTE_UNUSED, + union un3h u2 ATTRIBUTE_UNUSED, + union un3h u3 ATTRIBUTE_UNUSED, + union un3h u4 ATTRIBUTE_UNUSED, + union un3h u5 ATTRIBUTE_UNUSED, + union un3h u6 ATTRIBUTE_UNUSED, + union un3h u7 ATTRIBUTE_UNUSED, + union un3h u8 ATTRIBUTE_UNUSED) +{ + /* Check register contents. */ + check_m512_arguments; +} + +void +check_union_passing4(union un4 u ATTRIBUTE_UNUSED) +{ + /* Check the passing on the stack by comparing the address of the + stack elements to the expected place on the stack. */ + assert ((unsigned long)&u.x == rsp+8); + assert ((unsigned long)&u.ld == rsp+8); +} + +void +check_union_passing4h(union un4h u ATTRIBUTE_UNUSED) +{ + /* Check the passing on the stack by comparing the address of the + stack elements to the expected place on the stack. */ + assert ((unsigned long)&u.x == rsp+8); + assert ((unsigned long)&u.ld == rsp+8); +} + +void +check_union_passing5(union un5 u ATTRIBUTE_UNUSED) +{ + /* Check the passing on the stack by comparing the address of the + stack elements to the expected place on the stack. */ + assert ((unsigned long)&u.x == rsp+8); + assert ((unsigned long)&u.i == rsp+8); +} + +void +check_union_passing5h(union un5h u ATTRIBUTE_UNUSED) +{ + /* Check the passing on the stack by comparing the address of the + stack elements to the expected place on the stack. */ + assert ((unsigned long)&u.x == rsp+8); + assert ((unsigned long)&u.i == rsp+8); +} + +void +check_union_passing6(union un6 u1 ATTRIBUTE_UNUSED, + union un6 u2 ATTRIBUTE_UNUSED, + union un6 u3 ATTRIBUTE_UNUSED, + union un6 u4 ATTRIBUTE_UNUSED, + union un6 u5 ATTRIBUTE_UNUSED, + union un6 u6 ATTRIBUTE_UNUSED, + union un6 u7 ATTRIBUTE_UNUSED, + union un6 u8 ATTRIBUTE_UNUSED) +{ + /* Check register contents. */ + check_m512_arguments; +} + +void +check_union_passing6h(union un6h u1 ATTRIBUTE_UNUSED, + union un6h u2 ATTRIBUTE_UNUSED, + union un6h u3 ATTRIBUTE_UNUSED, + union un6h u4 ATTRIBUTE_UNUSED, + union un6h u5 ATTRIBUTE_UNUSED, + union un6h u6 ATTRIBUTE_UNUSED, + union un6h u7 ATTRIBUTE_UNUSED, + union un6h u8 ATTRIBUTE_UNUSED) +{ + /* Check register contents. */ + check_m512_arguments; +} + +#define check_union_passing1 WRAP_CALL(check_union_passing1) +#define check_union_passing2 WRAP_CALL(check_union_passing2) +#define check_union_passing3 WRAP_CALL(check_union_passing3) +#define check_union_passing4 WRAP_CALL(check_union_passing4) +#define check_union_passing5 WRAP_CALL(check_union_passing5) +#define check_union_passing6 WRAP_CALL(check_union_passing6) + +#define check_union_passing1h WRAP_CALL(check_union_passing1h) +#define check_union_passing1hf WRAP_CALL(check_union_passing1hf) +#define check_union_passing1hh WRAP_CALL(check_union_passing1hh) +#define check_union_passing2h WRAP_CALL(check_union_passing2h) +#define check_union_passing3h WRAP_CALL(check_union_passing3h) +#define check_union_passing4h WRAP_CALL(check_union_passing4h) +#define check_union_passing5h WRAP_CALL(check_union_passing5h) +#define check_union_passing6h WRAP_CALL(check_union_passing6h) + + +static void +do_test (void) +{ + union un1 u1[8]; + union un2 u2[8]; + union un3 u3[8]; + union un4 u4; + union un5 u5; + union un6 u6[8]; + union un1h u1h[8]; + union un1hf u1hf[8]; + union un1hh u1hh[8]; + union un2h u2h[8]; + union un3h u3h[8]; + union un4h u4h; + union un5h u5h; + union un6h u6h[8]; + int i; + + for (i = 0; i < 8; i++) + { + u1[i].x = (__m512){32+i, 0, i, 0, -i, 0, i - 12, i + 8, + 32+i, 0, i, 0, -i, 0, i - 12, i + 8}; + + u1hf[i].x = (__m512h){ 33+i, 1, i, 2, -i, 0, i - 15, i + 9, + 34+i, 1, i, 2, -i, 0, i - 15, i + 9, + 35+i, 1, i, 2, -i, 0, i - 15, i + 9, + 36+i, 1, i, 2, -i, 0, i - 15, i + 9}; + } + + clear_struct_registers; + for (i = 0; i < 8; i++) + (&fregs.zmm0)[i]._m512[0] = u1[i].x; + num_fregs = 8; + check_union_passing1(u1[0], u1[1], u1[2], u1[3], + u1[4], u1[5], u1[6], u1[7]); + + clear_struct_registers; + for (i = 0; i < 8; i++) + { + u1h[i].x = u1[i].x; + (&fregs.zmm0)[i]._m512[0] = u1h[i].x; + } + num_fregs = 8; + check_union_passing1h(u1h[0], u1h[1], u1h[2], u1h[3], + u1h[4], u1h[5], u1h[6], u1h[7]); + + clear_struct_registers; + for (i = 0; i < 8; i++) + (&fregs.zmm0)[i]._m512h[0] = u1hf[i].x; + num_fregs = 8; + check_union_passing1hf(u1hf[0], u1hf[1], u1hf[2], u1hf[3], + u1hf[4], u1hf[5], u1hf[6], u1hf[7]); + + clear_struct_registers; + for (i = 0; i < 8; i++) + { + u1hh[i].x = u1hf[i].x; + (&fregs.zmm0)[i]._m512h[0] = u1hh[i].x; + } + num_fregs = 8; + check_union_passing1hh(u1hh[0], u1hh[1], u1hh[2], u1hh[3], + u1hh[4], u1hh[5], u1hh[6], u1hh[7]); + + clear_struct_registers; + for (i = 0; i < 8; i++) + { + u2[i].x = u1[i].x; + (&fregs.zmm0)[i]._m512[0] = u2[i].x; + } + num_fregs = 8; + check_union_passing2(u2[0], u2[1], u2[2], u2[3], + u2[4], u2[5], u2[6], u2[7]); + + clear_struct_registers; + for (i = 0; i < 8; i++) + { + u2h[i].x = u1hf[i].x; + (&fregs.zmm0)[i]._m512h[0] = u2h[i].x; + } + num_fregs = 8; + check_union_passing2h(u2h[0], u2h[1], u2h[2], u2h[3], + u2h[4], u2h[5], u2h[6], u2h[7]); + + clear_struct_registers; + for (i = 0; i < 8; i++) + { + u3[i].x = u1[i].x; + (&fregs.zmm0)[i]._m512[0] = u3[i].x; + } + num_fregs = 8; + check_union_passing3(u3[0], u3[1], u3[2], u3[3], + u3[4], u3[5], u3[6], u3[7]); + + clear_struct_registers; + for (i = 0; i < 8; i++) + { + u3h[i].x = u1hf[i].x; + (&fregs.zmm0)[i]._m512h[0] = u3h[i].x; + } + num_fregs = 8; + check_union_passing3h(u3h[0], u3h[1], u3h[2], u3h[3], + u3h[4], u3h[5], u3h[6], u3h[7]); + + check_union_passing4(u4); + check_union_passing5(u5); + + check_union_passing4h(u4h); + check_union_passing5h(u5h); + + clear_struct_registers; + for (i = 0; i < 8; i++) + { + u6[i].x = u1[i].x; + (&fregs.zmm0)[i]._m512[0] = u6[i].x; + } + num_fregs = 8; + check_union_passing6(u6[0], u6[1], u6[2], u6[3], + u6[4], u6[5], u6[6], u6[7]); + + clear_struct_registers; + for (i = 0; i < 8; i++) + { + u6h[i].x = u1hf[i].x; + (&fregs.zmm0)[i]._m512h[0] = u6h[i].x; + } + num_fregs = 8; + check_union_passing6h(u6h[0], u6h[1], u6h[2], u6h[3], + u6h[4], u6h[5], u6h[6], u6h[7]); +} diff --git a/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m512h/test_varargs-m512.c b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m512h/test_varargs-m512.c new file mode 100644 index 00000000000..e6d165a8247 --- /dev/null +++ b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m512h/test_varargs-m512.c @@ -0,0 +1,164 @@ +/* Test variable number of 512-bit vector arguments passed to functions. */ + +#include +#include "avx512fp16-zmm-check.h" +#include "args.h" + +struct IntegerRegisters iregs; +struct FloatRegisters fregs; + +/* This struct holds values for argument checking. */ +struct +{ + ZMM_T i0, i1, i2, i3, i4, i5, i6, i7, i8, i9; +} values; + +char *pass; +int failed = 0; + +#undef assert +#define assert(c) do { \ + if (!(c)) {failed++; printf ("failed %s\n", pass); } \ +} while (0) + +#define compare(X1,X2,T) do { \ + assert (memcmp (&X1, &X2, sizeof (T)) == 0); \ +} while (0) + +void +fun_check_passing_m512_varargs (__m512 i0, __m512 i1, __m512 i2, + __m512 i3, ...) +{ + /* Check argument values. */ + void **fp = __builtin_frame_address (0); + void *ra = __builtin_return_address (0); + __m512 *argp; + + compare (values.i0, i0, __m512); + compare (values.i1, i1, __m512); + compare (values.i2, i2, __m512); + compare (values.i3, i3, __m512); + + /* Get the pointer to the return address on stack. */ + while (*fp != ra) + fp++; + + /* Skip the return address stack slot. */ + argp = (__m512 *)(((char *) fp) + 8); + + /* Check __m512 arguments passed on stack. */ + compare (values.i4, argp[0], __m512); + compare (values.i5, argp[1], __m512); + compare (values.i6, argp[2], __m512); + compare (values.i7, argp[3], __m512); + compare (values.i8, argp[4], __m512); + compare (values.i9, argp[5], __m512); + + /* Check register contents. */ + compare (fregs.zmm0, zmm_regs[0], __m512); + compare (fregs.zmm1, zmm_regs[1], __m512); + compare (fregs.zmm2, zmm_regs[2], __m512); + compare (fregs.zmm3, zmm_regs[3], __m512); +} + +void +fun_check_passing_m512h_varargs (__m512h i0, __m512h i1, __m512h i2, + __m512h i3, ...) +{ + /* Check argument values. */ + void **fp = __builtin_frame_address (0); + void *ra = __builtin_return_address (0); + __m512h *argp; + + compare (values.i0, i0, __m512h); + compare (values.i1, i1, __m512h); + compare (values.i2, i2, __m512h); + compare (values.i3, i3, __m512h); + + /* Get the pointer to the return address on stack. */ + while (*fp != ra) + fp++; + + /* Skip the return address stack slot. */ + argp = (__m512h *)(((char *) fp) + 8); + + /* Check __m512h arguments passed on stack. */ + compare (values.i4, argp[0], __m512h); + compare (values.i5, argp[1], __m512h); + compare (values.i6, argp[2], __m512h); + compare (values.i7, argp[3], __m512h); + compare (values.i8, argp[4], __m512h); + compare (values.i9, argp[5], __m512h); + + /* Check register contents. */ + compare (fregs.zmm0, zmm_regs[0], __m512h); + compare (fregs.zmm1, zmm_regs[1], __m512h); + compare (fregs.zmm2, zmm_regs[2], __m512h); + compare (fregs.zmm3, zmm_regs[3], __m512h); +} + +#define def_check_int_passing_varargs(_i0, _i1, _i2, _i3, _i4, _i5, \ + _i6, _i7, _i8, _i9, \ + _func, TYPE) \ + values.i0.TYPE[0] = _i0; \ + values.i1.TYPE[0] = _i1; \ + values.i2.TYPE[0] = _i2; \ + values.i3.TYPE[0] = _i3; \ + values.i4.TYPE[0] = _i4; \ + values.i5.TYPE[0] = _i5; \ + values.i6.TYPE[0] = _i6; \ + values.i7.TYPE[0] = _i7; \ + values.i8.TYPE[0] = _i8; \ + values.i9.TYPE[0] = _i9; \ + clear_struct_registers; \ + fregs.F0.TYPE[0] = _i0; \ + fregs.F1.TYPE[0] = _i1; \ + fregs.F2.TYPE[0] = _i2; \ + fregs.F3.TYPE[0] = _i3; \ + WRAP_CALL(_func) (_i0, _i1, _i2, _i3, _i4, _i5, _i6, _i7, _i8, _i9); + +void +test_m512_varargs (void) +{ + __m512 x[10]; + int i; + for (i = 0; i < 10; i++) + x[i] = (__m512){32+i, 0, 0, 0, 0, 0, 0, 0}; + pass = "m512-varargs"; + def_check_int_passing_varargs (x[0], x[1], x[2], x[3], x[4], x[5], + x[6], x[7], x[8], x[9], + fun_check_passing_m512_varargs, + _m512); +} + +void +test_m512h_varargs (void) +{ + __m512h x[10]; + int i; + for (i = 0; i < 10; i++) + x[i] = (__m512h) { + 1.1f16 + i, 2.2f16 + i, 3.3f16 + i, 4.4f16 + i, + 5.5f16 + i, 6.6f16 + i, 7.7f16 + i, 8.8f16 + i, + 9.9f16 + i, 10.10f16 + i, 11.11f16 + i, 12.12f16 + i, + 13.13f16 + i, 14.14f16 + i, 15.15f16 + i, 16.16f16 + i, + 17.17f16 + i, 18.18f16 + i, 19.19f16 + i, 20.20f16 + i, + 21.21f16 + i, 22.22f16 + i, 23.23f16 + i, 24.24f16 + i, + 25.25f16 + i, 26.26f16 + i, 27.27f16 + i, 28.28f16 + i, + 29.29f16 + i, 30.30f16 + i, 31.31f16 + i, 32.32f16 + i + }; + pass = "m512h-varargs"; + def_check_int_passing_varargs (x[0], x[1], x[2], x[3], x[4], x[5], + x[6], x[7], x[8], x[9], + fun_check_passing_m512h_varargs, + _m512h); +} + +void +do_test (void) +{ + test_m512_varargs (); + test_m512h_varargs (); + if (failed) + abort (); +} From patchwork Thu Jul 1 06:15:53 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: liuhongt X-Patchwork-Id: 1499310 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=8.43.85.97; helo=sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256 header.s=default header.b=eQ3S+u0s; dkim-atps=neutral Received: from sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4GFp8w59zmz9sWX for ; Thu, 1 Jul 2021 16:24:48 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 69DC7385F034 for ; Thu, 1 Jul 2021 06:24:46 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 69DC7385F034 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1625120686; bh=cguBXhadeUrt1YzLjaDL2oBgDfxfCmV96VxuuySv80c=; h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=eQ3S+u0sSbHRXeWYL8QUszgP6v+Bin1q0+qxtbK74l5t/50wxEZn25Ucj7mu+WJCH NonFg5/JyK8onbQcHVjshHpIGis5QuuiBic0iZbKsaAAG867AKUgs1RjWmlH2e6aom MvBFXLavae1qGHL09NlKx6o15iN/RkcCd1/ucdK4= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by sourceware.org (Postfix) with ESMTPS id 84075384B006 for ; Thu, 1 Jul 2021 06:17:03 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 84075384B006 X-IronPort-AV: E=McAfee;i="6200,9189,10031"; a="205474389" X-IronPort-AV: E=Sophos;i="5.83,313,1616482800"; d="scan'208";a="205474389" Received: from orsmga007.jf.intel.com ([10.7.209.58]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Jun 2021 23:17:03 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.83,313,1616482800"; d="scan'208";a="447761415" Received: from scymds01.sc.intel.com ([10.148.94.138]) by orsmga007.jf.intel.com with ESMTP; 30 Jun 2021 23:17:02 -0700 Received: from shliclel320.sh.intel.com (shliclel320.sh.intel.com [10.239.236.50]) by scymds01.sc.intel.com with ESMTP id 1616Gmek031625; Wed, 30 Jun 2021 23:17:01 -0700 To: gcc-patches@gcc.gnu.org Subject: [PATCH 07/62] AVX512FP16: Add vaddph/vsubph/vdivph/vmulph. Date: Thu, 1 Jul 2021 14:15:53 +0800 Message-Id: <20210701061648.9447-8-hongtao.liu@intel.com> X-Mailer: git-send-email 2.18.1 In-Reply-To: <20210701061648.9447-1-hongtao.liu@intel.com> References: <20210701061648.9447-1-hongtao.liu@intel.com> X-Spam-Status: No, score=-12.8 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_NONE, KAM_DMARC_STATUS, KAM_LAZY_DOMAIN_SECURITY, KAM_SHORT, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: liuhongt via Gcc-patches From: liuhongt Reply-To: liuhongt Cc: jakub@redhat.com Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" gcc/ChangeLog: * config.gcc: Add avx512fp16vlintrin.h. * config/i386/avx512fp16intrin.h: (_mm512_add_ph): New intrinsic. (_mm512_mask_add_ph): Likewise. (_mm512_maskz_add_ph): Likewise. (_mm512_sub_ph): Likewise. (_mm512_mask_sub_ph): Likewise. (_mm512_maskz_sub_ph): Likewise. (_mm512_mul_ph): Likewise. (_mm512_mask_mul_ph): Likewise. (_mm512_maskz_mul_ph): Likewise. (_mm512_div_ph): Likewise. (_mm512_mask_div_ph): Likewise. (_mm512_maskz_div_ph): Likewise. (_mm512_add_round_ph): Likewise. (_mm512_mask_add_round_ph): Likewise. (_mm512_maskz_add_round_ph): Likewise. (_mm512_sub_round_ph): Likewise. (_mm512_mask_sub_round_ph): Likewise. (_mm512_maskz_sub_round_ph): Likewise. (_mm512_mul_round_ph): Likewise. (_mm512_mask_mul_round_ph): Likewise. (_mm512_maskz_mul_round_ph): Likewise. (_mm512_div_round_ph): Likewise. (_mm512_mask_div_round_ph): Likewise. (_mm512_maskz_div_round_ph): Likewise. * config/i386/avx512fp16vlintrin.h: New header. * config/i386/i386-builtin-types.def (V16HF, V8HF, V32HF): Add new builtin types. * config/i386/i386-builtin.def: Add corresponding builtins. * config/i386/i386-expand.c (ix86_expand_args_builtin): Handle new builtin types. (ix86_expand_round_builtin): Likewise. * config/i386/immintrin.h: Include avx512fp16vlintrin.h * config/i386/sse.md (VFH): New mode_iterator. (VF2H): Likewise. (avx512fmaskmode): Add HF vector modes. (avx512fmaskhalfmode): Likewise. (3): Adjust to for HF vector modes. (*3): Likewise. (mul3): Likewise. (*mul3): Likewise. (div3): Likewise. (_div3): Likewise. * config/i386/subst.md (SUBST_V): Add HF vector modes. (SUBST_A): Likewise. (round_mode512bit_condition): Adjust for V32HFmode. gcc/testsuite/ChangeLog: * gcc.target/i386/avx-1.c: Add -mavx512vl and test for new intrinsics. * gcc.target/i386/avx-2.c: Add -mavx512vl. * gcc.target/i386/avx512fp16-11a.c: New test. * gcc.target/i386/avx512fp16-11b.c: Ditto. * gcc.target/i386/avx512vlfp16-11a.c: Ditto. * gcc.target/i386/avx512vlfp16-11b.c: Ditto. * gcc.target/i386/sse-13.c: Add test for new builtins. * gcc.target/i386/sse-23.c: Ditto. * gcc.target/i386/sse-14.c: Add test for new intrinsics. * gcc.target/i386/sse-22.c: Ditto. --- gcc/config.gcc | 2 +- gcc/config/i386/avx512fp16intrin.h | 251 ++++++++++++++++++ gcc/config/i386/avx512fp16vlintrin.h | 219 +++++++++++++++ gcc/config/i386/i386-builtin-types.def | 7 + gcc/config/i386/i386-builtin.def | 20 ++ gcc/config/i386/i386-expand.c | 5 + gcc/config/i386/immintrin.h | 2 + gcc/config/i386/sse.md | 62 +++-- gcc/config/i386/subst.md | 6 +- gcc/testsuite/gcc.target/i386/avx-1.c | 8 +- gcc/testsuite/gcc.target/i386/avx-2.c | 2 +- .../gcc.target/i386/avx512fp16-11a.c | 36 +++ .../gcc.target/i386/avx512fp16-11b.c | 75 ++++++ .../gcc.target/i386/avx512vlfp16-11a.c | 68 +++++ .../gcc.target/i386/avx512vlfp16-11b.c | 96 +++++++ gcc/testsuite/gcc.target/i386/sse-13.c | 6 + gcc/testsuite/gcc.target/i386/sse-14.c | 14 + gcc/testsuite/gcc.target/i386/sse-22.c | 14 + gcc/testsuite/gcc.target/i386/sse-23.c | 6 + 19 files changed, 872 insertions(+), 27 deletions(-) create mode 100644 gcc/config/i386/avx512fp16vlintrin.h create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-11a.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-11b.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512vlfp16-11a.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512vlfp16-11b.c diff --git a/gcc/config.gcc b/gcc/config.gcc index 5b4f894185a..d64a8b9407e 100644 --- a/gcc/config.gcc +++ b/gcc/config.gcc @@ -416,7 +416,7 @@ i[34567]86-*-* | x86_64-*-*) tsxldtrkintrin.h amxtileintrin.h amxint8intrin.h amxbf16intrin.h x86gprintrin.h uintrintrin.h hresetintrin.h keylockerintrin.h avxvnniintrin.h - mwaitintrin.h avx512fp16intrin.h" + mwaitintrin.h avx512fp16intrin.h avx512fp16vlintrin.h" ;; ia64-*-*) extra_headers=ia64intrin.h diff --git a/gcc/config/i386/avx512fp16intrin.h b/gcc/config/i386/avx512fp16intrin.h index 3fc0770986e..3e9d676dc39 100644 --- a/gcc/config/i386/avx512fp16intrin.h +++ b/gcc/config/i386/avx512fp16intrin.h @@ -217,6 +217,257 @@ _mm_store_sh (void *__P, __m128h __A) *(_Float16 *) __P = ((__v8hf)__A)[0]; } +/* Intrinsics v[add,sub,mul,div]ph. */ +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_add_ph (__m512h __A, __m512h __B) +{ + return (__m512h) ((__v32hf) __A + (__v32hf) __B); +} + +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_add_ph (__m512h __A, __mmask32 __B, __m512h __C, __m512h __D) +{ + return __builtin_ia32_vaddph_v32hf_mask (__C, __D, __A, __B); +} + +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_add_ph (__mmask32 __A, __m512h __B, __m512h __C) +{ + return __builtin_ia32_vaddph_v32hf_mask (__B, __C, + _mm512_setzero_ph (), __A); +} + +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_sub_ph (__m512h __A, __m512h __B) +{ + return (__m512h) ((__v32hf) __A - (__v32hf) __B); +} + +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_sub_ph (__m512h __A, __mmask32 __B, __m512h __C, __m512h __D) +{ + return __builtin_ia32_vsubph_v32hf_mask (__C, __D, __A, __B); +} + +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_sub_ph (__mmask32 __A, __m512h __B, __m512h __C) +{ + return __builtin_ia32_vsubph_v32hf_mask (__B, __C, + _mm512_setzero_ph (), __A); +} + +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mul_ph (__m512h __A, __m512h __B) +{ + return (__m512h) ((__v32hf) __A * (__v32hf) __B); +} + +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_mul_ph (__m512h __A, __mmask32 __B, __m512h __C, __m512h __D) +{ + return __builtin_ia32_vmulph_v32hf_mask (__C, __D, __A, __B); +} + +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_mul_ph (__mmask32 __A, __m512h __B, __m512h __C) +{ + return __builtin_ia32_vmulph_v32hf_mask (__B, __C, + _mm512_setzero_ph (), __A); +} + +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_div_ph (__m512h __A, __m512h __B) +{ + return (__m512h) ((__v32hf) __A / (__v32hf) __B); +} + +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_div_ph (__m512h __A, __mmask32 __B, __m512h __C, __m512h __D) +{ + return __builtin_ia32_vdivph_v32hf_mask (__C, __D, __A, __B); +} + +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_div_ph (__mmask32 __A, __m512h __B, __m512h __C) +{ + return __builtin_ia32_vdivph_v32hf_mask (__B, __C, + _mm512_setzero_ph (), __A); +} + +#ifdef __OPTIMIZE__ +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_add_round_ph (__m512h __A, __m512h __B, const int __C) +{ + return __builtin_ia32_vaddph_v32hf_mask_round (__A, __B, + _mm512_setzero_ph (), + (__mmask32) -1, __C); +} + +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_add_round_ph (__m512h __A, __mmask32 __B, __m512h __C, + __m512h __D, const int __E) +{ + return __builtin_ia32_vaddph_v32hf_mask_round (__C, __D, __A, __B, __E); +} + +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_add_round_ph (__mmask32 __A, __m512h __B, __m512h __C, + const int __D) +{ + return __builtin_ia32_vaddph_v32hf_mask_round (__B, __C, + _mm512_setzero_ph (), + __A, __D); +} + +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_sub_round_ph (__m512h __A, __m512h __B, const int __C) +{ + return __builtin_ia32_vsubph_v32hf_mask_round (__A, __B, + _mm512_setzero_ph (), + (__mmask32) -1, __C); +} + +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_sub_round_ph (__m512h __A, __mmask32 __B, __m512h __C, + __m512h __D, const int __E) +{ + return __builtin_ia32_vsubph_v32hf_mask_round (__C, __D, __A, __B, __E); +} + +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_sub_round_ph (__mmask32 __A, __m512h __B, __m512h __C, + const int __D) +{ + return __builtin_ia32_vsubph_v32hf_mask_round (__B, __C, + _mm512_setzero_ph (), + __A, __D); +} + +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mul_round_ph (__m512h __A, __m512h __B, const int __C) +{ + return __builtin_ia32_vmulph_v32hf_mask_round (__A, __B, + _mm512_setzero_ph (), + (__mmask32) -1, __C); +} + +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_mul_round_ph (__m512h __A, __mmask32 __B, __m512h __C, + __m512h __D, const int __E) +{ + return __builtin_ia32_vmulph_v32hf_mask_round (__C, __D, __A, __B, __E); +} + +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_mul_round_ph (__mmask32 __A, __m512h __B, __m512h __C, + const int __D) +{ + return __builtin_ia32_vmulph_v32hf_mask_round (__B, __C, + _mm512_setzero_ph (), + __A, __D); +} + +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_div_round_ph (__m512h __A, __m512h __B, const int __C) +{ + return __builtin_ia32_vdivph_v32hf_mask_round (__A, __B, + _mm512_setzero_ph (), + (__mmask32) -1, __C); +} + +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_div_round_ph (__m512h __A, __mmask32 __B, __m512h __C, + __m512h __D, const int __E) +{ + return __builtin_ia32_vdivph_v32hf_mask_round (__C, __D, __A, __B, __E); +} + +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_div_round_ph (__mmask32 __A, __m512h __B, __m512h __C, + const int __D) +{ + return __builtin_ia32_vdivph_v32hf_mask_round (__B, __C, + _mm512_setzero_ph (), + __A, __D); +} +#else +#define _mm512_add_round_ph(A, B, C) \ + ((__m512h)__builtin_ia32_vaddph_v32hf_mask_round((A), (B), \ + _mm512_setzero_ph (),\ + (__mmask32)-1, (C))) + +#define _mm512_mask_add_round_ph(A, B, C, D, E) \ + ((__m512h)__builtin_ia32_vaddph_v32hf_mask_round((C), (D), (A), (B), (E))) + +#define _mm512_maskz_add_round_ph(A, B, C, D) \ + ((__m512h)__builtin_ia32_vaddph_v32hf_mask_round((B), (C), \ + _mm512_setzero_ph (),\ + (A), (D))) + +#define _mm512_sub_round_ph(A, B, C) \ + ((__m512h)__builtin_ia32_vsubph_v32hf_mask_round((A), (B), \ + _mm512_setzero_ph (),\ + (__mmask32)-1, (C))) + +#define _mm512_mask_sub_round_ph(A, B, C, D, E) \ + ((__m512h)__builtin_ia32_vsubph_v32hf_mask_round((C), (D), (A), (B), (E))) + +#define _mm512_maskz_sub_round_ph(A, B, C, D) \ + ((__m512h)__builtin_ia32_vsubph_v32hf_mask_round((B), (C), \ + _mm512_setzero_ph (),\ + (A), (D))) + +#define _mm512_mul_round_ph(A, B, C) \ + ((__m512h)__builtin_ia32_vmulph_v32hf_mask_round((A), (B), \ + _mm512_setzero_ph (),\ + (__mmask32)-1, (C))) + +#define _mm512_mask_mul_round_ph(A, B, C, D, E) \ + ((__m512h)__builtin_ia32_vmulph_v32hf_mask_round((C), (D), (A), (B), (E))) + +#define _mm512_maskz_mul_round_ph(A, B, C, D) \ + ((__m512h)__builtin_ia32_vmulph_v32hf_mask_round((B), (C), \ + _mm512_setzero_ph (),\ + (A), (D))) + +#define _mm512_div_round_ph(A, B, C) \ + ((__m512h)__builtin_ia32_vdivph_v32hf_mask_round((A), (B), \ + _mm512_setzero_ph (),\ + (__mmask32)-1, (C))) + +#define _mm512_mask_div_round_ph(A, B, C, D, E) \ + ((__m512h)__builtin_ia32_vdivph_v32hf_mask_round((C), (D), (A), (B), (E))) + +#define _mm512_maskz_div_round_ph(A, B, C, D) \ + ((__m512h)__builtin_ia32_vdivph_v32hf_mask_round((B), (C), \ + _mm512_setzero_ph (),\ + (A), (D))) +#endif /* __OPTIMIZE__ */ + #ifdef __DISABLE_AVX512FP16__ #undef __DISABLE_AVX512FP16__ #pragma GCC pop_options diff --git a/gcc/config/i386/avx512fp16vlintrin.h b/gcc/config/i386/avx512fp16vlintrin.h new file mode 100644 index 00000000000..75fa9eb29e7 --- /dev/null +++ b/gcc/config/i386/avx512fp16vlintrin.h @@ -0,0 +1,219 @@ +/* Copyright (C) 2019 Free Software Foundation, Inc. + + This file is part of GCC. + + GCC is free software; you can redistribute it and/or modify + it under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 3, or (at your option) + any later version. + + GCC is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + Under Section 7 of GPL version 3, you are granted additional + permissions described in the GCC Runtime Library Exception, version + 3.1, as published by the Free Software Foundation. + + You should have received a copy of the GNU General Public License and + a copy of the GCC Runtime Library Exception along with this program; + see the files COPYING3 and COPYING.RUNTIME respectively. If not, see + . */ + +#ifndef _IMMINTRIN_H_INCLUDED +#error "Never use directly; include instead." +#endif + +#ifndef __AVX512FP16VLINTRIN_H_INCLUDED +#define __AVX512FP16VLINTRIN_H_INCLUDED + +#if !defined(__AVX512VL__) || !defined(__AVX512FP16__) +#pragma GCC push_options +#pragma GCC target("avx512fp16,avx512vl") +#define __DISABLE_AVX512FP16VL__ +#endif /* __AVX512FP16VL__ */ + +/* Intrinsics v[add,sub,mul,div]ph. */ +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_add_ph (__m128h __A, __m128h __B) +{ + return (__m128h) ((__v8hf) __A + (__v8hf) __B); +} + +extern __inline __m256h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_add_ph (__m256h __A, __m256h __B) +{ + return (__m256h) ((__v16hf) __A + (__v16hf) __B); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_add_ph (__m128h __A, __mmask8 __B, __m128h __C, __m128h __D) +{ + return __builtin_ia32_vaddph_v8hf_mask (__C, __D, __A, __B); +} + +extern __inline __m256h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_mask_add_ph (__m256h __A, __mmask16 __B, __m256h __C, __m256h __D) +{ + return __builtin_ia32_vaddph_v16hf_mask (__C, __D, __A, __B); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_add_ph (__mmask8 __A, __m128h __B, __m128h __C) +{ + return __builtin_ia32_vaddph_v8hf_mask (__B, __C, _mm_setzero_ph (), + __A); +} + +extern __inline __m256h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_maskz_add_ph (__mmask16 __A, __m256h __B, __m256h __C) +{ + return __builtin_ia32_vaddph_v16hf_mask (__B, __C, + _mm256_setzero_ph (), __A); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_sub_ph (__m128h __A, __m128h __B) +{ + return (__m128h) ((__v8hf) __A - (__v8hf) __B); +} + +extern __inline __m256h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_sub_ph (__m256h __A, __m256h __B) +{ + return (__m256h) ((__v16hf) __A - (__v16hf) __B); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_sub_ph (__m128h __A, __mmask8 __B, __m128h __C, __m128h __D) +{ + return __builtin_ia32_vsubph_v8hf_mask (__C, __D, __A, __B); +} + +extern __inline __m256h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_mask_sub_ph (__m256h __A, __mmask16 __B, __m256h __C, __m256h __D) +{ + return __builtin_ia32_vsubph_v16hf_mask (__C, __D, __A, __B); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_sub_ph (__mmask8 __A, __m128h __B, __m128h __C) +{ + return __builtin_ia32_vsubph_v8hf_mask (__B, __C, _mm_setzero_ph (), + __A); +} + +extern __inline __m256h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_maskz_sub_ph (__mmask16 __A, __m256h __B, __m256h __C) +{ + return __builtin_ia32_vsubph_v16hf_mask (__B, __C, + _mm256_setzero_ph (), __A); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mul_ph (__m128h __A, __m128h __B) +{ + return (__m128h) ((__v8hf) __A * (__v8hf) __B); +} + +extern __inline __m256h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_mul_ph (__m256h __A, __m256h __B) +{ + return (__m256h) ((__v16hf) __A * (__v16hf) __B); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_mul_ph (__m128h __A, __mmask8 __B, __m128h __C, __m128h __D) +{ + return __builtin_ia32_vmulph_v8hf_mask (__C, __D, __A, __B); +} + +extern __inline __m256h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_mask_mul_ph (__m256h __A, __mmask16 __B, __m256h __C, __m256h __D) +{ + return __builtin_ia32_vmulph_v16hf_mask (__C, __D, __A, __B); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_mul_ph (__mmask8 __A, __m128h __B, __m128h __C) +{ + return __builtin_ia32_vmulph_v8hf_mask (__B, __C, _mm_setzero_ph (), + __A); +} + +extern __inline __m256h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_maskz_mul_ph (__mmask16 __A, __m256h __B, __m256h __C) +{ + return __builtin_ia32_vmulph_v16hf_mask (__B, __C, + _mm256_setzero_ph (), __A); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_div_ph (__m128h __A, __m128h __B) +{ + return (__m128h) ((__v8hf) __A / (__v8hf) __B); +} + +extern __inline __m256h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_div_ph (__m256h __A, __m256h __B) +{ + return (__m256h) ((__v16hf) __A / (__v16hf) __B); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_div_ph (__m128h __A, __mmask8 __B, __m128h __C, __m128h __D) +{ + return __builtin_ia32_vdivph_v8hf_mask (__C, __D, __A, __B); +} + +extern __inline __m256h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_mask_div_ph (__m256h __A, __mmask16 __B, __m256h __C, __m256h __D) +{ + return __builtin_ia32_vdivph_v16hf_mask (__C, __D, __A, __B); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_div_ph (__mmask8 __A, __m128h __B, __m128h __C) +{ + return __builtin_ia32_vdivph_v8hf_mask (__B, __C, _mm_setzero_ph (), + __A); +} + +extern __inline __m256h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_maskz_div_ph (__mmask16 __A, __m256h __B, __m256h __C) +{ + return __builtin_ia32_vdivph_v16hf_mask (__B, __C, + _mm256_setzero_ph (), __A); +} + +#ifdef __DISABLE_AVX512FP16VL__ +#undef __DISABLE_AVX512FP16VL__ +#pragma GCC pop_options +#endif /* __DISABLE_AVX512FP16VL__ */ + +#endif /* __AVX512FP16VLINTRIN_H_INCLUDED */ diff --git a/gcc/config/i386/i386-builtin-types.def b/gcc/config/i386/i386-builtin-types.def index eb5153002ae..ee3b8c30589 100644 --- a/gcc/config/i386/i386-builtin-types.def +++ b/gcc/config/i386/i386-builtin-types.def @@ -98,6 +98,7 @@ DEF_VECTOR_TYPE (V16UQI, UQI, V16QI) # AVX vectors DEF_VECTOR_TYPE (V4DF, DOUBLE) DEF_VECTOR_TYPE (V8SF, FLOAT) +DEF_VECTOR_TYPE (V16HF, FLOAT16) DEF_VECTOR_TYPE (V4DI, DI) DEF_VECTOR_TYPE (V8SI, SI) DEF_VECTOR_TYPE (V16HI, HI) @@ -108,6 +109,7 @@ DEF_VECTOR_TYPE (V16UHI, UHI, V16HI) # AVX512F vectors DEF_VECTOR_TYPE (V32SF, FLOAT) +DEF_VECTOR_TYPE (V32HF, FLOAT16) DEF_VECTOR_TYPE (V16SF, FLOAT) DEF_VECTOR_TYPE (V8DF, DOUBLE) DEF_VECTOR_TYPE (V8DI, DI) @@ -1302,3 +1304,8 @@ DEF_FUNCTION_TYPE (UINT8, PV2DI, PCV2DI, PCVOID) # FP16 builtins DEF_FUNCTION_TYPE (V8HF, V8HI) +DEF_FUNCTION_TYPE (V8HF, V8HF, V8HF, V8HF, UQI) +DEF_FUNCTION_TYPE (V16HF, V16HF, V16HF, V16HF, UHI) +DEF_FUNCTION_TYPE (V32HF, V32HF, V32HF, INT) +DEF_FUNCTION_TYPE (V32HF, V32HF, V32HF, V32HF, USI) +DEF_FUNCTION_TYPE (V32HF, V32HF, V32HF, V32HF, USI, INT) diff --git a/gcc/config/i386/i386-builtin.def b/gcc/config/i386/i386-builtin.def index 1cc0cc6968c..b783d266dd8 100644 --- a/gcc/config/i386/i386-builtin.def +++ b/gcc/config/i386/i386-builtin.def @@ -2774,6 +2774,20 @@ BDESC (0, OPTION_MASK_ISA2_AVX512BF16, CODE_FOR_avx512f_dpbf16ps_v4sf, "__builti BDESC (0, OPTION_MASK_ISA2_AVX512BF16, CODE_FOR_avx512f_dpbf16ps_v4sf_mask, "__builtin_ia32_dpbf16ps_v4sf_mask", IX86_BUILTIN_DPHI16PS_V4SF_MASK, UNKNOWN, (int) V4SF_FTYPE_V4SF_V8HI_V8HI_UQI) BDESC (0, OPTION_MASK_ISA2_AVX512BF16, CODE_FOR_avx512f_dpbf16ps_v4sf_maskz, "__builtin_ia32_dpbf16ps_v4sf_maskz", IX86_BUILTIN_DPHI16PS_V4SF_MASKZ, UNKNOWN, (int) V4SF_FTYPE_V4SF_V8HI_V8HI_UQI) +/* AVX512FP16. */ +BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_addv8hf3_mask, "__builtin_ia32_vaddph_v8hf_mask", IX86_BUILTIN_VADDPH_V8HF_MASK, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI) +BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_addv16hf3_mask, "__builtin_ia32_vaddph_v16hf_mask", IX86_BUILTIN_VADDPH_V16HF_MASK, UNKNOWN, (int) V16HF_FTYPE_V16HF_V16HF_V16HF_UHI) +BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_addv32hf3_mask, "__builtin_ia32_vaddph_v32hf_mask", IX86_BUILTIN_VADDPH_V32HF_MASK, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI) +BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_subv8hf3_mask, "__builtin_ia32_vsubph_v8hf_mask", IX86_BUILTIN_VSUBPH_V8HF_MASK, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI) +BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_subv16hf3_mask, "__builtin_ia32_vsubph_v16hf_mask", IX86_BUILTIN_VSUBPH_V16HF_MASK, UNKNOWN, (int) V16HF_FTYPE_V16HF_V16HF_V16HF_UHI) +BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_subv32hf3_mask, "__builtin_ia32_vsubph_v32hf_mask", IX86_BUILTIN_VSUBPH_V32HF_MASK, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI) +BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_mulv8hf3_mask, "__builtin_ia32_vmulph_v8hf_mask", IX86_BUILTIN_VMULPH_V8HF_MASK, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI) +BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_mulv16hf3_mask, "__builtin_ia32_vmulph_v16hf_mask", IX86_BUILTIN_VMULPH_V16HF_MASK, UNKNOWN, (int) V16HF_FTYPE_V16HF_V16HF_V16HF_UHI) +BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_mulv32hf3_mask, "__builtin_ia32_vmulph_v32hf_mask", IX86_BUILTIN_VMULPH_V32HF_MASK, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI) +BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_divv8hf3_mask, "__builtin_ia32_vdivph_v8hf_mask", IX86_BUILTIN_VDIVPH_V8HF_MASK, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI) +BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_divv16hf3_mask, "__builtin_ia32_vdivph_v16hf_mask", IX86_BUILTIN_VDIVPH_V16HF_MASK, UNKNOWN, (int) V16HF_FTYPE_V16HF_V16HF_V16HF_UHI) +BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_divv32hf3_mask, "__builtin_ia32_vdivph_v32hf_mask", IX86_BUILTIN_VDIVPH_V32HF_MASK, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI) + /* Builtins with rounding support. */ BDESC_END (ARGS, ROUND_ARGS) @@ -2973,6 +2987,12 @@ BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_fixuns_truncv8dfv8di2_mask_round, " BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_avx512dq_rangepv16sf_mask_round, "__builtin_ia32_rangeps512_mask", IX86_BUILTIN_RANGEPS512, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_INT_V16SF_HI_INT) BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_avx512dq_rangepv8df_mask_round, "__builtin_ia32_rangepd512_mask", IX86_BUILTIN_RANGEPD512, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_INT_V8DF_QI_INT) +/* AVX512FP16. */ +BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_addv32hf3_mask_round, "__builtin_ia32_vaddph_v32hf_mask_round", IX86_BUILTIN_VADDPH_V32HF_MASK_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT) +BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_subv32hf3_mask_round, "__builtin_ia32_vsubph_v32hf_mask_round", IX86_BUILTIN_VSUBPH_V32HF_MASK_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT) +BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_mulv32hf3_mask_round, "__builtin_ia32_vmulph_v32hf_mask_round", IX86_BUILTIN_VMULPH_V32HF_MASK_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT) +BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_divv32hf3_mask_round, "__builtin_ia32_vdivph_v32hf_mask_round", IX86_BUILTIN_VDIVPH_V32HF_MASK_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT) + BDESC_END (ROUND_ARGS, MULTI_ARG) /* FMA4 and XOP. */ diff --git a/gcc/config/i386/i386-expand.c b/gcc/config/i386/i386-expand.c index 5ce7163b241..39647eb2cf1 100644 --- a/gcc/config/i386/i386-expand.c +++ b/gcc/config/i386/i386-expand.c @@ -9760,6 +9760,7 @@ ix86_expand_args_builtin (const struct builtin_description *d, case V16HI_FTYPE_V8SI_V8SI_V16HI_UHI: case V8HI_FTYPE_V4SI_V4SI_V8HI_UQI: case V4DF_FTYPE_V4DF_V4DI_V4DF_UQI: + case V32HF_FTYPE_V32HF_V32HF_V32HF_USI: case V8SF_FTYPE_V8SF_V8SI_V8SF_UQI: case V4SF_FTYPE_V4SF_V4SI_V4SF_UQI: case V2DF_FTYPE_V2DF_V2DI_V2DF_UQI: @@ -9777,6 +9778,7 @@ ix86_expand_args_builtin (const struct builtin_description *d, case V8HI_FTYPE_V8HI_V8HI_V8HI_UQI: case V8SI_FTYPE_V8SI_V8SI_V8SI_UQI: case V4SI_FTYPE_V4SI_V4SI_V4SI_UQI: + case V16HF_FTYPE_V16HF_V16HF_V16HF_UHI: case V8SF_FTYPE_V8SF_V8SF_V8SF_UQI: case V16QI_FTYPE_V16QI_V16QI_V16QI_UHI: case V16HI_FTYPE_V16HI_V16HI_V16HI_UHI: @@ -9784,6 +9786,7 @@ ix86_expand_args_builtin (const struct builtin_description *d, case V2DF_FTYPE_V2DF_V2DF_V2DF_UQI: case V4DI_FTYPE_V4DI_V4DI_V4DI_UQI: case V4DF_FTYPE_V4DF_V4DF_V4DF_UQI: + case V8HF_FTYPE_V8HF_V8HF_V8HF_UQI: case V4SF_FTYPE_V4SF_V4SF_V4SF_UQI: case V8DF_FTYPE_V8DF_V8DF_V8DF_UQI: case V8DF_FTYPE_V8DF_V8DI_V8DF_UQI: @@ -10460,6 +10463,7 @@ ix86_expand_round_builtin (const struct builtin_description *d, case INT_FTYPE_V4SF_INT: nargs = 2; break; + case V32HF_FTYPE_V32HF_V32HF_INT: case V4SF_FTYPE_V4SF_UINT_INT: case V4SF_FTYPE_V4SF_UINT64_INT: case V2DF_FTYPE_V2DF_UINT64_INT: @@ -10500,6 +10504,7 @@ ix86_expand_round_builtin (const struct builtin_description *d, case V2DF_FTYPE_V2DF_V2DF_V2DF_UQI_INT: case V4SF_FTYPE_V4SF_V4SF_V4SF_UQI_INT: case V16SF_FTYPE_V16SF_V16SF_V16SF_HI_INT: + case V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT: case V2DF_FTYPE_V2DF_V2DF_V2DF_QI_INT: case V2DF_FTYPE_V2DF_V4SF_V2DF_QI_INT: case V2DF_FTYPE_V2DF_V4SF_V2DF_UQI_INT: diff --git a/gcc/config/i386/immintrin.h b/gcc/config/i386/immintrin.h index 5344e22c9c8..e08efb9dff3 100644 --- a/gcc/config/i386/immintrin.h +++ b/gcc/config/i386/immintrin.h @@ -96,6 +96,8 @@ #include +#include + #include #include diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index 1009d656cbb..2c1b6fbcd86 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -295,6 +295,13 @@ (define_mode_iterator VF [(V16SF "TARGET_AVX512F") (V8SF "TARGET_AVX") V4SF (V8DF "TARGET_AVX512F") (V4DF "TARGET_AVX") (V2DF "TARGET_SSE2")]) +(define_mode_iterator VFH + [(V32HF "TARGET_AVX512FP16") + (V16HF "TARGET_AVX512FP16 && TARGET_AVX512VL") + (V8HF "TARGET_AVX512FP16 && TARGET_AVX512VL") + (V16SF "TARGET_AVX512F") (V8SF "TARGET_AVX") V4SF + (V8DF "TARGET_AVX512F") (V4DF "TARGET_AVX") (V2DF "TARGET_SSE2")]) + ;; 128- and 256-bit float vector modes (define_mode_iterator VF_128_256 [(V8SF "TARGET_AVX") V4SF @@ -318,6 +325,13 @@ (define_mode_iterator VF1_128_256VL (define_mode_iterator VF2 [(V8DF "TARGET_AVX512F") (V4DF "TARGET_AVX") V2DF]) +;; All DFmode & HFmode vector float modes +(define_mode_iterator VF2H + [(V32HF "TARGET_AVX512FP16") + (V16HF "TARGET_AVX512FP16 && TARGET_AVX512VL") + (V8HF "TARGET_AVX512FP16 && TARGET_AVX512VL") + (V8DF "TARGET_AVX512F") (V4DF "TARGET_AVX") V2DF]) + ;; 128- and 256-bit DF vector modes (define_mode_iterator VF2_128_256 [(V4DF "TARGET_AVX") V2DF]) @@ -824,6 +838,7 @@ (define_mode_attr avx512fmaskmode (V32HI "SI") (V16HI "HI") (V8HI "QI") (V4HI "QI") (V16SI "HI") (V8SI "QI") (V4SI "QI") (V8DI "QI") (V4DI "QI") (V2DI "QI") + (V32HF "SI") (V16HF "HI") (V8HF "QI") (V16SF "HI") (V8SF "QI") (V4SF "QI") (V8DF "QI") (V4DF "QI") (V2DF "QI")]) @@ -842,6 +857,7 @@ (define_mode_attr avx512fmaskhalfmode (V32HI "HI") (V16HI "QI") (V8HI "QI") (V4HI "QI") (V16SI "QI") (V8SI "QI") (V4SI "QI") (V8DI "QI") (V4DI "QI") (V2DI "QI") + (V32HF "HI") (V16HF "QI") (V8HF "QI") (V16SF "QI") (V8SF "QI") (V4SF "QI") (V8DF "QI") (V4DF "QI") (V2DF "QI")]) @@ -1940,18 +1956,18 @@ (define_insn_and_split "*nabs2" [(set_attr "isa" "noavx,noavx,avx,avx")]) (define_expand "3" - [(set (match_operand:VF 0 "register_operand") - (plusminus:VF - (match_operand:VF 1 "") - (match_operand:VF 2 "")))] + [(set (match_operand:VFH 0 "register_operand") + (plusminus:VFH + (match_operand:VFH 1 "") + (match_operand:VFH 2 "")))] "TARGET_SSE && && " "ix86_fixup_binary_operands_no_copy (, mode, operands);") (define_insn "*3" - [(set (match_operand:VF 0 "register_operand" "=x,v") - (plusminus:VF - (match_operand:VF 1 "" "0,v") - (match_operand:VF 2 "" "xBm,")))] + [(set (match_operand:VFH 0 "register_operand" "=x,v") + (plusminus:VFH + (match_operand:VFH 1 "" "0,v") + (match_operand:VFH 2 "" "xBm,")))] "TARGET_SSE && ix86_binary_operator_ok (, mode, operands) && && " "@ @@ -2002,18 +2018,18 @@ (define_insn "_vm3" (set_attr "mode" "")]) (define_expand "mul3" - [(set (match_operand:VF 0 "register_operand") - (mult:VF - (match_operand:VF 1 "") - (match_operand:VF 2 "")))] + [(set (match_operand:VFH 0 "register_operand") + (mult:VFH + (match_operand:VFH 1 "") + (match_operand:VFH 2 "")))] "TARGET_SSE && && " "ix86_fixup_binary_operands_no_copy (MULT, mode, operands);") (define_insn "*mul3" - [(set (match_operand:VF 0 "register_operand" "=x,v") - (mult:VF - (match_operand:VF 1 "" "%0,v") - (match_operand:VF 2 "" "xBm,")))] + [(set (match_operand:VFH 0 "register_operand" "=x,v") + (mult:VFH + (match_operand:VFH 1 "" "%0,v") + (match_operand:VFH 2 "" "xBm,")))] "TARGET_SSE && ix86_binary_operator_ok (MULT, mode, operands) && && " "@ @@ -2067,9 +2083,9 @@ (define_insn "_vm3")]) (define_expand "div3" - [(set (match_operand:VF2 0 "register_operand") - (div:VF2 (match_operand:VF2 1 "register_operand") - (match_operand:VF2 2 "vector_operand")))] + [(set (match_operand:VF2H 0 "register_operand") + (div:VF2H (match_operand:VF2H 1 "register_operand") + (match_operand:VF2H 2 "vector_operand")))] "TARGET_SSE2") (define_expand "div3" @@ -2090,10 +2106,10 @@ (define_expand "div3" }) (define_insn "_div3" - [(set (match_operand:VF 0 "register_operand" "=x,v") - (div:VF - (match_operand:VF 1 "register_operand" "0,v") - (match_operand:VF 2 "" "xBm,")))] + [(set (match_operand:VFH 0 "register_operand" "=x,v") + (div:VFH + (match_operand:VFH 1 "register_operand" "0,v") + (match_operand:VFH 2 "" "xBm,")))] "TARGET_SSE && && " "@ div\t{%2, %0|%0, %2} diff --git a/gcc/config/i386/subst.md b/gcc/config/i386/subst.md index 477a89803fa..762383bfd11 100644 --- a/gcc/config/i386/subst.md +++ b/gcc/config/i386/subst.md @@ -24,6 +24,7 @@ (define_mode_iterator SUBST_V V32HI V16HI V8HI V16SI V8SI V4SI V8DI V4DI V2DI + V32HF V16HF V8HF V16SF V8SF V4SF V8DF V4DF V2DF]) @@ -35,6 +36,7 @@ (define_mode_iterator SUBST_A V32HI V16HI V8HI V16SI V8SI V4SI V8DI V4DI V2DI + V32HF V16HF V8HF V16SF V8SF V4SF V8DF V4DF V2DF QI HI SI DI SF DF]) @@ -142,7 +144,9 @@ (define_subst_attr "round_prefix" "round" "vex" "evex") (define_subst_attr "round_mode512bit_condition" "round" "1" "(mode == V16SFmode || mode == V8DFmode || mode == V8DImode - || mode == V16SImode)") + || mode == V16SImode + || mode == V32HFmode)") + (define_subst_attr "round_modev8sf_condition" "round" "1" "(mode == V8SFmode)") (define_subst_attr "round_modev4sf_condition" "round" "1" "(mode == V4SFmode)") (define_subst_attr "round_codefor" "round" "*" "") diff --git a/gcc/testsuite/gcc.target/i386/avx-1.c b/gcc/testsuite/gcc.target/i386/avx-1.c index f3676077743..1eaee861141 100644 --- a/gcc/testsuite/gcc.target/i386/avx-1.c +++ b/gcc/testsuite/gcc.target/i386/avx-1.c @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-O2 -Werror-implicit-function-declaration -march=k8 -m3dnow -mavx -mavx2 -maes -mpclmul -mgfni -mavx512bw -mavx512fp16" } */ +/* { dg-options "-O2 -Werror-implicit-function-declaration -march=k8 -m3dnow -mavx -mavx2 -maes -mpclmul -mgfni -mavx512bw -mavx512fp16 -mavx512vl" } */ /* { dg-add-options bind_pic_locally } */ #include @@ -685,6 +685,12 @@ #define __builtin_ia32_vpshld_v2di(A, B, C) __builtin_ia32_vpshld_v2di(A, B, 1) #define __builtin_ia32_vpshld_v2di_mask(A, B, C, D, E) __builtin_ia32_vpshld_v2di_mask(A, B, 1, D, E) +/* avx512fp16intrin.h */ +#define __builtin_ia32_vaddph_v32hf_mask_round(A, B, C, D, E) __builtin_ia32_vaddph_v32hf_mask_round(A, B, C, D, 8) +#define __builtin_ia32_vsubph_v32hf_mask_round(A, B, C, D, E) __builtin_ia32_vsubph_v32hf_mask_round(A, B, C, D, 8) +#define __builtin_ia32_vmulph_v32hf_mask_round(A, B, C, D, E) __builtin_ia32_vmulph_v32hf_mask_round(A, B, C, D, 8) +#define __builtin_ia32_vdivph_v32hf_mask_round(A, B, C, D, E) __builtin_ia32_vdivph_v32hf_mask_round(A, B, C, D, 8) + /* vpclmulqdqintrin.h */ #define __builtin_ia32_vpclmulqdq_v4di(A, B, C) __builtin_ia32_vpclmulqdq_v4di(A, B, 1) #define __builtin_ia32_vpclmulqdq_v2di(A, B, C) __builtin_ia32_vpclmulqdq_v2di(A, B, 1) diff --git a/gcc/testsuite/gcc.target/i386/avx-2.c b/gcc/testsuite/gcc.target/i386/avx-2.c index 1751c52565c..642ae4d7bfb 100644 --- a/gcc/testsuite/gcc.target/i386/avx-2.c +++ b/gcc/testsuite/gcc.target/i386/avx-2.c @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-O0 -Werror-implicit-function-declaration -march=k8 -m3dnow -mavx -mavx2 -msse4a -maes -mpclmul -mavx512bw -mavx512fp16" } */ +/* { dg-options "-O0 -Werror-implicit-function-declaration -march=k8 -m3dnow -mavx -mavx2 -msse4a -maes -mpclmul -mavx512bw -mavx512fp16 -mavx512vl" } */ /* { dg-add-options bind_pic_locally } */ #include diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-11a.c b/gcc/testsuite/gcc.target/i386/avx512fp16-11a.c new file mode 100644 index 00000000000..28492fa3f7b --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-11a.c @@ -0,0 +1,36 @@ +/* { dg-do compile} */ +/* { dg-options "-O2 -mavx512fp16" } */ + +#include +__m512h +__attribute__ ((noinline, noclone)) +vadd512 (__m512h a, __m512h b) +{ + return a + b; +} + +__m512h +__attribute__ ((noinline, noclone)) +vsub512 (__m512h a, __m512h b) +{ + return a - b; +} + +__m512h +__attribute__ ((noinline, noclone)) +vmul512 (__m512h a, __m512h b) +{ + return a * b; +} + +__m512h +__attribute__ ((noinline, noclone)) +vdiv512 (__m512h a, __m512h b) +{ + return a / b; +} + +/* { dg-final { scan-assembler-times "vaddph\[ \\t\]+\[^\n\r\]*%zmm\[01\]" 1 } } */ +/* { dg-final { scan-assembler-times "vsubph\[ \\t\]+\[^\n\r\]*%zmm\[01\]" 1 } } */ +/* { dg-final { scan-assembler-times "vmulph\[ \\t\]+\[^\n\r\]*%zmm\[01\]" 1 } } */ +/* { dg-final { scan-assembler-times "vdivph\[ \\t\]+\[^\n\r\]*%zmm\[01\]" 1 } } */ diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-11b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-11b.c new file mode 100644 index 00000000000..fc105152d2f --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-11b.c @@ -0,0 +1,75 @@ +/* { dg-do run { target avx512fp16 } } */ +/* { dg-options "-O2 -mavx512fp16" } */ + +#include +#include +static void do_test (void); + +#define DO_TEST do_test +#define AVX512FP16 +#include "avx512-check.h" +#include "avx512fp16-11a.c" + +/* Get random float16 between -50.x to 50.x. */ +_Float16 +get_float16_noround() +{ + return ((int) (100.0 * rand ()/ (RAND_MAX + 1.0)) - 50) + + 0.1f * (int) (10 * rand() / (RAND_MAX + 1.0)); +} + +static void +do_test (void) +{ + _Float16 x[32]; + _Float16 y[32]; + _Float16 res_add[32]; + _Float16 res_sub[32]; + _Float16 res_mul[32]; + _Float16 res_div[32]; + for (int i = 0 ; i != 32; i++) + { + x[i] = get_float16_noround (); + y[i] = get_float16_noround (); + if (y[i] == 0) + y[i] = 1.0f; + res_add[i] = x[i] + y[i]; + res_sub[i] = x[i] - y[i]; + res_mul[i] = x[i] * y[i]; + res_div[i] = x[i] / y[i]; + + } + + union512h u512 = { x[0], x[1], x[2], x[3], x[4], x[5], x[6], x[7], + x[8], x[9], x[10], x[11], x[12], x[13], x[14], x[15], + x[16], x[17], x[18], x[19], x[20], x[21], x[22], x[23], + x[24], x[25], x[26], x[27], x[28], x[29], x[30], x[31] }; + union512h u512_1 = {y[0], y[1], y[2], y[3], y[4], y[5], y[6], y[7], + y[8], y[9], y[10], y[11], y[12], y[13], y[14], y[15], + y[16], y[17], y[18], y[19], y[20], y[21], y[22], y[23], + y[24], y[25], y[26], y[27], y[28], y[29], y[30], y[31] }; + + __m512h v512; + union512h a512; + + memset (&v512, -1, sizeof (v512)); + v512 = vadd512 (u512.x, u512_1.x); + a512.x = v512; + if (check_union512h (a512, res_add)) + abort (); + memset (&v512, -1, sizeof (v512)); + v512 = vsub512 (u512.x, u512_1.x); + a512.x = v512; + if (check_union512h (a512, res_sub)) + abort (); + memset (&v512, -1, sizeof (v512)); + v512 = vmul512 (u512.x, u512_1.x); + a512.x = v512; + if (check_union512h (a512, res_mul)) + abort (); + memset (&v512, -1, sizeof (v512)); + v512 = vdiv512 (u512.x, u512_1.x); + a512.x = v512; + if (check_union512h (a512, res_div)) + abort (); +} diff --git a/gcc/testsuite/gcc.target/i386/avx512vlfp16-11a.c b/gcc/testsuite/gcc.target/i386/avx512vlfp16-11a.c new file mode 100644 index 00000000000..a8c6296f504 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512vlfp16-11a.c @@ -0,0 +1,68 @@ +/* { dg-do compile} */ +/* { dg-options "-O2 -mavx512fp16 -mavx512vl" } */ + +#include +__m128h +__attribute__ ((noinline, noclone)) +vadd128 (__m128h a, __m128h b) +{ + return a + b; +} + +__m256h +__attribute__ ((noinline, noclone)) +vadd256 (__m256h a, __m256h b) +{ + return a + b; +} + +__m128h +__attribute__ ((noinline, noclone)) +vsub128 (__m128h a, __m128h b) +{ + return a - b; +} + +__m256h +__attribute__ ((noinline, noclone)) +vsub256 (__m256h a, __m256h b) +{ + return a - b; +} + +__m128h +__attribute__ ((noinline, noclone)) +vmul128 (__m128h a, __m128h b) +{ + return a * b; +} + +__m256h +__attribute__ ((noinline, noclone)) +vmul256 (__m256h a, __m256h b) +{ + return a * b; +} + +__m128h +__attribute__ ((noinline, noclone)) +vdiv128 (__m128h a, __m128h b) +{ + return a / b; +} + +__m256h +__attribute__ ((noinline, noclone)) +vdiv256 (__m256h a, __m256h b) +{ + return a / b; +} + +/* { dg-final { scan-assembler-times "vaddph\[ \\t\]+\[^\n\r\]*%xmm\[01\]" 1 } } */ +/* { dg-final { scan-assembler-times "vaddph\[ \\t\]+\[^\n\r\]*%ymm\[01\]" 1 } } */ +/* { dg-final { scan-assembler-times "vsubph\[ \\t\]+\[^\n\r\]*%xmm\[01\]" 1 } } */ +/* { dg-final { scan-assembler-times "vsubph\[ \\t\]+\[^\n\r\]*%ymm\[01\]" 1 } } */ +/* { dg-final { scan-assembler-times "vmulph\[ \\t\]+\[^\n\r\]*%xmm\[01\]" 1 } } */ +/* { dg-final { scan-assembler-times "vmulph\[ \\t\]+\[^\n\r\]*%ymm\[01\]" 1 } } */ +/* { dg-final { scan-assembler-times "vdivph\[ \\t\]+\[^\n\r\]*%xmm\[01\]" 1 } } */ +/* { dg-final { scan-assembler-times "vdivph\[ \\t\]+\[^\n\r\]*%ymm\[01\]" 1 } } */ diff --git a/gcc/testsuite/gcc.target/i386/avx512vlfp16-11b.c b/gcc/testsuite/gcc.target/i386/avx512vlfp16-11b.c new file mode 100644 index 00000000000..b8d3e8a4e96 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512vlfp16-11b.c @@ -0,0 +1,96 @@ +/* { dg-do run { target avx512fp16 } } */ +/* { dg-options "-O2 -mavx512fp16 -mavx512vl" } */ + +#include +#include +static void do_test (void); + +#define DO_TEST do_test +#define AVX512FP16 +#include "avx512-check.h" +#include "avx512vlfp16-11a.c" + +/* Get random float16 between -50.x to 50.x. */ +_Float16 +get_float16_noround() +{ + return ((int) (100.0 * rand ()/ (RAND_MAX + 1.0)) - 50) + + 0.1f * (int) (10 * rand() / (RAND_MAX + 1.0)); +} + +static void +do_test (void) +{ + _Float16 x[16]; + _Float16 y[16]; + _Float16 res_add[16]; + _Float16 res_sub[16]; + _Float16 res_mul[16]; + _Float16 res_div[16]; + for (int i = 0 ; i != 16; i++) + { + x[i] = get_float16_noround (); + y[i] = get_float16_noround (); + if (y[i] == 0) + y[i] = 1.0f; + res_add[i] = x[i] + y[i]; + res_sub[i] = x[i] - y[i]; + res_mul[i] = x[i] * y[i]; + res_div[i] = x[i] / y[i]; + + } + + union128h u128 = { x[0], x[1], x[2], x[3], x[4], x[5], x[6], x[7] }; + union128h u128_1 = { y[0], y[1], y[2], y[3], y[4], y[5], y[6], y[7] }; + union256h u256 = { x[0], x[1], x[2], x[3], x[4], x[5], x[6], x[7], + x[8], x[9], x[10], x[11], x[12], x[13], x[14], x[15] }; + union256h u256_1 = { y[0], y[1], y[2], y[3], y[4], y[5], y[6], y[7], + y[8], y[9], y[10], y[11], y[12], y[13], y[14], y[15]}; + + __m128h v128; + __m256h v256; + union128h a128; + union256h a256; + + memset (&v128, -1, sizeof (v128)); + v128 = vadd128 (u128.x, u128_1.x); + a128.x = v128; + if (check_union128h (a128, res_add)) + abort (); + memset (&v128, -1, sizeof (v128)); + v128 = vsub128 (u128.x, u128_1.x); + a128.x = v128; + if (check_union128h (a128, res_sub)) + abort (); + memset (&v128, -1, sizeof (v128)); + v128 = vmul128 (u128.x, u128_1.x); + a128.x = v128; + if (check_union128h (a128, res_mul)) + abort (); + memset (&v128, -1, sizeof (v128)); + v128 = vdiv128 (u128.x, u128_1.x); + a128.x = v128; + if (check_union128h (a128, res_div)) + abort (); + + memset (&v256, -1, sizeof (v256)); + v256 = vadd256 (u256.x, u256_1.x); + a256.x = v256; + if (check_union256h (a256, res_add)) + abort (); + memset (&v256, -1, sizeof (v256)); + v256 = vsub256 (u256.x, u256_1.x); + a256.x = v256; + if (check_union256h (a256, res_sub)) + abort (); + memset (&v256, -1, sizeof (v256)); + v256 = vmul256 (u256.x, u256_1.x); + a256.x = v256; + if (check_union256h (a256, res_mul)) + abort (); + memset (&v256, -1, sizeof (v256)); + v256 = vdiv256 (u256.x, u256_1.x); + a256.x = v256; + if (check_union256h (a256, res_div)) + abort (); +} diff --git a/gcc/testsuite/gcc.target/i386/sse-13.c b/gcc/testsuite/gcc.target/i386/sse-13.c index f5f5c113612..50ed74cd6d6 100644 --- a/gcc/testsuite/gcc.target/i386/sse-13.c +++ b/gcc/testsuite/gcc.target/i386/sse-13.c @@ -702,6 +702,12 @@ #define __builtin_ia32_vpshld_v2di(A, B, C) __builtin_ia32_vpshld_v2di(A, B, 1) #define __builtin_ia32_vpshld_v2di_mask(A, B, C, D, E) __builtin_ia32_vpshld_v2di_mask(A, B, 1, D, E) +/* avx512fp16intrin.h */ +#define __builtin_ia32_vaddph_v32hf_mask_round(A, B, C, D, E) __builtin_ia32_vaddph_v32hf_mask_round(A, B, C, D, 8) +#define __builtin_ia32_vsubph_v32hf_mask_round(A, B, C, D, E) __builtin_ia32_vsubph_v32hf_mask_round(A, B, C, D, 8) +#define __builtin_ia32_vmulph_v32hf_mask_round(A, B, C, D, E) __builtin_ia32_vmulph_v32hf_mask_round(A, B, C, D, 8) +#define __builtin_ia32_vdivph_v32hf_mask_round(A, B, C, D, E) __builtin_ia32_vdivph_v32hf_mask_round(A, B, C, D, 8) + /* vpclmulqdqintrin.h */ #define __builtin_ia32_vpclmulqdq_v4di(A, B, C) __builtin_ia32_vpclmulqdq_v4di(A, B, 1) #define __builtin_ia32_vpclmulqdq_v2di(A, B, C) __builtin_ia32_vpclmulqdq_v2di(A, B, 1) diff --git a/gcc/testsuite/gcc.target/i386/sse-14.c b/gcc/testsuite/gcc.target/i386/sse-14.c index 747d504cedb..26a5e94c7ca 100644 --- a/gcc/testsuite/gcc.target/i386/sse-14.c +++ b/gcc/testsuite/gcc.target/i386/sse-14.c @@ -667,6 +667,20 @@ test_3 (_mm512_mask_rcp28_round_ps, __m512, __m512, __mmask16, __m512, 8) test_3 (_mm512_mask_rsqrt28_round_pd, __m512d, __m512d, __mmask8, __m512d, 8) test_3 (_mm512_mask_rsqrt28_round_ps, __m512, __m512, __mmask16, __m512, 8) +/* avx512fp16intrin.h */ +test_2 (_mm512_add_round_ph, __m512h, __m512h, __m512h, 8) +test_2 (_mm512_sub_round_ph, __m512h, __m512h, __m512h, 8) +test_2 (_mm512_mul_round_ph, __m512h, __m512h, __m512h, 8) +test_2 (_mm512_div_round_ph, __m512h, __m512h, __m512h, 8) +test_3 (_mm512_maskz_add_round_ph, __m512h, __mmask32, __m512h, __m512h, 8) +test_3 (_mm512_maskz_sub_round_ph, __m512h, __mmask32, __m512h, __m512h, 8) +test_3 (_mm512_maskz_mul_round_ph, __m512h, __mmask32, __m512h, __m512h, 8) +test_3 (_mm512_maskz_div_round_ph, __m512h, __mmask32, __m512h, __m512h, 8) +test_4 (_mm512_mask_add_round_ph, __m512h, __m512h, __mmask32, __m512h, __m512h, 8) +test_4 (_mm512_mask_sub_round_ph, __m512h, __m512h, __mmask32, __m512h, __m512h, 8) +test_4 (_mm512_mask_mul_round_ph, __m512h, __m512h, __mmask32, __m512h, __m512h, 8) +test_4 (_mm512_mask_div_round_ph, __m512h, __m512h, __mmask32, __m512h, __m512h, 8) + /* shaintrin.h */ test_2 (_mm_sha1rnds4_epu32, __m128i, __m128i, __m128i, 1) diff --git a/gcc/testsuite/gcc.target/i386/sse-22.c b/gcc/testsuite/gcc.target/i386/sse-22.c index 33411969901..8d25effd724 100644 --- a/gcc/testsuite/gcc.target/i386/sse-22.c +++ b/gcc/testsuite/gcc.target/i386/sse-22.c @@ -772,6 +772,20 @@ test_2 (_mm_rcp28_round_ss, __m128, __m128, __m128, 8) test_2 (_mm_rsqrt28_round_sd, __m128d, __m128d, __m128d, 8) test_2 (_mm_rsqrt28_round_ss, __m128, __m128, __m128, 8) +/* avx512fp16intrin.h */ +test_2 (_mm512_add_round_ph, __m512h, __m512h, __m512h, 8) +test_2 (_mm512_sub_round_ph, __m512h, __m512h, __m512h, 8) +test_2 (_mm512_mul_round_ph, __m512h, __m512h, __m512h, 8) +test_2 (_mm512_div_round_ph, __m512h, __m512h, __m512h, 8) +test_3 (_mm512_maskz_add_round_ph, __m512h, __mmask32, __m512h, __m512h, 8) +test_3 (_mm512_maskz_sub_round_ph, __m512h, __mmask32, __m512h, __m512h, 8) +test_3 (_mm512_maskz_mul_round_ph, __m512h, __mmask32, __m512h, __m512h, 8) +test_3 (_mm512_maskz_div_round_ph, __m512h, __mmask32, __m512h, __m512h, 8) +test_4 (_mm512_mask_add_round_ph, __m512h, __m512h, __mmask32, __m512h, __m512h, 8) +test_4 (_mm512_mask_sub_round_ph, __m512h, __m512h, __mmask32, __m512h, __m512h, 8) +test_4 (_mm512_mask_mul_round_ph, __m512h, __m512h, __mmask32, __m512h, __m512h, 8) +test_4 (_mm512_mask_div_round_ph, __m512h, __m512h, __mmask32, __m512h, __m512h, 8) + /* shaintrin.h */ test_2 (_mm_sha1rnds4_epu32, __m128i, __m128i, __m128i, 1) diff --git a/gcc/testsuite/gcc.target/i386/sse-23.c b/gcc/testsuite/gcc.target/i386/sse-23.c index 86590ca5ffb..f7dd5d7495c 100644 --- a/gcc/testsuite/gcc.target/i386/sse-23.c +++ b/gcc/testsuite/gcc.target/i386/sse-23.c @@ -703,6 +703,12 @@ #define __builtin_ia32_vpshld_v2di(A, B, C) __builtin_ia32_vpshld_v2di(A, B, 1) #define __builtin_ia32_vpshld_v2di_mask(A, B, C, D, E) __builtin_ia32_vpshld_v2di_mask(A, B, 1, D, E) +/* avx512fp16intrin.h */ +#define __builtin_ia32_vaddph_v32hf_mask_round(A, B, C, D, E) __builtin_ia32_vaddph_v32hf_mask_round(A, B, C, D, 8) +#define __builtin_ia32_vsubph_v32hf_mask_round(A, B, C, D, E) __builtin_ia32_vsubph_v32hf_mask_round(A, B, C, D, 8) +#define __builtin_ia32_vmulph_v32hf_mask_round(A, B, C, D, E) __builtin_ia32_vmulph_v32hf_mask_round(A, B, C, D, 8) +#define __builtin_ia32_vdivph_v32hf_mask_round(A, B, C, D, E) __builtin_ia32_vdivph_v32hf_mask_round(A, B, C, D, 8) + /* vpclmulqdqintrin.h */ #define __builtin_ia32_vpclmulqdq_v4di(A, B, C) __builtin_ia32_vpclmulqdq_v4di(A, B, 1) #define __builtin_ia32_vpclmulqdq_v2di(A, B, C) __builtin_ia32_vpclmulqdq_v2di(A, B, 1) From patchwork Thu Jul 1 06:15:54 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: liuhongt X-Patchwork-Id: 1499313 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256 header.s=default header.b=CBidM9nl; dkim-atps=neutral Received: from sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4GFpCN07fxz9sWX for ; Thu, 1 Jul 2021 16:26:56 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 2F24C384B06F for ; Thu, 1 Jul 2021 06:26:53 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 2F24C384B06F DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1625120813; bh=aPP5FVccu9TFdrGEelqUrekqZP2cLeNZPLQlYTlD/9w=; h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=CBidM9nl1xvqpkGgeyRP6nY40TkikEzPrdpCEjdCdmnARSYDsUL7XtnJkH9L74rxh RLOnMXENJ/SpLDgpIs+uI8oeatiDkmI4Nt0Y6AGYMK8NSh+rfqafhHHAhp/BmBYAiJ y96cCff/1ttP4FNE+QfapZBpvwOg9FK0CUsFj6k0= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mga05.intel.com (mga05.intel.com [192.55.52.43]) by sourceware.org (Postfix) with ESMTPS id 618B8384BC22 for ; Thu, 1 Jul 2021 06:17:06 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 618B8384BC22 X-IronPort-AV: E=McAfee;i="6200,9189,10031"; a="294114945" X-IronPort-AV: E=Sophos;i="5.83,313,1616482800"; d="scan'208";a="294114945" Received: from orsmga005.jf.intel.com ([10.7.209.41]) by fmsmga105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Jun 2021 23:17:04 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.83,313,1616482800"; d="scan'208";a="626257381" Received: from scymds01.sc.intel.com ([10.148.94.138]) by orsmga005.jf.intel.com with ESMTP; 30 Jun 2021 23:17:03 -0700 Received: from shliclel320.sh.intel.com (shliclel320.sh.intel.com [10.239.236.50]) by scymds01.sc.intel.com with ESMTP id 1616Gmel031625; Wed, 30 Jun 2021 23:17:02 -0700 To: gcc-patches@gcc.gnu.org Subject: [PATCH 08/62] AVX512FP16: Add testcase for vaddph/vsubph/vmulph/vdivph. Date: Thu, 1 Jul 2021 14:15:54 +0800 Message-Id: <20210701061648.9447-9-hongtao.liu@intel.com> X-Mailer: git-send-email 2.18.1 In-Reply-To: <20210701061648.9447-1-hongtao.liu@intel.com> References: <20210701061648.9447-1-hongtao.liu@intel.com> X-Spam-Status: No, score=-12.8 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_NONE, KAM_DMARC_STATUS, KAM_LAZY_DOMAIN_SECURITY, KAM_SHORT, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: liuhongt via Gcc-patches From: liuhongt Reply-To: liuhongt Cc: jakub@redhat.com Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" gcc/testsuite/ChangeLog: * gcc.target/i386/avx512fp16-helper.h: New header file for FP16 runtime test. * gcc.target/i386/avx512fp16-vaddph-1a.c: New test. * gcc.target/i386/avx512fp16-vaddph-1b.c: Ditto. * gcc.target/i386/avx512fp16-vdivph-1a.c: Ditto. * gcc.target/i386/avx512fp16-vdivph-1b.c: Ditto. * gcc.target/i386/avx512fp16-vmulph-1a.c: Ditto. * gcc.target/i386/avx512fp16-vmulph-1b.c: Ditto. * gcc.target/i386/avx512fp16-vsubph-1a.c: Ditto. * gcc.target/i386/avx512fp16-vsubph-1b.c: Ditto. * gcc.target/i386/avx512fp16vl-vaddph-1a.c: Ditto. * gcc.target/i386/avx512fp16vl-vaddph-1b.c: Ditto. * gcc.target/i386/avx512fp16vl-vdivph-1a.c: Ditto. * gcc.target/i386/avx512fp16vl-vdivph-1b.c: Ditto. * gcc.target/i386/avx512fp16vl-vmulph-1a.c: Ditto. * gcc.target/i386/avx512fp16vl-vmulph-1b.c: Ditto. * gcc.target/i386/avx512fp16vl-vsubph-1a.c: Ditto. * gcc.target/i386/avx512fp16vl-vsubph-1b.c: Ditto. --- .../gcc.target/i386/avx512fp16-helper.h | 207 ++++++++++++++++++ .../gcc.target/i386/avx512fp16-vaddph-1a.c | 26 +++ .../gcc.target/i386/avx512fp16-vaddph-1b.c | 92 ++++++++ .../gcc.target/i386/avx512fp16-vdivph-1a.c | 26 +++ .../gcc.target/i386/avx512fp16-vdivph-1b.c | 97 ++++++++ .../gcc.target/i386/avx512fp16-vmulph-1a.c | 26 +++ .../gcc.target/i386/avx512fp16-vmulph-1b.c | 92 ++++++++ .../gcc.target/i386/avx512fp16-vsubph-1a.c | 26 +++ .../gcc.target/i386/avx512fp16-vsubph-1b.c | 93 ++++++++ .../gcc.target/i386/avx512fp16vl-vaddph-1a.c | 29 +++ .../gcc.target/i386/avx512fp16vl-vaddph-1b.c | 16 ++ .../gcc.target/i386/avx512fp16vl-vdivph-1a.c | 29 +++ .../gcc.target/i386/avx512fp16vl-vdivph-1b.c | 16 ++ .../gcc.target/i386/avx512fp16vl-vmulph-1a.c | 29 +++ .../gcc.target/i386/avx512fp16vl-vmulph-1b.c | 16 ++ .../gcc.target/i386/avx512fp16vl-vsubph-1a.c | 29 +++ .../gcc.target/i386/avx512fp16vl-vsubph-1b.c | 16 ++ 17 files changed, 865 insertions(+) create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-helper.h create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vaddph-1a.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vaddph-1b.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vdivph-1a.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vdivph-1b.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vmulph-1a.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vmulph-1b.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vsubph-1a.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vsubph-1b.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vaddph-1a.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vaddph-1b.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vdivph-1a.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vdivph-1b.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vmulph-1a.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vmulph-1b.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vsubph-1a.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vsubph-1b.c diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-helper.h b/gcc/testsuite/gcc.target/i386/avx512fp16-helper.h new file mode 100644 index 00000000000..9fde88a4f7b --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-helper.h @@ -0,0 +1,207 @@ +/* This file is used for emulation of avx512fp16 runtime tests. To + verify the correctness of _Float16 type calculation, the idea is + convert _Float16 to float and do emulation using float instructions. + _Float16 type should not be emulate or check by itself. */ + +#include "avx512f-helper.h" +#ifndef AVX512FP16_HELPER_INCLUDED +#define AVX512FP16_HELPER_INCLUDED + +#ifdef DEBUG +#include +#endif +#include +#include +#include + +/* Useful macros. */ +#define NOINLINE __attribute__((noinline,noclone)) +#define _ROUND_NINT (_MM_FROUND_TO_NEAREST_INT | _MM_FROUND_NO_EXC) +#define AVX512F_MAX_ELEM 512 / 32 + +/* Structure for _Float16 emulation */ +typedef union +{ + __m512 zmm; + __m512h zmmh; + __m256 ymm[2]; + __m256h ymmh[2]; + __m256i ymmi[2]; + __m128h xmmh[4]; + unsigned short u16[32]; + unsigned int u32[16]; + float f32[16]; + _Float16 f16[32]; +} V512; + +/* Global variables. */ +V512 src1, src2, src3; +int n_errs = 0; + +/* Helper function for packing/unpacking ph operands. */ +void NOINLINE +unpack_ph_2twops(V512 src, V512 *op1, V512 *op2) +{ + V512 v1; + + op1->zmm = _mm512_cvtph_ps(src.ymmi[0]); + v1.ymm[0] = _mm512_extractf32x8_ps(src.zmm, 1); + op2->zmm = _mm512_cvtph_ps(v1.ymmi[0]); +} + +V512 NOINLINE +pack_twops_2ph(V512 op1, V512 op2) +{ + V512 v1, v2, v3; + + v1.ymmi[0] = _mm512_cvtps_ph(op1.zmm, _MM_FROUND_TO_NEAREST_INT); + v2.ymmi[0] = _mm512_cvtps_ph(op2.zmm, _MM_FROUND_TO_NEAREST_INT); + + v3.zmm = _mm512_insertf32x8(v1.zmm, v2.ymm[0], 1); + + return v3; +} + +/* Helper function used for result debugging */ +#ifdef DEBUG +void NOINLINE +display_ps(const void *p, const char *banner, int n_elems) +{ + int i; + V512 *v = (V512*)p; + + if (banner) { + printf("%s", banner); + } + + for (i = 15; i >= n_elems; i--) { + printf(" --------"); + if (i == 8) { + printf("\n"); + if (banner) { + printf("%*s", (int)strlen(banner), ""); + } + } + } + + for (; i >= 0; i--) { + printf(" %x", v->u32[i]); + if (i == 8) { + printf("\n"); + if (banner) { + printf("%*s", (int)strlen(banner), ""); + } + } + } + printf("\n"); +} +#endif + +/* Functions/macros used for init/result checking. + Only check components within AVX512F_LEN. */ +#define TO_STRING(x) #x +#define STRINGIFY(x) TO_STRING(x) +#define NAME_OF(NAME) STRINGIFY(INTRINSIC (NAME)) + +#define CHECK_RESULT(res, exp, size, intrin) \ + check_results ((void*)res, (void*)exp, size,\ + NAME_OF(intrin)) + +/* To evaluate whether result match _Float16 precision, + only the last bit of real/emulate result could be + different. */ +void NOINLINE +check_results(void *got, void *exp, int n_elems, char *banner) +{ + int i; + V512 *v1 = (V512*)got; + V512 *v2 = (V512*)exp; + + for (i = 0; i < n_elems; i++) { + if (v1->u16[i] != v2->u16[i] && + ((v1->u16[i] > (v2->u16[i] + 1)) || + (v1->u16[i] < (v2->u16[i] - 1)))) { + +#ifdef DEBUG + printf("ERROR: %s failed at %d'th element: %x(%f) != %x(%f)\n", + banner ? banner : "", i, + v1->u16[i], *(float *)(&v1->u16[i]), + v2->u16[i], *(float *)(&v2->u16[i])); + display_ps(got, "got:", n_elems); + display_ps(exp, "exp:", n_elems); +#endif + n_errs++; + break; + } + } +} + +/* Functions for src/dest initialization */ +void NOINLINE +init_src() +{ + V512 v1, v2, v3, v4; + int i; + + for (i = 0; i < AVX512F_MAX_ELEM; i++) { + v1.f32[i] = -i + 1; + v2.f32[i] = i * 0.5f; + v3.f32[i] = i * 2.5f; + v4.f32[i] = i - 0.5f; + + src3.u32[i] = (i + 1) * 10; + } + + src1 = pack_twops_2ph(v1, v2); + src2 = pack_twops_2ph(v3, v4); +} + +void NOINLINE +init_dest(V512 * res, V512 * exp) +{ + int i; + V512 v1; + + for (i = 0; i < AVX512F_MAX_ELEM; i++) { + v1.f32[i] = 12 + 0.5f * i; + } + *res = *exp = pack_twops_2ph(v1, v1); +} + +#define EMULATE(NAME) EVAL(emulate_, NAME, AVX512F_LEN) + +#endif /* AVX512FP16_HELPER_INCLUDED */ + +/* Macros for AVX512VL Testing. Include V512 component usage + and mask type for emulation. */ + +#if AVX512F_LEN == 256 +#undef HF +#undef SF +#undef NET_MASK +#undef MASK_VALUE +#undef ZMASK_VALUE +#define NET_MASK 0xffff +#define MASK_VALUE 0xcccc +#define ZMASK_VALUE 0xfcc1 +#define HF(x) x.ymmh[0] +#define SF(x) x.ymm[0] +#elif AVX512F_LEN == 128 +#undef HF +#undef SF +#undef NET_MASK +#undef MASK_VALUE +#undef ZMASK_VALUE +#define NET_MASK 0xff +#define MASK_VALUE 0xcc +#define ZMASK_VALUE 0xc1 +#define HF(x) x.xmmh[0] +#define SF(x) x.xmm[0] +#else +#define NET_MASK 0xffffffff +#define MASK_VALUE 0xcccccccc +#define ZMASK_VALUE 0xfcc1fcc1 +#define HF(x) x.zmmh +#define SF(x) x.zmm +#endif + diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vaddph-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vaddph-1a.c new file mode 100644 index 00000000000..0590c34cebf --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vaddph-1a.c @@ -0,0 +1,26 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx512fp16 -O2" } */ +/* { dg-final { scan-assembler-times "vaddph\[ \\t\]+%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vaddph\[ \\t\]+%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vaddph\[ \\t\]+%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vaddph\[ \\t\]+\{rn-sae\}\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vaddph\[ \\t\]+\{rn-sae\}\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vaddph\[ \\t\]+\{rz-sae\}\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ + +#include + +volatile __m512h res, res1, res2; +volatile __m512h x1, x2; +volatile __mmask32 m32; + +void extern +avx512f_test (void) +{ + res = _mm512_add_ph (x1, x2); + res1 = _mm512_mask_add_ph (res1, m32, x1, x2); + res2 = _mm512_maskz_add_ph (m32, x1, x2); + + res = _mm512_add_round_ph (x1, x2, 8); + res1 = _mm512_mask_add_round_ph (res1, m32, x1, x2, 8); + res2 = _mm512_maskz_add_round_ph (m32, x1, x2, 11); +} diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vaddph-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vaddph-1b.c new file mode 100644 index 00000000000..1c412b5c10e --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vaddph-1b.c @@ -0,0 +1,92 @@ +/* { dg-do run { target avx512fp16 } } */ +/* { dg-options "-O2 -mavx512fp16 -mavx512dq" } */ + +#define AVX512FP16 +#include "avx512fp16-helper.h" + +#define N_ELEMS (AVX512F_LEN / 16) + +void NOINLINE +EMULATE(add_ph) (V512 * dest, V512 op1, V512 op2, + __mmask32 k, int zero_mask) +{ + V512 v1, v2, v3, v4, v5, v6, v7, v8; + int i; + __mmask16 m1, m2; + + m1 = k & 0xffff; + m2 = (k >> 16) & 0xffff; + + unpack_ph_2twops(op1, &v1, &v2); + unpack_ph_2twops(op2, &v3, &v4); + unpack_ph_2twops(*dest, &v7, &v8); + + for (i = 0; i < 16; i++) { + if (((1 << i) & m1) == 0) { + if (zero_mask) { + v5.f32[i] = 0; + } + else { + v5.u32[i] = v7.u32[i]; + } + } + else { + v5.f32[i] = v1.f32[i] + v3.f32[i]; + } + + if (((1 << i) & m2) == 0) { + if (zero_mask) { + v6.f32[i] = 0; + } + else { + v6.u32[i] = v8.u32[i]; + } + } + else { + v6.f32[i] = v2.f32[i] + v4.f32[i]; + } + + } + *dest = pack_twops_2ph(v5, v6); +} + +void +TEST (void) +{ + V512 res; + V512 exp; + + init_src(); + + EMULATE(add_ph) (&exp, src1, src2, NET_MASK, 0); + HF(res) = INTRINSIC (_add_ph) (HF(src1), HF(src2)); + CHECK_RESULT (&res, &exp, N_ELEMS, _add_ph); + + init_dest(&res, &exp); + EMULATE(add_ph) (&exp, src1, src2, MASK_VALUE, 0); + HF(res) = INTRINSIC (_mask_add_ph) (HF(res), MASK_VALUE, HF(src1), HF(src2)); + CHECK_RESULT (&res, &exp, N_ELEMS, _mask_add_ph); + + EMULATE(add_ph) (&exp, src1, src2, ZMASK_VALUE, 1); + HF(res) = INTRINSIC (_maskz_add_ph) (ZMASK_VALUE, HF(src1), HF(src2)); + CHECK_RESULT (&res, &exp, N_ELEMS, _maskz_add_ph); + +#if AVX512F_LEN == 512 + EMULATE(add_ph) (&exp, src1, src2, NET_MASK, 0); + HF(res) = INTRINSIC (_add_round_ph) (HF(src1), HF(src2), _ROUND_NINT); + CHECK_RESULT (&res, &exp, N_ELEMS, _add_round_ph); + + init_dest(&res, &exp); + EMULATE(add_ph) (&exp, src1, src2, MASK_VALUE, 0); + HF(res) = INTRINSIC (_mask_add_round_ph) (HF(res), MASK_VALUE, HF(src1), HF(src2), _ROUND_NINT); + CHECK_RESULT (&res, &exp, N_ELEMS, _mask_add_round_ph); + + EMULATE(add_ph) (&exp, src1, src2, ZMASK_VALUE, 1); + HF(res) = INTRINSIC (_maskz_add_round_ph) (ZMASK_VALUE, HF(src1), HF(src2), _ROUND_NINT); + CHECK_RESULT (&res, &exp, N_ELEMS, _maskz_add_round_ph); +#endif + + if (n_errs != 0) { + abort (); + } +} diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vdivph-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vdivph-1a.c new file mode 100644 index 00000000000..63f111f3196 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vdivph-1a.c @@ -0,0 +1,26 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx512fp16 -O2" } */ +/* { dg-final { scan-assembler-times "vdivph\[ \\t\]+%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vdivph\[ \\t\]+%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vdivph\[ \\t\]+%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vdivph\[ \\t\]+\{rn-sae\}\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vdivph\[ \\t\]+\{rn-sae\}\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vdivph\[ \\t\]+\{rz-sae\}\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ + +#include + +volatile __m512h res, res1, res2; +volatile __m512h x1, x2; +volatile __mmask32 m32; + +void extern +avx512f_test (void) +{ + res = _mm512_div_ph (x1, x2); + res1 = _mm512_mask_div_ph (res1, m32, x1, x2); + res2 = _mm512_maskz_div_ph (m32, x1, x2); + + res = _mm512_div_round_ph (x1, x2, 8); + res1 = _mm512_mask_div_round_ph (res1, m32, x1, x2, 8); + res2 = _mm512_maskz_div_round_ph (m32, x1, x2, 11); +} diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vdivph-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vdivph-1b.c new file mode 100644 index 00000000000..c8b38210e87 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vdivph-1b.c @@ -0,0 +1,97 @@ +/* { dg-do run { target avx512fp16 } } */ +/* { dg-options "-O2 -mavx512fp16 -mavx512dq" } */ + + +#define AVX512FP16 +#include "avx512fp16-helper.h" + +#define N_ELEMS (AVX512F_LEN / 16) + +void NOINLINE +EMULATE(div_ph) (V512 * dest, V512 op1, V512 op2, + __mmask32 k, int zero_mask) +{ + V512 v1, v2, v3, v4, v5, v6, v7, v8; + int i; + __mmask16 m1, m2; + + m1 = k & 0xffff; + m2 = (k >> 16) & 0xffff; + + unpack_ph_2twops(op1, &v1, &v2); + unpack_ph_2twops(op2, &v3, &v4); + unpack_ph_2twops(*dest, &v7, &v8); + + for (i = 0; i < 16; i++) { + if (((1 << i) & m1) == 0) { + if (zero_mask) { + v5.f32[i] = 0; + } + else { + v5.u32[i] = v7.u32[i]; + } + } + else { + v5.f32[i] = v1.f32[i] / v3.f32[i]; + } + + if (((1 << i) & m2) == 0) { + if (zero_mask) { + v6.f32[i] = 0; + } + else { + v6.u32[i] = v8.u32[i]; + } + } + else { + v6.f32[i] = v2.f32[i] / v4.f32[i]; + } + + } + *dest = pack_twops_2ph(v5, v6); +} + +void +TEST (void) +{ + V512 res; + V512 exp; + + init_src(); + + EMULATE(div_ph) (&exp, src1, src2, NET_MASK, 0); + HF(res) = INTRINSIC (_div_ph) (HF(src1), HF(src2)); + CHECK_RESULT (&res, &exp, N_ELEMS, _div_ph); + + init_dest(&res, &exp); + EMULATE(div_ph) (&exp, src1, src2, MASK_VALUE, 0); + HF(res) = INTRINSIC (_mask_div_ph) (HF(res), MASK_VALUE, HF(src1), HF(src2)); + CHECK_RESULT (&res, &exp, N_ELEMS, _mask_div_ph); + + EMULATE(div_ph) (&exp, src1, src2, ZMASK_VALUE, 1); + HF(res) = INTRINSIC (_maskz_div_ph) (ZMASK_VALUE, HF(src1), HF(src2)); + CHECK_RESULT (&res, &exp, N_ELEMS, _maskz_div_ph); + +#if AVX512F_LEN == 512 +#if AVX512F_LEN == 512 + EMULATE(div_ph) (&exp, src1, src2, NET_MASK, 0); + HF(res) = INTRINSIC (_div_round_ph) (HF(src1), HF(src2), _ROUND_NINT); + CHECK_RESULT (&res, &exp, N_ELEMS, _div_ph); + + init_dest(&res, &exp); + EMULATE(div_ph) (&exp, src1, src2, MASK_VALUE, 0); + HF(res) = INTRINSIC (_mask_div_round_ph) (HF(res), MASK_VALUE, HF(src1), HF(src2), _ROUND_NINT); + CHECK_RESULT (&res, &exp, N_ELEMS, _mask_div_ph); + + EMULATE(div_ph) (&exp, src1, src2, ZMASK_VALUE, 1); + HF(res) = INTRINSIC (_maskz_div_round_ph) (ZMASK_VALUE, HF(src1), HF(src2), _ROUND_NINT); + CHECK_RESULT (&res, &exp, N_ELEMS, _maskz_div_ph); +#endif +#endif + + if (n_errs != 0) { + abort (); + } +} + + diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vmulph-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vmulph-1a.c new file mode 100644 index 00000000000..1088e255786 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vmulph-1a.c @@ -0,0 +1,26 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx512fp16 -O2" } */ +/* { dg-final { scan-assembler-times "vmulph\[ \\t\]+%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vmulph\[ \\t\]+%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vmulph\[ \\t\]+%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vmulph\[ \\t\]+\{rn-sae\}\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vmulph\[ \\t\]+\{rn-sae\}\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vmulph\[ \\t\]+\{rz-sae\}\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ + +#include + +volatile __m512h res, res1, res2; +volatile __m512h x1, x2; +volatile __mmask32 m32; + +void extern +avx512f_test (void) +{ + res = _mm512_mul_ph (x1, x2); + res1 = _mm512_mask_mul_ph (res1, m32, x1, x2); + res2 = _mm512_maskz_mul_ph (m32, x1, x2); + + res = _mm512_mul_round_ph (x1, x2, 8); + res1 = _mm512_mask_mul_round_ph (res1, m32, x1, x2, 8); + res2 = _mm512_maskz_mul_round_ph (m32, x1, x2, 11); +} diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vmulph-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vmulph-1b.c new file mode 100644 index 00000000000..0d67e874d53 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vmulph-1b.c @@ -0,0 +1,92 @@ +/* { dg-do run { target avx512fp16 } } */ +/* { dg-options "-O2 -mavx512fp16 -mavx512dq" } */ + +#define AVX512FP16 +#include "avx512fp16-helper.h" + +#define N_ELEMS (AVX512F_LEN / 16) + +void NOINLINE +EMULATE(mul_ph) (V512 * dest, V512 op1, V512 op2, + __mmask32 k, int zero_mask) +{ + V512 v1, v2, v3, v4, v5, v6, v7, v8; + int i; + __mmask16 m1, m2; + + m1 = k & 0xffff; + m2 = (k >> 16) & 0xffff; + + unpack_ph_2twops(op1, &v1, &v2); + unpack_ph_2twops(op2, &v3, &v4); + unpack_ph_2twops(*dest, &v7, &v8); + + for (i = 0; i < 16; i++) { + if (((1 << i) & m1) == 0) { + if (zero_mask) { + v5.f32[i] = 0; + } + else { + v5.u32[i] = v7.u32[i]; + } + } + else { + v5.f32[i] = v1.f32[i] * v3.f32[i]; + } + + if (((1 << i) & m2) == 0) { + if (zero_mask) { + v6.f32[i] = 0; + } + else { + v6.u32[i] = v8.u32[i]; + } + } + else { + v6.f32[i] = v2.f32[i] * v4.f32[i]; + } + + } + *dest = pack_twops_2ph(v5, v6); +} + +void +TEST (void) +{ + V512 res; + V512 exp; + + init_src(); + + EMULATE(mul_ph) (&exp, src1, src2, NET_MASK, 0); + HF(res) = INTRINSIC (_mul_ph) (HF(src1), HF(src2)); + CHECK_RESULT (&res, &exp, N_ELEMS, _mul_ph); + + init_dest(&res, &exp); + EMULATE(mul_ph) (&exp, src1, src2, MASK_VALUE, 0); + HF(res) = INTRINSIC (_mask_mul_ph) (HF(res), MASK_VALUE, HF(src1), HF(src2)); + CHECK_RESULT (&res, &exp, N_ELEMS, _mask_mul_ph); + + EMULATE(mul_ph) (&exp, src1, src2, ZMASK_VALUE, 1); + HF(res) = INTRINSIC (_maskz_mul_ph) (ZMASK_VALUE, HF(src1), HF(src2)); + CHECK_RESULT (&res, &exp, N_ELEMS, _maskz_mul_ph); + +#if AVX512F_LEN == 512 + EMULATE(mul_ph) (&exp, src1, src2, NET_MASK, 0); + HF(res) = INTRINSIC (_mul_round_ph) (HF(src1), HF(src2), _ROUND_NINT); + CHECK_RESULT (&res, &exp, N_ELEMS, _mul_ph); + + init_dest(&res, &exp); + EMULATE(mul_ph) (&exp, src1, src2, MASK_VALUE, 0); + HF(res) = INTRINSIC (_mask_mul_round_ph) (HF(res), MASK_VALUE, HF(src1), HF(src2), _ROUND_NINT); + CHECK_RESULT (&res, &exp, N_ELEMS, _mask_mul_ph); + + EMULATE(mul_ph) (&exp, src1, src2, ZMASK_VALUE, 1); + HF(res) = INTRINSIC (_maskz_mul_round_ph) (ZMASK_VALUE, HF(src1), HF(src2), _ROUND_NINT); + CHECK_RESULT (&res, &exp, N_ELEMS, _maskz_mul_ph); +#endif + + if (n_errs != 0) { + abort (); + } +} diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vsubph-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vsubph-1a.c new file mode 100644 index 00000000000..bb5eda64e37 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vsubph-1a.c @@ -0,0 +1,26 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx512fp16 -O2" } */ +/* { dg-final { scan-assembler-times "vsubph\[ \\t\]+%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vsubph\[ \\t\]+%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vsubph\[ \\t\]+%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vsubph\[ \\t\]+\{rn-sae\}\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vsubph\[ \\t\]+\{rn-sae\}\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vsubph\[ \\t\]+\{rz-sae\}\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ + +#include + +volatile __m512h res, res1, res2; +volatile __m512h x1, x2; +volatile __mmask32 m32; + +void extern +avx512f_test (void) +{ + res = _mm512_sub_ph (x1, x2); + res1 = _mm512_mask_sub_ph (res1, m32, x1, x2); + res2 = _mm512_maskz_sub_ph (m32, x1, x2); + + res = _mm512_sub_round_ph (x1, x2, 8); + res1 = _mm512_mask_sub_round_ph (res1, m32, x1, x2, 8); + res2 = _mm512_maskz_sub_round_ph (m32, x1, x2, 11); +} diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vsubph-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vsubph-1b.c new file mode 100644 index 00000000000..bd31d98f43d --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vsubph-1b.c @@ -0,0 +1,93 @@ +/* { dg-do run { target avx512fp16 } } */ +/* { dg-options "-O2 -mavx512fp16 -mavx512dq" } */ + +#define AVX512FP16 +#include "avx512fp16-helper.h" + +#define N_ELEMS (AVX512F_LEN / 16) + +void NOINLINE +EMULATE(sub_ph) (V512 * dest, V512 op1, V512 op2, + __mmask32 k, int zero_mask) +{ + V512 v1, v2, v3, v4, v5, v6, v7, v8; + int i; + __mmask16 m1, m2; + + m1 = k & 0xffff; + m2 = (k >> 16) & 0xffff; + + unpack_ph_2twops(op1, &v1, &v2); + unpack_ph_2twops(op2, &v3, &v4); + unpack_ph_2twops(*dest, &v7, &v8); + + for (i = 0; i < 16; i++) { + if (((1 << i) & m1) == 0) { + if (zero_mask) { + v5.f32[i] = 0; + } + else { + v5.u32[i] = v7.u32[i]; + } + } + else { + v5.f32[i] = v1.f32[i] - v3.f32[i]; + } + + if (((1 << i) & m2) == 0) { + if (zero_mask) { + v6.f32[i] = 0; + } + else { + v6.u32[i] = v8.u32[i]; + } + } + else { + v6.f32[i] = v2.f32[i] - v4.f32[i]; + } + + } + *dest = pack_twops_2ph(v5, v6); +} + + +void +TEST (void) +{ + V512 res; + V512 exp; + + init_src(); + + EMULATE(sub_ph) (&exp, src1, src2, NET_MASK, 0); + HF(res) = INTRINSIC (_sub_ph) (HF(src1), HF(src2)); + CHECK_RESULT (&res, &exp, N_ELEMS, _sub_ph); + + init_dest(&res, &exp); + EMULATE(sub_ph) (&exp, src1, src2, MASK_VALUE, 0); + HF(res) = INTRINSIC (_mask_sub_ph) (HF(res), MASK_VALUE, HF(src1), HF(src2)); + CHECK_RESULT (&res, &exp, N_ELEMS, _mask_sub_ph); + + EMULATE(sub_ph) (&exp, src1, src2, ZMASK_VALUE, 1); + HF(res) = INTRINSIC (_maskz_sub_ph) (ZMASK_VALUE, HF(src1), HF(src2)); + CHECK_RESULT (&res, &exp, N_ELEMS, _maskz_sub_ph); + +#if AVX512F_LEN == 512 + EMULATE(sub_ph) (&exp, src1, src2, NET_MASK, 0); + HF(res) = INTRINSIC (_sub_round_ph) (HF(src1), HF(src2), _ROUND_NINT); + CHECK_RESULT (&res, &exp, N_ELEMS, _sub_ph); + + init_dest(&res, &exp); + EMULATE(sub_ph) (&exp, src1, src2, MASK_VALUE, 0); + HF(res) = INTRINSIC (_mask_sub_round_ph) (HF(res), MASK_VALUE, HF(src1), HF(src2), _ROUND_NINT); + CHECK_RESULT (&res, &exp, N_ELEMS, _mask_sub_ph); + + EMULATE(sub_ph) (&exp, src1, src2, ZMASK_VALUE, 1); + HF(res) = INTRINSIC (_maskz_sub_round_ph) (ZMASK_VALUE, HF(src1), HF(src2), _ROUND_NINT); + CHECK_RESULT (&res, &exp, N_ELEMS, _maskz_sub_ph); +#endif + + if (n_errs != 0) { + abort (); + } +} diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-vaddph-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vaddph-1a.c new file mode 100644 index 00000000000..354d897dd9e --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vaddph-1a.c @@ -0,0 +1,29 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx512fp16 -mavx512vl -O2" } */ +/* { dg-final { scan-assembler-times "vaddph\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vaddph\[ \\t\]+%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vaddph\[ \\t\]+%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vaddph\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vaddph\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vaddph\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ + +#include + +volatile __m256h res1; +volatile __m128h res2; +volatile __m256h x1,x2; +volatile __m128h x3, x4; +volatile __mmask16 m16; +volatile __mmask8 m8; + +void extern +avx512f_test (void) +{ + res1 = _mm256_add_ph (x1, x2); + res1 = _mm256_mask_add_ph (res1, m16, x1, x2); + res1 = _mm256_maskz_add_ph (m16, x1, x2); + + res2 = _mm_add_ph (x3, x4); + res2 = _mm_mask_add_ph (res2, m8, x3, x4); + res2 = _mm_maskz_add_ph (m8, x3, x4); +} diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-vaddph-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vaddph-1b.c new file mode 100644 index 00000000000..fcf6a9058f5 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vaddph-1b.c @@ -0,0 +1,16 @@ +/* { dg-do run { target avx512fp16 } } */ +/* { dg-options "-O2 -mavx512fp16 -mavx512vl -mavx512dq" } */ + +#define DEBUG +#define AVX512VL +#define AVX512F_LEN 256 +#define AVX512F_LEN_HALF 128 +#include "avx512fp16-vaddph-1b.c" + +#undef AVX512F_LEN +#undef AVX512F_LEN_HALF + +#define AVX512F_LEN 128 +#define AVX512F_LEN_HALF 128 +#include "avx512fp16-vaddph-1b.c" + diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-vdivph-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vdivph-1a.c new file mode 100644 index 00000000000..038d9e42fce --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vdivph-1a.c @@ -0,0 +1,29 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx512fp16 -mavx512vl -O2" } */ +/* { dg-final { scan-assembler-times "vdivph\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vdivph\[ \\t\]+%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vdivph\[ \\t\]+%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vdivph\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vdivph\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vdivph\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ + +#include + +volatile __m256h res1; +volatile __m128h res2; +volatile __m256h x1,x2; +volatile __m128h x3, x4; +volatile __mmask16 m16; +volatile __mmask8 m8; + +void extern +avx512f_test (void) +{ + res1 = _mm256_div_ph (x1, x2); + res1 = _mm256_mask_div_ph (res1, m16, x1, x2); + res1 = _mm256_maskz_div_ph (m16, x1, x2); + + res2 = _mm_div_ph (x3, x4); + res2 = _mm_mask_div_ph (res2, m8, x3, x4); + res2 = _mm_maskz_div_ph (m8, x3, x4); +} diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-vdivph-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vdivph-1b.c new file mode 100644 index 00000000000..48965c6cfb8 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vdivph-1b.c @@ -0,0 +1,16 @@ +/* { dg-do run { target avx512fp16 } } */ +/* { dg-options "-O2 -mavx512fp16 -mavx512vl -mavx512dq" } */ + +#define DEBUG +#define AVX512VL +#define AVX512F_LEN 256 +#define AVX512F_LEN_HALF 128 +#include "avx512fp16-vdivph-1b.c" + +#undef AVX512F_LEN +#undef AVX512F_LEN_HALF + +#define AVX512F_LEN 128 +#define AVX512F_LEN_HALF 128 +#include "avx512fp16-vdivph-1b.c" + diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-vmulph-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vmulph-1a.c new file mode 100644 index 00000000000..26663c5ca8d --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vmulph-1a.c @@ -0,0 +1,29 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx512fp16 -mavx512vl -O2" } */ +/* { dg-final { scan-assembler-times "vmulph\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vmulph\[ \\t\]+%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vmulph\[ \\t\]+%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vmulph\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vmulph\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vmulph\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ + +#include + +volatile __m256h res1; +volatile __m128h res2; +volatile __m256h x1,x2; +volatile __m128h x3, x4; +volatile __mmask16 m16; +volatile __mmask8 m8; + +void extern +avx512f_test (void) +{ + res1 = _mm256_mul_ph (x1, x2); + res1 = _mm256_mask_mul_ph (res1, m16, x1, x2); + res1 = _mm256_maskz_mul_ph (m16, x1, x2); + + res2 = _mm_mul_ph (x3, x4); + res2 = _mm_mask_mul_ph (res2, m8, x3, x4); + res2 = _mm_maskz_mul_ph (m8, x3, x4); +} diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-vmulph-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vmulph-1b.c new file mode 100644 index 00000000000..2b3ba050533 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vmulph-1b.c @@ -0,0 +1,16 @@ +/* { dg-do run { target avx512fp16 } } */ +/* { dg-options "-O2 -mavx512fp16 -mavx512vl -mavx512dq" } */ + +#define DEBUG +#define AVX512VL +#define AVX512F_LEN 256 +#define AVX512F_LEN_HALF 128 +#include "avx512fp16-vmulph-1b.c" + +#undef AVX512F_LEN +#undef AVX512F_LEN_HALF + +#define AVX512F_LEN 128 +#define AVX512F_LEN_HALF 128 +#include "avx512fp16-vmulph-1b.c" + diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-vsubph-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vsubph-1a.c new file mode 100644 index 00000000000..10e5cbfed92 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vsubph-1a.c @@ -0,0 +1,29 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx512fp16 -mavx512vl -O2" } */ +/* { dg-final { scan-assembler-times "vsubph\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vsubph\[ \\t\]+%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vsubph\[ \\t\]+%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vsubph\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vsubph\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vsubph\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ + +#include + +volatile __m256h res1; +volatile __m128h res2; +volatile __m256h x1,x2; +volatile __m128h x3, x4; +volatile __mmask16 m16; +volatile __mmask8 m8; + +void extern +avx512f_test (void) +{ + res1 = _mm256_sub_ph (x1, x2); + res1 = _mm256_mask_sub_ph (res1, m16, x1, x2); + res1 = _mm256_maskz_sub_ph (m16, x1, x2); + + res2 = _mm_sub_ph (x3, x4); + res2 = _mm_mask_sub_ph (res2, m8, x3, x4); + res2 = _mm_maskz_sub_ph (m8, x3, x4); +} diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-vsubph-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vsubph-1b.c new file mode 100644 index 00000000000..fa162185e3c --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vsubph-1b.c @@ -0,0 +1,16 @@ +/* { dg-do run { target avx512fp16 } } */ +/* { dg-options "-O2 -mavx512fp16 -mavx512vl -mavx512dq" } */ + +#define DEBUG +#define AVX512VL +#define AVX512F_LEN 256 +#define AVX512F_LEN_HALF 128 +#include "avx512fp16-vsubph-1b.c" + +#undef AVX512F_LEN +#undef AVX512F_LEN_HALF + +#define AVX512F_LEN 128 +#define AVX512F_LEN_HALF 128 +#include "avx512fp16-vsubph-1b.c" + From patchwork Thu Jul 1 06:15:55 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: liuhongt X-Patchwork-Id: 1499312 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=8.43.85.97; helo=sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256 header.s=default header.b=LtQvsi7j; dkim-atps=neutral Received: from sourceware.org (ip-8-43-85-97.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4GFp9f1wrBz9sWX for ; Thu, 1 Jul 2021 16:25:26 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id DDE9B384A012 for ; Thu, 1 Jul 2021 06:25:23 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org DDE9B384A012 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1625120723; bh=Bm2zU9eamFvZM0Ph+5YU8ArZxZhvtxYo8FFeJwusjD8=; h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=LtQvsi7jfTd9hYPOLWDZ8teYKERQ21EXFaQIgsN2zPmzIPewrmtDwHbksJqBSLCFm rNXDkPs1gu1DfVnLMJgwrXkfwIvqMtWMvG+G4homQCePh1DEzqaw0NOEu5cu7gVakX ALqMm8/jyJF0yLhRoNbL56MZPwbEykvyl3RttZn0= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by sourceware.org (Postfix) with ESMTPS id DEB4F384842A for ; Thu, 1 Jul 2021 06:17:06 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org DEB4F384842A X-IronPort-AV: E=McAfee;i="6200,9189,10031"; a="205474396" X-IronPort-AV: E=Sophos;i="5.83,313,1616482800"; d="scan'208";a="205474396" Received: from orsmga007.jf.intel.com ([10.7.209.58]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Jun 2021 23:17:05 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.83,313,1616482800"; d="scan'208";a="447761424" Received: from scymds01.sc.intel.com ([10.148.94.138]) by orsmga007.jf.intel.com with ESMTP; 30 Jun 2021 23:17:05 -0700 Received: from shliclel320.sh.intel.com (shliclel320.sh.intel.com [10.239.236.50]) by scymds01.sc.intel.com with ESMTP id 1616Gmem031625; Wed, 30 Jun 2021 23:17:04 -0700 To: gcc-patches@gcc.gnu.org Subject: [PATCH 09/62] AVX512FP16: Enable _Float16 autovectorization Date: Thu, 1 Jul 2021 14:15:55 +0800 Message-Id: <20210701061648.9447-10-hongtao.liu@intel.com> X-Mailer: git-send-email 2.18.1 In-Reply-To: <20210701061648.9447-1-hongtao.liu@intel.com> References: <20210701061648.9447-1-hongtao.liu@intel.com> X-Spam-Status: No, score=-12.8 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_NONE, KAM_DMARC_STATUS, KAM_LAZY_DOMAIN_SECURITY, KAM_SHORT, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: liuhongt via Gcc-patches From: liuhongt Reply-To: liuhongt Cc: jakub@redhat.com Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" From: "H.J. Lu" gcc/ChangeLog: * config/i386/i386-expand.c (ix86_avx256_split_vector_move_misalign): Handle V16HF mode. * config/i386/i386.c (ix86_preferred_simd_mode): Handle HF mode. * config/i386/sse.md (V_256H): New mode iterator. (avx_vextractf128): Use it. gcc/testsuite/ChangeLog: * gcc.target/i386/vect-float16-1.c: New test. * gcc.target/i386/vect-float16-10.c: Ditto. * gcc.target/i386/vect-float16-11.c: Ditto. * gcc.target/i386/vect-float16-12.c: Ditto. * gcc.target/i386/vect-float16-2.c: Ditto. * gcc.target/i386/vect-float16-3.c: Ditto. * gcc.target/i386/vect-float16-4.c: Ditto. * gcc.target/i386/vect-float16-5.c: Ditto. * gcc.target/i386/vect-float16-6.c: Ditto. * gcc.target/i386/vect-float16-7.c: Ditto. * gcc.target/i386/vect-float16-8.c: Ditto. * gcc.target/i386/vect-float16-9.c: Ditto. --- gcc/config/i386/i386-expand.c | 4 ++++ gcc/config/i386/i386.c | 14 ++++++++++++++ gcc/config/i386/sse.md | 7 ++++++- gcc/testsuite/gcc.target/i386/vect-float16-1.c | 14 ++++++++++++++ gcc/testsuite/gcc.target/i386/vect-float16-10.c | 14 ++++++++++++++ gcc/testsuite/gcc.target/i386/vect-float16-11.c | 14 ++++++++++++++ gcc/testsuite/gcc.target/i386/vect-float16-12.c | 14 ++++++++++++++ gcc/testsuite/gcc.target/i386/vect-float16-2.c | 14 ++++++++++++++ gcc/testsuite/gcc.target/i386/vect-float16-3.c | 14 ++++++++++++++ gcc/testsuite/gcc.target/i386/vect-float16-4.c | 14 ++++++++++++++ gcc/testsuite/gcc.target/i386/vect-float16-5.c | 14 ++++++++++++++ gcc/testsuite/gcc.target/i386/vect-float16-6.c | 14 ++++++++++++++ gcc/testsuite/gcc.target/i386/vect-float16-7.c | 14 ++++++++++++++ gcc/testsuite/gcc.target/i386/vect-float16-8.c | 14 ++++++++++++++ gcc/testsuite/gcc.target/i386/vect-float16-9.c | 14 ++++++++++++++ 15 files changed, 192 insertions(+), 1 deletion(-) create mode 100644 gcc/testsuite/gcc.target/i386/vect-float16-1.c create mode 100644 gcc/testsuite/gcc.target/i386/vect-float16-10.c create mode 100644 gcc/testsuite/gcc.target/i386/vect-float16-11.c create mode 100644 gcc/testsuite/gcc.target/i386/vect-float16-12.c create mode 100644 gcc/testsuite/gcc.target/i386/vect-float16-2.c create mode 100644 gcc/testsuite/gcc.target/i386/vect-float16-3.c create mode 100644 gcc/testsuite/gcc.target/i386/vect-float16-4.c create mode 100644 gcc/testsuite/gcc.target/i386/vect-float16-5.c create mode 100644 gcc/testsuite/gcc.target/i386/vect-float16-6.c create mode 100644 gcc/testsuite/gcc.target/i386/vect-float16-7.c create mode 100644 gcc/testsuite/gcc.target/i386/vect-float16-8.c create mode 100644 gcc/testsuite/gcc.target/i386/vect-float16-9.c diff --git a/gcc/config/i386/i386-expand.c b/gcc/config/i386/i386-expand.c index 39647eb2cf1..df50c72ab16 100644 --- a/gcc/config/i386/i386-expand.c +++ b/gcc/config/i386/i386-expand.c @@ -498,6 +498,10 @@ ix86_avx256_split_vector_move_misalign (rtx op0, rtx op1) extract = gen_avx_vextractf128v32qi; mode = V16QImode; break; + case E_V16HFmode: + extract = gen_avx_vextractf128v16hf; + mode = V8HFmode; + break; case E_V8SFmode: extract = gen_avx_vextractf128v8sf; mode = V4SFmode; diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 79e6880d9dd..dc0d440061b 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -22360,6 +22360,20 @@ ix86_preferred_simd_mode (scalar_mode mode) else return V2DImode; + case E_HFmode: + if (TARGET_AVX512FP16) + { + if (TARGET_AVX512VL) + { + if (TARGET_PREFER_AVX128) + return V8HFmode; + else if (TARGET_PREFER_AVX256) + return V16HFmode; + } + return V32HFmode; + } + return word_mode; + case E_SFmode: if (TARGET_AVX512F && !TARGET_PREFER_AVX256) return V16SFmode; diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index 2c1b6fbcd86..a0cfd611006 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -276,6 +276,11 @@ (define_mode_iterator V_128 (define_mode_iterator V_256 [V32QI V16HI V8SI V4DI V8SF V4DF]) +;; All 256bit vector modes including HF vector mode +(define_mode_iterator V_256H + [V32QI V16HI V8SI V4DI V8SF V4DF + (V16HF "TARGET_AVX512F && TARGET_AVX512VL")]) + ;; All 128bit and 256bit vector modes (define_mode_iterator V_128_256 [V32QI V16QI V16HI V8HI V8SI V4SI V4DI V2DI V8SF V4SF V4DF V2DF]) @@ -9045,7 +9050,7 @@ (define_expand "avx512vl_vextractf128" (define_expand "avx_vextractf128" [(match_operand: 0 "nonimmediate_operand") - (match_operand:V_256 1 "register_operand") + (match_operand:V_256H 1 "register_operand") (match_operand:SI 2 "const_0_to_1_operand")] "TARGET_AVX" { diff --git a/gcc/testsuite/gcc.target/i386/vect-float16-1.c b/gcc/testsuite/gcc.target/i386/vect-float16-1.c new file mode 100644 index 00000000000..0f82cf94932 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/vect-float16-1.c @@ -0,0 +1,14 @@ +/* { dg-do compile } */ +/* { dg-options "-O3 -mavx512fp16 -mno-avx512vl" } */ + +/* Check that we vectorize to a full 128-bit vector for _Float16 types. */ + +void +foo (_Float16 *__restrict__ a, _Float16 *__restrict__ b, + _Float16 *__restrict__ c) +{ + for (int i = 0; i < 256; i++) + a[i] = b[i] + c[i]; +} + +/* { dg-final { scan-assembler-times "vaddph" 8 } } */ diff --git a/gcc/testsuite/gcc.target/i386/vect-float16-10.c b/gcc/testsuite/gcc.target/i386/vect-float16-10.c new file mode 100644 index 00000000000..217645692ad --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/vect-float16-10.c @@ -0,0 +1,14 @@ +/* { dg-do compile } */ +/* { dg-options "-O3 -mavx512fp16 -mno-avx512vl" } */ + +/* Check that we vectorize to a full 128-bit vector for _Float16 types. */ + +void +foo (_Float16 *__restrict__ a, _Float16 *__restrict__ b, + _Float16 *__restrict__ c) +{ + for (int i = 0; i < 256; i++) + a[i] = b[i] / c[i]; +} + +/* { dg-final { scan-assembler-times "vdivph" 8 } } */ diff --git a/gcc/testsuite/gcc.target/i386/vect-float16-11.c b/gcc/testsuite/gcc.target/i386/vect-float16-11.c new file mode 100644 index 00000000000..e0409ce9d3f --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/vect-float16-11.c @@ -0,0 +1,14 @@ +/* { dg-do compile } */ +/* { dg-options "-O3 -mavx512fp16 -mavx512vl -mprefer-vector-width=128" } */ + +/* Check that we vectorize to a full 128-bit vector for _Float16 types. */ + +void +foo (_Float16 *__restrict__ a, _Float16 *__restrict__ b, + _Float16 *__restrict__ c) +{ + for (int i = 0; i < 128; i++) + a[i] = b[i] / c[i]; +} + +/* { dg-final { scan-assembler-times "vdivph" 16 } } */ diff --git a/gcc/testsuite/gcc.target/i386/vect-float16-12.c b/gcc/testsuite/gcc.target/i386/vect-float16-12.c new file mode 100644 index 00000000000..d92a25dc255 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/vect-float16-12.c @@ -0,0 +1,14 @@ +/* { dg-do compile } */ +/* { dg-options "-O3 -mavx512fp16 -mavx512vl -mprefer-vector-width=256" } */ + +/* Check that we vectorize to a full 128-bit vector for _Float16 types. */ + +void +foo (_Float16 *__restrict__ a, _Float16 *__restrict__ b, + _Float16 *__restrict__ c) +{ + for (int i = 0; i < 256; i++) + a[i] = b[i] / c[i]; +} + +/* { dg-final { scan-assembler-times "vdivph" 16 } } */ diff --git a/gcc/testsuite/gcc.target/i386/vect-float16-2.c b/gcc/testsuite/gcc.target/i386/vect-float16-2.c new file mode 100644 index 00000000000..974fca4ce09 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/vect-float16-2.c @@ -0,0 +1,14 @@ +/* { dg-do compile } */ +/* { dg-options "-O3 -mavx512fp16 -mavx512vl -mprefer-vector-width=128" } */ + +/* Check that we vectorize to a full 128-bit vector for _Float16 types. */ + +void +foo (_Float16 *__restrict__ a, _Float16 *__restrict__ b, + _Float16 *__restrict__ c) +{ + for (int i = 0; i < 128; i++) + a[i] = b[i] + c[i]; +} + +/* { dg-final { scan-assembler-times "vaddph" 16 } } */ diff --git a/gcc/testsuite/gcc.target/i386/vect-float16-3.c b/gcc/testsuite/gcc.target/i386/vect-float16-3.c new file mode 100644 index 00000000000..9bca9142df7 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/vect-float16-3.c @@ -0,0 +1,14 @@ +/* { dg-do compile } */ +/* { dg-options "-O3 -mavx512fp16 -mavx512vl -mprefer-vector-width=256" } */ + +/* Check that we vectorize to a full 128-bit vector for _Float16 types. */ + +void +foo (_Float16 *__restrict__ a, _Float16 *__restrict__ b, + _Float16 *__restrict__ c) +{ + for (int i = 0; i < 256; i++) + a[i] = b[i] + c[i]; +} + +/* { dg-final { scan-assembler-times "vaddph" 16 } } */ diff --git a/gcc/testsuite/gcc.target/i386/vect-float16-4.c b/gcc/testsuite/gcc.target/i386/vect-float16-4.c new file mode 100644 index 00000000000..e6f26f0aa40 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/vect-float16-4.c @@ -0,0 +1,14 @@ +/* { dg-do compile } */ +/* { dg-options "-O3 -mavx512fp16 -mno-avx512vl" } */ + +/* Check that we vectorize to a full 128-bit vector for _Float16 types. */ + +void +foo (_Float16 *__restrict__ a, _Float16 *__restrict__ b, + _Float16 *__restrict__ c) +{ + for (int i = 0; i < 256; i++) + a[i] = b[i] - c[i]; +} + +/* { dg-final { scan-assembler-times "vsubph" 8 } } */ diff --git a/gcc/testsuite/gcc.target/i386/vect-float16-5.c b/gcc/testsuite/gcc.target/i386/vect-float16-5.c new file mode 100644 index 00000000000..38f287b1dc0 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/vect-float16-5.c @@ -0,0 +1,14 @@ +/* { dg-do compile } */ +/* { dg-options "-O3 -mavx512fp16 -mavx512vl -mprefer-vector-width=128" } */ + +/* Check that we vectorize to a full 128-bit vector for _Float16 types. */ + +void +foo (_Float16 *__restrict__ a, _Float16 *__restrict__ b, + _Float16 *__restrict__ c) +{ + for (int i = 0; i < 128; i++) + a[i] = b[i] - c[i]; +} + +/* { dg-final { scan-assembler-times "vsubph" 16 } } */ diff --git a/gcc/testsuite/gcc.target/i386/vect-float16-6.c b/gcc/testsuite/gcc.target/i386/vect-float16-6.c new file mode 100644 index 00000000000..bc9f7870061 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/vect-float16-6.c @@ -0,0 +1,14 @@ +/* { dg-do compile } */ +/* { dg-options "-O3 -mavx512fp16 -mavx512vl -mprefer-vector-width=256" } */ + +/* Check that we vectorize to a full 128-bit vector for _Float16 types. */ + +void +foo (_Float16 *__restrict__ a, _Float16 *__restrict__ b, + _Float16 *__restrict__ c) +{ + for (int i = 0; i < 256; i++) + a[i] = b[i] - c[i]; +} + +/* { dg-final { scan-assembler-times "vsubph" 16 } } */ diff --git a/gcc/testsuite/gcc.target/i386/vect-float16-7.c b/gcc/testsuite/gcc.target/i386/vect-float16-7.c new file mode 100644 index 00000000000..b4849cf77c7 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/vect-float16-7.c @@ -0,0 +1,14 @@ +/* { dg-do compile } */ +/* { dg-options "-O3 -mavx512fp16 -mno-avx512vl" } */ + +/* Check that we vectorize to a full 128-bit vector for _Float16 types. */ + +void +foo (_Float16 *__restrict__ a, _Float16 *__restrict__ b, + _Float16 *__restrict__ c) +{ + for (int i = 0; i < 256; i++) + a[i] = b[i] * c[i]; +} + +/* { dg-final { scan-assembler-times "vmulph" 8 } } */ diff --git a/gcc/testsuite/gcc.target/i386/vect-float16-8.c b/gcc/testsuite/gcc.target/i386/vect-float16-8.c new file mode 100644 index 00000000000..71631b17cc3 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/vect-float16-8.c @@ -0,0 +1,14 @@ +/* { dg-do compile } */ +/* { dg-options "-O3 -mavx512fp16 -mavx512vl -mprefer-vector-width=128" } */ + +/* Check that we vectorize to a full 128-bit vector for _Float16 types. */ + +void +foo (_Float16 *__restrict__ a, _Float16 *__restrict__ b, + _Float16 *__restrict__ c) +{ + for (int i = 0; i < 128; i++) + a[i] = b[i] * c[i]; +} + +/* { dg-final { scan-assembler-times "vmulph" 16 } } */ diff --git a/gcc/testsuite/gcc.target/i386/vect-float16-9.c b/gcc/testsuite/gcc.target/i386/vect-float16-9.c new file mode 100644 index 00000000000..1be5c7f022f --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/vect-float16-9.c @@ -0,0 +1,14 @@ +/* { dg-do compile } */ +/* { dg-options "-O3 -mavx512fp16 -mavx512vl -mprefer-vector-width=256" } */ + +/* Check that we vectorize to a full 128-bit vector for _Float16 types. */ + +void +foo (_Float16 *__restrict__ a, _Float16 *__restrict__ b, + _Float16 *__restrict__ c) +{ + for (int i = 0; i < 256; i++) + a[i] = b[i] * c[i]; +} + +/* { dg-final { scan-assembler-times "vmulph" 16 } } */ From patchwork Thu Jul 1 06:15:56 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: liuhongt X-Patchwork-Id: 1499318 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256 header.s=default header.b=I+lBBKy+; dkim-atps=neutral Received: from sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4GFpGg3M8Mz9sWX for ; Thu, 1 Jul 2021 16:29:47 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 14F7F385503C for ; Thu, 1 Jul 2021 06:29:45 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 14F7F385503C DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1625120985; bh=gp0knQczr2Z7EuO4AST3Len6le8PajqH9RILDMfK0vI=; h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=I+lBBKy+DPM5A0u1Ukfd4Z6vttQp/9IsBUXaDMilPvJowd9yqxEd9+3RWWrpSr/7w zXQe+wBSkZUXB2it3NNmnY3MxKTEIva4lj+6ld2diNK2W6IBfcJmT+wbLaNyTTFi4y Oc4BhNStBZ+zmQjC4NzadX1Cksv2ZTiChxfZfeGg= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mga18.intel.com (mga18.intel.com [134.134.136.126]) by sourceware.org (Postfix) with ESMTPS id 8EF9D384F013 for ; Thu, 1 Jul 2021 06:17:10 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 8EF9D384F013 X-IronPort-AV: E=McAfee;i="6200,9189,10031"; a="195769837" X-IronPort-AV: E=Sophos;i="5.83,313,1616482800"; d="scan'208";a="195769837" Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Jun 2021 23:17:07 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.83,313,1616482800"; d="scan'208";a="457530466" Received: from scymds01.sc.intel.com ([10.148.94.138]) by fmsmga008.fm.intel.com with ESMTP; 30 Jun 2021 23:17:07 -0700 Received: from shliclel320.sh.intel.com (shliclel320.sh.intel.com [10.239.236.50]) by scymds01.sc.intel.com with ESMTP id 1616Gmen031625; Wed, 30 Jun 2021 23:17:05 -0700 To: gcc-patches@gcc.gnu.org Subject: [PATCH 10/62] AVX512FP16: Add vaddsh/vsubsh/vmulsh/vdivsh. Date: Thu, 1 Jul 2021 14:15:56 +0800 Message-Id: <20210701061648.9447-11-hongtao.liu@intel.com> X-Mailer: git-send-email 2.18.1 In-Reply-To: <20210701061648.9447-1-hongtao.liu@intel.com> References: <20210701061648.9447-1-hongtao.liu@intel.com> X-Spam-Status: No, score=-13.1 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_NONE, KAM_DMARC_STATUS, KAM_LAZY_DOMAIN_SECURITY, KAM_SHORT, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: liuhongt via Gcc-patches From: liuhongt Reply-To: liuhongt Cc: jakub@redhat.com, "Liu, Hongtao" Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" From: "Liu, Hongtao" gcc/ChangeLog: * config/i386/avx512fp16intrin.h (_mm_add_sh): New intrinsic. (_mm_mask_add_sh): Likewise. (_mm_maskz_add_sh): Likewise. (_mm_sub_sh): Likewise. (_mm_mask_sub_sh): Likewise. (_mm_maskz_sub_sh): Likewise. (_mm_mul_sh): Likewise. (_mm_mask_mul_sh): Likewise. (_mm_maskz_mul_sh): Likewise. (_mm_div_sh): Likewise. (_mm_mask_div_sh): Likewise. (_mm_maskz_div_sh): Likewise. (_mm_add_round_sh): Likewise. (_mm_mask_add_round_sh): Likewise. (_mm_maskz_add_round_sh): Likewise. (_mm_sub_round_sh): Likewise. (_mm_mask_sub_round_sh): Likewise. (_mm_maskz_sub_round_sh): Likewise. (_mm_mul_round_sh): Likewise. (_mm_mask_mul_round_sh): Likewise. (_mm_maskz_mul_round_sh): Likewise. (_mm_div_round_sh): Likewise. (_mm_mask_div_round_sh): Likewise. (_mm_maskz_div_round_sh): Likewise. * config/i386/i386-builtin-types.def: Add corresponding builtin types. * config/i386/i386-builtin.def: Add corresponding new builtins. * config/i386/i386-expand.c (ix86_expand_round_builtin): Handle new builtins. * config/i386/sse.md (VF_128): Change description. (_vm3): Adjust to support HF vector modes. (_vm3): Likewise. gcc/testsuite/ChangeLog: * gcc.target/i386/avx-1.c: Add test for new builtins. * gcc.target/i386/sse-13.c: Ditto. * gcc.target/i386/sse-23.c: Ditto. * gcc.target/i386/sse-14.c: Add test for new intrinsics. * gcc.target/i386/sse-22.c: Ditto. --- gcc/config/i386/avx512fp16intrin.h | 254 +++++++++++++++++++++++++ gcc/config/i386/i386-builtin-types.def | 2 + gcc/config/i386/i386-builtin.def | 8 + gcc/config/i386/i386-expand.c | 2 + gcc/config/i386/sse.md | 22 +-- gcc/testsuite/gcc.target/i386/avx-1.c | 4 + gcc/testsuite/gcc.target/i386/sse-13.c | 4 + gcc/testsuite/gcc.target/i386/sse-14.c | 12 ++ gcc/testsuite/gcc.target/i386/sse-22.c | 12 ++ gcc/testsuite/gcc.target/i386/sse-23.c | 4 + 10 files changed, 313 insertions(+), 11 deletions(-) diff --git a/gcc/config/i386/avx512fp16intrin.h b/gcc/config/i386/avx512fp16intrin.h index 3e9d676dc39..6ae12ebf920 100644 --- a/gcc/config/i386/avx512fp16intrin.h +++ b/gcc/config/i386/avx512fp16intrin.h @@ -468,6 +468,260 @@ _mm512_maskz_div_round_ph (__mmask32 __A, __m512h __B, __m512h __C, (A), (D))) #endif /* __OPTIMIZE__ */ +/* Intrinsics of v[add,sub,mul,div]sh. */ +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_add_sh (__m128h __A, __m128h __B) +{ + __A[0] += __B[0]; + return __A; +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_add_sh (__m128h __A, __mmask8 __B, __m128h __C, __m128h __D) +{ + return __builtin_ia32_vaddsh_v8hf_mask (__C, __D, __A, __B); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_add_sh (__mmask8 __A, __m128h __B, __m128h __C) +{ + return __builtin_ia32_vaddsh_v8hf_mask (__B, __C, _mm_setzero_ph (), + __A); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_sub_sh (__m128h __A, __m128h __B) +{ + __A[0] -= __B[0]; + return __A; +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_sub_sh (__m128h __A, __mmask8 __B, __m128h __C, __m128h __D) +{ + return __builtin_ia32_vsubsh_v8hf_mask (__C, __D, __A, __B); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_sub_sh (__mmask8 __A, __m128h __B, __m128h __C) +{ + return __builtin_ia32_vsubsh_v8hf_mask (__B, __C, _mm_setzero_ph (), + __A); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mul_sh (__m128h __A, __m128h __B) +{ + __A[0] *= __B[0]; + return __A; +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_mul_sh (__m128h __A, __mmask8 __B, __m128h __C, __m128h __D) +{ + return __builtin_ia32_vmulsh_v8hf_mask (__C, __D, __A, __B); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_mul_sh (__mmask8 __A, __m128h __B, __m128h __C) +{ + return __builtin_ia32_vmulsh_v8hf_mask (__B, __C, _mm_setzero_ph (), __A); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_div_sh (__m128h __A, __m128h __B) +{ + __A[0] /= __B[0]; + return __A; +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_div_sh (__m128h __A, __mmask8 __B, __m128h __C, __m128h __D) +{ + return __builtin_ia32_vdivsh_v8hf_mask (__C, __D, __A, __B); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_div_sh (__mmask8 __A, __m128h __B, __m128h __C) +{ + return __builtin_ia32_vdivsh_v8hf_mask (__B, __C, _mm_setzero_ph (), + __A); +} + +#ifdef __OPTIMIZE__ +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_add_round_sh (__m128h __A, __m128h __B, const int __C) +{ + return __builtin_ia32_vaddsh_v8hf_mask_round (__A, __B, + _mm_setzero_ph (), + (__mmask8) -1, __C); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_add_round_sh (__m128h __A, __mmask8 __B, __m128h __C, + __m128h __D, const int __E) +{ + return __builtin_ia32_vaddsh_v8hf_mask_round (__C, __D, __A, __B, __E); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_add_round_sh (__mmask8 __A, __m128h __B, __m128h __C, + const int __D) +{ + return __builtin_ia32_vaddsh_v8hf_mask_round (__B, __C, + _mm_setzero_ph (), + __A, __D); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_sub_round_sh (__m128h __A, __m128h __B, const int __C) +{ + return __builtin_ia32_vsubsh_v8hf_mask_round (__A, __B, + _mm_setzero_ph (), + (__mmask8) -1, __C); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_sub_round_sh (__m128h __A, __mmask8 __B, __m128h __C, + __m128h __D, const int __E) +{ + return __builtin_ia32_vsubsh_v8hf_mask_round (__C, __D, __A, __B, __E); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_sub_round_sh (__mmask8 __A, __m128h __B, __m128h __C, + const int __D) +{ + return __builtin_ia32_vsubsh_v8hf_mask_round (__B, __C, + _mm_setzero_ph (), + __A, __D); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mul_round_sh (__m128h __A, __m128h __B, const int __C) +{ + return __builtin_ia32_vmulsh_v8hf_mask_round (__A, __B, + _mm_setzero_ph (), + (__mmask8) -1, __C); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_mul_round_sh (__m128h __A, __mmask8 __B, __m128h __C, + __m128h __D, const int __E) +{ + return __builtin_ia32_vmulsh_v8hf_mask_round (__C, __D, __A, __B, __E); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_mul_round_sh (__mmask8 __A, __m128h __B, __m128h __C, + const int __D) +{ + return __builtin_ia32_vmulsh_v8hf_mask_round (__B, __C, + _mm_setzero_ph (), + __A, __D); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_div_round_sh (__m128h __A, __m128h __B, const int __C) +{ + return __builtin_ia32_vdivsh_v8hf_mask_round (__A, __B, + _mm_setzero_ph (), + (__mmask8) -1, __C); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_div_round_sh (__m128h __A, __mmask8 __B, __m128h __C, + __m128h __D, const int __E) +{ + return __builtin_ia32_vdivsh_v8hf_mask_round (__C, __D, __A, __B, __E); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_div_round_sh (__mmask8 __A, __m128h __B, __m128h __C, + const int __D) +{ + return __builtin_ia32_vdivsh_v8hf_mask_round (__B, __C, + _mm_setzero_ph (), + __A, __D); +} +#else +#define _mm_add_round_sh(A, B, C) \ + ((__m128h)__builtin_ia32_vaddsh_v8hf_mask_round ((A), (B), \ + _mm_setzero_ph (), \ + (__mmask8)-1, (C))) + +#define _mm_mask_add_round_sh(A, B, C, D, E) \ + ((__m128h)__builtin_ia32_vaddsh_v8hf_mask_round ((C), (D), (A), (B), (E))) + +#define _mm_maskz_add_round_sh(A, B, C, D) \ + ((__m128h)__builtin_ia32_vaddsh_v8hf_mask_round ((B), (C), \ + _mm_setzero_ph (), \ + (A), (D))) + +#define _mm_sub_round_sh(A, B, C) \ + ((__m128h)__builtin_ia32_vsubsh_v8hf_mask_round ((A), (B), \ + _mm_setzero_ph (), \ + (__mmask8)-1, (C))) + +#define _mm_mask_sub_round_sh(A, B, C, D, E) \ + ((__m128h)__builtin_ia32_vsubsh_v8hf_mask_round ((C), (D), (A), (B), (E))) + +#define _mm_maskz_sub_round_sh(A, B, C, D) \ + ((__m128h)__builtin_ia32_vsubsh_v8hf_mask_round ((B), (C), \ + _mm_setzero_ph (), \ + (A), (D))) + +#define _mm_mul_round_sh(A, B, C) \ + ((__m128h)__builtin_ia32_vmulsh_v8hf_mask_round ((A), (B), \ + _mm_setzero_ph (), \ + (__mmask8)-1, (C))) + +#define _mm_mask_mul_round_sh(A, B, C, D, E) \ + ((__m128h)__builtin_ia32_vmulsh_v8hf_mask_round ((C), (D), (A), (B), (E))) + +#define _mm_maskz_mul_round_sh(A, B, C, D) \ + ((__m128h)__builtin_ia32_vmulsh_v8hf_mask_round ((B), (C), \ + _mm_setzero_ph (), \ + (A), (D))) + +#define _mm_div_round_sh(A, B, C) \ + ((__m128h)__builtin_ia32_vdivsh_v8hf_mask_round ((A), (B), \ + _mm_setzero_ph (), \ + (__mmask8)-1, (C))) + +#define _mm_mask_div_round_sh(A, B, C, D, E) \ + ((__m128h)__builtin_ia32_vdivsh_v8hf_mask_round ((C), (D), (A), (B), (E))) + +#define _mm_maskz_div_round_sh(A, B, C, D) \ + ((__m128h)__builtin_ia32_vdivsh_v8hf_mask_round ((B), (C), \ + _mm_setzero_ph (), \ + (A), (D))) +#endif /* __OPTIMIZE__ */ + #ifdef __DISABLE_AVX512FP16__ #undef __DISABLE_AVX512FP16__ #pragma GCC pop_options diff --git a/gcc/config/i386/i386-builtin-types.def b/gcc/config/i386/i386-builtin-types.def index ee3b8c30589..ed738f71927 100644 --- a/gcc/config/i386/i386-builtin-types.def +++ b/gcc/config/i386/i386-builtin-types.def @@ -1304,7 +1304,9 @@ DEF_FUNCTION_TYPE (UINT8, PV2DI, PCV2DI, PCVOID) # FP16 builtins DEF_FUNCTION_TYPE (V8HF, V8HI) +DEF_FUNCTION_TYPE (V8HF, V8HF, V8HF, INT) DEF_FUNCTION_TYPE (V8HF, V8HF, V8HF, V8HF, UQI) +DEF_FUNCTION_TYPE (V8HF, V8HF, V8HF, V8HF, UQI, INT) DEF_FUNCTION_TYPE (V16HF, V16HF, V16HF, V16HF, UHI) DEF_FUNCTION_TYPE (V32HF, V32HF, V32HF, INT) DEF_FUNCTION_TYPE (V32HF, V32HF, V32HF, V32HF, USI) diff --git a/gcc/config/i386/i386-builtin.def b/gcc/config/i386/i386-builtin.def index b783d266dd8..60e2b75be14 100644 --- a/gcc/config/i386/i386-builtin.def +++ b/gcc/config/i386/i386-builtin.def @@ -2787,6 +2787,10 @@ BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_mulv32hf3_mask, "__builtin_ia32_ BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_divv8hf3_mask, "__builtin_ia32_vdivph_v8hf_mask", IX86_BUILTIN_VDIVPH_V8HF_MASK, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI) BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_divv16hf3_mask, "__builtin_ia32_vdivph_v16hf_mask", IX86_BUILTIN_VDIVPH_V16HF_MASK, UNKNOWN, (int) V16HF_FTYPE_V16HF_V16HF_V16HF_UHI) BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_divv32hf3_mask, "__builtin_ia32_vdivph_v32hf_mask", IX86_BUILTIN_VDIVPH_V32HF_MASK, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI) +BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vmaddv8hf3_mask, "__builtin_ia32_vaddsh_v8hf_mask", IX86_BUILTIN_VADDSH_V8HF_MASK, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI) +BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vmsubv8hf3_mask, "__builtin_ia32_vsubsh_v8hf_mask", IX86_BUILTIN_VSUBSH_V8HF_MASK, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI) +BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vmmulv8hf3_mask, "__builtin_ia32_vmulsh_v8hf_mask", IX86_BUILTIN_VMULSH_V8HF_MASK, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI) +BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vmdivv8hf3_mask, "__builtin_ia32_vdivsh_v8hf_mask", IX86_BUILTIN_VDIVSH_V8HF_MASK, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI) /* Builtins with rounding support. */ BDESC_END (ARGS, ROUND_ARGS) @@ -2992,6 +2996,10 @@ BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_addv32hf3_mask_round, "__builtin BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_subv32hf3_mask_round, "__builtin_ia32_vsubph_v32hf_mask_round", IX86_BUILTIN_VSUBPH_V32HF_MASK_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT) BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_mulv32hf3_mask_round, "__builtin_ia32_vmulph_v32hf_mask_round", IX86_BUILTIN_VMULPH_V32HF_MASK_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT) BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_divv32hf3_mask_round, "__builtin_ia32_vdivph_v32hf_mask_round", IX86_BUILTIN_VDIVPH_V32HF_MASK_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT) +BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vmaddv8hf3_mask_round, "__builtin_ia32_vaddsh_v8hf_mask_round", IX86_BUILTIN_VADDSH_V8HF_MASK_ROUND, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI_INT) +BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vmsubv8hf3_mask_round, "__builtin_ia32_vsubsh_v8hf_mask_round", IX86_BUILTIN_VSUBSH_V8HF_MASK_ROUND, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI_INT) +BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vmmulv8hf3_mask_round, "__builtin_ia32_vmulsh_v8hf_mask_round", IX86_BUILTIN_VMULSH_V8HF_MASK_ROUND, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI_INT) +BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vmdivv8hf3_mask_round, "__builtin_ia32_vdivsh_v8hf_mask_round", IX86_BUILTIN_VDIVSH_V8HF_MASK_ROUND, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI_INT) BDESC_END (ROUND_ARGS, MULTI_ARG) diff --git a/gcc/config/i386/i386-expand.c b/gcc/config/i386/i386-expand.c index df50c72ab16..d2a47150e1b 100644 --- a/gcc/config/i386/i386-expand.c +++ b/gcc/config/i386/i386-expand.c @@ -10468,6 +10468,7 @@ ix86_expand_round_builtin (const struct builtin_description *d, nargs = 2; break; case V32HF_FTYPE_V32HF_V32HF_INT: + case V8HF_FTYPE_V8HF_V8HF_INT: case V4SF_FTYPE_V4SF_UINT_INT: case V4SF_FTYPE_V4SF_UINT64_INT: case V2DF_FTYPE_V2DF_UINT64_INT: @@ -10515,6 +10516,7 @@ ix86_expand_round_builtin (const struct builtin_description *d, case V4SF_FTYPE_V4SF_V4SF_V4SF_QI_INT: case V4SF_FTYPE_V4SF_V2DF_V4SF_QI_INT: case V4SF_FTYPE_V4SF_V2DF_V4SF_UQI_INT: + case V8HF_FTYPE_V8HF_V8HF_V8HF_UQI_INT: nargs = 5; break; case V16SF_FTYPE_V16SF_INT_V16SF_HI_INT: diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index a0cfd611006..8fa3f8ddac9 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -347,7 +347,7 @@ (define_mode_iterator VF2_512_256 (define_mode_iterator VF2_512_256VL [V8DF (V4DF "TARGET_AVX512VL")]) -;; All 128bit vector float modes +;; All 128bit vector SF/DF modes (define_mode_iterator VF_128 [V4SF (V2DF "TARGET_SSE2")]) @@ -2006,11 +2006,11 @@ (define_insn "*_vm3" (set_attr "mode" "")]) (define_insn "_vm3" - [(set (match_operand:VF_128 0 "register_operand" "=x,v") - (vec_merge:VF_128 - (plusminus:VF_128 - (match_operand:VF_128 1 "register_operand" "0,v") - (match_operand:VF_128 2 "nonimmediate_operand" "xm,")) + [(set (match_operand:VFH_128 0 "register_operand" "=x,v") + (vec_merge:VFH_128 + (plusminus:VFH_128 + (match_operand:VFH_128 1 "register_operand" "0,v") + (match_operand:VFH_128 2 "nonimmediate_operand" "xm,")) (match_dup 1) (const_int 1)))] "TARGET_SSE" @@ -2070,11 +2070,11 @@ (define_insn "*_vm3" (set_attr "mode" "")]) (define_insn "_vm3" - [(set (match_operand:VF_128 0 "register_operand" "=x,v") - (vec_merge:VF_128 - (multdiv:VF_128 - (match_operand:VF_128 1 "register_operand" "0,v") - (match_operand:VF_128 2 "nonimmediate_operand" "xm,")) + [(set (match_operand:VFH_128 0 "register_operand" "=x,v") + (vec_merge:VFH_128 + (multdiv:VFH_128 + (match_operand:VFH_128 1 "register_operand" "0,v") + (match_operand:VFH_128 2 "nonimmediate_operand" "xm,")) (match_dup 1) (const_int 1)))] "TARGET_SSE" diff --git a/gcc/testsuite/gcc.target/i386/avx-1.c b/gcc/testsuite/gcc.target/i386/avx-1.c index 1eaee861141..26ca87ce2f5 100644 --- a/gcc/testsuite/gcc.target/i386/avx-1.c +++ b/gcc/testsuite/gcc.target/i386/avx-1.c @@ -690,6 +690,10 @@ #define __builtin_ia32_vsubph_v32hf_mask_round(A, B, C, D, E) __builtin_ia32_vsubph_v32hf_mask_round(A, B, C, D, 8) #define __builtin_ia32_vmulph_v32hf_mask_round(A, B, C, D, E) __builtin_ia32_vmulph_v32hf_mask_round(A, B, C, D, 8) #define __builtin_ia32_vdivph_v32hf_mask_round(A, B, C, D, E) __builtin_ia32_vdivph_v32hf_mask_round(A, B, C, D, 8) +#define __builtin_ia32_vaddsh_v8hf_mask_round(A, B, C, D, E) __builtin_ia32_vaddsh_v8hf_mask_round(A, B, C, D, 8) +#define __builtin_ia32_vsubsh_v8hf_mask_round(A, B, C, D, E) __builtin_ia32_vsubsh_v8hf_mask_round(A, B, C, D, 8) +#define __builtin_ia32_vmulsh_v8hf_mask_round(A, B, C, D, E) __builtin_ia32_vmulsh_v8hf_mask_round(A, B, C, D, 8) +#define __builtin_ia32_vdivsh_v8hf_mask_round(A, B, C, D, E) __builtin_ia32_vdivsh_v8hf_mask_round(A, B, C, D, 8) /* vpclmulqdqintrin.h */ #define __builtin_ia32_vpclmulqdq_v4di(A, B, C) __builtin_ia32_vpclmulqdq_v4di(A, B, 1) diff --git a/gcc/testsuite/gcc.target/i386/sse-13.c b/gcc/testsuite/gcc.target/i386/sse-13.c index 50ed74cd6d6..ae35adb5ead 100644 --- a/gcc/testsuite/gcc.target/i386/sse-13.c +++ b/gcc/testsuite/gcc.target/i386/sse-13.c @@ -707,6 +707,10 @@ #define __builtin_ia32_vsubph_v32hf_mask_round(A, B, C, D, E) __builtin_ia32_vsubph_v32hf_mask_round(A, B, C, D, 8) #define __builtin_ia32_vmulph_v32hf_mask_round(A, B, C, D, E) __builtin_ia32_vmulph_v32hf_mask_round(A, B, C, D, 8) #define __builtin_ia32_vdivph_v32hf_mask_round(A, B, C, D, E) __builtin_ia32_vdivph_v32hf_mask_round(A, B, C, D, 8) +#define __builtin_ia32_vaddsh_v8hf_mask_round(A, B, C, D, E) __builtin_ia32_vaddsh_v8hf_mask_round(A, B, C, D, 8) +#define __builtin_ia32_vsubsh_v8hf_mask_round(A, B, C, D, E) __builtin_ia32_vsubsh_v8hf_mask_round(A, B, C, D, 8) +#define __builtin_ia32_vmulsh_v8hf_mask_round(A, B, C, D, E) __builtin_ia32_vmulsh_v8hf_mask_round(A, B, C, D, 8) +#define __builtin_ia32_vdivsh_v8hf_mask_round(A, B, C, D, E) __builtin_ia32_vdivsh_v8hf_mask_round(A, B, C, D, 8) /* vpclmulqdqintrin.h */ #define __builtin_ia32_vpclmulqdq_v4di(A, B, C) __builtin_ia32_vpclmulqdq_v4di(A, B, 1) diff --git a/gcc/testsuite/gcc.target/i386/sse-14.c b/gcc/testsuite/gcc.target/i386/sse-14.c index 26a5e94c7ca..e79edf0a5bb 100644 --- a/gcc/testsuite/gcc.target/i386/sse-14.c +++ b/gcc/testsuite/gcc.target/i386/sse-14.c @@ -672,14 +672,26 @@ test_2 (_mm512_add_round_ph, __m512h, __m512h, __m512h, 8) test_2 (_mm512_sub_round_ph, __m512h, __m512h, __m512h, 8) test_2 (_mm512_mul_round_ph, __m512h, __m512h, __m512h, 8) test_2 (_mm512_div_round_ph, __m512h, __m512h, __m512h, 8) +test_2 (_mm_add_round_sh, __m128h, __m128h, __m128h, 8) +test_2 (_mm_sub_round_sh, __m128h, __m128h, __m128h, 8) +test_2 (_mm_mul_round_sh, __m128h, __m128h, __m128h, 8) +test_2 (_mm_div_round_sh, __m128h, __m128h, __m128h, 8) test_3 (_mm512_maskz_add_round_ph, __m512h, __mmask32, __m512h, __m512h, 8) test_3 (_mm512_maskz_sub_round_ph, __m512h, __mmask32, __m512h, __m512h, 8) test_3 (_mm512_maskz_mul_round_ph, __m512h, __mmask32, __m512h, __m512h, 8) test_3 (_mm512_maskz_div_round_ph, __m512h, __mmask32, __m512h, __m512h, 8) +test_3 (_mm_maskz_add_round_sh, __m128h, __mmask32, __m128h, __m128h, 8) +test_3 (_mm_maskz_sub_round_sh, __m128h, __mmask32, __m128h, __m128h, 8) +test_3 (_mm_maskz_mul_round_sh, __m128h, __mmask32, __m128h, __m128h, 8) +test_3 (_mm_maskz_div_round_sh, __m128h, __mmask32, __m128h, __m128h, 8) test_4 (_mm512_mask_add_round_ph, __m512h, __m512h, __mmask32, __m512h, __m512h, 8) test_4 (_mm512_mask_sub_round_ph, __m512h, __m512h, __mmask32, __m512h, __m512h, 8) test_4 (_mm512_mask_mul_round_ph, __m512h, __m512h, __mmask32, __m512h, __m512h, 8) test_4 (_mm512_mask_div_round_ph, __m512h, __m512h, __mmask32, __m512h, __m512h, 8) +test_4 (_mm_mask_add_round_sh, __m128h, __m128h, __mmask32, __m128h, __m128h, 8) +test_4 (_mm_mask_sub_round_sh, __m128h, __m128h, __mmask32, __m128h, __m128h, 8) +test_4 (_mm_mask_mul_round_sh, __m128h, __m128h, __mmask32, __m128h, __m128h, 8) +test_4 (_mm_mask_div_round_sh, __m128h, __m128h, __mmask32, __m128h, __m128h, 8) /* shaintrin.h */ test_2 (_mm_sha1rnds4_epu32, __m128i, __m128i, __m128i, 1) diff --git a/gcc/testsuite/gcc.target/i386/sse-22.c b/gcc/testsuite/gcc.target/i386/sse-22.c index 8d25effd724..2c1f27d881a 100644 --- a/gcc/testsuite/gcc.target/i386/sse-22.c +++ b/gcc/testsuite/gcc.target/i386/sse-22.c @@ -777,14 +777,26 @@ test_2 (_mm512_add_round_ph, __m512h, __m512h, __m512h, 8) test_2 (_mm512_sub_round_ph, __m512h, __m512h, __m512h, 8) test_2 (_mm512_mul_round_ph, __m512h, __m512h, __m512h, 8) test_2 (_mm512_div_round_ph, __m512h, __m512h, __m512h, 8) +test_2 (_mm_add_round_sh, __m128h, __m128h, __m128h, 8) +test_2 (_mm_sub_round_sh, __m128h, __m128h, __m128h, 8) +test_2 (_mm_mul_round_sh, __m128h, __m128h, __m128h, 8) +test_2 (_mm_div_round_sh, __m128h, __m128h, __m128h, 8) test_3 (_mm512_maskz_add_round_ph, __m512h, __mmask32, __m512h, __m512h, 8) test_3 (_mm512_maskz_sub_round_ph, __m512h, __mmask32, __m512h, __m512h, 8) test_3 (_mm512_maskz_mul_round_ph, __m512h, __mmask32, __m512h, __m512h, 8) test_3 (_mm512_maskz_div_round_ph, __m512h, __mmask32, __m512h, __m512h, 8) +test_3 (_mm_maskz_add_round_sh, __m128h, __mmask8, __m128h, __m128h, 8) +test_3 (_mm_maskz_sub_round_sh, __m128h, __mmask8, __m128h, __m128h, 8) +test_3 (_mm_maskz_mul_round_sh, __m128h, __mmask8, __m128h, __m128h, 8) +test_3 (_mm_maskz_div_round_sh, __m128h, __mmask8, __m128h, __m128h, 8) test_4 (_mm512_mask_add_round_ph, __m512h, __m512h, __mmask32, __m512h, __m512h, 8) test_4 (_mm512_mask_sub_round_ph, __m512h, __m512h, __mmask32, __m512h, __m512h, 8) test_4 (_mm512_mask_mul_round_ph, __m512h, __m512h, __mmask32, __m512h, __m512h, 8) test_4 (_mm512_mask_div_round_ph, __m512h, __m512h, __mmask32, __m512h, __m512h, 8) +test_4 (_mm_mask_add_round_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 8) +test_4 (_mm_mask_sub_round_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 8) +test_4 (_mm_mask_mul_round_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 8) +test_4 (_mm_mask_div_round_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 8) /* shaintrin.h */ test_2 (_mm_sha1rnds4_epu32, __m128i, __m128i, __m128i, 1) diff --git a/gcc/testsuite/gcc.target/i386/sse-23.c b/gcc/testsuite/gcc.target/i386/sse-23.c index f7dd5d7495c..a89aef2aa8e 100644 --- a/gcc/testsuite/gcc.target/i386/sse-23.c +++ b/gcc/testsuite/gcc.target/i386/sse-23.c @@ -708,6 +708,10 @@ #define __builtin_ia32_vsubph_v32hf_mask_round(A, B, C, D, E) __builtin_ia32_vsubph_v32hf_mask_round(A, B, C, D, 8) #define __builtin_ia32_vmulph_v32hf_mask_round(A, B, C, D, E) __builtin_ia32_vmulph_v32hf_mask_round(A, B, C, D, 8) #define __builtin_ia32_vdivph_v32hf_mask_round(A, B, C, D, E) __builtin_ia32_vdivph_v32hf_mask_round(A, B, C, D, 8) +#define __builtin_ia32_vaddsh_v8hf_mask_round(A, B, C, D, E) __builtin_ia32_vaddsh_v8hf_mask_round(A, B, C, D, 8) +#define __builtin_ia32_vsubsh_v8hf_mask_round(A, B, C, D, E) __builtin_ia32_vsubsh_v8hf_mask_round(A, B, C, D, 8) +#define __builtin_ia32_vmulsh_v8hf_mask_round(A, B, C, D, E) __builtin_ia32_vmulsh_v8hf_mask_round(A, B, C, D, 8) +#define __builtin_ia32_vdivsh_v8hf_mask_round(A, B, C, D, E) __builtin_ia32_vdivsh_v8hf_mask_round(A, B, C, D, 8) /* vpclmulqdqintrin.h */ #define __builtin_ia32_vpclmulqdq_v4di(A, B, C) __builtin_ia32_vpclmulqdq_v4di(A, B, 1) From patchwork Thu Jul 1 06:15:57 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: liuhongt X-Patchwork-Id: 1499317 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256 header.s=default header.b=E/erWNsl; dkim-atps=neutral Received: from sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4GFpFy0NBnz9sWX for ; Thu, 1 Jul 2021 16:29:10 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 9DC9C385503C for ; Thu, 1 Jul 2021 06:29:07 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 9DC9C385503C DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1625120947; bh=Qcf6ClFjmGhpTqxS7PoJzdKO3LZi2VJKkRWZ8gzBWJY=; h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=E/erWNslAukiL4tLD1/23vmN4AAeUkvdSOw1q8qQA/aAzN4xkoHrUGPpaY1++KOo/ X0CKMbUQxu8hsEEDrSwHwRsxTo/73/HUja761E2qrg/b6S0XAWaW+pQ42FrjDqUHeC 7UsCP0tH7aKYoJVaL+NayIwVeYeb+vCBVxoG2jLI= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mga17.intel.com (mga17.intel.com [192.55.52.151]) by sourceware.org (Postfix) with ESMTPS id 113C7384A012 for ; Thu, 1 Jul 2021 06:17:09 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 113C7384A012 X-IronPort-AV: E=McAfee;i="6200,9189,10031"; a="188859379" X-IronPort-AV: E=Sophos;i="5.83,313,1616482800"; d="scan'208";a="188859379" Received: from fmsmga005.fm.intel.com ([10.253.24.32]) by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Jun 2021 23:17:09 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.83,313,1616482800"; d="scan'208";a="644339060" Received: from scymds01.sc.intel.com ([10.148.94.138]) by fmsmga005.fm.intel.com with ESMTP; 30 Jun 2021 23:17:08 -0700 Received: from shliclel320.sh.intel.com (shliclel320.sh.intel.com [10.239.236.50]) by scymds01.sc.intel.com with ESMTP id 1616Gmeo031625; Wed, 30 Jun 2021 23:17:07 -0700 To: gcc-patches@gcc.gnu.org Subject: [PATCH 11/62] AVX512FP16: Add testcase for vaddsh/vsubsh/vmulsh/vdivsh. Date: Thu, 1 Jul 2021 14:15:57 +0800 Message-Id: <20210701061648.9447-12-hongtao.liu@intel.com> X-Mailer: git-send-email 2.18.1 In-Reply-To: <20210701061648.9447-1-hongtao.liu@intel.com> References: <20210701061648.9447-1-hongtao.liu@intel.com> X-Spam-Status: No, score=-12.7 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_NONE, KAM_DMARC_STATUS, KAM_LAZY_DOMAIN_SECURITY, KAM_SHORT, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: liuhongt via Gcc-patches From: liuhongt Reply-To: liuhongt Cc: jakub@redhat.com Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" gcc/testsuite/ChangeLog: * gcc.target/i386/avx512fp16-vaddsh-1a.c: New test. * gcc.target/i386/avx512fp16-vaddsh-1b.c: Ditto. * gcc.target/i386/avx512fp16-vdivsh-1a.c: Ditto. * gcc.target/i386/avx512fp16-vdivsh-1b.c: Ditto. * gcc.target/i386/avx512fp16-vmulsh-1a.c: Ditto. * gcc.target/i386/avx512fp16-vmulsh-1b.c: Ditto. * gcc.target/i386/avx512fp16-vsubsh-1a.c: Ditto. * gcc.target/i386/avx512fp16-vsubsh-1b.c: Ditto. * gcc.target/i386/pr54855-11.c: Ditto. --- .../gcc.target/i386/avx512fp16-vaddsh-1a.c | 27 +++++ .../gcc.target/i386/avx512fp16-vaddsh-1b.c | 104 ++++++++++++++++++ .../gcc.target/i386/avx512fp16-vdivsh-1a.c | 27 +++++ .../gcc.target/i386/avx512fp16-vdivsh-1b.c | 76 +++++++++++++ .../gcc.target/i386/avx512fp16-vmulsh-1a.c | 27 +++++ .../gcc.target/i386/avx512fp16-vmulsh-1b.c | 77 +++++++++++++ .../gcc.target/i386/avx512fp16-vsubsh-1a.c | 27 +++++ .../gcc.target/i386/avx512fp16-vsubsh-1b.c | 76 +++++++++++++ gcc/testsuite/gcc.target/i386/pr54855-11.c | 16 +++ 9 files changed, 457 insertions(+) create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vaddsh-1a.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vaddsh-1b.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vdivsh-1a.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vdivsh-1b.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vmulsh-1a.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vmulsh-1b.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vsubsh-1a.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vsubsh-1b.c create mode 100644 gcc/testsuite/gcc.target/i386/pr54855-11.c diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vaddsh-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vaddsh-1a.c new file mode 100644 index 00000000000..97aac3fd131 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vaddsh-1a.c @@ -0,0 +1,27 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx512fp16 -O2" } */ +/* { dg-final { scan-assembler-times "vaddsh\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vaddsh\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vaddsh\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vaddsh\[ \\t\]+\{rn-sae\}\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vaddsh\[ \\t\]+\{rn-sae\}\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vaddsh\[ \\t\]+\{rz-sae\}\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ + +#include + +volatile __m128h res; +volatile __m128h x1, x2; +volatile __mmask16 m16; +volatile __mmask8 m8; + +void extern +avx512f_test (void) +{ + res = _mm_add_sh (x1, x2); + res = _mm_mask_add_sh (res, m8, x1, x2); + res = _mm_maskz_add_sh (m8, x1, x2); + + res = _mm_add_round_sh (x1, x2, 8); + res = _mm_mask_add_round_sh (res, m8, x1, x2, 8); + res = _mm_maskz_add_round_sh (m8, x1, x2, 11); +} diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vaddsh-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vaddsh-1b.c new file mode 100644 index 00000000000..724112c8fc0 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vaddsh-1b.c @@ -0,0 +1,104 @@ +/* { dg-do run { target avx512fp16 } } */ +/* { dg-options "-O2 -mavx512fp16 -mavx512dq" } */ + + +#define AVX512FP16 +#include "avx512fp16-helper.h" + +#define N_ELEMS 8 + +void NOINLINE +emulate_add_sh(V512 * dest, V512 op1, V512 op2, + __mmask8 k, int zero_mask) +{ + V512 v1, v2, v3, v4, v5, v6, v7, v8; + int i; + + unpack_ph_2twops(op1, &v1, &v2); + unpack_ph_2twops(op2, &v3, &v4); + unpack_ph_2twops(*dest, &v7, &v8); + + if ((k&1) || !k) + v5.f32[0] = v1.f32[0] + v3.f32[0]; + else if (zero_mask) + v5.f32[0] = 0; + else + v5.f32[0] = v7.f32[0]; + + for (i = 1; i < 8; i++) + v5.f32[i] = v1.f32[i]; + + *dest = pack_twops_2ph(v5, v6); +} + +void +test_512 (void) +{ + V512 res; + V512 exp; + + init_src(); + + emulate_add_sh(&exp, src1, src2, 0x1, 0); + res.xmmh[0] = _mm_add_sh(src1.xmmh[0], src2.xmmh[0]); + check_results(&res, &exp, N_ELEMS, "_mm_add_sh"); + + //DEST.fp16[0] := SRC1.fp16[0] + SRC2.fp16[0] + emulate_add_sh(&exp, src1, src2, 0x1, 0); + res.xmmh[0] = _mm_mask_add_sh(res.xmmh[0], 0x1, + src1.xmmh[0], src2.xmmh[0]); + check_results(&res, &exp, N_ELEMS, "_mm_mask_add_sh"); + + //dest.fp16[0] remains unchanged + init_dest(&res, &exp); + emulate_add_sh(&exp, src1, src2, 0x2, 0); + res.xmmh[0] = _mm_mask_add_sh(res.xmmh[0], 0x2, + src1.xmmh[0], src2.xmmh[0]); + check_results(&res, &exp, N_ELEMS, "_mm_mask_add_sh"); + + //dest.fp16[0] = 0 + emulate_add_sh(&exp, src1, src2, 0x2, 1); + res.xmmh[0] = _mm_maskz_add_sh(0x2, src1.xmmh[0], src2.xmmh[0]); + check_results(&res, &exp, N_ELEMS, "_mm_maskz_add_sh"); + + //DEST.fp16[0] := SRC1.fp16[0] + SRC2.fp16[0] + emulate_add_sh(&exp, src1, src2, 0x3, 1); + res.xmmh[0] = _mm_maskz_add_sh(0x3, src1.xmmh[0], src2.xmmh[0]); + check_results(&res, &exp, N_ELEMS, "_mm_maskz_add_sh"); + + //DEST.fp16[0] := SRC1.fp16[0] + SRC2.fp16[0] + emulate_add_sh(&exp, src1, src2, 0x1, 0); + res.xmmh[0] = _mm_add_round_sh(src1.xmmh[0], + src2.xmmh[0], _ROUND_NINT); + check_results(&res, &exp, N_ELEMS, "_mm_add_round_sh"); + + //DEST.fp16[0] := SRC1.fp16[0] + SRC2.fp16[0] + emulate_add_sh(&exp, src1, src2, 0x1, 0); + res.xmmh[0] = _mm_mask_add_round_sh(res.xmmh[0], 0x1, src1.xmmh[0], + src2.xmmh[0], _ROUND_NINT); + check_results(&res, &exp, N_ELEMS, "_mm_mask_add_round_sh"); + + //dest.fp16[0] remains unchanged + init_dest(&res, &exp); + emulate_add_sh(&exp, src1, src2, 0x2, 0); + res.xmmh[0] = _mm_mask_add_round_sh(res.xmmh[0], 0x2, src1.xmmh[0], + src2.xmmh[0], _ROUND_NINT); + check_results(&res, &exp, N_ELEMS, "_mm_mask_add_round_sh"); + + //dest.fp16[0] = 0 + emulate_add_sh(&exp, src1, src2, 0x2, 1); + res.xmmh[0] = _mm_maskz_add_round_sh(0x2, src1.xmmh[0], + src2.xmmh[0], _ROUND_NINT); + check_results(&res, &exp, N_ELEMS, "_mm_maskz_add_round_sh"); + + //DEST.fp16[0] := SRC1.fp16[0] + SRC2.fp16[0] + emulate_add_sh(&exp, src1, src2, 0x3, 1); + res.xmmh[0] = _mm_maskz_add_round_sh(0x3, src1.xmmh[0], + src2.xmmh[0], _ROUND_NINT); + check_results(&res, &exp, N_ELEMS, "_mm_maskz_add_round_sh"); + + if (n_errs != 0) { + abort (); + } +} + diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vdivsh-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vdivsh-1a.c new file mode 100644 index 00000000000..39f26f5d77a --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vdivsh-1a.c @@ -0,0 +1,27 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx512fp16 -O2" } */ +/* { dg-final { scan-assembler-times "vdivsh\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vdivsh\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vdivsh\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vdivsh\[ \\t\]+\{rn-sae\}\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vdivsh\[ \\t\]+\{rn-sae\}\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vdivsh\[ \\t\]+\{rz-sae\}\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ + +#include + +volatile __m128h res; +volatile __m128h x1, x2; +volatile __mmask16 m16; +volatile __mmask8 m8; + +void extern +avx512f_test (void) +{ + res = _mm_div_sh (x1, x2); + res = _mm_mask_div_sh (res, m8, x1, x2); + res = _mm_maskz_div_sh (m8, x1, x2); + + res = _mm_div_round_sh (x1, x2, 8); + res = _mm_mask_div_round_sh (res, m8, x1, x2, 8); + res = _mm_maskz_div_round_sh (m8, x1, x2, 11); +} diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vdivsh-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vdivsh-1b.c new file mode 100644 index 00000000000..467f5d20155 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vdivsh-1b.c @@ -0,0 +1,76 @@ +/* { dg-do run { target avx512fp16 } } */ +/* { dg-options "-O2 -mavx512fp16 -mavx512dq" } */ + + +#define AVX512FP16 +#include "avx512fp16-helper.h" + +#define N_ELEMS 8 + +void NOINLINE +emulate_div_sh(V512 * dest, V512 op1, V512 op2, + __mmask8 k, int zero_mask) +{ + V512 v1, v2, v3, v4, v5, v6, v7, v8; + int i; + + unpack_ph_2twops(op1, &v1, &v2); + unpack_ph_2twops(op2, &v3, &v4); + unpack_ph_2twops(*dest, &v7, &v8); + + if ((k&1) || !k) + v5.f32[0] = v1.f32[0] / v3.f32[0]; + else if (zero_mask) + v5.f32[0] = 0; + else + v5.f32[0] = v7.f32[0]; + + for (i = 1; i < 8; i++) + v5.f32[i] = v1.f32[i]; + + *dest = pack_twops_2ph(v5, v6); +} + +void +test_512 (void) +{ + V512 res; + V512 exp; + + init_src(); + + emulate_div_sh(&exp, src1, src2, 0x1, 0); + res.xmmh[0] = _mm_div_sh(src1.xmmh[0], src2.xmmh[0]); + check_results(&res, &exp, N_ELEMS, "_mm_div_sh"); + + init_dest(&res, &exp); + emulate_div_sh(&exp, src1, src2, 0x1, 0); + res.xmmh[0] = _mm_mask_div_sh(res.xmmh[0], 0x1, src1.xmmh[0], + src2.xmmh[0]); + check_results(&res, &exp, N_ELEMS, "_mm_mask_div_sh"); + + emulate_div_sh(&exp, src1, src2, 0x3, 1); + res.xmmh[0] = _mm_maskz_div_sh(0x3, src1.xmmh[0], src2.xmmh[0]); + check_results(&res, &exp, N_ELEMS, "_mm_maskz_div_sh"); + + emulate_div_sh(&exp, src1, src2, 0x1, 0); + res.xmmh[0] = _mm_div_round_sh(src1.xmmh[0], src2.xmmh[0], + _ROUND_NINT); + check_results(&res, &exp, N_ELEMS, "_mm_div_sh"); + + init_dest(&res, &exp); + emulate_div_sh(&exp, src1, src2, 0x1, 0); + res.xmmh[0] = _mm_mask_div_round_sh(res.xmmh[0], 0x1, src1.xmmh[0], + src2.xmmh[0], _ROUND_NINT); + check_results(&res, &exp, N_ELEMS, "_mm_mask_div_sh"); + + emulate_div_sh(&exp, src1, src2, 0x3, 1); + res.xmmh[0] = _mm_maskz_div_round_sh(0x3, src1.xmmh[0], + src2.xmmh[0], _ROUND_NINT); + check_results(&res, &exp, N_ELEMS, "_mm_maskz_div_sh"); + + if (n_errs != 0) { + abort (); + } +} + diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vmulsh-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vmulsh-1a.c new file mode 100644 index 00000000000..85707b5f169 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vmulsh-1a.c @@ -0,0 +1,27 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx512fp16 -O2" } */ +/* { dg-final { scan-assembler-times "vmulsh\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vmulsh\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vmulsh\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vmulsh\[ \\t\]+\{rn-sae\}\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vmulsh\[ \\t\]+\{rn-sae\}\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vmulsh\[ \\t\]+\{rz-sae\}\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ + +#include + +volatile __m128h res; +volatile __m128h x1, x2; +volatile __mmask16 m16; +volatile __mmask8 m8; + +void extern +avx512f_test (void) +{ + res = _mm_mul_sh (x1, x2); + res = _mm_mask_mul_sh (res, m8, x1, x2); + res = _mm_maskz_mul_sh (m8, x1, x2); + + res = _mm_mul_round_sh (x1, x2, 8); + res = _mm_mask_mul_round_sh (res, m8, x1, x2, 8); + res = _mm_maskz_mul_round_sh (m8, x1, x2, 11); +} diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vmulsh-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vmulsh-1b.c new file mode 100644 index 00000000000..36b6930a516 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vmulsh-1b.c @@ -0,0 +1,77 @@ +/* { dg-do run { target avx512fp16 } } */ +/* { dg-options "-O2 -mavx512fp16 -mavx512dq" } */ + + +#define AVX512FP16 +#include "avx512fp16-helper.h" + +#define N_ELEMS 8 + +void NOINLINE +emulate_mul_sh(V512 * dest, V512 op1, V512 op2, + __mmask8 k, int zero_mask) +{ + V512 v1, v2, v3, v4, v5, v6, v7, v8; + int i; + + unpack_ph_2twops(op1, &v1, &v2); + unpack_ph_2twops(op2, &v3, &v4); + unpack_ph_2twops(*dest, &v7, &v8); + + if ((k&1) || !k) + v5.f32[0] = v1.f32[0] * v3.f32[0]; + else if (zero_mask) + v5.f32[0] = 0; + else + v5.f32[0] = v7.f32[0]; + + for (i = 1; i < 8; i++) + v5.f32[i] = v1.f32[i]; + + *dest = pack_twops_2ph(v5, v6); +} + +void +test_512 (void) +{ + V512 res; + V512 exp; + + init_src(); + + emulate_mul_sh(&exp, src1, src2, 0x1, 0); + res.xmmh[0] = _mm_mul_sh(src1.xmmh[0], src2.xmmh[0]); + check_results(&res, &exp, N_ELEMS, "_mm_mul_sh"); + + init_dest(&res, &exp); + emulate_mul_sh(&exp, src1, src2, 0x1, 0); + res.xmmh[0] = _mm_mask_mul_sh(res.xmmh[0], 0x1, src1.xmmh[0], + src2.xmmh[0]); + check_results(&res, &exp, N_ELEMS, "_mm_mask_mul_sh"); + + emulate_mul_sh(&exp, src1, src2, 0x3, 1); + res.xmmh[0] = _mm_maskz_mul_sh(0x3, src1.xmmh[0], src2.xmmh[0]); + check_results(&res, &exp, N_ELEMS, "_mm_maskz_mul_sh"); + + emulate_mul_sh(&exp, src1, src2, 0x1, 0); + res.xmmh[0] = _mm_mul_round_sh(src1.xmmh[0], src2.xmmh[0], + _ROUND_NINT); + check_results(&res, &exp, N_ELEMS, "_mm_mul_sh"); + + init_dest(&res, &exp); + emulate_mul_sh(&exp, src1, src2, 0x1, 0); + res.xmmh[0] = _mm_mask_mul_round_sh(res.xmmh[0], 0x1, src1.xmmh[0], + src2.xmmh[0], _ROUND_NINT); + check_results(&res, &exp, N_ELEMS, "_mm_mask_mul_sh"); + + emulate_mul_sh(&exp, src1, src2, 0x3, 1); + res.xmmh[0] = _mm_maskz_mul_round_sh(0x3, src1.xmmh[0], + src2.xmmh[0], _ROUND_NINT); + check_results(&res, &exp, N_ELEMS, "_mm_maskz_mul_sh"); + + if (n_errs != 0) { + abort (); + } +} + + diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vsubsh-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vsubsh-1a.c new file mode 100644 index 00000000000..8ea1eea615b --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vsubsh-1a.c @@ -0,0 +1,27 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx512fp16 -O2" } */ +/* { dg-final { scan-assembler-times "vsubsh\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vsubsh\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vsubsh\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vsubsh\[ \\t\]+\{rn-sae\}\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vsubsh\[ \\t\]+\{rn-sae\}\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vsubsh\[ \\t\]+\{rz-sae\}\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ + +#include + +volatile __m128h res; +volatile __m128h x1, x2; +volatile __mmask16 m16; +volatile __mmask8 m8; + +void extern +avx512f_test (void) +{ + res = _mm_sub_sh (x1, x2); + res = _mm_mask_sub_sh (res, m8, x1, x2); + res = _mm_maskz_sub_sh (m8, x1, x2); + + res = _mm_sub_round_sh (x1, x2, 8); + res = _mm_mask_sub_round_sh (res, m8, x1, x2, 8); + res = _mm_maskz_sub_round_sh (m8, x1, x2, 11); +} diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vsubsh-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vsubsh-1b.c new file mode 100644 index 00000000000..df3680ebee1 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vsubsh-1b.c @@ -0,0 +1,76 @@ +/* { dg-do run { target avx512fp16 } } */ +/* { dg-options "-O2 -mavx512fp16 -mavx512dq" } */ + + +#define AVX512FP16 +#include "avx512fp16-helper.h" + +#define N_ELEMS 8 + +void NOINLINE +emulate_sub_sh(V512 * dest, V512 op1, V512 op2, + __mmask8 k, int zero_mask) +{ + V512 v1, v2, v3, v4, v5, v6, v7, v8; + int i; + + unpack_ph_2twops(op1, &v1, &v2); + unpack_ph_2twops(op2, &v3, &v4); + unpack_ph_2twops(*dest, &v7, &v8); + + if ((k&1) || !k) + v5.f32[0] = v1.f32[0] - v3.f32[0]; + else if (zero_mask) + v5.f32[0] = 0; + else + v5.f32[0] = v7.f32[0]; + + for (i = 1; i < 8; i++) + v5.f32[i] = v1.f32[i]; + + *dest = pack_twops_2ph(v5, v6); +} + +void +test_512 (void) +{ + V512 res; + V512 exp; + + init_src(); + + emulate_sub_sh(&exp, src1, src2, 0x1, 0); + res.xmmh[0] = _mm_sub_sh(src1.xmmh[0], src2.xmmh[0]); + check_results(&res, &exp, N_ELEMS, "_mm_sub_sh"); + + init_dest(&res, &exp); + emulate_sub_sh(&exp, src1, src2, 0x1, 0); + res.xmmh[0] = _mm_mask_sub_sh(res.xmmh[0], 0x1, src1.xmmh[0], + src2.xmmh[0]); + check_results(&res, &exp, N_ELEMS, "_mm_mask_sub_sh"); + + emulate_sub_sh(&exp, src1, src2, 0x3, 1); + res.xmmh[0] = _mm_maskz_sub_sh(0x3, src1.xmmh[0], src2.xmmh[0]); + check_results(&res, &exp, N_ELEMS, "_mm_maskz_sub_sh"); + + emulate_sub_sh(&exp, src1, src2, 0x1, 0); + res.xmmh[0] = _mm_sub_round_sh(src1.xmmh[0], src2.xmmh[0], + _ROUND_NINT); + check_results(&res, &exp, N_ELEMS, "_mm_sub_sh"); + + init_dest(&res, &exp); + emulate_sub_sh(&exp, src1, src2, 0x1, 0); + res.xmmh[0] = _mm_mask_sub_round_sh(res.xmmh[0], 0x1, src1.xmmh[0], + src2.xmmh[0], _ROUND_NINT); + check_results(&res, &exp, N_ELEMS, "_mm_mask_sub_sh"); + + emulate_sub_sh(&exp, src1, src2, 0x3, 1); + res.xmmh[0] = _mm_maskz_sub_round_sh(0x3, src1.xmmh[0], + src2.xmmh[0], _ROUND_NINT); + check_results(&res, &exp, N_ELEMS, "_mm_maskz_sub_sh"); + + if (n_errs != 0) { + abort (); + } +} + diff --git a/gcc/testsuite/gcc.target/i386/pr54855-11.c b/gcc/testsuite/gcc.target/i386/pr54855-11.c new file mode 100644 index 00000000000..a7095665d76 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr54855-11.c @@ -0,0 +1,16 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -mavx512fp16" } */ +/* { dg-final { scan-assembler-times "vaddsh\[ \\t\]" 1 } } */ +/* { dg-final { scan-assembler-not "vpextrw\[ \\t\]" } } */ +/* { dg-final { scan-assembler-not "vmovw\[ \\t\]" } } */ +/* { dg-final { scan-assembler-not "vmovd\[ \\t\]" } } */ +/* { dg-final { scan-assembler-not "vpunpckldq\[ \\t\]" } } */ +/* { dg-final { scan-assembler-not "vpunpcklqdq\[ \\t\]" } } */ + +#include + +__m128h +foo (__m128h x, __m128h y) +{ + return _mm_add_sh (x, y); +} From patchwork Thu Jul 1 06:15:58 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: liuhongt X-Patchwork-Id: 1499319 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=8.43.85.97; helo=sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256 header.s=default header.b=HH3irkD6; dkim-atps=neutral Received: from sourceware.org (ip-8-43-85-97.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4GFpJQ4LY6z9sWX for ; Thu, 1 Jul 2021 16:31:18 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 6853C384A014 for ; Thu, 1 Jul 2021 06:31:15 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 6853C384A014 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1625121075; bh=ItbKOZtwvpnCHhCFebZRQUhheQ4KoyM1hG0M4Uo62lA=; h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=HH3irkD6hOqdKYEURMzkcFC5OFdSnWp79m/hz5Jp2qmtDg2aWGeFvCpy/p97OHqSU GBWTM89PKFkKS5xxm/q08zprY1bXFm0jIWIZwqdoghyn+TlJDrLjqyFBohozD92PlY q2JIxVvDkpjm0YXrTFLnNfzg4+1zfTAozCSSVycI= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mga18.intel.com (mga18.intel.com [134.134.136.126]) by sourceware.org (Postfix) with ESMTPS id C5E77384841E for ; Thu, 1 Jul 2021 06:17:11 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org C5E77384841E X-IronPort-AV: E=McAfee;i="6200,9189,10031"; a="195769841" X-IronPort-AV: E=Sophos;i="5.83,313,1616482800"; d="scan'208";a="195769841" Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Jun 2021 23:17:10 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.83,313,1616482800"; d="scan'208";a="457530482" Received: from scymds01.sc.intel.com ([10.148.94.138]) by fmsmga008.fm.intel.com with ESMTP; 30 Jun 2021 23:17:10 -0700 Received: from shliclel320.sh.intel.com (shliclel320.sh.intel.com [10.239.236.50]) by scymds01.sc.intel.com with ESMTP id 1616Gmep031625; Wed, 30 Jun 2021 23:17:09 -0700 To: gcc-patches@gcc.gnu.org Subject: [PATCH 12/62] AVX512FP16: Add vmaxph/vminph/vmaxsh/vminsh. Date: Thu, 1 Jul 2021 14:15:58 +0800 Message-Id: <20210701061648.9447-13-hongtao.liu@intel.com> X-Mailer: git-send-email 2.18.1 In-Reply-To: <20210701061648.9447-1-hongtao.liu@intel.com> References: <20210701061648.9447-1-hongtao.liu@intel.com> X-Spam-Status: No, score=-13.1 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_NONE, KAM_DMARC_STATUS, KAM_LAZY_DOMAIN_SECURITY, KAM_SHORT, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: liuhongt via Gcc-patches From: liuhongt Reply-To: liuhongt Cc: jakub@redhat.com Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" gcc/ChangeLog: * config/i386/avx512fp16intrin.h: (_mm512_max_ph): New intrinsic. (_mm512_mask_max_ph): Likewise. (_mm512_maskz_max_ph): Likewise. (_mm512_min_ph): Likewise. (_mm512_mask_min_ph): Likewise. (_mm512_maskz_min_ph): Likewise. (_mm512_max_round_ph): Likewise. (_mm512_mask_max_round_ph): Likewise. (_mm512_maskz_max_round_ph): Likewise. (_mm512_min_round_ph): Likewise. (_mm512_mask_min_round_ph): Likewise. (_mm512_maskz_min_round_ph): Likewise. (_mm_max_sh): Likewise. (_mm_mask_max_sh): Likewise. (_mm_maskz_max_sh): Likewise. (_mm_min_sh): Likewise. (_mm_mask_min_sh): Likewise. (_mm_maskz_min_sh): Likewise. (_mm_max_round_sh): Likewise. (_mm_mask_max_round_sh): Likewise. (_mm_maskz_max_round_sh): Likewise. (_mm_min_round_sh): Likewise. (_mm_mask_min_round_sh): Likewise. (_mm_maskz_min_round_sh): Likewise. * config/i386/avx512fp16vlintrin.h (_mm_max_ph): New intrinsic. (_mm256_max_ph): Likewise. (_mm_mask_max_ph): Likewise. (_mm256_mask_max_ph): Likewise. (_mm_maskz_max_ph): Likewise. (_mm256_maskz_max_ph): Likewise. (_mm_min_ph): Likewise. (_mm256_min_ph): Likewise. (_mm_mask_min_ph): Likewise. (_mm256_mask_min_ph): Likewise. (_mm_maskz_min_ph): Likewise. (_mm256_maskz_min_ph): Likewise. * config/i386/i386-builtin-types.def: Add corresponding builtin types. * config/i386/i386-builtin.def: Add corresponding new builtins. * config/i386/i386-expand.c (ix86_expand_args_builtin): Handle new builtin types. * config/i386/sse.md (3): Adjust to support HF vector modes. (*3): Likewise. (ieee_3): Likewise. (_vm3): Likewise. * config/i386/subst.md (round_saeonly_mode512bit_condition): Adjust for HF vector modes. gcc/testsuite/ChangeLog: * gcc.target/i386/avx-1.c: Add test for new builtins. * gcc.target/i386/sse-13.c: Ditto. * gcc.target/i386/sse-23.c: Ditto. * gcc.target/i386/sse-14.c: Add test for new intrinsics. * gcc.target/i386/sse-22.c: Ditto. --- gcc/config/i386/avx512fp16intrin.h | 263 +++++++++++++++++++++++++ gcc/config/i386/avx512fp16vlintrin.h | 97 +++++++++ gcc/config/i386/i386-builtin-types.def | 2 + gcc/config/i386/i386-builtin.def | 12 ++ gcc/config/i386/i386-expand.c | 2 + gcc/config/i386/sse.md | 43 ++-- gcc/config/i386/subst.md | 4 +- gcc/testsuite/gcc.target/i386/avx-1.c | 4 + gcc/testsuite/gcc.target/i386/sse-13.c | 4 + gcc/testsuite/gcc.target/i386/sse-14.c | 12 ++ gcc/testsuite/gcc.target/i386/sse-22.c | 12 ++ gcc/testsuite/gcc.target/i386/sse-23.c | 4 + 12 files changed, 438 insertions(+), 21 deletions(-) diff --git a/gcc/config/i386/avx512fp16intrin.h b/gcc/config/i386/avx512fp16intrin.h index 6ae12ebf920..c232419b4db 100644 --- a/gcc/config/i386/avx512fp16intrin.h +++ b/gcc/config/i386/avx512fp16intrin.h @@ -722,6 +722,269 @@ _mm_maskz_div_round_sh (__mmask8 __A, __m128h __B, __m128h __C, (A), (D))) #endif /* __OPTIMIZE__ */ +/* Intrinsic vmaxph vminph. */ +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_max_ph (__m512h __A, __m512h __B) +{ + return __builtin_ia32_vmaxph_v32hf_mask (__A, __B, + _mm512_setzero_ph (), + (__mmask32) -1); +} + +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_max_ph (__m512h __A, __mmask32 __B, __m512h __C, __m512h __D) +{ + return __builtin_ia32_vmaxph_v32hf_mask (__C, __D, __A, __B); +} + +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_max_ph (__mmask32 __A, __m512h __B, __m512h __C) +{ + return __builtin_ia32_vmaxph_v32hf_mask (__B, __C, + _mm512_setzero_ph (), __A); +} + +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_min_ph (__m512h __A, __m512h __B) +{ + return __builtin_ia32_vminph_v32hf_mask (__A, __B, + _mm512_setzero_ph (), + (__mmask32) -1); +} + +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_min_ph (__m512h __A, __mmask32 __B, __m512h __C, __m512h __D) +{ + return __builtin_ia32_vminph_v32hf_mask (__C, __D, __A, __B); +} + +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_min_ph (__mmask32 __A, __m512h __B, __m512h __C) +{ + return __builtin_ia32_vminph_v32hf_mask (__B, __C, + _mm512_setzero_ph (), __A); +} + +#ifdef __OPTIMIZE__ +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_max_round_ph (__m512h __A, __m512h __B, const int __C) +{ + return __builtin_ia32_vmaxph_v32hf_mask_round (__A, __B, + _mm512_setzero_ph (), + (__mmask32) -1, __C); +} + +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_max_round_ph (__m512h __A, __mmask32 __B, __m512h __C, + __m512h __D, const int __E) +{ + return __builtin_ia32_vmaxph_v32hf_mask_round (__C, __D, __A, __B, __E); +} + +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_max_round_ph (__mmask32 __A, __m512h __B, __m512h __C, + const int __D) +{ + return __builtin_ia32_vmaxph_v32hf_mask_round (__B, __C, + _mm512_setzero_ph (), + __A, __D); +} + +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_min_round_ph (__m512h __A, __m512h __B, const int __C) +{ + return __builtin_ia32_vminph_v32hf_mask_round (__A, __B, + _mm512_setzero_ph (), + (__mmask32) -1, __C); +} + +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_min_round_ph (__m512h __A, __mmask32 __B, __m512h __C, + __m512h __D, const int __E) +{ + return __builtin_ia32_vminph_v32hf_mask_round (__C, __D, __A, __B, __E); +} + +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_min_round_ph (__mmask32 __A, __m512h __B, __m512h __C, + const int __D) +{ + return __builtin_ia32_vminph_v32hf_mask_round (__B, __C, + _mm512_setzero_ph (), + __A, __D); +} + +#else +#define _mm512_max_round_ph(A, B, C) \ + (__builtin_ia32_vmaxph_v32hf_mask_round ((A), (B), \ + _mm512_setzero_ph (), \ + (__mmask32)-1, (C))) + +#define _mm512_mask_max_round_ph(A, B, C, D, E) \ + (__builtin_ia32_vmaxph_v32hf_mask_round ((C), (D), (A), (B), (E))) + +#define _mm512_maskz_max_round_ph(A, B, C, D) \ + (__builtin_ia32_vmaxph_v32hf_mask_round ((B), (C), \ + _mm512_setzero_ph (), \ + (A), (D))) + +#define _mm512_min_round_ph(A, B, C) \ + (__builtin_ia32_vminph_v32hf_mask_round ((A), (B), \ + _mm512_setzero_ph (), \ + (__mmask32)-1, (C))) + +#define _mm512_mask_min_round_ph(A, B, C, D, E) \ + (__builtin_ia32_vminph_v32hf_mask_round ((C), (D), (A), (B), (E))) + +#define _mm512_maskz_min_round_ph(A, B, C, D) \ + (__builtin_ia32_vminph_v32hf_mask_round ((B), (C), \ + _mm512_setzero_ph (), \ + (A), (D))) +#endif /* __OPTIMIZE__ */ + +/* Intrinsic vmaxsh vminsh. */ +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_max_sh (__m128h __A, __m128h __B) +{ + __A[0] = __A[0] > __B[0] ? __A[0] : __B[0]; + return __A; +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_max_sh (__m128h __A, __mmask8 __B, __m128h __C, __m128h __D) +{ + return __builtin_ia32_vmaxsh_v8hf_mask (__C, __D, __A, __B); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_max_sh (__mmask8 __A, __m128h __B, __m128h __C) +{ + return __builtin_ia32_vmaxsh_v8hf_mask (__B, __C, _mm_setzero_ph (), + __A); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_min_sh (__m128h __A, __m128h __B) +{ + __A[0] = __A[0] < __B[0] ? __A[0] : __B[0]; + return __A; +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_min_sh (__m128h __A, __mmask8 __B, __m128h __C, __m128h __D) +{ + return __builtin_ia32_vminsh_v8hf_mask (__C, __D, __A, __B); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_min_sh (__mmask8 __A, __m128h __B, __m128h __C) +{ + return __builtin_ia32_vminsh_v8hf_mask (__B, __C, _mm_setzero_ph (), + __A); +} + +#ifdef __OPTIMIZE__ +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_max_round_sh (__m128h __A, __m128h __B, const int __C) +{ + return __builtin_ia32_vmaxsh_v8hf_mask_round (__A, __B, + _mm_setzero_ph (), + (__mmask8) -1, __C); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_max_round_sh (__m128h __A, __mmask8 __B, __m128h __C, + __m128h __D, const int __E) +{ + return __builtin_ia32_vmaxsh_v8hf_mask_round (__C, __D, __A, __B, __E); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_max_round_sh (__mmask8 __A, __m128h __B, __m128h __C, + const int __D) +{ + return __builtin_ia32_vmaxsh_v8hf_mask_round (__B, __C, + _mm_setzero_ph (), + __A, __D); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_min_round_sh (__m128h __A, __m128h __B, const int __C) +{ + return __builtin_ia32_vminsh_v8hf_mask_round (__A, __B, + _mm_setzero_ph (), + (__mmask8) -1, __C); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_min_round_sh (__m128h __A, __mmask8 __B, __m128h __C, + __m128h __D, const int __E) +{ + return __builtin_ia32_vminsh_v8hf_mask_round (__C, __D, __A, __B, __E); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_min_round_sh (__mmask8 __A, __m128h __B, __m128h __C, + const int __D) +{ + return __builtin_ia32_vminsh_v8hf_mask_round (__B, __C, + _mm_setzero_ph (), + __A, __D); +} + +#else +#define _mm_max_round_sh(A, B, C) \ + (__builtin_ia32_vmaxsh_v8hf_mask_round ((A), (B), \ + _mm_setzero_ph (), \ + (__mmask8)-1, (C))) + +#define _mm_mask_max_round_sh(A, B, C, D, E) \ + (__builtin_ia32_vmaxsh_v8hf_mask_round ((C), (D), (A), (B), (E))) + +#define _mm_maskz_max_round_sh(A, B, C, D) \ + (__builtin_ia32_vmaxsh_v8hf_mask_round ((B), (C), \ + _mm_setzero_ph (), \ + (A), (D))) + +#define _mm_min_round_sh(A, B, C) \ + (__builtin_ia32_vminsh_v8hf_mask_round ((A), (B), \ + _mm_setzero_ph (), \ + (__mmask8)-1, (C))) + +#define _mm_mask_min_round_sh(A, B, C, D, E) \ + (__builtin_ia32_vminsh_v8hf_mask_round ((C), (D), (A), (B), (E))) + +#define _mm_maskz_min_round_sh(A, B, C, D) \ + (__builtin_ia32_vminsh_v8hf_mask_round ((B), (C), \ + _mm_setzero_ph (), \ + (A), (D))) + +#endif /* __OPTIMIZE__ */ + #ifdef __DISABLE_AVX512FP16__ #undef __DISABLE_AVX512FP16__ #pragma GCC pop_options diff --git a/gcc/config/i386/avx512fp16vlintrin.h b/gcc/config/i386/avx512fp16vlintrin.h index 75fa9eb29e7..bd60b4cd4ca 100644 --- a/gcc/config/i386/avx512fp16vlintrin.h +++ b/gcc/config/i386/avx512fp16vlintrin.h @@ -211,6 +211,103 @@ _mm256_maskz_div_ph (__mmask16 __A, __m256h __B, __m256h __C) _mm256_setzero_ph (), __A); } +/* Intrinsics v[max,min]ph. */ +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_max_ph (__m128h __A, __m128h __B) +{ + return __builtin_ia32_vmaxph_v8hf_mask (__A, __B, + _mm_setzero_ph (), + (__mmask8) -1); +} + +extern __inline __m256h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_max_ph (__m256h __A, __m256h __B) +{ + return __builtin_ia32_vmaxph_v16hf_mask (__A, __B, + _mm256_setzero_ph (), + (__mmask16) -1); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_max_ph (__m128h __A, __mmask8 __B, __m128h __C, __m128h __D) +{ + return __builtin_ia32_vmaxph_v8hf_mask (__C, __D, __A, __B); +} + +extern __inline __m256h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_mask_max_ph (__m256h __A, __mmask16 __B, __m256h __C, __m256h __D) +{ + return __builtin_ia32_vmaxph_v16hf_mask (__C, __D, __A, __B); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_max_ph (__mmask8 __A, __m128h __B, __m128h __C) +{ + return __builtin_ia32_vmaxph_v8hf_mask (__B, __C, _mm_setzero_ph (), + __A); +} + +extern __inline __m256h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_maskz_max_ph (__mmask16 __A, __m256h __B, __m256h __C) +{ + return __builtin_ia32_vmaxph_v16hf_mask (__B, __C, + _mm256_setzero_ph (), __A); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_min_ph (__m128h __A, __m128h __B) +{ + return __builtin_ia32_vminph_v8hf_mask (__A, __B, + _mm_setzero_ph (), + (__mmask8) -1); +} + +extern __inline __m256h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_min_ph (__m256h __A, __m256h __B) +{ + return __builtin_ia32_vminph_v16hf_mask (__A, __B, + _mm256_setzero_ph (), + (__mmask16) -1); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_min_ph (__m128h __A, __mmask8 __B, __m128h __C, __m128h __D) +{ + return __builtin_ia32_vminph_v8hf_mask (__C, __D, __A, __B); +} + +extern __inline __m256h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_mask_min_ph (__m256h __A, __mmask16 __B, __m256h __C, __m256h __D) +{ + return __builtin_ia32_vminph_v16hf_mask (__C, __D, __A, __B); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_min_ph (__mmask8 __A, __m128h __B, __m128h __C) +{ + return __builtin_ia32_vminph_v8hf_mask (__B, __C, _mm_setzero_ph (), + __A); +} + +extern __inline __m256h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_maskz_min_ph (__mmask16 __A, __m256h __B, __m256h __C) +{ + return __builtin_ia32_vminph_v16hf_mask (__B, __C, + _mm256_setzero_ph (), __A); +} + #ifdef __DISABLE_AVX512FP16VL__ #undef __DISABLE_AVX512FP16VL__ #pragma GCC pop_options diff --git a/gcc/config/i386/i386-builtin-types.def b/gcc/config/i386/i386-builtin-types.def index ed738f71927..3bd2670e229 100644 --- a/gcc/config/i386/i386-builtin-types.def +++ b/gcc/config/i386/i386-builtin-types.def @@ -1304,9 +1304,11 @@ DEF_FUNCTION_TYPE (UINT8, PV2DI, PCV2DI, PCVOID) # FP16 builtins DEF_FUNCTION_TYPE (V8HF, V8HI) +DEF_FUNCTION_TYPE (V8HF, V8HF, V8HF) DEF_FUNCTION_TYPE (V8HF, V8HF, V8HF, INT) DEF_FUNCTION_TYPE (V8HF, V8HF, V8HF, V8HF, UQI) DEF_FUNCTION_TYPE (V8HF, V8HF, V8HF, V8HF, UQI, INT) +DEF_FUNCTION_TYPE (V16HF, V16HF, V16HF) DEF_FUNCTION_TYPE (V16HF, V16HF, V16HF, V16HF, UHI) DEF_FUNCTION_TYPE (V32HF, V32HF, V32HF, INT) DEF_FUNCTION_TYPE (V32HF, V32HF, V32HF, V32HF, USI) diff --git a/gcc/config/i386/i386-builtin.def b/gcc/config/i386/i386-builtin.def index 60e2b75be14..28e5627ca4c 100644 --- a/gcc/config/i386/i386-builtin.def +++ b/gcc/config/i386/i386-builtin.def @@ -2791,6 +2791,14 @@ BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vmaddv8hf3_mask, "__b BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vmsubv8hf3_mask, "__builtin_ia32_vsubsh_v8hf_mask", IX86_BUILTIN_VSUBSH_V8HF_MASK, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI) BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vmmulv8hf3_mask, "__builtin_ia32_vmulsh_v8hf_mask", IX86_BUILTIN_VMULSH_V8HF_MASK, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI) BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vmdivv8hf3_mask, "__builtin_ia32_vdivsh_v8hf_mask", IX86_BUILTIN_VDIVSH_V8HF_MASK, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI) +BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_smaxv8hf3_mask, "__builtin_ia32_vmaxph_v8hf_mask", IX86_BUILTIN_VMAXPH_V8HF_MASK, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI) +BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_smaxv16hf3_mask, "__builtin_ia32_vmaxph_v16hf_mask", IX86_BUILTIN_VMAXPH_V16HF_MASK, UNKNOWN, (int) V16HF_FTYPE_V16HF_V16HF_V16HF_UHI) +BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_smaxv32hf3_mask, "__builtin_ia32_vmaxph_v32hf_mask", IX86_BUILTIN_VMAXPH_V32HF_MASK, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI) +BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_sminv8hf3_mask, "__builtin_ia32_vminph_v8hf_mask", IX86_BUILTIN_VMINPH_V8HF_MASK, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI) +BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_sminv16hf3_mask, "__builtin_ia32_vminph_v16hf_mask", IX86_BUILTIN_VMINPH_V16HF_MASK, UNKNOWN, (int) V16HF_FTYPE_V16HF_V16HF_V16HF_UHI) +BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_sminv32hf3_mask, "__builtin_ia32_vminph_v32hf_mask", IX86_BUILTIN_VMINPH_V32HF_MASK, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI) +BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vmsmaxv8hf3_mask, "__builtin_ia32_vmaxsh_v8hf_mask", IX86_BUILTIN_VMAXSH_V8HF_MASK, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI) +BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vmsminv8hf3_mask, "__builtin_ia32_vminsh_v8hf_mask", IX86_BUILTIN_VMINSH_V8HF_MASK, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI) /* Builtins with rounding support. */ BDESC_END (ARGS, ROUND_ARGS) @@ -3000,6 +3008,10 @@ BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vmaddv8hf3_mask_round BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vmsubv8hf3_mask_round, "__builtin_ia32_vsubsh_v8hf_mask_round", IX86_BUILTIN_VSUBSH_V8HF_MASK_ROUND, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI_INT) BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vmmulv8hf3_mask_round, "__builtin_ia32_vmulsh_v8hf_mask_round", IX86_BUILTIN_VMULSH_V8HF_MASK_ROUND, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI_INT) BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vmdivv8hf3_mask_round, "__builtin_ia32_vdivsh_v8hf_mask_round", IX86_BUILTIN_VDIVSH_V8HF_MASK_ROUND, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI_INT) +BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_smaxv32hf3_mask_round, "__builtin_ia32_vmaxph_v32hf_mask_round", IX86_BUILTIN_VMAXPH_V32HF_MASK_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT) +BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_sminv32hf3_mask_round, "__builtin_ia32_vminph_v32hf_mask_round", IX86_BUILTIN_VMINPH_V32HF_MASK_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT) +BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vmsmaxv8hf3_mask_round, "__builtin_ia32_vmaxsh_v8hf_mask_round", IX86_BUILTIN_VMAXSH_V8HF_MASK_ROUND, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI_INT) +BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vmsminv8hf3_mask_round, "__builtin_ia32_vminsh_v8hf_mask_round", IX86_BUILTIN_VMINSH_V8HF_MASK_ROUND, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI_INT) BDESC_END (ROUND_ARGS, MULTI_ARG) diff --git a/gcc/config/i386/i386-expand.c b/gcc/config/i386/i386-expand.c index d2a47150e1b..90f8e3a6d4c 100644 --- a/gcc/config/i386/i386-expand.c +++ b/gcc/config/i386/i386-expand.c @@ -9349,12 +9349,14 @@ ix86_expand_args_builtin (const struct builtin_description *d, case FLOAT128_FTYPE_FLOAT128_FLOAT128: case V16QI_FTYPE_V16QI_V16QI: case V16QI_FTYPE_V8HI_V8HI: + case V16HF_FTYPE_V16HF_V16HF: case V16SF_FTYPE_V16SF_V16SF: case V8QI_FTYPE_V8QI_V8QI: case V8QI_FTYPE_V4HI_V4HI: case V8HI_FTYPE_V8HI_V8HI: case V8HI_FTYPE_V16QI_V16QI: case V8HI_FTYPE_V4SI_V4SI: + case V8HF_FTYPE_V8HF_V8HF: case V8SF_FTYPE_V8SF_V8SF: case V8SF_FTYPE_V8SF_V8SI: case V8DF_FTYPE_V8DF_V8DF: diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index 8fa3f8ddac9..976803f2a1d 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -2384,11 +2384,12 @@ (define_insn "*sse_vmrsqrtv4sf2" (set_attr "mode" "SF")]) (define_expand "3" - [(set (match_operand:VF 0 "register_operand") - (smaxmin:VF - (match_operand:VF 1 "") - (match_operand:VF 2 "")))] - "TARGET_SSE && && " + [(set (match_operand:VFH 0 "register_operand") + (smaxmin:VFH + (match_operand:VFH 1 "") + (match_operand:VFH 2 "")))] + "TARGET_SSE && + && " { if (!flag_finite_math_only || flag_signed_zeros) { @@ -2409,13 +2410,14 @@ (define_expand "3" ;; are undefined in this condition, we're certain this is correct. (define_insn "*3" - [(set (match_operand:VF 0 "register_operand" "=x,v") - (smaxmin:VF - (match_operand:VF 1 "" "%0,v") - (match_operand:VF 2 "" "xBm,")))] + [(set (match_operand:VFH 0 "register_operand" "=x,v") + (smaxmin:VFH + (match_operand:VFH 1 "" "%0,v") + (match_operand:VFH 2 "" "xBm,")))] "TARGET_SSE && !(MEM_P (operands[1]) && MEM_P (operands[2])) - && && " + && + && " "@ \t{%2, %0|%0, %2} v\t{%2, %1, %0|%0, %1, %2}" @@ -2432,13 +2434,14 @@ (define_insn "*3" ;; presence of -0.0 and NaN. (define_insn "ieee_3" - [(set (match_operand:VF 0 "register_operand" "=x,v") - (unspec:VF - [(match_operand:VF 1 "register_operand" "0,v") - (match_operand:VF 2 "" "xBm,")] + [(set (match_operand:VFH 0 "register_operand" "=x,v") + (unspec:VFH + [(match_operand:VFH 1 "register_operand" "0,v") + (match_operand:VFH 2 "" "xBm,")] IEEE_MAXMIN))] "TARGET_SSE - && && " + && + && " "@ \t{%2, %0|%0, %2} v\t{%2, %1, %0|%0, %1, %2}" @@ -2473,11 +2476,11 @@ (define_insn "*ieee_3" (set_attr "mode" "")]) (define_insn "_vm3" - [(set (match_operand:VF_128 0 "register_operand" "=x,v") - (vec_merge:VF_128 - (smaxmin:VF_128 - (match_operand:VF_128 1 "register_operand" "0,v") - (match_operand:VF_128 2 "nonimmediate_operand" "xm,")) + [(set (match_operand:VFH_128 0 "register_operand" "=x,v") + (vec_merge:VFH_128 + (smaxmin:VFH_128 + (match_operand:VFH_128 1 "register_operand" "0,v") + (match_operand:VFH_128 2 "nonimmediate_operand" "xm,")) (match_dup 1) (const_int 1)))] "TARGET_SSE" diff --git a/gcc/config/i386/subst.md b/gcc/config/i386/subst.md index 762383bfd11..ecb158f07e5 100644 --- a/gcc/config/i386/subst.md +++ b/gcc/config/i386/subst.md @@ -187,7 +187,9 @@ (define_subst_attr "round_saeonly_nimm_scalar_predicate" "round_saeonly" "nonimm (define_subst_attr "round_saeonly_mode512bit_condition" "round_saeonly" "1" "(mode == V16SFmode || mode == V8DFmode || mode == V8DImode - || mode == V16SImode)") + || mode == V16SImode + || mode == V32HFmode)") + (define_subst_attr "round_saeonly_modev8sf_condition" "round_saeonly" "1" "(mode == V8SFmode)") (define_subst "round_saeonly" diff --git a/gcc/testsuite/gcc.target/i386/avx-1.c b/gcc/testsuite/gcc.target/i386/avx-1.c index 26ca87ce2f5..7106076b2a3 100644 --- a/gcc/testsuite/gcc.target/i386/avx-1.c +++ b/gcc/testsuite/gcc.target/i386/avx-1.c @@ -694,6 +694,10 @@ #define __builtin_ia32_vsubsh_v8hf_mask_round(A, B, C, D, E) __builtin_ia32_vsubsh_v8hf_mask_round(A, B, C, D, 8) #define __builtin_ia32_vmulsh_v8hf_mask_round(A, B, C, D, E) __builtin_ia32_vmulsh_v8hf_mask_round(A, B, C, D, 8) #define __builtin_ia32_vdivsh_v8hf_mask_round(A, B, C, D, E) __builtin_ia32_vdivsh_v8hf_mask_round(A, B, C, D, 8) +#define __builtin_ia32_vmaxph_v32hf_mask_round(A, B, C, D, E) __builtin_ia32_vmaxph_v32hf_mask_round(A, B, C, D, 8) +#define __builtin_ia32_vminph_v32hf_mask_round(A, B, C, D, E) __builtin_ia32_vminph_v32hf_mask_round(A, B, C, D, 8) +#define __builtin_ia32_vmaxsh_v8hf_mask_round(A, B, C, D, E) __builtin_ia32_vmaxsh_v8hf_mask_round(A, B, C, D, 8) +#define __builtin_ia32_vminsh_v8hf_mask_round(A, B, C, D, E) __builtin_ia32_vminsh_v8hf_mask_round(A, B, C, D, 8) /* vpclmulqdqintrin.h */ #define __builtin_ia32_vpclmulqdq_v4di(A, B, C) __builtin_ia32_vpclmulqdq_v4di(A, B, 1) diff --git a/gcc/testsuite/gcc.target/i386/sse-13.c b/gcc/testsuite/gcc.target/i386/sse-13.c index ae35adb5ead..1732b50be6b 100644 --- a/gcc/testsuite/gcc.target/i386/sse-13.c +++ b/gcc/testsuite/gcc.target/i386/sse-13.c @@ -711,6 +711,10 @@ #define __builtin_ia32_vsubsh_v8hf_mask_round(A, B, C, D, E) __builtin_ia32_vsubsh_v8hf_mask_round(A, B, C, D, 8) #define __builtin_ia32_vmulsh_v8hf_mask_round(A, B, C, D, E) __builtin_ia32_vmulsh_v8hf_mask_round(A, B, C, D, 8) #define __builtin_ia32_vdivsh_v8hf_mask_round(A, B, C, D, E) __builtin_ia32_vdivsh_v8hf_mask_round(A, B, C, D, 8) +#define __builtin_ia32_vmaxph_v32hf_mask_round(A, B, C, D, E) __builtin_ia32_vmaxph_v32hf_mask_round(A, B, C, D, 8) +#define __builtin_ia32_vminph_v32hf_mask_round(A, B, C, D, E) __builtin_ia32_vminph_v32hf_mask_round(A, B, C, D, 8) +#define __builtin_ia32_vmaxsh_v8hf_mask_round(A, B, C, D, E) __builtin_ia32_vmaxsh_v8hf_mask_round(A, B, C, D, 8) +#define __builtin_ia32_vminsh_v8hf_mask_round(A, B, C, D, E) __builtin_ia32_vminsh_v8hf_mask_round(A, B, C, D, 8) /* vpclmulqdqintrin.h */ #define __builtin_ia32_vpclmulqdq_v4di(A, B, C) __builtin_ia32_vpclmulqdq_v4di(A, B, 1) diff --git a/gcc/testsuite/gcc.target/i386/sse-14.c b/gcc/testsuite/gcc.target/i386/sse-14.c index e79edf0a5bb..135b4463941 100644 --- a/gcc/testsuite/gcc.target/i386/sse-14.c +++ b/gcc/testsuite/gcc.target/i386/sse-14.c @@ -676,6 +676,10 @@ test_2 (_mm_add_round_sh, __m128h, __m128h, __m128h, 8) test_2 (_mm_sub_round_sh, __m128h, __m128h, __m128h, 8) test_2 (_mm_mul_round_sh, __m128h, __m128h, __m128h, 8) test_2 (_mm_div_round_sh, __m128h, __m128h, __m128h, 8) +test_2 (_mm512_max_round_ph, __m512h, __m512h, __m512h, 8) +test_2 (_mm512_min_round_ph, __m512h, __m512h, __m512h, 8) +test_2 (_mm_max_round_sh, __m128h, __m128h, __m128h, 8) +test_2 (_mm_min_round_sh, __m128h, __m128h, __m128h, 8) test_3 (_mm512_maskz_add_round_ph, __m512h, __mmask32, __m512h, __m512h, 8) test_3 (_mm512_maskz_sub_round_ph, __m512h, __mmask32, __m512h, __m512h, 8) test_3 (_mm512_maskz_mul_round_ph, __m512h, __mmask32, __m512h, __m512h, 8) @@ -684,6 +688,10 @@ test_3 (_mm_maskz_add_round_sh, __m128h, __mmask32, __m128h, __m128h, 8) test_3 (_mm_maskz_sub_round_sh, __m128h, __mmask32, __m128h, __m128h, 8) test_3 (_mm_maskz_mul_round_sh, __m128h, __mmask32, __m128h, __m128h, 8) test_3 (_mm_maskz_div_round_sh, __m128h, __mmask32, __m128h, __m128h, 8) +test_3 (_mm512_maskz_max_round_ph, __m512h, __mmask32, __m512h, __m512h, 8) +test_3 (_mm512_maskz_min_round_ph, __m512h, __mmask32, __m512h, __m512h, 8) +test_3 (_mm_maskz_max_round_sh, __m128h, __mmask8, __m128h, __m128h, 8) +test_3 (_mm_maskz_min_round_sh, __m128h, __mmask8, __m128h, __m128h, 8) test_4 (_mm512_mask_add_round_ph, __m512h, __m512h, __mmask32, __m512h, __m512h, 8) test_4 (_mm512_mask_sub_round_ph, __m512h, __m512h, __mmask32, __m512h, __m512h, 8) test_4 (_mm512_mask_mul_round_ph, __m512h, __m512h, __mmask32, __m512h, __m512h, 8) @@ -692,6 +700,10 @@ test_4 (_mm_mask_add_round_sh, __m128h, __m128h, __mmask32, __m128h, __m128h, 8) test_4 (_mm_mask_sub_round_sh, __m128h, __m128h, __mmask32, __m128h, __m128h, 8) test_4 (_mm_mask_mul_round_sh, __m128h, __m128h, __mmask32, __m128h, __m128h, 8) test_4 (_mm_mask_div_round_sh, __m128h, __m128h, __mmask32, __m128h, __m128h, 8) +test_4 (_mm512_mask_max_round_ph, __m512h, __m512h, __mmask32, __m512h, __m512h, 8) +test_4 (_mm512_mask_min_round_ph, __m512h, __m512h, __mmask32, __m512h, __m512h, 8) +test_4 (_mm_mask_max_round_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 8) +test_4 (_mm_mask_min_round_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 8) /* shaintrin.h */ test_2 (_mm_sha1rnds4_epu32, __m128i, __m128i, __m128i, 1) diff --git a/gcc/testsuite/gcc.target/i386/sse-22.c b/gcc/testsuite/gcc.target/i386/sse-22.c index 2c1f27d881a..da3f5606207 100644 --- a/gcc/testsuite/gcc.target/i386/sse-22.c +++ b/gcc/testsuite/gcc.target/i386/sse-22.c @@ -781,6 +781,10 @@ test_2 (_mm_add_round_sh, __m128h, __m128h, __m128h, 8) test_2 (_mm_sub_round_sh, __m128h, __m128h, __m128h, 8) test_2 (_mm_mul_round_sh, __m128h, __m128h, __m128h, 8) test_2 (_mm_div_round_sh, __m128h, __m128h, __m128h, 8) +test_2 (_mm512_max_round_ph, __m512h, __m512h, __m512h, 8) +test_2 (_mm512_min_round_ph, __m512h, __m512h, __m512h, 8) +test_2 (_mm_max_round_sh, __m128h, __m128h, __m128h, 8) +test_2 (_mm_min_round_sh, __m128h, __m128h, __m128h, 8) test_3 (_mm512_maskz_add_round_ph, __m512h, __mmask32, __m512h, __m512h, 8) test_3 (_mm512_maskz_sub_round_ph, __m512h, __mmask32, __m512h, __m512h, 8) test_3 (_mm512_maskz_mul_round_ph, __m512h, __mmask32, __m512h, __m512h, 8) @@ -789,6 +793,10 @@ test_3 (_mm_maskz_add_round_sh, __m128h, __mmask8, __m128h, __m128h, 8) test_3 (_mm_maskz_sub_round_sh, __m128h, __mmask8, __m128h, __m128h, 8) test_3 (_mm_maskz_mul_round_sh, __m128h, __mmask8, __m128h, __m128h, 8) test_3 (_mm_maskz_div_round_sh, __m128h, __mmask8, __m128h, __m128h, 8) +test_3 (_mm512_maskz_max_round_ph, __m512h, __mmask32, __m512h, __m512h, 8) +test_3 (_mm512_maskz_min_round_ph, __m512h, __mmask32, __m512h, __m512h, 8) +test_3 (_mm_maskz_max_round_sh, __m128h, __mmask8, __m128h, __m128h, 8) +test_3 (_mm_maskz_min_round_sh, __m128h, __mmask8, __m128h, __m128h, 8) test_4 (_mm512_mask_add_round_ph, __m512h, __m512h, __mmask32, __m512h, __m512h, 8) test_4 (_mm512_mask_sub_round_ph, __m512h, __m512h, __mmask32, __m512h, __m512h, 8) test_4 (_mm512_mask_mul_round_ph, __m512h, __m512h, __mmask32, __m512h, __m512h, 8) @@ -797,6 +805,10 @@ test_4 (_mm_mask_add_round_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 8) test_4 (_mm_mask_sub_round_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 8) test_4 (_mm_mask_mul_round_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 8) test_4 (_mm_mask_div_round_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 8) +test_4 (_mm512_mask_max_round_ph, __m512h, __m512h, __mmask32, __m512h, __m512h, 8) +test_4 (_mm512_mask_min_round_ph, __m512h, __m512h, __mmask32, __m512h, __m512h, 8) +test_4 (_mm_mask_max_round_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 8) +test_4 (_mm_mask_min_round_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 8) /* shaintrin.h */ test_2 (_mm_sha1rnds4_epu32, __m128i, __m128i, __m128i, 1) diff --git a/gcc/testsuite/gcc.target/i386/sse-23.c b/gcc/testsuite/gcc.target/i386/sse-23.c index a89aef2aa8e..c3fee655288 100644 --- a/gcc/testsuite/gcc.target/i386/sse-23.c +++ b/gcc/testsuite/gcc.target/i386/sse-23.c @@ -712,6 +712,10 @@ #define __builtin_ia32_vsubsh_v8hf_mask_round(A, B, C, D, E) __builtin_ia32_vsubsh_v8hf_mask_round(A, B, C, D, 8) #define __builtin_ia32_vmulsh_v8hf_mask_round(A, B, C, D, E) __builtin_ia32_vmulsh_v8hf_mask_round(A, B, C, D, 8) #define __builtin_ia32_vdivsh_v8hf_mask_round(A, B, C, D, E) __builtin_ia32_vdivsh_v8hf_mask_round(A, B, C, D, 8) +#define __builtin_ia32_vmaxph_v32hf_mask_round(A, B, C, D, E) __builtin_ia32_vmaxph_v32hf_mask_round(A, B, C, D, 8) +#define __builtin_ia32_vminph_v32hf_mask_round(A, B, C, D, E) __builtin_ia32_vminph_v32hf_mask_round(A, B, C, D, 8) +#define __builtin_ia32_vmaxsh_v8hf_mask_round(A, B, C, D, E) __builtin_ia32_vmaxsh_v8hf_mask_round(A, B, C, D, 8) +#define __builtin_ia32_vminsh_v8hf_mask_round(A, B, C, D, E) __builtin_ia32_vminsh_v8hf_mask_round(A, B, C, D, 8) /* vpclmulqdqintrin.h */ #define __builtin_ia32_vpclmulqdq_v4di(A, B, C) __builtin_ia32_vpclmulqdq_v4di(A, B, 1) From patchwork Thu Jul 1 06:15:59 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: liuhongt X-Patchwork-Id: 1499320 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=8.43.85.97; helo=sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256 header.s=default header.b=pLeDfSTP; dkim-atps=neutral Received: from sourceware.org (ip-8-43-85-97.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4GFpK66x1yz9sW8 for ; Thu, 1 Jul 2021 16:31:54 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id AAE2C3840001 for ; Thu, 1 Jul 2021 06:31:52 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org AAE2C3840001 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1625121112; bh=utpD2amMY9giH25YSi6TyXIEetp9q8yKce0Vmn6b7JU=; h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=pLeDfSTPkn75vsKQHea8xDWWPw0Jg2f7IpHEKAdVBalQQafErls1suNuzucnmIBsq ynElHs9YSo770o15CXw68j0LXfUOJ1gUAVepBKXhrl/cFrD1M1bJ9OXR/7PVLJVNbK WOPgRI8AEWV8Ulz9Gcp+SbP/KzhWS/xwcYhrSm7k= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mga17.intel.com (mga17.intel.com [192.55.52.151]) by sourceware.org (Postfix) with ESMTPS id F369B3848402 for ; Thu, 1 Jul 2021 06:17:12 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org F369B3848402 X-IronPort-AV: E=McAfee;i="6200,9189,10031"; a="188859385" X-IronPort-AV: E=Sophos;i="5.83,313,1616482800"; d="scan'208";a="188859385" Received: from fmsmga005.fm.intel.com ([10.253.24.32]) by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Jun 2021 23:17:12 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.83,313,1616482800"; d="scan'208";a="644339074" Received: from scymds01.sc.intel.com ([10.148.94.138]) by fmsmga005.fm.intel.com with ESMTP; 30 Jun 2021 23:17:12 -0700 Received: from shliclel320.sh.intel.com (shliclel320.sh.intel.com [10.239.236.50]) by scymds01.sc.intel.com with ESMTP id 1616Gmeq031625; Wed, 30 Jun 2021 23:17:10 -0700 To: gcc-patches@gcc.gnu.org Subject: [PATCH 13/62] AVX512FP16: Add testcase for vmaxph/vmaxsh/vminph/vminsh. Date: Thu, 1 Jul 2021 14:15:59 +0800 Message-Id: <20210701061648.9447-14-hongtao.liu@intel.com> X-Mailer: git-send-email 2.18.1 In-Reply-To: <20210701061648.9447-1-hongtao.liu@intel.com> References: <20210701061648.9447-1-hongtao.liu@intel.com> X-Spam-Status: No, score=-12.6 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_NONE, KAM_DMARC_STATUS, KAM_LAZY_DOMAIN_SECURITY, KAM_SHORT, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: liuhongt via Gcc-patches From: liuhongt Reply-To: liuhongt Cc: jakub@redhat.com Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" gcc/testsuite/ChangeLog: * gcc.target/i386/avx512fp16-vmaxph-1a.c: New test. * gcc.target/i386/avx512fp16-vmaxph-1b.c: Ditto. * gcc.target/i386/avx512fp16-vmaxsh-1.c: Ditto. * gcc.target/i386/avx512fp16-vmaxsh-1b.c: Ditto. * gcc.target/i386/avx512fp16-vminph-1a.c: Ditto. * gcc.target/i386/avx512fp16-vminph-1b.c: Ditto. * gcc.target/i386/avx512fp16-vminsh-1.c: Ditto. * gcc.target/i386/avx512fp16-vminsh-1b.c: Ditto. * gcc.target/i386/avx512fp16vl-vmaxph-1a.c: Ditto. * gcc.target/i386/avx512fp16vl-vmaxph-1b.c: Ditto. * gcc.target/i386/avx512fp16vl-vminph-1a.c: Ditto. * gcc.target/i386/avx512fp16vl-vminph-1b.c: Ditto. --- .../gcc.target/i386/avx512fp16-vmaxph-1a.c | 26 +++++ .../gcc.target/i386/avx512fp16-vmaxph-1b.c | 94 +++++++++++++++++++ .../gcc.target/i386/avx512fp16-vmaxsh-1.c | 27 ++++++ .../gcc.target/i386/avx512fp16-vmaxsh-1b.c | 72 ++++++++++++++ .../gcc.target/i386/avx512fp16-vminph-1a.c | 26 +++++ .../gcc.target/i386/avx512fp16-vminph-1b.c | 93 ++++++++++++++++++ .../gcc.target/i386/avx512fp16-vminsh-1.c | 27 ++++++ .../gcc.target/i386/avx512fp16-vminsh-1b.c | 72 ++++++++++++++ .../gcc.target/i386/avx512fp16vl-vmaxph-1a.c | 29 ++++++ .../gcc.target/i386/avx512fp16vl-vmaxph-1b.c | 16 ++++ .../gcc.target/i386/avx512fp16vl-vminph-1a.c | 29 ++++++ .../gcc.target/i386/avx512fp16vl-vminph-1b.c | 16 ++++ 12 files changed, 527 insertions(+) create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vmaxph-1a.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vmaxph-1b.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vmaxsh-1.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vmaxsh-1b.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vminph-1a.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vminph-1b.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vminsh-1.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vminsh-1b.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vmaxph-1a.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vmaxph-1b.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vminph-1a.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vminph-1b.c diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vmaxph-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vmaxph-1a.c new file mode 100644 index 00000000000..b91f4bd1154 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vmaxph-1a.c @@ -0,0 +1,26 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx512fp16 -O2" } */ +/* { dg-final { scan-assembler-times "vmaxph\[ \\t\]+%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vmaxph\[ \\t\]+%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vmaxph\[ \\t\]+%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vmaxph\[ \\t\]+\{sae\}\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vmaxph\[ \\t\]+\{sae\}\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 2 } } */ +/* { dg-final { scan-assembler-times "vmaxph\[ \\t\]+\{sae\}\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ + +#include + +volatile __m512h res, res1, res2; +volatile __m512h x1, x2; +volatile __mmask32 m32; + +void extern +avx512f_test (void) +{ + res = _mm512_max_ph (x1, x2); + res1 = _mm512_mask_max_ph (res1, m32, x1, x2); + res2 = _mm512_maskz_max_ph (m32, x1, x2); + + res = _mm512_max_round_ph (x1, x2, 8); + res1 = _mm512_mask_max_round_ph (res1, m32, x1, x2, 8); + res2 = _mm512_maskz_max_round_ph (m32, x1, x2, 8); +} diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vmaxph-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vmaxph-1b.c new file mode 100644 index 00000000000..0dd4c11e9aa --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vmaxph-1b.c @@ -0,0 +1,94 @@ +/* { dg-do run { target avx512fp16 } } */ +/* { dg-options "-O2 -mavx512fp16 -mavx512dq" } */ + + +#define AVX512FP16 +#include "avx512fp16-helper.h" + +#define N_ELEMS (AVX512F_LEN / 16) + +void NOINLINE +EMULATE(max_ph) (V512 * dest, V512 op1, V512 op2, + __mmask32 k, int zero_mask) +{ + V512 v1, v2, v3, v4, v5, v6, v7, v8; + int i; + __mmask16 m1, m2; + + m1 = k & 0xffff; + m2 = (k >> 16) & 0xffff; + + unpack_ph_2twops(op1, &v1, &v2); + unpack_ph_2twops(op2, &v3, &v4); + unpack_ph_2twops(*dest, &v7, &v8); + + for (i = 0; i < 16; i++) { + if (((1 << i) & m1) == 0) { + if (zero_mask) { + v5.f32[i] = 0; + } + else { + v5.u32[i] = v7.u32[i]; + } + } + else { + v5.f32[i] = v1.f32[i] > v3.f32[i] ? v1.f32[i] : v3.f32[i]; + } + + if (((1 << i) & m2) == 0) { + if (zero_mask) { + v6.f32[i] = 0; + } + else { + v6.u32[i] = v8.u32[i]; + } + } + else { + v6.f32[i] = v2.f32[i] > v4.f32[i] ? v2.f32[i] : v4.f32[i]; + } + } + *dest = pack_twops_2ph(v5, v6); +} + +void +TEST (void) +{ + V512 res; + V512 exp; + + init_src(); + + EMULATE(max_ph) (&exp, src1, src2, NET_MASK, 0); + HF(res) = INTRINSIC (_max_ph) (HF(src1), HF(src2)); + CHECK_RESULT (&res, &exp, N_ELEMS, _max_ph); + + init_dest(&res, &exp); + EMULATE(max_ph) (&exp, src1, src2, MASK_VALUE, 0); + HF(res) = INTRINSIC (_mask_max_ph) (HF(res), MASK_VALUE, HF(src1), HF(src2)); + CHECK_RESULT (&res, &exp, N_ELEMS, _mask_max_ph); + + EMULATE(max_ph) (&exp, src1, src2, ZMASK_VALUE, 1); + HF(res) = INTRINSIC (_maskz_max_ph) (ZMASK_VALUE, HF(src1), HF(src2)); + CHECK_RESULT (&res, &exp, N_ELEMS, _maskz_max_ph); + +#if AVX512F_LEN == 512 + EMULATE(max_ph) (&exp, src1, src2, NET_MASK, 0); + HF(res) = INTRINSIC (_max_round_ph) (HF(src1), HF(src2), 8); + CHECK_RESULT (&res, &exp, N_ELEMS, _max_ph); + + init_dest(&res, &exp); + EMULATE(max_ph) (&exp, src1, src2, MASK_VALUE, 0); + HF(res) = INTRINSIC (_mask_max_round_ph) (HF(res), MASK_VALUE, HF(src1), HF(src2), 8); + CHECK_RESULT (&res, &exp, N_ELEMS, _mask_max_ph); + + EMULATE(max_ph) (&exp, src1, src2, ZMASK_VALUE, 1); + HF(res) = INTRINSIC (_maskz_max_round_ph) (ZMASK_VALUE, HF(src1), HF(src2), 8); + CHECK_RESULT (&res, &exp, N_ELEMS, _maskz_max_ph); + +#endif + + if (n_errs != 0) { + abort (); + } +} + diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vmaxsh-1.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vmaxsh-1.c new file mode 100644 index 00000000000..d5198dcebdc --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vmaxsh-1.c @@ -0,0 +1,27 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx512fp16 -O2" } */ +/* { dg-final { scan-assembler-times "vmaxsh\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vmaxsh\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vmaxsh\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vmaxsh\[ \\t\]+\{sae\}\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vmaxsh\[ \\t\]+\{sae\}\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vmaxsh\[ \\t\]+\{sae\}\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ + +#include + +volatile __m128h res; +volatile __m128h x1, x2; +volatile __mmask16 m16; +volatile __mmask8 m8; + +void extern +avx512f_test (void) +{ + res = _mm_max_sh (x1, x2); + res = _mm_mask_max_sh (res, m8, x1, x2); + res = _mm_maskz_max_sh (m8, x1, x2); + + res = _mm_max_round_sh (x1, x2, 8); + res = _mm_mask_max_round_sh (res, m8, x1, x2, 8); + res = _mm_maskz_max_round_sh (m8, x1, x2, 8); +} diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vmaxsh-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vmaxsh-1b.c new file mode 100644 index 00000000000..fe49de3147f --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vmaxsh-1b.c @@ -0,0 +1,72 @@ +/* { dg-do run { target avx512fp16 } } */ +/* { dg-options "-O2 -mavx512fp16 -mavx512dq" } */ + + +#define AVX512FP16 +#include "avx512fp16-helper.h" + +#define N_ELEMS 8 + +void NOINLINE +emulate_max_sh(V512 * dest, V512 op1, V512 op2, + __mmask8 k, int zero_mask) +{ + V512 v1, v2, v3, v4, v5, v6, v7, v8; + int i; + + unpack_ph_2twops(op1, &v1, &v2); + unpack_ph_2twops(op2, &v3, &v4); + unpack_ph_2twops(*dest, &v7, &v8); + + if ((k&1) || !k) + v5.f32[i] = v1.f32[i] > v3.f32[i] ? v1.f32[i] : v3.f32[i]; + else if (zero_mask) + v5.f32[0] = 0; + else + v5.f32[0] = v7.f32[0]; + + for (i = 1; i < 8; i++) + v5.f32[i] = v1.f32[i]; + + *dest = pack_twops_2ph(v5, v6); +} + +void +test_512 (void) +{ + V512 res; + V512 exp; + + init_src(); + + emulate_max_sh(&exp, src1, src2, 0x1, 0); + res.xmmh[0] = _mm_max_sh(src1.xmmh[0], src2.xmmh[0]); + check_results(&res, &exp, N_ELEMS, "_mm_max_sh"); + + init_dest(&res, &exp); + emulate_max_sh(&exp, src1, src2, 0x1, 0); + res.xmmh[0] = _mm_mask_max_sh(res.xmmh[0], 0x1, src1.xmmh[0], src2.xmmh[0]); + check_results(&res, &exp, N_ELEMS, "_mm_mask_max_sh"); + + emulate_max_sh(&exp, src1, src2, 0x3, 1); + res.xmmh[0] = _mm_maskz_max_sh(0x3, src1.xmmh[0], src2.xmmh[0]); + check_results(&res, &exp, N_ELEMS, "_mm_maskz_max_sh"); + + emulate_max_sh(&exp, src1, src2, 0x1, 0); + res.xmmh[0] = _mm_max_round_sh(src1.xmmh[0], src2.xmmh[0], 8); + check_results(&res, &exp, N_ELEMS, "_mm_max_round_sh"); + + init_dest(&res, &exp); + emulate_max_sh(&exp, src1, src2, 0x1, 0); + res.xmmh[0] = _mm_mask_max_round_sh(res.xmmh[0], 0x1, src1.xmmh[0], src2.xmmh[0], 8); + check_results(&res, &exp, N_ELEMS, "_mm_mask_max_round_sh"); + + emulate_max_sh(&exp, src1, src2, 0x3, 1); + res.xmmh[0] = _mm_maskz_max_round_sh(0x3, src1.xmmh[0], src2.xmmh[0], 8); + check_results(&res, &exp, N_ELEMS, "_mm_maskz_max_round_sh"); + + if (n_errs != 0) + abort (); +} + + diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vminph-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vminph-1a.c new file mode 100644 index 00000000000..810a93e3870 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vminph-1a.c @@ -0,0 +1,26 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx512fp16 -O2" } */ +/* { dg-final { scan-assembler-times "vminph\[ \\t\]+%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vminph\[ \\t\]+%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vminph\[ \\t\]+%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vminph\[ \\t\]+\{sae\}\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vminph\[ \\t\]+\{sae\}\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 2 } } */ +/* { dg-final { scan-assembler-times "vminph\[ \\t\]+\{sae\}\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ + +#include + +volatile __m512h res, res1, res2; +volatile __m512h x1, x2; +volatile __mmask32 m32; + +void extern +avx512f_test (void) +{ + res = _mm512_min_ph (x1, x2); + res1 = _mm512_mask_min_ph (res1, m32, x1, x2); + res2 = _mm512_maskz_min_ph (m32, x1, x2); + + res = _mm512_min_round_ph (x1, x2, 8); + res1 = _mm512_mask_min_round_ph (res1, m32, x1, x2, 8); + res2 = _mm512_maskz_min_round_ph (m32, x1, x2, 8); +} diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vminph-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vminph-1b.c new file mode 100644 index 00000000000..3315ce13813 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vminph-1b.c @@ -0,0 +1,93 @@ +/* { dg-do run { target avx512fp16 } } */ +/* { dg-options "-O2 -mavx512fp16 -mavx512dq" } */ + + +#define AVX512FP16 +#include "avx512fp16-helper.h" + +#define N_ELEMS (AVX512F_LEN / 16) + +void NOINLINE +EMULATE(min_ph) (V512 * dest, V512 op1, V512 op2, + __mmask32 k, int zero_mask) +{ + V512 v1, v2, v3, v4, v5, v6, v7, v8; + int i; + __mmask16 m1, m2; + + m1 = k & 0xffff; + m2 = (k >> 16) & 0xffff; + + unpack_ph_2twops(op1, &v1, &v2); + unpack_ph_2twops(op2, &v3, &v4); + unpack_ph_2twops(*dest, &v7, &v8); + + for (i = 0; i < 16; i++) { + if (((1 << i) & m1) == 0) { + if (zero_mask) { + v5.f32[i] = 0; + } + else { + v5.u32[i] = v7.u32[i]; + } + } + else { + v5.f32[i] = v1.f32[i] < v3.f32[i] ? v1.f32[i] : v3.f32[i]; + } + + if (((1 << i) & m2) == 0) { + if (zero_mask) { + v6.f32[i] = 0; + } + else { + v6.u32[i] = v8.u32[i]; + } + } + else { + v6.f32[i] = v2.f32[i] < v4.f32[i] ? v2.f32[i] : v4.f32[i]; + } + } + *dest = pack_twops_2ph(v5, v6); +} + +void +TEST (void) +{ + V512 res; + V512 exp; + + init_src(); + + EMULATE(min_ph) (&exp, src1, src2, NET_MASK, 0); + HF(res) = INTRINSIC (_min_ph) (HF(src1), HF(src2)); + CHECK_RESULT (&res, &exp, N_ELEMS, _min_ph); + + init_dest(&res, &exp); + EMULATE(min_ph) (&exp, src1, src2, MASK_VALUE, 0); + HF(res) = INTRINSIC (_mask_min_ph) (HF(res), MASK_VALUE, HF(src1), HF(src2)); + CHECK_RESULT (&res, &exp, N_ELEMS, _mask_min_ph); + + EMULATE(min_ph) (&exp, src1, src2, ZMASK_VALUE, 1); + HF(res) = INTRINSIC (_maskz_min_ph) (ZMASK_VALUE, HF(src1), HF(src2)); + CHECK_RESULT (&res, &exp, N_ELEMS, _maskz_min_ph); + +#if AVX512F_LEN == 512 + EMULATE(min_ph) (&exp, src1, src2, NET_MASK, 0); + HF(res) = INTRINSIC (_min_round_ph) (HF(src1), HF(src2), 8); + CHECK_RESULT (&res, &exp, N_ELEMS, _min_ph); + + init_dest(&res, &exp); + EMULATE(min_ph) (&exp, src1, src2, MASK_VALUE, 0); + HF(res) = INTRINSIC (_mask_min_round_ph) (HF(res), MASK_VALUE, HF(src1), HF(src2), 8); + CHECK_RESULT (&res, &exp, N_ELEMS, _mask_min_ph); + + EMULATE(min_ph) (&exp, src1, src2, ZMASK_VALUE, 1); + HF(res) = INTRINSIC (_maskz_min_round_ph) (ZMASK_VALUE, HF(src1), HF(src2), 8); + CHECK_RESULT (&res, &exp, N_ELEMS, _maskz_min_ph); +#endif + + if (n_errs != 0) { + abort (); + } +} + diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vminsh-1.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vminsh-1.c new file mode 100644 index 00000000000..9f1d6e7da4b --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vminsh-1.c @@ -0,0 +1,27 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx512fp16 -O2" } */ +/* { dg-final { scan-assembler-times "vminsh\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vminsh\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vminsh\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vminsh\[ \\t\]+\{sae\}\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vminsh\[ \\t\]+\{sae\}\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vminsh\[ \\t\]+\{sae\}\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ + +#include + +volatile __m128h res; +volatile __m128h x1, x2; +volatile __mmask16 m16; +volatile __mmask8 m8; + +void extern +avx512f_test (void) +{ + res = _mm_min_sh (x1, x2); + res = _mm_mask_min_sh (res, m8, x1, x2); + res = _mm_maskz_min_sh (m8, x1, x2); + + res = _mm_min_round_sh (x1, x2, 8); + res = _mm_mask_min_round_sh (res, m8, x1, x2, 8); + res = _mm_maskz_min_round_sh (m8, x1, x2, 8); +} diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vminsh-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vminsh-1b.c new file mode 100644 index 00000000000..13b8d86689c --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vminsh-1b.c @@ -0,0 +1,72 @@ +/* { dg-do run { target avx512fp16 } } */ +/* { dg-options "-O2 -mavx512fp16 -mavx512dq" } */ + + +#define AVX512FP16 +#include "avx512fp16-helper.h" + +#define N_ELEMS 8 + +void NOINLINE +emulate_min_sh(V512 * dest, V512 op1, V512 op2, + __mmask8 k, int zero_mask) +{ + V512 v1, v2, v3, v4, v5, v6, v7, v8; + int i; + + unpack_ph_2twops(op1, &v1, &v2); + unpack_ph_2twops(op2, &v3, &v4); + unpack_ph_2twops(*dest, &v7, &v8); + + if ((k&1) || !k) + v5.f32[i] = v1.f32[i] < v3.f32[i] ? v1.f32[i] : v3.f32[i]; + else if (zero_mask) + v5.f32[0] = 0; + else + v5.f32[0] = v7.f32[0]; + + for (i = 1; i < 8; i++) + v5.f32[i] = v1.f32[i]; + + *dest = pack_twops_2ph(v5, v6); +} + +void +test_512 (void) +{ + V512 res; + V512 exp; + + init_src(); + + emulate_min_sh(&exp, src1, src2, 0x1, 0); + res.xmmh[0] = _mm_min_sh(src1.xmmh[0], src2.xmmh[0]); + check_results(&res, &exp, N_ELEMS, "_mm_min_sh"); + + init_dest(&res, &exp); + emulate_min_sh(&exp, src1, src2, 0x1, 0); + res.xmmh[0] = _mm_mask_min_sh(res.xmmh[0], 0x1, src1.xmmh[0], src2.xmmh[0]); + check_results(&res, &exp, N_ELEMS, "_mm_mask_min_sh"); + + emulate_min_sh(&exp, src1, src2, 0x3, 1); + res.xmmh[0] = _mm_maskz_min_sh(0x3, src1.xmmh[0], src2.xmmh[0]); + check_results(&res, &exp, N_ELEMS, "_mm_maskz_min_sh"); + + emulate_min_sh(&exp, src1, src2, 0x1, 0); + res.xmmh[0] = _mm_min_round_sh(src1.xmmh[0], src2.xmmh[0], 8); + check_results(&res, &exp, N_ELEMS, "_mm_min_round_sh"); + + init_dest(&res, &exp); + emulate_min_sh(&exp, src1, src2, 0x1, 0); + res.xmmh[0] = _mm_mask_min_round_sh(res.xmmh[0], 0x1, src1.xmmh[0], src2.xmmh[0], 8); + check_results(&res, &exp, N_ELEMS, "_mm_mask_min_round_sh"); + + emulate_min_sh(&exp, src1, src2, 0x3, 1); + res.xmmh[0] = _mm_maskz_min_round_sh(0x3, src1.xmmh[0], src2.xmmh[0], 8); + check_results(&res, &exp, N_ELEMS, "_mm_maskz_min_round_sh"); + + if (n_errs != 0) + abort (); +} + + diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-vmaxph-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vmaxph-1a.c new file mode 100644 index 00000000000..adadc4ed8d0 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vmaxph-1a.c @@ -0,0 +1,29 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx512fp16 -mavx512vl -O2" } */ +/* { dg-final { scan-assembler-times "vmaxph\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vmaxph\[ \\t\]+%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vmaxph\[ \\t\]+%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vmaxph\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vmaxph\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vmaxph\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ + +#include + +volatile __m256h res1; +volatile __m128h res2; +volatile __m256h x1,x2; +volatile __m128h x3, x4; +volatile __mmask16 m16; +volatile __mmask8 m8; + +void extern +avx512f_test (void) +{ + res1 = _mm256_max_ph (x1, x2); + res1 = _mm256_mask_max_ph (res1, m16, x1, x2); + res1 = _mm256_maskz_max_ph (m16, x1, x2); + + res2 = _mm_max_ph (x3, x4); + res2 = _mm_mask_max_ph (res2, m8, x3, x4); + res2 = _mm_maskz_max_ph (m8, x3, x4); +} diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-vmaxph-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vmaxph-1b.c new file mode 100644 index 00000000000..f9a3b70d47c --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vmaxph-1b.c @@ -0,0 +1,16 @@ +/* { dg-do run { target avx512fp16 } } */ +/* { dg-options "-O2 -mavx512fp16 -mavx512vl -mavx512dq" } */ + +#define DEBUG +#define AVX512VL +#define AVX512F_LEN 256 +#define AVX512F_LEN_HALF 128 +#include "avx512fp16-vmaxph-1b.c" + +#undef AVX512F_LEN +#undef AVX512F_LEN_HALF + +#define AVX512F_LEN 128 +#define AVX512F_LEN_HALF 128 +#include "avx512fp16-vmaxph-1b.c" + diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-vminph-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vminph-1a.c new file mode 100644 index 00000000000..7909541aa34 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vminph-1a.c @@ -0,0 +1,29 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx512fp16 -mavx512vl -O2" } */ +/* { dg-final { scan-assembler-times "vminph\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vminph\[ \\t\]+%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vminph\[ \\t\]+%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vminph\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vminph\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vminph\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ + +#include + +volatile __m256h res1; +volatile __m128h res2; +volatile __m256h x1,x2; +volatile __m128h x3, x4; +volatile __mmask16 m16; +volatile __mmask8 m8; + +void extern +avx512f_test (void) +{ + res1 = _mm256_min_ph (x1, x2); + res1 = _mm256_mask_min_ph (res1, m16, x1, x2); + res1 = _mm256_maskz_min_ph (m16, x1, x2); + + res2 = _mm_min_ph (x3, x4); + res2 = _mm_mask_min_ph (res2, m8, x3, x4); + res2 = _mm_maskz_min_ph (m8, x3, x4); +} diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-vminph-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vminph-1b.c new file mode 100644 index 00000000000..98808b0eddd --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vminph-1b.c @@ -0,0 +1,16 @@ +/* { dg-do run { target avx512fp16 } } */ +/* { dg-options "-O2 -mavx512fp16 -mavx512vl -mavx512dq" } */ + +#define DEBUG +#define AVX512VL +#define AVX512F_LEN 256 +#define AVX512F_LEN_HALF 128 +#include "avx512fp16-vminph-1b.c" + +#undef AVX512F_LEN +#undef AVX512F_LEN_HALF + +#define AVX512F_LEN 128 +#define AVX512F_LEN_HALF 128 +#include "avx512fp16-vminph-1b.c" + From patchwork Thu Jul 1 06:16:00 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: liuhongt X-Patchwork-Id: 1499321 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256 header.s=default header.b=BmTGcrWt; dkim-atps=neutral Received: from sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4GFpMD2y0Zz9sWX for ; Thu, 1 Jul 2021 16:33:42 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id A3F453848015 for ; Thu, 1 Jul 2021 06:33:39 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org A3F453848015 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1625121219; bh=Lav+o5dD2jIUTtoCoenehu0NGnhfQnILqxm/ALr4we4=; h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=BmTGcrWt/gzTBwA/YztGgbaB8A8+uJaVYpfQxO8WiXu3rWNVuv2Ge2ehR87mF51zK IU2k+4j/D44NUQjw8PUnLCOdhzUyQ7yMYLnKMFy7RaA1K4vKgUJ0Jtlymwx4wlrS/i i4vj+2YXxdDaF5NkpyV3bjYi4JbYHT6OYJLELMu4= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mga01.intel.com (mga01.intel.com [192.55.52.88]) by sourceware.org (Postfix) with ESMTPS id CF74D384B006 for ; Thu, 1 Jul 2021 06:17:15 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org CF74D384B006 X-IronPort-AV: E=McAfee;i="6200,9189,10031"; a="230128666" X-IronPort-AV: E=Sophos;i="5.83,313,1616482800"; d="scan'208";a="230128666" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Jun 2021 23:17:14 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.83,313,1616482800"; d="scan'208";a="455498282" Received: from scymds01.sc.intel.com ([10.148.94.138]) by orsmga008.jf.intel.com with ESMTP; 30 Jun 2021 23:17:13 -0700 Received: from shliclel320.sh.intel.com (shliclel320.sh.intel.com [10.239.236.50]) by scymds01.sc.intel.com with ESMTP id 1616Gmer031625; Wed, 30 Jun 2021 23:17:12 -0700 To: gcc-patches@gcc.gnu.org Subject: [PATCH 14/62] AVX512FP16: Add vcmpph/vcmpsh/vcomish/vucomish. Date: Thu, 1 Jul 2021 14:16:00 +0800 Message-Id: <20210701061648.9447-15-hongtao.liu@intel.com> X-Mailer: git-send-email 2.18.1 In-Reply-To: <20210701061648.9447-1-hongtao.liu@intel.com> References: <20210701061648.9447-1-hongtao.liu@intel.com> X-Spam-Status: No, score=-12.5 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_NONE, KAM_DMARC_STATUS, KAM_LAZY_DOMAIN_SECURITY, KAM_SHORT, RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: liuhongt via Gcc-patches From: liuhongt Reply-To: liuhongt Cc: jakub@redhat.com Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" gcc/ChangeLog: * config/i386/avx512fp16intrin.h: (_mm512_cmp_ph_mask): New intrinsic. (_mm512_mask_cmp_ph_mask): Likewise. (_mm512_cmp_round_ph_mask): Likewise. (_mm512_mask_cmp_round_ph_mask): Likewise. (_mm_cmp_sh_mask): Likewise. (_mm_mask_cmp_sh_mask): Likewise. (_mm_cmp_round_sh_mask): Likewise. (_mm_mask_cmp_round_sh_mask): Likewise. (_mm_comieq_sh): Likewise. (_mm_comilt_sh): Likewise. (_mm_comile_sh): Likewise. (_mm_comigt_sh): Likewise. (_mm_comige_sh): Likewise. (_mm_comineq_sh): Likewise. (_mm_ucomieq_sh): Likewise. (_mm_ucomilt_sh): Likewise. (_mm_ucomile_sh): Likewise. (_mm_ucomigt_sh): Likewise. (_mm_ucomige_sh): Likewise. (_mm_ucomineq_sh): Likewise. (_mm_comi_round_sh): Likewise. (_mm_comi_sh): Likewise. * config/i386/avx512fp16vlintrin.h (_mm_cmp_ph_mask): New intrinsic. (_mm_mask_cmp_ph_mask): Likewise. (_mm256_cmp_ph_mask): Likewise. (_mm256_mask_cmp_ph_mask): Likewise. * config/i386/i386-builtin-types.def: Add corresponding builtin types. * config/i386/i386-builtin.def: Add corresponding new builtins. * config/i386/i386-expand.c (ix86_expand_args_builtin): Handle new builtin types. (ix86_expand_round_builtin): Ditto. * config/i386/i386.md (ssevecmode): Add HF mode. * config/i386/sse.md (V48H_AVX512VL): New mode iterator to support HF vector modes. Ajdust corresponding description. (ssecmpintprefix): New. (VI12_AVX512VL): Adjust to support HF vector modes. (cmp_imm_predicate): Likewise. (_cmp3): Likewise. (avx512f_vmcmp3): Likewise. (avx512f_vmcmp3_mask): Likewise. (_comi): Likewise. gcc/testsuite/ChangeLog: * gcc.target/i386/avx-1.c: Add test for new builtins. * gcc.target/i386/sse-13.c: Ditto. * gcc.target/i386/sse-23.c: Ditto. * gcc.target/i386/sse-14.c: Add test for new intrinsics. * gcc.target/i386/sse-22.c: Ditto. --- gcc/config/i386/avx512fp16intrin.h | 250 +++++++++++++++++++++++++ gcc/config/i386/avx512fp16vlintrin.h | 50 +++++ gcc/config/i386/i386-builtin-types.def | 5 + gcc/config/i386/i386-builtin.def | 5 + gcc/config/i386/i386-expand.c | 10 + gcc/config/i386/i386.md | 2 +- gcc/config/i386/sse.md | 56 ++++-- gcc/testsuite/gcc.target/i386/avx-1.c | 7 + gcc/testsuite/gcc.target/i386/sse-13.c | 7 + gcc/testsuite/gcc.target/i386/sse-14.c | 16 ++ gcc/testsuite/gcc.target/i386/sse-22.c | 16 ++ gcc/testsuite/gcc.target/i386/sse-23.c | 7 + 12 files changed, 413 insertions(+), 18 deletions(-) diff --git a/gcc/config/i386/avx512fp16intrin.h b/gcc/config/i386/avx512fp16intrin.h index c232419b4db..ed8ad84a105 100644 --- a/gcc/config/i386/avx512fp16intrin.h +++ b/gcc/config/i386/avx512fp16intrin.h @@ -985,6 +985,256 @@ _mm_maskz_min_round_sh (__mmask8 __A, __m128h __B, __m128h __C, #endif /* __OPTIMIZE__ */ +/* vcmpph */ +#ifdef __OPTIMIZE +extern __inline __mmask32 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_cmp_ph_mask (__m512h __A, __m512h __B, const int __C) +{ + return (__mmask32) __builtin_ia32_vcmpph_v32hf_mask (__A, __B, __C, + (__mmask32) -1); +} + +extern __inline __mmask32 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_cmp_ph_mask (__mmask32 __A, __m512h __B, __m512h __C, + const int __D) +{ + return (__mmask32) __builtin_ia32_vcmpph_v32hf_mask (__B, __C, __D, + __A); +} + +extern __inline __mmask32 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_cmp_round_ph_mask (__m512h __A, __m512h __B, const int __C, + const int __D) +{ + return (__mmask32) __builtin_ia32_vcmpph_v32hf_mask_round (__A, __B, + __C, (__mmask32) -1, + __D); +} + +extern __inline __mmask32 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_cmp_round_ph_mask (__mmask32 __A, __m512h __B, __m512h __C, + const int __D, const int __E) +{ + return (__mmask32) __builtin_ia32_vcmpph_v32hf_mask_round (__B, __C, + __D, __A, + __E); +} + +#else +#define _mm512_cmp_ph_mask(A, B, C) \ + (__builtin_ia32_vcmpph_v32hf_mask ((A), (B), (C), (-1))) + +#define _mm512_mask_cmp_ph_mask(A, B, C, D) \ + (__builtin_ia32_vcmpph_v32hf_mask ((B), (C), (D), (A))) + +#define _mm512_cmp_round_ph_mask(A, B, C, D) \ + (__builtin_ia32_vcmpph_v32hf_mask_round ((A), (B), (C), (-1), (D))) + +#define _mm512_mask_cmp_round_ph_mask(A, B, C, D, E) \ + (__builtin_ia32_vcmpph_v32hf_mask_round ((B), (C), (D), (A), (E))) + +#endif /* __OPTIMIZE__ */ + +/* Intrinsics vcmpsh. */ +#ifdef __OPTIMIZE__ +extern __inline __mmask8 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cmp_sh_mask (__m128h __A, __m128h __B, const int __C) +{ + return (__mmask8) + __builtin_ia32_vcmpsh_v8hf_mask_round (__A, __B, + __C, (__mmask8) -1, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __mmask8 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_cmp_sh_mask (__mmask8 __A, __m128h __B, __m128h __C, + const int __D) +{ + return (__mmask8) + __builtin_ia32_vcmpsh_v8hf_mask_round (__B, __C, + __D, __A, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __mmask8 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cmp_round_sh_mask (__m128h __A, __m128h __B, const int __C, + const int __D) +{ + return (__mmask8) __builtin_ia32_vcmpsh_v8hf_mask_round (__A, __B, + __C, (__mmask8) -1, + __D); +} + +extern __inline __mmask8 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_cmp_round_sh_mask (__mmask8 __A, __m128h __B, __m128h __C, + const int __D, const int __E) +{ + return (__mmask8) __builtin_ia32_vcmpsh_v8hf_mask_round (__B, __C, + __D, __A, + __E); +} + +#else +#define _mm_cmp_sh_mask(A, B, C) \ + (__builtin_ia32_vcmpsh_v8hf_mask_round ((A), (B), (C), (-1), \ + (_MM_FROUND_CUR_DIRECTION))) + +#define _mm_mask_cmp_sh_mask(A, B, C, D) \ + (__builtin_ia32_vcmpsh_v8hf_mask_round ((B), (C), (D), (A), \ + (_MM_FROUND_CUR_DIRECTION))) + +#define _mm_cmp_round_sh_mask(A, B, C, D) \ + (__builtin_ia32_vcmpsh_v8hf_mask_round ((A), (B), (C), (-1), (D))) + +#define _mm_mask_cmp_round_sh_mask(A, B, C, D, E) \ + (__builtin_ia32_vcmpsh_v8hf_mask_round ((B), (C), (D), (A), (E))) + +#endif /* __OPTIMIZE__ */ + +/* Intrinsics vcomish. */ +extern __inline int +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_comieq_sh (__m128h __A, __m128h __B) +{ + return __builtin_ia32_vcmpsh_v8hf_mask_round (__A, __B, _CMP_EQ_OS, + (__mmask8) -1, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline int +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_comilt_sh (__m128h __A, __m128h __B) +{ + return __builtin_ia32_vcmpsh_v8hf_mask_round (__A, __B, _CMP_LT_OS, + (__mmask8) -1, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline int +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_comile_sh (__m128h __A, __m128h __B) +{ + return __builtin_ia32_vcmpsh_v8hf_mask_round (__A, __B, _CMP_LE_OS, + (__mmask8) -1, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline int +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_comigt_sh (__m128h __A, __m128h __B) +{ + return __builtin_ia32_vcmpsh_v8hf_mask_round (__A, __B, _CMP_GT_OS, + (__mmask8) -1, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline int +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_comige_sh (__m128h __A, __m128h __B) +{ + return __builtin_ia32_vcmpsh_v8hf_mask_round (__A, __B, _CMP_GE_OS, + (__mmask8) -1, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline int +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_comineq_sh (__m128h __A, __m128h __B) +{ + return __builtin_ia32_vcmpsh_v8hf_mask_round (__A, __B, _CMP_NEQ_US, + (__mmask8) -1, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline int +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_ucomieq_sh (__m128h __A, __m128h __B) +{ + return __builtin_ia32_vcmpsh_v8hf_mask_round (__A, __B, _CMP_EQ_OQ, + (__mmask8) -1, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline int +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_ucomilt_sh (__m128h __A, __m128h __B) +{ + return __builtin_ia32_vcmpsh_v8hf_mask_round (__A, __B, _CMP_LT_OQ, + (__mmask8) -1, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline int +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_ucomile_sh (__m128h __A, __m128h __B) +{ + return __builtin_ia32_vcmpsh_v8hf_mask_round (__A, __B, _CMP_LE_OQ, + (__mmask8) -1, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline int +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_ucomigt_sh (__m128h __A, __m128h __B) +{ + return __builtin_ia32_vcmpsh_v8hf_mask_round (__A, __B, _CMP_GT_OQ, + (__mmask8) -1, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline int +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_ucomige_sh (__m128h __A, __m128h __B) +{ + return __builtin_ia32_vcmpsh_v8hf_mask_round (__A, __B, _CMP_GE_OQ, + (__mmask8) -1, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline int +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_ucomineq_sh (__m128h __A, __m128h __B) +{ + return __builtin_ia32_vcmpsh_v8hf_mask_round (__A, __B, _CMP_NEQ_UQ, + (__mmask8) -1, + _MM_FROUND_CUR_DIRECTION); +} + +#ifdef __OPTIMIZE__ +extern __inline int +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) + _mm_comi_sh (__m128h __A, __m128h __B, const int __P) +{ + return __builtin_ia32_vcmpsh_v8hf_mask_round (__A, __B, __P, + (__mmask8) -1, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline int +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_comi_round_sh (__m128h __A, __m128h __B, const int __P, const int __R) +{ + return __builtin_ia32_vcmpsh_v8hf_mask_round (__A, __B, __P, + (__mmask8) -1,__R); +} + +#else +#define _mm_comi_round_sh(A, B, P, R) \ + (__builtin_ia32_vcmpsh_v8hf_mask_round ((A), (B), (P), (__mmask8) (-1), (R))) +#define _mm_comi_sh(A, B, P) \ + (__builtin_ia32_vcmpsh_v8hf_mask_round ((A), (B), (P), (__mmask8) (-1), \ + _MM_FROUND_CUR_DIRECTION)) + +#endif /* __OPTIMIZE__ */ + #ifdef __DISABLE_AVX512FP16__ #undef __DISABLE_AVX512FP16__ #pragma GCC pop_options diff --git a/gcc/config/i386/avx512fp16vlintrin.h b/gcc/config/i386/avx512fp16vlintrin.h index bd60b4cd4ca..1787ed5f4ff 100644 --- a/gcc/config/i386/avx512fp16vlintrin.h +++ b/gcc/config/i386/avx512fp16vlintrin.h @@ -308,6 +308,56 @@ _mm256_maskz_min_ph (__mmask16 __A, __m256h __B, __m256h __C) _mm256_setzero_ph (), __A); } +/* vcmpph */ +#ifdef __OPTIMIZE +extern __inline __mmask8 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cmp_ph_mask (__m128h __A, __m128h __B, const int __C) +{ + return (__mmask8) __builtin_ia32_vcmpph_v8hf_mask (__A, __B, __C, + (__mmask8) -1); +} + +extern __inline __mmask8 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_cmp_ph_mask (__mmask8 __A, __m128h __B, __m128h __C, + const int __D) +{ + return (__mmask8) __builtin_ia32_vcmpph_v8hf_mask (__B, __C, __D, __A); +} + +extern __inline __mmask16 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cmp_ph_mask (__m256h __A, __m256h __B, const int __C) +{ + return (__mmask16) __builtin_ia32_vcmpph_v16hf_mask (__A, __B, __C, + (__mmask16) -1); +} + +extern __inline __mmask16 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_cmp_ph_mask (__mmask16 __A, __m256h __B, __m256h __C, + const int __D) +{ + return (__mmask16) __builtin_ia32_vcmpph_v16hf_mask (__B, __C, __D, + __A); +} + +#else +#define _mm_cmp_ph_mask(A, B, C) \ + (__builtin_ia32_vcmpph_v8hf_mask ((A), (B), (C), (-1))) + +#define _mm_mask_cmp_ph_mask(A, B, C, D) \ + (__builtin_ia32_vcmpph_v8hf_mask ((B), (C), (D), (A))) + +#define _mm256_cmp_ph_mask(A, B, C) \ + (__builtin_ia32_vcmpph_v16hf_mask ((A), (B), (C), (-1))) + +#define _mm256_mask_cmp_ph_mask(A, B, C, D) \ + (__builtin_ia32_vcmpph_v16hf_mask ((B), (C), (D), (A))) + +#endif /* __OPTIMIZE__ */ + #ifdef __DISABLE_AVX512FP16VL__ #undef __DISABLE_AVX512FP16VL__ #pragma GCC pop_options diff --git a/gcc/config/i386/i386-builtin-types.def b/gcc/config/i386/i386-builtin-types.def index 3bd2670e229..e3070ad00bd 100644 --- a/gcc/config/i386/i386-builtin-types.def +++ b/gcc/config/i386/i386-builtin-types.def @@ -1306,10 +1306,15 @@ DEF_FUNCTION_TYPE (UINT8, PV2DI, PCV2DI, PCVOID) DEF_FUNCTION_TYPE (V8HF, V8HI) DEF_FUNCTION_TYPE (V8HF, V8HF, V8HF) DEF_FUNCTION_TYPE (V8HF, V8HF, V8HF, INT) +DEF_FUNCTION_TYPE (UQI, V8HF, V8HF, INT, UQI) DEF_FUNCTION_TYPE (V8HF, V8HF, V8HF, V8HF, UQI) +DEF_FUNCTION_TYPE (UQI, V8HF, V8HF, INT, UQI, INT) DEF_FUNCTION_TYPE (V8HF, V8HF, V8HF, V8HF, UQI, INT) DEF_FUNCTION_TYPE (V16HF, V16HF, V16HF) +DEF_FUNCTION_TYPE (UHI, V16HF, V16HF, INT, UHI) DEF_FUNCTION_TYPE (V16HF, V16HF, V16HF, V16HF, UHI) DEF_FUNCTION_TYPE (V32HF, V32HF, V32HF, INT) +DEF_FUNCTION_TYPE (USI, V32HF, V32HF, INT, USI) DEF_FUNCTION_TYPE (V32HF, V32HF, V32HF, V32HF, USI) +DEF_FUNCTION_TYPE (USI, V32HF, V32HF, INT, USI, INT) DEF_FUNCTION_TYPE (V32HF, V32HF, V32HF, V32HF, USI, INT) diff --git a/gcc/config/i386/i386-builtin.def b/gcc/config/i386/i386-builtin.def index 28e5627ca4c..045cf561ec7 100644 --- a/gcc/config/i386/i386-builtin.def +++ b/gcc/config/i386/i386-builtin.def @@ -2799,6 +2799,9 @@ BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_sminv16hf BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_sminv32hf3_mask, "__builtin_ia32_vminph_v32hf_mask", IX86_BUILTIN_VMINPH_V32HF_MASK, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI) BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vmsmaxv8hf3_mask, "__builtin_ia32_vmaxsh_v8hf_mask", IX86_BUILTIN_VMAXSH_V8HF_MASK, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI) BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vmsminv8hf3_mask, "__builtin_ia32_vminsh_v8hf_mask", IX86_BUILTIN_VMINSH_V8HF_MASK, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI) +BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_cmpv8hf3_mask, "__builtin_ia32_vcmpph_v8hf_mask", IX86_BUILTIN_VCMPPH_V8HF_MASK, UNKNOWN, (int) UQI_FTYPE_V8HF_V8HF_INT_UQI) +BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512vl_cmpv16hf3_mask, "__builtin_ia32_vcmpph_v16hf_mask", IX86_BUILTIN_VCMPPH_V16HF_MASK, UNKNOWN, (int) UHI_FTYPE_V16HF_V16HF_INT_UHI) +BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_cmpv32hf3_mask, "__builtin_ia32_vcmpph_v32hf_mask", IX86_BUILTIN_VCMPPH_V32HF_MASK, UNKNOWN, (int) USI_FTYPE_V32HF_V32HF_INT_USI) /* Builtins with rounding support. */ BDESC_END (ARGS, ROUND_ARGS) @@ -3012,6 +3015,8 @@ BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_smaxv32hf3_mask_round, "__builti BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_sminv32hf3_mask_round, "__builtin_ia32_vminph_v32hf_mask_round", IX86_BUILTIN_VMINPH_V32HF_MASK_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT) BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vmsmaxv8hf3_mask_round, "__builtin_ia32_vmaxsh_v8hf_mask_round", IX86_BUILTIN_VMAXSH_V8HF_MASK_ROUND, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI_INT) BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vmsminv8hf3_mask_round, "__builtin_ia32_vminsh_v8hf_mask_round", IX86_BUILTIN_VMINSH_V8HF_MASK_ROUND, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI_INT) +BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_cmpv32hf3_mask_round, "__builtin_ia32_vcmpph_v32hf_mask_round", IX86_BUILTIN_VCMPPH_V32HF_MASK_ROUND, UNKNOWN, (int) USI_FTYPE_V32HF_V32HF_INT_USI_INT) +BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512f_vmcmpv8hf3_mask_round, "__builtin_ia32_vcmpsh_v8hf_mask_round", IX86_BUILTIN_VCMPSH_V8HF_MASK_ROUND, UNKNOWN, (int) UQI_FTYPE_V8HF_V8HF_INT_UQI_INT) BDESC_END (ROUND_ARGS, MULTI_ARG) diff --git a/gcc/config/i386/i386-expand.c b/gcc/config/i386/i386-expand.c index 90f8e3a6d4c..a79cc324ceb 100644 --- a/gcc/config/i386/i386-expand.c +++ b/gcc/config/i386/i386-expand.c @@ -9821,14 +9821,17 @@ ix86_expand_args_builtin (const struct builtin_description *d, case UQI_FTYPE_V8SI_V8SI_INT_UQI: case QI_FTYPE_V4DF_V4DF_INT_UQI: case QI_FTYPE_V8SF_V8SF_INT_UQI: + case UHI_FTYPE_V16HF_V16HF_INT_UHI: case UQI_FTYPE_V2DI_V2DI_INT_UQI: case UQI_FTYPE_V4SI_V4SI_INT_UQI: case UQI_FTYPE_V2DF_V2DF_INT_UQI: case UQI_FTYPE_V4SF_V4SF_INT_UQI: + case UQI_FTYPE_V8HF_V8HF_INT_UQI: case UDI_FTYPE_V64QI_V64QI_INT_UDI: case USI_FTYPE_V32QI_V32QI_INT_USI: case UHI_FTYPE_V16QI_V16QI_INT_UHI: case USI_FTYPE_V32HI_V32HI_INT_USI: + case USI_FTYPE_V32HF_V32HF_INT_USI: case UHI_FTYPE_V16HI_V16HI_INT_UHI: case UQI_FTYPE_V8HI_V8HI_INT_UQI: nargs = 4; @@ -10112,6 +10115,9 @@ ix86_expand_args_builtin (const struct builtin_description *d, case CODE_FOR_avx512f_cmpv16sf3_mask: case CODE_FOR_avx512f_vmcmpv2df3_mask: case CODE_FOR_avx512f_vmcmpv4sf3_mask: + case CODE_FOR_avx512bw_cmpv32hf3_mask: + case CODE_FOR_avx512vl_cmpv16hf3_mask: + case CODE_FOR_avx512fp16_cmpv8hf3_mask: error ("the last argument must be a 5-bit immediate"); return const0_rtx; @@ -10532,6 +10538,8 @@ ix86_expand_round_builtin (const struct builtin_description *d, case UQI_FTYPE_V2DF_V2DF_INT_UQI_INT: case UHI_FTYPE_V16SF_V16SF_INT_UHI_INT: case UQI_FTYPE_V4SF_V4SF_INT_UQI_INT: + case USI_FTYPE_V32HF_V32HF_INT_USI_INT: + case UQI_FTYPE_V8HF_V8HF_INT_UQI_INT: nargs_constant = 3; nargs = 5; break; @@ -10587,6 +10595,8 @@ ix86_expand_round_builtin (const struct builtin_description *d, case CODE_FOR_avx512f_cmpv16sf3_mask_round: case CODE_FOR_avx512f_vmcmpv2df3_mask_round: case CODE_FOR_avx512f_vmcmpv4sf3_mask_round: + case CODE_FOR_avx512f_vmcmpv8hf3_mask_round: + case CODE_FOR_avx512bw_cmpv32hf3_mask_round: error ("the immediate argument must be a 5-bit immediate"); return const0_rtx; default: diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md index 25cee502f97..014aba187e1 100644 --- a/gcc/config/i386/i386.md +++ b/gcc/config/i386/i386.md @@ -1252,7 +1252,7 @@ (define_mode_attr ssevecmodesuffix [(SF "ps") (DF "pd")]) ;; SSE vector mode corresponding to a scalar mode (define_mode_attr ssevecmode - [(QI "V16QI") (HI "V8HI") (SI "V4SI") (DI "V2DI") (SF "V4SF") (DF "V2DF")]) + [(QI "V16QI") (HI "V8HI") (SI "V4SI") (DI "V2DI") (HF "V8HF") (SF "V4SF") (DF "V2DF")]) (define_mode_attr ssevecmodelower [(QI "v16qi") (HI "v8hi") (SI "v4si") (DI "v2di") (SF "v4sf") (DF "v2df")]) diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index 976803f2a1d..b7e22e0ec80 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -230,13 +230,23 @@ (define_mode_iterator VMOVE (V16SF "TARGET_AVX512F") (V8SF "TARGET_AVX") V4SF (V8DF "TARGET_AVX512F") (V4DF "TARGET_AVX") V2DF]) -;; All AVX-512{F,VL} vector modes. Supposed TARGET_AVX512F baseline. +;; All AVX-512{F,VL} vector modes without HF. Supposed TARGET_AVX512F baseline. (define_mode_iterator V48_AVX512VL [V16SI (V8SI "TARGET_AVX512VL") (V4SI "TARGET_AVX512VL") V8DI (V4DI "TARGET_AVX512VL") (V2DI "TARGET_AVX512VL") V16SF (V8SF "TARGET_AVX512VL") (V4SF "TARGET_AVX512VL") V8DF (V4DF "TARGET_AVX512VL") (V2DF "TARGET_AVX512VL")]) +;; All AVX-512{F,VL} vector modes. Supposed TARGET_AVX512F baseline. +(define_mode_iterator V48H_AVX512VL + [V16SI (V8SI "TARGET_AVX512VL") (V4SI "TARGET_AVX512VL") + V8DI (V4DI "TARGET_AVX512VL") (V2DI "TARGET_AVX512VL") + (V32HF "TARGET_AVX512FP16") + (V16HF "TARGET_AVX512FP16 && TARGET_AVX512VL") + (V8HF "TARGET_AVX512FP16 && TARGET_AVX512VL") + V16SF (V8SF "TARGET_AVX512VL") (V4SF "TARGET_AVX512VL") + V8DF (V4DF "TARGET_AVX512VL") (V2DF "TARGET_AVX512VL")]) + ;; 1,2 byte AVX-512{BW,VL} vector modes. Supposed TARGET_AVX512BW baseline. (define_mode_iterator VI12_AVX512VL [V64QI (V16QI "TARGET_AVX512VL") (V32QI "TARGET_AVX512VL") @@ -974,10 +984,10 @@ (define_mode_attr iptr [(V64QI "b") (V32HI "w") (V16SI "k") (V8DI "q") (V32QI "b") (V16HI "w") (V8SI "k") (V4DI "q") (V16QI "b") (V8HI "w") (V4SI "k") (V2DI "q") - (V16SF "k") (V8DF "q") - (V8SF "k") (V4DF "q") - (V4SF "k") (V2DF "q") - (SF "k") (DF "q")]) + (V32HF "w") (V16SF "k") (V8DF "q") + (V16HF "w") (V8SF "k") (V4DF "q") + (V8HF "w") (V4SF "k") (V2DF "q") + (HF "w") (SF "k") (DF "q")]) ;; Mapping of vector modes to VPTERNLOG suffix (define_mode_attr ternlogsuffix @@ -1024,6 +1034,18 @@ (define_mode_attr sseintprefix (V32QI "p") (V16HI "p") (V16HF "p") (V64QI "p") (V32HI "p") (V32HF "p")]) +;; SSE prefix for integer and HF vector comparison. +(define_mode_attr ssecmpintprefix + [(V2DI "p") (V2DF "") + (V4DI "p") (V4DF "") + (V8DI "p") (V8DF "") + (V4SI "p") (V4SF "") + (V8SI "p") (V8SF "") + (V16SI "p") (V16SF "") + (V16QI "p") (V8HI "p") (V8HF "") + (V32QI "p") (V16HI "p") (V16HF "") + (V64QI "p") (V32HI "p") (V32HF "")]) + ;; SSE scalar suffix for vector modes (define_mode_attr ssescalarmodesuffix [(HF "sh") (SF "ss") (DF "sd") @@ -3263,11 +3285,11 @@ (define_insn "_vmmaskcmp3" (set_attr "mode" "")]) (define_mode_attr cmp_imm_predicate - [(V16SF "const_0_to_31_operand") (V8DF "const_0_to_31_operand") + [(V32HF "const_0_to_31_operand") (V16SF "const_0_to_31_operand") (V8DF "const_0_to_31_operand") (V16SI "const_0_to_7_operand") (V8DI "const_0_to_7_operand") - (V8SF "const_0_to_31_operand") (V4DF "const_0_to_31_operand") + (V16HF "const_0_to_31_operand") (V8SF "const_0_to_31_operand") (V4DF "const_0_to_31_operand") (V8SI "const_0_to_7_operand") (V4DI "const_0_to_7_operand") - (V4SF "const_0_to_31_operand") (V2DF "const_0_to_31_operand") + (V8HF "const_0_to_31_operand") (V4SF "const_0_to_31_operand") (V2DF "const_0_to_31_operand") (V4SI "const_0_to_7_operand") (V2DI "const_0_to_7_operand") (V32HI "const_0_to_7_operand") (V64QI "const_0_to_7_operand") (V16HI "const_0_to_7_operand") (V32QI "const_0_to_7_operand") @@ -3276,12 +3298,12 @@ (define_mode_attr cmp_imm_predicate (define_insn "_cmp3" [(set (match_operand: 0 "register_operand" "=k") (unspec: - [(match_operand:V48_AVX512VL 1 "register_operand" "v") - (match_operand:V48_AVX512VL 2 "nonimmediate_operand" "") + [(match_operand:V48H_AVX512VL 1 "register_operand" "v") + (match_operand:V48H_AVX512VL 2 "nonimmediate_operand" "") (match_operand:SI 3 "" "n")] UNSPEC_PCMP))] "TARGET_AVX512F && " - "vcmp\t{%3, %2, %1, %0|%0, %1, %2, %3}" + "vcmp\t{%3, %2, %1, %0|%0, %1, %2, %3}" [(set_attr "type" "ssecmp") (set_attr "length_immediate" "1") (set_attr "prefix" "evex") @@ -3428,8 +3450,8 @@ (define_insn "avx512f_vmcmp3" [(set (match_operand: 0 "register_operand" "=k") (and: (unspec: - [(match_operand:VF_128 1 "register_operand" "v") - (match_operand:VF_128 2 "" "") + [(match_operand:VFH_128 1 "register_operand" "v") + (match_operand:VFH_128 2 "" "") (match_operand:SI 3 "const_0_to_31_operand" "n")] UNSPEC_PCMP) (const_int 1)))] @@ -3444,8 +3466,8 @@ (define_insn "avx512f_vmcmp3_mask" [(set (match_operand: 0 "register_operand" "=k") (and: (unspec: - [(match_operand:VF_128 1 "register_operand" "v") - (match_operand:VF_128 2 "" "") + [(match_operand:VFH_128 1 "register_operand" "v") + (match_operand:VFH_128 2 "" "") (match_operand:SI 3 "const_0_to_31_operand" "n")] UNSPEC_PCMP) (and: @@ -3461,10 +3483,10 @@ (define_insn "avx512f_vmcmp3_mask" (define_insn "_comi" [(set (reg:CCFP FLAGS_REG) (compare:CCFP - (vec_select:MODEF + (vec_select:MODEFH (match_operand: 0 "register_operand" "v") (parallel [(const_int 0)])) - (vec_select:MODEF + (vec_select:MODEFH (match_operand: 1 "" "") (parallel [(const_int 0)]))))] "SSE_FLOAT_MODE_P (mode)" diff --git a/gcc/testsuite/gcc.target/i386/avx-1.c b/gcc/testsuite/gcc.target/i386/avx-1.c index 7106076b2a3..d9aa8a70e35 100644 --- a/gcc/testsuite/gcc.target/i386/avx-1.c +++ b/gcc/testsuite/gcc.target/i386/avx-1.c @@ -698,6 +698,13 @@ #define __builtin_ia32_vminph_v32hf_mask_round(A, B, C, D, E) __builtin_ia32_vminph_v32hf_mask_round(A, B, C, D, 8) #define __builtin_ia32_vmaxsh_v8hf_mask_round(A, B, C, D, E) __builtin_ia32_vmaxsh_v8hf_mask_round(A, B, C, D, 8) #define __builtin_ia32_vminsh_v8hf_mask_round(A, B, C, D, E) __builtin_ia32_vminsh_v8hf_mask_round(A, B, C, D, 8) +#define __builtin_ia32_vcmpph_v32hf_mask(A, B, C, D) __builtin_ia32_vcmpph_v32hf_mask(A, B, 1, D) +#define __builtin_ia32_vcmpph_v32hf_mask_round(A, B, C, D, E) __builtin_ia32_vcmpph_v32hf_mask_round(A, B, 1, D, 8) +#define __builtin_ia32_vcmpsh_v8hf_mask_round(A, B, C, D, E) __builtin_ia32_vcmpsh_v8hf_mask_round(A, B, 1, D, 8) + +/* avx512fp16vlintrin.h */ +#define __builtin_ia32_vcmpph_v8hf_mask(A, B, C, D) __builtin_ia32_vcmpph_v8hf_mask(A, B, 1, D) +#define __builtin_ia32_vcmpph_v16hf_mask(A, B, C, D) __builtin_ia32_vcmpph_v16hf_mask(A, B, 1, D) /* vpclmulqdqintrin.h */ #define __builtin_ia32_vpclmulqdq_v4di(A, B, C) __builtin_ia32_vpclmulqdq_v4di(A, B, 1) diff --git a/gcc/testsuite/gcc.target/i386/sse-13.c b/gcc/testsuite/gcc.target/i386/sse-13.c index 1732b50be6b..9a2833d78f2 100644 --- a/gcc/testsuite/gcc.target/i386/sse-13.c +++ b/gcc/testsuite/gcc.target/i386/sse-13.c @@ -715,6 +715,13 @@ #define __builtin_ia32_vminph_v32hf_mask_round(A, B, C, D, E) __builtin_ia32_vminph_v32hf_mask_round(A, B, C, D, 8) #define __builtin_ia32_vmaxsh_v8hf_mask_round(A, B, C, D, E) __builtin_ia32_vmaxsh_v8hf_mask_round(A, B, C, D, 8) #define __builtin_ia32_vminsh_v8hf_mask_round(A, B, C, D, E) __builtin_ia32_vminsh_v8hf_mask_round(A, B, C, D, 8) +#define __builtin_ia32_vcmpph_v32hf_mask(A, B, C, D) __builtin_ia32_vcmpph_v32hf_mask(A, B, 1, D) +#define __builtin_ia32_vcmpph_v32hf_mask_round(A, B, C, D, E) __builtin_ia32_vcmpph_v32hf_mask_round(A, B, 1, D, 8) +#define __builtin_ia32_vcmpsh_v8hf_mask_round(A, B, C, D, E) __builtin_ia32_vcmpsh_v8hf_mask_round(A, B, 1, D, 8) + +/* avx512fp16vlintrin.h */ +#define __builtin_ia32_vcmpph_v8hf_mask(A, B, C, D) __builtin_ia32_vcmpph_v8hf_mask(A, B, 1, D) +#define __builtin_ia32_vcmpph_v16hf_mask(A, B, C, D) __builtin_ia32_vcmpph_v16hf_mask(A, B, 1, D) /* vpclmulqdqintrin.h */ #define __builtin_ia32_vpclmulqdq_v4di(A, B, C) __builtin_ia32_vpclmulqdq_v4di(A, B, 1) diff --git a/gcc/testsuite/gcc.target/i386/sse-14.c b/gcc/testsuite/gcc.target/i386/sse-14.c index 135b4463941..ce0ad71f190 100644 --- a/gcc/testsuite/gcc.target/i386/sse-14.c +++ b/gcc/testsuite/gcc.target/i386/sse-14.c @@ -286,6 +286,7 @@ test_2 (_mm_add_round_sd, __m128d, __m128d, __m128d, 9) test_2 (_mm_add_round_ss, __m128, __m128, __m128, 9) test_2 (_mm_cmp_sd_mask, __mmask8, __m128d, __m128d, 1) test_2 (_mm_cmp_ss_mask, __mmask8, __m128, __m128, 1) +test_2 (_mm_cmp_sh_mask, __mmask8, __m128h, __m128h, 1) #ifdef __x86_64__ test_2 (_mm_cvt_roundi64_sd, __m128d, __m128d, long long, 9) test_2 (_mm_cvt_roundi64_ss, __m128, __m128, long long, 9) @@ -470,6 +471,7 @@ test_3 (_mm256_maskz_shldi_epi64, __m256i, __mmask8, __m256i, __m256i, 1) test_3 (_mm_maskz_shldi_epi16, __m128i, __mmask8, __m128i, __m128i, 1) test_3 (_mm_maskz_shldi_epi32, __m128i, __mmask8, __m128i, __m128i, 1) test_3 (_mm_maskz_shldi_epi64, __m128i, __mmask8, __m128i, __m128i, 1) +test_3 (_mm_mask_cmp_sh_mask, __mmask8, __mmask8, __m128h, __m128h, 1) test_3v (_mm512_i32scatter_epi32, void *, __m512i, __m512i, 1) test_3v (_mm512_i32scatter_epi64, void *, __m256i, __m512i, 1) test_3v (_mm512_i32scatter_pd, void *, __m256i, __m512d, 1) @@ -680,6 +682,11 @@ test_2 (_mm512_max_round_ph, __m512h, __m512h, __m512h, 8) test_2 (_mm512_min_round_ph, __m512h, __m512h, __m512h, 8) test_2 (_mm_max_round_sh, __m128h, __m128h, __m128h, 8) test_2 (_mm_min_round_sh, __m128h, __m128h, __m128h, 8) +test_2 (_mm512_cmp_ph_mask, __mmask32, __m512h, __m512h, 1) +test_2 (_mm_comi_sh, int, __m128h, __m128h, 1) +test_2x (_mm512_cmp_round_ph_mask, __mmask32, __m512h, __m512h, 1, 8) +test_2x (_mm_cmp_round_sh_mask, __mmask8, __m128h, __m128h, 1, 8) +test_2x (_mm_comi_round_sh, int, __m128h, __m128h, 1, 8) test_3 (_mm512_maskz_add_round_ph, __m512h, __mmask32, __m512h, __m512h, 8) test_3 (_mm512_maskz_sub_round_ph, __m512h, __mmask32, __m512h, __m512h, 8) test_3 (_mm512_maskz_mul_round_ph, __m512h, __mmask32, __m512h, __m512h, 8) @@ -692,6 +699,9 @@ test_3 (_mm512_maskz_max_round_ph, __m512h, __mmask32, __m512h, __m512h, 8) test_3 (_mm512_maskz_min_round_ph, __m512h, __mmask32, __m512h, __m512h, 8) test_3 (_mm_maskz_max_round_sh, __m128h, __mmask8, __m128h, __m128h, 8) test_3 (_mm_maskz_min_round_sh, __m128h, __mmask8, __m128h, __m128h, 8) +test_3 (_mm512_mask_cmp_ph_mask, __mmask32, __mmask32, __m512h, __m512h, 1) +test_3x (_mm512_mask_cmp_round_ph_mask, __mmask32, __mmask32, __m512h, __m512h, 1, 8) +test_3x (_mm_mask_cmp_round_sh_mask, __mmask8, __mmask8, __m128h, __m128h, 1, 8) test_4 (_mm512_mask_add_round_ph, __m512h, __m512h, __mmask32, __m512h, __m512h, 8) test_4 (_mm512_mask_sub_round_ph, __m512h, __m512h, __mmask32, __m512h, __m512h, 8) test_4 (_mm512_mask_mul_round_ph, __m512h, __m512h, __mmask32, __m512h, __m512h, 8) @@ -705,6 +715,12 @@ test_4 (_mm512_mask_min_round_ph, __m512h, __m512h, __mmask32, __m512h, __m512h, test_4 (_mm_mask_max_round_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 8) test_4 (_mm_mask_min_round_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 8) +/* avx512fp16vlintrin.h */ +test_2 (_mm_cmp_ph_mask, __mmask8, __m128h, __m128h, 1) +test_2 (_mm256_cmp_ph_mask, __mmask16, __m256h, __m256h, 1) +test_3 (_mm_mask_cmp_ph_mask, __mmask8, __mmask8, __m128h, __m128h, 1) +test_3 (_mm256_mask_cmp_ph_mask, __mmask16, __mmask16, __m256h, __m256h, 1) + /* shaintrin.h */ test_2 (_mm_sha1rnds4_epu32, __m128i, __m128i, __m128i, 1) diff --git a/gcc/testsuite/gcc.target/i386/sse-22.c b/gcc/testsuite/gcc.target/i386/sse-22.c index da3f5606207..439346490bd 100644 --- a/gcc/testsuite/gcc.target/i386/sse-22.c +++ b/gcc/testsuite/gcc.target/i386/sse-22.c @@ -457,6 +457,7 @@ test_2 (_mm256_shldi_epi64, __m256i, __m256i, __m256i, 1) test_2 (_mm_shldi_epi16, __m128i, __m128i, __m128i, 1) test_2 (_mm_shldi_epi32, __m128i, __m128i, __m128i, 1) test_2 (_mm_shldi_epi64, __m128i, __m128i, __m128i, 1) +test_2 (_mm_cmp_sh_mask, __mmask8, __m128h, __m128h, 1) #ifdef __x86_64__ test_2 (_mm_cvt_roundi64_sd, __m128d, __m128d, long long, 9) test_2 (_mm_cvt_roundi64_ss, __m128, __m128, long long, 9) @@ -581,6 +582,7 @@ test_3 (_mm256_maskz_shldi_epi64, __m256i, __mmask8, __m256i, __m256i, 1) test_3 (_mm_maskz_shldi_epi16, __m128i, __mmask8, __m128i, __m128i, 1) test_3 (_mm_maskz_shldi_epi32, __m128i, __mmask8, __m128i, __m128i, 1) test_3 (_mm_maskz_shldi_epi64, __m128i, __mmask8, __m128i, __m128i, 1) +test_3 (_mm_mask_cmp_sh_mask, __mmask8, __mmask8, __m128h, __m128h, 1) test_3v (_mm512_i32scatter_epi32, void *, __m512i, __m512i, 1) test_3v (_mm512_i32scatter_epi64, void *, __m256i, __m512i, 1) test_3v (_mm512_i32scatter_pd, void *, __m256i, __m512d, 1) @@ -785,6 +787,11 @@ test_2 (_mm512_max_round_ph, __m512h, __m512h, __m512h, 8) test_2 (_mm512_min_round_ph, __m512h, __m512h, __m512h, 8) test_2 (_mm_max_round_sh, __m128h, __m128h, __m128h, 8) test_2 (_mm_min_round_sh, __m128h, __m128h, __m128h, 8) +test_2 (_mm512_cmp_ph_mask, __mmask32, __m512h, __m512h, 1) +test_2 (_mm_comi_sh, int, __m128h, __m128h, 1) +test_2x (_mm512_cmp_round_ph_mask, __mmask32, __m512h, __m512h, 1, 8) +test_2x (_mm_cmp_round_sh_mask, __mmask8, __m128h, __m128h, 1, 8) +test_2x (_mm_comi_round_sh, int, __m128h, __m128h, 1, 8) test_3 (_mm512_maskz_add_round_ph, __m512h, __mmask32, __m512h, __m512h, 8) test_3 (_mm512_maskz_sub_round_ph, __m512h, __mmask32, __m512h, __m512h, 8) test_3 (_mm512_maskz_mul_round_ph, __m512h, __mmask32, __m512h, __m512h, 8) @@ -797,6 +804,9 @@ test_3 (_mm512_maskz_max_round_ph, __m512h, __mmask32, __m512h, __m512h, 8) test_3 (_mm512_maskz_min_round_ph, __m512h, __mmask32, __m512h, __m512h, 8) test_3 (_mm_maskz_max_round_sh, __m128h, __mmask8, __m128h, __m128h, 8) test_3 (_mm_maskz_min_round_sh, __m128h, __mmask8, __m128h, __m128h, 8) +test_3 (_mm512_mask_cmp_ph_mask, __mmask32, __mmask32, __m512h, __m512h, 1) +test_3x (_mm512_mask_cmp_round_ph_mask, __mmask32, __mmask32, __m512h, __m512h, 1, 8) +test_3x (_mm_mask_cmp_round_sh_mask, __mmask8, __mmask8, __m128h, __m128h, 1, 8) test_4 (_mm512_mask_add_round_ph, __m512h, __m512h, __mmask32, __m512h, __m512h, 8) test_4 (_mm512_mask_sub_round_ph, __m512h, __m512h, __mmask32, __m512h, __m512h, 8) test_4 (_mm512_mask_mul_round_ph, __m512h, __m512h, __mmask32, __m512h, __m512h, 8) @@ -810,6 +820,12 @@ test_4 (_mm512_mask_min_round_ph, __m512h, __m512h, __mmask32, __m512h, __m512h, test_4 (_mm_mask_max_round_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 8) test_4 (_mm_mask_min_round_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 8) +/* avx512fp16vlintrin.h */ +test_2 (_mm_cmp_ph_mask, __mmask8, __m128h, __m128h, 1) +test_2 (_mm256_cmp_ph_mask, __mmask16, __m256h, __m256h, 1) +test_3 (_mm_mask_cmp_ph_mask, __mmask8, __mmask8, __m128h, __m128h, 1) +test_3 (_mm256_mask_cmp_ph_mask, __mmask16, __mmask16, __m256h, __m256h, 1) + /* shaintrin.h */ test_2 (_mm_sha1rnds4_epu32, __m128i, __m128i, __m128i, 1) diff --git a/gcc/testsuite/gcc.target/i386/sse-23.c b/gcc/testsuite/gcc.target/i386/sse-23.c index c3fee655288..f6768bac345 100644 --- a/gcc/testsuite/gcc.target/i386/sse-23.c +++ b/gcc/testsuite/gcc.target/i386/sse-23.c @@ -716,6 +716,13 @@ #define __builtin_ia32_vminph_v32hf_mask_round(A, B, C, D, E) __builtin_ia32_vminph_v32hf_mask_round(A, B, C, D, 8) #define __builtin_ia32_vmaxsh_v8hf_mask_round(A, B, C, D, E) __builtin_ia32_vmaxsh_v8hf_mask_round(A, B, C, D, 8) #define __builtin_ia32_vminsh_v8hf_mask_round(A, B, C, D, E) __builtin_ia32_vminsh_v8hf_mask_round(A, B, C, D, 8) +#define __builtin_ia32_vcmpph_v32hf_mask(A, B, C, D) __builtin_ia32_vcmpph_v32hf_mask(A, B, 1, D) +#define __builtin_ia32_vcmpph_v32hf_mask_round(A, B, C, D, E) __builtin_ia32_vcmpph_v32hf_mask_round(A, B, 1, D, 8) +#define __builtin_ia32_vcmpsh_v8hf_mask_round(A, B, C, D, E) __builtin_ia32_vcmpsh_v8hf_mask_round(A, B, 1, D, 8) + +/* avx512fp16vlintrin.h */ +#define __builtin_ia32_vcmpph_v8hf_mask(A, B, C, D) __builtin_ia32_vcmpph_v8hf_mask(A, B, 1, D) +#define __builtin_ia32_vcmpph_v16hf_mask(A, B, C, D) __builtin_ia32_vcmpph_v16hf_mask(A, B, 1, D) /* vpclmulqdqintrin.h */ #define __builtin_ia32_vpclmulqdq_v4di(A, B, C) __builtin_ia32_vpclmulqdq_v4di(A, B, 1) From patchwork Thu Jul 1 06:16:01 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: liuhongt X-Patchwork-Id: 1499322 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256 header.s=default header.b=Eb7oPPQZ; dkim-atps=neutral Received: from sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4GFpNT2yvXz9sWX for ; Thu, 1 Jul 2021 16:34:49 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id C228C384A014 for ; Thu, 1 Jul 2021 06:34:46 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org C228C384A014 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1625121286; bh=miGS6FHyv+oRVCIoI8gjxpAYVK085sX9DvDyOUZFtcc=; h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=Eb7oPPQZaMXPoDkFve6uYA7AH7CpnVwyfTmTCOoZhAfqhXI6Y5wT6YbQe8u9jRCQT kkEmNTdWO6cXeN3UB1JVet6C1mRIF8E6UV7WwOF1knwYf0fwQ7PNaj+qP8PmkW12Ou /f8jBK41iBgTQSA77JmbpIdt5uVdW8ZZJFFI/uFE= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mga02.intel.com (mga02.intel.com [134.134.136.20]) by sourceware.org (Postfix) with ESMTPS id BEFF6384A013 for ; Thu, 1 Jul 2021 06:17:16 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org BEFF6384A013 X-IronPort-AV: E=McAfee;i="6200,9189,10031"; a="195639115" X-IronPort-AV: E=Sophos;i="5.83,313,1616482800"; d="scan'208";a="195639115" Received: from orsmga005.jf.intel.com ([10.7.209.41]) by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Jun 2021 23:17:15 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.83,313,1616482800"; d="scan'208";a="626257412" Received: from scymds01.sc.intel.com ([10.148.94.138]) by orsmga005.jf.intel.com with ESMTP; 30 Jun 2021 23:17:15 -0700 Received: from shliclel320.sh.intel.com (shliclel320.sh.intel.com [10.239.236.50]) by scymds01.sc.intel.com with ESMTP id 1616Gmes031625; Wed, 30 Jun 2021 23:17:14 -0700 To: gcc-patches@gcc.gnu.org Subject: [PATCH 15/62] AVX512FP16: Add testcase for vcmpph/vcmpsh/vcomish/vucomish. Date: Thu, 1 Jul 2021 14:16:01 +0800 Message-Id: <20210701061648.9447-16-hongtao.liu@intel.com> X-Mailer: git-send-email 2.18.1 In-Reply-To: <20210701061648.9447-1-hongtao.liu@intel.com> References: <20210701061648.9447-1-hongtao.liu@intel.com> X-Spam-Status: No, score=-13.0 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_NONE, KAM_DMARC_STATUS, KAM_LAZY_DOMAIN_SECURITY, KAM_SHORT, RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: liuhongt via Gcc-patches From: liuhongt Reply-To: liuhongt Cc: jakub@redhat.com Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" gcc/testsuite/ChangeLog: * gcc.target/i386/avx512fp16-helper.h (check_results_mask): New check_function. * gcc.target/i386/avx512fp16-vcmpph-1a.c: New test. * gcc.target/i386/avx512fp16-vcmpph-1b.c: Ditto. * gcc.target/i386/avx512fp16-vcmpsh-1a.c: Ditto. * gcc.target/i386/avx512fp16-vcmpsh-1b.c: Ditto. * gcc.target/i386/avx512fp16-vcomish-1a.c: Ditto. * gcc.target/i386/avx512fp16-vcomish-1b.c: Ditto. * gcc.target/i386/avx512fp16-vcomish-1c.c: Ditto. * gcc.target/i386/avx512fp16vl-vcmpph-1a.c: Ditto. * gcc.target/i386/avx512fp16vl-vcmpph-1b.c: Ditto. --- .../gcc.target/i386/avx512fp16-helper.h | 37 ++++++++++ .../gcc.target/i386/avx512fp16-vcmpph-1a.c | 22 ++++++ .../gcc.target/i386/avx512fp16-vcmpph-1b.c | 70 +++++++++++++++++++ .../gcc.target/i386/avx512fp16-vcmpsh-1a.c | 21 ++++++ .../gcc.target/i386/avx512fp16-vcmpsh-1b.c | 45 ++++++++++++ .../gcc.target/i386/avx512fp16-vcomish-1a.c | 41 +++++++++++ .../gcc.target/i386/avx512fp16-vcomish-1b.c | 66 +++++++++++++++++ .../gcc.target/i386/avx512fp16-vcomish-1c.c | 66 +++++++++++++++++ .../gcc.target/i386/avx512fp16vl-vcmpph-1a.c | 24 +++++++ .../gcc.target/i386/avx512fp16vl-vcmpph-1b.c | 16 +++++ 10 files changed, 408 insertions(+) create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcmpph-1a.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcmpph-1b.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcmpsh-1a.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcmpsh-1b.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcomish-1a.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcomish-1b.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcomish-1c.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vcmpph-1a.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vcmpph-1b.c diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-helper.h b/gcc/testsuite/gcc.target/i386/avx512fp16-helper.h index 9fde88a4f7b..5d3539bf312 100644 --- a/gcc/testsuite/gcc.target/i386/avx512fp16-helper.h +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-helper.h @@ -107,6 +107,10 @@ display_ps(const void *p, const char *banner, int n_elems) check_results ((void*)res, (void*)exp, size,\ NAME_OF(intrin)) +#define CHECK_RESULT_MASK(res, exp, size, intrin) \ + check_results_mask ((__mmask32)res, (__mmask32)exp, size,\ + NAME_OF(intrin)) + /* To evaluate whether result match _Float16 precision, only the last bit of real/emulate result could be different. */ @@ -136,6 +140,18 @@ check_results(void *got, void *exp, int n_elems, char *banner) } } +void NOINLINE +check_results_mask(__mmask32 got, __mmask32 exp, int n_elems, char *banner) +{ + if (got != exp) { +#ifdef DEBUG + printf("ERROR: %s failed : got mask %x != exp mask %x\n", + banner ? banner : "", got, exp); +#endif + n_errs++; + } +} + /* Functions for src/dest initialization */ void NOINLINE init_src() @@ -156,6 +172,27 @@ init_src() src2 = pack_twops_2ph(v3, v4); } +void NOINLINE +init_src_nanf() +{ + V512 v1, v2, v3, v4; + int i; + + for (i = 0; i < 16; i++) { + v1.f32[i] = i + 1 + 0.5; + v2.f32[i] = i + 17 + 0.5; + v3.f32[i] = i * 2 + 2 + 0.5; + v4.f32[i] = i * 2 + 34 + 0.5; + + src3.u32[i] = (i + 1) * 10; + } + + v1.f32[0] = __builtin_nanf(""); + src1 = pack_twops_2ph(v1, v2); + src2 = pack_twops_2ph(v3, v4); +} + + void NOINLINE init_dest(V512 * res, V512 * exp) { diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vcmpph-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vcmpph-1a.c new file mode 100644 index 00000000000..6425c4644c1 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vcmpph-1a.c @@ -0,0 +1,22 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx512fp16 -O2" } */ +/* { dg-final { scan-assembler-times "vcmpph\[ \\t\]+\\\$1\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%k\[0-9\](?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcmpph\[ \\t\]+\\\$2\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%k\[0-9\]\{%k\[0-9\]\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcmpph\[ \\t\]+\\\$3\[^\n\r]*\{sae\}\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%k\[0-9\]\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcmpph\[ \\t\]+\[^\{\n\]*\\\$4\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%k\[0-9\]\{%k\[0-9\]\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ + +#include + +volatile __mmask32 res, res1, res2; +volatile __m512h x1, x2; +volatile __mmask32 m32; +volatile __mmask8 m8; + +void extern +avx512f_test (void) +{ + res = _mm512_cmp_ph_mask (x1, x2, 1); + res1 = _mm512_mask_cmp_ph_mask (m32, x1, x2, 2); + res = _mm512_cmp_round_ph_mask (x1, x2, 3, 8); + res1 = _mm512_mask_cmp_round_ph_mask (m32, x1, x2, 4, 4); +} diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vcmpph-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vcmpph-1b.c new file mode 100644 index 00000000000..ec5eccfccb7 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vcmpph-1b.c @@ -0,0 +1,70 @@ +/* { dg-do run { target avx512fp16 } } */ +/* { dg-options "-O2 -mavx512fp16 -mavx512dq" } */ + +#define AVX512FP16 +#include "avx512fp16-helper.h" + +#define N_ELEMS (AVX512F_LEN / 16) + +__mmask32 NOINLINE +EMULATE(cmp_ph) (V512 op1, V512 op2, + __mmask32 k, int predicate) +{ + V512 v1, v2, v3, v4, v5, v6, v7, v8; + int i, j; + __mmask16 mr1 = 0, mr2 = 0; + __mmask16 m1, m2; + __mmask32 mr = 0; + + m1 = k & 0xffff; + m2 = (k >> 16) & 0xffff; + + unpack_ph_2twops(op1, &v1, &v2); + unpack_ph_2twops(op2, &v3, &v4); + + for (i = 0; i < 16; i++) { + if (((1 << i) & m1) != 0) { + j = v1.f32[i] == v3.f32[i] ? 1 : 0; + mr1 = mr1 | (j << i); + } + + if (((1 << i) & m2) != 0) { + j = v2.f32[i] == v4.f32[i] ? 1 : 0; + mr2 = mr2 | (j << i); + } + } + + mr = mr1 | (mr2 << 16); + return mr; +} + +void +TEST (void) +{ + __mmask32 res, exp; + + init_src(); + + exp = EMULATE(cmp_ph) (src1, src2, NET_MASK, 0); + res = INTRINSIC (_cmp_ph_mask) (HF(src1), HF(src2), 0); + CHECK_RESULT_MASK (res, exp, N_ELEMS, _cmp_ph_mask); + + exp = EMULATE(cmp_ph) (src1, src2, MASK_VALUE, 0); + res = INTRINSIC (_mask_cmp_ph_mask) (MASK_VALUE, HF(src1), HF(src2), 0); + CHECK_RESULT_MASK (res, exp, N_ELEMS, _mask_cmp_ph_mask); + +#if AVX512F_LEN == 512 + exp = EMULATE(cmp_ph) (src1, src2, NET_MASK, 0); + res = INTRINSIC (_cmp_round_ph_mask) (HF(src1), HF(src2), 0, 8); + CHECK_RESULT_MASK (res, exp, N_ELEMS, _cmp_round_ph_mask); + + exp = EMULATE(cmp_ph) (src1, src2, MASK_VALUE, 0); + res = INTRINSIC (_mask_cmp_round_ph_mask) (MASK_VALUE, HF(src1), HF(src2), 0, 8); + CHECK_RESULT_MASK (res, exp, N_ELEMS, _mask_cmp_round_ph_mask); +#endif + + if (n_errs != 0) { + abort (); + } +} + diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vcmpsh-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vcmpsh-1a.c new file mode 100644 index 00000000000..5cce097d661 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vcmpsh-1a.c @@ -0,0 +1,21 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx512fp16 -O2" } */ +/* { dg-final { scan-assembler-times "vcmpsh\[ \\t\]+\\\$3\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%k\[0-9\]\[^\n\r]*(?:\n|\[ \\t\]+#)" 2 } } */ +/* { dg-final { scan-assembler-times "vcmpsh\[ \\t\]+\[^\{\n\]*\\\$4\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%k\[0-9\]\{%k\[0-9\]\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 2 } } */ +/* { dg-final { scan-assembler-times "vcmpsh\[ \\t\]+\\\$3\[^\n\r]*\{sae\}\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%k\[0-9\]\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcmpsh\[ \\t\]+\[^\{\n\]*\\\$4\[^\n\r]*\{sae\}\[^\n\r\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%k\[0-9\]\{%k\[0-9\]\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ + +#include + +volatile __mmask8 res, res1, res2; +volatile __m128h x1, x2; +volatile __mmask8 m8; + +void extern +avx512f_test (void) +{ + res = _mm_cmp_sh_mask (x1, x2, 3); + res = _mm_mask_cmp_sh_mask (m8, x1, x2, 4); + res = _mm_cmp_round_sh_mask (x1, x2, 3, 8); + res1 = _mm_mask_cmp_round_sh_mask (m8, x1, x2, 4, 8); +} diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vcmpsh-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vcmpsh-1b.c new file mode 100644 index 00000000000..9deae52b41d --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vcmpsh-1b.c @@ -0,0 +1,45 @@ +/* { dg-do run { target avx512fp16 } } */ +/* { dg-options "-O2 -mavx512fp16 -mavx512dq" } */ + + +#define AVX512FP16 +#include "avx512fp16-helper.h" + +#define N_ELEMS 8 + +__mmask8 NOINLINE +emulate_cmp_sh(V512 op1, V512 op2, + __mmask8 k, int predicate) +{ + V512 v1, v2, v3, v4, v5, v6, v7, v8; + __mmask8 mr = 0; + + unpack_ph_2twops(op1, &v1, &v2); + unpack_ph_2twops(op2, &v3, &v4); + + if ((k&1) || !k) + mr = v1.f32[0] == v3.f32[0] ? 1 : 0; + + return mr; +} + +void +test_512 (void) +{ + __mmask8 res, exp; + + init_src(); + + exp = emulate_cmp_sh(src1, src2, 0x1, 0); + res = _mm_cmp_round_sh_mask(src1.xmmh[0], src2.xmmh[0], 0, 8); + check_results_mask(res, exp, 1, "_mm_cmp_round_sh_mask"); + + exp = emulate_cmp_sh(src1, src2, 0x1, 0); + res = _mm_mask_cmp_round_sh_mask(0x1, src1.xmmh[0], src2.xmmh[0], 0, 8); + check_results_mask(res, exp, 1, "_mm_mask_cmp_round_sh_mask"); + + if (n_errs != 0) { + abort (); + } +} + diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vcomish-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vcomish-1a.c new file mode 100644 index 00000000000..b87ffd9b80f --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vcomish-1a.c @@ -0,0 +1,41 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx512fp16 -O2" } */ +/* { dg-final { scan-assembler-times "vcmpsh\[ \\t\]+\\\$3\[^\n\r]*\{sae\}\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%k\[0-9\]\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcmpsh\[ \\t\]+\\\$7\[^\n\r0-9]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%k\[0-9\]\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcmpsh\[ \\t\]+\\\$16\[^\n\r0-9]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%k\[0-9\]\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcmpsh\[ \\t\]+\\\$1\[^\n\r0-9]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%k\[0-9\]\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcmpsh\[ \\t\]+\\\$2\[^\n\r0-9]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%k\[0-9\]\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcmpsh\[ \\t\]+\\\$14\[^\n\r0-9]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%k\[0-9\]\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcmpsh\[ \\t\]+\\\$13\[^\n\r0-9]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%k\[0-9\]\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcmpsh\[ \\t\]+\\\$20\[^\n\r0-9]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%k\[0-9\]\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcmpsh\[ \\t\]+\\\$0\[^\n\r0-9]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%k\[0-9\]\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcmpsh\[ \\t\]+\\\$17\[^\n\r0-9]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%k\[0-9\]\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcmpsh\[ \\t\]+\\\$18\[^\n\r0-9]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%k\[0-9\]\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcmpsh\[ \\t\]+\\\$30\[^\n\r0-9]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%k\[0-9\]\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcmpsh\[ \\t\]+\\\$29\[^\n\r0-9]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%k\[0-9\]\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcmpsh\[ \\t\]+\\\$4\[^\n\r0-9]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%k\[0-9\]\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ + +#include + +volatile __m128h x, y; +volatile int res; + +void extern +avx512f_test (void) +{ + res = _mm_comi_round_sh (x, y, 3, 8); + res = _mm_comi_sh (x, y, 7); + res = _mm_comieq_sh (x, y); + res = _mm_comilt_sh (x, y); + res = _mm_comile_sh (x, y); + res = _mm_comigt_sh (x, y); + res = _mm_comige_sh (x, y); + res = _mm_comineq_sh (x, y); + res = _mm_ucomieq_sh (x, y); + res = _mm_ucomilt_sh (x, y); + res = _mm_ucomile_sh (x, y); + res = _mm_ucomigt_sh (x, y); + res = _mm_ucomige_sh (x, y); + res = _mm_ucomineq_sh (x, y); +} + diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vcomish-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vcomish-1b.c new file mode 100644 index 00000000000..8c398003cb9 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vcomish-1b.c @@ -0,0 +1,66 @@ +/* { dg-do run { target avx512fp16 } } */ +/* { dg-options "-O2 -mavx512fp16 -mavx512dq" } */ + + +#define AVX512FP16 +#include "avx512fp16-helper.h" + + +#define CMP(imm, rel) \ + dst_ref = 0; \ + dst_ref = ((int) rel) | dst_ref; \ + dst = _mm_comi_round_sh(src1.xmmh[0], src2.xmmh[0], imm, \ + _MM_FROUND_NO_EXC); \ + if (dst_ref != dst) abort(); \ + +void +test_512 (void) +{ + V512 v1,v2,v3,v4; + float s1,s2; + int res,exp,dst; + __mmask8 dst_ref; + + init_src(); + unpack_ph_2twops(src1, &v1, &v2); + unpack_ph_2twops(src2, &v3, &v4); + s1 = v1.f32[0]; + s2 = v3.f32[0]; + + CMP(_CMP_EQ_OQ, !isunordered(s1, s2) && s1 == s2); + CMP(_CMP_LT_OS, !isunordered(s1, s2) && s1 < s2); + CMP(_CMP_LE_OS, !isunordered(s1, s2) && s1 <= s2); + CMP(_CMP_UNORD_Q, isunordered(s1, s2)); + CMP(_CMP_NEQ_UQ, isunordered(s1, s2) || s1 != s2); + CMP(_CMP_NLT_US, isunordered(s1, s2) || s1 >= s2); + CMP(_CMP_NLE_US, isunordered(s1, s2) || s1 > s2); + CMP(_CMP_ORD_Q, !isunordered(s1, s2)); + + CMP(_CMP_EQ_UQ, isunordered(s1, s2) || s1 == s2); + CMP(_CMP_NGE_US, isunordered(s1, s2) || s1 < s2); + CMP(_CMP_NGT_US, isunordered(s1, s2) || s1 <= s2); + + CMP(_CMP_FALSE_OQ, 0); + CMP(_CMP_NEQ_OQ, !isunordered(s1, s2) && s1 != s2); + CMP(_CMP_GE_OS, !isunordered(s1, s2) && s1 >= s2); + CMP(_CMP_GT_OS, !isunordered(s1, s2) && s1 > s2); + CMP(_CMP_TRUE_UQ, 1); + + CMP(_CMP_EQ_OS, !isunordered(s1, s2) && s1 == s2); + CMP(_CMP_LT_OQ, !isunordered(s1, s2) && s1 < s2); + CMP(_CMP_LE_OQ, !isunordered(s1, s2) && s1 <= s2); + CMP(_CMP_UNORD_S, isunordered(s1, s2)); + CMP(_CMP_NEQ_US, isunordered(s1, s2) || s1 != s2); + CMP(_CMP_NLT_UQ, isunordered(s1, s2) || s1 >= s2); + CMP(_CMP_NLE_UQ, isunordered(s1, s2) || s1 > s2); + CMP(_CMP_ORD_S, !isunordered(s1, s2)); + CMP(_CMP_EQ_US, isunordered(s1, s2) || s1 == s2); + CMP(_CMP_NGE_UQ, isunordered(s1, s2) || s1 < s2); + CMP(_CMP_NGT_UQ, isunordered(s1, s2) || s1 <= s2); + CMP(_CMP_FALSE_OS, 0); + CMP(_CMP_NEQ_OS, !isunordered(s1, s2) && s1 != s2); + CMP(_CMP_GE_OQ, !isunordered(s1, s2) && s1 >= s2); + CMP(_CMP_GT_OQ, !isunordered(s1, s2) && s1 > s2); + CMP(_CMP_TRUE_US, 1); +} + diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vcomish-1c.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vcomish-1c.c new file mode 100644 index 00000000000..77366a8a30e --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vcomish-1c.c @@ -0,0 +1,66 @@ +/* { dg-do run { target avx512fp16 } } */ +/* { dg-options "-O2 -mavx512fp16 -mavx512dq" } */ + + +#define AVX512FP16 +#include "avx512fp16-helper.h" + + +#define CMP(imm, rel) \ + dst_ref = 0; \ + dst_ref = ((int) rel) | dst_ref; \ + dst = _mm_comi_round_sh(src1.xmmh[0], src2.xmmh[0], imm, \ + _MM_FROUND_NO_EXC); \ + if (dst_ref != dst) abort(); \ + +void +test_512 (void) +{ + V512 v1,v2,v3,v4; + float s1,s2; + int res,exp,dst; + __mmask8 dst_ref; + + init_src_nanf(); + unpack_ph_2twops(src1, &v1, &v2); + unpack_ph_2twops(src2, &v3, &v4); + s1 = v1.f32[0]; + s2 = v3.f32[0]; + + CMP(_CMP_EQ_OQ, !isunordered(s1, s2) && s1 == s2); + CMP(_CMP_LT_OS, !isunordered(s1, s2) && s1 < s2); + CMP(_CMP_LE_OS, !isunordered(s1, s2) && s1 <= s2); + CMP(_CMP_UNORD_Q, isunordered(s1, s2)); + CMP(_CMP_NEQ_UQ, isunordered(s1, s2) || s1 != s2); + CMP(_CMP_NLT_US, isunordered(s1, s2) || s1 >= s2); + CMP(_CMP_NLE_US, isunordered(s1, s2) || s1 > s2); + CMP(_CMP_ORD_Q, !isunordered(s1, s2)); + + CMP(_CMP_EQ_UQ, isunordered(s1, s2) || s1 == s2); + CMP(_CMP_NGE_US, isunordered(s1, s2) || s1 < s2); + CMP(_CMP_NGT_US, isunordered(s1, s2) || s1 <= s2); + + CMP(_CMP_FALSE_OQ, 0); + CMP(_CMP_NEQ_OQ, !isunordered(s1, s2) && s1 != s2); + CMP(_CMP_GE_OS, !isunordered(s1, s2) && s1 >= s2); + CMP(_CMP_GT_OS, !isunordered(s1, s2) && s1 > s2); + CMP(_CMP_TRUE_UQ, 1); + + CMP(_CMP_EQ_OS, !isunordered(s1, s2) && s1 == s2); + CMP(_CMP_LT_OQ, !isunordered(s1, s2) && s1 < s2); + CMP(_CMP_LE_OQ, !isunordered(s1, s2) && s1 <= s2); + CMP(_CMP_UNORD_S, isunordered(s1, s2)); + CMP(_CMP_NEQ_US, isunordered(s1, s2) || s1 != s2); + CMP(_CMP_NLT_UQ, isunordered(s1, s2) || s1 >= s2); + CMP(_CMP_NLE_UQ, isunordered(s1, s2) || s1 > s2); + CMP(_CMP_ORD_S, !isunordered(s1, s2)); + CMP(_CMP_EQ_US, isunordered(s1, s2) || s1 == s2); + CMP(_CMP_NGE_UQ, isunordered(s1, s2) || s1 < s2); + CMP(_CMP_NGT_UQ, isunordered(s1, s2) || s1 <= s2); + CMP(_CMP_FALSE_OS, 0); + CMP(_CMP_NEQ_OS, !isunordered(s1, s2) && s1 != s2); + CMP(_CMP_GE_OQ, !isunordered(s1, s2) && s1 >= s2); + CMP(_CMP_GT_OQ, !isunordered(s1, s2) && s1 > s2); + CMP(_CMP_TRUE_US, 1); +} + diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcmpph-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcmpph-1a.c new file mode 100644 index 00000000000..31da2b235f2 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcmpph-1a.c @@ -0,0 +1,24 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx512fp16 -mavx512vl -O2" } */ +/* { dg-final { scan-assembler-times "vcmpph\[ \\t\]+\\\$1\[^\n\r]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\[^\n\r]*%k\[0-9\](?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcmpph\[ \\t\]+\\\$2\[^\n\r]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\[^\n\r]*%k\[0-9\]\{%k\[0-9\]\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcmpph\[ \\t\]+\\\$3\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%k\[0-9\](?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcmpph\[ \\t\]+\\\$4\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%k\[0-9\]\{%k\[0-9\]\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ + +#include + +volatile __mmask16 res; +volatile __mmask8 res1; +volatile __m256h x1, x2; +volatile __m128h x3, x4; +volatile __mmask16 m16; +volatile __mmask8 m8; + +void extern +avx512f_test (void) +{ + res = _mm256_cmp_ph_mask (x1, x2, 1); + res = _mm256_mask_cmp_ph_mask (m16, x1, x2, 2); + res1 = _mm_cmp_ph_mask (x3, x4, 3); + res1 = _mm_mask_cmp_ph_mask (m8, x3, x4, 4); +} diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcmpph-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcmpph-1b.c new file mode 100644 index 00000000000..c201a9258bf --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcmpph-1b.c @@ -0,0 +1,16 @@ +/* { dg-do run { target avx512fp16 } } */ +/* { dg-options "-O2 -mavx512fp16 -mavx512vl -mavx512dq" } */ + +#define DEBUG +#define AVX512VL +#define AVX512F_LEN 256 +#define AVX512F_LEN_HALF 128 +#include "avx512fp16-vcmpph-1b.c" + +#undef AVX512F_LEN +#undef AVX512F_LEN_HALF + +#define AVX512F_LEN 128 +#define AVX512F_LEN_HALF 128 +#include "avx512fp16-vcmpph-1b.c" + From patchwork Thu Jul 1 06:16:02 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: liuhongt X-Patchwork-Id: 1499323 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=8.43.85.97; helo=sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256 header.s=default header.b=j/QruD2s; dkim-atps=neutral Received: from sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4GFpR11bGHz9sW8 for ; Thu, 1 Jul 2021 16:37:01 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id F2716386F829 for ; Thu, 1 Jul 2021 06:36:58 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org F2716386F829 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1625121419; bh=nXGWlbHWiDP8cbhs6PIVfRayp8p+UNOseRfJefe3zGI=; h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=j/QruD2scPu5ltTRsSq/lJWhbKphIXi9vMOoZXZc6boDFal6ZcLgdOgwX+NCfIifK W5Xvw9IZd8rVD72Y1Y4r15k6qYHFRN76UFftTt7i+k3i4FVJZgKGcbWfkr6HeWL9Lx 1aXYquUJU4vQUjjA/e5cOUchUW2p0tAOvyU/GljQ= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mga18.intel.com (mga18.intel.com [134.134.136.126]) by sourceware.org (Postfix) with ESMTPS id 49480384F021 for ; Thu, 1 Jul 2021 06:17:18 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 49480384F021 X-IronPort-AV: E=McAfee;i="6200,9189,10031"; a="195769861" X-IronPort-AV: E=Sophos;i="5.83,313,1616482800"; d="scan'208";a="195769861" Received: from fmsmga006.fm.intel.com ([10.253.24.20]) by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Jun 2021 23:17:17 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.83,313,1616482800"; d="scan'208";a="641962001" Received: from scymds01.sc.intel.com ([10.148.94.138]) by fmsmga006.fm.intel.com with ESMTP; 30 Jun 2021 23:17:16 -0700 Received: from shliclel320.sh.intel.com (shliclel320.sh.intel.com [10.239.236.50]) by scymds01.sc.intel.com with ESMTP id 1616Gmet031625; Wed, 30 Jun 2021 23:17:15 -0700 To: gcc-patches@gcc.gnu.org Subject: [PATCH 16/62] AVX512FP16: Add vsqrtph/vrsqrtph/vsqrtsh/vrsqrtsh. Date: Thu, 1 Jul 2021 14:16:02 +0800 Message-Id: <20210701061648.9447-17-hongtao.liu@intel.com> X-Mailer: git-send-email 2.18.1 In-Reply-To: <20210701061648.9447-1-hongtao.liu@intel.com> References: <20210701061648.9447-1-hongtao.liu@intel.com> X-Spam-Status: No, score=-12.9 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_NONE, KAM_DMARC_STATUS, KAM_LAZY_DOMAIN_SECURITY, KAM_SHORT, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: liuhongt via Gcc-patches From: liuhongt Reply-To: liuhongt Cc: jakub@redhat.com Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" gcc/ChangeLog: * config/i386/avx512fp16intrin.h: (_mm512_sqrt_ph): New intrinsic. (_mm512_mask_sqrt_ph): Likewise. (_mm512_maskz_sqrt_ph): Likewise. (_mm512_sqrt_round_ph): Likewise. (_mm512_mask_sqrt_round_ph): Likewise. (_mm512_maskz_sqrt_round_ph): Likewise. (_mm512_rsqrt_ph): Likewise. (_mm512_mask_rsqrt_ph): Likewise. (_mm512_maskz_rsqrt_ph): Likewise. (_mm_rsqrt_sh): Likewise. (_mm_mask_rsqrt_sh): Likewise. (_mm_maskz_rsqrt_sh): Likewise. (_mm_sqrt_sh): Likewise. (_mm_mask_sqrt_sh): Likewise. (_mm_maskz_sqrt_sh): Likewise. (_mm_sqrt_round_sh): Likewise. (_mm_mask_sqrt_round_sh): Likewise. (_mm_maskz_sqrt_round_sh): Likewise. * config/i386/avx512fp16vlintrin.h (_mm_sqrt_ph): New intrinsic. (_mm256_sqrt_ph): Likewise. (_mm_mask_sqrt_ph): Likewise. (_mm256_mask_sqrt_ph): Likewise. (_mm_maskz_sqrt_ph): Likewise. (_mm256_maskz_sqrt_ph): Likewise. (_mm_rsqrt_ph): Likewise. (_mm256_rsqrt_ph): Likewise. (_mm_mask_rsqrt_ph): Likewise. (_mm256_mask_rsqrt_ph): Likewise. (_mm_maskz_rsqrt_ph): Likewise. (_mm256_maskz_rsqrt_ph): Likewise. * config/i386/i386-builtin-types.def: Add corresponding builtin types. * config/i386/i386-builtin.def: Add corresponding new builtins. * config/i386/i386-expand.c (ix86_expand_args_builtin): Handle new builtins. (ix86_expand_round_builtin): Ditto. * config/i386/sse.md (VF_AVX512FP16VL): New. (sqrt2): Adjust for HF vector modes. (_sqrt2): Likewise. (_vmsqrt2): Likewise. (_rsqrt2): New. (avx512fp16_vmrsqrtv8hf2): Likewise. gcc/testsuite/ChangeLog: * gcc.target/i386/avx-1.c: Add test for new builtins. * gcc.target/i386/sse-13.c: Ditto. * gcc.target/i386/sse-23.c: Ditto. * gcc.target/i386/sse-14.c: Add test for new intrinsics. * gcc.target/i386/sse-22.c: Ditto. --- gcc/config/i386/avx512fp16intrin.h | 193 +++++++++++++++++++++++++ gcc/config/i386/avx512fp16vlintrin.h | 93 ++++++++++++ gcc/config/i386/i386-builtin-types.def | 4 + gcc/config/i386/i386-builtin.def | 8 + gcc/config/i386/i386-expand.c | 4 + gcc/config/i386/sse.md | 44 ++++-- gcc/testsuite/gcc.target/i386/avx-1.c | 2 + gcc/testsuite/gcc.target/i386/sse-13.c | 2 + gcc/testsuite/gcc.target/i386/sse-14.c | 6 + gcc/testsuite/gcc.target/i386/sse-22.c | 6 + gcc/testsuite/gcc.target/i386/sse-23.c | 2 + 11 files changed, 355 insertions(+), 9 deletions(-) diff --git a/gcc/config/i386/avx512fp16intrin.h b/gcc/config/i386/avx512fp16intrin.h index ed8ad84a105..50db5d12140 100644 --- a/gcc/config/i386/avx512fp16intrin.h +++ b/gcc/config/i386/avx512fp16intrin.h @@ -1235,6 +1235,199 @@ _mm_comi_round_sh (__m128h __A, __m128h __B, const int __P, const int __R) #endif /* __OPTIMIZE__ */ +/* Intrinsics vsqrtph. */ +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_sqrt_ph (__m512h __A) +{ + return __builtin_ia32_vsqrtph_v32hf_mask_round (__A, + _mm512_setzero_ph(), + (__mmask32) -1, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_sqrt_ph (__m512h __A, __mmask32 __B, __m512h __C) +{ + return __builtin_ia32_vsqrtph_v32hf_mask_round (__C, __A, __B, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_sqrt_ph (__mmask32 __A, __m512h __B) +{ + return __builtin_ia32_vsqrtph_v32hf_mask_round (__B, + _mm512_setzero_ph (), + __A, + _MM_FROUND_CUR_DIRECTION); +} + +#ifdef __OPTIMIZE__ +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_sqrt_round_ph (__m512h __A, const int __B) +{ + return __builtin_ia32_vsqrtph_v32hf_mask_round (__A, + _mm512_setzero_ph(), + (__mmask32) -1, __B); +} + +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_sqrt_round_ph (__m512h __A, __mmask32 __B, __m512h __C, + const int __D) +{ + return __builtin_ia32_vsqrtph_v32hf_mask_round (__C, __A, __B, __D); +} + +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_sqrt_round_ph (__mmask32 __A, __m512h __B, const int __C) +{ + return __builtin_ia32_vsqrtph_v32hf_mask_round (__B, + _mm512_setzero_ph (), + __A, __C); +} + +#else +#define _mm512_sqrt_round_ph(A, B) \ + (__builtin_ia32_vsqrtph_v32hf_mask_round ((A), \ + _mm512_setzero_ph (), \ + (__mmask32)-1, (B))) + +#define _mm512_mask_sqrt_round_ph(A, B, C, D) \ + (__builtin_ia32_vsqrtph_v32hf_mask_round ((C), (A), (B), (D))) + +#define _mm512_maskz_sqrt_round_ph(A, B, C) \ + (__builtin_ia32_vsqrtph_v32hf_mask_round ((B), \ + _mm512_setzero_ph (), \ + (A), (C))) + +#endif /* __OPTIMIZE__ */ + +/* Intrinsics vrsqrtph. */ +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_rsqrt_ph (__m512h __A) +{ + return __builtin_ia32_vrsqrtph_v32hf_mask (__A, _mm512_setzero_ph (), + (__mmask32) -1); +} + +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_rsqrt_ph (__m512h __A, __mmask32 __B, __m512h __C) +{ + return __builtin_ia32_vrsqrtph_v32hf_mask (__C, __A, __B); +} + +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_rsqrt_ph (__mmask32 __A, __m512h __B) +{ + return __builtin_ia32_vrsqrtph_v32hf_mask (__B, _mm512_setzero_ph (), + __A); +} + +/* Intrinsics vrsqrtsh. */ +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_rsqrt_sh (__m128h __A, __m128h __B) +{ + return __builtin_ia32_vrsqrtsh_v8hf_mask (__B, __A, _mm_setzero_ph (), + (__mmask8) -1); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_rsqrt_sh (__m128h __A, __mmask8 __B, __m128h __C, __m128h __D) +{ + return __builtin_ia32_vrsqrtsh_v8hf_mask (__D, __C, __A, __B); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_rsqrt_sh (__mmask8 __A, __m128h __B, __m128h __C) +{ + return __builtin_ia32_vrsqrtsh_v8hf_mask (__C, __B, _mm_setzero_ph (), + __A); +} + +/* Intrinsics vsqrtsh. */ +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_sqrt_sh (__m128h __A, __m128h __B) +{ + return __builtin_ia32_vsqrtsh_v8hf_mask_round (__B, __A, + _mm_setzero_ph (), + (__mmask8) -1, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_sqrt_sh (__m128h __A, __mmask8 __B, __m128h __C, __m128h __D) +{ + return __builtin_ia32_vsqrtsh_v8hf_mask_round (__D, __C, __A, __B, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_sqrt_sh (__mmask8 __A, __m128h __B, __m128h __C) +{ + return __builtin_ia32_vsqrtsh_v8hf_mask_round (__C, __B, + _mm_setzero_ph (), + __A, _MM_FROUND_CUR_DIRECTION); +} + +#ifdef __OPTIMIZE__ +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_sqrt_round_sh (__m128h __A, __m128h __B, const int __C) +{ + return __builtin_ia32_vsqrtsh_v8hf_mask_round (__B, __A, + _mm_setzero_ph (), + (__mmask8) -1, __C); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_sqrt_round_sh (__m128h __A, __mmask8 __B, __m128h __C, + __m128h __D, const int __E) +{ + return __builtin_ia32_vsqrtsh_v8hf_mask_round (__D, __C, __A, __B, + __E); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_sqrt_round_sh (__mmask8 __A, __m128h __B, __m128h __C, + const int __D) +{ + return __builtin_ia32_vsqrtsh_v8hf_mask_round (__C, __B, + _mm_setzero_ph (), + __A, __D); +} + +#else +#define _mm_sqrt_round_sh(A, B, C) \ + (__builtin_ia32_vsqrtsh_v8hf_mask_round ((B), (A), \ + _mm_setzero_ph (), \ + (__mmask8)-1, (C))) + +#define _mm_mask_sqrt_round_sh(A, B, C, D, E) \ + (__builtin_ia32_vsqrtsh_v8hf_mask_round ((D), (C), (A), (B), (E))) + +#define _mm_maskz_sqrt_round_sh(A, B, C, D) \ + (__builtin_ia32_vsqrtsh_v8hf_mask_round ((C), (B), \ + _mm_setzero_ph (), \ + (A), (D))) + +#endif /* __OPTIMIZE__ */ + #ifdef __DISABLE_AVX512FP16__ #undef __DISABLE_AVX512FP16__ #pragma GCC pop_options diff --git a/gcc/config/i386/avx512fp16vlintrin.h b/gcc/config/i386/avx512fp16vlintrin.h index 1787ed5f4ff..aaed85203c9 100644 --- a/gcc/config/i386/avx512fp16vlintrin.h +++ b/gcc/config/i386/avx512fp16vlintrin.h @@ -358,6 +358,99 @@ _mm_mask_cmp_ph_mask (__mmask16 __A, __m256h __B, __m256h __C, #endif /* __OPTIMIZE__ */ +/* Intrinsics vsqrtph. */ +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_sqrt_ph (__m128h __A) +{ + return __builtin_ia32_vsqrtph_v8hf_mask (__A, _mm_setzero_ph (), + (__mmask8) -1); +} + +extern __inline __m256h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_sqrt_ph (__m256h __A) +{ + return __builtin_ia32_vsqrtph_v16hf_mask (__A, _mm256_setzero_ph (), + (__mmask16) -1); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_sqrt_ph (__m128h __A, __mmask8 __B, __m128h __C) +{ + return __builtin_ia32_vsqrtph_v8hf_mask (__C, __A, __B); +} + +extern __inline __m256h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_mask_sqrt_ph (__m256h __A, __mmask16 __B, __m256h __C) +{ + return __builtin_ia32_vsqrtph_v16hf_mask (__C, __A, __B); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_sqrt_ph (__mmask8 __A, __m128h __B) +{ + return __builtin_ia32_vsqrtph_v8hf_mask (__B, _mm_setzero_ph (), + __A); +} + +extern __inline __m256h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_maskz_sqrt_ph (__mmask16 __A, __m256h __B) +{ + return __builtin_ia32_vsqrtph_v16hf_mask (__B, _mm256_setzero_ph (), + __A); +} + +/* Intrinsics vrsqrtph. */ +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_rsqrt_ph (__m128h __A) +{ + return __builtin_ia32_vrsqrtph_v8hf_mask (__A, _mm_setzero_ph (), + (__mmask8) -1); +} + +extern __inline __m256h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_rsqrt_ph (__m256h __A) +{ + return __builtin_ia32_vrsqrtph_v16hf_mask (__A, _mm256_setzero_ph (), + (__mmask16) -1); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_rsqrt_ph (__m128h __A, __mmask8 __B, __m128h __C) +{ + return __builtin_ia32_vrsqrtph_v8hf_mask (__C, __A, __B); +} + +extern __inline __m256h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_mask_rsqrt_ph (__m256h __A, __mmask16 __B, __m256h __C) +{ + return __builtin_ia32_vrsqrtph_v16hf_mask (__C, __A, __B); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_rsqrt_ph (__mmask8 __A, __m128h __B) +{ + return __builtin_ia32_vrsqrtph_v8hf_mask (__B, _mm_setzero_ph (), __A); +} + +extern __inline __m256h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_maskz_rsqrt_ph (__mmask16 __A, __m256h __B) +{ + return __builtin_ia32_vrsqrtph_v16hf_mask (__B, _mm256_setzero_ph (), + __A); +} + #ifdef __DISABLE_AVX512FP16VL__ #undef __DISABLE_AVX512FP16VL__ #pragma GCC pop_options diff --git a/gcc/config/i386/i386-builtin-types.def b/gcc/config/i386/i386-builtin-types.def index e3070ad00bd..9ebad6b5f49 100644 --- a/gcc/config/i386/i386-builtin-types.def +++ b/gcc/config/i386/i386-builtin-types.def @@ -1305,16 +1305,20 @@ DEF_FUNCTION_TYPE (UINT8, PV2DI, PCV2DI, PCVOID) # FP16 builtins DEF_FUNCTION_TYPE (V8HF, V8HI) DEF_FUNCTION_TYPE (V8HF, V8HF, V8HF) +DEF_FUNCTION_TYPE (V8HF, V8HF, V8HF, UQI) DEF_FUNCTION_TYPE (V8HF, V8HF, V8HF, INT) DEF_FUNCTION_TYPE (UQI, V8HF, V8HF, INT, UQI) DEF_FUNCTION_TYPE (V8HF, V8HF, V8HF, V8HF, UQI) DEF_FUNCTION_TYPE (UQI, V8HF, V8HF, INT, UQI, INT) DEF_FUNCTION_TYPE (V8HF, V8HF, V8HF, V8HF, UQI, INT) DEF_FUNCTION_TYPE (V16HF, V16HF, V16HF) +DEF_FUNCTION_TYPE (V16HF, V16HF, V16HF, UHI) DEF_FUNCTION_TYPE (UHI, V16HF, V16HF, INT, UHI) DEF_FUNCTION_TYPE (V16HF, V16HF, V16HF, V16HF, UHI) +DEF_FUNCTION_TYPE (V32HF, V32HF, V32HF, USI) DEF_FUNCTION_TYPE (V32HF, V32HF, V32HF, INT) DEF_FUNCTION_TYPE (USI, V32HF, V32HF, INT, USI) +DEF_FUNCTION_TYPE (V32HF, V32HF, V32HF, USI, INT) DEF_FUNCTION_TYPE (V32HF, V32HF, V32HF, V32HF, USI) DEF_FUNCTION_TYPE (USI, V32HF, V32HF, INT, USI, INT) DEF_FUNCTION_TYPE (V32HF, V32HF, V32HF, V32HF, USI, INT) diff --git a/gcc/config/i386/i386-builtin.def b/gcc/config/i386/i386-builtin.def index 045cf561ec7..999b2e1abb5 100644 --- a/gcc/config/i386/i386-builtin.def +++ b/gcc/config/i386/i386-builtin.def @@ -2802,6 +2802,12 @@ BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vmsminv8hf3_mask, "__ BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_cmpv8hf3_mask, "__builtin_ia32_vcmpph_v8hf_mask", IX86_BUILTIN_VCMPPH_V8HF_MASK, UNKNOWN, (int) UQI_FTYPE_V8HF_V8HF_INT_UQI) BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512vl_cmpv16hf3_mask, "__builtin_ia32_vcmpph_v16hf_mask", IX86_BUILTIN_VCMPPH_V16HF_MASK, UNKNOWN, (int) UHI_FTYPE_V16HF_V16HF_INT_UHI) BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_cmpv32hf3_mask, "__builtin_ia32_vcmpph_v32hf_mask", IX86_BUILTIN_VCMPPH_V32HF_MASK, UNKNOWN, (int) USI_FTYPE_V32HF_V32HF_INT_USI) +BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_sqrtv8hf2_mask, "__builtin_ia32_vsqrtph_v8hf_mask", IX86_BUILTIN_VSQRTPH_V8HF_MASK, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_UQI) +BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_sqrtv16hf2_mask, "__builtin_ia32_vsqrtph_v16hf_mask", IX86_BUILTIN_VSQRTPH_V16HF_MASK, UNKNOWN, (int) V16HF_FTYPE_V16HF_V16HF_UHI) +BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_rsqrtv8hf2_mask, "__builtin_ia32_vrsqrtph_v8hf_mask", IX86_BUILTIN_VRSQRTPH_V8HF_MASK, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_UQI) +BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_rsqrtv16hf2_mask, "__builtin_ia32_vrsqrtph_v16hf_mask", IX86_BUILTIN_VRSQRTPH_V16HF_MASK, UNKNOWN, (int) V16HF_FTYPE_V16HF_V16HF_UHI) +BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_rsqrtv32hf2_mask, "__builtin_ia32_vrsqrtph_v32hf_mask", IX86_BUILTIN_VRSQRTPH_V32HF_MASK, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_USI) +BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vmrsqrtv8hf2_mask, "__builtin_ia32_vrsqrtsh_v8hf_mask", IX86_BUILTIN_VRSQRTSH_V8HF_MASK, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI) /* Builtins with rounding support. */ BDESC_END (ARGS, ROUND_ARGS) @@ -3017,6 +3023,8 @@ BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vmsmaxv8hf3_mask_roun BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vmsminv8hf3_mask_round, "__builtin_ia32_vminsh_v8hf_mask_round", IX86_BUILTIN_VMINSH_V8HF_MASK_ROUND, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI_INT) BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_cmpv32hf3_mask_round, "__builtin_ia32_vcmpph_v32hf_mask_round", IX86_BUILTIN_VCMPPH_V32HF_MASK_ROUND, UNKNOWN, (int) USI_FTYPE_V32HF_V32HF_INT_USI_INT) BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512f_vmcmpv8hf3_mask_round, "__builtin_ia32_vcmpsh_v8hf_mask_round", IX86_BUILTIN_VCMPSH_V8HF_MASK_ROUND, UNKNOWN, (int) UQI_FTYPE_V8HF_V8HF_INT_UQI_INT) +BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_sqrtv32hf2_mask_round, "__builtin_ia32_vsqrtph_v32hf_mask_round", IX86_BUILTIN_VSQRTPH_V32HF_MASK_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_USI_INT) +BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vmsqrtv8hf2_mask_round, "__builtin_ia32_vsqrtsh_v8hf_mask_round", IX86_BUILTIN_VSQRTSH_V8HF_MASK_ROUND, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI_INT) BDESC_END (ROUND_ARGS, MULTI_ARG) diff --git a/gcc/config/i386/i386-expand.c b/gcc/config/i386/i386-expand.c index a79cc324ceb..d76e4405413 100644 --- a/gcc/config/i386/i386-expand.c +++ b/gcc/config/i386/i386-expand.c @@ -9532,6 +9532,7 @@ ix86_expand_args_builtin (const struct builtin_description *d, case V16HI_FTYPE_V16SI_V16HI_UHI: case V16QI_FTYPE_V16SI_V16QI_UHI: case V16QI_FTYPE_V8DI_V16QI_UQI: + case V32HF_FTYPE_V32HF_V32HF_USI: case V16SF_FTYPE_V16SF_V16SF_UHI: case V16SF_FTYPE_V4SF_V16SF_UHI: case V16SI_FTYPE_SI_V16SI_UHI: @@ -9561,12 +9562,14 @@ ix86_expand_args_builtin (const struct builtin_description *d, case V16HI_FTYPE_HI_V16HI_UHI: case V8HI_FTYPE_V8HI_V8HI_UQI: case V8HI_FTYPE_HI_V8HI_UQI: + case V16HF_FTYPE_V16HF_V16HF_UHI: case V8SF_FTYPE_V8HI_V8SF_UQI: case V4SF_FTYPE_V8HI_V4SF_UQI: case V8SI_FTYPE_V8SF_V8SI_UQI: case V4SI_FTYPE_V4SF_V4SI_UQI: case V4DI_FTYPE_V4SF_V4DI_UQI: case V2DI_FTYPE_V4SF_V2DI_UQI: + case V8HF_FTYPE_V8HF_V8HF_UQI: case V4SF_FTYPE_V4DI_V4SF_UQI: case V4SF_FTYPE_V2DI_V4SF_UQI: case V4DF_FTYPE_V4DI_V4DF_UQI: @@ -10495,6 +10498,7 @@ ix86_expand_round_builtin (const struct builtin_description *d, case V8DI_FTYPE_V8DF_V8DI_QI_INT: case V8SF_FTYPE_V8DI_V8SF_QI_INT: case V8DF_FTYPE_V8DI_V8DF_QI_INT: + case V32HF_FTYPE_V32HF_V32HF_USI_INT: case V16SF_FTYPE_V16SF_V16SF_HI_INT: case V8DI_FTYPE_V8SF_V8DI_QI_INT: case V16SF_FTYPE_V16SI_V16SF_HI_INT: diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index b7e22e0ec80..4763fd0558d 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -395,6 +395,9 @@ (define_mode_iterator VF1_AVX512VL (define_mode_iterator VF_AVX512FP16 [V32HF V16HF V8HF]) +(define_mode_iterator VF_AVX512FP16VL + [V32HF (V16HF "TARGET_AVX512VL") (V8HF "TARGET_AVX512VL")]) + ;; All vector integer modes (define_mode_iterator VI [(V16SI "TARGET_AVX512F") (V8DI "TARGET_AVX512F") @@ -2238,8 +2241,8 @@ (define_insn "srcp14_mask" (set_attr "mode" "")]) (define_expand "sqrt2" - [(set (match_operand:VF2 0 "register_operand") - (sqrt:VF2 (match_operand:VF2 1 "vector_operand")))] + [(set (match_operand:VF2H 0 "register_operand") + (sqrt:VF2H (match_operand:VF2H 1 "vector_operand")))] "TARGET_SSE2") (define_expand "sqrt2" @@ -2259,8 +2262,8 @@ (define_expand "sqrt2" }) (define_insn "_sqrt2" - [(set (match_operand:VF 0 "register_operand" "=x,v") - (sqrt:VF (match_operand:VF 1 "" "xBm,")))] + [(set (match_operand:VFH 0 "register_operand" "=x,v") + (sqrt:VFH (match_operand:VFH 1 "" "xBm,")))] "TARGET_SSE && && " "@ sqrt\t{%1, %0|%0, %1} @@ -2273,11 +2276,11 @@ (define_insn "_sqrt2" (set_attr "mode" "")]) (define_insn "_vmsqrt2" - [(set (match_operand:VF_128 0 "register_operand" "=x,v") - (vec_merge:VF_128 - (sqrt:VF_128 - (match_operand:VF_128 1 "nonimmediate_operand" "xm,")) - (match_operand:VF_128 2 "register_operand" "0,v") + [(set (match_operand:VFH_128 0 "register_operand" "=x,v") + (vec_merge:VFH_128 + (sqrt:VFH_128 + (match_operand:VFH_128 1 "nonimmediate_operand" "xm,")) + (match_operand:VFH_128 2 "register_operand" "0,v") (const_int 1)))] "TARGET_SSE" "@ @@ -2330,6 +2333,16 @@ (define_insn "_rsqrt2" (set_attr "prefix" "maybe_vex") (set_attr "mode" "")]) +(define_insn "_rsqrt2" + [(set (match_operand:VF_AVX512FP16VL 0 "register_operand" "=v") + (unspec:VF_AVX512FP16VL + [(match_operand:VF_AVX512FP16VL 1 "vector_operand" "vBm")] UNSPEC_RSQRT))] + "TARGET_AVX512FP16" + "vrsqrtph\t{%1, %0|%0, %1}" + [(set_attr "type" "sse") + (set_attr "prefix" "evex") + (set_attr "mode" "")]) + (define_insn "rsqrt14" [(set (match_operand:VF_AVX512VL 0 "register_operand" "=v") (unspec:VF_AVX512VL @@ -2405,6 +2418,19 @@ (define_insn "*sse_vmrsqrtv4sf2" (set_attr "prefix" "orig,vex") (set_attr "mode" "SF")]) +(define_insn "avx512fp16_vmrsqrtv8hf2" + [(set (match_operand:V8HF 0 "register_operand" "=v") + (vec_merge:V8HF + (unspec:V8HF [(match_operand:V8HF 1 "nonimmediate_operand" "vm")] + UNSPEC_RSQRT) + (match_operand:V8HF 2 "register_operand" "v") + (const_int 1)))] + "TARGET_AVX512FP16" + "vrsqrtsh\t{%1, %2, %0|%0, %2, %w1}" + [(set_attr "type" "sse") + (set_attr "prefix" "evex") + (set_attr "mode" "HF")]) + (define_expand "3" [(set (match_operand:VFH 0 "register_operand") (smaxmin:VFH diff --git a/gcc/testsuite/gcc.target/i386/avx-1.c b/gcc/testsuite/gcc.target/i386/avx-1.c index d9aa8a70e35..651cb1c80fb 100644 --- a/gcc/testsuite/gcc.target/i386/avx-1.c +++ b/gcc/testsuite/gcc.target/i386/avx-1.c @@ -701,6 +701,8 @@ #define __builtin_ia32_vcmpph_v32hf_mask(A, B, C, D) __builtin_ia32_vcmpph_v32hf_mask(A, B, 1, D) #define __builtin_ia32_vcmpph_v32hf_mask_round(A, B, C, D, E) __builtin_ia32_vcmpph_v32hf_mask_round(A, B, 1, D, 8) #define __builtin_ia32_vcmpsh_v8hf_mask_round(A, B, C, D, E) __builtin_ia32_vcmpsh_v8hf_mask_round(A, B, 1, D, 8) +#define __builtin_ia32_vsqrtph_v32hf_mask_round(C, A, B, D) __builtin_ia32_vsqrtph_v32hf_mask_round(C, A, B, 8) +#define __builtin_ia32_vsqrtsh_v8hf_mask_round(D, C, A, B, E) __builtin_ia32_vsqrtsh_v8hf_mask_round(D, C, A, B, 8) /* avx512fp16vlintrin.h */ #define __builtin_ia32_vcmpph_v8hf_mask(A, B, C, D) __builtin_ia32_vcmpph_v8hf_mask(A, B, 1, D) diff --git a/gcc/testsuite/gcc.target/i386/sse-13.c b/gcc/testsuite/gcc.target/i386/sse-13.c index 9a2833d78f2..94553dec9e7 100644 --- a/gcc/testsuite/gcc.target/i386/sse-13.c +++ b/gcc/testsuite/gcc.target/i386/sse-13.c @@ -718,6 +718,8 @@ #define __builtin_ia32_vcmpph_v32hf_mask(A, B, C, D) __builtin_ia32_vcmpph_v32hf_mask(A, B, 1, D) #define __builtin_ia32_vcmpph_v32hf_mask_round(A, B, C, D, E) __builtin_ia32_vcmpph_v32hf_mask_round(A, B, 1, D, 8) #define __builtin_ia32_vcmpsh_v8hf_mask_round(A, B, C, D, E) __builtin_ia32_vcmpsh_v8hf_mask_round(A, B, 1, D, 8) +#define __builtin_ia32_vsqrtph_v32hf_mask_round(C, A, B, D) __builtin_ia32_vsqrtph_v32hf_mask_round(C, A, B, 8) +#define __builtin_ia32_vsqrtsh_v8hf_mask_round(D, C, A, B, E) __builtin_ia32_vsqrtsh_v8hf_mask_round(D, C, A, B, 8) /* avx512fp16vlintrin.h */ #define __builtin_ia32_vcmpph_v8hf_mask(A, B, C, D) __builtin_ia32_vcmpph_v8hf_mask(A, B, 1, D) diff --git a/gcc/testsuite/gcc.target/i386/sse-14.c b/gcc/testsuite/gcc.target/i386/sse-14.c index ce0ad71f190..7281bffdf2b 100644 --- a/gcc/testsuite/gcc.target/i386/sse-14.c +++ b/gcc/testsuite/gcc.target/i386/sse-14.c @@ -670,6 +670,7 @@ test_3 (_mm512_mask_rsqrt28_round_pd, __m512d, __m512d, __mmask8, __m512d, 8) test_3 (_mm512_mask_rsqrt28_round_ps, __m512, __m512, __mmask16, __m512, 8) /* avx512fp16intrin.h */ +test_1 (_mm512_sqrt_round_ph, __m512h, __m512h, 8) test_2 (_mm512_add_round_ph, __m512h, __m512h, __m512h, 8) test_2 (_mm512_sub_round_ph, __m512h, __m512h, __m512h, 8) test_2 (_mm512_mul_round_ph, __m512h, __m512h, __m512h, 8) @@ -684,6 +685,8 @@ test_2 (_mm_max_round_sh, __m128h, __m128h, __m128h, 8) test_2 (_mm_min_round_sh, __m128h, __m128h, __m128h, 8) test_2 (_mm512_cmp_ph_mask, __mmask32, __m512h, __m512h, 1) test_2 (_mm_comi_sh, int, __m128h, __m128h, 1) +test_2 (_mm512_maskz_sqrt_round_ph, __m512h, __mmask32, __m512h, 8) +test_2 (_mm_sqrt_round_sh, __m128h, __m128h, __m128h, 8) test_2x (_mm512_cmp_round_ph_mask, __mmask32, __m512h, __m512h, 1, 8) test_2x (_mm_cmp_round_sh_mask, __mmask8, __m128h, __m128h, 1, 8) test_2x (_mm_comi_round_sh, int, __m128h, __m128h, 1, 8) @@ -700,6 +703,8 @@ test_3 (_mm512_maskz_min_round_ph, __m512h, __mmask32, __m512h, __m512h, 8) test_3 (_mm_maskz_max_round_sh, __m128h, __mmask8, __m128h, __m128h, 8) test_3 (_mm_maskz_min_round_sh, __m128h, __mmask8, __m128h, __m128h, 8) test_3 (_mm512_mask_cmp_ph_mask, __mmask32, __mmask32, __m512h, __m512h, 1) +test_3 (_mm512_mask_sqrt_round_ph, __m512h, __m512h, __mmask32, __m512h, 8) +test_3 (_mm_maskz_sqrt_round_sh, __m128h, __mmask8, __m128h, __m128h, 8) test_3x (_mm512_mask_cmp_round_ph_mask, __mmask32, __mmask32, __m512h, __m512h, 1, 8) test_3x (_mm_mask_cmp_round_sh_mask, __mmask8, __mmask8, __m128h, __m128h, 1, 8) test_4 (_mm512_mask_add_round_ph, __m512h, __m512h, __mmask32, __m512h, __m512h, 8) @@ -714,6 +719,7 @@ test_4 (_mm512_mask_max_round_ph, __m512h, __m512h, __mmask32, __m512h, __m512h, test_4 (_mm512_mask_min_round_ph, __m512h, __m512h, __mmask32, __m512h, __m512h, 8) test_4 (_mm_mask_max_round_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 8) test_4 (_mm_mask_min_round_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 8) +test_4 (_mm_mask_sqrt_round_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 8) /* avx512fp16vlintrin.h */ test_2 (_mm_cmp_ph_mask, __mmask8, __m128h, __m128h, 1) diff --git a/gcc/testsuite/gcc.target/i386/sse-22.c b/gcc/testsuite/gcc.target/i386/sse-22.c index 439346490bd..04326e0e37d 100644 --- a/gcc/testsuite/gcc.target/i386/sse-22.c +++ b/gcc/testsuite/gcc.target/i386/sse-22.c @@ -775,6 +775,7 @@ test_2 (_mm_rsqrt28_round_sd, __m128d, __m128d, __m128d, 8) test_2 (_mm_rsqrt28_round_ss, __m128, __m128, __m128, 8) /* avx512fp16intrin.h */ +test_1 (_mm512_sqrt_round_ph, __m512h, __m512h, 8) test_2 (_mm512_add_round_ph, __m512h, __m512h, __m512h, 8) test_2 (_mm512_sub_round_ph, __m512h, __m512h, __m512h, 8) test_2 (_mm512_mul_round_ph, __m512h, __m512h, __m512h, 8) @@ -789,6 +790,8 @@ test_2 (_mm_max_round_sh, __m128h, __m128h, __m128h, 8) test_2 (_mm_min_round_sh, __m128h, __m128h, __m128h, 8) test_2 (_mm512_cmp_ph_mask, __mmask32, __m512h, __m512h, 1) test_2 (_mm_comi_sh, int, __m128h, __m128h, 1) +test_2 (_mm512_maskz_sqrt_round_ph, __m512h, __mmask32, __m512h, 8) +test_2 (_mm_sqrt_round_sh, __m128h, __m128h, __m128h, 8) test_2x (_mm512_cmp_round_ph_mask, __mmask32, __m512h, __m512h, 1, 8) test_2x (_mm_cmp_round_sh_mask, __mmask8, __m128h, __m128h, 1, 8) test_2x (_mm_comi_round_sh, int, __m128h, __m128h, 1, 8) @@ -805,6 +808,8 @@ test_3 (_mm512_maskz_min_round_ph, __m512h, __mmask32, __m512h, __m512h, 8) test_3 (_mm_maskz_max_round_sh, __m128h, __mmask8, __m128h, __m128h, 8) test_3 (_mm_maskz_min_round_sh, __m128h, __mmask8, __m128h, __m128h, 8) test_3 (_mm512_mask_cmp_ph_mask, __mmask32, __mmask32, __m512h, __m512h, 1) +test_3 (_mm512_mask_sqrt_round_ph, __m512h, __m512h, __mmask32, __m512h, 8) +test_3 (_mm_maskz_sqrt_round_sh, __m128h, __mmask8, __m128h, __m128h, 8) test_3x (_mm512_mask_cmp_round_ph_mask, __mmask32, __mmask32, __m512h, __m512h, 1, 8) test_3x (_mm_mask_cmp_round_sh_mask, __mmask8, __mmask8, __m128h, __m128h, 1, 8) test_4 (_mm512_mask_add_round_ph, __m512h, __m512h, __mmask32, __m512h, __m512h, 8) @@ -819,6 +824,7 @@ test_4 (_mm512_mask_max_round_ph, __m512h, __m512h, __mmask32, __m512h, __m512h, test_4 (_mm512_mask_min_round_ph, __m512h, __m512h, __mmask32, __m512h, __m512h, 8) test_4 (_mm_mask_max_round_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 8) test_4 (_mm_mask_min_round_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 8) +test_4 (_mm_mask_sqrt_round_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 8) /* avx512fp16vlintrin.h */ test_2 (_mm_cmp_ph_mask, __mmask8, __m128h, __m128h, 1) diff --git a/gcc/testsuite/gcc.target/i386/sse-23.c b/gcc/testsuite/gcc.target/i386/sse-23.c index f6768bac345..7559d335dbc 100644 --- a/gcc/testsuite/gcc.target/i386/sse-23.c +++ b/gcc/testsuite/gcc.target/i386/sse-23.c @@ -719,6 +719,8 @@ #define __builtin_ia32_vcmpph_v32hf_mask(A, B, C, D) __builtin_ia32_vcmpph_v32hf_mask(A, B, 1, D) #define __builtin_ia32_vcmpph_v32hf_mask_round(A, B, C, D, E) __builtin_ia32_vcmpph_v32hf_mask_round(A, B, 1, D, 8) #define __builtin_ia32_vcmpsh_v8hf_mask_round(A, B, C, D, E) __builtin_ia32_vcmpsh_v8hf_mask_round(A, B, 1, D, 8) +#define __builtin_ia32_vsqrtph_v32hf_mask_round(C, A, B, D) __builtin_ia32_vsqrtph_v32hf_mask_round(C, A, B, 8) +#define __builtin_ia32_vsqrtsh_v8hf_mask_round(D, C, A, B, E) __builtin_ia32_vsqrtsh_v8hf_mask_round(D, C, A, B, 8) /* avx512fp16vlintrin.h */ #define __builtin_ia32_vcmpph_v8hf_mask(A, B, C, D) __builtin_ia32_vcmpph_v8hf_mask(A, B, 1, D) From patchwork Thu Jul 1 06:16:03 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: liuhongt X-Patchwork-Id: 1499324 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256 header.s=default header.b=S4zADMzZ; dkim-atps=neutral Received: from sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4GFpSK0DqNz9sW8 for ; Thu, 1 Jul 2021 16:38:08 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 23707383B834 for ; Thu, 1 Jul 2021 06:38:06 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 23707383B834 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1625121486; bh=TvxfAb8aQzOxtCNifLiunNkHUehZdIXcR7fIB3InjO8=; h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=S4zADMzZYh9W57RJs6c9lJpAg4kylLC6EefVa7CSctdB4KX3tUOU9SKQn2B8SIoI1 SVof3gSEexxeyOkOAfBI9qN3M4PLxPNnS8TnWG6xH0QTxO1RMhqKLw/WMn1D0JAPzr z73NKEhC/xa7IkJ9JXgULB8tKBTSfSiRttMmW5FQ= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mga02.intel.com (mga02.intel.com [134.134.136.20]) by sourceware.org (Postfix) with ESMTPS id E8A6C384B006 for ; Thu, 1 Jul 2021 06:17:19 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org E8A6C384B006 X-IronPort-AV: E=McAfee;i="6200,9189,10031"; a="195639127" X-IronPort-AV: E=Sophos;i="5.83,313,1616482800"; d="scan'208";a="195639127" Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Jun 2021 23:17:18 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.83,313,1616482800"; d="scan'208";a="420287752" Received: from scymds01.sc.intel.com ([10.148.94.138]) by fmsmga007.fm.intel.com with ESMTP; 30 Jun 2021 23:17:18 -0700 Received: from shliclel320.sh.intel.com (shliclel320.sh.intel.com [10.239.236.50]) by scymds01.sc.intel.com with ESMTP id 1616Gmeu031625; Wed, 30 Jun 2021 23:17:17 -0700 To: gcc-patches@gcc.gnu.org Subject: [PATCH 17/62] AVX512FP16: Add testcase for vsqrtph/vsqrtsh/vrsqrtph/vrsqrtsh. Date: Thu, 1 Jul 2021 14:16:03 +0800 Message-Id: <20210701061648.9447-18-hongtao.liu@intel.com> X-Mailer: git-send-email 2.18.1 In-Reply-To: <20210701061648.9447-1-hongtao.liu@intel.com> References: <20210701061648.9447-1-hongtao.liu@intel.com> X-Spam-Status: No, score=-12.9 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_NONE, KAM_DMARC_STATUS, KAM_LAZY_DOMAIN_SECURITY, KAM_SHORT, RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: liuhongt via Gcc-patches From: liuhongt Reply-To: liuhongt Cc: jakub@redhat.com Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" gcc/testsuite/ChangeLog: * gcc.target/i386/avx512fp16-vrsqrtph-1a.c: New test. * gcc.target/i386/avx512fp16-vrsqrtph-1b.c: Ditto. * gcc.target/i386/avx512fp16-vrsqrtsh-1a.c: Ditto. * gcc.target/i386/avx512fp16-vrsqrtsh-1b.c: Ditto. * gcc.target/i386/avx512fp16-vsqrtph-1a.c: Ditto. * gcc.target/i386/avx512fp16-vsqrtph-1b.c: Ditto. * gcc.target/i386/avx512fp16-vsqrtsh-1a.c: Ditto. * gcc.target/i386/avx512fp16-vsqrtsh-1b.c: Ditto. * gcc.target/i386/avx512fp16vl-vrsqrtph-1a.c: Ditto. * gcc.target/i386/avx512fp16vl-vrsqrtph-1b.c: Ditto. * gcc.target/i386/avx512fp16vl-vsqrtph-1a.c: Ditto. * gcc.target/i386/avx512fp16vl-vsqrtph-1b.c: Ditto. --- .../gcc.target/i386/avx512fp16-vrsqrtph-1a.c | 19 ++++ .../gcc.target/i386/avx512fp16-vrsqrtph-1b.c | 77 ++++++++++++++++ .../gcc.target/i386/avx512fp16-vrsqrtsh-1a.c | 18 ++++ .../gcc.target/i386/avx512fp16-vrsqrtsh-1b.c | 59 ++++++++++++ .../gcc.target/i386/avx512fp16-vsqrtph-1a.c | 24 +++++ .../gcc.target/i386/avx512fp16-vsqrtph-1b.c | 92 +++++++++++++++++++ .../gcc.target/i386/avx512fp16-vsqrtsh-1a.c | 23 +++++ .../gcc.target/i386/avx512fp16-vsqrtsh-1b.c | 60 ++++++++++++ .../i386/avx512fp16vl-vrsqrtph-1a.c | 29 ++++++ .../i386/avx512fp16vl-vrsqrtph-1b.c | 16 ++++ .../gcc.target/i386/avx512fp16vl-vsqrtph-1a.c | 29 ++++++ .../gcc.target/i386/avx512fp16vl-vsqrtph-1b.c | 16 ++++ 12 files changed, 462 insertions(+) create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vrsqrtph-1a.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vrsqrtph-1b.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vrsqrtsh-1a.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vrsqrtsh-1b.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vsqrtph-1a.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vsqrtph-1b.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vsqrtsh-1a.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vsqrtsh-1b.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vrsqrtph-1a.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vrsqrtph-1b.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vsqrtph-1a.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vsqrtph-1b.c diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vrsqrtph-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vrsqrtph-1a.c new file mode 100644 index 00000000000..c9671e8ed0a --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vrsqrtph-1a.c @@ -0,0 +1,19 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx512fp16 -O2" } */ +/* { dg-final { scan-assembler-times "vrsqrtph\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vrsqrtph\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 2 } } */ +/* { dg-final { scan-assembler-times "vrsqrtph\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ + +#include + +volatile __m512h res; +volatile __m512h x1; +volatile __mmask32 m32; + +void extern +avx512f_test (void) +{ + res = _mm512_rsqrt_ph (x1); + res = _mm512_mask_rsqrt_ph (res, m32, x1); + res = _mm512_maskz_rsqrt_ph (m32, x1); +} diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vrsqrtph-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vrsqrtph-1b.c new file mode 100644 index 00000000000..237971dbaa7 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vrsqrtph-1b.c @@ -0,0 +1,77 @@ +/* { dg-do run { target avx512fp16 } } */ +/* { dg-options "-O2 -mavx512fp16 -mavx512dq" } */ + + +#define AVX512FP16 +#include "avx512fp16-helper.h" + +#define N_ELEMS (AVX512F_LEN / 16) + +void NOINLINE +EMULATE(rsqrt_ph) (V512 * dest, V512 op1, + __mmask32 k, int zero_mask) +{ + V512 v1, v2, v3, v4, v5, v6, v7, v8; + int i; + __mmask16 m1, m2; + + m1 = k & 0xffff; + m2 = (k >> 16) & 0xffff; + + unpack_ph_2twops(op1, &v1, &v2); + unpack_ph_2twops(*dest, &v7, &v8); + + for (i = 0; i < 16; i++) { + if (((1 << i) & m1) == 0) { + if (zero_mask) { + v5.f32[i] = 0; + } + else { + v5.u32[i] = v7.u32[i]; + } + } + else { + v5.f32[i] = 1. / sqrtf(v1.f32[i]); + } + + if (((1 << i) & m2) == 0) { + if (zero_mask) { + v6.f32[i] = 0; + } + else { + v6.u32[i] = v8.u32[i]; + } + } + else { + v6.f32[i] = 1. / sqrtf(v2.f32[i]); + } + + } + *dest = pack_twops_2ph(v5, v6); +} + +void +TEST (void) +{ + V512 res; + V512 exp; + + init_src(); + + EMULATE(rsqrt_ph) (&exp, src1, NET_MASK, 0); + HF(res) = INTRINSIC (_rsqrt_ph) (HF(src1)); + CHECK_RESULT (&res, &exp, N_ELEMS, _rsqrt_ph); + + init_dest(&res, &exp); + EMULATE(rsqrt_ph) (&exp, src1, MASK_VALUE, 0); + HF(res) = INTRINSIC (_mask_rsqrt_ph) (HF(res), MASK_VALUE, HF(src1)); + CHECK_RESULT (&res, &exp, N_ELEMS, _mask_rsqrt_ph); + + EMULATE(rsqrt_ph) (&exp, src1, ZMASK_VALUE, 1); + HF(res) = INTRINSIC (_maskz_rsqrt_ph) (ZMASK_VALUE, HF(src1)); + CHECK_RESULT (&res, &exp, N_ELEMS, _maskz_rsqrt_ph); + + if (n_errs != 0) + abort (); +} + diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vrsqrtsh-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vrsqrtsh-1a.c new file mode 100644 index 00000000000..060ce33f164 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vrsqrtsh-1a.c @@ -0,0 +1,18 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx512fp16 -O2" } */ +/* { dg-final { scan-assembler-times "vrsqrtsh\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vrsqrtsh\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 2 } } */ +/* { dg-final { scan-assembler-times "vrsqrtsh\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ + +#include + +volatile __m128h res, x1, x2; +volatile __mmask8 m8; + +void extern +avx512f_test (void) +{ + res = _mm_rsqrt_sh (x1, x2); + res = _mm_mask_rsqrt_sh (res, m8, x1, x2); + res = _mm_maskz_rsqrt_sh (m8, x1, x2); +} diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vrsqrtsh-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vrsqrtsh-1b.c new file mode 100644 index 00000000000..5f20de7c24a --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vrsqrtsh-1b.c @@ -0,0 +1,59 @@ +/* { dg-do run { target avx512fp16 } } */ +/* { dg-options "-O2 -mavx512fp16 -mavx512dq" } */ + + +#define AVX512FP16 +#include "avx512fp16-helper.h" + +#define N_ELEMS 8 + +void NOINLINE +emulate_rsqrt_sh(V512 * dest, V512 op1, + __mmask8 k, int zero_mask) +{ + V512 v1, v2, v3, v4, v5, v6, v7, v8; + int i; + + unpack_ph_2twops(op1, &v1, &v2); + unpack_ph_2twops(*dest, &v7, &v8); + + if ((k&1) || !k) + v5.f32[0] = 1.0 / sqrtf(v1.f32[0]); + else if (zero_mask) + v5.f32[0] = 0; + else + v5.f32[0] = v7.f32[0]; + + for (i = 1; i < 8; i++) + v5.f32[i] = v1.f32[i]; + + *dest = pack_twops_2ph(v5, v6); +} + +void +test_512 (void) +{ + V512 res; + V512 exp; + + init_src(); + + emulate_rsqrt_sh(&exp, src1, 0x1, 0); + res.xmmh[0] = _mm_rsqrt_sh(exp.xmmh[0], src1.xmmh[0]); + check_results(&res, &exp, N_ELEMS, "_mm_rsqrt_sh"); + + init_dest(&res, &exp); + emulate_rsqrt_sh(&exp, src1, 0x1, 0); + res.xmmh[0] = _mm_mask_rsqrt_sh(res.xmmh[0], 0x1, exp.xmmh[0], src1.xmmh[0]); + check_results(&res, &exp, N_ELEMS, "_mm_mask_rsqrt_sh"); + + emulate_rsqrt_sh(&exp, src1, 0x1, 1); + res.xmmh[0] = _mm_maskz_rsqrt_sh(0x1, exp.xmmh[0], src1.xmmh[0]); + check_results(&res, &exp, N_ELEMS, "_mm_maskz_rsqrt_sh"); + + if (n_errs != 0) { + abort (); + } + +} + diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vsqrtph-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vsqrtph-1a.c new file mode 100644 index 00000000000..497b5bab1db --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vsqrtph-1a.c @@ -0,0 +1,24 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx512fp16 -O2" } */ +/* { dg-final { scan-assembler-times "vsqrtph\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 2 } } */ +/* { dg-final { scan-assembler-times "vsqrtph\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\[^\{\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vsqrtph\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vsqrtph\[ \\t\]+\{rn-sae\}\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vsqrtph\[ \\t\]+\{rz-sae\}\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ + +#include + +volatile __m512h res; +volatile __m512h x1; +volatile __mmask32 m32; + +void extern +avx512f_test (void) +{ + res = _mm512_sqrt_ph (x1); + res = _mm512_mask_sqrt_ph (res, m32, x1); + res = _mm512_maskz_sqrt_ph (m32, x1); + res = _mm512_sqrt_round_ph (x1, 4); + res = _mm512_mask_sqrt_round_ph (res, m32, x1, 8); + res = _mm512_maskz_sqrt_round_ph (m32, x1, 11); +} diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vsqrtph-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vsqrtph-1b.c new file mode 100644 index 00000000000..d4d047b194d --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vsqrtph-1b.c @@ -0,0 +1,92 @@ +/* { dg-do run { target avx512fp16 } } */ +/* { dg-options "-O2 -mavx512fp16 -mavx512dq" } */ + + +#define AVX512FP16 +#include "avx512fp16-helper.h" + +#define N_ELEMS (AVX512F_LEN / 16) + +void NOINLINE +EMULATE(sqrt_ph) (V512 * dest, V512 op1, + __mmask32 k, int zero_mask) +{ + V512 v1, v2, v3, v4, v5, v6, v7, v8; + int i; + __mmask16 m1, m2; + + m1 = k & 0xffff; + m2 = (k >> 16) & 0xffff; + + unpack_ph_2twops(op1, &v1, &v2); + unpack_ph_2twops(*dest, &v7, &v8); + + for (i = 0; i < 16; i++) { + if (((1 << i) & m1) == 0) { + if (zero_mask) { + v5.f32[i] = 0; + } + else { + v5.u32[i] = v7.u32[i]; + } + } + else { + v5.f32[i] = sqrtf(v1.f32[i]); + } + + if (((1 << i) & m2) == 0) { + if (zero_mask) { + v6.f32[i] = 0; + } + else { + v6.u32[i] = v8.u32[i]; + } + } + else { + v6.f32[i] = sqrtf(v2.f32[i]); + } + + } + *dest = pack_twops_2ph(v5, v6); +} + +void +TEST (void) +{ + V512 res; + V512 exp; + + init_src(); + + EMULATE(sqrt_ph) (&exp, src1, NET_MASK, 0); + HF(res) = INTRINSIC (_sqrt_ph) (HF(src1)); + CHECK_RESULT (&res, &exp, N_ELEMS, _sqrt_ph); + + init_dest(&res, &exp); + EMULATE(sqrt_ph) (&exp, src1, MASK_VALUE, 0); + HF(res) = INTRINSIC (_mask_sqrt_ph) (HF(res), MASK_VALUE, HF(src1)); + CHECK_RESULT (&res, &exp, N_ELEMS, _mask_sqrt_ph); + + EMULATE(sqrt_ph) (&exp, src1, ZMASK_VALUE, 1); + HF(res) = INTRINSIC (_maskz_sqrt_ph) (ZMASK_VALUE, HF(src1)); + CHECK_RESULT (&res, &exp, N_ELEMS, _maskz_sqrt_ph); + +#if AVX512F_LEN == 512 + EMULATE(sqrt_ph) (&exp, src1, NET_MASK, 0); + HF(res) = INTRINSIC (_sqrt_round_ph) (HF(src1), 8); + CHECK_RESULT (&res, &exp, N_ELEMS, _sqrt_round_ph); + + init_dest(&res, &exp); + EMULATE(sqrt_ph) (&exp, src1, MASK_VALUE, 0); + HF(res) = INTRINSIC (_mask_sqrt_round_ph) (HF(res), MASK_VALUE, HF(src1), 8); + CHECK_RESULT (&res, &exp, N_ELEMS, _mask_sqrt_round_ph); + + EMULATE(sqrt_ph) (&exp, src1, ZMASK_VALUE, 1); + HF(res) = INTRINSIC (_maskz_sqrt_round_ph) (ZMASK_VALUE, HF(src1), 8); + CHECK_RESULT (&res, &exp, N_ELEMS, _maskz_sqrt_round_ph); +#endif + + if (n_errs != 0) + abort (); +} + diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vsqrtsh-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vsqrtsh-1a.c new file mode 100644 index 00000000000..dd44534a2eb --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vsqrtsh-1a.c @@ -0,0 +1,23 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx512fp16 -O2" } */ +/* { dg-final { scan-assembler-times "vsqrtsh\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 2 } } */ +/* { dg-final { scan-assembler-times "vsqrtsh\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\[^\{\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vsqrtsh\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vsqrtsh\[ \\t\]+\{rn-sae\}\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vsqrtsh\[ \\t\]+\{rz-sae\}\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ + +#include + +volatile __m128h res, x1, x2; +volatile __mmask8 m8; + +void extern +avx512f_test (void) +{ + res = _mm_sqrt_sh (x1, x2); + res = _mm_mask_sqrt_sh (res, m8, x1, x2); + res = _mm_maskz_sqrt_sh (m8, x1, x2); + res = _mm_sqrt_round_sh (x1, x2, 4); + res = _mm_mask_sqrt_round_sh (res, m8, x1, x2, 8); + res = _mm_maskz_sqrt_round_sh (m8, x1, x2, 11); +} diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vsqrtsh-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vsqrtsh-1b.c new file mode 100644 index 00000000000..4744c6f1e55 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vsqrtsh-1b.c @@ -0,0 +1,60 @@ +/* { dg-do run { target avx512fp16 } } */ +/* { dg-options "-O2 -mavx512fp16 -mavx512dq" } */ + + +#define AVX512FP16 +#include "avx512fp16-helper.h" + +#define N_ELEMS 8 + +void NOINLINE +emulate_sqrt_sh(V512 * dest, V512 op1, + __mmask8 k, int zero_mask) +{ + V512 v1, v2, v3, v4, v5, v6, v7, v8; + int i; + + unpack_ph_2twops(op1, &v1, &v2); + unpack_ph_2twops(*dest, &v7, &v8); + + if ((k&1) || !k) + v5.f32[0] = sqrtf(v1.f32[0]); + else if (zero_mask) + v5.f32[0] = 0; + else + v5.f32[0] = v7.f32[0]; + + for (i = 1; i < 8; i++) + v5.f32[i] = v1.f32[i]; + + *dest = pack_twops_2ph(v5, v6); +} + +void +test_512 (void) +{ + V512 res; + V512 exp; + + init_src(); + + emulate_sqrt_sh(&exp, src1, 0x1, 0); + res.xmmh[0] = _mm_sqrt_round_sh(exp.xmmh[0], src1.xmmh[0], _ROUND_NINT); + check_results(&res, &exp, N_ELEMS, "_mm_sqrt_round_sh"); + + init_dest(&res, &exp); + emulate_sqrt_sh(&exp, src1, 0x1, 0); + res.xmmh[0] = _mm_mask_sqrt_round_sh(res.xmmh[0], 0x1, exp.xmmh[0], + src1.xmmh[0], _ROUND_NINT); + check_results(&res, &exp, N_ELEMS, "_mm_mask_sqrt_round_sh"); + + emulate_sqrt_sh(&exp, src1, 0x1, 1); + res.xmmh[0] = _mm_maskz_sqrt_round_sh(0x1, exp.xmmh[0], src1.xmmh[0], _ROUND_NINT); + check_results(&res, &exp, N_ELEMS, "_mm_maskz_sqrt_round_sh"); + + if (n_errs != 0) { + abort (); + } + +} + diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-vrsqrtph-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vrsqrtph-1a.c new file mode 100644 index 00000000000..a5edc176b63 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vrsqrtph-1a.c @@ -0,0 +1,29 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx512fp16 -mavx512vl -O2" } */ +/* { dg-final { scan-assembler-times "vrsqrtph\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vrsqrtph\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 2 } } */ +/* { dg-final { scan-assembler-times "vrsqrtph\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vrsqrtph\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vrsqrtph\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 2 } } */ +/* { dg-final { scan-assembler-times "vrsqrtph\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ + +#include + +volatile __m256h res1; +volatile __m128h res2; +volatile __m256h x1; +volatile __m128h x2; +volatile __mmask16 m16; +volatile __mmask8 m8; + +void extern +avx512f_test (void) +{ + res1 = _mm256_rsqrt_ph (x1); + res1 = _mm256_mask_rsqrt_ph (res1, m16, x1); + res1 = _mm256_maskz_rsqrt_ph (m16, x1); + + res2 = _mm_rsqrt_ph (x2); + res2 = _mm_mask_rsqrt_ph (res2, m8, x2); + res2 = _mm_maskz_rsqrt_ph (m8, x2); +} diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-vrsqrtph-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vrsqrtph-1b.c new file mode 100644 index 00000000000..a5e796b8ebb --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vrsqrtph-1b.c @@ -0,0 +1,16 @@ +/* { dg-do run { target avx512fp16 } } */ +/* { dg-options "-O2 -mavx512fp16 -mavx512vl -mavx512dq" } */ + +#define DEBUG +#define AVX512VL +#define AVX512F_LEN 256 +#define AVX512F_LEN_HALF 128 +#include "avx512fp16-vrsqrtph-1b.c" + +#undef AVX512F_LEN +#undef AVX512F_LEN_HALF + +#define AVX512F_LEN 128 +#define AVX512F_LEN_HALF 128 +#include "avx512fp16-vrsqrtph-1b.c" + diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-vsqrtph-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vsqrtph-1a.c new file mode 100644 index 00000000000..4acb137e6b8 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vsqrtph-1a.c @@ -0,0 +1,29 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx512fp16 -mavx512vl -O2" } */ +/* { dg-final { scan-assembler-times "vsqrtph\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vsqrtph\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 2 } } */ +/* { dg-final { scan-assembler-times "vsqrtph\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vsqrtph\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vsqrtph\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 2 } } */ +/* { dg-final { scan-assembler-times "vsqrtph\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ + +#include + +volatile __m256h res1; +volatile __m128h res2; +volatile __m256h x1; +volatile __m128h x2; +volatile __mmask16 m16; +volatile __mmask8 m8; + +void extern +avx512f_test (void) +{ + res1 = _mm256_sqrt_ph (x1); + res1 = _mm256_mask_sqrt_ph (res1, m16, x1); + res1 = _mm256_maskz_sqrt_ph (m16, x1); + + res2 = _mm_sqrt_ph (x2); + res2 = _mm_mask_sqrt_ph (res2, m8, x2); + res2 = _mm_maskz_sqrt_ph (m8, x2); +} diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-vsqrtph-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vsqrtph-1b.c new file mode 100644 index 00000000000..9b0a91d7b5d --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vsqrtph-1b.c @@ -0,0 +1,16 @@ +/* { dg-do run { target avx512fp16 } } */ +/* { dg-options "-O2 -mavx512fp16 -mavx512vl -mavx512dq" } */ + +#define DEBUG +#define AVX512VL +#define AVX512F_LEN 256 +#define AVX512F_LEN_HALF 128 +#include "avx512fp16-vsqrtph-1b.c" + +#undef AVX512F_LEN +#undef AVX512F_LEN_HALF + +#define AVX512F_LEN 128 +#define AVX512F_LEN_HALF 128 +#include "avx512fp16-vsqrtph-1b.c" + From patchwork Thu Jul 1 06:16:04 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: liuhongt X-Patchwork-Id: 1499325 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256 header.s=default header.b=KAaE9Z2S; dkim-atps=neutral Received: from sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4GFpTT1VxZz9sW8 for ; Thu, 1 Jul 2021 16:39:08 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 91D74384802F for ; Thu, 1 Jul 2021 06:39:05 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 91D74384802F DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1625121545; bh=wGQENrmMv+z1AfwbXtoz/zBBUis2wP2fmqMZuNeIwgg=; h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=KAaE9Z2SwuIWnxqmvhkindRul+rngQo9SHU0xJ/7quRgde9Zn+fkGJvMt9dqazZm0 pISTthe/GPlxcEDKG6/7+PkhKUVLKJfV/o+PGnfqiWn7mqFjlQA5Ymcy7burM+A9EM gtMwARBzkl4LM+myRcRwHlZvWtccnRA89w9LwRNU= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mga02.intel.com (mga02.intel.com [134.134.136.20]) by sourceware.org (Postfix) with ESMTPS id DB3E1384A014 for ; Thu, 1 Jul 2021 06:17:20 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org DB3E1384A014 X-IronPort-AV: E=McAfee;i="6200,9189,10031"; a="195639131" X-IronPort-AV: E=Sophos;i="5.83,313,1616482800"; d="scan'208";a="195639131" Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Jun 2021 23:17:20 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.83,313,1616482800"; d="scan'208";a="420287764" Received: from scymds01.sc.intel.com ([10.148.94.138]) by fmsmga007.fm.intel.com with ESMTP; 30 Jun 2021 23:17:20 -0700 Received: from shliclel320.sh.intel.com (shliclel320.sh.intel.com [10.239.236.50]) by scymds01.sc.intel.com with ESMTP id 1616Gmev031625; Wed, 30 Jun 2021 23:17:18 -0700 To: gcc-patches@gcc.gnu.org Subject: [PATCH 18/62] AVX512FP16: Add vrcpph/vrcpsh/vscalefph/vscalefsh. Date: Thu, 1 Jul 2021 14:16:04 +0800 Message-Id: <20210701061648.9447-19-hongtao.liu@intel.com> X-Mailer: git-send-email 2.18.1 In-Reply-To: <20210701061648.9447-1-hongtao.liu@intel.com> References: <20210701061648.9447-1-hongtao.liu@intel.com> X-Spam-Status: No, score=-12.8 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_NONE, KAM_DMARC_STATUS, KAM_LAZY_DOMAIN_SECURITY, KAM_SHORT, RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: liuhongt via Gcc-patches From: liuhongt Reply-To: liuhongt Cc: jakub@redhat.com Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" gcc/ChangeLog: * config/i386/avx512fp16intrin.h: (_mm512_rcp_ph): New intrinsic. (_mm512_mask_rcp_ph): Likewise. (_mm512_maskz_rcp_ph): Likewise. (_mm_rcp_sh): Likewise. (_mm_mask_rcp_sh): Likewise. (_mm_maskz_rcp_sh): Likewise. (_mm512_scalef_ph): Likewise. (_mm512_mask_scalef_ph): Likewise. (_mm512_maskz_scalef_ph): Likewise. (_mm512_scalef_round_ph): Likewise. (_mm512_mask_scalef_round_ph): Likewise. (_mm512_maskz_scalef_round_ph): Likewise. (_mm_scalef_sh): Likewise. (_mm_mask_scalef_sh): Likewise. (_mm_maskz_scalef_sh): Likewise. (_mm_scalef_round_sh): Likewise. (_mm_mask_scalef_round_sh): Likewise. (_mm_maskz_scalef_round_sh): Likewise. * config/i386/avx512fp16vlintrin.h (_mm_rcp_ph): New intrinsic. (_mm256_rcp_ph): Likewise. (_mm_mask_rcp_ph): Likewise. (_mm256_mask_rcp_ph): Likewise. (_mm_maskz_rcp_ph): Likewise. (_mm256_maskz_rcp_ph): Likewise. (_mm_scalef_ph): Likewise. (_mm256_scalef_ph): Likewise. (_mm_mask_scalef_ph): Likewise. (_mm256_mask_scalef_ph): Likewise. (_mm_maskz_scalef_ph): Likewise. (_mm256_maskz_scalef_ph): Likewise. * config/i386/i386-builtin.def: Add new builtins. * config/i386/sse.md (VFH_AVX512VL): New. (avx512fp16_rcp2): Ditto. (avx512fp16_vmrcpv8hf2): Ditto. (avx512f_vmscalef): Adjust to support HF vector modes. (_scalef): Ditto. gcc/testsuite/ChangeLog: * gcc.target/i386/avx-1.c: Add test for new builtins. * gcc.target/i386/sse-13.c: Ditto. * gcc.target/i386/sse-23.c: Ditto. * gcc.target/i386/sse-14.c: Add test for new intrinsics. * gcc.target/i386/sse-22.c: Ditto. --- gcc/config/i386/avx512fp16intrin.h | 195 +++++++++++++++++++++++++ gcc/config/i386/avx512fp16vlintrin.h | 97 ++++++++++++ gcc/config/i386/i386-builtin.def | 8 + gcc/config/i386/sse.md | 49 +++++-- gcc/testsuite/gcc.target/i386/avx-1.c | 2 + gcc/testsuite/gcc.target/i386/sse-13.c | 2 + gcc/testsuite/gcc.target/i386/sse-14.c | 6 + gcc/testsuite/gcc.target/i386/sse-22.c | 3 + gcc/testsuite/gcc.target/i386/sse-23.c | 2 + 9 files changed, 355 insertions(+), 9 deletions(-) diff --git a/gcc/config/i386/avx512fp16intrin.h b/gcc/config/i386/avx512fp16intrin.h index 50db5d12140..9a52d2ac36e 100644 --- a/gcc/config/i386/avx512fp16intrin.h +++ b/gcc/config/i386/avx512fp16intrin.h @@ -1428,6 +1428,201 @@ _mm_maskz_sqrt_round_sh (__mmask8 __A, __m128h __B, __m128h __C, #endif /* __OPTIMIZE__ */ +/* Intrinsics vrcpph. */ +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_rcp_ph (__m512h __A) +{ + return __builtin_ia32_vrcpph_v32hf_mask (__A, _mm512_setzero_ph (), + (__mmask32) -1); +} + +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_rcp_ph (__m512h __A, __mmask32 __B, __m512h __C) +{ + return __builtin_ia32_vrcpph_v32hf_mask (__C, __A, __B); +} + +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_rcp_ph (__mmask32 __A, __m512h __B) +{ + return __builtin_ia32_vrcpph_v32hf_mask (__B, _mm512_setzero_ph (), + __A); +} + +/* Intrinsics vrcpsh. */ +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_rcp_sh (__m128h __A, __m128h __B) +{ + return __builtin_ia32_vrcpsh_v8hf_mask (__B, __A, _mm_setzero_ph (), + (__mmask8) -1); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_rcp_sh (__m128h __A, __mmask32 __B, __m128h __C, __m128h __D) +{ + return __builtin_ia32_vrcpsh_v8hf_mask (__D, __C, __A, __B); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_rcp_sh (__mmask32 __A, __m128h __B, __m128h __C) +{ + return __builtin_ia32_vrcpsh_v8hf_mask (__C, __B, _mm_setzero_ph (), + __A); +} + +/* Intrinsics vscalefph. */ +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_scalef_ph (__m512h __A, __m512h __B) +{ + return __builtin_ia32_vscalefph_v32hf_mask_round (__A, __B, + _mm512_setzero_ph (), + (__mmask32) -1, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_scalef_ph (__m512h __A, __mmask32 __B, __m512h __C, __m512h __D) +{ + return __builtin_ia32_vscalefph_v32hf_mask_round (__C, __D, __A, __B, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_scalef_ph (__mmask32 __A, __m512h __B, __m512h __C) +{ + return __builtin_ia32_vscalefph_v32hf_mask_round (__B, __C, + _mm512_setzero_ph (), + __A, + _MM_FROUND_CUR_DIRECTION); +} + +#ifdef __OPTIMIZE__ +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_scalef_round_ph (__m512h __A, __m512h __B, const int __C) +{ + return __builtin_ia32_vscalefph_v32hf_mask_round (__A, __B, + _mm512_setzero_ph (), + (__mmask32) -1, __C); +} + +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_scalef_round_ph (__m512h __A, __mmask32 __B, __m512h __C, + __m512h __D, const int __E) +{ + return __builtin_ia32_vscalefph_v32hf_mask_round (__C, __D, __A, __B, + __E); +} + +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_scalef_round_ph (__mmask32 __A, __m512h __B, __m512h __C, + const int __D) +{ + return __builtin_ia32_vscalefph_v32hf_mask_round (__B, __C, + _mm512_setzero_ph (), + __A, __D); +} + +#else +#define _mm512_scalef_round_ph(A, B, C) \ + (__builtin_ia32_vscalefph_v32hf_mask_round ((A), (B), \ + _mm512_setzero_ph (), \ + (__mmask32)-1, (C))) + +#define _mm512_mask_scalef_round_ph(A, B, C, D, E) \ + (__builtin_ia32_vscalefph_v32hf_mask_round ((C), (D), (A), (B), (E))) + +#define _mm512_maskz_scalef_round_ph(A, B, C, D) \ + (__builtin_ia32_vscalefph_v32hf_mask_round ((B), (C), \ + _mm512_setzero_ph (), \ + (A), (D))) + +#endif /* __OPTIMIZE__ */ + +/* Intrinsics vscalefsh. */ +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_scalef_sh (__m128h __A, __m128h __B) +{ + return __builtin_ia32_vscalefsh_v8hf_mask_round (__A, __B, + _mm_setzero_ph (), + (__mmask8) -1, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_scalef_sh (__m128h __A, __mmask8 __B, __m128h __C, __m128h __D) +{ + return __builtin_ia32_vscalefsh_v8hf_mask_round (__C, __D, __A, __B, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_scalef_sh (__mmask8 __A, __m128h __B, __m128h __C) +{ + return __builtin_ia32_vscalefsh_v8hf_mask_round (__B, __C, + _mm_setzero_ph (), + __A, + _MM_FROUND_CUR_DIRECTION); +} + +#ifdef __OPTIMIZE__ +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_scalef_round_sh (__m128h __A, __m128h __B, const int __C) +{ + return __builtin_ia32_vscalefsh_v8hf_mask_round (__A, __B, + _mm_setzero_ph (), + (__mmask8) -1, __C); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_scalef_round_sh (__m128h __A, __mmask8 __B, __m128h __C, + __m128h __D, const int __E) +{ + return __builtin_ia32_vscalefsh_v8hf_mask_round (__C, __D, __A, __B, + __E); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_scalef_round_sh (__mmask8 __A, __m128h __B, __m128h __C, + const int __D) +{ + return __builtin_ia32_vscalefsh_v8hf_mask_round (__B, __C, + _mm_setzero_ph (), + __A, __D); +} + +#else +#define _mm_scalef_round_sh(A, B, C) \ + (__builtin_ia32_vscalefsh_v8hf_mask_round ((A), (B), \ + _mm_setzero_ph (), \ + (__mmask8)-1, (C))) + +#define _mm_mask_scalef_round_sh(A, B, C, D, E) \ + (__builtin_ia32_vscalefsh_v8hf_mask_round ((C), (D), (A), (B), (E))) + +#define _mm_maskz_scalef_round_sh(A, B, C, D) \ + (__builtin_ia32_vscalefsh_v8hf_mask_round ((B), (C), _mm_setzero_ph (), \ + (A), (D))) + +#endif /* __OPTIMIZE__ */ + #ifdef __DISABLE_AVX512FP16__ #undef __DISABLE_AVX512FP16__ #pragma GCC pop_options diff --git a/gcc/config/i386/avx512fp16vlintrin.h b/gcc/config/i386/avx512fp16vlintrin.h index aaed85203c9..ebda59b9f9a 100644 --- a/gcc/config/i386/avx512fp16vlintrin.h +++ b/gcc/config/i386/avx512fp16vlintrin.h @@ -451,6 +451,103 @@ _mm256_maskz_rsqrt_ph (__mmask16 __A, __m256h __B) __A); } +/* Intrinsics vrcpph. */ +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_rcp_ph (__m128h __A) +{ + return __builtin_ia32_vrcpph_v8hf_mask (__A, _mm_setzero_ph (), + (__mmask8) -1); +} + +extern __inline __m256h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_rcp_ph (__m256h __A) +{ + return __builtin_ia32_vrcpph_v16hf_mask (__A, _mm256_setzero_ph (), + (__mmask16) -1); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_rcp_ph (__m128h __A, __mmask8 __B, __m128h __C) +{ + return __builtin_ia32_vrcpph_v8hf_mask (__C, __A, __B); +} + +extern __inline __m256h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_mask_rcp_ph (__m256h __A, __mmask16 __B, __m256h __C) +{ + return __builtin_ia32_vrcpph_v16hf_mask (__C, __A, __B); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_rcp_ph (__mmask8 __A, __m128h __B) +{ + return __builtin_ia32_vrcpph_v8hf_mask (__B, _mm_setzero_ph (), __A); +} + +extern __inline __m256h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_maskz_rcp_ph (__mmask16 __A, __m256h __B) +{ + return __builtin_ia32_vrcpph_v16hf_mask (__B, _mm256_setzero_ph (), + __A); +} + +/* Intrinsics vscalefph. */ +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_scalef_ph (__m128h __A, __m128h __B) +{ + return __builtin_ia32_vscalefph_v8hf_mask (__A, __B, + _mm_setzero_ph (), + (__mmask8) -1); +} + +extern __inline __m256h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_scalef_ph (__m256h __A, __m256h __B) +{ + return __builtin_ia32_vscalefph_v16hf_mask (__A, __B, + _mm256_setzero_ph (), + (__mmask16) -1); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_scalef_ph (__m128h __A, __mmask8 __B, __m128h __C, __m128h __D) +{ + return __builtin_ia32_vscalefph_v8hf_mask (__C, __D, __A, __B); +} + +extern __inline __m256h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_mask_scalef_ph (__m256h __A, __mmask16 __B, __m256h __C, + __m256h __D) +{ + return __builtin_ia32_vscalefph_v16hf_mask (__C, __D, __A, __B); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_scalef_ph (__mmask8 __A, __m128h __B, __m128h __C) +{ + return __builtin_ia32_vscalefph_v8hf_mask (__B, __C, + _mm_setzero_ph (), __A); +} + +extern __inline __m256h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_maskz_scalef_ph (__mmask16 __A, __m256h __B, __m256h __C) +{ + return __builtin_ia32_vscalefph_v16hf_mask (__B, __C, + _mm256_setzero_ph (), + __A); +} + #ifdef __DISABLE_AVX512FP16VL__ #undef __DISABLE_AVX512FP16VL__ #pragma GCC pop_options diff --git a/gcc/config/i386/i386-builtin.def b/gcc/config/i386/i386-builtin.def index 999b2e1abb5..7b8ca3ba685 100644 --- a/gcc/config/i386/i386-builtin.def +++ b/gcc/config/i386/i386-builtin.def @@ -2808,6 +2808,12 @@ BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp1 BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_rsqrtv16hf2_mask, "__builtin_ia32_vrsqrtph_v16hf_mask", IX86_BUILTIN_VRSQRTPH_V16HF_MASK, UNKNOWN, (int) V16HF_FTYPE_V16HF_V16HF_UHI) BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_rsqrtv32hf2_mask, "__builtin_ia32_vrsqrtph_v32hf_mask", IX86_BUILTIN_VRSQRTPH_V32HF_MASK, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_USI) BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vmrsqrtv8hf2_mask, "__builtin_ia32_vrsqrtsh_v8hf_mask", IX86_BUILTIN_VRSQRTSH_V8HF_MASK, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI) +BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_rcpv8hf2_mask, "__builtin_ia32_vrcpph_v8hf_mask", IX86_BUILTIN_VRCPPH_V8HF_MASK, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_UQI) +BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_rcpv16hf2_mask, "__builtin_ia32_vrcpph_v16hf_mask", IX86_BUILTIN_VRCPPH_V16HF_MASK, UNKNOWN, (int) V16HF_FTYPE_V16HF_V16HF_UHI) +BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_rcpv32hf2_mask, "__builtin_ia32_vrcpph_v32hf_mask", IX86_BUILTIN_VRCPPH_V32HF_MASK, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_USI) +BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vmrcpv8hf2_mask, "__builtin_ia32_vrcpsh_v8hf_mask", IX86_BUILTIN_VRCPSH_V8HF_MASK, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI) +BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_scalefv8hf_mask, "__builtin_ia32_vscalefph_v8hf_mask", IX86_BUILTIN_VSCALEFPH_V8HF_MASK, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI) +BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512vl_scalefv16hf_mask, "__builtin_ia32_vscalefph_v16hf_mask", IX86_BUILTIN_VSCALEFPH_V16HF_MASK, UNKNOWN, (int) V16HF_FTYPE_V16HF_V16HF_V16HF_UHI) /* Builtins with rounding support. */ BDESC_END (ARGS, ROUND_ARGS) @@ -3025,6 +3031,8 @@ BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_cmpv32hf3_mask_round, " BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512f_vmcmpv8hf3_mask_round, "__builtin_ia32_vcmpsh_v8hf_mask_round", IX86_BUILTIN_VCMPSH_V8HF_MASK_ROUND, UNKNOWN, (int) UQI_FTYPE_V8HF_V8HF_INT_UQI_INT) BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_sqrtv32hf2_mask_round, "__builtin_ia32_vsqrtph_v32hf_mask_round", IX86_BUILTIN_VSQRTPH_V32HF_MASK_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_USI_INT) BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vmsqrtv8hf2_mask_round, "__builtin_ia32_vsqrtsh_v8hf_mask_round", IX86_BUILTIN_VSQRTSH_V8HF_MASK_ROUND, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI_INT) +BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_scalefv32hf_mask_round, "__builtin_ia32_vscalefph_v32hf_mask_round", IX86_BUILTIN_VSCALEFPH_V32HF_MASK_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT) +BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512f_vmscalefv8hf_mask_round, "__builtin_ia32_vscalefsh_v8hf_mask_round", IX86_BUILTIN_VSCALEFSH_V8HF_MASK_ROUND, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI_INT) BDESC_END (ROUND_ARGS, MULTI_ARG) diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index 4763fd0558d..683efe4bb0e 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -386,6 +386,13 @@ (define_mode_iterator VF_AVX512VL (define_mode_iterator VF1_AVX512ER_128_256 [(V16SF "TARGET_AVX512ER") (V8SF "TARGET_AVX") V4SF]) +(define_mode_iterator VFH_AVX512VL + [(V32HF "TARGET_AVX512FP16") + (V16HF "TARGET_AVX512FP16 && TARGET_AVX512VL") + (V8HF "TARGET_AVX512FP16 && TARGET_AVX512VL") + V16SF (V8SF "TARGET_AVX512VL") (V4SF "TARGET_AVX512VL") + V8DF (V4DF "TARGET_AVX512VL") (V2DF "TARGET_AVX512VL")]) + (define_mode_iterator VF2_AVX512VL [V8DF (V4DF "TARGET_AVX512VL") (V2DF "TARGET_AVX512VL")]) @@ -2198,6 +2205,30 @@ (define_insn "*sse_vmrcpv4sf2" (set_attr "prefix" "orig,vex") (set_attr "mode" "SF")]) +(define_insn "avx512fp16_rcp2" + [(set (match_operand:VF_AVX512FP16VL 0 "register_operand" "=v") + (unspec:VF_AVX512FP16VL + [(match_operand:VF_AVX512FP16VL 1 "nonimmediate_operand" "vm")] + UNSPEC_RCP))] + "TARGET_AVX512FP16" + "vrcpph\t{%1, %0|%0, %1}" + [(set_attr "type" "sse") + (set_attr "prefix" "evex") + (set_attr "mode" "")]) + +(define_insn "avx512fp16_vmrcpv8hf2" + [(set (match_operand:V8HF 0 "register_operand" "=v") + (vec_merge:V8HF + (unspec:V8HF [(match_operand:V8HF 1 "nonimmediate_operand" "vm")] + UNSPEC_RCP) + (match_operand:V8HF 2 "register_operand" "v") + (const_int 1)))] + "TARGET_AVX512FP16" + "vrcpsh\t{%1, %2, %0|%0, %2, %w1}" + [(set_attr "type" "sse") + (set_attr "prefix" "evex") + (set_attr "mode" "HF")]) + (define_insn "rcp14" [(set (match_operand:VF_AVX512VL 0 "register_operand" "=v") (unspec:VF_AVX512VL @@ -9948,11 +9979,11 @@ (define_split }) (define_insn "avx512f_vmscalef" - [(set (match_operand:VF_128 0 "register_operand" "=v") - (vec_merge:VF_128 - (unspec:VF_128 - [(match_operand:VF_128 1 "register_operand" "v") - (match_operand:VF_128 2 "" "")] + [(set (match_operand:VFH_128 0 "register_operand" "=v") + (vec_merge:VFH_128 + (unspec:VFH_128 + [(match_operand:VFH_128 1 "register_operand" "v") + (match_operand:VFH_128 2 "" "")] UNSPEC_SCALEF) (match_dup 1) (const_int 1)))] @@ -9962,10 +9993,10 @@ (define_insn "avx512f_vmscalef" (set_attr "mode" "")]) (define_insn "_scalef" - [(set (match_operand:VF_AVX512VL 0 "register_operand" "=v") - (unspec:VF_AVX512VL - [(match_operand:VF_AVX512VL 1 "register_operand" "v") - (match_operand:VF_AVX512VL 2 "nonimmediate_operand" "")] + [(set (match_operand:VFH_AVX512VL 0 "register_operand" "=v") + (unspec:VFH_AVX512VL + [(match_operand:VFH_AVX512VL 1 "register_operand" "v") + (match_operand:VFH_AVX512VL 2 "nonimmediate_operand" "")] UNSPEC_SCALEF))] "TARGET_AVX512F" "vscalef\t{%2, %1, %0|%0, %1, %2}" diff --git a/gcc/testsuite/gcc.target/i386/avx-1.c b/gcc/testsuite/gcc.target/i386/avx-1.c index 651cb1c80fb..17c396567f2 100644 --- a/gcc/testsuite/gcc.target/i386/avx-1.c +++ b/gcc/testsuite/gcc.target/i386/avx-1.c @@ -703,6 +703,8 @@ #define __builtin_ia32_vcmpsh_v8hf_mask_round(A, B, C, D, E) __builtin_ia32_vcmpsh_v8hf_mask_round(A, B, 1, D, 8) #define __builtin_ia32_vsqrtph_v32hf_mask_round(C, A, B, D) __builtin_ia32_vsqrtph_v32hf_mask_round(C, A, B, 8) #define __builtin_ia32_vsqrtsh_v8hf_mask_round(D, C, A, B, E) __builtin_ia32_vsqrtsh_v8hf_mask_round(D, C, A, B, 8) +#define __builtin_ia32_vscalefph_v32hf_mask_round(A, B, C, D, E) __builtin_ia32_vscalefph_v32hf_mask_round(A, B, C, D, 8) +#define __builtin_ia32_vscalefsh_v8hf_mask_round(A, B, C, D, E) __builtin_ia32_vscalefsh_v8hf_mask_round(A, B, C, D, 8) /* avx512fp16vlintrin.h */ #define __builtin_ia32_vcmpph_v8hf_mask(A, B, C, D) __builtin_ia32_vcmpph_v8hf_mask(A, B, 1, D) diff --git a/gcc/testsuite/gcc.target/i386/sse-13.c b/gcc/testsuite/gcc.target/i386/sse-13.c index 94553dec9e7..c1d95fc2ead 100644 --- a/gcc/testsuite/gcc.target/i386/sse-13.c +++ b/gcc/testsuite/gcc.target/i386/sse-13.c @@ -720,6 +720,8 @@ #define __builtin_ia32_vcmpsh_v8hf_mask_round(A, B, C, D, E) __builtin_ia32_vcmpsh_v8hf_mask_round(A, B, 1, D, 8) #define __builtin_ia32_vsqrtph_v32hf_mask_round(C, A, B, D) __builtin_ia32_vsqrtph_v32hf_mask_round(C, A, B, 8) #define __builtin_ia32_vsqrtsh_v8hf_mask_round(D, C, A, B, E) __builtin_ia32_vsqrtsh_v8hf_mask_round(D, C, A, B, 8) +#define __builtin_ia32_vscalefph_v32hf_mask_round(A, B, C, D, E) __builtin_ia32_vscalefph_v32hf_mask_round(A, B, C, D, 8) +#define __builtin_ia32_vscalefsh_v8hf_mask_round(A, B, C, D, E) __builtin_ia32_vscalefsh_v8hf_mask_round(A, B, C, D, 8) /* avx512fp16vlintrin.h */ #define __builtin_ia32_vcmpph_v8hf_mask(A, B, C, D) __builtin_ia32_vcmpph_v8hf_mask(A, B, 1, D) diff --git a/gcc/testsuite/gcc.target/i386/sse-14.c b/gcc/testsuite/gcc.target/i386/sse-14.c index 7281bffdf2b..5b6d0b082d1 100644 --- a/gcc/testsuite/gcc.target/i386/sse-14.c +++ b/gcc/testsuite/gcc.target/i386/sse-14.c @@ -687,6 +687,8 @@ test_2 (_mm512_cmp_ph_mask, __mmask32, __m512h, __m512h, 1) test_2 (_mm_comi_sh, int, __m128h, __m128h, 1) test_2 (_mm512_maskz_sqrt_round_ph, __m512h, __mmask32, __m512h, 8) test_2 (_mm_sqrt_round_sh, __m128h, __m128h, __m128h, 8) +test_2 (_mm512_scalef_round_ph, __m512h, __m512h, __m512h, 8) +test_2 (_mm_scalef_round_sh, __m128h, __m128h, __m128h, 8) test_2x (_mm512_cmp_round_ph_mask, __mmask32, __m512h, __m512h, 1, 8) test_2x (_mm_cmp_round_sh_mask, __mmask8, __m128h, __m128h, 1, 8) test_2x (_mm_comi_round_sh, int, __m128h, __m128h, 1, 8) @@ -705,6 +707,8 @@ test_3 (_mm_maskz_min_round_sh, __m128h, __mmask8, __m128h, __m128h, 8) test_3 (_mm512_mask_cmp_ph_mask, __mmask32, __mmask32, __m512h, __m512h, 1) test_3 (_mm512_mask_sqrt_round_ph, __m512h, __m512h, __mmask32, __m512h, 8) test_3 (_mm_maskz_sqrt_round_sh, __m128h, __mmask8, __m128h, __m128h, 8) +test_3 (_mm512_maskz_scalef_round_ph, __m512h, __mmask32, __m512h, __m512h, 8) +test_3 (_mm_maskz_scalef_round_sh, __m128h, __mmask8, __m128h, __m128h, 8) test_3x (_mm512_mask_cmp_round_ph_mask, __mmask32, __mmask32, __m512h, __m512h, 1, 8) test_3x (_mm_mask_cmp_round_sh_mask, __mmask8, __mmask8, __m128h, __m128h, 1, 8) test_4 (_mm512_mask_add_round_ph, __m512h, __m512h, __mmask32, __m512h, __m512h, 8) @@ -720,6 +724,8 @@ test_4 (_mm512_mask_min_round_ph, __m512h, __m512h, __mmask32, __m512h, __m512h, test_4 (_mm_mask_max_round_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 8) test_4 (_mm_mask_min_round_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 8) test_4 (_mm_mask_sqrt_round_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 8) +test_4 (_mm512_mask_scalef_round_ph, __m512h, __m512h, __mmask32, __m512h, __m512h, 8) +test_4 (_mm_mask_scalef_round_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 8) /* avx512fp16vlintrin.h */ test_2 (_mm_cmp_ph_mask, __mmask8, __m128h, __m128h, 1) diff --git a/gcc/testsuite/gcc.target/i386/sse-22.c b/gcc/testsuite/gcc.target/i386/sse-22.c index 04326e0e37d..b2de5679bb6 100644 --- a/gcc/testsuite/gcc.target/i386/sse-22.c +++ b/gcc/testsuite/gcc.target/i386/sse-22.c @@ -792,6 +792,7 @@ test_2 (_mm512_cmp_ph_mask, __mmask32, __m512h, __m512h, 1) test_2 (_mm_comi_sh, int, __m128h, __m128h, 1) test_2 (_mm512_maskz_sqrt_round_ph, __m512h, __mmask32, __m512h, 8) test_2 (_mm_sqrt_round_sh, __m128h, __m128h, __m128h, 8) +test_2 (_mm512_scalef_round_ph, __m512h, __m512h, __m512h, 8) test_2x (_mm512_cmp_round_ph_mask, __mmask32, __m512h, __m512h, 1, 8) test_2x (_mm_cmp_round_sh_mask, __mmask8, __m128h, __m128h, 1, 8) test_2x (_mm_comi_round_sh, int, __m128h, __m128h, 1, 8) @@ -810,6 +811,7 @@ test_3 (_mm_maskz_min_round_sh, __m128h, __mmask8, __m128h, __m128h, 8) test_3 (_mm512_mask_cmp_ph_mask, __mmask32, __mmask32, __m512h, __m512h, 1) test_3 (_mm512_mask_sqrt_round_ph, __m512h, __m512h, __mmask32, __m512h, 8) test_3 (_mm_maskz_sqrt_round_sh, __m128h, __mmask8, __m128h, __m128h, 8) +test_3 (_mm512_maskz_scalef_round_ph, __m512h, __mmask32, __m512h, __m512h, 8) test_3x (_mm512_mask_cmp_round_ph_mask, __mmask32, __mmask32, __m512h, __m512h, 1, 8) test_3x (_mm_mask_cmp_round_sh_mask, __mmask8, __mmask8, __m128h, __m128h, 1, 8) test_4 (_mm512_mask_add_round_ph, __m512h, __m512h, __mmask32, __m512h, __m512h, 8) @@ -825,6 +827,7 @@ test_4 (_mm512_mask_min_round_ph, __m512h, __m512h, __mmask32, __m512h, __m512h, test_4 (_mm_mask_max_round_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 8) test_4 (_mm_mask_min_round_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 8) test_4 (_mm_mask_sqrt_round_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 8) +test_4 (_mm512_mask_scalef_round_ph, __m512h, __m512h, __mmask32, __m512h, __m512h, 8) /* avx512fp16vlintrin.h */ test_2 (_mm_cmp_ph_mask, __mmask8, __m128h, __m128h, 1) diff --git a/gcc/testsuite/gcc.target/i386/sse-23.c b/gcc/testsuite/gcc.target/i386/sse-23.c index 7559d335dbc..5948622cc4f 100644 --- a/gcc/testsuite/gcc.target/i386/sse-23.c +++ b/gcc/testsuite/gcc.target/i386/sse-23.c @@ -721,6 +721,8 @@ #define __builtin_ia32_vcmpsh_v8hf_mask_round(A, B, C, D, E) __builtin_ia32_vcmpsh_v8hf_mask_round(A, B, 1, D, 8) #define __builtin_ia32_vsqrtph_v32hf_mask_round(C, A, B, D) __builtin_ia32_vsqrtph_v32hf_mask_round(C, A, B, 8) #define __builtin_ia32_vsqrtsh_v8hf_mask_round(D, C, A, B, E) __builtin_ia32_vsqrtsh_v8hf_mask_round(D, C, A, B, 8) +#define __builtin_ia32_vscalefph_v32hf_mask_round(A, B, C, D, E) __builtin_ia32_vscalefph_v32hf_mask_round(A, B, C, D, 8) +#define __builtin_ia32_vscalefsh_v8hf_mask_round(A, B, C, D, E) __builtin_ia32_vscalefsh_v8hf_mask_round(A, B, C, D, 8) /* avx512fp16vlintrin.h */ #define __builtin_ia32_vcmpph_v8hf_mask(A, B, C, D) __builtin_ia32_vcmpph_v8hf_mask(A, B, 1, D) From patchwork Thu Jul 1 06:16:05 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: liuhongt X-Patchwork-Id: 1499326 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=8.43.85.97; helo=sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256 header.s=default header.b=G+b6l/yd; dkim-atps=neutral Received: from sourceware.org (ip-8-43-85-97.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4GFpVk3rJ8z9sW8 for ; Thu, 1 Jul 2021 16:40:14 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 3E9293861C54 for ; Thu, 1 Jul 2021 06:40:12 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 3E9293861C54 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1625121612; bh=54RDVLrDg+FAjJkgaPkC7iAgFnRWUHfy5xukHf0Gs3Q=; h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=G+b6l/ydBiY8rletpdIDFHsiEoZWXO/SiwWpn2Oh8T8pqmM+6vJiSd4PfNOyb0rrD 1DFn1cdWecI2WFzmW/SgjX7h67HKxVwxtwutx0pSMKbuhFz2sfqg0+nOzeVD9TlSFP vBHVfYZS+jAb2AYcBuzEM0ZlO6bkb5ygUE2EK64k= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by sourceware.org (Postfix) with ESMTPS id A0834384B806 for ; Thu, 1 Jul 2021 06:17:22 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org A0834384B806 X-IronPort-AV: E=McAfee;i="6200,9189,10031"; a="205474459" X-IronPort-AV: E=Sophos;i="5.83,313,1616482800"; d="scan'208";a="205474459" Received: from fmsmga005.fm.intel.com ([10.253.24.32]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Jun 2021 23:17:21 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.83,313,1616482800"; d="scan'208";a="644339093" Received: from scymds01.sc.intel.com ([10.148.94.138]) by fmsmga005.fm.intel.com with ESMTP; 30 Jun 2021 23:17:21 -0700 Received: from shliclel320.sh.intel.com (shliclel320.sh.intel.com [10.239.236.50]) by scymds01.sc.intel.com with ESMTP id 1616Gmew031625; Wed, 30 Jun 2021 23:17:20 -0700 To: gcc-patches@gcc.gnu.org Subject: [PATCH 19/62] AVX512FP16: Add testcase for vrcpph/vrcpsh/vscalefph/vscalefsh. Date: Thu, 1 Jul 2021 14:16:05 +0800 Message-Id: <20210701061648.9447-20-hongtao.liu@intel.com> X-Mailer: git-send-email 2.18.1 In-Reply-To: <20210701061648.9447-1-hongtao.liu@intel.com> References: <20210701061648.9447-1-hongtao.liu@intel.com> X-Spam-Status: No, score=-12.4 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_NONE, KAM_DMARC_STATUS, KAM_LAZY_DOMAIN_SECURITY, KAM_SHORT, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: liuhongt via Gcc-patches From: liuhongt Reply-To: liuhongt Cc: jakub@redhat.com Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" gcc/testsuite/ChangeLog: * gcc.target/i386/avx512fp16-vrcpph-1a.c: New test. * gcc.target/i386/avx512fp16-vrcpph-1b.c: Ditto. * gcc.target/i386/avx512fp16-vrcpsh-1a.c: Ditto. * gcc.target/i386/avx512fp16-vrcpsh-1b.c: Ditto. * gcc.target/i386/avx512fp16-vscalefph-1a.c: Ditto. * gcc.target/i386/avx512fp16-vscalefph-1b.c: Ditto. * gcc.target/i386/avx512fp16-vscalefsh-1a.c: Ditto. * gcc.target/i386/avx512fp16-vscalefsh-1b.c: Ditto. * gcc.target/i386/avx512fp16vl-vrcpph-1a.c: Ditto. * gcc.target/i386/avx512fp16vl-vrcpph-1b.c: Ditto. * gcc.target/i386/avx512fp16vl-vscalefph-1a.c: Ditto. * gcc.target/i386/avx512fp16vl-vscalefph-1b.c: Ditto. --- .../gcc.target/i386/avx512fp16-vrcpph-1a.c | 19 ++++ .../gcc.target/i386/avx512fp16-vrcpph-1b.c | 79 ++++++++++++++++ .../gcc.target/i386/avx512fp16-vrcpsh-1a.c | 18 ++++ .../gcc.target/i386/avx512fp16-vrcpsh-1b.c | 57 +++++++++++ .../gcc.target/i386/avx512fp16-vscalefph-1a.c | 25 +++++ .../gcc.target/i386/avx512fp16-vscalefph-1b.c | 94 +++++++++++++++++++ .../gcc.target/i386/avx512fp16-vscalefsh-1a.c | 23 +++++ .../gcc.target/i386/avx512fp16-vscalefsh-1b.c | 58 ++++++++++++ .../gcc.target/i386/avx512fp16vl-vrcpph-1a.c | 29 ++++++ .../gcc.target/i386/avx512fp16vl-vrcpph-1b.c | 16 ++++ .../i386/avx512fp16vl-vscalefph-1a.c | 29 ++++++ .../i386/avx512fp16vl-vscalefph-1b.c | 16 ++++ 12 files changed, 463 insertions(+) create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vrcpph-1a.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vrcpph-1b.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vrcpsh-1a.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vrcpsh-1b.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vscalefph-1a.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vscalefph-1b.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vscalefsh-1a.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vscalefsh-1b.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vrcpph-1a.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vrcpph-1b.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vscalefph-1a.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vscalefph-1b.c diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vrcpph-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vrcpph-1a.c new file mode 100644 index 00000000000..6a5c642d7d8 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vrcpph-1a.c @@ -0,0 +1,19 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx512fp16 -O2" } */ +/* { dg-final { scan-assembler-times "vrcpph\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vrcpph\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 2 } } */ +/* { dg-final { scan-assembler-times "vrcpph\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ + +#include + +volatile __m512h res; +volatile __m512h x1; +volatile __mmask32 m32; + +void extern +avx512f_test (void) +{ + res = _mm512_rcp_ph (x1); + res = _mm512_mask_rcp_ph (res, m32, x1); + res = _mm512_maskz_rcp_ph (m32, x1); +} diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vrcpph-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vrcpph-1b.c new file mode 100644 index 00000000000..4a65451af3b --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vrcpph-1b.c @@ -0,0 +1,79 @@ +/* { dg-do run { target avx512fp16 } } */ +/* { dg-options "-O2 -mavx512fp16 -mavx512dq" } */ + + +#define AVX512FP16 +#include "avx512fp16-helper.h" + +#define N_ELEMS (AVX512F_LEN / 16) + +void NOINLINE +EMULATE(rcp_ph) (V512 * dest, V512 op1, + __mmask32 k, int zero_mask) +{ + V512 v1, v2, v3, v4, v5, v6, v7, v8; + int i; + __mmask16 m1, m2; + + m1 = k & 0xffff; + m2 = (k >> 16) & 0xffff; + + unpack_ph_2twops(op1, &v1, &v2); + unpack_ph_2twops(*dest, &v7, &v8); + + for (i = 0; i < 16; i++) { + if (((1 << i) & m1) == 0) { + if (zero_mask) { + v5.f32[i] = 0; + } + else { + v5.u32[i] = v7.u32[i]; + } + } + else { + v5.f32[i] = 1. / v1.f32[i]; + + } + + if (((1 << i) & m2) == 0) { + if (zero_mask) { + v6.f32[i] = 0; + } + else { + v6.u32[i] = v8.u32[i]; + } + } + else { + v6.f32[i] = 1. / v2.f32[i]; + } + + } + *dest = pack_twops_2ph(v5, v6); +} + +void +TEST (void) +{ + V512 res; + V512 exp; + + init_src(); + + EMULATE(rcp_ph) (&exp, src1, NET_MASK, 0); + HF(res) = INTRINSIC (_rcp_ph) (HF(src1)); + CHECK_RESULT (&res, &exp, N_ELEMS, _rcp_ph); + + init_dest(&res, &exp); + EMULATE(rcp_ph) (&exp, src1, MASK_VALUE, 0); + HF(res) = INTRINSIC (_mask_rcp_ph) (HF(res), MASK_VALUE, HF(src1)); + CHECK_RESULT (&res, &exp, N_ELEMS, _mask_rcp_ph); + + EMULATE(rcp_ph) (&exp, src1, ZMASK_VALUE, 1); + HF(res) = INTRINSIC (_maskz_rcp_ph) (ZMASK_VALUE, HF(src1)); + CHECK_RESULT (&res, &exp, N_ELEMS, _maskz_rcp_ph); + + if (n_errs != 0) + abort (); +} + + diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vrcpsh-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vrcpsh-1a.c new file mode 100644 index 00000000000..0a5a18e8b84 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vrcpsh-1a.c @@ -0,0 +1,18 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx512fp16 -O2" } */ +/* { dg-final { scan-assembler-times "vrcpsh\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vrcpsh\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 2 } } */ +/* { dg-final { scan-assembler-times "vrcpsh\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ + +#include + +volatile __m128h res, x1, x2; +volatile __mmask8 m8; + +void extern +avx512f_test (void) +{ + res = _mm_rcp_sh (x1, x2); + res = _mm_mask_rcp_sh (res, m8, x1, x2); + res = _mm_maskz_rcp_sh (m8, x1, x2); +} diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vrcpsh-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vrcpsh-1b.c new file mode 100644 index 00000000000..531689569cb --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vrcpsh-1b.c @@ -0,0 +1,57 @@ +/* { dg-do run { target avx512fp16 } } */ +/* { dg-options "-O2 -mavx512fp16 -mavx512dq" } */ + + +#define AVX512FP16 +#include "avx512fp16-helper.h" + +#define N_ELEMS 8 + +void NOINLINE +emulate_rcp_sh(V512 * dest, V512 op1, + __mmask32 k, int zero_mask) +{ + V512 v1, v2, v3, v4, v5, v6, v7, v8; + int i; + + unpack_ph_2twops(op1, &v1, &v2); + unpack_ph_2twops(*dest, &v7, &v8); + + if ((k&1) || !k) + v5.f32[0] = 1. / v1.f32[0]; + else if (zero_mask) + v5.f32[0] = 0; + else + v5.f32[0] = v7.f32[0]; + + for (i = 1; i < 8; i++) + v5.f32[i] = v1.f32[i]; + + *dest = pack_twops_2ph(v5, v6); +} + +void +test_512 (void) +{ + V512 res; + V512 exp; + + init_src(); + + emulate_rcp_sh(&exp, src1, 0x1, 0); + res.xmmh[0] = _mm_rcp_sh(exp.xmmh[0], src1.xmmh[0]); + check_results(&res, &exp, N_ELEMS, "_mm_rcp_sh"); + + init_dest(&res, &exp); + emulate_rcp_sh(&exp, src1, 0x1, 0); + res.xmmh[0] = _mm_mask_rcp_sh(res.xmmh[0], 0x1, exp.xmmh[0], src1.xmmh[0]); + check_results(&res, &exp, N_ELEMS, "_mm_mask_rcp_sh"); + + emulate_rcp_sh(&exp, src1, 0x3, 1); + res.xmmh[0] = _mm_maskz_rcp_sh(0x3, exp.xmmh[0], src1.xmmh[0]); + check_results(&res, &exp, N_ELEMS, "_mm_maskz_rcp_sh"); + + if (n_errs != 0) + abort (); +} + diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vscalefph-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vscalefph-1a.c new file mode 100644 index 00000000000..f3d27898f27 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vscalefph-1a.c @@ -0,0 +1,25 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx512fp16 -O2" } */ +/* { dg-final { scan-assembler-times "vscalefph\[ \\t\]+%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vscalefph\[ \\t\]+%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\[^\{\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vscalefph\[ \\t\]+%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vscalefph\[ \\t\]+\{rn-sae\}\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vscalefph\[ \\t\]+\{rn-sae\}\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vscalefph\[ \\t\]+\{rz-sae\}\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ + +#include + +volatile __m512h res, res1, res2; +volatile __m512h x1, x2; +volatile __mmask32 m32; + +void extern +avx512f_test (void) +{ + res = _mm512_scalef_ph (x1, x2); + res1 = _mm512_mask_scalef_ph (res1, m32, x1, x2); + res2 = _mm512_maskz_scalef_ph (m32, x1, x2); + res = _mm512_scalef_round_ph (x1, x2, 8); + res1 = _mm512_mask_scalef_round_ph (res1, m32, x1, x2, 8); + res2 = _mm512_maskz_scalef_round_ph (m32, x1, x2, 11); +} diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vscalefph-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vscalefph-1b.c new file mode 100644 index 00000000000..7c7288d6eb3 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vscalefph-1b.c @@ -0,0 +1,94 @@ +/* { dg-do run { target avx512fp16 } } */ +/* { dg-options "-O2 -mavx512fp16 -mavx512dq" } */ + + +#define DEBUG + +#define AVX512FP16 +#include "avx512fp16-helper.h" + +#define N_ELEMS (AVX512F_LEN / 16) + +void NOINLINE +EMULATE(scalef_ph) (V512 * dest, V512 op1, V512 op2, + __mmask32 k, int zero_mask) +{ + V512 v1, v2, v3, v4, v5, v6, v7, v8; + int i; + __mmask16 m1, m2; + + m1 = k & 0xffff; + m2 = (k >> 16) & 0xffff; + + unpack_ph_2twops(op1, &v1, &v2); + unpack_ph_2twops(op2, &v3, &v4); + unpack_ph_2twops(*dest, &v7, &v8); + + for (i = 0; i < 16; i++) { + if (((1 << i) & m1) == 0) { + if (zero_mask) { + v5.f32[i] = 0; + } + else { + v5.u32[i] = v7.u32[i]; + } + } + else { + v5.f32[i] = v1.f32[i] * powf(2.0f, floorf(v3.f32[i])); + } + + if (((1 << i) & m2) == 0) { + if (zero_mask) { + v6.f32[i] = 0; + } + else { + v6.u32[i] = v8.u32[i]; + } + } + else { + v6.f32[i] = v2.f32[i] * powf(2.0f, floorf(v4.f32[i])); + } + } + *dest = pack_twops_2ph(v5, v6); +} + +void +TEST (void) +{ + V512 res; + V512 exp; + + init_src(); + + EMULATE(scalef_ph) (&exp, src1, src2, NET_MASK, 0); + HF(res) = INTRINSIC (_scalef_ph) (HF(src1), HF(src2)); + CHECK_RESULT (&res, &exp, N_ELEMS, _scalef_ph); + + init_dest(&res, &exp); + EMULATE(scalef_ph) (&exp, src1, src2, MASK_VALUE, 0); + HF(res) = INTRINSIC (_mask_scalef_ph) (HF(res), MASK_VALUE, HF(src1), HF(src2)); + CHECK_RESULT (&res, &exp, N_ELEMS, _mask_scalef_ph); + + EMULATE(scalef_ph) (&exp, src1, src2, ZMASK_VALUE, 1); + HF(res) = INTRINSIC (_maskz_scalef_ph) (ZMASK_VALUE, HF(src1), HF(src2)); + CHECK_RESULT (&res, &exp, N_ELEMS, _maskz_scalef_ph); + +#if AVX512F_LEN == 512 + EMULATE(scalef_ph) (&exp, src1, src2, NET_MASK, 0); + HF(res) = INTRINSIC (_scalef_round_ph) (HF(src1), HF(src2), 0x04); + CHECK_RESULT (&res, &exp, N_ELEMS, _scalef_round_ph); + + init_dest(&res, &exp); + EMULATE(scalef_ph) (&exp, src1, src2, MASK_VALUE, 0); + HF(res) = INTRINSIC (_mask_scalef_round_ph) (HF(res), MASK_VALUE, HF(src1), HF(src2), 0x04); + CHECK_RESULT (&res, &exp, N_ELEMS, _mask_scalef_round_ph); + + EMULATE(scalef_ph) (&exp, src1, src2, ZMASK_VALUE, 1); + HF(res) = INTRINSIC (_maskz_scalef_round_ph) (ZMASK_VALUE, HF(src1), HF(src2), 0x04); + CHECK_RESULT (&res, &exp, N_ELEMS, _maskz_scalef_round_ph); +#endif + + if (n_errs != 0) + abort (); +} + diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vscalefsh-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vscalefsh-1a.c new file mode 100644 index 00000000000..999c04849e9 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vscalefsh-1a.c @@ -0,0 +1,23 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx512fp16 -O2" } */ +/* { dg-final { scan-assembler-times "vscalefsh\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 2 } } */ +/* { dg-final { scan-assembler-times "vscalefsh\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\[^\{\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vscalefsh\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vscalefsh\[ \\t\]+\{rn-sae\}\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vscalefsh\[ \\t\]+\{rz-sae\}\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ + +#include + +volatile __m128h res, x1, x2; +volatile __mmask8 m8; + +void extern +avx512f_test (void) +{ + res = _mm_scalef_sh (x1, x2); + res = _mm_mask_scalef_sh (res, m8, x1, x2); + res = _mm_maskz_scalef_sh (m8, x1, x2); + res = _mm_scalef_round_sh (x1, x2, 4); + res = _mm_mask_scalef_round_sh (res, m8, x1, x2, 8); + res = _mm_maskz_scalef_round_sh (m8, x1, x2, 11); +} diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vscalefsh-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vscalefsh-1b.c new file mode 100644 index 00000000000..5db7be0715f --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vscalefsh-1b.c @@ -0,0 +1,58 @@ +/* { dg-do run { target avx512fp16 } } */ +/* { dg-options "-O2 -mavx512fp16 -mavx512dq" } */ + + +#define AVX512FP16 +#include "avx512fp16-helper.h" + +#define N_ELEMS 8 + +void NOINLINE +emulate_scalef_sh(V512 * dest, V512 op1, V512 op2, + __mmask8 k, int zero_mask) +{ + V512 v1, v2, v3, v4, v5, v6, v7, v8; + int i; + + unpack_ph_2twops(op1, &v1, &v2); + unpack_ph_2twops(op2, &v3, &v4); + unpack_ph_2twops(*dest, &v7, &v8); + + if ((k&1) || !k) + v5.f32[0] = v1.f32[0] * powf(2.0f, floorf(v3.f32[0])); + else if (zero_mask) + v5.f32[0] = 0; + else + v5.f32[0] = v7.f32[0]; + + for (i = 1; i < 8; i++) + v5.f32[i] = v1.f32[i]; + + *dest = pack_twops_2ph(v5, v6); +} + +void +test_512 (void) +{ + V512 res; + V512 exp; + + init_src(); + emulate_scalef_sh(&exp, src1, src2, 0x1, 0); + res.xmmh[0] = _mm_scalef_round_sh(src1.xmmh[0], src2.xmmh[0], (0x00 | 0x08)); + check_results(&res, &exp, N_ELEMS, "_mm_scalef_round_sh"); + + init_dest(&res, &exp); + emulate_scalef_sh(&exp, src1, src2, 0x1, 0); + res.xmmh[0] = _mm_mask_scalef_round_sh(res.xmmh[0], 0x1, src1.xmmh[0], src2.xmmh[0], (0x00 | 0x08)); + check_results(&res, &exp, N_ELEMS, "_mm_mask_scalef_round_sh"); + + emulate_scalef_sh(&exp, src1, src2, 0x3, 1); + res.xmmh[0] = _mm_maskz_scalef_round_sh(0x3, src1.xmmh[0], src2.xmmh[0], (0x00 | 0x08)); + check_results(&res, &exp, N_ELEMS, "_mm_maskz_scalef_round_sh"); + + if (n_errs != 0) { + abort (); + } +} + diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-vrcpph-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vrcpph-1a.c new file mode 100644 index 00000000000..5894dbc679f --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vrcpph-1a.c @@ -0,0 +1,29 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx512fp16 -mavx512vl -O2" } */ +/* { dg-final { scan-assembler-times "vrcpph\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vrcpph\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 2 } } */ +/* { dg-final { scan-assembler-times "vrcpph\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vrcpph\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vrcpph\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 2 } } */ +/* { dg-final { scan-assembler-times "vrcpph\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ + +#include + +volatile __m256h res1; +volatile __m128h res2; +volatile __m256h x1; +volatile __m128h x2; +volatile __mmask16 m16; +volatile __mmask8 m8; + +void extern +avx512f_test (void) +{ + res1 = _mm256_rcp_ph (x1); + res1 = _mm256_mask_rcp_ph (res1, m16, x1); + res1 = _mm256_maskz_rcp_ph (m16, x1); + + res2 = _mm_rcp_ph (x2); + res2 = _mm_mask_rcp_ph (res2, m8, x2); + res2 = _mm_maskz_rcp_ph (m8, x2); +} diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-vrcpph-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vrcpph-1b.c new file mode 100644 index 00000000000..a6b1e376a8e --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vrcpph-1b.c @@ -0,0 +1,16 @@ +/* { dg-do run { target avx512fp16 } } */ +/* { dg-options "-O2 -mavx512fp16 -mavx512vl -mavx512dq" } */ + +#define DEBUG +#define AVX512VL +#define AVX512F_LEN 256 +#define AVX512F_LEN_HALF 128 +#include "avx512fp16-vrcpph-1b.c" + +#undef AVX512F_LEN +#undef AVX512F_LEN_HALF + +#define AVX512F_LEN 128 +#define AVX512F_LEN_HALF 128 +#include "avx512fp16-vrcpph-1b.c" + diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-vscalefph-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vscalefph-1a.c new file mode 100644 index 00000000000..22231d628cf --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vscalefph-1a.c @@ -0,0 +1,29 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx512fp16 -mavx512vl -O2" } */ +/* { dg-final { scan-assembler-times "vscalefph\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vscalefph\[ \\t\]+%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vscalefph\[ \\t\]+%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vscalefph\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vscalefph\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vscalefph\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ + +#include + +volatile __m256h res1; +volatile __m128h res2; +volatile __m256h x1,x2; +volatile __m128h x3, x4; +volatile __mmask16 m16; +volatile __mmask8 m8; + +void extern +avx512f_test (void) +{ + res1 = _mm256_scalef_ph (x1, x2); + res1 = _mm256_mask_scalef_ph (res1, m16, x1, x2); + res1 = _mm256_maskz_scalef_ph (m16, x1, x2); + + res2 = _mm_scalef_ph (x3, x4); + res2 = _mm_mask_scalef_ph (res2, m8, x3, x4); + res2 = _mm_maskz_scalef_ph (m8, x3, x4); +} diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-vscalefph-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vscalefph-1b.c new file mode 100644 index 00000000000..5c12d08e2e1 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vscalefph-1b.c @@ -0,0 +1,16 @@ +/* { dg-do run { target avx512fp16 } } */ +/* { dg-options "-O2 -mavx512fp16 -mavx512vl -mavx512dq" } */ + +#define DEBUG +#define AVX512VL +#define AVX512F_LEN 256 +#define AVX512F_LEN_HALF 128 +#include "avx512fp16-vscalefph-1b.c" + +#undef AVX512F_LEN +#undef AVX512F_LEN_HALF + +#define AVX512F_LEN 128 +#define AVX512F_LEN_HALF 128 +#include "avx512fp16-vscalefph-1b.c" + From patchwork Thu Jul 1 06:16:06 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: liuhongt X-Patchwork-Id: 1499327 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=8.43.85.97; helo=sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256 header.s=default header.b=uIpZxyVy; dkim-atps=neutral Received: from sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4GFpWv1mBbz9sW8 for ; Thu, 1 Jul 2021 16:41:15 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 9CDC5384802F for ; Thu, 1 Jul 2021 06:41:12 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 9CDC5384802F DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1625121672; bh=v0a1vYJVsUM83ma68zOm2IWiplHtMhodOnFaV/7LFUE=; h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=uIpZxyVyRLPQX+XjuJ466YI4Y6Ih1Li5tIgWE/2JJxCmbKjUPhbd13FSkTFWYW4ey IvgnZEUDkkM0TwcAN/DQ2Mg4aoeUBe2rS6YZKeu1dA2Xq1iWoB2cciojAz0sS+ZyAy Eq4usKOWoX+rRLl/uDS9YBFLLnPzWVO+/8i1FQ4s= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mga17.intel.com (mga17.intel.com [192.55.52.151]) by sourceware.org (Postfix) with ESMTPS id E4A46384A014 for ; Thu, 1 Jul 2021 06:17:24 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org E4A46384A014 X-IronPort-AV: E=McAfee;i="6200,9189,10031"; a="188859418" X-IronPort-AV: E=Sophos;i="5.83,313,1616482800"; d="scan'208";a="188859418" Received: from orsmga003.jf.intel.com ([10.7.209.27]) by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Jun 2021 23:17:23 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.83,313,1616482800"; d="scan'208";a="409038723" Received: from scymds01.sc.intel.com ([10.148.94.138]) by orsmga003.jf.intel.com with ESMTP; 30 Jun 2021 23:17:23 -0700 Received: from shliclel320.sh.intel.com (shliclel320.sh.intel.com [10.239.236.50]) by scymds01.sc.intel.com with ESMTP id 1616Gmex031625; Wed, 30 Jun 2021 23:17:22 -0700 To: gcc-patches@gcc.gnu.org Subject: [PATCH 20/62] AVX512FP16: Add vreduceph/vreducesh/vrndscaleph/vrndscalesh. Date: Thu, 1 Jul 2021 14:16:06 +0800 Message-Id: <20210701061648.9447-21-hongtao.liu@intel.com> X-Mailer: git-send-email 2.18.1 In-Reply-To: <20210701061648.9447-1-hongtao.liu@intel.com> References: <20210701061648.9447-1-hongtao.liu@intel.com> X-Spam-Status: No, score=-12.5 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_NONE, KAM_DMARC_STATUS, KAM_LAZY_DOMAIN_SECURITY, KAM_SHORT, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: liuhongt via Gcc-patches From: liuhongt Reply-To: liuhongt Cc: jakub@redhat.com Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" gcc/ChangeLog: * config/i386/avx512fp16intrin.h (_mm512_reduce_ph): New intrinsic. (_mm512_mask_reduce_ph): Likewise. (_mm512_maskz_reduce_ph): Likewise. (_mm512_reduce_round_ph): Likewise. (_mm512_mask_reduce_round_ph): Likewise. (_mm512_maskz_reduce_round_ph): Likewise. (_mm_reduce_sh): Likewise. (_mm_mask_reduce_sh): Likewise. (_mm_maskz_reduce_sh): Likewise. (_mm_reduce_round_sh): Likewise. (_mm_mask_reduce_round_sh): Likewise. (_mm_maskz_reduce_round_sh): Likewise. (_mm512_roundscale_ph): Likewise. (_mm512_mask_roundscale_ph): Likewise. (_mm512_maskz_roundscale_ph): Likewise. (_mm512_roundscale_round_ph): Likewise. (_mm512_mask_roundscale_round_ph): Likewise. (_mm512_maskz_roundscale_round_ph): Likewise. (_mm_roundscale_sh): Likewise. (_mm_mask_roundscale_sh): Likewise. (_mm_maskz_roundscale_sh): Likewise. (_mm_roundscale_round_sh): Likewise. (_mm_mask_roundscale_round_sh): Likewise. (_mm_maskz_roundscale_round_sh): Likewise. * config/i386/avx512fp16vlintrin.h: (_mm_reduce_ph): New intrinsic. (_mm_mask_reduce_ph): Likewise. (_mm_maskz_reduce_ph): Likewise. (_mm256_reduce_ph): Likewise. (_mm256_mask_reduce_ph): Likewise. (_mm256_maskz_reduce_ph): Likewise. (_mm_roundscale_ph): Likewise. (_mm_mask_roundscale_ph): Likewise. (_mm_maskz_roundscale_ph): Likewise. (_mm256_roundscale_ph): Likewise. (_mm256_mask_roundscale_ph): Likewise. (_mm256_maskz_roundscale_ph): Likewise. * config/i386/i386-builtin-types.def: Add corresponding builtin types. * config/i386/i386-builtin.def: Add corresponding new builtins. * config/i386/i386-expand.c (ix86_expand_args_builtin): Handle new builtin types. (ix86_expand_round_builtin): Ditto. * config/i386/sse.md (reducep): Renamed to ... (reducep): ... this, and adjust for round operands. (reduces): Likewise, with ... (reduces_rndscale): Adjust for HF vector modes. (avx512f_rndscale): Ditto. (*avx512f_rndscale): Ditto. gcc/testsuite/ChangeLog: * gcc.target/i386/avx-1.c: Add test for new builtins. * gcc.target/i386/sse-13.c: Ditto. * gcc.target/i386/sse-23.c: Ditto. * gcc.target/i386/sse-14.c: Add test for new intrinsics. * gcc.target/i386/sse-22.c: Ditto. --- gcc/config/i386/avx512fp16intrin.h | 359 +++++++++++++++++++++++++ gcc/config/i386/avx512fp16vlintrin.h | 153 +++++++++++ gcc/config/i386/i386-builtin-types.def | 4 + gcc/config/i386/i386-builtin.def | 8 + gcc/config/i386/i386-expand.c | 4 + gcc/config/i386/sse.md | 44 +-- gcc/testsuite/gcc.target/i386/avx-1.c | 8 + gcc/testsuite/gcc.target/i386/sse-13.c | 8 + gcc/testsuite/gcc.target/i386/sse-14.c | 36 +++ gcc/testsuite/gcc.target/i386/sse-22.c | 36 +++ gcc/testsuite/gcc.target/i386/sse-23.c | 8 + 11 files changed, 646 insertions(+), 22 deletions(-) diff --git a/gcc/config/i386/avx512fp16intrin.h b/gcc/config/i386/avx512fp16intrin.h index 9a52d2ac36e..8c2c9b28987 100644 --- a/gcc/config/i386/avx512fp16intrin.h +++ b/gcc/config/i386/avx512fp16intrin.h @@ -1623,6 +1623,365 @@ _mm_maskz_scalef_round_sh (__mmask8 __A, __m128h __B, __m128h __C, #endif /* __OPTIMIZE__ */ +/* Intrinsics vreduceph. */ +#ifdef __OPTIMIZE__ +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_reduce_ph (__m512h __A, int __B) +{ + return __builtin_ia32_vreduceph_v32hf_mask_round (__A, __B, + _mm512_setzero_ph (), + (__mmask32) -1, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_reduce_ph (__m512h __A, __mmask32 __B, __m512h __C, int __D) +{ + return __builtin_ia32_vreduceph_v32hf_mask_round (__C, __D, __A, __B, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_reduce_ph (__mmask32 __A, __m512h __B, int __C) +{ + return __builtin_ia32_vreduceph_v32hf_mask_round (__B, __C, + _mm512_setzero_ph (), + __A, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_reduce_round_ph (__m512h __A, int __B, const int __C) +{ + return __builtin_ia32_vreduceph_v32hf_mask_round (__A, __B, + _mm512_setzero_ph (), + (__mmask32) -1, __C); +} + +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_reduce_round_ph (__m512h __A, __mmask32 __B, __m512h __C, + int __D, const int __E) +{ + return __builtin_ia32_vreduceph_v32hf_mask_round (__C, __D, __A, __B, + __E); +} + +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_reduce_round_ph (__mmask32 __A, __m512h __B, int __C, + const int __D) +{ + return __builtin_ia32_vreduceph_v32hf_mask_round (__B, __C, + _mm512_setzero_ph (), + __A, __D); +} + +#else +#define _mm512_reduce_ph(A, B) \ + (__builtin_ia32_vreduceph_v32hf_mask_round ((A), (B), \ + _mm512_setzero_ph (), \ + (__mmask32)-1, \ + _MM_FROUND_CUR_DIRECTION)) + +#define _mm512_mask_reduce_ph(A, B, C, D) \ + (__builtin_ia32_vreduceph_v32hf_mask_round ((C), (D), (A), (B), \ + _MM_FROUND_CUR_DIRECTION)) + +#define _mm512_maskz_reduce_ph(A, B, C) \ + (__builtin_ia32_vreduceph_v32hf_mask_round ((B), (C), \ + _mm512_setzero_ph (), \ + (A), _MM_FROUND_CUR_DIRECTION)) + +#define _mm512_reduce_round_ph(A, B, C) \ + (__builtin_ia32_vreduceph_v32hf_mask_round ((A), (B), \ + _mm512_setzero_ph (), \ + (__mmask32)-1, (C))) + +#define _mm512_mask_reduce_round_ph(A, B, C, D, E) \ + (__builtin_ia32_vreduceph_v32hf_mask_round ((C), (D), (A), (B), (E))) + +#define _mm512_maskz_reduce_round_ph(A, B, C, D) \ + (__builtin_ia32_vreduceph_v32hf_mask_round ((B), (C), \ + _mm512_setzero_ph (), \ + (A), (D))) + +#endif /* __OPTIMIZE__ */ + +/* Intrinsics vreducesh. */ +#ifdef __OPTIMIZE__ +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_reduce_sh (__m128h __A, __m128h __B, int __C) +{ + return __builtin_ia32_vreducesh_v8hf_mask_round (__A, __B, __C, + _mm_setzero_ph (), + (__mmask8) -1, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_reduce_sh (__m128h __A, __mmask8 __B, __m128h __C, + __m128h __D, int __E) +{ + return __builtin_ia32_vreducesh_v8hf_mask_round (__C, __D, __E, __A, __B, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_reduce_sh (__mmask8 __A, __m128h __B, __m128h __C, int __D) +{ + return __builtin_ia32_vreducesh_v8hf_mask_round (__B, __C, __D, + _mm_setzero_ph (), __A, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_reduce_round_sh (__m128h __A, __m128h __B, int __C, const int __D) +{ + return __builtin_ia32_vreducesh_v8hf_mask_round (__A, __B, __C, + _mm_setzero_ph (), + (__mmask8) -1, __D); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_reduce_round_sh (__m128h __A, __mmask8 __B, __m128h __C, + __m128h __D, int __E, const int __F) +{ + return __builtin_ia32_vreducesh_v8hf_mask_round (__C, __D, __E, __A, + __B, __F); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_reduce_round_sh (__mmask8 __A, __m128h __B, __m128h __C, + int __D, const int __E) +{ + return __builtin_ia32_vreducesh_v8hf_mask_round (__B, __C, __D, + _mm_setzero_ph (), + __A, __E); +} + +#else +#define _mm_reduce_sh(A, B, C) \ + (__builtin_ia32_vreducesh_v8hf_mask_round ((A), (B), (C), \ + _mm_setzero_ph (), \ + (__mmask8)-1, \ + _MM_FROUND_CUR_DIRECTION)) + +#define _mm_mask_reduce_sh(A, B, C, D, E) \ + (__builtin_ia32_vreducesh_v8hf_mask_round ((C), (D), (E), (A), (B), \ + _MM_FROUND_CUR_DIRECTION)) + +#define _mm_maskz_reduce_sh(A, B, C, D) \ + (__builtin_ia32_vreducesh_v8hf_mask_round ((B), (C), (D), \ + _mm_setzero_ph (), \ + (A), _MM_FROUND_CUR_DIRECTION)) + +#define _mm_reduce_round_sh(A, B, C, D) \ + (__builtin_ia32_vreducesh_v8hf_mask_round ((A), (B), (C), \ + _mm_setzero_ph (), \ + (__mmask8)-1, (D))) + +#define _mm_mask_reduce_round_sh(A, B, C, D, E, F) \ + (__builtin_ia32_vreducesh_v8hf_mask_round ((C), (D), (E), (A), (B), (F))) + +#define _mm_maskz_reduce_round_sh(A, B, C, D, E) \ + (__builtin_ia32_vreducesh_v8hf_mask_round ((B), (C), (D), \ + _mm_setzero_ph (), \ + (A), (E))) + +#endif /* __OPTIMIZE__ */ + +/* Intrinsics vrndscaleph. */ +#ifdef __OPTIMIZE__ +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_roundscale_ph (__m512h __A, int __B) +{ + return __builtin_ia32_vrndscaleph_v32hf_mask_round (__A, __B, + _mm512_setzero_ph (), + (__mmask32) -1, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_roundscale_ph (__m512h __A, __mmask32 __B, + __m512h __C, int __D) +{ + return __builtin_ia32_vrndscaleph_v32hf_mask_round (__C, __D, __A, __B, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_roundscale_ph (__mmask32 __A, __m512h __B, int __C) +{ + return __builtin_ia32_vrndscaleph_v32hf_mask_round (__B, __C, + _mm512_setzero_ph (), + __A, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_roundscale_round_ph (__m512h __A, int __B, const int __C) +{ + return __builtin_ia32_vrndscaleph_v32hf_mask_round (__A, __B, + _mm512_setzero_ph (), + (__mmask32) -1, + __C); +} + +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_roundscale_round_ph (__m512h __A, __mmask32 __B, + __m512h __C, int __D, const int __E) +{ + return __builtin_ia32_vrndscaleph_v32hf_mask_round (__C, __D, __A, + __B, __E); +} + +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_roundscale_round_ph (__mmask32 __A, __m512h __B, int __C, + const int __D) +{ + return __builtin_ia32_vrndscaleph_v32hf_mask_round (__B, __C, + _mm512_setzero_ph (), + __A, __D); +} + +#else +#define _mm512_roundscale_ph(A, B) \ + (__builtin_ia32_vrndscaleph_v32hf_mask_round ((A), (B), \ + _mm512_setzero_ph (), \ + (__mmask32)-1, \ + _MM_FROUND_CUR_DIRECTION)) + +#define _mm512_mask_roundscale_ph(A, B, C, D) \ + (__builtin_ia32_vrndscaleph_v32hf_mask_round ((C), (D), (A), (B), \ + _MM_FROUND_CUR_DIRECTION)) + +#define _mm512_maskz_roundscale_ph(A, B, C) \ + (__builtin_ia32_vrndscaleph_v32hf_mask_round ((B), (C), \ + _mm512_setzero_ph (), \ + (A), \ + _MM_FROUND_CUR_DIRECTION)) +#define _mm512_roundscale_round_ph(A, B, C) \ + (__builtin_ia32_vrndscaleph_v32hf_mask_round ((A), (B), \ + _mm512_setzero_ph (), \ + (__mmask32)-1, (C))) + +#define _mm512_mask_roundscale_round_ph(A, B, C, D, E) \ + (__builtin_ia32_vrndscaleph_v32hf_mask_round ((C), (D), (A), (B), (E))) + +#define _mm512_maskz_roundscale_round_ph(A, B, C, D) \ + (__builtin_ia32_vrndscaleph_v32hf_mask_round ((B), (C), \ + _mm512_setzero_ph (), \ + (A), (D))) + +#endif /* __OPTIMIZE__ */ + +/* Intrinsics vrndscalesh. */ +#ifdef __OPTIMIZE__ +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_roundscale_sh (__m128h __A, __m128h __B, int __C) +{ + return __builtin_ia32_vrndscalesh_v8hf_mask_round (__A, __B, __C, + _mm_setzero_ph (), + (__mmask8) -1, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_roundscale_sh (__m128h __A, __mmask8 __B, __m128h __C, + __m128h __D, int __E) +{ + return __builtin_ia32_vrndscalesh_v8hf_mask_round (__C, __D, __E, __A, __B, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_roundscale_sh (__mmask8 __A, __m128h __B, __m128h __C, int __D) +{ + return __builtin_ia32_vrndscalesh_v8hf_mask_round (__B, __C, __D, + _mm_setzero_ph (), __A, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_roundscale_round_sh (__m128h __A, __m128h __B, int __C, const int __D) +{ + return __builtin_ia32_vrndscalesh_v8hf_mask_round (__A, __B, __C, + _mm_setzero_ph (), + (__mmask8) -1, + __D); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_roundscale_round_sh (__m128h __A, __mmask8 __B, __m128h __C, + __m128h __D, int __E, const int __F) +{ + return __builtin_ia32_vrndscalesh_v8hf_mask_round (__C, __D, __E, + __A, __B, __F); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_roundscale_round_sh (__mmask8 __A, __m128h __B, __m128h __C, + int __D, const int __E) +{ + return __builtin_ia32_vrndscalesh_v8hf_mask_round (__B, __C, __D, + _mm_setzero_ph (), + __A, __E); +} + +#else +#define _mm_roundscale_sh(A, B, C) \ + (__builtin_ia32_vrndscalesh_v8hf_mask_round ((A), (B), (C), \ + _mm_setzero_ph (), \ + (__mmask8)-1, \ + _MM_FROUND_CUR_DIRECTION)) + +#define _mm_mask_roundscale_sh(A, B, C, D, E) \ + (__builtin_ia32_vrndscalesh_v8hf_mask_round ((C), (D), (E), (A), (B), \ + _MM_FROUND_CUR_DIRECTION)) + +#define _mm_maskz_roundscale_sh(A, B, C, D) \ + (__builtin_ia32_vrndscalesh_v8hf_mask_round ((B), (C), (D), \ + _mm_setzero_ph (), \ + (A), _MM_FROUND_CUR_DIRECTION)) + +#define _mm_roundscale_round_sh(A, B, C, D) \ + (__builtin_ia32_vrndscalesh_v8hf_mask_round ((A), (B), (C), \ + _mm_setzero_ph (), \ + (__mmask8)-1, (D))) + +#define _mm_mask_roundscale_round_sh(A, B, C, D, E, F) \ + (__builtin_ia32_vrndscalesh_v8hf_mask_round ((C), (D), (E), (A), (B), (F))) + +#define _mm_maskz_roundscale_round_sh(A, B, C, D, E) \ + (__builtin_ia32_vrndscalesh_v8hf_mask_round ((B), (C), (D), \ + _mm_setzero_ph (), \ + (A), (E))) + +#endif /* __OPTIMIZE__ */ + #ifdef __DISABLE_AVX512FP16__ #undef __DISABLE_AVX512FP16__ #pragma GCC pop_options diff --git a/gcc/config/i386/avx512fp16vlintrin.h b/gcc/config/i386/avx512fp16vlintrin.h index ebda59b9f9a..20b6716aa00 100644 --- a/gcc/config/i386/avx512fp16vlintrin.h +++ b/gcc/config/i386/avx512fp16vlintrin.h @@ -548,6 +548,159 @@ _mm256_maskz_scalef_ph (__mmask16 __A, __m256h __B, __m256h __C) __A); } +/* Intrinsics vreduceph. */ +#ifdef __OPTIMIZE__ +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_reduce_ph (__m128h __A, int __B) +{ + return __builtin_ia32_vreduceph_v8hf_mask (__A, __B, + _mm_setzero_ph (), + (__mmask8) -1); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_reduce_ph (__m128h __A, __mmask8 __B, __m128h __C, int __D) +{ + return __builtin_ia32_vreduceph_v8hf_mask (__C, __D, __A, __B); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_reduce_ph (__mmask8 __A, __m128h __B, int __C) +{ + return __builtin_ia32_vreduceph_v8hf_mask (__B, __C, + _mm_setzero_ph (), __A); +} + +extern __inline __m256h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_reduce_ph (__m256h __A, int __B) +{ + return __builtin_ia32_vreduceph_v16hf_mask (__A, __B, + _mm256_setzero_ph (), + (__mmask16) -1); +} + +extern __inline __m256h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_mask_reduce_ph (__m256h __A, __mmask16 __B, __m256h __C, int __D) +{ + return __builtin_ia32_vreduceph_v16hf_mask (__C, __D, __A, __B); +} + +extern __inline __m256h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_maskz_reduce_ph (__mmask16 __A, __m256h __B, int __C) +{ + return __builtin_ia32_vreduceph_v16hf_mask (__B, __C, + _mm256_setzero_ph (), + __A); +} + +#else +#define _mm_reduce_ph(A, B) \ + (__builtin_ia32_vreduceph_v8hf_mask ((A), (B),\ + _mm_setzero_ph (), \ + ((__mmask8)-1))) + +#define _mm_mask_reduce_ph(A, B, C, D) \ + (__builtin_ia32_vreduceph_v8hf_mask ((C), (D), (A), (B))) + +#define _mm_maskz_reduce_ph(A, B, C) \ + (__builtin_ia32_vreduceph_v8hf_mask ((B), (C), _mm_setzero_ph (), (A))) + +#define _mm256_reduce_ph(A, B) \ + (__builtin_ia32_vreduceph_v16hf_mask ((A), (B),\ + _mm256_setzero_ph (), \ + ((__mmask16)-1))) + +#define _mm256_mask_reduce_ph(A, B, C, D) \ + (__builtin_ia32_vreduceph_v16hf_mask ((C), (D), (A), (B))) + +#define _mm256_maskz_reduce_ph(A, B, C) \ + (__builtin_ia32_vreduceph_v16hf_mask ((B), (C), _mm256_setzero_ph (), (A))) + +#endif /* __OPTIMIZE__ */ + +/* Intrinsics vrndscaleph. */ +#ifdef __OPTIMIZE__ +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_roundscale_ph (__m128h __A, int __B) +{ + return __builtin_ia32_vrndscaleph_v8hf_mask (__A, __B, + _mm_setzero_ph (), + (__mmask8) -1); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_roundscale_ph (__m128h __A, __mmask8 __B, __m128h __C, int __D) +{ + return __builtin_ia32_vrndscaleph_v8hf_mask (__C, __D, __A, __B); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_roundscale_ph (__mmask8 __A, __m128h __B, int __C) +{ + return __builtin_ia32_vrndscaleph_v8hf_mask (__B, __C, + _mm_setzero_ph (), __A); +} + +extern __inline __m256h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_roundscale_ph (__m256h __A, int __B) +{ + return __builtin_ia32_vrndscaleph_v16hf_mask (__A, __B, + _mm256_setzero_ph (), + (__mmask16) -1); +} + +extern __inline __m256h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_mask_roundscale_ph (__m256h __A, __mmask16 __B, __m256h __C, + int __D) +{ + return __builtin_ia32_vrndscaleph_v16hf_mask (__C, __D, __A, __B); +} + +extern __inline __m256h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_maskz_roundscale_ph (__mmask16 __A, __m256h __B, int __C) +{ + return __builtin_ia32_vrndscaleph_v16hf_mask (__B, __C, + _mm256_setzero_ph (), + __A); +} + +#else +#define _mm_roundscale_ph(A, B) \ + (__builtin_ia32_vrndscaleph_v8hf_mask ((A), (B), _mm_setzero_ph (), \ + ((__mmask8)-1))) + +#define _mm_mask_roundscale_ph(A, B, C, D) \ + (__builtin_ia32_vrndscaleph_v8hf_mask ((C), (D), (A), (B))) + +#define _mm_maskz_roundscale_ph(A, B, C) \ + (__builtin_ia32_vrndscaleph_v8hf_mask ((B), (C), _mm_setzero_ph (), (A))) + +#define _mm256_roundscale_ph(A, B) \ + (__builtin_ia32_vrndscaleph_v16hf_mask ((A), (B), \ + _mm256_setzero_ph(), \ + ((__mmask16)-1))) + +#define _mm256_mask_roundscale_ph(A, B, C, D) \ + (__builtin_ia32_vrndscaleph_v16hf_mask ((C), (D), (A), (B))) + +#define _mm256_maskz_roundscale_ph(A, B, C) \ + (__builtin_ia32_vrndscaleph_v16hf_mask ((B), (C), \ + _mm256_setzero_ph (), (A))) + +#endif /* __OPTIMIZE__ */ + #ifdef __DISABLE_AVX512FP16VL__ #undef __DISABLE_AVX512FP16VL__ #pragma GCC pop_options diff --git a/gcc/config/i386/i386-builtin-types.def b/gcc/config/i386/i386-builtin-types.def index 9ebad6b5f49..d2ba1a5edac 100644 --- a/gcc/config/i386/i386-builtin-types.def +++ b/gcc/config/i386/i386-builtin-types.def @@ -1307,12 +1307,15 @@ DEF_FUNCTION_TYPE (V8HF, V8HI) DEF_FUNCTION_TYPE (V8HF, V8HF, V8HF) DEF_FUNCTION_TYPE (V8HF, V8HF, V8HF, UQI) DEF_FUNCTION_TYPE (V8HF, V8HF, V8HF, INT) +DEF_FUNCTION_TYPE (V8HF, V8HF, INT, V8HF, UQI) DEF_FUNCTION_TYPE (UQI, V8HF, V8HF, INT, UQI) DEF_FUNCTION_TYPE (V8HF, V8HF, V8HF, V8HF, UQI) DEF_FUNCTION_TYPE (UQI, V8HF, V8HF, INT, UQI, INT) DEF_FUNCTION_TYPE (V8HF, V8HF, V8HF, V8HF, UQI, INT) +DEF_FUNCTION_TYPE (V8HF, V8HF, V8HF, INT, V8HF, UQI, INT) DEF_FUNCTION_TYPE (V16HF, V16HF, V16HF) DEF_FUNCTION_TYPE (V16HF, V16HF, V16HF, UHI) +DEF_FUNCTION_TYPE (V16HF, V16HF, INT, V16HF, UHI) DEF_FUNCTION_TYPE (UHI, V16HF, V16HF, INT, UHI) DEF_FUNCTION_TYPE (V16HF, V16HF, V16HF, V16HF, UHI) DEF_FUNCTION_TYPE (V32HF, V32HF, V32HF, USI) @@ -1322,3 +1325,4 @@ DEF_FUNCTION_TYPE (V32HF, V32HF, V32HF, USI, INT) DEF_FUNCTION_TYPE (V32HF, V32HF, V32HF, V32HF, USI) DEF_FUNCTION_TYPE (USI, V32HF, V32HF, INT, USI, INT) DEF_FUNCTION_TYPE (V32HF, V32HF, V32HF, V32HF, USI, INT) +DEF_FUNCTION_TYPE (V32HF, V32HF, INT, V32HF, USI, INT) diff --git a/gcc/config/i386/i386-builtin.def b/gcc/config/i386/i386-builtin.def index 7b8ca3ba685..6964062c874 100644 --- a/gcc/config/i386/i386-builtin.def +++ b/gcc/config/i386/i386-builtin.def @@ -2814,6 +2814,10 @@ BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_rcpv32hf2_mask, "__bu BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vmrcpv8hf2_mask, "__builtin_ia32_vrcpsh_v8hf_mask", IX86_BUILTIN_VRCPSH_V8HF_MASK, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI) BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_scalefv8hf_mask, "__builtin_ia32_vscalefph_v8hf_mask", IX86_BUILTIN_VSCALEFPH_V8HF_MASK, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI) BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512vl_scalefv16hf_mask, "__builtin_ia32_vscalefph_v16hf_mask", IX86_BUILTIN_VSCALEFPH_V16HF_MASK, UNKNOWN, (int) V16HF_FTYPE_V16HF_V16HF_V16HF_UHI) +BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_reducepv8hf_mask, "__builtin_ia32_vreduceph_v8hf_mask", IX86_BUILTIN_VREDUCEPH_V8HF_MASK, UNKNOWN, (int) V8HF_FTYPE_V8HF_INT_V8HF_UQI) +BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_reducepv16hf_mask, "__builtin_ia32_vreduceph_v16hf_mask", IX86_BUILTIN_VREDUCEPH_V16HF_MASK, UNKNOWN, (int) V16HF_FTYPE_V16HF_INT_V16HF_UHI) +BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_rndscalev8hf_mask, "__builtin_ia32_vrndscaleph_v8hf_mask", IX86_BUILTIN_VRNDSCALEPH_V8HF_MASK, UNKNOWN, (int) V8HF_FTYPE_V8HF_INT_V8HF_UQI) +BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512vl_rndscalev16hf_mask, "__builtin_ia32_vrndscaleph_v16hf_mask", IX86_BUILTIN_VRNDSCALEPH_V16HF_MASK, UNKNOWN, (int) V16HF_FTYPE_V16HF_INT_V16HF_UHI) /* Builtins with rounding support. */ BDESC_END (ARGS, ROUND_ARGS) @@ -3033,6 +3037,10 @@ BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_sqrtv32hf2_mask_round BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vmsqrtv8hf2_mask_round, "__builtin_ia32_vsqrtsh_v8hf_mask_round", IX86_BUILTIN_VSQRTSH_V8HF_MASK_ROUND, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI_INT) BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_scalefv32hf_mask_round, "__builtin_ia32_vscalefph_v32hf_mask_round", IX86_BUILTIN_VSCALEFPH_V32HF_MASK_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT) BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512f_vmscalefv8hf_mask_round, "__builtin_ia32_vscalefsh_v8hf_mask_round", IX86_BUILTIN_VSCALEFSH_V8HF_MASK_ROUND, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI_INT) +BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_reducepv32hf_mask_round, "__builtin_ia32_vreduceph_v32hf_mask_round", IX86_BUILTIN_VREDUCEPH_V32HF_MASK_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HF_INT_V32HF_USI_INT) +BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_reducesv8hf_mask_round, "__builtin_ia32_vreducesh_v8hf_mask_round", IX86_BUILTIN_VREDUCESH_V8HF_MASK_ROUND, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_INT_V8HF_UQI_INT) +BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_rndscalev32hf_mask_round, "__builtin_ia32_vrndscaleph_v32hf_mask_round", IX86_BUILTIN_VRNDSCALEPH_V32HF_MASK_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HF_INT_V32HF_USI_INT) +BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512f_rndscalev8hf_mask_round, "__builtin_ia32_vrndscalesh_v8hf_mask_round", IX86_BUILTIN_VRNDSCALESH_V8HF_MASK_ROUND, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_INT_V8HF_UQI_INT) BDESC_END (ROUND_ARGS, MULTI_ARG) diff --git a/gcc/config/i386/i386-expand.c b/gcc/config/i386/i386-expand.c index d76e4405413..655234cbdd0 100644 --- a/gcc/config/i386/i386-expand.c +++ b/gcc/config/i386/i386-expand.c @@ -9883,6 +9883,8 @@ ix86_expand_args_builtin (const struct builtin_description *d, case V16SF_FTYPE_V16SF_INT_V16SF_UHI: case V16HI_FTYPE_V16SF_INT_V16HI_UHI: case V16SI_FTYPE_V16SI_INT_V16SI_UHI: + case V16HF_FTYPE_V16HF_INT_V16HF_UHI: + case V8HF_FTYPE_V8HF_INT_V8HF_UQI: case V4SI_FTYPE_V16SI_INT_V4SI_UQI: case V4DI_FTYPE_V8DI_INT_V4DI_UQI: case V4DF_FTYPE_V8DF_INT_V4DF_UQI: @@ -10531,6 +10533,7 @@ ix86_expand_round_builtin (const struct builtin_description *d, case V8HF_FTYPE_V8HF_V8HF_V8HF_UQI_INT: nargs = 5; break; + case V32HF_FTYPE_V32HF_INT_V32HF_USI_INT: case V16SF_FTYPE_V16SF_INT_V16SF_HI_INT: case V8DF_FTYPE_V8DF_INT_V8DF_QI_INT: case V8DF_FTYPE_V8DF_INT_V8DF_UQI_INT: @@ -10553,6 +10556,7 @@ ix86_expand_round_builtin (const struct builtin_description *d, case V2DF_FTYPE_V2DF_V2DF_INT_V2DF_QI_INT: case V2DF_FTYPE_V2DF_V2DF_INT_V2DF_UQI_INT: case V4SF_FTYPE_V4SF_V4SF_INT_V4SF_UQI_INT: + case V8HF_FTYPE_V8HF_V8HF_INT_V8HF_UQI_INT: nargs = 6; nargs_constant = 4; break; diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index 683efe4bb0e..f43651a95ce 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -3070,28 +3070,28 @@ (define_expand "reduc_umin_scal_v8hi" }) (define_insn "reducep" - [(set (match_operand:VF_AVX512VL 0 "register_operand" "=v") - (unspec:VF_AVX512VL - [(match_operand:VF_AVX512VL 1 "" "") + [(set (match_operand:VFH_AVX512VL 0 "register_operand" "=v") + (unspec:VFH_AVX512VL + [(match_operand:VFH_AVX512VL 1 "" "") (match_operand:SI 2 "const_0_to_255_operand")] UNSPEC_REDUCE))] - "TARGET_AVX512DQ" + "TARGET_AVX512DQ || (VALID_AVX512FP16_REG_MODE (mode))" "vreduce\t{%2, %1, %0|%0, %1, %2}" [(set_attr "type" "sse") (set_attr "prefix" "evex") (set_attr "mode" "")]) (define_insn "reduces" - [(set (match_operand:VF_128 0 "register_operand" "=v") - (vec_merge:VF_128 - (unspec:VF_128 - [(match_operand:VF_128 1 "register_operand" "v") - (match_operand:VF_128 2 "" "") + [(set (match_operand:VFH_128 0 "register_operand" "=v") + (vec_merge:VFH_128 + (unspec:VFH_128 + [(match_operand:VFH_128 1 "register_operand" "v") + (match_operand:VFH_128 2 "" "") (match_operand:SI 3 "const_0_to_255_operand")] UNSPEC_REDUCE) (match_dup 1) (const_int 1)))] - "TARGET_AVX512DQ" + "TARGET_AVX512DQ || (VALID_AVX512FP16_REG_MODE (mode))" "vreduce\t{%3, %2, %1, %0|%0, %1, %2, %3}" [(set_attr "type" "sse") (set_attr "prefix" "evex") @@ -10212,9 +10212,9 @@ (define_insn "avx512f_sfixupimm_mask" (set_attr "mode" "")]) (define_insn "_rndscale" - [(set (match_operand:VF_AVX512VL 0 "register_operand" "=v") - (unspec:VF_AVX512VL - [(match_operand:VF_AVX512VL 1 "nonimmediate_operand" "") + [(set (match_operand:VFH_AVX512VL 0 "register_operand" "=v") + (unspec:VFH_AVX512VL + [(match_operand:VFH_AVX512VL 1 "nonimmediate_operand" "") (match_operand:SI 2 "const_0_to_255_operand")] UNSPEC_ROUND))] "TARGET_AVX512F" @@ -10224,13 +10224,13 @@ (define_insn "_rndscale" (set_attr "mode" "")]) (define_insn "avx512f_rndscale" - [(set (match_operand:VF_128 0 "register_operand" "=v") - (vec_merge:VF_128 - (unspec:VF_128 - [(match_operand:VF_128 2 "" "") + [(set (match_operand:VFH_128 0 "register_operand" "=v") + (vec_merge:VFH_128 + (unspec:VFH_128 + [(match_operand:VFH_128 2 "" "") (match_operand:SI 3 "const_0_to_255_operand")] UNSPEC_ROUND) - (match_operand:VF_128 1 "register_operand" "v") + (match_operand:VFH_128 1 "register_operand" "v") (const_int 1)))] "TARGET_AVX512F" "vrndscale\t{%3, %2, %1, %0|%0, %1, %2, %3}" @@ -10239,14 +10239,14 @@ (define_insn "avx512f_rndscale")]) (define_insn "*avx512f_rndscale" - [(set (match_operand:VF_128 0 "register_operand" "=v") - (vec_merge:VF_128 - (vec_duplicate:VF_128 + [(set (match_operand:VFH_128 0 "register_operand" "=v") + (vec_merge:VFH_128 + (vec_duplicate:VFH_128 (unspec: [(match_operand: 2 "" "") (match_operand:SI 3 "const_0_to_255_operand")] UNSPEC_ROUND)) - (match_operand:VF_128 1 "register_operand" "v") + (match_operand:VFH_128 1 "register_operand" "v") (const_int 1)))] "TARGET_AVX512F" "vrndscale\t{%3, %2, %1, %0|%0, %1, %2, %3}" diff --git a/gcc/testsuite/gcc.target/i386/avx-1.c b/gcc/testsuite/gcc.target/i386/avx-1.c index 17c396567f2..4c8e54e4c2a 100644 --- a/gcc/testsuite/gcc.target/i386/avx-1.c +++ b/gcc/testsuite/gcc.target/i386/avx-1.c @@ -705,6 +705,14 @@ #define __builtin_ia32_vsqrtsh_v8hf_mask_round(D, C, A, B, E) __builtin_ia32_vsqrtsh_v8hf_mask_round(D, C, A, B, 8) #define __builtin_ia32_vscalefph_v32hf_mask_round(A, B, C, D, E) __builtin_ia32_vscalefph_v32hf_mask_round(A, B, C, D, 8) #define __builtin_ia32_vscalefsh_v8hf_mask_round(A, B, C, D, E) __builtin_ia32_vscalefsh_v8hf_mask_round(A, B, C, D, 8) +#define __builtin_ia32_vreduceph_v32hf_mask_round(A, B, C, D, E) __builtin_ia32_vreduceph_v32hf_mask_round(A, 123, C, D, 8) +#define __builtin_ia32_vreduceph_v8hf_mask(A, B, C, D) __builtin_ia32_vreduceph_v8hf_mask(A, 123, C, D) +#define __builtin_ia32_vreduceph_v16hf_mask(A, B, C, D) __builtin_ia32_vreduceph_v16hf_mask(A, 123, C, D) +#define __builtin_ia32_vreducesh_v8hf_mask_round(A, B, C, D, E, F) __builtin_ia32_vreducesh_v8hf_mask_round(A, B, 123, D, E, 8) +#define __builtin_ia32_vrndscaleph_v32hf_mask_round(A, B, C, D, E) __builtin_ia32_vrndscaleph_v32hf_mask_round(A, 123, C, D, 8) +#define __builtin_ia32_vrndscaleph_v8hf_mask(A, B, C, D) __builtin_ia32_vrndscaleph_v8hf_mask(A, 123, C, D) +#define __builtin_ia32_vrndscaleph_v16hf_mask(A, B, C, D) __builtin_ia32_vrndscaleph_v16hf_mask(A, 123, C, D) +#define __builtin_ia32_vrndscalesh_v8hf_mask_round(A, B, C, D, E, F) __builtin_ia32_vrndscalesh_v8hf_mask_round(A, B, 123, D, E, 8) /* avx512fp16vlintrin.h */ #define __builtin_ia32_vcmpph_v8hf_mask(A, B, C, D) __builtin_ia32_vcmpph_v8hf_mask(A, B, 1, D) diff --git a/gcc/testsuite/gcc.target/i386/sse-13.c b/gcc/testsuite/gcc.target/i386/sse-13.c index c1d95fc2ead..044d427c932 100644 --- a/gcc/testsuite/gcc.target/i386/sse-13.c +++ b/gcc/testsuite/gcc.target/i386/sse-13.c @@ -722,6 +722,14 @@ #define __builtin_ia32_vsqrtsh_v8hf_mask_round(D, C, A, B, E) __builtin_ia32_vsqrtsh_v8hf_mask_round(D, C, A, B, 8) #define __builtin_ia32_vscalefph_v32hf_mask_round(A, B, C, D, E) __builtin_ia32_vscalefph_v32hf_mask_round(A, B, C, D, 8) #define __builtin_ia32_vscalefsh_v8hf_mask_round(A, B, C, D, E) __builtin_ia32_vscalefsh_v8hf_mask_round(A, B, C, D, 8) +#define __builtin_ia32_vreduceph_v32hf_mask_round(A, B, C, D, E) __builtin_ia32_vreduceph_v32hf_mask_round(A, 123, C, D, 8) +#define __builtin_ia32_vreduceph_v8hf_mask(A, B, C, D) __builtin_ia32_vreduceph_v8hf_mask(A, 123, C, D) +#define __builtin_ia32_vreduceph_v16hf_mask(A, B, C, D) __builtin_ia32_vreduceph_v16hf_mask(A, 123, C, D) +#define __builtin_ia32_vreducesh_v8hf_mask_round(A, B, C, D, E, F) __builtin_ia32_vreducesh_v8hf_mask_round(A, B, 123, D, E, 8) +#define __builtin_ia32_vrndscaleph_v32hf_mask_round(A, B, C, D, E) __builtin_ia32_vrndscaleph_v32hf_mask_round(A, 123, C, D, 8) +#define __builtin_ia32_vrndscaleph_v8hf_mask(A, B, C, D) __builtin_ia32_vrndscaleph_v8hf_mask(A, 123, C, D) +#define __builtin_ia32_vrndscaleph_v16hf_mask(A, B, C, D) __builtin_ia32_vrndscaleph_v16hf_mask(A, 123, C, D) +#define __builtin_ia32_vrndscalesh_v8hf_mask_round(A, B, C, D, E, F) __builtin_ia32_vrndscalesh_v8hf_mask_round(A, B, 123, D, E, 8) /* avx512fp16vlintrin.h */ #define __builtin_ia32_vcmpph_v8hf_mask(A, B, C, D) __builtin_ia32_vcmpph_v8hf_mask(A, B, 1, D) diff --git a/gcc/testsuite/gcc.target/i386/sse-14.c b/gcc/testsuite/gcc.target/i386/sse-14.c index 5b6d0b082d1..b7ffdf7e1df 100644 --- a/gcc/testsuite/gcc.target/i386/sse-14.c +++ b/gcc/testsuite/gcc.target/i386/sse-14.c @@ -671,6 +671,14 @@ test_3 (_mm512_mask_rsqrt28_round_ps, __m512, __m512, __mmask16, __m512, 8) /* avx512fp16intrin.h */ test_1 (_mm512_sqrt_round_ph, __m512h, __m512h, 8) +test_1 (_mm_reduce_ph, __m128h, __m128h, 123) +test_1 (_mm256_reduce_ph, __m256h, __m256h, 123) +test_1 (_mm512_reduce_ph, __m512h, __m512h, 123) +test_1 (_mm_roundscale_ph, __m128h, __m128h, 123) +test_1 (_mm256_roundscale_ph, __m256h, __m256h, 123) +test_1 (_mm512_roundscale_ph, __m512h, __m512h, 123) +test_1x (_mm512_reduce_round_ph, __m512h, __m512h, 123, 8) +test_1x (_mm512_roundscale_round_ph, __m512h, __m512h, 123, 8) test_2 (_mm512_add_round_ph, __m512h, __m512h, __m512h, 8) test_2 (_mm512_sub_round_ph, __m512h, __m512h, __m512h, 8) test_2 (_mm512_mul_round_ph, __m512h, __m512h, __m512h, 8) @@ -689,9 +697,21 @@ test_2 (_mm512_maskz_sqrt_round_ph, __m512h, __mmask32, __m512h, 8) test_2 (_mm_sqrt_round_sh, __m128h, __m128h, __m128h, 8) test_2 (_mm512_scalef_round_ph, __m512h, __m512h, __m512h, 8) test_2 (_mm_scalef_round_sh, __m128h, __m128h, __m128h, 8) +test_2 (_mm_maskz_reduce_ph, __m128h, __mmask8, __m128h, 123) +test_2 (_mm256_maskz_reduce_ph, __m256h, __mmask16, __m256h, 123) +test_2 (_mm512_maskz_reduce_ph, __m512h, __mmask32, __m512h, 123) +test_2 (_mm_reduce_sh, __m128h, __m128h, __m128h, 123) +test_2 (_mm_maskz_roundscale_ph, __m128h, __mmask8, __m128h, 123) +test_2 (_mm256_maskz_roundscale_ph, __m256h, __mmask16, __m256h, 123) +test_2 (_mm512_maskz_roundscale_ph, __m512h, __mmask32, __m512h, 123) +test_2 (_mm_roundscale_sh, __m128h, __m128h, __m128h, 123) test_2x (_mm512_cmp_round_ph_mask, __mmask32, __m512h, __m512h, 1, 8) test_2x (_mm_cmp_round_sh_mask, __mmask8, __m128h, __m128h, 1, 8) test_2x (_mm_comi_round_sh, int, __m128h, __m128h, 1, 8) +test_2x (_mm512_maskz_reduce_round_ph, __m512h, __mmask32, __m512h, 123, 8) +test_2x (_mm512_maskz_roundscale_round_ph, __m512h, __mmask32, __m512h, 123, 8) +test_2x (_mm_reduce_round_sh, __m128h, __m128h, __m128h, 123, 8) +test_2x (_mm_roundscale_round_sh, __m128h, __m128h, __m128h, 123, 8) test_3 (_mm512_maskz_add_round_ph, __m512h, __mmask32, __m512h, __m512h, 8) test_3 (_mm512_maskz_sub_round_ph, __m512h, __mmask32, __m512h, __m512h, 8) test_3 (_mm512_maskz_mul_round_ph, __m512h, __mmask32, __m512h, __m512h, 8) @@ -709,8 +729,20 @@ test_3 (_mm512_mask_sqrt_round_ph, __m512h, __m512h, __mmask32, __m512h, 8) test_3 (_mm_maskz_sqrt_round_sh, __m128h, __mmask8, __m128h, __m128h, 8) test_3 (_mm512_maskz_scalef_round_ph, __m512h, __mmask32, __m512h, __m512h, 8) test_3 (_mm_maskz_scalef_round_sh, __m128h, __mmask8, __m128h, __m128h, 8) +test_3 (_mm_mask_reduce_ph, __m128h, __m128h, __mmask8, __m128h, 123) +test_3 (_mm256_mask_reduce_ph, __m256h, __m256h, __mmask16, __m256h, 123) +test_3 (_mm512_mask_reduce_ph, __m512h, __m512h, __mmask32, __m512h, 123) +test_3 (_mm_maskz_reduce_sh, __m128h, __mmask8, __m128h, __m128h, 123) +test_3 (_mm_mask_roundscale_ph, __m128h, __m128h, __mmask8, __m128h, 123) +test_3 (_mm256_mask_roundscale_ph, __m256h, __m256h, __mmask16, __m256h, 123) +test_3 (_mm512_mask_roundscale_ph, __m512h, __m512h, __mmask32, __m512h, 123) +test_3 (_mm_maskz_roundscale_sh, __m128h, __mmask8, __m128h, __m128h, 123) test_3x (_mm512_mask_cmp_round_ph_mask, __mmask32, __mmask32, __m512h, __m512h, 1, 8) test_3x (_mm_mask_cmp_round_sh_mask, __mmask8, __mmask8, __m128h, __m128h, 1, 8) +test_3x (_mm512_mask_reduce_round_ph, __m512h, __m512h, __mmask32, __m512h, 123, 8) +test_3x (_mm512_mask_roundscale_round_ph, __m512h, __m512h, __mmask32, __m512h, 123, 8) +test_3x (_mm_maskz_reduce_round_sh, __m128h, __mmask8, __m128h, __m128h, 123, 8) +test_3x (_mm_maskz_roundscale_round_sh, __m128h, __mmask8, __m128h, __m128h, 123, 8) test_4 (_mm512_mask_add_round_ph, __m512h, __m512h, __mmask32, __m512h, __m512h, 8) test_4 (_mm512_mask_sub_round_ph, __m512h, __m512h, __mmask32, __m512h, __m512h, 8) test_4 (_mm512_mask_mul_round_ph, __m512h, __m512h, __mmask32, __m512h, __m512h, 8) @@ -726,6 +758,10 @@ test_4 (_mm_mask_min_round_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 8) test_4 (_mm_mask_sqrt_round_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 8) test_4 (_mm512_mask_scalef_round_ph, __m512h, __m512h, __mmask32, __m512h, __m512h, 8) test_4 (_mm_mask_scalef_round_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 8) +test_4 (_mm_mask_reduce_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 123) +test_4 (_mm_mask_roundscale_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 123) +test_4x (_mm_mask_reduce_round_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 123, 8) +test_4x (_mm_mask_roundscale_round_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 123, 8) /* avx512fp16vlintrin.h */ test_2 (_mm_cmp_ph_mask, __mmask8, __m128h, __m128h, 1) diff --git a/gcc/testsuite/gcc.target/i386/sse-22.c b/gcc/testsuite/gcc.target/i386/sse-22.c index b2de5679bb6..5dbe8cba5ea 100644 --- a/gcc/testsuite/gcc.target/i386/sse-22.c +++ b/gcc/testsuite/gcc.target/i386/sse-22.c @@ -776,6 +776,14 @@ test_2 (_mm_rsqrt28_round_ss, __m128, __m128, __m128, 8) /* avx512fp16intrin.h */ test_1 (_mm512_sqrt_round_ph, __m512h, __m512h, 8) +test_1 (_mm_reduce_ph, __m128h, __m128h, 123) +test_1 (_mm256_reduce_ph, __m256h, __m256h, 123) +test_1 (_mm512_reduce_ph, __m512h, __m512h, 123) +test_1 (_mm_roundscale_ph, __m128h, __m128h, 123) +test_1 (_mm256_roundscale_ph, __m256h, __m256h, 123) +test_1 (_mm512_roundscale_ph, __m512h, __m512h, 123) +test_1x (_mm512_reduce_round_ph, __m512h, __m512h, 123, 8) +test_1x (_mm512_roundscale_round_ph, __m512h, __m512h, 123, 8) test_2 (_mm512_add_round_ph, __m512h, __m512h, __m512h, 8) test_2 (_mm512_sub_round_ph, __m512h, __m512h, __m512h, 8) test_2 (_mm512_mul_round_ph, __m512h, __m512h, __m512h, 8) @@ -793,9 +801,21 @@ test_2 (_mm_comi_sh, int, __m128h, __m128h, 1) test_2 (_mm512_maskz_sqrt_round_ph, __m512h, __mmask32, __m512h, 8) test_2 (_mm_sqrt_round_sh, __m128h, __m128h, __m128h, 8) test_2 (_mm512_scalef_round_ph, __m512h, __m512h, __m512h, 8) +test_2 (_mm_maskz_reduce_ph, __m128h, __mmask8, __m128h, 123) +test_2 (_mm256_maskz_reduce_ph, __m256h, __mmask16, __m256h, 123) +test_2 (_mm512_maskz_reduce_ph, __m512h, __mmask32, __m512h, 123) +test_2 (_mm_reduce_sh, __m128h, __m128h, __m128h, 123) +test_2 (_mm_maskz_roundscale_ph, __m128h, __mmask8, __m128h, 123) +test_2 (_mm256_maskz_roundscale_ph, __m256h, __mmask16, __m256h, 123) +test_2 (_mm512_maskz_roundscale_ph, __m512h, __mmask32, __m512h, 123) +test_2 (_mm_roundscale_sh, __m128h, __m128h, __m128h, 123) test_2x (_mm512_cmp_round_ph_mask, __mmask32, __m512h, __m512h, 1, 8) test_2x (_mm_cmp_round_sh_mask, __mmask8, __m128h, __m128h, 1, 8) test_2x (_mm_comi_round_sh, int, __m128h, __m128h, 1, 8) +test_2x (_mm512_maskz_reduce_round_ph, __m512h, __mmask32, __m512h, 123, 8) +test_2x (_mm512_maskz_roundscale_round_ph, __m512h, __mmask32, __m512h, 123, 8) +test_2x (_mm_reduce_round_sh, __m128h, __m128h, __m128h, 123, 8) +test_2x (_mm_roundscale_round_sh, __m128h, __m128h, __m128h, 123, 8) test_3 (_mm512_maskz_add_round_ph, __m512h, __mmask32, __m512h, __m512h, 8) test_3 (_mm512_maskz_sub_round_ph, __m512h, __mmask32, __m512h, __m512h, 8) test_3 (_mm512_maskz_mul_round_ph, __m512h, __mmask32, __m512h, __m512h, 8) @@ -812,8 +832,20 @@ test_3 (_mm512_mask_cmp_ph_mask, __mmask32, __mmask32, __m512h, __m512h, 1) test_3 (_mm512_mask_sqrt_round_ph, __m512h, __m512h, __mmask32, __m512h, 8) test_3 (_mm_maskz_sqrt_round_sh, __m128h, __mmask8, __m128h, __m128h, 8) test_3 (_mm512_maskz_scalef_round_ph, __m512h, __mmask32, __m512h, __m512h, 8) +test_3 (_mm_mask_reduce_ph, __m128h, __m128h, __mmask8, __m128h, 123) +test_3 (_mm256_mask_reduce_ph, __m256h, __m256h, __mmask16, __m256h, 123) +test_3 (_mm512_mask_reduce_ph, __m512h, __m512h, __mmask32, __m512h, 123) +test_3 (_mm_maskz_reduce_sh, __m128h, __mmask8, __m128h, __m128h, 123) +test_3 (_mm_mask_roundscale_ph, __m128h, __m128h, __mmask8, __m128h, 123) +test_3 (_mm256_mask_roundscale_ph, __m256h, __m256h, __mmask16, __m256h, 123) +test_3 (_mm512_mask_roundscale_ph, __m512h, __m512h, __mmask32, __m512h, 123) +test_3 (_mm_maskz_roundscale_sh, __m128h, __mmask8, __m128h, __m128h, 123) test_3x (_mm512_mask_cmp_round_ph_mask, __mmask32, __mmask32, __m512h, __m512h, 1, 8) test_3x (_mm_mask_cmp_round_sh_mask, __mmask8, __mmask8, __m128h, __m128h, 1, 8) +test_3x (_mm512_mask_reduce_round_ph, __m512h, __m512h, __mmask32, __m512h, 123, 8) +test_3x (_mm512_mask_roundscale_round_ph, __m512h, __m512h, __mmask32, __m512h, 123, 8) +test_3x (_mm_maskz_reduce_round_sh, __m128h, __mmask8, __m128h, __m128h, 123, 8) +test_3x (_mm_maskz_roundscale_round_sh, __m128h, __mmask8, __m128h, __m128h, 123, 8) test_4 (_mm512_mask_add_round_ph, __m512h, __m512h, __mmask32, __m512h, __m512h, 8) test_4 (_mm512_mask_sub_round_ph, __m512h, __m512h, __mmask32, __m512h, __m512h, 8) test_4 (_mm512_mask_mul_round_ph, __m512h, __m512h, __mmask32, __m512h, __m512h, 8) @@ -828,6 +860,10 @@ test_4 (_mm_mask_max_round_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 8) test_4 (_mm_mask_min_round_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 8) test_4 (_mm_mask_sqrt_round_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 8) test_4 (_mm512_mask_scalef_round_ph, __m512h, __m512h, __mmask32, __m512h, __m512h, 8) +test_4 (_mm_mask_reduce_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 123) +test_4 (_mm_mask_roundscale_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 123) +test_4x (_mm_mask_reduce_round_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 123, 8) +test_4x (_mm_mask_roundscale_round_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 123, 8) /* avx512fp16vlintrin.h */ test_2 (_mm_cmp_ph_mask, __mmask8, __m128h, __m128h, 1) diff --git a/gcc/testsuite/gcc.target/i386/sse-23.c b/gcc/testsuite/gcc.target/i386/sse-23.c index 5948622cc4f..2d968f07bc8 100644 --- a/gcc/testsuite/gcc.target/i386/sse-23.c +++ b/gcc/testsuite/gcc.target/i386/sse-23.c @@ -723,6 +723,14 @@ #define __builtin_ia32_vsqrtsh_v8hf_mask_round(D, C, A, B, E) __builtin_ia32_vsqrtsh_v8hf_mask_round(D, C, A, B, 8) #define __builtin_ia32_vscalefph_v32hf_mask_round(A, B, C, D, E) __builtin_ia32_vscalefph_v32hf_mask_round(A, B, C, D, 8) #define __builtin_ia32_vscalefsh_v8hf_mask_round(A, B, C, D, E) __builtin_ia32_vscalefsh_v8hf_mask_round(A, B, C, D, 8) +#define __builtin_ia32_vreduceph_v32hf_mask_round(A, B, C, D, E) __builtin_ia32_vreduceph_v32hf_mask_round(A, 123, C, D, 8) +#define __builtin_ia32_vreduceph_v8hf_mask(A, B, C, D) __builtin_ia32_vreduceph_v8hf_mask(A, 123, C, D) +#define __builtin_ia32_vreduceph_v16hf_mask(A, B, C, D) __builtin_ia32_vreduceph_v16hf_mask(A, 123, C, D) +#define __builtin_ia32_vreducesh_v8hf_mask_round(A, B, C, D, E, F) __builtin_ia32_vreducesh_v8hf_mask_round(A, B, 123, D, E, 8) +#define __builtin_ia32_vrndscaleph_v32hf_mask_round(A, B, C, D, E) __builtin_ia32_vrndscaleph_v32hf_mask_round(A, 123, C, D, 8) +#define __builtin_ia32_vrndscaleph_v8hf_mask(A, B, C, D) __builtin_ia32_vrndscaleph_v8hf_mask(A, 123, C, D) +#define __builtin_ia32_vrndscaleph_v16hf_mask(A, B, C, D) __builtin_ia32_vrndscaleph_v16hf_mask(A, 123, C, D) +#define __builtin_ia32_vrndscalesh_v8hf_mask_round(A, B, C, D, E, F) __builtin_ia32_vrndscalesh_v8hf_mask_round(A, B, 123, D, E, 8) /* avx512fp16vlintrin.h */ #define __builtin_ia32_vcmpph_v8hf_mask(A, B, C, D) __builtin_ia32_vcmpph_v8hf_mask(A, B, 1, D) From patchwork Thu Jul 1 06:16:07 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: liuhongt X-Patchwork-Id: 1499328 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256 header.s=default header.b=h6x+Pkcs; dkim-atps=neutral Received: from sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4GFpXc3M7hz9sW8 for ; Thu, 1 Jul 2021 16:41:52 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 03A63383D816 for ; Thu, 1 Jul 2021 06:41:50 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 03A63383D816 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1625121710; bh=FE651ELLgti9LNrF17y4W9QOIka31XCOngfA9cDPsec=; h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=h6x+PkcsUonL6nd/40wwJQvIre7kePmcOGIqr9gEjGHhkb7P4I8EbwoT4hKh7bjo3 Ora4w5n+q5/eDw5aGTtiQGBLqi6nRH9rVEvv/pxvvzmUCywSqh2DqBEWbeYMySMaym oEe/5+l+8bgM8aM39HEinJg5QIa6j2Uxd4E7XGiQ= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mga04.intel.com (mga04.intel.com [192.55.52.120]) by sourceware.org (Postfix) with ESMTPS id A366C3848404 for ; Thu, 1 Jul 2021 06:17:26 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org A366C3848404 X-IronPort-AV: E=McAfee;i="6200,9189,10031"; a="206656473" X-IronPort-AV: E=Sophos;i="5.83,313,1616482800"; d="scan'208";a="206656473" Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by fmsmga104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Jun 2021 23:17:25 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.83,313,1616482800"; d="scan'208";a="476545855" Received: from scymds01.sc.intel.com ([10.148.94.138]) by fmsmga004.fm.intel.com with ESMTP; 30 Jun 2021 23:17:25 -0700 Received: from shliclel320.sh.intel.com (shliclel320.sh.intel.com [10.239.236.50]) by scymds01.sc.intel.com with ESMTP id 1616Gmf0031625; Wed, 30 Jun 2021 23:17:23 -0700 To: gcc-patches@gcc.gnu.org Subject: [PATCH 21/62] AVX512FP16: Add testcase for vreduceph/vreducesh/vrndscaleph/vrndscalesh. Date: Thu, 1 Jul 2021 14:16:07 +0800 Message-Id: <20210701061648.9447-22-hongtao.liu@intel.com> X-Mailer: git-send-email 2.18.1 In-Reply-To: <20210701061648.9447-1-hongtao.liu@intel.com> References: <20210701061648.9447-1-hongtao.liu@intel.com> X-Spam-Status: No, score=-12.4 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_NONE, KAM_DMARC_STATUS, KAM_LAZY_DOMAIN_SECURITY, KAM_SHORT, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: liuhongt via Gcc-patches From: liuhongt Reply-To: liuhongt Cc: jakub@redhat.com Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" gcc/testsuite/ChangeLog: * gcc.target/i386/avx512fp16-helper.h (_ROUND_CUR): New macro. * gcc.target/i386/avx512fp16-vreduceph-1a.c: New test. * gcc.target/i386/avx512fp16-vreduceph-1b.c: Ditto. * gcc.target/i386/avx512fp16-vreducesh-1a.c: Ditto. * gcc.target/i386/avx512fp16-vreducesh-1b.c: Ditto. * gcc.target/i386/avx512fp16-vrndscaleph-1a.c: Ditto. * gcc.target/i386/avx512fp16-vrndscaleph-1b.c: Ditto. * gcc.target/i386/avx512fp16-vrndscalesh-1a.c: Ditto. * gcc.target/i386/avx512fp16-vrndscalesh-1b.c: Ditto. * gcc.target/i386/avx512fp16vl-vreduceph-1a.c: Ditto. * gcc.target/i386/avx512fp16vl-vreduceph-1b.c: Ditto. * gcc.target/i386/avx512fp16vl-vrndscaleph-1a.c: Ditto. * gcc.target/i386/avx512fp16vl-vrndscaleph-1b.c: Ditto. --- .../gcc.target/i386/avx512fp16-helper.h | 1 + .../gcc.target/i386/avx512fp16-vreduceph-1a.c | 26 ++++ .../gcc.target/i386/avx512fp16-vreduceph-1b.c | 116 ++++++++++++++++++ .../gcc.target/i386/avx512fp16-vreducesh-1a.c | 26 ++++ .../gcc.target/i386/avx512fp16-vreducesh-1b.c | 78 ++++++++++++ .../i386/avx512fp16-vrndscaleph-1a.c | 26 ++++ .../i386/avx512fp16-vrndscaleph-1b.c | 101 +++++++++++++++ .../i386/avx512fp16-vrndscalesh-1a.c | 25 ++++ .../i386/avx512fp16-vrndscalesh-1b.c | 62 ++++++++++ .../i386/avx512fp16vl-vreduceph-1a.c | 30 +++++ .../i386/avx512fp16vl-vreduceph-1b.c | 16 +++ .../i386/avx512fp16vl-vrndscaleph-1a.c | 30 +++++ .../i386/avx512fp16vl-vrndscaleph-1b.c | 16 +++ 13 files changed, 553 insertions(+) create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vreduceph-1a.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vreduceph-1b.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vreducesh-1a.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vreducesh-1b.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vrndscaleph-1a.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vrndscaleph-1b.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vrndscalesh-1a.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vrndscalesh-1b.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vreduceph-1a.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vreduceph-1b.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vrndscaleph-1a.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vrndscaleph-1b.c diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-helper.h b/gcc/testsuite/gcc.target/i386/avx512fp16-helper.h index 5d3539bf312..ec88888532c 100644 --- a/gcc/testsuite/gcc.target/i386/avx512fp16-helper.h +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-helper.h @@ -17,6 +17,7 @@ /* Useful macros. */ #define NOINLINE __attribute__((noinline,noclone)) #define _ROUND_NINT (_MM_FROUND_TO_NEAREST_INT | _MM_FROUND_NO_EXC) +#define _ROUND_CUR 8 #define AVX512F_MAX_ELEM 512 / 32 /* Structure for _Float16 emulation */ diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vreduceph-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vreduceph-1a.c new file mode 100644 index 00000000000..536c1ef6b02 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vreduceph-1a.c @@ -0,0 +1,26 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx512fp16 -O2" } */ +/* { dg-final { scan-assembler-times "vreduceph\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vreduceph\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vreduceph\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vreduceph\[ \\t\]+\[^\{\n\]*\{sae\}\[^\{\n\]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vreduceph\[ \\t\]+\[^\{\n\]*\{sae\}\[^\{\n\]*%zmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vreduceph\[ \\t\]+\[^\{\n\]*\{sae\}\[^\{\n\]*%zmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ + +#include + +#define IMM 123 + +volatile __m512h x1; +volatile __mmask32 m; + +void extern +avx512fp16_test (void) +{ + x1 = _mm512_reduce_ph (x1, IMM); + x1 = _mm512_mask_reduce_ph (x1, m, x1, IMM); + x1 = _mm512_maskz_reduce_ph (m, x1, IMM); + x1 = _mm512_reduce_round_ph (x1, IMM, 8); + x1 = _mm512_mask_reduce_round_ph (x1, m, x1, IMM, 8); + x1 = _mm512_maskz_reduce_round_ph (m, x1, IMM, 8); +} diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vreduceph-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vreduceph-1b.c new file mode 100644 index 00000000000..20d1ba59fda --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vreduceph-1b.c @@ -0,0 +1,116 @@ +/* { dg-do run { target avx512fp16 } } */ +/* { dg-options "-O2 -mavx512fp16 -mavx512dq" } */ + + +#define AVX512FP16 +#include "avx512fp16-helper.h" + +#define N_ELEMS (AVX512F_LEN / 16) + +#ifndef __REDUCEPH__ +#define __REDUCEPH__ +V512 borrow_reduce_ps(V512 v, int imm8) +{ + V512 temp; + switch (imm8) + { + case 1: temp.zmm = _mm512_mask_reduce_ps (v.zmm, 0xffff, v.zmm, 1);break; + case 2: temp.zmm = _mm512_mask_reduce_ps (v.zmm, 0xffff, v.zmm, 2);break; + case 3: temp.zmm = _mm512_mask_reduce_ps (v.zmm, 0xffff, v.zmm, 3);break; + case 4: temp.zmm = _mm512_mask_reduce_ps (v.zmm, 0xffff, v.zmm, 4);break; + case 5: temp.zmm = _mm512_mask_reduce_ps (v.zmm, 0xffff, v.zmm, 5);break; + case 6: temp.zmm = _mm512_mask_reduce_ps (v.zmm, 0xffff, v.zmm, 6);break; + case 7: temp.zmm = _mm512_mask_reduce_ps (v.zmm, 0xffff, v.zmm, 7);break; + case 8: temp.zmm = _mm512_mask_reduce_ps (v.zmm, 0xffff, v.zmm, 8);break; + } + return temp; +} +#endif + +void NOINLINE +EMULATE(reduce_ph) (V512 * dest, V512 op1, + __mmask32 k, int imm8, int zero_mask) +{ + V512 v1, v2, v3, v4, v5, v6, v7, v8; + V512 t1,t2; + int i; + __mmask16 m1, m2; + + m1 = k & 0xffff; + m2 = (k >> 16) & 0xffff; + + unpack_ph_2twops(op1, &v1, &v2); + unpack_ph_2twops(*dest, &v7, &v8); + t1 = borrow_reduce_ps(v1, imm8); + t2 = borrow_reduce_ps(v2, imm8); + + for (i = 0; i < 16; i++) { + if (((1 << i) & m1) == 0) { + if (zero_mask) { + v5.f32[i] = 0; + } + else { + v5.u32[i] = v7.u32[i]; + } + } + else { + v5.f32[i] = t1.f32[i]; + } + + if (((1 << i) & m2) == 0) { + if (zero_mask) { + v6.f32[i] = 0; + } + else { + v6.u32[i] = v8.u32[i]; + } + } + else { + v6.f32[i] = t2.f32[i]; + } + + } + *dest = pack_twops_2ph(v5, v6); +} + +void +TEST (void) +{ + V512 res; + V512 exp; + + init_src(); + + EMULATE(reduce_ph) (&exp, src1, NET_MASK, 6, 0); + HF(res) = INTRINSIC (_reduce_ph) (HF(src1), 6); + CHECK_RESULT (&res, &exp, N_ELEMS, _reduce_ph); + + init_dest(&res, &exp); + EMULATE(reduce_ph) (&exp, src1, MASK_VALUE, 5, 0); + HF(res) = INTRINSIC (_mask_reduce_ph) (HF(res), MASK_VALUE, HF(src1), 5); + CHECK_RESULT (&res, &exp, N_ELEMS, _mask_reduce_ph); + + EMULATE(reduce_ph) (&exp, src1, ZMASK_VALUE, 4, 1); + HF(res) = INTRINSIC (_maskz_reduce_ph) (ZMASK_VALUE, HF(src1), 4); + CHECK_RESULT (&res, &exp, N_ELEMS, _maskz_reduce_ph); + +#if AVX512F_LEN == 512 + EMULATE(reduce_ph) (&exp, src1, NET_MASK, 6, 0); + HF(res) = INTRINSIC (_reduce_round_ph) (HF(src1), 6, _ROUND_CUR); + CHECK_RESULT (&res, &exp, N_ELEMS, _reduce_round_ph); + + init_dest(&res, &exp); + EMULATE(reduce_ph) (&exp, src1, MASK_VALUE, 5, 0); + HF(res) = INTRINSIC (_mask_reduce_round_ph) (HF(res), MASK_VALUE, HF(src1), 5, _ROUND_CUR); + CHECK_RESULT (&res, &exp, N_ELEMS, _mask_reduce_round_ph); + + EMULATE(reduce_ph) (&exp, src1, ZMASK_VALUE, 4, 1); + HF(res) = INTRINSIC (_maskz_reduce_round_ph) (ZMASK_VALUE, HF(src1), 4, _ROUND_CUR); + CHECK_RESULT (&res, &exp, N_ELEMS, _maskz_reduce_round_ph); +#endif + + if (n_errs != 0) { + abort (); + } +} + diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vreducesh-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vreducesh-1a.c new file mode 100644 index 00000000000..80369918567 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vreducesh-1a.c @@ -0,0 +1,26 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx512fp16 -O2" } */ +/* { dg-final { scan-assembler-times "vreducesh\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\]*%xmm\[0-9\]+\[^\n\]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 2 } } */ +/* { dg-final { scan-assembler-times "vreducesh\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\]*%xmm\[0-9\]+\[^\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vreducesh\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\]*%xmm\[0-9\]+\[^\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vreducesh\[ \\t\]+\[^\n\]*\{sae\}\[^\{\n\]*%xmm\[0-9\]+\[^\n\]*%xmm\[0-9\]+\[^\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vreducesh\[ \\t\]+\[^\n\]*\{sae\}\[^\{\n\]*%xmm\[0-9\]+\[^\n\]*%xmm\[0-9\]+\[^\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ + + +#include + +#define IMM 123 + +volatile __m128h x1, x2; +volatile __mmask8 m; + +void extern +avx512fp16_test (void) +{ + x1 = _mm_reduce_sh (x1, x2, IMM); + x1 = _mm_mask_reduce_sh(x1, m, x1, x2, IMM); + x1 = _mm_maskz_reduce_sh(m, x1, x2, IMM); + x1 = _mm_reduce_round_sh (x1, x2, IMM, 4); + x1 = _mm_mask_reduce_round_sh(x1, m, x1, x2, IMM, 8); + x1 = _mm_maskz_reduce_round_sh(m, x1, x2, IMM, 8); +} diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vreducesh-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vreducesh-1b.c new file mode 100644 index 00000000000..4c5dfe73c3a --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vreducesh-1b.c @@ -0,0 +1,78 @@ +/* { dg-do run { target avx512fp16 } } */ +/* { dg-options "-O2 -mavx512fp16 -mavx512dq" } */ + + +#define AVX512FP16 +#include "avx512fp16-helper.h" + +#define N_ELEMS 8 + +V512 borrow_reduce_ps(V512 v, int imm8) +{ + V512 temp; + switch (imm8) + { + case 1: temp.zmm = _mm512_mask_reduce_ps (v.zmm, 0xffff, v.zmm, 1);break; + case 2: temp.zmm = _mm512_mask_reduce_ps (v.zmm, 0xffff, v.zmm, 2);break; + case 3: temp.zmm = _mm512_mask_reduce_ps (v.zmm, 0xffff, v.zmm, 3);break; + case 4: temp.zmm = _mm512_mask_reduce_ps (v.zmm, 0xffff, v.zmm, 4);break; + case 5: temp.zmm = _mm512_mask_reduce_ps (v.zmm, 0xffff, v.zmm, 5);break; + case 6: temp.zmm = _mm512_mask_reduce_ps (v.zmm, 0xffff, v.zmm, 6);break; + case 7: temp.zmm = _mm512_mask_reduce_ps (v.zmm, 0xffff, v.zmm, 7);break; + case 8: temp.zmm = _mm512_mask_reduce_ps (v.zmm, 0xffff, v.zmm, 8);break; + } + return temp; +} + +void NOINLINE +emulate_reduce_sh(V512 * dest, V512 op1, + __mmask32 k, int imm8, int zero_mask) +{ + V512 v1, v2, v3, v4, v5, v6, v7, v8; + V512 t1; + int i; + + unpack_ph_2twops(op1, &v1, &v2); + unpack_ph_2twops(*dest, &v7, &v8); + t1 = borrow_reduce_ps(v1, imm8); + + if ((k&1) || !k) + v5.f32[0] = t1.f32[0]; + else if (zero_mask) + v5.f32[0] = 0; + else + v5.f32[0] = v7.f32[0]; + + for (i = 1; i < 8; i++) + v5.f32[i] = v1.f32[i]; + + *dest = pack_twops_2ph(v5, v6); +} + +void +test_512 (void) +{ + V512 res; + V512 exp; + + init_src(); + + emulate_reduce_sh(&exp, src1, 0x1, 8, 0); + res.xmmh[0] = _mm_reduce_round_sh(src1.xmmh[0], exp.xmmh[0], 8, _ROUND_CUR); + check_results(&res, &exp, N_ELEMS, "_mm_reduce_round_sh"); + + init_dest(&res, &exp); + emulate_reduce_sh(&exp, src1, 0x1, 7, 0); + res.xmmh[0] = _mm_mask_reduce_round_sh(res.xmmh[0], 0x1, src1.xmmh[0], exp.xmmh[0], 7, _ROUND_CUR); + check_results(&res, &exp, N_ELEMS, "_mm_mask_reduce_round_sh"); + + emulate_reduce_sh(&exp, src1, 0x3, 6, 1); + res.xmmh[0] = _mm_maskz_reduce_round_sh(0x3, src1.xmmh[0], exp.xmmh[0], 6, _ROUND_CUR); + check_results(&res, &exp, N_ELEMS, "_mm_maskz_reduce_round_sh"); + + + if (n_errs != 0) { + abort (); + } +} + diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vrndscaleph-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vrndscaleph-1a.c new file mode 100644 index 00000000000..8a307274a9f --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vrndscaleph-1a.c @@ -0,0 +1,26 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx512fp16 -O2" } */ +/* { dg-final { scan-assembler-times "vrndscaleph\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vrndscaleph\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vrndscaleph\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vrndscaleph\[ \\t\]+\[^\n\]*\{sae\}\[^\{\n\]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vrndscaleph\[ \\t\]+\[^\n\]*\{sae\}\[^\{\n\]*%zmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vrndscaleph\[ \\t\]+\[^\n\]*\{sae\}\[^\{\n\]*%zmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ + +#include + +#define IMM 123 + +volatile __m512h x1; +volatile __mmask32 m; + +void extern +avx512fp16_test (void) +{ + x1 = _mm512_roundscale_ph (x1, IMM); + x1 = _mm512_mask_roundscale_ph (x1, m, x1, IMM); + x1 = _mm512_maskz_roundscale_ph (m, x1, IMM); + x1 = _mm512_roundscale_round_ph (x1, IMM, 8); + x1 = _mm512_mask_roundscale_round_ph (x1, m, x1, IMM, 8); + x1 = _mm512_maskz_roundscale_round_ph (m, x1, IMM, 8); +} diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vrndscaleph-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vrndscaleph-1b.c new file mode 100644 index 00000000000..d50e75585f1 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vrndscaleph-1b.c @@ -0,0 +1,101 @@ +/* { dg-do run { target avx512fp16 } } */ +/* { dg-options "-O2 -mavx512fp16 -mavx512dq" } */ + + +#define AVX512FP16 +#include "avx512fp16-helper.h" + +#define N_ELEMS (AVX512F_LEN / 16) + +void NOINLINE +EMULATE(roundscale_ph) (V512 * dest, V512 op1, + __mmask32 k, int zero_mask, int round) +{ + V512 v1, v2, v3, v4, v5, v6, v7, v8; + int i; + __mmask16 m1, m2; + V512 t1, t2; + m1 = k & 0xffff; + m2 = (k >> 16) & 0xffff; + + unpack_ph_2twops(op1, &v1, &v2); + unpack_ph_2twops(*dest, &v7, &v8); + if (round==0) + { + t1.zmm = _mm512_maskz_roundscale_ps (0xffff, v1.zmm, 0x11); + t2.zmm = _mm512_maskz_roundscale_ps (0xffff, v2.zmm, 0x11); + } + else + { + t1.zmm = _mm512_maskz_roundscale_ps (0xffff, v1.zmm, 0x14); + t2.zmm = _mm512_maskz_roundscale_ps (0xffff, v2.zmm, 0x14); + } + for (i = 0; i < 16; i++) + { + if (((1 << i) & m1) == 0) { + if (zero_mask) { + v5.f32[i] = 0; + } + else { + v5.u32[i] = v7.u32[i]; + } + } + else { + v5.f32[i] = t1.f32[i]; + } + + if (((1 << i) & m2) == 0) { + if (zero_mask) { + v6.f32[i] = 0; + } + else { + v6.u32[i] = v8.u32[i]; + } + } + else { + v6.f32[i] = t2.f32[i]; + } + } + *dest = pack_twops_2ph(v5, v6); +} + +void +TEST (void) +{ + V512 res, exp; + + init_src(); + + EMULATE(roundscale_ph) (&exp, src1, NET_MASK, 0, 1); + HF(res) = INTRINSIC (_roundscale_ph) (HF(src1), 0x13); + CHECK_RESULT (&res, &exp, N_ELEMS, _roundscale_ph); + + init_dest(&res, &exp); + EMULATE(roundscale_ph) (&exp, src1, MASK_VALUE, 0, 1); + HF(res) = INTRINSIC (_mask_roundscale_ph) (HF(res), MASK_VALUE, HF(src1), 0x14); + CHECK_RESULT (&res, &exp, N_ELEMS, _mask_roundscale_ph); + + EMULATE(roundscale_ph) (&exp, src1, ZMASK_VALUE, 1, 1); + HF(res) = INTRINSIC (_maskz_roundscale_ph) (ZMASK_VALUE, HF(src1), 0x14); + CHECK_RESULT (&res, &exp, N_ELEMS, _maskz_roundscale_ph); + +#if AVX512F_LEN == 512 + EMULATE(roundscale_ph) (&exp, src1, NET_MASK, 0, 1); + HF(res) = INTRINSIC (_roundscale_round_ph) (HF(src1), 0x13, 0x08); + CHECK_RESULT (&res, &exp, N_ELEMS, _roundscale_round_ph); + + init_dest(&res, &exp); + EMULATE(roundscale_ph) (&exp, src1, MASK_VALUE, 0, 1); + HF(res) = INTRINSIC (_mask_roundscale_round_ph) (HF(res), MASK_VALUE, HF(src1), 0x14, 0x08); + CHECK_RESULT (&res, &exp, N_ELEMS, _mask_roundscale_round_ph); + + EMULATE(roundscale_ph) (&exp, src1, ZMASK_VALUE, 1, 1); + HF(res) = INTRINSIC (_maskz_roundscale_round_ph) (ZMASK_VALUE, HF(src1), 0x14, 0x08); + CHECK_RESULT (&res, &exp, N_ELEMS, _maskz_roundscale_round_ph); +#endif + + if (n_errs != 0) { + abort (); + } +} + diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vrndscalesh-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vrndscalesh-1a.c new file mode 100644 index 00000000000..bd41b634aff --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vrndscalesh-1a.c @@ -0,0 +1,25 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx512fp16 -O2" } */ +/* { dg-final { scan-assembler-times "vrndscalesh\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\]*%xmm\[0-9\]+\[^\n\]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 2 } } */ +/* { dg-final { scan-assembler-times "vrndscalesh\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\]*%xmm\[0-9\]+\[^\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vrndscalesh\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\]*%xmm\[0-9\]+\[^\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vrndscalesh\[ \\t\]+\[^\n\]*\{sae\}\[^\{\n\]*%xmm\[0-9\]+\[^\n\]*%xmm\[0-9\]+\[^\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vrndscalesh\[ \\t\]+\[^\n\]*\{sae\}\[^\{\n\]*%xmm\[0-9\]+\[^\n\]*%xmm\[0-9\]+\[^\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ + +#include + +#define IMM 123 + +volatile __m128h x1, x2; +volatile __mmask8 m; + +void extern +avx512fp16_test (void) +{ + x1 = _mm_roundscale_sh (x1, x2, IMM); + x1 = _mm_mask_roundscale_sh(x1, m, x1, x2, IMM); + x1 = _mm_maskz_roundscale_sh(m, x1, x2, IMM); + x1 = _mm_roundscale_round_sh (x1, x2, IMM, 4); + x1 = _mm_mask_roundscale_round_sh(x1, m, x1, x2, IMM, 8); + x1 = _mm_maskz_roundscale_round_sh(m, x1, x2, IMM, 8); +} diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vrndscalesh-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vrndscalesh-1b.c new file mode 100644 index 00000000000..c1033892878 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vrndscalesh-1b.c @@ -0,0 +1,62 @@ +/* { dg-do run { target avx512fp16 } } */ +/* { dg-options "-O2 -mavx512fp16 -mavx512dq" } */ + + +#define AVX512FP16 +#include "avx512fp16-helper.h" + +#define N_ELEMS 8 + +void NOINLINE +emulate_roundscale_sh(V512 * dest, V512 op1, + __mmask8 k, int zero_mask) +{ + V512 v1, v2, v3, v4, v5, v6, v7, v8; + V512 t1,t2; + int i; + + unpack_ph_2twops(op1, &v1, &v2); + unpack_ph_2twops(*dest, &v7, &v8); + t1.zmm = _mm512_maskz_roundscale_ps (0xffff, v1.zmm, 0x14); + t2.zmm = _mm512_maskz_roundscale_ps (0xffff, v2.zmm, 0x14); + + if ((k&1) || !k) + v5.f32[0] = t1.f32[0]; + else if (zero_mask) + v5.f32[0] = 0; + else + v5.f32[0] = v7.f32[0]; + + for (i = 1; i < 8; i++) + v5.f32[i] = v1.f32[i]; + + *dest = pack_twops_2ph(v5, v6); +} + +void +test_512 (void) +{ + V512 res; + V512 exp; + + init_src(); + + emulate_roundscale_sh(&exp, src1, 0x1, 0); + res.xmmh[0] = _mm_roundscale_round_sh(src1.xmmh[0], src1.xmmh[0], 0x1, 0x08); + check_results(&res, &exp, N_ELEMS, "_mm_roundscale_round_sh"); + + init_dest(&res, &exp); + emulate_roundscale_sh(&exp, src1, 0x1, 0); + res.xmmh[0] = _mm_mask_roundscale_round_sh(res.xmmh[0], + 0x1, src1.xmmh[0], src1.xmmh[0], 0x1, 0x08); + check_results(&res, &exp, N_ELEMS, "_mm_mask_roundscale_round_sh"); + + emulate_roundscale_sh(&exp, src1, 0x3, 1); + res.xmmh[0] = _mm_maskz_roundscale_round_sh(0x3, src1.xmmh[0], src1.xmmh[0], 0x1, 0x08); + check_results(&res, &exp, N_ELEMS, "_mm_maskz_roundscale_round_sh"); + + + if (n_errs != 0) + abort (); +} + diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-vreduceph-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vreduceph-1a.c new file mode 100644 index 00000000000..4f43abd5411 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vreduceph-1a.c @@ -0,0 +1,30 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx512fp16 -mavx512vl -O2" } */ +/* { dg-final { scan-assembler-times "vreduceph\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vreduceph\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vreduceph\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vreduceph\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vreduceph\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vreduceph\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ + +#include + +#define IMM 123 + +volatile __m256h x2; +volatile __m128h x3; +volatile __mmask8 m8; +volatile __mmask16 m16; + +void extern +avx512fp16_test (void) +{ + x2 = _mm256_reduce_ph (x2, IMM); + x3 = _mm_reduce_ph (x3, IMM); + + x2 = _mm256_mask_reduce_ph (x2, m16, x2, IMM); + x3 = _mm_mask_reduce_ph (x3, m8, x3, IMM); + + x2 = _mm256_maskz_reduce_ph (m8, x2, IMM); + x3 = _mm_maskz_reduce_ph (m16, x3, IMM); +} diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-vreduceph-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vreduceph-1b.c new file mode 100644 index 00000000000..38515976ce6 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vreduceph-1b.c @@ -0,0 +1,16 @@ +/* { dg-do run { target avx512fp16 } } */ +/* { dg-options "-O2 -mavx512fp16 -mavx512vl -mavx512dq" } */ + +#define DEBUG +#define AVX512VL +#define AVX512F_LEN 256 +#define AVX512F_LEN_HALF 128 +#include "avx512fp16-vreduceph-1b.c" + +#undef AVX512F_LEN +#undef AVX512F_LEN_HALF + +#define AVX512F_LEN 128 +#define AVX512F_LEN_HALF 128 +#include "avx512fp16-vreduceph-1b.c" + diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-vrndscaleph-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vrndscaleph-1a.c new file mode 100644 index 00000000000..9fcf7e9b7bc --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vrndscaleph-1a.c @@ -0,0 +1,30 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx512fp16 -mavx512vl -O2" } */ +/* { dg-final { scan-assembler-times "vrndscaleph\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vrndscaleph\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vrndscaleph\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vrndscaleph\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vrndscaleph\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vrndscaleph\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ + +#include + +#define IMM 123 + +volatile __m256h x2; +volatile __m128h x3; +volatile __mmask8 m8; +volatile __mmask16 m16; + +void extern +avx512fp16_test (void) +{ + x2 = _mm256_roundscale_ph (x2, IMM); + x3 = _mm_roundscale_ph (x3, IMM); + + x2 = _mm256_mask_roundscale_ph (x2, m16, x2, IMM); + x3 = _mm_mask_roundscale_ph (x3, m8, x3, IMM); + + x2 = _mm256_maskz_roundscale_ph (m8, x2, IMM); + x3 = _mm_maskz_roundscale_ph (m16, x3, IMM); +} diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-vrndscaleph-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vrndscaleph-1b.c new file mode 100644 index 00000000000..04b00e2db2d --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vrndscaleph-1b.c @@ -0,0 +1,16 @@ +/* { dg-do run { target avx512fp16 } } */ +/* { dg-options "-O2 -mavx512fp16 -mavx512vl -mavx512dq" } */ + +#define DEBUG +#define AVX512VL +#define AVX512F_LEN 256 +#define AVX512F_LEN_HALF 128 +#include "avx512fp16-vrndscaleph-1b.c" + +#undef AVX512F_LEN +#undef AVX512F_LEN_HALF + +#define AVX512F_LEN 128 +#define AVX512F_LEN_HALF 128 +#include "avx512fp16-vrndscaleph-1b.c" + From patchwork Thu Jul 1 06:16:08 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: liuhongt X-Patchwork-Id: 1499330 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256 header.s=default header.b=AVYkcQOS; dkim-atps=neutral Received: from sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4GFpb36dXRz9sWX for ; Thu, 1 Jul 2021 16:43:59 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 6BDE4384A028 for ; Thu, 1 Jul 2021 06:43:57 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 6BDE4384A028 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1625121837; bh=5n2unIjtIQZh1IgpcE+/8q6nJXhMQ93BmYsce320jHk=; h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=AVYkcQOSuwT1BM+5/WF9uVGctUS0nQHuuM2cW5Xu2YVhwkD/sGhbfX0g8TiDwqdpc NfgeT2pQ7lgIS40kdJ7+tt4BBeaEnE9Ty5XEf9Q6QDsHxPC1G/hjwhMSnJnkTyH2eH QMs6u45k45SpPeBw7Ert7hivsHJrmewmbRS3Frs0= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mga07.intel.com (mga07.intel.com [134.134.136.100]) by sourceware.org (Postfix) with ESMTPS id BBD0C384841D for ; Thu, 1 Jul 2021 06:17:27 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org BBD0C384841D X-IronPort-AV: E=McAfee;i="6200,9189,10031"; a="272334063" X-IronPort-AV: E=Sophos;i="5.83,313,1616482800"; d="scan'208";a="272334063" Received: from orsmga005.jf.intel.com ([10.7.209.41]) by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Jun 2021 23:17:26 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.83,313,1616482800"; d="scan'208";a="626257440" Received: from scymds01.sc.intel.com ([10.148.94.138]) by orsmga005.jf.intel.com with ESMTP; 30 Jun 2021 23:17:26 -0700 Received: from shliclel320.sh.intel.com (shliclel320.sh.intel.com [10.239.236.50]) by scymds01.sc.intel.com with ESMTP id 1616Gmf1031625; Wed, 30 Jun 2021 23:17:25 -0700 To: gcc-patches@gcc.gnu.org Subject: [PATCH 22/62] AVX512FP16: Add fpclass/getexp/getmant instructions. Date: Thu, 1 Jul 2021 14:16:08 +0800 Message-Id: <20210701061648.9447-23-hongtao.liu@intel.com> X-Mailer: git-send-email 2.18.1 In-Reply-To: <20210701061648.9447-1-hongtao.liu@intel.com> References: <20210701061648.9447-1-hongtao.liu@intel.com> X-Spam-Status: No, score=-12.8 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_NONE, KAM_DMARC_STATUS, KAM_LAZY_DOMAIN_SECURITY, KAM_SHORT, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: liuhongt via Gcc-patches From: liuhongt Reply-To: liuhongt Cc: jakub@redhat.com Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" Add vfpclassph/vfpclasssh/vgetexpph/vgetexpsh/vgetmantph/vgetmantsh. gcc/ChangeLog: * config/i386/avx512fp16intrin.h (_mm_fpclass_sh_mask): New intrinsic. (_mm_mask_fpclass_sh_mask): Likewise. (_mm512_mask_fpclass_ph_mask): Likewise. (_mm512_fpclass_ph_mask): Likewise. (_mm_getexp_sh): Likewise. (_mm_mask_getexp_sh): Likewise. (_mm_maskz_getexp_sh): Likewise. (_mm512_getexp_ph): Likewise. (_mm512_mask_getexp_ph): Likewise. (_mm512_maskz_getexp_ph): Likewise. (_mm_getexp_round_sh): Likewise. (_mm_mask_getexp_round_sh): Likewise. (_mm_maskz_getexp_round_sh): Likewise. (_mm512_getexp_round_ph): Likewise. (_mm512_mask_getexp_round_ph): Likewise. (_mm512_maskz_getexp_round_ph): Likewise. (_mm_getmant_sh): Likewise. (_mm_mask_getmant_sh): Likewise. (_mm_maskz_getmant_sh): Likewise. (_mm512_getmant_ph): Likewise. (_mm512_mask_getmant_ph): Likewise. (_mm512_maskz_getmant_ph): Likewise. (_mm_getmant_round_sh): Likewise. (_mm_mask_getmant_round_sh): Likewise. (_mm_maskz_getmant_round_sh): Likewise. (_mm512_getmant_round_ph): Likewise. (_mm512_mask_getmant_round_ph): Likewise. (_mm512_maskz_getmant_round_ph): Likewise. * config/i386/avx512fp16vlintrin.h (_mm_mask_fpclass_ph_mask): New intrinsic. (_mm_fpclass_ph_mask): Likewise. (_mm256_mask_fpclass_ph_mask): Likewise. (_mm256_fpclass_ph_mask): Likewise. (_mm256_getexp_ph): Likewise. (_mm256_mask_getexp_ph): Likewise. (_mm256_maskz_getexp_ph): Likewise. (_mm_getexp_ph): Likewise. (_mm_mask_getexp_ph): Likewise. (_mm_maskz_getexp_ph): Likewise. (_mm256_getmant_ph): Likewise. (_mm256_mask_getmant_ph): Likewise. (_mm256_maskz_getmant_ph): Likewise. (_mm_getmant_ph): Likewise. (_mm_mask_getmant_ph): Likewise. (_mm_maskz_getmant_ph): Likewise. * config/i386/i386-builtin-types.def: Add corresponding builtin types. * config/i386/i386-builtin.def: Add corresponding new builtins. * config/i386/i386-expand.c (ix86_expand_args_builtin): Handle new builtin types. (ix86_expand_round_builtin): Ditto. * config/i386/sse.md (vecmemsuffix): Add HF vector modes. (_getexp): Adjust to support HF vector modes. (avx512f_sgetexp): Ditto. (avx512dq_vmfpclass): Ditto. (_getmant): Ditto. (avx512f_vgetmant): Ditto. gcc/testsuite/ChangeLog: * gcc.target/i386/avx-1.c: Add test for new builtins. * gcc.target/i386/sse-13.c: Ditto. * gcc.target/i386/sse-23.c: Ditto. * gcc.target/i386/sse-14.c: Add test for new intrinsics. * gcc.target/i386/sse-22.c: Ditto. --- gcc/config/i386/avx512fp16intrin.h | 471 +++++++++++++++++++++++++ gcc/config/i386/avx512fp16vlintrin.h | 229 ++++++++++++ gcc/config/i386/i386-builtin-types.def | 3 + gcc/config/i386/i386-builtin.def | 12 + gcc/config/i386/i386-expand.c | 7 + gcc/config/i386/sse.md | 41 +-- gcc/testsuite/gcc.target/i386/avx-1.c | 10 + gcc/testsuite/gcc.target/i386/sse-13.c | 10 + gcc/testsuite/gcc.target/i386/sse-14.c | 18 + gcc/testsuite/gcc.target/i386/sse-22.c | 18 + gcc/testsuite/gcc.target/i386/sse-23.c | 10 + 11 files changed, 809 insertions(+), 20 deletions(-) diff --git a/gcc/config/i386/avx512fp16intrin.h b/gcc/config/i386/avx512fp16intrin.h index 8c2c9b28987..2fbfc140c44 100644 --- a/gcc/config/i386/avx512fp16intrin.h +++ b/gcc/config/i386/avx512fp16intrin.h @@ -1982,6 +1982,477 @@ _mm_maskz_roundscale_round_sh (__mmask8 __A, __m128h __B, __m128h __C, #endif /* __OPTIMIZE__ */ +/* Intrinsics vfpclasssh. */ +#ifdef __OPTIMIZE__ +extern __inline __mmask8 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_fpclass_sh_mask (__m128h __A, const int __imm) +{ + return (__mmask8) __builtin_ia32_fpclasssh_mask ((__v8hf) __A, __imm, + (__mmask8) -1); +} + +extern __inline __mmask8 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_fpclass_sh_mask (__mmask8 __U, __m128h __A, const int __imm) +{ + return (__mmask8) __builtin_ia32_fpclasssh_mask ((__v8hf) __A, __imm, __U); +} + +#else +#define _mm_fpclass_sh_mask(X, C) \ + ((__mmask8) __builtin_ia32_fpclasssh_mask ((__v8hf) (__m128h) (X), \ + (int) (C), (__mmask8) (-1))) \ + +#define _mm_mask_fpclass_sh_mask(U, X, C) \ + ((__mmask8) __builtin_ia32_fpclasssh_mask ((__v8hf) (__m128h) (X), \ + (int) (C), (__mmask8) (U))) +#endif /* __OPTIMIZE__ */ + +/* Intrinsics vfpclassph. */ +#ifdef __OPTIMIZE__ +extern __inline __mmask32 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_fpclass_ph_mask (__mmask32 __U, __m512h __A, + const int __imm) +{ + return (__mmask32) __builtin_ia32_fpclassph512_mask ((__v32hf) __A, + __imm, __U); +} + +extern __inline __mmask32 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_fpclass_ph_mask (__m512h __A, const int __imm) +{ + return (__mmask32) __builtin_ia32_fpclassph512_mask ((__v32hf) __A, + __imm, + (__mmask32) -1); +} + +#else +#define _mm512_mask_fpclass_ph_mask(u, x, c) \ + ((__mmask32) __builtin_ia32_fpclassph512_mask ((__v32hf) (__m512h) (x),\ + (int) (c),(__mmask8)(u))) + +#define _mm512_fpclass_ph_mask(x, c) \ + ((__mmask32) __builtin_ia32_fpclassph512_mask ((__v32hf) (__m512h) (x),\ + (int) (c),(__mmask8)-1)) +#endif /* __OPIMTIZE__ */ + +/* Intrinsics vgetexpph, vgetexpsh. */ +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_getexp_sh (__m128h __A, __m128h __B) +{ + return (__m128h) + __builtin_ia32_getexpsh_mask_round ((__v8hf) __A, (__v8hf) __B, + (__v8hf) _mm_setzero_ph (), + (__mmask8) -1, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_getexp_sh (__m128h __W, __mmask8 __U, __m128h __A, __m128h __B) +{ + return (__m128h) + __builtin_ia32_getexpsh_mask_round ((__v8hf) __A, (__v8hf) __B, + (__v8hf) __W, (__mmask8) __U, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_getexp_sh (__mmask8 __U, __m128h __A, __m128h __B) +{ + return (__m128h) + __builtin_ia32_getexpsh_mask_round ((__v8hf) __A, (__v8hf) __B, + (__v8hf) _mm_setzero_ph (), + (__mmask8) __U, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_getexp_ph (__m512h __A) +{ + return (__m512h) + __builtin_ia32_getexpph512_mask ((__v32hf) __A, + (__v32hf) _mm512_setzero_ph (), + (__mmask32) -1, _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_getexp_ph (__m512h __W, __mmask32 __U, __m512h __A) +{ + return (__m512h) + __builtin_ia32_getexpph512_mask ((__v32hf) __A, (__v32hf) __W, + (__mmask32) __U, _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_getexp_ph (__mmask32 __U, __m512h __A) +{ + return (__m512h) + __builtin_ia32_getexpph512_mask ((__v32hf) __A, + (__v32hf) _mm512_setzero_ph (), + (__mmask32) __U, _MM_FROUND_CUR_DIRECTION); +} + +#ifdef __OPTIMIZE__ +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_getexp_round_sh (__m128h __A, __m128h __B, const int __R) +{ + return (__m128h) __builtin_ia32_getexpsh_mask_round ((__v8hf) __A, + (__v8hf) __B, + _mm_setzero_ph (), + (__mmask8) -1, + __R); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_getexp_round_sh (__m128h __W, __mmask8 __U, __m128h __A, + __m128h __B, const int __R) +{ + return (__m128h) __builtin_ia32_getexpsh_mask_round ((__v8hf) __A, + (__v8hf) __B, + (__v8hf) __W, + (__mmask8) __U, __R); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_getexp_round_sh (__mmask8 __U, __m128h __A, __m128h __B, + const int __R) +{ + return (__m128h) __builtin_ia32_getexpsh_mask_round ((__v8hf) __A, + (__v8hf) __B, + (__v8hf) + _mm_setzero_ph (), + (__mmask8) __U, __R); +} + +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_getexp_round_ph (__m512h __A, const int __R) +{ + return (__m512h) __builtin_ia32_getexpph512_mask ((__v32hf) __A, + (__v32hf) + _mm512_setzero_ph (), + (__mmask32) -1, __R); +} + +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_getexp_round_ph (__m512h __W, __mmask32 __U, __m512h __A, + const int __R) +{ + return (__m512h) __builtin_ia32_getexpph512_mask ((__v32hf) __A, + (__v32hf) __W, + (__mmask32) __U, __R); +} + +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_getexp_round_ph (__mmask32 __U, __m512h __A, const int __R) +{ + return (__m512h) __builtin_ia32_getexpph512_mask ((__v32hf) __A, + (__v32hf) + _mm512_setzero_ph (), + (__mmask32) __U, __R); +} + +#else +#define _mm_getexp_round_sh(A, B, R) \ + ((__m128h)__builtin_ia32_getexpsh_mask_round((__v8hf)(__m128h)(A), \ + (__v8hf)(__m128h)(B), \ + (__v8hf)_mm_setzero_ph(), \ + (__mmask8)-1, R)) + +#define _mm_mask_getexp_round_sh(W, U, A, B, C) \ + (__m128h)__builtin_ia32_getexpsh_mask_round(A, B, W, U, C) + +#define _mm_maskz_getexp_round_sh(U, A, B, C) \ + (__m128h)__builtin_ia32_getexpsh_mask_round(A, B, \ + (__v8hf)_mm_setzero_ph(), \ + U, C) + +#define _mm512_getexp_round_ph(A, R) \ + ((__m512h)__builtin_ia32_getexpph512_mask((__v32hf)(__m512h)(A), \ + (__v32hf)_mm512_setzero_ph(), (__mmask32)-1, R)) + +#define _mm512_mask_getexp_round_ph(W, U, A, R) \ + ((__m512h)__builtin_ia32_getexpph512_mask((__v32hf)(__m512h)(A), \ + (__v32hf)(__m512h)(W), (__mmask32)(U), R)) + +#define _mm512_maskz_getexp_round_ph(U, A, R) \ + ((__m512h)__builtin_ia32_getexpph512_mask((__v32hf)(__m512h)(A), \ + (__v32hf)_mm512_setzero_ph(), (__mmask32)(U), R)) + +#endif /* __OPTIMIZE__ */ + +/* Intrinsics vgetmantph, vgetmantsh. */ +#ifdef __OPTIMIZE__ +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_getmant_sh (__m128h __A, __m128h __B, + _MM_MANTISSA_NORM_ENUM __C, + _MM_MANTISSA_SIGN_ENUM __D) +{ + return (__m128h) + __builtin_ia32_getmantsh_mask_round ((__v8hf) __A, (__v8hf) __B, + (__D << 2) | __C, _mm_setzero_ph (), + (__mmask8) -1, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_getmant_sh (__m128h __W, __mmask8 __U, __m128h __A, + __m128h __B, _MM_MANTISSA_NORM_ENUM __C, + _MM_MANTISSA_SIGN_ENUM __D) +{ + return (__m128h) + __builtin_ia32_getmantsh_mask_round ((__v8hf) __A, (__v8hf) __B, + (__D << 2) | __C, (__v8hf) __W, + __U, _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_getmant_sh (__mmask8 __U, __m128h __A, __m128h __B, + _MM_MANTISSA_NORM_ENUM __C, + _MM_MANTISSA_SIGN_ENUM __D) +{ + return (__m128h) + __builtin_ia32_getmantsh_mask_round ((__v8hf) __A, (__v8hf) __B, + (__D << 2) | __C, + (__v8hf) _mm_setzero_ph(), + __U, _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_getmant_ph (__m512h __A, _MM_MANTISSA_NORM_ENUM __B, + _MM_MANTISSA_SIGN_ENUM __C) +{ + return (__m512h) __builtin_ia32_getmantph512_mask ((__v32hf) __A, + (__C << 2) | __B, + _mm512_setzero_ph (), + (__mmask32) -1, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_getmant_ph (__m512h __W, __mmask32 __U, __m512h __A, + _MM_MANTISSA_NORM_ENUM __B, + _MM_MANTISSA_SIGN_ENUM __C) +{ + return (__m512h) __builtin_ia32_getmantph512_mask ((__v32hf) __A, + (__C << 2) | __B, + (__v32hf) __W, __U, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_getmant_ph (__mmask32 __U, __m512h __A, + _MM_MANTISSA_NORM_ENUM __B, + _MM_MANTISSA_SIGN_ENUM __C) +{ + return (__m512h) __builtin_ia32_getmantph512_mask ((__v32hf) __A, + (__C << 2) | __B, + (__v32hf) + _mm512_setzero_ph (), + __U, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_getmant_round_sh (__m128h __A, __m128h __B, + _MM_MANTISSA_NORM_ENUM __C, + _MM_MANTISSA_SIGN_ENUM __D, const int __R) +{ + return (__m128h) __builtin_ia32_getmantsh_mask_round ((__v8hf) __A, + (__v8hf) __B, + (__D << 2) | __C, + _mm_setzero_ph (), + (__mmask8) -1, + __R); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_getmant_round_sh (__m128h __W, __mmask8 __U, __m128h __A, + __m128h __B, _MM_MANTISSA_NORM_ENUM __C, + _MM_MANTISSA_SIGN_ENUM __D, const int __R) +{ + return (__m128h) __builtin_ia32_getmantsh_mask_round ((__v8hf) __A, + (__v8hf) __B, + (__D << 2) | __C, + (__v8hf) __W, + __U, __R); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_getmant_round_sh (__mmask8 __U, __m128h __A, __m128h __B, + _MM_MANTISSA_NORM_ENUM __C, + _MM_MANTISSA_SIGN_ENUM __D, const int __R) +{ + return (__m128h) __builtin_ia32_getmantsh_mask_round ((__v8hf) __A, + (__v8hf) __B, + (__D << 2) | __C, + (__v8hf) + _mm_setzero_ph(), + __U, __R); +} + +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_getmant_round_ph (__m512h __A, _MM_MANTISSA_NORM_ENUM __B, + _MM_MANTISSA_SIGN_ENUM __C, const int __R) +{ + return (__m512h) __builtin_ia32_getmantph512_mask ((__v32hf) __A, + (__C << 2) | __B, + _mm512_setzero_ph (), + (__mmask32) -1, __R); +} + +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_getmant_round_ph (__m512h __W, __mmask32 __U, __m512h __A, + _MM_MANTISSA_NORM_ENUM __B, + _MM_MANTISSA_SIGN_ENUM __C, const int __R) +{ + return (__m512h) __builtin_ia32_getmantph512_mask ((__v32hf) __A, + (__C << 2) | __B, + (__v32hf) __W, __U, + __R); +} + +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_getmant_round_ph (__mmask32 __U, __m512h __A, + _MM_MANTISSA_NORM_ENUM __B, + _MM_MANTISSA_SIGN_ENUM __C, const int __R) +{ + return (__m512h) __builtin_ia32_getmantph512_mask ((__v32hf) __A, + (__C << 2) | __B, + (__v32hf) + _mm512_setzero_ph (), + __U, __R); +} + +#else +#define _mm512_getmant_ph(X, B, C) \ + ((__m512h)__builtin_ia32_getmantph512_mask ((__v32hf)(__m512h)(X), \ + (int)(((C)<<2) | (B)), \ + (__v32hf)(__m512h) \ + _mm512_setzero_ph(), \ + (__mmask32)-1, \ + _MM_FROUND_CUR_DIRECTION)) + +#define _mm512_mask_getmant_ph(W, U, X, B, C) \ + ((__m512h)__builtin_ia32_getmantph512_mask ((__v32hf)(__m512h)(X), \ + (int)(((C)<<2) | (B)), \ + (__v32hf)(__m512h)(W), \ + (__mmask32)(U), \ + _MM_FROUND_CUR_DIRECTION)) + + +#define _mm512_maskz_getmant_ph(U, X, B, C) \ + ((__m512h)__builtin_ia32_getmantph512_mask ((__v32hf)(__m512h)(X), \ + (int)(((C)<<2) | (B)), \ + (__v32hf)(__m512h) \ + _mm512_setzero_ph(), \ + (__mmask32)(U), \ + _MM_FROUND_CUR_DIRECTION)) + +#define _mm_getmant_sh(X, Y, C, D) \ + ((__m128h)__builtin_ia32_getmantsh_mask_round ((__v8hf)(__m128h)(X), \ + (__v8hf)(__m128h)(Y), \ + (int)(((D)<<2) | (C)), \ + (__v8hf)(__m128h) \ + _mm_setzero_ph (), \ + (__mmask8)-1, \ + _MM_FROUND_CUR_DIRECTION)) + +#define _mm_mask_getmant_sh(W, U, X, Y, C, D) \ + ((__m128h)__builtin_ia32_getmantsh_mask_round ((__v8hf)(__m128h)(X), \ + (__v8hf)(__m128h)(Y), \ + (int)(((D)<<2) | (C)), \ + (__v8hf)(__m128h)(W), \ + (__mmask8)(U), \ + _MM_FROUND_CUR_DIRECTION)) + +#define _mm_maskz_getmant_sh(U, X, Y, C, D) \ + ((__m128h)__builtin_ia32_getmantsh_mask_round ((__v8hf)(__m128h)(X), \ + (__v8hf)(__m128h)(Y), \ + (int)(((D)<<2) | (C)), \ + (__v8hf)(__m128h) \ + _mm_setzero_ph(), \ + (__mmask8)(U), \ + _MM_FROUND_CUR_DIRECTION)) + +#define _mm512_getmant_round_ph(X, B, C, R) \ + ((__m512h)__builtin_ia32_getmantph512_mask ((__v32hf)(__m512h)(X), \ + (int)(((C)<<2) | (B)), \ + (__v32hf)(__m512h) \ + _mm512_setzero_ph(), \ + (__mmask32)-1, \ + (R))) + +#define _mm512_mask_getmant_round_ph(W, U, X, B, C, R) \ + ((__m512h)__builtin_ia32_getmantph512_mask ((__v32hf)(__m512h)(X), \ + (int)(((C)<<2) | (B)), \ + (__v32hf)(__m512h)(W), \ + (__mmask32)(U), \ + (R))) + + +#define _mm512_maskz_getmant_round_ph(U, X, B, C, R) \ + ((__m512h)__builtin_ia32_getmantph512_mask ((__v32hf)(__m512h)(X), \ + (int)(((C)<<2) | (B)), \ + (__v32hf)(__m512h) \ + _mm512_setzero_ph(), \ + (__mmask32)(U), \ + (R))) + +#define _mm_getmant_round_sh(X, Y, C, D, R) \ + ((__m128h)__builtin_ia32_getmantsh_mask_round ((__v8hf)(__m128h)(X), \ + (__v8hf)(__m128h)(Y), \ + (int)(((D)<<2) | (C)), \ + (__v8hf)(__m128h) \ + _mm_setzero_ph (), \ + (__mmask8)-1, \ + (R))) + +#define _mm_mask_getmant_round_sh(W, U, X, Y, C, D, R) \ + ((__m128h)__builtin_ia32_getmantsh_mask_round ((__v8hf)(__m128h)(X), \ + (__v8hf)(__m128h)(Y), \ + (int)(((D)<<2) | (C)), \ + (__v8hf)(__m128h)(W), \ + (__mmask8)(U), \ + (R))) + +#define _mm_maskz_getmant_round_sh(U, X, Y, C, D, R) \ + ((__m128h)__builtin_ia32_getmantsh_mask_round ((__v8hf)(__m128h)(X), \ + (__v8hf)(__m128h)(Y), \ + (int)(((D)<<2) | (C)), \ + (__v8hf)(__m128h) \ + _mm_setzero_ph(), \ + (__mmask8)(U), \ + (R))) + +#endif /* __OPTIMIZE__ */ + #ifdef __DISABLE_AVX512FP16__ #undef __DISABLE_AVX512FP16__ #pragma GCC pop_options diff --git a/gcc/config/i386/avx512fp16vlintrin.h b/gcc/config/i386/avx512fp16vlintrin.h index 20b6716aa00..206d60407fc 100644 --- a/gcc/config/i386/avx512fp16vlintrin.h +++ b/gcc/config/i386/avx512fp16vlintrin.h @@ -701,6 +701,235 @@ _mm256_maskz_roundscale_ph (__mmask16 __A, __m256h __B, int __C) #endif /* __OPTIMIZE__ */ +/* Intrinsics vfpclassph. */ +#ifdef __OPTIMIZE__ +extern __inline __mmask8 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_fpclass_ph_mask (__mmask8 __U, __m128h __A, const int __imm) +{ + return (__mmask8) __builtin_ia32_fpclassph128_mask ((__v8hf) __A, + __imm, __U); +} + +extern __inline __mmask8 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_fpclass_ph_mask (__m128h __A, const int __imm) +{ + return (__mmask8) __builtin_ia32_fpclassph128_mask ((__v8hf) __A, + __imm, + (__mmask8) -1); +} + +extern __inline __mmask16 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_mask_fpclass_ph_mask (__mmask16 __U, __m256h __A, const int __imm) +{ + return (__mmask16) __builtin_ia32_fpclassph256_mask ((__v16hf) __A, + __imm, __U); +} + +extern __inline __mmask16 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_fpclass_ph_mask (__m256h __A, const int __imm) +{ + return (__mmask16) __builtin_ia32_fpclassph256_mask ((__v16hf) __A, + __imm, + (__mmask16) -1); +} + +#else +#define _mm_fpclass_ph_mask(X, C) \ + ((__mmask8) __builtin_ia32_fpclassph128_mask ((__v8hf) (__m128h) (X), \ + (int) (C),(__mmask8)-1)) + +#define _mm_mask_fpclass_ph_mask(u, X, C) \ + ((__mmask8) __builtin_ia32_fpclassph128_mask ((__v8hf) (__m128h) (X), \ + (int) (C),(__mmask8)(u))) + +#define _mm256_fpclass_ph_mask(X, C) \ + ((__mmask16) __builtin_ia32_fpclassph256_mask ((__v16hf) (__m256h) (X), \ + (int) (C),(__mmask16)-1)) + +#define _mm256_mask_fpclass_ph_mask(u, X, C) \ + ((__mmask16) __builtin_ia32_fpclassph256_mask ((__v16hf) (__m256h) (X), \ + (int) (C),(__mmask16)(u))) +#endif /* __OPTIMIZE__ */ + +/* Intrinsics vgetexpph, vgetexpsh. */ +extern __inline __m256h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_getexp_ph (__m256h __A) +{ + return (__m256h) __builtin_ia32_getexpph256_mask ((__v16hf) __A, + (__v16hf) + _mm256_setzero_ph (), + (__mmask16) -1); +} + +extern __inline __m256h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_mask_getexp_ph (__m256h __W, __mmask16 __U, __m256h __A) +{ + return (__m256h) __builtin_ia32_getexpph256_mask ((__v16hf) __A, + (__v16hf) __W, + (__mmask16) __U); +} + +extern __inline __m256h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_maskz_getexp_ph (__mmask16 __U, __m256h __A) +{ + return (__m256h) __builtin_ia32_getexpph256_mask ((__v16hf) __A, + (__v16hf) + _mm256_setzero_ph (), + (__mmask16) __U); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_getexp_ph (__m128h __A) +{ + return (__m128h) __builtin_ia32_getexpph128_mask ((__v8hf) __A, + (__v8hf) + _mm_setzero_ph (), + (__mmask8) -1); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_getexp_ph (__m128h __W, __mmask8 __U, __m128h __A) +{ + return (__m128h) __builtin_ia32_getexpph128_mask ((__v8hf) __A, + (__v8hf) __W, + (__mmask8) __U); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_getexp_ph (__mmask8 __U, __m128h __A) +{ + return (__m128h) __builtin_ia32_getexpph128_mask ((__v8hf) __A, + (__v8hf) + _mm_setzero_ph (), + (__mmask8) __U); +} + + +/* Intrinsics vgetmantph, vgetmantsh. */ +#ifdef __OPTIMIZE__ +extern __inline __m256h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_getmant_ph (__m256h __A, _MM_MANTISSA_NORM_ENUM __B, + _MM_MANTISSA_SIGN_ENUM __C) +{ + return (__m256h) __builtin_ia32_getmantph256_mask ((__v16hf) __A, + (__C << 2) | __B, + (__v16hf) + _mm256_setzero_ph (), + (__mmask16) -1); +} + +extern __inline __m256h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_mask_getmant_ph (__m256h __W, __mmask16 __U, __m256h __A, + _MM_MANTISSA_NORM_ENUM __B, + _MM_MANTISSA_SIGN_ENUM __C) +{ + return (__m256h) __builtin_ia32_getmantph256_mask ((__v16hf) __A, + (__C << 2) | __B, + (__v16hf) __W, + (__mmask16) __U); +} + +extern __inline __m256h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_maskz_getmant_ph (__mmask16 __U, __m256h __A, + _MM_MANTISSA_NORM_ENUM __B, + _MM_MANTISSA_SIGN_ENUM __C) +{ + return (__m256h) __builtin_ia32_getmantph256_mask ((__v16hf) __A, + (__C << 2) | __B, + (__v16hf) + _mm256_setzero_ph (), + (__mmask16) __U); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_getmant_ph (__m128h __A, _MM_MANTISSA_NORM_ENUM __B, + _MM_MANTISSA_SIGN_ENUM __C) +{ + return (__m128h) __builtin_ia32_getmantph128_mask ((__v8hf) __A, + (__C << 2) | __B, + (__v8hf) + _mm_setzero_ph (), + (__mmask8) -1); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_getmant_ph (__m128h __W, __mmask8 __U, __m128h __A, + _MM_MANTISSA_NORM_ENUM __B, + _MM_MANTISSA_SIGN_ENUM __C) +{ + return (__m128h) __builtin_ia32_getmantph128_mask ((__v8hf) __A, + (__C << 2) | __B, + (__v8hf) __W, + (__mmask8) __U); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_getmant_ph (__mmask8 __U, __m128h __A, + _MM_MANTISSA_NORM_ENUM __B, + _MM_MANTISSA_SIGN_ENUM __C) +{ + return (__m128h) __builtin_ia32_getmantph128_mask ((__v8hf) __A, + (__C << 2) | __B, + (__v8hf) + _mm_setzero_ph (), + (__mmask8) __U); +} + +#else +#define _mm256_getmant_ph(X, B, C) \ + ((__m256h) __builtin_ia32_getmantph256_mask ((__v16hf)(__m256h) (X), \ + (int)(((C)<<2) | (B)), \ + (__v16hf)(__m256h)_mm256_setzero_ph (),\ + (__mmask16)-1)) + +#define _mm256_mask_getmant_ph(W, U, X, B, C) \ + ((__m256h) __builtin_ia32_getmantph256_mask ((__v16hf)(__m256h) (X), \ + (int)(((C)<<2) | (B)), \ + (__v16hf)(__m256h)(W), \ + (__mmask16)(U))) + +#define _mm256_maskz_getmant_ph(U, X, B, C) \ + ((__m256h) __builtin_ia32_getmantph256_mask ((__v16hf)(__m256h) (X), \ + (int)(((C)<<2) | (B)), \ + (__v16hf)(__m256h)_mm256_setzero_ph (),\ + (__mmask16)(U))) + +#define _mm_getmant_ph(X, B, C) \ + ((__m128h) __builtin_ia32_getmantph128_mask ((__v8hf)(__m128h) (X), \ + (int)(((C)<<2) | (B)), \ + (__v8hf)(__m128h)_mm_setzero_ph (), \ + (__mmask8)-1)) + +#define _mm_mask_getmant_ph(W, U, X, B, C) \ + ((__m128h) __builtin_ia32_getmantph128_mask ((__v8hf)(__m128h) (X), \ + (int)(((C)<<2) | (B)), \ + (__v8hf)(__m128h)(W), \ + (__mmask8)(U))) + +#define _mm_maskz_getmant_ph(U, X, B, C) \ + ((__m128h) __builtin_ia32_getmantph128_mask ((__v8hf)(__m128h) (X), \ + (int)(((C)<<2) | (B)), \ + (__v8hf)(__m128h)_mm_setzero_ph (), \ + (__mmask8)(U))) + +#endif /* __OPTIMIZE__ */ + #ifdef __DISABLE_AVX512FP16VL__ #undef __DISABLE_AVX512FP16VL__ #pragma GCC pop_options diff --git a/gcc/config/i386/i386-builtin-types.def b/gcc/config/i386/i386-builtin-types.def index d2ba1a5edac..79e7edf13e5 100644 --- a/gcc/config/i386/i386-builtin-types.def +++ b/gcc/config/i386/i386-builtin-types.def @@ -1304,6 +1304,9 @@ DEF_FUNCTION_TYPE (UINT8, PV2DI, PCV2DI, PCVOID) # FP16 builtins DEF_FUNCTION_TYPE (V8HF, V8HI) +DEF_FUNCTION_TYPE (QI, V8HF, INT, UQI) +DEF_FUNCTION_TYPE (HI, V16HF, INT, UHI) +DEF_FUNCTION_TYPE (SI, V32HF, INT, USI) DEF_FUNCTION_TYPE (V8HF, V8HF, V8HF) DEF_FUNCTION_TYPE (V8HF, V8HF, V8HF, UQI) DEF_FUNCTION_TYPE (V8HF, V8HF, V8HF, INT) diff --git a/gcc/config/i386/i386-builtin.def b/gcc/config/i386/i386-builtin.def index 6964062c874..ed1a4a38b1c 100644 --- a/gcc/config/i386/i386-builtin.def +++ b/gcc/config/i386/i386-builtin.def @@ -2818,6 +2818,14 @@ BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_reducepv8 BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_reducepv16hf_mask, "__builtin_ia32_vreduceph_v16hf_mask", IX86_BUILTIN_VREDUCEPH_V16HF_MASK, UNKNOWN, (int) V16HF_FTYPE_V16HF_INT_V16HF_UHI) BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_rndscalev8hf_mask, "__builtin_ia32_vrndscaleph_v8hf_mask", IX86_BUILTIN_VRNDSCALEPH_V8HF_MASK, UNKNOWN, (int) V8HF_FTYPE_V8HF_INT_V8HF_UQI) BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512vl_rndscalev16hf_mask, "__builtin_ia32_vrndscaleph_v16hf_mask", IX86_BUILTIN_VRNDSCALEPH_V16HF_MASK, UNKNOWN, (int) V16HF_FTYPE_V16HF_INT_V16HF_UHI) +BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512dq_fpclassv16hf_mask, "__builtin_ia32_fpclassph256_mask", IX86_BUILTIN_FPCLASSPH256, UNKNOWN, (int) HI_FTYPE_V16HF_INT_UHI) +BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512dq_fpclassv8hf_mask, "__builtin_ia32_fpclassph128_mask", IX86_BUILTIN_FPCLASSPH128, UNKNOWN, (int) QI_FTYPE_V8HF_INT_UQI) +BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512dq_fpclassv32hf_mask, "__builtin_ia32_fpclassph512_mask", IX86_BUILTIN_FPCLASSPH512, UNKNOWN, (int) SI_FTYPE_V32HF_INT_USI) +BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512dq_vmfpclassv8hf_mask, "__builtin_ia32_fpclasssh_mask", IX86_BUILTIN_FPCLASSSH_MASK, UNKNOWN, (int) QI_FTYPE_V8HF_INT_UQI) +BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512vl_getexpv16hf_mask, "__builtin_ia32_getexpph256_mask", IX86_BUILTIN_GETEXPPH256, UNKNOWN, (int) V16HF_FTYPE_V16HF_V16HF_UHI) +BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_getexpv8hf_mask, "__builtin_ia32_getexpph128_mask", IX86_BUILTIN_GETEXPPH128, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_UQI) +BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512vl_getmantv16hf_mask, "__builtin_ia32_getmantph256_mask", IX86_BUILTIN_GETMANTPH256, UNKNOWN, (int) V16HF_FTYPE_V16HF_INT_V16HF_UHI) +BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_getmantv8hf_mask, "__builtin_ia32_getmantph128_mask", IX86_BUILTIN_GETMANTPH128, UNKNOWN, (int) V8HF_FTYPE_V8HF_INT_V8HF_UQI) /* Builtins with rounding support. */ BDESC_END (ARGS, ROUND_ARGS) @@ -3041,6 +3049,10 @@ BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_reducepv32hf_mask_round, "__buil BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_reducesv8hf_mask_round, "__builtin_ia32_vreducesh_v8hf_mask_round", IX86_BUILTIN_VREDUCESH_V8HF_MASK_ROUND, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_INT_V8HF_UQI_INT) BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_rndscalev32hf_mask_round, "__builtin_ia32_vrndscaleph_v32hf_mask_round", IX86_BUILTIN_VRNDSCALEPH_V32HF_MASK_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HF_INT_V32HF_USI_INT) BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512f_rndscalev8hf_mask_round, "__builtin_ia32_vrndscalesh_v8hf_mask_round", IX86_BUILTIN_VRNDSCALESH_V8HF_MASK_ROUND, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_INT_V8HF_UQI_INT) +BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_getexpv32hf_mask_round, "__builtin_ia32_getexpph512_mask", IX86_BUILTIN_GETEXPPH512, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_USI_INT) +BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512f_sgetexpv8hf_mask_round, "__builtin_ia32_getexpsh_mask_round", IX86_BUILTIN_GETEXPSH_MASK_ROUND, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI_INT) +BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_getmantv32hf_mask_round, "__builtin_ia32_getmantph512_mask", IX86_BUILTIN_GETMANTPH512, UNKNOWN, (int) V32HF_FTYPE_V32HF_INT_V32HF_USI_INT) +BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512f_vgetmantv8hf_mask_round, "__builtin_ia32_getmantsh_mask_round", IX86_BUILTIN_GETMANTSH_MASK_ROUND, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_INT_V8HF_UQI_INT) BDESC_END (ROUND_ARGS, MULTI_ARG) diff --git a/gcc/config/i386/i386-expand.c b/gcc/config/i386/i386-expand.c index 655234cbdd0..266aa411ddb 100644 --- a/gcc/config/i386/i386-expand.c +++ b/gcc/config/i386/i386-expand.c @@ -9735,6 +9735,9 @@ ix86_expand_args_builtin (const struct builtin_description *d, case HI_FTYPE_V16SF_INT_UHI: case QI_FTYPE_V8SF_INT_UQI: case QI_FTYPE_V4SF_INT_UQI: + case QI_FTYPE_V8HF_INT_UQI: + case HI_FTYPE_V16HF_INT_UHI: + case SI_FTYPE_V32HF_INT_USI: case V4SI_FTYPE_V4SI_V4SI_UHI: case V8SI_FTYPE_V8SI_V8SI_UHI: nargs = 3; @@ -10056,8 +10059,10 @@ ix86_expand_args_builtin (const struct builtin_description *d, case CODE_FOR_avx_vpermilv4df_mask: case CODE_FOR_avx512f_getmantv8df_mask: case CODE_FOR_avx512f_getmantv16sf_mask: + case CODE_FOR_avx512vl_getmantv16hf_mask: case CODE_FOR_avx512vl_getmantv8sf_mask: case CODE_FOR_avx512vl_getmantv4df_mask: + case CODE_FOR_avx512fp16_getmantv8hf_mask: case CODE_FOR_avx512vl_getmantv4sf_mask: case CODE_FOR_avx512vl_getmantv2df_mask: case CODE_FOR_avx512dq_rangepv8df_mask_round: @@ -10593,10 +10598,12 @@ ix86_expand_round_builtin (const struct builtin_description *d, { case CODE_FOR_avx512f_getmantv8df_mask_round: case CODE_FOR_avx512f_getmantv16sf_mask_round: + case CODE_FOR_avx512bw_getmantv32hf_mask_round: case CODE_FOR_avx512f_vgetmantv2df_round: case CODE_FOR_avx512f_vgetmantv2df_mask_round: case CODE_FOR_avx512f_vgetmantv4sf_round: case CODE_FOR_avx512f_vgetmantv4sf_mask_round: + case CODE_FOR_avx512f_vgetmantv8hf_mask_round: error ("the immediate argument must be a 4-bit immediate"); return const0_rtx; case CODE_FOR_avx512f_cmpv8df3_mask_round: diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index f43651a95ce..c4db778e25d 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -701,7 +701,8 @@ (define_mode_attr ssequarterinsnmode [(V16SF "V4SF") (V8DF "V2DF") (V16SI "TI") (V8DI "TI")]) (define_mode_attr vecmemsuffix - [(V16SF "{z}") (V8SF "{y}") (V4SF "{x}") + [(V32HF "{z}") (V16HF "{y}") (V8HF "{x}") + (V16SF "{z}") (V8SF "{y}") (V4SF "{x}") (V8DF "{z}") (V4DF "{y}") (V2DF "{x}")]) (define_mode_attr ssedoublemodelower @@ -10050,8 +10051,8 @@ (define_insn "_vternlog_mask" (set_attr "mode" "")]) (define_insn "_getexp" - [(set (match_operand:VF_AVX512VL 0 "register_operand" "=v") - (unspec:VF_AVX512VL [(match_operand:VF_AVX512VL 1 "" "")] + [(set (match_operand:VFH_AVX512VL 0 "register_operand" "=v") + (unspec:VFH_AVX512VL [(match_operand:VFH_AVX512VL 1 "" "")] UNSPEC_GETEXP))] "TARGET_AVX512F" "vgetexp\t{%1, %0|%0, %1}"; @@ -10059,11 +10060,11 @@ (define_insn "_getexp" (set_attr "mode" "")]) (define_insn "avx512f_sgetexp" - [(set (match_operand:VF_128 0 "register_operand" "=v") - (vec_merge:VF_128 - (unspec:VF_128 - [(match_operand:VF_128 1 "register_operand" "v") - (match_operand:VF_128 2 "" "")] + [(set (match_operand:VFH_128 0 "register_operand" "=v") + (vec_merge:VFH_128 + (unspec:VFH_128 + [(match_operand:VFH_128 1 "register_operand" "v") + (match_operand:VFH_128 2 "" "")] UNSPEC_GETEXP) (match_dup 1) (const_int 1)))] @@ -23571,10 +23572,10 @@ (define_insn "avx512dq_ranges (define_insn "avx512dq_fpclass" [(set (match_operand: 0 "register_operand" "=k") (unspec: - [(match_operand:VF_AVX512VL 1 "vector_operand" "vm") + [(match_operand:VFH_AVX512VL 1 "vector_operand" "vm") (match_operand 2 "const_0_to_255_operand" "n")] UNSPEC_FPCLASS))] - "TARGET_AVX512DQ" + "TARGET_AVX512DQ || VALID_AVX512FP16_REG_MODE(mode)" "vfpclass\t{%2, %1, %0|%0, %1, %2}"; [(set_attr "type" "sse") (set_attr "length_immediate" "1") @@ -23585,11 +23586,11 @@ (define_insn "avx512dq_vmfpclass" [(set (match_operand: 0 "register_operand" "=k") (and: (unspec: - [(match_operand:VF_128 1 "nonimmediate_operand" "vm") + [(match_operand:VFH_128 1 "nonimmediate_operand" "vm") (match_operand 2 "const_0_to_255_operand" "n")] UNSPEC_FPCLASS) (const_int 1)))] - "TARGET_AVX512DQ" + "TARGET_AVX512DQ || VALID_AVX512FP16_REG_MODE(mode)" "vfpclass\t{%2, %1, %0|%0, %1, %2}"; [(set_attr "type" "sse") (set_attr "length_immediate" "1") @@ -23597,9 +23598,9 @@ (define_insn "avx512dq_vmfpclass" (set_attr "mode" "")]) (define_insn "_getmant" - [(set (match_operand:VF_AVX512VL 0 "register_operand" "=v") - (unspec:VF_AVX512VL - [(match_operand:VF_AVX512VL 1 "nonimmediate_operand" "") + [(set (match_operand:VFH_AVX512VL 0 "register_operand" "=v") + (unspec:VFH_AVX512VL + [(match_operand:VFH_AVX512VL 1 "nonimmediate_operand" "") (match_operand:SI 2 "const_0_to_15_operand")] UNSPEC_GETMANT))] "TARGET_AVX512F" @@ -23608,11 +23609,11 @@ (define_insn "_getmant" (set_attr "mode" "")]) (define_insn "avx512f_vgetmant" - [(set (match_operand:VF_128 0 "register_operand" "=v") - (vec_merge:VF_128 - (unspec:VF_128 - [(match_operand:VF_128 1 "register_operand" "v") - (match_operand:VF_128 2 "" "") + [(set (match_operand:VFH_128 0 "register_operand" "=v") + (vec_merge:VFH_128 + (unspec:VFH_128 + [(match_operand:VFH_128 1 "register_operand" "v") + (match_operand:VFH_128 2 "" "") (match_operand:SI 3 "const_0_to_15_operand")] UNSPEC_GETMANT) (match_dup 1) diff --git a/gcc/testsuite/gcc.target/i386/avx-1.c b/gcc/testsuite/gcc.target/i386/avx-1.c index 4c8e54e4c2a..b3cffa0644f 100644 --- a/gcc/testsuite/gcc.target/i386/avx-1.c +++ b/gcc/testsuite/gcc.target/i386/avx-1.c @@ -713,10 +713,20 @@ #define __builtin_ia32_vrndscaleph_v8hf_mask(A, B, C, D) __builtin_ia32_vrndscaleph_v8hf_mask(A, 123, C, D) #define __builtin_ia32_vrndscaleph_v16hf_mask(A, B, C, D) __builtin_ia32_vrndscaleph_v16hf_mask(A, 123, C, D) #define __builtin_ia32_vrndscalesh_v8hf_mask_round(A, B, C, D, E, F) __builtin_ia32_vrndscalesh_v8hf_mask_round(A, B, 123, D, E, 8) +#define __builtin_ia32_fpclassph512_mask(A, D, C) __builtin_ia32_fpclassph512_mask(A, 1, C) +#define __builtin_ia32_fpclasssh_mask(A, D, U) __builtin_ia32_fpclasssh_mask(A, 1, U) +#define __builtin_ia32_getexpph512_mask(A, B, C, D) __builtin_ia32_getexpph512_mask(A, B, C, 8) +#define __builtin_ia32_getexpsh_mask_round(A, B, C, D, E) __builtin_ia32_getexpsh_mask_round(A, B, C, D, 4) +#define __builtin_ia32_getmantph512_mask(A, F, C, D, E) __builtin_ia32_getmantph512_mask(A, 1, C, D, 8) +#define __builtin_ia32_getmantsh_mask_round(A, B, C, W, U, D) __builtin_ia32_getmantsh_mask_round(A, B, 1, W, U, 4) /* avx512fp16vlintrin.h */ #define __builtin_ia32_vcmpph_v8hf_mask(A, B, C, D) __builtin_ia32_vcmpph_v8hf_mask(A, B, 1, D) #define __builtin_ia32_vcmpph_v16hf_mask(A, B, C, D) __builtin_ia32_vcmpph_v16hf_mask(A, B, 1, D) +#define __builtin_ia32_fpclassph256_mask(A, D, C) __builtin_ia32_fpclassph256_mask(A, 1, C) +#define __builtin_ia32_fpclassph128_mask(A, D, C) __builtin_ia32_fpclassph128_mask(A, 1, C) +#define __builtin_ia32_getmantph256_mask(A, E, C, D) __builtin_ia32_getmantph256_mask(A, 1, C, D) +#define __builtin_ia32_getmantph128_mask(A, E, C, D) __builtin_ia32_getmantph128_mask(A, 1, C, D) /* vpclmulqdqintrin.h */ #define __builtin_ia32_vpclmulqdq_v4di(A, B, C) __builtin_ia32_vpclmulqdq_v4di(A, B, 1) diff --git a/gcc/testsuite/gcc.target/i386/sse-13.c b/gcc/testsuite/gcc.target/i386/sse-13.c index 044d427c932..67ef567e437 100644 --- a/gcc/testsuite/gcc.target/i386/sse-13.c +++ b/gcc/testsuite/gcc.target/i386/sse-13.c @@ -730,10 +730,20 @@ #define __builtin_ia32_vrndscaleph_v8hf_mask(A, B, C, D) __builtin_ia32_vrndscaleph_v8hf_mask(A, 123, C, D) #define __builtin_ia32_vrndscaleph_v16hf_mask(A, B, C, D) __builtin_ia32_vrndscaleph_v16hf_mask(A, 123, C, D) #define __builtin_ia32_vrndscalesh_v8hf_mask_round(A, B, C, D, E, F) __builtin_ia32_vrndscalesh_v8hf_mask_round(A, B, 123, D, E, 8) +#define __builtin_ia32_fpclassph512_mask(A, D, C) __builtin_ia32_fpclassph512_mask(A, 1, C) +#define __builtin_ia32_fpclasssh_mask(A, D, U) __builtin_ia32_fpclasssh_mask(A, 1, U) +#define __builtin_ia32_getexpph512_mask(A, B, C, D) __builtin_ia32_getexpph512_mask(A, B, C, 8) +#define __builtin_ia32_getexpsh_mask_round(A, B, C, D, E) __builtin_ia32_getexpsh_mask_round(A, B, C, D, 4) +#define __builtin_ia32_getmantph512_mask(A, F, C, D, E) __builtin_ia32_getmantph512_mask(A, 1, C, D, 8) +#define __builtin_ia32_getmantsh_mask_round(A, B, C, W, U, D) __builtin_ia32_getmantsh_mask_round(A, B, 1, W, U, 4) /* avx512fp16vlintrin.h */ #define __builtin_ia32_vcmpph_v8hf_mask(A, B, C, D) __builtin_ia32_vcmpph_v8hf_mask(A, B, 1, D) #define __builtin_ia32_vcmpph_v16hf_mask(A, B, C, D) __builtin_ia32_vcmpph_v16hf_mask(A, B, 1, D) +#define __builtin_ia32_fpclassph256_mask(A, D, C) __builtin_ia32_fpclassph256_mask(A, 1, C) +#define __builtin_ia32_fpclassph128_mask(A, D, C) __builtin_ia32_fpclassph128_mask(A, 1, C) +#define __builtin_ia32_getmantph256_mask(A, E, C, D) __builtin_ia32_getmantph256_mask(A, 1, C, D) +#define __builtin_ia32_getmantph128_mask(A, E, C, D) __builtin_ia32_getmantph128_mask(A, 1, C, D) /* vpclmulqdqintrin.h */ #define __builtin_ia32_vpclmulqdq_v4di(A, B, C) __builtin_ia32_vpclmulqdq_v4di(A, B, 1) diff --git a/gcc/testsuite/gcc.target/i386/sse-14.c b/gcc/testsuite/gcc.target/i386/sse-14.c index b7ffdf7e1df..04163874f90 100644 --- a/gcc/testsuite/gcc.target/i386/sse-14.c +++ b/gcc/testsuite/gcc.target/i386/sse-14.c @@ -677,8 +677,11 @@ test_1 (_mm512_reduce_ph, __m512h, __m512h, 123) test_1 (_mm_roundscale_ph, __m128h, __m128h, 123) test_1 (_mm256_roundscale_ph, __m256h, __m256h, 123) test_1 (_mm512_roundscale_ph, __m512h, __m512h, 123) +test_1 (_mm512_getexp_round_ph, __m512h, __m512h, 8) test_1x (_mm512_reduce_round_ph, __m512h, __m512h, 123, 8) test_1x (_mm512_roundscale_round_ph, __m512h, __m512h, 123, 8) +test_1x (_mm512_getmant_ph, __m512h, __m512h, 1, 1) +test_1y (_mm512_getmant_round_ph, __m512h, __m512h, 1, 1, 8) test_2 (_mm512_add_round_ph, __m512h, __m512h, __m512h, 8) test_2 (_mm512_sub_round_ph, __m512h, __m512h, __m512h, 8) test_2 (_mm512_mul_round_ph, __m512h, __m512h, __m512h, 8) @@ -705,6 +708,8 @@ test_2 (_mm_maskz_roundscale_ph, __m128h, __mmask8, __m128h, 123) test_2 (_mm256_maskz_roundscale_ph, __m256h, __mmask16, __m256h, 123) test_2 (_mm512_maskz_roundscale_ph, __m512h, __mmask32, __m512h, 123) test_2 (_mm_roundscale_sh, __m128h, __m128h, __m128h, 123) +test_2 (_mm512_maskz_getexp_round_ph, __m512h, __mmask32, __m512h, 8) +test_2 (_mm_getexp_round_sh, __m128h, __m128h, __m128h, 8) test_2x (_mm512_cmp_round_ph_mask, __mmask32, __m512h, __m512h, 1, 8) test_2x (_mm_cmp_round_sh_mask, __mmask8, __m128h, __m128h, 1, 8) test_2x (_mm_comi_round_sh, int, __m128h, __m128h, 1, 8) @@ -712,6 +717,10 @@ test_2x (_mm512_maskz_reduce_round_ph, __m512h, __mmask32, __m512h, 123, 8) test_2x (_mm512_maskz_roundscale_round_ph, __m512h, __mmask32, __m512h, 123, 8) test_2x (_mm_reduce_round_sh, __m128h, __m128h, __m128h, 123, 8) test_2x (_mm_roundscale_round_sh, __m128h, __m128h, __m128h, 123, 8) +test_2x (_mm512_maskz_getmant_ph, __m512h, __mmask32, __m512h, 1, 1) +test_2x (_mm_getmant_sh, __m128h, __m128h, __m128h, 1, 1) +test_2y (_mm512_maskz_getmant_round_ph, __m512h, __mmask32, __m512h, 1, 1, 8) +test_2y (_mm_getmant_round_sh, __m128h, __m128h, __m128h, 1, 1, 8) test_3 (_mm512_maskz_add_round_ph, __m512h, __mmask32, __m512h, __m512h, 8) test_3 (_mm512_maskz_sub_round_ph, __m512h, __mmask32, __m512h, __m512h, 8) test_3 (_mm512_maskz_mul_round_ph, __m512h, __mmask32, __m512h, __m512h, 8) @@ -737,12 +746,18 @@ test_3 (_mm_mask_roundscale_ph, __m128h, __m128h, __mmask8, __m128h, 123) test_3 (_mm256_mask_roundscale_ph, __m256h, __m256h, __mmask16, __m256h, 123) test_3 (_mm512_mask_roundscale_ph, __m512h, __m512h, __mmask32, __m512h, 123) test_3 (_mm_maskz_roundscale_sh, __m128h, __mmask8, __m128h, __m128h, 123) +test_3 (_mm_maskz_getexp_round_sh, __m128h, __mmask8, __m128h, __m128h, 8) +test_3 (_mm512_mask_getexp_round_ph, __m512h, __m512h, __mmask32, __m512h, 8) test_3x (_mm512_mask_cmp_round_ph_mask, __mmask32, __mmask32, __m512h, __m512h, 1, 8) test_3x (_mm_mask_cmp_round_sh_mask, __mmask8, __mmask8, __m128h, __m128h, 1, 8) test_3x (_mm512_mask_reduce_round_ph, __m512h, __m512h, __mmask32, __m512h, 123, 8) test_3x (_mm512_mask_roundscale_round_ph, __m512h, __m512h, __mmask32, __m512h, 123, 8) test_3x (_mm_maskz_reduce_round_sh, __m128h, __mmask8, __m128h, __m128h, 123, 8) test_3x (_mm_maskz_roundscale_round_sh, __m128h, __mmask8, __m128h, __m128h, 123, 8) +test_3x (_mm512_mask_getmant_ph, __m512h, __m512h, __mmask32, __m512h, 1, 1) +test_3x (_mm_maskz_getmant_sh, __m128h, __mmask8, __m128h, __m128h, 1, 1) +test_3y (_mm_maskz_getmant_round_sh, __m128h, __mmask8, __m128h, __m128h, 1, 1, 8) +test_3y (_mm512_mask_getmant_round_ph, __m512h, __m512h, __mmask32, __m512h, 1, 1, 8) test_4 (_mm512_mask_add_round_ph, __m512h, __m512h, __mmask32, __m512h, __m512h, 8) test_4 (_mm512_mask_sub_round_ph, __m512h, __m512h, __mmask32, __m512h, __m512h, 8) test_4 (_mm512_mask_mul_round_ph, __m512h, __m512h, __mmask32, __m512h, __m512h, 8) @@ -760,8 +775,11 @@ test_4 (_mm512_mask_scalef_round_ph, __m512h, __m512h, __mmask32, __m512h, __m51 test_4 (_mm_mask_scalef_round_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 8) test_4 (_mm_mask_reduce_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 123) test_4 (_mm_mask_roundscale_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 123) +test_4 (_mm_mask_getexp_round_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 8) test_4x (_mm_mask_reduce_round_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 123, 8) test_4x (_mm_mask_roundscale_round_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 123, 8) +test_4x (_mm_mask_getmant_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 1, 1) +test_4y (_mm_mask_getmant_round_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 1, 1, 8) /* avx512fp16vlintrin.h */ test_2 (_mm_cmp_ph_mask, __mmask8, __m128h, __m128h, 1) diff --git a/gcc/testsuite/gcc.target/i386/sse-22.c b/gcc/testsuite/gcc.target/i386/sse-22.c index 5dbe8cba5ea..008600a393d 100644 --- a/gcc/testsuite/gcc.target/i386/sse-22.c +++ b/gcc/testsuite/gcc.target/i386/sse-22.c @@ -782,8 +782,11 @@ test_1 (_mm512_reduce_ph, __m512h, __m512h, 123) test_1 (_mm_roundscale_ph, __m128h, __m128h, 123) test_1 (_mm256_roundscale_ph, __m256h, __m256h, 123) test_1 (_mm512_roundscale_ph, __m512h, __m512h, 123) +test_1 (_mm512_getexp_round_ph, __m512h, __m512h, 8) test_1x (_mm512_reduce_round_ph, __m512h, __m512h, 123, 8) test_1x (_mm512_roundscale_round_ph, __m512h, __m512h, 123, 8) +test_1x (_mm512_getmant_ph, __m512h, __m512h, 1, 1) +test_1y (_mm512_getmant_round_ph, __m512h, __m512h, 1, 1, 8) test_2 (_mm512_add_round_ph, __m512h, __m512h, __m512h, 8) test_2 (_mm512_sub_round_ph, __m512h, __m512h, __m512h, 8) test_2 (_mm512_mul_round_ph, __m512h, __m512h, __m512h, 8) @@ -809,6 +812,8 @@ test_2 (_mm_maskz_roundscale_ph, __m128h, __mmask8, __m128h, 123) test_2 (_mm256_maskz_roundscale_ph, __m256h, __mmask16, __m256h, 123) test_2 (_mm512_maskz_roundscale_ph, __m512h, __mmask32, __m512h, 123) test_2 (_mm_roundscale_sh, __m128h, __m128h, __m128h, 123) +test_2 (_mm512_maskz_getexp_round_ph, __m512h, __mmask32, __m512h, 8) +test_2 (_mm_getexp_round_sh, __m128h, __m128h, __m128h, 8) test_2x (_mm512_cmp_round_ph_mask, __mmask32, __m512h, __m512h, 1, 8) test_2x (_mm_cmp_round_sh_mask, __mmask8, __m128h, __m128h, 1, 8) test_2x (_mm_comi_round_sh, int, __m128h, __m128h, 1, 8) @@ -816,6 +821,10 @@ test_2x (_mm512_maskz_reduce_round_ph, __m512h, __mmask32, __m512h, 123, 8) test_2x (_mm512_maskz_roundscale_round_ph, __m512h, __mmask32, __m512h, 123, 8) test_2x (_mm_reduce_round_sh, __m128h, __m128h, __m128h, 123, 8) test_2x (_mm_roundscale_round_sh, __m128h, __m128h, __m128h, 123, 8) +test_2x (_mm512_maskz_getmant_ph, __m512h, __mmask32, __m512h, 1, 1) +test_2x (_mm_getmant_sh, __m128h, __m128h, __m128h, 1, 1) +test_2y (_mm512_maskz_getmant_round_ph, __m512h, __mmask32, __m512h, 1, 1, 8) +test_2y (_mm_getmant_round_sh, __m128h, __m128h, __m128h, 1, 1, 8) test_3 (_mm512_maskz_add_round_ph, __m512h, __mmask32, __m512h, __m512h, 8) test_3 (_mm512_maskz_sub_round_ph, __m512h, __mmask32, __m512h, __m512h, 8) test_3 (_mm512_maskz_mul_round_ph, __m512h, __mmask32, __m512h, __m512h, 8) @@ -840,12 +849,18 @@ test_3 (_mm_mask_roundscale_ph, __m128h, __m128h, __mmask8, __m128h, 123) test_3 (_mm256_mask_roundscale_ph, __m256h, __m256h, __mmask16, __m256h, 123) test_3 (_mm512_mask_roundscale_ph, __m512h, __m512h, __mmask32, __m512h, 123) test_3 (_mm_maskz_roundscale_sh, __m128h, __mmask8, __m128h, __m128h, 123) +test_3 (_mm_maskz_getexp_round_sh, __m128h, __mmask8, __m128h, __m128h, 8) +test_3 (_mm512_mask_getexp_round_ph, __m512h, __m512h, __mmask32, __m512h, 8) test_3x (_mm512_mask_cmp_round_ph_mask, __mmask32, __mmask32, __m512h, __m512h, 1, 8) test_3x (_mm_mask_cmp_round_sh_mask, __mmask8, __mmask8, __m128h, __m128h, 1, 8) test_3x (_mm512_mask_reduce_round_ph, __m512h, __m512h, __mmask32, __m512h, 123, 8) test_3x (_mm512_mask_roundscale_round_ph, __m512h, __m512h, __mmask32, __m512h, 123, 8) test_3x (_mm_maskz_reduce_round_sh, __m128h, __mmask8, __m128h, __m128h, 123, 8) test_3x (_mm_maskz_roundscale_round_sh, __m128h, __mmask8, __m128h, __m128h, 123, 8) +test_3x (_mm512_mask_getmant_ph, __m512h, __m512h, __mmask32, __m512h, 1, 1) +test_3x (_mm_maskz_getmant_sh, __m128h, __mmask8, __m128h, __m128h, 1, 1) +test_3y (_mm_maskz_getmant_round_sh, __m128h, __mmask8, __m128h, __m128h, 1, 1, 8) +test_3y (_mm512_mask_getmant_round_ph, __m512h, __m512h, __mmask32, __m512h, 1, 1, 8) test_4 (_mm512_mask_add_round_ph, __m512h, __m512h, __mmask32, __m512h, __m512h, 8) test_4 (_mm512_mask_sub_round_ph, __m512h, __m512h, __mmask32, __m512h, __m512h, 8) test_4 (_mm512_mask_mul_round_ph, __m512h, __m512h, __mmask32, __m512h, __m512h, 8) @@ -862,8 +877,11 @@ test_4 (_mm_mask_sqrt_round_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 8) test_4 (_mm512_mask_scalef_round_ph, __m512h, __m512h, __mmask32, __m512h, __m512h, 8) test_4 (_mm_mask_reduce_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 123) test_4 (_mm_mask_roundscale_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 123) +test_4 (_mm_mask_getexp_round_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 8) test_4x (_mm_mask_reduce_round_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 123, 8) test_4x (_mm_mask_roundscale_round_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 123, 8) +test_4x (_mm_mask_getmant_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 1, 1) +test_4y (_mm_mask_getmant_round_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 1, 1, 8) /* avx512fp16vlintrin.h */ test_2 (_mm_cmp_ph_mask, __mmask8, __m128h, __m128h, 1) diff --git a/gcc/testsuite/gcc.target/i386/sse-23.c b/gcc/testsuite/gcc.target/i386/sse-23.c index 2d968f07bc8..b3f07587acb 100644 --- a/gcc/testsuite/gcc.target/i386/sse-23.c +++ b/gcc/testsuite/gcc.target/i386/sse-23.c @@ -731,10 +731,20 @@ #define __builtin_ia32_vrndscaleph_v8hf_mask(A, B, C, D) __builtin_ia32_vrndscaleph_v8hf_mask(A, 123, C, D) #define __builtin_ia32_vrndscaleph_v16hf_mask(A, B, C, D) __builtin_ia32_vrndscaleph_v16hf_mask(A, 123, C, D) #define __builtin_ia32_vrndscalesh_v8hf_mask_round(A, B, C, D, E, F) __builtin_ia32_vrndscalesh_v8hf_mask_round(A, B, 123, D, E, 8) +#define __builtin_ia32_fpclassph512_mask(A, D, C) __builtin_ia32_fpclassph512_mask(A, 1, C) +#define __builtin_ia32_fpclasssh_mask(A, D, U) __builtin_ia32_fpclasssh_mask(A, 1, U) +#define __builtin_ia32_getexpph512_mask(A, B, C, D) __builtin_ia32_getexpph512_mask(A, B, C, 8) +#define __builtin_ia32_getexpsh_mask_round(A, B, C, D, E) __builtin_ia32_getexpsh_mask_round(A, B, C, D, 4) +#define __builtin_ia32_getmantph512_mask(A, F, C, D, E) __builtin_ia32_getmantph512_mask(A, 1, C, D, 8) +#define __builtin_ia32_getmantsh_mask_round(A, B, C, W, U, D) __builtin_ia32_getmantsh_mask_round(A, B, 1, W, U, 4) /* avx512fp16vlintrin.h */ #define __builtin_ia32_vcmpph_v8hf_mask(A, B, C, D) __builtin_ia32_vcmpph_v8hf_mask(A, B, 1, D) #define __builtin_ia32_vcmpph_v16hf_mask(A, B, C, D) __builtin_ia32_vcmpph_v16hf_mask(A, B, 1, D) +#define __builtin_ia32_fpclassph256_mask(A, D, C) __builtin_ia32_fpclassph256_mask(A, 1, C) +#define __builtin_ia32_fpclassph128_mask(A, D, C) __builtin_ia32_fpclassph128_mask(A, 1, C) +#define __builtin_ia32_getmantph256_mask(A, E, C, D) __builtin_ia32_getmantph256_mask(A, 1, C, D) +#define __builtin_ia32_getmantph128_mask(A, E, C, D) __builtin_ia32_getmantph128_mask(A, 1, C, D) /* vpclmulqdqintrin.h */ #define __builtin_ia32_vpclmulqdq_v4di(A, B, C) __builtin_ia32_vpclmulqdq_v4di(A, B, 1) From patchwork Thu Jul 1 06:16:09 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: liuhongt X-Patchwork-Id: 1499331 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=8.43.85.97; helo=sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256 header.s=default header.b=HjvZwC5H; dkim-atps=neutral Received: from sourceware.org (ip-8-43-85-97.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4GFpd05RC8z9sVb for ; Thu, 1 Jul 2021 16:45:40 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 1DD06384B006 for ; Thu, 1 Jul 2021 06:45:38 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 1DD06384B006 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1625121938; bh=lrBa+GfIOvdy0ExFNUmF2ne+nHIUPO8M685lw9PI7fs=; h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=HjvZwC5HAboYSHcw/beUeOYs5f83GemTrJILluV4mazcx3nQ8zF3jC61sD3RhK04b CvsTQjt+BU//cPk03LOVZVOAjaqI+SCVEmwC8GsyuMjVKgUWYS4uCLHY4dAO70v7XF jlCM/U0ZF+8L0aO7JK47ZHsJ/0gbQBGaN/PUHJSc= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mga07.intel.com (mga07.intel.com [134.134.136.100]) by sourceware.org (Postfix) with ESMTPS id AC90C384B06F for ; Thu, 1 Jul 2021 06:17:29 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org AC90C384B06F X-IronPort-AV: E=McAfee;i="6200,9189,10031"; a="272334067" X-IronPort-AV: E=Sophos;i="5.83,313,1616482800"; d="scan'208";a="272334067" Received: from fmsmga005.fm.intel.com ([10.253.24.32]) by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Jun 2021 23:17:28 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.83,313,1616482800"; d="scan'208";a="644339104" Received: from scymds01.sc.intel.com ([10.148.94.138]) by fmsmga005.fm.intel.com with ESMTP; 30 Jun 2021 23:17:28 -0700 Received: from shliclel320.sh.intel.com (shliclel320.sh.intel.com [10.239.236.50]) by scymds01.sc.intel.com with ESMTP id 1616Gmf2031625; Wed, 30 Jun 2021 23:17:26 -0700 To: gcc-patches@gcc.gnu.org Subject: [PATCH 23/62] AVX512FP16: Add testcase for fpclass/getmant/getexp instructions. Date: Thu, 1 Jul 2021 14:16:09 +0800 Message-Id: <20210701061648.9447-24-hongtao.liu@intel.com> X-Mailer: git-send-email 2.18.1 In-Reply-To: <20210701061648.9447-1-hongtao.liu@intel.com> References: <20210701061648.9447-1-hongtao.liu@intel.com> X-Spam-Status: No, score=-12.7 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_NONE, KAM_DMARC_STATUS, KAM_LAZY_DOMAIN_SECURITY, KAM_SHORT, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: liuhongt via Gcc-patches From: liuhongt Reply-To: liuhongt Cc: jakub@redhat.com Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" gcc/testsuite/ChangeLog: * gcc.target/i386/avx512fp16-helper.h (V512): Add xmm component. * gcc.target/i386/avx512fp16-vfpclassph-1a.c: New test. * gcc.target/i386/avx512fp16-vfpclassph-1b.c: Ditto. * gcc.target/i386/avx512fp16-vfpclasssh-1a.c: Ditto. * gcc.target/i386/avx512fp16-vfpclasssh-1b.c: Ditto. * gcc.target/i386/avx512fp16-vgetexpph-1a.c: Ditto. * gcc.target/i386/avx512fp16-vgetexpph-1b.c: Ditto. * gcc.target/i386/avx512fp16-vgetexpsh-1a.c: Ditto. * gcc.target/i386/avx512fp16-vgetexpsh-1b.c: Ditto. * gcc.target/i386/avx512fp16-vgetmantph-1a.c: Ditto. * gcc.target/i386/avx512fp16-vgetmantph-1b.c: Ditto. * gcc.target/i386/avx512fp16-vgetmantsh-1a.c: Ditto. * gcc.target/i386/avx512fp16-vgetmantsh-1b.c: Ditto. * gcc.target/i386/avx512fp16vl-vfpclassph-1a.c: Ditto. * gcc.target/i386/avx512fp16vl-vfpclassph-1b.c: Ditto. * gcc.target/i386/avx512fp16vl-vgetexpph-1a.c: Ditto. * gcc.target/i386/avx512fp16vl-vgetexpph-1b.c: Ditto. * gcc.target/i386/avx512fp16vl-vgetmantph-1a.c: Ditto. * gcc.target/i386/avx512fp16vl-vgetmantph-1b.c: Ditto. --- .../gcc.target/i386/avx512fp16-helper.h | 1 + .../i386/avx512fp16-vfpclassph-1a.c | 16 +++ .../i386/avx512fp16-vfpclassph-1b.c | 77 +++++++++++++ .../i386/avx512fp16-vfpclasssh-1a.c | 16 +++ .../i386/avx512fp16-vfpclasssh-1b.c | 76 +++++++++++++ .../gcc.target/i386/avx512fp16-vgetexpph-1a.c | 24 +++++ .../gcc.target/i386/avx512fp16-vgetexpph-1b.c | 99 +++++++++++++++++ .../gcc.target/i386/avx512fp16-vgetexpsh-1a.c | 24 +++++ .../gcc.target/i386/avx512fp16-vgetexpsh-1b.c | 61 +++++++++++ .../i386/avx512fp16-vgetmantph-1a.c | 24 +++++ .../i386/avx512fp16-vgetmantph-1b.c | 102 ++++++++++++++++++ .../i386/avx512fp16-vgetmantsh-1a.c | 24 +++++ .../i386/avx512fp16-vgetmantsh-1b.c | 62 +++++++++++ .../i386/avx512fp16vl-vfpclassph-1a.c | 22 ++++ .../i386/avx512fp16vl-vfpclassph-1b.c | 16 +++ .../i386/avx512fp16vl-vgetexpph-1a.c | 26 +++++ .../i386/avx512fp16vl-vgetexpph-1b.c | 16 +++ .../i386/avx512fp16vl-vgetmantph-1a.c | 30 ++++++ .../i386/avx512fp16vl-vgetmantph-1b.c | 16 +++ 19 files changed, 732 insertions(+) create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vfpclassph-1a.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vfpclassph-1b.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vfpclasssh-1a.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vfpclasssh-1b.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vgetexpph-1a.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vgetexpph-1b.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vgetexpsh-1a.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vgetexpsh-1b.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vgetmantph-1a.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vgetmantph-1b.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vgetmantsh-1a.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vgetmantsh-1b.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vfpclassph-1a.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vfpclassph-1b.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vgetexpph-1a.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vgetexpph-1b.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vgetmantph-1a.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vgetmantph-1b.c diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-helper.h b/gcc/testsuite/gcc.target/i386/avx512fp16-helper.h index ec88888532c..f6f46872c35 100644 --- a/gcc/testsuite/gcc.target/i386/avx512fp16-helper.h +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-helper.h @@ -29,6 +29,7 @@ typedef union __m256h ymmh[2]; __m256i ymmi[2]; __m128h xmmh[4]; + __m128 xmm[4]; unsigned short u16[32]; unsigned int u32[16]; float f32[16]; diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vfpclassph-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vfpclassph-1a.c new file mode 100644 index 00000000000..a97dddf6110 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vfpclassph-1a.c @@ -0,0 +1,16 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx512fp16 -O2" } */ +/* { dg-final { scan-assembler-times "vfpclassphz\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n^k\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vfpclassphz\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n^k\]*%k\[0-7\]\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ + +#include + +volatile __m512h x512; +volatile __mmask16 m32; + +void extern +avx512dq_test (void) +{ + m32 = _mm512_fpclass_ph_mask (x512, 13); + m32 = _mm512_mask_fpclass_ph_mask (2, x512, 13); +} diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vfpclassph-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vfpclassph-1b.c new file mode 100644 index 00000000000..9ffb5606b81 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vfpclassph-1b.c @@ -0,0 +1,77 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mavx512fp16" } */ +/* { dg-require-effective-target avx512fp16 } */ + +#define AVX512FP16 +#include "avx512f-helper.h" + +#include +#include +#include +#include "avx512f-mask-type.h" +#define SIZE (AVX512F_LEN / 16) + +#ifndef __FPCLASSPH__ +#define __FPCLASSPH__ +int check_fp_class_hp (_Float16 src, int imm) +{ + int qNaN_res = isnan (src); + int sNaN_res = isnan (src); + int Pzero_res = (src == 0.0); + int Nzero_res = (src == -0.0); + int PInf_res = (isinf (src) == 1); + int NInf_res = (isinf (src) == -1); + int Denorm_res = (fpclassify (src) == FP_SUBNORMAL); + int FinNeg_res = __builtin_finite (src) && (src < 0); + + int result = (((imm & 1) && qNaN_res) + || (((imm >> 1) & 1) && Pzero_res) + || (((imm >> 2) & 1) && Nzero_res) + || (((imm >> 3) & 1) && PInf_res) + || (((imm >> 4) & 1) && NInf_res) + || (((imm >> 5) & 1) && Denorm_res) + || (((imm >> 6) & 1) && FinNeg_res) + || (((imm >> 7) & 1) && sNaN_res)); + return result; +} +#endif + +MASK_TYPE +CALC (_Float16 *s1, int imm) +{ + int i; + MASK_TYPE res = 0; + + for (i = 0; i < SIZE; i++) + if (check_fp_class_hp(s1[i], imm)) + res = res | (1 << i); + + return res; +} + +void +TEST (void) +{ + int i; + UNION_TYPE (AVX512F_LEN, h) src; + MASK_TYPE res1, res2, res_ref = 0; + MASK_TYPE mask = MASK_VALUE; + + src.a[0] = NAN; + src.a[1] = 1.0 / 0.0; + for (i = 1; i < SIZE; i++) + { + src.a[i] = -24.43 + 0.6 * i; + } + + res1 = INTRINSIC (_fpclass_ph_mask) (src.x, 0xFF); + res2 = INTRINSIC (_mask_fpclass_ph_mask) (mask, src.x, 0xFF); + + res_ref = CALC (src.a, 0xFF); + + if (res_ref != res1) + abort (); + + if ((mask & res_ref) != res2) + abort (); +} diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vfpclasssh-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vfpclasssh-1a.c new file mode 100644 index 00000000000..7a31fd8b47d --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vfpclasssh-1a.c @@ -0,0 +1,16 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx512fp16 -O2" } */ +/* { dg-final { scan-assembler-times "vfpclasssh\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n^k\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vfpclasssh\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n^k\]*%k\[0-7\]\{%k\[0-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ + +#include + +volatile __m128h x128; +volatile __mmask8 m8; + +void extern +avx512dq_test (void) +{ + m8 = _mm_fpclass_sh_mask (x128, 13); + m8 = _mm_mask_fpclass_sh_mask (m8, x128, 13); +} diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vfpclasssh-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vfpclasssh-1b.c new file mode 100644 index 00000000000..bdc6f9f059a --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vfpclasssh-1b.c @@ -0,0 +1,76 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mavx512fp16" } */ +/* { dg-require-effective-target avx512fp16 } */ + +#define AVX512FP16 +#include "avx512f-helper.h" + +#include +#include +#include +#include "avx512f-mask-type.h" +#define SIZE (128 / 16) + +#ifndef __FPCLASSSH__ +#define __FPCLASSSH__ +int check_fp_class_hp (_Float16 src, int imm) +{ + int qNaN_res = isnan (src); + int sNaN_res = isnan (src); + int Pzero_res = (src == 0.0); + int Nzero_res = (src == -0.0); + int PInf_res = (isinf (src) == 1); + int NInf_res = (isinf (src) == -1); + int Denorm_res = (fpclassify (src) == FP_SUBNORMAL); + int FinNeg_res = __builtin_finite (src) && (src < 0); + + int result = (((imm & 1) && qNaN_res) + || (((imm >> 1) & 1) && Pzero_res) + || (((imm >> 2) & 1) && Nzero_res) + || (((imm >> 3) & 1) && PInf_res) + || (((imm >> 4) & 1) && NInf_res) + || (((imm >> 5) & 1) && Denorm_res) + || (((imm >> 6) & 1) && FinNeg_res) + || (((imm >> 7) & 1) && sNaN_res)); + return result; +} +#endif + +__mmask8 +CALC (_Float16 *s1, int imm) +{ + int i; + __mmask8 res = 0; + + if (check_fp_class_hp(s1[0], imm)) + res = res | 1; + + return res; +} + +void +TEST (void) +{ + int i; + union128h src; + __mmask8 res1, res2, res_ref = 0; + __mmask8 mask = MASK_VALUE; + + src.a[0] = 1.0 / 0.0; + for (i = 1; i < SIZE; i++) + { + src.a[i] = -24.43 + 0.6 * i; + } + + res1 = _mm_fpclass_sh_mask (src.x, 0xFF); + res2 = _mm_mask_fpclass_sh_mask (mask, src.x, 0xFF); + + + res_ref = CALC (src.a, 0xFF); + + if (res_ref != res1) + abort (); + + if ((mask & res_ref) != res2) + abort (); +} diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vgetexpph-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vgetexpph-1a.c new file mode 100644 index 00000000000..993cbd944d1 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vgetexpph-1a.c @@ -0,0 +1,24 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx512fp16 -O2" } */ +/* { dg-final { scan-assembler-times "vgetexpph\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1} } */ +/* { dg-final { scan-assembler-times "vgetexpph\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1} } */ +/* { dg-final { scan-assembler-times "vgetexpph\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1} } */ +/* { dg-final { scan-assembler-times "vgetexpph\[ \\t\]+\[^\{\n\]*\{sae\}\[^\n\]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1} } */ +/* { dg-final { scan-assembler-times "vgetexpph\[ \\t\]+\[^\{\n\]*\{sae\}\[^\n\]*%zmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1} } */ +/* { dg-final { scan-assembler-times "vgetexpph\[ \\t\]+\[^\{\n\]*\{sae\}\[^\n\]*%zmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1} } */ + +#include + +volatile __m512h x; +volatile __mmask32 m; + +void extern +avx512f_test (void) +{ + x = _mm512_getexp_ph (x); + x = _mm512_mask_getexp_ph (x, m, x); + x = _mm512_maskz_getexp_ph (m, x); + x = _mm512_getexp_round_ph (x, _MM_FROUND_NO_EXC); + x = _mm512_mask_getexp_round_ph (x, m, x, _MM_FROUND_NO_EXC); + x = _mm512_maskz_getexp_round_ph (m, x, _MM_FROUND_NO_EXC); +} diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vgetexpph-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vgetexpph-1b.c new file mode 100644 index 00000000000..3483c9537dd --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vgetexpph-1b.c @@ -0,0 +1,99 @@ + /* { dg-do run { target avx512fp16 } } */ +/* { dg-options "-O2 -mavx512fp16 -mavx512dq" } */ + + +#define AVX512FP16 +#include "avx512fp16-helper.h" + +#define N_ELEMS (AVX512F_LEN / 16) + +void NOINLINE +EMULATE(getexp_ph) (V512 * dest, V512 op1, + __mmask32 k, int zero_mask) +{ + V512 v1, v2, v3, v4, v5, v6, v7, v8; + int i; + float emu[32]; + __mmask16 m1, m2; + m1 = k & 0xffff; + m2 = (k >> 16) & 0xffff; + unpack_ph_2twops(op1, &v1, &v2); + unpack_ph_2twops(*dest, &v7, &v8); + v3.zmm = _mm512_getexp_round_ps(v1.zmm, _ROUND_CUR); + v4.zmm = _mm512_getexp_round_ps(v2.zmm, _ROUND_CUR); + for (i=0; i<16; i++) + { + emu[i] = v3.f32[i]; + emu[i+16] = v4.f32[i]; + } + for (i = 0; i < 16; i++) { + if (((1 << i) & m1) == 0) { + if (zero_mask) { + v5.f32[i] = 0; + } + else { + v5.u32[i] = v7.u32[i]; + } + } + else { + v5.f32[i] = emu[i]; + + } + + if (((1 << i) & m2) == 0) { + if (zero_mask) { + v6.f32[i] = 0; + } + else { + v6.u32[i] = v8.u32[i]; + } + } + else { + v6.f32[i] = emu[i+16]; + } + + } + *dest = pack_twops_2ph(v5, v6); +} + +void +TEST (void) +{ + V512 res; + V512 exp; + + init_src(); + + EMULATE(getexp_ph) (&exp, src1, NET_MASK, 0); + HF(res) = INTRINSIC (_getexp_ph) (HF(src1)); + CHECK_RESULT (&res, &exp, N_ELEMS, _getexp_ph); + + init_dest(&res, &exp); + EMULATE(getexp_ph) (&exp, src1, MASK_VALUE, 0); + HF(res) = INTRINSIC (_mask_getexp_ph) (HF(res), MASK_VALUE, HF(src1)); + CHECK_RESULT (&res, &exp, N_ELEMS, _mask_getexp_ph); + + EMULATE(getexp_ph) (&exp, src1, ZMASK_VALUE, 1); + HF(res) = INTRINSIC (_maskz_getexp_ph) (ZMASK_VALUE, HF(src1)); + CHECK_RESULT (&res, &exp, N_ELEMS, _maskz_getexp_ph); +#if AVX512F_LEN == 512 + EMULATE(getexp_ph) (&exp, src1, NET_MASK, 0); + HF(res) = INTRINSIC (_getexp_round_ph) (HF(src1), _ROUND_CUR); + CHECK_RESULT (&res, &exp, N_ELEMS, _getexp_round_ph); + + init_dest(&res, &exp); + EMULATE(getexp_ph) (&exp, src1, MASK_VALUE, 0); + HF(res) = INTRINSIC (_mask_getexp_round_ph) (HF(res), MASK_VALUE, HF(src1), + _ROUND_CUR); + CHECK_RESULT (&res, &exp, N_ELEMS, _mask_getexp_round_ph); + + EMULATE(getexp_ph) (&exp, src1, ZMASK_VALUE, 1); + HF(res) = INTRINSIC (_maskz_getexp_round_ph) (ZMASK_VALUE, HF(src1), _ROUND_CUR); + CHECK_RESULT (&res, &exp, N_ELEMS, _maskz_getexp_round_ph); +#endif + + if (n_errs != 0) { + abort (); + } +} + diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vgetexpsh-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vgetexpsh-1a.c new file mode 100644 index 00000000000..397fd3e14a5 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vgetexpsh-1a.c @@ -0,0 +1,24 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx512fp16 -O2" } */ +/* { dg-final { scan-assembler-times "vgetexpsh\[ \\t\]+\[^\{\n\]\[^\n\]*%xmm\[0-9\]+\, %xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vgetexpsh\[ \\t\]+\[^\{\n\]\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vgetexpsh\[ \\t\]+\[^\{\n\]\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vgetexpsh\[ \\t\]+\[^\{\n\]*\{sae\}\[^\n\]*%xmm\[0-9\]+\, %xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vgetexpsh\[ \\t\]+\[^\n\]*\{sae\}\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vgetexpsh\[ \\t\]+\[^\n\]*\{sae\}\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ + +#include + +volatile __m128h x; +volatile __mmask8 m; + +void extern +avx512f_test (void) +{ + x = _mm_getexp_sh (x, x); + x = _mm_mask_getexp_sh (x, m, x, x); + x = _mm_maskz_getexp_sh (m, x, x); + x = _mm_getexp_round_sh (x, x, _MM_FROUND_NO_EXC); + x = _mm_mask_getexp_round_sh (x, m, x, x, _MM_FROUND_NO_EXC); + x = _mm_maskz_getexp_round_sh (m, x, x, _MM_FROUND_NO_EXC); +} diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vgetexpsh-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vgetexpsh-1b.c new file mode 100644 index 00000000000..ca9834df6e4 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vgetexpsh-1b.c @@ -0,0 +1,61 @@ +/* { dg-do run { target avx512fp16 } } */ +/* { dg-options "-O2 -mavx512fp16 -mavx512dq" } */ + + +#define AVX512FP16 +#include "avx512fp16-helper.h" + +#define N_ELEMS 8 + +void NOINLINE +emulate_getexp_sh(V512 * dest, V512 op1, + __mmask32 k, int zero_mask) +{ + V512 v0, v1, v2, v5, v6, v7, v8; + int i; + + unpack_ph_2twops(op1, &v1, &v2); + unpack_ph_2twops(*dest, &v7, &v8); + + v0.xmm[0] = _mm_getexp_round_ss (v1.xmm[0], v1.xmm[0], _ROUND_CUR); + + if ((k&1) || !k) + v5.f32[0] = v0.f32[0]; + else if (zero_mask) + v5.f32[0] = 0; + else + v5.f32[0] = v7.f32[0]; + + for (i = 1; i < 8; i++) + v5.f32[i] = v1.f32[i]; + *dest = pack_twops_2ph(v5, v6); +} + +void +test_512 (void) +{ + V512 res; + V512 exp; + + init_src(); + + emulate_getexp_sh(&exp, src1, 0x1, 0); + res.xmmh[0] = _mm_getexp_round_sh(exp.xmmh[0], src1.xmmh[0], _ROUND_CUR); + check_results(&res, &exp, N_ELEMS, "_mm_getexp_round_sh"); + + init_dest(&res, &exp); + emulate_getexp_sh(&exp, src1, 0x1, 0); + res.xmmh[0] = _mm_mask_getexp_round_sh(res.xmmh[0], 0x1, exp.xmmh[0], + src1.xmmh[0], _ROUND_CUR); + check_results(&res, &exp, N_ELEMS, "_mm_mask_getexp_round_sh"); + + emulate_getexp_sh(&exp, src1, 0x3, 1); + res.xmmh[0] = _mm_maskz_getexp_round_sh(0x3, exp.xmmh[0], src1.xmmh[0], + _ROUND_CUR); + check_results(&res, &exp, N_ELEMS, "_mm_maskz_getexp_round_sh"); + + if (n_errs != 0) { + abort (); + } +} + diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vgetmantph-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vgetmantph-1a.c new file mode 100644 index 00000000000..69e0c72721b --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vgetmantph-1a.c @@ -0,0 +1,24 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -mavx512fp16" } */ +/* { dg-final { scan-assembler-times "vgetmantph\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vgetmantph\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vgetmantph\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vgetmantph\[ \\t\]+\[^\{\n\]*\{sae\}\[^\n\]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vgetmantph\[ \\t\]+\[^\{\n\]*\{sae\}\[^\n\]*%zmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vgetmantph\[ \\t\]+\[^\{\n\]*\{sae\}\[^\n\]*%zmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ + +#include + +volatile __m512h x, y; +volatile __mmask32 m; + +void extern +avx512f_test (void) +{ + x = _mm512_getmant_ph (y, _MM_MANT_NORM_p75_1p5, _MM_MANT_SIGN_src); + x = _mm512_mask_getmant_ph (x, m, y, _MM_MANT_NORM_p75_1p5, _MM_MANT_SIGN_src); + x = _mm512_maskz_getmant_ph (m, y, _MM_MANT_NORM_p75_1p5, _MM_MANT_SIGN_src); + x = _mm512_getmant_round_ph (y, _MM_MANT_NORM_p75_1p5, _MM_MANT_SIGN_src, _MM_FROUND_NO_EXC); + x = _mm512_mask_getmant_round_ph (x, m, y, _MM_MANT_NORM_p75_1p5, _MM_MANT_SIGN_src, _MM_FROUND_NO_EXC); + x = _mm512_maskz_getmant_round_ph (m, y, _MM_MANT_NORM_p75_1p5, _MM_MANT_SIGN_src, _MM_FROUND_NO_EXC); +} diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vgetmantph-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vgetmantph-1b.c new file mode 100644 index 00000000000..c18d1aa5dc1 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vgetmantph-1b.c @@ -0,0 +1,102 @@ +/* { dg-do run { target avx512fp16 } } */ +/* { dg-options "-O2 -mavx512fp16 -mavx512dq" } */ + + +#define AVX512FP16 +#include "avx512fp16-helper.h" + +#define N_ELEMS (AVX512F_LEN / 16) + +void NOINLINE +EMULATE(getmant_ph) (V512 * dest, V512 op1, + __mmask32 k, int zero_mask) +{ + V512 v1, v2, v3, v4, v5, v6, v7, v8; + int i; + float emu[32]; + __mmask16 m1, m2; + m1 = k & 0xffff; + m2 = (k >> 16) & 0xffff; + unpack_ph_2twops(op1, &v1, &v2); + unpack_ph_2twops(*dest, &v7, &v8); + v3.zmm = _mm512_getmant_round_ps(v1.zmm, 2, 0, _ROUND_CUR); + v4.zmm = _mm512_getmant_round_ps(v2.zmm, 2, 0, _ROUND_CUR); + for (i=0; i<16; i++) + { + emu[i] = v3.f32[i]; + emu[i+16] = v4.f32[i]; + } + for (i = 0; i < 16; i++) { + if (((1 << i) & m1) == 0) { + if (zero_mask) { + v5.f32[i] = 0; + } + else { + v5.u32[i] = v7.u32[i]; + } + } + else { + v5.f32[i] = emu[i]; + + } + + if (((1 << i) & m2) == 0) { + if (zero_mask) { + v6.f32[i] = 0; + } + else { + v6.u32[i] = v8.u32[i]; + } + } + else { + v6.f32[i] = emu[i+16]; + } + + } + *dest = pack_twops_2ph(v5, v6); +} + +void +TEST (void) +{ + V512 res; + V512 exp; + + init_src(); + + EMULATE(getmant_ph) (&exp, src1, NET_MASK, 0); + HF(res) = INTRINSIC (_getmant_ph) (HF(src1), 2, 0); + CHECK_RESULT (&res, &exp, N_ELEMS, _getmant_ph); + + init_dest(&res, &exp); + EMULATE(getmant_ph) (&exp, src1, MASK_VALUE, 0); + HF(res) = INTRINSIC (_mask_getmant_ph) (HF(res), MASK_VALUE, + HF(src1), 2, 0); + CHECK_RESULT (&res, &exp, N_ELEMS, _mask_getmant_ph); + + EMULATE(getmant_ph) (&exp, src1, ZMASK_VALUE, 1); + HF(res) = INTRINSIC (_maskz_getmant_ph) (ZMASK_VALUE, HF(src1), + 2, 0); + CHECK_RESULT (&res, &exp, N_ELEMS, _maskz_getmant_ph); +#if AVX512F_LEN == 512 + EMULATE(getmant_ph) (&exp, src1, NET_MASK, 0); + HF(res) = INTRINSIC (_getmant_round_ph) (HF(src1), 2, 0, _ROUND_CUR); + CHECK_RESULT (&res, &exp, N_ELEMS, _getmant_round_ph); + + init_dest(&res, &exp); + EMULATE(getmant_ph) (&exp, src1, MASK_VALUE, 0); + HF(res) = INTRINSIC (_mask_getmant_round_ph) (HF(res), MASK_VALUE, + HF(src1), 2, 0, _ROUND_CUR); + CHECK_RESULT (&res, &exp, N_ELEMS, _mask_getmant_round_ph); + + EMULATE(getmant_ph) (&exp, src1, ZMASK_VALUE, 1); + HF(res) = INTRINSIC (_maskz_getmant_round_ph) (ZMASK_VALUE, HF(src1), + 2, 0, _ROUND_CUR); + CHECK_RESULT (&res, &exp, N_ELEMS, _maskz_getmant_round_ph); +#endif + + if (n_errs != 0) { + abort (); + } +} + diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vgetmantsh-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vgetmantsh-1a.c new file mode 100644 index 00000000000..b533f20341b --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vgetmantsh-1a.c @@ -0,0 +1,24 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -mavx512fp16" } */ +/* { dg-final { scan-assembler-times "vgetmantsh\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vgetmantsh\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vgetmantsh\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vgetmantsh\[ \\t\]+\[^\{\n\]*\{sae\}\[^\n\]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vgetmantsh\[ \\t\]+\[^\n\]*\{sae\}\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vgetmantsh\[ \\t\]+\[^\n\]*\{sae\}\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ + +#include + +volatile __m128h x, y, z; +volatile __mmask8 m; + +void extern +avx512f_test (void) +{ + x = _mm_getmant_sh (y, z, _MM_MANT_NORM_p75_1p5, _MM_MANT_SIGN_src); + x = _mm_mask_getmant_sh (x, m, y, z, _MM_MANT_NORM_p75_1p5, _MM_MANT_SIGN_src); + x = _mm_maskz_getmant_sh (m, y, z, _MM_MANT_NORM_p75_1p5, _MM_MANT_SIGN_src); + x = _mm_getmant_round_sh (y, z, _MM_MANT_NORM_p75_1p5, _MM_MANT_SIGN_src, _MM_FROUND_NO_EXC); + x = _mm_mask_getmant_round_sh (x, m, y, z, _MM_MANT_NORM_p75_1p5, _MM_MANT_SIGN_src, _MM_FROUND_NO_EXC); + x = _mm_maskz_getmant_round_sh (m, y, z, _MM_MANT_NORM_p75_1p5, _MM_MANT_SIGN_src, _MM_FROUND_NO_EXC); +} diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vgetmantsh-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vgetmantsh-1b.c new file mode 100644 index 00000000000..bee8b04dfc5 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vgetmantsh-1b.c @@ -0,0 +1,62 @@ +/* { dg-do run { target avx512fp16 } } */ +/* { dg-options "-O2 -mavx512fp16 -mavx512dq" } */ + + +#define AVX512FP16 +#include "avx512fp16-helper.h" + +#define N_ELEMS 8 + +void NOINLINE +emulate_getmant_sh(V512 * dest, V512 op1, + __mmask32 k, int zero_mask) +{ + V512 v0, v1, v2, v5, v6, v7, v8; + int i; + + unpack_ph_2twops(op1, &v1, &v2); + unpack_ph_2twops(*dest, &v7, &v8); + + v0.xmm[0] = _mm_getmant_round_ss (v1.xmm[0], v1.xmm[0], 2, 0, _ROUND_CUR); + + if ((k&1) || !k) + v5.f32[0] = v0.f32[0]; + else if (zero_mask) + v5.f32[0] = 0; + else + v5.f32[0] = v7.f32[0]; + + for (i = 1; i < 8; i++) + v5.f32[i] = v1.f32[i]; + *dest = pack_twops_2ph(v5, v6); +} + +void +test_512 (void) +{ + V512 res; + V512 exp; + + init_src(); + + emulate_getmant_sh(&exp, src1, 0x1, 0); + res.xmmh[0] = _mm_getmant_round_sh(src1.xmmh[0], exp.xmmh[0], + 2, 0, _ROUND_CUR); + check_results(&res, &exp, 1, "_mm_getmant_round_sh"); + + init_dest(&res, &exp); + emulate_getmant_sh(&exp, src1, 0x1, 0); + res.xmmh[0] = _mm_mask_getmant_round_sh(res.xmmh[0], 0x1, src1.xmmh[0], + exp.xmmh[0], 2, 0, _ROUND_CUR); + check_results(&res, &exp, 1, "_mm_mask_getmant_round_sh"); + + emulate_getmant_sh(&exp, src1, 0x3, 1); + res.xmmh[0] = _mm_maskz_getmant_round_sh(0x3, src1.xmmh[0], exp.xmmh[0], + 2, 0, _ROUND_CUR); + check_results(&res, &exp, 1, "_mm_maskz_getmant_round_sh"); + + if (n_errs != 0) { + abort (); + } +} + diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-vfpclassph-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vfpclassph-1a.c new file mode 100644 index 00000000000..897a3c83692 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vfpclassph-1a.c @@ -0,0 +1,22 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx512fp16 -mavx512vl -O2" } */ +/* { dg-final { scan-assembler-times "vfpclassphy\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n^k\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vfpclassphx\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n^k\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vfpclassphy\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n^k\]*%k\[0-7\]\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vfpclassphx\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n^k\]*%k\[0-7\]\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ + +#include + +volatile __m256h x256; +volatile __m128h x128; +volatile __mmask16 m16; +volatile __mmask8 m8; + +void extern +avx512dq_test (void) +{ + m16 = _mm256_fpclass_ph_mask (x256, 13); + m8 = _mm_fpclass_ph_mask (x128, 13); + m16 = _mm256_mask_fpclass_ph_mask (2, x256, 13); + m8 = _mm_mask_fpclass_ph_mask (2, x128, 13); +} diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-vfpclassph-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vfpclassph-1b.c new file mode 100644 index 00000000000..6745f137c27 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vfpclassph-1b.c @@ -0,0 +1,16 @@ +/* { dg-do run { target avx512fp16 } } */ +/* { dg-options "-O2 -mavx512fp16 -mavx512vl -mavx512dq" } */ + +#define DEBUG +#define AVX512VL +#define AVX512F_LEN 256 +#define AVX512F_LEN_HALF 128 +#include "avx512fp16-vfpclassph-1b.c" + +#undef AVX512F_LEN +#undef AVX512F_LEN_HALF + +#define AVX512F_LEN 128 +#define AVX512F_LEN_HALF 128 +#include "avx512fp16-vfpclassph-1b.c" + diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-vgetexpph-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vgetexpph-1a.c new file mode 100644 index 00000000000..82c23b6e63d --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vgetexpph-1a.c @@ -0,0 +1,26 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx512vl -mavx512fp16 -O2" } */ +/* { dg-final { scan-assembler-times "vgetexpph\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1} } */ +/* { dg-final { scan-assembler-times "vgetexpph\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1} } */ +/* { dg-final { scan-assembler-times "vgetexpph\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1} } */ +/* { dg-final { scan-assembler-times "vgetexpph\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1} } */ +/* { dg-final { scan-assembler-times "vgetexpph\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1} } */ +/* { dg-final { scan-assembler-times "vgetexpph\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1} } */ + +#include + +volatile __m256h xx; +volatile __m128h x2; +volatile __mmask8 m8; +volatile __mmask16 m16; + +void extern +avx512vl_test (void) +{ + xx = _mm256_getexp_ph (xx); + xx = _mm256_mask_getexp_ph (xx, m16, xx); + xx = _mm256_maskz_getexp_ph (m16, xx); + x2 = _mm_getexp_ph (x2); + x2 = _mm_mask_getexp_ph (x2, m8, x2); + x2 = _mm_maskz_getexp_ph (m8, x2); +} diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-vgetexpph-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vgetexpph-1b.c new file mode 100644 index 00000000000..7eb4fa4f537 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vgetexpph-1b.c @@ -0,0 +1,16 @@ +/* { dg-do run { target avx512fp16 } } */ +/* { dg-options "-O2 -mavx512fp16 -mavx512vl -mavx512dq" } */ + +#define DEBUG +#define AVX512VL +#define AVX512F_LEN 256 +#define AVX512F_LEN_HALF 128 +#include "avx512fp16-vgetexpph-1b.c" + +#undef AVX512F_LEN +#undef AVX512F_LEN_HALF + +#define AVX512F_LEN 128 +#define AVX512F_LEN_HALF 128 +#include "avx512fp16-vgetexpph-1b.c" + diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-vgetmantph-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vgetmantph-1a.c new file mode 100644 index 00000000000..4ce6ed58cf1 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vgetmantph-1a.c @@ -0,0 +1,30 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -mavx512vl -mavx512fp16 " } */ +/* { dg-final { scan-assembler-times "vgetmantph\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vgetmantph\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vgetmantph\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vgetmantph\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vgetmantph\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vgetmantph\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ + +#include + +volatile __m256h x, y; +volatile __m128h a, b; +volatile __mmask8 m8; +volatile __mmask16 m16; + +void extern +avx512vl_test (void) +{ + x = _mm256_getmant_ph (y, _MM_MANT_NORM_p75_1p5, _MM_MANT_SIGN_src); + x = _mm256_mask_getmant_ph (x, m16, y, _MM_MANT_NORM_p75_1p5, + _MM_MANT_SIGN_src); + x = _mm256_maskz_getmant_ph (m16, y, _MM_MANT_NORM_p75_1p5, + _MM_MANT_SIGN_src); + a = _mm_getmant_ph (b, _MM_MANT_NORM_p75_1p5, _MM_MANT_SIGN_src); + a = _mm_mask_getmant_ph (a, m8, b, _MM_MANT_NORM_p75_1p5, + _MM_MANT_SIGN_src); + a = _mm_maskz_getmant_ph (m8, b, _MM_MANT_NORM_p75_1p5, + _MM_MANT_SIGN_src); +} diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-vgetmantph-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vgetmantph-1b.c new file mode 100644 index 00000000000..e5f87401558 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vgetmantph-1b.c @@ -0,0 +1,16 @@ +/* { dg-do run { target avx512fp16 } } */ +/* { dg-options "-O2 -mavx512fp16 -mavx512vl -mavx512dq" } */ + +#define DEBUG +#define AVX512VL +#define AVX512F_LEN 256 +#define AVX512F_LEN_HALF 128 +#include "avx512fp16-vgetmantph-1b.c" + +#undef AVX512F_LEN +#undef AVX512F_LEN_HALF + +#define AVX512F_LEN 128 +#define AVX512F_LEN_HALF 128 +#include "avx512fp16-vgetmantph-1b.c" + From patchwork Thu Jul 1 06:16:10 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: liuhongt X-Patchwork-Id: 1499329 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=8.43.85.97; helo=sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256 header.s=default header.b=vt2X/+xb; dkim-atps=neutral Received: from sourceware.org (ip-8-43-85-97.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4GFpZL6tYCz9sVb for ; Thu, 1 Jul 2021 16:43:22 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 26AE9384F013 for ; Thu, 1 Jul 2021 06:43:20 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 26AE9384F013 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1625121800; bh=UeHzPSuycuBkPAHlmfCbwnLCyBrFwFSC5Yf4+COD3vY=; h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=vt2X/+xbom5DilTe2fiFyEipFFI3BFlXbeDcopqzHwQ5zLmHwSbF2Wks4Mzjf9Elb WEg6GenVGRbZVFf2/gTNUGG3pPYuW2BtzHaCTawb0Zmhu+ysOXq83SXqyG/3qD2Lyy VOvJjb6SB/axjcv+pYQFtxgfTAafueD8HnpaiUzU= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mga04.intel.com (mga04.intel.com [192.55.52.120]) by sourceware.org (Postfix) with ESMTPS id 2F96F3848404 for ; Thu, 1 Jul 2021 06:17:30 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 2F96F3848404 X-IronPort-AV: E=McAfee;i="6200,9189,10031"; a="206656476" X-IronPort-AV: E=Sophos;i="5.83,313,1616482800"; d="scan'208";a="206656476" Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by fmsmga104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Jun 2021 23:17:29 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.83,313,1616482800"; d="scan'208";a="476545863" Received: from scymds01.sc.intel.com ([10.148.94.138]) by fmsmga004.fm.intel.com with ESMTP; 30 Jun 2021 23:17:29 -0700 Received: from shliclel320.sh.intel.com (shliclel320.sh.intel.com [10.239.236.50]) by scymds01.sc.intel.com with ESMTP id 1616Gmf3031625; Wed, 30 Jun 2021 23:17:28 -0700 To: gcc-patches@gcc.gnu.org Subject: [PATCH 24/62] AVX512FP16: Add vmovw/vmovsh. Date: Thu, 1 Jul 2021 14:16:10 +0800 Message-Id: <20210701061648.9447-25-hongtao.liu@intel.com> X-Mailer: git-send-email 2.18.1 In-Reply-To: <20210701061648.9447-1-hongtao.liu@intel.com> References: <20210701061648.9447-1-hongtao.liu@intel.com> X-Spam-Status: No, score=-12.4 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_NONE, KAM_DMARC_STATUS, KAM_LAZY_DOMAIN_SECURITY, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: liuhongt via Gcc-patches From: liuhongt Reply-To: liuhongt Cc: jakub@redhat.com Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" gcc/ChangeLog: * config/i386/avx512fp16intrin.h: (_mm_cvtsi16_si128): New intrinsic. (_mm_cvtsi128_si16): Likewise. (_mm_mask_load_sh): Likewise. (_mm_maskz_load_sh): Likewise. (_mm_mask_store_sh): Likewise. (_mm_move_sh): Likewise. (_mm_mask_move_sh): Likewise. (_mm_maskz_move_sh): Likewise. * config/i386/i386-builtin-types.def: Add corresponding builtin types. * config/i386/i386-builtin.def: Add corresponding new builtins. * config/i386/i386-expand.c (ix86_expand_special_args_builtin): Handle new builtin types. (ix86_expand_vector_init_one_nonzero): Adjust for FP16 target. * config/i386/sse.md (VI2F): New mode iterator. (vec_set_0): Use new mode iterator. (avx512f_mov_mask): Adjust for HF vector mode. (avx512f_store_mask): Ditto. --- gcc/config/i386/avx512fp16intrin.h | 59 ++++++++++++++++++++++++++ gcc/config/i386/i386-builtin-types.def | 3 ++ gcc/config/i386/i386-builtin.def | 5 +++ gcc/config/i386/i386-expand.c | 11 +++++ gcc/config/i386/sse.md | 33 +++++++------- 5 files changed, 95 insertions(+), 16 deletions(-) diff --git a/gcc/config/i386/avx512fp16intrin.h b/gcc/config/i386/avx512fp16intrin.h index 2fbfc140c44..cdf6646c8c6 100644 --- a/gcc/config/i386/avx512fp16intrin.h +++ b/gcc/config/i386/avx512fp16intrin.h @@ -2453,6 +2453,65 @@ _mm512_maskz_getmant_round_ph (__mmask32 __U, __m512h __A, #endif /* __OPTIMIZE__ */ +/* Intrinsics vmovw. */ +extern __inline __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvtsi16_si128 (short __A) +{ + return _mm_set_epi16 (0, 0, 0, 0, 0, 0, 0, __A); +} + +extern __inline short +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvtsi128_si16 (__m128i __A) +{ + return __builtin_ia32_vec_ext_v8hi ((__v8hi)__A, 0); +} + +/* Intrinsics vmovsh. */ +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_load_sh (__m128h __A, __mmask8 __B, _Float16 const* __C) +{ + return __builtin_ia32_loadsh_mask (__C, __A, __B); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_load_sh (__mmask8 __A, _Float16 const* __B) +{ + return __builtin_ia32_loadsh_mask (__B, _mm_setzero_ph (), __A); +} + +extern __inline void +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_store_sh (_Float16 const* __A, __mmask8 __B, __m128h __C) +{ + __builtin_ia32_storesh_mask (__A, __C, __B); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_move_sh (__m128h __A, __m128h __B) +{ + __A[0] = __B[0]; + return __A; +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_move_sh (__m128h __A, __mmask8 __B, __m128h __C, __m128h __D) +{ + return __builtin_ia32_vmovsh_mask (__C, __D, __A, __B); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_move_sh (__mmask8 __A, __m128h __B, __m128h __C) +{ + return __builtin_ia32_vmovsh_mask (__B, __C, _mm_setzero_ph (), __A); +} + #ifdef __DISABLE_AVX512FP16__ #undef __DISABLE_AVX512FP16__ #pragma GCC pop_options diff --git a/gcc/config/i386/i386-builtin-types.def b/gcc/config/i386/i386-builtin-types.def index 79e7edf13e5..6cf3e354c78 100644 --- a/gcc/config/i386/i386-builtin-types.def +++ b/gcc/config/i386/i386-builtin-types.def @@ -134,6 +134,7 @@ DEF_POINTER_TYPE (PCVOID, VOID, CONST) DEF_POINTER_TYPE (PVOID, VOID) DEF_POINTER_TYPE (PDOUBLE, DOUBLE) DEF_POINTER_TYPE (PFLOAT, FLOAT) +DEF_POINTER_TYPE (PCFLOAT16, FLOAT16, CONST) DEF_POINTER_TYPE (PSHORT, SHORT) DEF_POINTER_TYPE (PUSHORT, USHORT) DEF_POINTER_TYPE (PINT, INT) @@ -1308,6 +1309,8 @@ DEF_FUNCTION_TYPE (QI, V8HF, INT, UQI) DEF_FUNCTION_TYPE (HI, V16HF, INT, UHI) DEF_FUNCTION_TYPE (SI, V32HF, INT, USI) DEF_FUNCTION_TYPE (V8HF, V8HF, V8HF) +DEF_FUNCTION_TYPE (VOID, PCFLOAT16, V8HF, UQI) +DEF_FUNCTION_TYPE (V8HF, PCFLOAT16, V8HF, UQI) DEF_FUNCTION_TYPE (V8HF, V8HF, V8HF, UQI) DEF_FUNCTION_TYPE (V8HF, V8HF, V8HF, INT) DEF_FUNCTION_TYPE (V8HF, V8HF, INT, V8HF, UQI) diff --git a/gcc/config/i386/i386-builtin.def b/gcc/config/i386/i386-builtin.def index ed1a4a38b1c..be617b8f18a 100644 --- a/gcc/config/i386/i386-builtin.def +++ b/gcc/config/i386/i386-builtin.def @@ -393,6 +393,10 @@ BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_us_truncatev32hiv32qi2_mas BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_ss_truncatev32hiv32qi2_mask_store, "__builtin_ia32_pmovswb512mem_mask", IX86_BUILTIN_PMOVSWB512_MEM, UNKNOWN, (int) VOID_FTYPE_PV32QI_V32HI_USI) BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_truncatev32hiv32qi2_mask_store, "__builtin_ia32_pmovwb512mem_mask", IX86_BUILTIN_PMOVWB512_MEM, UNKNOWN, (int) VOID_FTYPE_PV32QI_V32HI_USI) +/* AVX512FP16 */ +BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512f_loadhf_mask, "__builtin_ia32_loadsh_mask", IX86_BUILTIN_LOADSH_MASK, UNKNOWN, (int) V8HF_FTYPE_PCFLOAT16_V8HF_UQI) +BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512f_storehf_mask, "__builtin_ia32_storesh_mask", IX86_BUILTIN_STORESH_MASK, UNKNOWN, (int) VOID_FTYPE_PCFLOAT16_V8HF_UQI) + /* RDPKRU and WRPKRU. */ BDESC (OPTION_MASK_ISA_PKU, 0, CODE_FOR_rdpkru, "__builtin_ia32_rdpkru", IX86_BUILTIN_RDPKRU, UNKNOWN, (int) UNSIGNED_FTYPE_VOID) BDESC (OPTION_MASK_ISA_PKU, 0, CODE_FOR_wrpkru, "__builtin_ia32_wrpkru", IX86_BUILTIN_WRPKRU, UNKNOWN, (int) VOID_FTYPE_UNSIGNED) @@ -2826,6 +2830,7 @@ BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512vl_ BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_getexpv8hf_mask, "__builtin_ia32_getexpph128_mask", IX86_BUILTIN_GETEXPPH128, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_UQI) BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512vl_getmantv16hf_mask, "__builtin_ia32_getmantph256_mask", IX86_BUILTIN_GETMANTPH256, UNKNOWN, (int) V16HF_FTYPE_V16HF_INT_V16HF_UHI) BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_getmantv8hf_mask, "__builtin_ia32_getmantph128_mask", IX86_BUILTIN_GETMANTPH128, UNKNOWN, (int) V8HF_FTYPE_V8HF_INT_V8HF_UQI) +BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512f_movhf_mask, "__builtin_ia32_vmovsh_mask", IX86_BUILTIN_VMOVSH_MASK, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI) /* Builtins with rounding support. */ BDESC_END (ARGS, ROUND_ARGS) diff --git a/gcc/config/i386/i386-expand.c b/gcc/config/i386/i386-expand.c index 266aa411ddb..bfc7fc75b97 100644 --- a/gcc/config/i386/i386-expand.c +++ b/gcc/config/i386/i386-expand.c @@ -10907,6 +10907,7 @@ ix86_expand_special_args_builtin (const struct builtin_description *d, case VOID_FTYPE_PFLOAT_V16SF_UHI: case VOID_FTYPE_PFLOAT_V8SF_UQI: case VOID_FTYPE_PFLOAT_V4SF_UQI: + case VOID_FTYPE_PCFLOAT16_V8HF_UQI: case VOID_FTYPE_PV32QI_V32HI_USI: case VOID_FTYPE_PV16QI_V16HI_UHI: case VOID_FTYPE_PUDI_V8HI_UQI: @@ -10979,6 +10980,7 @@ ix86_expand_special_args_builtin (const struct builtin_description *d, case V16SF_FTYPE_PCFLOAT_V16SF_UHI: case V8SF_FTYPE_PCFLOAT_V8SF_UQI: case V4SF_FTYPE_PCFLOAT_V4SF_UQI: + case V8HF_FTYPE_PCFLOAT16_V8HF_UQI: nargs = 3; klass = load; memory = 0; @@ -13993,6 +13995,8 @@ ix86_expand_vector_init_one_nonzero (bool mmx_ok, machine_mode mode, break; case E_V8HImode: use_vector_set = TARGET_SSE2; + gen_vec_set_0 = TARGET_AVX512FP16 && one_var == 0 + ? gen_vec_setv8hi_0 : NULL; break; case E_V8QImode: use_vector_set = TARGET_MMX_WITH_SSE && TARGET_SSE4_1; @@ -14004,8 +14008,12 @@ ix86_expand_vector_init_one_nonzero (bool mmx_ok, machine_mode mode, use_vector_set = TARGET_SSE4_1; break; case E_V32QImode: + use_vector_set = TARGET_AVX; + break; case E_V16HImode: use_vector_set = TARGET_AVX; + gen_vec_set_0 = TARGET_AVX512FP16 && one_var == 0 + ? gen_vec_setv16hi_0 : NULL; break; case E_V8SImode: use_vector_set = TARGET_AVX; @@ -14053,6 +14061,9 @@ ix86_expand_vector_init_one_nonzero (bool mmx_ok, machine_mode mode, use_vector_set = TARGET_AVX512FP16 && one_var == 0; gen_vec_set_0 = gen_vec_setv32hf_0; break; + case E_V32HImode: + use_vector_set = TARGET_AVX512FP16 && one_var == 0; + gen_vec_set_0 = gen_vec_setv32hi_0; default: break; } diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index c4db778e25d..97f7c698d5d 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -758,6 +758,7 @@ (define_mode_iterator VIHF_AVX512BW (V32HF "TARGET_AVX512FP16")]) ;; Int-float size matches +(define_mode_iterator VI2F [V8HI V16HI V32HI V8HF V16HF V32HF]) (define_mode_iterator VI4F_128 [V4SI V4SF]) (define_mode_iterator VI8F_128 [V2DI V2DF]) (define_mode_iterator VI4F_256 [V8SI V8SF]) @@ -1317,13 +1318,13 @@ (define_insn_and_split "*_load" [(set (match_dup 0) (match_dup 1))]) (define_insn "avx512f_mov_mask" - [(set (match_operand:VF_128 0 "register_operand" "=v") - (vec_merge:VF_128 - (vec_merge:VF_128 - (match_operand:VF_128 2 "register_operand" "v") - (match_operand:VF_128 3 "nonimm_or_0_operand" "0C") + [(set (match_operand:VFH_128 0 "register_operand" "=v") + (vec_merge:VFH_128 + (vec_merge:VFH_128 + (match_operand:VFH_128 2 "register_operand" "v") + (match_operand:VFH_128 3 "nonimm_or_0_operand" "0C") (match_operand:QI 4 "register_operand" "Yk")) - (match_operand:VF_128 1 "register_operand" "v") + (match_operand:VFH_128 1 "register_operand" "v") (const_int 1)))] "TARGET_AVX512F" "vmov\t{%2, %1, %0%{%4%}%N3|%0%{%4%}%N3, %1, %2}" @@ -1336,7 +1337,7 @@ (define_expand "avx512f_load_mask" (vec_merge: (vec_merge: (vec_duplicate: - (match_operand:MODEF 1 "memory_operand")) + (match_operand:MODEFH 1 "memory_operand")) (match_operand: 2 "nonimm_or_0_operand") (match_operand:QI 3 "register_operand")) (match_dup 4) @@ -1349,7 +1350,7 @@ (define_insn "*avx512f_load_mask" (vec_merge: (vec_merge: (vec_duplicate: - (match_operand:MODEF 1 "memory_operand" "m")) + (match_operand:MODEFH 1 "memory_operand" "m")) (match_operand: 2 "nonimm_or_0_operand" "0C") (match_operand:QI 3 "register_operand" "Yk")) (match_operand: 4 "const0_operand" "C") @@ -1362,11 +1363,11 @@ (define_insn "*avx512f_load_mask" (set_attr "mode" "")]) (define_insn "avx512f_store_mask" - [(set (match_operand:MODEF 0 "memory_operand" "=m") - (if_then_else:MODEF + [(set (match_operand:MODEFH 0 "memory_operand" "=m") + (if_then_else:MODEFH (and:QI (match_operand:QI 2 "register_operand" "Yk") (const_int 1)) - (vec_select:MODEF + (vec_select:MODEFH (match_operand: 1 "register_operand" "v") (parallel [(const_int 0)])) (match_dup 0)))] @@ -8513,11 +8514,11 @@ (define_insn "vec_set_0" ;; vmovw clears also the higer bits (define_insn "vec_set_0" - [(set (match_operand:VF_AVX512FP16 0 "register_operand" "=v") - (vec_merge:VF_AVX512FP16 - (vec_duplicate:VF_AVX512FP16 - (match_operand:HF 2 "nonimmediate_operand" "rm")) - (match_operand:VF_AVX512FP16 1 "const0_operand" "C") + [(set (match_operand:VI2F 0 "register_operand" "=v") + (vec_merge:VI2F + (vec_duplicate:VI2F + (match_operand: 2 "nonimmediate_operand" "rm")) + (match_operand:VI2F 1 "const0_operand" "C") (const_int 1)))] "TARGET_AVX512FP16" "vmovw\t{%2, %x0|%x0, %2}" From patchwork Thu Jul 1 06:16:11 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: liuhongt X-Patchwork-Id: 1499332 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256 header.s=default header.b=V0Ox+UFY; dkim-atps=neutral Received: from sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4GFpfM4krkz9sVb for ; Thu, 1 Jul 2021 16:46:51 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id BB11F383F42D for ; Thu, 1 Jul 2021 06:46:48 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org BB11F383F42D DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1625122008; bh=ONbrfOpTi09a9mvBQucJDpmT5XoIazvZY93jGXGAg0I=; h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=V0Ox+UFYLjvSHtQ30MtNQLbIn43bBv67VcQMZIi50M9iO1RFB4EV98i+kd/cpkA2C JN2qw89c6mWWzzZKTnjOX2huwnjrz/qeBXU4z2nGc/PYD/D5mPrdcGbgQ9l0AXQQJu 4Mbyeb7WSgKFDUbn1yJh5EY3VXBjjj0QjHOzQpco= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mga07.intel.com (mga07.intel.com [134.134.136.100]) by sourceware.org (Postfix) with ESMTPS id 853C23858034 for ; Thu, 1 Jul 2021 06:17:32 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 853C23858034 X-IronPort-AV: E=McAfee;i="6200,9189,10031"; a="272334072" X-IronPort-AV: E=Sophos;i="5.83,313,1616482800"; d="scan'208";a="272334072" Received: from orsmga003.jf.intel.com ([10.7.209.27]) by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Jun 2021 23:17:31 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.83,313,1616482800"; d="scan'208";a="409038750" Received: from scymds01.sc.intel.com ([10.148.94.138]) by orsmga003.jf.intel.com with ESMTP; 30 Jun 2021 23:17:31 -0700 Received: from shliclel320.sh.intel.com (shliclel320.sh.intel.com [10.239.236.50]) by scymds01.sc.intel.com with ESMTP id 1616Gmf4031625; Wed, 30 Jun 2021 23:17:30 -0700 To: gcc-patches@gcc.gnu.org Subject: [PATCH 25/62] AVX512FP16: Add testcase for vmovsh/vmovw. Date: Thu, 1 Jul 2021 14:16:11 +0800 Message-Id: <20210701061648.9447-26-hongtao.liu@intel.com> X-Mailer: git-send-email 2.18.1 In-Reply-To: <20210701061648.9447-1-hongtao.liu@intel.com> References: <20210701061648.9447-1-hongtao.liu@intel.com> X-Spam-Status: No, score=-12.2 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_NONE, KAM_DMARC_STATUS, KAM_LAZY_DOMAIN_SECURITY, KAM_SHORT, SCC_5_SHORT_WORD_LINES, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: liuhongt via Gcc-patches From: liuhongt Reply-To: liuhongt Cc: jakub@redhat.com Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" gcc/testsuite/ChangeLog: * gcc.target/i386/avx512fp16-vmovsh-1a.c: New test. * gcc.target/i386/avx512fp16-vmovsh-1b.c: Ditto. * gcc.target/i386/avx512fp16-vmovw-1a.c: Ditto. * gcc.target/i386/avx512fp16-vmovw-1b.c: Ditto. * gcc.target/i386/avx512fp16-vmovw-2a.c: Ditto. * gcc.target/i386/avx512fp16-vmovw-2b.c: Ditto. * gcc.target/i386/avx512fp16-vmovw-3a.c: Ditto. * gcc.target/i386/avx512fp16-vmovw-3b.c: Ditto. * gcc.target/i386/avx512fp16-vmovw-4a.c: Ditto. * gcc.target/i386/avx512fp16-vmovw-4b.c: Ditto. --- .../gcc.target/i386/avx512fp16-vmovsh-1a.c | 26 ++++ .../gcc.target/i386/avx512fp16-vmovsh-1b.c | 115 ++++++++++++++++++ .../gcc.target/i386/avx512fp16-vmovw-1a.c | 15 +++ .../gcc.target/i386/avx512fp16-vmovw-1b.c | 27 ++++ .../gcc.target/i386/avx512fp16-vmovw-2a.c | 21 ++++ .../gcc.target/i386/avx512fp16-vmovw-2b.c | 53 ++++++++ .../gcc.target/i386/avx512fp16-vmovw-3a.c | 23 ++++ .../gcc.target/i386/avx512fp16-vmovw-3b.c | 52 ++++++++ .../gcc.target/i386/avx512fp16-vmovw-4a.c | 27 ++++ .../gcc.target/i386/avx512fp16-vmovw-4b.c | 52 ++++++++ 10 files changed, 411 insertions(+) create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vmovsh-1a.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vmovsh-1b.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vmovw-1a.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vmovw-1b.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vmovw-2a.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vmovw-2b.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vmovw-3a.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vmovw-3b.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vmovw-4a.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vmovw-4b.c diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vmovsh-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vmovsh-1a.c new file mode 100644 index 00000000000..e35be10fcd0 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vmovsh-1a.c @@ -0,0 +1,26 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx512fp16 -O2" } */ +/* { dg-final { scan-assembler-times "vmovsh\[ \\t\]+%xmm\[0-9\]+\[^\n\r\]*%\[er\]ax+\[^\n\r]*\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vmovsh\[ \\t\]+\[^\n\r\]*%\[er\]ax+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vmovsh\[ \\t\]+\[^\n\r\]*%\[er\]ax+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vmovsh\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vmovsh\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\[^z\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vmovsh\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ + +#include + +extern _Float16 const* p; +volatile __m128h x1, x2, res; +volatile __mmask8 m8; + +void +avx512f_test (void) +{ + x2 = _mm_mask_load_sh (x1, m8, p); + x2 = _mm_maskz_load_sh (m8, p); + _mm_mask_store_sh (p, m8, x1); + + res = _mm_move_sh (x1, x2); + res = _mm_mask_move_sh (res, m8, x1, x2); + res = _mm_maskz_move_sh (m8, x1, x2); +} diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vmovsh-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vmovsh-1b.c new file mode 100644 index 00000000000..cea224a62e6 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vmovsh-1b.c @@ -0,0 +1,115 @@ +/* { dg-do run { target avx512fp16 } } */ +/* { dg-options "-O2 -mavx512fp16 -mavx512dq" } */ + +#define AVX512FP16 +#include "avx512fp16-helper.h" + +void NOINLINE +emulate_mov2_load_sh(V512 * dest, V512 op1, + __mmask8 k, int zero_mask) +{ + V512 v1, v2, v3, v4, v5, v6, v7, v8; + int i; + + unpack_ph_2twops(op1, &v1, &v2); + unpack_ph_2twops(*dest, &v7, &v8); + + if ((k&1) || !k) + v5.f32[0] = v1.f32[0]; + else if (zero_mask) + v5.f32[0] = 0; + else + v5.f32[0] = v7.f32[0]; //remains unchanged + + for (i = 1; i < 8; i++) + v5.f32[i] = 0; + + *dest = pack_twops_2ph(v5, v6); +} + +void NOINLINE +emulate_mov3_load_sh(V512 * dest, V512 op1, V512 op2, + __mmask8 k, int zero_mask) +{ + V512 v1, v2, v3, v4, v5, v6, v7, v8; + int i; + + unpack_ph_2twops(op1, &v1, &v2); + unpack_ph_2twops(op2, &v3, &v4); + unpack_ph_2twops(*dest, &v7, &v8); + + if ((k&1) || !k) + v5.f32[0] = v3.f32[0]; + else if (zero_mask) + v5.f32[0] = 0; + else + v5.f32[0] = v7.f32[0]; //remains unchanged + + for (i = 1; i < 8; i++) + v5.f32[i] = v1.f32[i]; + + *dest = pack_twops_2ph(v5, v6); +} + +void NOINLINE +emulate_mov2_store_sh(V512 * dest, V512 op1, __mmask8 k) +{ + V512 v1, v2, v3, v4, v5, v6, v7, v8; + int i; + + unpack_ph_2twops(op1, &v1, &v2); + unpack_ph_2twops(*dest, &v7, &v8); + + if ((k&1) || !k) + v5.f32[0] = v1.f32[0]; + else + v5.f32[0] = v7.f32[0]; //remains unchanged + + *dest = pack_twops_2ph(v5, v6); +} + +void +test_512 (void) +{ + V512 res; + V512 exp; + + init_src(); + + // no mask + emulate_mov2_load_sh (&exp, src1, 0x0, 0); + res.xmmh[0] = _mm_load_sh((const void *)&(src1.u16[0])); + check_results(&res, &exp, 8, "_mm_load_sh"); + + // with mask and mask bit is set + emulate_mov2_load_sh (&exp, src1, 0x1, 0); + res.xmmh[0] = _mm_mask_load_sh(res.xmmh[0], 0x1, (const void *)&(src1.u16[0])); + check_results(&res, &exp, 8, "_mm__mask_load_sh"); + + // with zero-mask + emulate_mov2_load_sh (&exp, src1, 0x0, 1); + res.xmmh[0] = _mm_maskz_load_sh(0x1, (const void *)&(src1.u16[0])); + check_results(&res, &exp, 8, "_mm_maskz_load_sh"); + + emulate_mov3_load_sh (&exp, src1, src2, 0x1, 0); + res.xmmh[0] = _mm_mask_move_sh(res.xmmh[0], 0x1, src1.xmmh[0], src2.xmmh[0]); + check_results(&res, &exp, 8, "_mm_mask_move_sh"); + + emulate_mov3_load_sh (&exp, src1, src2, 0x1, 1); + res.xmmh[0] = _mm_maskz_move_sh(0x1, src1.xmmh[0], src2.xmmh[0]); + check_results(&res, &exp, 8, "_mm_maskz_move_sh"); + + // no mask + emulate_mov2_store_sh (&exp, src1, 0x0); + _mm_store_sh((void *)&(res.u16[0]), src1.xmmh[0]); + check_results(&exp, &res, 1, "_mm_store_sh"); + + // with mask + emulate_mov2_store_sh (&exp, src1, 0x1); + _mm_mask_store_sh((void *)&(res.u16[0]), 0x1, src1.xmmh[0]); + check_results(&exp, &res, 1, "_mm_mask_store_sh"); + + if (n_errs != 0) { + abort (); + } +} diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vmovw-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vmovw-1a.c new file mode 100644 index 00000000000..177802c6dcb --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vmovw-1a.c @@ -0,0 +1,15 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx512fp16 -O2" } */ +/* { dg-final { scan-assembler-times "vmovw\[^-]" 1 } } */ +/* { dg-final { scan-assembler-times "vpextrw" 1 } } */ +#include + +volatile __m128i x1; +volatile short x2; + +void extern +avx512f_test (void) +{ + x1 = _mm_cvtsi16_si128 (x2); + x2 = _mm_cvtsi128_si16 (x1); +} diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vmovw-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vmovw-1b.c new file mode 100644 index 00000000000..a96007d6fd8 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vmovw-1b.c @@ -0,0 +1,27 @@ +/* { dg-do run {target avx512fp16} } */ +/* { dg-options "-O2 -mavx512fp16" } */ + +static void do_test (void); + +#define DO_TEST do_test +#define AVX512FP16 +#include "avx512-check.h" + +static void +do_test (void) +{ + union128i_w u; + short b = 128; + short e[8] = {0,0,0,0,0,0,0,0}; + + u.x = _mm_cvtsi16_si128 (b); + + e[0] = b; + + if (check_union128i_w (u, e)) + abort (); + u.a[0] = 123; + b = _mm_cvtsi128_si16 (u.x); + if (u.a[0] != b) + abort(); +} diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vmovw-2a.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vmovw-2a.c new file mode 100644 index 00000000000..efa24e5523c --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vmovw-2a.c @@ -0,0 +1,21 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -mavx512fp16" } */ + +typedef short __v8hi __attribute__ ((__vector_size__ (16))); +typedef long long __m128i __attribute__ ((__vector_size__ (16), __may_alias__)); + +__m128i +__attribute__ ((noinline, noclone)) +foo1 (short x) +{ + return __extension__ (__m128i)(__v8hi) { x, 0, 0, 0, 0, 0, 0, 0 }; +} + +__m128i +__attribute__ ((noinline, noclone)) +foo2 (short *x) +{ + return __extension__ (__m128i)(__v8hi) { *x, 0, 0, 0, 0, 0, 0, 0 }; +} + +/* { dg-final { scan-assembler-times "vmovw\[^-\n\r]*xmm0" 2 } } */ diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vmovw-2b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vmovw-2b.c new file mode 100644 index 00000000000..b680a16945f --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vmovw-2b.c @@ -0,0 +1,53 @@ +/* { dg-do run { target avx512fp16 } } */ +/* { dg-options "-O2 -mavx512fp16" } */ + +#include + +static void do_test (void); + +#define DO_TEST do_test +#define AVX512FP16 +#include "avx512-check.h" +#include "avx512fp16-vmovw-2a.c" + +__m128i +__attribute__ ((noinline,noclone)) +foo3 (__m128i x) +{ + return foo1 (((__v8hi) x)[0]); +} + +static void +do_test (void) +{ + short x; + union128i_w u = { -1, -1,}; + union128i_w exp = { 0, 0}; + __m128i v; + union128i_w a; + + x = 25; + exp.a[0] = x; + memset (&v, -1, sizeof (v)); + v = foo1 (x); + a.x = v; + if (check_union128i_w (a, exp.a)) + abort (); + + x = 33; + exp.a[0] = x; + memset (&v, -1, sizeof (v)); + v = foo2 (&x); + a.x = v; + if (check_union128i_w (a, exp.a)) + abort (); + + x = -33; + u.a[0] = x; + exp.a[0] = x; + memset (&v, -1, sizeof (v)); + v = foo3 (u.x); + a.x = v; + if (check_union128i_w (a, exp.a)) + abort (); +} diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vmovw-3a.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vmovw-3a.c new file mode 100644 index 00000000000..c60310710a4 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vmovw-3a.c @@ -0,0 +1,23 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -mavx512fp16" } */ + +typedef short __v16hi __attribute__ ((__vector_size__ (32))); +typedef long long __m256i __attribute__ ((__vector_size__ (32), __may_alias__)); + +__m256i +__attribute__ ((noinline, noclone)) +foo1 (short x) +{ + return __extension__ (__m256i)(__v16hi) { x, 0, 0, 0, 0, 0, 0, 0, + 0, 0, 0, 0, 0, 0, 0, 0 }; +} + +__m256i +__attribute__ ((noinline, noclone)) +foo2 (short *x) +{ + return __extension__ (__m256i)(__v16hi) { *x, 0, 0, 0, 0, 0, 0, 0, + 0, 0, 0, 0, 0, 0, 0, 0 }; +} + +/* { dg-final { scan-assembler-times "vmovw\[^-\n\r]*xmm0" 2 } } */ diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vmovw-3b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vmovw-3b.c new file mode 100644 index 00000000000..13c1f6518f2 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vmovw-3b.c @@ -0,0 +1,52 @@ +/* { dg-do run { target avx512fp16 } } */ +/* { dg-options "-O2 -mavx512fp16" } */ + +#include + +static void do_test (void); + +#define DO_TEST do_test +#define AVX512FP16 +#include "avx512-check.h" +#include "avx512fp16-vmovw-3a.c" + +__m256i +__attribute__ ((noinline,noclone)) +foo3 (__m256i x) +{ + return foo1 (((__v16hi) x)[0]); +} + +static void +do_test (void) +{ + short x; + union256i_w u = { -1, -1, -1, -1 }; + union256i_w exp = { 0, 0, 0, 0 }; + + __m256i v; + union256i_w a; + exp.a[0] = x; + memset (&v, -1, sizeof (v)); + v = foo1 (x); + a.x = v; + if (check_union256i_w (a, exp.a)) + abort (); + + x = 33; + exp.a[0] = x; + memset (&v, -1, sizeof (v)); + v = foo2 (&x); + a.x = v; + if (check_union256i_w (a, exp.a)) + abort (); + + x = -23; + u.a[0] = x; + exp.a[0] = x; + memset (&v, -1, sizeof (v)); + v = foo3 (u.x); + a.x = v; + if (check_union256i_w (a, exp.a)) + abort (); +} diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vmovw-4a.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vmovw-4a.c new file mode 100644 index 00000000000..2ba198dd7fc --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vmovw-4a.c @@ -0,0 +1,27 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -mavx512fp16" } */ + +typedef short __v32hi __attribute__ ((__vector_size__ (64))); +typedef long long __m512i __attribute__ ((__vector_size__ (64), __may_alias__)); + +__m512i +__attribute__ ((noinline, noclone)) +foo1 (short x) +{ + return __extension__ (__m512i)(__v32hi) { x, 0, 0, 0, 0, 0, 0, 0, + 0, 0, 0, 0, 0, 0, 0, 0, + 0, 0, 0, 0, 0, 0, 0, 0, + 0, 0, 0, 0, 0, 0, 0, 0 }; +} + +__m512i +__attribute__ ((noinline, noclone)) +foo2 (short *x) +{ + return __extension__ (__m512i)(__v32hi) { *x, 0, 0, 0, 0, 0, 0, 0, + 0, 0, 0, 0, 0, 0, 0, 0, + 0, 0, 0, 0, 0, 0, 0, 0, + 0, 0, 0, 0, 0, 0, 0, 0 }; +} + +/* { dg-final { scan-assembler-times "vmovw\[^-\n\r]*xmm0" 2 } } */ diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vmovw-4b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vmovw-4b.c new file mode 100644 index 00000000000..ec6477b793f --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vmovw-4b.c @@ -0,0 +1,52 @@ +/* { dg-do run { target avx512fp16 } } */ +/* { dg-options "-O2 -mavx512fp16" } */ + +#include + +static void do_test (void); + +#define DO_TEST do_test +#define AVX512FP16 +#include "avx512-check.h" +#include "avx512fp16-vmovw-4a.c" + +__m512i +__attribute__ ((noinline,noclone)) +foo3 (__m512i x) +{ + return foo1 (((__v32hi) x)[0]); +} + +static void +do_test (void) +{ + short x = 25; + union512i_w u = { -1, -1, -1, -1, -1, -1, -1, -1 }; + union512i_w exp = { 0, 0, 0, 0, 0, 0, 0, 0 }; + + __m512i v; + union512i_w a; + exp.a[0] = x; + memset (&v, -1, sizeof (v)); + v = foo1 (x); + a.x = v; + if (check_union512i_w (a, exp.a)) + abort (); + + x = 55; + exp.a[0] = x; + memset (&v, -1, sizeof (v)); + v = foo2 (&x); + a.x = v; + if (check_union512i_w (a, exp.a)) + abort (); + + x = 33; + u.a[0] = x; + exp.a[0] = x; + memset (&v, -1, sizeof (v)); + v = foo3 (u.x); + a.x = v; + if (check_union512i_w (a, exp.a)) + abort (); +} From patchwork Thu Jul 1 06:16:12 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: liuhongt X-Patchwork-Id: 1499333 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256 header.s=default header.b=QcxjjqEa; dkim-atps=neutral Received: from sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4GFpgX1NWHz9sWX for ; Thu, 1 Jul 2021 16:47:52 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id AED5C383B40F for ; Thu, 1 Jul 2021 06:47:49 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org AED5C383B40F DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1625122069; bh=6UVIl9QSFpFBk8if1oisbCWjWVzQ+2M8uFeQS/loluM=; h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=QcxjjqEaXYUcrRD6xLnKlq2vl8vftfj8ceQLR+gL7EXpMLfl3Z+LMPQiTDWCwWI0N ctyjr/Q4F3JcBFro+GDunwyhck7ZhP9K3AxL4UJRl1z3teImfTCeFr0mYfilaQHzez R9Qaqs26yxpWp43/QKYBZh2Z5e6LyLjE9DsOIUhE= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mga12.intel.com (mga12.intel.com [192.55.52.136]) by sourceware.org (Postfix) with ESMTPS id AD132384A881 for ; Thu, 1 Jul 2021 06:17:34 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org AD132384A881 X-IronPort-AV: E=McAfee;i="6200,9189,10031"; a="188163488" X-IronPort-AV: E=Sophos;i="5.83,313,1616482800"; d="scan'208";a="188163488" Received: from orsmga001.jf.intel.com ([10.7.209.18]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Jun 2021 23:17:33 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.83,313,1616482800"; d="scan'208";a="489821343" Received: from scymds01.sc.intel.com ([10.148.94.138]) by orsmga001.jf.intel.com with ESMTP; 30 Jun 2021 23:17:33 -0700 Received: from shliclel320.sh.intel.com (shliclel320.sh.intel.com [10.239.236.50]) by scymds01.sc.intel.com with ESMTP id 1616Gmf5031625; Wed, 30 Jun 2021 23:17:31 -0700 To: gcc-patches@gcc.gnu.org Subject: [PATCH 26/62] AVX512FP16: Add vcvtph2dq/vcvtph2qq/vcvtph2w/vcvtph2uw/vcvtph2uqq/vcvtph2udq Date: Thu, 1 Jul 2021 14:16:12 +0800 Message-Id: <20210701061648.9447-27-hongtao.liu@intel.com> X-Mailer: git-send-email 2.18.1 In-Reply-To: <20210701061648.9447-1-hongtao.liu@intel.com> References: <20210701061648.9447-1-hongtao.liu@intel.com> X-Spam-Status: No, score=-12.3 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_NONE, KAM_DMARC_STATUS, KAM_LAZY_DOMAIN_SECURITY, KAM_SHORT, SPF_HELO_PASS, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: liuhongt via Gcc-patches From: liuhongt Reply-To: liuhongt Cc: jakub@redhat.com Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" gcc/ChangeLog: * config/i386/avx512fp16intrin.h (_mm512_cvtph_epi32): New intrinsic/ (_mm512_mask_cvtph_epi32): Likewise. (_mm512_maskz_cvtph_epi32): Likewise. (_mm512_cvt_roundph_epi32): Likewise. (_mm512_mask_cvt_roundph_epi32): Likewise. (_mm512_maskz_cvt_roundph_epi32): Likewise. (_mm512_cvtph_epu32): Likewise. (_mm512_mask_cvtph_epu32): Likewise. (_mm512_maskz_cvtph_epu32): Likewise. (_mm512_cvt_roundph_epu32): Likewise. (_mm512_mask_cvt_roundph_epu32): Likewise. (_mm512_maskz_cvt_roundph_epu32): Likewise. (_mm512_cvtph_epi64): Likewise. (_mm512_mask_cvtph_epi64): Likewise. (_mm512_maskz_cvtph_epi64): Likewise. (_mm512_cvt_roundph_epi64): Likewise. (_mm512_mask_cvt_roundph_epi64): Likewise. (_mm512_maskz_cvt_roundph_epi64): Likewise. (_mm512_cvtph_epu64): Likewise. (_mm512_mask_cvtph_epu64): Likewise. (_mm512_maskz_cvtph_epu64): Likewise. (_mm512_cvt_roundph_epu64): Likewise. (_mm512_mask_cvt_roundph_epu64): Likewise. (_mm512_maskz_cvt_roundph_epu64): Likewise. (_mm512_cvtph_epi16): Likewise. (_mm512_mask_cvtph_epi16): Likewise. (_mm512_maskz_cvtph_epi16): Likewise. (_mm512_cvt_roundph_epi16): Likewise. (_mm512_mask_cvt_roundph_epi16): Likewise. (_mm512_maskz_cvt_roundph_epi16): Likewise. (_mm512_cvtph_epu16): Likewise. (_mm512_mask_cvtph_epu16): Likewise. (_mm512_maskz_cvtph_epu16): Likewise. (_mm512_cvt_roundph_epu16): Likewise. (_mm512_mask_cvt_roundph_epu16): Likewise. (_mm512_maskz_cvt_roundph_epu16): Likewise. * config/i386/avx512fp16vlintrin.h (_mm_cvtph_epi32): New intrinsic. (_mm_mask_cvtph_epi32): Likewise. (_mm_maskz_cvtph_epi32): Likewise. (_mm256_cvtph_epi32): Likewise. (_mm256_mask_cvtph_epi32): Likewise. (_mm256_maskz_cvtph_epi32): Likewise. (_mm_cvtph_epu32): Likewise. (_mm_mask_cvtph_epu32): Likewise. (_mm_maskz_cvtph_epu32): Likewise. (_mm256_cvtph_epu32): Likewise. (_mm256_mask_cvtph_epu32): Likewise. (_mm256_maskz_cvtph_epu32): Likewise. (_mm_cvtph_epi64): Likewise. (_mm_mask_cvtph_epi64): Likewise. (_mm_maskz_cvtph_epi64): Likewise. (_mm256_cvtph_epi64): Likewise. (_mm256_mask_cvtph_epi64): Likewise. (_mm256_maskz_cvtph_epi64): Likewise. (_mm_cvtph_epu64): Likewise. (_mm_mask_cvtph_epu64): Likewise. (_mm_maskz_cvtph_epu64): Likewise. (_mm256_cvtph_epu64): Likewise. (_mm256_mask_cvtph_epu64): Likewise. (_mm256_maskz_cvtph_epu64): Likewise. (_mm_cvtph_epi16): Likewise. (_mm_mask_cvtph_epi16): Likewise. (_mm_maskz_cvtph_epi16): Likewise. (_mm256_cvtph_epi16): Likewise. (_mm256_mask_cvtph_epi16): Likewise. (_mm256_maskz_cvtph_epi16): Likewise. (_mm_cvtph_epu16): Likewise. (_mm_mask_cvtph_epu16): Likewise. (_mm_maskz_cvtph_epu16): Likewise. (_mm256_cvtph_epu16): Likewise. (_mm256_mask_cvtph_epu16): Likewise. (_mm256_maskz_cvtph_epu16): Likewise. * config/i386/i386-builtin-types.def: Add new builtin types. * config/i386/i386-builtin.def: Add new builtins. * config/i386/i386-expand.c (ix86_expand_args_builtin): Handle new builtin types. (ix86_expand_round_builtin): Ditto. * config/i386/sse.md (sseintconvert): New. (ssePHmode): Ditto. (UNSPEC_US_FIX_NOTRUNC): Ditto. (sseintconvertsignprefix): Ditto. (avx512fp16_vcvtph2_): Ditto. gcc/testsuite/ChangeLog: * gcc.target/i386/avx-1.c: Add test for new builtins. * gcc.target/i386/sse-13.c: Ditto. * gcc.target/i386/sse-23.c: Ditto. * gcc.target/i386/sse-14.c: Add test for new intrinsics. * gcc.target/i386/sse-22.c: Ditto. --- gcc/config/i386/avx512fp16intrin.h | 525 +++++++++++++++++++++++++ gcc/config/i386/avx512fp16vlintrin.h | 345 ++++++++++++++++ gcc/config/i386/i386-builtin-types.def | 9 + gcc/config/i386/i386-builtin.def | 18 + gcc/config/i386/i386-expand.c | 9 + gcc/config/i386/sse.md | 35 ++ gcc/testsuite/gcc.target/i386/avx-1.c | 6 + gcc/testsuite/gcc.target/i386/sse-13.c | 6 + gcc/testsuite/gcc.target/i386/sse-14.c | 18 + gcc/testsuite/gcc.target/i386/sse-22.c | 18 + gcc/testsuite/gcc.target/i386/sse-23.c | 6 + 11 files changed, 995 insertions(+) diff --git a/gcc/config/i386/avx512fp16intrin.h b/gcc/config/i386/avx512fp16intrin.h index cdf6646c8c6..42576c4ae2e 100644 --- a/gcc/config/i386/avx512fp16intrin.h +++ b/gcc/config/i386/avx512fp16intrin.h @@ -2512,6 +2512,531 @@ _mm_maskz_move_sh (__mmask8 __A, __m128h __B, __m128h __C) return __builtin_ia32_vmovsh_mask (__B, __C, _mm_setzero_ph (), __A); } +/* Intrinsics vcvtph2dq. */ +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_cvtph_epi32 (__m256h __A) +{ + return (__m512i) + __builtin_ia32_vcvtph2dq_v16si_mask_round (__A, + (__v16si) + _mm512_setzero_si512 (), + (__mmask16) -1, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_cvtph_epi32 (__m512i __A, __mmask16 __B, __m256h __C) +{ + return (__m512i) + __builtin_ia32_vcvtph2dq_v16si_mask_round (__C, + (__v16si) __A, + __B, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_cvtph_epi32 (__mmask16 __A, __m256h __B) +{ + return (__m512i) + __builtin_ia32_vcvtph2dq_v16si_mask_round (__B, + (__v16si) + _mm512_setzero_si512 (), + __A, + _MM_FROUND_CUR_DIRECTION); +} + +#ifdef __OPTIMIZE__ +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_cvt_roundph_epi32 (__m256h __A, int __B) +{ + return (__m512i) + __builtin_ia32_vcvtph2dq_v16si_mask_round (__A, + (__v16si) + _mm512_setzero_si512 (), + (__mmask16) -1, + __B); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_cvt_roundph_epi32 (__m512i __A, __mmask16 __B, __m256h __C, int __D) +{ + return (__m512i) + __builtin_ia32_vcvtph2dq_v16si_mask_round (__C, + (__v16si) __A, + __B, + __D); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_cvt_roundph_epi32 (__mmask16 __A, __m256h __B, int __C) +{ + return (__m512i) + __builtin_ia32_vcvtph2dq_v16si_mask_round (__B, + (__v16si) + _mm512_setzero_si512 (), + __A, + __C); +} + +#else +#define _mm512_cvt_roundph_epi32(A, B) \ + ((__m512i) \ + __builtin_ia32_vcvtph2dq_v16si_mask_round ((A), \ + (__v16si) \ + _mm512_setzero_si512 (), \ + (__mmask16)-1, \ + (B))) + +#define _mm512_mask_cvt_roundph_epi32(A, B, C, D) \ + ((__m512i) \ + __builtin_ia32_vcvtph2dq_v16si_mask_round ((C), (__v16si)(A), (B), (D))) + +#define _mm512_maskz_cvt_roundph_epi32(A, B, C) \ + ((__m512i) \ + __builtin_ia32_vcvtph2dq_v16si_mask_round ((B), \ + (__v16si) \ + _mm512_setzero_si512 (), \ + (A), \ + (C))) + +#endif /* __OPTIMIZE__ */ + +/* Intrinsics vcvtph2udq. */ +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_cvtph_epu32 (__m256h __A) +{ + return (__m512i) + __builtin_ia32_vcvtph2udq_v16si_mask_round (__A, + (__v16si) + _mm512_setzero_si512 (), + (__mmask16) -1, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_cvtph_epu32 (__m512i __A, __mmask16 __B, __m256h __C) +{ + return (__m512i) + __builtin_ia32_vcvtph2udq_v16si_mask_round (__C, + (__v16si) __A, + __B, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_cvtph_epu32 (__mmask16 __A, __m256h __B) +{ + return (__m512i) + __builtin_ia32_vcvtph2udq_v16si_mask_round (__B, + (__v16si) + _mm512_setzero_si512 (), + __A, + _MM_FROUND_CUR_DIRECTION); +} + +#ifdef __OPTIMIZE__ +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_cvt_roundph_epu32 (__m256h __A, int __B) +{ + return (__m512i) + __builtin_ia32_vcvtph2udq_v16si_mask_round (__A, + (__v16si) + _mm512_setzero_si512 (), + (__mmask16) -1, + __B); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_cvt_roundph_epu32 (__m512i __A, __mmask16 __B, __m256h __C, int __D) +{ + return (__m512i) + __builtin_ia32_vcvtph2udq_v16si_mask_round (__C, + (__v16si) __A, + __B, + __D); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_cvt_roundph_epu32 (__mmask16 __A, __m256h __B, int __C) +{ + return (__m512i) + __builtin_ia32_vcvtph2udq_v16si_mask_round (__B, + (__v16si) + _mm512_setzero_si512 (), + __A, + __C); +} + +#else +#define _mm512_cvt_roundph_epu32(A, B) \ + ((__m512i) \ + __builtin_ia32_vcvtph2udq_v16si_mask_round ((A), \ + (__v16si) \ + _mm512_setzero_si512 (), \ + (__mmask16)-1, \ + (B))) + +#define _mm512_mask_cvt_roundph_epu32(A, B, C, D) \ + ((__m512i) \ + __builtin_ia32_vcvtph2udq_v16si_mask_round ((C), (__v16si)(A), (B), (D))) + +#define _mm512_maskz_cvt_roundph_epu32(A, B, C) \ + ((__m512i) \ + __builtin_ia32_vcvtph2udq_v16si_mask_round ((B), \ + (__v16si) \ + _mm512_setzero_si512 (), \ + (A), \ + (C))) + +#endif /* __OPTIMIZE__ */ + +/* Intrinsics vcvtph2qq. */ +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_cvtph_epi64 (__m128h __A) +{ + return __builtin_ia32_vcvtph2qq_v8di_mask_round (__A, + _mm512_setzero_si512 (), + (__mmask8) -1, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_cvtph_epi64 (__m512i __A, __mmask8 __B, __m128h __C) +{ + return __builtin_ia32_vcvtph2qq_v8di_mask_round (__C, __A, __B, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_cvtph_epi64 (__mmask8 __A, __m128h __B) +{ + return __builtin_ia32_vcvtph2qq_v8di_mask_round (__B, + _mm512_setzero_si512 (), + __A, + _MM_FROUND_CUR_DIRECTION); +} + +#ifdef __OPTIMIZE__ +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_cvt_roundph_epi64 (__m128h __A, int __B) +{ + return __builtin_ia32_vcvtph2qq_v8di_mask_round (__A, + _mm512_setzero_si512 (), + (__mmask8) -1, + __B); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_cvt_roundph_epi64 (__m512i __A, __mmask8 __B, __m128h __C, int __D) +{ + return __builtin_ia32_vcvtph2qq_v8di_mask_round (__C, __A, __B, __D); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_cvt_roundph_epi64 (__mmask8 __A, __m128h __B, int __C) +{ + return __builtin_ia32_vcvtph2qq_v8di_mask_round (__B, + _mm512_setzero_si512 (), + __A, + __C); +} + +#else +#define _mm512_cvt_roundph_epi64(A, B) \ + (__builtin_ia32_vcvtph2qq_v8di_mask_round ((A), \ + _mm512_setzero_si512 (), \ + (__mmask8)-1, \ + (B))) + +#define _mm512_mask_cvt_roundph_epi64(A, B, C, D) \ + (__builtin_ia32_vcvtph2qq_v8di_mask_round ((C), (A), (B), (D))) + +#define _mm512_maskz_cvt_roundph_epi64(A, B, C) \ + (__builtin_ia32_vcvtph2qq_v8di_mask_round ((B), \ + _mm512_setzero_si512 (), \ + (A), \ + (C))) + +#endif /* __OPTIMIZE__ */ + +/* Intrinsics vcvtph2uqq. */ +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_cvtph_epu64 (__m128h __A) +{ + return __builtin_ia32_vcvtph2uqq_v8di_mask_round (__A, + _mm512_setzero_si512 (), + (__mmask8) -1, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_cvtph_epu64 (__m512i __A, __mmask8 __B, __m128h __C) +{ + return __builtin_ia32_vcvtph2uqq_v8di_mask_round (__C, __A, __B, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_cvtph_epu64 (__mmask8 __A, __m128h __B) +{ + return __builtin_ia32_vcvtph2uqq_v8di_mask_round (__B, + _mm512_setzero_si512 (), + __A, + _MM_FROUND_CUR_DIRECTION); +} + +#ifdef __OPTIMIZE__ + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_cvt_roundph_epu64 (__m128h __A, int __B) +{ + return __builtin_ia32_vcvtph2uqq_v8di_mask_round (__A, + _mm512_setzero_si512 (), + (__mmask8) -1, + __B); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_cvt_roundph_epu64 (__m512i __A, __mmask8 __B, __m128h __C, int __D) +{ + return __builtin_ia32_vcvtph2uqq_v8di_mask_round (__C, __A, __B, __D); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_cvt_roundph_epu64 (__mmask8 __A, __m128h __B, int __C) +{ + return __builtin_ia32_vcvtph2uqq_v8di_mask_round (__B, + _mm512_setzero_si512 (), + __A, + __C); +} + +#else +#define _mm512_cvt_roundph_epu64(A, B) \ + (__builtin_ia32_vcvtph2uqq_v8di_mask_round ((A), \ + _mm512_setzero_si512 (), \ + (__mmask8)-1, \ + (B))) + +#define _mm512_mask_cvt_roundph_epu64(A, B, C, D) \ + (__builtin_ia32_vcvtph2uqq_v8di_mask_round ((C), (A), (B), (D))) + +#define _mm512_maskz_cvt_roundph_epu64(A, B, C) \ + (__builtin_ia32_vcvtph2uqq_v8di_mask_round ((B), \ + _mm512_setzero_si512 (), \ + (A), \ + (C))) + +#endif /* __OPTIMIZE__ */ + +/* Intrinsics vcvtph2w. */ +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_cvtph_epi16 (__m512h __A) +{ + return (__m512i) + __builtin_ia32_vcvtph2w_v32hi_mask_round (__A, + (__v32hi) + _mm512_setzero_si512 (), + (__mmask32) -1, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_cvtph_epi16 (__m512i __A, __mmask32 __B, __m512h __C) +{ + return (__m512i) + __builtin_ia32_vcvtph2w_v32hi_mask_round (__C, + (__v32hi) __A, + __B, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_cvtph_epi16 (__mmask32 __A, __m512h __B) +{ + return (__m512i) + __builtin_ia32_vcvtph2w_v32hi_mask_round (__B, + (__v32hi) + _mm512_setzero_si512 (), + __A, + _MM_FROUND_CUR_DIRECTION); +} + +#ifdef __OPTIMIZE__ +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_cvt_roundph_epi16 (__m512h __A, int __B) +{ + return (__m512i) + __builtin_ia32_vcvtph2w_v32hi_mask_round (__A, + (__v32hi) + _mm512_setzero_si512 (), + (__mmask32) -1, + __B); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_cvt_roundph_epi16 (__m512i __A, __mmask32 __B, __m512h __C, int __D) +{ + return (__m512i) + __builtin_ia32_vcvtph2w_v32hi_mask_round (__C, + (__v32hi) __A, + __B, + __D); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_cvt_roundph_epi16 (__mmask32 __A, __m512h __B, int __C) +{ + return (__m512i) + __builtin_ia32_vcvtph2w_v32hi_mask_round (__B, + (__v32hi) + _mm512_setzero_si512 (), + __A, + __C); +} + +#else +#define _mm512_cvt_roundph_epi16(A, B) \ + ((__m512i)__builtin_ia32_vcvtph2w_v32hi_mask_round ((A), \ + (__v32hi) \ + _mm512_setzero_si512 (), \ + (__mmask32)-1, \ + (B))) + +#define _mm512_mask_cvt_roundph_epi16(A, B, C, D) \ + ((__m512i)__builtin_ia32_vcvtph2w_v32hi_mask_round ((C), \ + (__v32hi)(A), \ + (B), \ + (D))) + +#define _mm512_maskz_cvt_roundph_epi16(A, B, C) \ + ((__m512i)__builtin_ia32_vcvtph2w_v32hi_mask_round ((B), \ + (__v32hi) \ + _mm512_setzero_si512 (), \ + (A), \ + (C))) + +#endif /* __OPTIMIZE__ */ + +/* Intrinsics vcvtph2uw. */ +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_cvtph_epu16 (__m512h __A) +{ + return (__m512i) + __builtin_ia32_vcvtph2uw_v32hi_mask_round (__A, + (__v32hi) + _mm512_setzero_si512 (), + (__mmask32) -1, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_cvtph_epu16 (__m512i __A, __mmask32 __B, __m512h __C) +{ + return (__m512i) + __builtin_ia32_vcvtph2uw_v32hi_mask_round (__C, (__v32hi) __A, __B, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_cvtph_epu16 (__mmask32 __A, __m512h __B) +{ + return (__m512i) + __builtin_ia32_vcvtph2uw_v32hi_mask_round (__B, + (__v32hi) + _mm512_setzero_si512 (), + __A, + _MM_FROUND_CUR_DIRECTION); +} + +#ifdef __OPTIMIZE__ +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_cvt_roundph_epu16 (__m512h __A, int __B) +{ + return (__m512i) + __builtin_ia32_vcvtph2uw_v32hi_mask_round (__A, + (__v32hi) + _mm512_setzero_si512 (), + (__mmask32) -1, + __B); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_cvt_roundph_epu16 (__m512i __A, __mmask32 __B, __m512h __C, int __D) +{ + return (__m512i) + __builtin_ia32_vcvtph2uw_v32hi_mask_round (__C, (__v32hi) __A, __B, __D); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_cvt_roundph_epu16 (__mmask32 __A, __m512h __B, int __C) +{ + return (__m512i) + __builtin_ia32_vcvtph2uw_v32hi_mask_round (__B, + (__v32hi) + _mm512_setzero_si512 (), + __A, + __C); +} + +#else +#define _mm512_cvt_roundph_epu16(A, B) \ + ((__m512i) \ + __builtin_ia32_vcvtph2uw_v32hi_mask_round ((A), \ + (__v32hi) \ + _mm512_setzero_si512 (), \ + (__mmask32)-1, (B))) + +#define _mm512_mask_cvt_roundph_epu16(A, B, C, D) \ + ((__m512i) \ + __builtin_ia32_vcvtph2uw_v32hi_mask_round ((C), (__v32hi)(A), (B), (D))) + +#define _mm512_maskz_cvt_roundph_epu16(A, B, C) \ + ((__m512i) \ + __builtin_ia32_vcvtph2uw_v32hi_mask_round ((B), \ + (__v32hi) \ + _mm512_setzero_si512 (), \ + (A), \ + (C))) + +#endif /* __OPTIMIZE__ */ + #ifdef __DISABLE_AVX512FP16__ #undef __DISABLE_AVX512FP16__ #pragma GCC pop_options diff --git a/gcc/config/i386/avx512fp16vlintrin.h b/gcc/config/i386/avx512fp16vlintrin.h index 206d60407fc..8a7e0aaa6b1 100644 --- a/gcc/config/i386/avx512fp16vlintrin.h +++ b/gcc/config/i386/avx512fp16vlintrin.h @@ -930,6 +930,351 @@ _mm_maskz_getmant_ph (__mmask8 __U, __m128h __A, #endif /* __OPTIMIZE__ */ +/* Intrinsics vcvtph2dq. */ +extern __inline __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvtph_epi32 (__m128h __A) +{ + return (__m128i) + __builtin_ia32_vcvtph2dq_v4si_mask (__A, + (__v4si) + _mm_setzero_si128 (), + (__mmask8) -1); +} + +extern __inline __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_cvtph_epi32 (__m128i __A, __mmask8 __B, __m128h __C) +{ + return (__m128i) + __builtin_ia32_vcvtph2dq_v4si_mask (__C, ( __v4si) __A, __B); +} + +extern __inline __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_cvtph_epi32 (__mmask8 __A, __m128h __B) +{ + return (__m128i) + __builtin_ia32_vcvtph2dq_v4si_mask (__B, + (__v4si) _mm_setzero_si128 (), + __A); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_cvtph_epi32 (__m128h __A) +{ + return (__m256i) + __builtin_ia32_vcvtph2dq_v8si_mask (__A, + (__v8si) + _mm256_setzero_si256 (), + (__mmask8) -1); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_mask_cvtph_epi32 (__m256i __A, __mmask8 __B, __m128h __C) +{ + return (__m256i) + __builtin_ia32_vcvtph2dq_v8si_mask (__C, ( __v8si) __A, __B); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_maskz_cvtph_epi32 (__mmask8 __A, __m128h __B) +{ + return (__m256i) + __builtin_ia32_vcvtph2dq_v8si_mask (__B, + (__v8si) + _mm256_setzero_si256 (), + __A); +} + +/* Intrinsics vcvtph2udq. */ +extern __inline __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvtph_epu32 (__m128h __A) +{ + return (__m128i) + __builtin_ia32_vcvtph2udq_v4si_mask (__A, + (__v4si) + _mm_setzero_si128 (), + (__mmask8) -1); +} + +extern __inline __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_cvtph_epu32 (__m128i __A, __mmask8 __B, __m128h __C) +{ + return (__m128i) + __builtin_ia32_vcvtph2udq_v4si_mask (__C, ( __v4si) __A, __B); +} + +extern __inline __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_cvtph_epu32 (__mmask8 __A, __m128h __B) +{ + return (__m128i) + __builtin_ia32_vcvtph2udq_v4si_mask (__B, + (__v4si) + _mm_setzero_si128 (), + __A); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_cvtph_epu32 (__m128h __A) +{ + return (__m256i) + __builtin_ia32_vcvtph2udq_v8si_mask (__A, + (__v8si) + _mm256_setzero_si256 (), + (__mmask8) -1); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_mask_cvtph_epu32 (__m256i __A, __mmask8 __B, __m128h __C) +{ + return (__m256i) + __builtin_ia32_vcvtph2udq_v8si_mask (__C, ( __v8si) __A, __B); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_maskz_cvtph_epu32 (__mmask8 __A, __m128h __B) +{ + return (__m256i) + __builtin_ia32_vcvtph2udq_v8si_mask (__B, + (__v8si) _mm256_setzero_si256 (), + __A); +} + +/* Intrinsics vcvtph2qq. */ +extern __inline __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvtph_epi64 (__m128h __A) +{ + return + __builtin_ia32_vcvtph2qq_v2di_mask (__A, + _mm_setzero_si128 (), + (__mmask8) -1); +} + +extern __inline __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_cvtph_epi64 (__m128i __A, __mmask8 __B, __m128h __C) +{ + return __builtin_ia32_vcvtph2qq_v2di_mask (__C, __A, __B); +} + +extern __inline __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_cvtph_epi64 (__mmask8 __A, __m128h __B) +{ + return __builtin_ia32_vcvtph2qq_v2di_mask (__B, + _mm_setzero_si128 (), + __A); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_cvtph_epi64 (__m128h __A) +{ + return __builtin_ia32_vcvtph2qq_v4di_mask (__A, + _mm256_setzero_si256 (), + (__mmask8) -1); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_mask_cvtph_epi64 (__m256i __A, __mmask8 __B, __m128h __C) +{ + return __builtin_ia32_vcvtph2qq_v4di_mask (__C, __A, __B); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_maskz_cvtph_epi64 (__mmask8 __A, __m128h __B) +{ + return __builtin_ia32_vcvtph2qq_v4di_mask (__B, + _mm256_setzero_si256 (), + __A); +} + +/* Intrinsics vcvtph2uqq. */ +extern __inline __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvtph_epu64 (__m128h __A) +{ + return __builtin_ia32_vcvtph2uqq_v2di_mask (__A, + _mm_setzero_si128 (), + (__mmask8) -1); +} + +extern __inline __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_cvtph_epu64 (__m128i __A, __mmask8 __B, __m128h __C) +{ + return __builtin_ia32_vcvtph2uqq_v2di_mask (__C, __A, __B); +} + +extern __inline __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_cvtph_epu64 (__mmask8 __A, __m128h __B) +{ + return __builtin_ia32_vcvtph2uqq_v2di_mask (__B, + _mm_setzero_si128 (), + __A); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_cvtph_epu64 (__m128h __A) +{ + return __builtin_ia32_vcvtph2uqq_v4di_mask (__A, + _mm256_setzero_si256 (), + (__mmask8) -1); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_mask_cvtph_epu64 (__m256i __A, __mmask8 __B, __m128h __C) +{ + return __builtin_ia32_vcvtph2uqq_v4di_mask (__C, __A, __B); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_maskz_cvtph_epu64 (__mmask8 __A, __m128h __B) +{ + return __builtin_ia32_vcvtph2uqq_v4di_mask (__B, + _mm256_setzero_si256 (), + __A); +} + +/* Intrinsics vcvtph2w. */ +extern __inline __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvtph_epi16 (__m128h __A) +{ + return (__m128i) + __builtin_ia32_vcvtph2w_v8hi_mask (__A, + (__v8hi) + _mm_setzero_si128 (), + (__mmask8) -1); +} + +extern __inline __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_cvtph_epi16 (__m128i __A, __mmask8 __B, __m128h __C) +{ + return (__m128i) + __builtin_ia32_vcvtph2w_v8hi_mask (__C, ( __v8hi) __A, __B); +} + +extern __inline __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_cvtph_epi16 (__mmask8 __A, __m128h __B) +{ + return (__m128i) + __builtin_ia32_vcvtph2w_v8hi_mask (__B, + (__v8hi) + _mm_setzero_si128 (), + __A); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_cvtph_epi16 (__m256h __A) +{ + return (__m256i) + __builtin_ia32_vcvtph2w_v16hi_mask (__A, + (__v16hi) + _mm256_setzero_si256 (), + (__mmask16) -1); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_mask_cvtph_epi16 (__m256i __A, __mmask16 __B, __m256h __C) +{ + return (__m256i) + __builtin_ia32_vcvtph2w_v16hi_mask (__C, ( __v16hi) __A, __B); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_maskz_cvtph_epi16 (__mmask16 __A, __m256h __B) +{ + return (__m256i) + __builtin_ia32_vcvtph2w_v16hi_mask (__B, + (__v16hi) + _mm256_setzero_si256 (), + __A); +} + +/* Intrinsics vcvtph2uw. */ +extern __inline __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvtph_epu16 (__m128h __A) +{ + return (__m128i) + __builtin_ia32_vcvtph2uw_v8hi_mask (__A, + (__v8hi) + _mm_setzero_si128 (), + (__mmask8) -1); +} + +extern __inline __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_cvtph_epu16 (__m128i __A, __mmask8 __B, __m128h __C) +{ + return (__m128i) + __builtin_ia32_vcvtph2uw_v8hi_mask (__C, ( __v8hi) __A, __B); +} + +extern __inline __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_cvtph_epu16 (__mmask8 __A, __m128h __B) +{ + return (__m128i) + __builtin_ia32_vcvtph2uw_v8hi_mask (__B, + (__v8hi) + _mm_setzero_si128 (), + __A); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_cvtph_epu16 (__m256h __A) +{ + return (__m256i) + __builtin_ia32_vcvtph2uw_v16hi_mask (__A, + (__v16hi) + _mm256_setzero_si256 (), + (__mmask16) -1); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_mask_cvtph_epu16 (__m256i __A, __mmask16 __B, __m256h __C) +{ + return (__m256i) + __builtin_ia32_vcvtph2uw_v16hi_mask (__C, ( __v16hi) __A, __B); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_maskz_cvtph_epu16 (__mmask16 __A, __m256h __B) +{ + return (__m256i) + __builtin_ia32_vcvtph2uw_v16hi_mask (__B, + (__v16hi) + _mm256_setzero_si256 (), + __A); +} + #ifdef __DISABLE_AVX512FP16VL__ #undef __DISABLE_AVX512FP16VL__ #pragma GCC pop_options diff --git a/gcc/config/i386/i386-builtin-types.def b/gcc/config/i386/i386-builtin-types.def index 6cf3e354c78..c430dc9ab48 100644 --- a/gcc/config/i386/i386-builtin-types.def +++ b/gcc/config/i386/i386-builtin-types.def @@ -1311,21 +1311,30 @@ DEF_FUNCTION_TYPE (SI, V32HF, INT, USI) DEF_FUNCTION_TYPE (V8HF, V8HF, V8HF) DEF_FUNCTION_TYPE (VOID, PCFLOAT16, V8HF, UQI) DEF_FUNCTION_TYPE (V8HF, PCFLOAT16, V8HF, UQI) +DEF_FUNCTION_TYPE (V2DI, V8HF, V2DI, UQI) +DEF_FUNCTION_TYPE (V4DI, V8HF, V4DI, UQI) +DEF_FUNCTION_TYPE (V4SI, V8HF, V4SI, UQI) +DEF_FUNCTION_TYPE (V8SI, V8HF, V8SI, UQI) +DEF_FUNCTION_TYPE (V8HI, V8HF, V8HI, UQI) DEF_FUNCTION_TYPE (V8HF, V8HF, V8HF, UQI) DEF_FUNCTION_TYPE (V8HF, V8HF, V8HF, INT) DEF_FUNCTION_TYPE (V8HF, V8HF, INT, V8HF, UQI) DEF_FUNCTION_TYPE (UQI, V8HF, V8HF, INT, UQI) DEF_FUNCTION_TYPE (V8HF, V8HF, V8HF, V8HF, UQI) DEF_FUNCTION_TYPE (UQI, V8HF, V8HF, INT, UQI, INT) +DEF_FUNCTION_TYPE (V8DI, V8HF, V8DI, UQI, INT) DEF_FUNCTION_TYPE (V8HF, V8HF, V8HF, V8HF, UQI, INT) DEF_FUNCTION_TYPE (V8HF, V8HF, V8HF, INT, V8HF, UQI, INT) DEF_FUNCTION_TYPE (V16HF, V16HF, V16HF) +DEF_FUNCTION_TYPE (V16HI, V16HF, V16HI, UHI) DEF_FUNCTION_TYPE (V16HF, V16HF, V16HF, UHI) +DEF_FUNCTION_TYPE (V16SI, V16HF, V16SI, UHI, INT) DEF_FUNCTION_TYPE (V16HF, V16HF, INT, V16HF, UHI) DEF_FUNCTION_TYPE (UHI, V16HF, V16HF, INT, UHI) DEF_FUNCTION_TYPE (V16HF, V16HF, V16HF, V16HF, UHI) DEF_FUNCTION_TYPE (V32HF, V32HF, V32HF, USI) DEF_FUNCTION_TYPE (V32HF, V32HF, V32HF, INT) +DEF_FUNCTION_TYPE (V32HI, V32HF, V32HI, USI, INT) DEF_FUNCTION_TYPE (USI, V32HF, V32HF, INT, USI) DEF_FUNCTION_TYPE (V32HF, V32HF, V32HF, USI, INT) DEF_FUNCTION_TYPE (V32HF, V32HF, V32HF, V32HF, USI) diff --git a/gcc/config/i386/i386-builtin.def b/gcc/config/i386/i386-builtin.def index be617b8f18a..dde8af53ff0 100644 --- a/gcc/config/i386/i386-builtin.def +++ b/gcc/config/i386/i386-builtin.def @@ -2831,6 +2831,18 @@ BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp1 BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512vl_getmantv16hf_mask, "__builtin_ia32_getmantph256_mask", IX86_BUILTIN_GETMANTPH256, UNKNOWN, (int) V16HF_FTYPE_V16HF_INT_V16HF_UHI) BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_getmantv8hf_mask, "__builtin_ia32_getmantph128_mask", IX86_BUILTIN_GETMANTPH128, UNKNOWN, (int) V8HF_FTYPE_V8HF_INT_V8HF_UQI) BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512f_movhf_mask, "__builtin_ia32_vmovsh_mask", IX86_BUILTIN_VMOVSH_MASK, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI) +BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtph2dq_v4si_mask, "__builtin_ia32_vcvtph2dq_v4si_mask", IX86_BUILTIN_VCVTPH2DQ_V4SI_MASK, UNKNOWN, (int) V4SI_FTYPE_V8HF_V4SI_UQI) +BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtph2dq_v8si_mask, "__builtin_ia32_vcvtph2dq_v8si_mask", IX86_BUILTIN_VCVTPH2DQ_V8SI_MASK, UNKNOWN, (int) V8SI_FTYPE_V8HF_V8SI_UQI) +BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtph2udq_v4si_mask, "__builtin_ia32_vcvtph2udq_v4si_mask", IX86_BUILTIN_VCVTPH2UDQ_V4SI_MASK, UNKNOWN, (int) V4SI_FTYPE_V8HF_V4SI_UQI) +BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtph2udq_v8si_mask, "__builtin_ia32_vcvtph2udq_v8si_mask", IX86_BUILTIN_VCVTPH2UDQ_V8SI_MASK, UNKNOWN, (int) V8SI_FTYPE_V8HF_V8SI_UQI) +BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtph2qq_v2di_mask, "__builtin_ia32_vcvtph2qq_v2di_mask", IX86_BUILTIN_VCVTPH2QQ_V2DI_MASK, UNKNOWN, (int) V2DI_FTYPE_V8HF_V2DI_UQI) +BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtph2qq_v4di_mask, "__builtin_ia32_vcvtph2qq_v4di_mask", IX86_BUILTIN_VCVTPH2QQ_V4DI_MASK, UNKNOWN, (int) V4DI_FTYPE_V8HF_V4DI_UQI) +BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtph2uqq_v2di_mask, "__builtin_ia32_vcvtph2uqq_v2di_mask", IX86_BUILTIN_VCVTPH2UQQ_V2DI_MASK, UNKNOWN, (int) V2DI_FTYPE_V8HF_V2DI_UQI) +BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtph2uqq_v4di_mask, "__builtin_ia32_vcvtph2uqq_v4di_mask", IX86_BUILTIN_VCVTPH2UQQ_V4DI_MASK, UNKNOWN, (int) V4DI_FTYPE_V8HF_V4DI_UQI) +BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtph2w_v8hi_mask, "__builtin_ia32_vcvtph2w_v8hi_mask", IX86_BUILTIN_VCVTPH2W_V8HI_MASK, UNKNOWN, (int) V8HI_FTYPE_V8HF_V8HI_UQI) +BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtph2w_v16hi_mask, "__builtin_ia32_vcvtph2w_v16hi_mask", IX86_BUILTIN_VCVTPH2W_V16HI_MASK, UNKNOWN, (int) V16HI_FTYPE_V16HF_V16HI_UHI) +BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtph2uw_v8hi_mask, "__builtin_ia32_vcvtph2uw_v8hi_mask", IX86_BUILTIN_VCVTPH2UW_V8HI_MASK, UNKNOWN, (int) V8HI_FTYPE_V8HF_V8HI_UQI) +BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtph2uw_v16hi_mask, "__builtin_ia32_vcvtph2uw_v16hi_mask", IX86_BUILTIN_VCVTPH2UW_V16HI_MASK, UNKNOWN, (int) V16HI_FTYPE_V16HF_V16HI_UHI) /* Builtins with rounding support. */ BDESC_END (ARGS, ROUND_ARGS) @@ -3058,6 +3070,12 @@ BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_getexpv32hf_mask_round, BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512f_sgetexpv8hf_mask_round, "__builtin_ia32_getexpsh_mask_round", IX86_BUILTIN_GETEXPSH_MASK_ROUND, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI_INT) BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_getmantv32hf_mask_round, "__builtin_ia32_getmantph512_mask", IX86_BUILTIN_GETMANTPH512, UNKNOWN, (int) V32HF_FTYPE_V32HF_INT_V32HF_USI_INT) BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512f_vgetmantv8hf_mask_round, "__builtin_ia32_getmantsh_mask_round", IX86_BUILTIN_GETMANTSH_MASK_ROUND, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_INT_V8HF_UQI_INT) +BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtph2dq_v16si_mask_round, "__builtin_ia32_vcvtph2dq_v16si_mask_round", IX86_BUILTIN_VCVTPH2DQ_V16SI_MASK_ROUND, UNKNOWN, (int) V16SI_FTYPE_V16HF_V16SI_UHI_INT) +BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtph2udq_v16si_mask_round, "__builtin_ia32_vcvtph2udq_v16si_mask_round", IX86_BUILTIN_VCVTPH2UDQ_V16SI_MASK_ROUND, UNKNOWN, (int) V16SI_FTYPE_V16HF_V16SI_UHI_INT) +BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtph2qq_v8di_mask_round, "__builtin_ia32_vcvtph2qq_v8di_mask_round", IX86_BUILTIN_VCVTPH2QQ_V8DI_MASK_ROUND, UNKNOWN, (int) V8DI_FTYPE_V8HF_V8DI_UQI_INT) +BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtph2uqq_v8di_mask_round, "__builtin_ia32_vcvtph2uqq_v8di_mask_round", IX86_BUILTIN_VCVTPH2UQQ_V8DI_MASK_ROUND, UNKNOWN, (int) V8DI_FTYPE_V8HF_V8DI_UQI_INT) +BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtph2w_v32hi_mask_round, "__builtin_ia32_vcvtph2w_v32hi_mask_round", IX86_BUILTIN_VCVTPH2W_V32HI_MASK_ROUND, UNKNOWN, (int) V32HI_FTYPE_V32HF_V32HI_USI_INT) +BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtph2uw_v32hi_mask_round, "__builtin_ia32_vcvtph2uw_v32hi_mask_round", IX86_BUILTIN_VCVTPH2UW_V32HI_MASK_ROUND, UNKNOWN, (int) V32HI_FTYPE_V32HF_V32HI_USI_INT) BDESC_END (ROUND_ARGS, MULTI_ARG) diff --git a/gcc/config/i386/i386-expand.c b/gcc/config/i386/i386-expand.c index bfc7fc75b97..59d1f4f5eea 100644 --- a/gcc/config/i386/i386-expand.c +++ b/gcc/config/i386/i386-expand.c @@ -9565,9 +9565,13 @@ ix86_expand_args_builtin (const struct builtin_description *d, case V16HF_FTYPE_V16HF_V16HF_UHI: case V8SF_FTYPE_V8HI_V8SF_UQI: case V4SF_FTYPE_V8HI_V4SF_UQI: + case V8SI_FTYPE_V8HF_V8SI_UQI: case V8SI_FTYPE_V8SF_V8SI_UQI: case V4SI_FTYPE_V4SF_V4SI_UQI: + case V4SI_FTYPE_V8HF_V4SI_UQI: + case V4DI_FTYPE_V8HF_V4DI_UQI: case V4DI_FTYPE_V4SF_V4DI_UQI: + case V2DI_FTYPE_V8HF_V2DI_UQI: case V2DI_FTYPE_V4SF_V2DI_UQI: case V8HF_FTYPE_V8HF_V8HF_UQI: case V4SF_FTYPE_V4DI_V4SF_UQI: @@ -9578,6 +9582,7 @@ ix86_expand_args_builtin (const struct builtin_description *d, case V16QI_FTYPE_V16HI_V16QI_UHI: case V16QI_FTYPE_V4SI_V16QI_UQI: case V16QI_FTYPE_V8SI_V16QI_UQI: + case V8HI_FTYPE_V8HF_V8HI_UQI: case V8HI_FTYPE_V4SI_V8HI_UQI: case V8HI_FTYPE_V8SI_V8HI_UQI: case V16QI_FTYPE_V2DI_V16QI_UQI: @@ -9635,6 +9640,7 @@ ix86_expand_args_builtin (const struct builtin_description *d, case V8DI_FTYPE_DI_V8DI_UQI: case V16SF_FTYPE_V8SF_V16SF_UHI: case V16SI_FTYPE_V8SI_V16SI_UHI: + case V16HI_FTYPE_V16HF_V16HI_UHI: case V16HI_FTYPE_V16HI_V16HI_UHI: case V8HI_FTYPE_V16QI_V8HI_UQI: case V16HI_FTYPE_V16QI_V16HI_UHI: @@ -10501,7 +10507,9 @@ ix86_expand_round_builtin (const struct builtin_description *d, break; case V8SF_FTYPE_V8DF_V8SF_QI_INT: case V8DF_FTYPE_V8DF_V8DF_QI_INT: + case V32HI_FTYPE_V32HF_V32HI_USI_INT: case V8SI_FTYPE_V8DF_V8SI_QI_INT: + case V8DI_FTYPE_V8HF_V8DI_UQI_INT: case V8DI_FTYPE_V8DF_V8DI_QI_INT: case V8SF_FTYPE_V8DI_V8SF_QI_INT: case V8DF_FTYPE_V8DI_V8DF_QI_INT: @@ -10510,6 +10518,7 @@ ix86_expand_round_builtin (const struct builtin_description *d, case V8DI_FTYPE_V8SF_V8DI_QI_INT: case V16SF_FTYPE_V16SI_V16SF_HI_INT: case V16SI_FTYPE_V16SF_V16SI_HI_INT: + case V16SI_FTYPE_V16HF_V16SI_UHI_INT: case V8DF_FTYPE_V8SF_V8DF_QI_INT: case V16SF_FTYPE_V16HI_V16SF_HI_INT: case V2DF_FTYPE_V2DF_V2DF_V2DF_INT: diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index 97f7c698d5d..7b705422396 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -722,6 +722,11 @@ (define_mode_attr ssebytemode [(V8DI "V64QI") (V4DI "V32QI") (V2DI "V16QI") (V16SI "V64QI") (V8SI "V32QI") (V4SI "V16QI")]) +(define_mode_attr sseintconvert + [(V32HI "w") (V16HI "w") (V8HI "w") + (V16SI "dq") (V8SI "dq") (V4SI "dq") + (V8DI "qq") (V4DI "qq") (V2DI "qq")]) + ;; All 128bit vector integer modes (define_mode_iterator VI_128 [V16QI V8HI V4SI V2DI]) @@ -943,6 +948,12 @@ (define_mode_attr ssehalfvecmodelower (V4SF "v2sf") (V32HF "v16hf") (V16HF "v8hf") (V8HF "v4hf")]) +;; Mapping of vector modes to vector hf modes of conversion. +(define_mode_attr ssePHmode + [(V32HI "V32HF") (V16HI "V16HF") (V8HI "V8HF") + (V16SI "V16HF") (V8SI "V8HF") (V4SI "V8HF") + (V8DI "V8HF") (V4DI "V8HF") (V2DI "V8HF")]) + ;; Mapping of vector modes to packed single mode of the same size (define_mode_attr ssePSmode [(V16SI "V16SF") (V8DF "V16SF") @@ -5408,6 +5419,30 @@ (define_insn "*fma4i_vmfnmsub_" [(set_attr "type" "ssemuladd") (set_attr "mode" "")]) +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;; +;; Parallel half-precision floating point conversion operations +;; +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; + +(define_int_iterator UNSPEC_US_FIX_NOTRUNC + [UNSPEC_UNSIGNED_FIX_NOTRUNC UNSPEC_FIX_NOTRUNC]) + +(define_int_attr sseintconvertsignprefix + [(UNSPEC_UNSIGNED_FIX_NOTRUNC "u") + (UNSPEC_FIX_NOTRUNC "")]) + +(define_insn "avx512fp16_vcvtph2_" + [(set (match_operand:VI248_AVX512VL 0 "register_operand" "=v") + (unspec:VI248_AVX512VL + [(match_operand: 1 "" "")] + UNSPEC_US_FIX_NOTRUNC))] + "TARGET_AVX512FP16" + "vcvtph2\t{%1, %0|%0, %1}" + [(set_attr "type" "ssecvt") + (set_attr "prefix" "evex") + (set_attr "mode" "")]) + ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; ;; ;; Parallel single-precision floating point conversion operations diff --git a/gcc/testsuite/gcc.target/i386/avx-1.c b/gcc/testsuite/gcc.target/i386/avx-1.c index b3cffa0644f..cdfc2e3b69f 100644 --- a/gcc/testsuite/gcc.target/i386/avx-1.c +++ b/gcc/testsuite/gcc.target/i386/avx-1.c @@ -719,6 +719,12 @@ #define __builtin_ia32_getexpsh_mask_round(A, B, C, D, E) __builtin_ia32_getexpsh_mask_round(A, B, C, D, 4) #define __builtin_ia32_getmantph512_mask(A, F, C, D, E) __builtin_ia32_getmantph512_mask(A, 1, C, D, 8) #define __builtin_ia32_getmantsh_mask_round(A, B, C, W, U, D) __builtin_ia32_getmantsh_mask_round(A, B, 1, W, U, 4) +#define __builtin_ia32_vcvtph2dq_v16si_mask_round(A, B, C, D) __builtin_ia32_vcvtph2dq_v16si_mask_round(A, B, C, 8) +#define __builtin_ia32_vcvtph2udq_v16si_mask_round(A, B, C, D) __builtin_ia32_vcvtph2udq_v16si_mask_round(A, B, C, 8) +#define __builtin_ia32_vcvtph2qq_v8di_mask_round(A, B, C, D) __builtin_ia32_vcvtph2qq_v8di_mask_round(A, B, C, 8) +#define __builtin_ia32_vcvtph2uqq_v8di_mask_round(A, B, C, D) __builtin_ia32_vcvtph2uqq_v8di_mask_round(A, B, C, 8) +#define __builtin_ia32_vcvtph2w_v32hi_mask_round(A, B, C, D) __builtin_ia32_vcvtph2w_v32hi_mask_round(A, B, C, 8) +#define __builtin_ia32_vcvtph2uw_v32hi_mask_round(A, B, C, D) __builtin_ia32_vcvtph2uw_v32hi_mask_round(A, B, C, 8) /* avx512fp16vlintrin.h */ #define __builtin_ia32_vcmpph_v8hf_mask(A, B, C, D) __builtin_ia32_vcmpph_v8hf_mask(A, B, 1, D) diff --git a/gcc/testsuite/gcc.target/i386/sse-13.c b/gcc/testsuite/gcc.target/i386/sse-13.c index 67ef567e437..5e4aaf8ce9b 100644 --- a/gcc/testsuite/gcc.target/i386/sse-13.c +++ b/gcc/testsuite/gcc.target/i386/sse-13.c @@ -736,6 +736,12 @@ #define __builtin_ia32_getexpsh_mask_round(A, B, C, D, E) __builtin_ia32_getexpsh_mask_round(A, B, C, D, 4) #define __builtin_ia32_getmantph512_mask(A, F, C, D, E) __builtin_ia32_getmantph512_mask(A, 1, C, D, 8) #define __builtin_ia32_getmantsh_mask_round(A, B, C, W, U, D) __builtin_ia32_getmantsh_mask_round(A, B, 1, W, U, 4) +#define __builtin_ia32_vcvtph2dq_v16si_mask_round(A, B, C, D) __builtin_ia32_vcvtph2dq_v16si_mask_round(A, B, C, 8) +#define __builtin_ia32_vcvtph2udq_v16si_mask_round(A, B, C, D) __builtin_ia32_vcvtph2udq_v16si_mask_round(A, B, C, 8) +#define __builtin_ia32_vcvtph2qq_v8di_mask_round(A, B, C, D) __builtin_ia32_vcvtph2qq_v8di_mask_round(A, B, C, 8) +#define __builtin_ia32_vcvtph2uqq_v8di_mask_round(A, B, C, D) __builtin_ia32_vcvtph2uqq_v8di_mask_round(A, B, C, 8) +#define __builtin_ia32_vcvtph2w_v32hi_mask_round(A, B, C, D) __builtin_ia32_vcvtph2w_v32hi_mask_round(A, B, C, 8) +#define __builtin_ia32_vcvtph2uw_v32hi_mask_round(A, B, C, D) __builtin_ia32_vcvtph2uw_v32hi_mask_round(A, B, C, 8) /* avx512fp16vlintrin.h */ #define __builtin_ia32_vcmpph_v8hf_mask(A, B, C, D) __builtin_ia32_vcmpph_v8hf_mask(A, B, 1, D) diff --git a/gcc/testsuite/gcc.target/i386/sse-14.c b/gcc/testsuite/gcc.target/i386/sse-14.c index 04163874f90..32aa4518703 100644 --- a/gcc/testsuite/gcc.target/i386/sse-14.c +++ b/gcc/testsuite/gcc.target/i386/sse-14.c @@ -678,6 +678,12 @@ test_1 (_mm_roundscale_ph, __m128h, __m128h, 123) test_1 (_mm256_roundscale_ph, __m256h, __m256h, 123) test_1 (_mm512_roundscale_ph, __m512h, __m512h, 123) test_1 (_mm512_getexp_round_ph, __m512h, __m512h, 8) +test_1 (_mm512_cvt_roundph_epi16, __m512i, __m512h, 8) +test_1 (_mm512_cvt_roundph_epu16, __m512i, __m512h, 8) +test_1 (_mm512_cvt_roundph_epi32, __m512i, __m256h, 8) +test_1 (_mm512_cvt_roundph_epu32, __m512i, __m256h, 8) +test_1 (_mm512_cvt_roundph_epi64, __m512i, __m128h, 8) +test_1 (_mm512_cvt_roundph_epu64, __m512i, __m128h, 8) test_1x (_mm512_reduce_round_ph, __m512h, __m512h, 123, 8) test_1x (_mm512_roundscale_round_ph, __m512h, __m512h, 123, 8) test_1x (_mm512_getmant_ph, __m512h, __m512h, 1, 1) @@ -710,6 +716,12 @@ test_2 (_mm512_maskz_roundscale_ph, __m512h, __mmask32, __m512h, 123) test_2 (_mm_roundscale_sh, __m128h, __m128h, __m128h, 123) test_2 (_mm512_maskz_getexp_round_ph, __m512h, __mmask32, __m512h, 8) test_2 (_mm_getexp_round_sh, __m128h, __m128h, __m128h, 8) +test_2 (_mm512_maskz_cvt_roundph_epi16, __m512i, __mmask32, __m512h, 8) +test_2 (_mm512_maskz_cvt_roundph_epu16, __m512i, __mmask32, __m512h, 8) +test_2 (_mm512_maskz_cvt_roundph_epi32, __m512i, __mmask16, __m256h, 8) +test_2 (_mm512_maskz_cvt_roundph_epu32, __m512i, __mmask16, __m256h, 8) +test_2 (_mm512_maskz_cvt_roundph_epi64, __m512i, __mmask8, __m128h, 8) +test_2 (_mm512_maskz_cvt_roundph_epu64, __m512i, __mmask8, __m128h, 8) test_2x (_mm512_cmp_round_ph_mask, __mmask32, __m512h, __m512h, 1, 8) test_2x (_mm_cmp_round_sh_mask, __mmask8, __m128h, __m128h, 1, 8) test_2x (_mm_comi_round_sh, int, __m128h, __m128h, 1, 8) @@ -748,6 +760,12 @@ test_3 (_mm512_mask_roundscale_ph, __m512h, __m512h, __mmask32, __m512h, 123) test_3 (_mm_maskz_roundscale_sh, __m128h, __mmask8, __m128h, __m128h, 123) test_3 (_mm_maskz_getexp_round_sh, __m128h, __mmask8, __m128h, __m128h, 8) test_3 (_mm512_mask_getexp_round_ph, __m512h, __m512h, __mmask32, __m512h, 8) +test_3 (_mm512_mask_cvt_roundph_epi16, __m512i, __m512i, __mmask32, __m512h, 8) +test_3 (_mm512_mask_cvt_roundph_epu16, __m512i, __m512i, __mmask32, __m512h, 8) +test_3 (_mm512_mask_cvt_roundph_epi32, __m512i, __m512i, __mmask16, __m256h, 8) +test_3 (_mm512_mask_cvt_roundph_epu32, __m512i, __m512i, __mmask16, __m256h, 8) +test_3 (_mm512_mask_cvt_roundph_epi64, __m512i, __m512i, __mmask8, __m128h, 8) +test_3 (_mm512_mask_cvt_roundph_epu64, __m512i, __m512i, __mmask8, __m128h, 8) test_3x (_mm512_mask_cmp_round_ph_mask, __mmask32, __mmask32, __m512h, __m512h, 1, 8) test_3x (_mm_mask_cmp_round_sh_mask, __mmask8, __mmask8, __m128h, __m128h, 1, 8) test_3x (_mm512_mask_reduce_round_ph, __m512h, __m512h, __mmask32, __m512h, 123, 8) diff --git a/gcc/testsuite/gcc.target/i386/sse-22.c b/gcc/testsuite/gcc.target/i386/sse-22.c index 008600a393d..44ac10d602f 100644 --- a/gcc/testsuite/gcc.target/i386/sse-22.c +++ b/gcc/testsuite/gcc.target/i386/sse-22.c @@ -783,6 +783,12 @@ test_1 (_mm_roundscale_ph, __m128h, __m128h, 123) test_1 (_mm256_roundscale_ph, __m256h, __m256h, 123) test_1 (_mm512_roundscale_ph, __m512h, __m512h, 123) test_1 (_mm512_getexp_round_ph, __m512h, __m512h, 8) +test_1 (_mm512_cvt_roundph_epi16, __m512i, __m512h, 8) +test_1 (_mm512_cvt_roundph_epu16, __m512i, __m512h, 8) +test_1 (_mm512_cvt_roundph_epi32, __m512i, __m256h, 8) +test_1 (_mm512_cvt_roundph_epu32, __m512i, __m256h, 8) +test_1 (_mm512_cvt_roundph_epi64, __m512i, __m128h, 8) +test_1 (_mm512_cvt_roundph_epu64, __m512i, __m128h, 8) test_1x (_mm512_reduce_round_ph, __m512h, __m512h, 123, 8) test_1x (_mm512_roundscale_round_ph, __m512h, __m512h, 123, 8) test_1x (_mm512_getmant_ph, __m512h, __m512h, 1, 1) @@ -814,6 +820,12 @@ test_2 (_mm512_maskz_roundscale_ph, __m512h, __mmask32, __m512h, 123) test_2 (_mm_roundscale_sh, __m128h, __m128h, __m128h, 123) test_2 (_mm512_maskz_getexp_round_ph, __m512h, __mmask32, __m512h, 8) test_2 (_mm_getexp_round_sh, __m128h, __m128h, __m128h, 8) +test_2 (_mm512_maskz_cvt_roundph_epi16, __m512i, __mmask32, __m512h, 8) +test_2 (_mm512_maskz_cvt_roundph_epu16, __m512i, __mmask32, __m512h, 8) +test_2 (_mm512_maskz_cvt_roundph_epi32, __m512i, __mmask16, __m256h, 8) +test_2 (_mm512_maskz_cvt_roundph_epu32, __m512i, __mmask16, __m256h, 8) +test_2 (_mm512_maskz_cvt_roundph_epi64, __m512i, __mmask8, __m128h, 8) +test_2 (_mm512_maskz_cvt_roundph_epu64, __m512i, __mmask8, __m128h, 8) test_2x (_mm512_cmp_round_ph_mask, __mmask32, __m512h, __m512h, 1, 8) test_2x (_mm_cmp_round_sh_mask, __mmask8, __m128h, __m128h, 1, 8) test_2x (_mm_comi_round_sh, int, __m128h, __m128h, 1, 8) @@ -851,6 +863,12 @@ test_3 (_mm512_mask_roundscale_ph, __m512h, __m512h, __mmask32, __m512h, 123) test_3 (_mm_maskz_roundscale_sh, __m128h, __mmask8, __m128h, __m128h, 123) test_3 (_mm_maskz_getexp_round_sh, __m128h, __mmask8, __m128h, __m128h, 8) test_3 (_mm512_mask_getexp_round_ph, __m512h, __m512h, __mmask32, __m512h, 8) +test_3 (_mm512_mask_cvt_roundph_epi16, __m512i, __m512i, __mmask32, __m512h, 8) +test_3 (_mm512_mask_cvt_roundph_epu16, __m512i, __m512i, __mmask32, __m512h, 8) +test_3 (_mm512_mask_cvt_roundph_epi32, __m512i, __m512i, __mmask16, __m256h, 8) +test_3 (_mm512_mask_cvt_roundph_epu32, __m512i, __m512i, __mmask16, __m256h, 8) +test_3 (_mm512_mask_cvt_roundph_epi64, __m512i, __m512i, __mmask8, __m128h, 8) +test_3 (_mm512_mask_cvt_roundph_epu64, __m512i, __m512i, __mmask8, __m128h, 8) test_3x (_mm512_mask_cmp_round_ph_mask, __mmask32, __mmask32, __m512h, __m512h, 1, 8) test_3x (_mm_mask_cmp_round_sh_mask, __mmask8, __mmask8, __m128h, __m128h, 1, 8) test_3x (_mm512_mask_reduce_round_ph, __m512h, __m512h, __mmask32, __m512h, 123, 8) diff --git a/gcc/testsuite/gcc.target/i386/sse-23.c b/gcc/testsuite/gcc.target/i386/sse-23.c index b3f07587acb..ae6151b4a61 100644 --- a/gcc/testsuite/gcc.target/i386/sse-23.c +++ b/gcc/testsuite/gcc.target/i386/sse-23.c @@ -737,6 +737,12 @@ #define __builtin_ia32_getexpsh_mask_round(A, B, C, D, E) __builtin_ia32_getexpsh_mask_round(A, B, C, D, 4) #define __builtin_ia32_getmantph512_mask(A, F, C, D, E) __builtin_ia32_getmantph512_mask(A, 1, C, D, 8) #define __builtin_ia32_getmantsh_mask_round(A, B, C, W, U, D) __builtin_ia32_getmantsh_mask_round(A, B, 1, W, U, 4) +#define __builtin_ia32_vcvtph2dq_v16si_mask_round(A, B, C, D) __builtin_ia32_vcvtph2dq_v16si_mask_round(A, B, C, 8) +#define __builtin_ia32_vcvtph2udq_v16si_mask_round(A, B, C, D) __builtin_ia32_vcvtph2udq_v16si_mask_round(A, B, C, 8) +#define __builtin_ia32_vcvtph2qq_v8di_mask_round(A, B, C, D) __builtin_ia32_vcvtph2qq_v8di_mask_round(A, B, C, 8) +#define __builtin_ia32_vcvtph2uqq_v8di_mask_round(A, B, C, D) __builtin_ia32_vcvtph2uqq_v8di_mask_round(A, B, C, 8) +#define __builtin_ia32_vcvtph2w_v32hi_mask_round(A, B, C, D) __builtin_ia32_vcvtph2w_v32hi_mask_round(A, B, C, 8) +#define __builtin_ia32_vcvtph2uw_v32hi_mask_round(A, B, C, D) __builtin_ia32_vcvtph2uw_v32hi_mask_round(A, B, C, 8) /* avx512fp16vlintrin.h */ #define __builtin_ia32_vcmpph_v8hf_mask(A, B, C, D) __builtin_ia32_vcmpph_v8hf_mask(A, B, 1, D) From patchwork Thu Jul 1 06:16:13 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: liuhongt X-Patchwork-Id: 1499334 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=8.43.85.97; helo=sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256 header.s=default header.b=cJpBW6Be; dkim-atps=neutral Received: from sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4GFphr3NBKz9sWX for ; Thu, 1 Jul 2021 16:49:00 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 2D726383A809 for ; Thu, 1 Jul 2021 06:48:58 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 2D726383A809 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1625122138; bh=otbr+dT3ESBY7rl7/yNZOhTHedi8c6r2+Qho+LtO/ZQ=; h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=cJpBW6Be8JHVoHUEZq8uCuTkINO/R9BmrGgf3mh2I4wuzMtGVWxsIXhhysKDclYTt I11KFI3AvMCl42qXm+jnKuTr+YIjy6/+L9FF4nxnMvUIQ/6VmZHmMcH3DMbHCWEcZa cUuze5sXD84tkKblvgnWOcidO3GX8b4YX4j1WWkA= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mga12.intel.com (mga12.intel.com [192.55.52.136]) by sourceware.org (Postfix) with ESMTPS id 915F0384A013 for ; Thu, 1 Jul 2021 06:17:36 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 915F0384A013 X-IronPort-AV: E=McAfee;i="6200,9189,10031"; a="188163490" X-IronPort-AV: E=Sophos;i="5.83,313,1616482800"; d="scan'208";a="188163490" Received: from orsmga001.jf.intel.com ([10.7.209.18]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Jun 2021 23:17:35 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.83,313,1616482800"; d="scan'208";a="489821355" Received: from scymds01.sc.intel.com ([10.148.94.138]) by orsmga001.jf.intel.com with ESMTP; 30 Jun 2021 23:17:34 -0700 Received: from shliclel320.sh.intel.com (shliclel320.sh.intel.com [10.239.236.50]) by scymds01.sc.intel.com with ESMTP id 1616Gmf6031625; Wed, 30 Jun 2021 23:17:33 -0700 To: gcc-patches@gcc.gnu.org Subject: [PATCH 27/62] AVX512FP16: Add testcase for vcvtph2w/vcvtph2uw/vcvtph2dq/vcvtph2udq/vcvtph2qq/vcvtph2uqq. Date: Thu, 1 Jul 2021 14:16:13 +0800 Message-Id: <20210701061648.9447-28-hongtao.liu@intel.com> X-Mailer: git-send-email 2.18.1 In-Reply-To: <20210701061648.9447-1-hongtao.liu@intel.com> References: <20210701061648.9447-1-hongtao.liu@intel.com> X-Spam-Status: No, score=-12.3 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_NONE, KAM_DMARC_STATUS, KAM_LAZY_DOMAIN_SECURITY, KAM_SHORT, SPF_HELO_PASS, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: liuhongt via Gcc-patches From: liuhongt Reply-To: liuhongt Cc: jakub@redhat.com Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" gcc/testsuite/ChangeLog: * gcc.target/i386/avx512fp16-helper.h (V512): Add QI components. * gcc.target/i386/avx512fp16-vcvtph2dq-1a.c: New test. * gcc.target/i386/avx512fp16-vcvtph2dq-1b.c: Ditto. * gcc.target/i386/avx512fp16-vcvtph2qq-1a.c: Ditto. * gcc.target/i386/avx512fp16-vcvtph2qq-1b.c: Ditto. * gcc.target/i386/avx512fp16-vcvtph2udq-1a.c: Ditto. * gcc.target/i386/avx512fp16-vcvtph2udq-1b.c: Ditto. * gcc.target/i386/avx512fp16-vcvtph2uqq-1a.c: Ditto. * gcc.target/i386/avx512fp16-vcvtph2uqq-1b.c: Ditto. * gcc.target/i386/avx512fp16-vcvtph2uw-1a.c: Ditto. * gcc.target/i386/avx512fp16-vcvtph2uw-1b.c: Ditto. * gcc.target/i386/avx512fp16-vcvtph2w-1a.c: Ditto. * gcc.target/i386/avx512fp16-vcvtph2w-1b.c: Ditto. * gcc.target/i386/avx512fp16vl-vcvtph2dq-1a.c: Ditto. * gcc.target/i386/avx512fp16vl-vcvtph2dq-1b.c: Ditto. * gcc.target/i386/avx512fp16vl-vcvtph2qq-1a.c: Ditto. * gcc.target/i386/avx512fp16vl-vcvtph2qq-1b.c: Ditto. * gcc.target/i386/avx512fp16vl-vcvtph2udq-1a.c: Ditto. * gcc.target/i386/avx512fp16vl-vcvtph2udq-1b.c: Ditto. * gcc.target/i386/avx512fp16vl-vcvtph2uqq-1a.c: Ditto. * gcc.target/i386/avx512fp16vl-vcvtph2uqq-1b.c: Ditto. * gcc.target/i386/avx512fp16vl-vcvtph2uw-1a.c: Ditto. * gcc.target/i386/avx512fp16vl-vcvtph2uw-1b.c: Ditto. * gcc.target/i386/avx512fp16vl-vcvtph2w-1a.c: Ditto. * gcc.target/i386/avx512fp16vl-vcvtph2w-1b.c: Ditto. --- .../gcc.target/i386/avx512fp16-helper.h | 25 +++++- .../gcc.target/i386/avx512fp16-vcvtph2dq-1a.c | 24 ++++++ .../gcc.target/i386/avx512fp16-vcvtph2dq-1b.c | 79 +++++++++++++++++ .../gcc.target/i386/avx512fp16-vcvtph2qq-1a.c | 24 ++++++ .../gcc.target/i386/avx512fp16-vcvtph2qq-1b.c | 78 +++++++++++++++++ .../i386/avx512fp16-vcvtph2udq-1a.c | 24 ++++++ .../i386/avx512fp16-vcvtph2udq-1b.c | 79 +++++++++++++++++ .../i386/avx512fp16-vcvtph2uqq-1a.c | 24 ++++++ .../i386/avx512fp16-vcvtph2uqq-1b.c | 78 +++++++++++++++++ .../gcc.target/i386/avx512fp16-vcvtph2uw-1a.c | 24 ++++++ .../gcc.target/i386/avx512fp16-vcvtph2uw-1b.c | 84 +++++++++++++++++++ .../gcc.target/i386/avx512fp16-vcvtph2w-1a.c | 24 ++++++ .../gcc.target/i386/avx512fp16-vcvtph2w-1b.c | 83 ++++++++++++++++++ .../i386/avx512fp16vl-vcvtph2dq-1a.c | 27 ++++++ .../i386/avx512fp16vl-vcvtph2dq-1b.c | 15 ++++ .../i386/avx512fp16vl-vcvtph2qq-1a.c | 27 ++++++ .../i386/avx512fp16vl-vcvtph2qq-1b.c | 15 ++++ .../i386/avx512fp16vl-vcvtph2udq-1a.c | 27 ++++++ .../i386/avx512fp16vl-vcvtph2udq-1b.c | 15 ++++ .../i386/avx512fp16vl-vcvtph2uqq-1a.c | 27 ++++++ .../i386/avx512fp16vl-vcvtph2uqq-1b.c | 15 ++++ .../i386/avx512fp16vl-vcvtph2uw-1a.c | 29 +++++++ .../i386/avx512fp16vl-vcvtph2uw-1b.c | 15 ++++ .../i386/avx512fp16vl-vcvtph2w-1a.c | 29 +++++++ .../i386/avx512fp16vl-vcvtph2w-1b.c | 15 ++++ 25 files changed, 903 insertions(+), 3 deletions(-) create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvtph2dq-1a.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvtph2dq-1b.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvtph2qq-1a.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvtph2qq-1b.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvtph2udq-1a.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvtph2udq-1b.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvtph2uqq-1a.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvtph2uqq-1b.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvtph2uw-1a.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvtph2uw-1b.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvtph2w-1a.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvtph2w-1b.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtph2dq-1a.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtph2dq-1b.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtph2qq-1a.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtph2qq-1b.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtph2udq-1a.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtph2udq-1b.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtph2uqq-1a.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtph2uqq-1b.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtph2uw-1a.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtph2uw-1b.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtph2w-1a.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtph2w-1b.c diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-helper.h b/gcc/testsuite/gcc.target/i386/avx512fp16-helper.h index f6f46872c35..aa83b66998c 100644 --- a/gcc/testsuite/gcc.target/i386/avx512fp16-helper.h +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-helper.h @@ -25,13 +25,17 @@ typedef union { __m512 zmm; __m512h zmmh; + __m512i zmmi; __m256 ymm[2]; __m256h ymmh[2]; __m256i ymmi[2]; __m128h xmmh[4]; __m128 xmm[4]; + __m128i xmmi[4]; unsigned short u16[32]; unsigned int u32[16]; + long long s64[8]; + unsigned long long u64[8]; float f32[16]; _Float16 f16[32]; } V512; @@ -162,9 +166,9 @@ init_src() int i; for (i = 0; i < AVX512F_MAX_ELEM; i++) { - v1.f32[i] = -i + 1; + v1.f32[i] = i + 1; v2.f32[i] = i * 0.5f; - v3.f32[i] = i * 2.5f; + v3.f32[i] = i * 1.5f; v4.f32[i] = i - 0.5f; src3.u32[i] = (i + 1) * 10; @@ -217,30 +221,45 @@ init_dest(V512 * res, V512 * exp) #if AVX512F_LEN == 256 #undef HF #undef SF +#undef SI +#undef H_HF #undef NET_MASK -#undef MASK_VALUE +#undef MASK_VALUE +#undef HALF_MASK #undef ZMASK_VALUE #define NET_MASK 0xffff #define MASK_VALUE 0xcccc #define ZMASK_VALUE 0xfcc1 +#define HALF_MASK 0xcc #define HF(x) x.ymmh[0] +#define H_HF(x) x.xmmh[0] #define SF(x) x.ymm[0] +#define SI(x) x.ymmi[0] #elif AVX512F_LEN == 128 #undef HF #undef SF +#undef SI +#undef H_HF #undef NET_MASK #undef MASK_VALUE #undef ZMASK_VALUE +#undef HALF_MASK #define NET_MASK 0xff #define MASK_VALUE 0xcc +#define HALF_MASK MASK_VALUE #define ZMASK_VALUE 0xc1 #define HF(x) x.xmmh[0] #define SF(x) x.xmm[0] +#define SI(x) x.xmmi[0] +#define H_HF(x) x.xmmh[0] #else #define NET_MASK 0xffffffff #define MASK_VALUE 0xcccccccc #define ZMASK_VALUE 0xfcc1fcc1 +#define HALF_MASK 0xcccc #define HF(x) x.zmmh #define SF(x) x.zmm +#define SI(x) x.zmmi +#define H_HF(x) x.ymmh[0] #endif diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtph2dq-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtph2dq-1a.c new file mode 100644 index 00000000000..31a56393f0e --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtph2dq-1a.c @@ -0,0 +1,24 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx512fp16 -O2" } */ +/* { dg-final { scan-assembler-times "vcvtph2dq\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 2 } } */ +/* { dg-final { scan-assembler-times "vcvtph2dq\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\[^\{\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtph2dq\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtph2dq\[ \\t\]+\{rn-sae\}\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtph2dq\[ \\t\]+\{rz-sae\}\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ + +#include + +volatile __m512i res, res1, res2; +volatile __m256h x1, x2, x3; +volatile __mmask16 m16; + +void extern +avx512f_test (void) +{ + res = _mm512_cvtph_epi32 (x1); + res1 = _mm512_mask_cvtph_epi32 (res, m16, x2); + res2 = _mm512_maskz_cvtph_epi32 (m16, x3); + res = _mm512_cvt_roundph_epi32 (x1, 4); + res1 = _mm512_mask_cvt_roundph_epi32 (res, m16, x2, 8); + res2 = _mm512_maskz_cvt_roundph_epi32 (m16, x3, 11); +} diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtph2dq-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtph2dq-1b.c new file mode 100644 index 00000000000..80a85828271 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtph2dq-1b.c @@ -0,0 +1,79 @@ +/* { dg-do run { target avx512fp16 } } */ +/* { dg-options "-O2 -mavx512fp16 -mavx512dq" } */ + + +#define AVX512FP16 +#include "avx512fp16-helper.h" + +#define N_ELEMS (AVX512F_LEN / 16) + +void NOINLINE +EMULATE(cvtph2_d) (V512 * dest, V512 op1, + __mmask32 k, int zero_mask) +{ + V512 v1, v2, v3, v4, v5, v6, v7, v8; + int i; + __mmask16 m1, m2; + + m1 = k & 0xffff; + + unpack_ph_2twops(op1, &v1, &v2); + + for (i = 0; i < 16; i++) { + if (((1 << i) & m1) == 0) { + if (zero_mask) { + v5.u32[i] = 0; + } + else { + v5.u32[i] = dest->u32[i]; + } + } + else { + v5.u32[i] = v1.f32[i]; + + } + } + *dest = v5; +} + +void +TEST (void) +{ + V512 res; + V512 exp; + + init_src(); + + EMULATE(cvtph2_d)(&exp, src1, NET_MASK, 0); + SI(res) = INTRINSIC (_cvtph_epi32) (H_HF(src1)); + CHECK_RESULT (&res, &exp, N_ELEMS, _cvtph_epi32); + + init_dest(&res, &exp); + EMULATE(cvtph2_d)(&exp, src1, HALF_MASK, 0); + SI(res) = INTRINSIC (_mask_cvtph_epi32) (SI(res), HALF_MASK, H_HF(src1)); + CHECK_RESULT (&res, &exp, N_ELEMS, _mask_cvtph_epi32); + + EMULATE(cvtph2_d)(&exp, src1, HALF_MASK, 1); + SI(res) = INTRINSIC (_maskz_cvtph_epi32) (HALF_MASK, H_HF(src1)); + CHECK_RESULT (&res, &exp, N_ELEMS, _maskz_cvtph_epi32); + +#if AVX512F_LEN == 512 + EMULATE(cvtph2_d)(&exp, src1, NET_MASK, 0); + SI(res) = INTRINSIC (_cvt_roundph_epi32) (H_HF(src1), _ROUND_NINT); + CHECK_RESULT (&res, &exp, N_ELEMS, _cvt_roundph_epi32); + + init_dest(&res, &exp); + EMULATE(cvtph2_d)(&exp, src1, HALF_MASK, 0); + SI(res) = INTRINSIC (_mask_cvt_roundph_epi32) (SI(res), HALF_MASK, H_HF(src1), _ROUND_NINT); + CHECK_RESULT (&res, &exp, N_ELEMS, _mask_cvt_roundph_epi32); + + EMULATE(cvtph2_d)(&exp, src1, HALF_MASK, 1); + SI(res) = INTRINSIC (_maskz_cvt_roundph_epi32) (HALF_MASK, H_HF(src1), _ROUND_NINT); + CHECK_RESULT (&res, &exp, N_ELEMS, _maskz_cvt_roundph_epi32); +#endif + + if (n_errs != 0) + abort (); +} + + diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtph2qq-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtph2qq-1a.c new file mode 100644 index 00000000000..d80ee611f3c --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtph2qq-1a.c @@ -0,0 +1,24 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx512fp16 -O2" } */ +/* { dg-final { scan-assembler-times "vcvtph2qq\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 2 } } */ +/* { dg-final { scan-assembler-times "vcvtph2qq\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\[^\{\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtph2qq\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtph2qq\[ \\t\]+\{rn-sae\}\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtph2qq\[ \\t\]+\{rz-sae\}\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ + +#include + +volatile __m512i res, res1, res2; +volatile __m128h x1, x2, x3; +volatile __mmask8 m8; + +void extern +avx512f_test (void) +{ + res = _mm512_cvtph_epi64 (x1); + res1 = _mm512_mask_cvtph_epi64 (res, m8, x2); + res2 = _mm512_maskz_cvtph_epi64 (m8, x3); + res = _mm512_cvt_roundph_epi64 (x1, 4); + res1 = _mm512_mask_cvt_roundph_epi64 (res, m8, x2, 8); + res2 = _mm512_maskz_cvt_roundph_epi64 (m8, x3, 11); +} diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtph2qq-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtph2qq-1b.c new file mode 100644 index 00000000000..42b21cf2e4d --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtph2qq-1b.c @@ -0,0 +1,78 @@ +/* { dg-do run { target avx512fp16 } } */ +/* { dg-options "-O2 -mavx512fp16 -mavx512dq" } */ + + +#define AVX512FP16 +#include "avx512fp16-helper.h" + +#define N_ELEMS (AVX512F_LEN / 16) + +void NOINLINE +EMULATE(cvtph2_q) (V512 * dest, V512 op1, + __mmask32 k, int zero_mask) +{ + V512 v1, v2, v3, v4, v5, v6, v7, v8; + int i; + __mmask16 m1, m2; + + m1 = k & 0xffff; + + unpack_ph_2twops(op1, &v1, &v2); + + for (i = 0; i < 8; i++) { + if (((1 << i) & m1) == 0) { + if (zero_mask) { + v5.u64[i] = 0; + } + else { + v5.u64[i] = dest->u64[i]; + } + } + else { + v5.u64[i] = v1.f32[i]; + } + } + *dest = v5; +} + +void +TEST (void) +{ + V512 res; + V512 exp; + + init_src(); + + EMULATE(cvtph2_q)(&exp, src1, NET_MASK, 0); + SI(res) = INTRINSIC (_cvtph_epi64) (src1.xmmh[0]); + CHECK_RESULT (&res, &exp, N_ELEMS, _cvtph_epi64); + + init_dest(&res, &exp); + EMULATE(cvtph2_q)(&exp, src1, 0xcc, 0); + SI(res) = INTRINSIC (_mask_cvtph_epi64) (SI(res), 0xcc, src1.xmmh[0]); + CHECK_RESULT (&res, &exp, N_ELEMS, _mask_cvtph_epi64); + + EMULATE(cvtph2_q)(&exp, src1, 0xfa, 1); + SI(res) = INTRINSIC (_maskz_cvtph_epi64) (0xfa, src1.xmmh[0]); + CHECK_RESULT (&res, &exp, N_ELEMS, _maskz_cvtph_epi64); + +#if AVX512F_LEN == 512 + EMULATE(cvtph2_q)(&exp, src1, NET_MASK, 0); + SI(res) = INTRINSIC (_cvt_roundph_epi64) (src1.xmmh[0], _ROUND_NINT); + CHECK_RESULT (&res, &exp, N_ELEMS, _cvt_roundph_epi64); + + init_dest(&res, &exp); + EMULATE(cvtph2_q)(&exp, src1, 0xcc, 0); + SI(res) = INTRINSIC (_mask_cvt_roundph_epi64) (SI(res), 0xcc, src1.xmmh[0], _ROUND_NINT); + CHECK_RESULT (&res, &exp, N_ELEMS, _mask_cvt_roundph_epi64); + + EMULATE(cvtph2_q)(&exp, src1, 0xfa, 1); + SI(res) = INTRINSIC (_maskz_cvt_roundph_epi64) (0xfa, src1.xmmh[0], _ROUND_NINT); + CHECK_RESULT (&res, &exp, N_ELEMS, _maskz_cvt_roundph_epi64); +#endif + + if (n_errs != 0) { + abort (); + } +} + diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtph2udq-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtph2udq-1a.c new file mode 100644 index 00000000000..b4a833afdab --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtph2udq-1a.c @@ -0,0 +1,24 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx512fp16 -O2" } */ +/* { dg-final { scan-assembler-times "vcvtph2udq\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 2 } } */ +/* { dg-final { scan-assembler-times "vcvtph2udq\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\[^\{\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtph2udq\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtph2udq\[ \\t\]+\{rn-sae\}\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtph2udq\[ \\t\]+\{rz-sae\}\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ + +#include + +volatile __m512i res, res1, res2; +volatile __m256h x1, x2, x3; +volatile __mmask16 m16; + +void extern +avx512f_test (void) +{ + res = _mm512_cvtph_epu32 (x1); + res1 = _mm512_mask_cvtph_epu32 (res, m16, x2); + res2 = _mm512_maskz_cvtph_epu32 (m16, x3); + res = _mm512_cvt_roundph_epu32 (x1, 4); + res1 = _mm512_mask_cvt_roundph_epu32 (res, m16, x2, 8); + res2 = _mm512_maskz_cvt_roundph_epu32 (m16, x3, 11); +} diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtph2udq-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtph2udq-1b.c new file mode 100644 index 00000000000..15fa0ba2b4f --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtph2udq-1b.c @@ -0,0 +1,79 @@ +/* { dg-do run { target avx512fp16 } } */ +/* { dg-options "-O2 -mavx512fp16 -mavx512dq" } */ + + +#define AVX512FP16 +#include "avx512fp16-helper.h" + +#define N_ELEMS (AVX512F_LEN / 16) + +void NOINLINE +EMULATE(cvtph2_d) (V512 * dest, V512 op1, + __mmask32 k, int zero_mask) +{ + V512 v1, v2, v3, v4, v5, v6, v7, v8; + int i; + __mmask16 m1, m2; + + m1 = k & 0xffff; + + unpack_ph_2twops(op1, &v1, &v2); + + for (i = 0; i < 16; i++) { + if (((1 << i) & m1) == 0) { + if (zero_mask) { + v5.u32[i] = 0; + } + else { + v5.u32[i] = dest->u32[i]; + } + } + else { + v5.u32[i] = v1.f32[i]; + + } + } + *dest = v5; +} + +void +TEST (void) +{ + V512 res; + V512 exp; + + init_src(); + + EMULATE(cvtph2_d)(&exp, src1, NET_MASK, 0); + SI(res) = INTRINSIC (_cvtph_epu32) (H_HF(src1)); + CHECK_RESULT (&res, &exp, N_ELEMS, _cvtph_epu32); + + init_dest(&res, &exp); + EMULATE(cvtph2_d)(&exp, src1, HALF_MASK, 0); + SI(res) = INTRINSIC (_mask_cvtph_epu32) (SI(res), HALF_MASK, H_HF(src1)); + CHECK_RESULT (&res, &exp, N_ELEMS, _mask_cvtph_epu32); + + EMULATE(cvtph2_d)(&exp, src1, HALF_MASK, 1); + SI(res) = INTRINSIC (_maskz_cvtph_epu32) (HALF_MASK, H_HF(src1)); + CHECK_RESULT (&res, &exp, N_ELEMS, _maskz_cvtph_epu32); + +#if AVX512F_LEN == 512 + EMULATE(cvtph2_d)(&exp, src1, NET_MASK, 0); + SI(res) = INTRINSIC (_cvt_roundph_epu32) (H_HF(src1), _ROUND_NINT); + CHECK_RESULT (&res, &exp, N_ELEMS, _cvt_roundph_epu32); + + init_dest(&res, &exp); + EMULATE(cvtph2_d)(&exp, src1, HALF_MASK, 0); + SI(res) = INTRINSIC (_mask_cvt_roundph_epu32) (SI(res), HALF_MASK, H_HF(src1), _ROUND_NINT); + CHECK_RESULT (&res, &exp, N_ELEMS, _mask_cvt_roundph_epu32); + + EMULATE(cvtph2_d)(&exp, src1, HALF_MASK, 1); + SI(res) = INTRINSIC (_maskz_cvt_roundph_epu32) (HALF_MASK, H_HF(src1), _ROUND_NINT); + CHECK_RESULT (&res, &exp, N_ELEMS, _maskz_cvt_roundph_epu32); +#endif + + if (n_errs != 0) { + abort (); + } +} + diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtph2uqq-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtph2uqq-1a.c new file mode 100644 index 00000000000..b4087798be9 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtph2uqq-1a.c @@ -0,0 +1,24 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx512fp16 -O2" } */ +/* { dg-final { scan-assembler-times "vcvtph2uqq\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 2 } } */ +/* { dg-final { scan-assembler-times "vcvtph2uqq\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\[^\{\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtph2uqq\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtph2uqq\[ \\t\]+\{rn-sae\}\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtph2uqq\[ \\t\]+\{rz-sae\}\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ + +#include + +volatile __m512i res, res1, res2; +volatile __m128h x1, x2, x3; +volatile __mmask8 m8; + +void extern +avx512f_test (void) +{ + res = _mm512_cvtph_epu64 (x1); + res1 = _mm512_mask_cvtph_epu64 (res, m8, x2); + res2 = _mm512_maskz_cvtph_epu64 (m8, x3); + res = _mm512_cvt_roundph_epu64 (x1, 4); + res1 = _mm512_mask_cvt_roundph_epu64 (res, m8, x2, 8); + res2 = _mm512_maskz_cvt_roundph_epu64 (m8, x3, 11); +} diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtph2uqq-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtph2uqq-1b.c new file mode 100644 index 00000000000..7f34772aca6 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtph2uqq-1b.c @@ -0,0 +1,78 @@ +/* { dg-do run { target avx512fp16 } } */ +/* { dg-options "-O2 -mavx512fp16 -mavx512dq" } */ + + +#define AVX512FP16 +#include "avx512fp16-helper.h" + +#define N_ELEMS (AVX512F_LEN / 16) + +void NOINLINE +EMULATE(cvtph2_q) (V512 * dest, V512 op1, + __mmask32 k, int zero_mask) +{ + V512 v1, v2, v3, v4, v5, v6, v7, v8; + int i; + __mmask16 m1, m2; + + m1 = k & 0xffff; + + unpack_ph_2twops(op1, &v1, &v2); + + for (i = 0; i < 8; i++) { + if (((1 << i) & m1) == 0) { + if (zero_mask) { + v5.u64[i] = 0; + } + else { + v5.u64[i] = dest->u64[i]; + } + } + else { + v5.u64[i] = v1.f32[i]; + } + } + *dest = v5; +} + +void +TEST (void) +{ + V512 res; + V512 exp; + + init_src(); + + EMULATE(cvtph2_q)(&exp, src1, NET_MASK, 0); + SI(res) = INTRINSIC (_cvtph_epu64) (src1.xmmh[0]); + CHECK_RESULT (&res, &exp, N_ELEMS, _cvtph_epu64); + + init_dest(&res, &exp); + EMULATE(cvtph2_q)(&exp, src1, 0xcc, 0); + SI(res) = INTRINSIC (_mask_cvtph_epu64) (SI(res), 0xcc, src1.xmmh[0]); + CHECK_RESULT (&res, &exp, N_ELEMS, _mask_cvtph_epu64); + + EMULATE(cvtph2_q)(&exp, src1, 0xfc, 1); + SI(res) = INTRINSIC (_maskz_cvtph_epu64) (0xfc, src1.xmmh[0]); + CHECK_RESULT (&res, &exp, N_ELEMS, _maskz_cvtph_epu64); + +#if AVX512F_LEN == 512 + EMULATE(cvtph2_q)(&exp, src1, NET_MASK, 0); + SI(res) = INTRINSIC (_cvt_roundph_epu64) (src1.xmmh[0], _ROUND_NINT); + CHECK_RESULT (&res, &exp, N_ELEMS, _cvt_roundph_epu64); + + init_dest(&res, &exp); + EMULATE(cvtph2_q)(&exp, src1, 0xcc, 0); + SI(res) = INTRINSIC (_mask_cvt_roundph_epu64) (SI(res), 0xcc, src1.xmmh[0], _ROUND_NINT); + CHECK_RESULT (&res, &exp, N_ELEMS, _mask_cvt_roundph_epu64); + + EMULATE(cvtph2_q)(&exp, src1, 0xfc, 1); + SI(res) = INTRINSIC (_maskz_cvt_roundph_epu64) (0xfc, src1.xmmh[0], _ROUND_NINT); + CHECK_RESULT (&res, &exp, N_ELEMS, _maskz_cvt_roundph_epu64); +#endif + + if (n_errs != 0) { + abort (); + } +} + diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtph2uw-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtph2uw-1a.c new file mode 100644 index 00000000000..262274526b1 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtph2uw-1a.c @@ -0,0 +1,24 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx512fp16 -O2" } */ +/* { dg-final { scan-assembler-times "vcvtph2uw\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 2 } } */ +/* { dg-final { scan-assembler-times "vcvtph2uw\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\[^\{\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtph2uw\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtph2uw\[ \\t\]+\{rn-sae\}\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtph2uw\[ \\t\]+\{rz-sae\}\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ + +#include + +volatile __m512i res, res1, res2; +volatile __m512h x1, x2, x3; +volatile __mmask32 m32; + +void extern +avx512f_test (void) +{ + res = _mm512_cvtph_epu16 (x1); + res1 = _mm512_mask_cvtph_epu16 (res, m32, x2); + res2 = _mm512_maskz_cvtph_epu16 (m32, x3); + res = _mm512_cvt_roundph_epu16 (x1, 4); + res1 = _mm512_mask_cvt_roundph_epu16 (res, m32, x2, 8); + res2 = _mm512_maskz_cvt_roundph_epu16 (m32, x3, 11); +} diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtph2uw-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtph2uw-1b.c new file mode 100644 index 00000000000..437a1f0eeae --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtph2uw-1b.c @@ -0,0 +1,84 @@ +/* { dg-do run { target avx512fp16 } } */ +/* { dg-options "-O2 -mavx512fp16 -mavx512dq" } */ + + +#define AVX512FP16 +#include "avx512fp16-helper.h" + +#define N_ELEMS (AVX512F_LEN / 16) + +void NOINLINE +EMULATE(cvtph2_w) (V512 * dest, V512 op1, + __mmask32 k, int zero_mask) +{ + V512 v1, v2, v3, v4, v5, v6, v7, v8; + int i; + __mmask16 m1, m2; + + m1 = k & 0xffff; + m2 = (k >> 16) & 0xffff; + + unpack_ph_2twops(op1, &v1, &v2); + + for (i = 0; i < 16; i++) { + if (((1 << i) & m1) == 0) { + if (zero_mask) { + dest->u16[i] = 0; + } + } + else { + dest->u16[i] = v1.f32[i]; + + } + + if (((1 << i) & m2) == 0) { + if (zero_mask) { + dest->u16[i+16] = 0; + } + } + else { + dest->u16[i+16] = v2.f32[i]; + } + } +} + +void +TEST (void) +{ + V512 res; + V512 exp; + + init_src(); + + EMULATE(cvtph2_w)(&exp, src1, NET_MASK, 0); + SI(res) = INTRINSIC (_cvtph_epu16) (HF(src1)); + CHECK_RESULT (&res, &exp, N_ELEMS, _cvtph_epu16); + + init_dest(&res, &exp); + EMULATE(cvtph2_w)(&exp, src1, MASK_VALUE, 0); + SI(res) = INTRINSIC (_mask_cvtph_epu16) (SI(res), MASK_VALUE, HF(src1)); + CHECK_RESULT (&res, &exp, N_ELEMS, _mask_cvtph_epu16); + + EMULATE(cvtph2_w)(&exp, src1, ZMASK_VALUE, 1); + SI(res) = INTRINSIC (_maskz_cvtph_epu16) (ZMASK_VALUE, HF(src1)); + CHECK_RESULT (&res, &exp, N_ELEMS, _maskz_cvtph_epu16); + +#if AVX512F_LEN == 512 + EMULATE(cvtph2_w)(&exp, src1, NET_MASK, 0); + SI(res) = INTRINSIC (_cvt_roundph_epu16) (HF(src1), _ROUND_NINT); + CHECK_RESULT (&res, &exp, N_ELEMS, _cvt_roundph_epu16); + + init_dest(&res, &exp); + EMULATE(cvtph2_w)(&exp, src1, MASK_VALUE, 0); + SI(res) = INTRINSIC (_mask_cvt_roundph_epu16) (SI(res), MASK_VALUE, HF(src1), _ROUND_NINT); + CHECK_RESULT (&res, &exp, N_ELEMS, _mask_cvt_roundph_epu16); + + EMULATE(cvtph2_w)(&exp, src1, ZMASK_VALUE, 1); + SI(res) = INTRINSIC (_maskz_cvt_roundph_epu16) (ZMASK_VALUE, HF(src1), _ROUND_NINT); + CHECK_RESULT (&res, &exp, N_ELEMS, _maskz_cvt_roundph_epu16); +#endif + + if (n_errs != 0) + abort (); +} + diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtph2w-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtph2w-1a.c new file mode 100644 index 00000000000..bcaa7446d34 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtph2w-1a.c @@ -0,0 +1,24 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx512fp16 -O2" } */ +/* { dg-final { scan-assembler-times "vcvtph2w\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 2 } } */ +/* { dg-final { scan-assembler-times "vcvtph2w\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\[^\{\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtph2w\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtph2w\[ \\t\]+\{rn-sae\}\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtph2w\[ \\t\]+\{rz-sae\}\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ + +#include + +volatile __m512i res, res1, res2; +volatile __m512h x1, x2, x3; +volatile __mmask32 m32; + +void extern +avx512f_test (void) +{ + res = _mm512_cvtph_epi16 (x1); + res1 = _mm512_mask_cvtph_epi16 (res, m32, x2); + res2 = _mm512_maskz_cvtph_epi16 (m32, x3); + res = _mm512_cvt_roundph_epi16 (x1, 4); + res1 = _mm512_mask_cvt_roundph_epi16 (res, m32, x2, 8); + res2 = _mm512_maskz_cvt_roundph_epi16 (m32, x3, 11); +} diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtph2w-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtph2w-1b.c new file mode 100644 index 00000000000..dfa20523932 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtph2w-1b.c @@ -0,0 +1,83 @@ +/* { dg-do run { target avx512fp16 } } */ +/* { dg-options "-O2 -mavx512fp16 -mavx512dq" } */ + +#define AVX512FP16 +#include "avx512fp16-helper.h" + +#define N_ELEMS (AVX512F_LEN / 16) + +void NOINLINE +EMULATE(cvtph2_w) (V512 * dest, V512 op1, + __mmask32 k, int zero_mask) +{ + V512 v1, v2, v3, v4, v5, v6, v7, v8; + int i; + __mmask16 m1, m2; + + m1 = k & 0xffff; + m2 = (k >> 16) & 0xffff; + + unpack_ph_2twops(op1, &v1, &v2); + + for (i = 0; i < 16; i++) { + if (((1 << i) & m1) == 0) { + if (zero_mask) { + dest->u16[i] = 0; + } + } + else { + dest->u16[i] = v1.f32[i]; + + } + + if (((1 << i) & m2) == 0) { + if (zero_mask) { + dest->u16[i+16] = 0; + } + } + else { + dest->u16[i+16] = v2.f32[i]; + } + } +} + +void +TEST (void) +{ + V512 res, exp; + + init_src(); + + EMULATE(cvtph2_w)(&exp, src1, NET_MASK, 0); + SI(res) = INTRINSIC (_cvtph_epi16) (HF(src1)); + CHECK_RESULT (&res, &exp, N_ELEMS, _cvtph_epi16); + + init_dest(&res, &exp); + EMULATE(cvtph2_w)(&exp, src1, MASK_VALUE, 0); + SI(res) = INTRINSIC (_mask_cvtph_epi16) (SI(res), MASK_VALUE, HF(src1)); + CHECK_RESULT (&res, &exp, N_ELEMS, _mask_cvtph_epi16); + + EMULATE(cvtph2_w)(&exp, src1, ZMASK_VALUE, 1); + SI(res) = INTRINSIC (_maskz_cvtph_epi16) (ZMASK_VALUE, HF(src1)); + CHECK_RESULT (&res, &exp, N_ELEMS, _maskz_cvtph_epi16); + +#if AVX512F_LEN == 512 + EMULATE(cvtph2_w)(&exp, src1, NET_MASK, 0); + SI(res) = INTRINSIC (_cvt_roundph_epi16) (HF(src1), _ROUND_NINT); + CHECK_RESULT (&res, &exp, N_ELEMS, _cvt_roundph_epi16); + + init_dest(&res, &exp); + EMULATE(cvtph2_w)(&exp, src1, MASK_VALUE, 0); + SI(res) = INTRINSIC (_mask_cvt_roundph_epi16) (SI(res), MASK_VALUE, HF(src1), _ROUND_NINT); + CHECK_RESULT (&res, &exp, N_ELEMS, _mask_cvt_roundph_epi16); + + EMULATE(cvtph2_w)(&exp, src1, ZMASK_VALUE, 1); + SI(res) = INTRINSIC (_maskz_cvt_roundph_epi16) (ZMASK_VALUE, HF(src1), _ROUND_NINT); + CHECK_RESULT (&res, &exp, N_ELEMS, _maskz_cvt_roundph_epi16); +#endif + + if (n_errs != 0) { + abort (); + } +} + diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtph2dq-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtph2dq-1a.c new file mode 100644 index 00000000000..df653b0b2c7 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtph2dq-1a.c @@ -0,0 +1,27 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx512fp16 -mavx512vl -O2" } */ +/* { dg-final { scan-assembler-times "vcvtph2dq\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtph2dq\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtph2dq\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtph2dq\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtph2dq\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtph2dq\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ + +#include + +volatile __m256i res1; +volatile __m128i res2; +volatile __m128h x3; +volatile __mmask8 m8; + +void extern +avx512f_test (void) +{ + res1 = _mm256_cvtph_epi32 (x3); + res1 = _mm256_mask_cvtph_epi32 (res1, m8, x3); + res1 = _mm256_maskz_cvtph_epi32 (m8, x3); + + res2 = _mm_cvtph_epi32 (x3); + res2 = _mm_mask_cvtph_epi32 (res2, m8, x3); + res2 = _mm_maskz_cvtph_epi32 (m8, x3); +} diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtph2dq-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtph2dq-1b.c new file mode 100644 index 00000000000..93a3e903da4 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtph2dq-1b.c @@ -0,0 +1,15 @@ +/* { dg-do run { target avx512fp16 } } */ +/* { dg-options "-O2 -mavx512fp16 -mavx512vl -mavx512dq" } */ + +#define AVX512VL +#define AVX512F_LEN 256 +#define AVX512F_LEN_HALF 128 +#include "avx512fp16-vcvtph2dq-1b.c" + +#undef AVX512F_LEN +#undef AVX512F_LEN_HALF + +#define AVX512F_LEN 128 +#define AVX512F_LEN_HALF 128 +#include "avx512fp16-vcvtph2dq-1b.c" + diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtph2qq-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtph2qq-1a.c new file mode 100644 index 00000000000..ddc6f2a702e --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtph2qq-1a.c @@ -0,0 +1,27 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx512fp16 -mavx512vl -O2" } */ +/* { dg-final { scan-assembler-times "vcvtph2qq\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtph2qq\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtph2qq\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtph2qq\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtph2qq\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtph2qq\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ + +#include + +volatile __m256i res1; +volatile __m128i res2; +volatile __m128h x3; +volatile __mmask8 m8; + +void extern +avx512f_test (void) +{ + res1 = _mm256_cvtph_epi64 (x3); + res1 = _mm256_mask_cvtph_epi64 (res1, m8, x3); + res1 = _mm256_maskz_cvtph_epi64 (m8, x3); + + res2 = _mm_cvtph_epi64 (x3); + res2 = _mm_mask_cvtph_epi64 (res2, m8, x3); + res2 = _mm_maskz_cvtph_epi64 (m8, x3); +} diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtph2qq-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtph2qq-1b.c new file mode 100644 index 00000000000..5afc5a1836b --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtph2qq-1b.c @@ -0,0 +1,15 @@ +/* { dg-do run { target avx512fp16 } } */ +/* { dg-options "-O2 -mavx512fp16 -mavx512vl -mavx512dq" } */ + +#define AVX512VL +#define AVX512F_LEN 256 +#define AVX512F_LEN_HALF 128 +#include "avx512fp16-vcvtph2qq-1b.c" + +#undef AVX512F_LEN +#undef AVX512F_LEN_HALF + +#define AVX512F_LEN 128 +#define AVX512F_LEN_HALF 128 +#include "avx512fp16-vcvtph2qq-1b.c" + diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtph2udq-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtph2udq-1a.c new file mode 100644 index 00000000000..d07d76647a7 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtph2udq-1a.c @@ -0,0 +1,27 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx512fp16 -mavx512vl -O2" } */ +/* { dg-final { scan-assembler-times "vcvtph2udq\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtph2udq\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtph2udq\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtph2udq\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtph2udq\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtph2udq\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ + +#include + +volatile __m256i res1; +volatile __m128i res2; +volatile __m128h x3; +volatile __mmask8 m8; + +void extern +avx512f_test (void) +{ + res1 = _mm256_cvtph_epu32 (x3); + res1 = _mm256_mask_cvtph_epu32 (res1, m8, x3); + res1 = _mm256_maskz_cvtph_epu32 (m8, x3); + + res2 = _mm_cvtph_epu32 (x3); + res2 = _mm_mask_cvtph_epu32 (res2, m8, x3); + res2 = _mm_maskz_cvtph_epu32 (m8, x3); +} diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtph2udq-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtph2udq-1b.c new file mode 100644 index 00000000000..d869a0ca259 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtph2udq-1b.c @@ -0,0 +1,15 @@ +/* { dg-do run { target avx512fp16 } } */ +/* { dg-options "-O2 -mavx512fp16 -mavx512vl -mavx512dq" } */ + +#define AVX512VL +#define AVX512F_LEN 256 +#define AVX512F_LEN_HALF 128 +#include "avx512fp16-vcvtph2udq-1b.c" + +#undef AVX512F_LEN +#undef AVX512F_LEN_HALF + +#define AVX512F_LEN 128 +#define AVX512F_LEN_HALF 128 +#include "avx512fp16-vcvtph2udq-1b.c" + diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtph2uqq-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtph2uqq-1a.c new file mode 100644 index 00000000000..26dbf227d81 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtph2uqq-1a.c @@ -0,0 +1,27 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx512fp16 -mavx512vl -O2" } */ +/* { dg-final { scan-assembler-times "vcvtph2uqq\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtph2uqq\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtph2uqq\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtph2uqq\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtph2uqq\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtph2uqq\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ + +#include + +volatile __m256i res1; +volatile __m128i res2; +volatile __m128h x3; +volatile __mmask8 m8; + +void extern +avx512f_test (void) +{ + res1 = _mm256_cvtph_epu64 (x3); + res1 = _mm256_mask_cvtph_epu64 (res1, m8, x3); + res1 = _mm256_maskz_cvtph_epu64 (m8, x3); + + res2 = _mm_cvtph_epu64 (x3); + res2 = _mm_mask_cvtph_epu64 (res2, m8, x3); + res2 = _mm_maskz_cvtph_epu64 (m8, x3); +} diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtph2uqq-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtph2uqq-1b.c new file mode 100644 index 00000000000..d9b10a82f8e --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtph2uqq-1b.c @@ -0,0 +1,15 @@ +/* { dg-do run { target avx512fp16 } } */ +/* { dg-options "-O2 -mavx512fp16 -mavx512vl -mavx512dq" } */ + +#define AVX512VL +#define AVX512F_LEN 256 +#define AVX512F_LEN_HALF 128 +#include "avx512fp16-vcvtph2uqq-1b.c" + +#undef AVX512F_LEN +#undef AVX512F_LEN_HALF + +#define AVX512F_LEN 128 +#define AVX512F_LEN_HALF 128 +#include "avx512fp16-vcvtph2uqq-1b.c" + diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtph2uw-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtph2uw-1a.c new file mode 100644 index 00000000000..0f9fd27881c --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtph2uw-1a.c @@ -0,0 +1,29 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx512fp16 -mavx512vl -O2" } */ +/* { dg-final { scan-assembler-times "vcvtph2uw\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtph2uw\[ \\t\]+%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtph2uw\[ \\t\]+%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtph2uw\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtph2uw\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtph2uw\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ + +#include + +volatile __m256i res1; +volatile __m128i res2; +volatile __m256h x3; +volatile __m128h x4; +volatile __mmask16 m16; +volatile __mmask8 m8; + +void extern +avx512f_test (void) +{ + res1 = _mm256_cvtph_epu16 (x3); + res1 = _mm256_mask_cvtph_epu16 (res1, m16, x3); + res1 = _mm256_maskz_cvtph_epu16 (m16, x3); + + res2 = _mm_cvtph_epu16 (x4); + res2 = _mm_mask_cvtph_epu16 (res2, m8, x4); + res2 = _mm_maskz_cvtph_epu16 (m8, x4); +} diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtph2uw-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtph2uw-1b.c new file mode 100644 index 00000000000..280dcd75320 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtph2uw-1b.c @@ -0,0 +1,15 @@ +/* { dg-do run { target avx512fp16 } } */ +/* { dg-options "-O2 -mavx512fp16 -mavx512vl -mavx512dq" } */ + +#define AVX512VL +#define AVX512F_LEN 256 +#define AVX512F_LEN_HALF 128 +#include "avx512fp16-vcvtph2uw-1b.c" + +#undef AVX512F_LEN +#undef AVX512F_LEN_HALF + +#define AVX512F_LEN 128 +#define AVX512F_LEN_HALF 128 +#include "avx512fp16-vcvtph2uw-1b.c" + diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtph2w-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtph2w-1a.c new file mode 100644 index 00000000000..8dee4ee25d0 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtph2w-1a.c @@ -0,0 +1,29 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx512fp16 -mavx512vl -O2" } */ +/* { dg-final { scan-assembler-times "vcvtph2w\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtph2w\[ \\t\]+%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtph2w\[ \\t\]+%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtph2w\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtph2w\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtph2w\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ + +#include + +volatile __m256i res1; +volatile __m128i res2; +volatile __m256h x3; +volatile __m128h x4; +volatile __mmask16 m16; +volatile __mmask8 m8; + +void extern +avx512f_test (void) +{ + res1 = _mm256_cvtph_epi16 (x3); + res1 = _mm256_mask_cvtph_epi16 (res1, m16, x3); + res1 = _mm256_maskz_cvtph_epi16 (m16, x3); + + res2 = _mm_cvtph_epi16 (x4); + res2 = _mm_mask_cvtph_epi16 (res2, m8, x4); + res2 = _mm_maskz_cvtph_epi16 (m8, x4); +} diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtph2w-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtph2w-1b.c new file mode 100644 index 00000000000..739ba6478ae --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtph2w-1b.c @@ -0,0 +1,15 @@ +/* { dg-do run { target avx512fp16 } } */ +/* { dg-options "-O2 -mavx512fp16 -mavx512vl -mavx512dq" } */ + +#define AVX512VL +#define AVX512F_LEN 256 +#define AVX512F_LEN_HALF 128 +#include "avx512fp16-vcvtph2w-1b.c" + +#undef AVX512F_LEN +#undef AVX512F_LEN_HALF + +#define AVX512F_LEN 128 +#define AVX512F_LEN_HALF 128 +#include "avx512fp16-vcvtph2w-1b.c" + From patchwork Thu Jul 1 06:16:14 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: liuhongt X-Patchwork-Id: 1499335 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256 header.s=default header.b=E097bCqv; dkim-atps=neutral Received: from sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4GFpjb0jHkz9sVb for ; Thu, 1 Jul 2021 16:49:39 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id A0BE0384842E for ; Thu, 1 Jul 2021 06:49:36 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org A0BE0384842E DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1625122176; bh=m6dGoZyu9d2whc0wUkoidNoquKorJg0AsWmDzY2Oj2M=; h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=E097bCqvh2jScC4OedqdEg8hd7/MV/buccEGC7xzya0qpPrGZpBQKdkS5ijMfs52Y fvTAfURE6EOeI70mNWVc0BW6LDX7618hxBsCDOs5CZy131kjsz7YPhb48l3iL+go6/ bYpEEOalqwiNOos1BZHHmV4eQHmvMLImZT+vcvt8= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by sourceware.org (Postfix) with ESMTPS id 6014A384A012 for ; Thu, 1 Jul 2021 06:17:37 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 6014A384A012 X-IronPort-AV: E=McAfee;i="6200,9189,10031"; a="205474473" X-IronPort-AV: E=Sophos;i="5.83,313,1616482800"; d="scan'208";a="205474473" Received: from fmsmga005.fm.intel.com ([10.253.24.32]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Jun 2021 23:17:36 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.83,313,1616482800"; d="scan'208";a="644339127" Received: from scymds01.sc.intel.com ([10.148.94.138]) by fmsmga005.fm.intel.com with ESMTP; 30 Jun 2021 23:17:36 -0700 Received: from shliclel320.sh.intel.com (shliclel320.sh.intel.com [10.239.236.50]) by scymds01.sc.intel.com with ESMTP id 1616Gmf7031625; Wed, 30 Jun 2021 23:17:35 -0700 To: gcc-patches@gcc.gnu.org Subject: [PATCH 28/62] AVX512FP16: Add vcvtuw2ph/vcvtw2ph/vcvtdq2ph/vcvtudq2ph/vcvtqq2ph/vcvtuqq2ph Date: Thu, 1 Jul 2021 14:16:14 +0800 Message-Id: <20210701061648.9447-29-hongtao.liu@intel.com> X-Mailer: git-send-email 2.18.1 In-Reply-To: <20210701061648.9447-1-hongtao.liu@intel.com> References: <20210701061648.9447-1-hongtao.liu@intel.com> X-Spam-Status: No, score=-12.3 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_NONE, KAM_DMARC_STATUS, KAM_LAZY_DOMAIN_SECURITY, KAM_SHORT, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: liuhongt via Gcc-patches From: liuhongt Reply-To: liuhongt Cc: jakub@redhat.com Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" gcc/ChangeLog: * config/i386/avx512fp16intrin.h (_mm512_cvtepi32_ph): New intrinsic. (_mm512_mask_cvtepi32_ph): Likewise. (_mm512_maskz_cvtepi32_ph): Likewise. (_mm512_cvt_roundepi32_ph): Likewise. (_mm512_mask_cvt_roundepi32_ph): Likewise. (_mm512_maskz_cvt_roundepi32_ph): Likewise. (_mm512_cvtepu32_ph): Likewise. (_mm512_mask_cvtepu32_ph): Likewise. (_mm512_maskz_cvtepu32_ph): Likewise. (_mm512_cvt_roundepu32_ph): Likewise. (_mm512_mask_cvt_roundepu32_ph): Likewise. (_mm512_maskz_cvt_roundepu32_ph): Likewise. (_mm512_cvtepi64_ph): Likewise. (_mm512_mask_cvtepi64_ph): Likewise. (_mm512_maskz_cvtepi64_ph): Likewise. (_mm512_cvt_roundepi64_ph): Likewise. (_mm512_mask_cvt_roundepi64_ph): Likewise. (_mm512_maskz_cvt_roundepi64_ph): Likewise. (_mm512_cvtepu64_ph): Likewise. (_mm512_mask_cvtepu64_ph): Likewise. (_mm512_maskz_cvtepu64_ph): Likewise. (_mm512_cvt_roundepu64_ph): Likewise. (_mm512_mask_cvt_roundepu64_ph): Likewise. (_mm512_maskz_cvt_roundepu64_ph): Likewise. (_mm512_cvtepi16_ph): Likewise. (_mm512_mask_cvtepi16_ph): Likewise. (_mm512_maskz_cvtepi16_ph): Likewise. (_mm512_cvt_roundepi16_ph): Likewise. (_mm512_mask_cvt_roundepi16_ph): Likewise. (_mm512_maskz_cvt_roundepi16_ph): Likewise. (_mm512_cvtepu16_ph): Likewise. (_mm512_mask_cvtepu16_ph): Likewise. (_mm512_maskz_cvtepu16_ph): Likewise. (_mm512_cvt_roundepu16_ph): Likewise. (_mm512_mask_cvt_roundepu16_ph): Likewise. (_mm512_maskz_cvt_roundepu16_ph): Likewise. * config/i386/avx512fp16vlintrin.h (_mm_cvtepi32_ph): New intrinsic. (_mm_mask_cvtepi32_ph): Likewise. (_mm_maskz_cvtepi32_ph): Likewise. (_mm256_cvtepi32_ph): Likewise. (_mm256_mask_cvtepi32_ph): Likewise. (_mm256_maskz_cvtepi32_ph): Likewise. (_mm_cvtepu32_ph): Likewise. (_mm_mask_cvtepu32_ph): Likewise. (_mm_maskz_cvtepu32_ph): Likewise. (_mm256_cvtepu32_ph): Likewise. (_mm256_mask_cvtepu32_ph): Likewise. (_mm256_maskz_cvtepu32_ph): Likewise. (_mm_cvtepi64_ph): Likewise. (_mm_mask_cvtepi64_ph): Likewise. (_mm_maskz_cvtepi64_ph): Likewise. (_mm256_cvtepi64_ph): Likewise. (_mm256_mask_cvtepi64_ph): Likewise. (_mm256_maskz_cvtepi64_ph): Likewise. (_mm_cvtepu64_ph): Likewise. (_mm_mask_cvtepu64_ph): Likewise. (_mm_maskz_cvtepu64_ph): Likewise. (_mm256_cvtepu64_ph): Likewise. (_mm256_mask_cvtepu64_ph): Likewise. (_mm256_maskz_cvtepu64_ph): Likewise. (_mm_cvtepi16_ph): Likewise. (_mm_mask_cvtepi16_ph): Likewise. (_mm_maskz_cvtepi16_ph): Likewise. (_mm256_cvtepi16_ph): Likewise. (_mm256_mask_cvtepi16_ph): Likewise. (_mm256_maskz_cvtepi16_ph): Likewise. (_mm_cvtepu16_ph): Likewise. (_mm_mask_cvtepu16_ph): Likewise. (_mm_maskz_cvtepu16_ph): Likewise. (_mm256_cvtepu16_ph): Likewise. (_mm256_mask_cvtepu16_ph): Likewise. (_mm256_maskz_cvtepu16_ph): Likewise. * config/i386/i386-builtin-types.def: Add corresponding builtin types. * config/i386/i386-builtin.def: Add corresponding new builtins. * config/i386/i386-expand.c (ix86_expand_args_builtin): Handle new builtin types. (ix86_expand_round_builtin): Ditto. * config/i386/i386-modes.def: Declare V2HF and V6HF. * config/i386/sse.md (VI2H_AVX512VL): New. (qq2phsuff): Ditto. (sseintvecmode): Add HF vector modes. (avx512fp16_vcvt2ph_): New. (avx512fp16_vcvt2ph_): Ditto. (*avx512fp16_vcvt2ph_): Ditto. (avx512fp16_vcvt2ph__mask): Ditto. (*avx512fp16_vcvt2ph__mask): Ditto. (*avx512fp16_vcvt2ph__mask_1): Ditto. (avx512fp16_vcvtqq2ph_v2di): Ditto. (*avx512fp16_vcvtqq2ph_v2di): Ditto. (avx512fp16_vcvtqq2ph_v2di_mask): Ditto. (*avx512fp16_vcvtqq2ph_v2di_mask): Ditto. (*avx512fp16_vcvtqq2ph_v2di_mask_1): Ditto. * config/i386/subst.md (round_qq2phsuff): New subst_attr. gcc/testsuite/ChangeLog: * gcc.target/i386/avx-1.c: Add test for new builtins. * gcc.target/i386/sse-13.c: Ditto. * gcc.target/i386/sse-23.c: Ditto. * gcc.target/i386/sse-14.c: Add test for new intrinsics. * gcc.target/i386/sse-22.c: Ditto. --- gcc/config/i386/avx512fp16intrin.h | 492 +++++++++++++++++++++++++ gcc/config/i386/avx512fp16vlintrin.h | 312 ++++++++++++++++ gcc/config/i386/i386-builtin-types.def | 9 + gcc/config/i386/i386-builtin.def | 18 + gcc/config/i386/i386-expand.c | 9 + gcc/config/i386/i386-modes.def | 2 + gcc/config/i386/sse.md | 153 +++++++- gcc/config/i386/subst.md | 1 + gcc/testsuite/gcc.target/i386/avx-1.c | 6 + gcc/testsuite/gcc.target/i386/sse-13.c | 6 + gcc/testsuite/gcc.target/i386/sse-14.c | 18 + gcc/testsuite/gcc.target/i386/sse-22.c | 18 + gcc/testsuite/gcc.target/i386/sse-23.c | 6 + 13 files changed, 1047 insertions(+), 3 deletions(-) diff --git a/gcc/config/i386/avx512fp16intrin.h b/gcc/config/i386/avx512fp16intrin.h index 42576c4ae2e..bd801942365 100644 --- a/gcc/config/i386/avx512fp16intrin.h +++ b/gcc/config/i386/avx512fp16intrin.h @@ -2702,6 +2702,172 @@ _mm512_maskz_cvt_roundph_epu32 (__mmask16 __A, __m256h __B, int __C) #endif /* __OPTIMIZE__ */ +/* Intrinsics vcvtdq2ph. */ +extern __inline __m256h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_cvtepi32_ph (__m512i __A) +{ + return __builtin_ia32_vcvtdq2ph_v16si_mask_round ((__v16si) __A, + _mm256_setzero_ph (), + (__mmask16) -1, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m256h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_cvtepi32_ph (__m256h __A, __mmask16 __B, __m512i __C) +{ + return __builtin_ia32_vcvtdq2ph_v16si_mask_round ((__v16si) __C, + __A, + __B, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m256h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_cvtepi32_ph (__mmask16 __A, __m512i __B) +{ + return __builtin_ia32_vcvtdq2ph_v16si_mask_round ((__v16si) __B, + _mm256_setzero_ph (), + __A, + _MM_FROUND_CUR_DIRECTION); +} + +#ifdef __OPTIMIZE__ +extern __inline __m256h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_cvt_roundepi32_ph (__m512i __A, int __B) +{ + return __builtin_ia32_vcvtdq2ph_v16si_mask_round ((__v16si) __A, + _mm256_setzero_ph (), + (__mmask16) -1, + __B); +} + +extern __inline __m256h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_cvt_roundepi32_ph (__m256h __A, __mmask16 __B, __m512i __C, int __D) +{ + return __builtin_ia32_vcvtdq2ph_v16si_mask_round ((__v16si) __C, + __A, + __B, + __D); +} + +extern __inline __m256h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_cvt_roundepi32_ph (__mmask16 __A, __m512i __B, int __C) +{ + return __builtin_ia32_vcvtdq2ph_v16si_mask_round ((__v16si) __B, + _mm256_setzero_ph (), + __A, + __C); +} + +#else +#define _mm512_cvt_roundepi32_ph(A, B) \ + (__builtin_ia32_vcvtdq2ph_v16si_mask_round ((__v16si)(A), \ + _mm256_setzero_ph (), \ + (__mmask16)-1, \ + (B))) + +#define _mm512_mask_cvt_roundepi32_ph(A, B, C, D) \ + (__builtin_ia32_vcvtdq2ph_v16si_mask_round ((__v16si)(C), \ + (A), \ + (B), \ + (D))) + +#define _mm512_maskz_cvt_roundepi32_ph(A, B, C) \ + (__builtin_ia32_vcvtdq2ph_v16si_mask_round ((__v16si)(B), \ + _mm256_setzero_ph (), \ + (A), \ + (C))) + +#endif /* __OPTIMIZE__ */ + +/* Intrinsics vcvtudq2ph. */ +extern __inline __m256h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_cvtepu32_ph (__m512i __A) +{ + return __builtin_ia32_vcvtudq2ph_v16si_mask_round ((__v16si) __A, + _mm256_setzero_ph (), + (__mmask16) -1, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m256h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_cvtepu32_ph (__m256h __A, __mmask16 __B, __m512i __C) +{ + return __builtin_ia32_vcvtudq2ph_v16si_mask_round ((__v16si) __C, + __A, + __B, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m256h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_cvtepu32_ph (__mmask16 __A, __m512i __B) +{ + return __builtin_ia32_vcvtudq2ph_v16si_mask_round ((__v16si) __B, + _mm256_setzero_ph (), + __A, + _MM_FROUND_CUR_DIRECTION); +} + +#ifdef __OPTIMIZE__ +extern __inline __m256h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_cvt_roundepu32_ph (__m512i __A, int __B) +{ + return __builtin_ia32_vcvtudq2ph_v16si_mask_round ((__v16si) __A, + _mm256_setzero_ph (), + (__mmask16) -1, + __B); +} + +extern __inline __m256h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_cvt_roundepu32_ph (__m256h __A, __mmask16 __B, __m512i __C, int __D) +{ + return __builtin_ia32_vcvtudq2ph_v16si_mask_round ((__v16si) __C, + __A, + __B, + __D); +} + +extern __inline __m256h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_cvt_roundepu32_ph (__mmask16 __A, __m512i __B, int __C) +{ + return __builtin_ia32_vcvtudq2ph_v16si_mask_round ((__v16si) __B, + _mm256_setzero_ph (), + __A, + __C); +} + +#else +#define _mm512_cvt_roundepu32_ph(A, B) \ + (__builtin_ia32_vcvtudq2ph_v16si_mask_round ((__v16si)(A), \ + _mm256_setzero_ph (), \ + (__mmask16)-1, \ + B)) + +#define _mm512_mask_cvt_roundepu32_ph(A, B, C, D) \ + (__builtin_ia32_vcvtudq2ph_v16si_mask_round ((__v16si)C, \ + A, \ + B, \ + D)) + +#define _mm512_maskz_cvt_roundepu32_ph(A, B, C) \ + (__builtin_ia32_vcvtudq2ph_v16si_mask_round ((__v16si)B, \ + _mm256_setzero_ph (), \ + A, \ + C)) + +#endif /* __OPTIMIZE__ */ + /* Intrinsics vcvtph2qq. */ extern __inline __m512i __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) @@ -2853,6 +3019,166 @@ _mm512_maskz_cvt_roundph_epu64 (__mmask8 __A, __m128h __B, int __C) #endif /* __OPTIMIZE__ */ +/* Intrinsics vcvtqq2ph. */ +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_cvtepi64_ph (__m512i __A) +{ + return __builtin_ia32_vcvtqq2ph_v8di_mask_round ((__v8di) __A, + _mm_setzero_ph (), + (__mmask8) -1, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_cvtepi64_ph (__m128h __A, __mmask8 __B, __m512i __C) +{ + return __builtin_ia32_vcvtqq2ph_v8di_mask_round ((__v8di) __C, + __A, + __B, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_cvtepi64_ph (__mmask8 __A, __m512i __B) +{ + return __builtin_ia32_vcvtqq2ph_v8di_mask_round ((__v8di) __B, + _mm_setzero_ph (), + __A, + _MM_FROUND_CUR_DIRECTION); +} + +#ifdef __OPTIMIZE__ +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_cvt_roundepi64_ph (__m512i __A, int __B) +{ + return __builtin_ia32_vcvtqq2ph_v8di_mask_round ((__v8di) __A, + _mm_setzero_ph (), + (__mmask8) -1, + __B); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_cvt_roundepi64_ph (__m128h __A, __mmask8 __B, __m512i __C, int __D) +{ + return __builtin_ia32_vcvtqq2ph_v8di_mask_round ((__v8di) __C, + __A, + __B, + __D); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_cvt_roundepi64_ph (__mmask8 __A, __m512i __B, int __C) +{ + return __builtin_ia32_vcvtqq2ph_v8di_mask_round ((__v8di) __B, + _mm_setzero_ph (), + __A, + __C); +} + +#else +#define _mm512_cvt_roundepi64_ph(A, B) \ + (__builtin_ia32_vcvtqq2ph_v8di_mask_round ((__v8di)(A), \ + _mm_setzero_ph (), \ + (__mmask8)-1, \ + (B))) + +#define _mm512_mask_cvt_roundepi64_ph(A, B, C, D) \ + (__builtin_ia32_vcvtqq2ph_v8di_mask_round ((__v8di)(C), (A), (B), (D))) + +#define _mm512_maskz_cvt_roundepi64_ph(A, B, C) \ + (__builtin_ia32_vcvtqq2ph_v8di_mask_round ((__v8di)(B), \ + _mm_setzero_ph (), \ + (A), \ + (C))) + +#endif /* __OPTIMIZE__ */ + +/* Intrinsics vcvtuqq2ph. */ +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_cvtepu64_ph (__m512i __A) +{ + return __builtin_ia32_vcvtuqq2ph_v8di_mask_round ((__v8di) __A, + _mm_setzero_ph (), + (__mmask8) -1, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_cvtepu64_ph (__m128h __A, __mmask8 __B, __m512i __C) +{ + return __builtin_ia32_vcvtuqq2ph_v8di_mask_round ((__v8di) __C, + __A, + __B, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_cvtepu64_ph (__mmask8 __A, __m512i __B) +{ + return __builtin_ia32_vcvtuqq2ph_v8di_mask_round ((__v8di) __B, + _mm_setzero_ph (), + __A, + _MM_FROUND_CUR_DIRECTION); +} + +#ifdef __OPTIMIZE__ +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_cvt_roundepu64_ph (__m512i __A, int __B) +{ + return __builtin_ia32_vcvtuqq2ph_v8di_mask_round ((__v8di) __A, + _mm_setzero_ph (), + (__mmask8) -1, + __B); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_cvt_roundepu64_ph (__m128h __A, __mmask8 __B, __m512i __C, int __D) +{ + return __builtin_ia32_vcvtuqq2ph_v8di_mask_round ((__v8di) __C, + __A, + __B, + __D); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_cvt_roundepu64_ph (__mmask8 __A, __m512i __B, int __C) +{ + return __builtin_ia32_vcvtuqq2ph_v8di_mask_round ((__v8di) __B, + _mm_setzero_ph (), + __A, + __C); +} + +#else +#define _mm512_cvt_roundepu64_ph(A, B) \ + (__builtin_ia32_vcvtuqq2ph_v8di_mask_round ((__v8di)(A), \ + _mm_setzero_ph (), \ + (__mmask8)-1, \ + (B))) + +#define _mm512_mask_cvt_roundepu64_ph(A, B, C, D) \ + (__builtin_ia32_vcvtuqq2ph_v8di_mask_round ((__v8di)(C), (A), (B), (D))) + +#define _mm512_maskz_cvt_roundepu64_ph(A, B, C) \ + (__builtin_ia32_vcvtuqq2ph_v8di_mask_round ((__v8di)(B), \ + _mm_setzero_ph (), \ + (A), \ + (C))) + +#endif /* __OPTIMIZE__ */ + /* Intrinsics vcvtph2w. */ extern __inline __m512i __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) @@ -3037,6 +3363,172 @@ _mm512_maskz_cvt_roundph_epu16 (__mmask32 __A, __m512h __B, int __C) #endif /* __OPTIMIZE__ */ +/* Intrinsics vcvtw2ph. */ +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_cvtepi16_ph (__m512i __A) +{ + return __builtin_ia32_vcvtw2ph_v32hi_mask_round ((__v32hi) __A, + _mm512_setzero_ph (), + (__mmask32) -1, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_cvtepi16_ph (__m512h __A, __mmask32 __B, __m512i __C) +{ + return __builtin_ia32_vcvtw2ph_v32hi_mask_round ((__v32hi) __C, + __A, + __B, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_cvtepi16_ph (__mmask32 __A, __m512i __B) +{ + return __builtin_ia32_vcvtw2ph_v32hi_mask_round ((__v32hi) __B, + _mm512_setzero_ph (), + __A, + _MM_FROUND_CUR_DIRECTION); +} + +#ifdef __OPTIMIZE__ +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_cvt_roundepi16_ph (__m512i __A, int __B) +{ + return __builtin_ia32_vcvtw2ph_v32hi_mask_round ((__v32hi) __A, + _mm512_setzero_ph (), + (__mmask32) -1, + __B); +} + +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_cvt_roundepi16_ph (__m512h __A, __mmask32 __B, __m512i __C, int __D) +{ + return __builtin_ia32_vcvtw2ph_v32hi_mask_round ((__v32hi) __C, + __A, + __B, + __D); +} + +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_cvt_roundepi16_ph (__mmask32 __A, __m512i __B, int __C) +{ + return __builtin_ia32_vcvtw2ph_v32hi_mask_round ((__v32hi) __B, + _mm512_setzero_ph (), + __A, + __C); +} + +#else +#define _mm512_cvt_roundepi16_ph(A, B) \ + (__builtin_ia32_vcvtw2ph_v32hi_mask_round ((__v32hi)(A), \ + _mm512_setzero_ph (), \ + (__mmask32)-1, \ + (B))) + +#define _mm512_mask_cvt_roundepi16_ph(A, B, C, D) \ + (__builtin_ia32_vcvtw2ph_v32hi_mask_round ((__v32hi)(C), \ + (A), \ + (B), \ + (D))) + +#define _mm512_maskz_cvt_roundepi16_ph(A, B, C) \ + (__builtin_ia32_vcvtw2ph_v32hi_mask_round ((__v32hi)(B), \ + _mm512_setzero_ph (), \ + (A), \ + (C))) + +#endif /* __OPTIMIZE__ */ + +/* Intrinsics vcvtuw2ph. */ +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_cvtepu16_ph (__m512i __A) +{ + return __builtin_ia32_vcvtuw2ph_v32hi_mask_round ((__v32hi) __A, + _mm512_setzero_ph (), + (__mmask32) -1, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_cvtepu16_ph (__m512h __A, __mmask32 __B, __m512i __C) +{ + return __builtin_ia32_vcvtuw2ph_v32hi_mask_round ((__v32hi) __C, + __A, + __B, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_cvtepu16_ph (__mmask32 __A, __m512i __B) +{ + return __builtin_ia32_vcvtuw2ph_v32hi_mask_round ((__v32hi) __B, + _mm512_setzero_ph (), + __A, + _MM_FROUND_CUR_DIRECTION); +} + +#ifdef __OPTIMIZE__ +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_cvt_roundepu16_ph (__m512i __A, int __B) +{ + return __builtin_ia32_vcvtuw2ph_v32hi_mask_round ((__v32hi) __A, + _mm512_setzero_ph (), + (__mmask32) -1, + __B); +} + +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_cvt_roundepu16_ph (__m512h __A, __mmask32 __B, __m512i __C, int __D) +{ + return __builtin_ia32_vcvtuw2ph_v32hi_mask_round ((__v32hi) __C, + __A, + __B, + __D); +} + +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_cvt_roundepu16_ph (__mmask32 __A, __m512i __B, int __C) +{ + return __builtin_ia32_vcvtuw2ph_v32hi_mask_round ((__v32hi) __B, + _mm512_setzero_ph (), + __A, + __C); +} + +#else +#define _mm512_cvt_roundepu16_ph(A, B) \ + (__builtin_ia32_vcvtuw2ph_v32hi_mask_round ((__v32hi)(A), \ + _mm512_setzero_ph (), \ + (__mmask32)-1, \ + (B))) + +#define _mm512_mask_cvt_roundepu16_ph(A, B, C, D) \ + (__builtin_ia32_vcvtuw2ph_v32hi_mask_round ((__v32hi)(C), \ + (A), \ + (B), \ + (D))) + +#define _mm512_maskz_cvt_roundepu16_ph(A, B, C) \ + (__builtin_ia32_vcvtuw2ph_v32hi_mask_round ((__v32hi)(B), \ + _mm512_setzero_ph (), \ + (A), \ + (C))) + +#endif /* __OPTIMIZE__ */ + #ifdef __DISABLE_AVX512FP16__ #undef __DISABLE_AVX512FP16__ #pragma GCC pop_options diff --git a/gcc/config/i386/avx512fp16vlintrin.h b/gcc/config/i386/avx512fp16vlintrin.h index 8a7e0aaa6b1..93d9ff8bf3c 100644 --- a/gcc/config/i386/avx512fp16vlintrin.h +++ b/gcc/config/i386/avx512fp16vlintrin.h @@ -1050,6 +1050,110 @@ _mm256_maskz_cvtph_epu32 (__mmask8 __A, __m128h __B) __A); } +/* Intrinsics vcvtdq2ph. */ +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvtepi32_ph (__m128i __A) +{ + return __builtin_ia32_vcvtdq2ph_v4si_mask ((__v4si) __A, + _mm_setzero_ph (), + (__mmask8) -1); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_cvtepi32_ph (__m128h __A, __mmask8 __B, __m128i __C) +{ + return __builtin_ia32_vcvtdq2ph_v4si_mask ((__v4si) __C, __A, __B); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_cvtepi32_ph (__mmask8 __A, __m128i __B) +{ + return __builtin_ia32_vcvtdq2ph_v4si_mask ((__v4si) __B, + _mm_setzero_ph (), + __A); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_cvtepi32_ph (__m256i __A) +{ + return __builtin_ia32_vcvtdq2ph_v8si_mask ((__v8si) __A, + _mm_setzero_ph (), + (__mmask8) -1); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_mask_cvtepi32_ph (__m128h __A, __mmask8 __B, __m256i __C) +{ + return __builtin_ia32_vcvtdq2ph_v8si_mask ((__v8si) __C, __A, __B); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_maskz_cvtepi32_ph (__mmask8 __A, __m256i __B) +{ + return __builtin_ia32_vcvtdq2ph_v8si_mask ((__v8si) __B, + _mm_setzero_ph (), + __A); +} + +/* Intrinsics vcvtudq2ph. */ +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvtepu32_ph (__m128i __A) +{ + return __builtin_ia32_vcvtudq2ph_v4si_mask ((__v4si) __A, + _mm_setzero_ph (), + (__mmask8) -1); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_cvtepu32_ph (__m128h __A, __mmask8 __B, __m128i __C) +{ + return __builtin_ia32_vcvtudq2ph_v4si_mask ((__v4si) __C, + __A, + __B); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_cvtepu32_ph (__mmask8 __A, __m128i __B) +{ + return __builtin_ia32_vcvtudq2ph_v4si_mask ((__v4si) __B, + _mm_setzero_ph (), + __A); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_cvtepu32_ph (__m256i __A) +{ + return __builtin_ia32_vcvtudq2ph_v8si_mask ((__v8si) __A, + _mm_setzero_ph (), + (__mmask8) -1); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_mask_cvtepu32_ph (__m128h __A, __mmask8 __B, __m256i __C) +{ + return __builtin_ia32_vcvtudq2ph_v8si_mask ((__v8si) __C, __A, __B); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_maskz_cvtepu32_ph (__mmask8 __A, __m256i __B) +{ + return __builtin_ia32_vcvtudq2ph_v8si_mask ((__v8si) __B, + _mm_setzero_ph (), + __A); +} + /* Intrinsics vcvtph2qq. */ extern __inline __m128i __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) @@ -1153,6 +1257,108 @@ _mm256_maskz_cvtph_epu64 (__mmask8 __A, __m128h __B) __A); } +/* Intrinsics vcvtqq2ph. */ +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvtepi64_ph (__m128i __A) +{ + return __builtin_ia32_vcvtqq2ph_v2di_mask ((__v2di) __A, + _mm_setzero_ph (), + (__mmask8) -1); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_cvtepi64_ph (__m128h __A, __mmask8 __B, __m128i __C) +{ + return __builtin_ia32_vcvtqq2ph_v2di_mask ((__v2di) __C, __A, __B); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_cvtepi64_ph (__mmask8 __A, __m128i __B) +{ + return __builtin_ia32_vcvtqq2ph_v2di_mask ((__v2di) __B, + _mm_setzero_ph (), + __A); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_cvtepi64_ph (__m256i __A) +{ + return __builtin_ia32_vcvtqq2ph_v4di_mask ((__v4di) __A, + _mm_setzero_ph (), + (__mmask8) -1); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_mask_cvtepi64_ph (__m128h __A, __mmask8 __B, __m256i __C) +{ + return __builtin_ia32_vcvtqq2ph_v4di_mask ((__v4di) __C, __A, __B); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_maskz_cvtepi64_ph (__mmask8 __A, __m256i __B) +{ + return __builtin_ia32_vcvtqq2ph_v4di_mask ((__v4di) __B, + _mm_setzero_ph (), + __A); +} + +/* Intrinsics vcvtuqq2ph. */ +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvtepu64_ph (__m128i __A) +{ + return __builtin_ia32_vcvtuqq2ph_v2di_mask ((__v2di) __A, + _mm_setzero_ph (), + (__mmask8) -1); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_cvtepu64_ph (__m128h __A, __mmask8 __B, __m128i __C) +{ + return __builtin_ia32_vcvtuqq2ph_v2di_mask ((__v2di) __C, __A, __B); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_cvtepu64_ph (__mmask8 __A, __m128i __B) +{ + return __builtin_ia32_vcvtuqq2ph_v2di_mask ((__v2di) __B, + _mm_setzero_ph (), + __A); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_cvtepu64_ph (__m256i __A) +{ + return __builtin_ia32_vcvtuqq2ph_v4di_mask ((__v4di) __A, + _mm_setzero_ph (), + (__mmask8) -1); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_mask_cvtepu64_ph (__m128h __A, __mmask8 __B, __m256i __C) +{ + return __builtin_ia32_vcvtuqq2ph_v4di_mask ((__v4di) __C, __A, __B); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_maskz_cvtepu64_ph (__mmask8 __A, __m256i __B) +{ + return __builtin_ia32_vcvtuqq2ph_v4di_mask ((__v4di) __B, + _mm_setzero_ph (), + __A); +} + /* Intrinsics vcvtph2w. */ extern __inline __m128i __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) @@ -1275,6 +1481,112 @@ _mm256_maskz_cvtph_epu16 (__mmask16 __A, __m256h __B) __A); } +/* Intrinsics vcvtw2ph. */ +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvtepi16_ph (__m128i __A) +{ + return __builtin_ia32_vcvtw2ph_v8hi_mask ((__v8hi) __A, + _mm_setzero_ph (), + (__mmask8) -1); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_cvtepi16_ph (__m128h __A, __mmask8 __B, __m128i __C) +{ + return __builtin_ia32_vcvtw2ph_v8hi_mask ((__v8hi) __C, + __A, + __B); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_cvtepi16_ph (__mmask8 __A, __m128i __B) +{ + return __builtin_ia32_vcvtw2ph_v8hi_mask ((__v8hi) __B, + _mm_setzero_ph (), + __A); +} + +extern __inline __m256h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_cvtepi16_ph (__m256i __A) +{ + return __builtin_ia32_vcvtw2ph_v16hi_mask ((__v16hi) __A, + _mm256_setzero_ph (), + (__mmask16) -1); +} + +extern __inline __m256h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_mask_cvtepi16_ph (__m256h __A, __mmask16 __B, __m256i __C) +{ + return __builtin_ia32_vcvtw2ph_v16hi_mask ((__v16hi) __C, + __A, + __B); +} + +extern __inline __m256h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_maskz_cvtepi16_ph (__mmask16 __A, __m256i __B) +{ + return __builtin_ia32_vcvtw2ph_v16hi_mask ((__v16hi) __B, + _mm256_setzero_ph (), + __A); +} + +/* Intrinsics vcvtuw2ph. */ +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvtepu16_ph (__m128i __A) +{ + return __builtin_ia32_vcvtuw2ph_v8hi_mask ((__v8hi) __A, + _mm_setzero_ph (), + (__mmask8) -1); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_cvtepu16_ph (__m128h __A, __mmask8 __B, __m128i __C) +{ + return __builtin_ia32_vcvtuw2ph_v8hi_mask ((__v8hi) __C, __A, __B); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_cvtepu16_ph (__mmask8 __A, __m128i __B) +{ + return __builtin_ia32_vcvtuw2ph_v8hi_mask ((__v8hi) __B, + _mm_setzero_ph (), + __A); +} + +extern __inline __m256h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_cvtepu16_ph (__m256i __A) +{ + return __builtin_ia32_vcvtuw2ph_v16hi_mask ((__v16hi) __A, + _mm256_setzero_ph (), + (__mmask16) -1); +} + +extern __inline __m256h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_mask_cvtepu16_ph (__m256h __A, __mmask16 __B, __m256i __C) +{ + return __builtin_ia32_vcvtuw2ph_v16hi_mask ((__v16hi) __C, __A, __B); +} + +extern __inline __m256h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_maskz_cvtepu16_ph (__mmask16 __A, __m256i __B) +{ + return __builtin_ia32_vcvtuw2ph_v16hi_mask ((__v16hi) __B, + _mm256_setzero_ph (), + __A); +} + #ifdef __DISABLE_AVX512FP16VL__ #undef __DISABLE_AVX512FP16VL__ #pragma GCC pop_options diff --git a/gcc/config/i386/i386-builtin-types.def b/gcc/config/i386/i386-builtin-types.def index c430dc9ab48..57b9ea786e1 100644 --- a/gcc/config/i386/i386-builtin-types.def +++ b/gcc/config/i386/i386-builtin-types.def @@ -1316,6 +1316,11 @@ DEF_FUNCTION_TYPE (V4DI, V8HF, V4DI, UQI) DEF_FUNCTION_TYPE (V4SI, V8HF, V4SI, UQI) DEF_FUNCTION_TYPE (V8SI, V8HF, V8SI, UQI) DEF_FUNCTION_TYPE (V8HI, V8HF, V8HI, UQI) +DEF_FUNCTION_TYPE (V8HF, V4SI, V8HF, UQI) +DEF_FUNCTION_TYPE (V8HF, V8SI, V8HF, UQI) +DEF_FUNCTION_TYPE (V8HF, V2DI, V8HF, UQI) +DEF_FUNCTION_TYPE (V8HF, V4DI, V8HF, UQI) +DEF_FUNCTION_TYPE (V8HF, V8HI, V8HF, UQI) DEF_FUNCTION_TYPE (V8HF, V8HF, V8HF, UQI) DEF_FUNCTION_TYPE (V8HF, V8HF, V8HF, INT) DEF_FUNCTION_TYPE (V8HF, V8HF, INT, V8HF, UQI) @@ -1323,18 +1328,22 @@ DEF_FUNCTION_TYPE (UQI, V8HF, V8HF, INT, UQI) DEF_FUNCTION_TYPE (V8HF, V8HF, V8HF, V8HF, UQI) DEF_FUNCTION_TYPE (UQI, V8HF, V8HF, INT, UQI, INT) DEF_FUNCTION_TYPE (V8DI, V8HF, V8DI, UQI, INT) +DEF_FUNCTION_TYPE (V8HF, V8DI, V8HF, UQI, INT) DEF_FUNCTION_TYPE (V8HF, V8HF, V8HF, V8HF, UQI, INT) DEF_FUNCTION_TYPE (V8HF, V8HF, V8HF, INT, V8HF, UQI, INT) DEF_FUNCTION_TYPE (V16HF, V16HF, V16HF) DEF_FUNCTION_TYPE (V16HI, V16HF, V16HI, UHI) +DEF_FUNCTION_TYPE (V16HF, V16HI, V16HF, UHI) DEF_FUNCTION_TYPE (V16HF, V16HF, V16HF, UHI) DEF_FUNCTION_TYPE (V16SI, V16HF, V16SI, UHI, INT) DEF_FUNCTION_TYPE (V16HF, V16HF, INT, V16HF, UHI) DEF_FUNCTION_TYPE (UHI, V16HF, V16HF, INT, UHI) +DEF_FUNCTION_TYPE (V16HF, V16SI, V16HF, UHI, INT) DEF_FUNCTION_TYPE (V16HF, V16HF, V16HF, V16HF, UHI) DEF_FUNCTION_TYPE (V32HF, V32HF, V32HF, USI) DEF_FUNCTION_TYPE (V32HF, V32HF, V32HF, INT) DEF_FUNCTION_TYPE (V32HI, V32HF, V32HI, USI, INT) +DEF_FUNCTION_TYPE (V32HF, V32HI, V32HF, USI, INT) DEF_FUNCTION_TYPE (USI, V32HF, V32HF, INT, USI) DEF_FUNCTION_TYPE (V32HF, V32HF, V32HF, USI, INT) DEF_FUNCTION_TYPE (V32HF, V32HF, V32HF, V32HF, USI) diff --git a/gcc/config/i386/i386-builtin.def b/gcc/config/i386/i386-builtin.def index dde8af53ff0..44c55876e48 100644 --- a/gcc/config/i386/i386-builtin.def +++ b/gcc/config/i386/i386-builtin.def @@ -2843,6 +2843,18 @@ BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp1 BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtph2w_v16hi_mask, "__builtin_ia32_vcvtph2w_v16hi_mask", IX86_BUILTIN_VCVTPH2W_V16HI_MASK, UNKNOWN, (int) V16HI_FTYPE_V16HF_V16HI_UHI) BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtph2uw_v8hi_mask, "__builtin_ia32_vcvtph2uw_v8hi_mask", IX86_BUILTIN_VCVTPH2UW_V8HI_MASK, UNKNOWN, (int) V8HI_FTYPE_V8HF_V8HI_UQI) BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtph2uw_v16hi_mask, "__builtin_ia32_vcvtph2uw_v16hi_mask", IX86_BUILTIN_VCVTPH2UW_V16HI_MASK, UNKNOWN, (int) V16HI_FTYPE_V16HF_V16HI_UHI) +BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtw2ph_v8hi_mask, "__builtin_ia32_vcvtw2ph_v8hi_mask", IX86_BUILTIN_VCVTW2PH_V8HI_MASK, UNKNOWN, (int) V8HF_FTYPE_V8HI_V8HF_UQI) +BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtw2ph_v16hi_mask, "__builtin_ia32_vcvtw2ph_v16hi_mask", IX86_BUILTIN_VCVTW2PH_V16HI_MASK, UNKNOWN, (int) V16HF_FTYPE_V16HI_V16HF_UHI) +BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtuw2ph_v8hi_mask, "__builtin_ia32_vcvtuw2ph_v8hi_mask", IX86_BUILTIN_VCVTUW2PH_V8HI_MASK, UNKNOWN, (int) V8HF_FTYPE_V8HI_V8HF_UQI) +BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtuw2ph_v16hi_mask, "__builtin_ia32_vcvtuw2ph_v16hi_mask", IX86_BUILTIN_VCVTUW2PH_V16HI_MASK, UNKNOWN, (int) V16HF_FTYPE_V16HI_V16HF_UHI) +BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtdq2ph_v4si_mask, "__builtin_ia32_vcvtdq2ph_v4si_mask", IX86_BUILTIN_VCVTDQ2PH_V4SI_MASK, UNKNOWN, (int) V8HF_FTYPE_V4SI_V8HF_UQI) +BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtdq2ph_v8si_mask, "__builtin_ia32_vcvtdq2ph_v8si_mask", IX86_BUILTIN_VCVTDQ2PH_V8SI_MASK, UNKNOWN, (int) V8HF_FTYPE_V8SI_V8HF_UQI) +BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtudq2ph_v4si_mask, "__builtin_ia32_vcvtudq2ph_v4si_mask", IX86_BUILTIN_VCVTUDQ2PH_V4SI_MASK, UNKNOWN, (int) V8HF_FTYPE_V4SI_V8HF_UQI) +BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtudq2ph_v8si_mask, "__builtin_ia32_vcvtudq2ph_v8si_mask", IX86_BUILTIN_VCVTUDQ2PH_V8SI_MASK, UNKNOWN, (int) V8HF_FTYPE_V8SI_V8HF_UQI) +BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtqq2ph_v2di_mask, "__builtin_ia32_vcvtqq2ph_v2di_mask", IX86_BUILTIN_VCVTQQ2PH_V2DI_MASK, UNKNOWN, (int) V8HF_FTYPE_V2DI_V8HF_UQI) +BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtqq2ph_v4di_mask, "__builtin_ia32_vcvtqq2ph_v4di_mask", IX86_BUILTIN_VCVTQQ2PH_V4DI_MASK, UNKNOWN, (int) V8HF_FTYPE_V4DI_V8HF_UQI) +BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtuqq2ph_v2di_mask, "__builtin_ia32_vcvtuqq2ph_v2di_mask", IX86_BUILTIN_VCVTUQQ2PH_V2DI_MASK, UNKNOWN, (int) V8HF_FTYPE_V2DI_V8HF_UQI) +BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtuqq2ph_v4di_mask, "__builtin_ia32_vcvtuqq2ph_v4di_mask", IX86_BUILTIN_VCVTUQQ2PH_V4DI_MASK, UNKNOWN, (int) V8HF_FTYPE_V4DI_V8HF_UQI) /* Builtins with rounding support. */ BDESC_END (ARGS, ROUND_ARGS) @@ -3076,6 +3088,12 @@ BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtph2qq_v8di_mask_r BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtph2uqq_v8di_mask_round, "__builtin_ia32_vcvtph2uqq_v8di_mask_round", IX86_BUILTIN_VCVTPH2UQQ_V8DI_MASK_ROUND, UNKNOWN, (int) V8DI_FTYPE_V8HF_V8DI_UQI_INT) BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtph2w_v32hi_mask_round, "__builtin_ia32_vcvtph2w_v32hi_mask_round", IX86_BUILTIN_VCVTPH2W_V32HI_MASK_ROUND, UNKNOWN, (int) V32HI_FTYPE_V32HF_V32HI_USI_INT) BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtph2uw_v32hi_mask_round, "__builtin_ia32_vcvtph2uw_v32hi_mask_round", IX86_BUILTIN_VCVTPH2UW_V32HI_MASK_ROUND, UNKNOWN, (int) V32HI_FTYPE_V32HF_V32HI_USI_INT) +BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtw2ph_v32hi_mask_round, "__builtin_ia32_vcvtw2ph_v32hi_mask_round", IX86_BUILTIN_VCVTW2PH_V32HI_MASK_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HI_V32HF_USI_INT) +BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtuw2ph_v32hi_mask_round, "__builtin_ia32_vcvtuw2ph_v32hi_mask_round", IX86_BUILTIN_VCVTUW2PH_V32HI_MASK_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HI_V32HF_USI_INT) +BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtdq2ph_v16si_mask_round, "__builtin_ia32_vcvtdq2ph_v16si_mask_round", IX86_BUILTIN_VCVTDQ2PH_V16SI_MASK_ROUND, UNKNOWN, (int) V16HF_FTYPE_V16SI_V16HF_UHI_INT) +BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtudq2ph_v16si_mask_round, "__builtin_ia32_vcvtudq2ph_v16si_mask_round", IX86_BUILTIN_VCVTUDQ2PH_V16SI_MASK_ROUND, UNKNOWN, (int) V16HF_FTYPE_V16SI_V16HF_UHI_INT) +BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtqq2ph_v8di_mask_round, "__builtin_ia32_vcvtqq2ph_v8di_mask_round", IX86_BUILTIN_VCVTQQ2PH_V8DI_MASK_ROUND, UNKNOWN, (int) V8HF_FTYPE_V8DI_V8HF_UQI_INT) +BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtuqq2ph_v8di_mask_round, "__builtin_ia32_vcvtuqq2ph_v8di_mask_round", IX86_BUILTIN_VCVTUQQ2PH_V8DI_MASK_ROUND, UNKNOWN, (int) V8HF_FTYPE_V8DI_V8HF_UQI_INT) BDESC_END (ROUND_ARGS, MULTI_ARG) diff --git a/gcc/config/i386/i386-expand.c b/gcc/config/i386/i386-expand.c index 59d1f4f5eea..7d9e1bd6a2d 100644 --- a/gcc/config/i386/i386-expand.c +++ b/gcc/config/i386/i386-expand.c @@ -9574,6 +9574,11 @@ ix86_expand_args_builtin (const struct builtin_description *d, case V2DI_FTYPE_V8HF_V2DI_UQI: case V2DI_FTYPE_V4SF_V2DI_UQI: case V8HF_FTYPE_V8HF_V8HF_UQI: + case V8HF_FTYPE_V8HI_V8HF_UQI: + case V8HF_FTYPE_V8SI_V8HF_UQI: + case V8HF_FTYPE_V4SI_V8HF_UQI: + case V8HF_FTYPE_V4DI_V8HF_UQI: + case V8HF_FTYPE_V2DI_V8HF_UQI: case V4SF_FTYPE_V4DI_V4SF_UQI: case V4SF_FTYPE_V2DI_V4SF_UQI: case V4DF_FTYPE_V4DI_V4DF_UQI: @@ -9640,6 +9645,7 @@ ix86_expand_args_builtin (const struct builtin_description *d, case V8DI_FTYPE_DI_V8DI_UQI: case V16SF_FTYPE_V8SF_V16SF_UHI: case V16SI_FTYPE_V8SI_V16SI_UHI: + case V16HF_FTYPE_V16HI_V16HF_UHI: case V16HI_FTYPE_V16HF_V16HI_UHI: case V16HI_FTYPE_V16HI_V16HI_UHI: case V8HI_FTYPE_V16QI_V8HI_UQI: @@ -10513,16 +10519,19 @@ ix86_expand_round_builtin (const struct builtin_description *d, case V8DI_FTYPE_V8DF_V8DI_QI_INT: case V8SF_FTYPE_V8DI_V8SF_QI_INT: case V8DF_FTYPE_V8DI_V8DF_QI_INT: + case V32HF_FTYPE_V32HI_V32HF_USI_INT: case V32HF_FTYPE_V32HF_V32HF_USI_INT: case V16SF_FTYPE_V16SF_V16SF_HI_INT: case V8DI_FTYPE_V8SF_V8DI_QI_INT: case V16SF_FTYPE_V16SI_V16SF_HI_INT: case V16SI_FTYPE_V16SF_V16SI_HI_INT: case V16SI_FTYPE_V16HF_V16SI_UHI_INT: + case V16HF_FTYPE_V16SI_V16HF_UHI_INT: case V8DF_FTYPE_V8SF_V8DF_QI_INT: case V16SF_FTYPE_V16HI_V16SF_HI_INT: case V2DF_FTYPE_V2DF_V2DF_V2DF_INT: case V4SF_FTYPE_V4SF_V4SF_V4SF_INT: + case V8HF_FTYPE_V8DI_V8HF_UQI_INT: nargs = 4; break; case V4SF_FTYPE_V4SF_V4SF_INT_INT: diff --git a/gcc/config/i386/i386-modes.def b/gcc/config/i386/i386-modes.def index fcadfcd4c94..699f9a234c9 100644 --- a/gcc/config/i386/i386-modes.def +++ b/gcc/config/i386/i386-modes.def @@ -90,6 +90,8 @@ VECTOR_MODES (FLOAT, 32); /* V16HF V8SF V4DF V2TF */ VECTOR_MODES (FLOAT, 64); /* V32HF V16SF V8DF V4TF */ VECTOR_MODES (FLOAT, 128); /* V64HF V32SF V16DF V8TF */ VECTOR_MODES (FLOAT, 256); /* V128HF V64SF V32DF V16TF */ +VECTOR_MODE (FLOAT, HF, 2) /* V2HF */ +VECTOR_MODE (FLOAT, HF, 6) /* V6HF */ VECTOR_MODE (INT, TI, 1); /* V1TI */ VECTOR_MODE (INT, DI, 1); /* V1DI */ VECTOR_MODE (INT, SI, 1); /* V1SI */ diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index 7b705422396..8b23048a232 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -494,6 +494,11 @@ (define_mode_iterator VI48_AVX512F_AVX512VL (define_mode_iterator VI2_AVX512VL [(V8HI "TARGET_AVX512VL") (V16HI "TARGET_AVX512VL") V32HI]) +(define_mode_iterator VI2H_AVX512VL + [(V8HI "TARGET_AVX512VL") (V16HI "TARGET_AVX512VL") V32HI + (V8SI "TARGET_AVX512VL") V16SI + V8DI ]) + (define_mode_iterator VI1_AVX512VL_F [V32QI (V16QI "TARGET_AVX512VL") (V64QI "TARGET_AVX512F")]) @@ -895,9 +900,9 @@ (define_mode_attr avx512fmaskhalfmode ;; Mapping of vector float modes to an integer mode of the same size (define_mode_attr sseintvecmode - [(V16SF "V16SI") (V8DF "V8DI") - (V8SF "V8SI") (V4DF "V4DI") - (V4SF "V4SI") (V2DF "V2DI") + [(V32HF "V32HI") (V16SF "V16SI") (V8DF "V8DI") + (V16HF "V16HI") (V8SF "V8SI") (V4DF "V4DI") + (V8HF "V8HI") (V4SF "V4SI") (V2DF "V2DI") (V16SI "V16SI") (V8DI "V8DI") (V8SI "V8SI") (V4DI "V4DI") (V4SI "V4SI") (V2DI "V2DI") @@ -5432,6 +5437,11 @@ (define_int_attr sseintconvertsignprefix [(UNSPEC_UNSIGNED_FIX_NOTRUNC "u") (UNSPEC_FIX_NOTRUNC "")]) +(define_mode_attr qq2phsuff + [(V32HI "") (V16HI "") (V8HI "") + (V16SI "") (V8SI "{y}") (V4SI "{x}") + (V8DI "{z}") (V4DI "{y}") (V2DI "{x}")]) + (define_insn "avx512fp16_vcvtph2_" [(set (match_operand:VI248_AVX512VL 0 "register_operand" "=v") (unspec:VI248_AVX512VL @@ -5443,6 +5453,143 @@ (define_insn "avx512fp16_vcvtph2_< (set_attr "prefix" "evex") (set_attr "mode" "")]) +(define_insn "avx512fp16_vcvt2ph_" + [(set (match_operand: 0 "register_operand" "=v") + (any_float: + (match_operand:VI2H_AVX512VL 1 "" "")))] + "TARGET_AVX512FP16" + "vcvt2ph\t{%1, %0|%0, %1}" + [(set_attr "type" "ssecvt") + (set_attr "prefix" "evex") + (set_attr "mode" "")]) + +(define_expand "avx512fp16_vcvt2ph_" + [(set (match_operand:V8HF 0 "register_operand" "=v") + (vec_concat:V8HF + (any_float:V4HF (match_operand:VI4_128_8_256 1 "vector_operand" "vm")) + (match_dup 2)))] + "TARGET_AVX512FP16 && TARGET_AVX512VL" + "operands[2] = CONST0_RTX (V4HFmode);") + +(define_insn "*avx512fp16_vcvt2ph_" + [(set (match_operand:V8HF 0 "register_operand" "=v") + (vec_concat:V8HF + (any_float:V4HF (match_operand:VI4_128_8_256 1 "vector_operand" "vm")) + (match_operand:V4HF 2 "const0_operand" "C")))] + "TARGET_AVX512FP16 && TARGET_AVX512VL" + "vcvt2ph\t{%1, %0|%0, %1}" + [(set_attr "type" "ssecvt") + (set_attr "prefix" "evex") + (set_attr "mode" "")]) + +(define_expand "avx512fp16_vcvt2ph__mask" + [(set (match_operand:V8HF 0 "register_operand" "=v") + (vec_concat:V8HF + (vec_merge:V4HF + (any_float:V4HF (match_operand:VI4_128_8_256 1 "vector_operand" "vm")) + (vec_select:V4HF + (match_operand:V8HF 2 "nonimm_or_0_operand" "0C") + (parallel [(const_int 0) (const_int 1) (const_int 2) (const_int 3)])) + (match_operand:QI 3 "register_operand" "Yk")) + (match_dup 4)))] + "TARGET_AVX512FP16 && TARGET_AVX512VL" + "operands[4] = CONST0_RTX (V4HFmode);") + +(define_insn "*avx512fp16_vcvt2ph__mask" + [(set (match_operand:V8HF 0 "register_operand" "=v") + (vec_concat:V8HF + (vec_merge:V4HF + (any_float:V4HF (match_operand:VI4_128_8_256 1 "vector_operand" "vm")) + (vec_select:V4HF + (match_operand:V8HF 2 "nonimm_or_0_operand" "0C") + (parallel [(const_int 0) (const_int 1) (const_int 2) (const_int 3)])) + (match_operand:QI 3 "register_operand" "Yk")) + (match_operand:V4HF 4 "const0_operand" "C")))] + "TARGET_AVX512FP16 && TARGET_AVX512VL" + "vcvt2ph\t{%1, %0%{%3%}%N2|%0%{%3%}%N2, %1}" + [(set_attr "type" "ssecvt") + (set_attr "prefix" "evex") + (set_attr "mode" "")]) + +(define_insn "*avx512fp16_vcvt2ph__mask_1" + [(set (match_operand:V8HF 0 "register_operand" "=v") + (vec_concat:V8HF + (vec_merge:V4HF + (any_float:V4HF (match_operand:VI4_128_8_256 1 + "vector_operand" "vm")) + (match_operand:V4HF 3 "const0_operand" "C") + (match_operand:QI 2 "register_operand" "Yk")) + (match_operand:V4HF 4 "const0_operand" "C")))] + "TARGET_AVX512FP16 && TARGET_AVX512VL" + "vcvt2ph\t{%1, %0%{%2%}%{z%}|%0%{%2%}%{z%}, %1}" + [(set_attr "type" "ssecvt") + (set_attr "prefix" "evex") + (set_attr "mode" "")]) + +(define_expand "avx512fp16_vcvtqq2ph_v2di" + [(set (match_operand:V8HF 0 "register_operand" "=v") + (vec_concat:V8HF + (any_float:V2HF (match_operand:V2DI 1 "vector_operand" "vm")) + (match_dup 2)))] + "TARGET_AVX512FP16 && TARGET_AVX512VL" + "operands[2] = CONST0_RTX (V6HFmode);") + +(define_insn "*avx512fp16_vcvtqq2ph_v2di" + [(set (match_operand:V8HF 0 "register_operand" "=v") + (vec_concat:V8HF + (any_float:V2HF (match_operand:V2DI 1 "vector_operand" "vm")) + (match_operand:V6HF 2 "const0_operand" "C")))] + "TARGET_AVX512FP16 && TARGET_AVX512VL" + "vcvtqq2ph{x}\t{%1, %0|%0, %1}" + [(set_attr "type" "ssecvt") + (set_attr "prefix" "evex") + (set_attr "mode" "TI")]) + +(define_expand "avx512fp16_vcvtqq2ph_v2di_mask" + [(set (match_operand:V8HF 0 "register_operand" "=v") + (vec_concat:V8HF + (vec_merge:V2HF + (any_float:V2HF (match_operand:V2DI 1 "vector_operand" "vm")) + (vec_select:V2HF + (match_operand:V8HF 2 "nonimm_or_0_operand" "0C") + (parallel [(const_int 0) (const_int 1)])) + (match_operand:QI 3 "register_operand" "Yk")) + (match_dup 4)))] + "TARGET_AVX512FP16 && TARGET_AVX512VL" + "operands[4] = CONST0_RTX (V6HFmode);") + +(define_insn "*avx512fp16_vcvtqq2ph_v2di_mask" + [(set (match_operand:V8HF 0 "register_operand" "=v") + (vec_concat:V8HF + (vec_merge:V2HF + (any_float:V2HF (match_operand:V2DI 1 "vector_operand" "vm")) + (vec_select:V2HF + (match_operand:V8HF 2 "nonimm_or_0_operand" "0C") + (parallel [(const_int 0) (const_int 1)])) + (match_operand:QI 3 "register_operand" "Yk")) + (match_operand:V6HF 4 "const0_operand" "C")))] + "TARGET_AVX512FP16 && TARGET_AVX512VL" + "vcvtqq2ph{x}\t{%1, %0%{%3%}%N2|%0%{%3%}%N2, %1}" + [(set_attr "type" "ssecvt") + (set_attr "prefix" "evex") + (set_attr "mode" "TI")]) + +(define_insn "*avx512fp16_vcvtqq2ph_v2di_mask_1" + [(set (match_operand:V8HF 0 "register_operand" "=v") + (vec_concat:V8HF + (vec_merge:V2HF + (any_float:V2HF (match_operand:V2DI 1 + "vector_operand" "vm")) + (match_operand:V2HF 3 "const0_operand" "C") + (match_operand:QI 2 "register_operand" "Yk")) + (match_operand:V6HF 4 "const0_operand" "C")))] + "TARGET_AVX512FP16 && TARGET_AVX512VL" + "vcvtqq2ph{x}\t{%1, %0%{%2%}%{z%}|%0%{%2%}%{z%}, %1}" + [(set_attr "type" "ssecvt") + (set_attr "prefix" "evex") + (set_attr "mode" "TI")]) + + ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; ;; ;; Parallel single-precision floating point conversion operations diff --git a/gcc/config/i386/subst.md b/gcc/config/i386/subst.md index ecb158f07e5..2e9c2b38e25 100644 --- a/gcc/config/i386/subst.md +++ b/gcc/config/i386/subst.md @@ -134,6 +134,7 @@ (define_subst_attr "round_mask_op3" "round" "" "") (define_subst_attr "round_mask_op4" "round" "" "") (define_subst_attr "round_sd_mask_op4" "round" "" "") (define_subst_attr "round_constraint" "round" "vm" "v") +(define_subst_attr "round_qq2phsuff" "round" "" "") (define_subst_attr "bcst_round_constraint" "round" "vmBr" "v") (define_subst_attr "round_constraint2" "round" "m" "v") (define_subst_attr "round_constraint3" "round" "rm" "r") diff --git a/gcc/testsuite/gcc.target/i386/avx-1.c b/gcc/testsuite/gcc.target/i386/avx-1.c index cdfc2e3b69f..b569cc0bdd9 100644 --- a/gcc/testsuite/gcc.target/i386/avx-1.c +++ b/gcc/testsuite/gcc.target/i386/avx-1.c @@ -725,6 +725,12 @@ #define __builtin_ia32_vcvtph2uqq_v8di_mask_round(A, B, C, D) __builtin_ia32_vcvtph2uqq_v8di_mask_round(A, B, C, 8) #define __builtin_ia32_vcvtph2w_v32hi_mask_round(A, B, C, D) __builtin_ia32_vcvtph2w_v32hi_mask_round(A, B, C, 8) #define __builtin_ia32_vcvtph2uw_v32hi_mask_round(A, B, C, D) __builtin_ia32_vcvtph2uw_v32hi_mask_round(A, B, C, 8) +#define __builtin_ia32_vcvtw2ph_v32hi_mask_round(A, B, C, D) __builtin_ia32_vcvtw2ph_v32hi_mask_round(A, B, C, 8) +#define __builtin_ia32_vcvtuw2ph_v32hi_mask_round(A, B, C, D) __builtin_ia32_vcvtuw2ph_v32hi_mask_round(A, B, C, 8) +#define __builtin_ia32_vcvtdq2ph_v16si_mask_round(A, B, C, D) __builtin_ia32_vcvtdq2ph_v16si_mask_round(A, B, C, 8) +#define __builtin_ia32_vcvtudq2ph_v16si_mask_round(A, B, C, D) __builtin_ia32_vcvtudq2ph_v16si_mask_round(A, B, C, 8) +#define __builtin_ia32_vcvtqq2ph_v8di_mask_round(A, B, C, D) __builtin_ia32_vcvtqq2ph_v8di_mask_round(A, B, C, 8) +#define __builtin_ia32_vcvtuqq2ph_v8di_mask_round(A, B, C, D) __builtin_ia32_vcvtuqq2ph_v8di_mask_round(A, B, C, 8) /* avx512fp16vlintrin.h */ #define __builtin_ia32_vcmpph_v8hf_mask(A, B, C, D) __builtin_ia32_vcmpph_v8hf_mask(A, B, 1, D) diff --git a/gcc/testsuite/gcc.target/i386/sse-13.c b/gcc/testsuite/gcc.target/i386/sse-13.c index 5e4aaf8ce9b..07e59118438 100644 --- a/gcc/testsuite/gcc.target/i386/sse-13.c +++ b/gcc/testsuite/gcc.target/i386/sse-13.c @@ -742,6 +742,12 @@ #define __builtin_ia32_vcvtph2uqq_v8di_mask_round(A, B, C, D) __builtin_ia32_vcvtph2uqq_v8di_mask_round(A, B, C, 8) #define __builtin_ia32_vcvtph2w_v32hi_mask_round(A, B, C, D) __builtin_ia32_vcvtph2w_v32hi_mask_round(A, B, C, 8) #define __builtin_ia32_vcvtph2uw_v32hi_mask_round(A, B, C, D) __builtin_ia32_vcvtph2uw_v32hi_mask_round(A, B, C, 8) +#define __builtin_ia32_vcvtw2ph_v32hi_mask_round(A, B, C, D) __builtin_ia32_vcvtw2ph_v32hi_mask_round(A, B, C, 8) +#define __builtin_ia32_vcvtuw2ph_v32hi_mask_round(A, B, C, D) __builtin_ia32_vcvtuw2ph_v32hi_mask_round(A, B, C, 8) +#define __builtin_ia32_vcvtdq2ph_v16si_mask_round(A, B, C, D) __builtin_ia32_vcvtdq2ph_v16si_mask_round(A, B, C, 8) +#define __builtin_ia32_vcvtudq2ph_v16si_mask_round(A, B, C, D) __builtin_ia32_vcvtudq2ph_v16si_mask_round(A, B, C, 8) +#define __builtin_ia32_vcvtqq2ph_v8di_mask_round(A, B, C, D) __builtin_ia32_vcvtqq2ph_v8di_mask_round(A, B, C, 8) +#define __builtin_ia32_vcvtuqq2ph_v8di_mask_round(A, B, C, D) __builtin_ia32_vcvtuqq2ph_v8di_mask_round(A, B, C, 8) /* avx512fp16vlintrin.h */ #define __builtin_ia32_vcmpph_v8hf_mask(A, B, C, D) __builtin_ia32_vcmpph_v8hf_mask(A, B, 1, D) diff --git a/gcc/testsuite/gcc.target/i386/sse-14.c b/gcc/testsuite/gcc.target/i386/sse-14.c index 32aa4518703..0530192d97e 100644 --- a/gcc/testsuite/gcc.target/i386/sse-14.c +++ b/gcc/testsuite/gcc.target/i386/sse-14.c @@ -684,6 +684,12 @@ test_1 (_mm512_cvt_roundph_epi32, __m512i, __m256h, 8) test_1 (_mm512_cvt_roundph_epu32, __m512i, __m256h, 8) test_1 (_mm512_cvt_roundph_epi64, __m512i, __m128h, 8) test_1 (_mm512_cvt_roundph_epu64, __m512i, __m128h, 8) +test_1 (_mm512_cvt_roundepi16_ph, __m512h, __m512i, 8) +test_1 (_mm512_cvt_roundepu16_ph, __m512h, __m512i, 8) +test_1 (_mm512_cvt_roundepi32_ph, __m256h, __m512i, 8) +test_1 (_mm512_cvt_roundepu32_ph, __m256h, __m512i, 8) +test_1 (_mm512_cvt_roundepi64_ph, __m128h, __m512i, 8) +test_1 (_mm512_cvt_roundepu64_ph, __m128h, __m512i, 8) test_1x (_mm512_reduce_round_ph, __m512h, __m512h, 123, 8) test_1x (_mm512_roundscale_round_ph, __m512h, __m512h, 123, 8) test_1x (_mm512_getmant_ph, __m512h, __m512h, 1, 1) @@ -722,6 +728,12 @@ test_2 (_mm512_maskz_cvt_roundph_epi32, __m512i, __mmask16, __m256h, 8) test_2 (_mm512_maskz_cvt_roundph_epu32, __m512i, __mmask16, __m256h, 8) test_2 (_mm512_maskz_cvt_roundph_epi64, __m512i, __mmask8, __m128h, 8) test_2 (_mm512_maskz_cvt_roundph_epu64, __m512i, __mmask8, __m128h, 8) +test_2 (_mm512_maskz_cvt_roundepi16_ph, __m512h, __mmask32, __m512i, 8) +test_2 (_mm512_maskz_cvt_roundepu16_ph, __m512h, __mmask32, __m512i, 8) +test_2 (_mm512_maskz_cvt_roundepi32_ph, __m256h, __mmask16, __m512i, 8) +test_2 (_mm512_maskz_cvt_roundepu32_ph, __m256h, __mmask16, __m512i, 8) +test_2 (_mm512_maskz_cvt_roundepi64_ph, __m128h, __mmask8, __m512i, 8) +test_2 (_mm512_maskz_cvt_roundepu64_ph, __m128h, __mmask8, __m512i, 8) test_2x (_mm512_cmp_round_ph_mask, __mmask32, __m512h, __m512h, 1, 8) test_2x (_mm_cmp_round_sh_mask, __mmask8, __m128h, __m128h, 1, 8) test_2x (_mm_comi_round_sh, int, __m128h, __m128h, 1, 8) @@ -766,6 +778,12 @@ test_3 (_mm512_mask_cvt_roundph_epi32, __m512i, __m512i, __mmask16, __m256h, 8) test_3 (_mm512_mask_cvt_roundph_epu32, __m512i, __m512i, __mmask16, __m256h, 8) test_3 (_mm512_mask_cvt_roundph_epi64, __m512i, __m512i, __mmask8, __m128h, 8) test_3 (_mm512_mask_cvt_roundph_epu64, __m512i, __m512i, __mmask8, __m128h, 8) +test_3 (_mm512_mask_cvt_roundepi16_ph, __m512h, __m512h, __mmask32, __m512i, 8) +test_3 (_mm512_mask_cvt_roundepu16_ph, __m512h, __m512h, __mmask32, __m512i, 8) +test_3 (_mm512_mask_cvt_roundepi32_ph, __m256h, __m256h, __mmask16, __m512i, 8) +test_3 (_mm512_mask_cvt_roundepu32_ph, __m256h, __m256h, __mmask16, __m512i, 8) +test_3 (_mm512_mask_cvt_roundepi64_ph, __m128h, __m128h, __mmask8, __m512i, 8) +test_3 (_mm512_mask_cvt_roundepu64_ph, __m128h, __m128h, __mmask8, __m512i, 8) test_3x (_mm512_mask_cmp_round_ph_mask, __mmask32, __mmask32, __m512h, __m512h, 1, 8) test_3x (_mm_mask_cmp_round_sh_mask, __mmask8, __mmask8, __m128h, __m128h, 1, 8) test_3x (_mm512_mask_reduce_round_ph, __m512h, __m512h, __mmask32, __m512h, 123, 8) diff --git a/gcc/testsuite/gcc.target/i386/sse-22.c b/gcc/testsuite/gcc.target/i386/sse-22.c index 44ac10d602f..04e6340516b 100644 --- a/gcc/testsuite/gcc.target/i386/sse-22.c +++ b/gcc/testsuite/gcc.target/i386/sse-22.c @@ -789,6 +789,12 @@ test_1 (_mm512_cvt_roundph_epi32, __m512i, __m256h, 8) test_1 (_mm512_cvt_roundph_epu32, __m512i, __m256h, 8) test_1 (_mm512_cvt_roundph_epi64, __m512i, __m128h, 8) test_1 (_mm512_cvt_roundph_epu64, __m512i, __m128h, 8) +test_1 (_mm512_cvt_roundepi16_ph, __m512h, __m512i, 8) +test_1 (_mm512_cvt_roundepu16_ph, __m512h, __m512i, 8) +test_1 (_mm512_cvt_roundepi32_ph, __m256h, __m512i, 8) +test_1 (_mm512_cvt_roundepu32_ph, __m256h, __m512i, 8) +test_1 (_mm512_cvt_roundepi64_ph, __m128h, __m512i, 8) +test_1 (_mm512_cvt_roundepu64_ph, __m128h, __m512i, 8) test_1x (_mm512_reduce_round_ph, __m512h, __m512h, 123, 8) test_1x (_mm512_roundscale_round_ph, __m512h, __m512h, 123, 8) test_1x (_mm512_getmant_ph, __m512h, __m512h, 1, 1) @@ -826,6 +832,12 @@ test_2 (_mm512_maskz_cvt_roundph_epi32, __m512i, __mmask16, __m256h, 8) test_2 (_mm512_maskz_cvt_roundph_epu32, __m512i, __mmask16, __m256h, 8) test_2 (_mm512_maskz_cvt_roundph_epi64, __m512i, __mmask8, __m128h, 8) test_2 (_mm512_maskz_cvt_roundph_epu64, __m512i, __mmask8, __m128h, 8) +test_2 (_mm512_maskz_cvt_roundepi16_ph, __m512h, __mmask32, __m512i, 8) +test_2 (_mm512_maskz_cvt_roundepu16_ph, __m512h, __mmask32, __m512i, 8) +test_2 (_mm512_maskz_cvt_roundepi32_ph, __m256h, __mmask16, __m512i, 8) +test_2 (_mm512_maskz_cvt_roundepu32_ph, __m256h, __mmask16, __m512i, 8) +test_2 (_mm512_maskz_cvt_roundepi64_ph, __m128h, __mmask8, __m512i, 8) +test_2 (_mm512_maskz_cvt_roundepu64_ph, __m128h, __mmask8, __m512i, 8) test_2x (_mm512_cmp_round_ph_mask, __mmask32, __m512h, __m512h, 1, 8) test_2x (_mm_cmp_round_sh_mask, __mmask8, __m128h, __m128h, 1, 8) test_2x (_mm_comi_round_sh, int, __m128h, __m128h, 1, 8) @@ -869,6 +881,12 @@ test_3 (_mm512_mask_cvt_roundph_epi32, __m512i, __m512i, __mmask16, __m256h, 8) test_3 (_mm512_mask_cvt_roundph_epu32, __m512i, __m512i, __mmask16, __m256h, 8) test_3 (_mm512_mask_cvt_roundph_epi64, __m512i, __m512i, __mmask8, __m128h, 8) test_3 (_mm512_mask_cvt_roundph_epu64, __m512i, __m512i, __mmask8, __m128h, 8) +test_3 (_mm512_mask_cvt_roundepi16_ph, __m512h, __m512h, __mmask32, __m512i, 8) +test_3 (_mm512_mask_cvt_roundepu16_ph, __m512h, __m512h, __mmask32, __m512i, 8) +test_3 (_mm512_mask_cvt_roundepi32_ph, __m256h, __m256h, __mmask16, __m512i, 8) +test_3 (_mm512_mask_cvt_roundepu32_ph, __m256h, __m256h, __mmask16, __m512i, 8) +test_3 (_mm512_mask_cvt_roundepi64_ph, __m128h, __m128h, __mmask8, __m512i, 8) +test_3 (_mm512_mask_cvt_roundepu64_ph, __m128h, __m128h, __mmask8, __m512i, 8) test_3x (_mm512_mask_cmp_round_ph_mask, __mmask32, __mmask32, __m512h, __m512h, 1, 8) test_3x (_mm_mask_cmp_round_sh_mask, __mmask8, __mmask8, __m128h, __m128h, 1, 8) test_3x (_mm512_mask_reduce_round_ph, __m512h, __m512h, __mmask32, __m512h, 123, 8) diff --git a/gcc/testsuite/gcc.target/i386/sse-23.c b/gcc/testsuite/gcc.target/i386/sse-23.c index ae6151b4a61..684891cc98b 100644 --- a/gcc/testsuite/gcc.target/i386/sse-23.c +++ b/gcc/testsuite/gcc.target/i386/sse-23.c @@ -743,6 +743,12 @@ #define __builtin_ia32_vcvtph2uqq_v8di_mask_round(A, B, C, D) __builtin_ia32_vcvtph2uqq_v8di_mask_round(A, B, C, 8) #define __builtin_ia32_vcvtph2w_v32hi_mask_round(A, B, C, D) __builtin_ia32_vcvtph2w_v32hi_mask_round(A, B, C, 8) #define __builtin_ia32_vcvtph2uw_v32hi_mask_round(A, B, C, D) __builtin_ia32_vcvtph2uw_v32hi_mask_round(A, B, C, 8) +#define __builtin_ia32_vcvtw2ph_v32hi_mask_round(A, B, C, D) __builtin_ia32_vcvtw2ph_v32hi_mask_round(A, B, C, 8) +#define __builtin_ia32_vcvtuw2ph_v32hi_mask_round(A, B, C, D) __builtin_ia32_vcvtuw2ph_v32hi_mask_round(A, B, C, 8) +#define __builtin_ia32_vcvtdq2ph_v16si_mask_round(A, B, C, D) __builtin_ia32_vcvtdq2ph_v16si_mask_round(A, B, C, 8) +#define __builtin_ia32_vcvtudq2ph_v16si_mask_round(A, B, C, D) __builtin_ia32_vcvtudq2ph_v16si_mask_round(A, B, C, 8) +#define __builtin_ia32_vcvtqq2ph_v8di_mask_round(A, B, C, D) __builtin_ia32_vcvtqq2ph_v8di_mask_round(A, B, C, 8) +#define __builtin_ia32_vcvtuqq2ph_v8di_mask_round(A, B, C, D) __builtin_ia32_vcvtuqq2ph_v8di_mask_round(A, B, C, 8) /* avx512fp16vlintrin.h */ #define __builtin_ia32_vcmpph_v8hf_mask(A, B, C, D) __builtin_ia32_vcmpph_v8hf_mask(A, B, 1, D) From patchwork Thu Jul 1 06:16:15 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: liuhongt X-Patchwork-Id: 1499336 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=8.43.85.97; helo=sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256 header.s=default header.b=rhnizn5H; dkim-atps=neutral Received: from sourceware.org (ip-8-43-85-97.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4GFplW5tbxz9sVb for ; Thu, 1 Jul 2021 16:51:18 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id C304A384843F for ; Thu, 1 Jul 2021 06:51:15 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org C304A384843F DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1625122275; bh=kqw9qGFYyyE0yCdGZpqqH8qnjEEFHfwWy2DRAuQCN1A=; h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=rhnizn5HeMppfra3QwC1QCkQlAdlibegFubqR92rB6vLY+xxfTBX3yPrxMynIspEu ttwyzwxEpxft32eXERz4LQ3iaU0NvDsF8zgVCzlhzb/TpRRXWh96eaKpccnxr6CToE WXnAfF5eBGuzxmUIfzKSep2Bi3IvSsPIgORdjO7E= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by sourceware.org (Postfix) with ESMTPS id 6BB6D384F02A for ; Thu, 1 Jul 2021 06:17:39 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 6BB6D384F02A X-IronPort-AV: E=McAfee;i="6200,9189,10031"; a="205474477" X-IronPort-AV: E=Sophos;i="5.83,313,1616482800"; d="scan'208";a="205474477" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Jun 2021 23:17:38 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.83,313,1616482800"; d="scan'208";a="455498329" Received: from scymds01.sc.intel.com ([10.148.94.138]) by orsmga008.jf.intel.com with ESMTP; 30 Jun 2021 23:17:38 -0700 Received: from shliclel320.sh.intel.com (shliclel320.sh.intel.com [10.239.236.50]) by scymds01.sc.intel.com with ESMTP id 1616Gmf8031625; Wed, 30 Jun 2021 23:17:36 -0700 To: gcc-patches@gcc.gnu.org Subject: [PATCH 29/62] AVX512FP16: Add testcase for vcvtw2ph/vcvtuw2ph/vcvtdq2ph/vcvtudq2ph/vcvtqq2ph/vcvtuqq2ph. Date: Thu, 1 Jul 2021 14:16:15 +0800 Message-Id: <20210701061648.9447-30-hongtao.liu@intel.com> X-Mailer: git-send-email 2.18.1 In-Reply-To: <20210701061648.9447-1-hongtao.liu@intel.com> References: <20210701061648.9447-1-hongtao.liu@intel.com> X-Spam-Status: No, score=-12.3 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_NONE, KAM_DMARC_STATUS, KAM_LAZY_DOMAIN_SECURITY, KAM_SHORT, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: liuhongt via Gcc-patches From: liuhongt Reply-To: liuhongt Cc: jakub@redhat.com Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" gcc/testsuite/ChangeLog: * gcc.target/i386/avx512fp16-vcvtdq2ph-1a.c: New test. * gcc.target/i386/avx512fp16-vcvtdq2ph-1b.c: Ditto. * gcc.target/i386/avx512fp16-vcvtqq2ph-1a.c: Ditto. * gcc.target/i386/avx512fp16-vcvtqq2ph-1b.c: Ditto. * gcc.target/i386/avx512fp16-vcvtudq2ph-1a.c: Ditto. * gcc.target/i386/avx512fp16-vcvtudq2ph-1b.c: Ditto. * gcc.target/i386/avx512fp16-vcvtuqq2ph-1a.c: Ditto. * gcc.target/i386/avx512fp16-vcvtuqq2ph-1b.c: Ditto. * gcc.target/i386/avx512fp16-vcvtuw2ph-1a.c: Ditto. * gcc.target/i386/avx512fp16-vcvtuw2ph-1b.c: Ditto. * gcc.target/i386/avx512fp16-vcvtw2ph-1a.c: Ditto. * gcc.target/i386/avx512fp16-vcvtw2ph-1b.c: Ditto. * gcc.target/i386/avx512fp16vl-vcvtdq2ph-1a.c: Ditto. * gcc.target/i386/avx512fp16vl-vcvtdq2ph-1b.c: Ditto. * gcc.target/i386/avx512fp16vl-vcvtqq2ph-1a.c: Ditto. * gcc.target/i386/avx512fp16vl-vcvtqq2ph-1b.c: Ditto. * gcc.target/i386/avx512fp16vl-vcvtudq2ph-1a.c: Ditto. * gcc.target/i386/avx512fp16vl-vcvtudq2ph-1b.c: Ditto. * gcc.target/i386/avx512fp16vl-vcvtuqq2ph-1a.c: Ditto. * gcc.target/i386/avx512fp16vl-vcvtuqq2ph-1b.c: Ditto. * gcc.target/i386/avx512fp16vl-vcvtuw2ph-1a.c: Ditto. * gcc.target/i386/avx512fp16vl-vcvtuw2ph-1b.c: Ditto. * gcc.target/i386/avx512fp16vl-vcvtw2ph-1a.c: Ditto. * gcc.target/i386/avx512fp16vl-vcvtw2ph-1b.c: Ditto. --- .../gcc.target/i386/avx512fp16-vcvtdq2ph-1a.c | 24 +++++ .../gcc.target/i386/avx512fp16-vcvtdq2ph-1b.c | 79 ++++++++++++++++ .../gcc.target/i386/avx512fp16-vcvtqq2ph-1a.c | 24 +++++ .../gcc.target/i386/avx512fp16-vcvtqq2ph-1b.c | 84 +++++++++++++++++ .../i386/avx512fp16-vcvtudq2ph-1a.c | 24 +++++ .../i386/avx512fp16-vcvtudq2ph-1b.c | 79 ++++++++++++++++ .../i386/avx512fp16-vcvtuqq2ph-1a.c | 24 +++++ .../i386/avx512fp16-vcvtuqq2ph-1b.c | 83 +++++++++++++++++ .../gcc.target/i386/avx512fp16-vcvtuw2ph-1a.c | 24 +++++ .../gcc.target/i386/avx512fp16-vcvtuw2ph-1b.c | 93 +++++++++++++++++++ .../gcc.target/i386/avx512fp16-vcvtw2ph-1a.c | 24 +++++ .../gcc.target/i386/avx512fp16-vcvtw2ph-1b.c | 92 ++++++++++++++++++ .../i386/avx512fp16vl-vcvtdq2ph-1a.c | 27 ++++++ .../i386/avx512fp16vl-vcvtdq2ph-1b.c | 15 +++ .../i386/avx512fp16vl-vcvtqq2ph-1a.c | 28 ++++++ .../i386/avx512fp16vl-vcvtqq2ph-1b.c | 15 +++ .../i386/avx512fp16vl-vcvtudq2ph-1a.c | 27 ++++++ .../i386/avx512fp16vl-vcvtudq2ph-1b.c | 15 +++ .../i386/avx512fp16vl-vcvtuqq2ph-1a.c | 28 ++++++ .../i386/avx512fp16vl-vcvtuqq2ph-1b.c | 15 +++ .../i386/avx512fp16vl-vcvtuw2ph-1a.c | 29 ++++++ .../i386/avx512fp16vl-vcvtuw2ph-1b.c | 15 +++ .../i386/avx512fp16vl-vcvtw2ph-1a.c | 29 ++++++ .../i386/avx512fp16vl-vcvtw2ph-1b.c | 15 +++ 24 files changed, 912 insertions(+) create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvtdq2ph-1a.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvtdq2ph-1b.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvtqq2ph-1a.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvtqq2ph-1b.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvtudq2ph-1a.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvtudq2ph-1b.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvtuqq2ph-1a.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvtuqq2ph-1b.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvtuw2ph-1a.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvtuw2ph-1b.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvtw2ph-1a.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvtw2ph-1b.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtdq2ph-1a.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtdq2ph-1b.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtqq2ph-1a.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtqq2ph-1b.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtudq2ph-1a.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtudq2ph-1b.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtuqq2ph-1a.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtuqq2ph-1b.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtuw2ph-1a.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtuw2ph-1b.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtw2ph-1a.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtw2ph-1b.c diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtdq2ph-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtdq2ph-1a.c new file mode 100644 index 00000000000..45697d94b1c --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtdq2ph-1a.c @@ -0,0 +1,24 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx512fp16 -O2" } */ +/* { dg-final { scan-assembler-times "vcvtdq2ph\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 2 } } */ +/* { dg-final { scan-assembler-times "vcvtdq2ph\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}\[^\{\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtdq2ph\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtdq2ph\[ \\t\]+\{rn-sae\}\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtdq2ph\[ \\t\]+\{rz-sae\}\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ + +#include + +volatile __m256h res, res1, res2; +volatile __m512i x1, x2, x3; +volatile __mmask16 m16; + +void extern +avx512f_test (void) +{ + res = _mm512_cvtepi32_ph (x1); + res1 = _mm512_mask_cvtepi32_ph (res, m16, x2); + res2 = _mm512_maskz_cvtepi32_ph (m16, x3); + res = _mm512_cvt_roundepi32_ph (x1, 4); + res1 = _mm512_mask_cvt_roundepi32_ph (res, m16, x2, 8); + res2 = _mm512_maskz_cvt_roundepi32_ph (m16, x3, 11); +} diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtdq2ph-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtdq2ph-1b.c new file mode 100644 index 00000000000..a2bb56c25d7 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtdq2ph-1b.c @@ -0,0 +1,79 @@ +/* { dg-do run { target avx512fp16 } } */ +/* { dg-options "-O2 -mavx512fp16 -mavx512dq" } */ + + +#define AVX512FP16 +#include "avx512fp16-helper.h" + +#define N_ELEMS (AVX512F_LEN / 32) + +void NOINLINE +EMULATE(cvtd2_ph) (V512 * dest, V512 op1, + __mmask32 k, int zero_mask) +{ + V512 v1, v2, v3, v4, v5, v6, v7, v8; + int i; + __mmask16 m1, m2; + + m1 = k & 0xffff; + + unpack_ph_2twops(*dest, &v7, &v8); + + for (i = 0; i < 16; i++) { + if (((1 << i) & m1) == 0) { + if (zero_mask) { + v5.f32[i] = 0; + } + else { + v5.u32[i] = v7.u32[i]; + } + } + else { + v5.f32[i] = op1.u32[i]; + } + } + *dest = pack_twops_2ph(v5, v5); +} + +void +TEST (void) +{ + V512 res; + V512 exp; + + init_src(); + + EMULATE(cvtd2_ph)(&exp, src3, NET_MASK, 0); + H_HF(res) = INTRINSIC (_cvtepi32_ph) (SI(src3)); + CHECK_RESULT (&res, &exp, N_ELEMS, _cvtepi32_ph); + + init_dest(&res, &exp); + EMULATE(cvtd2_ph)(&exp, src3, HALF_MASK, 0); + H_HF(res) = INTRINSIC (_mask_cvtepi32_ph) (H_HF(res), HALF_MASK, SI(src3)); + CHECK_RESULT (&res, &exp, N_ELEMS, _mask_cvtepi32_ph); + + EMULATE(cvtd2_ph)(&exp, src3, HALF_MASK, 1); + H_HF(res) = INTRINSIC (_maskz_cvtepi32_ph) (HALF_MASK, SI(src3)); + CHECK_RESULT (&res, &exp, N_ELEMS, _maskz_cvtepi32_ph); + +#if AVX512F_LEN == 512 + EMULATE(cvtd2_ph)(&exp, src3, NET_MASK, 0); + H_HF(res) = INTRINSIC (_cvt_roundepi32_ph) (SI(src3), _ROUND_NINT); + CHECK_RESULT (&res, &exp, N_ELEMS, _cvt_roundepi32_ph); + + init_dest(&res, &exp); + EMULATE(cvtd2_ph)(&exp, src3, HALF_MASK, 0); + H_HF(res) = INTRINSIC (_mask_cvt_roundepi32_ph) (H_HF(res), HALF_MASK, SI(src3), _ROUND_NINT); + CHECK_RESULT (&res, &exp, N_ELEMS, _mask_cvt_roundepi32_ph); + + EMULATE(cvtd2_ph)(&exp, src3, HALF_MASK, 1); + H_HF(res) = INTRINSIC (_maskz_cvt_roundepi32_ph) (HALF_MASK, SI(src3), _ROUND_NINT); + CHECK_RESULT (&res, &exp, N_ELEMS, _maskz_cvt_roundepi32_ph); +#endif + + if (n_errs != 0) { + abort (); + } +} + + diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtqq2ph-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtqq2ph-1a.c new file mode 100644 index 00000000000..4e8515e9a3d --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtqq2ph-1a.c @@ -0,0 +1,24 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx512fp16 -O2" } */ +/* { dg-final { scan-assembler-times "vcvtqq2phz\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 2 } } */ +/* { dg-final { scan-assembler-times "vcvtqq2phz\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\[^\{\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtqq2phz\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtqq2ph\[ \\t\]+\{rn-sae\}\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtqq2ph\[ \\t\]+\{rz-sae\}\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ + +#include + +volatile __m128h res, res1, res2; +volatile __m512i x1, x2, x3; +volatile __mmask8 m8; + +void extern +avx512f_test (void) +{ + res = _mm512_cvtepi64_ph (x1); + res1 = _mm512_mask_cvtepi64_ph (res, m8, x2); + res2 = _mm512_maskz_cvtepi64_ph (m8, x3); + res = _mm512_cvt_roundepi64_ph (x1, 4); + res1 = _mm512_mask_cvt_roundepi64_ph (res, m8, x2, 8); + res2 = _mm512_maskz_cvt_roundepi64_ph (m8, x3, 11); +} diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtqq2ph-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtqq2ph-1b.c new file mode 100644 index 00000000000..cb213b9d9f6 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtqq2ph-1b.c @@ -0,0 +1,84 @@ +/* { dg-do run { target avx512fp16 } } */ +/* { dg-options "-O2 -mavx512fp16 -mavx512dq" } */ + + +#define AVX512FP16 +#include "avx512fp16-helper.h" + +#define N_ELEMS (AVX512F_LEN / 64) + +void NOINLINE +EMULATE(cvtq2_ph) (V512 * dest, V512 op1, int n_el, + __mmask32 k, int zero_mask) +{ + V512 v1, v2, v3, v4, v5, v6, v7, v8; + int i; + __mmask16 m1, m2; + + m1 = k & 0xffff; + + unpack_ph_2twops(*dest, &v7, &v8); + + for (i = 0; i < n_el; i++) { + if (((1 << i) & m1) == 0) { + if (zero_mask) { + v5.f32[i] = 0; + } + else { + v5.u32[i] = v7.u32[i]; + } + } + else { + v5.f32[i] = op1.u64[i]; + } + } + + // The left part should be zero + for (i = n_el; i < 16; i++) + v5.f32[i] = 0; + + *dest = pack_twops_2ph(v5, v5); +} + +void +TEST (void) +{ + + V512 res; + V512 exp; + + init_src(); + + EMULATE(cvtq2_ph)(&exp, src3, N_ELEMS, NET_MASK, 0); + res.xmmh[0] = INTRINSIC (_cvtepi64_ph) (SI(src3)); + CHECK_RESULT (&res, &exp, 8, _cvtepi64_ph); + + init_dest(&res, &exp); + EMULATE(cvtq2_ph)(&exp, src3, N_ELEMS, 0xcc, 0); + res.xmmh[0] = INTRINSIC (_mask_cvtepi64_ph) (res.xmmh[0], 0xcc, SI(src3)); + CHECK_RESULT (&res, &exp, 8, _mask_cvtepi64_ph); + + EMULATE(cvtq2_ph)(&exp, src3, N_ELEMS, 0xf1, 1); + res.xmmh[0] = INTRINSIC (_maskz_cvtepi64_ph) (0xf1, SI(src3)); + CHECK_RESULT (&res, &exp, 8, _maskz_cvtepi64_ph); + +#if AVX512F_LEN == 512 + EMULATE(cvtq2_ph)(&exp, src3, N_ELEMS, NET_MASK, 0); + res.xmmh[0] = INTRINSIC (_cvt_roundepi64_ph) (SI(src3), _ROUND_NINT); + CHECK_RESULT (&res, &exp, 8, _cvt_roundepi64_ph); + + init_dest(&res, &exp); + EMULATE(cvtq2_ph)(&exp, src3, N_ELEMS, 0xcc, 0); + res.xmmh[0] = INTRINSIC (_mask_cvt_roundepi64_ph) (res.xmmh[0], 0xcc, SI(src3), _ROUND_NINT); + CHECK_RESULT (&res, &exp, 8, _mask_cvt_roundepi64_ph); + + EMULATE(cvtq2_ph)(&exp, src3, N_ELEMS, 0xf1, 1); + res.xmmh[0] = INTRINSIC (_maskz_cvt_roundepi64_ph) (0xf1, SI(src3), _ROUND_NINT); + CHECK_RESULT (&res, &exp, 8, _maskz_cvt_roundepi64_ph); +#endif + + if (n_errs != 0) { + abort (); + } +} + diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtudq2ph-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtudq2ph-1a.c new file mode 100644 index 00000000000..8d90ef6f168 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtudq2ph-1a.c @@ -0,0 +1,24 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx512fp16 -O2" } */ +/* { dg-final { scan-assembler-times "vcvtudq2ph\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 2 } } */ +/* { dg-final { scan-assembler-times "vcvtudq2ph\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}\[^\{\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtudq2ph\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtudq2ph\[ \\t\]+\{rn-sae\}\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtudq2ph\[ \\t\]+\{rz-sae\}\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ + +#include + +volatile __m256h res, res1, res2; +volatile __m512i x1, x2, x3; +volatile __mmask16 m16; + +void extern +avx512f_test (void) +{ + res = _mm512_cvtepu32_ph (x1); + res1 = _mm512_mask_cvtepu32_ph (res, m16, x2); + res2 = _mm512_maskz_cvtepu32_ph (m16, x3); + res = _mm512_cvt_roundepu32_ph (x1, 4); + res1 = _mm512_mask_cvt_roundepu32_ph (res, m16, x2, 8); + res2 = _mm512_maskz_cvt_roundepu32_ph (m16, x3, 11); +} diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtudq2ph-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtudq2ph-1b.c new file mode 100644 index 00000000000..e9c1cd1bcb0 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtudq2ph-1b.c @@ -0,0 +1,79 @@ +/* { dg-do run { target avx512fp16 } } */ +/* { dg-options "-O2 -mavx512fp16 -mavx512dq" } */ + + +#define AVX512FP16 +#include "avx512fp16-helper.h" + +#define N_ELEMS (AVX512F_LEN / 32) + +void NOINLINE +EMULATE(cvtd2_ph) (V512 * dest, V512 op1, + __mmask32 k, int zero_mask) +{ + V512 v1, v2, v3, v4, v5, v6, v7, v8; + int i; + __mmask16 m1, m2; + + m1 = k & 0xffff; + + unpack_ph_2twops(*dest, &v7, &v8); + + for (i = 0; i < 16; i++) { + if (((1 << i) & m1) == 0) { + if (zero_mask) { + v5.f32[i] = 0; + } + else { + v5.u32[i] = v7.u32[i]; + } + } + else { + v5.f32[i] = op1.u32[i]; + } + } + *dest = pack_twops_2ph(v5, v5); +} + +void +TEST (void) +{ + V512 res; + V512 exp; + + init_src(); + + EMULATE(cvtd2_ph)(&exp, src3, NET_MASK, 0); + H_HF(res)= INTRINSIC (_cvtepu32_ph) (SI(src3)); + CHECK_RESULT (&res, &exp, N_ELEMS, _cvtepu32_ph); + + init_dest(&res, &exp); + EMULATE(cvtd2_ph)(&exp, src3, HALF_MASK, 0); + H_HF(res) = INTRINSIC (_mask_cvtepu32_ph) (H_HF(res), HALF_MASK, SI(src3)); + CHECK_RESULT (&res, &exp, N_ELEMS, _mask_cvtepu32_ph); + + EMULATE(cvtd2_ph)(&exp, src3, HALF_MASK, 1); + H_HF(res) = INTRINSIC (_maskz_cvtepu32_ph) (HALF_MASK, SI(src3)); + CHECK_RESULT (&res, &exp, N_ELEMS, _maskz_cvtepu32_ph); + +#if AVX512F_LEN == 512 + EMULATE(cvtd2_ph)(&exp, src3, NET_MASK, 0); + H_HF(res)= INTRINSIC (_cvt_roundepu32_ph) (SI(src3), _ROUND_NINT); + CHECK_RESULT (&res, &exp, N_ELEMS, _cvt_roundepu32_ph); + + init_dest(&res, &exp); + EMULATE(cvtd2_ph)(&exp, src3, HALF_MASK, 0); + H_HF(res) = INTRINSIC (_mask_cvt_roundepu32_ph) (H_HF(res), HALF_MASK, SI(src3), _ROUND_NINT); + CHECK_RESULT (&res, &exp, N_ELEMS, _mask_cvt_roundepu32_ph); + + EMULATE(cvtd2_ph)(&exp, src3, HALF_MASK, 1); + H_HF(res) = INTRINSIC (_maskz_cvt_roundepu32_ph) (HALF_MASK, SI(src3), _ROUND_NINT); + CHECK_RESULT (&res, &exp, N_ELEMS, _maskz_cvt_roundepu32_ph); +#endif + + if (n_errs != 0) { + abort (); + } +} + + diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtuqq2ph-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtuqq2ph-1a.c new file mode 100644 index 00000000000..a234bb50482 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtuqq2ph-1a.c @@ -0,0 +1,24 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx512fp16 -O2" } */ +/* { dg-final { scan-assembler-times "vcvtuqq2phz\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 2 } } */ +/* { dg-final { scan-assembler-times "vcvtuqq2phz\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\[^\{\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtuqq2phz\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtuqq2ph\[ \\t\]+\{rn-sae\}\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtuqq2ph\[ \\t\]+\{rz-sae\}\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ + +#include + +volatile __m128h res, res1, res2; +volatile __m512i x1, x2, x3; +volatile __mmask8 m8; + +void extern +avx512f_test (void) +{ + res = _mm512_cvtepu64_ph (x1); + res1 = _mm512_mask_cvtepu64_ph (res, m8, x2); + res2 = _mm512_maskz_cvtepu64_ph (m8, x3); + res = _mm512_cvt_roundepu64_ph (x1, 4); + res1 = _mm512_mask_cvt_roundepu64_ph (res, m8, x2, 8); + res2 = _mm512_maskz_cvt_roundepu64_ph (m8, x3, 11); +} diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtuqq2ph-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtuqq2ph-1b.c new file mode 100644 index 00000000000..873d9109e47 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtuqq2ph-1b.c @@ -0,0 +1,83 @@ +/* { dg-do run { target avx512fp16 } } */ +/* { dg-options "-O2 -mavx512fp16 -mavx512dq" } */ + + +#define AVX512FP16 +#include "avx512fp16-helper.h" + +#define N_ELEMS (AVX512F_LEN / 64) + +void NOINLINE +EMULATE(cvtq2_ph) (V512 * dest, V512 op1, int n_el, + __mmask32 k, int zero_mask) +{ + V512 v1, v2, v3, v4, v5, v6, v7, v8; + int i; + __mmask16 m1, m2; + + m1 = k & 0xffff; + + unpack_ph_2twops(*dest, &v7, &v8); + + for (i = 0; i < n_el; i++) { + if (((1 << i) & m1) == 0) { + if (zero_mask) { + v5.f32[i] = 0; + } + else { + v5.u32[i] = v7.u32[i]; + } + } + else { + v5.f32[i] = op1.u64[i]; + } + } + + // The left part should be zero + for (i = n_el; i < 16; i++) + v5.f32[i] = 0; + + *dest = pack_twops_2ph(v5, v5); +} + +void +TEST (void) +{ + V512 res; + V512 exp; + + init_src(); + + EMULATE(cvtq2_ph)(&exp, src3, N_ELEMS, NET_MASK, 0); + res.xmmh[0] = INTRINSIC (_cvtepu64_ph) (SI(src3)); + CHECK_RESULT (&res, &exp, 8, _cvtepu64_ph); + + init_dest(&res, &exp); + EMULATE(cvtq2_ph)(&exp, src3, N_ELEMS, 0xcc, 0); + res.xmmh[0] = INTRINSIC (_mask_cvtepu64_ph) (res.xmmh[0], 0xcc, SI(src3)); + CHECK_RESULT (&res, &exp, 8, _mask_cvtepu64_ph); + + EMULATE(cvtq2_ph)(&exp, src3, N_ELEMS, 0xc1, 1); + res.xmmh[0] = INTRINSIC (_maskz_cvtepu64_ph) (0xc1, SI(src3)); + CHECK_RESULT (&res, &exp, 8, _maskz_cvtepu64_ph); + +#if AVX512F_LEN == 512 + EMULATE(cvtq2_ph)(&exp, src3, N_ELEMS, NET_MASK, 0); + res.xmmh[0] = INTRINSIC (_cvt_roundepu64_ph) (SI(src3), _ROUND_NINT); + CHECK_RESULT (&res, &exp, 8, _cvt_roundepu64_ph); + + init_dest(&res, &exp); + EMULATE(cvtq2_ph)(&exp, src3, N_ELEMS, 0xcc, 0); + res.xmmh[0] = INTRINSIC (_mask_cvt_roundepu64_ph) (res.xmmh[0], 0xcc, SI(src3), _ROUND_NINT); + CHECK_RESULT (&res, &exp, 8, _mask_cvt_roundepu64_ph); + + EMULATE(cvtq2_ph)(&exp, src3, N_ELEMS, 0xc1, 1); + res.xmmh[0] = INTRINSIC (_maskz_cvt_roundepu64_ph) (0xc1, SI(src3), _ROUND_NINT); + CHECK_RESULT (&res, &exp, 8, _maskz_cvt_roundepu64_ph); +#endif + + if (n_errs != 0) { + abort (); + } +} + diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtuw2ph-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtuw2ph-1a.c new file mode 100644 index 00000000000..43c96a0d2fc --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtuw2ph-1a.c @@ -0,0 +1,24 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx512fp16 -O2" } */ +/* { dg-final { scan-assembler-times "vcvtuw2ph\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\[^\{\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtuw2ph\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtuw2ph\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 2 } } */ +/* { dg-final { scan-assembler-times "vcvtuw2ph\[ \\t\]+\{rn-sae\}\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtuw2ph\[ \\t\]+\{rz-sae\}\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ + +#include + +volatile __m512h res; +volatile __m512i x1; +volatile __mmask32 m32; + +void extern +avx512f_test (void) +{ + res = _mm512_cvtepu16_ph (x1); + res = _mm512_mask_cvtepu16_ph (res, m32, x1); + res = _mm512_maskz_cvtepu16_ph (m32, x1); + res = _mm512_cvt_roundepu16_ph (x1, 4); + res = _mm512_mask_cvt_roundepu16_ph (res, m32, x1, 8); + res = _mm512_maskz_cvt_roundepu16_ph (m32, x1, 11); +} diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtuw2ph-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtuw2ph-1b.c new file mode 100644 index 00000000000..6d6b6da342f --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtuw2ph-1b.c @@ -0,0 +1,93 @@ +/* { dg-do run { target avx512fp16 } } */ +/* { dg-options "-O2 -mavx512fp16 -mavx512dq" } */ + + +#define AVX512FP16 +#include "avx512fp16-helper.h" + +#define N_ELEMS (AVX512F_LEN / 16) + +void NOINLINE +EMULATE(cvtw2_ph) (V512 * dest, V512 op1, + __mmask32 k, int zero_mask) +{ + V512 v1, v2, v3, v4, v5, v6, v7, v8; + int i; + __mmask16 m1, m2; + + m1 = k & 0xffff; + m2 = (k >> 16) & 0xffff; + + unpack_ph_2twops(*dest, &v7, &v8); + + for (i = 0; i < 16; i++) { + if (((1 << i) & m1) == 0) { + if (zero_mask) { + v5.f32[i] = 0; + } + else { + v5.f32[i] = v7.f32[i]; + } + } + else { + v5.f32[i] = op1.u16[i]; + + } + + if (((1 << i) & m2) == 0) { + if (zero_mask) { + v6.f32[i] = 0; + } + else { + v6.f32[i] = v8.f32[i]; + } + } + else { + v6.f32[i] = op1.u16[i+16]; + } + } + + *dest = pack_twops_2ph(v5, v6); +} + +void +TEST (void) +{ + V512 res; + V512 exp; + + init_src(); + + EMULATE(cvtw2_ph)(&exp, src3, NET_MASK, 0); + HF(res) = INTRINSIC (_cvtepu16_ph) (SI(src3)); + CHECK_RESULT (&res, &exp, N_ELEMS, _cvtepu16_ph); + + init_dest(&res, &exp); + EMULATE(cvtw2_ph)(&exp, src3, MASK_VALUE, 0); + HF(res) = INTRINSIC (_mask_cvtepu16_ph) (HF(res), MASK_VALUE, SI(src3)); + CHECK_RESULT (&res, &exp, N_ELEMS, _mask_cvtepu16_ph); + + EMULATE(cvtw2_ph)(&exp, src3, ZMASK_VALUE, 1); + HF(res) = INTRINSIC (_maskz_cvtepu16_ph) (ZMASK_VALUE, SI(src3)); + CHECK_RESULT (&res, &exp, N_ELEMS, _maskz_cvtepu16_ph); + +#if AVX512F_LEN == 512 + EMULATE(cvtw2_ph)(&exp, src3, NET_MASK, 0); + HF(res) = INTRINSIC (_cvt_roundepu16_ph) (SI(src3), _ROUND_NINT); + CHECK_RESULT (&res, &exp, N_ELEMS, _cvt_roundepu16_ph); + + init_dest(&res, &exp); + EMULATE(cvtw2_ph)(&exp, src3, MASK_VALUE, 0); + HF(res) = INTRINSIC (_mask_cvt_roundepu16_ph) (HF(res), MASK_VALUE, SI(src3), _ROUND_NINT); + CHECK_RESULT (&res, &exp, N_ELEMS, _mask_cvt_roundepu16_ph); + + EMULATE(cvtw2_ph)(&exp, src3, ZMASK_VALUE, 1); + HF(res) = INTRINSIC (_maskz_cvt_roundepu16_ph) (ZMASK_VALUE, SI(src3), _ROUND_NINT); + CHECK_RESULT (&res, &exp, N_ELEMS, _maskz_cvt_roundepu16_ph); +#endif + + if (n_errs != 0) { + abort (); + } +} + diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtw2ph-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtw2ph-1a.c new file mode 100644 index 00000000000..c6eaee1772b --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtw2ph-1a.c @@ -0,0 +1,24 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx512fp16 -O2" } */ +/* { dg-final { scan-assembler-times "vcvtw2ph\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\[^\{\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtw2ph\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtw2ph\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 2 } } */ +/* { dg-final { scan-assembler-times "vcvtw2ph\[ \\t\]+\{rn-sae\}\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtw2ph\[ \\t\]+\{rz-sae\}\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ + +#include + +volatile __m512h res; +volatile __m512i x1; +volatile __mmask32 m32; + +void extern +avx512f_test (void) +{ + res = _mm512_cvtepi16_ph (x1); + res = _mm512_mask_cvtepi16_ph (res, m32, x1); + res = _mm512_maskz_cvtepi16_ph (m32, x1); + res = _mm512_cvt_roundepi16_ph (x1, 4); + res = _mm512_mask_cvt_roundepi16_ph (res, m32, x1, 8); + res = _mm512_maskz_cvt_roundepi16_ph (m32, x1, 11); +} diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtw2ph-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtw2ph-1b.c new file mode 100644 index 00000000000..e02b6fcdbf7 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtw2ph-1b.c @@ -0,0 +1,92 @@ +/* { dg-do run { target avx512fp16 } } */ +/* { dg-options "-O2 -mavx512fp16 -mavx512dq" } */ + +#define AVX512FP16 +#include "avx512fp16-helper.h" + +#define N_ELEMS (AVX512F_LEN / 16) + +void NOINLINE +EMULATE(cvtw2_ph) (V512 * dest, V512 op1, + __mmask32 k, int zero_mask) +{ + V512 v1, v2, v3, v4, v5, v6, v7, v8; + int i; + __mmask16 m1, m2; + + m1 = k & 0xffff; + m2 = (k >> 16) & 0xffff; + + unpack_ph_2twops(*dest, &v7, &v8); + + for (i = 0; i < 16; i++) { + if (((1 << i) & m1) == 0) { + if (zero_mask) { + v5.f32[i] = 0; + } + else { + v5.f32[i] = v7.f32[i]; + } + } + else { + v5.f32[i] = op1.u16[i]; + + } + + if (((1 << i) & m2) == 0) { + if (zero_mask) { + v6.f32[i] = 0; + } + else { + v6.f32[i] = v8.f32[i]; + } + } + else { + v6.f32[i] = op1.u16[i+16]; + } + } + + *dest = pack_twops_2ph(v5, v6); +} + +void +TEST (void) +{ + V512 res; + V512 exp; + + init_src(); + + EMULATE(cvtw2_ph)(&exp, src3, NET_MASK, 0); + HF(res) = INTRINSIC (_cvtepi16_ph) (SI(src3)); + CHECK_RESULT (&res, &exp, N_ELEMS, _cvtepi16_ph); + + init_dest(&res, &exp); + EMULATE(cvtw2_ph)(&exp, src3, MASK_VALUE, 0); + HF(res) = INTRINSIC (_mask_cvtepi16_ph) (HF(res), MASK_VALUE, SI(src3)); + CHECK_RESULT (&res, &exp, N_ELEMS, _mask_cvtepi16_ph); + + EMULATE(cvtw2_ph)(&exp, src3, ZMASK_VALUE, 1); + HF(res) = INTRINSIC (_maskz_cvtepi16_ph) (ZMASK_VALUE, SI(src3)); + CHECK_RESULT (&res, &exp, N_ELEMS, _maskz_cvtepi16_ph); + +#if AVX512F_LEN == 512 + EMULATE(cvtw2_ph)(&exp, src3, NET_MASK, 0); + HF(res) = INTRINSIC (_cvt_roundepi16_ph) (SI(src3), _ROUND_NINT); + CHECK_RESULT (&res, &exp, N_ELEMS, _cvt_roundepi16_ph); + + init_dest(&res, &exp); + EMULATE(cvtw2_ph)(&exp, src3, MASK_VALUE, 0); + HF(res) = INTRINSIC (_mask_cvt_roundepi16_ph) (HF(res), MASK_VALUE, SI(src3), _ROUND_NINT); + CHECK_RESULT (&res, &exp, N_ELEMS, _mask_cvt_roundepi16_ph); + + EMULATE(cvtw2_ph)(&exp, src3, ZMASK_VALUE, 1); + HF(res) = INTRINSIC (_maskz_cvt_roundepi16_ph) (ZMASK_VALUE, SI(src3), _ROUND_NINT); + CHECK_RESULT (&res, &exp, N_ELEMS, _maskz_cvt_roundepi16_ph); +#endif + + if (n_errs != 0) { + abort (); + } +} + diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtdq2ph-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtdq2ph-1a.c new file mode 100644 index 00000000000..ab0541dce1a --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtdq2ph-1a.c @@ -0,0 +1,27 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx512fp16 -mavx512vl -O2" } */ +/* { dg-final { scan-assembler-times "vcvtdq2phy\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtdq2phy\[ \\t\]+%ymm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtdq2phy\[ \\t\]+%ymm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtdq2phx\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtdq2phx\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtdq2phx\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ + +#include + +volatile __m128h res3; +volatile __m256i x2; +volatile __m128i x3; +volatile __mmask8 m8; + +void extern +avx512f_test (void) +{ + res3 = _mm256_cvtepi32_ph (x2); + res3 = _mm256_mask_cvtepi32_ph (res3, m8, x2); + res3 = _mm256_maskz_cvtepi32_ph (m8, x2); + + res3 = _mm_cvtepi32_ph (x3); + res3 = _mm_mask_cvtepi32_ph (res3, m8, x3); + res3 = _mm_maskz_cvtepi32_ph (m8, x3); +} diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtdq2ph-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtdq2ph-1b.c new file mode 100644 index 00000000000..033587a6704 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtdq2ph-1b.c @@ -0,0 +1,15 @@ +/* { dg-do run { target avx512fp16 } } */ +/* { dg-options "-O2 -mavx512fp16 -mavx512vl -mavx512dq" } */ + +#define AVX512VL +#define AVX512F_LEN 256 +#define AVX512F_LEN_HALF 128 +#include "avx512fp16-vcvtdq2ph-1b.c" + +#undef AVX512F_LEN +#undef AVX512F_LEN_HALF + +#define AVX512F_LEN 128 +#define AVX512F_LEN_HALF 128 +#include "avx512fp16-vcvtdq2ph-1b.c" + diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtqq2ph-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtqq2ph-1a.c new file mode 100644 index 00000000000..8e42a4b29f7 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtqq2ph-1a.c @@ -0,0 +1,28 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx512fp16 -mavx512vl -O2" } */ +/* { dg-final { scan-assembler-times "vcvtqq2phy\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtqq2phy\[ \\t\]+%ymm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtqq2phy\[ \\t\]+%ymm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtqq2phx\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtqq2phx\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtqq2phx\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ + +#include + +volatile __m128h res3; +volatile __m256i x2; +volatile __m128i x3; +volatile __mmask16 m16; +volatile __mmask8 m8; + +void extern +avx512f_test (void) +{ + res3 = _mm256_cvtepi64_ph (x2); + res3 = _mm256_mask_cvtepi64_ph (res3, m16, x2); + res3 = _mm256_maskz_cvtepi64_ph (m16, x2); + + res3 = _mm_cvtepi64_ph (x3); + res3 = _mm_mask_cvtepi64_ph (res3, m8, x3); + res3 = _mm_maskz_cvtepi64_ph (m8, x3); +} diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtqq2ph-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtqq2ph-1b.c new file mode 100644 index 00000000000..6a4a329f368 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtqq2ph-1b.c @@ -0,0 +1,15 @@ +/* { dg-do run { target avx512fp16 } } */ +/* { dg-options "-O2 -mavx512fp16 -mavx512vl -mavx512dq" } */ + +#define AVX512VL +#define AVX512F_LEN 256 +#define AVX512F_LEN_HALF 128 +#include "avx512fp16-vcvtqq2ph-1b.c" + +#undef AVX512F_LEN +#undef AVX512F_LEN_HALF + +#define AVX512F_LEN 128 +#define AVX512F_LEN_HALF 128 +#include "avx512fp16-vcvtqq2ph-1b.c" + diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtudq2ph-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtudq2ph-1a.c new file mode 100644 index 00000000000..4fa2ab92245 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtudq2ph-1a.c @@ -0,0 +1,27 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx512fp16 -mavx512vl -O2" } */ +/* { dg-final { scan-assembler-times "vcvtudq2phy\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtudq2phy\[ \\t\]+%ymm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtudq2phy\[ \\t\]+%ymm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtudq2phx\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtudq2phx\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtudq2phx\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ + +#include + +volatile __m128h res3; +volatile __m256i x2; +volatile __m128i x3; +volatile __mmask8 m8; + +void extern +avx512f_test (void) +{ + res3 = _mm256_cvtepu32_ph (x2); + res3 = _mm256_mask_cvtepu32_ph (res3, m8, x2); + res3 = _mm256_maskz_cvtepu32_ph (m8, x2); + + res3 = _mm_cvtepu32_ph (x3); + res3 = _mm_mask_cvtepu32_ph (res3, m8, x3); + res3 = _mm_maskz_cvtepu32_ph (m8, x3); +} diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtudq2ph-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtudq2ph-1b.c new file mode 100644 index 00000000000..4ea2c268760 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtudq2ph-1b.c @@ -0,0 +1,15 @@ +/* { dg-do run { target avx512fp16 } } */ +/* { dg-options "-O2 -mavx512fp16 -mavx512vl -mavx512dq" } */ + +#define AVX512VL +#define AVX512F_LEN 256 +#define AVX512F_LEN_HALF 128 +#include "avx512fp16-vcvtudq2ph-1b.c" + +#undef AVX512F_LEN +#undef AVX512F_LEN_HALF + +#define AVX512F_LEN 128 +#define AVX512F_LEN_HALF 128 +#include "avx512fp16-vcvtudq2ph-1b.c" + diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtuqq2ph-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtuqq2ph-1a.c new file mode 100644 index 00000000000..a3ee951d4c5 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtuqq2ph-1a.c @@ -0,0 +1,28 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx512fp16 -mavx512vl -O2" } */ +/* { dg-final { scan-assembler-times "vcvtuqq2phy\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtuqq2phy\[ \\t\]+%ymm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtuqq2phy\[ \\t\]+%ymm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtuqq2phx\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtuqq2phx\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtuqq2phx\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ + +#include + +volatile __m128h res3; +volatile __m256i x2; +volatile __m128i x3; +volatile __mmask16 m16; +volatile __mmask8 m8; + +void extern +avx512f_test (void) +{ + res3 = _mm256_cvtepu64_ph (x2); + res3 = _mm256_mask_cvtepu64_ph (res3, m16, x2); + res3 = _mm256_maskz_cvtepu64_ph (m16, x2); + + res3 = _mm_cvtepu64_ph (x3); + res3 = _mm_mask_cvtepu64_ph (res3, m8, x3); + res3 = _mm_maskz_cvtepu64_ph (m8, x3); +} diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtuqq2ph-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtuqq2ph-1b.c new file mode 100644 index 00000000000..c747e8de0dd --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtuqq2ph-1b.c @@ -0,0 +1,15 @@ +/* { dg-do run { target avx512fp16 } } */ +/* { dg-options "-O2 -mavx512fp16 -mavx512vl -mavx512dq" } */ + +#define AVX512VL +#define AVX512F_LEN 256 +#define AVX512F_LEN_HALF 128 +#include "avx512fp16-vcvtuqq2ph-1b.c" + +#undef AVX512F_LEN +#undef AVX512F_LEN_HALF + +#define AVX512F_LEN 128 +#define AVX512F_LEN_HALF 128 +#include "avx512fp16-vcvtuqq2ph-1b.c" + diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtuw2ph-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtuw2ph-1a.c new file mode 100644 index 00000000000..59393dc01a7 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtuw2ph-1a.c @@ -0,0 +1,29 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx512fp16 -mavx512vl -O2" } */ +/* { dg-final { scan-assembler-times "vcvtuw2ph\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtuw2ph\[ \\t\]+%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtuw2ph\[ \\t\]+%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtuw2ph\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtuw2ph\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtuw2ph\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ + +#include + +volatile __m256h res2; +volatile __m128h res3; +volatile __m256i x2; +volatile __m128i x3; +volatile __mmask16 m16; +volatile __mmask8 m8; + +void extern +avx512f_test (void) +{ + res2 = _mm256_cvtepu16_ph (x2); + res2 = _mm256_mask_cvtepu16_ph (res2, m16, x2); + res2 = _mm256_maskz_cvtepu16_ph (m16, x2); + + res3 = _mm_cvtepu16_ph (x3); + res3 = _mm_mask_cvtepu16_ph (res3, m8, x3); + res3 = _mm_maskz_cvtepu16_ph (m8, x3); +} diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtuw2ph-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtuw2ph-1b.c new file mode 100644 index 00000000000..89d94df57b3 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtuw2ph-1b.c @@ -0,0 +1,15 @@ +/* { dg-do run { target avx512fp16 } } */ +/* { dg-options "-O2 -mavx512fp16 -mavx512vl -mavx512dq" } */ + +#define AVX512VL +#define AVX512F_LEN 256 +#define AVX512F_LEN_HALF 128 +#include "avx512fp16-vcvtuw2ph-1b.c" + +#undef AVX512F_LEN +#undef AVX512F_LEN_HALF + +#define AVX512F_LEN 128 +#define AVX512F_LEN_HALF 128 +#include "avx512fp16-vcvtuw2ph-1b.c" + diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtw2ph-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtw2ph-1a.c new file mode 100644 index 00000000000..ff5530f60a2 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtw2ph-1a.c @@ -0,0 +1,29 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx512fp16 -mavx512vl -O2" } */ +/* { dg-final { scan-assembler-times "vcvtw2ph\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtw2ph\[ \\t\]+%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtw2ph\[ \\t\]+%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtw2ph\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtw2ph\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtw2ph\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ + +#include + +volatile __m256h res2; +volatile __m128h res3; +volatile __m256i x2; +volatile __m128i x3; +volatile __mmask16 m16; +volatile __mmask8 m8; + +void extern +avx512f_test (void) +{ + res2 = _mm256_cvtepi16_ph (x2); + res2 = _mm256_mask_cvtepi16_ph (res2, m16, x2); + res2 = _mm256_maskz_cvtepi16_ph (m16, x2); + + res3 = _mm_cvtepi16_ph (x3); + res3 = _mm_mask_cvtepi16_ph (res3, m8, x3); + res3 = _mm_maskz_cvtepi16_ph (m8, x3); +} diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtw2ph-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtw2ph-1b.c new file mode 100644 index 00000000000..243e45bda62 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtw2ph-1b.c @@ -0,0 +1,15 @@ +/* { dg-do run { target avx512fp16 } } */ +/* { dg-options "-O2 -mavx512fp16 -mavx512vl -mavx512dq" } */ + +#define AVX512VL +#define AVX512F_LEN 256 +#define AVX512F_LEN_HALF 128 +#include "avx512fp16-vcvtw2ph-1b.c" + +#undef AVX512F_LEN +#undef AVX512F_LEN_HALF + +#define AVX512F_LEN 128 +#define AVX512F_LEN_HALF 128 +#include "avx512fp16-vcvtw2ph-1b.c" + From patchwork Thu Jul 1 06:16:16 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: liuhongt X-Patchwork-Id: 1499339 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256 header.s=default header.b=qj1tNexx; dkim-atps=neutral Received: from sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4GFpmD0DFpz9sXS for ; Thu, 1 Jul 2021 16:51:56 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 38F3E383A82C for ; Thu, 1 Jul 2021 06:51:53 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 38F3E383A82C DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1625122313; bh=6ehsNjji1QTKphi0ftv+VUWDPXCqKsJq/n7y/VDtQxE=; h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=qj1tNexx58vKw+8eWoOejwnt4LgrdbdP7HaE5xET+Oppb0Br80kUgUdDSPeEc86ak senzMFnEinaszJ46ao8N7wpxJDNuBK8OGeEYfnN4tD6sXUuBz0n69ltcbtp9rCX/K7 ozb/0bNFBXVOMivWOB6LwylCktHzWhe/T2xhiGBE= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mga09.intel.com (mga09.intel.com [134.134.136.24]) by sourceware.org (Postfix) with ESMTPS id E8077385503C for ; Thu, 1 Jul 2021 06:17:40 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org E8077385503C X-IronPort-AV: E=McAfee;i="6200,9189,10031"; a="208430294" X-IronPort-AV: E=Sophos;i="5.83,313,1616482800"; d="scan'208";a="208430294" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by orsmga102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Jun 2021 23:17:39 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.83,313,1616482800"; d="scan'208";a="559303599" Received: from scymds01.sc.intel.com ([10.148.94.138]) by fmsmga001.fm.intel.com with ESMTP; 30 Jun 2021 23:17:39 -0700 Received: from shliclel320.sh.intel.com (shliclel320.sh.intel.com [10.239.236.50]) by scymds01.sc.intel.com with ESMTP id 1616Gmf9031625; Wed, 30 Jun 2021 23:17:38 -0700 To: gcc-patches@gcc.gnu.org Subject: [PATCH 30/62] AVX512FP16: Add vcvtsh2si/vcvtsh2usi/vcvtsi2sh/vcvtusi2sh. Date: Thu, 1 Jul 2021 14:16:16 +0800 Message-Id: <20210701061648.9447-31-hongtao.liu@intel.com> X-Mailer: git-send-email 2.18.1 In-Reply-To: <20210701061648.9447-1-hongtao.liu@intel.com> References: <20210701061648.9447-1-hongtao.liu@intel.com> X-Spam-Status: No, score=-12.6 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_NONE, KAM_DMARC_STATUS, KAM_LAZY_DOMAIN_SECURITY, KAM_SHORT, RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: liuhongt via Gcc-patches From: liuhongt Reply-To: liuhongt Cc: jakub@redhat.com Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" gcc/ChangeLog: * config/i386/avx512fp16intrin.h (_mm_cvtsh_i32): New intrinsic. (_mm_cvtsh_u32): Likewise. (_mm_cvt_roundsh_i32): Likewise. (_mm_cvt_roundsh_u32): Likewise. (_mm_cvtsh_i64): Likewise. (_mm_cvtsh_u64): Likewise. (_mm_cvt_roundsh_i64): Likewise. (_mm_cvt_roundsh_u64): Likewise. (_mm_cvti32_sh): Likewise. (_mm_cvtu32_sh): Likewise. (_mm_cvt_roundi32_sh): Likewise. (_mm_cvt_roundu32_sh): Likewise. (_mm_cvti64_sh): Likewise. (_mm_cvtu64_sh): Likewise. (_mm_cvt_roundi64_sh): Likewise. (_mm_cvt_roundu64_sh): Likewise. * config/i386/i386-builtin-types.def: Add corresponding builtin types. * config/i386/i386-builtin.def: Add corresponding new builtins. * config/i386/i386-expand.c (ix86_expand_round_builtin): Handle new builtin types. * config/i386/sse.md (avx512fp16_vcvtsh2si): New define_insn. (avx512fp16_vcvtsh2si_2): Likewise. (avx512fp16_vcvtsi2sh): Likewise. gcc/testsuite/ChangeLog: * gcc.target/i386/avx-1.c: Add test for new builtins. * gcc.target/i386/sse-13.c: Ditto. * gcc.target/i386/sse-23.c: Ditto. * gcc.target/i386/sse-14.c: Add test for new intrinsics. * gcc.target/i386/sse-22.c: Ditto. --- gcc/config/i386/avx512fp16intrin.h | 158 +++++++++++++++++++++++++ gcc/config/i386/i386-builtin-types.def | 8 ++ gcc/config/i386/i386-builtin.def | 8 ++ gcc/config/i386/i386-expand.c | 8 ++ gcc/config/i386/sse.md | 46 +++++++ gcc/testsuite/gcc.target/i386/avx-1.c | 8 ++ gcc/testsuite/gcc.target/i386/sse-13.c | 8 ++ gcc/testsuite/gcc.target/i386/sse-14.c | 10 ++ gcc/testsuite/gcc.target/i386/sse-22.c | 10 ++ gcc/testsuite/gcc.target/i386/sse-23.c | 8 ++ 10 files changed, 272 insertions(+) diff --git a/gcc/config/i386/avx512fp16intrin.h b/gcc/config/i386/avx512fp16intrin.h index bd801942365..7524a8d6a5b 100644 --- a/gcc/config/i386/avx512fp16intrin.h +++ b/gcc/config/i386/avx512fp16intrin.h @@ -3529,6 +3529,164 @@ _mm512_maskz_cvt_roundepu16_ph (__mmask32 __A, __m512i __B, int __C) #endif /* __OPTIMIZE__ */ +/* Intrinsics vcvtsh2si, vcvtsh2us. */ +extern __inline int +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvtsh_i32 (__m128h __A) +{ + return (int) __builtin_ia32_vcvtsh2si32_round (__A, _MM_FROUND_CUR_DIRECTION); +} + +extern __inline unsigned +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvtsh_u32 (__m128h __A) +{ + return (int) __builtin_ia32_vcvtsh2usi32_round (__A, + _MM_FROUND_CUR_DIRECTION); +} + +#ifdef __OPTIMIZE__ +extern __inline int +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvt_roundsh_i32 (__m128h __A, const int __R) +{ + return (int) __builtin_ia32_vcvtsh2si32_round (__A, __R); +} + +extern __inline unsigned +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvt_roundsh_u32 (__m128h __A, const int __R) +{ + return (int) __builtin_ia32_vcvtsh2usi32_round (__A, __R); +} + +#else +#define _mm_cvt_roundsh_i32(A, B) \ + ((int)__builtin_ia32_vcvtsh2si32_round ((A), (B))) +#define _mm_cvt_roundsh_u32(A, B) \ + ((int)__builtin_ia32_vcvtsh2usi32_round ((A), (B))) + +#endif /* __OPTIMIZE__ */ + +#ifdef __x86_64__ +extern __inline long long +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvtsh_i64 (__m128h __A) +{ + return (long long) + __builtin_ia32_vcvtsh2si64_round (__A, _MM_FROUND_CUR_DIRECTION); +} + +extern __inline unsigned long long +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvtsh_u64 (__m128h __A) +{ + return (long long) + __builtin_ia32_vcvtsh2usi64_round (__A, _MM_FROUND_CUR_DIRECTION); +} + +#ifdef __OPTIMIZE__ +extern __inline long long +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvt_roundsh_i64 (__m128h __A, const int __R) +{ + return (long long) __builtin_ia32_vcvtsh2si64_round (__A, __R); +} + +extern __inline unsigned long long +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvt_roundsh_u64 (__m128h __A, const int __R) +{ + return (long long) __builtin_ia32_vcvtsh2usi64_round (__A, __R); +} + +#else +#define _mm_cvt_roundsh_i64(A, B) \ + ((long long)__builtin_ia32_vcvtsh2si64_round ((A), (B))) +#define _mm_cvt_roundsh_u64(A, B) \ + ((long long)__builtin_ia32_vcvtsh2usi64_round ((A), (B))) + +#endif /* __OPTIMIZE__ */ +#endif /* __x86_64__ */ + +/* Intrinsics vcvtsi2sh, vcvtusi2sh. */ +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvti32_sh (__m128h __A, int __B) +{ + return __builtin_ia32_vcvtsi2sh32_round (__A, __B, _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvtu32_sh (__m128h __A, unsigned int __B) +{ + return __builtin_ia32_vcvtusi2sh32_round (__A, __B, _MM_FROUND_CUR_DIRECTION); +} + +#ifdef __OPTIMIZE__ +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvt_roundi32_sh (__m128h __A, int __B, const int __R) +{ + return __builtin_ia32_vcvtsi2sh32_round (__A, __B, __R); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvt_roundu32_sh (__m128h __A, unsigned int __B, const int __R) +{ + return __builtin_ia32_vcvtusi2sh32_round (__A, __B, __R); +} + +#else +#define _mm_cvt_roundi32_sh(A, B, C) \ + (__builtin_ia32_vcvtsi2sh32_round ((A), (B), (C))) +#define _mm_cvt_roundu32_sh(A, B, C) \ + (__builtin_ia32_vcvtusi2sh32_round ((A), (B), (C))) + +#endif /* __OPTIMIZE__ */ + +#ifdef __x86_64__ +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvti64_sh (__m128h __A, long long __B) +{ + return __builtin_ia32_vcvtsi2sh64_round (__A, __B, _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvtu64_sh (__m128h __A, unsigned long long __B) +{ + return __builtin_ia32_vcvtusi2sh64_round (__A, __B, _MM_FROUND_CUR_DIRECTION); +} + +#ifdef __OPTIMIZE__ +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvt_roundi64_sh (__m128h __A, long long __B, const int __R) +{ + return __builtin_ia32_vcvtsi2sh64_round (__A, __B, __R); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvt_roundu64_sh (__m128h __A, unsigned long long __B, const int __R) +{ + return __builtin_ia32_vcvtusi2sh64_round (__A, __B, __R); +} + +#else +#define _mm_cvt_roundi64_sh(A, B, C) \ + (__builtin_ia32_vcvtsi2sh64_round ((A), (B), (C))) +#define _mm_cvt_roundu64_sh(A, B, C) \ + (__builtin_ia32_vcvtusi2sh64_round ((A), (B), (C))) + +#endif /* __OPTIMIZE__ */ +#endif /* __x86_64__ */ + + #ifdef __DISABLE_AVX512FP16__ #undef __DISABLE_AVX512FP16__ #pragma GCC pop_options diff --git a/gcc/config/i386/i386-builtin-types.def b/gcc/config/i386/i386-builtin-types.def index 57b9ea786e1..74bda59a65e 100644 --- a/gcc/config/i386/i386-builtin-types.def +++ b/gcc/config/i386/i386-builtin-types.def @@ -1308,9 +1308,17 @@ DEF_FUNCTION_TYPE (V8HF, V8HI) DEF_FUNCTION_TYPE (QI, V8HF, INT, UQI) DEF_FUNCTION_TYPE (HI, V16HF, INT, UHI) DEF_FUNCTION_TYPE (SI, V32HF, INT, USI) +DEF_FUNCTION_TYPE (INT, V8HF, INT) +DEF_FUNCTION_TYPE (INT64, V8HF, INT) +DEF_FUNCTION_TYPE (UINT, V8HF, INT) +DEF_FUNCTION_TYPE (UINT64, V8HF, INT) DEF_FUNCTION_TYPE (V8HF, V8HF, V8HF) DEF_FUNCTION_TYPE (VOID, PCFLOAT16, V8HF, UQI) DEF_FUNCTION_TYPE (V8HF, PCFLOAT16, V8HF, UQI) +DEF_FUNCTION_TYPE (V8HF, V8HF, INT, INT) +DEF_FUNCTION_TYPE (V8HF, V8HF, INT64, INT) +DEF_FUNCTION_TYPE (V8HF, V8HF, UINT, INT) +DEF_FUNCTION_TYPE (V8HF, V8HF, UINT64, INT) DEF_FUNCTION_TYPE (V2DI, V8HF, V2DI, UQI) DEF_FUNCTION_TYPE (V4DI, V8HF, V4DI, UQI) DEF_FUNCTION_TYPE (V4SI, V8HF, V4SI, UQI) diff --git a/gcc/config/i386/i386-builtin.def b/gcc/config/i386/i386-builtin.def index 44c55876e48..3602b40d6d5 100644 --- a/gcc/config/i386/i386-builtin.def +++ b/gcc/config/i386/i386-builtin.def @@ -3094,6 +3094,14 @@ BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtdq2ph_v16si_mask_ BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtudq2ph_v16si_mask_round, "__builtin_ia32_vcvtudq2ph_v16si_mask_round", IX86_BUILTIN_VCVTUDQ2PH_V16SI_MASK_ROUND, UNKNOWN, (int) V16HF_FTYPE_V16SI_V16HF_UHI_INT) BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtqq2ph_v8di_mask_round, "__builtin_ia32_vcvtqq2ph_v8di_mask_round", IX86_BUILTIN_VCVTQQ2PH_V8DI_MASK_ROUND, UNKNOWN, (int) V8HF_FTYPE_V8DI_V8HF_UQI_INT) BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtuqq2ph_v8di_mask_round, "__builtin_ia32_vcvtuqq2ph_v8di_mask_round", IX86_BUILTIN_VCVTUQQ2PH_V8DI_MASK_ROUND, UNKNOWN, (int) V8HF_FTYPE_V8DI_V8HF_UQI_INT) +BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtsh2si_round, "__builtin_ia32_vcvtsh2si32_round", IX86_BUILTIN_VCVTSH2SI32_ROUND, UNKNOWN, (int) INT_FTYPE_V8HF_INT) +BDESC (OPTION_MASK_ISA_64BIT, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtsh2siq_round, "__builtin_ia32_vcvtsh2si64_round", IX86_BUILTIN_VCVTSH2SI64_ROUND, UNKNOWN, (int) INT64_FTYPE_V8HF_INT) +BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtsh2usi_round, "__builtin_ia32_vcvtsh2usi32_round", IX86_BUILTIN_VCVTSH2USI32_ROUND, UNKNOWN, (int) UINT_FTYPE_V8HF_INT) +BDESC (OPTION_MASK_ISA_64BIT, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtsh2usiq_round, "__builtin_ia32_vcvtsh2usi64_round", IX86_BUILTIN_VCVTSH2USI64_ROUND, UNKNOWN, (int) UINT64_FTYPE_V8HF_INT) +BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtsi2sh_round, "__builtin_ia32_vcvtsi2sh32_round", IX86_BUILTIN_VCVTSI2SH32_ROUND, UNKNOWN, (int) V8HF_FTYPE_V8HF_INT_INT) +BDESC (OPTION_MASK_ISA_64BIT, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtsi2shq_round, "__builtin_ia32_vcvtsi2sh64_round", IX86_BUILTIN_VCVTSI2SH64_ROUND, UNKNOWN, (int) V8HF_FTYPE_V8HF_INT64_INT) +BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtusi2sh_round, "__builtin_ia32_vcvtusi2sh32_round", IX86_BUILTIN_VCVTUSI2SH32_ROUND, UNKNOWN, (int) V8HF_FTYPE_V8HF_UINT_INT) +BDESC (OPTION_MASK_ISA_64BIT, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtusi2shq_round, "__builtin_ia32_vcvtusi2sh64_round", IX86_BUILTIN_VCVTUSI2SH64_ROUND, UNKNOWN, (int) V8HF_FTYPE_V8HF_UINT64_INT) BDESC_END (ROUND_ARGS, MULTI_ARG) diff --git a/gcc/config/i386/i386-expand.c b/gcc/config/i386/i386-expand.c index 7d9e1bd6a2d..b83c6d9a92b 100644 --- a/gcc/config/i386/i386-expand.c +++ b/gcc/config/i386/i386-expand.c @@ -10489,16 +10489,24 @@ ix86_expand_round_builtin (const struct builtin_description *d, { case UINT64_FTYPE_V2DF_INT: case UINT64_FTYPE_V4SF_INT: + case UINT64_FTYPE_V8HF_INT: case UINT_FTYPE_V2DF_INT: case UINT_FTYPE_V4SF_INT: + case UINT_FTYPE_V8HF_INT: case INT64_FTYPE_V2DF_INT: case INT64_FTYPE_V4SF_INT: + case INT64_FTYPE_V8HF_INT: case INT_FTYPE_V2DF_INT: case INT_FTYPE_V4SF_INT: + case INT_FTYPE_V8HF_INT: nargs = 2; break; case V32HF_FTYPE_V32HF_V32HF_INT: case V8HF_FTYPE_V8HF_V8HF_INT: + case V8HF_FTYPE_V8HF_INT_INT: + case V8HF_FTYPE_V8HF_UINT_INT: + case V8HF_FTYPE_V8HF_INT64_INT: + case V8HF_FTYPE_V8HF_UINT64_INT: case V4SF_FTYPE_V4SF_UINT_INT: case V4SF_FTYPE_V4SF_UINT64_INT: case V2DF_FTYPE_V2DF_UINT64_INT: diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index 8b23048a232..b312d26b806 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -5589,6 +5589,52 @@ (define_insn "*avx512fp16_vcvtqq2ph_v2di_mask_1" (set_attr "prefix" "evex") (set_attr "mode" "TI")]) +(define_insn "avx512fp16_vcvtsh2si" + [(set (match_operand:SWI48 0 "register_operand" "=r,r") + (unspec:SWI48 + [(vec_select:HF + (match_operand:V8HF 1 "" "v,") + (parallel [(const_int 0)]))] + UNSPEC_US_FIX_NOTRUNC))] + "TARGET_AVX512FP16" + "%vcvtsh2si\t{%1, %0|%0, %k1}" + [(set_attr "type" "sseicvt") + (set_attr "athlon_decode" "double,vector") + (set_attr "bdver1_decode" "double,double") + (set_attr "prefix_rep" "1") + (set_attr "prefix" "evex") + (set_attr "mode" "")]) + +(define_insn "avx512fp16_vcvtsh2si_2" + [(set (match_operand:SWI48 0 "register_operand" "=r,r") + (unspec:SWI48 [(match_operand:HF 1 "nonimmediate_operand" "v,m")] + UNSPEC_US_FIX_NOTRUNC))] + "TARGET_AVX512FP16" + "%vcvtsh2si\t{%1, %0|%0, %k1}" + [(set_attr "type" "sseicvt") + (set_attr "athlon_decode" "double,vector") + (set_attr "bdver1_decode" "double,double") + (set_attr "prefix_rep" "1") + (set_attr "prefix" "evex") + (set_attr "mode" "")]) + +(define_insn "avx512fp16_vcvtsi2sh" + [(set (match_operand:V8HF 0 "register_operand" "=v") + (vec_merge:V8HF + (vec_duplicate:V8HF + (any_float:HF (match_operand:SWI48 2 "" ""))) + (match_operand:V8HF 1 "register_operand" "v") + (const_int 1)))] + "TARGET_AVX512FP16" + "vcvtsi2sh\t{%2, %1, %0|%0, %1, %2}" + [(set_attr "type" "sseicvt") + (set_attr "athlon_decode" "*") + (set_attr "amdfam10_decode" "*") + (set_attr "bdver1_decode" "*") + (set_attr "btver2_decode" "double") + (set_attr "znver1_decode" "double") + (set_attr "prefix" "evex") + (set_attr "mode" "HF")]) ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; ;; diff --git a/gcc/testsuite/gcc.target/i386/avx-1.c b/gcc/testsuite/gcc.target/i386/avx-1.c index b569cc0bdd9..0aae949097a 100644 --- a/gcc/testsuite/gcc.target/i386/avx-1.c +++ b/gcc/testsuite/gcc.target/i386/avx-1.c @@ -731,6 +731,14 @@ #define __builtin_ia32_vcvtudq2ph_v16si_mask_round(A, B, C, D) __builtin_ia32_vcvtudq2ph_v16si_mask_round(A, B, C, 8) #define __builtin_ia32_vcvtqq2ph_v8di_mask_round(A, B, C, D) __builtin_ia32_vcvtqq2ph_v8di_mask_round(A, B, C, 8) #define __builtin_ia32_vcvtuqq2ph_v8di_mask_round(A, B, C, D) __builtin_ia32_vcvtuqq2ph_v8di_mask_round(A, B, C, 8) +#define __builtin_ia32_vcvtsh2si32_round(A, B) __builtin_ia32_vcvtsh2si32_round(A, 8) +#define __builtin_ia32_vcvtsh2si64_round(A, B) __builtin_ia32_vcvtsh2si64_round(A, 8) +#define __builtin_ia32_vcvtsh2usi32_round(A, B) __builtin_ia32_vcvtsh2usi32_round(A, 8) +#define __builtin_ia32_vcvtsh2usi64_round(A, B) __builtin_ia32_vcvtsh2usi64_round(A, 8) +#define __builtin_ia32_vcvtsi2sh32_round(A, B, C) __builtin_ia32_vcvtsi2sh32_round(A, B, 8) +#define __builtin_ia32_vcvtsi2sh64_round(A, B, C) __builtin_ia32_vcvtsi2sh64_round(A, B, 8) +#define __builtin_ia32_vcvtusi2sh32_round(A, B, C) __builtin_ia32_vcvtusi2sh32_round(A, B, 8) +#define __builtin_ia32_vcvtusi2sh64_round(A, B, C) __builtin_ia32_vcvtusi2sh64_round(A, B, 8) /* avx512fp16vlintrin.h */ #define __builtin_ia32_vcmpph_v8hf_mask(A, B, C, D) __builtin_ia32_vcmpph_v8hf_mask(A, B, 1, D) diff --git a/gcc/testsuite/gcc.target/i386/sse-13.c b/gcc/testsuite/gcc.target/i386/sse-13.c index 07e59118438..997fb733132 100644 --- a/gcc/testsuite/gcc.target/i386/sse-13.c +++ b/gcc/testsuite/gcc.target/i386/sse-13.c @@ -748,6 +748,14 @@ #define __builtin_ia32_vcvtudq2ph_v16si_mask_round(A, B, C, D) __builtin_ia32_vcvtudq2ph_v16si_mask_round(A, B, C, 8) #define __builtin_ia32_vcvtqq2ph_v8di_mask_round(A, B, C, D) __builtin_ia32_vcvtqq2ph_v8di_mask_round(A, B, C, 8) #define __builtin_ia32_vcvtuqq2ph_v8di_mask_round(A, B, C, D) __builtin_ia32_vcvtuqq2ph_v8di_mask_round(A, B, C, 8) +#define __builtin_ia32_vcvtsh2si32_round(A, B) __builtin_ia32_vcvtsh2si32_round(A, 8) +#define __builtin_ia32_vcvtsh2si64_round(A, B) __builtin_ia32_vcvtsh2si64_round(A, 8) +#define __builtin_ia32_vcvtsh2usi32_round(A, B) __builtin_ia32_vcvtsh2usi32_round(A, 8) +#define __builtin_ia32_vcvtsh2usi64_round(A, B) __builtin_ia32_vcvtsh2usi64_round(A, 8) +#define __builtin_ia32_vcvtsi2sh32_round(A, B, C) __builtin_ia32_vcvtsi2sh32_round(A, B, 8) +#define __builtin_ia32_vcvtsi2sh64_round(A, B, C) __builtin_ia32_vcvtsi2sh64_round(A, B, 8) +#define __builtin_ia32_vcvtusi2sh32_round(A, B, C) __builtin_ia32_vcvtusi2sh32_round(A, B, 8) +#define __builtin_ia32_vcvtusi2sh64_round(A, B, C) __builtin_ia32_vcvtusi2sh64_round(A, B, 8) /* avx512fp16vlintrin.h */ #define __builtin_ia32_vcmpph_v8hf_mask(A, B, C, D) __builtin_ia32_vcmpph_v8hf_mask(A, B, 1, D) diff --git a/gcc/testsuite/gcc.target/i386/sse-14.c b/gcc/testsuite/gcc.target/i386/sse-14.c index 0530192d97e..89a589e0d80 100644 --- a/gcc/testsuite/gcc.target/i386/sse-14.c +++ b/gcc/testsuite/gcc.target/i386/sse-14.c @@ -690,6 +690,14 @@ test_1 (_mm512_cvt_roundepi32_ph, __m256h, __m512i, 8) test_1 (_mm512_cvt_roundepu32_ph, __m256h, __m512i, 8) test_1 (_mm512_cvt_roundepi64_ph, __m128h, __m512i, 8) test_1 (_mm512_cvt_roundepu64_ph, __m128h, __m512i, 8) +test_1 (_mm_cvt_roundsh_i32, int, __m128h, 8) +test_1 (_mm_cvt_roundsh_u32, unsigned, __m128h, 8) +#ifdef __x86_64__ +test_1 (_mm_cvt_roundsh_i64, long long, __m128h, 8) +test_1 (_mm_cvt_roundsh_u64, unsigned long long, __m128h, 8) +test_2 (_mm_cvt_roundi64_sh, __m128h, __m128h, long long, 8) +test_2 (_mm_cvt_roundu64_sh, __m128h, __m128h, unsigned long long, 8) +#endif test_1x (_mm512_reduce_round_ph, __m512h, __m512h, 123, 8) test_1x (_mm512_roundscale_round_ph, __m512h, __m512h, 123, 8) test_1x (_mm512_getmant_ph, __m512h, __m512h, 1, 1) @@ -734,6 +742,8 @@ test_2 (_mm512_maskz_cvt_roundepi32_ph, __m256h, __mmask16, __m512i, 8) test_2 (_mm512_maskz_cvt_roundepu32_ph, __m256h, __mmask16, __m512i, 8) test_2 (_mm512_maskz_cvt_roundepi64_ph, __m128h, __mmask8, __m512i, 8) test_2 (_mm512_maskz_cvt_roundepu64_ph, __m128h, __mmask8, __m512i, 8) +test_2 (_mm_cvt_roundi32_sh, __m128h, __m128h, int, 8) +test_2 (_mm_cvt_roundu32_sh, __m128h, __m128h, unsigned, 8) test_2x (_mm512_cmp_round_ph_mask, __mmask32, __m512h, __m512h, 1, 8) test_2x (_mm_cmp_round_sh_mask, __mmask8, __m128h, __m128h, 1, 8) test_2x (_mm_comi_round_sh, int, __m128h, __m128h, 1, 8) diff --git a/gcc/testsuite/gcc.target/i386/sse-22.c b/gcc/testsuite/gcc.target/i386/sse-22.c index 04e6340516b..fed12744c6c 100644 --- a/gcc/testsuite/gcc.target/i386/sse-22.c +++ b/gcc/testsuite/gcc.target/i386/sse-22.c @@ -795,6 +795,14 @@ test_1 (_mm512_cvt_roundepi32_ph, __m256h, __m512i, 8) test_1 (_mm512_cvt_roundepu32_ph, __m256h, __m512i, 8) test_1 (_mm512_cvt_roundepi64_ph, __m128h, __m512i, 8) test_1 (_mm512_cvt_roundepu64_ph, __m128h, __m512i, 8) +test_1 (_mm_cvt_roundsh_i32, int, __m128h, 8) +test_1 (_mm_cvt_roundsh_u32, unsigned, __m128h, 8) +#ifdef __x86_64__ +test_1 (_mm_cvt_roundsh_i64, long long, __m128h, 8) +test_1 (_mm_cvt_roundsh_u64, unsigned long long, __m128h, 8) +test_2 (_mm_cvt_roundi64_sh, __m128h, __m128h, long long, 8) +test_2 (_mm_cvt_roundu64_sh, __m128h, __m128h, unsigned long long, 8) +#endif test_1x (_mm512_reduce_round_ph, __m512h, __m512h, 123, 8) test_1x (_mm512_roundscale_round_ph, __m512h, __m512h, 123, 8) test_1x (_mm512_getmant_ph, __m512h, __m512h, 1, 1) @@ -838,6 +846,8 @@ test_2 (_mm512_maskz_cvt_roundepi32_ph, __m256h, __mmask16, __m512i, 8) test_2 (_mm512_maskz_cvt_roundepu32_ph, __m256h, __mmask16, __m512i, 8) test_2 (_mm512_maskz_cvt_roundepi64_ph, __m128h, __mmask8, __m512i, 8) test_2 (_mm512_maskz_cvt_roundepu64_ph, __m128h, __mmask8, __m512i, 8) +test_2 (_mm_cvt_roundi32_sh, __m128h, __m128h, int, 8) +test_2 (_mm_cvt_roundu32_sh, __m128h, __m128h, unsigned, 8) test_2x (_mm512_cmp_round_ph_mask, __mmask32, __m512h, __m512h, 1, 8) test_2x (_mm_cmp_round_sh_mask, __mmask8, __m128h, __m128h, 1, 8) test_2x (_mm_comi_round_sh, int, __m128h, __m128h, 1, 8) diff --git a/gcc/testsuite/gcc.target/i386/sse-23.c b/gcc/testsuite/gcc.target/i386/sse-23.c index 684891cc98b..6e8d8a1833c 100644 --- a/gcc/testsuite/gcc.target/i386/sse-23.c +++ b/gcc/testsuite/gcc.target/i386/sse-23.c @@ -749,6 +749,14 @@ #define __builtin_ia32_vcvtudq2ph_v16si_mask_round(A, B, C, D) __builtin_ia32_vcvtudq2ph_v16si_mask_round(A, B, C, 8) #define __builtin_ia32_vcvtqq2ph_v8di_mask_round(A, B, C, D) __builtin_ia32_vcvtqq2ph_v8di_mask_round(A, B, C, 8) #define __builtin_ia32_vcvtuqq2ph_v8di_mask_round(A, B, C, D) __builtin_ia32_vcvtuqq2ph_v8di_mask_round(A, B, C, 8) +#define __builtin_ia32_vcvtsh2si32_round(A, B) __builtin_ia32_vcvtsh2si32_round(A, 8) +#define __builtin_ia32_vcvtsh2si64_round(A, B) __builtin_ia32_vcvtsh2si64_round(A, 8) +#define __builtin_ia32_vcvtsh2usi32_round(A, B) __builtin_ia32_vcvtsh2usi32_round(A, 8) +#define __builtin_ia32_vcvtsh2usi64_round(A, B) __builtin_ia32_vcvtsh2usi64_round(A, 8) +#define __builtin_ia32_vcvtsi2sh32_round(A, B, C) __builtin_ia32_vcvtsi2sh32_round(A, B, 8) +#define __builtin_ia32_vcvtsi2sh64_round(A, B, C) __builtin_ia32_vcvtsi2sh64_round(A, B, 8) +#define __builtin_ia32_vcvtusi2sh32_round(A, B, C) __builtin_ia32_vcvtusi2sh32_round(A, B, 8) +#define __builtin_ia32_vcvtusi2sh64_round(A, B, C) __builtin_ia32_vcvtusi2sh64_round(A, B, 8) /* avx512fp16vlintrin.h */ #define __builtin_ia32_vcmpph_v8hf_mask(A, B, C, D) __builtin_ia32_vcmpph_v8hf_mask(A, B, 1, D) From patchwork Thu Jul 1 06:16:17 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: liuhongt X-Patchwork-Id: 1499370 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256 header.s=default header.b=Z8iIZuOf; dkim-atps=neutral Received: from sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4GFpny0Qcyz9sX5 for ; Thu, 1 Jul 2021 16:53:25 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 1265C383B81D for ; Thu, 1 Jul 2021 06:53:23 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 1265C383B81D DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1625122403; bh=LvM/bMzZamYaXpiXEqTTetzrq/RRPL9Tl+a9UzPiNwI=; h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=Z8iIZuOfH+I/I1ReXxiLa03id7MW6/u5iakeQxkKerTreWNhHCORCN9fyASKQBIsW pYLfOHh7S0m9AgGhGqSNDvvqMZllN1PDX4k9uNwAMB8RqnAVWs+qt9Eu6rQHwC4WJQ qaj9lPqwBtlogjLJZFOEQrqUWS7y7r0ZxnNIMqlc= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mga07.intel.com (mga07.intel.com [134.134.136.100]) by sourceware.org (Postfix) with ESMTPS id 73D77384841D for ; Thu, 1 Jul 2021 06:17:42 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 73D77384841D X-IronPort-AV: E=McAfee;i="6200,9189,10031"; a="272334079" X-IronPort-AV: E=Sophos;i="5.83,313,1616482800"; d="scan'208";a="272334079" Received: from orsmga001.jf.intel.com ([10.7.209.18]) by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Jun 2021 23:17:41 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.83,313,1616482800"; d="scan'208";a="489821390" Received: from scymds01.sc.intel.com ([10.148.94.138]) by orsmga001.jf.intel.com with ESMTP; 30 Jun 2021 23:17:41 -0700 Received: from shliclel320.sh.intel.com (shliclel320.sh.intel.com [10.239.236.50]) by scymds01.sc.intel.com with ESMTP id 1616GmfA031625; Wed, 30 Jun 2021 23:17:40 -0700 To: gcc-patches@gcc.gnu.org Subject: [PATCH 31/62] AVX512FP16: Add testcase for vcvtsh2si/vcvtsh2usi/vcvtsi2sh/vcvtusi2sh. Date: Thu, 1 Jul 2021 14:16:17 +0800 Message-Id: <20210701061648.9447-32-hongtao.liu@intel.com> X-Mailer: git-send-email 2.18.1 In-Reply-To: <20210701061648.9447-1-hongtao.liu@intel.com> References: <20210701061648.9447-1-hongtao.liu@intel.com> X-Spam-Status: No, score=-12.6 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_NONE, KAM_DMARC_STATUS, KAM_LAZY_DOMAIN_SECURITY, KAM_SHORT, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: liuhongt via Gcc-patches From: liuhongt Reply-To: liuhongt Cc: jakub@redhat.com Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" gcc/testsuite/ChangeLog: * gcc.target/i386/avx512fp16-helper.h (V512): Add int32 component. * gcc.target/i386/avx512fp16-vcvtsh2si-1a.c: New test. * gcc.target/i386/avx512fp16-vcvtsh2si-1b.c: Ditto. * gcc.target/i386/avx512fp16-vcvtsh2si64-1a.c: Ditto. * gcc.target/i386/avx512fp16-vcvtsh2si64-1b.c: Ditto. * gcc.target/i386/avx512fp16-vcvtsh2usi-1a.c: Ditto. * gcc.target/i386/avx512fp16-vcvtsh2usi-1b.c: Ditto. * gcc.target/i386/avx512fp16-vcvtsh2usi64-1a.c: Ditto. * gcc.target/i386/avx512fp16-vcvtsh2usi64-1b.c: Ditto. * gcc.target/i386/avx512fp16-vcvtsi2sh-1a.c: Ditto. * gcc.target/i386/avx512fp16-vcvtsi2sh-1b.c: Ditto. * gcc.target/i386/avx512fp16-vcvtsi2sh64-1a.c: Ditto. * gcc.target/i386/avx512fp16-vcvtsi2sh64-1b.c: Ditto. * gcc.target/i386/avx512fp16-vcvtusi2sh-1a.c: Ditto. * gcc.target/i386/avx512fp16-vcvtusi2sh-1b.c: Ditto. * gcc.target/i386/avx512fp16-vcvtusi2sh64-1a.c: Ditto. * gcc.target/i386/avx512fp16-vcvtusi2sh64-1b.c: Ditto. --- .../gcc.target/i386/avx512fp16-helper.h | 1 + .../gcc.target/i386/avx512fp16-vcvtsh2si-1a.c | 17 ++++++ .../gcc.target/i386/avx512fp16-vcvtsh2si-1b.c | 54 +++++++++++++++++++ .../i386/avx512fp16-vcvtsh2si64-1a.c | 17 ++++++ .../i386/avx512fp16-vcvtsh2si64-1b.c | 52 ++++++++++++++++++ .../i386/avx512fp16-vcvtsh2usi-1a.c | 17 ++++++ .../i386/avx512fp16-vcvtsh2usi-1b.c | 54 +++++++++++++++++++ .../i386/avx512fp16-vcvtsh2usi64-1a.c | 16 ++++++ .../i386/avx512fp16-vcvtsh2usi64-1b.c | 53 ++++++++++++++++++ .../gcc.target/i386/avx512fp16-vcvtsi2sh-1a.c | 16 ++++++ .../gcc.target/i386/avx512fp16-vcvtsi2sh-1b.c | 41 ++++++++++++++ .../i386/avx512fp16-vcvtsi2sh64-1a.c | 16 ++++++ .../i386/avx512fp16-vcvtsi2sh64-1b.c | 41 ++++++++++++++ .../i386/avx512fp16-vcvtusi2sh-1a.c | 16 ++++++ .../i386/avx512fp16-vcvtusi2sh-1b.c | 41 ++++++++++++++ .../i386/avx512fp16-vcvtusi2sh64-1a.c | 16 ++++++ .../i386/avx512fp16-vcvtusi2sh64-1b.c | 41 ++++++++++++++ 17 files changed, 509 insertions(+) create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvtsh2si-1a.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvtsh2si-1b.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvtsh2si64-1a.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvtsh2si64-1b.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvtsh2usi-1a.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvtsh2usi-1b.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvtsh2usi64-1a.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvtsh2usi64-1b.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvtsi2sh-1a.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvtsi2sh-1b.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvtsi2sh64-1a.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvtsi2sh64-1b.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvtusi2sh-1a.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvtusi2sh-1b.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvtusi2sh64-1a.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvtusi2sh64-1b.c diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-helper.h b/gcc/testsuite/gcc.target/i386/avx512fp16-helper.h index aa83b66998c..cf1c536d9f7 100644 --- a/gcc/testsuite/gcc.target/i386/avx512fp16-helper.h +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-helper.h @@ -34,6 +34,7 @@ typedef union __m128i xmmi[4]; unsigned short u16[32]; unsigned int u32[16]; + int i32[16]; long long s64[8]; unsigned long long u64[8]; float f32[16]; diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtsh2si-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtsh2si-1a.c new file mode 100644 index 00000000000..f29c953572d --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtsh2si-1a.c @@ -0,0 +1,17 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx512fp16 -O2" } */ +/* { dg-final { scan-assembler-times "vcvtsh2si\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%eax" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtsh2si\[ \\t\]+\{rn-sae\}\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%eax" 1 } } */ + + +#include + +volatile __m128h x1; +volatile int res1; + +void extern +avx512f_test (void) +{ + res1 = _mm_cvtsh_i32 (x1); + res1 = _mm_cvt_roundsh_i32 (x1, 8); +} diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtsh2si-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtsh2si-1b.c new file mode 100644 index 00000000000..89c492cfc44 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtsh2si-1b.c @@ -0,0 +1,54 @@ +/* { dg-do run { target avx512fp16 } } */ +/* { dg-options "-O2 -mavx512fp16 -mavx512dq" } */ + + +#define AVX512FP16 +#include "avx512fp16-helper.h" + +#define N_ELEMS 2 + +void NOINLINE +emulate_cvtph2_d(V512 * dest, V512 op1, + __mmask32 k, int zero_mask) +{ + V512 v1, v2, v3, v4, v5, v6, v7, v8; + int i; + __mmask16 m1, m2; + + m1 = k & 0xffff; + + unpack_ph_2twops(op1, &v1, &v2); + + for (i = 0; i < 16; i++) { + if (((1 << i) & m1) == 0) { + if (zero_mask) { + v5.u32[i] = 0; + } + else { + v5.u32[i] = dest->u32[i]; + } + } + else { + v5.u32[i] = v1.f32[i]; + + } + } + *dest = v5; +} + +void +test_512 (void) +{ + V512 res; + V512 exp; + + init_src(); + emulate_cvtph2_d(&exp, src1, NET_MASK, 0); + res.i32[0] = _mm_cvt_roundsh_i32(src1.xmmh[0], _ROUND_NINT); + check_results(&res, &exp, N_ELEMS, "_mm_cvt_roundsh_i32"); + + if (n_errs != 0) { + abort (); + } +} + diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtsh2si64-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtsh2si64-1a.c new file mode 100644 index 00000000000..0289ebf95ea --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtsh2si64-1a.c @@ -0,0 +1,17 @@ +/* { dg-do compile { target { ! ia32 } } } */ +/* { dg-options "-mavx512fp16 -O2" } */ +/* { dg-final { scan-assembler-times "vcvtsh2si\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%rax" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtsh2si\[ \\t\]+\{rz-sae\}\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%rax" 1 } } */ + + +#include + +volatile __m128h x1; +volatile long long res2; + +void extern +avx512f_test (void) +{ + res2 = _mm_cvtsh_i64 (x1); + res2 = _mm_cvt_roundsh_i64 (x1, 11); +} diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtsh2si64-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtsh2si64-1b.c new file mode 100644 index 00000000000..6a5e836fd7f --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtsh2si64-1b.c @@ -0,0 +1,52 @@ +/* { dg-do run { target { { ! ia32 } && avx512fp16 } } } */ +/* { dg-options "-O2 -mavx512fp16 -mavx512dq" } */ + +#define AVX512FP16 +#include "avx512fp16-helper.h" + +#define N_ELEMS 4 + +void NOINLINE +emulate_cvtph2_q(V512 * dest, V512 op1, + __mmask32 k, int zero_mask) +{ + V512 v1, v2, v3, v4, v5, v6, v7, v8; + int i; + __mmask16 m1, m2; + + m1 = k & 0xffff; + + unpack_ph_2twops(op1, &v1, &v2); + + for (i = 0; i < 8; i++) { + if (((1 << i) & m1) == 0) { + if (zero_mask) { + v5.u64[i] = 0; + } + else { + v5.u64[i] = dest->u64[i]; + } + } + else { + v5.u64[i] = v1.f32[i]; + } + } + *dest = v5; +} + +void +test_512 (void) +{ + V512 res; + V512 exp; + + init_src(); + emulate_cvtph2_q(&exp, src1, NET_MASK, 0); + res.s64[0] = _mm_cvt_roundsh_i64(src1.xmmh[0], _ROUND_NINT); + check_results(&res, &exp, N_ELEMS, "_mm_cvt_roundsh_i64"); + + if (n_errs != 0) { + abort (); + } +} + diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtsh2usi-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtsh2usi-1a.c new file mode 100644 index 00000000000..7d00867247e --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtsh2usi-1a.c @@ -0,0 +1,17 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx512fp16 -O2" } */ +/* { dg-final { scan-assembler-times "vcvtsh2usi\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%eax" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtsh2usi\[ \\t\]+\{rn-sae\}\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%eax" 1 } } */ + + +#include + +volatile __m128h x1; +volatile unsigned int res1; + +void extern +avx512f_test (void) +{ + res1 = _mm_cvtsh_u32 (x1); + res1 = _mm_cvt_roundsh_u32 (x1, 8); +} diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtsh2usi-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtsh2usi-1b.c new file mode 100644 index 00000000000..466ce6ead83 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtsh2usi-1b.c @@ -0,0 +1,54 @@ +/* { dg-do run { target avx512fp16 } } */ +/* { dg-options "-O2 -mavx512fp16 -mavx512dq" } */ + + +#define AVX512FP16 +#include "avx512fp16-helper.h" + +#define N_ELEMS 2 + +void NOINLINE +emulate_cvtph2_d(V512 * dest, V512 op1, + __mmask32 k, int zero_mask) +{ + V512 v1, v2, v3, v4, v5, v6, v7, v8; + int i; + __mmask16 m1, m2; + + m1 = k & 0xffff; + + unpack_ph_2twops(op1, &v1, &v2); + + for (i = 0; i < 16; i++) { + if (((1 << i) & m1) == 0) { + if (zero_mask) { + v5.u32[i] = 0; + } + else { + v5.u32[i] = dest->u32[i]; + } + } + else { + v5.u32[i] = v1.f32[i]; + + } + } + *dest = v5; +} + +void +test_512 (void) +{ + V512 res; + V512 exp; + + init_src(); + emulate_cvtph2_d(&exp, src1, NET_MASK, 0); + res.u32[0] = _mm_cvt_roundsh_i32(src1.xmmh[0], _ROUND_NINT); + check_results(&res, &exp, N_ELEMS, "_mm_cvt_roundsh_u32"); + + if (n_errs != 0) { + abort (); + } +} + diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtsh2usi64-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtsh2usi64-1a.c new file mode 100644 index 00000000000..363252d8d5d --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtsh2usi64-1a.c @@ -0,0 +1,16 @@ +/* { dg-do compile { target { ! ia32 } } } */ +/* { dg-options "-mavx512fp16 -O2 " } */ +/* { dg-final { scan-assembler-times "vcvtsh2usi\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%rax" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtsh2usi\[ \\t\]+\{rz-sae\}\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%rax" 1 } } */ + +#include + +volatile __m128h x1; +volatile unsigned long long res2; + +void extern +avx512f_test (void) +{ + res2 = _mm_cvtsh_u64 (x1); + res2 = _mm_cvt_roundsh_u64 (x1, 11); +} diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtsh2usi64-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtsh2usi64-1b.c new file mode 100644 index 00000000000..74643ae2bd6 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtsh2usi64-1b.c @@ -0,0 +1,53 @@ +/* { dg-do run { target { { ! ia32 } && avx512fp16 } } } */ +/* { dg-options "-O2 -mavx512fp16 -mavx512dq" } */ + + +#define AVX512FP16 +#include "avx512fp16-helper.h" + +#define N_ELEMS 4 + +void NOINLINE +emulate_cvtph2_q(V512 * dest, V512 op1, + __mmask32 k, int zero_mask) +{ + V512 v1, v2, v3, v4, v5, v6, v7, v8; + int i; + __mmask16 m1, m2; + + m1 = k & 0xffff; + + unpack_ph_2twops(op1, &v1, &v2); + + for (i = 0; i < 8; i++) { + if (((1 << i) & m1) == 0) { + if (zero_mask) { + v5.u64[i] = 0; + } + else { + v5.u64[i] = dest->u64[i]; + } + } + else { + v5.u64[i] = v1.f32[i]; + } + } + *dest = v5; +} + +void +test_512 (void) +{ + V512 res; + V512 exp; + + init_src(); + emulate_cvtph2_q(&exp, src1, NET_MASK, 0); + res.u64[0] = _mm_cvt_roundsh_i64(src1.xmmh[0], _ROUND_NINT); + check_results(&res, &exp, 4, "_mm_cvt_roundsh_u64"); + + if (n_errs != 0) { + abort (); + } +} + diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtsi2sh-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtsi2sh-1a.c new file mode 100644 index 00000000000..4cd69d9b4e5 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtsi2sh-1a.c @@ -0,0 +1,16 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx512fp16 -O2" } */ +/* { dg-final { scan-assembler-times "vcvtsi2sh\[ \\t\]+\[^%\n\]*%e\[^\{\n\]*\[^\{\n\]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtsi2sh\[ \\t\]+\[^%\n\]*%e\[^\{\n\]*\{rn-sae\}\[^\{\n\]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ + +#include + +volatile __m128h x; +volatile int n; + +void extern +avx512f_test (void) +{ + x = _mm_cvti32_sh (x, n); + x = _mm_cvt_roundi32_sh (x, n, _MM_FROUND_TO_NEAREST_INT | _MM_FROUND_NO_EXC); +} diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtsi2sh-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtsi2sh-1b.c new file mode 100644 index 00000000000..d9c9a853a17 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtsi2sh-1b.c @@ -0,0 +1,41 @@ +/* { dg-do run { target avx512fp16 } } */ +/* { dg-options "-O2 -mavx512fp16 -mavx512dq" } */ + + +#define AVX512FP16 +#include "avx512fp16-helper.h" + +#define N_ELEMS 8 + +void NOINLINE +emulate_vcvtsi2sh(V512 *dest, V512 op1, + int value_32, __int64_t value_64, int bits) +{ + V512 v1,v2,v5,v6; + unpack_ph_2twops(op1, &v1, &v2); + if (bits == 32) + v5.xmm[0] = _mm_cvt_roundi32_ss (v1.xmm[0], value_32, _ROUND_NINT); +#ifdef __x86_64__ + else + v5.xmm[0] = _mm_cvt_roundi64_ss (v1.xmm[0], value_64, _ROUND_NINT); +#endif + v5.xmm[1] = v1.xmm[1]; + *dest = pack_twops_2ph(v5, v6); +} + +void +test_512 (void) +{ + V512 res; + V512 exp; + + init_src(); + emulate_vcvtsi2sh(&exp, src1, 99, 0, 32); + res.xmmh[0] = _mm_cvt_roundi32_sh(src1.xmmh[0], 99, _ROUND_NINT); + check_results(&res, &exp, N_ELEMS, "_mm_cvt_roundi32_sh"); + + if (n_errs != 0) { + abort (); + } +} + diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtsi2sh64-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtsi2sh64-1a.c new file mode 100644 index 00000000000..5f3e5520bf1 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtsi2sh64-1a.c @@ -0,0 +1,16 @@ +/* { dg-do compile { target { ! ia32 } } } */ +/* { dg-options "-mavx512fp16 -O2" } */ +/* { dg-final { scan-assembler-times "vcvtsi2sh\[ \\t\]+\[^%\n\]*%r\[^\{\n\]*\[^\{\n\]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtsi2sh\[ \\t\]+\[^%\n\]*%r\[^\{\n\]*\{ru-sae\}\[^\{\n\]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ + +#include + +volatile __m128h x; +volatile long long n; + +void extern +avx512f_test (void) +{ + x = _mm_cvti64_sh (x, n); + x = _mm_cvt_roundi64_sh (x, n, _MM_FROUND_TO_POS_INF | _MM_FROUND_NO_EXC); +} diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtsi2sh64-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtsi2sh64-1b.c new file mode 100644 index 00000000000..6f66a87a8e7 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtsi2sh64-1b.c @@ -0,0 +1,41 @@ +/* { dg-do run { target { { ! ia32 } && avx512fp16 } } } */ +/* { dg-options "-O2 -mavx512fp16 -mavx512dq" } */ + + +#define AVX512FP16 +#include "avx512fp16-helper.h" + +#define N_ELEMS 8 + +void NOINLINE +emulate_vcvtsi2sh(V512 *dest, V512 op1, + int value_32, __int64_t value_64, int bits) +{ + V512 v1,v2,v5,v6; + unpack_ph_2twops(op1, &v1, &v2); + if (bits == 32) + v5.xmm[0] = _mm_cvt_roundi32_ss (v1.xmm[0], value_32, _ROUND_NINT); +#ifdef __x86_64__ + else + v5.xmm[0] = _mm_cvt_roundi64_ss (v1.xmm[0], value_64, _ROUND_NINT); +#endif + v5.xmm[1] = v1.xmm[1]; + *dest = pack_twops_2ph(v5, v6); +} + +void +test_512 (void) +{ + V512 res; + V512 exp; + + init_src(); + emulate_vcvtsi2sh(&exp, src1, 0, 99, 64); + res.xmmh[0] = _mm_cvt_roundi64_sh(src1.xmmh[0], 99, _ROUND_NINT); + check_results(&res, &exp, N_ELEMS, "_mm_cvt_roundi64_sh"); + + if (n_errs != 0) { + abort (); + } +} + diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtusi2sh-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtusi2sh-1a.c new file mode 100644 index 00000000000..9c85da09e29 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtusi2sh-1a.c @@ -0,0 +1,16 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx512fp16 -O2" } */ +/* { dg-final { scan-assembler-times "vcvtusi2sh\[ \\t\]+\[^%\n\]*%e\[^\{\n\]*\[^\{\n\]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtusi2sh\[ \\t\]+\[^%\n\]*%e\[^\{\n\]*\{rn-sae\}\[^\{\n\]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ + +#include + +volatile __m128h x; +volatile unsigned n; + +void extern +avx512f_test (void) +{ + x = _mm_cvtu32_sh (x, n); + x = _mm_cvt_roundu32_sh (x, n, _MM_FROUND_TO_NEAREST_INT | _MM_FROUND_NO_EXC); +} diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtusi2sh-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtusi2sh-1b.c new file mode 100644 index 00000000000..d339f0a4043 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtusi2sh-1b.c @@ -0,0 +1,41 @@ +/* { dg-do run { target avx512fp16 } } */ +/* { dg-options "-O2 -mavx512fp16 -mavx512dq" } */ + + +#define AVX512FP16 +#include "avx512fp16-helper.h" + +#define N_ELEMS 8 + +void NOINLINE +emulate_vcvtusi2sh(V512 *dest, V512 op1, + int value_32, __int64_t value_64, int bits) +{ + V512 v1,v2,v5,v6; + unpack_ph_2twops(op1, &v1, &v2); + if (bits == 32) + v5.xmm[0] = _mm_cvt_roundu32_ss (v1.xmm[0], value_32, _ROUND_NINT); +#ifdef __x86_64__ + else + v5.xmm[0] = _mm_cvt_roundu64_ss (v1.xmm[0], value_64, _ROUND_NINT); +#endif + v5.xmm[1] = v1.xmm[1]; + *dest = pack_twops_2ph(v5, v6); +} + +void +test_512 (void) +{ + V512 res; + V512 exp; + + init_src(); + emulate_vcvtusi2sh(&exp, src1, 99, 0, 32); + res.xmmh[0] = _mm_cvt_roundu32_sh(src1.xmmh[0], 99, _ROUND_NINT); + check_results(&res, &exp, N_ELEMS, "_mm_cvt_roundu32_sh"); + + if (n_errs != 0) { + abort (); + } +} + diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtusi2sh64-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtusi2sh64-1a.c new file mode 100644 index 00000000000..1f22ac258e0 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtusi2sh64-1a.c @@ -0,0 +1,16 @@ +/* { dg-do compile { target { ! ia32 } } } */ +/* { dg-options "-mavx512fp16 -O2" } */ +/* { dg-final { scan-assembler-times "vcvtusi2sh\[ \\t\]+\[^%\n\]*%r\[^\{\n\]*\[^\{\n\]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtusi2sh\[ \\t\]+\[^%\n\]*%r\[^\{\n\]*\{ru-sae\}\[^\{\n\]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ + +#include + +volatile __m128h x; +volatile unsigned long long n; + +void extern +avx512f_test (void) +{ + x = _mm_cvtu64_sh (x, n); + x = _mm_cvt_roundu64_sh (x, n, _MM_FROUND_TO_POS_INF | _MM_FROUND_NO_EXC); +} diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtusi2sh64-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtusi2sh64-1b.c new file mode 100644 index 00000000000..20e711e1b0e --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtusi2sh64-1b.c @@ -0,0 +1,41 @@ +/* { dg-do run { target { { ! ia32 } && avx512fp16 } } } */ +/* { dg-options "-O2 -mavx512fp16 -mavx512dq" } */ + + +#define AVX512FP16 +#include "avx512fp16-helper.h" + +#define N_ELEMS 8 + +void NOINLINE +emulate_vcvtusi2sh(V512 *dest, V512 op1, + int value_32, __int64_t value_64, int bits) +{ + V512 v1,v2,v5,v6; + unpack_ph_2twops(op1, &v1, &v2); + if (bits == 32) + v5.xmm[0] = _mm_cvt_roundu32_ss (v1.xmm[0], value_32, _ROUND_NINT); +#ifdef __x86_64__ + else + v5.xmm[0] = _mm_cvt_roundu64_ss (v1.xmm[0], value_64, _ROUND_NINT); +#endif + v5.xmm[1] = v1.xmm[1]; + *dest = pack_twops_2ph(v5, v6); +} + +void +test_512 (void) +{ + V512 res; + V512 exp; + + init_src(); + emulate_vcvtusi2sh(&exp, src1, 0, 99, 64); + res.xmmh[0] = _mm_cvt_roundu64_sh(src1.xmmh[0], 99, _ROUND_NINT); + check_results(&res, &exp, N_ELEMS, "_mm_cvt_roundu64_sh"); + + if (n_errs != 0) { + abort (); + } +} + From patchwork Thu Jul 1 06:16:18 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: liuhongt X-Patchwork-Id: 1499371 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256 header.s=default header.b=rsety5MO; dkim-atps=neutral Received: from sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4GFpqF2cgrz9sVb for ; Thu, 1 Jul 2021 16:54:33 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id B1C423839C6E for ; Thu, 1 Jul 2021 06:54:30 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org B1C423839C6E DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1625122470; bh=sUQK77n+eC6d+w+C6UWjOBRCMJ33u0EM5EUeEd7S6N4=; h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=rsety5MOiQ76oIJ8tzwZvyfZZHPgvURKXVjTOhRlCMG8SE7Ni3I+e1dVvE36kS3er CHUWbHyMhTgQnd7zK0y72yx5Abs7HWcHsoiplMeOmNlUPS2u1/heCxZ14b+g1aqEn2 1BEimpydwRKrNKSHB6E/R00hwM0se26B40ngfxWo= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mga18.intel.com (mga18.intel.com [134.134.136.126]) by sourceware.org (Postfix) with ESMTPS id 563C6384A015 for ; Thu, 1 Jul 2021 06:17:44 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 563C6384A015 X-IronPort-AV: E=McAfee;i="6200,9189,10031"; a="195769905" X-IronPort-AV: E=Sophos;i="5.83,313,1616482800"; d="scan'208";a="195769905" Received: from orsmga003.jf.intel.com ([10.7.209.27]) by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Jun 2021 23:17:43 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.83,313,1616482800"; d="scan'208";a="409038790" Received: from scymds01.sc.intel.com ([10.148.94.138]) by orsmga003.jf.intel.com with ESMTP; 30 Jun 2021 23:17:43 -0700 Received: from shliclel320.sh.intel.com (shliclel320.sh.intel.com [10.239.236.50]) by scymds01.sc.intel.com with ESMTP id 1616GmfB031625; Wed, 30 Jun 2021 23:17:41 -0700 To: gcc-patches@gcc.gnu.org Subject: [PATCH 32/62] AVX512FP16: Add vcvttph2w/vcvttph2uw/vcvttph2dq/vcvttph2qq/vcvttph2udq/vcvttph2uqq Date: Thu, 1 Jul 2021 14:16:18 +0800 Message-Id: <20210701061648.9447-33-hongtao.liu@intel.com> X-Mailer: git-send-email 2.18.1 In-Reply-To: <20210701061648.9447-1-hongtao.liu@intel.com> References: <20210701061648.9447-1-hongtao.liu@intel.com> X-Spam-Status: No, score=-12.5 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_NONE, KAM_DMARC_STATUS, KAM_LAZY_DOMAIN_SECURITY, KAM_SHORT, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: liuhongt via Gcc-patches From: liuhongt Reply-To: liuhongt Cc: jakub@redhat.com Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" gcc/ChangeLog: * config/i386/avx512fp16intrin.h (_mm512_cvttph_epi32): New intrinsic. (_mm512_mask_cvttph_epi32): Likewise. (_mm512_maskz_cvttph_epi32): Likewise. (_mm512_cvtt_roundph_epi32): Likewise. (_mm512_mask_cvtt_roundph_epi32): Likewise. (_mm512_maskz_cvtt_roundph_epi32): Likewise. (_mm512_cvttph_epu32): Likewise. (_mm512_mask_cvttph_epu32): Likewise. (_mm512_maskz_cvttph_epu32): Likewise. (_mm512_cvtt_roundph_epu32): Likewise. (_mm512_mask_cvtt_roundph_epu32): Likewise. (_mm512_maskz_cvtt_roundph_epu32): Likewise. (_mm512_cvttph_epi64): Likewise. (_mm512_mask_cvttph_epi64): Likewise. (_mm512_maskz_cvttph_epi64): Likewise. (_mm512_cvtt_roundph_epi64): Likewise. (_mm512_mask_cvtt_roundph_epi64): Likewise. (_mm512_maskz_cvtt_roundph_epi64): Likewise. (_mm512_cvttph_epu64): Likewise. (_mm512_mask_cvttph_epu64): Likewise. (_mm512_maskz_cvttph_epu64): Likewise. (_mm512_cvtt_roundph_epu64): Likewise. (_mm512_mask_cvtt_roundph_epu64): Likewise. (_mm512_maskz_cvtt_roundph_epu64): Likewise. (_mm512_cvttph_epi16): Likewise. (_mm512_mask_cvttph_epi16): Likewise. (_mm512_maskz_cvttph_epi16): Likewise. (_mm512_cvtt_roundph_epi16): Likewise. (_mm512_mask_cvtt_roundph_epi16): Likewise. (_mm512_maskz_cvtt_roundph_epi16): Likewise. (_mm512_cvttph_epu16): Likewise. (_mm512_mask_cvttph_epu16): Likewise. (_mm512_maskz_cvttph_epu16): Likewise. (_mm512_cvtt_roundph_epu16): Likewise. (_mm512_mask_cvtt_roundph_epu16): Likewise. (_mm512_maskz_cvtt_roundph_epu16): Likewise. * config/i386/avx512fp16vlintrin.h (_mm_cvttph_epi32): New intirnsic. (_mm_mask_cvttph_epi32): Likewise. (_mm_maskz_cvttph_epi32): Likewise. (_mm256_cvttph_epi32): Likewise. (_mm256_mask_cvttph_epi32): Likewise. (_mm256_maskz_cvttph_epi32): Likewise. (_mm_cvttph_epu32): Likewise. (_mm_mask_cvttph_epu32): Likewise. (_mm_maskz_cvttph_epu32): Likewise. (_mm256_cvttph_epu32): Likewise. (_mm256_mask_cvttph_epu32): Likewise. (_mm256_maskz_cvttph_epu32): Likewise. (_mm_cvttph_epi64): Likewise. (_mm_mask_cvttph_epi64): Likewise. (_mm_maskz_cvttph_epi64): Likewise. (_mm256_cvttph_epi64): Likewise. (_mm256_mask_cvttph_epi64): Likewise. (_mm256_maskz_cvttph_epi64): Likewise. (_mm_cvttph_epu64): Likewise. (_mm_mask_cvttph_epu64): Likewise. (_mm_maskz_cvttph_epu64): Likewise. (_mm256_cvttph_epu64): Likewise. (_mm256_mask_cvttph_epu64): Likewise. (_mm256_maskz_cvttph_epu64): Likewise. (_mm_cvttph_epi16): Likewise. (_mm_mask_cvttph_epi16): Likewise. (_mm_maskz_cvttph_epi16): Likewise. (_mm256_cvttph_epi16): Likewise. (_mm256_mask_cvttph_epi16): Likewise. (_mm256_maskz_cvttph_epi16): Likewise. (_mm_cvttph_epu16): Likewise. (_mm_mask_cvttph_epu16): Likewise. (_mm_maskz_cvttph_epu16): Likewise. (_mm256_cvttph_epu16): Likewise. (_mm256_mask_cvttph_epu16): Likewise. (_mm256_maskz_cvttph_epu16): Likewise. * config/i386/i386-builtin.def: Add new builtins. * config/i386/sse.md (avx512fp16_fix_trunc2): New. (avx512fp16_fix_trunc2): Ditto. (avx512fp16_fix_truncv2di2): Ditto. gcc/testsuite/ChangeLog: * gcc.target/i386/avx-1.c: Add test for new builtins. * gcc.target/i386/sse-13.c: Ditto. * gcc.target/i386/sse-23.c: Ditto. * gcc.target/i386/sse-14.c: Add test for new intrinsics. * gcc.target/i386/sse-22.c: Ditto. --- gcc/config/i386/avx512fp16intrin.h | 539 +++++++++++++++++++++++++ gcc/config/i386/avx512fp16vlintrin.h | 365 +++++++++++++++++ gcc/config/i386/i386-builtin.def | 18 + gcc/config/i386/sse.md | 34 ++ gcc/testsuite/gcc.target/i386/avx-1.c | 6 + gcc/testsuite/gcc.target/i386/sse-13.c | 6 + gcc/testsuite/gcc.target/i386/sse-14.c | 18 + gcc/testsuite/gcc.target/i386/sse-22.c | 18 + gcc/testsuite/gcc.target/i386/sse-23.c | 6 + 9 files changed, 1010 insertions(+) diff --git a/gcc/config/i386/avx512fp16intrin.h b/gcc/config/i386/avx512fp16intrin.h index 7524a8d6a5b..66de5b88927 100644 --- a/gcc/config/i386/avx512fp16intrin.h +++ b/gcc/config/i386/avx512fp16intrin.h @@ -2702,6 +2702,201 @@ _mm512_maskz_cvt_roundph_epu32 (__mmask16 __A, __m256h __B, int __C) #endif /* __OPTIMIZE__ */ +/* Intrinsics vcvttph2dq. */ +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_cvttph_epi32 (__m256h __A) +{ + return (__m512i) + __builtin_ia32_vcvttph2dq_v16si_mask_round (__A, + (__v16si) + _mm512_setzero_si512 (), + (__mmask16) -1, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_cvttph_epi32 (__m512i __A, __mmask16 __B, __m256h __C) +{ + return (__m512i) + __builtin_ia32_vcvttph2dq_v16si_mask_round (__C, + (__v16si) __A, + __B, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_cvttph_epi32 (__mmask16 __A, __m256h __B) +{ + return (__m512i) + __builtin_ia32_vcvttph2dq_v16si_mask_round (__B, + (__v16si) + _mm512_setzero_si512 (), + __A, + _MM_FROUND_CUR_DIRECTION); +} + +#ifdef __OPTIMIZE__ +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_cvtt_roundph_epi32 (__m256h __A, int __B) +{ + return (__m512i) + __builtin_ia32_vcvttph2dq_v16si_mask_round (__A, + (__v16si) + _mm512_setzero_si512 (), + (__mmask16) -1, + __B); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_cvtt_roundph_epi32 (__m512i __A, __mmask16 __B, __m256h __C, int __D) +{ + return (__m512i) + __builtin_ia32_vcvttph2dq_v16si_mask_round (__C, + (__v16si) __A, + __B, + __D); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_cvtt_roundph_epi32 (__mmask16 __A, __m256h __B, int __C) +{ + return (__m512i) + __builtin_ia32_vcvttph2dq_v16si_mask_round (__B, + (__v16si) + _mm512_setzero_si512 (), + __A, + __C); +} + +#else +#define _mm512_cvtt_roundph_epi32(A, B) \ + ((__m512i) \ + __builtin_ia32_vcvttph2dq_v16si_mask_round ((A), \ + (__v16si) \ + (_mm512_setzero_si512 ()), \ + (__mmask16)(-1), (B))) + +#define _mm512_mask_cvtt_roundph_epi32(A, B, C, D) \ + ((__m512i) \ + __builtin_ia32_vcvttph2dq_v16si_mask_round ((C), \ + (__v16si)(A), \ + (B), \ + (D))) + +#define _mm512_maskz_cvtt_roundph_epi32(A, B, C) \ + ((__m512i) \ + __builtin_ia32_vcvttph2dq_v16si_mask_round ((B), \ + (__v16si) \ + _mm512_setzero_si512 (), \ + (A), \ + (C))) + +#endif /* __OPTIMIZE__ */ + +/* Intrinsics vcvttph2udq. */ +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_cvttph_epu32 (__m256h __A) +{ + return (__m512i) + __builtin_ia32_vcvttph2udq_v16si_mask_round (__A, + (__v16si) + _mm512_setzero_si512 (), + (__mmask16) -1, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_cvttph_epu32 (__m512i __A, __mmask16 __B, __m256h __C) +{ + return (__m512i) + __builtin_ia32_vcvttph2udq_v16si_mask_round (__C, + (__v16si) __A, + __B, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_cvttph_epu32 (__mmask16 __A, __m256h __B) +{ + return (__m512i) + __builtin_ia32_vcvttph2udq_v16si_mask_round (__B, + (__v16si) + _mm512_setzero_si512 (), + __A, + _MM_FROUND_CUR_DIRECTION); +} + +#ifdef __OPTIMIZE__ +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_cvtt_roundph_epu32 (__m256h __A, int __B) +{ + return (__m512i) + __builtin_ia32_vcvttph2udq_v16si_mask_round (__A, + (__v16si) + _mm512_setzero_si512 (), + (__mmask16) -1, + __B); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_cvtt_roundph_epu32 (__m512i __A, __mmask16 __B, __m256h __C, int __D) +{ + return (__m512i) + __builtin_ia32_vcvttph2udq_v16si_mask_round (__C, + (__v16si) __A, + __B, + __D); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_cvtt_roundph_epu32 (__mmask16 __A, __m256h __B, int __C) +{ + return (__m512i) + __builtin_ia32_vcvttph2udq_v16si_mask_round (__B, + (__v16si) + _mm512_setzero_si512 (), + __A, + __C); +} + +#else +#define _mm512_cvtt_roundph_epu32(A, B) \ + ((__m512i) \ + __builtin_ia32_vcvttph2udq_v16si_mask_round ((A), \ + (__v16si) \ + _mm512_setzero_si512 (), \ + (__mmask16)-1, \ + (B))) + +#define _mm512_mask_cvtt_roundph_epu32(A, B, C, D) \ + ((__m512i) \ + __builtin_ia32_vcvttph2udq_v16si_mask_round ((C), \ + (__v16si)(A), \ + (B), \ + (D))) + +#define _mm512_maskz_cvtt_roundph_epu32(A, B, C) \ + ((__m512i) \ + __builtin_ia32_vcvttph2udq_v16si_mask_round ((B), \ + (__v16si) \ + _mm512_setzero_si512 (), \ + (A), \ + (C))) + +#endif /* __OPTIMIZE__ */ + /* Intrinsics vcvtdq2ph. */ extern __inline __m256h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) @@ -3019,6 +3214,156 @@ _mm512_maskz_cvt_roundph_epu64 (__mmask8 __A, __m128h __B, int __C) #endif /* __OPTIMIZE__ */ +/* Intrinsics vcvttph2qq. */ +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_cvttph_epi64 (__m128h __A) +{ + return __builtin_ia32_vcvttph2qq_v8di_mask_round (__A, + _mm512_setzero_si512 (), + (__mmask8) -1, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_cvttph_epi64 (__m512i __A, __mmask8 __B, __m128h __C) +{ + return __builtin_ia32_vcvttph2qq_v8di_mask_round (__C, __A, __B, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_cvttph_epi64 (__mmask8 __A, __m128h __B) +{ + return __builtin_ia32_vcvttph2qq_v8di_mask_round (__B, + _mm512_setzero_si512 (), + __A, + _MM_FROUND_CUR_DIRECTION); +} + +#ifdef __OPTIMIZE__ +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_cvtt_roundph_epi64 (__m128h __A, int __B) +{ + return __builtin_ia32_vcvttph2qq_v8di_mask_round (__A, + _mm512_setzero_si512 (), + (__mmask8) -1, + __B); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_cvtt_roundph_epi64 (__m512i __A, __mmask8 __B, __m128h __C, int __D) +{ + return __builtin_ia32_vcvttph2qq_v8di_mask_round (__C, __A, __B, __D); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_cvtt_roundph_epi64 (__mmask8 __A, __m128h __B, int __C) +{ + return __builtin_ia32_vcvttph2qq_v8di_mask_round (__B, + _mm512_setzero_si512 (), + __A, + __C); +} + +#else +#define _mm512_cvtt_roundph_epi64(A, B) \ + (__builtin_ia32_vcvttph2qq_v8di_mask_round ((A), \ + _mm512_setzero_si512 (), \ + (__mmask8)-1, \ + (B))) + +#define _mm512_mask_cvtt_roundph_epi64(A, B, C, D) \ + __builtin_ia32_vcvttph2qq_v8di_mask_round ((C), (A), (B), (D)) + +#define _mm512_maskz_cvtt_roundph_epi64(A, B, C) \ + (__builtin_ia32_vcvttph2qq_v8di_mask_round ((B), \ + _mm512_setzero_si512 (), \ + (A), \ + (C))) + +#endif /* __OPTIMIZE__ */ + +/* Intrinsics vcvttph2uqq. */ +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_cvttph_epu64 (__m128h __A) +{ + return __builtin_ia32_vcvttph2uqq_v8di_mask_round (__A, + _mm512_setzero_si512 (), + (__mmask8) -1, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_cvttph_epu64 (__m512i __A, __mmask8 __B, __m128h __C) +{ + return __builtin_ia32_vcvttph2uqq_v8di_mask_round (__C, __A, __B, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_cvttph_epu64 (__mmask8 __A, __m128h __B) +{ + return __builtin_ia32_vcvttph2uqq_v8di_mask_round (__B, + _mm512_setzero_si512 (), + __A, + _MM_FROUND_CUR_DIRECTION); +} + +#ifdef __OPTIMIZE__ +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_cvtt_roundph_epu64 (__m128h __A, int __B) +{ + return __builtin_ia32_vcvttph2uqq_v8di_mask_round (__A, + _mm512_setzero_si512 (), + (__mmask8) -1, + __B); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_cvtt_roundph_epu64 (__m512i __A, __mmask8 __B, __m128h __C, int __D) +{ + return __builtin_ia32_vcvttph2uqq_v8di_mask_round (__C, __A, __B, __D); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_cvtt_roundph_epu64 (__mmask8 __A, __m128h __B, int __C) +{ + return __builtin_ia32_vcvttph2uqq_v8di_mask_round (__B, + _mm512_setzero_si512 (), + __A, + __C); +} + +#else +#define _mm512_cvtt_roundph_epu64(A, B) \ + (__builtin_ia32_vcvttph2uqq_v8di_mask_round ((A), \ + _mm512_setzero_si512 (), \ + (__mmask8)-1, \ + (B))) + +#define _mm512_mask_cvtt_roundph_epu64(A, B, C, D) \ + __builtin_ia32_vcvttph2uqq_v8di_mask_round ((C), (A), (B), (D)) + +#define _mm512_maskz_cvtt_roundph_epu64(A, B, C) \ + (__builtin_ia32_vcvttph2uqq_v8di_mask_round ((B), \ + _mm512_setzero_si512 (), \ + (A), \ + (C))) + +#endif /* __OPTIMIZE__ */ + /* Intrinsics vcvtqq2ph. */ extern __inline __m128h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) @@ -3363,6 +3708,200 @@ _mm512_maskz_cvt_roundph_epu16 (__mmask32 __A, __m512h __B, int __C) #endif /* __OPTIMIZE__ */ +/* Intrinsics vcvttph2w. */ +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_cvttph_epi16 (__m512h __A) +{ + return (__m512i) + __builtin_ia32_vcvttph2w_v32hi_mask_round (__A, + (__v32hi) + _mm512_setzero_si512 (), + (__mmask32) -1, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_cvttph_epi16 (__m512i __A, __mmask32 __B, __m512h __C) +{ + return (__m512i) + __builtin_ia32_vcvttph2w_v32hi_mask_round (__C, + (__v32hi) __A, + __B, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_cvttph_epi16 (__mmask32 __A, __m512h __B) +{ + return (__m512i) + __builtin_ia32_vcvttph2w_v32hi_mask_round (__B, + (__v32hi) + _mm512_setzero_si512 (), + __A, + _MM_FROUND_CUR_DIRECTION); +} + +#ifdef __OPTIMIZE__ +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_cvtt_roundph_epi16 (__m512h __A, int __B) +{ + return (__m512i) + __builtin_ia32_vcvttph2w_v32hi_mask_round (__A, + (__v32hi) + _mm512_setzero_si512 (), + (__mmask32) -1, + __B); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_cvtt_roundph_epi16 (__m512i __A, __mmask32 __B, __m512h __C, int __D) +{ + return (__m512i) + __builtin_ia32_vcvttph2w_v32hi_mask_round (__C, + (__v32hi) __A, + __B, + __D); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_cvtt_roundph_epi16 (__mmask32 __A, __m512h __B, int __C) +{ + return (__m512i) + __builtin_ia32_vcvttph2w_v32hi_mask_round (__B, + (__v32hi) + _mm512_setzero_si512 (), + __A, + __C); +} + +#else +#define _mm512_cvtt_roundph_epi16(A, B) \ + ((__m512i) \ + __builtin_ia32_vcvttph2w_v32hi_mask_round ((A), \ + (__v32hi)_mm512_setzero_si512 (), \ + (__mmask32)-1, \ + (B))) + +#define _mm512_mask_cvtt_roundph_epi16(A, B, C, D) \ + ((__m512i) \ + __builtin_ia32_vcvttph2w_v32hi_mask_round ((C), \ + (__v32hi)(A), \ + (B), \ + (D))) + +#define _mm512_maskz_cvtt_roundph_epi16(A, B, C) \ + ((__m512i) \ + __builtin_ia32_vcvttph2w_v32hi_mask_round ((B), \ + (__v32hi)_mm512_setzero_si512 (), \ + (A), \ + (C))) + +#endif /* __OPTIMIZE__ */ + +/* Intrinsics vcvttph2uw. */ +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_cvttph_epu16 (__m512h __A) +{ + return (__m512i) + __builtin_ia32_vcvttph2uw_v32hi_mask_round (__A, + (__v32hi) + _mm512_setzero_si512 (), + (__mmask32) -1, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_cvttph_epu16 (__m512i __A, __mmask32 __B, __m512h __C) +{ + return (__m512i) + __builtin_ia32_vcvttph2uw_v32hi_mask_round (__C, + (__v32hi) __A, + __B, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_cvttph_epu16 (__mmask32 __A, __m512h __B) +{ + return (__m512i) + __builtin_ia32_vcvttph2uw_v32hi_mask_round (__B, + (__v32hi) + _mm512_setzero_si512 (), + __A, + _MM_FROUND_CUR_DIRECTION); +} + +#ifdef __OPTIMIZE__ +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_cvtt_roundph_epu16 (__m512h __A, int __B) +{ + return (__m512i) + __builtin_ia32_vcvttph2uw_v32hi_mask_round (__A, + (__v32hi) + _mm512_setzero_si512 (), + (__mmask32) -1, + __B); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_cvtt_roundph_epu16 (__m512i __A, __mmask32 __B, __m512h __C, int __D) +{ + return (__m512i) + __builtin_ia32_vcvttph2uw_v32hi_mask_round (__C, + (__v32hi) __A, + __B, + __D); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_cvtt_roundph_epu16 (__mmask32 __A, __m512h __B, int __C) +{ + return (__m512i) + __builtin_ia32_vcvttph2uw_v32hi_mask_round (__B, + (__v32hi) + _mm512_setzero_si512 (), + __A, + __C); +} + +#else +#define _mm512_cvtt_roundph_epu16(A, B) \ + ((__m512i) \ + __builtin_ia32_vcvttph2uw_v32hi_mask_round ((A), \ + (__v32hi) \ + _mm512_setzero_si512 (), \ + (__mmask32)-1, \ + (B))) + +#define _mm512_mask_cvtt_roundph_epu16(A, B, C, D) \ + ((__m512i) \ + __builtin_ia32_vcvttph2uw_v32hi_mask_round ((C), \ + (__v32hi)(A), \ + (B), \ + (D))) + +#define _mm512_maskz_cvtt_roundph_epu16(A, B, C) \ + ((__m512i) \ + __builtin_ia32_vcvttph2uw_v32hi_mask_round ((B), \ + (__v32hi) \ + _mm512_setzero_si512 (), \ + (A), \ + (C))) + +#endif /* __OPTIMIZE__ */ + /* Intrinsics vcvtw2ph. */ extern __inline __m512h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) diff --git a/gcc/config/i386/avx512fp16vlintrin.h b/gcc/config/i386/avx512fp16vlintrin.h index 93d9ff8bf3c..e1ee37edde6 100644 --- a/gcc/config/i386/avx512fp16vlintrin.h +++ b/gcc/config/i386/avx512fp16vlintrin.h @@ -1050,6 +1050,132 @@ _mm256_maskz_cvtph_epu32 (__mmask8 __A, __m128h __B) __A); } +/* Intrinsics vcvttph2dq. */ +extern __inline __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvttph_epi32 (__m128h __A) +{ + return (__m128i) + __builtin_ia32_vcvttph2dq_v4si_mask (__A, + (__v4si) _mm_setzero_si128 (), + (__mmask8) -1); +} + +extern __inline __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_cvttph_epi32 (__m128i __A, __mmask8 __B, __m128h __C) +{ + return (__m128i)__builtin_ia32_vcvttph2dq_v4si_mask (__C, + ( __v4si) __A, + __B); +} + +extern __inline __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_cvttph_epi32 (__mmask8 __A, __m128h __B) +{ + return (__m128i) + __builtin_ia32_vcvttph2dq_v4si_mask (__B, + (__v4si) _mm_setzero_si128 (), + __A); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_cvttph_epi32 (__m128h __A) +{ + return (__m256i) + __builtin_ia32_vcvttph2dq_v8si_mask (__A, + (__v8si) + _mm256_setzero_si256 (), + (__mmask8) -1); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_mask_cvttph_epi32 (__m256i __A, __mmask8 __B, __m128h __C) +{ + return (__m256i) + __builtin_ia32_vcvttph2dq_v8si_mask (__C, + ( __v8si) __A, + __B); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_maskz_cvttph_epi32 (__mmask8 __A, __m128h __B) +{ + return (__m256i) + __builtin_ia32_vcvttph2dq_v8si_mask (__B, + (__v8si) + _mm256_setzero_si256 (), + __A); +} + +/* Intrinsics vcvttph2udq. */ +extern __inline __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvttph_epu32 (__m128h __A) +{ + return (__m128i) + __builtin_ia32_vcvttph2udq_v4si_mask (__A, + (__v4si) + _mm_setzero_si128 (), + (__mmask8) -1); +} + +extern __inline __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_cvttph_epu32 (__m128i __A, __mmask8 __B, __m128h __C) +{ + return (__m128i) + __builtin_ia32_vcvttph2udq_v4si_mask (__C, + ( __v4si) __A, + __B); +} + +extern __inline __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_cvttph_epu32 (__mmask8 __A, __m128h __B) +{ + return (__m128i) + __builtin_ia32_vcvttph2udq_v4si_mask (__B, + (__v4si) + _mm_setzero_si128 (), + __A); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_cvttph_epu32 (__m128h __A) +{ + return (__m256i) + __builtin_ia32_vcvttph2udq_v8si_mask (__A, + (__v8si) + _mm256_setzero_si256 (), (__mmask8) -1); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_mask_cvttph_epu32 (__m256i __A, __mmask8 __B, __m128h __C) +{ + return (__m256i) + __builtin_ia32_vcvttph2udq_v8si_mask (__C, + ( __v8si) __A, + __B); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_maskz_cvttph_epu32 (__mmask8 __A, __m128h __B) +{ + return (__m256i) + __builtin_ia32_vcvttph2udq_v8si_mask (__B, + (__v8si) + _mm256_setzero_si256 (), + __A); +} + /* Intrinsics vcvtdq2ph. */ extern __inline __m128h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) @@ -1257,6 +1383,116 @@ _mm256_maskz_cvtph_epu64 (__mmask8 __A, __m128h __B) __A); } +/* Intrinsics vcvttph2qq. */ +extern __inline __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvttph_epi64 (__m128h __A) +{ + return __builtin_ia32_vcvttph2qq_v2di_mask (__A, + _mm_setzero_si128 (), + (__mmask8) -1); +} + +extern __inline __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_cvttph_epi64 (__m128i __A, __mmask8 __B, __m128h __C) +{ + return __builtin_ia32_vcvttph2qq_v2di_mask (__C, + __A, + __B); +} + +extern __inline __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_cvttph_epi64 (__mmask8 __A, __m128h __B) +{ + return __builtin_ia32_vcvttph2qq_v2di_mask (__B, + _mm_setzero_si128 (), + __A); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_cvttph_epi64 (__m128h __A) +{ + return __builtin_ia32_vcvttph2qq_v4di_mask (__A, + _mm256_setzero_si256 (), + (__mmask8) -1); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_mask_cvttph_epi64 (__m256i __A, __mmask8 __B, __m128h __C) +{ + return __builtin_ia32_vcvttph2qq_v4di_mask (__C, + __A, + __B); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_maskz_cvttph_epi64 (__mmask8 __A, __m128h __B) +{ + return __builtin_ia32_vcvttph2qq_v4di_mask (__B, + _mm256_setzero_si256 (), + __A); +} + +/* Intrinsics vcvttph2uqq. */ +extern __inline __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvttph_epu64 (__m128h __A) +{ + return __builtin_ia32_vcvttph2uqq_v2di_mask (__A, + _mm_setzero_si128 (), + (__mmask8) -1); +} + +extern __inline __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_cvttph_epu64 (__m128i __A, __mmask8 __B, __m128h __C) +{ + return __builtin_ia32_vcvttph2uqq_v2di_mask (__C, + __A, + __B); +} + +extern __inline __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_cvttph_epu64 (__mmask8 __A, __m128h __B) +{ + return __builtin_ia32_vcvttph2uqq_v2di_mask (__B, + _mm_setzero_si128 (), + __A); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_cvttph_epu64 (__m128h __A) +{ + return __builtin_ia32_vcvttph2uqq_v4di_mask (__A, + _mm256_setzero_si256 (), + (__mmask8) -1); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_mask_cvttph_epu64 (__m256i __A, __mmask8 __B, __m128h __C) +{ + return __builtin_ia32_vcvttph2uqq_v4di_mask (__C, + __A, + __B); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_maskz_cvttph_epu64 (__mmask8 __A, __m128h __B) +{ + return __builtin_ia32_vcvttph2uqq_v4di_mask (__B, + _mm256_setzero_si256 (), + __A); +} + /* Intrinsics vcvtqq2ph. */ extern __inline __m128h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) @@ -1481,6 +1717,135 @@ _mm256_maskz_cvtph_epu16 (__mmask16 __A, __m256h __B) __A); } +/* Intrinsics vcvttph2w. */ +extern __inline __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvttph_epi16 (__m128h __A) +{ + return (__m128i) + __builtin_ia32_vcvttph2w_v8hi_mask (__A, + (__v8hi) + _mm_setzero_si128 (), + (__mmask8) -1); +} + +extern __inline __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_cvttph_epi16 (__m128i __A, __mmask8 __B, __m128h __C) +{ + return (__m128i) + __builtin_ia32_vcvttph2w_v8hi_mask (__C, + ( __v8hi) __A, + __B); +} + +extern __inline __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_cvttph_epi16 (__mmask8 __A, __m128h __B) +{ + return (__m128i) + __builtin_ia32_vcvttph2w_v8hi_mask (__B, + (__v8hi) + _mm_setzero_si128 (), + __A); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_cvttph_epi16 (__m256h __A) +{ + return (__m256i) + __builtin_ia32_vcvttph2w_v16hi_mask (__A, + (__v16hi) + _mm256_setzero_si256 (), + (__mmask16) -1); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_mask_cvttph_epi16 (__m256i __A, __mmask16 __B, __m256h __C) +{ + return (__m256i) + __builtin_ia32_vcvttph2w_v16hi_mask (__C, + ( __v16hi) __A, + __B); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_maskz_cvttph_epi16 (__mmask16 __A, __m256h __B) +{ + return (__m256i) + __builtin_ia32_vcvttph2w_v16hi_mask (__B, + (__v16hi) + _mm256_setzero_si256 (), + __A); +} + +/* Intrinsics vcvttph2uw. */ +extern __inline __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvttph_epu16 (__m128h __A) +{ + return (__m128i) + __builtin_ia32_vcvttph2uw_v8hi_mask (__A, + (__v8hi) + _mm_setzero_si128 (), + (__mmask8) -1); +} + +extern __inline __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_cvttph_epu16 (__m128i __A, __mmask8 __B, __m128h __C) +{ + return (__m128i) + __builtin_ia32_vcvttph2uw_v8hi_mask (__C, + ( __v8hi) __A, + __B); +} + +extern __inline __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_cvttph_epu16 (__mmask8 __A, __m128h __B) +{ + return (__m128i) + __builtin_ia32_vcvttph2uw_v8hi_mask (__B, + (__v8hi) + _mm_setzero_si128 (), + __A); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_cvttph_epu16 (__m256h __A) +{ + return (__m256i) + __builtin_ia32_vcvttph2uw_v16hi_mask (__A, + (__v16hi) + _mm256_setzero_si256 (), + (__mmask16) -1); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_mask_cvttph_epu16 (__m256i __A, __mmask16 __B, __m256h __C) +{ + return (__m256i) + __builtin_ia32_vcvttph2uw_v16hi_mask (__C, + ( __v16hi) __A, + __B); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_maskz_cvttph_epu16 (__mmask16 __A, __m256h __B) +{ + return (__m256i) + __builtin_ia32_vcvttph2uw_v16hi_mask (__B, + (__v16hi) _mm256_setzero_si256 (), + __A); +} + /* Intrinsics vcvtw2ph. */ extern __inline __m128h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) diff --git a/gcc/config/i386/i386-builtin.def b/gcc/config/i386/i386-builtin.def index 3602b40d6d5..17571e3b4c3 100644 --- a/gcc/config/i386/i386-builtin.def +++ b/gcc/config/i386/i386-builtin.def @@ -2835,14 +2835,26 @@ BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp1 BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtph2dq_v8si_mask, "__builtin_ia32_vcvtph2dq_v8si_mask", IX86_BUILTIN_VCVTPH2DQ_V8SI_MASK, UNKNOWN, (int) V8SI_FTYPE_V8HF_V8SI_UQI) BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtph2udq_v4si_mask, "__builtin_ia32_vcvtph2udq_v4si_mask", IX86_BUILTIN_VCVTPH2UDQ_V4SI_MASK, UNKNOWN, (int) V4SI_FTYPE_V8HF_V4SI_UQI) BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtph2udq_v8si_mask, "__builtin_ia32_vcvtph2udq_v8si_mask", IX86_BUILTIN_VCVTPH2UDQ_V8SI_MASK, UNKNOWN, (int) V8SI_FTYPE_V8HF_V8SI_UQI) +BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_fix_truncv4si2_mask, "__builtin_ia32_vcvttph2dq_v4si_mask", IX86_BUILTIN_VCVTTPH2DQ_V4SI_MASK, UNKNOWN, (int) V4SI_FTYPE_V8HF_V4SI_UQI) +BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_fix_truncv8si2_mask, "__builtin_ia32_vcvttph2dq_v8si_mask", IX86_BUILTIN_VCVTTPH2DQ_V8SI_MASK, UNKNOWN, (int) V8SI_FTYPE_V8HF_V8SI_UQI) +BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_fixuns_truncv4si2_mask, "__builtin_ia32_vcvttph2udq_v4si_mask", IX86_BUILTIN_VCVTTPH2UDQ_V4SI_MASK, UNKNOWN, (int) V4SI_FTYPE_V8HF_V4SI_UQI) +BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_fixuns_truncv8si2_mask, "__builtin_ia32_vcvttph2udq_v8si_mask", IX86_BUILTIN_VCVTTPH2UDQ_V8SI_MASK, UNKNOWN, (int) V8SI_FTYPE_V8HF_V8SI_UQI) BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtph2qq_v2di_mask, "__builtin_ia32_vcvtph2qq_v2di_mask", IX86_BUILTIN_VCVTPH2QQ_V2DI_MASK, UNKNOWN, (int) V2DI_FTYPE_V8HF_V2DI_UQI) BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtph2qq_v4di_mask, "__builtin_ia32_vcvtph2qq_v4di_mask", IX86_BUILTIN_VCVTPH2QQ_V4DI_MASK, UNKNOWN, (int) V4DI_FTYPE_V8HF_V4DI_UQI) BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtph2uqq_v2di_mask, "__builtin_ia32_vcvtph2uqq_v2di_mask", IX86_BUILTIN_VCVTPH2UQQ_V2DI_MASK, UNKNOWN, (int) V2DI_FTYPE_V8HF_V2DI_UQI) BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtph2uqq_v4di_mask, "__builtin_ia32_vcvtph2uqq_v4di_mask", IX86_BUILTIN_VCVTPH2UQQ_V4DI_MASK, UNKNOWN, (int) V4DI_FTYPE_V8HF_V4DI_UQI) +BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_fix_truncv2di2_mask, "__builtin_ia32_vcvttph2qq_v2di_mask", IX86_BUILTIN_VCVTTPH2QQ_V2DI_MASK, UNKNOWN, (int) V2DI_FTYPE_V8HF_V2DI_UQI) +BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_fix_truncv4di2_mask, "__builtin_ia32_vcvttph2qq_v4di_mask", IX86_BUILTIN_VCVTTPH2QQ_V4DI_MASK, UNKNOWN, (int) V4DI_FTYPE_V8HF_V4DI_UQI) +BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_fixuns_truncv2di2_mask, "__builtin_ia32_vcvttph2uqq_v2di_mask", IX86_BUILTIN_VCVTTPH2UQQ_V2DI_MASK, UNKNOWN, (int) V2DI_FTYPE_V8HF_V2DI_UQI) +BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_fixuns_truncv4di2_mask, "__builtin_ia32_vcvttph2uqq_v4di_mask", IX86_BUILTIN_VCVTTPH2UQQ_V4DI_MASK, UNKNOWN, (int) V4DI_FTYPE_V8HF_V4DI_UQI) BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtph2w_v8hi_mask, "__builtin_ia32_vcvtph2w_v8hi_mask", IX86_BUILTIN_VCVTPH2W_V8HI_MASK, UNKNOWN, (int) V8HI_FTYPE_V8HF_V8HI_UQI) BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtph2w_v16hi_mask, "__builtin_ia32_vcvtph2w_v16hi_mask", IX86_BUILTIN_VCVTPH2W_V16HI_MASK, UNKNOWN, (int) V16HI_FTYPE_V16HF_V16HI_UHI) BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtph2uw_v8hi_mask, "__builtin_ia32_vcvtph2uw_v8hi_mask", IX86_BUILTIN_VCVTPH2UW_V8HI_MASK, UNKNOWN, (int) V8HI_FTYPE_V8HF_V8HI_UQI) BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtph2uw_v16hi_mask, "__builtin_ia32_vcvtph2uw_v16hi_mask", IX86_BUILTIN_VCVTPH2UW_V16HI_MASK, UNKNOWN, (int) V16HI_FTYPE_V16HF_V16HI_UHI) +BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_fix_truncv8hi2_mask, "__builtin_ia32_vcvttph2w_v8hi_mask", IX86_BUILTIN_VCVTTPH2W_V8HI_MASK, UNKNOWN, (int) V8HI_FTYPE_V8HF_V8HI_UQI) +BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_fix_truncv16hi2_mask, "__builtin_ia32_vcvttph2w_v16hi_mask", IX86_BUILTIN_VCVTTPH2W_V16HI_MASK, UNKNOWN, (int) V16HI_FTYPE_V16HF_V16HI_UHI) +BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_fixuns_truncv8hi2_mask, "__builtin_ia32_vcvttph2uw_v8hi_mask", IX86_BUILTIN_VCVTTPH2UW_V8HI_MASK, UNKNOWN, (int) V8HI_FTYPE_V8HF_V8HI_UQI) +BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_fixuns_truncv16hi2_mask, "__builtin_ia32_vcvttph2uw_v16hi_mask", IX86_BUILTIN_VCVTTPH2UW_V16HI_MASK, UNKNOWN, (int) V16HI_FTYPE_V16HF_V16HI_UHI) BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtw2ph_v8hi_mask, "__builtin_ia32_vcvtw2ph_v8hi_mask", IX86_BUILTIN_VCVTW2PH_V8HI_MASK, UNKNOWN, (int) V8HF_FTYPE_V8HI_V8HF_UQI) BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtw2ph_v16hi_mask, "__builtin_ia32_vcvtw2ph_v16hi_mask", IX86_BUILTIN_VCVTW2PH_V16HI_MASK, UNKNOWN, (int) V16HF_FTYPE_V16HI_V16HF_UHI) BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtuw2ph_v8hi_mask, "__builtin_ia32_vcvtuw2ph_v8hi_mask", IX86_BUILTIN_VCVTUW2PH_V8HI_MASK, UNKNOWN, (int) V8HF_FTYPE_V8HI_V8HF_UQI) @@ -3084,10 +3096,16 @@ BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_getmantv32hf_mask_round BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512f_vgetmantv8hf_mask_round, "__builtin_ia32_getmantsh_mask_round", IX86_BUILTIN_GETMANTSH_MASK_ROUND, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_INT_V8HF_UQI_INT) BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtph2dq_v16si_mask_round, "__builtin_ia32_vcvtph2dq_v16si_mask_round", IX86_BUILTIN_VCVTPH2DQ_V16SI_MASK_ROUND, UNKNOWN, (int) V16SI_FTYPE_V16HF_V16SI_UHI_INT) BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtph2udq_v16si_mask_round, "__builtin_ia32_vcvtph2udq_v16si_mask_round", IX86_BUILTIN_VCVTPH2UDQ_V16SI_MASK_ROUND, UNKNOWN, (int) V16SI_FTYPE_V16HF_V16SI_UHI_INT) +BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_fix_truncv16si2_mask_round, "__builtin_ia32_vcvttph2dq_v16si_mask_round", IX86_BUILTIN_VCVTTPH2DQ_V16SI_MASK_ROUND, UNKNOWN, (int) V16SI_FTYPE_V16HF_V16SI_UHI_INT) +BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_fixuns_truncv16si2_mask_round, "__builtin_ia32_vcvttph2udq_v16si_mask_round", IX86_BUILTIN_VCVTTPH2UDQ_V16SI_MASK_ROUND, UNKNOWN, (int) V16SI_FTYPE_V16HF_V16SI_UHI_INT) BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtph2qq_v8di_mask_round, "__builtin_ia32_vcvtph2qq_v8di_mask_round", IX86_BUILTIN_VCVTPH2QQ_V8DI_MASK_ROUND, UNKNOWN, (int) V8DI_FTYPE_V8HF_V8DI_UQI_INT) BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtph2uqq_v8di_mask_round, "__builtin_ia32_vcvtph2uqq_v8di_mask_round", IX86_BUILTIN_VCVTPH2UQQ_V8DI_MASK_ROUND, UNKNOWN, (int) V8DI_FTYPE_V8HF_V8DI_UQI_INT) +BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_fix_truncv8di2_mask_round, "__builtin_ia32_vcvttph2qq_v8di_mask_round", IX86_BUILTIN_VCVTTPH2QQ_V8DI_MASK_ROUND, UNKNOWN, (int) V8DI_FTYPE_V8HF_V8DI_UQI_INT) +BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_fixuns_truncv8di2_mask_round, "__builtin_ia32_vcvttph2uqq_v8di_mask_round", IX86_BUILTIN_VCVTTPH2UQQ_V8DI_MASK_ROUND, UNKNOWN, (int) V8DI_FTYPE_V8HF_V8DI_UQI_INT) BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtph2w_v32hi_mask_round, "__builtin_ia32_vcvtph2w_v32hi_mask_round", IX86_BUILTIN_VCVTPH2W_V32HI_MASK_ROUND, UNKNOWN, (int) V32HI_FTYPE_V32HF_V32HI_USI_INT) BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtph2uw_v32hi_mask_round, "__builtin_ia32_vcvtph2uw_v32hi_mask_round", IX86_BUILTIN_VCVTPH2UW_V32HI_MASK_ROUND, UNKNOWN, (int) V32HI_FTYPE_V32HF_V32HI_USI_INT) +BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_fix_truncv32hi2_mask_round, "__builtin_ia32_vcvttph2w_v32hi_mask_round", IX86_BUILTIN_VCVTTPH2W_V32HI_MASK_ROUND, UNKNOWN, (int) V32HI_FTYPE_V32HF_V32HI_USI_INT) +BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_fixuns_truncv32hi2_mask_round, "__builtin_ia32_vcvttph2uw_v32hi_mask_round", IX86_BUILTIN_VCVTTPH2UW_V32HI_MASK_ROUND, UNKNOWN, (int) V32HI_FTYPE_V32HF_V32HI_USI_INT) BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtw2ph_v32hi_mask_round, "__builtin_ia32_vcvtw2ph_v32hi_mask_round", IX86_BUILTIN_VCVTW2PH_V32HI_MASK_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HI_V32HF_USI_INT) BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtuw2ph_v32hi_mask_round, "__builtin_ia32_vcvtuw2ph_v32hi_mask_round", IX86_BUILTIN_VCVTUW2PH_V32HI_MASK_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HI_V32HF_USI_INT) BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtdq2ph_v16si_mask_round, "__builtin_ia32_vcvtdq2ph_v16si_mask_round", IX86_BUILTIN_VCVTDQ2PH_V16SI_MASK_ROUND, UNKNOWN, (int) V16HF_FTYPE_V16SI_V16HF_UHI_INT) diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index b312d26b806..66b4fa61eb5 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -5636,6 +5636,40 @@ (define_insn "avx512fp16_vcvtsi2sh" (set_attr "prefix" "evex") (set_attr "mode" "HF")]) +(define_insn "avx512fp16_fix_trunc2" + [(set (match_operand:VI2H_AVX512VL 0 "register_operand" "=v") + (any_fix:VI2H_AVX512VL + (match_operand: 1 "" "")))] + "TARGET_AVX512FP16" + "vcvttph2\t{%1, %0|%0, %1}" + [(set_attr "type" "ssecvt") + (set_attr "prefix" "evex") + (set_attr "mode" "")]) + +(define_insn "avx512fp16_fix_trunc2" + [(set (match_operand:VI4_128_8_256 0 "register_operand" "=v") + (any_fix:VI4_128_8_256 + (vec_select:V4HF + (match_operand:V8HF 1 "nonimmediate_operand" "vm") + (parallel [(const_int 0) (const_int 1) (const_int 2) (const_int 3)]))))] + "TARGET_AVX512FP16 && TARGET_AVX512VL" + "vcvttph2\t{%1, %0|%0, %q1}" + [(set_attr "type" "ssecvt") + (set_attr "prefix" "evex") + (set_attr "mode" "")]) + +(define_insn "avx512fp16_fix_truncv2di2" + [(set (match_operand:V2DI 0 "register_operand" "=v") + (any_fix:V2DI + (vec_select:V2HF + (match_operand:V8HF 1 "nonimmediate_operand" "vm") + (parallel [(const_int 0) (const_int 1)]))))] + "TARGET_AVX512FP16 && TARGET_AVX512VL" + "vcvttph2qq\t{%1, %0|%0, %k1}" + [(set_attr "type" "ssecvt") + (set_attr "prefix" "evex") + (set_attr "mode" "TI")]) + ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; ;; ;; Parallel single-precision floating point conversion operations diff --git a/gcc/testsuite/gcc.target/i386/avx-1.c b/gcc/testsuite/gcc.target/i386/avx-1.c index 0aae949097a..4b6cf7e1ed6 100644 --- a/gcc/testsuite/gcc.target/i386/avx-1.c +++ b/gcc/testsuite/gcc.target/i386/avx-1.c @@ -723,8 +723,14 @@ #define __builtin_ia32_vcvtph2udq_v16si_mask_round(A, B, C, D) __builtin_ia32_vcvtph2udq_v16si_mask_round(A, B, C, 8) #define __builtin_ia32_vcvtph2qq_v8di_mask_round(A, B, C, D) __builtin_ia32_vcvtph2qq_v8di_mask_round(A, B, C, 8) #define __builtin_ia32_vcvtph2uqq_v8di_mask_round(A, B, C, D) __builtin_ia32_vcvtph2uqq_v8di_mask_round(A, B, C, 8) +#define __builtin_ia32_vcvttph2dq_v16si_mask_round(A, B, C, D) __builtin_ia32_vcvttph2dq_v16si_mask_round(A, B, C, 8) +#define __builtin_ia32_vcvttph2udq_v16si_mask_round(A, B, C, D) __builtin_ia32_vcvttph2udq_v16si_mask_round(A, B, C, 8) +#define __builtin_ia32_vcvttph2qq_v8di_mask_round(A, B, C, D) __builtin_ia32_vcvttph2qq_v8di_mask_round(A, B, C, 8) +#define __builtin_ia32_vcvttph2uqq_v8di_mask_round(A, B, C, D) __builtin_ia32_vcvttph2uqq_v8di_mask_round(A, B, C, 8) #define __builtin_ia32_vcvtph2w_v32hi_mask_round(A, B, C, D) __builtin_ia32_vcvtph2w_v32hi_mask_round(A, B, C, 8) #define __builtin_ia32_vcvtph2uw_v32hi_mask_round(A, B, C, D) __builtin_ia32_vcvtph2uw_v32hi_mask_round(A, B, C, 8) +#define __builtin_ia32_vcvttph2w_v32hi_mask_round(A, B, C, D) __builtin_ia32_vcvttph2w_v32hi_mask_round(A, B, C, 8) +#define __builtin_ia32_vcvttph2uw_v32hi_mask_round(A, B, C, D) __builtin_ia32_vcvttph2uw_v32hi_mask_round(A, B, C, 8) #define __builtin_ia32_vcvtw2ph_v32hi_mask_round(A, B, C, D) __builtin_ia32_vcvtw2ph_v32hi_mask_round(A, B, C, 8) #define __builtin_ia32_vcvtuw2ph_v32hi_mask_round(A, B, C, D) __builtin_ia32_vcvtuw2ph_v32hi_mask_round(A, B, C, 8) #define __builtin_ia32_vcvtdq2ph_v16si_mask_round(A, B, C, D) __builtin_ia32_vcvtdq2ph_v16si_mask_round(A, B, C, 8) diff --git a/gcc/testsuite/gcc.target/i386/sse-13.c b/gcc/testsuite/gcc.target/i386/sse-13.c index 997fb733132..2e730d554dd 100644 --- a/gcc/testsuite/gcc.target/i386/sse-13.c +++ b/gcc/testsuite/gcc.target/i386/sse-13.c @@ -740,8 +740,14 @@ #define __builtin_ia32_vcvtph2udq_v16si_mask_round(A, B, C, D) __builtin_ia32_vcvtph2udq_v16si_mask_round(A, B, C, 8) #define __builtin_ia32_vcvtph2qq_v8di_mask_round(A, B, C, D) __builtin_ia32_vcvtph2qq_v8di_mask_round(A, B, C, 8) #define __builtin_ia32_vcvtph2uqq_v8di_mask_round(A, B, C, D) __builtin_ia32_vcvtph2uqq_v8di_mask_round(A, B, C, 8) +#define __builtin_ia32_vcvttph2dq_v16si_mask_round(A, B, C, D) __builtin_ia32_vcvttph2dq_v16si_mask_round(A, B, C, 8) +#define __builtin_ia32_vcvttph2udq_v16si_mask_round(A, B, C, D) __builtin_ia32_vcvttph2udq_v16si_mask_round(A, B, C, 8) +#define __builtin_ia32_vcvttph2qq_v8di_mask_round(A, B, C, D) __builtin_ia32_vcvttph2qq_v8di_mask_round(A, B, C, 8) +#define __builtin_ia32_vcvttph2uqq_v8di_mask_round(A, B, C, D) __builtin_ia32_vcvttph2uqq_v8di_mask_round(A, B, C, 8) #define __builtin_ia32_vcvtph2w_v32hi_mask_round(A, B, C, D) __builtin_ia32_vcvtph2w_v32hi_mask_round(A, B, C, 8) #define __builtin_ia32_vcvtph2uw_v32hi_mask_round(A, B, C, D) __builtin_ia32_vcvtph2uw_v32hi_mask_round(A, B, C, 8) +#define __builtin_ia32_vcvttph2w_v32hi_mask_round(A, B, C, D) __builtin_ia32_vcvttph2w_v32hi_mask_round(A, B, C, 8) +#define __builtin_ia32_vcvttph2uw_v32hi_mask_round(A, B, C, D) __builtin_ia32_vcvttph2uw_v32hi_mask_round(A, B, C, 8) #define __builtin_ia32_vcvtw2ph_v32hi_mask_round(A, B, C, D) __builtin_ia32_vcvtw2ph_v32hi_mask_round(A, B, C, 8) #define __builtin_ia32_vcvtuw2ph_v32hi_mask_round(A, B, C, D) __builtin_ia32_vcvtuw2ph_v32hi_mask_round(A, B, C, 8) #define __builtin_ia32_vcvtdq2ph_v16si_mask_round(A, B, C, D) __builtin_ia32_vcvtdq2ph_v16si_mask_round(A, B, C, 8) diff --git a/gcc/testsuite/gcc.target/i386/sse-14.c b/gcc/testsuite/gcc.target/i386/sse-14.c index 89a589e0d80..98e38fb025a 100644 --- a/gcc/testsuite/gcc.target/i386/sse-14.c +++ b/gcc/testsuite/gcc.target/i386/sse-14.c @@ -680,8 +680,14 @@ test_1 (_mm512_roundscale_ph, __m512h, __m512h, 123) test_1 (_mm512_getexp_round_ph, __m512h, __m512h, 8) test_1 (_mm512_cvt_roundph_epi16, __m512i, __m512h, 8) test_1 (_mm512_cvt_roundph_epu16, __m512i, __m512h, 8) +test_1 (_mm512_cvtt_roundph_epi16, __m512i, __m512h, 8) +test_1 (_mm512_cvtt_roundph_epu16, __m512i, __m512h, 8) test_1 (_mm512_cvt_roundph_epi32, __m512i, __m256h, 8) test_1 (_mm512_cvt_roundph_epu32, __m512i, __m256h, 8) +test_1 (_mm512_cvtt_roundph_epi32, __m512i, __m256h, 8) +test_1 (_mm512_cvtt_roundph_epu32, __m512i, __m256h, 8) +test_1 (_mm512_cvtt_roundph_epi64, __m512i, __m128h, 8) +test_1 (_mm512_cvtt_roundph_epu64, __m512i, __m128h, 8) test_1 (_mm512_cvt_roundph_epi64, __m512i, __m128h, 8) test_1 (_mm512_cvt_roundph_epu64, __m512i, __m128h, 8) test_1 (_mm512_cvt_roundepi16_ph, __m512h, __m512i, 8) @@ -732,10 +738,16 @@ test_2 (_mm512_maskz_getexp_round_ph, __m512h, __mmask32, __m512h, 8) test_2 (_mm_getexp_round_sh, __m128h, __m128h, __m128h, 8) test_2 (_mm512_maskz_cvt_roundph_epi16, __m512i, __mmask32, __m512h, 8) test_2 (_mm512_maskz_cvt_roundph_epu16, __m512i, __mmask32, __m512h, 8) +test_2 (_mm512_maskz_cvtt_roundph_epi16, __m512i, __mmask32, __m512h, 8) +test_2 (_mm512_maskz_cvtt_roundph_epu16, __m512i, __mmask32, __m512h, 8) test_2 (_mm512_maskz_cvt_roundph_epi32, __m512i, __mmask16, __m256h, 8) test_2 (_mm512_maskz_cvt_roundph_epu32, __m512i, __mmask16, __m256h, 8) test_2 (_mm512_maskz_cvt_roundph_epi64, __m512i, __mmask8, __m128h, 8) test_2 (_mm512_maskz_cvt_roundph_epu64, __m512i, __mmask8, __m128h, 8) +test_2 (_mm512_maskz_cvtt_roundph_epi32, __m512i, __mmask16, __m256h, 8) +test_2 (_mm512_maskz_cvtt_roundph_epu32, __m512i, __mmask16, __m256h, 8) +test_2 (_mm512_maskz_cvtt_roundph_epi64, __m512i, __mmask8, __m128h, 8) +test_2 (_mm512_maskz_cvtt_roundph_epu64, __m512i, __mmask8, __m128h, 8) test_2 (_mm512_maskz_cvt_roundepi16_ph, __m512h, __mmask32, __m512i, 8) test_2 (_mm512_maskz_cvt_roundepu16_ph, __m512h, __mmask32, __m512i, 8) test_2 (_mm512_maskz_cvt_roundepi32_ph, __m256h, __mmask16, __m512i, 8) @@ -784,10 +796,16 @@ test_3 (_mm_maskz_getexp_round_sh, __m128h, __mmask8, __m128h, __m128h, 8) test_3 (_mm512_mask_getexp_round_ph, __m512h, __m512h, __mmask32, __m512h, 8) test_3 (_mm512_mask_cvt_roundph_epi16, __m512i, __m512i, __mmask32, __m512h, 8) test_3 (_mm512_mask_cvt_roundph_epu16, __m512i, __m512i, __mmask32, __m512h, 8) +test_3 (_mm512_mask_cvtt_roundph_epi16, __m512i, __m512i, __mmask32, __m512h, 8) +test_3 (_mm512_mask_cvtt_roundph_epu16, __m512i, __m512i, __mmask32, __m512h, 8) test_3 (_mm512_mask_cvt_roundph_epi32, __m512i, __m512i, __mmask16, __m256h, 8) test_3 (_mm512_mask_cvt_roundph_epu32, __m512i, __m512i, __mmask16, __m256h, 8) test_3 (_mm512_mask_cvt_roundph_epi64, __m512i, __m512i, __mmask8, __m128h, 8) test_3 (_mm512_mask_cvt_roundph_epu64, __m512i, __m512i, __mmask8, __m128h, 8) +test_3 (_mm512_mask_cvtt_roundph_epi32, __m512i, __m512i, __mmask16, __m256h, 8) +test_3 (_mm512_mask_cvtt_roundph_epu32, __m512i, __m512i, __mmask16, __m256h, 8) +test_3 (_mm512_mask_cvtt_roundph_epi64, __m512i, __m512i, __mmask8, __m128h, 8) +test_3 (_mm512_mask_cvtt_roundph_epu64, __m512i, __m512i, __mmask8, __m128h, 8) test_3 (_mm512_mask_cvt_roundepi16_ph, __m512h, __m512h, __mmask32, __m512i, 8) test_3 (_mm512_mask_cvt_roundepu16_ph, __m512h, __m512h, __mmask32, __m512i, 8) test_3 (_mm512_mask_cvt_roundepi32_ph, __m256h, __m256h, __mmask16, __m512i, 8) diff --git a/gcc/testsuite/gcc.target/i386/sse-22.c b/gcc/testsuite/gcc.target/i386/sse-22.c index fed12744c6c..3ad10908d49 100644 --- a/gcc/testsuite/gcc.target/i386/sse-22.c +++ b/gcc/testsuite/gcc.target/i386/sse-22.c @@ -785,10 +785,16 @@ test_1 (_mm512_roundscale_ph, __m512h, __m512h, 123) test_1 (_mm512_getexp_round_ph, __m512h, __m512h, 8) test_1 (_mm512_cvt_roundph_epi16, __m512i, __m512h, 8) test_1 (_mm512_cvt_roundph_epu16, __m512i, __m512h, 8) +test_1 (_mm512_cvtt_roundph_epi16, __m512i, __m512h, 8) +test_1 (_mm512_cvtt_roundph_epu16, __m512i, __m512h, 8) test_1 (_mm512_cvt_roundph_epi32, __m512i, __m256h, 8) test_1 (_mm512_cvt_roundph_epu32, __m512i, __m256h, 8) test_1 (_mm512_cvt_roundph_epi64, __m512i, __m128h, 8) test_1 (_mm512_cvt_roundph_epu64, __m512i, __m128h, 8) +test_1 (_mm512_cvtt_roundph_epi32, __m512i, __m256h, 8) +test_1 (_mm512_cvtt_roundph_epu32, __m512i, __m256h, 8) +test_1 (_mm512_cvtt_roundph_epi64, __m512i, __m128h, 8) +test_1 (_mm512_cvtt_roundph_epu64, __m512i, __m128h, 8) test_1 (_mm512_cvt_roundepi16_ph, __m512h, __m512i, 8) test_1 (_mm512_cvt_roundepu16_ph, __m512h, __m512i, 8) test_1 (_mm512_cvt_roundepi32_ph, __m256h, __m512i, 8) @@ -836,10 +842,16 @@ test_2 (_mm512_maskz_getexp_round_ph, __m512h, __mmask32, __m512h, 8) test_2 (_mm_getexp_round_sh, __m128h, __m128h, __m128h, 8) test_2 (_mm512_maskz_cvt_roundph_epi16, __m512i, __mmask32, __m512h, 8) test_2 (_mm512_maskz_cvt_roundph_epu16, __m512i, __mmask32, __m512h, 8) +test_2 (_mm512_maskz_cvtt_roundph_epi16, __m512i, __mmask32, __m512h, 8) +test_2 (_mm512_maskz_cvtt_roundph_epu16, __m512i, __mmask32, __m512h, 8) test_2 (_mm512_maskz_cvt_roundph_epi32, __m512i, __mmask16, __m256h, 8) test_2 (_mm512_maskz_cvt_roundph_epu32, __m512i, __mmask16, __m256h, 8) test_2 (_mm512_maskz_cvt_roundph_epi64, __m512i, __mmask8, __m128h, 8) test_2 (_mm512_maskz_cvt_roundph_epu64, __m512i, __mmask8, __m128h, 8) +test_2 (_mm512_maskz_cvtt_roundph_epi32, __m512i, __mmask16, __m256h, 8) +test_2 (_mm512_maskz_cvtt_roundph_epu32, __m512i, __mmask16, __m256h, 8) +test_2 (_mm512_maskz_cvtt_roundph_epi64, __m512i, __mmask8, __m128h, 8) +test_2 (_mm512_maskz_cvtt_roundph_epu64, __m512i, __mmask8, __m128h, 8) test_2 (_mm512_maskz_cvt_roundepi16_ph, __m512h, __mmask32, __m512i, 8) test_2 (_mm512_maskz_cvt_roundepu16_ph, __m512h, __mmask32, __m512i, 8) test_2 (_mm512_maskz_cvt_roundepi32_ph, __m256h, __mmask16, __m512i, 8) @@ -887,10 +899,16 @@ test_3 (_mm_maskz_getexp_round_sh, __m128h, __mmask8, __m128h, __m128h, 8) test_3 (_mm512_mask_getexp_round_ph, __m512h, __m512h, __mmask32, __m512h, 8) test_3 (_mm512_mask_cvt_roundph_epi16, __m512i, __m512i, __mmask32, __m512h, 8) test_3 (_mm512_mask_cvt_roundph_epu16, __m512i, __m512i, __mmask32, __m512h, 8) +test_3 (_mm512_mask_cvtt_roundph_epi16, __m512i, __m512i, __mmask32, __m512h, 8) +test_3 (_mm512_mask_cvtt_roundph_epu16, __m512i, __m512i, __mmask32, __m512h, 8) test_3 (_mm512_mask_cvt_roundph_epi32, __m512i, __m512i, __mmask16, __m256h, 8) test_3 (_mm512_mask_cvt_roundph_epu32, __m512i, __m512i, __mmask16, __m256h, 8) test_3 (_mm512_mask_cvt_roundph_epi64, __m512i, __m512i, __mmask8, __m128h, 8) test_3 (_mm512_mask_cvt_roundph_epu64, __m512i, __m512i, __mmask8, __m128h, 8) +test_3 (_mm512_mask_cvtt_roundph_epi32, __m512i, __m512i, __mmask16, __m256h, 8) +test_3 (_mm512_mask_cvtt_roundph_epu32, __m512i, __m512i, __mmask16, __m256h, 8) +test_3 (_mm512_mask_cvtt_roundph_epi64, __m512i, __m512i, __mmask8, __m128h, 8) +test_3 (_mm512_mask_cvtt_roundph_epu64, __m512i, __m512i, __mmask8, __m128h, 8) test_3 (_mm512_mask_cvt_roundepi16_ph, __m512h, __m512h, __mmask32, __m512i, 8) test_3 (_mm512_mask_cvt_roundepu16_ph, __m512h, __m512h, __mmask32, __m512i, 8) test_3 (_mm512_mask_cvt_roundepi32_ph, __m256h, __m256h, __mmask16, __m512i, 8) diff --git a/gcc/testsuite/gcc.target/i386/sse-23.c b/gcc/testsuite/gcc.target/i386/sse-23.c index 6e8d8a1833c..6990f93bfce 100644 --- a/gcc/testsuite/gcc.target/i386/sse-23.c +++ b/gcc/testsuite/gcc.target/i386/sse-23.c @@ -741,8 +741,14 @@ #define __builtin_ia32_vcvtph2udq_v16si_mask_round(A, B, C, D) __builtin_ia32_vcvtph2udq_v16si_mask_round(A, B, C, 8) #define __builtin_ia32_vcvtph2qq_v8di_mask_round(A, B, C, D) __builtin_ia32_vcvtph2qq_v8di_mask_round(A, B, C, 8) #define __builtin_ia32_vcvtph2uqq_v8di_mask_round(A, B, C, D) __builtin_ia32_vcvtph2uqq_v8di_mask_round(A, B, C, 8) +#define __builtin_ia32_vcvttph2dq_v16si_mask_round(A, B, C, D) __builtin_ia32_vcvttph2dq_v16si_mask_round(A, B, C, 8) +#define __builtin_ia32_vcvttph2udq_v16si_mask_round(A, B, C, D) __builtin_ia32_vcvttph2udq_v16si_mask_round(A, B, C, 8) +#define __builtin_ia32_vcvttph2qq_v8di_mask_round(A, B, C, D) __builtin_ia32_vcvttph2qq_v8di_mask_round(A, B, C, 8) +#define __builtin_ia32_vcvttph2uqq_v8di_mask_round(A, B, C, D) __builtin_ia32_vcvttph2uqq_v8di_mask_round(A, B, C, 8) #define __builtin_ia32_vcvtph2w_v32hi_mask_round(A, B, C, D) __builtin_ia32_vcvtph2w_v32hi_mask_round(A, B, C, 8) #define __builtin_ia32_vcvtph2uw_v32hi_mask_round(A, B, C, D) __builtin_ia32_vcvtph2uw_v32hi_mask_round(A, B, C, 8) +#define __builtin_ia32_vcvttph2w_v32hi_mask_round(A, B, C, D) __builtin_ia32_vcvttph2w_v32hi_mask_round(A, B, C, 8) +#define __builtin_ia32_vcvttph2uw_v32hi_mask_round(A, B, C, D) __builtin_ia32_vcvttph2uw_v32hi_mask_round(A, B, C, 8) #define __builtin_ia32_vcvtw2ph_v32hi_mask_round(A, B, C, D) __builtin_ia32_vcvtw2ph_v32hi_mask_round(A, B, C, 8) #define __builtin_ia32_vcvtuw2ph_v32hi_mask_round(A, B, C, D) __builtin_ia32_vcvtuw2ph_v32hi_mask_round(A, B, C, 8) #define __builtin_ia32_vcvtdq2ph_v16si_mask_round(A, B, C, D) __builtin_ia32_vcvtdq2ph_v16si_mask_round(A, B, C, 8) From patchwork Thu Jul 1 06:16:19 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: liuhongt X-Patchwork-Id: 1499374 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=8.43.85.97; helo=sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256 header.s=default header.b=PJGRzQXr; dkim-atps=neutral Received: from sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4GFpsG5FcVz9sVb for ; Thu, 1 Jul 2021 16:56:18 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id F0240383B834 for ; Thu, 1 Jul 2021 06:56:15 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org F0240383B834 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1625122576; bh=gfE7e0IzrTWiQVmlURfx2SoPv5xuTCJRNKecWMpR7mU=; h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=PJGRzQXrdaCK2ZEbPWHCAlF0gz+fC8j5tkYcz0ZP46RISlEZvLbMGJwEuNyUMTWDk oMQyFzsXXkuuO95WAlh5zwqxuZ9bDKmXIsC53a5TLW/Dhit92uppX2EUS4I7D9GneK w6p2zWED7Qw0dUTpnTfSwh1GXyySHhXS9knW8n3c= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by sourceware.org (Postfix) with ESMTPS id D30B5385F034 for ; Thu, 1 Jul 2021 06:17:45 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org D30B5385F034 X-IronPort-AV: E=McAfee;i="6200,9189,10031"; a="208518881" X-IronPort-AV: E=Sophos;i="5.83,313,1616482800"; d="scan'208";a="208518881" Received: from orsmga007.jf.intel.com ([10.7.209.58]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Jun 2021 23:17:44 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.83,313,1616482800"; d="scan'208";a="447761526" Received: from scymds01.sc.intel.com ([10.148.94.138]) by orsmga007.jf.intel.com with ESMTP; 30 Jun 2021 23:17:44 -0700 Received: from shliclel320.sh.intel.com (shliclel320.sh.intel.com [10.239.236.50]) by scymds01.sc.intel.com with ESMTP id 1616GmfC031625; Wed, 30 Jun 2021 23:17:43 -0700 To: gcc-patches@gcc.gnu.org Subject: [PATCH 33/62] AVX512FP16: Add testcase for vcvttph2w/vcvttph2uw/vcvttph2dq/vcvttph2udq/vcvttph2qq/vcvttph2uqq. Date: Thu, 1 Jul 2021 14:16:19 +0800 Message-Id: <20210701061648.9447-34-hongtao.liu@intel.com> X-Mailer: git-send-email 2.18.1 In-Reply-To: <20210701061648.9447-1-hongtao.liu@intel.com> References: <20210701061648.9447-1-hongtao.liu@intel.com> X-Spam-Status: No, score=-12.5 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_NONE, KAM_DMARC_STATUS, KAM_LAZY_DOMAIN_SECURITY, KAM_SHORT, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: liuhongt via Gcc-patches From: liuhongt Reply-To: liuhongt Cc: jakub@redhat.com Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" gcc/testsuite/ChangeLog: * gcc.target/i386/avx512fp16-vcvttph2dq-1a.c: New test. * gcc.target/i386/avx512fp16-vcvttph2dq-1b.c: Ditto. * gcc.target/i386/avx512fp16-vcvttph2qq-1a.c: Ditto. * gcc.target/i386/avx512fp16-vcvttph2qq-1b.c: Ditto. * gcc.target/i386/avx512fp16-vcvttph2udq-1a.c: Ditto. * gcc.target/i386/avx512fp16-vcvttph2udq-1b.c: Ditto. * gcc.target/i386/avx512fp16-vcvttph2uqq-1a.c: Ditto. * gcc.target/i386/avx512fp16-vcvttph2uqq-1b.c: Ditto. * gcc.target/i386/avx512fp16-vcvttph2uw-1a.c: Ditto. * gcc.target/i386/avx512fp16-vcvttph2uw-1b.c: Ditto. * gcc.target/i386/avx512fp16-vcvttph2w-1a.c: Ditto. * gcc.target/i386/avx512fp16-vcvttph2w-1b.c: Ditto. * gcc.target/i386/avx512fp16vl-vcvttph2dq-1a.c: Ditto. * gcc.target/i386/avx512fp16vl-vcvttph2dq-1b.c: Ditto. * gcc.target/i386/avx512fp16vl-vcvttph2qq-1a.c: Ditto. * gcc.target/i386/avx512fp16vl-vcvttph2qq-1b.c: Ditto. * gcc.target/i386/avx512fp16vl-vcvttph2udq-1a.c: Ditto. * gcc.target/i386/avx512fp16vl-vcvttph2udq-1b.c: Ditto. * gcc.target/i386/avx512fp16vl-vcvttph2uqq-1a.c: Ditto. * gcc.target/i386/avx512fp16vl-vcvttph2uqq-1b.c: Ditto. * gcc.target/i386/avx512fp16vl-vcvttph2uw-1a.c: Ditto. * gcc.target/i386/avx512fp16vl-vcvttph2uw-1b.c: Ditto. * gcc.target/i386/avx512fp16vl-vcvttph2w-1a.c: Ditto. * gcc.target/i386/avx512fp16vl-vcvttph2w-1b.c: Ditto. --- .../i386/avx512fp16-vcvttph2dq-1a.c | 24 ++++++ .../i386/avx512fp16-vcvttph2dq-1b.c | 79 +++++++++++++++++ .../i386/avx512fp16-vcvttph2qq-1a.c | 24 ++++++ .../i386/avx512fp16-vcvttph2qq-1b.c | 78 +++++++++++++++++ .../i386/avx512fp16-vcvttph2udq-1a.c | 24 ++++++ .../i386/avx512fp16-vcvttph2udq-1b.c | 79 +++++++++++++++++ .../i386/avx512fp16-vcvttph2uqq-1a.c | 24 ++++++ .../i386/avx512fp16-vcvttph2uqq-1b.c | 78 +++++++++++++++++ .../i386/avx512fp16-vcvttph2uw-1a.c | 24 ++++++ .../i386/avx512fp16-vcvttph2uw-1b.c | 84 +++++++++++++++++++ .../gcc.target/i386/avx512fp16-vcvttph2w-1a.c | 24 ++++++ .../gcc.target/i386/avx512fp16-vcvttph2w-1b.c | 83 ++++++++++++++++++ .../i386/avx512fp16vl-vcvttph2dq-1a.c | 27 ++++++ .../i386/avx512fp16vl-vcvttph2dq-1b.c | 15 ++++ .../i386/avx512fp16vl-vcvttph2qq-1a.c | 27 ++++++ .../i386/avx512fp16vl-vcvttph2qq-1b.c | 15 ++++ .../i386/avx512fp16vl-vcvttph2udq-1a.c | 27 ++++++ .../i386/avx512fp16vl-vcvttph2udq-1b.c | 15 ++++ .../i386/avx512fp16vl-vcvttph2uqq-1a.c | 27 ++++++ .../i386/avx512fp16vl-vcvttph2uqq-1b.c | 15 ++++ .../i386/avx512fp16vl-vcvttph2uw-1a.c | 29 +++++++ .../i386/avx512fp16vl-vcvttph2uw-1b.c | 15 ++++ .../i386/avx512fp16vl-vcvttph2w-1a.c | 29 +++++++ .../i386/avx512fp16vl-vcvttph2w-1b.c | 15 ++++ 24 files changed, 881 insertions(+) create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvttph2dq-1a.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvttph2dq-1b.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvttph2qq-1a.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvttph2qq-1b.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvttph2udq-1a.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvttph2udq-1b.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvttph2uqq-1a.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvttph2uqq-1b.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvttph2uw-1a.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvttph2uw-1b.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvttph2w-1a.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvttph2w-1b.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvttph2dq-1a.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvttph2dq-1b.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvttph2qq-1a.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvttph2qq-1b.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvttph2udq-1a.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvttph2udq-1b.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvttph2uqq-1a.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvttph2uqq-1b.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvttph2uw-1a.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvttph2uw-1b.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvttph2w-1a.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvttph2w-1b.c diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vcvttph2dq-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvttph2dq-1a.c new file mode 100644 index 00000000000..0e44aaf1bb5 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvttph2dq-1a.c @@ -0,0 +1,24 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx512fp16 -O2" } */ +/* { dg-final { scan-assembler-times "vcvttph2dq\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 2 } } */ +/* { dg-final { scan-assembler-times "vcvttph2dq\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\[^\{\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvttph2dq\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvttph2dq\[ \\t\]+\{sae\}\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\[^\{\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvttph2dq\[ \\t\]+\{sae\}\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ + +#include + +volatile __m512i res, res1, res2; +volatile __m256h x1, x2, x3; +volatile __mmask16 m16; + +void extern +avx512f_test (void) +{ + res = _mm512_cvttph_epi32 (x1); + res1 = _mm512_mask_cvttph_epi32 (res, m16, x2); + res2 = _mm512_maskz_cvttph_epi32 (m16, x3); + res = _mm512_cvtt_roundph_epi32 (x1, 4); + res1 = _mm512_mask_cvtt_roundph_epi32 (res, m16, x2, 8); + res2 = _mm512_maskz_cvtt_roundph_epi32 (m16, x3, 8); +} diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vcvttph2dq-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvttph2dq-1b.c new file mode 100644 index 00000000000..c18fefbf206 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvttph2dq-1b.c @@ -0,0 +1,79 @@ +/* { dg-do run { target avx512fp16 } } */ +/* { dg-options "-O2 -mavx512fp16 -mavx512dq" } */ + + +#define AVX512FP16 +#include "avx512fp16-helper.h" + +#define N_ELEMS (AVX512F_LEN / 16) + +void NOINLINE +EMULATE(cvtph2_d) (V512 * dest, V512 op1, + __mmask32 k, int zero_mask) +{ + V512 v1, v2, v3, v4, v5, v6, v7, v8; + int i; + __mmask16 m1, m2; + + m1 = k & 0xffff; + + unpack_ph_2twops(op1, &v1, &v2); + + for (i = 0; i < 16; i++) { + if (((1 << i) & m1) == 0) { + if (zero_mask) { + v5.u32[i] = 0; + } + else { + v5.u32[i] = dest->u32[i]; + } + } + else { + v5.u32[i] = v1.f32[i]; + + } + } + *dest = v5; +} + +void +TEST (void) +{ + V512 res; + V512 exp; + + init_src(); + + EMULATE(cvtph2_d)(&exp, src1, NET_MASK, 0); + SI(res) = INTRINSIC (_cvttph_epi32) (H_HF(src1)); + CHECK_RESULT (&res, &exp, N_ELEMS, _cvttph_epi32); + + init_dest(&res, &exp); + EMULATE(cvtph2_d)(&exp, src1, HALF_MASK, 0); + SI(res) = INTRINSIC (_mask_cvttph_epi32) (SI(res), HALF_MASK, H_HF(src1)); + CHECK_RESULT (&res, &exp, N_ELEMS, _mask_cvttph_epi32); + + EMULATE(cvtph2_d)(&exp, src1, HALF_MASK, 1); + SI(res) = INTRINSIC (_maskz_cvttph_epi32) (HALF_MASK, H_HF(src1)); + CHECK_RESULT (&res, &exp, N_ELEMS, _maskz_cvttph_epi32); + +#if AVX512F_LEN == 512 + EMULATE(cvtph2_d)(&exp, src1, NET_MASK, 0); + SI(res) = INTRINSIC (_cvtt_roundph_epi32) (H_HF(src1), _ROUND_NINT); + CHECK_RESULT (&res, &exp, N_ELEMS, _cvtt_roundph_epi32); + + init_dest(&res, &exp); + EMULATE(cvtph2_d)(&exp, src1, HALF_MASK, 0); + SI(res) = INTRINSIC (_mask_cvtt_roundph_epi32) (SI(res), HALF_MASK, H_HF(src1), _ROUND_NINT); + CHECK_RESULT (&res, &exp, N_ELEMS, _mask_cvtt_roundph_epi32); + + EMULATE(cvtph2_d)(&exp, src1, HALF_MASK, 1); + SI(res) = INTRINSIC (_maskz_cvtt_roundph_epi32) (HALF_MASK, H_HF(src1), _ROUND_NINT); + CHECK_RESULT (&res, &exp, N_ELEMS, _maskz_cvtt_roundph_epi32); +#endif + + if (n_errs != 0) + abort (); +} + + diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vcvttph2qq-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvttph2qq-1a.c new file mode 100644 index 00000000000..124169467ee --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvttph2qq-1a.c @@ -0,0 +1,24 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx512fp16 -O2" } */ +/* { dg-final { scan-assembler-times "vcvttph2qq\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 2 } } */ +/* { dg-final { scan-assembler-times "vcvttph2qq\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\[^\{\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvttph2qq\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvttph2qq\[ \\t\]+\{sae\}\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\[^\{\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvttph2qq\[ \\t\]+\{sae\}\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ + +#include + +volatile __m512i res, res1, res2; +volatile __m128h x1, x2, x3; +volatile __mmask8 m8; + +void extern +avx512f_test (void) +{ + res = _mm512_cvttph_epi64 (x1); + res1 = _mm512_mask_cvttph_epi64 (res, m8, x2); + res2 = _mm512_maskz_cvttph_epi64 (m8, x3); + res = _mm512_cvtt_roundph_epi64 (x1, 4); + res1 = _mm512_mask_cvtt_roundph_epi64 (res, m8, x2, 8); + res2 = _mm512_maskz_cvtt_roundph_epi64 (m8, x3, 8); +} diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vcvttph2qq-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvttph2qq-1b.c new file mode 100644 index 00000000000..2a9a2ca26f9 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvttph2qq-1b.c @@ -0,0 +1,78 @@ +/* { dg-do run { target avx512fp16 } } */ +/* { dg-options "-O2 -mavx512fp16 -mavx512dq" } */ + + +#define AVX512FP16 +#include "avx512fp16-helper.h" + +#define N_ELEMS (AVX512F_LEN / 16) + +void NOINLINE +EMULATE(cvtph2_q) (V512 * dest, V512 op1, + __mmask32 k, int zero_mask) +{ + V512 v1, v2, v3, v4, v5, v6, v7, v8; + int i; + __mmask16 m1, m2; + + m1 = k & 0xffff; + + unpack_ph_2twops(op1, &v1, &v2); + + for (i = 0; i < 8; i++) { + if (((1 << i) & m1) == 0) { + if (zero_mask) { + v5.u64[i] = 0; + } + else { + v5.u64[i] = dest->u64[i]; + } + } + else { + v5.u64[i] = v1.f32[i]; + } + } + *dest = v5; +} + +void +TEST (void) +{ + V512 res; + V512 exp; + + init_src(); + + EMULATE(cvtph2_q)(&exp, src1, NET_MASK, 0); + SI(res) = INTRINSIC (_cvttph_epi64) (src1.xmmh[0]); + CHECK_RESULT (&res, &exp, N_ELEMS, _cvttph_epi64); + + init_dest(&res, &exp); + EMULATE(cvtph2_q)(&exp, src1, 0xcc, 0); + SI(res) = INTRINSIC (_mask_cvttph_epi64) (SI(res), 0xcc, src1.xmmh[0]); + CHECK_RESULT (&res, &exp, N_ELEMS, _mask_cvttph_epi64); + + EMULATE(cvtph2_q)(&exp, src1, 0xfa, 1); + SI(res) = INTRINSIC (_maskz_cvttph_epi64) (0xfa, src1.xmmh[0]); + CHECK_RESULT (&res, &exp, N_ELEMS, _maskz_cvttph_epi64); + +#if AVX512F_LEN == 512 + EMULATE(cvtph2_q)(&exp, src1, NET_MASK, 0); + SI(res) = INTRINSIC (_cvtt_roundph_epi64) (src1.xmmh[0], _ROUND_NINT); + CHECK_RESULT (&res, &exp, N_ELEMS, _cvtt_roundph_epi64); + + init_dest(&res, &exp); + EMULATE(cvtph2_q)(&exp, src1, 0xcc, 0); + SI(res) = INTRINSIC (_mask_cvtt_roundph_epi64) (SI(res), 0xcc, src1.xmmh[0], _ROUND_NINT); + CHECK_RESULT (&res, &exp, N_ELEMS, _mask_cvtt_roundph_epi64); + + EMULATE(cvtph2_q)(&exp, src1, 0xfa, 1); + SI(res) = INTRINSIC (_maskz_cvtt_roundph_epi64) (0xfa, src1.xmmh[0], _ROUND_NINT); + CHECK_RESULT (&res, &exp, N_ELEMS, _maskz_cvtt_roundph_epi64); +#endif + + if (n_errs != 0) { + abort (); + } +} + diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vcvttph2udq-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvttph2udq-1a.c new file mode 100644 index 00000000000..0fd60f56777 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvttph2udq-1a.c @@ -0,0 +1,24 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx512fp16 -O2" } */ +/* { dg-final { scan-assembler-times "vcvttph2udq\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 2 } } */ +/* { dg-final { scan-assembler-times "vcvttph2udq\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\[^\{\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvttph2udq\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvttph2udq\[ \\t\]+\{sae\}\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\[^\{\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvttph2udq\[ \\t\]+\{sae\}\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ + +#include + +volatile __m512i res, res1, res2; +volatile __m256h x1, x2, x3; +volatile __mmask16 m16; + +void extern +avx512f_test (void) +{ + res = _mm512_cvttph_epu32 (x1); + res1 = _mm512_mask_cvttph_epu32 (res, m16, x2); + res2 = _mm512_maskz_cvttph_epu32 (m16, x3); + res = _mm512_cvtt_roundph_epu32 (x1, 4); + res1 = _mm512_mask_cvtt_roundph_epu32 (res, m16, x2, 8); + res2 = _mm512_maskz_cvtt_roundph_epu32 (m16, x3, 8); +} diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vcvttph2udq-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvttph2udq-1b.c new file mode 100644 index 00000000000..98bce374753 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvttph2udq-1b.c @@ -0,0 +1,79 @@ +/* { dg-do run { target avx512fp16 } } */ +/* { dg-options "-O2 -mavx512fp16 -mavx512dq" } */ + + +#define AVX512FP16 +#include "avx512fp16-helper.h" + +#define N_ELEMS (AVX512F_LEN / 16) + +void NOINLINE +EMULATE(cvtph2_d) (V512 * dest, V512 op1, + __mmask32 k, int zero_mask) +{ + V512 v1, v2, v3, v4, v5, v6, v7, v8; + int i; + __mmask16 m1, m2; + + m1 = k & 0xffff; + + unpack_ph_2twops(op1, &v1, &v2); + + for (i = 0; i < 16; i++) { + if (((1 << i) & m1) == 0) { + if (zero_mask) { + v5.u32[i] = 0; + } + else { + v5.u32[i] = dest->u32[i]; + } + } + else { + v5.u32[i] = v1.f32[i]; + + } + } + *dest = v5; +} + +void +TEST (void) +{ + V512 res; + V512 exp; + + init_src(); + + EMULATE(cvtph2_d)(&exp, src1, NET_MASK, 0); + SI(res) = INTRINSIC (_cvttph_epu32) (H_HF(src1)); + CHECK_RESULT (&res, &exp, N_ELEMS, _cvttph_epu32); + + init_dest(&res, &exp); + EMULATE(cvtph2_d)(&exp, src1, HALF_MASK, 0); + SI(res) = INTRINSIC (_mask_cvttph_epu32) (SI(res), HALF_MASK, H_HF(src1)); + CHECK_RESULT (&res, &exp, N_ELEMS, _mask_cvttph_epu32); + + EMULATE(cvtph2_d)(&exp, src1, HALF_MASK, 1); + SI(res) = INTRINSIC (_maskz_cvttph_epu32) (HALF_MASK, H_HF(src1)); + CHECK_RESULT (&res, &exp, N_ELEMS, _maskz_cvttph_epu32); + +#if AVX512F_LEN == 512 + EMULATE(cvtph2_d)(&exp, src1, NET_MASK, 0); + SI(res) = INTRINSIC (_cvtt_roundph_epu32) (H_HF(src1), _ROUND_NINT); + CHECK_RESULT (&res, &exp, N_ELEMS, _cvtt_roundph_epu32); + + init_dest(&res, &exp); + EMULATE(cvtph2_d)(&exp, src1, HALF_MASK, 0); + SI(res) = INTRINSIC (_mask_cvtt_roundph_epu32) (SI(res), HALF_MASK, H_HF(src1), _ROUND_NINT); + CHECK_RESULT (&res, &exp, N_ELEMS, _mask_cvtt_roundph_epu32); + + EMULATE(cvtph2_d)(&exp, src1, HALF_MASK, 1); + SI(res) = INTRINSIC (_maskz_cvtt_roundph_epu32) (HALF_MASK, H_HF(src1), _ROUND_NINT); + CHECK_RESULT (&res, &exp, N_ELEMS, _maskz_cvtt_roundph_epu32); +#endif + + if (n_errs != 0) { + abort (); + } +} + diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vcvttph2uqq-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvttph2uqq-1a.c new file mode 100644 index 00000000000..04fee2936c8 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvttph2uqq-1a.c @@ -0,0 +1,24 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx512fp16 -O2" } */ +/* { dg-final { scan-assembler-times "vcvttph2uqq\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 2 } } */ +/* { dg-final { scan-assembler-times "vcvttph2uqq\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\[^\{\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvttph2uqq\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvttph2uqq\[ \\t\]+\{sae\}\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\[^\{\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvttph2uqq\[ \\t\]+\{sae\}\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ + +#include + +volatile __m512i res, res1, res2; +volatile __m128h x1, x2, x3; +volatile __mmask8 m8; + +void extern +avx512f_test (void) +{ + res = _mm512_cvttph_epu64 (x1); + res1 = _mm512_mask_cvttph_epu64 (res, m8, x2); + res2 = _mm512_maskz_cvttph_epu64 (m8, x3); + res = _mm512_cvtt_roundph_epu64 (x1, 4); + res1 = _mm512_mask_cvtt_roundph_epu64 (res, m8, x2, 8); + res2 = _mm512_maskz_cvtt_roundph_epu64 (m8, x3, 8); +} diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vcvttph2uqq-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvttph2uqq-1b.c new file mode 100644 index 00000000000..31879ef8983 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvttph2uqq-1b.c @@ -0,0 +1,78 @@ +/* { dg-do run { target avx512fp16 } } */ +/* { dg-options "-O2 -mavx512fp16 -mavx512dq" } */ + + +#define AVX512FP16 +#include "avx512fp16-helper.h" + +#define N_ELEMS (AVX512F_LEN / 16) + +void NOINLINE +EMULATE(cvtph2_q) (V512 * dest, V512 op1, + __mmask32 k, int zero_mask) +{ + V512 v1, v2, v3, v4, v5, v6, v7, v8; + int i; + __mmask16 m1, m2; + + m1 = k & 0xffff; + + unpack_ph_2twops(op1, &v1, &v2); + + for (i = 0; i < 8; i++) { + if (((1 << i) & m1) == 0) { + if (zero_mask) { + v5.u64[i] = 0; + } + else { + v5.u64[i] = dest->u64[i]; + } + } + else { + v5.u64[i] = v1.f32[i]; + } + } + *dest = v5; +} + +void +TEST (void) +{ + V512 res; + V512 exp; + + init_src(); + + EMULATE(cvtph2_q)(&exp, src1, NET_MASK, 0); + SI(res) = INTRINSIC (_cvttph_epu64) (src1.xmmh[0]); + CHECK_RESULT (&res, &exp, N_ELEMS, _cvttph_epu64); + + init_dest(&res, &exp); + EMULATE(cvtph2_q)(&exp, src1, 0xcc, 0); + SI(res) = INTRINSIC (_mask_cvttph_epu64) (SI(res), 0xcc, src1.xmmh[0]); + CHECK_RESULT (&res, &exp, N_ELEMS, _mask_cvttph_epu64); + + EMULATE(cvtph2_q)(&exp, src1, 0xfc, 1); + SI(res) = INTRINSIC (_maskz_cvttph_epu64) (0xfc, src1.xmmh[0]); + CHECK_RESULT (&res, &exp, N_ELEMS, _maskz_cvttph_epu64); + +#if AVX512F_LEN == 512 + EMULATE(cvtph2_q)(&exp, src1, NET_MASK, 0); + SI(res) = INTRINSIC (_cvtt_roundph_epu64) (src1.xmmh[0], _ROUND_NINT); + CHECK_RESULT (&res, &exp, N_ELEMS, _cvtt_roundph_epu64); + + init_dest(&res, &exp); + EMULATE(cvtph2_q)(&exp, src1, 0xcc, 0); + SI(res) = INTRINSIC (_mask_cvtt_roundph_epu64) (SI(res), 0xcc, src1.xmmh[0], _ROUND_NINT); + CHECK_RESULT (&res, &exp, N_ELEMS, _mask_cvtt_roundph_epu64); + + EMULATE(cvtph2_q)(&exp, src1, 0xfc, 1); + SI(res) = INTRINSIC (_maskz_cvtt_roundph_epu64) (0xfc, src1.xmmh[0], _ROUND_NINT); + CHECK_RESULT (&res, &exp, N_ELEMS, _maskz_cvtt_roundph_epu64); +#endif + + if (n_errs != 0) { + abort (); + } +} + diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vcvttph2uw-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvttph2uw-1a.c new file mode 100644 index 00000000000..b31af8441a9 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvttph2uw-1a.c @@ -0,0 +1,24 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx512fp16 -O2" } */ +/* { dg-final { scan-assembler-times "vcvttph2uw\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 2 } } */ +/* { dg-final { scan-assembler-times "vcvttph2uw\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\[^\{\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvttph2uw\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvttph2uw\[ \\t\]+\{sae\}\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\[^\{\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvttph2uw\[ \\t\]+\{sae\}\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ + +#include + +volatile __m512i res, res1, res2; +volatile __m512h x1, x2, x3; +volatile __mmask32 m32; + +void extern +avx512f_test (void) +{ + res = _mm512_cvttph_epu16 (x1); + res1 = _mm512_mask_cvttph_epu16 (res, m32, x2); + res2 = _mm512_maskz_cvttph_epu16 (m32, x3); + res = _mm512_cvtt_roundph_epu16 (x1, 4); + res1 = _mm512_mask_cvtt_roundph_epu16 (res, m32, x2, 8); + res2 = _mm512_maskz_cvtt_roundph_epu16 (m32, x3, 8); +} diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vcvttph2uw-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvttph2uw-1b.c new file mode 100644 index 00000000000..34e94e8e549 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvttph2uw-1b.c @@ -0,0 +1,84 @@ +/* { dg-do run { target avx512fp16 } } */ +/* { dg-options "-O2 -mavx512fp16 -mavx512dq" } */ + + +#define AVX512FP16 +#include "avx512fp16-helper.h" + +#define N_ELEMS (AVX512F_LEN / 16) + +void NOINLINE +EMULATE(cvtph2_w) (V512 * dest, V512 op1, + __mmask32 k, int zero_mask) +{ + V512 v1, v2, v3, v4, v5, v6, v7, v8; + int i; + __mmask16 m1, m2; + + m1 = k & 0xffff; + m2 = (k >> 16) & 0xffff; + + unpack_ph_2twops(op1, &v1, &v2); + + for (i = 0; i < 16; i++) { + if (((1 << i) & m1) == 0) { + if (zero_mask) { + dest->u16[i] = 0; + } + } + else { + dest->u16[i] = v1.f32[i]; + + } + + if (((1 << i) & m2) == 0) { + if (zero_mask) { + dest->u16[i+16] = 0; + } + } + else { + dest->u16[i+16] = v2.f32[i]; + } + } +} + +void +TEST (void) +{ + V512 res; + V512 exp; + + init_src(); + + EMULATE(cvtph2_w)(&exp, src1, NET_MASK, 0); + SI(res) = INTRINSIC (_cvttph_epu16) (HF(src1)); + CHECK_RESULT (&res, &exp, N_ELEMS, _cvttph_epu16); + + init_dest(&res, &exp); + EMULATE(cvtph2_w)(&exp, src1, MASK_VALUE, 0); + SI(res) = INTRINSIC (_mask_cvttph_epu16) (SI(res), MASK_VALUE, HF(src1)); + CHECK_RESULT (&res, &exp, N_ELEMS, _mask_cvttph_epu16); + + EMULATE(cvtph2_w)(&exp, src1, ZMASK_VALUE, 1); + SI(res) = INTRINSIC (_maskz_cvttph_epu16) (ZMASK_VALUE, HF(src1)); + CHECK_RESULT (&res, &exp, N_ELEMS, _maskz_cvttph_epu16); + +#if AVX512F_LEN == 512 + EMULATE(cvtph2_w)(&exp, src1, NET_MASK, 0); + SI(res) = INTRINSIC (_cvtt_roundph_epu16) (HF(src1), _ROUND_NINT); + CHECK_RESULT (&res, &exp, N_ELEMS, _cvtt_roundph_epu16); + + init_dest(&res, &exp); + EMULATE(cvtph2_w)(&exp, src1, MASK_VALUE, 0); + SI(res) = INTRINSIC (_mask_cvtt_roundph_epu16) (SI(res), MASK_VALUE, HF(src1), _ROUND_NINT); + CHECK_RESULT (&res, &exp, N_ELEMS, _mask_cvtt_roundph_epu16); + + EMULATE(cvtph2_w)(&exp, src1, ZMASK_VALUE, 1); + SI(res) = INTRINSIC (_maskz_cvtt_roundph_epu16) (ZMASK_VALUE, HF(src1), _ROUND_NINT); + CHECK_RESULT (&res, &exp, N_ELEMS, _maskz_cvtt_roundph_epu16); +#endif + + if (n_errs != 0) + abort (); +} + diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vcvttph2w-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvttph2w-1a.c new file mode 100644 index 00000000000..a918594d0d8 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvttph2w-1a.c @@ -0,0 +1,24 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx512fp16 -O2" } */ +/* { dg-final { scan-assembler-times "vcvttph2w\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 2 } } */ +/* { dg-final { scan-assembler-times "vcvttph2w\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\[^\{\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvttph2w\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvttph2w\[ \\t\]+\{sae\}\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\[^\{\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvttph2w\[ \\t\]+\{sae\}\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ + +#include + +volatile __m512i res, res1, res2; +volatile __m512h x1, x2, x3; +volatile __mmask32 m32; + +void extern +avx512f_test (void) +{ + res = _mm512_cvttph_epi16 (x1); + res1 = _mm512_mask_cvttph_epi16 (res, m32, x2); + res2 = _mm512_maskz_cvttph_epi16 (m32, x3); + res = _mm512_cvtt_roundph_epi16 (x1, 4); + res1 = _mm512_mask_cvtt_roundph_epi16 (res, m32, x2, 8); + res2 = _mm512_maskz_cvtt_roundph_epi16 (m32, x3, 8); +} diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vcvttph2w-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvttph2w-1b.c new file mode 100644 index 00000000000..23bc8e680c5 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvttph2w-1b.c @@ -0,0 +1,83 @@ +/* { dg-do run { target avx512fp16 } } */ +/* { dg-options "-O2 -mavx512fp16 -mavx512dq" } */ + +#define AVX512FP16 +#include "avx512fp16-helper.h" + +#define N_ELEMS (AVX512F_LEN / 16) + +void NOINLINE +EMULATE(cvtph2_w) (V512 * dest, V512 op1, + __mmask32 k, int zero_mask) +{ + V512 v1, v2, v3, v4, v5, v6, v7, v8; + int i; + __mmask16 m1, m2; + + m1 = k & 0xffff; + m2 = (k >> 16) & 0xffff; + + unpack_ph_2twops(op1, &v1, &v2); + + for (i = 0; i < 16; i++) { + if (((1 << i) & m1) == 0) { + if (zero_mask) { + dest->u16[i] = 0; + } + } + else { + dest->u16[i] = v1.f32[i]; + + } + + if (((1 << i) & m2) == 0) { + if (zero_mask) { + dest->u16[i+16] = 0; + } + } + else { + dest->u16[i+16] = v2.f32[i]; + } + } +} + +void +TEST (void) +{ + V512 res, exp; + + init_src(); + + EMULATE(cvtph2_w)(&exp, src1, NET_MASK, 0); + SI(res) = INTRINSIC (_cvttph_epi16) (HF(src1)); + CHECK_RESULT (&res, &exp, N_ELEMS, _cvttph_epi16); + + init_dest(&res, &exp); + EMULATE(cvtph2_w)(&exp, src1, MASK_VALUE, 0); + SI(res) = INTRINSIC (_mask_cvttph_epi16) (SI(res), MASK_VALUE, HF(src1)); + CHECK_RESULT (&res, &exp, N_ELEMS, _mask_cvttph_epi16); + + EMULATE(cvtph2_w)(&exp, src1, ZMASK_VALUE, 1); + SI(res) = INTRINSIC (_maskz_cvttph_epi16) (ZMASK_VALUE, HF(src1)); + CHECK_RESULT (&res, &exp, N_ELEMS, _maskz_cvttph_epi16); + +#if AVX512F_LEN == 512 + EMULATE(cvtph2_w)(&exp, src1, NET_MASK, 0); + SI(res) = INTRINSIC (_cvtt_roundph_epi16) (HF(src1), _ROUND_NINT); + CHECK_RESULT (&res, &exp, N_ELEMS, _cvtt_roundph_epi16); + + init_dest(&res, &exp); + EMULATE(cvtph2_w)(&exp, src1, MASK_VALUE, 0); + SI(res) = INTRINSIC (_mask_cvtt_roundph_epi16) (SI(res), MASK_VALUE, HF(src1), _ROUND_NINT); + CHECK_RESULT (&res, &exp, N_ELEMS, _mask_cvtt_roundph_epi16); + + EMULATE(cvtph2_w)(&exp, src1, ZMASK_VALUE, 1); + SI(res) = INTRINSIC (_maskz_cvtt_roundph_epi16) (ZMASK_VALUE, HF(src1), _ROUND_NINT); + CHECK_RESULT (&res, &exp, N_ELEMS, _maskz_cvtt_roundph_epi16); +#endif + + if (n_errs != 0) { + abort (); + } +} + diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvttph2dq-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvttph2dq-1a.c new file mode 100644 index 00000000000..b4c084020ac --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvttph2dq-1a.c @@ -0,0 +1,27 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx512fp16 -mavx512vl -O2" } */ +/* { dg-final { scan-assembler-times "vcvttph2dq\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvttph2dq\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvttph2dq\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvttph2dq\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvttph2dq\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvttph2dq\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ + +#include + +volatile __m256i res1; +volatile __m128i res2; +volatile __m128h x3; +volatile __mmask8 m8; + +void extern +avx512f_test (void) +{ + res1 = _mm256_cvttph_epi32 (x3); + res1 = _mm256_mask_cvttph_epi32 (res1, m8, x3); + res1 = _mm256_maskz_cvttph_epi32 (m8, x3); + + res2 = _mm_cvttph_epi32 (x3); + res2 = _mm_mask_cvttph_epi32 (res2, m8, x3); + res2 = _mm_maskz_cvttph_epi32 (m8, x3); +} diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvttph2dq-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvttph2dq-1b.c new file mode 100644 index 00000000000..f9d82f92f4d --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvttph2dq-1b.c @@ -0,0 +1,15 @@ +/* { dg-do run { target avx512fp16 } } */ +/* { dg-options "-O2 -mavx512fp16 -mavx512vl -mavx512dq" } */ + +#define AVX512VL +#define AVX512F_LEN 256 +#define AVX512F_LEN_HALF 128 +#include "avx512fp16-vcvttph2dq-1b.c" + +#undef AVX512F_LEN +#undef AVX512F_LEN_HALF + +#define AVX512F_LEN 128 +#define AVX512F_LEN_HALF 128 +#include "avx512fp16-vcvttph2dq-1b.c" + diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvttph2qq-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvttph2qq-1a.c new file mode 100644 index 00000000000..421c688ee29 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvttph2qq-1a.c @@ -0,0 +1,27 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx512fp16 -mavx512vl -O2" } */ +/* { dg-final { scan-assembler-times "vcvttph2qq\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvttph2qq\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvttph2qq\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvttph2qq\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvttph2qq\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvttph2qq\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ + +#include + +volatile __m256i res1; +volatile __m128i res2; +volatile __m128h x3; +volatile __mmask8 m8; + +void extern +avx512f_test (void) +{ + res1 = _mm256_cvttph_epi64 (x3); + res1 = _mm256_mask_cvttph_epi64 (res1, m8, x3); + res1 = _mm256_maskz_cvttph_epi64 (m8, x3); + + res2 = _mm_cvttph_epi64 (x3); + res2 = _mm_mask_cvttph_epi64 (res2, m8, x3); + res2 = _mm_maskz_cvttph_epi64 (m8, x3); +} diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvttph2qq-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvttph2qq-1b.c new file mode 100644 index 00000000000..323ab74fa05 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvttph2qq-1b.c @@ -0,0 +1,15 @@ +/* { dg-do run { target avx512fp16 } } */ +/* { dg-options "-O2 -mavx512fp16 -mavx512vl -mavx512dq" } */ + +#define AVX512VL +#define AVX512F_LEN 256 +#define AVX512F_LEN_HALF 128 +#include "avx512fp16-vcvttph2qq-1b.c" + +#undef AVX512F_LEN +#undef AVX512F_LEN_HALF + +#define AVX512F_LEN 128 +#define AVX512F_LEN_HALF 128 +#include "avx512fp16-vcvttph2qq-1b.c" + diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvttph2udq-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvttph2udq-1a.c new file mode 100644 index 00000000000..60f43189d61 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvttph2udq-1a.c @@ -0,0 +1,27 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx512fp16 -mavx512vl -O2" } */ +/* { dg-final { scan-assembler-times "vcvttph2udq\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvttph2udq\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvttph2udq\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvttph2udq\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvttph2udq\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvttph2udq\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ + +#include + +volatile __m256i res1; +volatile __m128i res2; +volatile __m128h x3; +volatile __mmask8 m8; + +void extern +avx512f_test (void) +{ + res1 = _mm256_cvttph_epu32 (x3); + res1 = _mm256_mask_cvttph_epu32 (res1, m8, x3); + res1 = _mm256_maskz_cvttph_epu32 (m8, x3); + + res2 = _mm_cvttph_epu32 (x3); + res2 = _mm_mask_cvttph_epu32 (res2, m8, x3); + res2 = _mm_maskz_cvttph_epu32 (m8, x3); +} diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvttph2udq-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvttph2udq-1b.c new file mode 100644 index 00000000000..61365d456c2 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvttph2udq-1b.c @@ -0,0 +1,15 @@ +/* { dg-do run { target avx512fp16 } } */ +/* { dg-options "-O2 -mavx512fp16 -mavx512vl -mavx512dq" } */ + +#define AVX512VL +#define AVX512F_LEN 256 +#define AVX512F_LEN_HALF 128 +#include "avx512fp16-vcvttph2udq-1b.c" + +#undef AVX512F_LEN +#undef AVX512F_LEN_HALF + +#define AVX512F_LEN 128 +#define AVX512F_LEN_HALF 128 +#include "avx512fp16-vcvttph2udq-1b.c" + diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvttph2uqq-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvttph2uqq-1a.c new file mode 100644 index 00000000000..37008f9d9e9 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvttph2uqq-1a.c @@ -0,0 +1,27 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx512fp16 -mavx512vl -O2" } */ +/* { dg-final { scan-assembler-times "vcvttph2uqq\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvttph2uqq\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvttph2uqq\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvttph2uqq\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvttph2uqq\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvttph2uqq\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ + +#include + +volatile __m256i res1; +volatile __m128i res2; +volatile __m128h x3; +volatile __mmask8 m8; + +void extern +avx512f_test (void) +{ + res1 = _mm256_cvttph_epu64 (x3); + res1 = _mm256_mask_cvttph_epu64 (res1, m8, x3); + res1 = _mm256_maskz_cvttph_epu64 (m8, x3); + + res2 = _mm_cvttph_epu64 (x3); + res2 = _mm_mask_cvttph_epu64 (res2, m8, x3); + res2 = _mm_maskz_cvttph_epu64 (m8, x3); +} diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvttph2uqq-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvttph2uqq-1b.c new file mode 100644 index 00000000000..6360402e6d6 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvttph2uqq-1b.c @@ -0,0 +1,15 @@ +/* { dg-do run { target avx512fp16 } } */ +/* { dg-options "-O2 -mavx512fp16 -mavx512vl -mavx512dq" } */ + +#define AVX512VL +#define AVX512F_LEN 256 +#define AVX512F_LEN_HALF 128 +#include "avx512fp16-vcvttph2uqq-1b.c" + +#undef AVX512F_LEN +#undef AVX512F_LEN_HALF + +#define AVX512F_LEN 128 +#define AVX512F_LEN_HALF 128 +#include "avx512fp16-vcvttph2uqq-1b.c" + diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvttph2uw-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvttph2uw-1a.c new file mode 100644 index 00000000000..eafa31a786b --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvttph2uw-1a.c @@ -0,0 +1,29 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx512fp16 -mavx512vl -O2" } */ +/* { dg-final { scan-assembler-times "vcvttph2uw\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvttph2uw\[ \\t\]+%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvttph2uw\[ \\t\]+%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvttph2uw\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvttph2uw\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvttph2uw\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ + +#include + +volatile __m256i res1; +volatile __m128i res2; +volatile __m256h x3; +volatile __m128h x4; +volatile __mmask16 m16; +volatile __mmask8 m8; + +void extern +avx512f_test (void) +{ + res1 = _mm256_cvttph_epu16 (x3); + res1 = _mm256_mask_cvttph_epu16 (res1, m16, x3); + res1 = _mm256_maskz_cvttph_epu16 (m16, x3); + + res2 = _mm_cvttph_epu16 (x4); + res2 = _mm_mask_cvttph_epu16 (res2, m8, x4); + res2 = _mm_maskz_cvttph_epu16 (m8, x4); +} diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvttph2uw-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvttph2uw-1b.c new file mode 100644 index 00000000000..dd5ed9d5b38 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvttph2uw-1b.c @@ -0,0 +1,15 @@ +/* { dg-do run { target avx512fp16 } } */ +/* { dg-options "-O2 -mavx512fp16 -mavx512vl -mavx512dq" } */ + +#define AVX512VL +#define AVX512F_LEN 256 +#define AVX512F_LEN_HALF 128 +#include "avx512fp16-vcvttph2uw-1b.c" + +#undef AVX512F_LEN +#undef AVX512F_LEN_HALF + +#define AVX512F_LEN 128 +#define AVX512F_LEN_HALF 128 +#include "avx512fp16-vcvttph2uw-1b.c" + diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvttph2w-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvttph2w-1a.c new file mode 100644 index 00000000000..7476d3c1160 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvttph2w-1a.c @@ -0,0 +1,29 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx512fp16 -mavx512vl -O2" } */ +/* { dg-final { scan-assembler-times "vcvttph2w\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvttph2w\[ \\t\]+%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvttph2w\[ \\t\]+%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvttph2w\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvttph2w\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvttph2w\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ + +#include + +volatile __m256i res1; +volatile __m128i res2; +volatile __m256h x3; +volatile __m128h x4; +volatile __mmask16 m16; +volatile __mmask8 m8; + +void extern +avx512f_test (void) +{ + res1 = _mm256_cvttph_epi16 (x3); + res1 = _mm256_mask_cvttph_epi16 (res1, m16, x3); + res1 = _mm256_maskz_cvttph_epi16 (m16, x3); + + res2 = _mm_cvttph_epi16 (x4); + res2 = _mm_mask_cvttph_epi16 (res2, m8, x4); + res2 = _mm_maskz_cvttph_epi16 (m8, x4); +} diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvttph2w-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvttph2w-1b.c new file mode 100644 index 00000000000..7a04a6a8ebc --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvttph2w-1b.c @@ -0,0 +1,15 @@ +/* { dg-do run { target avx512fp16 } } */ +/* { dg-options "-O2 -mavx512fp16 -mavx512vl -mavx512dq" } */ + +#define AVX512VL +#define AVX512F_LEN 256 +#define AVX512F_LEN_HALF 128 +#include "avx512fp16-vcvttph2w-1b.c" + +#undef AVX512F_LEN +#undef AVX512F_LEN_HALF + +#define AVX512F_LEN 128 +#define AVX512F_LEN_HALF 128 +#include "avx512fp16-vcvttph2w-1b.c" + From patchwork Thu Jul 1 06:16:20 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: liuhongt X-Patchwork-Id: 1499373 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256 header.s=default header.b=N/KVZoS7; dkim-atps=neutral Received: from sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4GFprX3Jdfz9sVb for ; Thu, 1 Jul 2021 16:55:40 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 154363839802 for ; Thu, 1 Jul 2021 06:55:38 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 154363839802 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1625122538; bh=lY+ItruHohIL62o9nymEkorPjVLKSDe7Ykw4mfK/Oo4=; h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=N/KVZoS7bOYgkR93b9SKVOMRDGIs4IOgO9YMJhLlF2SWaTQ3loeAzvE0IFg+4n+NA 1NUWfrn1R18jrWw66T64eYn9bOIz3BD+iLRhq1FItD5B5I2BahNlmYbt50jnJOCirU EAtPz5dnjvChrAWz09VQLju+r4Tm2yQOXarH4MS8= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by sourceware.org (Postfix) with ESMTPS id 34139385503C for ; Thu, 1 Jul 2021 06:17:47 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 34139385503C X-IronPort-AV: E=McAfee;i="6200,9189,10031"; a="208518885" X-IronPort-AV: E=Sophos;i="5.83,313,1616482800"; d="scan'208";a="208518885" Received: from orsmga007.jf.intel.com ([10.7.209.58]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Jun 2021 23:17:46 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.83,313,1616482800"; d="scan'208";a="447761529" Received: from scymds01.sc.intel.com ([10.148.94.138]) by orsmga007.jf.intel.com with ESMTP; 30 Jun 2021 23:17:46 -0700 Received: from shliclel320.sh.intel.com (shliclel320.sh.intel.com [10.239.236.50]) by scymds01.sc.intel.com with ESMTP id 1616GmfD031625; Wed, 30 Jun 2021 23:17:44 -0700 To: gcc-patches@gcc.gnu.org Subject: [PATCH 34/62] AVX512FP16: Add vcvttsh2si/vcvttsh2usi. Date: Thu, 1 Jul 2021 14:16:20 +0800 Message-Id: <20210701061648.9447-35-hongtao.liu@intel.com> X-Mailer: git-send-email 2.18.1 In-Reply-To: <20210701061648.9447-1-hongtao.liu@intel.com> References: <20210701061648.9447-1-hongtao.liu@intel.com> X-Spam-Status: No, score=-12.5 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_NONE, KAM_DMARC_STATUS, KAM_LAZY_DOMAIN_SECURITY, KAM_SHORT, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: liuhongt via Gcc-patches From: liuhongt Reply-To: liuhongt Cc: jakub@redhat.com Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" gcc/ChangeLog: * config/i386/avx512fp16intrin.h (_mm_cvttsh_i32): New intrinsic. (_mm_cvttsh_u32): Likewise. (_mm_cvtt_roundsh_i32): Likewise. (_mm_cvtt_roundsh_u32): Likewise. (_mm_cvttsh_i64): Likewise. (_mm_cvttsh_u64): Likewise. (_mm_cvtt_roundsh_i64): Likewise. (_mm_cvtt_roundsh_u64): Likewise. * config/i386/i386-builtin.def: Add corresponding new builtins. * config/i386/sse.md (avx512fp16_fix_trunc2): New. gcc/testsuite/ChangeLog: * gcc.target/i386/avx512fp16-vcvttsh2si-1a.c: New test. * gcc.target/i386/avx512fp16-vcvttsh2si-1b.c: Ditto. * gcc.target/i386/avx512fp16-vcvttsh2si64-1a.c: Ditto. * gcc.target/i386/avx512fp16-vcvttsh2si64-1b.c: Ditto. * gcc.target/i386/avx512fp16-vcvttsh2usi-1a.c: Ditto. * gcc.target/i386/avx512fp16-vcvttsh2usi-1b.c: Ditto. * gcc.target/i386/avx512fp16-vcvttsh2usi64-1a.c: Ditto. * gcc.target/i386/avx512fp16-vcvttsh2usi64-1b.c: Ditto. * gcc.target/i386/avx-1.c: Add test for new builtins. * gcc.target/i386/sse-13.c: Ditto. * gcc.target/i386/sse-23.c: Ditto. * gcc.target/i386/sse-14.c: Add test for new intrinsics. * gcc.target/i386/sse-22.c: Ditto. --- gcc/config/i386/avx512fp16intrin.h | 81 +++++++++++++++++++ gcc/config/i386/i386-builtin.def | 4 + gcc/config/i386/sse.md | 16 ++++ gcc/testsuite/gcc.target/i386/avx-1.c | 4 + .../i386/avx512fp16-vcvttsh2si-1a.c | 16 ++++ .../i386/avx512fp16-vcvttsh2si-1b.c | 54 +++++++++++++ .../i386/avx512fp16-vcvttsh2si64-1a.c | 16 ++++ .../i386/avx512fp16-vcvttsh2si64-1b.c | 52 ++++++++++++ .../i386/avx512fp16-vcvttsh2usi-1a.c | 16 ++++ .../i386/avx512fp16-vcvttsh2usi-1b.c | 54 +++++++++++++ .../i386/avx512fp16-vcvttsh2usi64-1a.c | 16 ++++ .../i386/avx512fp16-vcvttsh2usi64-1b.c | 53 ++++++++++++ gcc/testsuite/gcc.target/i386/sse-13.c | 4 + gcc/testsuite/gcc.target/i386/sse-14.c | 4 + gcc/testsuite/gcc.target/i386/sse-22.c | 4 + gcc/testsuite/gcc.target/i386/sse-23.c | 4 + 16 files changed, 398 insertions(+) create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvttsh2si-1a.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvttsh2si-1b.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvttsh2si64-1a.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvttsh2si64-1b.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvttsh2usi-1a.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvttsh2usi-1b.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvttsh2usi64-1a.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvttsh2usi64-1b.c diff --git a/gcc/config/i386/avx512fp16intrin.h b/gcc/config/i386/avx512fp16intrin.h index 66de5b88927..bcd04f14769 100644 --- a/gcc/config/i386/avx512fp16intrin.h +++ b/gcc/config/i386/avx512fp16intrin.h @@ -4148,6 +4148,87 @@ _mm_cvt_roundsh_u64 (__m128h __A, const int __R) #endif /* __OPTIMIZE__ */ #endif /* __x86_64__ */ +/* Intrinsics vcvttsh2si, vcvttsh2us. */ +extern __inline int +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvttsh_i32 (__m128h __A) +{ + return (int) + __builtin_ia32_vcvttsh2si32_round (__A, _MM_FROUND_CUR_DIRECTION); +} + +extern __inline unsigned +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvttsh_u32 (__m128h __A) +{ + return (int) + __builtin_ia32_vcvttsh2usi32_round (__A, _MM_FROUND_CUR_DIRECTION); +} + +#ifdef __OPTIMIZE__ +extern __inline int +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvtt_roundsh_i32 (__m128h __A, const int __R) +{ + return (int) __builtin_ia32_vcvttsh2si32_round (__A, __R); +} + +extern __inline unsigned +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvtt_roundsh_u32 (__m128h __A, const int __R) +{ + return (int) __builtin_ia32_vcvttsh2usi32_round (__A, __R); +} + +#else +#define _mm_cvtt_roundsh_i32(A, B) \ + ((int)__builtin_ia32_vcvttsh2si32_round ((A), (B))) +#define _mm_cvtt_roundsh_u32(A, B) \ + ((int)__builtin_ia32_vcvttsh2usi32_round ((A), (B))) + +#endif /* __OPTIMIZE__ */ + +#ifdef __x86_64__ +extern __inline long long +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvttsh_i64 (__m128h __A) +{ + return (long long) + __builtin_ia32_vcvttsh2si64_round (__A, _MM_FROUND_CUR_DIRECTION); +} + +extern __inline unsigned long long +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvttsh_u64 (__m128h __A) +{ + return (long long) + __builtin_ia32_vcvttsh2usi64_round (__A, _MM_FROUND_CUR_DIRECTION); +} + +#ifdef __OPTIMIZE__ +extern __inline long long +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvtt_roundsh_i64 (__m128h __A, const int __R) +{ + return (long long) __builtin_ia32_vcvttsh2si64_round (__A, __R); +} + +extern __inline unsigned long long +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvtt_roundsh_u64 (__m128h __A, const int __R) +{ + return (long long) __builtin_ia32_vcvttsh2usi64_round (__A, __R); +} + +#else +#define _mm_cvtt_roundsh_i64(A, B) \ + ((long long)__builtin_ia32_vcvttsh2si64_round ((A), (B))) +#define _mm_cvtt_roundsh_u64(A, B) \ + ((long long)__builtin_ia32_vcvttsh2usi64_round ((A), (B))) + +#endif /* __OPTIMIZE__ */ +#endif /* __x86_64__ */ + /* Intrinsics vcvtsi2sh, vcvtusi2sh. */ extern __inline __m128h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) diff --git a/gcc/config/i386/i386-builtin.def b/gcc/config/i386/i386-builtin.def index 17571e3b4c3..4e6d08c2d3f 100644 --- a/gcc/config/i386/i386-builtin.def +++ b/gcc/config/i386/i386-builtin.def @@ -3116,6 +3116,10 @@ BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtsh2si_round, "__b BDESC (OPTION_MASK_ISA_64BIT, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtsh2siq_round, "__builtin_ia32_vcvtsh2si64_round", IX86_BUILTIN_VCVTSH2SI64_ROUND, UNKNOWN, (int) INT64_FTYPE_V8HF_INT) BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtsh2usi_round, "__builtin_ia32_vcvtsh2usi32_round", IX86_BUILTIN_VCVTSH2USI32_ROUND, UNKNOWN, (int) UINT_FTYPE_V8HF_INT) BDESC (OPTION_MASK_ISA_64BIT, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtsh2usiq_round, "__builtin_ia32_vcvtsh2usi64_round", IX86_BUILTIN_VCVTSH2USI64_ROUND, UNKNOWN, (int) UINT64_FTYPE_V8HF_INT) +BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_fix_truncsi2_round, "__builtin_ia32_vcvttsh2si32_round", IX86_BUILTIN_VCVTTSH2SI32_ROUND, UNKNOWN, (int) INT_FTYPE_V8HF_INT) +BDESC (OPTION_MASK_ISA_64BIT, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_fix_truncdi2_round, "__builtin_ia32_vcvttsh2si64_round", IX86_BUILTIN_VCVTTSH2SI64_ROUND, UNKNOWN, (int) INT64_FTYPE_V8HF_INT) +BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_fixuns_truncsi2_round, "__builtin_ia32_vcvttsh2usi32_round", IX86_BUILTIN_VCVTTSH2USI32_ROUND, UNKNOWN, (int) UINT_FTYPE_V8HF_INT) +BDESC (OPTION_MASK_ISA_64BIT, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_fixuns_truncdi2_round, "__builtin_ia32_vcvttsh2usi64_round", IX86_BUILTIN_VCVTTSH2USI64_ROUND, UNKNOWN, (int) UINT64_FTYPE_V8HF_INT) BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtsi2sh_round, "__builtin_ia32_vcvtsi2sh32_round", IX86_BUILTIN_VCVTSI2SH32_ROUND, UNKNOWN, (int) V8HF_FTYPE_V8HF_INT_INT) BDESC (OPTION_MASK_ISA_64BIT, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtsi2shq_round, "__builtin_ia32_vcvtsi2sh64_round", IX86_BUILTIN_VCVTSI2SH64_ROUND, UNKNOWN, (int) V8HF_FTYPE_V8HF_INT64_INT) BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtusi2sh_round, "__builtin_ia32_vcvtusi2sh32_round", IX86_BUILTIN_VCVTUSI2SH32_ROUND, UNKNOWN, (int) V8HF_FTYPE_V8HF_UINT_INT) diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index 66b4fa61eb5..c16e0dc46a7 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -5670,6 +5670,22 @@ (define_insn "avx512fp16_fix_truncv2di2" (set_attr "prefix" "evex") (set_attr "mode" "TI")]) +(define_insn "avx512fp16_fix_trunc2" + [(set (match_operand:SWI48 0 "register_operand" "=r,r") + (any_fix:SWI48 + (vec_select:HF + (match_operand:V8HF 1 "" "v,") + (parallel [(const_int 0)]))))] + "TARGET_AVX512FP16" + "%vcvttsh2si\t{%1, %0|%0, %k1}" + [(set_attr "type" "sseicvt") + (set_attr "athlon_decode" "double,vector") + (set_attr "amdfam10_decode" "double,double") + (set_attr "bdver1_decode" "double,double") + (set_attr "prefix_rep" "1") + (set_attr "prefix" "evex") + (set_attr "mode" "")]) + ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; ;; ;; Parallel single-precision floating point conversion operations diff --git a/gcc/testsuite/gcc.target/i386/avx-1.c b/gcc/testsuite/gcc.target/i386/avx-1.c index 4b6cf7e1ed6..595a6ac007a 100644 --- a/gcc/testsuite/gcc.target/i386/avx-1.c +++ b/gcc/testsuite/gcc.target/i386/avx-1.c @@ -741,6 +741,10 @@ #define __builtin_ia32_vcvtsh2si64_round(A, B) __builtin_ia32_vcvtsh2si64_round(A, 8) #define __builtin_ia32_vcvtsh2usi32_round(A, B) __builtin_ia32_vcvtsh2usi32_round(A, 8) #define __builtin_ia32_vcvtsh2usi64_round(A, B) __builtin_ia32_vcvtsh2usi64_round(A, 8) +#define __builtin_ia32_vcvttsh2si32_round(A, B) __builtin_ia32_vcvttsh2si32_round(A, 8) +#define __builtin_ia32_vcvttsh2si64_round(A, B) __builtin_ia32_vcvttsh2si64_round(A, 8) +#define __builtin_ia32_vcvttsh2usi32_round(A, B) __builtin_ia32_vcvttsh2usi32_round(A, 8) +#define __builtin_ia32_vcvttsh2usi64_round(A, B) __builtin_ia32_vcvttsh2usi64_round(A, 8) #define __builtin_ia32_vcvtsi2sh32_round(A, B, C) __builtin_ia32_vcvtsi2sh32_round(A, B, 8) #define __builtin_ia32_vcvtsi2sh64_round(A, B, C) __builtin_ia32_vcvtsi2sh64_round(A, B, 8) #define __builtin_ia32_vcvtusi2sh32_round(A, B, C) __builtin_ia32_vcvtusi2sh32_round(A, B, 8) diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vcvttsh2si-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvttsh2si-1a.c new file mode 100644 index 00000000000..80d84fce153 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvttsh2si-1a.c @@ -0,0 +1,16 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx512fp16 -O2" } */ +/* { dg-final { scan-assembler-times "vcvttsh2si\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%eax" 1 } } */ +/* { dg-final { scan-assembler-times "vcvttsh2si\[ \\t\]+\{sae\}\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%eax" 1 } } */ + +#include + +volatile __m128h x1; +volatile int res1; + +void extern +avx512f_test (void) +{ + res1 = _mm_cvttsh_i32 (x1); + res1 = _mm_cvtt_roundsh_i32 (x1, 8); +} diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vcvttsh2si-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvttsh2si-1b.c new file mode 100644 index 00000000000..c5b0a64d5f0 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvttsh2si-1b.c @@ -0,0 +1,54 @@ +/* { dg-do run { target avx512fp16 } } */ +/* { dg-options "-O2 -mavx512fp16 -mavx512dq" } */ + + +#define AVX512FP16 +#include "avx512fp16-helper.h" + +#define N_ELEMS 2 + +void NOINLINE +emulate_cvtph2_d(V512 * dest, V512 op1, + __mmask32 k, int zero_mask) +{ + V512 v1, v2, v3, v4, v5, v6, v7, v8; + int i; + __mmask16 m1, m2; + + m1 = k & 0xffff; + + unpack_ph_2twops(op1, &v1, &v2); + + for (i = 0; i < 16; i++) { + if (((1 << i) & m1) == 0) { + if (zero_mask) { + v5.u32[i] = 0; + } + else { + v5.u32[i] = dest->u32[i]; + } + } + else { + v5.u32[i] = v1.f32[i]; + + } + } + *dest = v5; +} + +void +test_512 (void) +{ + V512 res; + V512 exp; + + init_src(); + emulate_cvtph2_d(&exp, src1, NET_MASK, 0); + res.i32[0] = _mm_cvtt_roundsh_i32(src1.xmmh[0], _ROUND_NINT); + check_results(&res, &exp, N_ELEMS, "_mm_cvtt_roundsh_i32"); + + if (n_errs != 0) { + abort (); + } +} + diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vcvttsh2si64-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvttsh2si64-1a.c new file mode 100644 index 00000000000..76a9053ef89 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvttsh2si64-1a.c @@ -0,0 +1,16 @@ +/* { dg-do compile { target { ! ia32 } } } */ +/* { dg-options "-mavx512fp16 -O2" } */ +/* { dg-final { scan-assembler-times "vcvttsh2si\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%rax" 1 } } */ +/* { dg-final { scan-assembler-times "vcvttsh2si\[ \\t\]+\{sae\}\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%rax" 1 } } */ + +#include + +volatile __m128h x1; +volatile long long res2; + +void extern +avx512f_test (void) +{ + res2 = _mm_cvttsh_i64 (x1); + res2 = _mm_cvtt_roundsh_i64 (x1, 8); +} diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vcvttsh2si64-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvttsh2si64-1b.c new file mode 100644 index 00000000000..4e0fe5bb6bf --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvttsh2si64-1b.c @@ -0,0 +1,52 @@ +/* { dg-do run { target { { ! ia32 } && avx512fp16 } } } */ +/* { dg-options "-O2 -mavx512fp16 -mavx512dq" } */ + +#define AVX512FP16 +#include "avx512fp16-helper.h" + +#define N_ELEMS 4 + +void NOINLINE +emulate_cvtph2_q(V512 * dest, V512 op1, + __mmask32 k, int zero_mask) +{ + V512 v1, v2, v3, v4, v5, v6, v7, v8; + int i; + __mmask16 m1, m2; + + m1 = k & 0xffff; + + unpack_ph_2twops(op1, &v1, &v2); + + for (i = 0; i < 8; i++) { + if (((1 << i) & m1) == 0) { + if (zero_mask) { + v5.u64[i] = 0; + } + else { + v5.u64[i] = dest->u64[i]; + } + } + else { + v5.u64[i] = v1.f32[i]; + } + } + *dest = v5; +} + +void +test_512 (void) +{ + V512 res; + V512 exp; + + init_src(); + emulate_cvtph2_q(&exp, src1, NET_MASK, 0); + res.s64[0] = _mm_cvtt_roundsh_i64(src1.xmmh[0], _ROUND_NINT); + check_results(&res, &exp, N_ELEMS, "_mm_cvtt_roundsh_i64"); + + if (n_errs != 0) { + abort (); + } +} + diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vcvttsh2usi-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvttsh2usi-1a.c new file mode 100644 index 00000000000..59564578a4d --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvttsh2usi-1a.c @@ -0,0 +1,16 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx512fp16 -O2" } */ +/* { dg-final { scan-assembler-times "vcvttsh2usi\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%eax" 1 } } */ +/* { dg-final { scan-assembler-times "vcvttsh2usi\[ \\t\]+\{sae\}\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%eax" 1 } } */ + +#include + +volatile __m128h x1; +volatile unsigned int res1; + +void extern +avx512f_test (void) +{ + res1 = _mm_cvttsh_u32 (x1); + res1 = _mm_cvtt_roundsh_u32 (x1, 8); +} diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vcvttsh2usi-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvttsh2usi-1b.c new file mode 100644 index 00000000000..214e3e13db7 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvttsh2usi-1b.c @@ -0,0 +1,54 @@ +/* { dg-do run { target avx512fp16 } } */ +/* { dg-options "-O2 -mavx512fp16 -mavx512dq" } */ + + +#define AVX512FP16 +#include "avx512fp16-helper.h" + +#define N_ELEMS 2 + +void NOINLINE +emulate_cvtph2_d(V512 * dest, V512 op1, + __mmask32 k, int zero_mask) +{ + V512 v1, v2, v3, v4, v5, v6, v7, v8; + int i; + __mmask16 m1, m2; + + m1 = k & 0xffff; + + unpack_ph_2twops(op1, &v1, &v2); + + for (i = 0; i < 16; i++) { + if (((1 << i) & m1) == 0) { + if (zero_mask) { + v5.u32[i] = 0; + } + else { + v5.u32[i] = dest->u32[i]; + } + } + else { + v5.u32[i] = v1.f32[i]; + + } + } + *dest = v5; +} + +void +test_512 (void) +{ + V512 res; + V512 exp; + + init_src(); + emulate_cvtph2_d(&exp, src1, NET_MASK, 0); + res.u32[0] = _mm_cvtt_roundsh_i32(src1.xmmh[0], _ROUND_NINT); + check_results(&res, &exp, N_ELEMS, "_mm_cvtt_roundsh_u32"); + + if (n_errs != 0) { + abort (); + } +} + diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vcvttsh2usi64-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvttsh2usi64-1a.c new file mode 100644 index 00000000000..23e8e70a901 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvttsh2usi64-1a.c @@ -0,0 +1,16 @@ +/* { dg-do compile { target { ! ia32 } } } */ +/* { dg-options "-mavx512fp16 -O2 " } */ +/* { dg-final { scan-assembler-times "vcvttsh2usi\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%rax" 1 } } */ +/* { dg-final { scan-assembler-times "vcvttsh2usi\[ \\t\]+\{sae\}\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%rax" 1 } } */ + +#include + +volatile __m128h x1; +volatile unsigned long long res2; + +void extern +avx512f_test (void) +{ + res2 = _mm_cvttsh_u64 (x1); + res2 = _mm_cvtt_roundsh_u64 (x1, 8); +} diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vcvttsh2usi64-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvttsh2usi64-1b.c new file mode 100644 index 00000000000..863fb6e167d --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvttsh2usi64-1b.c @@ -0,0 +1,53 @@ +/* { dg-do run { target { { ! ia32 } && avx512fp16 } } } */ +/* { dg-options "-O2 -mavx512fp16 -mavx512dq" } */ + + +#define AVX512FP16 +#include "avx512fp16-helper.h" + +#define N_ELEMS 4 + +void NOINLINE +emulate_cvtph2_q(V512 * dest, V512 op1, + __mmask32 k, int zero_mask) +{ + V512 v1, v2, v3, v4, v5, v6, v7, v8; + int i; + __mmask16 m1, m2; + + m1 = k & 0xffff; + + unpack_ph_2twops(op1, &v1, &v2); + + for (i = 0; i < 8; i++) { + if (((1 << i) & m1) == 0) { + if (zero_mask) { + v5.u64[i] = 0; + } + else { + v5.u64[i] = dest->u64[i]; + } + } + else { + v5.u64[i] = v1.f32[i]; + } + } + *dest = v5; +} + +void +test_512 (void) +{ + V512 res; + V512 exp; + + init_src(); + emulate_cvtph2_q(&exp, src1, NET_MASK, 0); + res.u64[0] = _mm_cvtt_roundsh_i64(src1.xmmh[0], _ROUND_NINT); + check_results(&res, &exp, 4, "_mm_cvtt_roundsh_u64"); + + if (n_errs != 0) { + abort (); + } +} + diff --git a/gcc/testsuite/gcc.target/i386/sse-13.c b/gcc/testsuite/gcc.target/i386/sse-13.c index 2e730d554dd..0d976fb0de4 100644 --- a/gcc/testsuite/gcc.target/i386/sse-13.c +++ b/gcc/testsuite/gcc.target/i386/sse-13.c @@ -758,6 +758,10 @@ #define __builtin_ia32_vcvtsh2si64_round(A, B) __builtin_ia32_vcvtsh2si64_round(A, 8) #define __builtin_ia32_vcvtsh2usi32_round(A, B) __builtin_ia32_vcvtsh2usi32_round(A, 8) #define __builtin_ia32_vcvtsh2usi64_round(A, B) __builtin_ia32_vcvtsh2usi64_round(A, 8) +#define __builtin_ia32_vcvttsh2si32_round(A, B) __builtin_ia32_vcvttsh2si32_round(A, 8) +#define __builtin_ia32_vcvttsh2si64_round(A, B) __builtin_ia32_vcvttsh2si64_round(A, 8) +#define __builtin_ia32_vcvttsh2usi32_round(A, B) __builtin_ia32_vcvttsh2usi32_round(A, 8) +#define __builtin_ia32_vcvttsh2usi64_round(A, B) __builtin_ia32_vcvttsh2usi64_round(A, 8) #define __builtin_ia32_vcvtsi2sh32_round(A, B, C) __builtin_ia32_vcvtsi2sh32_round(A, B, 8) #define __builtin_ia32_vcvtsi2sh64_round(A, B, C) __builtin_ia32_vcvtsi2sh64_round(A, B, 8) #define __builtin_ia32_vcvtusi2sh32_round(A, B, C) __builtin_ia32_vcvtusi2sh32_round(A, B, 8) diff --git a/gcc/testsuite/gcc.target/i386/sse-14.c b/gcc/testsuite/gcc.target/i386/sse-14.c index 98e38fb025a..403f3af6067 100644 --- a/gcc/testsuite/gcc.target/i386/sse-14.c +++ b/gcc/testsuite/gcc.target/i386/sse-14.c @@ -698,9 +698,13 @@ test_1 (_mm512_cvt_roundepi64_ph, __m128h, __m512i, 8) test_1 (_mm512_cvt_roundepu64_ph, __m128h, __m512i, 8) test_1 (_mm_cvt_roundsh_i32, int, __m128h, 8) test_1 (_mm_cvt_roundsh_u32, unsigned, __m128h, 8) +test_1 (_mm_cvtt_roundsh_i32, int, __m128h, 8) +test_1 (_mm_cvtt_roundsh_u32, unsigned, __m128h, 8) #ifdef __x86_64__ test_1 (_mm_cvt_roundsh_i64, long long, __m128h, 8) test_1 (_mm_cvt_roundsh_u64, unsigned long long, __m128h, 8) +test_1 (_mm_cvtt_roundsh_i64, long long, __m128h, 8) +test_1 (_mm_cvtt_roundsh_u64, unsigned long long, __m128h, 8) test_2 (_mm_cvt_roundi64_sh, __m128h, __m128h, long long, 8) test_2 (_mm_cvt_roundu64_sh, __m128h, __m128h, unsigned long long, 8) #endif diff --git a/gcc/testsuite/gcc.target/i386/sse-22.c b/gcc/testsuite/gcc.target/i386/sse-22.c index 3ad10908d49..b980ac3cddd 100644 --- a/gcc/testsuite/gcc.target/i386/sse-22.c +++ b/gcc/testsuite/gcc.target/i386/sse-22.c @@ -803,9 +803,13 @@ test_1 (_mm512_cvt_roundepi64_ph, __m128h, __m512i, 8) test_1 (_mm512_cvt_roundepu64_ph, __m128h, __m512i, 8) test_1 (_mm_cvt_roundsh_i32, int, __m128h, 8) test_1 (_mm_cvt_roundsh_u32, unsigned, __m128h, 8) +test_1 (_mm_cvtt_roundsh_i32, int, __m128h, 8) +test_1 (_mm_cvtt_roundsh_u32, unsigned, __m128h, 8) #ifdef __x86_64__ test_1 (_mm_cvt_roundsh_i64, long long, __m128h, 8) test_1 (_mm_cvt_roundsh_u64, unsigned long long, __m128h, 8) +test_1 (_mm_cvtt_roundsh_i64, long long, __m128h, 8) +test_1 (_mm_cvtt_roundsh_u64, unsigned long long, __m128h, 8) test_2 (_mm_cvt_roundi64_sh, __m128h, __m128h, long long, 8) test_2 (_mm_cvt_roundu64_sh, __m128h, __m128h, unsigned long long, 8) #endif diff --git a/gcc/testsuite/gcc.target/i386/sse-23.c b/gcc/testsuite/gcc.target/i386/sse-23.c index 6990f93bfce..1bd734a9352 100644 --- a/gcc/testsuite/gcc.target/i386/sse-23.c +++ b/gcc/testsuite/gcc.target/i386/sse-23.c @@ -759,6 +759,10 @@ #define __builtin_ia32_vcvtsh2si64_round(A, B) __builtin_ia32_vcvtsh2si64_round(A, 8) #define __builtin_ia32_vcvtsh2usi32_round(A, B) __builtin_ia32_vcvtsh2usi32_round(A, 8) #define __builtin_ia32_vcvtsh2usi64_round(A, B) __builtin_ia32_vcvtsh2usi64_round(A, 8) +#define __builtin_ia32_vcvttsh2si32_round(A, B) __builtin_ia32_vcvttsh2si32_round(A, 8) +#define __builtin_ia32_vcvttsh2si64_round(A, B) __builtin_ia32_vcvttsh2si64_round(A, 8) +#define __builtin_ia32_vcvttsh2usi32_round(A, B) __builtin_ia32_vcvttsh2usi32_round(A, 8) +#define __builtin_ia32_vcvttsh2usi64_round(A, B) __builtin_ia32_vcvttsh2usi64_round(A, 8) #define __builtin_ia32_vcvtsi2sh32_round(A, B, C) __builtin_ia32_vcvtsi2sh32_round(A, B, 8) #define __builtin_ia32_vcvtsi2sh64_round(A, B, C) __builtin_ia32_vcvtsi2sh64_round(A, B, 8) #define __builtin_ia32_vcvtusi2sh32_round(A, B, C) __builtin_ia32_vcvtusi2sh32_round(A, B, 8) From patchwork Thu Jul 1 06:16:21 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: liuhongt X-Patchwork-Id: 1499379 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=8.43.85.97; helo=sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256 header.s=default header.b=UtzVsVjl; dkim-atps=neutral Received: from sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4GFpxT04Wjz9sWw for ; Thu, 1 Jul 2021 16:59:56 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 436AB383981E for ; Thu, 1 Jul 2021 06:59:54 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 436AB383981E DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1625122794; bh=U6qq6HwloWdK5aZBkYSO0cn4jNsLxECC6etwppCKMlY=; h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=UtzVsVjl7BWA+j0+LoDo94HCvB1LWOr00czirmr+EsmzqzAdJAsnAOJvhKn0q8g7h oochsBkPmW2eLEHxukbTupadzf3Eg6g8JRgPXUiZ96mrpEedyATnDmztihHyH6RgSI 0qF/5Hslu7oRAyl7cAEh7ewl/qyu5kVwS690tfQc= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mga12.intel.com (mga12.intel.com [192.55.52.136]) by sourceware.org (Postfix) with ESMTPS id 87D17384A013 for ; Thu, 1 Jul 2021 06:17:53 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 87D17384A013 X-IronPort-AV: E=McAfee;i="6200,9189,10031"; a="188163510" X-IronPort-AV: E=Sophos;i="5.83,313,1616482800"; d="scan'208";a="188163510" Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Jun 2021 23:17:47 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.83,313,1616482800"; d="scan'208";a="476545922" Received: from scymds01.sc.intel.com ([10.148.94.138]) by fmsmga004.fm.intel.com with ESMTP; 30 Jun 2021 23:17:47 -0700 Received: from shliclel320.sh.intel.com (shliclel320.sh.intel.com [10.239.236.50]) by scymds01.sc.intel.com with ESMTP id 1616GmfE031625; Wed, 30 Jun 2021 23:17:46 -0700 To: gcc-patches@gcc.gnu.org Subject: [PATCH 35/62] AVX512FP16: Add vcvtph2pd/vcvtph2psx/vcvtpd2ph/vcvtps2phx. Date: Thu, 1 Jul 2021 14:16:21 +0800 Message-Id: <20210701061648.9447-36-hongtao.liu@intel.com> X-Mailer: git-send-email 2.18.1 In-Reply-To: <20210701061648.9447-1-hongtao.liu@intel.com> References: <20210701061648.9447-1-hongtao.liu@intel.com> X-Spam-Status: No, score=-12.2 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_NONE, KAM_DMARC_STATUS, KAM_LAZY_DOMAIN_SECURITY, KAM_SHORT, SPF_HELO_PASS, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: liuhongt via Gcc-patches From: liuhongt Reply-To: liuhongt Cc: jakub@redhat.com Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" gcc/ChangeLog: * config/i386/avx512fp16intrin.h (_mm512_cvtph_pd): New intrinsic. (_mm512_mask_cvtph_pd): Likewise. (_mm512_maskz_cvtph_pd): Likewise. (_mm512_cvt_roundph_pd): Likewise. (_mm512_mask_cvt_roundph_pd): Likewise. (_mm512_maskz_cvt_roundph_pd): Likewise. (_mm512_cvtxph_ps): Likewise. (_mm512_mask_cvtxph_ps): Likewise. (_mm512_maskz_cvtxph_ps): Likewise. (_mm512_cvtx_roundph_ps): Likewise. (_mm512_mask_cvtx_roundph_ps): Likewise. (_mm512_maskz_cvtx_roundph_ps): Likewise. (_mm512_cvtxps_ph): Likewise. (_mm512_mask_cvtxps_ph): Likewise. (_mm512_maskz_cvtxps_ph): Likewise. (_mm512_cvtx_roundps_ph): Likewise. (_mm512_mask_cvtx_roundps_ph): Likewise. (_mm512_maskz_cvtx_roundps_ph): Likewise. (_mm512_cvtpd_ph): Likewise. (_mm512_mask_cvtpd_ph): Likewise. (_mm512_maskz_cvtpd_ph): Likewise. (_mm512_cvt_roundpd_ph): Likewise. (_mm512_mask_cvt_roundpd_ph): Likewise. (_mm512_maskz_cvt_roundpd_ph): Likewise. * config/i386/avx512fp16vlintrin.h (_mm_cvtph_pd): New intrinsic. (_mm_mask_cvtph_pd): Likewise. (_mm_maskz_cvtph_pd): Likewise. (_mm256_cvtph_pd): Likewise. (_mm256_mask_cvtph_pd): Likewise. (_mm256_maskz_cvtph_pd): Likewise. (_mm_cvtxph_ps): Likewise. (_mm_mask_cvtxph_ps): Likewise. (_mm_maskz_cvtxph_ps): Likewise. (_mm256_cvtxph_ps): Likewise. (_mm256_mask_cvtxph_ps): Likewise. (_mm256_maskz_cvtxph_ps): Likewise. (_mm_cvtxps_ph): Likewise. (_mm_mask_cvtxps_ph): Likewise. (_mm_maskz_cvtxps_ph): Likewise. (_mm256_cvtxps_ph): Likewise. (_mm256_mask_cvtxps_ph): Likewise. (_mm256_maskz_cvtxps_ph): Likewise. (_mm_cvtpd_ph): Likewise. (_mm_mask_cvtpd_ph): Likewise. (_mm_maskz_cvtpd_ph): Likewise. (_mm256_cvtpd_ph): Likewise. (_mm256_mask_cvtpd_ph): Likewise. (_mm256_maskz_cvtpd_ph): Likewise. * config/i386/i386-builtin.def: Add corresponding new builtins. * config/i386/i386-builtin-types.def: Add corresponding builtin types. * config/i386/i386-expand.c: Handle new builtin types. * config/i386/sse.md (VF4_128_8_256): New. (VF48H_AVX512VL): Ditto. (ssePHmode): Add HF vector modes. (castmode): Add new convertable modes. (qq2phsuff): Ditto. (ph2pssuffix): New. (avx512fp16_vcvt2ph_): Ditto. (avx512fp16_vcvt2ph_): Ditto. (*avx512fp16_vcvt2ph_): Ditto. (avx512fp16_vcvt2ph__mask): Ditto. (*avx512fp16_vcvt2ph__mask): Ditto. (*avx512fp16_vcvt2ph__mask_1): Ditto. (avx512fp16_vcvtpd2ph_v2df): Ditto. (*avx512fp16_vcvtpd2ph_v2df): Ditto. (avx512fp16_vcvtpd2ph_v2df_mask): Ditto. (*avx512fp16_vcvtpd2ph_v2df_mask): Ditto. (*avx512fp16_vcvtpd2ph_v2df_mask_1): Ditto. gcc/testsuite/ChangeLog: * gcc.target/i386/avx-1.c: Add test for new builtins. * gcc.target/i386/sse-13.c: Ditto. * gcc.target/i386/sse-23.c: Ditto. * gcc.target/i386/sse-14.c: Add test for new intrinsics. * gcc.target/i386/sse-22.c: Ditto. --- gcc/config/i386/avx512fp16intrin.h | 297 +++++++++++++++++++++++++ gcc/config/i386/avx512fp16vlintrin.h | 200 +++++++++++++++++ gcc/config/i386/i386-builtin-types.def | 12 + gcc/config/i386/i386-builtin.def | 12 + gcc/config/i386/i386-expand.c | 12 + gcc/config/i386/sse.md | 189 +++++++++++++++- gcc/testsuite/gcc.target/i386/avx-1.c | 4 + gcc/testsuite/gcc.target/i386/sse-13.c | 4 + gcc/testsuite/gcc.target/i386/sse-14.c | 12 + gcc/testsuite/gcc.target/i386/sse-22.c | 12 + gcc/testsuite/gcc.target/i386/sse-23.c | 4 + 11 files changed, 755 insertions(+), 3 deletions(-) diff --git a/gcc/config/i386/avx512fp16intrin.h b/gcc/config/i386/avx512fp16intrin.h index bcd04f14769..5a6a0ba83a9 100644 --- a/gcc/config/i386/avx512fp16intrin.h +++ b/gcc/config/i386/avx512fp16intrin.h @@ -4306,6 +4306,303 @@ _mm_cvt_roundu64_sh (__m128h __A, unsigned long long __B, const int __R) #endif /* __OPTIMIZE__ */ #endif /* __x86_64__ */ +/* Intrinsics vcvtph2pd. */ +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_cvtph_pd (__m128h __A) +{ + return __builtin_ia32_vcvtph2pd_v8df_mask_round (__A, + _mm512_setzero_pd (), + (__mmask8) -1, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_cvtph_pd (__m512d __A, __mmask8 __B, __m128h __C) +{ + return __builtin_ia32_vcvtph2pd_v8df_mask_round (__C, __A, __B, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_cvtph_pd (__mmask8 __A, __m128h __B) +{ + return __builtin_ia32_vcvtph2pd_v8df_mask_round (__B, + _mm512_setzero_pd (), + __A, + _MM_FROUND_CUR_DIRECTION); +} + +#ifdef __OPTIMIZE__ +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_cvt_roundph_pd (__m128h __A, int __B) +{ + return __builtin_ia32_vcvtph2pd_v8df_mask_round (__A, + _mm512_setzero_pd (), + (__mmask8) -1, + __B); +} + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_cvt_roundph_pd (__m512d __A, __mmask8 __B, __m128h __C, int __D) +{ + return __builtin_ia32_vcvtph2pd_v8df_mask_round (__C, __A, __B, __D); +} + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_cvt_roundph_pd (__mmask8 __A, __m128h __B, int __C) +{ + return __builtin_ia32_vcvtph2pd_v8df_mask_round (__B, + _mm512_setzero_pd (), + __A, + __C); +} + +#else +#define _mm512_cvt_roundph_pd(A, B) \ + (__builtin_ia32_vcvtph2pd_v8df_mask_round ((A), \ + _mm512_setzero_pd (), \ + (__mmask8)-1, \ + (B))) + +#define _mm512_mask_cvt_roundph_pd(A, B, C, D) \ + (__builtin_ia32_vcvtph2pd_v8df_mask_round ((C), (A), (B), (D))) + +#define _mm512_maskz_cvt_roundph_pd(A, B, C) \ + (__builtin_ia32_vcvtph2pd_v8df_mask_round ((B), \ + _mm512_setzero_pd (), \ + (A), \ + (C))) + +#endif /* __OPTIMIZE__ */ + +/* Intrinsics vcvtph2psx. */ +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_cvtxph_ps (__m256h __A) +{ + return __builtin_ia32_vcvtph2ps_v16sf_mask_round (__A, + _mm512_setzero_ps (), + (__mmask16) -1, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_cvtxph_ps (__m512 __A, __mmask16 __B, __m256h __C) +{ + return __builtin_ia32_vcvtph2ps_v16sf_mask_round (__C, __A, __B, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_cvtxph_ps (__mmask16 __A, __m256h __B) +{ + return __builtin_ia32_vcvtph2ps_v16sf_mask_round (__B, + _mm512_setzero_ps (), + __A, + _MM_FROUND_CUR_DIRECTION); +} + +#ifdef __OPTIMIZE__ +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_cvtx_roundph_ps (__m256h __A, int __B) +{ + return __builtin_ia32_vcvtph2ps_v16sf_mask_round (__A, + _mm512_setzero_ps (), + (__mmask16) -1, + __B); +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_cvtx_roundph_ps (__m512 __A, __mmask16 __B, __m256h __C, int __D) +{ + return __builtin_ia32_vcvtph2ps_v16sf_mask_round (__C, __A, __B, __D); +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_cvtx_roundph_ps (__mmask16 __A, __m256h __B, int __C) +{ + return __builtin_ia32_vcvtph2ps_v16sf_mask_round (__B, + _mm512_setzero_ps (), + __A, + __C); +} + +#else +#define _mm512_cvtx_roundph_ps(A, B) \ + (__builtin_ia32_vcvtph2ps_v16sf_mask_round ((A), \ + _mm512_setzero_ps (), \ + (__mmask16)-1, \ + (B))) + +#define _mm512_mask_cvtx_roundph_ps(A, B, C, D) \ + (__builtin_ia32_vcvtph2ps_v16sf_mask_round ((C), (A), (B), (D))) + +#define _mm512_maskz_cvtx_roundph_ps(A, B, C) \ + (__builtin_ia32_vcvtph2ps_v16sf_mask_round ((B), \ + _mm512_setzero_ps (), \ + (A), \ + (C))) +#endif /* __OPTIMIZE__ */ + +/* Intrinsics vcvtps2ph. */ +extern __inline __m256h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_cvtxps_ph (__m512 __A) +{ + return __builtin_ia32_vcvtps2ph_v16sf_mask_round ((__v16sf) __A, + _mm256_setzero_ph (), + (__mmask16) -1, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m256h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_cvtxps_ph (__m256h __A, __mmask16 __B, __m512 __C) +{ + return __builtin_ia32_vcvtps2ph_v16sf_mask_round ((__v16sf) __C, + __A, __B, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m256h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_cvtxps_ph (__mmask16 __A, __m512 __B) +{ + return __builtin_ia32_vcvtps2ph_v16sf_mask_round ((__v16sf) __B, + _mm256_setzero_ph (), + __A, + _MM_FROUND_CUR_DIRECTION); +} + +#ifdef __OPTIMIZE__ +extern __inline __m256h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_cvtx_roundps_ph (__m512 __A, int __B) +{ + return __builtin_ia32_vcvtps2ph_v16sf_mask_round ((__v16sf) __A, + _mm256_setzero_ph (), + (__mmask16) -1, + __B); +} + +extern __inline __m256h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_cvtx_roundps_ph (__m256h __A, __mmask16 __B, __m512 __C, int __D) +{ + return __builtin_ia32_vcvtps2ph_v16sf_mask_round ((__v16sf) __C, + __A, __B, __D); +} + +extern __inline __m256h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_cvtx_roundps_ph (__mmask16 __A, __m512 __B, int __C) +{ + return __builtin_ia32_vcvtps2ph_v16sf_mask_round ((__v16sf) __B, + _mm256_setzero_ph (), + __A, __C); +} + +#else +#define _mm512_cvtx_roundps_ph(A, B) \ + (__builtin_ia32_vcvtps2ph_v16sf_mask_round ((__v16sf)(A), \ + _mm256_setzero_ph (), \ + (__mmask16)-1, (B))) + +#define _mm512_mask_cvtx_roundps_ph(A, B, C, D) \ + (__builtin_ia32_vcvtps2ph_v16sf_mask_round ((__v16sf)(C), \ + (A), (B), (D))) + +#define _mm512_maskz_cvtx_roundps_ph(A, B, C) \ + (__builtin_ia32_vcvtps2ph_v16sf_mask_round ((__v16sf)(B), \ + _mm256_setzero_ph (), \ + (A), (C))) +#endif /* __OPTIMIZE__ */ + +/* Intrinsics vcvtpd2ph. */ +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_cvtpd_ph (__m512d __A) +{ + return __builtin_ia32_vcvtpd2ph_v8df_mask_round ((__v8df) __A, + _mm_setzero_ph (), + (__mmask8) -1, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_cvtpd_ph (__m128h __A, __mmask8 __B, __m512d __C) +{ + return __builtin_ia32_vcvtpd2ph_v8df_mask_round ((__v8df) __C, + __A, __B, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_cvtpd_ph (__mmask8 __A, __m512d __B) +{ + return __builtin_ia32_vcvtpd2ph_v8df_mask_round ((__v8df) __B, + _mm_setzero_ph (), + __A, + _MM_FROUND_CUR_DIRECTION); +} + +#ifdef __OPTIMIZE__ +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_cvt_roundpd_ph (__m512d __A, int __B) +{ + return __builtin_ia32_vcvtpd2ph_v8df_mask_round ((__v8df) __A, + _mm_setzero_ph (), + (__mmask8) -1, + __B); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_cvt_roundpd_ph (__m128h __A, __mmask8 __B, __m512d __C, int __D) +{ + return __builtin_ia32_vcvtpd2ph_v8df_mask_round ((__v8df) __C, + __A, __B, __D); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_cvt_roundpd_ph (__mmask8 __A, __m512d __B, int __C) +{ + return __builtin_ia32_vcvtpd2ph_v8df_mask_round ((__v8df) __B, + _mm_setzero_ph (), + __A, __C); +} + +#else +#define _mm512_cvt_roundpd_ph(A, B) \ + (__builtin_ia32_vcvtpd2ph_v8df_mask_round ((__v8df)(A), \ + _mm_setzero_ph (), \ + (__mmask8)-1, (B))) + +#define _mm512_mask_cvt_roundpd_ph(A, B, C, D) \ + (__builtin_ia32_vcvtpd2ph_v8df_mask_round ((__v8df)(C), \ + (A), (B), (D))) + +#define _mm512_maskz_cvt_roundpd_ph(A, B, C) \ + (__builtin_ia32_vcvtpd2ph_v8df_mask_round ((__v8df)(B), \ + _mm_setzero_ph (), \ + (A), (C))) + +#endif /* __OPTIMIZE__ */ #ifdef __DISABLE_AVX512FP16__ #undef __DISABLE_AVX512FP16__ diff --git a/gcc/config/i386/avx512fp16vlintrin.h b/gcc/config/i386/avx512fp16vlintrin.h index e1ee37edde6..0124b830dd5 100644 --- a/gcc/config/i386/avx512fp16vlintrin.h +++ b/gcc/config/i386/avx512fp16vlintrin.h @@ -1952,6 +1952,206 @@ _mm256_maskz_cvtepu16_ph (__mmask16 __A, __m256i __B) __A); } +/* Intrinsics vcvtph2pd. */ +extern __inline __m128d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvtph_pd (__m128h __A) +{ + return __builtin_ia32_vcvtph2pd_v2df_mask (__A, + _mm_setzero_pd (), + (__mmask8) -1); +} + +extern __inline __m128d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_cvtph_pd (__m128d __A, __mmask8 __B, __m128h __C) +{ + return __builtin_ia32_vcvtph2pd_v2df_mask (__C, __A, __B); +} + +extern __inline __m128d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_cvtph_pd (__mmask8 __A, __m128h __B) +{ + return __builtin_ia32_vcvtph2pd_v2df_mask (__B, _mm_setzero_pd (), __A); +} + +extern __inline __m256d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_cvtph_pd (__m128h __A) +{ + return __builtin_ia32_vcvtph2pd_v4df_mask (__A, + _mm256_setzero_pd (), + (__mmask8) -1); +} + +extern __inline __m256d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_mask_cvtph_pd (__m256d __A, __mmask8 __B, __m128h __C) +{ + return __builtin_ia32_vcvtph2pd_v4df_mask (__C, __A, __B); +} + +extern __inline __m256d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_maskz_cvtph_pd (__mmask8 __A, __m128h __B) +{ + return __builtin_ia32_vcvtph2pd_v4df_mask (__B, + _mm256_setzero_pd (), + __A); +} + +/* Intrinsics vcvtph2ps. */ +extern __inline __m128 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvtxph_ps (__m128h __A) +{ + return __builtin_ia32_vcvtph2ps_v4sf_mask (__A, + _mm_setzero_ps (), + (__mmask8) -1); +} + +extern __inline __m128 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_cvtxph_ps (__m128 __A, __mmask8 __B, __m128h __C) +{ + return __builtin_ia32_vcvtph2ps_v4sf_mask (__C, __A, __B); +} + +extern __inline __m128 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_cvtxph_ps (__mmask8 __A, __m128h __B) +{ + return __builtin_ia32_vcvtph2ps_v4sf_mask (__B, _mm_setzero_ps (), __A); +} + +extern __inline __m256 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_cvtxph_ps (__m128h __A) +{ + return __builtin_ia32_vcvtph2ps_v8sf_mask (__A, + _mm256_setzero_ps (), + (__mmask8) -1); +} + +extern __inline __m256 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_mask_cvtxph_ps (__m256 __A, __mmask8 __B, __m128h __C) +{ + return __builtin_ia32_vcvtph2ps_v8sf_mask (__C, __A, __B); +} + +extern __inline __m256 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_maskz_cvtxph_ps (__mmask8 __A, __m128h __B) +{ + return __builtin_ia32_vcvtph2ps_v8sf_mask (__B, + _mm256_setzero_ps (), + __A); +} + +/* Intrinsics vcvtxps2ph. */ +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvtxps_ph (__m128 __A) +{ + return __builtin_ia32_vcvtps2ph_v4sf_mask ((__v4sf) __A, + _mm_setzero_ph (), + (__mmask8) -1); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_cvtxps_ph (__m128h __A, __mmask8 __B, __m128 __C) +{ + return __builtin_ia32_vcvtps2ph_v4sf_mask ((__v4sf) __C, __A, __B); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_cvtxps_ph (__mmask8 __A, __m128 __B) +{ + return __builtin_ia32_vcvtps2ph_v4sf_mask ((__v4sf) __B, + _mm_setzero_ph (), + __A); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_cvtxps_ph (__m256 __A) +{ + return __builtin_ia32_vcvtps2ph_v8sf_mask ((__v8sf) __A, + _mm_setzero_ph (), + (__mmask8) -1); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_mask_cvtxps_ph (__m128h __A, __mmask8 __B, __m256 __C) +{ + return __builtin_ia32_vcvtps2ph_v8sf_mask ((__v8sf) __C, __A, __B); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_maskz_cvtxps_ph (__mmask8 __A, __m256 __B) +{ + return __builtin_ia32_vcvtps2ph_v8sf_mask ((__v8sf) __B, + _mm_setzero_ph (), + __A); +} + +/* Intrinsics vcvtpd2ph. */ +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvtpd_ph (__m128d __A) +{ + return __builtin_ia32_vcvtpd2ph_v2df_mask ((__v2df) __A, + _mm_setzero_ph (), + (__mmask8) -1); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_cvtpd_ph (__m128h __A, __mmask8 __B, __m128d __C) +{ + return __builtin_ia32_vcvtpd2ph_v2df_mask ((__v2df) __C, __A, __B); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_cvtpd_ph (__mmask8 __A, __m128d __B) +{ + return __builtin_ia32_vcvtpd2ph_v2df_mask ((__v2df) __B, + _mm_setzero_ph (), + __A); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_cvtpd_ph (__m256d __A) +{ + return __builtin_ia32_vcvtpd2ph_v4df_mask ((__v4df) __A, + _mm_setzero_ph (), + (__mmask8) -1); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_mask_cvtpd_ph (__m128h __A, __mmask8 __B, __m256d __C) +{ + return __builtin_ia32_vcvtpd2ph_v4df_mask ((__v4df) __C, __A, __B); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_maskz_cvtpd_ph (__mmask8 __A, __m256d __B) +{ + return __builtin_ia32_vcvtpd2ph_v4df_mask ((__v4df) __B, + _mm_setzero_ph (), + __A); +} + #ifdef __DISABLE_AVX512FP16VL__ #undef __DISABLE_AVX512FP16VL__ #pragma GCC pop_options diff --git a/gcc/config/i386/i386-builtin-types.def b/gcc/config/i386/i386-builtin-types.def index 74bda59a65e..4123e66f7cd 100644 --- a/gcc/config/i386/i386-builtin-types.def +++ b/gcc/config/i386/i386-builtin-types.def @@ -1321,13 +1321,21 @@ DEF_FUNCTION_TYPE (V8HF, V8HF, UINT, INT) DEF_FUNCTION_TYPE (V8HF, V8HF, UINT64, INT) DEF_FUNCTION_TYPE (V2DI, V8HF, V2DI, UQI) DEF_FUNCTION_TYPE (V4DI, V8HF, V4DI, UQI) +DEF_FUNCTION_TYPE (V2DF, V8HF, V2DF, UQI) +DEF_FUNCTION_TYPE (V4DF, V8HF, V4DF, UQI) DEF_FUNCTION_TYPE (V4SI, V8HF, V4SI, UQI) +DEF_FUNCTION_TYPE (V4SF, V8HF, V4SF, UQI) DEF_FUNCTION_TYPE (V8SI, V8HF, V8SI, UQI) +DEF_FUNCTION_TYPE (V8SF, V8HF, V8SF, UQI) DEF_FUNCTION_TYPE (V8HI, V8HF, V8HI, UQI) DEF_FUNCTION_TYPE (V8HF, V4SI, V8HF, UQI) +DEF_FUNCTION_TYPE (V8HF, V4SF, V8HF, UQI) DEF_FUNCTION_TYPE (V8HF, V8SI, V8HF, UQI) +DEF_FUNCTION_TYPE (V8HF, V8SF, V8HF, UQI) DEF_FUNCTION_TYPE (V8HF, V2DI, V8HF, UQI) DEF_FUNCTION_TYPE (V8HF, V4DI, V8HF, UQI) +DEF_FUNCTION_TYPE (V8HF, V2DF, V8HF, UQI) +DEF_FUNCTION_TYPE (V8HF, V4DF, V8HF, UQI) DEF_FUNCTION_TYPE (V8HF, V8HI, V8HF, UQI) DEF_FUNCTION_TYPE (V8HF, V8HF, V8HF, UQI) DEF_FUNCTION_TYPE (V8HF, V8HF, V8HF, INT) @@ -1336,7 +1344,9 @@ DEF_FUNCTION_TYPE (UQI, V8HF, V8HF, INT, UQI) DEF_FUNCTION_TYPE (V8HF, V8HF, V8HF, V8HF, UQI) DEF_FUNCTION_TYPE (UQI, V8HF, V8HF, INT, UQI, INT) DEF_FUNCTION_TYPE (V8DI, V8HF, V8DI, UQI, INT) +DEF_FUNCTION_TYPE (V8DF, V8HF, V8DF, UQI, INT) DEF_FUNCTION_TYPE (V8HF, V8DI, V8HF, UQI, INT) +DEF_FUNCTION_TYPE (V8HF, V8DF, V8HF, UQI, INT) DEF_FUNCTION_TYPE (V8HF, V8HF, V8HF, V8HF, UQI, INT) DEF_FUNCTION_TYPE (V8HF, V8HF, V8HF, INT, V8HF, UQI, INT) DEF_FUNCTION_TYPE (V16HF, V16HF, V16HF) @@ -1344,9 +1354,11 @@ DEF_FUNCTION_TYPE (V16HI, V16HF, V16HI, UHI) DEF_FUNCTION_TYPE (V16HF, V16HI, V16HF, UHI) DEF_FUNCTION_TYPE (V16HF, V16HF, V16HF, UHI) DEF_FUNCTION_TYPE (V16SI, V16HF, V16SI, UHI, INT) +DEF_FUNCTION_TYPE (V16SF, V16HF, V16SF, UHI, INT) DEF_FUNCTION_TYPE (V16HF, V16HF, INT, V16HF, UHI) DEF_FUNCTION_TYPE (UHI, V16HF, V16HF, INT, UHI) DEF_FUNCTION_TYPE (V16HF, V16SI, V16HF, UHI, INT) +DEF_FUNCTION_TYPE (V16HF, V16SF, V16HF, UHI, INT) DEF_FUNCTION_TYPE (V16HF, V16HF, V16HF, V16HF, UHI) DEF_FUNCTION_TYPE (V32HF, V32HF, V32HF, USI) DEF_FUNCTION_TYPE (V32HF, V32HF, V32HF, INT) diff --git a/gcc/config/i386/i386-builtin.def b/gcc/config/i386/i386-builtin.def index 4e6d08c2d3f..2992bd0383d 100644 --- a/gcc/config/i386/i386-builtin.def +++ b/gcc/config/i386/i386-builtin.def @@ -2867,6 +2867,14 @@ BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp1 BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtqq2ph_v4di_mask, "__builtin_ia32_vcvtqq2ph_v4di_mask", IX86_BUILTIN_VCVTQQ2PH_V4DI_MASK, UNKNOWN, (int) V8HF_FTYPE_V4DI_V8HF_UQI) BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtuqq2ph_v2di_mask, "__builtin_ia32_vcvtuqq2ph_v2di_mask", IX86_BUILTIN_VCVTUQQ2PH_V2DI_MASK, UNKNOWN, (int) V8HF_FTYPE_V2DI_V8HF_UQI) BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtuqq2ph_v4di_mask, "__builtin_ia32_vcvtuqq2ph_v4di_mask", IX86_BUILTIN_VCVTUQQ2PH_V4DI_MASK, UNKNOWN, (int) V8HF_FTYPE_V4DI_V8HF_UQI) +BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_float_extend_phv2df2_mask, "__builtin_ia32_vcvtph2pd_v2df_mask", IX86_BUILTIN_VCVTPH2PD_V2DF_MASK, UNKNOWN, (int) V2DF_FTYPE_V8HF_V2DF_UQI) +BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_float_extend_phv4df2_mask, "__builtin_ia32_vcvtph2pd_v4df_mask", IX86_BUILTIN_VCVTPH2PD_V4DF_MASK, UNKNOWN, (int) V4DF_FTYPE_V8HF_V4DF_UQI) +BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_float_extend_phv4sf2_mask, "__builtin_ia32_vcvtph2ps_v4sf_mask", IX86_BUILTIN_VCVTPH2PS_V4SF_MASK, UNKNOWN, (int) V4SF_FTYPE_V8HF_V4SF_UQI) +BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_float_extend_phv8sf2_mask, "__builtin_ia32_vcvtph2ps_v8sf_mask", IX86_BUILTIN_VCVTPH2PS_V8SF_MASK, UNKNOWN, (int) V8SF_FTYPE_V8HF_V8SF_UQI) +BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtps2ph_v4sf_mask, "__builtin_ia32_vcvtps2ph_v4sf_mask", IX86_BUILTIN_VCVTPS2PH_V4SF_MASK, UNKNOWN, (int) V8HF_FTYPE_V4SF_V8HF_UQI) +BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtps2ph_v8sf_mask, "__builtin_ia32_vcvtps2ph_v8sf_mask", IX86_BUILTIN_VCVTPS2PH_V8SF_MASK, UNKNOWN, (int) V8HF_FTYPE_V8SF_V8HF_UQI) +BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtpd2ph_v2df_mask, "__builtin_ia32_vcvtpd2ph_v2df_mask", IX86_BUILTIN_VCVTPD2PH_V2DF_MASK, UNKNOWN, (int) V8HF_FTYPE_V2DF_V8HF_UQI) +BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtpd2ph_v4df_mask, "__builtin_ia32_vcvtpd2ph_v4df_mask", IX86_BUILTIN_VCVTPD2PH_V4DF_MASK, UNKNOWN, (int) V8HF_FTYPE_V4DF_V8HF_UQI) /* Builtins with rounding support. */ BDESC_END (ARGS, ROUND_ARGS) @@ -3124,6 +3132,10 @@ BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtsi2sh_round, "__b BDESC (OPTION_MASK_ISA_64BIT, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtsi2shq_round, "__builtin_ia32_vcvtsi2sh64_round", IX86_BUILTIN_VCVTSI2SH64_ROUND, UNKNOWN, (int) V8HF_FTYPE_V8HF_INT64_INT) BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtusi2sh_round, "__builtin_ia32_vcvtusi2sh32_round", IX86_BUILTIN_VCVTUSI2SH32_ROUND, UNKNOWN, (int) V8HF_FTYPE_V8HF_UINT_INT) BDESC (OPTION_MASK_ISA_64BIT, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtusi2shq_round, "__builtin_ia32_vcvtusi2sh64_round", IX86_BUILTIN_VCVTUSI2SH64_ROUND, UNKNOWN, (int) V8HF_FTYPE_V8HF_UINT64_INT) +BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_float_extend_phv8df2_mask_round, "__builtin_ia32_vcvtph2pd_v8df_mask_round", IX86_BUILTIN_VCVTPH2PD_V8DF_MASK_ROUND, UNKNOWN, (int) V8DF_FTYPE_V8HF_V8DF_UQI_INT) +BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_float_extend_phv16sf2_mask_round, "__builtin_ia32_vcvtph2ps_v16sf_mask_round", IX86_BUILTIN_VCVTPH2PS_V16SF_MASK_ROUND, UNKNOWN, (int) V16SF_FTYPE_V16HF_V16SF_UHI_INT) +BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtpd2ph_v8df_mask_round, "__builtin_ia32_vcvtpd2ph_v8df_mask_round", IX86_BUILTIN_VCVTPD2PH_V8DF_MASK_ROUND, UNKNOWN, (int) V8HF_FTYPE_V8DF_V8HF_UQI_INT) +BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtps2ph_v16sf_mask_round, "__builtin_ia32_vcvtps2ph_v16sf_mask_round", IX86_BUILTIN_VCVTPS2PH_V16SF_MASK_ROUND, UNKNOWN, (int) V16HF_FTYPE_V16SF_V16HF_UHI_INT) BDESC_END (ROUND_ARGS, MULTI_ARG) diff --git a/gcc/config/i386/i386-expand.c b/gcc/config/i386/i386-expand.c index b83c6d9a92b..a216f6f2bf3 100644 --- a/gcc/config/i386/i386-expand.c +++ b/gcc/config/i386/i386-expand.c @@ -9566,9 +9566,11 @@ ix86_expand_args_builtin (const struct builtin_description *d, case V8SF_FTYPE_V8HI_V8SF_UQI: case V4SF_FTYPE_V8HI_V4SF_UQI: case V8SI_FTYPE_V8HF_V8SI_UQI: + case V8SF_FTYPE_V8HF_V8SF_UQI: case V8SI_FTYPE_V8SF_V8SI_UQI: case V4SI_FTYPE_V4SF_V4SI_UQI: case V4SI_FTYPE_V8HF_V4SI_UQI: + case V4SF_FTYPE_V8HF_V4SF_UQI: case V4DI_FTYPE_V8HF_V4DI_UQI: case V4DI_FTYPE_V4SF_V4DI_UQI: case V2DI_FTYPE_V8HF_V2DI_UQI: @@ -9576,12 +9578,18 @@ ix86_expand_args_builtin (const struct builtin_description *d, case V8HF_FTYPE_V8HF_V8HF_UQI: case V8HF_FTYPE_V8HI_V8HF_UQI: case V8HF_FTYPE_V8SI_V8HF_UQI: + case V8HF_FTYPE_V8SF_V8HF_UQI: case V8HF_FTYPE_V4SI_V8HF_UQI: + case V8HF_FTYPE_V4SF_V8HF_UQI: case V8HF_FTYPE_V4DI_V8HF_UQI: + case V8HF_FTYPE_V4DF_V8HF_UQI: case V8HF_FTYPE_V2DI_V8HF_UQI: + case V8HF_FTYPE_V2DF_V8HF_UQI: case V4SF_FTYPE_V4DI_V4SF_UQI: case V4SF_FTYPE_V2DI_V4SF_UQI: case V4DF_FTYPE_V4DI_V4DF_UQI: + case V4DF_FTYPE_V8HF_V4DF_UQI: + case V2DF_FTYPE_V8HF_V2DF_UQI: case V2DF_FTYPE_V2DI_V2DF_UQI: case V16QI_FTYPE_V8HI_V16QI_UQI: case V16QI_FTYPE_V16HI_V16QI_UHI: @@ -10527,6 +10535,8 @@ ix86_expand_round_builtin (const struct builtin_description *d, case V8DI_FTYPE_V8DF_V8DI_QI_INT: case V8SF_FTYPE_V8DI_V8SF_QI_INT: case V8DF_FTYPE_V8DI_V8DF_QI_INT: + case V8DF_FTYPE_V8HF_V8DF_UQI_INT: + case V16SF_FTYPE_V16HF_V16SF_UHI_INT: case V32HF_FTYPE_V32HI_V32HF_USI_INT: case V32HF_FTYPE_V32HF_V32HF_USI_INT: case V16SF_FTYPE_V16SF_V16SF_HI_INT: @@ -10540,6 +10550,8 @@ ix86_expand_round_builtin (const struct builtin_description *d, case V2DF_FTYPE_V2DF_V2DF_V2DF_INT: case V4SF_FTYPE_V4SF_V4SF_V4SF_INT: case V8HF_FTYPE_V8DI_V8HF_UQI_INT: + case V8HF_FTYPE_V8DF_V8HF_UQI_INT: + case V16HF_FTYPE_V16SF_V16HF_UHI_INT: nargs = 4; break; case V4SF_FTYPE_V4SF_V4SF_INT_INT: diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index c16e0dc46a7..7447d6b75b5 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -621,6 +621,9 @@ (define_mode_iterator V48_AVX2 (V4SI "TARGET_AVX2") (V2DI "TARGET_AVX2") (V8SI "TARGET_AVX2") (V4DI "TARGET_AVX2")]) +(define_mode_iterator VF4_128_8_256 + [V4DF V4SF]) + (define_mode_iterator VI1_AVX512VLBW [(V64QI "TARGET_AVX512BW") (V32QI "TARGET_AVX512VL") (V16QI "TARGET_AVX512VL")]) @@ -783,6 +786,8 @@ (define_mode_iterator VI48F_256_512 (V4DI "TARGET_AVX512VL") (V4DF "TARGET_AVX512VL")]) (define_mode_iterator VF48_I1248 [V16SI V16SF V8DI V8DF V32HI V64QI]) +(define_mode_iterator VF48H_AVX512VL + [V8DF V16SF (V8SF "TARGET_AVX512VL")]) (define_mode_iterator VI48F [V16SI V16SF V8DI V8DF (V8SI "TARGET_AVX512VL") (V8SF "TARGET_AVX512VL") @@ -957,7 +962,8 @@ (define_mode_attr ssehalfvecmodelower (define_mode_attr ssePHmode [(V32HI "V32HF") (V16HI "V16HF") (V8HI "V8HF") (V16SI "V16HF") (V8SI "V8HF") (V4SI "V8HF") - (V8DI "V8HF") (V4DI "V8HF") (V2DI "V8HF")]) + (V8DI "V8HF") (V4DI "V8HF") (V2DI "V8HF") + (V8DF "V8HF") (V16SF "V16HF") (V8SF "V8HF")]) ;; Mapping of vector modes to packed single mode of the same size (define_mode_attr ssePSmode @@ -1101,7 +1107,8 @@ (define_mode_attr sserotatemax ;; Mapping of mode to cast intrinsic name (define_mode_attr castmode - [(V8SI "si") (V8SF "ps") (V4DF "pd") + [(V4SF "ps") (V2DF "pd") + (V8SI "si") (V8SF "ps") (V4DF "pd") (V16SI "si") (V16SF "ps") (V8DF "pd")]) ;; i128 for integer vectors and TARGET_AVX2, f128 otherwise. @@ -5440,7 +5447,9 @@ (define_int_attr sseintconvertsignprefix (define_mode_attr qq2phsuff [(V32HI "") (V16HI "") (V8HI "") (V16SI "") (V8SI "{y}") (V4SI "{x}") - (V8DI "{z}") (V4DI "{y}") (V2DI "{x}")]) + (V8DI "{z}") (V4DI "{y}") (V2DI "{x}") + (V16SF "") (V8SF "{y}") (V4SF "{x}") + (V8DF "{z}") (V4DF "{y}") (V2DF "{x}")]) (define_insn "avx512fp16_vcvtph2_" [(set (match_operand:VI248_AVX512VL 0 "register_operand" "=v") @@ -5686,6 +5695,180 @@ (define_insn "avx512fp16_fix_trunc2" (set_attr "prefix" "evex") (set_attr "mode" "")]) +(define_mode_attr ph2pssuffix + [(V16SF "x") (V8SF "x") (V4SF "x") + (V8DF "") (V4DF "") (V2DF "")]) + +(define_insn "avx512fp16_float_extend_ph2" + [(set (match_operand:VF48H_AVX512VL 0 "register_operand" "=v") + (float_extend:VF48H_AVX512VL + (match_operand: 1 "" "")))] + "TARGET_AVX512FP16" + "vcvtph2\t{%1, %0|%0, %1}" + [(set_attr "type" "ssecvt") + (set_attr "prefix" "evex") + (set_attr "mode" "")]) + +(define_insn "avx512fp16_float_extend_ph2" + [(set (match_operand:VF4_128_8_256 0 "register_operand" "=v") + (float_extend:VF4_128_8_256 + (vec_select:V4HF + (match_operand:V8HF 1 "nonimmediate_operand" "vm") + (parallel [(const_int 0) (const_int 1) (const_int 2) (const_int 3)]))))] + "TARGET_AVX512FP16 && TARGET_AVX512VL" + "vcvtph2\t{%1, %0|%0, %q1}" + [(set_attr "type" "ssecvt") + (set_attr "prefix" "evex") + (set_attr "mode" "")]) + +(define_insn "avx512fp16_float_extend_phv2df2" + [(set (match_operand:V2DF 0 "register_operand" "=v") + (float_extend:V2DF + (vec_select:V2HF + (match_operand:V8HF 1 "nonimmediate_operand" "vm") + (parallel [(const_int 0) (const_int 1)]))))] + "TARGET_AVX512FP16 && TARGET_AVX512VL" + "vcvtph2pd\t{%1, %0|%0, %k1}" + [(set_attr "type" "ssecvt") + (set_attr "prefix" "evex") + (set_attr "mode" "TI")]) + +(define_insn "avx512fp16_vcvt2ph_" + [(set (match_operand: 0 "register_operand" "=v") + (float_truncate: + (match_operand:VF48H_AVX512VL 1 "" "")))] + "TARGET_AVX512FP16" + "vcvt2ph\t{%1, %0|%0, %1}" + [(set_attr "type" "ssecvt") + (set_attr "prefix" "evex") + (set_attr "mode" "")]) + +(define_expand "avx512fp16_vcvt2ph_" + [(set (match_operand:V8HF 0 "register_operand" "=v") + (vec_concat:V8HF + (float_truncate:V4HF (match_operand:VF4_128_8_256 1 "vector_operand" "vm")) + (match_dup 2)))] + "TARGET_AVX512FP16 && TARGET_AVX512VL" + "operands[2] = CONST0_RTX (V4HFmode);") + +(define_insn "*avx512fp16_vcvt2ph_" + [(set (match_operand:V8HF 0 "register_operand" "=v") + (vec_concat:V8HF + (float_truncate:V4HF (match_operand:VF4_128_8_256 1 "vector_operand" "vm")) + (match_operand:V4HF 2 "const0_operand" "C")))] + "TARGET_AVX512FP16 && TARGET_AVX512VL" + "vcvt2ph\t{%1, %0|%0, %1}" + [(set_attr "type" "ssecvt") + (set_attr "prefix" "evex") + (set_attr "mode" "")]) + +(define_expand "avx512fp16_vcvt2ph__mask" + [(set (match_operand:V8HF 0 "register_operand" "=v") + (vec_concat:V8HF + (vec_merge:V4HF + (float_truncate:V4HF (match_operand:VF4_128_8_256 1 "vector_operand" "vm")) + (vec_select:V4HF + (match_operand:V8HF 2 "nonimm_or_0_operand" "0C") + (parallel [(const_int 0) (const_int 1) (const_int 2) (const_int 3)])) + (match_operand:QI 3 "register_operand" "Yk")) + (match_dup 4)))] + "TARGET_AVX512FP16 && TARGET_AVX512VL" + "operands[4] = CONST0_RTX (V4HFmode);") + +(define_insn "*avx512fp16_vcvt2ph__mask" + [(set (match_operand:V8HF 0 "register_operand" "=v") + (vec_concat:V8HF + (vec_merge:V4HF + (float_truncate:V4HF (match_operand:VF4_128_8_256 1 "vector_operand" "vm")) + (vec_select:V4HF + (match_operand:V8HF 2 "nonimm_or_0_operand" "0C") + (parallel [(const_int 0) (const_int 1) (const_int 2) (const_int 3)])) + (match_operand:QI 3 "register_operand" "Yk")) + (match_operand:V4HF 4 "const0_operand" "C")))] + "TARGET_AVX512FP16 && TARGET_AVX512VL" + "vcvt2ph\t{%1, %0%{%3%}%N2|%0%{%3%}%N2, %1}" + [(set_attr "type" "ssecvt") + (set_attr "prefix" "evex") + (set_attr "mode" "")]) + +(define_insn "*avx512fp16_vcvt2ph__mask_1" + [(set (match_operand:V8HF 0 "register_operand" "=v") + (vec_concat:V8HF + (vec_merge:V4HF + (float_truncate:V4HF (match_operand:VF4_128_8_256 1 + "vector_operand" "vm")) + (match_operand:V4HF 3 "const0_operand" "C") + (match_operand:QI 2 "register_operand" "Yk")) + (match_operand:V4HF 4 "const0_operand" "C")))] + "TARGET_AVX512FP16 && TARGET_AVX512VL" + "vcvt2ph\t{%1, %0%{%2%}%{z%}|%0%{%2%}%{z%}, %1}" + [(set_attr "type" "ssecvt") + (set_attr "prefix" "evex") + (set_attr "mode" "")]) + +(define_expand "avx512fp16_vcvtpd2ph_v2df" + [(set (match_operand:V8HF 0 "register_operand" "=v") + (vec_concat:V8HF + (float_truncate:V2HF (match_operand:V2DF 1 "vector_operand" "vm")) + (match_dup 2)))] + "TARGET_AVX512FP16 && TARGET_AVX512VL" + "operands[2] = CONST0_RTX (V6HFmode);") + +(define_insn "*avx512fp16_vcvtpd2ph_v2df" + [(set (match_operand:V8HF 0 "register_operand" "=v") + (vec_concat:V8HF + (float_truncate:V2HF (match_operand:V2DF 1 "vector_operand" "vm")) + (match_operand:V6HF 2 "const0_operand" "C")))] + "TARGET_AVX512FP16 && TARGET_AVX512VL" + "vcvtpd2ph{x}\t{%1, %0|%0, %1}" + [(set_attr "type" "ssecvt") + (set_attr "prefix" "evex") + (set_attr "mode" "TI")]) + +(define_expand "avx512fp16_vcvtpd2ph_v2df_mask" + [(set (match_operand:V8HF 0 "register_operand" "=v") + (vec_concat:V8HF + (vec_merge:V2HF + (float_truncate:V2HF (match_operand:V2DF 1 "vector_operand" "vm")) + (vec_select:V2HF + (match_operand:V8HF 2 "nonimm_or_0_operand" "0C") + (parallel [(const_int 0) (const_int 1)])) + (match_operand:QI 3 "register_operand" "Yk")) + (match_dup 4)))] + "TARGET_AVX512FP16 && TARGET_AVX512VL" + "operands[4] = CONST0_RTX (V6HFmode);") + +(define_insn "*avx512fp16_vcvtpd2ph_v2df_mask" + [(set (match_operand:V8HF 0 "register_operand" "=v") + (vec_concat:V8HF + (vec_merge:V2HF + (float_truncate:V2HF (match_operand:V2DF 1 "vector_operand" "vm")) + (vec_select:V2HF + (match_operand:V8HF 2 "nonimm_or_0_operand" "0C") + (parallel [(const_int 0) (const_int 1)])) + (match_operand:QI 3 "register_operand" "Yk")) + (match_operand:V6HF 4 "const0_operand" "C")))] + "TARGET_AVX512FP16 && TARGET_AVX512VL" + "vcvtpd2ph{x}\t{%1, %0%{%3%}%N2|%0%{%3%}%N2, %1}" + [(set_attr "type" "ssecvt") + (set_attr "prefix" "evex") + (set_attr "mode" "TI")]) + +(define_insn "*avx512fp16_vcvtpd2ph_v2df_mask_1" + [(set (match_operand:V8HF 0 "register_operand" "=v") + (vec_concat:V8HF + (vec_merge:V2HF + (float_truncate:V2HF (match_operand:V2DF 1 + "vector_operand" "vm")) + (match_operand:V2HF 3 "const0_operand" "C") + (match_operand:QI 2 "register_operand" "Yk")) + (match_operand:V6HF 4 "const0_operand" "C")))] + "TARGET_AVX512FP16 && TARGET_AVX512VL" + "vcvtpd2ph{x}\t{%1, %0%{%2%}%{z%}|%0%{%2%}%{z%}, %1}" + [(set_attr "type" "ssecvt") + (set_attr "prefix" "evex") + (set_attr "mode" "TI")]) + ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; ;; ;; Parallel single-precision floating point conversion operations diff --git a/gcc/testsuite/gcc.target/i386/avx-1.c b/gcc/testsuite/gcc.target/i386/avx-1.c index 595a6ac007a..f186f8c40f3 100644 --- a/gcc/testsuite/gcc.target/i386/avx-1.c +++ b/gcc/testsuite/gcc.target/i386/avx-1.c @@ -749,6 +749,10 @@ #define __builtin_ia32_vcvtsi2sh64_round(A, B, C) __builtin_ia32_vcvtsi2sh64_round(A, B, 8) #define __builtin_ia32_vcvtusi2sh32_round(A, B, C) __builtin_ia32_vcvtusi2sh32_round(A, B, 8) #define __builtin_ia32_vcvtusi2sh64_round(A, B, C) __builtin_ia32_vcvtusi2sh64_round(A, B, 8) +#define __builtin_ia32_vcvtph2pd_v8df_mask_round(A, B, C, D) __builtin_ia32_vcvtph2pd_v8df_mask_round(A, B, C, 8) +#define __builtin_ia32_vcvtph2ps_v16sf_mask_round(A, B, C, D) __builtin_ia32_vcvtph2ps_v16sf_mask_round(A, B, C, 8) +#define __builtin_ia32_vcvtpd2ph_v8df_mask_round(A, B, C, D) __builtin_ia32_vcvtpd2ph_v8df_mask_round(A, B, C, 8) +#define __builtin_ia32_vcvtps2ph_v16sf_mask_round(A, B, C, D) __builtin_ia32_vcvtps2ph_v16sf_mask_round(A, B, C, 8) /* avx512fp16vlintrin.h */ #define __builtin_ia32_vcmpph_v8hf_mask(A, B, C, D) __builtin_ia32_vcmpph_v8hf_mask(A, B, 1, D) diff --git a/gcc/testsuite/gcc.target/i386/sse-13.c b/gcc/testsuite/gcc.target/i386/sse-13.c index 0d976fb0de4..0e88174e636 100644 --- a/gcc/testsuite/gcc.target/i386/sse-13.c +++ b/gcc/testsuite/gcc.target/i386/sse-13.c @@ -766,6 +766,10 @@ #define __builtin_ia32_vcvtsi2sh64_round(A, B, C) __builtin_ia32_vcvtsi2sh64_round(A, B, 8) #define __builtin_ia32_vcvtusi2sh32_round(A, B, C) __builtin_ia32_vcvtusi2sh32_round(A, B, 8) #define __builtin_ia32_vcvtusi2sh64_round(A, B, C) __builtin_ia32_vcvtusi2sh64_round(A, B, 8) +#define __builtin_ia32_vcvtph2pd_v8df_mask_round(A, B, C, D) __builtin_ia32_vcvtph2pd_v8df_mask_round(A, B, C, 8) +#define __builtin_ia32_vcvtph2ps_v16sf_mask_round(A, B, C, D) __builtin_ia32_vcvtph2ps_v16sf_mask_round(A, B, C, 8) +#define __builtin_ia32_vcvtpd2ph_v8df_mask_round(A, B, C, D) __builtin_ia32_vcvtpd2ph_v8df_mask_round(A, B, C, 8) +#define __builtin_ia32_vcvtps2ph_v16sf_mask_round(A, B, C, D) __builtin_ia32_vcvtps2ph_v16sf_mask_round(A, B, C, 8) /* avx512fp16vlintrin.h */ #define __builtin_ia32_vcmpph_v8hf_mask(A, B, C, D) __builtin_ia32_vcmpph_v8hf_mask(A, B, 1, D) diff --git a/gcc/testsuite/gcc.target/i386/sse-14.c b/gcc/testsuite/gcc.target/i386/sse-14.c index 403f3af6067..5c3e370d4a7 100644 --- a/gcc/testsuite/gcc.target/i386/sse-14.c +++ b/gcc/testsuite/gcc.target/i386/sse-14.c @@ -687,6 +687,8 @@ test_1 (_mm512_cvt_roundph_epu32, __m512i, __m256h, 8) test_1 (_mm512_cvtt_roundph_epi32, __m512i, __m256h, 8) test_1 (_mm512_cvtt_roundph_epu32, __m512i, __m256h, 8) test_1 (_mm512_cvtt_roundph_epi64, __m512i, __m128h, 8) +test_1 (_mm512_cvt_roundph_pd, __m512d, __m128h, 8) +test_1 (_mm512_cvtx_roundph_ps, __m512, __m256h, 8) test_1 (_mm512_cvtt_roundph_epu64, __m512i, __m128h, 8) test_1 (_mm512_cvt_roundph_epi64, __m512i, __m128h, 8) test_1 (_mm512_cvt_roundph_epu64, __m512i, __m128h, 8) @@ -696,6 +698,8 @@ test_1 (_mm512_cvt_roundepi32_ph, __m256h, __m512i, 8) test_1 (_mm512_cvt_roundepu32_ph, __m256h, __m512i, 8) test_1 (_mm512_cvt_roundepi64_ph, __m128h, __m512i, 8) test_1 (_mm512_cvt_roundepu64_ph, __m128h, __m512i, 8) +test_1 (_mm512_cvtx_roundps_ph, __m256h, __m512, 8) +test_1 (_mm512_cvt_roundpd_ph, __m128h, __m512d, 8) test_1 (_mm_cvt_roundsh_i32, int, __m128h, 8) test_1 (_mm_cvt_roundsh_u32, unsigned, __m128h, 8) test_1 (_mm_cvtt_roundsh_i32, int, __m128h, 8) @@ -751,6 +755,8 @@ test_2 (_mm512_maskz_cvt_roundph_epu64, __m512i, __mmask8, __m128h, 8) test_2 (_mm512_maskz_cvtt_roundph_epi32, __m512i, __mmask16, __m256h, 8) test_2 (_mm512_maskz_cvtt_roundph_epu32, __m512i, __mmask16, __m256h, 8) test_2 (_mm512_maskz_cvtt_roundph_epi64, __m512i, __mmask8, __m128h, 8) +test_2 (_mm512_maskz_cvt_roundph_pd, __m512d, __mmask8, __m128h, 8) +test_2 (_mm512_maskz_cvtx_roundph_ps, __m512, __mmask16, __m256h, 8) test_2 (_mm512_maskz_cvtt_roundph_epu64, __m512i, __mmask8, __m128h, 8) test_2 (_mm512_maskz_cvt_roundepi16_ph, __m512h, __mmask32, __m512i, 8) test_2 (_mm512_maskz_cvt_roundepu16_ph, __m512h, __mmask32, __m512i, 8) @@ -758,6 +764,8 @@ test_2 (_mm512_maskz_cvt_roundepi32_ph, __m256h, __mmask16, __m512i, 8) test_2 (_mm512_maskz_cvt_roundepu32_ph, __m256h, __mmask16, __m512i, 8) test_2 (_mm512_maskz_cvt_roundepi64_ph, __m128h, __mmask8, __m512i, 8) test_2 (_mm512_maskz_cvt_roundepu64_ph, __m128h, __mmask8, __m512i, 8) +test_2 (_mm512_maskz_cvtx_roundps_ph, __m256h, __mmask16, __m512, 8) +test_2 (_mm512_maskz_cvt_roundpd_ph, __m128h, __mmask8, __m512d, 8) test_2 (_mm_cvt_roundi32_sh, __m128h, __m128h, int, 8) test_2 (_mm_cvt_roundu32_sh, __m128h, __m128h, unsigned, 8) test_2x (_mm512_cmp_round_ph_mask, __mmask32, __m512h, __m512h, 1, 8) @@ -809,6 +817,8 @@ test_3 (_mm512_mask_cvt_roundph_epu64, __m512i, __m512i, __mmask8, __m128h, 8) test_3 (_mm512_mask_cvtt_roundph_epi32, __m512i, __m512i, __mmask16, __m256h, 8) test_3 (_mm512_mask_cvtt_roundph_epu32, __m512i, __m512i, __mmask16, __m256h, 8) test_3 (_mm512_mask_cvtt_roundph_epi64, __m512i, __m512i, __mmask8, __m128h, 8) +test_3 (_mm512_mask_cvt_roundph_pd, __m512d, __m512d, __mmask8, __m128h, 8) +test_3 (_mm512_mask_cvtx_roundph_ps, __m512, __m512, __mmask16, __m256h, 8) test_3 (_mm512_mask_cvtt_roundph_epu64, __m512i, __m512i, __mmask8, __m128h, 8) test_3 (_mm512_mask_cvt_roundepi16_ph, __m512h, __m512h, __mmask32, __m512i, 8) test_3 (_mm512_mask_cvt_roundepu16_ph, __m512h, __m512h, __mmask32, __m512i, 8) @@ -816,6 +826,8 @@ test_3 (_mm512_mask_cvt_roundepi32_ph, __m256h, __m256h, __mmask16, __m512i, 8) test_3 (_mm512_mask_cvt_roundepu32_ph, __m256h, __m256h, __mmask16, __m512i, 8) test_3 (_mm512_mask_cvt_roundepi64_ph, __m128h, __m128h, __mmask8, __m512i, 8) test_3 (_mm512_mask_cvt_roundepu64_ph, __m128h, __m128h, __mmask8, __m512i, 8) +test_3 (_mm512_mask_cvtx_roundps_ph, __m256h, __m256h, __mmask16, __m512, 8) +test_3 (_mm512_mask_cvt_roundpd_ph, __m128h, __m128h, __mmask8, __m512d, 8) test_3x (_mm512_mask_cmp_round_ph_mask, __mmask32, __mmask32, __m512h, __m512h, 1, 8) test_3x (_mm_mask_cmp_round_sh_mask, __mmask8, __mmask8, __m128h, __m128h, 1, 8) test_3x (_mm512_mask_reduce_round_ph, __m512h, __m512h, __mmask32, __m512h, 123, 8) diff --git a/gcc/testsuite/gcc.target/i386/sse-22.c b/gcc/testsuite/gcc.target/i386/sse-22.c index b980ac3cddd..5bf94d56ce3 100644 --- a/gcc/testsuite/gcc.target/i386/sse-22.c +++ b/gcc/testsuite/gcc.target/i386/sse-22.c @@ -794,6 +794,8 @@ test_1 (_mm512_cvt_roundph_epu64, __m512i, __m128h, 8) test_1 (_mm512_cvtt_roundph_epi32, __m512i, __m256h, 8) test_1 (_mm512_cvtt_roundph_epu32, __m512i, __m256h, 8) test_1 (_mm512_cvtt_roundph_epi64, __m512i, __m128h, 8) +test_1 (_mm512_cvt_roundph_pd, __m512d, __m128h, 8) +test_1 (_mm512_cvtx_roundph_ps, __m512, __m256h, 8) test_1 (_mm512_cvtt_roundph_epu64, __m512i, __m128h, 8) test_1 (_mm512_cvt_roundepi16_ph, __m512h, __m512i, 8) test_1 (_mm512_cvt_roundepu16_ph, __m512h, __m512i, 8) @@ -801,6 +803,8 @@ test_1 (_mm512_cvt_roundepi32_ph, __m256h, __m512i, 8) test_1 (_mm512_cvt_roundepu32_ph, __m256h, __m512i, 8) test_1 (_mm512_cvt_roundepi64_ph, __m128h, __m512i, 8) test_1 (_mm512_cvt_roundepu64_ph, __m128h, __m512i, 8) +test_1 (_mm512_cvtx_roundps_ph, __m256h, __m512, 8) +test_1 (_mm512_cvt_roundpd_ph, __m128h, __m512d, 8) test_1 (_mm_cvt_roundsh_i32, int, __m128h, 8) test_1 (_mm_cvt_roundsh_u32, unsigned, __m128h, 8) test_1 (_mm_cvtt_roundsh_i32, int, __m128h, 8) @@ -855,6 +859,8 @@ test_2 (_mm512_maskz_cvt_roundph_epu64, __m512i, __mmask8, __m128h, 8) test_2 (_mm512_maskz_cvtt_roundph_epi32, __m512i, __mmask16, __m256h, 8) test_2 (_mm512_maskz_cvtt_roundph_epu32, __m512i, __mmask16, __m256h, 8) test_2 (_mm512_maskz_cvtt_roundph_epi64, __m512i, __mmask8, __m128h, 8) +test_2 (_mm512_maskz_cvt_roundph_pd, __m512d, __mmask8, __m128h, 8) +test_2 (_mm512_maskz_cvtx_roundph_ps, __m512, __mmask16, __m256h, 8) test_2 (_mm512_maskz_cvtt_roundph_epu64, __m512i, __mmask8, __m128h, 8) test_2 (_mm512_maskz_cvt_roundepi16_ph, __m512h, __mmask32, __m512i, 8) test_2 (_mm512_maskz_cvt_roundepu16_ph, __m512h, __mmask32, __m512i, 8) @@ -862,6 +868,8 @@ test_2 (_mm512_maskz_cvt_roundepi32_ph, __m256h, __mmask16, __m512i, 8) test_2 (_mm512_maskz_cvt_roundepu32_ph, __m256h, __mmask16, __m512i, 8) test_2 (_mm512_maskz_cvt_roundepi64_ph, __m128h, __mmask8, __m512i, 8) test_2 (_mm512_maskz_cvt_roundepu64_ph, __m128h, __mmask8, __m512i, 8) +test_2 (_mm512_maskz_cvtx_roundps_ph, __m256h, __mmask16, __m512, 8) +test_2 (_mm512_maskz_cvt_roundpd_ph, __m128h, __mmask8, __m512d, 8) test_2 (_mm_cvt_roundi32_sh, __m128h, __m128h, int, 8) test_2 (_mm_cvt_roundu32_sh, __m128h, __m128h, unsigned, 8) test_2x (_mm512_cmp_round_ph_mask, __mmask32, __m512h, __m512h, 1, 8) @@ -912,6 +920,8 @@ test_3 (_mm512_mask_cvt_roundph_epu64, __m512i, __m512i, __mmask8, __m128h, 8) test_3 (_mm512_mask_cvtt_roundph_epi32, __m512i, __m512i, __mmask16, __m256h, 8) test_3 (_mm512_mask_cvtt_roundph_epu32, __m512i, __m512i, __mmask16, __m256h, 8) test_3 (_mm512_mask_cvtt_roundph_epi64, __m512i, __m512i, __mmask8, __m128h, 8) +test_3 (_mm512_mask_cvt_roundph_pd, __m512d, __m512d, __mmask8, __m128h, 8) +test_3 (_mm512_mask_cvtx_roundph_ps, __m512, __m512, __mmask16, __m256h, 8) test_3 (_mm512_mask_cvtt_roundph_epu64, __m512i, __m512i, __mmask8, __m128h, 8) test_3 (_mm512_mask_cvt_roundepi16_ph, __m512h, __m512h, __mmask32, __m512i, 8) test_3 (_mm512_mask_cvt_roundepu16_ph, __m512h, __m512h, __mmask32, __m512i, 8) @@ -919,6 +929,8 @@ test_3 (_mm512_mask_cvt_roundepi32_ph, __m256h, __m256h, __mmask16, __m512i, 8) test_3 (_mm512_mask_cvt_roundepu32_ph, __m256h, __m256h, __mmask16, __m512i, 8) test_3 (_mm512_mask_cvt_roundepi64_ph, __m128h, __m128h, __mmask8, __m512i, 8) test_3 (_mm512_mask_cvt_roundepu64_ph, __m128h, __m128h, __mmask8, __m512i, 8) +test_3 (_mm512_mask_cvtx_roundps_ph, __m256h, __m256h, __mmask16, __m512, 8) +test_3 (_mm512_mask_cvt_roundpd_ph, __m128h, __m128h, __mmask8, __m512d, 8) test_3x (_mm512_mask_cmp_round_ph_mask, __mmask32, __mmask32, __m512h, __m512h, 1, 8) test_3x (_mm_mask_cmp_round_sh_mask, __mmask8, __mmask8, __m128h, __m128h, 1, 8) test_3x (_mm512_mask_reduce_round_ph, __m512h, __m512h, __mmask32, __m512h, 123, 8) diff --git a/gcc/testsuite/gcc.target/i386/sse-23.c b/gcc/testsuite/gcc.target/i386/sse-23.c index 1bd734a9352..2f27d9a1e87 100644 --- a/gcc/testsuite/gcc.target/i386/sse-23.c +++ b/gcc/testsuite/gcc.target/i386/sse-23.c @@ -767,6 +767,10 @@ #define __builtin_ia32_vcvtsi2sh64_round(A, B, C) __builtin_ia32_vcvtsi2sh64_round(A, B, 8) #define __builtin_ia32_vcvtusi2sh32_round(A, B, C) __builtin_ia32_vcvtusi2sh32_round(A, B, 8) #define __builtin_ia32_vcvtusi2sh64_round(A, B, C) __builtin_ia32_vcvtusi2sh64_round(A, B, 8) +#define __builtin_ia32_vcvtph2pd_v8df_mask_round(A, B, C, D) __builtin_ia32_vcvtph2pd_v8df_mask_round(A, B, C, 8) +#define __builtin_ia32_vcvtph2ps_v16sf_mask_round(A, B, C, D) __builtin_ia32_vcvtph2ps_v16sf_mask_round(A, B, C, 8) +#define __builtin_ia32_vcvtpd2ph_v8df_mask_round(A, B, C, D) __builtin_ia32_vcvtpd2ph_v8df_mask_round(A, B, C, 8) +#define __builtin_ia32_vcvtps2ph_v16sf_mask_round(A, B, C, D) __builtin_ia32_vcvtps2ph_v16sf_mask_round(A, B, C, 8) /* avx512fp16vlintrin.h */ #define __builtin_ia32_vcmpph_v8hf_mask(A, B, C, D) __builtin_ia32_vcmpph_v8hf_mask(A, B, 1, D) From patchwork Thu Jul 1 06:16:22 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: liuhongt X-Patchwork-Id: 1499375 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256 header.s=default header.b=dTJoweFj; dkim-atps=neutral Received: from sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4GFpv14xGDz9sVb for ; Thu, 1 Jul 2021 16:57:49 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 5545D3848015 for ; Thu, 1 Jul 2021 06:57:46 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 5545D3848015 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1625122666; bh=DM35NNp0YTvdo5+MK5YuJs3pmkYAzLQ+YwSq4pw8rMo=; h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=dTJoweFj1NMnAcUMBHwSXSwj+F/suBE4c1hOhSrfff4381O0fcPO25bn6X5f+k+S0 NnrkPM++t4JBD+ewhlgDQKOzqzHgTfF0uo0ExHf/Q64iqbb9umKTg7s++DzWZxG4LA aokg8ag40PsFTjU+ToaKsnV+9OpaX151+ReUsBkY= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mga06.intel.com (mga06.intel.com [134.134.136.31]) by sourceware.org (Postfix) with ESMTPS id 608B6384F009 for ; Thu, 1 Jul 2021 06:17:50 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 608B6384F009 X-IronPort-AV: E=McAfee;i="6200,9189,10031"; a="269610171" X-IronPort-AV: E=Sophos;i="5.83,313,1616482800"; d="scan'208";a="269610171" Received: from orsmga006.jf.intel.com ([10.7.209.51]) by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Jun 2021 23:17:49 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.83,313,1616482800"; d="scan'208";a="408822373" Received: from scymds01.sc.intel.com ([10.148.94.138]) by orsmga006.jf.intel.com with ESMTP; 30 Jun 2021 23:17:49 -0700 Received: from shliclel320.sh.intel.com (shliclel320.sh.intel.com [10.239.236.50]) by scymds01.sc.intel.com with ESMTP id 1616GmfF031625; Wed, 30 Jun 2021 23:17:48 -0700 To: gcc-patches@gcc.gnu.org Subject: [PATCH 36/62] AVX512FP16: Add testcase for vcvtph2pd/vcvtph2psx/vcvtpd2ph/vcvtps2phx. Date: Thu, 1 Jul 2021 14:16:22 +0800 Message-Id: <20210701061648.9447-37-hongtao.liu@intel.com> X-Mailer: git-send-email 2.18.1 In-Reply-To: <20210701061648.9447-1-hongtao.liu@intel.com> References: <20210701061648.9447-1-hongtao.liu@intel.com> X-Spam-Status: No, score=-12.5 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_NONE, KAM_DMARC_STATUS, KAM_LAZY_DOMAIN_SECURITY, KAM_SHORT, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: liuhongt via Gcc-patches From: liuhongt Reply-To: liuhongt Cc: jakub@redhat.com Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" gcc/testsuite/ChangeLog: * gcc.target/i386/avx512fp16-helper.h (V512): Add DF contents. (src3f): New. * gcc.target/i386/avx512fp16-vcvtpd2ph-1a.c: New test. * gcc.target/i386/avx512fp16-vcvtpd2ph-1b.c: Ditto. * gcc.target/i386/avx512fp16-vcvtph2pd-1a.c: Ditto. * gcc.target/i386/avx512fp16-vcvtph2pd-1b.c: Ditto. * gcc.target/i386/avx512fp16-vcvtph2psx-1a.c: Ditto. * gcc.target/i386/avx512fp16-vcvtph2psx-1b.c: Ditto. * gcc.target/i386/avx512fp16-vcvtps2ph-1a.c: Ditto. * gcc.target/i386/avx512fp16-vcvtps2ph-1b.c: Ditto. * gcc.target/i386/avx512fp16vl-vcvtpd2ph-1a.c: Ditto. * gcc.target/i386/avx512fp16vl-vcvtpd2ph-1b.c: Ditto. * gcc.target/i386/avx512fp16vl-vcvtph2pd-1a.c: Ditto. * gcc.target/i386/avx512fp16vl-vcvtph2pd-1b.c: Ditto. * gcc.target/i386/avx512fp16vl-vcvtph2psx-1a.c: Ditto. * gcc.target/i386/avx512fp16vl-vcvtph2psx-1b.c: Ditto. * gcc.target/i386/avx512fp16vl-vcvtps2ph-1a.c: Ditto. * gcc.target/i386/avx512fp16vl-vcvtps2ph-1b.c: Ditto. --- .../gcc.target/i386/avx512fp16-helper.h | 25 ++++-- .../gcc.target/i386/avx512fp16-vcvtpd2ph-1a.c | 24 ++++++ .../gcc.target/i386/avx512fp16-vcvtpd2ph-1b.c | 82 ++++++++++++++++++ .../gcc.target/i386/avx512fp16-vcvtph2pd-1a.c | 24 ++++++ .../gcc.target/i386/avx512fp16-vcvtph2pd-1b.c | 78 +++++++++++++++++ .../i386/avx512fp16-vcvtph2psx-1a.c | 24 ++++++ .../i386/avx512fp16-vcvtph2psx-1b.c | 81 ++++++++++++++++++ .../gcc.target/i386/avx512fp16-vcvtps2ph-1a.c | 24 ++++++ .../gcc.target/i386/avx512fp16-vcvtps2ph-1b.c | 84 +++++++++++++++++++ .../i386/avx512fp16vl-vcvtpd2ph-1a.c | 28 +++++++ .../i386/avx512fp16vl-vcvtpd2ph-1b.c | 15 ++++ .../i386/avx512fp16vl-vcvtph2pd-1a.c | 27 ++++++ .../i386/avx512fp16vl-vcvtph2pd-1b.c | 15 ++++ .../i386/avx512fp16vl-vcvtph2psx-1a.c | 27 ++++++ .../i386/avx512fp16vl-vcvtph2psx-1b.c | 15 ++++ .../i386/avx512fp16vl-vcvtps2ph-1a.c | 27 ++++++ .../i386/avx512fp16vl-vcvtps2ph-1b.c | 15 ++++ 17 files changed, 609 insertions(+), 6 deletions(-) create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvtpd2ph-1a.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvtpd2ph-1b.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvtph2pd-1a.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvtph2pd-1b.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvtph2psx-1a.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvtph2psx-1b.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvtps2ph-1a.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvtps2ph-1b.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtpd2ph-1a.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtpd2ph-1b.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtph2pd-1a.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtph2pd-1b.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtph2psx-1a.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtph2psx-1b.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtps2ph-1a.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtps2ph-1b.c diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-helper.h b/gcc/testsuite/gcc.target/i386/avx512fp16-helper.h index cf1c536d9f7..ce3cfdc3f6b 100644 --- a/gcc/testsuite/gcc.target/i386/avx512fp16-helper.h +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-helper.h @@ -26,23 +26,27 @@ typedef union __m512 zmm; __m512h zmmh; __m512i zmmi; + __m512d zmmd; __m256 ymm[2]; __m256h ymmh[2]; __m256i ymmi[2]; + __m256d ymmd[2]; __m128h xmmh[4]; __m128 xmm[4]; __m128i xmmi[4]; + __m128d xmmd[4]; unsigned short u16[32]; unsigned int u32[16]; int i32[16]; long long s64[8]; unsigned long long u64[8]; + double f64[8]; float f32[16]; _Float16 f16[32]; } V512; /* Global variables. */ -V512 src1, src2, src3; +V512 src1, src2, src3, src3f; int n_errs = 0; /* Helper function for packing/unpacking ph operands. */ @@ -167,12 +171,16 @@ init_src() int i; for (i = 0; i < AVX512F_MAX_ELEM; i++) { - v1.f32[i] = i + 1; - v2.f32[i] = i * 0.5f; - v3.f32[i] = i * 1.5f; - v4.f32[i] = i - 0.5f; + v1.f32[i] = i + 1; + v2.f32[i] = i * 0.5f; + v3.f32[i] = i * 1.5f; + v4.f32[i] = i - 0.5f; - src3.u32[i] = (i + 1) * 10; + src3.u32[i] = (i + 1) * 10; + } + + for (i = 0; i < 8; i++) { + src3f.f64[i] = (i + 1) * 7.5; } src1 = pack_twops_2ph(v1, v2); @@ -223,6 +231,7 @@ init_dest(V512 * res, V512 * exp) #undef HF #undef SF #undef SI +#undef DF #undef H_HF #undef NET_MASK #undef MASK_VALUE @@ -235,10 +244,12 @@ init_dest(V512 * res, V512 * exp) #define HF(x) x.ymmh[0] #define H_HF(x) x.xmmh[0] #define SF(x) x.ymm[0] +#define DF(x) x.ymmd[0] #define SI(x) x.ymmi[0] #elif AVX512F_LEN == 128 #undef HF #undef SF +#undef DF #undef SI #undef H_HF #undef NET_MASK @@ -251,6 +262,7 @@ init_dest(V512 * res, V512 * exp) #define ZMASK_VALUE 0xc1 #define HF(x) x.xmmh[0] #define SF(x) x.xmm[0] +#define DF(x) x.xmmd[0] #define SI(x) x.xmmi[0] #define H_HF(x) x.xmmh[0] #else @@ -260,6 +272,7 @@ init_dest(V512 * res, V512 * exp) #define HALF_MASK 0xcccc #define HF(x) x.zmmh #define SF(x) x.zmm +#define DF(x) x.zmmd #define SI(x) x.zmmi #define H_HF(x) x.ymmh[0] #endif diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtpd2ph-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtpd2ph-1a.c new file mode 100644 index 00000000000..8f74405873f --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtpd2ph-1a.c @@ -0,0 +1,24 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx512fp16 -O2" } */ +/* { dg-final { scan-assembler-times "vcvtpd2phz\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 2 } } */ +/* { dg-final { scan-assembler-times "vcvtpd2phz\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\[^\{\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtpd2phz\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtpd2ph\[ \\t\]+\{rn-sae\}\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtpd2ph\[ \\t\]+\{rz-sae\}\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ + +#include + +volatile __m128h res, res1, res2; +volatile __m512d x1, x2, x3; +volatile __mmask8 m8; + +void extern +avx512f_test (void) +{ + res = _mm512_cvtpd_ph (x1); + res1 = _mm512_mask_cvtpd_ph (res, m8, x2); + res2 = _mm512_maskz_cvtpd_ph (m8, x3); + res = _mm512_cvt_roundpd_ph (x1, 4); + res1 = _mm512_mask_cvt_roundpd_ph (res, m8, x2, 8); + res2 = _mm512_maskz_cvt_roundpd_ph (m8, x3, 11); +} diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtpd2ph-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtpd2ph-1b.c new file mode 100644 index 00000000000..dde364b65ca --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtpd2ph-1b.c @@ -0,0 +1,82 @@ +/* { dg-do run { target avx512fp16 } } */ +/* { dg-options "-O2 -mavx512fp16 -mavx512dq" } */ + + +#define AVX512FP16 +#include "avx512fp16-helper.h" + +#define N_ELEMS (AVX512F_LEN / 64) + +void NOINLINE +EMULATE(cvtpd2_ph) (V512 * dest, V512 op1, int n_el, + __mmask32 k, int zero_mask) +{ + V512 v1, v2, v3, v4, v5, v6, v7, v8; + int i; + __mmask16 m1, m2; + + m1 = k & 0xffff; + unpack_ph_2twops(*dest, &v7, &v8); + + for (i = 0; i < n_el; i++) { + if (((1 << i) & m1) == 0) { + if (zero_mask) { + v5.f32[i] = 0; + } + else { + v5.u32[i] = v7.u32[i]; + } + } + else { + v5.f32[i] = op1.f64[i]; + } + } + *dest = pack_twops_2ph(v5, v5); + for (i = n_el; i < 8; i++) + dest->u16[i] = 0; +} + +void +TEST (void) +{ + V512 res; + V512 exp; + + init_src(); + + EMULATE(cvtpd2_ph)(&exp, src3f, N_ELEMS, NET_MASK, 0); + res.xmmh[0] = INTRINSIC (_cvtpd_ph) (DF(src3f)); + CHECK_RESULT (&res, &exp, 8, _cvtpd_ph); + + init_dest(&res, &exp); + EMULATE(cvtpd2_ph)(&exp, src3f, N_ELEMS, 0xcc, 0); + res.xmmh[0] = INTRINSIC (_mask_cvtpd_ph) (res.xmmh[0], 0xcc, + DF(src3f)); + CHECK_RESULT (&res, &exp, 8, _mask_cvtpd_ph); + + EMULATE(cvtpd2_ph)(&exp, src3f, N_ELEMS, 0xf1, 1); + res.xmmh[0] = INTRINSIC (_maskz_cvtpd_ph) (0xf1, DF(src3f)); + CHECK_RESULT (&res, &exp, 8, _maskz_cvtpd_ph); + +#if AVX512F_LEN == 512 + EMULATE(cvtpd2_ph)(&exp, src3f, N_ELEMS, NET_MASK, 0); + res.xmmh[0] = INTRINSIC (_cvt_roundpd_ph) (DF(src3f), _ROUND_NINT); + CHECK_RESULT (&res, &exp, 8, _cvt_roundpd_ph); + + init_dest(&res, &exp); + EMULATE(cvtpd2_ph)(&exp, src3f, N_ELEMS, 0xcc, 0); + res.xmmh[0] = INTRINSIC (_mask_cvt_roundpd_ph) (res.xmmh[0], 0xcc, + DF(src3f), _ROUND_NINT); + CHECK_RESULT (&res, &exp, 8, _mask_cvt_roundpd_ph); + + EMULATE(cvtpd2_ph)(&exp, src3f, N_ELEMS, 0xf1, 1); + res.xmmh[0] = INTRINSIC (_maskz_cvt_roundpd_ph) (0xf1, DF(src3f), _ROUND_NINT); + CHECK_RESULT (&res, &exp, 8, _maskz_cvt_roundpd_ph); +#endif + + if (n_errs != 0) { + abort (); + } +} + + diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtph2pd-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtph2pd-1a.c new file mode 100644 index 00000000000..b7bb3b7840f --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtph2pd-1a.c @@ -0,0 +1,24 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx512fp16 -O2" } */ +/* { dg-final { scan-assembler-times "vcvtph2pd\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 2 } } */ +/* { dg-final { scan-assembler-times "vcvtph2pd\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\[^\{\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtph2pd\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtph2pd\[ \\t\]+\{sae\}\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\[^\{\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtph2pd\[ \\t\]+\{sae\}\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ + +#include + +volatile __m512d res, res1, res2; +volatile __m128h x1, x2, x3; +volatile __mmask8 m8; + +void extern +avx512f_test (void) +{ + res = _mm512_cvtph_pd (x1); + res1 = _mm512_mask_cvtph_pd (res, m8, x2); + res2 = _mm512_maskz_cvtph_pd (m8, x3); + res = _mm512_cvt_roundph_pd (x1, 4); + res1 = _mm512_mask_cvt_roundph_pd (res, m8, x2, 8); + res2 = _mm512_maskz_cvt_roundph_pd (m8, x3, 8); +} diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtph2pd-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtph2pd-1b.c new file mode 100644 index 00000000000..c20888ba534 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtph2pd-1b.c @@ -0,0 +1,78 @@ +/* { dg-do run { target avx512fp16 } } */ +/* { dg-options "-O2 -mavx512fp16 -mavx512vl -mavx512dq" } */ + + +#define AVX512FP16 +#include "avx512fp16-helper.h" + +#define N_ELEMS (AVX512F_LEN / 16) + +void NOINLINE +EMULATE(cvtph2_pd) (V512 * dest, V512 op1, + __mmask32 k, int zero_mask) +{ + V512 v1, v2, v3, v4, v5, v6, v7, v8; + int i; + __mmask16 m1, m2; + + m1 = k & 0xffff; + unpack_ph_2twops(op1, &v1, &v2); + + for (i = 0; i < 8; i++) { + if (((1 << i) & m1) == 0) { + if (zero_mask) { + v5.u64[i] = 0; + } + else { + v5.u64[i] = dest->u64[i]; + } + } + else { + v5.f64[i] = v1.f32[i]; + } + } + + *dest = v5; +} + +void +TEST (void) +{ + V512 res; + V512 exp; + + init_src(); + + EMULATE(cvtph2_pd)(&exp, src1, NET_MASK, 0); + DF(res) = INTRINSIC (_cvtph_pd) (src1.xmmh[0]); + CHECK_RESULT (&res, &exp, N_ELEMS, _cvtph_pd); + + init_dest(&res, &exp); + EMULATE(cvtph2_pd)(&exp, src1, 0xcc, 0); + DF(res) = INTRINSIC (_mask_cvtph_pd) (DF(res), 0xcc, src1.xmmh[0]); + CHECK_RESULT (&res, &exp, N_ELEMS, _mask_cvtph_pd); + + EMULATE(cvtph2_pd)(&exp, src1, 0xc1, 1); + DF(res) = INTRINSIC (_maskz_cvtph_pd) (0xc1, src1.xmmh[0]); + CHECK_RESULT (&res, &exp, N_ELEMS, _maskz_cvtph_pd); + +#if AVX512F_LEN == 512 + EMULATE(cvtph2_pd)(&exp, src1, NET_MASK, 0); + DF(res) = INTRINSIC (_cvt_roundph_pd) (src1.xmmh[0], _ROUND_CUR); + CHECK_RESULT (&res, &exp, N_ELEMS, _cvt_roundph_pd); + + init_dest(&res, &exp); + EMULATE(cvtph2_pd)(&exp, src1, 0xcc, 0); + DF(res) = INTRINSIC (_mask_cvt_roundph_pd) (DF(res), 0xcc, src1.xmmh[0], _ROUND_CUR); + CHECK_RESULT (&res, &exp, N_ELEMS, _mask_cvt_roundph_pd); + + EMULATE(cvtph2_pd)(&exp, src1, 0xc1, 1); + DF(res) = INTRINSIC (_maskz_cvt_roundph_pd) (0xc1, src1.xmmh[0], _ROUND_CUR); + CHECK_RESULT (&res, &exp, N_ELEMS, _maskz_cvt_roundph_pd); +#endif + + if (n_errs != 0) { + abort (); +} +} + diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtph2psx-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtph2psx-1a.c new file mode 100644 index 00000000000..c79549f67c5 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtph2psx-1a.c @@ -0,0 +1,24 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx512fp16 -O2" } */ +/* { dg-final { scan-assembler-times "vcvtph2psx\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 2 } } */ +/* { dg-final { scan-assembler-times "vcvtph2psx\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\[^\{\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtph2psx\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtph2psx\[ \\t\]+\{sae\}\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\[^\{\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtph2psx\[ \\t\]+\{sae\}\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ + +#include + +volatile __m512 res, res1, res2; +volatile __m256h x1, x2, x3; +volatile __mmask16 m16; + +void extern +avx512f_test (void) +{ + res = _mm512_cvtxph_ps (x1); + res1 = _mm512_mask_cvtxph_ps (res, m16, x2); + res2 = _mm512_maskz_cvtxph_ps (m16, x3); + res = _mm512_cvtx_roundph_ps (x1, 4); + res1 = _mm512_mask_cvtx_roundph_ps (res, m16, x2, 8); + res2 = _mm512_maskz_cvtx_roundph_ps (m16, x3, 8); +} diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtph2psx-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtph2psx-1b.c new file mode 100644 index 00000000000..a2f20c099b5 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtph2psx-1b.c @@ -0,0 +1,81 @@ +/* { dg-do run { target avx512fp16 } } */ +/* { dg-options "-O2 -mavx512fp16 -mavx512vl -mavx512dq" } */ + + +#define AVX512FP16 +#include "avx512fp16-helper.h" + +#define N_ELEMS (AVX512F_LEN / 32) +#define CHECK_ELEMS (AVX512F_LEN / 16) + +void NOINLINE +EMULATE(cvtxph2_ps) (V512 * dest, V512 op1, int n_el, + __mmask32 k, int zero_mask) +{ + V512 v1, v2, v3, v4, v5, v6, v7, v8; + int i; + __mmask16 m1, m2; + + m1 = k & 0xffff; + unpack_ph_2twops(op1, &v1, &v2); + + for (i = 0; i < n_el; i++) { + if (((1 << i) & m1) == 0) { + if (zero_mask) { + v5.u32[i] = 0; + } + else { + v5.u32[i] = dest->u32[i]; + } + } + else { + v5.f32[i] = v1.f32[i]; + } + } + + for (i = n_el; i < 16; i++) + v5.u32[i] = 0; + + *dest = v5; +} + +void +TEST (void) +{ + V512 res; + V512 exp; + + init_src(); + + EMULATE(cvtxph2_ps)(&exp, src1, N_ELEMS, 0xffff, 0); + SF(res) = INTRINSIC (_cvtxph_ps) (H_HF(src1)); + CHECK_RESULT (&res, &exp, CHECK_ELEMS, _cvtxph_ps); + + init_dest(&res, &exp); + EMULATE(cvtxph2_ps)(&exp, src1, N_ELEMS, 0xcc, 0); + SF(res) = INTRINSIC (_mask_cvtxph_ps) (SF(res), 0xcc, H_HF(src1)); + CHECK_RESULT (&res, &exp, CHECK_ELEMS, _mask_cvtxph_ps); + + EMULATE(cvtxph2_ps)(&exp, src1, N_ELEMS, 0xc1, 1); + SF(res) = INTRINSIC (_maskz_cvtxph_ps) (0xc1, H_HF(src1)); + CHECK_RESULT (&res, &exp, CHECK_ELEMS, _maskz_cvtxph_ps); + +#if AVX512F_LEN == 512 + EMULATE(cvtxph2_ps)(&exp, src1, N_ELEMS, 0xffff, 0); + SF(res) = INTRINSIC (_cvtx_roundph_ps) (H_HF(src1), _ROUND_CUR); + CHECK_RESULT (&res, &exp, CHECK_ELEMS, _cvtx_roundph_ps); + + init_dest(&res, &exp); + EMULATE(cvtxph2_ps)(&exp, src1, N_ELEMS, 0xcc, 0); + SF(res) = INTRINSIC (_mask_cvtx_roundph_ps) (SF(res), 0xcc, H_HF(src1), _ROUND_CUR); + CHECK_RESULT (&res, &exp, CHECK_ELEMS, _mask_cvtx_roundph_ps); + + EMULATE(cvtxph2_ps)(&exp, src1, N_ELEMS, 0xc1, 1); + SF(res) = INTRINSIC (_maskz_cvtx_roundph_ps) (0xc1, H_HF(src1), _ROUND_CUR); + CHECK_RESULT (&res, &exp, CHECK_ELEMS, _maskz_cvtx_roundph_ps); +#endif + + if (n_errs != 0) + abort (); +} + diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtps2ph-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtps2ph-1a.c new file mode 100644 index 00000000000..cb957f86920 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtps2ph-1a.c @@ -0,0 +1,24 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx512fp16 -O2" } */ +/* { dg-final { scan-assembler-times "vcvtps2phx\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 2 } } */ +/* { dg-final { scan-assembler-times "vcvtps2phx\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}\[^\{\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtps2phx\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtps2phx\[ \\t\]+\{rn-sae\}\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtps2phx\[ \\t\]+\{rz-sae\}\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ + +#include + +volatile __m256h res, res1, res2; +volatile __m512 x1, x2, x3; +volatile __mmask16 m16; + +void extern +avx512f_test (void) +{ + res = _mm512_cvtxps_ph (x1); + res1 = _mm512_mask_cvtxps_ph (res, m16, x2); + res2 = _mm512_maskz_cvtxps_ph (m16, x3); + res = _mm512_cvtx_roundps_ph (x1, 4); + res1 = _mm512_mask_cvtx_roundps_ph (res, m16, x2, 8); + res2 = _mm512_maskz_cvtx_roundps_ph (m16, x3, 11); +} diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtps2ph-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtps2ph-1b.c new file mode 100644 index 00000000000..e316e766f0a --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtps2ph-1b.c @@ -0,0 +1,84 @@ +/* { dg-do run { target avx512fp16 } } */ +/* { dg-options "-O2 -mavx512fp16 -mavx512dq" } */ + + +#define AVX512FP16 +#include "avx512fp16-helper.h" + +#define N_ELEMS (AVX512F_LEN / 32) +#define CHECK_ELEMS (AVX512F_LEN_HALF / 16) + +void NOINLINE +EMULATE(cvtxps2_ph) (V512 * dest, V512 op1, int n_el, + __mmask32 k, int zero_mask) +{ + V512 v1, v2, v3, v4, v5, v6, v7, v8; + int i; + __mmask16 m1, m2; + + m1 = k & 0xffff; + + unpack_ph_2twops(*dest, &v7, &v8); + + for (i = 0; i < n_el; i++) { + if (((1 << i) & m1) == 0) { + if (zero_mask) { + v5.f32[i] = 0; + } + else { + v5.u32[i] = v7.u32[i]; + } + } + else { + v5.f32[i] = op1.f32[i]; + } + } + *dest = pack_twops_2ph(v5, v5); + for (i = n_el; i < 16; i++) + dest->u16[i] = 0; +} + +void +TEST (void) +{ + V512 res; + V512 exp; + + init_src(); + + EMULATE(cvtxps2_ph)(&exp, src3f, N_ELEMS, NET_MASK, 0); + H_HF(res) = INTRINSIC (_cvtxps_ph) (SF(src3f)); + CHECK_RESULT (&res, &exp, CHECK_ELEMS, _cvtxps_ph); + + init_dest(&res, &exp); + EMULATE(cvtxps2_ph)(&exp, src3f, N_ELEMS, 0xcc, 0); + H_HF(res) = INTRINSIC (_mask_cvtxps_ph) (H_HF(res), 0xcc, + SF(src3f)); + CHECK_RESULT (&res, &exp, CHECK_ELEMS, _mask_cvtxps_ph); + + EMULATE(cvtxps2_ph)(&exp, src3f, N_ELEMS, 0xf1, 1); + H_HF(res) = INTRINSIC (_maskz_cvtxps_ph) (0xf1, SF(src3f)); + CHECK_RESULT (&res, &exp, CHECK_ELEMS, _maskz_cvtxps_ph); + +#if AVX512F_LEN == 512 + EMULATE(cvtxps2_ph)(&exp, src3f, N_ELEMS, NET_MASK, 0); + H_HF(res) = INTRINSIC (_cvtx_roundps_ph) (SF(src3f), _ROUND_NINT); + CHECK_RESULT (&res, &exp, CHECK_ELEMS, _cvtx_roundps_ph); + + init_dest(&res, &exp); + EMULATE(cvtxps2_ph)(&exp, src3f, N_ELEMS, 0xcc, 0); + H_HF(res) = INTRINSIC (_mask_cvtx_roundps_ph) (H_HF(res), 0xcc, + SF(src3f), _ROUND_NINT); + CHECK_RESULT (&res, &exp, CHECK_ELEMS, _mask_cvtx_roundps_ph); + + EMULATE(cvtxps2_ph)(&exp, src3f, N_ELEMS, 0xf1, 1); + H_HF(res) = INTRINSIC (_maskz_cvtx_roundps_ph) (0xf1, SF(src3f), _ROUND_NINT); + CHECK_RESULT (&res, &exp, CHECK_ELEMS, _maskz_cvtx_roundps_ph); +#endif + + if (n_errs != 0) { + abort (); + } +} + + diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtpd2ph-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtpd2ph-1a.c new file mode 100644 index 00000000000..57604a91334 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtpd2ph-1a.c @@ -0,0 +1,28 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx512fp16 -mavx512vl -O2" } */ +/* { dg-final { scan-assembler-times "vcvtpd2phy\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtpd2phy\[ \\t\]+%ymm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtpd2phy\[ \\t\]+%ymm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtpd2phx\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtpd2phx\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtpd2phx\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ + +#include + +volatile __m128h res3; +volatile __m256d x2; +volatile __m128d x3; +volatile __mmask16 m16; +volatile __mmask8 m8; + +void extern +avx512f_test (void) +{ + res3 = _mm256_cvtpd_ph (x2); + res3 = _mm256_mask_cvtpd_ph (res3, m16, x2); + res3 = _mm256_maskz_cvtpd_ph (m16, x2); + + res3 = _mm_cvtpd_ph (x3); + res3 = _mm_mask_cvtpd_ph (res3, m8, x3); + res3 = _mm_maskz_cvtpd_ph (m8, x3); +} diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtpd2ph-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtpd2ph-1b.c new file mode 100644 index 00000000000..ea4b200803b --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtpd2ph-1b.c @@ -0,0 +1,15 @@ +/* { dg-do run { target avx512fp16 } } */ +/* { dg-options "-O2 -mavx512fp16 -mavx512vl -mavx512dq" } */ + +#define AVX512VL +#define AVX512F_LEN 256 +#define AVX512F_LEN_HALF 128 +#include "avx512fp16-vcvtpd2ph-1b.c" + +#undef AVX512F_LEN +#undef AVX512F_LEN_HALF + +#define AVX512F_LEN 128 +#define AVX512F_LEN_HALF 128 +#include "avx512fp16-vcvtpd2ph-1b.c" + diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtph2pd-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtph2pd-1a.c new file mode 100644 index 00000000000..80010c02297 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtph2pd-1a.c @@ -0,0 +1,27 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx512fp16 -mavx512vl -O2" } */ +/* { dg-final { scan-assembler-times "vcvtph2pd\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtph2pd\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtph2pd\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtph2pd\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtph2pd\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtph2pd\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ + +#include + +volatile __m256d res1; +volatile __m128d res2; +volatile __m128h x3; +volatile __mmask8 m8; + +void extern +avx512f_test (void) +{ + res1 = _mm256_cvtph_pd (x3); + res1 = _mm256_mask_cvtph_pd (res1, m8, x3); + res1 = _mm256_maskz_cvtph_pd (m8, x3); + + res2 = _mm_cvtph_pd (x3); + res2 = _mm_mask_cvtph_pd (res2, m8, x3); + res2 = _mm_maskz_cvtph_pd (m8, x3); +} diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtph2pd-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtph2pd-1b.c new file mode 100644 index 00000000000..a3849056870 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtph2pd-1b.c @@ -0,0 +1,15 @@ +/* { dg-do run { target avx512fp16 } } */ +/* { dg-options "-O2 -mavx512fp16 -mavx512vl -mavx512dq" } */ + +#define AVX512VL +#define AVX512F_LEN 256 +#define AVX512F_LEN_HALF 128 +#include "avx512fp16-vcvtph2pd-1b.c" + +#undef AVX512F_LEN +#undef AVX512F_LEN_HALF + +#define AVX512F_LEN 128 +#define AVX512F_LEN_HALF 128 +#include "avx512fp16-vcvtph2pd-1b.c" + diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtph2psx-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtph2psx-1a.c new file mode 100644 index 00000000000..e8c4c8c70d7 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtph2psx-1a.c @@ -0,0 +1,27 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx512fp16 -mavx512vl -O2" } */ +/* { dg-final { scan-assembler-times "vcvtph2psx\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtph2psx\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtph2psx\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtph2psx\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtph2psx\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtph2psx\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ + +#include + +volatile __m256 res1; +volatile __m128 res2; +volatile __m128h x3; +volatile __mmask8 m8; + +void extern +avx512f_test (void) +{ + res1 = _mm256_cvtxph_ps (x3); + res1 = _mm256_mask_cvtxph_ps (res1, m8, x3); + res1 = _mm256_maskz_cvtxph_ps (m8, x3); + + res2 = _mm_cvtxph_ps (x3); + res2 = _mm_mask_cvtxph_ps (res2, m8, x3); + res2 = _mm_maskz_cvtxph_ps (m8, x3); +} diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtph2psx-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtph2psx-1b.c new file mode 100644 index 00000000000..ad91de85370 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtph2psx-1b.c @@ -0,0 +1,15 @@ +/* { dg-do run { target avx512fp16 } } */ +/* { dg-options "-O2 -mavx512fp16 -mavx512vl -mavx512dq" } */ + +#define AVX512VL +#define AVX512F_LEN 256 +#define AVX512F_LEN_HALF 128 +#include "avx512fp16-vcvtph2psx-1b.c" + +#undef AVX512F_LEN +#undef AVX512F_LEN_HALF + +#define AVX512F_LEN 128 +#define AVX512F_LEN_HALF 128 +#include "avx512fp16-vcvtph2psx-1b.c" + diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtps2ph-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtps2ph-1a.c new file mode 100644 index 00000000000..a89f8c4fe87 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtps2ph-1a.c @@ -0,0 +1,27 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx512fp16 -mavx512vl -O2" } */ +/* { dg-final { scan-assembler-times "vcvtps2phxy\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtps2phxy\[ \\t\]+%ymm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtps2phxy\[ \\t\]+%ymm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtps2phxx\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtps2phxx\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtps2phxx\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ + +#include + +volatile __m128h res3; +volatile __m256 x2; +volatile __m128 x3; +volatile __mmask8 m8; + +void extern +avx512f_test (void) +{ + res3 = _mm256_cvtxps_ph (x2); + res3 = _mm256_mask_cvtxps_ph (res3, m8, x2); + res3 = _mm256_maskz_cvtxps_ph (m8, x2); + + res3 = _mm_cvtxps_ph (x3); + res3 = _mm_mask_cvtxps_ph (res3, m8, x3); + res3 = _mm_maskz_cvtxps_ph (m8, x3); +} diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtps2ph-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtps2ph-1b.c new file mode 100644 index 00000000000..a339d0c933e --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtps2ph-1b.c @@ -0,0 +1,15 @@ +/* { dg-do run { target avx512fp16 } } */ +/* { dg-options "-O2 -mavx512fp16 -mavx512vl -mavx512dq" } */ + +#define AVX512VL +#define AVX512F_LEN 256 +#define AVX512F_LEN_HALF 128 +#include "avx512fp16-vcvtps2ph-1b.c" + +#undef AVX512F_LEN +#undef AVX512F_LEN_HALF + +#define AVX512F_LEN 128 +#define AVX512F_LEN_HALF 128 +#include "avx512fp16-vcvtps2ph-1b.c" + From patchwork Thu Jul 1 06:16:23 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: liuhongt X-Patchwork-Id: 1499380 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=8.43.85.97; helo=sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256 header.s=default header.b=kwyOtMtE; dkim-atps=neutral Received: from sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4GFpym4R63z9sVb for ; Thu, 1 Jul 2021 17:01:04 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 5558E393D00C for ; Thu, 1 Jul 2021 07:01:02 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 5558E393D00C DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1625122862; bh=2XfKYB3QQ/D8n/chTjfwox/PHGXbMsJf0hR+TGjKRP0=; h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=kwyOtMtEP8L1CKNvsyMeBjZA2QpcINsq/40W5tXjOmlMwxMUixgvecFNt8Z6EF68r MtTdujzutQcDtVfXD7T4xKyky6WLnmxndt8iRD4tJ/iG3ySEwphhuB/RfQTWNPhyTW Gjgy2N6m9A+WJ3xFu/wrXCQD3LX6UC5NLc099wGA= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mga12.intel.com (mga12.intel.com [192.55.52.136]) by sourceware.org (Postfix) with ESMTPS id 7F7C5384BC22 for ; Thu, 1 Jul 2021 06:17:55 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 7F7C5384BC22 X-IronPort-AV: E=McAfee;i="6200,9189,10031"; a="188163515" X-IronPort-AV: E=Sophos;i="5.83,313,1616482800"; d="scan'208";a="188163515" Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Jun 2021 23:17:50 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.83,313,1616482800"; d="scan'208";a="476545939" Received: from scymds01.sc.intel.com ([10.148.94.138]) by fmsmga004.fm.intel.com with ESMTP; 30 Jun 2021 23:17:50 -0700 Received: from shliclel320.sh.intel.com (shliclel320.sh.intel.com [10.239.236.50]) by scymds01.sc.intel.com with ESMTP id 1616GmfG031625; Wed, 30 Jun 2021 23:17:49 -0700 To: gcc-patches@gcc.gnu.org Subject: [PATCH 37/62] AVX512FP16: Add vcvtsh2ss/vcvtsh2sd/vcvtss2sh/vcvtsd2sh. Date: Thu, 1 Jul 2021 14:16:23 +0800 Message-Id: <20210701061648.9447-38-hongtao.liu@intel.com> X-Mailer: git-send-email 2.18.1 In-Reply-To: <20210701061648.9447-1-hongtao.liu@intel.com> References: <20210701061648.9447-1-hongtao.liu@intel.com> X-Spam-Status: No, score=-12.2 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_NONE, KAM_DMARC_STATUS, KAM_LAZY_DOMAIN_SECURITY, KAM_SHORT, SPF_HELO_PASS, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: liuhongt via Gcc-patches From: liuhongt Reply-To: liuhongt Cc: jakub@redhat.com Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" gcc/ChangeLog: * config/i386/avx512fp16intrin.h (_mm_cvtsh_ss): New intrinsic. (_mm_mask_cvtsh_ss): Likewise. (_mm_maskz_cvtsh_ss): Likewise. (_mm_cvtsh_sd): Likewise. (_mm_mask_cvtsh_sd): Likewise. (_mm_maskz_cvtsh_sd): Likewise. (_mm_cvt_roundsh_ss): Likewise. (_mm_mask_cvt_roundsh_ss): Likewise. (_mm_maskz_cvt_roundsh_ss): Likewise. (_mm_cvt_roundsh_sd): Likewise. (_mm_mask_cvt_roundsh_sd): Likewise. (_mm_maskz_cvt_roundsh_sd): Likewise. (_mm_cvtss_sh): Likewise. (_mm_mask_cvtss_sh): Likewise. (_mm_maskz_cvtss_sh): Likewise. (_mm_cvtsd_sh): Likewise. (_mm_mask_cvtsd_sh): Likewise. (_mm_maskz_cvtsd_sh): Likewise. (_mm_cvt_roundss_sh): Likewise. (_mm_mask_cvt_roundss_sh): Likewise. (_mm_maskz_cvt_roundss_sh): Likewise. (_mm_cvt_roundsd_sh): Likewise. (_mm_mask_cvt_roundsd_sh): Likewise. (_mm_maskz_cvt_roundsd_sh): Likewise. * config/i386/i386-builtin-types.def (V8HF_FTYPE_V2DF_V8HF_V8HF_UQI_INT, V8HF_FTYPE_V4SF_V8HF_V8HF_UQI_INT, V2DF_FTYPE_V8HF_V2DF_V2DF_UQI_INT, V4SF_FTYPE_V8HF_V4SF_V4SF_UQI_INT): Add new builtin types. * config/i386/i386-builtin.def: Add corrresponding new builtins. * config/i386/i386-expand.c: Handle new builtin types. * config/i386/sse.md (VF48_128): New mode iterator. (avx512fp16_vcvtsh2): New. (avx512fp16_vcvt2sh): Ditto. gcc/testsuite/ChangeLog: * gcc.target/i386/avx-1.c: Add test for new builtins. * gcc.target/i386/sse-13.c: Ditto. * gcc.target/i386/sse-23.c: Ditto. * gcc.target/i386/sse-14.c: Add test for new intrinsics. * gcc.target/i386/sse-22.c: Ditto. --- gcc/config/i386/avx512fp16intrin.h | 280 +++++++++++++++++++++++++ gcc/config/i386/i386-builtin-types.def | 4 + gcc/config/i386/i386-builtin.def | 4 + gcc/config/i386/i386-expand.c | 4 + gcc/config/i386/sse.md | 36 ++++ gcc/testsuite/gcc.target/i386/avx-1.c | 4 + gcc/testsuite/gcc.target/i386/sse-13.c | 4 + gcc/testsuite/gcc.target/i386/sse-14.c | 12 ++ gcc/testsuite/gcc.target/i386/sse-22.c | 12 ++ gcc/testsuite/gcc.target/i386/sse-23.c | 4 + 10 files changed, 364 insertions(+) diff --git a/gcc/config/i386/avx512fp16intrin.h b/gcc/config/i386/avx512fp16intrin.h index 5a6a0ba83a9..05efbc5777b 100644 --- a/gcc/config/i386/avx512fp16intrin.h +++ b/gcc/config/i386/avx512fp16intrin.h @@ -4604,6 +4604,286 @@ _mm512_maskz_cvt_roundpd_ph (__mmask8 __A, __m512d __B, int __C) #endif /* __OPTIMIZE__ */ +/* Intrinsics vcvtsh2ss, vcvtsh2sd. */ +extern __inline __m128 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvtsh_ss (__m128 __A, __m128h __B) +{ + return __builtin_ia32_vcvtsh2ss_mask_round (__B, __A, + _mm_setzero_ps (), + (__mmask8) -1, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m128 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_cvtsh_ss (__m128 __A, __mmask8 __B, __m128 __C, + __m128h __D) +{ + return __builtin_ia32_vcvtsh2ss_mask_round (__D, __C, __A, __B, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m128 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_cvtsh_ss (__mmask8 __A, __m128 __B, + __m128h __C) +{ + return __builtin_ia32_vcvtsh2ss_mask_round (__C, __B, + _mm_setzero_ps (), + __A, _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m128d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvtsh_sd (__m128d __A, __m128h __B) +{ + return __builtin_ia32_vcvtsh2sd_mask_round (__B, __A, + _mm_setzero_pd (), + (__mmask8) -1, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m128d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_cvtsh_sd (__m128d __A, __mmask8 __B, __m128d __C, + __m128h __D) +{ + return __builtin_ia32_vcvtsh2sd_mask_round (__D, __C, __A, __B, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m128d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_cvtsh_sd (__mmask8 __A, __m128d __B, __m128h __C) +{ + return __builtin_ia32_vcvtsh2sd_mask_round (__C, __B, + _mm_setzero_pd (), + __A, _MM_FROUND_CUR_DIRECTION); +} + +#ifdef __OPTIMIZE__ +extern __inline __m128 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvt_roundsh_ss (__m128 __A, __m128h __B, const int __R) +{ + return __builtin_ia32_vcvtsh2ss_mask_round (__B, __A, + _mm_setzero_ps (), + (__mmask8) -1, __R); +} + +extern __inline __m128 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_cvt_roundsh_ss (__m128 __A, __mmask8 __B, __m128 __C, + __m128h __D, const int __R) +{ + return __builtin_ia32_vcvtsh2ss_mask_round (__D, __C, __A, __B, __R); +} + +extern __inline __m128 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_cvt_roundsh_ss (__mmask8 __A, __m128 __B, + __m128h __C, const int __R) +{ + return __builtin_ia32_vcvtsh2ss_mask_round (__C, __B, + _mm_setzero_ps (), + __A, __R); +} + +extern __inline __m128d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvt_roundsh_sd (__m128d __A, __m128h __B, const int __R) +{ + return __builtin_ia32_vcvtsh2sd_mask_round (__B, __A, + _mm_setzero_pd (), + (__mmask8) -1, __R); +} + +extern __inline __m128d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_cvt_roundsh_sd (__m128d __A, __mmask8 __B, __m128d __C, + __m128h __D, const int __R) +{ + return __builtin_ia32_vcvtsh2sd_mask_round (__D, __C, __A, __B, __R); +} + +extern __inline __m128d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_cvt_roundsh_sd (__mmask8 __A, __m128d __B, __m128h __C, const int __R) +{ + return __builtin_ia32_vcvtsh2sd_mask_round (__C, __B, + _mm_setzero_pd (), + __A, __R); +} + +#else +#define _mm_cvt_roundsh_ss(A, B, R) \ + (__builtin_ia32_vcvtsh2ss_mask_round ((B), (A), \ + _mm_setzero_ps (), \ + (__mmask8) -1, (R))) + +#define _mm_mask_cvt_roundsh_ss(A, B, C, D, R) \ + (__builtin_ia32_vcvtsh2ss_mask_round ((D), (C), (A), (B), (R))) + +#define _mm_maskz_cvt_roundsh_ss(A, B, C, R) \ + (__builtin_ia32_vcvtsh2ss_mask_round((C), (B), \ + _mm_setzero_ps (), \ + (A), (R))) + +#define _mm_cvt_roundsh_sd(A, B, R) \ + (__builtin_ia32_vcvtsh2sd_mask_round((B), (A), \ + _mm_setzero_pd (), \ + (__mmask8) -1, (R))) + +#define _mm_mask_cvt_roundsh_sd(A, B, C, D, R) \ + (__builtin_ia32_vcvtsh2sd_mask_round((D), (C), (A), (B), (R))) + +#define _mm_maskz_cvt_roundsh_sd(A, B, C, R) \ + (__builtin_ia32_vcvtsh2sd_mask_round((C), (B), \ + _mm_setzero_pd (), \ + (A), (R))) + +#endif /* __OPTIMIZE__ */ + +/* Intrinsics vcvtss2sh, vcvtsd2sh. */ +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvtss_sh (__m128h __A, __m128 __B) +{ + return __builtin_ia32_vcvtss2sh_mask_round (__B, __A, + _mm_setzero_ph (), + (__mmask8) -1, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_cvtss_sh (__m128h __A, __mmask8 __B, __m128h __C, __m128 __D) +{ + return __builtin_ia32_vcvtss2sh_mask_round (__D, __C, __A, __B, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_cvtss_sh (__mmask8 __A, __m128h __B, __m128 __C) +{ + return __builtin_ia32_vcvtss2sh_mask_round (__C, __B, + _mm_setzero_ph (), + __A, _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvtsd_sh (__m128h __A, __m128d __B) +{ + return __builtin_ia32_vcvtsd2sh_mask_round (__B, __A, + _mm_setzero_ph (), + (__mmask8) -1, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_cvtsd_sh (__m128h __A, __mmask8 __B, __m128h __C, __m128d __D) +{ + return __builtin_ia32_vcvtsd2sh_mask_round (__D, __C, __A, __B, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_cvtsd_sh (__mmask8 __A, __m128h __B, __m128d __C) +{ + return __builtin_ia32_vcvtsd2sh_mask_round (__C, __B, + _mm_setzero_ph (), + __A, _MM_FROUND_CUR_DIRECTION); +} + +#ifdef __OPTIMIZE__ +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvt_roundss_sh (__m128h __A, __m128 __B, const int __R) +{ + return __builtin_ia32_vcvtss2sh_mask_round (__B, __A, + _mm_setzero_ph (), + (__mmask8) -1, __R); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_cvt_roundss_sh (__m128h __A, __mmask8 __B, __m128h __C, __m128 __D, + const int __R) +{ + return __builtin_ia32_vcvtss2sh_mask_round (__D, __C, __A, __B, __R); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_cvt_roundss_sh (__mmask8 __A, __m128h __B, __m128 __C, + const int __R) +{ + return __builtin_ia32_vcvtss2sh_mask_round (__C, __B, + _mm_setzero_ph (), + __A, __R); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvt_roundsd_sh (__m128h __A, __m128d __B, const int __R) +{ + return __builtin_ia32_vcvtsd2sh_mask_round (__B, __A, + _mm_setzero_ph (), + (__mmask8) -1, __R); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_cvt_roundsd_sh (__m128h __A, __mmask8 __B, __m128h __C, __m128d __D, + const int __R) +{ + return __builtin_ia32_vcvtsd2sh_mask_round (__D, __C, __A, __B, __R); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_cvt_roundsd_sh (__mmask8 __A, __m128h __B, __m128d __C, + const int __R) +{ + return __builtin_ia32_vcvtsd2sh_mask_round (__C, __B, + _mm_setzero_ph (), + __A, __R); +} + +#else +#define _mm_cvt_roundss_sh(A, B, R) \ + (__builtin_ia32_vcvtss2sh_mask_round ((B), (A), \ + _mm_setzero_ph (), \ + (__mmask8) -1, R)) + +#define _mm_mask_cvt_roundss_sh(A, B, C, D, R) \ + (__builtin_ia32_vcvtss2sh_mask_round ((D), (C), (A), (B), (R))) + +#define _mm_maskz_cvt_roundss_sh(A, B, C, R) \ + (__builtin_ia32_vcvtss2sh_mask_round ((C), (B), \ + _mm_setzero_ph (), \ + A, R)) + +#define _mm_cvt_roundsd_sh(A, B, R) \ + (__builtin_ia32_vcvtsd2sh_mask_round ((B), (A), \ + _mm_setzero_ph (), \ + (__mmask8) -1, R)) + +#define _mm_mask_cvt_roundsd_sh(A, B, C, D, R) \ + (__builtin_ia32_vcvtsd2sh_mask_round ((D), (C), (A), (B), (R))) + +#define _mm_maskz_cvt_roundsd_sh(A, B, C, R) \ + (__builtin_ia32_vcvtsd2sh_mask_round ((C), (B), \ + _mm_setzero_ph (), \ + (A), (R))) + +#endif /* __OPTIMIZE__ */ + #ifdef __DISABLE_AVX512FP16__ #undef __DISABLE_AVX512FP16__ #pragma GCC pop_options diff --git a/gcc/config/i386/i386-builtin-types.def b/gcc/config/i386/i386-builtin-types.def index 4123e66f7cd..0cdbf1bc0c0 100644 --- a/gcc/config/i386/i386-builtin-types.def +++ b/gcc/config/i386/i386-builtin-types.def @@ -1348,6 +1348,10 @@ DEF_FUNCTION_TYPE (V8DF, V8HF, V8DF, UQI, INT) DEF_FUNCTION_TYPE (V8HF, V8DI, V8HF, UQI, INT) DEF_FUNCTION_TYPE (V8HF, V8DF, V8HF, UQI, INT) DEF_FUNCTION_TYPE (V8HF, V8HF, V8HF, V8HF, UQI, INT) +DEF_FUNCTION_TYPE (V8HF, V2DF, V8HF, V8HF, UQI, INT) +DEF_FUNCTION_TYPE (V8HF, V4SF, V8HF, V8HF, UQI, INT) +DEF_FUNCTION_TYPE (V2DF, V8HF, V2DF, V2DF, UQI, INT) +DEF_FUNCTION_TYPE (V4SF, V8HF, V4SF, V4SF, UQI, INT) DEF_FUNCTION_TYPE (V8HF, V8HF, V8HF, INT, V8HF, UQI, INT) DEF_FUNCTION_TYPE (V16HF, V16HF, V16HF) DEF_FUNCTION_TYPE (V16HI, V16HF, V16HI, UHI) diff --git a/gcc/config/i386/i386-builtin.def b/gcc/config/i386/i386-builtin.def index 2992bd0383d..4bb48bc21dc 100644 --- a/gcc/config/i386/i386-builtin.def +++ b/gcc/config/i386/i386-builtin.def @@ -3136,6 +3136,10 @@ BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_float_extend_phv8df2_ BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_float_extend_phv16sf2_mask_round, "__builtin_ia32_vcvtph2ps_v16sf_mask_round", IX86_BUILTIN_VCVTPH2PS_V16SF_MASK_ROUND, UNKNOWN, (int) V16SF_FTYPE_V16HF_V16SF_UHI_INT) BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtpd2ph_v8df_mask_round, "__builtin_ia32_vcvtpd2ph_v8df_mask_round", IX86_BUILTIN_VCVTPD2PH_V8DF_MASK_ROUND, UNKNOWN, (int) V8HF_FTYPE_V8DF_V8HF_UQI_INT) BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtps2ph_v16sf_mask_round, "__builtin_ia32_vcvtps2ph_v16sf_mask_round", IX86_BUILTIN_VCVTPS2PH_V16SF_MASK_ROUND, UNKNOWN, (int) V16HF_FTYPE_V16SF_V16HF_UHI_INT) +BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtsh2ss_mask_round, "__builtin_ia32_vcvtsh2ss_mask_round", IX86_BUILTIN_VCVTSH2SS_MASK_ROUND, UNKNOWN, (int) V4SF_FTYPE_V8HF_V4SF_V4SF_UQI_INT) +BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtsh2sd_mask_round, "__builtin_ia32_vcvtsh2sd_mask_round", IX86_BUILTIN_VCVTSH2SD_MASK_ROUND, UNKNOWN, (int) V2DF_FTYPE_V8HF_V2DF_V2DF_UQI_INT) +BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtss2sh_mask_round, "__builtin_ia32_vcvtss2sh_mask_round", IX86_BUILTIN_VCVTSS2SH_MASK_ROUND, UNKNOWN, (int) V8HF_FTYPE_V4SF_V8HF_V8HF_UQI_INT) +BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtsd2sh_mask_round, "__builtin_ia32_vcvtsd2sh_mask_round", IX86_BUILTIN_VCVTSD2SH_MASK_ROUND, UNKNOWN, (int) V8HF_FTYPE_V2DF_V8HF_V8HF_UQI_INT) BDESC_END (ROUND_ARGS, MULTI_ARG) diff --git a/gcc/config/i386/i386-expand.c b/gcc/config/i386/i386-expand.c index a216f6f2bf3..9233c6cd1e8 100644 --- a/gcc/config/i386/i386-expand.c +++ b/gcc/config/i386/i386-expand.c @@ -10565,8 +10565,10 @@ ix86_expand_round_builtin (const struct builtin_description *d, case V8DF_FTYPE_V8DF_V8DF_V8DF_UQI_INT: case V2DF_FTYPE_V2DF_V2DF_V2DF_UQI_INT: case V4SF_FTYPE_V4SF_V4SF_V4SF_UQI_INT: + case V4SF_FTYPE_V8HF_V4SF_V4SF_UQI_INT: case V16SF_FTYPE_V16SF_V16SF_V16SF_HI_INT: case V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT: + case V2DF_FTYPE_V8HF_V2DF_V2DF_UQI_INT: case V2DF_FTYPE_V2DF_V2DF_V2DF_QI_INT: case V2DF_FTYPE_V2DF_V4SF_V2DF_QI_INT: case V2DF_FTYPE_V2DF_V4SF_V2DF_UQI_INT: @@ -10574,6 +10576,8 @@ ix86_expand_round_builtin (const struct builtin_description *d, case V4SF_FTYPE_V4SF_V2DF_V4SF_QI_INT: case V4SF_FTYPE_V4SF_V2DF_V4SF_UQI_INT: case V8HF_FTYPE_V8HF_V8HF_V8HF_UQI_INT: + case V8HF_FTYPE_V2DF_V8HF_V8HF_UQI_INT: + case V8HF_FTYPE_V4SF_V8HF_V8HF_UQI_INT: nargs = 5; break; case V32HF_FTYPE_V32HF_INT_V32HF_USI_INT: diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index 7447d6b75b5..95f4a82c9cd 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -788,6 +788,10 @@ (define_mode_iterator VF48_I1248 [V16SI V16SF V8DI V8DF V32HI V64QI]) (define_mode_iterator VF48H_AVX512VL [V8DF V16SF (V8SF "TARGET_AVX512VL")]) + +(define_mode_iterator VF48_128 + [V2DF V4SF]) + (define_mode_iterator VI48F [V16SI V16SF V8DI V8DF (V8SI "TARGET_AVX512VL") (V8SF "TARGET_AVX512VL") @@ -5869,6 +5873,38 @@ (define_insn "*avx512fp16_vcvtpd2ph_v2df_mask_1" (set_attr "prefix" "evex") (set_attr "mode" "TI")]) +(define_insn "avx512fp16_vcvtsh2" + [(set (match_operand:VF48_128 0 "register_operand" "=v") + (vec_merge:VF48_128 + (vec_duplicate:VF48_128 + (float_extend: + (vec_select:HF + (match_operand:V8HF 1 "" "") + (parallel [(const_int 0)])))) + (match_operand:VF48_128 2 "register_operand" "v") + (const_int 1)))] + "TARGET_AVX512FP16" + "vcvtsh2\t{%1, %2, %0|%0, %2, %1}" + [(set_attr "type" "ssecvt") + (set_attr "prefix" "evex") + (set_attr "mode" "TI")]) + +(define_insn "avx512fp16_vcvt2sh" + [(set (match_operand:V8HF 0 "register_operand" "=v") + (vec_merge:V8HF + (vec_duplicate:V8HF + (float_truncate:HF + (vec_select: + (match_operand:VF48_128 1 "" "") + (parallel [(const_int 0)])))) + (match_operand:V8HF 2 "register_operand" "v") + (const_int 1)))] +"TARGET_AVX512FP16" +"vcvt2sh\t{%1, %2, %0|%0, %2, %1}" + [(set_attr "type" "ssecvt") + (set_attr "prefix" "evex") + (set_attr "mode" "TI")]) + ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; ;; ;; Parallel single-precision floating point conversion operations diff --git a/gcc/testsuite/gcc.target/i386/avx-1.c b/gcc/testsuite/gcc.target/i386/avx-1.c index f186f8c40f3..deb25098f25 100644 --- a/gcc/testsuite/gcc.target/i386/avx-1.c +++ b/gcc/testsuite/gcc.target/i386/avx-1.c @@ -753,6 +753,10 @@ #define __builtin_ia32_vcvtph2ps_v16sf_mask_round(A, B, C, D) __builtin_ia32_vcvtph2ps_v16sf_mask_round(A, B, C, 8) #define __builtin_ia32_vcvtpd2ph_v8df_mask_round(A, B, C, D) __builtin_ia32_vcvtpd2ph_v8df_mask_round(A, B, C, 8) #define __builtin_ia32_vcvtps2ph_v16sf_mask_round(A, B, C, D) __builtin_ia32_vcvtps2ph_v16sf_mask_round(A, B, C, 8) +#define __builtin_ia32_vcvtsh2ss_mask_round(A, B, C, D, E) __builtin_ia32_vcvtsh2ss_mask_round(A, B, C, D, 8) +#define __builtin_ia32_vcvtsh2sd_mask_round(A, B, C, D, E) __builtin_ia32_vcvtsh2sd_mask_round(A, B, C, D, 8) +#define __builtin_ia32_vcvtss2sh_mask_round(A, B, C, D, E) __builtin_ia32_vcvtss2sh_mask_round(A, B, C, D, 8) +#define __builtin_ia32_vcvtsd2sh_mask_round(A, B, C, D, E) __builtin_ia32_vcvtsd2sh_mask_round(A, B, C, D, 8) /* avx512fp16vlintrin.h */ #define __builtin_ia32_vcmpph_v8hf_mask(A, B, C, D) __builtin_ia32_vcmpph_v8hf_mask(A, B, 1, D) diff --git a/gcc/testsuite/gcc.target/i386/sse-13.c b/gcc/testsuite/gcc.target/i386/sse-13.c index 0e88174e636..dbe206bd1bb 100644 --- a/gcc/testsuite/gcc.target/i386/sse-13.c +++ b/gcc/testsuite/gcc.target/i386/sse-13.c @@ -770,6 +770,10 @@ #define __builtin_ia32_vcvtph2ps_v16sf_mask_round(A, B, C, D) __builtin_ia32_vcvtph2ps_v16sf_mask_round(A, B, C, 8) #define __builtin_ia32_vcvtpd2ph_v8df_mask_round(A, B, C, D) __builtin_ia32_vcvtpd2ph_v8df_mask_round(A, B, C, 8) #define __builtin_ia32_vcvtps2ph_v16sf_mask_round(A, B, C, D) __builtin_ia32_vcvtps2ph_v16sf_mask_round(A, B, C, 8) +#define __builtin_ia32_vcvtsh2ss_mask_round(A, B, C, D, E) __builtin_ia32_vcvtsh2ss_mask_round(A, B, C, D, 8) +#define __builtin_ia32_vcvtsh2sd_mask_round(A, B, C, D, E) __builtin_ia32_vcvtsh2sd_mask_round(A, B, C, D, 8) +#define __builtin_ia32_vcvtss2sh_mask_round(A, B, C, D, E) __builtin_ia32_vcvtss2sh_mask_round(A, B, C, D, 8) +#define __builtin_ia32_vcvtsd2sh_mask_round(A, B, C, D, E) __builtin_ia32_vcvtsd2sh_mask_round(A, B, C, D, 8) /* avx512fp16vlintrin.h */ #define __builtin_ia32_vcmpph_v8hf_mask(A, B, C, D) __builtin_ia32_vcmpph_v8hf_mask(A, B, 1, D) diff --git a/gcc/testsuite/gcc.target/i386/sse-14.c b/gcc/testsuite/gcc.target/i386/sse-14.c index 5c3e370d4a7..e64321d8afa 100644 --- a/gcc/testsuite/gcc.target/i386/sse-14.c +++ b/gcc/testsuite/gcc.target/i386/sse-14.c @@ -766,6 +766,10 @@ test_2 (_mm512_maskz_cvt_roundepi64_ph, __m128h, __mmask8, __m512i, 8) test_2 (_mm512_maskz_cvt_roundepu64_ph, __m128h, __mmask8, __m512i, 8) test_2 (_mm512_maskz_cvtx_roundps_ph, __m256h, __mmask16, __m512, 8) test_2 (_mm512_maskz_cvt_roundpd_ph, __m128h, __mmask8, __m512d, 8) +test_2 (_mm_cvt_roundsh_ss, __m128, __m128, __m128h, 8) +test_2 (_mm_cvt_roundsh_sd, __m128d, __m128d, __m128h, 8) +test_2 (_mm_cvt_roundss_sh, __m128h, __m128h, __m128, 8) +test_2 (_mm_cvt_roundsd_sh, __m128h, __m128h, __m128d, 8) test_2 (_mm_cvt_roundi32_sh, __m128h, __m128h, int, 8) test_2 (_mm_cvt_roundu32_sh, __m128h, __m128h, unsigned, 8) test_2x (_mm512_cmp_round_ph_mask, __mmask32, __m512h, __m512h, 1, 8) @@ -828,6 +832,10 @@ test_3 (_mm512_mask_cvt_roundepi64_ph, __m128h, __m128h, __mmask8, __m512i, 8) test_3 (_mm512_mask_cvt_roundepu64_ph, __m128h, __m128h, __mmask8, __m512i, 8) test_3 (_mm512_mask_cvtx_roundps_ph, __m256h, __m256h, __mmask16, __m512, 8) test_3 (_mm512_mask_cvt_roundpd_ph, __m128h, __m128h, __mmask8, __m512d, 8) +test_3 (_mm_maskz_cvt_roundsh_ss, __m128, __mmask8, __m128, __m128h, 8) +test_3 (_mm_maskz_cvt_roundsh_sd, __m128d, __mmask8, __m128d, __m128h, 8) +test_3 (_mm_maskz_cvt_roundss_sh, __m128h, __mmask8, __m128h, __m128, 8) +test_3 (_mm_maskz_cvt_roundsd_sh, __m128h, __mmask8, __m128h, __m128d, 8) test_3x (_mm512_mask_cmp_round_ph_mask, __mmask32, __mmask32, __m512h, __m512h, 1, 8) test_3x (_mm_mask_cmp_round_sh_mask, __mmask8, __mmask8, __m128h, __m128h, 1, 8) test_3x (_mm512_mask_reduce_round_ph, __m512h, __m512h, __mmask32, __m512h, 123, 8) @@ -856,6 +864,10 @@ test_4 (_mm_mask_scalef_round_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, test_4 (_mm_mask_reduce_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 123) test_4 (_mm_mask_roundscale_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 123) test_4 (_mm_mask_getexp_round_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 8) +test_4 (_mm_mask_cvt_roundsh_ss, __m128, __m128, __mmask8, __m128, __m128h, 8) +test_4 (_mm_mask_cvt_roundsh_sd, __m128d, __m128d, __mmask8, __m128d, __m128h, 8) +test_4 (_mm_mask_cvt_roundss_sh, __m128h, __m128h, __mmask8, __m128h, __m128, 8) +test_4 (_mm_mask_cvt_roundsd_sh, __m128h, __m128h, __mmask8, __m128h, __m128d, 8) test_4x (_mm_mask_reduce_round_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 123, 8) test_4x (_mm_mask_roundscale_round_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 123, 8) test_4x (_mm_mask_getmant_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 1, 1) diff --git a/gcc/testsuite/gcc.target/i386/sse-22.c b/gcc/testsuite/gcc.target/i386/sse-22.c index 5bf94d56ce3..d92898fdd11 100644 --- a/gcc/testsuite/gcc.target/i386/sse-22.c +++ b/gcc/testsuite/gcc.target/i386/sse-22.c @@ -872,6 +872,10 @@ test_2 (_mm512_maskz_cvtx_roundps_ph, __m256h, __mmask16, __m512, 8) test_2 (_mm512_maskz_cvt_roundpd_ph, __m128h, __mmask8, __m512d, 8) test_2 (_mm_cvt_roundi32_sh, __m128h, __m128h, int, 8) test_2 (_mm_cvt_roundu32_sh, __m128h, __m128h, unsigned, 8) +test_2 (_mm_cvt_roundsh_ss, __m128, __m128, __m128h, 8) +test_2 (_mm_cvt_roundsh_sd, __m128d, __m128d, __m128h, 8) +test_2 (_mm_cvt_roundss_sh, __m128h, __m128h, __m128, 8) +test_2 (_mm_cvt_roundsd_sh, __m128h, __m128h, __m128d, 8) test_2x (_mm512_cmp_round_ph_mask, __mmask32, __m512h, __m512h, 1, 8) test_2x (_mm_cmp_round_sh_mask, __mmask8, __m128h, __m128h, 1, 8) test_2x (_mm_comi_round_sh, int, __m128h, __m128h, 1, 8) @@ -931,6 +935,10 @@ test_3 (_mm512_mask_cvt_roundepi64_ph, __m128h, __m128h, __mmask8, __m512i, 8) test_3 (_mm512_mask_cvt_roundepu64_ph, __m128h, __m128h, __mmask8, __m512i, 8) test_3 (_mm512_mask_cvtx_roundps_ph, __m256h, __m256h, __mmask16, __m512, 8) test_3 (_mm512_mask_cvt_roundpd_ph, __m128h, __m128h, __mmask8, __m512d, 8) +test_3 (_mm_maskz_cvt_roundsh_ss, __m128, __mmask8, __m128, __m128h, 8) +test_3 (_mm_maskz_cvt_roundsh_sd, __m128d, __mmask8, __m128d, __m128h, 8) +test_3 (_mm_maskz_cvt_roundss_sh, __m128h, __mmask8, __m128h, __m128, 8) +test_3 (_mm_maskz_cvt_roundsd_sh, __m128h, __mmask8, __m128h, __m128d, 8) test_3x (_mm512_mask_cmp_round_ph_mask, __mmask32, __mmask32, __m512h, __m512h, 1, 8) test_3x (_mm_mask_cmp_round_sh_mask, __mmask8, __mmask8, __m128h, __m128h, 1, 8) test_3x (_mm512_mask_reduce_round_ph, __m512h, __m512h, __mmask32, __m512h, 123, 8) @@ -958,6 +966,10 @@ test_4 (_mm512_mask_scalef_round_ph, __m512h, __m512h, __mmask32, __m512h, __m51 test_4 (_mm_mask_reduce_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 123) test_4 (_mm_mask_roundscale_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 123) test_4 (_mm_mask_getexp_round_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 8) +test_4 (_mm_mask_cvt_roundsh_ss, __m128, __m128, __mmask8, __m128, __m128h, 8) +test_4 (_mm_mask_cvt_roundsh_sd, __m128d, __m128d, __mmask8, __m128d, __m128h, 8) +test_4 (_mm_mask_cvt_roundss_sh, __m128h, __m128h, __mmask8, __m128h, __m128, 8) +test_4 (_mm_mask_cvt_roundsd_sh, __m128h, __m128h, __mmask8, __m128h, __m128d, 8) test_4x (_mm_mask_reduce_round_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 123, 8) test_4x (_mm_mask_roundscale_round_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 123, 8) test_4x (_mm_mask_getmant_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 1, 1) diff --git a/gcc/testsuite/gcc.target/i386/sse-23.c b/gcc/testsuite/gcc.target/i386/sse-23.c index 2f27d9a1e87..2f5027ba36f 100644 --- a/gcc/testsuite/gcc.target/i386/sse-23.c +++ b/gcc/testsuite/gcc.target/i386/sse-23.c @@ -771,6 +771,10 @@ #define __builtin_ia32_vcvtph2ps_v16sf_mask_round(A, B, C, D) __builtin_ia32_vcvtph2ps_v16sf_mask_round(A, B, C, 8) #define __builtin_ia32_vcvtpd2ph_v8df_mask_round(A, B, C, D) __builtin_ia32_vcvtpd2ph_v8df_mask_round(A, B, C, 8) #define __builtin_ia32_vcvtps2ph_v16sf_mask_round(A, B, C, D) __builtin_ia32_vcvtps2ph_v16sf_mask_round(A, B, C, 8) +#define __builtin_ia32_vcvtsh2ss_mask_round(A, B, C, D, E) __builtin_ia32_vcvtsh2ss_mask_round(A, B, C, D, 8) +#define __builtin_ia32_vcvtsh2sd_mask_round(A, B, C, D, E) __builtin_ia32_vcvtsh2sd_mask_round(A, B, C, D, 8) +#define __builtin_ia32_vcvtss2sh_mask_round(A, B, C, D, E) __builtin_ia32_vcvtss2sh_mask_round(A, B, C, D, 8) +#define __builtin_ia32_vcvtsd2sh_mask_round(A, B, C, D, E) __builtin_ia32_vcvtsd2sh_mask_round(A, B, C, D, 8) /* avx512fp16vlintrin.h */ #define __builtin_ia32_vcmpph_v8hf_mask(A, B, C, D) __builtin_ia32_vcmpph_v8hf_mask(A, B, 1, D) From patchwork Thu Jul 1 06:16:24 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: liuhongt X-Patchwork-Id: 1499376 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256 header.s=default header.b=QmJ6rNjo; dkim-atps=neutral Received: from sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4GFpwK0l4Sz9sWw for ; Thu, 1 Jul 2021 16:58:57 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 654413839802 for ; Thu, 1 Jul 2021 06:58:54 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 654413839802 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1625122734; bh=PIbayIxJzH1nMzDmQbPseS72yDuG4FzVpfK8HVVDwlQ=; h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=QmJ6rNjohDZRm0kAfE4D2kz59GpRPF26m+4QqShQz4tGI+YPn8+esQr550Xw+abO1 xj7ZZOgwkKFHyXTAg3ax38N1e4FRFF3AXUR0FKMACxCsL6RbpTcg5UdLWGiKkyTvx7 q//pjcmP2a2M7snYrfAJN+F7fx3elcBVZIsQSy/A= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mga06.intel.com (mga06.intel.com [134.134.136.31]) by sourceware.org (Postfix) with ESMTPS id 80EBF384A012 for ; Thu, 1 Jul 2021 06:17:53 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 80EBF384A012 X-IronPort-AV: E=McAfee;i="6200,9189,10031"; a="269610179" X-IronPort-AV: E=Sophos;i="5.83,313,1616482800"; d="scan'208";a="269610179" Received: from orsmga003.jf.intel.com ([10.7.209.27]) by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Jun 2021 23:17:52 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.83,313,1616482800"; d="scan'208";a="409038825" Received: from scymds01.sc.intel.com ([10.148.94.138]) by orsmga003.jf.intel.com with ESMTP; 30 Jun 2021 23:17:52 -0700 Received: from shliclel320.sh.intel.com (shliclel320.sh.intel.com [10.239.236.50]) by scymds01.sc.intel.com with ESMTP id 1616GmfH031625; Wed, 30 Jun 2021 23:17:51 -0700 To: gcc-patches@gcc.gnu.org Subject: [PATCH 38/62] AVX512FP16: Add testcase for vcvtsh2sd/vcvtsh2ss/vcvtsd2sh/vcvtss2sh. Date: Thu, 1 Jul 2021 14:16:24 +0800 Message-Id: <20210701061648.9447-39-hongtao.liu@intel.com> X-Mailer: git-send-email 2.18.1 In-Reply-To: <20210701061648.9447-1-hongtao.liu@intel.com> References: <20210701061648.9447-1-hongtao.liu@intel.com> X-Spam-Status: No, score=-12.5 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_NONE, KAM_DMARC_STATUS, KAM_LAZY_DOMAIN_SECURITY, KAM_SHORT, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: liuhongt via Gcc-patches From: liuhongt Reply-To: liuhongt Cc: jakub@redhat.com Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" gcc/testsuite/ChangeLog: * gcc.target/i386/avx512fp16-vcvtsd2sh-1a.c: New test. * gcc.target/i386/avx512fp16-vcvtsd2sh-1b.c: Ditto. * gcc.target/i386/avx512fp16-vcvtsh2sd-1a.c: Ditto. * gcc.target/i386/avx512fp16-vcvtsh2sd-1b.c: Ditto. * gcc.target/i386/avx512fp16-vcvtsh2ss-1a.c: Ditto. * gcc.target/i386/avx512fp16-vcvtsh2ss-1b.c: Ditto. * gcc.target/i386/avx512fp16-vcvtss2sh-1a.c: Ditto. * gcc.target/i386/avx512fp16-vcvtss2sh-1b.c: Ditto. --- .../gcc.target/i386/avx512fp16-vcvtsd2sh-1a.c | 25 ++++++++ .../gcc.target/i386/avx512fp16-vcvtsd2sh-1b.c | 60 +++++++++++++++++++ .../gcc.target/i386/avx512fp16-vcvtsh2sd-1a.c | 25 ++++++++ .../gcc.target/i386/avx512fp16-vcvtsh2sd-1b.c | 57 ++++++++++++++++++ .../gcc.target/i386/avx512fp16-vcvtsh2ss-1a.c | 25 ++++++++ .../gcc.target/i386/avx512fp16-vcvtsh2ss-1b.c | 59 ++++++++++++++++++ .../gcc.target/i386/avx512fp16-vcvtss2sh-1a.c | 25 ++++++++ .../gcc.target/i386/avx512fp16-vcvtss2sh-1b.c | 60 +++++++++++++++++++ 8 files changed, 336 insertions(+) create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvtsd2sh-1a.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvtsd2sh-1b.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvtsh2sd-1a.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvtsh2sd-1b.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvtsh2ss-1a.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvtsh2ss-1b.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvtss2sh-1a.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvtss2sh-1b.c diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtsd2sh-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtsd2sh-1a.c new file mode 100644 index 00000000000..b663ca507fe --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtsd2sh-1a.c @@ -0,0 +1,25 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx512fp16 -O2" } */ +/* { dg-final { scan-assembler-times "vcvtsd2sh\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtsd2sh\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtsd2sh\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtsd2sh\[ \\t\]+\{rn-sae\}\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtsd2sh\[ \\t\]+\{rn-sae\}\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtsd2sh\[ \\t\]+\{rz-sae\}\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ + +#include + +volatile __m128h res, x1; +volatile __m128d x2; +volatile __mmask8 m8; + +void extern +avx512f_test (void) +{ + res = _mm_cvtsd_sh (x1, x2); + res = _mm_mask_cvtsd_sh (res, m8, x1, x2); + res = _mm_maskz_cvtsd_sh (m8, x1, x2); + res = _mm_cvt_roundsd_sh (x1, x2, 8); + res = _mm_mask_cvt_roundsd_sh (res, m8, x1, x2, 8); + res = _mm_maskz_cvt_roundsd_sh (m8, x1, x2, 11); +} diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtsd2sh-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtsd2sh-1b.c new file mode 100644 index 00000000000..552362058c5 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtsd2sh-1b.c @@ -0,0 +1,60 @@ +/* { dg-do run { target avx512fp16 } } */ +/* { dg-options "-O2 -mavx512fp16 -mavx512dq" } */ + + +#define AVX512FP16 +#include "avx512fp16-helper.h" + +#define N_ELEMS 8 + +void NOINLINE +emulate_vcvtsd2sh(V512 * dest, V512 op1, V512 op2, + __mmask8 k, int zero_mask) +{ + V512 v1, v2, v5, v6, v7, v8; + int i; + + unpack_ph_2twops(op1, &v1, &v2); + unpack_ph_2twops(*dest, &v7, &v8); + + if ((k&1) || !k) + v5.f32[0] = (float)op2.f64[0]; + else if (zero_mask) + v5.f32[0] = 0; + else + v5.f32[0] = v7.f32[0]; + + for (i = 1; i < 8; i++) + v5.f32[i] = v1.f32[i]; + + *dest = pack_twops_2ph(v5, v6); +} + +void +test_512 (void) +{ + V512 res; + V512 exp; + + init_src(); + emulate_vcvtsd2sh(&exp, src1, src2, 0x1, 0); + res.xmmh[0] = _mm_cvt_roundsd_sh(src1.xmmh[0], src2.xmmd[0], + _ROUND_NINT); + check_results(&res, &exp, N_ELEMS, "_mm_cvt_roundsd_sh"); + + init_dest(&res, &exp); + emulate_vcvtsd2sh(&exp, src1, src2, 0x1, 0); + res.xmmh[0] = _mm_mask_cvt_roundsd_sh(res.xmmh[0], 0x1, src1.xmmh[0], + src2.xmmd[0], _ROUND_NINT); + check_results(&res, &exp, N_ELEMS, "mm_mask_cvt_roundsd_sh"); + + emulate_vcvtsd2sh(&exp, src1, src2, 0x2, 1); + res.xmmh[0] = _mm_maskz_cvt_roundsd_sh(0x2, src1.xmmh[0], + src2.xmmd[0], _ROUND_NINT); + check_results(&res, &exp, N_ELEMS, "mm_maskz_cvt_roundsd_sh"); + + if (n_errs != 0) { + abort (); + } +} + diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtsh2sd-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtsh2sd-1a.c new file mode 100644 index 00000000000..59719ed18e6 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtsh2sd-1a.c @@ -0,0 +1,25 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx512fp16 -O2" } */ +/* { dg-final { scan-assembler-times "vcvtsh2sd\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtsh2sd\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtsh2sd\[ \\t\]+\{sae\}\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtsh2sd\[ \\t\]+\{sae\}\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtsh2sd\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 2 } } */ + +#include + +volatile __m128d res; +volatile __m128d x1; +volatile __m128h x2; +volatile __mmask8 m8; + +void extern +avx512f_test (void) +{ + res = _mm_cvtsh_sd (x1, x2); + res = _mm_mask_cvtsh_sd (res, m8, x1, x2); + res = _mm_maskz_cvtsh_sd (m8, x1, x2); + res = _mm_cvt_roundsh_sd (x1, x2, 8); + res = _mm_mask_cvt_roundsh_sd (res, m8, x1, x2, 8); + res = _mm_maskz_cvt_roundsh_sd (m8, x1, x2, 4); +} diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtsh2sd-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtsh2sd-1b.c new file mode 100644 index 00000000000..e6bdc9580bb --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtsh2sd-1b.c @@ -0,0 +1,57 @@ +/* { dg-do run { target avx512fp16 } } */ +/* { dg-options "-O2 -mavx512fp16 -mavx512dq" } */ + + +#define AVX512FP16 +#include "avx512fp16-helper.h" + +#define N_ELEMS 8 + +void NOINLINE +emulate_vcvtsh2sd(V512 * dest, V512 op1, V512 op2, + __mmask8 k, int zero_mask) +{ + V512 v1, v2, v3, v4, v5, v6, v7, v8; + + unpack_ph_2twops(op2, &v3, &v4); + + if ((k&1) || !k) + v5.f64[0] = v3.f32[0]; + else if (zero_mask) + v5.f64[0] = 0; + else + v5.f64[0] = dest->f64[0]; + + v5.f64[1] = op1.f64[1]; + + *dest = v5; +} + +void +test_512 (void) +{ + V512 res; + V512 exp; + + init_src(); + emulate_vcvtsh2sd(&exp, src1, src2, 0x1, 0); + res.xmmd[0] = _mm_cvt_roundsh_sd(src1.xmmd[0], src2.xmmh[0], + _ROUND_NINT); + check_results(&res, &exp, N_ELEMS, "_mm_cvt_roundsh_sd"); + + init_dest(&res, &exp); + emulate_vcvtsh2sd(&exp, src1, src2, 0x1, 0); + res.xmmd[0] = _mm_mask_cvt_roundsh_sd(res.xmmd[0], 0x1, src1.xmmd[0], + src2.xmmh[0], _ROUND_NINT); + check_results(&res, &exp, N_ELEMS, "mm_mask_cvt_roundsh_sd"); + + emulate_vcvtsh2sd(&exp, src1, src2, 0x2, 1); + res.xmmd[0] = _mm_maskz_cvt_roundsh_sd(0x2, src1.xmmd[0], + src2.xmmh[0], _ROUND_NINT); + check_results(&res, &exp, N_ELEMS, "mm_maskz_cvt_roundsh_sd"); + + if (n_errs != 0) { + abort (); + } +} + diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtsh2ss-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtsh2ss-1a.c new file mode 100644 index 00000000000..e6c369c067f --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtsh2ss-1a.c @@ -0,0 +1,25 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx512fp16 -O2" } */ +/* { dg-final { scan-assembler-times "vcvtsh2ss\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtsh2ss\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtsh2ss\[ \\t\]+\{sae\}\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtsh2ss\[ \\t\]+\{sae\}\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtsh2ss\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 2 } } */ + +#include + +volatile __m128 res; +volatile __m128 x1; +volatile __m128h x2; +volatile __mmask8 m8; + +void extern +avx512f_test (void) +{ + res = _mm_cvtsh_ss (x1, x2); + res = _mm_mask_cvtsh_ss (res, m8, x1, x2); + res = _mm_maskz_cvtsh_ss (m8, x1, x2); + res = _mm_cvt_roundsh_ss (x1, x2, 8); + res = _mm_mask_cvt_roundsh_ss (res, m8, x1, x2, 8); + res = _mm_maskz_cvt_roundsh_ss (m8, x1, x2, 4); +} diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtsh2ss-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtsh2ss-1b.c new file mode 100644 index 00000000000..319598341cd --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtsh2ss-1b.c @@ -0,0 +1,59 @@ +/* { dg-do run { target avx512fp16 } } */ +/* { dg-options "-O2 -mavx512fp16 -mavx512dq" } */ + + +#define AVX512FP16 +#include "avx512fp16-helper.h" + +#define N_ELEMS 8 + + void NOINLINE +emulate_vcvtsh2ss(V512 * dest, V512 op1, V512 op2, + __mmask8 k, int zero_mask) +{ + V512 v1, v2, v3, v4, v5, v6, v7, v8; + int i; + + unpack_ph_2twops(op2, &v3, &v4); + if ((k&1) || !k) + v5.f32[0] = v3.f32[0]; + else if (zero_mask) + v5.f32[0] = 0; + else + v5.f32[0] = dest->f32[0]; + + for (i = 1; i < 4; i++) + v5.f32[i] = op1.f32[i]; + + *dest = v5; +} + +void +test_512 (void) +{ + V512 res; + V512 exp; + + init_src(); + emulate_vcvtsh2ss(&exp, src1, src2, 0x1, 0); + res.xmm[0] = _mm_cvt_roundsh_ss(src1.xmm[0], src2.xmmh[0], + _ROUND_NINT); + check_results(&res, &exp, N_ELEMS, "_mm_cvt_roundsh_ss"); + + init_dest(&res, &exp); + emulate_vcvtsh2ss(&exp, src1, src2, 0x1, 0); + res.xmm[0] = _mm_mask_cvt_roundsh_ss(res.xmm[0], 0x1, src1.xmm[0], + src2.xmmh[0], _ROUND_NINT); + check_results(&res, &exp, N_ELEMS, "mm_mask_cvt_roundsh_ss"); + + emulate_vcvtsh2ss(&exp, src1, src2, 0x2, 1); + res.xmm[0] = _mm_maskz_cvt_roundsh_ss(0x2, src1.xmm[0], + src2.xmmh[0], _ROUND_NINT); + check_results(&res, &exp, N_ELEMS, "mm_maskz_cvt_roundsh_ss"); + + if (n_errs != 0) { + abort (); + } +} + + diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtss2sh-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtss2sh-1a.c new file mode 100644 index 00000000000..63ad0906555 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtss2sh-1a.c @@ -0,0 +1,25 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx512fp16 -O2" } */ +/* { dg-final { scan-assembler-times "vcvtss2sh\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtss2sh\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtss2sh\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtss2sh\[ \\t\]+\{rn-sae\}\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtss2sh\[ \\t\]+\{rn-sae\}\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtss2sh\[ \\t\]+\{rz-sae\}\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ + +#include + +volatile __m128h res, x1; +volatile __m128 x2; +volatile __mmask8 m8; + +void extern +avx512f_test (void) +{ + res = _mm_cvtss_sh (x1, x2); + res = _mm_mask_cvtss_sh (res, m8, x1, x2); + res = _mm_maskz_cvtss_sh (m8, x1, x2); + res = _mm_cvt_roundss_sh (x1, x2, 8); + res = _mm_mask_cvt_roundss_sh (res, m8, x1, x2, 8); + res = _mm_maskz_cvt_roundss_sh (m8, x1, x2, 11); +} diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtss2sh-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtss2sh-1b.c new file mode 100644 index 00000000000..94981bbb79f --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtss2sh-1b.c @@ -0,0 +1,60 @@ +/* { dg-do run { target avx512fp16 } } */ +/* { dg-options "-O2 -mavx512fp16 -mavx512dq" } */ + + +#define AVX512FP16 +#include "avx512fp16-helper.h" + +#define N_ELEMS 8 + +void NOINLINE +emulate_vcvtss2sh(V512 * dest, V512 op1, V512 op2, + __mmask8 k, int zero_mask) +{ + V512 v1, v2, v5, v6, v7, v8; + int i; + + unpack_ph_2twops(op1, &v1, &v2); + unpack_ph_2twops(*dest, &v7, &v8); + + if ((k&1) || !k) + v5.f32[0] = op2.f32[0]; + else if (zero_mask) + v5.f32[0] = 0; + else + v5.f32[0] = v7.f32[0]; + + for (i = 1; i < 8; i++) + v5.f32[i] = v1.f32[i]; + + *dest = pack_twops_2ph(v5, v6); +} + +void +test_512 (void) +{ + V512 res; + V512 exp; + + init_src(); + emulate_vcvtss2sh(&exp, src1, src2, 0x1, 0); + res.xmmh[0] = _mm_cvt_roundss_sh(src1.xmmh[0], src2.xmm[0], + _ROUND_NINT); + check_results(&res, &exp, N_ELEMS, "_mm_cvt_roundss_sh"); + + init_dest(&res, &exp); + emulate_vcvtss2sh(&exp, src1, src2, 0x1, 0); + res.xmmh[0] = _mm_mask_cvt_roundss_sh(res.xmmh[0], 0x1, src1.xmmh[0], + src2.xmm[0], _ROUND_NINT); + check_results(&res, &exp, N_ELEMS, "mm_mask_cvt_roundss_sh"); + + emulate_vcvtss2sh(&exp, src1, src2, 0x2, 1); + res.xmmh[0] = _mm_maskz_cvt_roundss_sh(0x2, src1.xmmh[0], + src2.xmm[0], _ROUND_NINT); + check_results(&res, &exp, N_ELEMS, "mm_maskz_cvt_roundss_sh"); + + if (n_errs != 0) { + abort (); + } +} + From patchwork Thu Jul 1 06:16:25 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: liuhongt X-Patchwork-Id: 1499384 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=8.43.85.97; helo=sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256 header.s=default header.b=adyW9SJZ; dkim-atps=neutral Received: from sourceware.org (ip-8-43-85-97.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4GFq0d69ZYz9sWw for ; Thu, 1 Jul 2021 17:02:41 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id A846F3839C43 for ; Thu, 1 Jul 2021 07:02:39 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org A846F3839C43 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1625122959; bh=WdgvHo1/gkqAdtiEYU9QfVgSiQFxfwwKN2EUPRYGlq8=; h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=adyW9SJZlbLnOOPD+2cnwLH+u3OAW0pB4b2ViH8IOUoHRf3CO1DNxj+X94QMJus39 axTdNcoHum2dx9kJ3fDP37zldyN2oYuXdYhzvoc0roRfiGbBPgIPpBuAszsOmTHh9J WceDCbEdp08a04VFjXuFcoGsx0twga8MnLQEU+wA= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mga12.intel.com (mga12.intel.com [192.55.52.136]) by sourceware.org (Postfix) with ESMTPS id 686F9384A024 for ; Thu, 1 Jul 2021 06:17:58 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 686F9384A024 X-IronPort-AV: E=McAfee;i="6200,9189,10031"; a="188163520" X-IronPort-AV: E=Sophos;i="5.83,313,1616482800"; d="scan'208";a="188163520" Received: from orsmga005.jf.intel.com ([10.7.209.41]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Jun 2021 23:17:54 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.83,313,1616482800"; d="scan'208";a="626257555" Received: from scymds01.sc.intel.com ([10.148.94.138]) by orsmga005.jf.intel.com with ESMTP; 30 Jun 2021 23:17:54 -0700 Received: from shliclel320.sh.intel.com (shliclel320.sh.intel.com [10.239.236.50]) by scymds01.sc.intel.com with ESMTP id 1616GmfI031625; Wed, 30 Jun 2021 23:17:52 -0700 To: gcc-patches@gcc.gnu.org Subject: [PATCH 39/62] AVX512FP16: Add intrinsics for casting between vector float16 and vector float32/float64/integer. Date: Thu, 1 Jul 2021 14:16:25 +0800 Message-Id: <20210701061648.9447-40-hongtao.liu@intel.com> X-Mailer: git-send-email 2.18.1 In-Reply-To: <20210701061648.9447-1-hongtao.liu@intel.com> References: <20210701061648.9447-1-hongtao.liu@intel.com> X-Spam-Status: No, score=-12.2 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_NONE, KAM_DMARC_STATUS, KAM_LAZY_DOMAIN_SECURITY, KAM_SHORT, SPF_HELO_PASS, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: liuhongt via Gcc-patches From: liuhongt Reply-To: liuhongt Cc: jakub@redhat.com Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" gcc/ChangeLog: * config/i386/avx512fp16intrin.h (_mm_undefined_ph): New intrinsic. (_mm256_undefined_ph): Likewise. (_mm512_undefined_ph): Likewise. (_mm_cvtsh_h): Likewise. (_mm256_cvtsh_h): Likewise. (_mm512_cvtsh_h): Likewise. (_mm512_castph_ps): Likewise. (_mm512_castph_pd): Likewise. (_mm512_castph_si512): Likewise. (_mm512_castph512_ph128): Likewise. (_mm512_castph512_ph256): Likewise. (_mm512_castph128_ph512): Likewise. (_mm512_castph256_ph512): Likewise. (_mm512_zextph128_ph512): Likewise. (_mm512_zextph256_ph512): Likewise. (_mm512_castps_ph): Likewise. (_mm512_castpd_ph): Likewise. (_mm512_castsi512_ph): Likewise. * config/i386/avx512fp16vlintrin.h (_mm_castph_ps): New intrinsic. (_mm256_castph_ps): Likewise. (_mm_castph_pd): Likewise. (_mm256_castph_pd): Likewise. (_mm_castph_si128): Likewise. (_mm256_castph_si256): Likewise. (_mm_castps_ph): Likewise. (_mm256_castps_ph): Likewise. (_mm_castpd_ph): Likewise. (_mm256_castpd_ph): Likewise. (_mm_castsi128_ph): Likewise. (_mm256_castsi256_ph): Likewise. (_mm256_castph256_ph128): Likewise. (_mm256_castph128_ph256): Likewise. (_mm256_zextph128_ph256): Likewise. gcc/testsuite/ChangeLog: * gcc.target/i386/avx512fp16-typecast-1.c: New test. * gcc.target/i386/avx512fp16-typecast-2.c: Ditto. * gcc.target/i386/avx512fp16vl-typecast-1.c: Ditto. * gcc.target/i386/avx512fp16vl-typecast-2.c: Ditto. --- gcc/config/i386/avx512fp16intrin.h | 153 ++++++++++++++++++ gcc/config/i386/avx512fp16vlintrin.h | 117 ++++++++++++++ .../gcc.target/i386/avx512fp16-typecast-1.c | 44 +++++ .../gcc.target/i386/avx512fp16-typecast-2.c | 43 +++++ .../gcc.target/i386/avx512fp16vl-typecast-1.c | 55 +++++++ .../gcc.target/i386/avx512fp16vl-typecast-2.c | 37 +++++ 6 files changed, 449 insertions(+) create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-typecast-1.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-typecast-2.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-typecast-1.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-typecast-2.c diff --git a/gcc/config/i386/avx512fp16intrin.h b/gcc/config/i386/avx512fp16intrin.h index 05efbc5777b..ddb227529fa 100644 --- a/gcc/config/i386/avx512fp16intrin.h +++ b/gcc/config/i386/avx512fp16intrin.h @@ -192,6 +192,159 @@ _mm512_setzero_ph (void) return _mm512_set1_ph (0.0f); } +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_undefined_ph (void) +{ + __m128h __Y = __Y; + return __Y; +} + +extern __inline __m256h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_undefined_ph (void) +{ + __m256h __Y = __Y; + return __Y; +} + +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_undefined_ph (void) +{ + __m512h __Y = __Y; + return __Y; +} + +extern __inline _Float16 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvtsh_h (__m128h __A) +{ + return __A[0]; +} + +extern __inline _Float16 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_cvtsh_h (__m256h __A) +{ + return __A[0]; +} + +extern __inline _Float16 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_cvtsh_h (__m512h __A) +{ + return __A[0]; +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_castph_ps (__m512h __a) +{ + return (__m512) __a; +} + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_castph_pd (__m512h __a) +{ + return (__m512d) __a; +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_castph_si512 (__m512h __a) +{ + return (__m512i) __a; +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_castph512_ph128 (__m512h __A) +{ + union + { + __m128h a[4]; + __m512h v; + } u = { .v = __A }; + return u.a[0]; +} + +extern __inline __m256h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_castph512_ph256 (__m512h __A) +{ + union + { + __m256h a[2]; + __m512h v; + } u = { .v = __A }; + return u.a[0]; +} + +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_castph128_ph512 (__m128h __A) +{ + union + { + __m128h a[4]; + __m512h v; + } u; + u.a[0] = __A; + return u.v; +} + +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_castph256_ph512 (__m256h __A) +{ + union + { + __m256h a[2]; + __m512h v; + } u; + u.a[0] = __A; + return u.v; +} + +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_zextph128_ph512 (__m128h __A) +{ + return (__m512h) _mm512_insertf32x4 (_mm512_setzero_ps (), + (__m128) __A, 0); +} + +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_zextph256_ph512 (__m256h __A) +{ + return (__m512h) _mm512_insertf64x4 (_mm512_setzero_pd (), + (__m256d) __A, 0); +} + +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_castps_ph (__m512 __a) +{ + return (__m512h) __a; +} + +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_castpd_ph (__m512d __a) +{ + return (__m512h) __a; +} + +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_castsi512_ph (__m512i __a) +{ + return (__m512h) __a; +} + /* Create a vector with element 0 as F and the rest zero. */ extern __inline __m128h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) diff --git a/gcc/config/i386/avx512fp16vlintrin.h b/gcc/config/i386/avx512fp16vlintrin.h index 0124b830dd5..bcbe4523357 100644 --- a/gcc/config/i386/avx512fp16vlintrin.h +++ b/gcc/config/i386/avx512fp16vlintrin.h @@ -34,6 +34,123 @@ #define __DISABLE_AVX512FP16VL__ #endif /* __AVX512FP16VL__ */ +extern __inline __m128 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_castph_ps (__m128h __a) +{ + return (__m128) __a; +} + +extern __inline __m256 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_castph_ps (__m256h __a) +{ + return (__m256) __a; +} + +extern __inline __m128d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_castph_pd (__m128h __a) +{ + return (__m128d) __a; +} + +extern __inline __m256d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_castph_pd (__m256h __a) +{ + return (__m256d) __a; +} + +extern __inline __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_castph_si128 (__m128h __a) +{ + return (__m128i) __a; +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_castph_si256 (__m256h __a) +{ + return (__m256i) __a; +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_castps_ph (__m128 __a) +{ + return (__m128h) __a; +} + +extern __inline __m256h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_castps_ph (__m256 __a) +{ + return (__m256h) __a; +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_castpd_ph (__m128d __a) +{ + return (__m128h) __a; +} + +extern __inline __m256h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_castpd_ph (__m256d __a) +{ + return (__m256h) __a; +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_castsi128_ph (__m128i __a) +{ + return (__m128h) __a; +} + +extern __inline __m256h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_castsi256_ph (__m256i __a) +{ + return (__m256h) __a; +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_castph256_ph128 (__m256h __A) +{ + union + { + __m128h a[2]; + __m256h v; + } u = { .v = __A }; + return u.a[0]; +} + +extern __inline __m256h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_castph128_ph256 (__m128h __A) +{ + union + { + __m128h a[2]; + __m256h v; + } u; + u.a[0] = __A; + return u.v; +} + +extern __inline __m256h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_zextph128_ph256 (__m128h __A) +{ + return (__m256h) _mm256_insertf128_ps (_mm256_setzero_ps (), + (__m128) __A, 0); +} + /* Intrinsics v[add,sub,mul,div]ph. */ extern __inline __m128h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-typecast-1.c b/gcc/testsuite/gcc.target/i386/avx512fp16-typecast-1.c new file mode 100644 index 00000000000..cf0cc7443c0 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-typecast-1.c @@ -0,0 +1,44 @@ +/* { dg-do run { target avx512fp16 } } */ +/* { dg-options "-O2 -mavx512fp16 -mavx512dq" } */ + +#define AVX512FP16 +#include "avx512fp16-helper.h" + +void +test_512 (void) +{ + V512 res; + + res.ymmh[0] = _mm512_castph512_ph256 (src1.zmmh); + check_results (&res, &src1, 16, "_mm512_castph512_ph256"); + + res.xmmh[0] = _mm512_castph512_ph128 (src1.zmmh); + check_results (&res, &src1, 8, "_mm512_castph512_ph128"); + + res.zmmh = _mm512_castph256_ph512 (src1.ymmh[0]); + check_results (&res, &src1, 16, "_mm512_castph256_ph512"); + + res.zmmh = _mm512_castph128_ph512 (src1.xmmh[0]); + check_results (&res, &src1, 8, "_mm512_castph128_ph512"); + + res.zmm = _mm512_castph_ps (src1.zmmh); + check_results (&res, &src1, 32, "_mm512_castph_ps"); + + res.zmmd = _mm512_castph_pd (src1.zmmh); + check_results (&res, &src1, 32, "_mm512_castph_pd"); + + res.zmmi = _mm512_castph_si512 (src1.zmmh); + check_results (&res, &src1, 32, "_mm512_castph_si512"); + + res.zmmh = _mm512_castps_ph (src1.zmm); + check_results (&res, &src1, 32, "_mm512_castps_ph"); + + res.zmmh = _mm512_castpd_ph (src1.zmmd); + check_results (&res, &src1, 32, "_mm512_castpd_ph"); + + res.zmmh = _mm512_castsi512_ph (src1.zmmi); + check_results (&res, &src1, 32, "_mm512_castsi512_ph"); + + if (n_errs != 0) + abort (); +} diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-typecast-2.c b/gcc/testsuite/gcc.target/i386/avx512fp16-typecast-2.c new file mode 100644 index 00000000000..a29f1dbd76a --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-typecast-2.c @@ -0,0 +1,43 @@ +/* { dg-do run { target avx512fp16 } } */ +/* { dg-options "-O2 -mavx512fp16 -mavx512dq" } */ + +static void do_test (void); + +#define DO_TEST do_test +#define AVX512FP16 +#include "avx512f-check.h" + +extern int memcmp (const void *, const void *, __SIZE_TYPE__); + +void +do_test (void) +{ + union512i_d zero; + union512h ad; + union256h b,bd; + union128h c; + + int i; + + for (i = 0; i < 16; i++) + { + b.a[i] = 65.43f + i; + zero.a[i] = 0; + } + + for (i = 0; i < 8; i++) + { + c.a[i] = 32.01f + i; + } + + ad.x = _mm512_zextph256_ph512 (b.x); + if (memcmp (ad.a, b.a, 32) + || memcmp (&ad.a[16], &zero.a, 32)) + abort (); + + ad.x = _mm512_zextph128_ph512 (c.x); + if (memcmp (ad.a, c.a, 16) + || memcmp (&ad.a[8], &zero.a, 48)) + abort (); + +} diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-typecast-1.c b/gcc/testsuite/gcc.target/i386/avx512fp16vl-typecast-1.c new file mode 100644 index 00000000000..3621bb52f08 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-typecast-1.c @@ -0,0 +1,55 @@ +/* { dg-do run { target avx512fp16 } } */ +/* { dg-options "-O2 -mavx512fp16 -mavx512vl -mavx512dq" } */ + +#define AVX512FP16 +#include "avx512fp16-helper.h" + +void +test_512 (void) +{ + V512 res; + res.xmm[0] = _mm_castph_ps (src1.xmmh[0]); + check_results (&res, &src1, 8, "_mm_castph_ps"); + + res.xmmd[0] = _mm_castph_pd (src1.xmmh[0]); + check_results (&res, &src1, 8, "_mm_castph_pd"); + + res.xmmi[0] = _mm_castph_si128 (src1.xmmh[0]); + check_results (&res, &src1, 8, "_mm_castph_si128"); + + res.xmmh[0] = _mm_castps_ph (src1.xmm[0]); + check_results (&res, &src1, 8, "_mm_castps_ph"); + + res.xmmh[0] = _mm_castpd_ph (src1.xmmd[0]); + check_results (&res, &src1, 8, "_mm_castpd_ph"); + + res.xmmh[0] = _mm_castsi128_ph (src1.xmmi[0]); + check_results (&res, &src1, 8, "_mm_castsi128_ph"); + + res.ymm[0] = _mm256_castph_ps (src1.ymmh[0]); + check_results (&res, &src1, 16, "_mm256_castph_ps"); + + res.ymmd[0] = _mm256_castph_pd (src1.ymmh[0]); + check_results (&res, &src1, 16, "_mm256_castph_pd"); + + res.ymmi[0] = _mm256_castph_si256 (src1.ymmh[0]); + check_results (&res, &src1, 16, "_mm256_castph_si256"); + + res.ymmh[0] = _mm256_castps_ph (src1.ymm[0]); + check_results (&res, &src1, 16, "_mm256_castps_ph"); + + res.ymmh[0] = _mm256_castpd_ph (src1.ymmd[0]); + check_results (&res, &src1, 16, "_mm256_castpd_ph"); + + res.ymmh[0] = _mm256_castsi256_ph (src1.ymmi[0]); + check_results (&res, &src1, 16, "_mm256_castsi256_ph"); + + res.xmmh[0] = _mm256_castph256_ph128 (src1.ymmh[0]); + check_results (&res, &src1, 8, "_mm256_castph256_ph128"); + + res.ymmh[0] = _mm256_castph128_ph256 (src1.xmmh[0]); + check_results (&res, &src1, 8, "_mm256_castph128_ph256"); + + if (n_errs != 0) + abort (); +} diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-typecast-2.c b/gcc/testsuite/gcc.target/i386/avx512fp16vl-typecast-2.c new file mode 100644 index 00000000000..dce387f1fab --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-typecast-2.c @@ -0,0 +1,37 @@ +/* { dg-do run { target avx512fp16 } } */ +/* { dg-options "-O2 -mavx512fp16 -mavx512vl -mavx512dq" } */ + +static void do_test (void); + +#define DO_TEST do_test +#define AVX512FP16 +#include "avx512f-check.h" + +extern int memcmp (const void *, const void *, __SIZE_TYPE__); + +void +do_test (void) +{ + union512i_d zero; + union512h ad; + union256h b,bd; + union128h c; + + int i; + + for (i = 0; i < 16; i++) + { + b.a[i] = 65.43f + i; + zero.a[i] = 0; + } + + for (i = 0; i < 8; i++) + { + c.a[i] = 32.01f + i; + } + + bd.x = _mm256_zextph128_ph256 (c.x); + if (memcmp (bd.a, c.a, 16) + || memcmp (&bd.a[8], &zero.a, 16)) + abort (); +} From patchwork Thu Jul 1 06:16:26 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: liuhongt X-Patchwork-Id: 1499383 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=8.43.85.97; helo=sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256 header.s=default header.b=wN1Apo3D; dkim-atps=neutral Received: from sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4GFpzx5yjNz9sX5 for ; Thu, 1 Jul 2021 17:02:04 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 32E993945049 for ; Thu, 1 Jul 2021 07:02:02 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 32E993945049 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1625122922; bh=nAnaOcSEbyMy370JM+EegMa9ZDjvu4cGDljH1hd5wDc=; h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=wN1Apo3DGMekkXmY8wAkO94lQJrLNwVgQ+bt0nPxREwORLJ5mQUtVGBUEIfdkKh/g +ZFt/JN63nTWF+W9mNceh6gr8LvzWKpypfFBbDZZwFzOgtSU0UmN8MnSoRQNO6vjxb QyyqzQ+rGcWT1+QmwAXd+JdhXJVFYDbRtuCdwMGk= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mga04.intel.com (mga04.intel.com [192.55.52.120]) by sourceware.org (Postfix) with ESMTPS id 8ABAF384F02A for ; Thu, 1 Jul 2021 06:17:56 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 8ABAF384F02A X-IronPort-AV: E=McAfee;i="6200,9189,10031"; a="206656520" X-IronPort-AV: E=Sophos;i="5.83,313,1616482800"; d="scan'208";a="206656520" Received: from fmsmga006.fm.intel.com ([10.253.24.20]) by fmsmga104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Jun 2021 23:17:55 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.83,313,1616482800"; d="scan'208";a="641962208" Received: from scymds01.sc.intel.com ([10.148.94.138]) by fmsmga006.fm.intel.com with ESMTP; 30 Jun 2021 23:17:55 -0700 Received: from shliclel320.sh.intel.com (shliclel320.sh.intel.com [10.239.236.50]) by scymds01.sc.intel.com with ESMTP id 1616GmfJ031625; Wed, 30 Jun 2021 23:17:54 -0700 To: gcc-patches@gcc.gnu.org Subject: [PATCH 40/62] AVX512FP16: Add vfmaddsub[132, 213, 231]ph/vfmsubadd[132, 213, 231]ph. Date: Thu, 1 Jul 2021 14:16:26 +0800 Message-Id: <20210701061648.9447-41-hongtao.liu@intel.com> X-Mailer: git-send-email 2.18.1 In-Reply-To: <20210701061648.9447-1-hongtao.liu@intel.com> References: <20210701061648.9447-1-hongtao.liu@intel.com> X-Spam-Status: No, score=-12.3 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_NONE, KAM_DMARC_STATUS, KAM_LAZY_DOMAIN_SECURITY, KAM_SHORT, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: liuhongt via Gcc-patches From: liuhongt Reply-To: liuhongt Cc: jakub@redhat.com Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" gcc/ChangeLog: * config/i386/avx512fp16intrin.h (_mm512_fmaddsub_ph): New intrinsic. (_mm512_mask_fmaddsub_ph): Likewise. (_mm512_mask3_fmaddsub_ph): Likewise. (_mm512_maskz_fmaddsub_ph): Likewise. (_mm512_fmaddsub_round_ph): Likewise. (_mm512_mask_fmaddsub_round_ph): Likewise. (_mm512_mask3_fmaddsub_round_ph): Likewise. (_mm512_maskz_fmaddsub_round_ph): Likewise. (_mm512_mask_fmsubadd_ph): Likewise. (_mm512_mask3_fmsubadd_ph): Likewise. (_mm512_maskz_fmsubadd_ph): Likewise. (_mm512_fmsubadd_round_ph): Likewise. (_mm512_mask_fmsubadd_round_ph): Likewise. (_mm512_mask3_fmsubadd_round_ph): Likewise. (_mm512_maskz_fmsubadd_round_ph): Likewise. * config/i386/avx512fp16vlintrin.h (_mm256_fmaddsub_ph): New intrinsic. (_mm256_mask_fmaddsub_ph): Likewise. (_mm256_mask3_fmaddsub_ph): Likewise. (_mm256_maskz_fmaddsub_ph): Likewise. (_mm_fmaddsub_ph): Likewise. (_mm_mask_fmaddsub_ph): Likewise. (_mm_mask3_fmaddsub_ph): Likewise. (_mm_maskz_fmaddsub_ph): Likewise. (_mm256_fmsubadd_ph): Likewise. (_mm256_mask_fmsubadd_ph): Likewise. (_mm256_mask3_fmsubadd_ph): Likewise. (_mm256_maskz_fmsubadd_ph): Likewise. (_mm_fmsubadd_ph): Likewise. (_mm_mask_fmsubadd_ph): Likewise. (_mm_mask3_fmsubadd_ph): Likewise. (_mm_maskz_fmsubadd_ph): Likewise. * config/i386/i386-builtin.def: Add corresponding new builtins. * config/i386/sse.md (VFH_SF_AVX512VL): New mode iterator. * (_fmsubadd__maskz): New expander. * (_fmaddsub__maskz): Use VFH_SF_AVX512VL. * (fma_fmaddsub_): Ditto. * (_fmaddsub__mask): Ditto. * (_fmaddsub__mask3): Ditto. * (fma_fmsubadd_): Ditto. * (_fmsubadd__mask): Ditto. * (_fmsubadd__mask3): Ditto. gcc/testsuite/ChangeLog: * gcc.target/i386/avx-1.c: Add test for new builtins. * gcc.target/i386/sse-13.c: Ditto. * gcc.target/i386/sse-23.c: Ditto. * gcc.target/i386/sse-14.c: Add test for new intrinsics. * gcc.target/i386/sse-22.c: Ditto. --- gcc/config/i386/avx512fp16intrin.h | 228 +++++++++++++++++++++++++ gcc/config/i386/avx512fp16vlintrin.h | 182 ++++++++++++++++++++ gcc/config/i386/i386-builtin.def | 18 ++ gcc/config/i386/sse.md | 103 ++++++----- gcc/testsuite/gcc.target/i386/avx-1.c | 6 + gcc/testsuite/gcc.target/i386/sse-13.c | 6 + gcc/testsuite/gcc.target/i386/sse-14.c | 8 + gcc/testsuite/gcc.target/i386/sse-22.c | 8 + gcc/testsuite/gcc.target/i386/sse-23.c | 6 + 9 files changed, 524 insertions(+), 41 deletions(-) diff --git a/gcc/config/i386/avx512fp16intrin.h b/gcc/config/i386/avx512fp16intrin.h index ddb227529fa..4092663b504 100644 --- a/gcc/config/i386/avx512fp16intrin.h +++ b/gcc/config/i386/avx512fp16intrin.h @@ -5037,6 +5037,234 @@ _mm_maskz_cvt_roundsd_sh (__mmask8 __A, __m128h __B, __m128d __C, #endif /* __OPTIMIZE__ */ +/* Intrinsics vfmaddsub[132,213,231]ph. */ +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_fmaddsub_ph (__m512h __A, __m512h __B, __m512h __C) +{ + return (__m512h) + __builtin_ia32_vfmaddsubph512_mask ((__v32hf) __A, + (__v32hf) __B, + (__v32hf) __C, + (__mmask32) -1, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_fmaddsub_ph (__m512h __A, __mmask32 __U, __m512h __B, __m512h __C) +{ + return (__m512h) + __builtin_ia32_vfmaddsubph512_mask ((__v32hf) __A, + (__v32hf) __B, + (__v32hf) __C, + (__mmask32) __U, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask3_fmaddsub_ph (__m512h __A, __m512h __B, __m512h __C, __mmask32 __U) +{ + return (__m512h) + __builtin_ia32_vfmaddsubph512_mask3 ((__v32hf) __A, + (__v32hf) __B, + (__v32hf) __C, + (__mmask32) __U, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_fmaddsub_ph (__mmask32 __U, __m512h __A, __m512h __B, __m512h __C) +{ + return (__m512h) + __builtin_ia32_vfmaddsubph512_maskz ((__v32hf) __A, + (__v32hf) __B, + (__v32hf) __C, + (__mmask32) __U, + _MM_FROUND_CUR_DIRECTION); +} + +#ifdef __OPTIMIZE__ +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_fmaddsub_round_ph (__m512h __A, __m512h __B, __m512h __C, const int __R) +{ + return (__m512h) + __builtin_ia32_vfmaddsubph512_mask ((__v32hf) __A, + (__v32hf) __B, + (__v32hf) __C, + (__mmask32) -1, __R); +} + +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_fmaddsub_round_ph (__m512h __A, __mmask32 __U, __m512h __B, + __m512h __C, const int __R) +{ + return (__m512h) + __builtin_ia32_vfmaddsubph512_mask ((__v32hf) __A, + (__v32hf) __B, + (__v32hf) __C, + (__mmask32) __U, __R); +} + +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask3_fmaddsub_round_ph (__m512h __A, __m512h __B, __m512h __C, + __mmask32 __U, const int __R) +{ + return (__m512h) + __builtin_ia32_vfmaddsubph512_mask3 ((__v32hf) __A, + (__v32hf) __B, + (__v32hf) __C, + (__mmask32) __U, __R); +} + +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_fmaddsub_round_ph (__mmask32 __U, __m512h __A, __m512h __B, + __m512h __C, const int __R) +{ + return (__m512h) + __builtin_ia32_vfmaddsubph512_maskz ((__v32hf) __A, + (__v32hf) __B, + (__v32hf) __C, + (__mmask32) __U, __R); +} + +#else +#define _mm512_fmaddsub_round_ph(A, B, C, R) \ + ((__m512h)__builtin_ia32_vfmaddsubph512_mask ((A), (B), (C), -1, (R))) + +#define _mm512_mask_fmaddsub_round_ph(A, U, B, C, R) \ + ((__m512h)__builtin_ia32_vfmaddsubph512_mask ((A), (B), (C), (U), (R))) + +#define _mm512_mask3_fmaddsub_round_ph(A, B, C, U, R) \ + ((__m512h)__builtin_ia32_vfmaddsubph512_mask3 ((A), (B), (C), (U), (R))) + +#define _mm512_maskz_fmaddsub_round_ph(U, A, B, C, R) \ + ((__m512h)__builtin_ia32_vfmaddsubph512_maskz((A), (B), (C), (U), (R))) + +#endif /* __OPTIMIZE__ */ + +/* Intrinsics vfmsubadd[132,213,231]ph. */ +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) + _mm512_fmsubadd_ph (__m512h __A, __m512h __B, __m512h __C) +{ + return (__m512h) + __builtin_ia32_vfmsubaddph512_mask ((__v32hf) __A, + (__v32hf) __B, + (__v32hf) __C, + (__mmask32) -1, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_fmsubadd_ph (__m512h __A, __mmask32 __U, + __m512h __B, __m512h __C) +{ + return (__m512h) + __builtin_ia32_vfmsubaddph512_mask ((__v32hf) __A, + (__v32hf) __B, + (__v32hf) __C, + (__mmask32) __U, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask3_fmsubadd_ph (__m512h __A, __m512h __B, + __m512h __C, __mmask32 __U) +{ + return (__m512h) + __builtin_ia32_vfmsubaddph512_mask3 ((__v32hf) __A, + (__v32hf) __B, + (__v32hf) __C, + (__mmask32) __U, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_fmsubadd_ph (__mmask32 __U, __m512h __A, + __m512h __B, __m512h __C) +{ + return (__m512h) + __builtin_ia32_vfmsubaddph512_maskz ((__v32hf) __A, + (__v32hf) __B, + (__v32hf) __C, + (__mmask32) __U, + _MM_FROUND_CUR_DIRECTION); +} + +#ifdef __OPTIMIZE__ +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_fmsubadd_round_ph (__m512h __A, __m512h __B, + __m512h __C, const int __R) +{ + return (__m512h) + __builtin_ia32_vfmsubaddph512_mask ((__v32hf) __A, + (__v32hf) __B, + (__v32hf) __C, + (__mmask32) -1, __R); +} + +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_fmsubadd_round_ph (__m512h __A, __mmask32 __U, __m512h __B, + __m512h __C, const int __R) +{ + return (__m512h) + __builtin_ia32_vfmsubaddph512_mask ((__v32hf) __A, + (__v32hf) __B, + (__v32hf) __C, + (__mmask32) __U, __R); +} + +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask3_fmsubadd_round_ph (__m512h __A, __m512h __B, __m512h __C, + __mmask32 __U, const int __R) +{ + return (__m512h) + __builtin_ia32_vfmsubaddph512_mask3 ((__v32hf) __A, + (__v32hf) __B, + (__v32hf) __C, + (__mmask32) __U, __R); +} + +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_fmsubadd_round_ph (__mmask32 __U, __m512h __A, __m512h __B, + __m512h __C, const int __R) +{ + return (__m512h) + __builtin_ia32_vfmsubaddph512_maskz ((__v32hf) __A, + (__v32hf) __B, + (__v32hf) __C, + (__mmask32) __U, __R); +} + +#else +#define _mm512_fmsubadd_round_ph(A, B, C, R) \ + ((__m512h)__builtin_ia32_vfmsubaddph512_mask ((A), (B), (C), -1, (R))) + +#define _mm512_mask_fmsubadd_round_ph(A, U, B, C, R) \ + ((__m512h)__builtin_ia32_vfmsubaddph512_mask ((A), (B), (C), (U), (R))) + +#define _mm512_mask3_fmsubadd_round_ph(A, B, C, U, R) \ + ((__m512h)__builtin_ia32_vfmsubaddph512_mask3 ((A), (B), (C), (U), (R))) + +#define _mm512_maskz_fmsubadd_round_ph(U, A, B, C, R) \ + ((__m512h)__builtin_ia32_vfmsubaddph512_maskz ((A), (B), (C), (U), (R))) + +#endif /* __OPTIMIZE__ */ + #ifdef __DISABLE_AVX512FP16__ #undef __DISABLE_AVX512FP16__ #pragma GCC pop_options diff --git a/gcc/config/i386/avx512fp16vlintrin.h b/gcc/config/i386/avx512fp16vlintrin.h index bcbe4523357..8825fae52aa 100644 --- a/gcc/config/i386/avx512fp16vlintrin.h +++ b/gcc/config/i386/avx512fp16vlintrin.h @@ -2269,6 +2269,188 @@ _mm256_maskz_cvtpd_ph (__mmask8 __A, __m256d __B) __A); } +/* Intrinsics vfmaddsub[132,213,231]ph. */ +extern __inline __m256h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_fmaddsub_ph (__m256h __A, __m256h __B, __m256h __C) +{ + return (__m256h)__builtin_ia32_vfmaddsubph256_mask ((__v16hf)__A, + (__v16hf)__B, + (__v16hf)__C, + (__mmask16)-1); +} + +extern __inline __m256h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_mask_fmaddsub_ph (__m256h __A, __mmask16 __U, __m256h __B, + __m256h __C) +{ + return (__m256h) __builtin_ia32_vfmaddsubph256_mask ((__v16hf) __A, + (__v16hf) __B, + (__v16hf) __C, + (__mmask16) __U); +} + +extern __inline __m256h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_mask3_fmaddsub_ph (__m256h __A, __m256h __B, __m256h __C, + __mmask16 __U) +{ + return (__m256h) __builtin_ia32_vfmaddsubph256_mask3 ((__v16hf) __A, + (__v16hf) __B, + (__v16hf) __C, + (__mmask16) + __U); +} + +extern __inline __m256h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_maskz_fmaddsub_ph (__mmask16 __U, __m256h __A, __m256h __B, + __m256h __C) +{ + return (__m256h) __builtin_ia32_vfmaddsubph256_maskz ((__v16hf) __A, + (__v16hf) __B, + (__v16hf) __C, + (__mmask16) + __U); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_fmaddsub_ph (__m128h __A, __m128h __B, __m128h __C) +{ + return (__m128h)__builtin_ia32_vfmaddsubph128_mask ((__v8hf)__A, + (__v8hf)__B, + (__v8hf)__C, + (__mmask8)-1); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_fmaddsub_ph (__m128h __A, __mmask8 __U, __m128h __B, + __m128h __C) +{ + return (__m128h) __builtin_ia32_vfmaddsubph128_mask ((__v8hf) __A, + (__v8hf) __B, + (__v8hf) __C, + (__mmask8) __U); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask3_fmaddsub_ph (__m128h __A, __m128h __B, __m128h __C, + __mmask8 __U) +{ + return (__m128h) __builtin_ia32_vfmaddsubph128_mask3 ((__v8hf) __A, + (__v8hf) __B, + (__v8hf) __C, + (__mmask8) + __U); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_fmaddsub_ph (__mmask8 __U, __m128h __A, __m128h __B, + __m128h __C) +{ + return (__m128h) __builtin_ia32_vfmaddsubph128_maskz ((__v8hf) __A, + (__v8hf) __B, + (__v8hf) __C, + (__mmask8) + __U); +} + +/* Intrinsics vfmsubadd[132,213,231]ph. */ +extern __inline __m256h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_fmsubadd_ph (__m256h __A, __m256h __B, __m256h __C) +{ + return (__m256h) __builtin_ia32_vfmsubaddph256_mask ((__v16hf) __A, + (__v16hf) __B, + (__v16hf) __C, + (__mmask16) -1); +} + +extern __inline __m256h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_mask_fmsubadd_ph (__m256h __A, __mmask16 __U, __m256h __B, + __m256h __C) +{ + return (__m256h) __builtin_ia32_vfmsubaddph256_mask ((__v16hf) __A, + (__v16hf) __B, + (__v16hf) __C, + (__mmask16) __U); +} + +extern __inline __m256h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_mask3_fmsubadd_ph (__m256h __A, __m256h __B, __m256h __C, + __mmask16 __U) +{ + return (__m256h) __builtin_ia32_vfmsubaddph256_mask3 ((__v16hf) __A, + (__v16hf) __B, + (__v16hf) __C, + (__mmask16) + __U); +} + +extern __inline __m256h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_maskz_fmsubadd_ph (__mmask16 __U, __m256h __A, __m256h __B, + __m256h __C) +{ + return (__m256h) __builtin_ia32_vfmsubaddph256_maskz ((__v16hf) __A, + (__v16hf) __B, + (__v16hf) __C, + (__mmask16) + __U); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_fmsubadd_ph (__m128h __A, __m128h __B, __m128h __C) +{ + return (__m128h) __builtin_ia32_vfmsubaddph128_mask ((__v8hf) __A, + (__v8hf) __B, + (__v8hf) __C, + (__mmask8) -1); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_fmsubadd_ph (__m128h __A, __mmask8 __U, __m128h __B, + __m128h __C) +{ + return (__m128h) __builtin_ia32_vfmsubaddph128_mask ((__v8hf) __A, + (__v8hf) __B, + (__v8hf) __C, + (__mmask8) __U); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask3_fmsubadd_ph (__m128h __A, __m128h __B, __m128h __C, + __mmask8 __U) +{ + return (__m128h) __builtin_ia32_vfmsubaddph128_mask3 ((__v8hf) __A, + (__v8hf) __B, + (__v8hf) __C, + (__mmask8) + __U); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_fmsubadd_ph (__mmask8 __U, __m128h __A, __m128h __B, + __m128h __C) +{ + return (__m128h) __builtin_ia32_vfmsubaddph128_maskz ((__v8hf) __A, + (__v8hf) __B, + (__v8hf) __C, + (__mmask8) + __U); +} + #ifdef __DISABLE_AVX512FP16VL__ #undef __DISABLE_AVX512FP16VL__ #pragma GCC pop_options diff --git a/gcc/config/i386/i386-builtin.def b/gcc/config/i386/i386-builtin.def index 4bb48bc21dc..42bba719ec3 100644 --- a/gcc/config/i386/i386-builtin.def +++ b/gcc/config/i386/i386-builtin.def @@ -2875,6 +2875,18 @@ BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp1 BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtps2ph_v8sf_mask, "__builtin_ia32_vcvtps2ph_v8sf_mask", IX86_BUILTIN_VCVTPS2PH_V8SF_MASK, UNKNOWN, (int) V8HF_FTYPE_V8SF_V8HF_UQI) BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtpd2ph_v2df_mask, "__builtin_ia32_vcvtpd2ph_v2df_mask", IX86_BUILTIN_VCVTPD2PH_V2DF_MASK, UNKNOWN, (int) V8HF_FTYPE_V2DF_V8HF_UQI) BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtpd2ph_v4df_mask, "__builtin_ia32_vcvtpd2ph_v4df_mask", IX86_BUILTIN_VCVTPD2PH_V4DF_MASK, UNKNOWN, (int) V8HF_FTYPE_V4DF_V8HF_UQI) +BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512vl_fmaddsub_v16hf_mask, "__builtin_ia32_vfmaddsubph256_mask", IX86_BUILTIN_VFMADDSUBPH256_MASK, UNKNOWN, (int) V16HF_FTYPE_V16HF_V16HF_V16HF_UHI) +BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512vl_fmaddsub_v16hf_mask3, "__builtin_ia32_vfmaddsubph256_mask3", IX86_BUILTIN_VFMADDSUBPH256_MASK3, UNKNOWN, (int) V16HF_FTYPE_V16HF_V16HF_V16HF_UHI) +BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512vl_fmaddsub_v16hf_maskz, "__builtin_ia32_vfmaddsubph256_maskz", IX86_BUILTIN_VFMADDSUBPH256_MASKZ, UNKNOWN, (int) V16HF_FTYPE_V16HF_V16HF_V16HF_UHI) +BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_fmaddsub_v8hf_mask, "__builtin_ia32_vfmaddsubph128_mask", IX86_BUILTIN_VFMADDSUBPH128_MASK, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI) +BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_fmaddsub_v8hf_mask3, "__builtin_ia32_vfmaddsubph128_mask3", IX86_BUILTIN_VFMADDSUBPH128_MASK3, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI) +BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_fmaddsub_v8hf_maskz, "__builtin_ia32_vfmaddsubph128_maskz", IX86_BUILTIN_VFMADDSUBPH128_MASKZ, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI) +BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512vl_fmsubadd_v16hf_mask, "__builtin_ia32_vfmsubaddph256_mask", IX86_BUILTIN_VFMSUBADDPH256_MASK, UNKNOWN, (int) V16HF_FTYPE_V16HF_V16HF_V16HF_UHI) +BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512vl_fmsubadd_v16hf_mask3, "__builtin_ia32_vfmsubaddph256_mask3", IX86_BUILTIN_VFMSUBADDPH256_MASK3, UNKNOWN, (int) V16HF_FTYPE_V16HF_V16HF_V16HF_UHI) +BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512vl_fmsubadd_v16hf_maskz, "__builtin_ia32_vfmsubaddph256_maskz", IX86_BUILTIN_VFMSUBADDPH256_MASKZ, UNKNOWN, (int) V16HF_FTYPE_V16HF_V16HF_V16HF_UHI) +BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_fmsubadd_v8hf_mask, "__builtin_ia32_vfmsubaddph128_mask", IX86_BUILTIN_VFMSUBADDPH128_MASK, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI) +BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_fmsubadd_v8hf_mask3, "__builtin_ia32_vfmsubaddph128_mask3", IX86_BUILTIN_VFMSUBADDPH128_MASK3, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI) +BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_fmsubadd_v8hf_maskz, "__builtin_ia32_vfmsubaddph128_maskz", IX86_BUILTIN_VFMSUBADDPH128_MASKZ, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI) /* Builtins with rounding support. */ BDESC_END (ARGS, ROUND_ARGS) @@ -3140,6 +3152,12 @@ BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtsh2ss_mask_round, BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtsh2sd_mask_round, "__builtin_ia32_vcvtsh2sd_mask_round", IX86_BUILTIN_VCVTSH2SD_MASK_ROUND, UNKNOWN, (int) V2DF_FTYPE_V8HF_V2DF_V2DF_UQI_INT) BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtss2sh_mask_round, "__builtin_ia32_vcvtss2sh_mask_round", IX86_BUILTIN_VCVTSS2SH_MASK_ROUND, UNKNOWN, (int) V8HF_FTYPE_V4SF_V8HF_V8HF_UQI_INT) BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtsd2sh_mask_round, "__builtin_ia32_vcvtsd2sh_mask_round", IX86_BUILTIN_VCVTSD2SH_MASK_ROUND, UNKNOWN, (int) V8HF_FTYPE_V2DF_V8HF_V8HF_UQI_INT) +BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_fmaddsub_v32hf_mask_round, "__builtin_ia32_vfmaddsubph512_mask", IX86_BUILTIN_VFMADDSUBPH512_MASK, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT) +BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_fmaddsub_v32hf_mask3_round, "__builtin_ia32_vfmaddsubph512_mask3", IX86_BUILTIN_VFMADDSUBPH512_MASK3, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT) +BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_fmaddsub_v32hf_maskz_round, "__builtin_ia32_vfmaddsubph512_maskz", IX86_BUILTIN_VFMADDSUBPH512_MASKZ, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT) +BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_fmsubadd_v32hf_mask_round, "__builtin_ia32_vfmsubaddph512_mask", IX86_BUILTIN_VFMSUBADDPH512_MASK, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT) +BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_fmsubadd_v32hf_mask3_round, "__builtin_ia32_vfmsubaddph512_mask3", IX86_BUILTIN_VFMSUBADDPH512_MASK3, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT) +BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_fmsubadd_v32hf_maskz_round, "__builtin_ia32_vfmsubaddph512_maskz", IX86_BUILTIN_VFMSUBADDPH512_MASKZ, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT) BDESC_END (ROUND_ARGS, MULTI_ARG) diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index 95f4a82c9cd..847684e232e 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -4542,6 +4542,13 @@ (define_mode_iterator VF_SF_AVX512VL [SF V16SF (V8SF "TARGET_AVX512VL") (V4SF "TARGET_AVX512VL") DF V8DF (V4DF "TARGET_AVX512VL") (V2DF "TARGET_AVX512VL")]) +(define_mode_iterator VFH_SF_AVX512VL + [(V32HF "TARGET_AVX512FP16") + (V16HF "TARGET_AVX512FP16 && TARGET_AVX512VL") + (V8HF "TARGET_AVX512FP16 && TARGET_AVX512VL") + SF V16SF (V8SF "TARGET_AVX512VL") (V4SF "TARGET_AVX512VL") + DF V8DF (V4DF "TARGET_AVX512VL") (V2DF "TARGET_AVX512VL")]) + (define_insn "fma_fmadd_" [(set (match_operand:VF_SF_AVX512VL 0 "register_operand" "=v,v,v") (fma:VF_SF_AVX512VL @@ -4848,10 +4855,10 @@ (define_expand "fmaddsub_" "TARGET_FMA || TARGET_FMA4 || TARGET_AVX512F") (define_expand "_fmaddsub__maskz" - [(match_operand:VF_AVX512VL 0 "register_operand") - (match_operand:VF_AVX512VL 1 "") - (match_operand:VF_AVX512VL 2 "") - (match_operand:VF_AVX512VL 3 "") + [(match_operand:VFH_AVX512VL 0 "register_operand") + (match_operand:VFH_AVX512VL 1 "") + (match_operand:VFH_AVX512VL 2 "") + (match_operand:VFH_AVX512VL 3 "") (match_operand: 4 "register_operand")] "TARGET_AVX512F" { @@ -4861,6 +4868,20 @@ (define_expand "_fmaddsub__maskz" DONE; }) +(define_expand "_fmsubadd__maskz" + [(match_operand:VFH_AVX512VL 0 "register_operand") + (match_operand:VFH_AVX512VL 1 "") + (match_operand:VFH_AVX512VL 2 "") + (match_operand:VFH_AVX512VL 3 "") + (match_operand: 4 "register_operand")] + "TARGET_AVX512F" +{ + emit_insn (gen_fma_fmsubadd__maskz_1 ( + operands[0], operands[1], operands[2], operands[3], + CONST0_RTX (mode), operands[4])); + DONE; +}) + (define_insn "*fma_fmaddsub_" [(set (match_operand:VF_128_256 0 "register_operand" "=v,v,v,x,x") (unspec:VF_128_256 @@ -4880,11 +4901,11 @@ (define_insn "*fma_fmaddsub_" (set_attr "mode" "")]) (define_insn "fma_fmaddsub_" - [(set (match_operand:VF_SF_AVX512VL 0 "register_operand" "=v,v,v") - (unspec:VF_SF_AVX512VL - [(match_operand:VF_SF_AVX512VL 1 "" "%0,0,v") - (match_operand:VF_SF_AVX512VL 2 "" ",v,") - (match_operand:VF_SF_AVX512VL 3 "" "v,,0")] + [(set (match_operand:VFH_SF_AVX512VL 0 "register_operand" "=v,v,v") + (unspec:VFH_SF_AVX512VL + [(match_operand:VFH_SF_AVX512VL 1 "" "%0,0,v") + (match_operand:VFH_SF_AVX512VL 2 "" ",v,") + (match_operand:VFH_SF_AVX512VL 3 "" "v,,0")] UNSPEC_FMADDSUB))] "TARGET_AVX512F && && " "@ @@ -4895,12 +4916,12 @@ (define_insn "fma_fmaddsub_" (set_attr "mode" "")]) (define_insn "_fmaddsub__mask" - [(set (match_operand:VF_AVX512VL 0 "register_operand" "=v,v") - (vec_merge:VF_AVX512VL - (unspec:VF_AVX512VL - [(match_operand:VF_AVX512VL 1 "register_operand" "0,0") - (match_operand:VF_AVX512VL 2 "" ",v") - (match_operand:VF_AVX512VL 3 "" "v,")] + [(set (match_operand:VFH_AVX512VL 0 "register_operand" "=v,v") + (vec_merge:VFH_AVX512VL + (unspec:VFH_AVX512VL + [(match_operand:VFH_AVX512VL 1 "register_operand" "0,0") + (match_operand:VFH_AVX512VL 2 "" ",v") + (match_operand:VFH_AVX512VL 3 "" "v,")] UNSPEC_FMADDSUB) (match_dup 1) (match_operand: 4 "register_operand" "Yk,Yk")))] @@ -4912,12 +4933,12 @@ (define_insn "_fmaddsub__mask" (set_attr "mode" "")]) (define_insn "_fmaddsub__mask3" - [(set (match_operand:VF_AVX512VL 0 "register_operand" "=v") - (vec_merge:VF_AVX512VL - (unspec:VF_AVX512VL - [(match_operand:VF_AVX512VL 1 "register_operand" "v") - (match_operand:VF_AVX512VL 2 "" "") - (match_operand:VF_AVX512VL 3 "register_operand" "0")] + [(set (match_operand:VFH_AVX512VL 0 "register_operand" "=v") + (vec_merge:VFH_AVX512VL + (unspec:VFH_AVX512VL + [(match_operand:VFH_AVX512VL 1 "register_operand" "v") + (match_operand:VFH_AVX512VL 2 "" "") + (match_operand:VFH_AVX512VL 3 "register_operand" "0")] UNSPEC_FMADDSUB) (match_dup 3) (match_operand: 4 "register_operand" "Yk")))] @@ -4946,12 +4967,12 @@ (define_insn "*fma_fmsubadd_" (set_attr "mode" "")]) (define_insn "fma_fmsubadd_" - [(set (match_operand:VF_SF_AVX512VL 0 "register_operand" "=v,v,v") - (unspec:VF_SF_AVX512VL - [(match_operand:VF_SF_AVX512VL 1 "" "%0,0,v") - (match_operand:VF_SF_AVX512VL 2 "" ",v,") - (neg:VF_SF_AVX512VL - (match_operand:VF_SF_AVX512VL 3 "" "v,,0"))] + [(set (match_operand:VFH_SF_AVX512VL 0 "register_operand" "=v,v,v") + (unspec:VFH_SF_AVX512VL + [(match_operand:VFH_SF_AVX512VL 1 "" "%0,0,v") + (match_operand:VFH_SF_AVX512VL 2 "" ",v,") + (neg:VFH_SF_AVX512VL + (match_operand:VFH_SF_AVX512VL 3 "" "v,,0"))] UNSPEC_FMADDSUB))] "TARGET_AVX512F && && " "@ @@ -4962,13 +4983,13 @@ (define_insn "fma_fmsubadd_" (set_attr "mode" "")]) (define_insn "_fmsubadd__mask" - [(set (match_operand:VF_AVX512VL 0 "register_operand" "=v,v") - (vec_merge:VF_AVX512VL - (unspec:VF_AVX512VL - [(match_operand:VF_AVX512VL 1 "register_operand" "0,0") - (match_operand:VF_AVX512VL 2 "" ",v") - (neg:VF_AVX512VL - (match_operand:VF_AVX512VL 3 "" "v,"))] + [(set (match_operand:VFH_AVX512VL 0 "register_operand" "=v,v") + (vec_merge:VFH_AVX512VL + (unspec:VFH_AVX512VL + [(match_operand:VFH_AVX512VL 1 "register_operand" "0,0") + (match_operand:VFH_AVX512VL 2 "" ",v") + (neg:VFH_AVX512VL + (match_operand:VFH_AVX512VL 3 "" "v,"))] UNSPEC_FMADDSUB) (match_dup 1) (match_operand: 4 "register_operand" "Yk,Yk")))] @@ -4980,13 +5001,13 @@ (define_insn "_fmsubadd__mask" (set_attr "mode" "")]) (define_insn "_fmsubadd__mask3" - [(set (match_operand:VF_AVX512VL 0 "register_operand" "=v") - (vec_merge:VF_AVX512VL - (unspec:VF_AVX512VL - [(match_operand:VF_AVX512VL 1 "register_operand" "v") - (match_operand:VF_AVX512VL 2 "" "") - (neg:VF_AVX512VL - (match_operand:VF_AVX512VL 3 "register_operand" "0"))] + [(set (match_operand:VFH_AVX512VL 0 "register_operand" "=v") + (vec_merge:VFH_AVX512VL + (unspec:VFH_AVX512VL + [(match_operand:VFH_AVX512VL 1 "register_operand" "v") + (match_operand:VFH_AVX512VL 2 "" "") + (neg:VFH_AVX512VL + (match_operand:VFH_AVX512VL 3 "register_operand" "0"))] UNSPEC_FMADDSUB) (match_dup 3) (match_operand: 4 "register_operand" "Yk")))] diff --git a/gcc/testsuite/gcc.target/i386/avx-1.c b/gcc/testsuite/gcc.target/i386/avx-1.c index deb25098f25..51a0cf2fe87 100644 --- a/gcc/testsuite/gcc.target/i386/avx-1.c +++ b/gcc/testsuite/gcc.target/i386/avx-1.c @@ -757,6 +757,12 @@ #define __builtin_ia32_vcvtsh2sd_mask_round(A, B, C, D, E) __builtin_ia32_vcvtsh2sd_mask_round(A, B, C, D, 8) #define __builtin_ia32_vcvtss2sh_mask_round(A, B, C, D, E) __builtin_ia32_vcvtss2sh_mask_round(A, B, C, D, 8) #define __builtin_ia32_vcvtsd2sh_mask_round(A, B, C, D, E) __builtin_ia32_vcvtsd2sh_mask_round(A, B, C, D, 8) +#define __builtin_ia32_vfmaddsubph512_mask(A, B, C, D, E) __builtin_ia32_vfmaddsubph512_mask(A, B, C, D, 8) +#define __builtin_ia32_vfmaddsubph512_mask3(A, B, C, D, E) __builtin_ia32_vfmaddsubph512_mask3(A, B, C, D, 8) +#define __builtin_ia32_vfmaddsubph512_maskz(A, B, C, D, E) __builtin_ia32_vfmaddsubph512_maskz(A, B, C, D, 8) +#define __builtin_ia32_vfmsubaddph512_mask(A, B, C, D, E) __builtin_ia32_vfmsubaddph512_mask(A, B, C, D, 8) +#define __builtin_ia32_vfmsubaddph512_mask3(A, B, C, D, E) __builtin_ia32_vfmsubaddph512_mask3(A, B, C, D, 8) +#define __builtin_ia32_vfmsubaddph512_maskz(A, B, C, D, E) __builtin_ia32_vfmsubaddph512_maskz(A, B, C, D, 8) /* avx512fp16vlintrin.h */ #define __builtin_ia32_vcmpph_v8hf_mask(A, B, C, D) __builtin_ia32_vcmpph_v8hf_mask(A, B, 1, D) diff --git a/gcc/testsuite/gcc.target/i386/sse-13.c b/gcc/testsuite/gcc.target/i386/sse-13.c index dbe206bd1bb..a53f4653908 100644 --- a/gcc/testsuite/gcc.target/i386/sse-13.c +++ b/gcc/testsuite/gcc.target/i386/sse-13.c @@ -774,6 +774,12 @@ #define __builtin_ia32_vcvtsh2sd_mask_round(A, B, C, D, E) __builtin_ia32_vcvtsh2sd_mask_round(A, B, C, D, 8) #define __builtin_ia32_vcvtss2sh_mask_round(A, B, C, D, E) __builtin_ia32_vcvtss2sh_mask_round(A, B, C, D, 8) #define __builtin_ia32_vcvtsd2sh_mask_round(A, B, C, D, E) __builtin_ia32_vcvtsd2sh_mask_round(A, B, C, D, 8) +#define __builtin_ia32_vfmaddsubph512_mask(A, B, C, D, E) __builtin_ia32_vfmaddsubph512_mask(A, B, C, D, 8) +#define __builtin_ia32_vfmaddsubph512_mask3(A, B, C, D, E) __builtin_ia32_vfmaddsubph512_mask3(A, B, C, D, 8) +#define __builtin_ia32_vfmaddsubph512_maskz(A, B, C, D, E) __builtin_ia32_vfmaddsubph512_maskz(A, B, C, D, 8) +#define __builtin_ia32_vfmsubaddph512_mask(A, B, C, D, E) __builtin_ia32_vfmsubaddph512_mask(A, B, C, D, 8) +#define __builtin_ia32_vfmsubaddph512_mask3(A, B, C, D, E) __builtin_ia32_vfmsubaddph512_mask3(A, B, C, D, 8) +#define __builtin_ia32_vfmsubaddph512_maskz(A, B, C, D, E) __builtin_ia32_vfmsubaddph512_maskz(A, B, C, D, 8) /* avx512fp16vlintrin.h */ #define __builtin_ia32_vcmpph_v8hf_mask(A, B, C, D) __builtin_ia32_vcmpph_v8hf_mask(A, B, 1, D) diff --git a/gcc/testsuite/gcc.target/i386/sse-14.c b/gcc/testsuite/gcc.target/i386/sse-14.c index e64321d8afa..48895e0dd0d 100644 --- a/gcc/testsuite/gcc.target/i386/sse-14.c +++ b/gcc/testsuite/gcc.target/i386/sse-14.c @@ -836,6 +836,8 @@ test_3 (_mm_maskz_cvt_roundsh_ss, __m128, __mmask8, __m128, __m128h, 8) test_3 (_mm_maskz_cvt_roundsh_sd, __m128d, __mmask8, __m128d, __m128h, 8) test_3 (_mm_maskz_cvt_roundss_sh, __m128h, __mmask8, __m128h, __m128, 8) test_3 (_mm_maskz_cvt_roundsd_sh, __m128h, __mmask8, __m128h, __m128d, 8) +test_3 (_mm512_fmaddsub_round_ph, __m512h, __m512h, __m512h, __m512h, 9) +test_3 (_mm512_fmsubadd_round_ph, __m512h, __m512h, __m512h, __m512h, 9) test_3x (_mm512_mask_cmp_round_ph_mask, __mmask32, __mmask32, __m512h, __m512h, 1, 8) test_3x (_mm_mask_cmp_round_sh_mask, __mmask8, __mmask8, __m128h, __m128h, 1, 8) test_3x (_mm512_mask_reduce_round_ph, __m512h, __m512h, __mmask32, __m512h, 123, 8) @@ -868,6 +870,12 @@ test_4 (_mm_mask_cvt_roundsh_ss, __m128, __m128, __mmask8, __m128, __m128h, 8) test_4 (_mm_mask_cvt_roundsh_sd, __m128d, __m128d, __mmask8, __m128d, __m128h, 8) test_4 (_mm_mask_cvt_roundss_sh, __m128h, __m128h, __mmask8, __m128h, __m128, 8) test_4 (_mm_mask_cvt_roundsd_sh, __m128h, __m128h, __mmask8, __m128h, __m128d, 8) +test_4 (_mm512_mask_fmaddsub_round_ph, __m512h, __m512h, __mmask32, __m512h, __m512h, 9) +test_4 (_mm512_mask3_fmaddsub_round_ph, __m512h, __m512h, __m512h, __m512h, __mmask32, 9) +test_4 (_mm512_maskz_fmaddsub_round_ph, __m512h, __mmask32, __m512h, __m512h, __m512h, 9) +test_4 (_mm512_mask3_fmsubadd_round_ph, __m512h, __m512h, __m512h, __m512h, __mmask32, 9) +test_4 (_mm512_mask_fmsubadd_round_ph, __m512h, __m512h, __mmask32, __m512h, __m512h, 9) +test_4 (_mm512_maskz_fmsubadd_round_ph, __m512h, __mmask32, __m512h, __m512h, __m512h, 9) test_4x (_mm_mask_reduce_round_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 123, 8) test_4x (_mm_mask_roundscale_round_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 123, 8) test_4x (_mm_mask_getmant_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 1, 1) diff --git a/gcc/testsuite/gcc.target/i386/sse-22.c b/gcc/testsuite/gcc.target/i386/sse-22.c index d92898fdd11..bc530da388b 100644 --- a/gcc/testsuite/gcc.target/i386/sse-22.c +++ b/gcc/testsuite/gcc.target/i386/sse-22.c @@ -939,6 +939,8 @@ test_3 (_mm_maskz_cvt_roundsh_ss, __m128, __mmask8, __m128, __m128h, 8) test_3 (_mm_maskz_cvt_roundsh_sd, __m128d, __mmask8, __m128d, __m128h, 8) test_3 (_mm_maskz_cvt_roundss_sh, __m128h, __mmask8, __m128h, __m128, 8) test_3 (_mm_maskz_cvt_roundsd_sh, __m128h, __mmask8, __m128h, __m128d, 8) +test_3 (_mm512_fmaddsub_round_ph, __m512h, __m512h, __m512h, __m512h, 9) +test_3 (_mm512_fmsubadd_round_ph, __m512h, __m512h, __m512h, __m512h, 9) test_3x (_mm512_mask_cmp_round_ph_mask, __mmask32, __mmask32, __m512h, __m512h, 1, 8) test_3x (_mm_mask_cmp_round_sh_mask, __mmask8, __mmask8, __m128h, __m128h, 1, 8) test_3x (_mm512_mask_reduce_round_ph, __m512h, __m512h, __mmask32, __m512h, 123, 8) @@ -970,6 +972,12 @@ test_4 (_mm_mask_cvt_roundsh_ss, __m128, __m128, __mmask8, __m128, __m128h, 8) test_4 (_mm_mask_cvt_roundsh_sd, __m128d, __m128d, __mmask8, __m128d, __m128h, 8) test_4 (_mm_mask_cvt_roundss_sh, __m128h, __m128h, __mmask8, __m128h, __m128, 8) test_4 (_mm_mask_cvt_roundsd_sh, __m128h, __m128h, __mmask8, __m128h, __m128d, 8) +test_4 (_mm512_mask_fmaddsub_round_ph, __m512h, __m512h, __mmask32, __m512h, __m512h, 9) +test_4 (_mm512_mask3_fmaddsub_round_ph, __m512h, __m512h, __m512h, __m512h, __mmask32, 9) +test_4 (_mm512_maskz_fmaddsub_round_ph, __m512h, __mmask32, __m512h, __m512h, __m512h, 9) +test_4 (_mm512_mask3_fmsubadd_round_ph, __m512h, __m512h, __m512h, __m512h, __mmask32, 9) +test_4 (_mm512_mask_fmsubadd_round_ph, __m512h, __m512h, __mmask32, __m512h, __m512h, 9) +test_4 (_mm512_maskz_fmsubadd_round_ph, __m512h, __mmask32, __m512h, __m512h, __m512h, 9) test_4x (_mm_mask_reduce_round_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 123, 8) test_4x (_mm_mask_roundscale_round_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 123, 8) test_4x (_mm_mask_getmant_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 1, 1) diff --git a/gcc/testsuite/gcc.target/i386/sse-23.c b/gcc/testsuite/gcc.target/i386/sse-23.c index 2f5027ba36f..df43931ca97 100644 --- a/gcc/testsuite/gcc.target/i386/sse-23.c +++ b/gcc/testsuite/gcc.target/i386/sse-23.c @@ -775,6 +775,12 @@ #define __builtin_ia32_vcvtsh2sd_mask_round(A, B, C, D, E) __builtin_ia32_vcvtsh2sd_mask_round(A, B, C, D, 8) #define __builtin_ia32_vcvtss2sh_mask_round(A, B, C, D, E) __builtin_ia32_vcvtss2sh_mask_round(A, B, C, D, 8) #define __builtin_ia32_vcvtsd2sh_mask_round(A, B, C, D, E) __builtin_ia32_vcvtsd2sh_mask_round(A, B, C, D, 8) +#define __builtin_ia32_vfmaddsubph512_mask(A, B, C, D, E) __builtin_ia32_vfmaddsubph512_mask(A, B, C, D, 8) +#define __builtin_ia32_vfmaddsubph512_mask3(A, B, C, D, E) __builtin_ia32_vfmaddsubph512_mask3(A, B, C, D, 8) +#define __builtin_ia32_vfmaddsubph512_maskz(A, B, C, D, E) __builtin_ia32_vfmaddsubph512_maskz(A, B, C, D, 8) +#define __builtin_ia32_vfmsubaddph512_mask(A, B, C, D, E) __builtin_ia32_vfmsubaddph512_mask(A, B, C, D, 8) +#define __builtin_ia32_vfmsubaddph512_mask3(A, B, C, D, E) __builtin_ia32_vfmsubaddph512_mask3(A, B, C, D, 8) +#define __builtin_ia32_vfmsubaddph512_maskz(A, B, C, D, E) __builtin_ia32_vfmsubaddph512_maskz(A, B, C, D, 8) /* avx512fp16vlintrin.h */ #define __builtin_ia32_vcmpph_v8hf_mask(A, B, C, D) __builtin_ia32_vcmpph_v8hf_mask(A, B, 1, D) From patchwork Thu Jul 1 06:16:27 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: liuhongt X-Patchwork-Id: 1499385 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256 header.s=default header.b=S1xOPTfV; dkim-atps=neutral Received: from sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4GFq2F2RXSz9sVb for ; Thu, 1 Jul 2021 17:04:05 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 430E83839C43 for ; Thu, 1 Jul 2021 07:04:02 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 430E83839C43 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1625123042; bh=fTmKtHlEK/vjBfkr5ZXpwaFbz4P33LA/l3FyVDV+X7g=; h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=S1xOPTfVwcyRkh4H6i9x78ZaIfTbFZX656+QwGD9GrQxrgfY+hABu7ofuswc7JE+A ntmql7Bj0fxwdXssw/k4OEGBw7YZMNi6Q0lm+zWcRHcqhdEAaLcQiouN05B3uE8Ped 7joazQ2g7oTz4eG3RCihXgBB3YxgYDmUnkWbRPRw= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mga04.intel.com (mga04.intel.com [192.55.52.120]) by sourceware.org (Postfix) with ESMTPS id 23FF93858034 for ; Thu, 1 Jul 2021 06:17:58 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 23FF93858034 X-IronPort-AV: E=McAfee;i="6200,9189,10031"; a="206656525" X-IronPort-AV: E=Sophos;i="5.83,313,1616482800"; d="scan'208";a="206656525" Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by fmsmga104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Jun 2021 23:17:57 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.83,313,1616482800"; d="scan'208";a="420287899" Received: from scymds01.sc.intel.com ([10.148.94.138]) by fmsmga007.fm.intel.com with ESMTP; 30 Jun 2021 23:17:57 -0700 Received: from shliclel320.sh.intel.com (shliclel320.sh.intel.com [10.239.236.50]) by scymds01.sc.intel.com with ESMTP id 1616GmfK031625; Wed, 30 Jun 2021 23:17:55 -0700 To: gcc-patches@gcc.gnu.org Subject: [PATCH 41/62] AVX512FP16: Add testcase for vfmaddsub[132, 213, 231]ph/vfmsubadd[132, 213, 231]ph. Date: Thu, 1 Jul 2021 14:16:27 +0800 Message-Id: <20210701061648.9447-42-hongtao.liu@intel.com> X-Mailer: git-send-email 2.18.1 In-Reply-To: <20210701061648.9447-1-hongtao.liu@intel.com> References: <20210701061648.9447-1-hongtao.liu@intel.com> X-Spam-Status: No, score=-2.2 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_NONE, KAM_DMARC_STATUS, KAM_LAZY_DOMAIN_SECURITY, KAM_SHORT, SPF_HELO_NONE, SPF_NONE, TXREP, UNWANTED_LANGUAGE_BODY autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: liuhongt via Gcc-patches From: liuhongt Reply-To: liuhongt Cc: jakub@redhat.com Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" gcc/testsuite/ChangeLog: * gcc.target/i386/avx512fp16-vfmaddsubXXXph-1a.c: New test. * gcc.target/i386/avx512fp16-vfmaddsubXXXph-1b.c: Ditto. * gcc.target/i386/avx512fp16-vfmsubaddXXXph-1a.c: Ditto. * gcc.target/i386/avx512fp16-vfmsubaddXXXph-1b.c: Ditto. * gcc.target/i386/avx512fp16vl-vfmaddsubXXXph-1a.c: Ditto. * gcc.target/i386/avx512fp16vl-vfmaddsubXXXph-1b.c: Ditto. * gcc.target/i386/avx512fp16vl-vfmsubaddXXXph-1a.c: Ditto. * gcc.target/i386/avx512fp16vl-vfmsubaddXXXph-1b.c: Ditto. --- .../i386/avx512fp16-vfmaddsubXXXph-1a.c | 28 +++ .../i386/avx512fp16-vfmaddsubXXXph-1b.c | 171 +++++++++++++++++ .../i386/avx512fp16-vfmsubaddXXXph-1a.c | 28 +++ .../i386/avx512fp16-vfmsubaddXXXph-1b.c | 175 ++++++++++++++++++ .../i386/avx512fp16vl-vfmaddsubXXXph-1a.c | 28 +++ .../i386/avx512fp16vl-vfmaddsubXXXph-1b.c | 15 ++ .../i386/avx512fp16vl-vfmsubaddXXXph-1a.c | 28 +++ .../i386/avx512fp16vl-vfmsubaddXXXph-1b.c | 15 ++ 8 files changed, 488 insertions(+) create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vfmaddsubXXXph-1a.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vfmaddsubXXXph-1b.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vfmsubaddXXXph-1a.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vfmsubaddXXXph-1b.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vfmaddsubXXXph-1a.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vfmaddsubXXXph-1b.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vfmsubaddXXXph-1a.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vfmsubaddXXXph-1b.c diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vfmaddsubXXXph-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vfmaddsubXXXph-1a.c new file mode 100644 index 00000000000..7063646ef58 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vfmaddsubXXXph-1a.c @@ -0,0 +1,28 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx512fp16 -O2" } */ +/* { dg-final { scan-assembler-times "vfmaddsub...ph\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vfmaddsub...ph\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 2 } } */ +/* { dg-final { scan-assembler-times "vfmaddsub231ph\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vfmaddsub...ph\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vfmaddsub...ph\[ \\t\]+\[^\n\]*\{rn-sae\}\[^\{\n\]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vfmaddsub...ph\[ \\t\]+\[^\n\]*\{rd-sae\}\[^\{\n\]*%zmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vfmaddsub231ph\[ \\t\]+\[^\n\]*\{ru-sae\}\[^\{\n\]*%zmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vfmaddsub...ph\[ \\t\]+\[^\n\]*\{rz-sae\}\[^\{\n\]*%zmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ + +#include + +volatile __m512h x1, x2, x3; +volatile __mmask32 m; + +void extern +avx512f_test (void) +{ + x1 = _mm512_fmaddsub_ph (x1, x2, x3); + x1 = _mm512_mask_fmaddsub_ph (x1, m, x2, x3); + x3 = _mm512_mask3_fmaddsub_ph (x1, x2, x3, m); + x1 = _mm512_maskz_fmaddsub_ph (m, x1, x2, x3); + x1 = _mm512_fmaddsub_round_ph (x1, x2, x3, _MM_FROUND_TO_NEAREST_INT | _MM_FROUND_NO_EXC); + x1 = _mm512_mask_fmaddsub_round_ph (x1, m, x2, x3, _MM_FROUND_TO_NEG_INF | _MM_FROUND_NO_EXC); + x3 = _mm512_mask3_fmaddsub_round_ph (x1, x2, x3, m, _MM_FROUND_TO_POS_INF | _MM_FROUND_NO_EXC); + x1 = _mm512_maskz_fmaddsub_round_ph (m, x1, x2, x3, _MM_FROUND_TO_ZERO | _MM_FROUND_NO_EXC); +} diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vfmaddsubXXXph-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vfmaddsubXXXph-1b.c new file mode 100644 index 00000000000..16cf0af19d6 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vfmaddsubXXXph-1b.c @@ -0,0 +1,171 @@ +/* { dg-do run { target avx512fp16 } } */ +/* { dg-options "-O2 -mavx512fp16 -mavx512dq" } */ + + +#define AVX512FP16 +#include "avx512fp16-helper.h" + +#define N_ELEMS (AVX512F_LEN / 16) + +void NOINLINE +EMULATE(fmaddsub_ph) (V512 * dest, V512 op1, V512 op2, + __mmask32 k, int zero_mask) +{ + V512 v1, v2, v3, v4, v5, v6, v7, v8; + int i; + __mmask16 m1, m2; + + m1 = k & 0xffff; + m2 = (k >> 16) & 0xffff; + + unpack_ph_2twops(op1, &v1, &v2); + unpack_ph_2twops(op2, &v3, &v4); + unpack_ph_2twops(*dest, &v7, &v8); + + for (i = 0; i < 16; i++) { + if (((1 << i) & m1) == 0) { + if (zero_mask) { + v5.f32[i] = 0; + } + else { + v5.u32[i] = v7.u32[i]; + } + } + else { + if (i % 2 == 1) { + v5.f32[i] = v1.f32[i] * v3.f32[i] + v7.f32[i]; + } + else { + v5.f32[i] = v1.f32[i] * v3.f32[i] - v7.f32[i]; + } + } + + if (((1 << i) & m2) == 0) { + if (zero_mask) { + v6.f32[i] = 0; + } + else { + v6.u32[i] = v8.u32[i]; + } + } + else { + if (i % 2 == 1) { + v6.f32[i] = v2.f32[i] * v4.f32[i] + v8.f32[i]; + } + else { + v6.f32[i] = v2.f32[i] * v4.f32[i] - v8.f32[i]; + } + } + } + *dest = pack_twops_2ph(v5, v6); +} + +void NOINLINE +EMULATE(m_fmaddsub_ph) (V512 * dest, V512 op1, V512 op2, + __mmask32 k, int zero_mask) +{ + V512 v1, v2, v3, v4, v5, v6, v7, v8; + int i; + __mmask16 m1, m2; + + m1 = k & 0xffff; + m2 = (k >> 16) & 0xffff; + + unpack_ph_2twops(op1, &v1, &v2); + unpack_ph_2twops(op2, &v3, &v4); + unpack_ph_2twops(*dest, &v7, &v8); + + for (i = 0; i < 16; i++) { + if (((1 << i) & m1) == 0) { + if (zero_mask) { + v5.f32[i] = 0; + } + else { + v5.u32[i] = v7.u32[i]; + } + } + else { + if (i % 2 == 1) { + v5.f32[i] = v1.f32[i] * v7.f32[i] + v3.f32[i]; + } + else { + v5.f32[i] = v1.f32[i] * v7.f32[i] - v3.f32[i]; + } + } + + if (((1 << i) & m2) == 0) { + if (zero_mask) { + v6.f32[i] = 0; + } + else { + v6.u32[i] = v8.u32[i]; + } + } + else { + if (i % 2 == 1) { + v6.f32[i] = v2.f32[i] * v8.f32[i] + v4.f32[i]; + } + else { + v6.f32[i] = v2.f32[i] * v8.f32[i] - v4.f32[i]; + } + } + } + *dest = pack_twops_2ph(v5, v6); +} + +void +TEST (void) +{ + V512 res; + V512 exp; + + init_src(); + + init_dest(&res, &exp); + EMULATE(fmaddsub_ph)(&exp, src1, src2, NET_MASK, 0); + HF(res) = INTRINSIC (_fmaddsub_ph) (HF(src1), HF(src2), HF(res)); + CHECK_RESULT (&res, &exp, N_ELEMS, _fmaddsub_ph); + init_dest(&res, &exp); + EMULATE(fmaddsub_ph)(&exp, src1, src2, MASK_VALUE, 0); + HF(res) = INTRINSIC (_mask3_fmaddsub_ph) (HF(src1), HF(src2), + HF(res), MASK_VALUE); + CHECK_RESULT (&res, &exp, N_ELEMS, _mask3_fmaddsub_ph); + init_dest(&res, &exp); + EMULATE(m_fmaddsub_ph)(&exp, src1, src2, MASK_VALUE, 0); + HF(res) = INTRINSIC (_mask_fmaddsub_ph) (HF(res), MASK_VALUE, + HF(src1), HF(src2)); + CHECK_RESULT (&res, &exp, N_ELEMS, _mask_fmaddsub_ph); + init_dest(&res, &exp); + EMULATE(fmaddsub_ph)(&exp, src1, src2, ZMASK_VALUE, 1); + HF(res) = INTRINSIC (_maskz_fmaddsub_ph) (ZMASK_VALUE, HF(src1), + HF(src2), HF(res)); + CHECK_RESULT (&res, &exp, N_ELEMS, _maskz_fmaddsub_ph); + + init_dest(&res, &exp); +#if AVX512F_LEN == 512 + EMULATE(fmaddsub_ph)(&exp, src1, src2, NET_MASK, 0); + HF(res) = INTRINSIC (_fmaddsub_round_ph) (HF(src1), HF(src2), + HF(res), _ROUND_NINT); + CHECK_RESULT (&res, &exp, N_ELEMS, _fmaddsub_ph); + init_dest(&res, &exp); + EMULATE(fmaddsub_ph)(&exp, src1, src2, MASK_VALUE, 0); + HF(res) = INTRINSIC (_mask3_fmaddsub_round_ph) (HF(src1), HF(src2), + HF(res), MASK_VALUE, _ROUND_NINT); + CHECK_RESULT (&res, &exp, N_ELEMS, _mask3_fmaddsub_ph); + init_dest(&res, &exp); + EMULATE(m_fmaddsub_ph)(&exp, src1, src2, MASK_VALUE, 0); + HF(res) = INTRINSIC (_mask_fmaddsub_round_ph) (HF(res), MASK_VALUE, + HF(src1), HF(src2), _ROUND_NINT); + CHECK_RESULT (&res, &exp, N_ELEMS, _mask_fmaddsub_ph); + init_dest(&res, &exp); + EMULATE(fmaddsub_ph)(&exp, src1, src2, ZMASK_VALUE, 1); + HF(res) = INTRINSIC (_maskz_fmaddsub_round_ph) (ZMASK_VALUE, HF(src1), + HF(src2), HF(res), _ROUND_NINT); + CHECK_RESULT (&res, &exp, N_ELEMS, _maskz_fmaddsub_ph); +#endif + + if (n_errs != 0) { + abort (); + } +} + diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vfmsubaddXXXph-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vfmsubaddXXXph-1a.c new file mode 100644 index 00000000000..87087c9fb42 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vfmsubaddXXXph-1a.c @@ -0,0 +1,28 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx512fp16 -O2" } */ +/* { dg-final { scan-assembler-times "vfmsubadd...ph\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vfmsubadd...ph\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 2 } } */ +/* { dg-final { scan-assembler-times "vfmsubadd231ph\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vfmsubadd...ph\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vfmsubadd...ph\[ \\t\]+\[^\n\]*\{rn-sae\}\[^\{\n\]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vfmsubadd...ph\[ \\t\]+\[^\n\]*\{rd-sae\}\[^\{\n\]*%zmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vfmsubadd231ph\[ \\t\]+\[^\n\]*\{ru-sae\}\[^\{\n\]*%zmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vfmsubadd...ph\[ \\t\]+\[^\n\]*\{rz-sae\}\[^\{\n\]*%zmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ + +#include + +volatile __m512h x1, x2, x3; +volatile __mmask32 m; + +void extern +avx512f_test (void) +{ + x1 = _mm512_fmsubadd_ph (x1, x2, x3); + x1 = _mm512_mask_fmsubadd_ph (x1, m, x2, x3); + x3 = _mm512_mask3_fmsubadd_ph (x1, x2, x3, m); + x1 = _mm512_maskz_fmsubadd_ph (m, x1, x2, x3); + x1 = _mm512_fmsubadd_round_ph (x1, x2, x3, _MM_FROUND_TO_NEAREST_INT | _MM_FROUND_NO_EXC); + x1 = _mm512_mask_fmsubadd_round_ph (x1, m, x2, x3, _MM_FROUND_TO_NEG_INF | _MM_FROUND_NO_EXC); + x3 = _mm512_mask3_fmsubadd_round_ph (x1, x2, x3, m, _MM_FROUND_TO_POS_INF | _MM_FROUND_NO_EXC); + x1 = _mm512_maskz_fmsubadd_round_ph (m, x1, x2, x3, _MM_FROUND_TO_ZERO | _MM_FROUND_NO_EXC); +} diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vfmsubaddXXXph-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vfmsubaddXXXph-1b.c new file mode 100644 index 00000000000..159cae4bb26 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vfmsubaddXXXph-1b.c @@ -0,0 +1,175 @@ +/* { dg-do run { target avx512fp16 } } */ +/* { dg-options "-O2 -mavx512fp16 -mavx512dq" } */ + + +#define AVX512FP16 +#include "avx512fp16-helper.h" + +#define N_ELEMS (AVX512F_LEN / 16) + +void NOINLINE +EMULATE(fmsubadd_ph) (V512 * dest, V512 op1, V512 op2, + __mmask32 k, int zero_mask) +{ + V512 v1, v2, v3, v4, v5, v6, v7, v8; + int i; + __mmask16 m1, m2; + + m1 = k & 0xffff; + m2 = (k >> 16) & 0xffff; + + unpack_ph_2twops(op1, &v1, &v2); + unpack_ph_2twops(op2, &v3, &v4); + unpack_ph_2twops(*dest, &v7, &v8); + + for (i = 0; i < 16; i++) { + if (((1 << i) & m1) == 0) { + if (zero_mask) { + v5.f32[i] = 0; + } + else { + v5.u32[i] = v7.u32[i]; + } + } + else { + if (i % 2 == 1) { + v5.f32[i] = v1.f32[i] * v3.f32[i] - v7.f32[i]; + } + else { + v5.f32[i] = v1.f32[i] * v3.f32[i] + v7.f32[i]; + } + } + + if (((1 << i) & m2) == 0) { + if (zero_mask) { + v6.f32[i] = 0; + } + else { + v6.u32[i] = v8.u32[i]; + } + } + else { + if (i % 2 == 1) { + v6.f32[i] = v2.f32[i] * v4.f32[i] - v8.f32[i]; + } + else { + v6.f32[i] = v2.f32[i] * v4.f32[i] + v8.f32[i]; + } + } + } + *dest = pack_twops_2ph(v5, v6); +} + +void NOINLINE +EMULATE(m_fmsubadd_ph) (V512 * dest, V512 op1, V512 op2, + __mmask32 k, int zero_mask) +{ + V512 v1, v2, v3, v4, v5, v6, v7, v8; + int i; + __mmask16 m1, m2; + + m1 = k & 0xffff; + m2 = (k >> 16) & 0xffff; + + unpack_ph_2twops(op1, &v1, &v2); + unpack_ph_2twops(op2, &v3, &v4); + unpack_ph_2twops(*dest, &v7, &v8); + + for (i = 0; i < 16; i++) { + if (((1 << i) & m1) == 0) { + if (zero_mask) { + v5.f32[i] = 0; + } + else { + v5.u32[i] = v7.u32[i]; + } + } + else { + if (i % 2 == 1) { + v5.f32[i] = v1.f32[i] * v7.f32[i] - v3.f32[i]; + } + else { + v5.f32[i] = v1.f32[i] * v7.f32[i] + v3.f32[i]; + } + } + + if (((1 << i) & m2) == 0) { + if (zero_mask) { + v6.f32[i] = 0; + } + else { + v6.u32[i] = v8.u32[i]; + } + } + else { + if (i % 2 == 1) { + v6.f32[i] = v2.f32[i] * v8.f32[i] - v4.f32[i]; + } + else { + v6.f32[i] = v2.f32[i] * v8.f32[i] + v4.f32[i]; + } + } + } + *dest = pack_twops_2ph(v5, v6); +} + +void +TEST (void) +{ + V512 res; + V512 exp; + + init_src(); + + init_dest(&res, &exp); + EMULATE(fmsubadd_ph)(&exp, src1, src2, NET_MASK, 0); + HF(res) = INTRINSIC (_fmsubadd_ph) (HF(src1), HF(src2), HF(res)); + CHECK_RESULT (&res, &exp, N_ELEMS, _fmsubadd_ph); + init_dest(&res, &exp); + EMULATE(fmsubadd_ph)(&exp, src1, src2, MASK_VALUE, 0); + HF(res) = INTRINSIC (_mask3_fmsubadd_ph) (HF(src1), HF(src2), + HF(res), MASK_VALUE); + CHECK_RESULT (&res, &exp, N_ELEMS, _mask3_fmsubadd_ph); + init_dest(&res, &exp); + EMULATE(m_fmsubadd_ph)(&exp, src1, src2, MASK_VALUE, 0); + HF(res) = INTRINSIC (_mask_fmsubadd_ph) (HF(res), MASK_VALUE, + HF(src1), HF(src2)); + CHECK_RESULT (&res, &exp, N_ELEMS, _mask_fmsubadd_ph); + init_dest(&res, &exp); + EMULATE(fmsubadd_ph)(&exp, src1, src2, ZMASK_VALUE, 1); + HF(res) = INTRINSIC (_maskz_fmsubadd_ph) (ZMASK_VALUE, HF(src1), + HF(src2), HF(res)); + CHECK_RESULT (&res, &exp, N_ELEMS, _maskz_fmsubadd_ph); + + init_dest(&res, &exp); +#if AVX512F_LEN == 512 + EMULATE(fmsubadd_ph)(&exp, src1, src2, NET_MASK, 0); + HF(res) = INTRINSIC (_fmsubadd_round_ph) (HF(src1), HF(src2), + HF(res), _ROUND_NINT); + CHECK_RESULT (&res, &exp, N_ELEMS, _fmsubadd_ph); + init_dest(&res, &exp); + EMULATE(fmsubadd_ph)(&exp, src1, src2, MASK_VALUE, 0); + HF(res) = INTRINSIC (_mask3_fmsubadd_round_ph) (HF(src1), HF(src2), + HF(res), MASK_VALUE, + _ROUND_NINT); + CHECK_RESULT (&res, &exp, N_ELEMS, _mask3_fmsubadd_ph); + init_dest(&res, &exp); + EMULATE(m_fmsubadd_ph)(&exp, src1, src2, MASK_VALUE, 0); + HF(res) = INTRINSIC (_mask_fmsubadd_round_ph) (HF(res), MASK_VALUE, + HF(src1), HF(src2), + _ROUND_NINT); + CHECK_RESULT (&res, &exp, N_ELEMS, _mask_fmsubadd_ph); + init_dest(&res, &exp); + EMULATE(fmsubadd_ph)(&exp, src1, src2, ZMASK_VALUE, 1); + HF(res) = INTRINSIC (_maskz_fmsubadd_round_ph) (ZMASK_VALUE, HF(src1), + HF(src2), HF(res), + _ROUND_NINT); + CHECK_RESULT (&res, &exp, N_ELEMS, _maskz_fmsubadd_ph); +#endif + + if (n_errs != 0) { + abort (); + } +} + + diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-vfmaddsubXXXph-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vfmaddsubXXXph-1a.c new file mode 100644 index 00000000000..963fbb6af90 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vfmaddsubXXXph-1a.c @@ -0,0 +1,28 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx512vl -mavx512fp16 -O2" } */ +/* { dg-final { scan-assembler-times "vfmaddsub...ph\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 2 } } */ +/* { dg-final { scan-assembler-times "vfmaddsub...ph\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 2 } } */ +/* { dg-final { scan-assembler-times "vfmaddsub231ph\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vfmaddsub231ph\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vfmaddsub...ph\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vfmaddsub...ph\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ + +#include + +volatile __m256h yy, y2, y3; +volatile __m128h xx, x2, x3; +volatile __mmask8 m; +volatile __mmask16 m16; + +void extern +avx512vl_test (void) +{ + yy = _mm256_mask_fmaddsub_ph (yy, m16, y2, y3); + xx = _mm_mask_fmaddsub_ph (xx, m, x2, x3); + + y3 = _mm256_mask3_fmaddsub_ph (yy, y2, y3, m16); + x3 = _mm_mask3_fmaddsub_ph (xx, x2, x3, m); + + yy = _mm256_maskz_fmaddsub_ph (m16, yy, y2, y3); + xx = _mm_maskz_fmaddsub_ph (m, xx, x2, x3); +} diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-vfmaddsubXXXph-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vfmaddsubXXXph-1b.c new file mode 100644 index 00000000000..7f9748b7e26 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vfmaddsubXXXph-1b.c @@ -0,0 +1,15 @@ +/* { dg-do run { target avx512fp16 } } */ +/* { dg-options "-O2 -mavx512fp16 -mavx512vl -mavx512dq" } */ + +#define AVX512VL +#define AVX512F_LEN 256 +#define AVX512F_LEN_HALF 128 +#include "avx512fp16-vfmaddsubXXXph-1b.c" + +#undef AVX512F_LEN +#undef AVX512F_LEN_HALF + +#define AVX512F_LEN 128 +#define AVX512F_LEN_HALF 128 +#include "avx512fp16-vfmaddsubXXXph-1b.c" + diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-vfmsubaddXXXph-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vfmsubaddXXXph-1a.c new file mode 100644 index 00000000000..0316b8e0714 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vfmsubaddXXXph-1a.c @@ -0,0 +1,28 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx512vl -mavx512fp16 -O2" } */ +/* { dg-final { scan-assembler-times "vfmsubadd...ph\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 2 } } */ +/* { dg-final { scan-assembler-times "vfmsubadd...ph\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 2 } } */ +/* { dg-final { scan-assembler-times "vfmsubadd231ph\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vfmsubadd231ph\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vfmsubadd...ph\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vfmsubadd...ph\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ + +#include + +volatile __m256h yy, y2, y3; +volatile __m128h xx, x2, x3; +volatile __mmask8 m; +volatile __mmask16 m16; + +void extern +avx512vl_test (void) +{ + yy = _mm256_mask_fmsubadd_ph (yy, m16, y2, y3); + xx = _mm_mask_fmsubadd_ph (xx, m, x2, x3); + + y3 = _mm256_mask3_fmsubadd_ph (yy, y2, y3, m16); + x3 = _mm_mask3_fmsubadd_ph (xx, x2, x3, m); + + yy = _mm256_maskz_fmsubadd_ph (m16, yy, y2, y3); + xx = _mm_maskz_fmsubadd_ph (m, xx, x2, x3); +} diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-vfmsubaddXXXph-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vfmsubaddXXXph-1b.c new file mode 100644 index 00000000000..c8caca105ad --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vfmsubaddXXXph-1b.c @@ -0,0 +1,15 @@ +/* { dg-do run { target avx512fp16 } } */ +/* { dg-options "-O2 -mavx512fp16 -mavx512vl -mavx512dq" } */ + +#define AVX512VL +#define AVX512F_LEN 256 +#define AVX512F_LEN_HALF 128 +#include "avx512fp16-vfmsubaddXXXph-1b.c" + +#undef AVX512F_LEN +#undef AVX512F_LEN_HALF + +#define AVX512F_LEN 128 +#define AVX512F_LEN_HALF 128 +#include "avx512fp16-vfmsubaddXXXph-1b.c" + From patchwork Thu Jul 1 06:16:28 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: liuhongt X-Patchwork-Id: 1499386 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=8.43.85.97; helo=sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256 header.s=default header.b=Wj7NeXSc; dkim-atps=neutral Received: from sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4GFq3g57fPz9sVb for ; Thu, 1 Jul 2021 17:05:19 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 7ED0C394D82A for ; Thu, 1 Jul 2021 07:05:17 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 7ED0C394D82A DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1625123117; bh=d1fN1OKBqJPy1YX1D7pDKK3nu7Kgm9C7EMoUjsZiL2M=; h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=Wj7NeXScz2trpI5LmTSbr95vsTpEhE8E7BiFV501Co1oWIxCi8bOo1/jZpERmHdg6 8Zp0KoYAos+CybRnnTMFU4yIR4jSuKYclwqTenPPg4q+u6mQGKqCI9uRUy2hRI9Hht y6k65HFctb0oKWTrnyCUh+1E4ZQ7gG2D/hdWgDjk= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by sourceware.org (Postfix) with ESMTPS id 658FD384B806 for ; Thu, 1 Jul 2021 06:18:00 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 658FD384B806 X-IronPort-AV: E=McAfee;i="6200,9189,10031"; a="205474536" X-IronPort-AV: E=Sophos;i="5.83,313,1616482800"; d="scan'208";a="205474536" Received: from orsmga004.jf.intel.com ([10.7.209.38]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Jun 2021 23:17:59 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.83,313,1616482800"; d="scan'208";a="558546762" Received: from scymds01.sc.intel.com ([10.148.94.138]) by orsmga004.jf.intel.com with ESMTP; 30 Jun 2021 23:17:58 -0700 Received: from shliclel320.sh.intel.com (shliclel320.sh.intel.com [10.239.236.50]) by scymds01.sc.intel.com with ESMTP id 1616GmfL031625; Wed, 30 Jun 2021 23:17:57 -0700 To: gcc-patches@gcc.gnu.org Subject: [PATCH 42/62] AVX512FP16: Add FP16 fma instructions. Date: Thu, 1 Jul 2021 14:16:28 +0800 Message-Id: <20210701061648.9447-43-hongtao.liu@intel.com> X-Mailer: git-send-email 2.18.1 In-Reply-To: <20210701061648.9447-1-hongtao.liu@intel.com> References: <20210701061648.9447-1-hongtao.liu@intel.com> X-Spam-Status: No, score=-12.2 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_NONE, KAM_DMARC_STATUS, KAM_LAZY_DOMAIN_SECURITY, KAM_SHORT, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: liuhongt via Gcc-patches From: liuhongt Reply-To: liuhongt Cc: jakub@redhat.com Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" Add vfmadd[132,213,231]ph/vfnmadd[132,213,231]ph/vfmsub[132,213,231]ph/ vfnmsub[132,213,231]ph. gcc/ChangeLog: * config/i386/avx512fp16intrin.h (_mm512_mask_fmadd_ph): New intrinsic. (_mm512_mask3_fmadd_ph): Likewise. (_mm512_maskz_fmadd_ph): Likewise. (_mm512_fmadd_round_ph): Likewise. (_mm512_mask_fmadd_round_ph): Likewise. (_mm512_mask3_fmadd_round_ph): Likewise. (_mm512_maskz_fmadd_round_ph): Likewise. (_mm512_fnmadd_ph): Likewise. (_mm512_mask_fnmadd_ph): Likewise. (_mm512_mask3_fnmadd_ph): Likewise. (_mm512_maskz_fnmadd_ph): Likewise. (_mm512_fnmadd_round_ph): Likewise. (_mm512_mask_fnmadd_round_ph): Likewise. (_mm512_mask3_fnmadd_round_ph): Likewise. (_mm512_maskz_fnmadd_round_ph): Likewise. (_mm512_fmsub_ph): Likewise. (_mm512_mask_fmsub_ph): Likewise. (_mm512_mask3_fmsub_ph): Likewise. (_mm512_maskz_fmsub_ph): Likewise. (_mm512_fmsub_round_ph): Likewise. (_mm512_mask_fmsub_round_ph): Likewise. (_mm512_mask3_fmsub_round_ph): Likewise. (_mm512_maskz_fmsub_round_ph): Likewise. (_mm512_fnmsub_ph): Likewise. (_mm512_mask_fnmsub_ph): Likewise. (_mm512_mask3_fnmsub_ph): Likewise. (_mm512_maskz_fnmsub_ph): Likewise. (_mm512_fnmsub_round_ph): Likewise. (_mm512_mask_fnmsub_round_ph): Likewise. (_mm512_mask3_fnmsub_round_ph): Likewise. (_mm512_maskz_fnmsub_round_ph): Likewise. * config/i386/avx512fp16vlintrin.h (_mm256_fmadd_ph): New intrinsic. (_mm256_mask_fmadd_ph): Likewise. (_mm256_mask3_fmadd_ph): Likewise. (_mm256_maskz_fmadd_ph): Likewise. (_mm_fmadd_ph): Likewise. (_mm_mask_fmadd_ph): Likewise. (_mm_mask3_fmadd_ph): Likewise. (_mm_maskz_fmadd_ph): Likewise. (_mm256_fnmadd_ph): Likewise. (_mm256_mask_fnmadd_ph): Likewise. (_mm256_mask3_fnmadd_ph): Likewise. (_mm256_maskz_fnmadd_ph): Likewise. (_mm_fnmadd_ph): Likewise. (_mm_mask_fnmadd_ph): Likewise. (_mm_mask3_fnmadd_ph): Likewise. (_mm_maskz_fnmadd_ph): Likewise. (_mm256_fmsub_ph): Likewise. (_mm256_mask_fmsub_ph): Likewise. (_mm256_mask3_fmsub_ph): Likewise. (_mm256_maskz_fmsub_ph): Likewise. (_mm_fmsub_ph): Likewise. (_mm_mask_fmsub_ph): Likewise. (_mm_mask3_fmsub_ph): Likewise. (_mm_maskz_fmsub_ph): Likewise. (_mm256_fnmsub_ph): Likewise. (_mm256_mask_fnmsub_ph): Likewise. (_mm256_mask3_fnmsub_ph): Likewise. (_mm256_maskz_fnmsub_ph): Likewise. (_mm_fnmsub_ph): Likewise. (_mm_mask_fnmsub_ph): Likewise. (_mm_mask3_fnmsub_ph): Likewise. (_mm_maskz_fnmsub_ph): Likewise. * config/i386/i386-builtin.def: Add corresponding new builtins. * config/i386/sse.md (avx512bcst): Add HF vector modes. (_fmadd__maskz): Adjust to support HF vector modes. (fma_fmadd_): Ditto. (*fma_fmadd__bcst_1): Ditto. (*fma_fmadd__bcst_2): Ditto. (*fma_fmadd__bcst_3): Ditto. (_fmadd__mask): Ditto. (_fmadd__mask3): Ditto. (_fmsub__maskz): Ditto. (fma_fmsub_): Ditto. (*fma_fmsub__bcst_1): Ditto. (*fma_fmsub__bcst_2): Ditto. (*fma_fmsub__bcst_3): Ditto. (_fmsub__mask): Ditto. (_fmsub__mask3): Ditto. (fma_fnmadd_): Ditto. (*fma_fnmadd__bcst_1): Ditto. (*fma_fnmadd__bcst_2): Ditto. (*fma_fnmadd__bcst_3): Ditto. (_fnmadd__mask): Ditto. (_fnmadd__mask3): Ditto. (_fnmsub__maskz): Ditto. (fma_fnmsub_): Ditto. (*fma_fnmsub__bcst_1): Ditto. (*fma_fnmsub__bcst_2): Ditto. (*fma_fnmsub__bcst_3): Ditto. (_fnmsub__mask): Ditto. (_fnmsub__mask3): Ditto. gcc/testsuite/ChangeLog: * gcc.target/i386/avx-1.c: Add test for new builtins. * gcc.target/i386/sse-13.c: Ditto. * gcc.target/i386/sse-23.c: Ditto. * gcc.target/i386/sse-14.c: Add test fot new intrinsics. * gcc.target/i386/sse-22.c: Ditto. --- gcc/config/i386/avx512fp16intrin.h | 432 +++++++++++++++++++++++++ gcc/config/i386/avx512fp16vlintrin.h | 364 +++++++++++++++++++++ gcc/config/i386/i386-builtin.def | 36 +++ gcc/config/i386/sse.md | 196 +++++------ gcc/testsuite/gcc.target/i386/avx-1.c | 12 + gcc/testsuite/gcc.target/i386/sse-13.c | 12 + gcc/testsuite/gcc.target/i386/sse-14.c | 16 + gcc/testsuite/gcc.target/i386/sse-22.c | 16 + gcc/testsuite/gcc.target/i386/sse-23.c | 12 + 9 files changed, 999 insertions(+), 97 deletions(-) diff --git a/gcc/config/i386/avx512fp16intrin.h b/gcc/config/i386/avx512fp16intrin.h index 4092663b504..f246bab5159 100644 --- a/gcc/config/i386/avx512fp16intrin.h +++ b/gcc/config/i386/avx512fp16intrin.h @@ -5265,6 +5265,438 @@ _mm512_maskz_fmsubadd_round_ph (__mmask32 __U, __m512h __A, __m512h __B, #endif /* __OPTIMIZE__ */ +/* Intrinsics vfmadd[132,213,231]ph. */ +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) + _mm512_fmadd_ph (__m512h __A, __m512h __B, __m512h __C) +{ + return (__m512h) + __builtin_ia32_vfmaddph512_mask ((__v32hf) __A, + (__v32hf) __B, + (__v32hf) __C, + (__mmask32) -1, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_fmadd_ph (__m512h __A, __mmask32 __U, __m512h __B, __m512h __C) +{ + return (__m512h) + __builtin_ia32_vfmaddph512_mask ((__v32hf) __A, + (__v32hf) __B, + (__v32hf) __C, + (__mmask32) __U, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask3_fmadd_ph (__m512h __A, __m512h __B, __m512h __C, __mmask32 __U) +{ + return (__m512h) + __builtin_ia32_vfmaddph512_mask3 ((__v32hf) __A, + (__v32hf) __B, + (__v32hf) __C, + (__mmask32) __U, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_fmadd_ph (__mmask32 __U, __m512h __A, __m512h __B, __m512h __C) +{ + return (__m512h) + __builtin_ia32_vfmaddph512_maskz ((__v32hf) __A, + (__v32hf) __B, + (__v32hf) __C, + (__mmask32) __U, + _MM_FROUND_CUR_DIRECTION); +} + +#ifdef __OPTIMIZE__ +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_fmadd_round_ph (__m512h __A, __m512h __B, __m512h __C, const int __R) +{ + return (__m512h) __builtin_ia32_vfmaddph512_mask ((__v32hf) __A, + (__v32hf) __B, + (__v32hf) __C, + (__mmask32) -1, __R); +} + +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_fmadd_round_ph (__m512h __A, __mmask32 __U, __m512h __B, + __m512h __C, const int __R) +{ + return (__m512h) __builtin_ia32_vfmaddph512_mask ((__v32hf) __A, + (__v32hf) __B, + (__v32hf) __C, + (__mmask32) __U, __R); +} + +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask3_fmadd_round_ph (__m512h __A, __m512h __B, __m512h __C, + __mmask32 __U, const int __R) +{ + return (__m512h) __builtin_ia32_vfmaddph512_mask3 ((__v32hf) __A, + (__v32hf) __B, + (__v32hf) __C, + (__mmask32) __U, __R); +} + +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_fmadd_round_ph (__mmask32 __U, __m512h __A, __m512h __B, + __m512h __C, const int __R) +{ + return (__m512h) __builtin_ia32_vfmaddph512_maskz ((__v32hf) __A, + (__v32hf) __B, + (__v32hf) __C, + (__mmask32) __U, __R); +} + +#else +#define _mm512_fmadd_round_ph(A, B, C, R) \ + ((__m512h)__builtin_ia32_vfmaddph512_mask ((A), (B), (C), -1, (R))) + +#define _mm512_mask_fmadd_round_ph(A, U, B, C, R) \ + ((__m512h)__builtin_ia32_vfmaddph512_mask ((A), (B), (C), (U), (R))) + +#define _mm512_mask3_fmadd_round_ph(A, B, C, U, R) \ + ((__m512h)__builtin_ia32_vfmaddph512_mask3 ((A), (B), (C), (U), (R))) + +#define _mm512_maskz_fmadd_round_ph(U, A, B, C, R) \ + ((__m512h)__builtin_ia32_vfmaddph512_maskz ((A), (B), (C), (U), (R))) + +#endif /* __OPTIMIZE__ */ + +/* Intrinsics vfnmadd[132,213,231]ph. */ +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_fnmadd_ph (__m512h __A, __m512h __B, __m512h __C) +{ + return (__m512h) + __builtin_ia32_vfnmaddph512_mask ((__v32hf) __A, + (__v32hf) __B, + (__v32hf) __C, + (__mmask32) -1, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_fnmadd_ph (__m512h __A, __mmask32 __U, __m512h __B, __m512h __C) +{ + return (__m512h) + __builtin_ia32_vfnmaddph512_mask ((__v32hf) __A, + (__v32hf) __B, + (__v32hf) __C, + (__mmask32) __U, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask3_fnmadd_ph (__m512h __A, __m512h __B, __m512h __C, __mmask32 __U) +{ + return (__m512h) + __builtin_ia32_vfnmaddph512_mask3 ((__v32hf) __A, + (__v32hf) __B, + (__v32hf) __C, + (__mmask32) __U, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_fnmadd_ph (__mmask32 __U, __m512h __A, __m512h __B, __m512h __C) +{ + return (__m512h) + __builtin_ia32_vfnmaddph512_maskz ((__v32hf) __A, + (__v32hf) __B, + (__v32hf) __C, + (__mmask32) __U, + _MM_FROUND_CUR_DIRECTION); +} + +#ifdef __OPTIMIZE__ +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_fnmadd_round_ph (__m512h __A, __m512h __B, __m512h __C, const int __R) +{ + return (__m512h) __builtin_ia32_vfnmaddph512_mask ((__v32hf) __A, + (__v32hf) __B, + (__v32hf) __C, + (__mmask32) -1, __R); +} + +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_fnmadd_round_ph (__m512h __A, __mmask32 __U, __m512h __B, + __m512h __C, const int __R) +{ + return (__m512h) __builtin_ia32_vfnmaddph512_mask ((__v32hf) __A, + (__v32hf) __B, + (__v32hf) __C, + (__mmask32) __U, __R); +} + +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask3_fnmadd_round_ph (__m512h __A, __m512h __B, __m512h __C, + __mmask32 __U, const int __R) +{ + return (__m512h) __builtin_ia32_vfnmaddph512_mask3 ((__v32hf) __A, + (__v32hf) __B, + (__v32hf) __C, + (__mmask32) __U, __R); +} + +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_fnmadd_round_ph (__mmask32 __U, __m512h __A, __m512h __B, + __m512h __C, const int __R) +{ + return (__m512h) __builtin_ia32_vfnmaddph512_maskz ((__v32hf) __A, + (__v32hf) __B, + (__v32hf) __C, + (__mmask32) __U, __R); +} + +#else +#define _mm512_fnmadd_round_ph(A, B, C, R) \ + ((__m512h)__builtin_ia32_vfnmaddph512_mask ((A), (B), (C), -1, (R))) + +#define _mm512_mask_fnmadd_round_ph(A, U, B, C, R) \ + ((__m512h)__builtin_ia32_vfnmaddph512_mask ((A), (B), (C), (U), (R))) + +#define _mm512_mask3_fnmadd_round_ph(A, B, C, U, R) \ + ((__m512h)__builtin_ia32_vfnmaddph512_mask3 ((A), (B), (C), (U), (R))) + +#define _mm512_maskz_fnmadd_round_ph(U, A, B, C, R) \ + ((__m512h)__builtin_ia32_vfnmaddph512_maskz ((A), (B), (C), (U), (R))) + +#endif /* __OPTIMIZE__ */ + +/* Intrinsics vfmsub[132,213,231]ph. */ +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_fmsub_ph (__m512h __A, __m512h __B, __m512h __C) +{ + return (__m512h) + __builtin_ia32_vfmsubph512_mask ((__v32hf) __A, + (__v32hf) __B, + (__v32hf) __C, + (__mmask32) -1, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_fmsub_ph (__m512h __A, __mmask32 __U, __m512h __B, __m512h __C) +{ + return (__m512h) + __builtin_ia32_vfmsubph512_mask ((__v32hf) __A, + (__v32hf) __B, + (__v32hf) __C, + (__mmask32) __U, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask3_fmsub_ph (__m512h __A, __m512h __B, __m512h __C, __mmask32 __U) +{ + return (__m512h) + __builtin_ia32_vfmsubph512_mask3 ((__v32hf) __A, + (__v32hf) __B, + (__v32hf) __C, + (__mmask32) __U, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_fmsub_ph (__mmask32 __U, __m512h __A, __m512h __B, __m512h __C) +{ + return (__m512h) + __builtin_ia32_vfmsubph512_maskz ((__v32hf) __A, + (__v32hf) __B, + (__v32hf) __C, + (__mmask32) __U, + _MM_FROUND_CUR_DIRECTION); +} + +#ifdef __OPTIMIZE__ +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_fmsub_round_ph (__m512h __A, __m512h __B, __m512h __C, const int __R) +{ + return (__m512h) __builtin_ia32_vfmsubph512_mask ((__v32hf) __A, + (__v32hf) __B, + (__v32hf) __C, + (__mmask32) -1, __R); +} + +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_fmsub_round_ph (__m512h __A, __mmask32 __U, __m512h __B, + __m512h __C, const int __R) +{ + return (__m512h) __builtin_ia32_vfmsubph512_mask ((__v32hf) __A, + (__v32hf) __B, + (__v32hf) __C, + (__mmask32) __U, __R); +} + +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask3_fmsub_round_ph (__m512h __A, __m512h __B, __m512h __C, + __mmask32 __U, const int __R) +{ + return (__m512h) __builtin_ia32_vfmsubph512_mask3 ((__v32hf) __A, + (__v32hf) __B, + (__v32hf) __C, + (__mmask32) __U, __R); +} + +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_fmsub_round_ph (__mmask32 __U, __m512h __A, __m512h __B, + __m512h __C, const int __R) +{ + return (__m512h) __builtin_ia32_vfmsubph512_maskz ((__v32hf) __A, + (__v32hf) __B, + (__v32hf) __C, + (__mmask32) __U, __R); +} + +#else +#define _mm512_fmsub_round_ph(A, B, C, R) \ + ((__m512h)__builtin_ia32_vfmsubph512_mask((A), (B), (C), -1, (R))) + +#define _mm512_mask_fmsub_round_ph(A, U, B, C, R) \ + ((__m512h)__builtin_ia32_vfmsubph512_mask((A), (B), (C), (U), (R))) + +#define _mm512_mask3_fmsub_round_ph(A, B, C, U, R) \ + ((__m512h)__builtin_ia32_vfmsubph512_mask3((A), (B), (C), (U), (R))) + +#define _mm512_maskz_fmsub_round_ph(U, A, B, C, R) \ + ((__m512h)__builtin_ia32_vfmsubph512_maskz((A), (B), (C), (U), (R))) + +#endif /* __OPTIMIZE__ */ + +/* Intrinsics vfnmsub[132,213,231]ph. */ +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_fnmsub_ph (__m512h __A, __m512h __B, __m512h __C) +{ + return (__m512h) + __builtin_ia32_vfnmsubph512_mask ((__v32hf) __A, + (__v32hf) __B, + (__v32hf) __C, + (__mmask32) -1, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_fnmsub_ph (__m512h __A, __mmask32 __U, __m512h __B, __m512h __C) +{ + return (__m512h) + __builtin_ia32_vfnmsubph512_mask ((__v32hf) __A, + (__v32hf) __B, + (__v32hf) __C, + (__mmask32) __U, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask3_fnmsub_ph (__m512h __A, __m512h __B, __m512h __C, __mmask32 __U) +{ + return (__m512h) + __builtin_ia32_vfnmsubph512_mask3 ((__v32hf) __A, + (__v32hf) __B, + (__v32hf) __C, + (__mmask32) __U, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_fnmsub_ph (__mmask32 __U, __m512h __A, __m512h __B, __m512h __C) +{ + return (__m512h) + __builtin_ia32_vfnmsubph512_maskz ((__v32hf) __A, + (__v32hf) __B, + (__v32hf) __C, + (__mmask32) __U, + _MM_FROUND_CUR_DIRECTION); +} + +#ifdef __OPTIMIZE__ +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_fnmsub_round_ph (__m512h __A, __m512h __B, __m512h __C, const int __R) +{ + return (__m512h) __builtin_ia32_vfnmsubph512_mask ((__v32hf) __A, + (__v32hf) __B, + (__v32hf) __C, + (__mmask32) -1, __R); +} + +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_fnmsub_round_ph (__m512h __A, __mmask32 __U, __m512h __B, + __m512h __C, const int __R) +{ + return (__m512h) __builtin_ia32_vfnmsubph512_mask ((__v32hf) __A, + (__v32hf) __B, + (__v32hf) __C, + (__mmask32) __U, __R); +} + +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask3_fnmsub_round_ph (__m512h __A, __m512h __B, __m512h __C, + __mmask32 __U, const int __R) +{ + return (__m512h) __builtin_ia32_vfnmsubph512_mask3 ((__v32hf) __A, + (__v32hf) __B, + (__v32hf) __C, + (__mmask32) __U, __R); +} + +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_fnmsub_round_ph (__mmask32 __U, __m512h __A, __m512h __B, + __m512h __C, const int __R) +{ + return (__m512h) __builtin_ia32_vfnmsubph512_maskz ((__v32hf) __A, + (__v32hf) __B, + (__v32hf) __C, + (__mmask32) __U, __R); +} + +#else +#define _mm512_fnmsub_round_ph(A, B, C, R) \ + ((__m512h)__builtin_ia32_vfnmsubph512_mask ((A), (B), (C), -1, (R))) + +#define _mm512_mask_fnmsub_round_ph(A, U, B, C, R) \ + ((__m512h)__builtin_ia32_vfnmsubph512_mask ((A), (B), (C), (U), (R))) + +#define _mm512_mask3_fnmsub_round_ph(A, B, C, U, R) \ + ((__m512h)__builtin_ia32_vfnmsubph512_mask3 ((A), (B), (C), (U), (R))) + +#define _mm512_maskz_fnmsub_round_ph(U, A, B, C, R) \ + ((__m512h)__builtin_ia32_vfnmsubph512_maskz ((A), (B), (C), (U), (R))) + +#endif /* __OPTIMIZE__ */ + #ifdef __DISABLE_AVX512FP16__ #undef __DISABLE_AVX512FP16__ #pragma GCC pop_options diff --git a/gcc/config/i386/avx512fp16vlintrin.h b/gcc/config/i386/avx512fp16vlintrin.h index 8825fae52aa..bba98f105ac 100644 --- a/gcc/config/i386/avx512fp16vlintrin.h +++ b/gcc/config/i386/avx512fp16vlintrin.h @@ -2451,6 +2451,370 @@ _mm_maskz_fmsubadd_ph (__mmask8 __U, __m128h __A, __m128h __B, __U); } +/* Intrinsics vfmadd[132,213,231]ph. */ +extern __inline __m256h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_fmadd_ph (__m256h __A, __m256h __B, __m256h __C) +{ + return (__m256h) __builtin_ia32_vfmaddph256_mask ((__v16hf) __A, + (__v16hf) __B, + (__v16hf) __C, + (__mmask16) -1); +} + +extern __inline __m256h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_mask_fmadd_ph (__m256h __A, __mmask16 __U, __m256h __B, + __m256h __C) +{ + return (__m256h) __builtin_ia32_vfmaddph256_mask ((__v16hf) __A, + (__v16hf) __B, + (__v16hf) __C, + (__mmask16) __U); +} + +extern __inline __m256h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_mask3_fmadd_ph (__m256h __A, __m256h __B, __m256h __C, + __mmask16 __U) +{ + return (__m256h) __builtin_ia32_vfmaddph256_mask3 ((__v16hf) __A, + (__v16hf) __B, + (__v16hf) __C, + (__mmask16) + __U); +} + +extern __inline __m256h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_maskz_fmadd_ph (__mmask16 __U, __m256h __A, __m256h __B, + __m256h __C) +{ + return (__m256h) __builtin_ia32_vfmaddph256_maskz ((__v16hf) __A, + (__v16hf) __B, + (__v16hf) __C, + (__mmask16) + __U); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_fmadd_ph (__m128h __A, __m128h __B, __m128h __C) +{ + return (__m128h) __builtin_ia32_vfmaddph128_mask ((__v8hf) __A, + (__v8hf) __B, + (__v8hf) __C, + (__mmask8) -1); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_fmadd_ph (__m128h __A, __mmask8 __U, __m128h __B, + __m128h __C) +{ + return (__m128h) __builtin_ia32_vfmaddph128_mask ((__v8hf) __A, + (__v8hf) __B, + (__v8hf) __C, + (__mmask8) __U); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask3_fmadd_ph (__m128h __A, __m128h __B, __m128h __C, + __mmask8 __U) +{ + return (__m128h) __builtin_ia32_vfmaddph128_mask3 ((__v8hf) __A, + (__v8hf) __B, + (__v8hf) __C, + (__mmask8) + __U); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_fmadd_ph (__mmask8 __U, __m128h __A, __m128h __B, + __m128h __C) +{ + return (__m128h) __builtin_ia32_vfmaddph128_maskz ((__v8hf) __A, + (__v8hf) __B, + (__v8hf) __C, + (__mmask8) + __U); +} + +/* Intrinsics vfnmadd[132,213,231]ph. */ +extern __inline __m256h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_fnmadd_ph (__m256h __A, __m256h __B, __m256h __C) +{ + return (__m256h) __builtin_ia32_vfnmaddph256_mask ((__v16hf) __A, + (__v16hf) __B, + (__v16hf) __C, + (__mmask16) -1); +} + +extern __inline __m256h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_mask_fnmadd_ph (__m256h __A, __mmask16 __U, __m256h __B, + __m256h __C) +{ + return (__m256h) __builtin_ia32_vfnmaddph256_mask ((__v16hf) __A, + (__v16hf) __B, + (__v16hf) __C, + (__mmask16) __U); +} + +extern __inline __m256h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_mask3_fnmadd_ph (__m256h __A, __m256h __B, __m256h __C, + __mmask16 __U) +{ + return (__m256h) __builtin_ia32_vfnmaddph256_mask3 ((__v16hf) __A, + (__v16hf) __B, + (__v16hf) __C, + (__mmask16) + __U); +} + +extern __inline __m256h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_maskz_fnmadd_ph (__mmask16 __U, __m256h __A, __m256h __B, + __m256h __C) +{ + return (__m256h) __builtin_ia32_vfnmaddph256_maskz ((__v16hf) __A, + (__v16hf) __B, + (__v16hf) __C, + (__mmask16) + __U); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_fnmadd_ph (__m128h __A, __m128h __B, __m128h __C) +{ + return (__m128h) __builtin_ia32_vfnmaddph128_mask ((__v8hf) __A, + (__v8hf) __B, + (__v8hf) __C, + (__mmask8) -1); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_fnmadd_ph (__m128h __A, __mmask8 __U, __m128h __B, + __m128h __C) +{ + return (__m128h) __builtin_ia32_vfnmaddph128_mask ((__v8hf) __A, + (__v8hf) __B, + (__v8hf) __C, + (__mmask8) __U); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask3_fnmadd_ph (__m128h __A, __m128h __B, __m128h __C, + __mmask8 __U) +{ + return (__m128h) __builtin_ia32_vfnmaddph128_mask3 ((__v8hf) __A, + (__v8hf) __B, + (__v8hf) __C, + (__mmask8) + __U); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_fnmadd_ph (__mmask8 __U, __m128h __A, __m128h __B, + __m128h __C) +{ + return (__m128h) __builtin_ia32_vfnmaddph128_maskz ((__v8hf) __A, + (__v8hf) __B, + (__v8hf) __C, + (__mmask8) + __U); +} + +/* Intrinsics vfmsub[132,213,231]ph. */ +extern __inline __m256h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_fmsub_ph (__m256h __A, __m256h __B, __m256h __C) +{ + return (__m256h) __builtin_ia32_vfmsubph256_mask ((__v16hf) __A, + (__v16hf) __B, + (__v16hf) __C, + (__mmask16) -1); +} + +extern __inline __m256h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_mask_fmsub_ph (__m256h __A, __mmask16 __U, __m256h __B, + __m256h __C) +{ + return (__m256h) __builtin_ia32_vfmsubph256_mask ((__v16hf) __A, + (__v16hf) __B, + (__v16hf) __C, + (__mmask16) __U); +} + +extern __inline __m256h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_mask3_fmsub_ph (__m256h __A, __m256h __B, __m256h __C, + __mmask16 __U) +{ + return (__m256h) __builtin_ia32_vfmsubph256_mask3 ((__v16hf) __A, + (__v16hf) __B, + (__v16hf) __C, + (__mmask16) + __U); +} + +extern __inline __m256h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_maskz_fmsub_ph (__mmask16 __U, __m256h __A, __m256h __B, + __m256h __C) +{ + return (__m256h) __builtin_ia32_vfmsubph256_maskz ((__v16hf) __A, + (__v16hf) __B, + (__v16hf) __C, + (__mmask16) + __U); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_fmsub_ph (__m128h __A, __m128h __B, __m128h __C) +{ + return (__m128h) __builtin_ia32_vfmsubph128_mask ((__v8hf) __A, + (__v8hf) __B, + (__v8hf) __C, + (__mmask8) -1); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_fmsub_ph (__m128h __A, __mmask8 __U, __m128h __B, + __m128h __C) +{ + return (__m128h) __builtin_ia32_vfmsubph128_mask ((__v8hf) __A, + (__v8hf) __B, + (__v8hf) __C, + (__mmask8) __U); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask3_fmsub_ph (__m128h __A, __m128h __B, __m128h __C, + __mmask8 __U) +{ + return (__m128h) __builtin_ia32_vfmsubph128_mask3 ((__v8hf) __A, + (__v8hf) __B, + (__v8hf) __C, + (__mmask8) + __U); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_fmsub_ph (__mmask8 __U, __m128h __A, __m128h __B, + __m128h __C) +{ + return (__m128h) __builtin_ia32_vfmsubph128_maskz ((__v8hf) __A, + (__v8hf) __B, + (__v8hf) __C, + (__mmask8) + __U); +} + +/* Intrinsics vfnmsub[132,213,231]ph. */ +extern __inline __m256h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_fnmsub_ph (__m256h __A, __m256h __B, __m256h __C) +{ + return (__m256h) __builtin_ia32_vfnmsubph256_mask ((__v16hf) __A, + (__v16hf) __B, + (__v16hf) __C, + (__mmask16) -1); +} + +extern __inline __m256h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_mask_fnmsub_ph (__m256h __A, __mmask16 __U, __m256h __B, + __m256h __C) +{ + return (__m256h) __builtin_ia32_vfnmsubph256_mask ((__v16hf) __A, + (__v16hf) __B, + (__v16hf) __C, + (__mmask16) __U); +} + +extern __inline __m256h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_mask3_fnmsub_ph (__m256h __A, __m256h __B, __m256h __C, + __mmask16 __U) +{ + return (__m256h) __builtin_ia32_vfnmsubph256_mask3 ((__v16hf) __A, + (__v16hf) __B, + (__v16hf) __C, + (__mmask16) + __U); +} + +extern __inline __m256h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_maskz_fnmsub_ph (__mmask16 __U, __m256h __A, __m256h __B, + __m256h __C) +{ + return (__m256h) __builtin_ia32_vfnmsubph256_maskz ((__v16hf) __A, + (__v16hf) __B, + (__v16hf) __C, + (__mmask16) + __U); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_fnmsub_ph (__m128h __A, __m128h __B, __m128h __C) +{ + return (__m128h) __builtin_ia32_vfnmsubph128_mask ((__v8hf) __A, + (__v8hf) __B, + (__v8hf) __C, + (__mmask8) -1); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_fnmsub_ph (__m128h __A, __mmask8 __U, __m128h __B, + __m128h __C) +{ + return (__m128h) __builtin_ia32_vfnmsubph128_mask ((__v8hf) __A, + (__v8hf) __B, + (__v8hf) __C, + (__mmask8) __U); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask3_fnmsub_ph (__m128h __A, __m128h __B, __m128h __C, + __mmask8 __U) +{ + return (__m128h) __builtin_ia32_vfnmsubph128_mask3 ((__v8hf) __A, + (__v8hf) __B, + (__v8hf) __C, + (__mmask8) + __U); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_fnmsub_ph (__mmask8 __U, __m128h __A, __m128h __B, + __m128h __C) +{ + return (__m128h) __builtin_ia32_vfnmsubph128_maskz ((__v8hf) __A, + (__v8hf) __B, + (__v8hf) __C, + (__mmask8) + __U); +} + #ifdef __DISABLE_AVX512FP16VL__ #undef __DISABLE_AVX512FP16VL__ #pragma GCC pop_options diff --git a/gcc/config/i386/i386-builtin.def b/gcc/config/i386/i386-builtin.def index 42bba719ec3..cf0259843cc 100644 --- a/gcc/config/i386/i386-builtin.def +++ b/gcc/config/i386/i386-builtin.def @@ -2887,6 +2887,30 @@ BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512vl_ BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_fmsubadd_v8hf_mask, "__builtin_ia32_vfmsubaddph128_mask", IX86_BUILTIN_VFMSUBADDPH128_MASK, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI) BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_fmsubadd_v8hf_mask3, "__builtin_ia32_vfmsubaddph128_mask3", IX86_BUILTIN_VFMSUBADDPH128_MASK3, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI) BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_fmsubadd_v8hf_maskz, "__builtin_ia32_vfmsubaddph128_maskz", IX86_BUILTIN_VFMSUBADDPH128_MASKZ, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI) +BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512vl_fmadd_v16hf_mask, "__builtin_ia32_vfmaddph256_mask", IX86_BUILTIN_VFMADDPH256_MASK, UNKNOWN, (int) V16HF_FTYPE_V16HF_V16HF_V16HF_UHI) +BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512vl_fmadd_v16hf_mask3, "__builtin_ia32_vfmaddph256_mask3", IX86_BUILTIN_VFMADDPH256_MASK3, UNKNOWN, (int) V16HF_FTYPE_V16HF_V16HF_V16HF_UHI) +BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512vl_fmadd_v16hf_maskz, "__builtin_ia32_vfmaddph256_maskz", IX86_BUILTIN_VFMADDPH256_MASKZ, UNKNOWN, (int) V16HF_FTYPE_V16HF_V16HF_V16HF_UHI) +BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_fmadd_v8hf_mask, "__builtin_ia32_vfmaddph128_mask", IX86_BUILTIN_VFMADDPH128_MASK, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI) +BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_fmadd_v8hf_mask3, "__builtin_ia32_vfmaddph128_mask3", IX86_BUILTIN_VFMADDPH128_MASK3, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI) +BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_fmadd_v8hf_maskz, "__builtin_ia32_vfmaddph128_maskz", IX86_BUILTIN_VFMADDPH128_MASKZ, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI) +BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512vl_fnmadd_v16hf_mask, "__builtin_ia32_vfnmaddph256_mask", IX86_BUILTIN_VFNMADDPH256_MASK, UNKNOWN, (int) V16HF_FTYPE_V16HF_V16HF_V16HF_UHI) +BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512vl_fnmadd_v16hf_mask3, "__builtin_ia32_vfnmaddph256_mask3", IX86_BUILTIN_VFNMADDPH256_MASK3, UNKNOWN, (int) V16HF_FTYPE_V16HF_V16HF_V16HF_UHI) +BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512vl_fnmadd_v16hf_maskz, "__builtin_ia32_vfnmaddph256_maskz", IX86_BUILTIN_VFNMADDPH256_MASKZ, UNKNOWN, (int) V16HF_FTYPE_V16HF_V16HF_V16HF_UHI) +BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_fnmadd_v8hf_mask, "__builtin_ia32_vfnmaddph128_mask", IX86_BUILTIN_VFNMADDPH128_MASK, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI) +BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_fnmadd_v8hf_mask3, "__builtin_ia32_vfnmaddph128_mask3", IX86_BUILTIN_VFNMADDPH128_MASK3, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI) +BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_fnmadd_v8hf_maskz, "__builtin_ia32_vfnmaddph128_maskz", IX86_BUILTIN_VFNMADDPH128_MASKZ, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI) +BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512vl_fmsub_v16hf_mask, "__builtin_ia32_vfmsubph256_mask", IX86_BUILTIN_VFMSUBPH256_MASK, UNKNOWN, (int) V16HF_FTYPE_V16HF_V16HF_V16HF_UHI) +BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512vl_fmsub_v16hf_mask3, "__builtin_ia32_vfmsubph256_mask3", IX86_BUILTIN_VFMSUBPH256_MASK3, UNKNOWN, (int) V16HF_FTYPE_V16HF_V16HF_V16HF_UHI) +BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512vl_fmsub_v16hf_maskz, "__builtin_ia32_vfmsubph256_maskz", IX86_BUILTIN_VFMSUBPH256_MASKZ, UNKNOWN, (int) V16HF_FTYPE_V16HF_V16HF_V16HF_UHI) +BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_fmsub_v8hf_mask, "__builtin_ia32_vfmsubph128_mask", IX86_BUILTIN_VFMSUBPH128_MASK, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI) +BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_fmsub_v8hf_mask3, "__builtin_ia32_vfmsubph128_mask3", IX86_BUILTIN_VFMSUBPH128_MASK3, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI) +BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_fmsub_v8hf_maskz, "__builtin_ia32_vfmsubph128_maskz", IX86_BUILTIN_VFMSUBPH128_MASKZ, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI) +BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512vl_fnmsub_v16hf_mask, "__builtin_ia32_vfnmsubph256_mask", IX86_BUILTIN_VFNMSUBPH256_MASK, UNKNOWN, (int) V16HF_FTYPE_V16HF_V16HF_V16HF_UHI) +BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512vl_fnmsub_v16hf_mask3, "__builtin_ia32_vfnmsubph256_mask3", IX86_BUILTIN_VFNMSUBPH256_MASK3, UNKNOWN, (int) V16HF_FTYPE_V16HF_V16HF_V16HF_UHI) +BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512vl_fnmsub_v16hf_maskz, "__builtin_ia32_vfnmsubph256_maskz", IX86_BUILTIN_VFNMSUBPH256_MASKZ, UNKNOWN, (int) V16HF_FTYPE_V16HF_V16HF_V16HF_UHI) +BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_fnmsub_v8hf_mask, "__builtin_ia32_vfnmsubph128_mask", IX86_BUILTIN_VFNMSUBPH128_MASK, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI) +BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_fnmsub_v8hf_mask3, "__builtin_ia32_vfnmsubph128_mask3", IX86_BUILTIN_VFNMSUBPH128_MASK3, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI) +BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_fnmsub_v8hf_maskz, "__builtin_ia32_vfnmsubph128_maskz", IX86_BUILTIN_VFNMSUBPH128_MASKZ, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI) /* Builtins with rounding support. */ BDESC_END (ARGS, ROUND_ARGS) @@ -3158,6 +3182,18 @@ BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_fmaddsub_v32hf_maskz_ro BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_fmsubadd_v32hf_mask_round, "__builtin_ia32_vfmsubaddph512_mask", IX86_BUILTIN_VFMSUBADDPH512_MASK, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT) BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_fmsubadd_v32hf_mask3_round, "__builtin_ia32_vfmsubaddph512_mask3", IX86_BUILTIN_VFMSUBADDPH512_MASK3, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT) BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_fmsubadd_v32hf_maskz_round, "__builtin_ia32_vfmsubaddph512_maskz", IX86_BUILTIN_VFMSUBADDPH512_MASKZ, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT) +BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_fmadd_v32hf_mask_round, "__builtin_ia32_vfmaddph512_mask", IX86_BUILTIN_VFMADDPH512_MASK, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT) +BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_fmadd_v32hf_mask3_round, "__builtin_ia32_vfmaddph512_mask3", IX86_BUILTIN_VFMADDPH512_MASK3, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT) +BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_fmadd_v32hf_maskz_round, "__builtin_ia32_vfmaddph512_maskz", IX86_BUILTIN_VFMADDPH512_MASKZ, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT) +BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_fnmadd_v32hf_mask_round, "__builtin_ia32_vfnmaddph512_mask", IX86_BUILTIN_VFNMADDPH512_MASK, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT) +BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_fnmadd_v32hf_mask3_round, "__builtin_ia32_vfnmaddph512_mask3", IX86_BUILTIN_VFNMADDPH512_MASK3, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT) +BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_fnmadd_v32hf_maskz_round, "__builtin_ia32_vfnmaddph512_maskz", IX86_BUILTIN_VFNMADDPH512_MASKZ, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT) +BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_fmsub_v32hf_mask_round, "__builtin_ia32_vfmsubph512_mask", IX86_BUILTIN_VFMSUBPH512_MASK, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT) +BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_fmsub_v32hf_mask3_round, "__builtin_ia32_vfmsubph512_mask3", IX86_BUILTIN_VFMSUBPH512_MASK3, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT) +BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_fmsub_v32hf_maskz_round, "__builtin_ia32_vfmsubph512_maskz", IX86_BUILTIN_VFMSUBPH512_MASKZ, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT) +BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_fnmsub_v32hf_mask_round, "__builtin_ia32_vfnmsubph512_mask", IX86_BUILTIN_VFNMSUBPH512_MASK, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT) +BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_fnmsub_v32hf_mask3_round, "__builtin_ia32_vfnmsubph512_mask3", IX86_BUILTIN_VFNMSUBPH512_MASK3, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT) +BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_fnmsub_v32hf_maskz_round, "__builtin_ia32_vfnmsubph512_maskz", IX86_BUILTIN_VFNMSUBPH512_MASKZ, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT) BDESC_END (ROUND_ARGS, MULTI_ARG) diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index 847684e232e..fdcc0515228 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -825,7 +825,9 @@ (define_mode_attr avx512bcst (V16SI "%{1to16%}") (V8DI "%{1to8%}") (V4SF "%{1to4%}") (V2DF "%{1to2%}") (V8SF "%{1to8%}") (V4DF "%{1to4%}") - (V16SF "%{1to16%}") (V8DF "%{1to8%}")]) + (V16SF "%{1to16%}") (V8DF "%{1to8%}") + (V8HF "%{1to8%}") (V16HF "%{1to16%}") + (V32HF "%{1to32%}")]) ;; Mapping from float mode to required SSE level (define_mode_attr sse @@ -4507,10 +4509,10 @@ (define_expand "fma4i_fnmsub_" (match_operand:FMAMODE_AVX512 3 "nonimmediate_operand"))))]) (define_expand "_fmadd__maskz" - [(match_operand:VF_AVX512VL 0 "register_operand") - (match_operand:VF_AVX512VL 1 "") - (match_operand:VF_AVX512VL 2 "") - (match_operand:VF_AVX512VL 3 "") + [(match_operand:VFH_AVX512VL 0 "register_operand") + (match_operand:VFH_AVX512VL 1 "") + (match_operand:VFH_AVX512VL 2 "") + (match_operand:VFH_AVX512VL 3 "") (match_operand: 4 "register_operand")] "TARGET_AVX512F && " { @@ -4550,11 +4552,11 @@ (define_mode_iterator VFH_SF_AVX512VL DF V8DF (V4DF "TARGET_AVX512VL") (V2DF "TARGET_AVX512VL")]) (define_insn "fma_fmadd_" - [(set (match_operand:VF_SF_AVX512VL 0 "register_operand" "=v,v,v") - (fma:VF_SF_AVX512VL - (match_operand:VF_SF_AVX512VL 1 "" "%0,0,v") - (match_operand:VF_SF_AVX512VL 2 "" ",v,") - (match_operand:VF_SF_AVX512VL 3 "" "v,,0")))] + [(set (match_operand:VFH_SF_AVX512VL 0 "register_operand" "=v,v,v") + (fma:VFH_SF_AVX512VL + (match_operand:VFH_SF_AVX512VL 1 "" "%0,0,v") + (match_operand:VFH_SF_AVX512VL 2 "" ",v,") + (match_operand:VFH_SF_AVX512VL 3 "" "v,,0")))] "TARGET_AVX512F && && " "@ vfmadd132\t{%2, %3, %0|%0, %3, %2} @@ -4564,12 +4566,12 @@ (define_insn "fma_fmadd_" (set_attr "mode" "")]) (define_insn "_fmadd__mask" - [(set (match_operand:VF_AVX512VL 0 "register_operand" "=v,v") - (vec_merge:VF_AVX512VL - (fma:VF_AVX512VL - (match_operand:VF_AVX512VL 1 "register_operand" "0,0") - (match_operand:VF_AVX512VL 2 "" ",v") - (match_operand:VF_AVX512VL 3 "" "v,")) + [(set (match_operand:VFH_AVX512VL 0 "register_operand" "=v,v") + (vec_merge:VFH_AVX512VL + (fma:VFH_AVX512VL + (match_operand:VFH_AVX512VL 1 "register_operand" "0,0") + (match_operand:VFH_AVX512VL 2 "" ",v") + (match_operand:VFH_AVX512VL 3 "" "v,")) (match_dup 1) (match_operand: 4 "register_operand" "Yk,Yk")))] "TARGET_AVX512F && " @@ -4580,12 +4582,12 @@ (define_insn "_fmadd__mask" (set_attr "mode" "")]) (define_insn "_fmadd__mask3" - [(set (match_operand:VF_AVX512VL 0 "register_operand" "=v") - (vec_merge:VF_AVX512VL - (fma:VF_AVX512VL - (match_operand:VF_AVX512VL 1 "" "%v") - (match_operand:VF_AVX512VL 2 "" "") - (match_operand:VF_AVX512VL 3 "register_operand" "0")) + [(set (match_operand:VFH_AVX512VL 0 "register_operand" "=v") + (vec_merge:VFH_AVX512VL + (fma:VFH_AVX512VL + (match_operand:VFH_AVX512VL 1 "" "%v") + (match_operand:VFH_AVX512VL 2 "" "") + (match_operand:VFH_AVX512VL 3 "register_operand" "0")) (match_dup 3) (match_operand: 4 "register_operand" "Yk")))] "TARGET_AVX512F" @@ -4612,10 +4614,10 @@ (define_insn "*fma_fmsub_" (set_attr "mode" "")]) (define_expand "_fmsub__maskz" - [(match_operand:VF_AVX512VL 0 "register_operand") - (match_operand:VF_AVX512VL 1 "") - (match_operand:VF_AVX512VL 2 "") - (match_operand:VF_AVX512VL 3 "") + [(match_operand:VFH_AVX512VL 0 "register_operand") + (match_operand:VFH_AVX512VL 1 "") + (match_operand:VFH_AVX512VL 2 "") + (match_operand:VFH_AVX512VL 3 "") (match_operand: 4 "register_operand")] "TARGET_AVX512F && " { @@ -4626,12 +4628,12 @@ (define_expand "_fmsub__maskz" }) (define_insn "fma_fmsub_" - [(set (match_operand:VF_SF_AVX512VL 0 "register_operand" "=v,v,v") - (fma:VF_SF_AVX512VL - (match_operand:VF_SF_AVX512VL 1 "" "%0,0,v") - (match_operand:VF_SF_AVX512VL 2 "" ",v,") - (neg:VF_SF_AVX512VL - (match_operand:VF_SF_AVX512VL 3 "" "v,,0"))))] + [(set (match_operand:VFH_SF_AVX512VL 0 "register_operand" "=v,v,v") + (fma:VFH_SF_AVX512VL + (match_operand:VFH_SF_AVX512VL 1 "" "%0,0,v") + (match_operand:VFH_SF_AVX512VL 2 "" ",v,") + (neg:VFH_SF_AVX512VL + (match_operand:VFH_SF_AVX512VL 3 "" "v,,0"))))] "TARGET_AVX512F && && " "@ vfmsub132\t{%2, %3, %0|%0, %3, %2} @@ -4641,13 +4643,13 @@ (define_insn "fma_fmsub_" (set_attr "mode" "")]) (define_insn "_fmsub__mask" - [(set (match_operand:VF_AVX512VL 0 "register_operand" "=v,v") - (vec_merge:VF_AVX512VL - (fma:VF_AVX512VL - (match_operand:VF_AVX512VL 1 "register_operand" "0,0") - (match_operand:VF_AVX512VL 2 "" ",v") - (neg:VF_AVX512VL - (match_operand:VF_AVX512VL 3 "" "v,"))) + [(set (match_operand:VFH_AVX512VL 0 "register_operand" "=v,v") + (vec_merge:VFH_AVX512VL + (fma:VFH_AVX512VL + (match_operand:VFH_AVX512VL 1 "register_operand" "0,0") + (match_operand:VFH_AVX512VL 2 "" ",v") + (neg:VFH_AVX512VL + (match_operand:VFH_AVX512VL 3 "" "v,"))) (match_dup 1) (match_operand: 4 "register_operand" "Yk,Yk")))] "TARGET_AVX512F" @@ -4658,13 +4660,13 @@ (define_insn "_fmsub__mask" (set_attr "mode" "")]) (define_insn "_fmsub__mask3" - [(set (match_operand:VF_AVX512VL 0 "register_operand" "=v") - (vec_merge:VF_AVX512VL - (fma:VF_AVX512VL - (match_operand:VF_AVX512VL 1 "" "%v") - (match_operand:VF_AVX512VL 2 "" "") - (neg:VF_AVX512VL - (match_operand:VF_AVX512VL 3 "register_operand" "0"))) + [(set (match_operand:VFH_AVX512VL 0 "register_operand" "=v") + (vec_merge:VFH_AVX512VL + (fma:VFH_AVX512VL + (match_operand:VFH_AVX512VL 1 "" "%v") + (match_operand:VFH_AVX512VL 2 "" "") + (neg:VFH_AVX512VL + (match_operand:VFH_AVX512VL 3 "register_operand" "0"))) (match_dup 3) (match_operand: 4 "register_operand" "Yk")))] "TARGET_AVX512F && " @@ -4691,10 +4693,10 @@ (define_insn "*fma_fnmadd_" (set_attr "mode" "")]) (define_expand "_fnmadd__maskz" - [(match_operand:VF_AVX512VL 0 "register_operand") - (match_operand:VF_AVX512VL 1 "") - (match_operand:VF_AVX512VL 2 "") - (match_operand:VF_AVX512VL 3 "") + [(match_operand:VFH_AVX512VL 0 "register_operand") + (match_operand:VFH_AVX512VL 1 "") + (match_operand:VFH_AVX512VL 2 "") + (match_operand:VFH_AVX512VL 3 "") (match_operand: 4 "register_operand")] "TARGET_AVX512F && " { @@ -4705,12 +4707,12 @@ (define_expand "_fnmadd__maskz" }) (define_insn "fma_fnmadd_" - [(set (match_operand:VF_SF_AVX512VL 0 "register_operand" "=v,v,v") - (fma:VF_SF_AVX512VL - (neg:VF_SF_AVX512VL - (match_operand:VF_SF_AVX512VL 1 "" "%0,0,v")) - (match_operand:VF_SF_AVX512VL 2 "" ",v,") - (match_operand:VF_SF_AVX512VL 3 "" "v,,0")))] + [(set (match_operand:VFH_SF_AVX512VL 0 "register_operand" "=v,v,v") + (fma:VFH_SF_AVX512VL + (neg:VFH_SF_AVX512VL + (match_operand:VFH_SF_AVX512VL 1 "" "%0,0,v")) + (match_operand:VFH_SF_AVX512VL 2 "" ",v,") + (match_operand:VFH_SF_AVX512VL 3 "" "v,,0")))] "TARGET_AVX512F && && " "@ vfnmadd132\t{%2, %3, %0|%0, %3, %2} @@ -4720,13 +4722,13 @@ (define_insn "fma_fnmadd_" (set_attr "mode" "")]) (define_insn "_fnmadd__mask" - [(set (match_operand:VF_AVX512VL 0 "register_operand" "=v,v") - (vec_merge:VF_AVX512VL - (fma:VF_AVX512VL - (neg:VF_AVX512VL - (match_operand:VF_AVX512VL 1 "register_operand" "0,0")) - (match_operand:VF_AVX512VL 2 "" ",v") - (match_operand:VF_AVX512VL 3 "" "v,")) + [(set (match_operand:VFH_AVX512VL 0 "register_operand" "=v,v") + (vec_merge:VFH_AVX512VL + (fma:VFH_AVX512VL + (neg:VFH_AVX512VL + (match_operand:VFH_AVX512VL 1 "register_operand" "0,0")) + (match_operand:VFH_AVX512VL 2 "" ",v") + (match_operand:VFH_AVX512VL 3 "" "v,")) (match_dup 1) (match_operand: 4 "register_operand" "Yk,Yk")))] "TARGET_AVX512F && " @@ -4737,13 +4739,13 @@ (define_insn "_fnmadd__mask" (set_attr "mode" "")]) (define_insn "_fnmadd__mask3" - [(set (match_operand:VF_AVX512VL 0 "register_operand" "=v") - (vec_merge:VF_AVX512VL - (fma:VF_AVX512VL - (neg:VF_AVX512VL - (match_operand:VF_AVX512VL 1 "" "%v")) - (match_operand:VF_AVX512VL 2 "" "") - (match_operand:VF_AVX512VL 3 "register_operand" "0")) + [(set (match_operand:VFH_AVX512VL 0 "register_operand" "=v") + (vec_merge:VFH_AVX512VL + (fma:VFH_AVX512VL + (neg:VFH_AVX512VL + (match_operand:VFH_AVX512VL 1 "" "%v")) + (match_operand:VFH_AVX512VL 2 "" "") + (match_operand:VFH_AVX512VL 3 "register_operand" "0")) (match_dup 3) (match_operand: 4 "register_operand" "Yk")))] "TARGET_AVX512F && " @@ -4771,10 +4773,10 @@ (define_insn "*fma_fnmsub_" (set_attr "mode" "")]) (define_expand "_fnmsub__maskz" - [(match_operand:VF_AVX512VL 0 "register_operand") - (match_operand:VF_AVX512VL 1 "") - (match_operand:VF_AVX512VL 2 "") - (match_operand:VF_AVX512VL 3 "") + [(match_operand:VFH_AVX512VL 0 "register_operand") + (match_operand:VFH_AVX512VL 1 "") + (match_operand:VFH_AVX512VL 2 "") + (match_operand:VFH_AVX512VL 3 "") (match_operand: 4 "register_operand")] "TARGET_AVX512F && " { @@ -4785,13 +4787,13 @@ (define_expand "_fnmsub__maskz" }) (define_insn "fma_fnmsub_" - [(set (match_operand:VF_SF_AVX512VL 0 "register_operand" "=v,v,v") - (fma:VF_SF_AVX512VL - (neg:VF_SF_AVX512VL - (match_operand:VF_SF_AVX512VL 1 "" "%0,0,v")) - (match_operand:VF_SF_AVX512VL 2 "" ",v,") - (neg:VF_SF_AVX512VL - (match_operand:VF_SF_AVX512VL 3 "" "v,,0"))))] + [(set (match_operand:VFH_SF_AVX512VL 0 "register_operand" "=v,v,v") + (fma:VFH_SF_AVX512VL + (neg:VFH_SF_AVX512VL + (match_operand:VFH_SF_AVX512VL 1 "" "%0,0,v")) + (match_operand:VFH_SF_AVX512VL 2 "" ",v,") + (neg:VFH_SF_AVX512VL + (match_operand:VFH_SF_AVX512VL 3 "" "v,,0"))))] "TARGET_AVX512F && && " "@ vfnmsub132\t{%2, %3, %0|%0, %3, %2} @@ -4801,14 +4803,14 @@ (define_insn "fma_fnmsub_" (set_attr "mode" "")]) (define_insn "_fnmsub__mask" - [(set (match_operand:VF_AVX512VL 0 "register_operand" "=v,v") - (vec_merge:VF_AVX512VL - (fma:VF_AVX512VL - (neg:VF_AVX512VL - (match_operand:VF_AVX512VL 1 "register_operand" "0,0")) - (match_operand:VF_AVX512VL 2 "" ",v") - (neg:VF_AVX512VL - (match_operand:VF_AVX512VL 3 "" "v,"))) + [(set (match_operand:VFH_AVX512VL 0 "register_operand" "=v,v") + (vec_merge:VFH_AVX512VL + (fma:VFH_AVX512VL + (neg:VFH_AVX512VL + (match_operand:VFH_AVX512VL 1 "register_operand" "0,0")) + (match_operand:VFH_AVX512VL 2 "" ",v") + (neg:VFH_AVX512VL + (match_operand:VFH_AVX512VL 3 "" "v,"))) (match_dup 1) (match_operand: 4 "register_operand" "Yk,Yk")))] "TARGET_AVX512F && " @@ -4819,14 +4821,14 @@ (define_insn "_fnmsub__mask" (set_attr "mode" "")]) (define_insn "_fnmsub__mask3" - [(set (match_operand:VF_AVX512VL 0 "register_operand" "=v") - (vec_merge:VF_AVX512VL - (fma:VF_AVX512VL - (neg:VF_AVX512VL - (match_operand:VF_AVX512VL 1 "" "%v")) - (match_operand:VF_AVX512VL 2 "" "") - (neg:VF_AVX512VL - (match_operand:VF_AVX512VL 3 "register_operand" "0"))) + [(set (match_operand:VFH_AVX512VL 0 "register_operand" "=v") + (vec_merge:VFH_AVX512VL + (fma:VFH_AVX512VL + (neg:VFH_AVX512VL + (match_operand:VFH_AVX512VL 1 "" "%v")) + (match_operand:VFH_AVX512VL 2 "" "") + (neg:VFH_AVX512VL + (match_operand:VFH_AVX512VL 3 "register_operand" "0"))) (match_dup 3) (match_operand: 4 "register_operand" "Yk")))] "TARGET_AVX512F" diff --git a/gcc/testsuite/gcc.target/i386/avx-1.c b/gcc/testsuite/gcc.target/i386/avx-1.c index 51a0cf2fe87..d2ab16538d8 100644 --- a/gcc/testsuite/gcc.target/i386/avx-1.c +++ b/gcc/testsuite/gcc.target/i386/avx-1.c @@ -763,6 +763,18 @@ #define __builtin_ia32_vfmsubaddph512_mask(A, B, C, D, E) __builtin_ia32_vfmsubaddph512_mask(A, B, C, D, 8) #define __builtin_ia32_vfmsubaddph512_mask3(A, B, C, D, E) __builtin_ia32_vfmsubaddph512_mask3(A, B, C, D, 8) #define __builtin_ia32_vfmsubaddph512_maskz(A, B, C, D, E) __builtin_ia32_vfmsubaddph512_maskz(A, B, C, D, 8) +#define __builtin_ia32_vfmaddph512_mask(A, B, C, D, E) __builtin_ia32_vfmaddph512_mask(A, B, C, D, 8) +#define __builtin_ia32_vfmaddph512_mask3(A, B, C, D, E) __builtin_ia32_vfmaddph512_mask3(A, B, C, D, 8) +#define __builtin_ia32_vfmaddph512_maskz(A, B, C, D, E) __builtin_ia32_vfmaddph512_maskz(A, B, C, D, 8) +#define __builtin_ia32_vfnmaddph512_mask(A, B, C, D, E) __builtin_ia32_vfnmaddph512_mask(A, B, C, D, 8) +#define __builtin_ia32_vfnmaddph512_mask3(A, B, C, D, E) __builtin_ia32_vfnmaddph512_mask3(A, B, C, D, 8) +#define __builtin_ia32_vfnmaddph512_maskz(A, B, C, D, E) __builtin_ia32_vfnmaddph512_maskz(A, B, C, D, 8) +#define __builtin_ia32_vfmsubph512_mask(A, B, C, D, E) __builtin_ia32_vfmsubph512_mask(A, B, C, D, 8) +#define __builtin_ia32_vfmsubph512_mask3(A, B, C, D, E) __builtin_ia32_vfmsubph512_mask3(A, B, C, D, 8) +#define __builtin_ia32_vfmsubph512_maskz(A, B, C, D, E) __builtin_ia32_vfmsubph512_maskz(A, B, C, D, 8) +#define __builtin_ia32_vfnmsubph512_mask(A, B, C, D, E) __builtin_ia32_vfnmsubph512_mask(A, B, C, D, 8) +#define __builtin_ia32_vfnmsubph512_mask3(A, B, C, D, E) __builtin_ia32_vfnmsubph512_mask3(A, B, C, D, 8) +#define __builtin_ia32_vfnmsubph512_maskz(A, B, C, D, E) __builtin_ia32_vfnmsubph512_maskz(A, B, C, D, 8) /* avx512fp16vlintrin.h */ #define __builtin_ia32_vcmpph_v8hf_mask(A, B, C, D) __builtin_ia32_vcmpph_v8hf_mask(A, B, 1, D) diff --git a/gcc/testsuite/gcc.target/i386/sse-13.c b/gcc/testsuite/gcc.target/i386/sse-13.c index a53f4653908..49c72f6fcef 100644 --- a/gcc/testsuite/gcc.target/i386/sse-13.c +++ b/gcc/testsuite/gcc.target/i386/sse-13.c @@ -780,6 +780,18 @@ #define __builtin_ia32_vfmsubaddph512_mask(A, B, C, D, E) __builtin_ia32_vfmsubaddph512_mask(A, B, C, D, 8) #define __builtin_ia32_vfmsubaddph512_mask3(A, B, C, D, E) __builtin_ia32_vfmsubaddph512_mask3(A, B, C, D, 8) #define __builtin_ia32_vfmsubaddph512_maskz(A, B, C, D, E) __builtin_ia32_vfmsubaddph512_maskz(A, B, C, D, 8) +#define __builtin_ia32_vfmaddph512_mask(A, B, C, D, E) __builtin_ia32_vfmaddph512_mask(A, B, C, D, 8) +#define __builtin_ia32_vfmaddph512_mask3(A, B, C, D, E) __builtin_ia32_vfmaddph512_mask3(A, B, C, D, 8) +#define __builtin_ia32_vfmaddph512_maskz(A, B, C, D, E) __builtin_ia32_vfmaddph512_maskz(A, B, C, D, 8) +#define __builtin_ia32_vfnmaddph512_mask(A, B, C, D, E) __builtin_ia32_vfnmaddph512_mask(A, B, C, D, 8) +#define __builtin_ia32_vfnmaddph512_mask3(A, B, C, D, E) __builtin_ia32_vfnmaddph512_mask3(A, B, C, D, 8) +#define __builtin_ia32_vfnmaddph512_maskz(A, B, C, D, E) __builtin_ia32_vfnmaddph512_maskz(A, B, C, D, 8) +#define __builtin_ia32_vfmsubph512_mask(A, B, C, D, E) __builtin_ia32_vfmsubph512_mask(A, B, C, D, 8) +#define __builtin_ia32_vfmsubph512_mask3(A, B, C, D, E) __builtin_ia32_vfmsubph512_mask3(A, B, C, D, 8) +#define __builtin_ia32_vfmsubph512_maskz(A, B, C, D, E) __builtin_ia32_vfmsubph512_maskz(A, B, C, D, 8) +#define __builtin_ia32_vfnmsubph512_mask(A, B, C, D, E) __builtin_ia32_vfnmsubph512_mask(A, B, C, D, 8) +#define __builtin_ia32_vfnmsubph512_mask3(A, B, C, D, E) __builtin_ia32_vfnmsubph512_mask3(A, B, C, D, 8) +#define __builtin_ia32_vfnmsubph512_maskz(A, B, C, D, E) __builtin_ia32_vfnmsubph512_maskz(A, B, C, D, 8) /* avx512fp16vlintrin.h */ #define __builtin_ia32_vcmpph_v8hf_mask(A, B, C, D) __builtin_ia32_vcmpph_v8hf_mask(A, B, 1, D) diff --git a/gcc/testsuite/gcc.target/i386/sse-14.c b/gcc/testsuite/gcc.target/i386/sse-14.c index 48895e0dd0d..9151e50afd2 100644 --- a/gcc/testsuite/gcc.target/i386/sse-14.c +++ b/gcc/testsuite/gcc.target/i386/sse-14.c @@ -838,6 +838,10 @@ test_3 (_mm_maskz_cvt_roundss_sh, __m128h, __mmask8, __m128h, __m128, 8) test_3 (_mm_maskz_cvt_roundsd_sh, __m128h, __mmask8, __m128h, __m128d, 8) test_3 (_mm512_fmaddsub_round_ph, __m512h, __m512h, __m512h, __m512h, 9) test_3 (_mm512_fmsubadd_round_ph, __m512h, __m512h, __m512h, __m512h, 9) +test_3 (_mm512_fmadd_round_ph, __m512h, __m512h, __m512h, __m512h, 9) +test_3 (_mm512_fnmadd_round_ph, __m512h, __m512h, __m512h, __m512h, 9) +test_3 (_mm512_fmsub_round_ph, __m512h, __m512h, __m512h, __m512h, 9) +test_3 (_mm512_fnmsub_round_ph, __m512h, __m512h, __m512h, __m512h, 9) test_3x (_mm512_mask_cmp_round_ph_mask, __mmask32, __mmask32, __m512h, __m512h, 1, 8) test_3x (_mm_mask_cmp_round_sh_mask, __mmask8, __mmask8, __m128h, __m128h, 1, 8) test_3x (_mm512_mask_reduce_round_ph, __m512h, __m512h, __mmask32, __m512h, 123, 8) @@ -876,6 +880,18 @@ test_4 (_mm512_maskz_fmaddsub_round_ph, __m512h, __mmask32, __m512h, __m512h, __ test_4 (_mm512_mask3_fmsubadd_round_ph, __m512h, __m512h, __m512h, __m512h, __mmask32, 9) test_4 (_mm512_mask_fmsubadd_round_ph, __m512h, __m512h, __mmask32, __m512h, __m512h, 9) test_4 (_mm512_maskz_fmsubadd_round_ph, __m512h, __mmask32, __m512h, __m512h, __m512h, 9) +test_4 (_mm512_mask_fmadd_round_ph, __m512h, __m512h, __mmask32, __m512h, __m512h, 9) +test_4 (_mm512_mask3_fmadd_round_ph, __m512h, __m512h, __m512h, __m512h, __mmask32, 9) +test_4 (_mm512_maskz_fmadd_round_ph, __m512h, __mmask32, __m512h, __m512h, __m512h, 9) +test_4 (_mm512_mask_fnmadd_round_ph, __m512h, __m512h, __mmask32, __m512h, __m512h, 9) +test_4 (_mm512_mask3_fnmadd_round_ph, __m512h, __m512h, __m512h, __m512h, __mmask32, 9) +test_4 (_mm512_maskz_fnmadd_round_ph, __m512h, __mmask32, __m512h, __m512h, __m512h, 9) +test_4 (_mm512_mask_fmsub_round_ph, __m512h, __m512h, __mmask32, __m512h, __m512h, 9) +test_4 (_mm512_mask3_fmsub_round_ph, __m512h, __m512h, __m512h, __m512h, __mmask32, 9) +test_4 (_mm512_maskz_fmsub_round_ph, __m512h, __mmask32, __m512h, __m512h, __m512h, 9) +test_4 (_mm512_mask_fnmsub_round_ph, __m512h, __m512h, __mmask32, __m512h, __m512h, 9) +test_4 (_mm512_mask3_fnmsub_round_ph, __m512h, __m512h, __m512h, __m512h, __mmask32, 9) +test_4 (_mm512_maskz_fnmsub_round_ph, __m512h, __mmask32, __m512h, __m512h, __m512h, 9) test_4x (_mm_mask_reduce_round_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 123, 8) test_4x (_mm_mask_roundscale_round_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 123, 8) test_4x (_mm_mask_getmant_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 1, 1) diff --git a/gcc/testsuite/gcc.target/i386/sse-22.c b/gcc/testsuite/gcc.target/i386/sse-22.c index bc530da388b..892b6334ae2 100644 --- a/gcc/testsuite/gcc.target/i386/sse-22.c +++ b/gcc/testsuite/gcc.target/i386/sse-22.c @@ -941,6 +941,10 @@ test_3 (_mm_maskz_cvt_roundss_sh, __m128h, __mmask8, __m128h, __m128, 8) test_3 (_mm_maskz_cvt_roundsd_sh, __m128h, __mmask8, __m128h, __m128d, 8) test_3 (_mm512_fmaddsub_round_ph, __m512h, __m512h, __m512h, __m512h, 9) test_3 (_mm512_fmsubadd_round_ph, __m512h, __m512h, __m512h, __m512h, 9) +test_3 (_mm512_fmadd_round_ph, __m512h, __m512h, __m512h, __m512h, 9) +test_3 (_mm512_fnmadd_round_ph, __m512h, __m512h, __m512h, __m512h, 9) +test_3 (_mm512_fmsub_round_ph, __m512h, __m512h, __m512h, __m512h, 9) +test_3 (_mm512_fnmsub_round_ph, __m512h, __m512h, __m512h, __m512h, 9) test_3x (_mm512_mask_cmp_round_ph_mask, __mmask32, __mmask32, __m512h, __m512h, 1, 8) test_3x (_mm_mask_cmp_round_sh_mask, __mmask8, __mmask8, __m128h, __m128h, 1, 8) test_3x (_mm512_mask_reduce_round_ph, __m512h, __m512h, __mmask32, __m512h, 123, 8) @@ -978,6 +982,18 @@ test_4 (_mm512_maskz_fmaddsub_round_ph, __m512h, __mmask32, __m512h, __m512h, __ test_4 (_mm512_mask3_fmsubadd_round_ph, __m512h, __m512h, __m512h, __m512h, __mmask32, 9) test_4 (_mm512_mask_fmsubadd_round_ph, __m512h, __m512h, __mmask32, __m512h, __m512h, 9) test_4 (_mm512_maskz_fmsubadd_round_ph, __m512h, __mmask32, __m512h, __m512h, __m512h, 9) +test_4 (_mm512_mask_fmadd_round_ph, __m512h, __m512h, __mmask32, __m512h, __m512h, 9) +test_4 (_mm512_mask3_fmadd_round_ph, __m512h, __m512h, __m512h, __m512h, __mmask32, 9) +test_4 (_mm512_maskz_fmadd_round_ph, __m512h, __mmask32, __m512h, __m512h, __m512h, 9) +test_4 (_mm512_mask_fnmadd_round_ph, __m512h, __m512h, __mmask32, __m512h, __m512h, 9) +test_4 (_mm512_mask3_fnmadd_round_ph, __m512h, __m512h, __m512h, __m512h, __mmask32, 9) +test_4 (_mm512_maskz_fnmadd_round_ph, __m512h, __mmask32, __m512h, __m512h, __m512h, 9) +test_4 (_mm512_mask_fmsub_round_ph, __m512h, __m512h, __mmask32, __m512h, __m512h, 9) +test_4 (_mm512_mask3_fmsub_round_ph, __m512h, __m512h, __m512h, __m512h, __mmask32, 9) +test_4 (_mm512_maskz_fmsub_round_ph, __m512h, __mmask32, __m512h, __m512h, __m512h, 9) +test_4 (_mm512_mask_fnmsub_round_ph, __m512h, __m512h, __mmask32, __m512h, __m512h, 9) +test_4 (_mm512_mask3_fnmsub_round_ph, __m512h, __m512h, __m512h, __m512h, __mmask32, 9) +test_4 (_mm512_maskz_fnmsub_round_ph, __m512h, __mmask32, __m512h, __m512h, __m512h, 9) test_4x (_mm_mask_reduce_round_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 123, 8) test_4x (_mm_mask_roundscale_round_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 123, 8) test_4x (_mm_mask_getmant_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 1, 1) diff --git a/gcc/testsuite/gcc.target/i386/sse-23.c b/gcc/testsuite/gcc.target/i386/sse-23.c index df43931ca97..447b83829f3 100644 --- a/gcc/testsuite/gcc.target/i386/sse-23.c +++ b/gcc/testsuite/gcc.target/i386/sse-23.c @@ -781,6 +781,18 @@ #define __builtin_ia32_vfmsubaddph512_mask(A, B, C, D, E) __builtin_ia32_vfmsubaddph512_mask(A, B, C, D, 8) #define __builtin_ia32_vfmsubaddph512_mask3(A, B, C, D, E) __builtin_ia32_vfmsubaddph512_mask3(A, B, C, D, 8) #define __builtin_ia32_vfmsubaddph512_maskz(A, B, C, D, E) __builtin_ia32_vfmsubaddph512_maskz(A, B, C, D, 8) +#define __builtin_ia32_vfmaddph512_mask(A, B, C, D, E) __builtin_ia32_vfmaddph512_mask(A, B, C, D, 8) +#define __builtin_ia32_vfmaddph512_mask3(A, B, C, D, E) __builtin_ia32_vfmaddph512_mask3(A, B, C, D, 8) +#define __builtin_ia32_vfmaddph512_maskz(A, B, C, D, E) __builtin_ia32_vfmaddph512_maskz(A, B, C, D, 8) +#define __builtin_ia32_vfnmaddph512_mask(A, B, C, D, E) __builtin_ia32_vfnmaddph512_mask(A, B, C, D, 8) +#define __builtin_ia32_vfnmaddph512_mask3(A, B, C, D, E) __builtin_ia32_vfnmaddph512_mask3(A, B, C, D, 8) +#define __builtin_ia32_vfnmaddph512_maskz(A, B, C, D, E) __builtin_ia32_vfnmaddph512_maskz(A, B, C, D, 8) +#define __builtin_ia32_vfmsubph512_mask(A, B, C, D, E) __builtin_ia32_vfmsubph512_mask(A, B, C, D, 8) +#define __builtin_ia32_vfmsubph512_mask3(A, B, C, D, E) __builtin_ia32_vfmsubph512_mask3(A, B, C, D, 8) +#define __builtin_ia32_vfmsubph512_maskz(A, B, C, D, E) __builtin_ia32_vfmsubph512_maskz(A, B, C, D, 8) +#define __builtin_ia32_vfnmsubph512_mask(A, B, C, D, E) __builtin_ia32_vfnmsubph512_mask(A, B, C, D, 8) +#define __builtin_ia32_vfnmsubph512_mask3(A, B, C, D, E) __builtin_ia32_vfnmsubph512_mask3(A, B, C, D, 8) +#define __builtin_ia32_vfnmsubph512_maskz(A, B, C, D, E) __builtin_ia32_vfnmsubph512_maskz(A, B, C, D, 8) /* avx512fp16vlintrin.h */ #define __builtin_ia32_vcmpph_v8hf_mask(A, B, C, D) __builtin_ia32_vcmpph_v8hf_mask(A, B, 1, D) From patchwork Thu Jul 1 06:16:29 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: liuhongt X-Patchwork-Id: 1499388 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256 header.s=default header.b=WScUEuNy; dkim-atps=neutral Received: from sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4GFq4z5tPGz9sVb for ; Thu, 1 Jul 2021 17:06:27 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id D7D1D394D8AB for ; Thu, 1 Jul 2021 07:06:24 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org D7D1D394D8AB DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1625123184; bh=mdpHKuugsewFr6pyUftWskLouSZzfIKqIF5pG3tYqnE=; h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=WScUEuNy9ygwrPuEBFdJSe0YSEI3Oup7k6wAhBV8H5dJS1Icxg7hL+T1/K06KpzP8 PGDRtTN+augUmMatcZ6rDSAhOTlwwxtE3Rck9TLugqJVko0fP+9Gl/WdT0Vix6s9m9 wUSd+g9Mya1FqMBgTs7RF4Sv4TRaEpNuTuBvuXrQ= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mga01.intel.com (mga01.intel.com [192.55.52.88]) by sourceware.org (Postfix) with ESMTPS id 6181F384A015 for ; Thu, 1 Jul 2021 06:18:01 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 6181F384A015 X-IronPort-AV: E=McAfee;i="6200,9189,10031"; a="230128753" X-IronPort-AV: E=Sophos;i="5.83,313,1616482800"; d="scan'208";a="230128753" Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Jun 2021 23:18:00 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.83,313,1616482800"; d="scan'208";a="476545954" Received: from scymds01.sc.intel.com ([10.148.94.138]) by fmsmga004.fm.intel.com with ESMTP; 30 Jun 2021 23:18:00 -0700 Received: from shliclel320.sh.intel.com (shliclel320.sh.intel.com [10.239.236.50]) by scymds01.sc.intel.com with ESMTP id 1616GmfM031625; Wed, 30 Jun 2021 23:17:59 -0700 To: gcc-patches@gcc.gnu.org Subject: [PATCH 43/62] AVX512FP16: Add testcase for fma instructions Date: Thu, 1 Jul 2021 14:16:29 +0800 Message-Id: <20210701061648.9447-44-hongtao.liu@intel.com> X-Mailer: git-send-email 2.18.1 In-Reply-To: <20210701061648.9447-1-hongtao.liu@intel.com> References: <20210701061648.9447-1-hongtao.liu@intel.com> X-Spam-Status: No, score=-12.2 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_NONE, KAM_DMARC_STATUS, KAM_LAZY_DOMAIN_SECURITY, KAM_SHORT, RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: liuhongt via Gcc-patches From: liuhongt Reply-To: liuhongt Cc: jakub@redhat.com Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" gcc/testsuite/ChangeLog: * gcc.target/i386/avx512fp16-vfmaddXXXph-1a.c: New test. * gcc.target/i386/avx512fp16-vfmaddXXXph-1b.c: Ditto. * gcc.target/i386/avx512fp16-vfmsubXXXph-1a.c: Ditto. * gcc.target/i386/avx512fp16-vfmsubXXXph-1b.c: Ditto. * gcc.target/i386/avx512fp16-vfnmaddXXXph-1a.c: Ditto. * gcc.target/i386/avx512fp16-vfnmaddXXXph-1b.c: Ditto. * gcc.target/i386/avx512fp16-vfnmsubXXXph-1a.c: Ditto. * gcc.target/i386/avx512fp16-vfnmsubXXXph-1b.c: Ditto. * gcc.target/i386/avx512fp16vl-vfmaddXXXph-1a.c: Ditto. * gcc.target/i386/avx512fp16vl-vfmaddXXXph-1b.c: Ditto. * gcc.target/i386/avx512fp16vl-vfmsubXXXph-1a.c: Ditto. * gcc.target/i386/avx512fp16vl-vfmsubXXXph-1b.c: Ditto. * gcc.target/i386/avx512fp16vl-vfnmaddXXXph-1a.c: Ditto. * gcc.target/i386/avx512fp16vl-vfnmaddXXXph-1b.c: Ditto. * gcc.target/i386/avx512fp16vl-vfnmsubXXXph-1a.c: Ditto. * gcc.target/i386/avx512fp16vl-vfnmsubXXXph-1b.c: Ditto. --- .../i386/avx512fp16-vfmaddXXXph-1a.c | 28 +++ .../i386/avx512fp16-vfmaddXXXph-1b.c | 160 ++++++++++++++++++ .../i386/avx512fp16-vfmsubXXXph-1a.c | 32 ++++ .../i386/avx512fp16-vfmsubXXXph-1b.c | 155 +++++++++++++++++ .../i386/avx512fp16-vfnmaddXXXph-1a.c | 28 +++ .../i386/avx512fp16-vfnmaddXXXph-1b.c | 159 +++++++++++++++++ .../i386/avx512fp16-vfnmsubXXXph-1a.c | 32 ++++ .../i386/avx512fp16-vfnmsubXXXph-1b.c | 157 +++++++++++++++++ .../i386/avx512fp16vl-vfmaddXXXph-1a.c | 28 +++ .../i386/avx512fp16vl-vfmaddXXXph-1b.c | 15 ++ .../i386/avx512fp16vl-vfmsubXXXph-1a.c | 28 +++ .../i386/avx512fp16vl-vfmsubXXXph-1b.c | 15 ++ .../i386/avx512fp16vl-vfnmaddXXXph-1a.c | 28 +++ .../i386/avx512fp16vl-vfnmaddXXXph-1b.c | 15 ++ .../i386/avx512fp16vl-vfnmsubXXXph-1a.c | 28 +++ .../i386/avx512fp16vl-vfnmsubXXXph-1b.c | 15 ++ 16 files changed, 923 insertions(+) create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vfmaddXXXph-1a.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vfmaddXXXph-1b.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vfmsubXXXph-1a.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vfmsubXXXph-1b.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vfnmaddXXXph-1a.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vfnmaddXXXph-1b.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vfnmsubXXXph-1a.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vfnmsubXXXph-1b.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vfmaddXXXph-1a.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vfmaddXXXph-1b.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vfmsubXXXph-1a.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vfmsubXXXph-1b.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vfnmaddXXXph-1a.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vfnmaddXXXph-1b.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vfnmsubXXXph-1a.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vfnmsubXXXph-1b.c diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vfmaddXXXph-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vfmaddXXXph-1a.c new file mode 100644 index 00000000000..f9e2777196a --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vfmaddXXXph-1a.c @@ -0,0 +1,28 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx512fp16 -O2" } */ +/* { dg-final { scan-assembler-times "vfmadd...ph\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vfmadd...ph\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 2 } } */ +/* { dg-final { scan-assembler-times "vfmadd231ph\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vfmadd...ph\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vfmadd...ph\[ \\t\]+\[^\n\]*\{rn-sae\}\[^\{\n\]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vfmadd...ph\[ \\t\]+\[^\n\]*\{rd-sae\}\[^\{\n\]*%zmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vfmadd231ph\[ \\t\]+\[^\n\]*\{ru-sae\}\[^\{\n\]*%zmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vfmadd...ph\[ \\t\]+\[^\n\]*\{rz-sae\}\[^\{\n\]*%zmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ + +#include + +volatile __m512h x1, x2, x3; +volatile __mmask32 m; + +void extern +avx512f_test (void) +{ + x1 = _mm512_fmadd_ph (x1, x2, x3); + x1 = _mm512_mask_fmadd_ph (x1, m, x2, x3); + x3 = _mm512_mask3_fmadd_ph (x1, x2, x3, m); + x1 = _mm512_maskz_fmadd_ph (m, x1, x2, x3); + x1 = _mm512_fmadd_round_ph (x1, x2, x3, _MM_FROUND_TO_NEAREST_INT | _MM_FROUND_NO_EXC); + x1 = _mm512_mask_fmadd_round_ph (x1, m, x2, x3, _MM_FROUND_TO_NEG_INF | _MM_FROUND_NO_EXC); + x3 = _mm512_mask3_fmadd_round_ph (x1, x2, x3, m, _MM_FROUND_TO_POS_INF | _MM_FROUND_NO_EXC); + x1 = _mm512_maskz_fmadd_round_ph (m, x1, x2, x3, _MM_FROUND_TO_ZERO | _MM_FROUND_NO_EXC); +} diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vfmaddXXXph-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vfmaddXXXph-1b.c new file mode 100644 index 00000000000..71c2b8fb930 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vfmaddXXXph-1b.c @@ -0,0 +1,160 @@ +/* { dg-do run { target avx512fp16 } } */ +/* { dg-options "-O2 -mavx512fp16 -mavx512dq" } */ + + +#define AVX512FP16 +#include "avx512fp16-helper.h" + +#define N_ELEMS (AVX512F_LEN / 16) + +void NOINLINE +EMULATE(fmadd_ph) (V512 * dest, V512 op1, V512 op2, + __mmask32 k, int zero_mask) +{ + V512 v1, v2, v3, v4, v5, v6, v7, v8; + int i; + __mmask16 m1, m2; + + m1 = k & 0xffff; + m2 = (k >> 16) & 0xffff; + + unpack_ph_2twops(op1, &v1, &v2); + unpack_ph_2twops(op2, &v3, &v4); + unpack_ph_2twops(*dest, &v7, &v8); + + for (i = 0; i < 16; i++) { + if (((1 << i) & m1) == 0) { + if (zero_mask) { + v5.f32[i] = 0; + } + else { + v5.u32[i] = v7.u32[i]; + } + } + else { + v5.f32[i] = v1.f32[i] * v3.f32[i] + v7.f32[i]; + } + + if (((1 << i) & m2) == 0) { + if (zero_mask) { + v6.f32[i] = 0; + } + else { + v6.u32[i] = v8.u32[i]; + } + } + else { + v6.f32[i] = v2.f32[i] * v4.f32[i] + v8.f32[i]; + } + + } + *dest = pack_twops_2ph(v5, v6); +} + +void NOINLINE +EMULATE(m_fmadd_ph) (V512 * dest, V512 op1, V512 op2, + __mmask32 k, int zero_mask) +{ + V512 v1, v2, v3, v4, v5, v6, v7, v8; + int i; + __mmask16 m1, m2; + + m1 = k & 0xffff; + m2 = (k >> 16) & 0xffff; + + unpack_ph_2twops(op1, &v1, &v2); + unpack_ph_2twops(op2, &v3, &v4); + unpack_ph_2twops(*dest, &v7, &v8); + + for (i = 0; i < 16; i++) { + if (((1 << i) & m1) == 0) { + if (zero_mask) { + v5.f32[i] = 0; + } + else { + v5.u32[i] = v7.u32[i]; + } + } + else { + v5.f32[i] = v7.f32[i] * v1.f32[i] + v3.f32[i]; + } + + if (((1 << i) & m2) == 0) { + if (zero_mask) { + v6.f32[i] = 0; + } + else { + v6.u32[i] = v8.u32[i]; + } + } + else { + v6.f32[i] = v8.f32[i] * v2.f32[i] + v4.f32[i]; + } + + } + *dest = pack_twops_2ph(v5, v6); +} + +void +TEST (void) +{ + V512 res; + V512 exp; + + init_src(); + + init_dest(&res, &exp); + EMULATE(fmadd_ph)(&exp, src1, src2, NET_MASK, 0); + HF(res) = INTRINSIC (_fmadd_ph) (HF(src1), HF(src2), + HF(res)); + CHECK_RESULT (&res, &exp, N_ELEMS, _fmadd_ph); + + init_dest(&res, &exp); + EMULATE(m_fmadd_ph)(&exp, src1, src2, MASK_VALUE, 0); + HF(res) = INTRINSIC (_mask_fmadd_ph) (HF(res), MASK_VALUE, + HF(src1), HF(src2)); + CHECK_RESULT (&res, &exp, N_ELEMS, _mask_fmadd_ph); + + init_dest(&res, &exp); + EMULATE(fmadd_ph)(&exp, src1, src2, MASK_VALUE, 0); + HF(res) = INTRINSIC (_mask3_fmadd_ph) (HF(src1), HF(src2), + HF(res), MASK_VALUE); + CHECK_RESULT (&res, &exp, N_ELEMS, _mask3_fmadd_ph); + + init_dest(&res, &exp); + EMULATE(fmadd_ph)(&exp, src1, src2, ZMASK_VALUE, 1); + HF(res) = INTRINSIC (_maskz_fmadd_ph) (ZMASK_VALUE, HF(src1), + HF(src2), HF(res)); + CHECK_RESULT (&res, &exp, N_ELEMS, _maskz_fmadd_ph); + +#if AVX512F_LEN == 512 + init_dest(&res, &exp); + EMULATE(fmadd_ph)(&exp, src1, src2, NET_MASK, 0); + HF(res) = INTRINSIC (_fmadd_round_ph) (HF(src1), HF(src2), + HF(res), _ROUND_NINT); + CHECK_RESULT (&res, &exp, N_ELEMS, _fmadd_ph); + + init_dest(&res, &exp); + EMULATE(m_fmadd_ph)(&exp, src1, src2, MASK_VALUE, 0); + HF(res) = INTRINSIC (_mask_fmadd_round_ph) (HF(res), MASK_VALUE, HF(src1), + HF(src2), _ROUND_NINT); + CHECK_RESULT (&res, &exp, N_ELEMS, _mask_fmadd_ph); + + EMULATE(fmadd_ph)(&exp, src1, src2, MASK_VALUE, 0); + HF(res) = INTRINSIC (_mask3_fmadd_round_ph) (HF(src1), HF(src2), HF(res), + MASK_VALUE, _ROUND_NINT); + CHECK_RESULT (&res, &exp, N_ELEMS, _mask3_fmadd_ph); + + init_dest(&res, &exp); + EMULATE(fmadd_ph)(&exp, src1, src2, ZMASK_VALUE, 1); + HF(res) = INTRINSIC (_maskz_fmadd_round_ph) (ZMASK_VALUE, HF(src1), HF(src2), + HF(res), _ROUND_NINT); + CHECK_RESULT (&res, &exp, N_ELEMS, _maskz_fmadd_ph); +#endif + + if (n_errs != 0) { + abort (); + } +} + + diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vfmsubXXXph-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vfmsubXXXph-1a.c new file mode 100644 index 00000000000..3b1147a41cd --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vfmsubXXXph-1a.c @@ -0,0 +1,32 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx512fp16 -O2" } */ +/* { dg-final { scan-assembler-times "vfmsub...ph\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vfmsub...ph\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 2 } } */ +/* { dg-final { scan-assembler-times "vfmsub231ph\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vfmsub...ph\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vfmsub...ph\[ \\t\]+\[^\n\]*\{rn-sae\}\[^\{\n\]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vfmsub...ph\[ \\t\]+\[^\n\]*\{rd-sae\}\[^\{\n\]*%zmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vfmsub231ph\[ \\t\]+\[^\n\]*\{ru-sae\}\[^\{\n\]*%zmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vfmsub...ph\[ \\t\]+\[^\n\]*\{rz-sae\}\[^\{\n\]*%zmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ + +#include + +volatile __m512h x1, x2, x3; +volatile __mmask32 m; + +void extern +avx512f_test (void) +{ + x1 = _mm512_fmsub_ph (x1, x2, x3); + x1 = _mm512_mask_fmsub_ph (x1, m, x2, x3); + x3 = _mm512_mask3_fmsub_ph (x1, x2, x3, m); + x1 = _mm512_maskz_fmsub_ph (m, x1, x2, x3); + x1 = _mm512_fmsub_round_ph (x1, x2, x3, _MM_FROUND_TO_NEAREST_INT + | _MM_FROUND_NO_EXC); + x1 = _mm512_mask_fmsub_round_ph (x1, m, x2, x3, _MM_FROUND_TO_NEG_INF + | _MM_FROUND_NO_EXC); + x3 = _mm512_mask3_fmsub_round_ph (x1, x2, x3, m, _MM_FROUND_TO_POS_INF + | _MM_FROUND_NO_EXC); + x1 = _mm512_maskz_fmsub_round_ph (m, x1, x2, x3, _MM_FROUND_TO_ZERO + | _MM_FROUND_NO_EXC); +} diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vfmsubXXXph-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vfmsubXXXph-1b.c new file mode 100644 index 00000000000..abb9a9bc826 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vfmsubXXXph-1b.c @@ -0,0 +1,155 @@ +/* { dg-do run { target avx512fp16 } } */ +/* { dg-options "-O2 -mavx512fp16 -mavx512dq" } */ + + +#define AVX512FP16 +#include "avx512fp16-helper.h" + +#define N_ELEMS (AVX512F_LEN / 16) + +void NOINLINE +EMULATE(fmsub_ph) (V512 * dest, V512 op1, V512 op2, + __mmask32 k, int zero_mask) +{ + V512 v1, v2, v3, v4, v5, v6, v7, v8; + int i; + __mmask16 m1, m2; + + m1 = k & 0xffff; + m2 = (k >> 16) & 0xffff; + + unpack_ph_2twops(op1, &v1, &v2); + unpack_ph_2twops(op2, &v3, &v4); + unpack_ph_2twops(*dest, &v7, &v8); + + for (i = 0; i < 16; i++) { + if (((1 << i) & m1) == 0) { + if (zero_mask) { + v5.f32[i] = 0; + } + else { + v5.u32[i] = v7.u32[i]; + } + } + else { + v5.f32[i] = v1.f32[i] * v3.f32[i] - v7.f32[i]; + } + + if (((1 << i) & m2) == 0) { + if (zero_mask) { + v6.f32[i] = 0; + } + else { + v6.u32[i] = v8.u32[i]; + } + } + else { + v6.f32[i] = v2.f32[i] * v4.f32[i] - v8.f32[i]; + } + + } + *dest = pack_twops_2ph(v5, v6); +} + +void NOINLINE +EMULATE(m_fmsub_ph) (V512 * dest, V512 op1, V512 op2, + __mmask32 k, int zero_mask) +{ + V512 v1, v2, v3, v4, v5, v6, v7, v8; + int i; + __mmask16 m1, m2; + + m1 = k & 0xffff; + m2 = (k >> 16) & 0xffff; + + unpack_ph_2twops(op1, &v1, &v2); + unpack_ph_2twops(op2, &v3, &v4); + unpack_ph_2twops(*dest, &v7, &v8); + + for (i = 0; i < 16; i++) { + if (((1 << i) & m1) == 0) { + if (zero_mask) { + v5.f32[i] = 0; + } + else { + v5.u32[i] = v7.u32[i]; + } + } + else { + v5.f32[i] = v7.f32[i] * v1.f32[i] - v3.f32[i]; + } + + if (((1 << i) & m2) == 0) { + if (zero_mask) { + v6.f32[i] = 0; + } + else { + v6.u32[i] = v8.u32[i]; + } + } + else { + v6.f32[i] = v8.f32[i] * v2.f32[i] - v4.f32[i]; + } + + } + *dest = pack_twops_2ph(v5, v6); +} + +void +TEST (void) +{ + V512 res; + V512 exp; + + init_src(); + + init_dest(&res, &exp); + EMULATE(fmsub_ph)(&exp, src1, src2, NET_MASK, 0); + HF(res) = INTRINSIC (_fmsub_ph) (HF(src1), HF(src2), HF(res)); + CHECK_RESULT (&res, &exp, N_ELEMS, _fmsub_ph); + + init_dest(&res, &exp); + EMULATE(m_fmsub_ph)(&exp, src1, src2, MASK_VALUE, 0); + HF(res) = INTRINSIC (_mask_fmsub_ph) (HF(res), MASK_VALUE, + HF(src1), HF(src2)); + CHECK_RESULT (&res, &exp, N_ELEMS, _mask_fmsub_ph); + + init_dest(&res, &exp); + EMULATE(fmsub_ph)(&exp, src1, src2, MASK_VALUE, 0); + HF(res) = INTRINSIC (_mask3_fmsub_ph) (HF(src1), HF(src2), HF(res), MASK_VALUE); + CHECK_RESULT (&res, &exp, N_ELEMS, _mask3_fmsub_ph); + + init_dest(&res, &exp); + EMULATE(fmsub_ph)(&exp, src1, src2, ZMASK_VALUE, 1); + HF(res) = INTRINSIC (_maskz_fmsub_ph) (ZMASK_VALUE, HF(src1), HF(src2), HF(res)); + CHECK_RESULT (&res, &exp, N_ELEMS, _maskz_fmsub_ph); + +#if AVX512F_LEN == 512 + init_dest(&res, &exp); + EMULATE(fmsub_ph)(&exp, src1, src2, NET_MASK, 0); + HF(res) = INTRINSIC (_fmsub_round_ph) (HF(src1), HF(src2), HF(res), _ROUND_NINT); + CHECK_RESULT (&res, &exp, N_ELEMS, _fmsub_ph); + + init_dest(&res, &exp); + EMULATE(m_fmsub_ph)(&exp, src1, src2, MASK_VALUE, 0); + HF(res) = INTRINSIC (_mask_fmsub_round_ph) (HF(res), MASK_VALUE, + HF(src1), HF(src2), _ROUND_NINT); + CHECK_RESULT (&res, &exp, N_ELEMS, _mask_fmsub_ph); + + EMULATE(fmsub_ph)(&exp, src1, src2, MASK_VALUE, 0); + HF(res) = INTRINSIC (_mask3_fmsub_round_ph) (HF(src1), HF(src2), + HF(res), MASK_VALUE, _ROUND_NINT); + CHECK_RESULT (&res, &exp, N_ELEMS, _mask3_fmsub_ph); + + init_dest(&res, &exp); + EMULATE(fmsub_ph)(&exp, src1, src2, ZMASK_VALUE, 1); + HF(res) = INTRINSIC (_maskz_fmsub_round_ph) (ZMASK_VALUE, HF(src1), + HF(src2), HF(res), _ROUND_NINT); + CHECK_RESULT (&res, &exp, N_ELEMS, _maskz_fmsub_ph); +#endif + + if (n_errs != 0) { + abort (); + } +} + diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vfnmaddXXXph-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vfnmaddXXXph-1a.c new file mode 100644 index 00000000000..20e77ce7398 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vfnmaddXXXph-1a.c @@ -0,0 +1,28 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx512fp16 -O2" } */ +/* { dg-final { scan-assembler-times "vfnmadd...ph\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vfnmadd...ph\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 2 } } */ +/* { dg-final { scan-assembler-times "vfnmadd231ph\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vfnmadd...ph\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vfnmadd...ph\[ \\t\]+\[^\n\]*\{rn-sae\}\[^\{\n\]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vfnmadd...ph\[ \\t\]+\[^\n\]*\{rd-sae\}\[^\{\n\]*%zmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vfnmadd231ph\[ \\t\]+\[^\n\]*\{ru-sae\}\[^\{\n\]*%zmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vfnmadd...ph\[ \\t\]+\[^\n\]*\{rz-sae\}\[^\{\n\]*%zmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ + +#include + +volatile __m512h x1, x2, x3; +volatile __mmask32 m; + +void extern +avx512f_test (void) +{ + x1 = _mm512_fnmadd_ph (x1, x2, x3); + x1 = _mm512_mask_fnmadd_ph (x1, m, x2, x3); + x3 = _mm512_mask3_fnmadd_ph (x1, x2, x3, m); + x1 = _mm512_maskz_fnmadd_ph (m, x1, x2, x3); + x1 = _mm512_fnmadd_round_ph (x1, x2, x3, _MM_FROUND_TO_NEAREST_INT | _MM_FROUND_NO_EXC); + x1 = _mm512_mask_fnmadd_round_ph (x1, m, x2, x3, _MM_FROUND_TO_NEG_INF | _MM_FROUND_NO_EXC); + x3 = _mm512_mask3_fnmadd_round_ph (x1, x2, x3, m, _MM_FROUND_TO_POS_INF | _MM_FROUND_NO_EXC); + x1 = _mm512_maskz_fnmadd_round_ph (m, x1, x2, x3, _MM_FROUND_TO_ZERO | _MM_FROUND_NO_EXC); +} diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vfnmaddXXXph-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vfnmaddXXXph-1b.c new file mode 100644 index 00000000000..b15b1bd1149 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vfnmaddXXXph-1b.c @@ -0,0 +1,159 @@ +/* { dg-do run { target avx512fp16 } } */ +/* { dg-options "-O2 -mavx512fp16 -mavx512dq" } */ + + +#define AVX512FP16 +#include "avx512fp16-helper.h" + +#define N_ELEMS (AVX512F_LEN / 16) + +void NOINLINE +EMULATE(fnmadd_ph) (V512 * dest, V512 op1, V512 op2, + __mmask32 k, int zero_mask) +{ + V512 v1, v2, v3, v4, v5, v6, v7, v8; + int i; + __mmask16 m1, m2; + + m1 = k & 0xffff; + m2 = (k >> 16) & 0xffff; + + unpack_ph_2twops(op1, &v1, &v2); + unpack_ph_2twops(op2, &v3, &v4); + unpack_ph_2twops(*dest, &v7, &v8); + + for (i = 0; i < 16; i++) { + if (((1 << i) & m1) == 0) { + if (zero_mask) { + v5.f32[i] = 0; + } + else { + v5.u32[i] = v7.u32[i]; + } + } + else { + v5.f32[i] = -(v1.f32[i] * v3.f32[i]) + v7.f32[i]; + } + + if (((1 << i) & m2) == 0) { + if (zero_mask) { + v6.f32[i] = 0; + } + else { + v6.u32[i] = v8.u32[i]; + } + } + else { + v6.f32[i] = -(v2.f32[i] * v4.f32[i]) + v8.f32[i]; + } + + } + *dest = pack_twops_2ph(v5, v6); +} + +void NOINLINE +EMULATE(m_fnmadd_ph) (V512 * dest, V512 op1, V512 op2, + __mmask32 k, int zero_mask) +{ + V512 v1, v2, v3, v4, v5, v6, v7, v8; + int i; + __mmask16 m1, m2; + + m1 = k & 0xffff; + m2 = (k >> 16) & 0xffff; + + unpack_ph_2twops(op1, &v1, &v2); + unpack_ph_2twops(op2, &v3, &v4); + unpack_ph_2twops(*dest, &v7, &v8); + + for (i = 0; i < 16; i++) { + if (((1 << i) & m1) == 0) { + if (zero_mask) { + v5.f32[i] = 0; + } + else { + v5.u32[i] = v7.u32[i]; + } + } + else { + v5.f32[i] = -(v1.f32[i] * v7.f32[i]) + v3.f32[i]; + } + + if (((1 << i) & m2) == 0) { + if (zero_mask) { + v6.f32[i] = 0; + } + else { + v6.u32[i] = v8.u32[i]; + } + } + else { + v6.f32[i] = -(v2.f32[i] * v8.f32[i]) + v4.f32[i]; + } + + } + *dest = pack_twops_2ph(v5, v6); +} + +void +TEST (void) +{ + V512 res; + V512 exp; + + init_src(); + + init_dest(&res, &exp); + EMULATE(fnmadd_ph)(&exp, src1, src2, NET_MASK, 0); + HF(res) = INTRINSIC (_fnmadd_ph) (HF(src1), HF(src2), + HF(res)); + CHECK_RESULT (&res, &exp, N_ELEMS, _fnmadd_ph); + + init_dest(&res, &exp); + EMULATE(m_fnmadd_ph)(&exp, src1, src2, MASK_VALUE, 0); + HF(res) = INTRINSIC (_mask_fnmadd_ph) (HF(res), MASK_VALUE, + HF(src1), HF(src2)); + CHECK_RESULT (&res, &exp, N_ELEMS, _mask_fnmadd_ph); + + init_dest(&res, &exp); + EMULATE(fnmadd_ph)(&exp, src1, src2, MASK_VALUE, 0); + HF(res) = INTRINSIC (_mask3_fnmadd_ph) (HF(src1), HF(src2), + HF(res), MASK_VALUE); + CHECK_RESULT (&res, &exp, N_ELEMS, _mask3_fnmadd_ph); + + init_dest(&res, &exp); + EMULATE(fnmadd_ph)(&exp, src1, src2, ZMASK_VALUE, 1); + HF(res) = INTRINSIC (_maskz_fnmadd_ph) (ZMASK_VALUE, HF(src1), + HF(src2), HF(res)); + CHECK_RESULT (&res, &exp, N_ELEMS, _maskz_fnmadd_ph); + +#if AVX512F_LEN == 512 + init_dest(&res, &exp); + EMULATE(fnmadd_ph)(&exp, src1, src2, NET_MASK, 0); + HF(res) = INTRINSIC (_fnmadd_round_ph) (HF(src1), HF(src2), + HF(res), _ROUND_NINT); + CHECK_RESULT (&res, &exp, N_ELEMS, _fnmadd_ph); + + init_dest(&res, &exp); + EMULATE(m_fnmadd_ph)(&exp, src1, src2, MASK_VALUE, 0); + HF(res) = INTRINSIC (_mask_fnmadd_round_ph) (HF(res), MASK_VALUE, + HF(src1), HF(src2), _ROUND_NINT); + CHECK_RESULT (&res, &exp, N_ELEMS, _mask_fnmadd_ph); + + EMULATE(fnmadd_ph)(&exp, src1, src2, MASK_VALUE, 0); + HF(res) = INTRINSIC (_mask3_fnmadd_round_ph) (HF(src1), HF(src2), + HF(res), MASK_VALUE, _ROUND_NINT); + CHECK_RESULT (&res, &exp, N_ELEMS, _mask3_fnmadd_ph); + + init_dest(&res, &exp); + EMULATE(fnmadd_ph)(&exp, src1, src2, ZMASK_VALUE, 1); + HF(res) = INTRINSIC (_maskz_fnmadd_round_ph) (ZMASK_VALUE, HF(src1), + HF(src2), HF(res), _ROUND_NINT); + CHECK_RESULT (&res, &exp, N_ELEMS, _maskz_fnmadd_ph); +#endif + + if (n_errs != 0) { + abort (); + } +} + diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vfnmsubXXXph-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vfnmsubXXXph-1a.c new file mode 100644 index 00000000000..eb05de46347 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vfnmsubXXXph-1a.c @@ -0,0 +1,32 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx512fp16 -O2" } */ +/* { dg-final { scan-assembler-times "vfnmsub...ph\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vfnmsub...ph\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 2 } } */ +/* { dg-final { scan-assembler-times "vfnmsub231ph\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vfnmsub...ph\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vfnmsub...ph\[ \\t\]+\[^\n\]*\{rn-sae\}\[^\{\n\]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vfnmsub...ph\[ \\t\]+\[^\n\]*\{rd-sae\}\[^\{\n\]*%zmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vfnmsub231ph\[ \\t\]+\[^\n\]*\{ru-sae\}\[^\{\n\]*%zmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vfnmsub...ph\[ \\t\]+\[^\n\]*\{rz-sae\}\[^\{\n\]*%zmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ + +#include + +volatile __m512h x1, x2, x3; +volatile __mmask32 m; + +void extern +avx512f_test (void) +{ + x1 = _mm512_fnmsub_ph (x1, x2, x3); + x1 = _mm512_mask_fnmsub_ph (x1, m, x2, x3); + x3 = _mm512_mask3_fnmsub_ph (x1, x2, x3, m); + x1 = _mm512_maskz_fnmsub_ph (m, x1, x2, x3); + x1 = _mm512_fnmsub_round_ph (x1, x2, x3, _MM_FROUND_TO_NEAREST_INT + | _MM_FROUND_NO_EXC); + x1 = _mm512_mask_fnmsub_round_ph (x1, m, x2, x3, _MM_FROUND_TO_NEG_INF + | _MM_FROUND_NO_EXC); + x3 = _mm512_mask3_fnmsub_round_ph (x1, x2, x3, m, _MM_FROUND_TO_POS_INF + | _MM_FROUND_NO_EXC); + x1 = _mm512_maskz_fnmsub_round_ph (m, x1, x2, x3, _MM_FROUND_TO_ZERO + | _MM_FROUND_NO_EXC); +} diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vfnmsubXXXph-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vfnmsubXXXph-1b.c new file mode 100644 index 00000000000..73f0172ca20 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vfnmsubXXXph-1b.c @@ -0,0 +1,157 @@ +/* { dg-do run { target avx512fp16 } } */ +/* { dg-options "-O2 -mavx512fp16 -mavx512dq" } */ + + +#define AVX512FP16 +#include "avx512fp16-helper.h" + +#define N_ELEMS (AVX512F_LEN / 16) + +void NOINLINE +EMULATE(fnmsub_ph) (V512 * dest, V512 op1, V512 op2, + __mmask32 k, int zero_mask) +{ + V512 v1, v2, v3, v4, v5, v6, v7, v8; + int i; + __mmask16 m1, m2; + + m1 = k & 0xffff; + m2 = (k >> 16) & 0xffff; + + unpack_ph_2twops(op1, &v1, &v2); + unpack_ph_2twops(op2, &v3, &v4); + unpack_ph_2twops(*dest, &v7, &v8); + + for (i = 0; i < 16; i++) { + if (((1 << i) & m1) == 0) { + if (zero_mask) { + v5.f32[i] = 0; + } + else { + v5.u32[i] = v7.u32[i]; + } + } + else { + v5.f32[i] = -(v1.f32[i] * v3.f32[i]) - v7.f32[i]; + } + + if (((1 << i) & m2) == 0) { + if (zero_mask) { + v6.f32[i] = 0; + } + else { + v6.u32[i] = v8.u32[i]; + } + } + else { + v6.f32[i] = -(v2.f32[i] * v4.f32[i]) - v8.f32[i]; + } + + } + *dest = pack_twops_2ph(v5, v6); +} + +void NOINLINE +EMULATE(m_fnmsub_ph) (V512 * dest, V512 op1, V512 op2, + __mmask32 k, int zero_mask) +{ + V512 v1, v2, v3, v4, v5, v6, v7, v8; + int i; + __mmask16 m1, m2; + + m1 = k & 0xffff; + m2 = (k >> 16) & 0xffff; + + unpack_ph_2twops(op1, &v1, &v2); + unpack_ph_2twops(op2, &v3, &v4); + unpack_ph_2twops(*dest, &v7, &v8); + + for (i = 0; i < 16; i++) { + if (((1 << i) & m1) == 0) { + if (zero_mask) { + v5.f32[i] = 0; + } + else { + v5.u32[i] = v7.u32[i]; + } + } + else { + v5.f32[i] = -(v1.f32[i] * v7.f32[i]) - v3.f32[i]; + } + + if (((1 << i) & m2) == 0) { + if (zero_mask) { + v6.f32[i] = 0; + } + else { + v6.u32[i] = v8.u32[i]; + } + } + else { + v6.f32[i] = -(v2.f32[i] * v8.f32[i]) - v4.f32[i]; + } + + } + *dest = pack_twops_2ph(v5, v6); +} + +void +TEST (void) +{ + V512 res; + V512 exp; + + init_src(); + + init_dest(&res, &exp); + EMULATE(fnmsub_ph)(&exp, src1, src2, NET_MASK, 0); + HF(res) = INTRINSIC (_fnmsub_ph) (HF(src1), HF(src2), + HF(res)); + CHECK_RESULT (&res, &exp, N_ELEMS, _fnmsub_ph); + + init_dest(&res, &exp); + EMULATE(m_fnmsub_ph)(&exp, src1, src2, MASK_VALUE, 0); + HF(res) = INTRINSIC (_mask_fnmsub_ph) (HF(res), MASK_VALUE, + HF(src1), HF(src2)); + CHECK_RESULT (&res, &exp, N_ELEMS, _mask_fnmsub_ph); + + init_dest(&res, &exp); + EMULATE(fnmsub_ph)(&exp, src1, src2, MASK_VALUE, 0); + HF(res) = INTRINSIC (_mask3_fnmsub_ph) (HF(src1), HF(src2), HF(res), MASK_VALUE); + CHECK_RESULT (&res, &exp, N_ELEMS, _mask3_fnmsub_ph); + + init_dest(&res, &exp); + EMULATE(fnmsub_ph)(&exp, src1, src2, ZMASK_VALUE, 1); + HF(res) = INTRINSIC (_maskz_fnmsub_ph) (ZMASK_VALUE, HF(src1), HF(src2), HF(res)); + CHECK_RESULT (&res, &exp, N_ELEMS, _maskz_fnmsub_ph); + +#if AVX512F_LEN == 512 + init_dest(&res, &exp); + EMULATE(fnmsub_ph)(&exp, src1, src2, NET_MASK, 0); + HF(res) = INTRINSIC (_fnmsub_round_ph) (HF(src1), HF(src2), + HF(res), _ROUND_NINT); + CHECK_RESULT (&res, &exp, N_ELEMS, _fnmsub_ph); + + init_dest(&res, &exp); + EMULATE(m_fnmsub_ph)(&exp, src1, src2, MASK_VALUE, 0); + HF(res) = INTRINSIC (_mask_fnmsub_round_ph) (HF(res), MASK_VALUE, + HF(src1), HF(src2), _ROUND_NINT); + CHECK_RESULT (&res, &exp, N_ELEMS, _mask_fnmsub_ph); + + EMULATE(fnmsub_ph)(&exp, src1, src2, MASK_VALUE, 0); + HF(res) = INTRINSIC (_mask3_fnmsub_round_ph) (HF(src1), HF(src2), + HF(res), MASK_VALUE, _ROUND_NINT); + CHECK_RESULT (&res, &exp, N_ELEMS, _mask3_fnmsub_ph); + + init_dest(&res, &exp); + EMULATE(fnmsub_ph)(&exp, src1, src2, ZMASK_VALUE, 1); + HF(res) = INTRINSIC (_maskz_fnmsub_round_ph) (ZMASK_VALUE, HF(src1), + HF(src2), HF(res), _ROUND_NINT); + CHECK_RESULT (&res, &exp, N_ELEMS, _maskz_fnmsub_ph); +#endif + + if (n_errs != 0) { + abort (); + } +} + diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-vfmaddXXXph-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vfmaddXXXph-1a.c new file mode 100644 index 00000000000..eea38b860ae --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vfmaddXXXph-1a.c @@ -0,0 +1,28 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx512vl -mavx512fp16 -O2" } */ +/* { dg-final { scan-assembler-times "vfmadd...ph\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 2 } } */ +/* { dg-final { scan-assembler-times "vfmadd...ph\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 2 } } */ +/* { dg-final { scan-assembler-times "vfmadd231ph\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vfmadd231ph\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vfmadd...ph\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vfmadd...ph\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ + +#include + +volatile __m256h yy, y2, y3; +volatile __m128h xx, x2, x3; +volatile __mmask8 m; +volatile __mmask16 m16; + +void extern +avx512vl_test (void) +{ + yy = _mm256_mask_fmadd_ph (yy, m16, y2, y3); + xx = _mm_mask_fmadd_ph (xx, m, x2, x3); + + y3 = _mm256_mask3_fmadd_ph (yy, y2, y3, m16); + x3 = _mm_mask3_fmadd_ph (xx, x2, x3, m); + + yy = _mm256_maskz_fmadd_ph (m16, yy, y2, y3); + xx = _mm_maskz_fmadd_ph (m, xx, x2, x3); +} diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-vfmaddXXXph-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vfmaddXXXph-1b.c new file mode 100644 index 00000000000..f6e4a9ae128 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vfmaddXXXph-1b.c @@ -0,0 +1,15 @@ +/* { dg-do run { target avx512fp16 } } */ +/* { dg-options "-O2 -mavx512fp16 -mavx512vl -mavx512dq" } */ + +#define AVX512VL +#define AVX512F_LEN 256 +#define AVX512F_LEN_HALF 128 +#include "avx512fp16-vfmaddXXXph-1b.c" + +#undef AVX512F_LEN +#undef AVX512F_LEN_HALF + +#define AVX512F_LEN 128 +#define AVX512F_LEN_HALF 128 +#include "avx512fp16-vfmaddXXXph-1b.c" + diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-vfmsubXXXph-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vfmsubXXXph-1a.c new file mode 100644 index 00000000000..add1abc2bea --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vfmsubXXXph-1a.c @@ -0,0 +1,28 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx512vl -mavx512fp16 -O2" } */ +/* { dg-final { scan-assembler-times "vfmsub...ph\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 2 } } */ +/* { dg-final { scan-assembler-times "vfmsub...ph\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 2 } } */ +/* { dg-final { scan-assembler-times "vfmsub231ph\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vfmsub231ph\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vfmsub...ph\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vfmsub...ph\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ + +#include + +volatile __m256h yy, y2, y3; +volatile __m128h xx, x2, x3; +volatile __mmask8 m; +volatile __mmask16 m16; + +void extern +avx512vl_test (void) +{ + yy = _mm256_mask_fmsub_ph (yy, m16, y2, y3); + xx = _mm_mask_fmsub_ph (xx, m, x2, x3); + + y3 = _mm256_mask3_fmsub_ph (yy, y2, y3, m16); + x3 = _mm_mask3_fmsub_ph (xx, x2, x3, m); + + yy = _mm256_maskz_fmsub_ph (m16, yy, y2, y3); + xx = _mm_maskz_fmsub_ph (m, xx, x2, x3); +} diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-vfmsubXXXph-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vfmsubXXXph-1b.c new file mode 100644 index 00000000000..b9c2085ecd4 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vfmsubXXXph-1b.c @@ -0,0 +1,15 @@ +/* { dg-do run { target avx512fp16 } } */ +/* { dg-options "-O2 -mavx512fp16 -mavx512vl -mavx512dq" } */ + +#define AVX512VL +#define AVX512F_LEN 256 +#define AVX512F_LEN_HALF 128 +#include "avx512fp16-vfmsubXXXph-1b.c" + +#undef AVX512F_LEN +#undef AVX512F_LEN_HALF + +#define AVX512F_LEN 128 +#define AVX512F_LEN_HALF 128 +#include "avx512fp16-vfmsubXXXph-1b.c" + diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-vfnmaddXXXph-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vfnmaddXXXph-1a.c new file mode 100644 index 00000000000..6dad9013581 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vfnmaddXXXph-1a.c @@ -0,0 +1,28 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx512vl -mavx512fp16 -O2" } */ +/* { dg-final { scan-assembler-times "vfnmadd...ph\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 2 } } */ +/* { dg-final { scan-assembler-times "vfnmadd...ph\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 2 } } */ +/* { dg-final { scan-assembler-times "vfnmadd231ph\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vfnmadd231ph\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vfnmadd...ph\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vfnmadd...ph\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ + +#include + +volatile __m256h yy, y2, y3; +volatile __m128h xx, x2, x3; +volatile __mmask8 m; +volatile __mmask16 m16; + +void extern +avx512vl_test (void) +{ + yy = _mm256_mask_fnmadd_ph (yy, m16, y2, y3); + xx = _mm_mask_fnmadd_ph (xx, m, x2, x3); + + y3 = _mm256_mask3_fnmadd_ph (yy, y2, y3, m16); + x3 = _mm_mask3_fnmadd_ph (xx, x2, x3, m); + + yy = _mm256_maskz_fnmadd_ph (m16, yy, y2, y3); + xx = _mm_maskz_fnmadd_ph (m, xx, x2, x3); +} diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-vfnmaddXXXph-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vfnmaddXXXph-1b.c new file mode 100644 index 00000000000..6c615d6541e --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vfnmaddXXXph-1b.c @@ -0,0 +1,15 @@ +/* { dg-do run { target avx512fp16 } } */ +/* { dg-options "-O2 -mavx512fp16 -mavx512vl -mavx512dq" } */ + +#define AVX512VL +#define AVX512F_LEN 256 +#define AVX512F_LEN_HALF 128 +#include "avx512fp16-vfnmaddXXXph-1b.c" + +#undef AVX512F_LEN +#undef AVX512F_LEN_HALF + +#define AVX512F_LEN 128 +#define AVX512F_LEN_HALF 128 +#include "avx512fp16-vfnmaddXXXph-1b.c" + diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-vfnmsubXXXph-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vfnmsubXXXph-1a.c new file mode 100644 index 00000000000..1a7fd092b73 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vfnmsubXXXph-1a.c @@ -0,0 +1,28 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx512vl -mavx512fp16 -O2" } */ +/* { dg-final { scan-assembler-times "vfnmsub...ph\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 2 } } */ +/* { dg-final { scan-assembler-times "vfnmsub...ph\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 2 } } */ +/* { dg-final { scan-assembler-times "vfnmsub231ph\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vfnmsub231ph\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vfnmsub...ph\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vfnmsub...ph\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ + +#include + +volatile __m256h yy, y2, y3; +volatile __m128h xx, x2, x3; +volatile __mmask8 m; +volatile __mmask16 m16; + +void extern +avx512vl_test (void) +{ + yy = _mm256_mask_fnmsub_ph (yy, m16, y2, y3); + xx = _mm_mask_fnmsub_ph (xx, m, x2, x3); + + y3 = _mm256_mask3_fnmsub_ph (yy, y2, y3, m16); + x3 = _mm_mask3_fnmsub_ph (xx, x2, x3, m); + + yy = _mm256_maskz_fnmsub_ph (m16, yy, y2, y3); + xx = _mm_maskz_fnmsub_ph (m, xx, x2, x3); +} diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-vfnmsubXXXph-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vfnmsubXXXph-1b.c new file mode 100644 index 00000000000..6d72b3dc220 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vfnmsubXXXph-1b.c @@ -0,0 +1,15 @@ +/* { dg-do run { target avx512fp16 } } */ +/* { dg-options "-O2 -mavx512fp16 -mavx512vl -mavx512dq" } */ + +#define AVX512VL +#define AVX512F_LEN 256 +#define AVX512F_LEN_HALF 128 +#include "avx512fp16-vfnmsubXXXph-1b.c" + +#undef AVX512F_LEN +#undef AVX512F_LEN_HALF + +#define AVX512F_LEN 128 +#define AVX512F_LEN_HALF 128 +#include "avx512fp16-vfnmsubXXXph-1b.c" + From patchwork Thu Jul 1 06:16:30 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: liuhongt X-Patchwork-Id: 1499391 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=8.43.85.97; helo=sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256 header.s=default header.b=TzjYxiBZ; dkim-atps=neutral Received: from sourceware.org (ip-8-43-85-97.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4GFq5h6psPz9sWw for ; Thu, 1 Jul 2021 17:07:04 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 3BA953848015 for ; Thu, 1 Jul 2021 07:07:02 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 3BA953848015 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1625123222; bh=uh/RHTUGmy6wFZ91UMwU/nRue/LBR8af4jIn1Gk40TY=; h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=TzjYxiBZDyM4lW362kcrshc9f3pdI8Mvg55UOai65WWWkvUjqt5s4py00A+u6wJO7 XNxP+9kJtwR8oVBlkBdSgHta6u1rS+pWpjXU//0tvRnQayzXvBkdmysbO4kSu6Mslg fUBzDoT6es3/jHNccFnOX7v0XMvddYQ+xFBevoTw= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mga07.intel.com (mga07.intel.com [134.134.136.100]) by sourceware.org (Postfix) with ESMTPS id 38DB43848015 for ; Thu, 1 Jul 2021 06:18:03 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 38DB43848015 X-IronPort-AV: E=McAfee;i="6200,9189,10031"; a="272334123" X-IronPort-AV: E=Sophos;i="5.83,313,1616482800"; d="scan'208";a="272334123" Received: from orsmga005.jf.intel.com ([10.7.209.41]) by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Jun 2021 23:18:02 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.83,313,1616482800"; d="scan'208";a="626257580" Received: from scymds01.sc.intel.com ([10.148.94.138]) by orsmga005.jf.intel.com with ESMTP; 30 Jun 2021 23:18:02 -0700 Received: from shliclel320.sh.intel.com (shliclel320.sh.intel.com [10.239.236.50]) by scymds01.sc.intel.com with ESMTP id 1616GmfN031625; Wed, 30 Jun 2021 23:18:00 -0700 To: gcc-patches@gcc.gnu.org Subject: [PATCH 44/62] AVX512FP16: Add scalar/vector bitwise operations, including Date: Thu, 1 Jul 2021 14:16:30 +0800 Message-Id: <20210701061648.9447-45-hongtao.liu@intel.com> X-Mailer: git-send-email 2.18.1 In-Reply-To: <20210701061648.9447-1-hongtao.liu@intel.com> References: <20210701061648.9447-1-hongtao.liu@intel.com> X-Spam-Status: No, score=-12.5 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_NONE, KAM_DMARC_STATUS, KAM_LAZY_DOMAIN_SECURITY, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: liuhongt via Gcc-patches From: liuhongt Reply-To: liuhongt Cc: jakub@redhat.com Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" From: "H.J. Lu" 1. FP16 vector xor/ior/and/andnot/abs/neg 2. FP16 scalar abs/neg/copysign/xorsign gcc/ChangeLog: * config/i386/i386-expand.c (ix86_expand_fp_absneg_operator): Handle HFmode. (ix86_expand_copysign): Ditto. (ix86_expand_xorsign): Ditto. * config/i386/i386.c (ix86_build_const_vector): Handle HF vector modes. (ix86_build_signbit_mask): Ditto. (ix86_can_change_mode_class): Ditto. * config/i386/i386.md (SSEMODEF): Add HF mode. (ssevecmodef): Ditto. (2): Use MODEFH. (*2_1): Ditto. (define_split): Ditto. (xorsign3): Ditto. (@xorsign3_1): Ditto. * config/i386/sse.md (VFB): New mode iterator. (VFB_128_256): Ditto. (VFB_512): Ditto. (sseintvecmode2): Support HF vector mode. (2): Use new mode iterator. (*2): Ditto. (copysign3): Ditto. (xorsign3): Ditto. (3): Ditto. (3): Ditto. (_andnot3): Adjust for HF vector mode. (_andnot3): Ditto. (*3): Ditto. (*3): Ditto. --- gcc/config/i386/i386-expand.c | 12 +++- gcc/config/i386/i386.c | 12 +++- gcc/config/i386/i386.md | 40 ++++++----- gcc/config/i386/sse.md | 128 ++++++++++++++++++++-------------- 4 files changed, 118 insertions(+), 74 deletions(-) diff --git a/gcc/config/i386/i386-expand.c b/gcc/config/i386/i386-expand.c index 9233c6cd1e8..006f4bec8db 100644 --- a/gcc/config/i386/i386-expand.c +++ b/gcc/config/i386/i386-expand.c @@ -1781,6 +1781,8 @@ ix86_expand_fp_absneg_operator (enum rtx_code code, machine_mode mode, vmode = V4SFmode; else if (mode == DFmode) vmode = V2DFmode; + else if (mode == HFmode) + vmode = V8HFmode; } dst = operands[0]; @@ -1918,7 +1920,9 @@ ix86_expand_copysign (rtx operands[]) mode = GET_MODE (dest); - if (mode == SFmode) + if (mode == HFmode) + vmode = V8HFmode; + else if (mode == SFmode) vmode = V4SFmode; else if (mode == DFmode) vmode = V2DFmode; @@ -1934,7 +1938,7 @@ ix86_expand_copysign (rtx operands[]) if (real_isneg (CONST_DOUBLE_REAL_VALUE (op0))) op0 = simplify_unary_operation (ABS, mode, op0, mode); - if (mode == SFmode || mode == DFmode) + if (mode == HFmode || mode == SFmode || mode == DFmode) { if (op0 == CONST0_RTX (mode)) op0 = CONST0_RTX (vmode); @@ -2073,7 +2077,9 @@ ix86_expand_xorsign (rtx operands[]) mode = GET_MODE (dest); - if (mode == SFmode) + if (mode == HFmode) + vmode = V8HFmode; + else if (mode == SFmode) vmode = V4SFmode; else if (mode == DFmode) vmode = V2DFmode; diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index dc0d440061b..17e1b5ea874 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -15374,6 +15374,9 @@ ix86_build_const_vector (machine_mode mode, bool vect, rtx value) case E_V2DImode: gcc_assert (vect); /* FALLTHRU */ + case E_V8HFmode: + case E_V16HFmode: + case E_V32HFmode: case E_V16SFmode: case E_V8SFmode: case E_V4SFmode: @@ -15412,6 +15415,13 @@ ix86_build_signbit_mask (machine_mode mode, bool vect, bool invert) switch (mode) { + case E_V8HFmode: + case E_V16HFmode: + case E_V32HFmode: + vec_mode = mode; + imode = HImode; + break; + case E_V16SImode: case E_V16SFmode: case E_V8SImode: @@ -19198,7 +19208,7 @@ ix86_can_change_mode_class (machine_mode from, machine_mode to, disallow a change to these modes, reload will assume it's ok to drop the subreg from (subreg:SI (reg:HI 100) 0). This affects the vec_dupv4hi pattern. */ - if (GET_MODE_SIZE (from) < 4) + if (GET_MODE_SIZE (from) < 4 && from != E_HFmode) return false; } diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md index 014aba187e1..a85c23d74f1 100644 --- a/gcc/config/i386/i386.md +++ b/gcc/config/i386/i386.md @@ -1233,9 +1233,10 @@ (define_mode_iterator MODEFH [(HF "TARGET_AVX512FP16") SF DF]) ;; All x87 floating point modes plus HFmode (define_mode_iterator X87MODEFH [HF SF DF XF]) -;; All SSE floating point modes -(define_mode_iterator SSEMODEF [SF DF TF]) -(define_mode_attr ssevecmodef [(SF "V4SF") (DF "V2DF") (TF "TF")]) +;; All SSE floating point modes and HFmode +(define_mode_iterator SSEMODEF [HF SF DF TF]) +(define_mode_attr ssevecmodef [(HF "V8HF") (SF "V4SF") (DF "V2DF") (TF "TF")]) + ;; SSE instruction suffix for various modes (define_mode_attr ssemodesuffix @@ -10529,8 +10530,8 @@ (define_insn_and_split "*nabstf2_1" [(set_attr "isa" "noavx,noavx,avx,avx")]) (define_expand "2" - [(set (match_operand:X87MODEF 0 "register_operand") - (absneg:X87MODEF (match_operand:X87MODEF 1 "register_operand")))] + [(set (match_operand:X87MODEFH 0 "register_operand") + (absneg:X87MODEFH (match_operand:X87MODEFH 1 "register_operand")))] "TARGET_80387 || (SSE_FLOAT_MODE_P (mode) && TARGET_SSE_MATH)" "ix86_expand_fp_absneg_operator (, mode, operands); DONE;") @@ -10559,9 +10560,9 @@ (define_split "ix86_split_fp_absneg_operator (, mode, operands); DONE;") (define_insn "*2_1" - [(set (match_operand:MODEF 0 "register_operand" "=x,x,Yv,f,!r") - (absneg:MODEF - (match_operand:MODEF 1 "register_operand" "0,x,Yv,0,0"))) + [(set (match_operand:MODEFH 0 "register_operand" "=x,x,Yv,f,!r") + (absneg:MODEFH + (match_operand:MODEFH 1 "register_operand" "0,x,Yv,0,0"))) (use (match_operand: 2 "vector_operand" "xBm,0,Yvm,X,X")) (clobber (reg:CC FLAGS_REG))] "TARGET_80387 || (SSE_FLOAT_MODE_P (mode) && TARGET_SSE_MATH)" @@ -10572,7 +10573,8 @@ (define_insn "*2_1" (match_test ("SSE_FLOAT_MODE_P (mode) && TARGET_SSE_MATH")) (if_then_else (eq_attr "alternative" "3,4") - (symbol_ref "TARGET_MIX_SSE_I387") + (symbol_ref "TARGET_MIX_SSE_I387 + && mode != HFmode") (const_string "*")) (if_then_else (eq_attr "alternative" "3,4") @@ -10580,9 +10582,9 @@ (define_insn "*2_1" (symbol_ref "false"))))]) (define_split - [(set (match_operand:MODEF 0 "sse_reg_operand") - (absneg:MODEF - (match_operand:MODEF 1 "sse_reg_operand"))) + [(set (match_operand:MODEFH 0 "sse_reg_operand") + (absneg:MODEFH + (match_operand:MODEFH 1 "sse_reg_operand"))) (use (match_operand: 2 "vector_operand")) (clobber (reg:CC FLAGS_REG))] "SSE_FLOAT_MODE_P (mode) && TARGET_SSE_MATH @@ -10706,17 +10708,17 @@ (define_split "ix86_split_copysign_var (operands); DONE;") (define_expand "xorsign3" - [(match_operand:MODEF 0 "register_operand") - (match_operand:MODEF 1 "register_operand") - (match_operand:MODEF 2 "register_operand")] + [(match_operand:MODEFH 0 "register_operand") + (match_operand:MODEFH 1 "register_operand") + (match_operand:MODEFH 2 "register_operand")] "SSE_FLOAT_MODE_P (mode) && TARGET_SSE_MATH" "ix86_expand_xorsign (operands); DONE;") (define_insn_and_split "@xorsign3_1" - [(set (match_operand:MODEF 0 "register_operand" "=Yv") - (unspec:MODEF - [(match_operand:MODEF 1 "register_operand" "Yv") - (match_operand:MODEF 2 "register_operand" "0") + [(set (match_operand:MODEFH 0 "register_operand" "=Yv") + (unspec:MODEFH + [(match_operand:MODEFH 1 "register_operand" "Yv") + (match_operand:MODEFH 2 "register_operand" "0") (match_operand: 3 "nonimmediate_operand" "Yvm")] UNSPEC_XORSIGN))] "SSE_FLOAT_MODE_P (mode) && TARGET_SSE_MATH" diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index fdcc0515228..7c594babcce 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -317,11 +317,26 @@ (define_mode_iterator VFH (V16SF "TARGET_AVX512F") (V8SF "TARGET_AVX") V4SF (V8DF "TARGET_AVX512F") (V4DF "TARGET_AVX") (V2DF "TARGET_SSE2")]) +;; 128-, 256- and 512-bit float vector modes for bitwise operations +(define_mode_iterator VFB + [(V32HF "TARGET_AVX512FP16") + (V16HF "TARGET_AVX512FP16") + (V8HF "TARGET_AVX512FP16") + (V16SF "TARGET_AVX512F") (V8SF "TARGET_AVX") V4SF + (V8DF "TARGET_AVX512F") (V4DF "TARGET_AVX") (V2DF "TARGET_SSE2")]) + ;; 128- and 256-bit float vector modes (define_mode_iterator VF_128_256 [(V8SF "TARGET_AVX") V4SF (V4DF "TARGET_AVX") (V2DF "TARGET_SSE2")]) +;; 128- and 256-bit float vector modes for bitwise operations +(define_mode_iterator VFB_128_256 + [(V16HF "TARGET_AVX512FP16") + (V8HF "TARGET_AVX512FP16") + (V8SF "TARGET_AVX") V4SF + (V4DF "TARGET_AVX") (V2DF "TARGET_SSE2")]) + ;; All SFmode vector float modes (define_mode_iterator VF1 [(V16SF "TARGET_AVX512F") (V8SF "TARGET_AVX") V4SF]) @@ -374,6 +389,10 @@ (define_mode_iterator VF_256 (define_mode_iterator VF_512 [V16SF V8DF]) +;; All 512bit vector float modes for bitwise operations +(define_mode_iterator VFB_512 + [(V32HF "TARGET_AVX512FP16") V16SF V8DF]) + (define_mode_iterator VI48_AVX512VL [V16SI (V8SI "TARGET_AVX512VL") (V4SI "TARGET_AVX512VL") V8DI (V4DI "TARGET_AVX512VL") (V2DI "TARGET_AVX512VL")]) @@ -923,7 +942,8 @@ (define_mode_attr sseintvecmode (define_mode_attr sseintvecmode2 [(V8DF "XI") (V4DF "OI") (V2DF "TI") - (V8SF "OI") (V4SF "TI")]) + (V8SF "OI") (V4SF "TI") + (V16HF "OI") (V8HF "TI")]) (define_mode_attr sseintvecmodelower [(V16SF "v16si") (V8DF "v8di") @@ -1968,22 +1988,22 @@ (define_insn "kunpckdi" ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; (define_expand "2" - [(set (match_operand:VF 0 "register_operand") - (absneg:VF - (match_operand:VF 1 "register_operand")))] + [(set (match_operand:VFB 0 "register_operand") + (absneg:VFB + (match_operand:VFB 1 "register_operand")))] "TARGET_SSE" "ix86_expand_fp_absneg_operator (, mode, operands); DONE;") (define_insn_and_split "*2" - [(set (match_operand:VF 0 "register_operand" "=x,x,v,v") - (absneg:VF - (match_operand:VF 1 "vector_operand" "0,xBm,v,m"))) - (use (match_operand:VF 2 "vector_operand" "xBm,0,vm,v"))] + [(set (match_operand:VFB 0 "register_operand" "=x,x,v,v") + (absneg:VFB + (match_operand:VFB 1 "vector_operand" "0,xBm,v,m"))) + (use (match_operand:VFB 2 "vector_operand" "xBm,0,vm,v"))] "TARGET_SSE" "#" "&& reload_completed" [(set (match_dup 0) - (:VF (match_dup 1) (match_dup 2)))] + (:VFB (match_dup 1) (match_dup 2)))] { if (TARGET_AVX) { @@ -3893,11 +3913,11 @@ (define_expand "vcond_mask_" ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; (define_insn "_andnot3" - [(set (match_operand:VF_128_256 0 "register_operand" "=x,x,v,v") - (and:VF_128_256 - (not:VF_128_256 - (match_operand:VF_128_256 1 "register_operand" "0,x,v,v")) - (match_operand:VF_128_256 2 "vector_operand" "xBm,xm,vm,vm")))] + [(set (match_operand:VFB_128_256 0 "register_operand" "=x,x,v,v") + (and:VFB_128_256 + (not:VFB_128_256 + (match_operand:VFB_128_256 1 "register_operand" "0,x,v,v")) + (match_operand:VFB_128_256 2 "vector_operand" "xBm,xm,vm,vm")))] "TARGET_SSE && " { char buf[128]; @@ -3920,6 +3940,8 @@ (define_insn "_andnot3" switch (get_attr_mode (insn)) { + case MODE_V16HF: + case MODE_V8HF: case MODE_V8SF: case MODE_V4SF: suffix = "ps"; @@ -3958,11 +3980,11 @@ (define_insn "_andnot3" (const_string "")))]) (define_insn "_andnot3" - [(set (match_operand:VF_512 0 "register_operand" "=v") - (and:VF_512 - (not:VF_512 - (match_operand:VF_512 1 "register_operand" "v")) - (match_operand:VF_512 2 "nonimmediate_operand" "vm")))] + [(set (match_operand:VFB_512 0 "register_operand" "=v") + (and:VFB_512 + (not:VFB_512 + (match_operand:VFB_512 1 "register_operand" "v")) + (match_operand:VFB_512 2 "nonimmediate_operand" "vm")))] "TARGET_AVX512F" { char buf[128]; @@ -3972,8 +3994,9 @@ (define_insn "_andnot3" suffix = ""; ops = ""; - /* There is no vandnp[sd] in avx512f. Use vpandn[qd]. */ - if (!TARGET_AVX512DQ) + /* Since there are no vandnp[sd] without AVX512DQ nor vandnph, + use vp[dq]. */ + if (!TARGET_AVX512DQ || mode == V32HFmode) { suffix = GET_MODE_INNER (mode) == DFmode ? "q" : "d"; ops = "p"; @@ -3993,26 +4016,26 @@ (define_insn "_andnot3" (const_string "XI")))]) (define_expand "3" - [(set (match_operand:VF_128_256 0 "register_operand") - (any_logic:VF_128_256 - (match_operand:VF_128_256 1 "vector_operand") - (match_operand:VF_128_256 2 "vector_operand")))] + [(set (match_operand:VFB_128_256 0 "register_operand") + (any_logic:VFB_128_256 + (match_operand:VFB_128_256 1 "vector_operand") + (match_operand:VFB_128_256 2 "vector_operand")))] "TARGET_SSE && " "ix86_fixup_binary_operands_no_copy (, mode, operands);") (define_expand "3" - [(set (match_operand:VF_512 0 "register_operand") - (any_logic:VF_512 - (match_operand:VF_512 1 "nonimmediate_operand") - (match_operand:VF_512 2 "nonimmediate_operand")))] + [(set (match_operand:VFB_512 0 "register_operand") + (any_logic:VFB_512 + (match_operand:VFB_512 1 "nonimmediate_operand") + (match_operand:VFB_512 2 "nonimmediate_operand")))] "TARGET_AVX512F" "ix86_fixup_binary_operands_no_copy (, mode, operands);") (define_insn "*3" - [(set (match_operand:VF_128_256 0 "register_operand" "=x,x,v,v") - (any_logic:VF_128_256 - (match_operand:VF_128_256 1 "vector_operand" "%0,x,v,v") - (match_operand:VF_128_256 2 "vector_operand" "xBm,xm,vm,vm")))] + [(set (match_operand:VFB_128_256 0 "register_operand" "=x,x,v,v") + (any_logic:VFB_128_256 + (match_operand:VFB_128_256 1 "vector_operand" "%0,x,v,v") + (match_operand:VFB_128_256 2 "vector_operand" "xBm,xm,vm,vm")))] "TARGET_SSE && && !(MEM_P (operands[1]) && MEM_P (operands[2]))" { @@ -4036,6 +4059,8 @@ (define_insn "*3" switch (get_attr_mode (insn)) { + case MODE_V16HF: + case MODE_V8HF: case MODE_V8SF: case MODE_V4SF: suffix = "ps"; @@ -4074,10 +4099,10 @@ (define_insn "*3" (const_string "")))]) (define_insn "*3" - [(set (match_operand:VF_512 0 "register_operand" "=v") - (any_logic:VF_512 - (match_operand:VF_512 1 "nonimmediate_operand" "%v") - (match_operand:VF_512 2 "nonimmediate_operand" "vm")))] + [(set (match_operand:VFB_512 0 "register_operand" "=v") + (any_logic:VFB_512 + (match_operand:VFB_512 1 "nonimmediate_operand" "%v") + (match_operand:VFB_512 2 "nonimmediate_operand" "vm")))] "TARGET_AVX512F && !(MEM_P (operands[1]) && MEM_P (operands[2]))" { char buf[128]; @@ -4087,8 +4112,9 @@ (define_insn "*3" suffix = ""; ops = ""; - /* There is no vp[sd] in avx512f. Use vp[dq]. */ - if (!TARGET_AVX512DQ) + /* Since there are no vp[sd] without AVX512DQ nor vph, + use vp[dq]. */ + if (!TARGET_AVX512DQ || mode == V32HFmode) { suffix = GET_MODE_INNER (mode) == DFmode ? "q" : "d"; ops = "p"; @@ -4109,14 +4135,14 @@ (define_insn "*3" (define_expand "copysign3" [(set (match_dup 4) - (and:VF - (not:VF (match_dup 3)) - (match_operand:VF 1 "vector_operand"))) + (and:VFB + (not:VFB (match_dup 3)) + (match_operand:VFB 1 "vector_operand"))) (set (match_dup 5) - (and:VF (match_dup 3) - (match_operand:VF 2 "vector_operand"))) - (set (match_operand:VF 0 "register_operand") - (ior:VF (match_dup 4) (match_dup 5)))] + (and:VFB (match_dup 3) + (match_operand:VFB 2 "vector_operand"))) + (set (match_operand:VFB 0 "register_operand") + (ior:VFB (match_dup 4) (match_dup 5)))] "TARGET_SSE" { operands[3] = ix86_build_signbit_mask (mode, 1, 0); @@ -4127,11 +4153,11 @@ (define_expand "copysign3" (define_expand "xorsign3" [(set (match_dup 4) - (and:VF (match_dup 3) - (match_operand:VF 2 "vector_operand"))) - (set (match_operand:VF 0 "register_operand") - (xor:VF (match_dup 4) - (match_operand:VF 1 "vector_operand")))] + (and:VFB (match_dup 3) + (match_operand:VFB 2 "vector_operand"))) + (set (match_operand:VFB 0 "register_operand") + (xor:VFB (match_dup 4) + (match_operand:VFB 1 "vector_operand")))] "TARGET_SSE" { operands[3] = ix86_build_signbit_mask (mode, 1, 0); From patchwork Thu Jul 1 06:16:31 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: liuhongt X-Patchwork-Id: 1499396 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256 header.s=default header.b=a3K3ENnx; dkim-atps=neutral Received: from sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4GFq7R6NBZz9sVb for ; Thu, 1 Jul 2021 17:08:34 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 852E73848027 for ; Thu, 1 Jul 2021 07:08:31 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 852E73848027 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1625123311; bh=KjUa7mqE4aR4hwKw7O3BP60whEeBKybL6ni2edUbeEE=; h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=a3K3ENnxOXAyn24u/w50ilRMP26WV5k2Bbsv0jwiuBDrBwIR9l9FVuB+BnvTrNYt2 13zEhhRorgdCoId/eaiijOL1XjCUXXddrs5YZqcbMD49+oM0oBw6Fhu2xV0qa1uizF ECVsXhGhcG0wSMyXXg2GlUxlTSBPFqJwQhJKbVKc= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mga01.intel.com (mga01.intel.com [192.55.52.88]) by sourceware.org (Postfix) with ESMTPS id 27FE2384604F for ; Thu, 1 Jul 2021 06:18:05 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 27FE2384604F X-IronPort-AV: E=McAfee;i="6200,9189,10031"; a="230128762" X-IronPort-AV: E=Sophos;i="5.83,313,1616482800"; d="scan'208";a="230128762" Received: from orsmga006.jf.intel.com ([10.7.209.51]) by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Jun 2021 23:18:04 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.83,313,1616482800"; d="scan'208";a="408822413" Received: from scymds01.sc.intel.com ([10.148.94.138]) by orsmga006.jf.intel.com with ESMTP; 30 Jun 2021 23:18:03 -0700 Received: from shliclel320.sh.intel.com (shliclel320.sh.intel.com [10.239.236.50]) by scymds01.sc.intel.com with ESMTP id 1616GmfO031625; Wed, 30 Jun 2021 23:18:02 -0700 To: gcc-patches@gcc.gnu.org Subject: [PATCH 45/62] AVX512FP16: Add testcase for fp16 bitwise operations. Date: Thu, 1 Jul 2021 14:16:31 +0800 Message-Id: <20210701061648.9447-46-hongtao.liu@intel.com> X-Mailer: git-send-email 2.18.1 In-Reply-To: <20210701061648.9447-1-hongtao.liu@intel.com> References: <20210701061648.9447-1-hongtao.liu@intel.com> X-Spam-Status: No, score=-12.2 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_NONE, KAM_DMARC_STATUS, KAM_LAZY_DOMAIN_SECURITY, KAM_SHORT, RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: liuhongt via Gcc-patches From: liuhongt Reply-To: liuhongt Cc: jakub@redhat.com Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" gcc/testsuite/ChangeLog: * gcc.target/i386/avx512fp16-neg-1a.c: New test. * gcc.target/i386/avx512fp16-neg-1b.c: Ditto. * gcc.target/i386/avx512fp16-scalar-bitwise-1a.c: Ditto. * gcc.target/i386/avx512fp16-scalar-bitwise-1b.c: Ditto. * gcc.target/i386/avx512fp16-vector-bitwise-1a.c: Ditto. * gcc.target/i386/avx512fp16-vector-bitwise-1b.c: Ditto. * gcc.target/i386/avx512fp16vl-neg-1a.c: Ditto. * gcc.target/i386/avx512fp16vl-neg-1b.c: Ditto. --- .../gcc.target/i386/avx512fp16-neg-1a.c | 19 +++ .../gcc.target/i386/avx512fp16-neg-1b.c | 33 +++++ .../i386/avx512fp16-scalar-bitwise-1a.c | 31 +++++ .../i386/avx512fp16-scalar-bitwise-1b.c | 82 ++++++++++++ .../i386/avx512fp16-vector-bitwise-1a.c | 124 ++++++++++++++++++ .../i386/avx512fp16-vector-bitwise-1b.c | 119 +++++++++++++++++ .../gcc.target/i386/avx512fp16vl-neg-1a.c | 18 +++ .../gcc.target/i386/avx512fp16vl-neg-1b.c | 33 +++++ 8 files changed, 459 insertions(+) create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-neg-1a.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-neg-1b.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-scalar-bitwise-1a.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-scalar-bitwise-1b.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vector-bitwise-1a.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vector-bitwise-1b.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-neg-1a.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-neg-1b.c diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-neg-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16-neg-1a.c new file mode 100644 index 00000000000..bf7693e0b1d --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-neg-1a.c @@ -0,0 +1,19 @@ +/* { dg-do compile} */ +/* { dg-options "-O2 -mavx512fp16" } */ + +/* { dg-final { scan-assembler-times "vpxord\[ \\t\]+\[^\n\r\]*%zmm0" 1 } } */ +/* { dg-final { scan-assembler-times "vxorps\[ \\t\]+\[^\n\r\]*%xmm0" 1 } } */ + +#include + +_Float16 +neghf (_Float16 a) +{ + return -a; +} + +__m512h +neghf512 (__m512h a) +{ + return -a; +} diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-neg-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-neg-1b.c new file mode 100644 index 00000000000..770f7b283d8 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-neg-1b.c @@ -0,0 +1,33 @@ +/* { dg-do run { target avx512fp16 } } */ +/* { dg-options "-O2 -mavx512fp16 -mavx512dq" } */ + +#define AVX512FP16 +#include "avx512fp16-helper.h" + +static void +test_512 (void) +{ + V512 v1, v2, v3, v4, exp, res; + int i; + init_src(); + + unpack_ph_2twops(src1, &v1, &v2); + v1.f32[0] = -v1.f32[0]; + exp = pack_twops_2ph(v1, v2); + res.zmmh = src1.zmmh; + res.f16[0] = -res.f16[0]; + check_results(&res, &exp, 32, "neg"); + + unpack_ph_2twops(src1, &v1, &v2); + for (i=0; i<16; i++) + { + v1.f32[i] = -v1.f32[i]; + v2.f32[i] = -v2.f32[i]; + } + exp = pack_twops_2ph(v1, v2); + res.zmmh = -src1.zmmh; + check_results(&res, &exp, 32, "neg"); + if (n_errs != 0) { + abort (); + } +} diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-scalar-bitwise-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16-scalar-bitwise-1a.c new file mode 100644 index 00000000000..1325c341a33 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-scalar-bitwise-1a.c @@ -0,0 +1,31 @@ +/* { dg-do compile } */ +/* { dg-options "-Ofast -mavx512fp16" } */ + +_Float16 +f1 (_Float16 x) +{ + return __builtin_fabsf16 (x); +} + +_Float16 +f2 (_Float16 x, _Float16 y) +{ + return __builtin_copysignf16 (x, y); +} + +_Float16 +f3 (_Float16 x) +{ + return -x; +} + +_Float16 +f4 (_Float16 x, _Float16 y) +{ + return x * __builtin_copysignf16 (1, y); +} + + +/* { dg-final { scan-assembler-times "vandps\[^\n\r\]*xmm\[0-9\]" 4 } } */ +/* { dg-final { scan-assembler-times "vorps\[^\n\r\]*xmm\[0-9\]" 1 } } */ +/* { dg-final { scan-assembler-times "vxorps\[^\n\r\]*xmm\[0-9\]" 2 } } */ diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-scalar-bitwise-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-scalar-bitwise-1b.c new file mode 100644 index 00000000000..7a292519a4e --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-scalar-bitwise-1b.c @@ -0,0 +1,82 @@ +/* { dg-do run { target avx512fp16 } } */ +/* { dg-options "-Ofast -mavx512fp16 -mavx512dq" } */ + +#define AVX512FP16 +#include "avx512fp16-helper.h" + +void NOINLINE +emulate_absneg_ph (V512 * dest, V512 op1, int abs) +{ + V512 v1, v2, v3, v4; + int i; + + unpack_ph_2twops(op1, &v1, &v2); + unpack_ph_2twops(*dest, &v3, &v4); + + for (i = 0; i != 16; i++) { + if (abs) { + v3.f32[i] = __builtin_fabsf (v1.f32[i]); + v4.f32[i] = __builtin_fabsf (v2.f32[i]); + } + else { + v3.f32[i] = -v1.f32[i]; + v4.f32[i] = -v2.f32[i]; + } + } + *dest = pack_twops_2ph(v3, v4); +} + +void NOINLINE +emulate_copysign_ph (V512 * dest, V512 op1, V512 op2, int xorsign) +{ + V512 v1, v2, v3, v4, v5, v6; + int i; + + unpack_ph_2twops(op1, &v1, &v2); + unpack_ph_2twops(op2, &v3, &v4); + unpack_ph_2twops(*dest, &v5, &v6); + + for (i = 0; i != 16; i++) { + if (xorsign) { + v5.f32[i] = v1.f32[i] * __builtin_copysignf (1, v3.f32[i]); + v6.f32[i] = v2.f32[i] * __builtin_copysignf (1, v4.f32[i]); + } + else { + v5.f32[i] = __builtin_copysignf (v1.f32[i], v3.f32[i]); + v6.f32[i] = __builtin_copysignf (v2.f32[i], v4.f32[i]); + } + } + *dest = pack_twops_2ph(v5, v6); +} + +void +test_512 (void) +{ + V512 res, exp; + + init_src (); + + /* Abs for float16. */ + emulate_absneg_ph (&exp, src1, 1); + res.f16[0] = __builtin_fabsf16 (src1.f16[0]); + check_results (&res, &exp, 1, "abs_float16"); + + /* Neg for float16. */ + emulate_absneg_ph (&exp, src1, 0); + res.f16[0] = -(src1.f16[0]); + check_results (&res, &exp, 1, "neg_float16"); + + /* Copysign for float16. */ + emulate_copysign_ph (&exp, src1, src2, 0); + res.f16[0] = __builtin_copysignf16 (src1.f16[0], src2.f16[0]); + check_results (&res, &exp, 1, "copysign_float16"); + + /* Xorsign for float16. */ + emulate_copysign_ph (&exp, src1, src2, 1); + res.f16[0] = src1.f16[0] * __builtin_copysignf16 (1, src2.f16[0]); + check_results (&res, &exp, 1, "xorsign_float16"); + + if (n_errs != 0) { + abort (); + } +} diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vector-bitwise-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vector-bitwise-1a.c new file mode 100644 index 00000000000..13c05abc532 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vector-bitwise-1a.c @@ -0,0 +1,124 @@ +/* { dg-do compile } */ +/* { dg-options "-Ofast -mavx512vl -mavx512fp16" } */ + +#include +__m128h +f1 (__m128h x) +{ + int i = 0; + __m128h y; + for (; i != 8; i++) + y[i] = __builtin_fabsf16 (x[i]); + return y; +} + +__m256h +f2 (__m256h x) +{ + int i = 0; + __m256h y; + for (; i != 16; i++) + y[i] = __builtin_fabsf16 (x[i]); + return y; +} + +__m512h +f3 (__m512h x) +{ + int i = 0; + __m512h y; + for (; i != 32; i++) + y[i] = __builtin_fabsf16 (x[i]); + return y; +} + +__m128h +f4 (__m128h x) +{ + return -x; +} + +__m256h +f5 (__m256h x) +{ + return -x; +} + +__m512h +f6 (__m512h x) +{ + return -x; +} + +__m128h +f7 (__m128h x, __m128h y) +{ + int i = 0; + __m128h z; + for (; i != 8; i++) + z[i] = __builtin_copysignf16 (x[i], y[i]); + return z; +} + +__m256h +f8 (__m256h x, __m256h y) +{ + int i = 0; + __m256h z; + for (; i != 16; i++) + z[i] = __builtin_copysignf16 (x[i], y[i]); + return z; +} + +__m512h +f9 (__m512h x, __m512h y) +{ + int i = 0; + __m512h z; + for (; i != 32; i++) + z[i] = __builtin_copysignf16 (x[i], y[i]); + return z; +} + +__m128h +f10 (__m128h x, __m128h y) +{ + int i = 0; + __m128h z; + for (; i != 8; i++) + z[i] = x[i] * __builtin_copysignf16 (1, y[i]); + return z; +} + +__m256h +f11 (__m256h x, __m256h y) +{ + int i = 0; + __m256h z; + for (; i != 16; i++) + z[i] = x[i] * __builtin_copysignf16 (1, y[i]); + return z; +} + +__m512h +f12 (__m512h x, __m512h y) +{ + int i = 0; + __m512h z; + for (; i != 32; i++) + z[i] = x[i] * __builtin_copysignf16 (1, y[i]); + return z; +} + +/* { dg-final { scan-assembler "vandps\[^\n\r\]*xmm0" } } */ +/* { dg-final { scan-assembler "vandps\[^\n\r\]*ymm0" } } */ +/* { dg-final { scan-assembler "vpandd\[^\n\r\]*zmm0" } } */ +/* { dg-final { scan-assembler-times "vxorps\[^\n\r\]*xmm0" 2 } } */ +/* { dg-final { scan-assembler-times "vxorps\[^\n\r\]*ymm0" 2 } } */ +/* { dg-final { scan-assembler-times "vpxord\[^\n\r\]*zmm0" 2 } } */ +/* { dg-final { scan-assembler-times "vorps\[^\n\r\]*xmm0" 1 } } */ +/* { dg-final { scan-assembler-times "vorps\[^\n\r\]*ymm0" 1 } } */ +/* { dg-final { scan-assembler-times "vpord\[^\n\r\]*zmm0" 1 } } */ +/* { dg-final { scan-assembler-times "vandnps\[^\n\r\]*xmm0" 1 } } */ +/* { dg-final { scan-assembler-times "vandnps\[^\n\r\]*ymm0" 1 } } */ +/* { dg-final { scan-assembler-times "vpandnd\[^\n\r\]*zmm0" 1 } } */ diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vector-bitwise-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vector-bitwise-1b.c new file mode 100644 index 00000000000..1398b360064 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vector-bitwise-1b.c @@ -0,0 +1,119 @@ +/* { dg-do run { target avx512fp16 } } */ +/* { dg-options "-Ofast -mavx512fp16 -mavx512vl -mavx512dq" } */ + +#define AVX512FP16 +#include "avx512fp16-helper.h" + +void NOINLINE +emulate_absneg_ph (V512 * dest, V512 op1, int abs) +{ + V512 v1, v2, v3, v4; + int i; + + unpack_ph_2twops(op1, &v1, &v2); + unpack_ph_2twops(*dest, &v3, &v4); + + for (i = 0; i != 16; i++) { + if (abs) { + v3.f32[i] = __builtin_fabsf (v1.f32[i]); + v4.f32[i] = __builtin_fabsf (v2.f32[i]); + } + else { + v3.f32[i] = -v1.f32[i]; + v4.f32[i] = -v2.f32[i]; + } + } + *dest = pack_twops_2ph(v3, v4); +} + +void NOINLINE +emulate_copysign_ph (V512 * dest, V512 op1, V512 op2, int xorsign) +{ + V512 v1, v2, v3, v4, v5, v6; + int i; + + unpack_ph_2twops(op1, &v1, &v2); + unpack_ph_2twops(op2, &v3, &v4); + unpack_ph_2twops(*dest, &v5, &v6); + + for (i = 0; i != 16; i++) { + if (xorsign) { + v5.f32[i] = v1.f32[i] * __builtin_copysignf (1, v3.f32[i]); + v6.f32[i] = v2.f32[i] * __builtin_copysignf (1, v4.f32[i]); + } + else { + v5.f32[i] = __builtin_copysignf (v1.f32[i], v3.f32[i]); + v6.f32[i] = __builtin_copysignf (v2.f32[i], v4.f32[i]); + } + } + *dest = pack_twops_2ph(v5, v6); +} + + +void +test_512 (void) +{ + V512 res, exp; + + init_src (); + + /* Abs for vector float16. */ + emulate_absneg_ph (&exp, src1, 1); + for (int i = 0; i != 8; i++) + res.f16[i] = __builtin_fabsf16 (src1.f16[i]); + check_results (&res, &exp, 8, "abs_m128h"); + + for (int i = 0; i != 16; i++) + res.f16[i] = __builtin_fabsf16 (src1.f16[i]); + check_results (&res, &exp, 16, "abs_m256h"); + + for (int i = 0; i != 32; i++) + res.f16[i] = __builtin_fabsf16 (src1.f16[i]); + check_results (&res, &exp, 32, "abs_m512h"); + + /* Neg for vector float16. */ + emulate_absneg_ph (&exp, src1, 0); + for (int i = 0; i != 8; i++) + res.f16[i] = -(src1.f16[i]); + check_results (&res, &exp, 8, "neg_m128h"); + + for (int i = 0; i != 16; i++) + res.f16[i] = -(src1.f16[i]); + check_results (&res, &exp, 16, "neg_m256h"); + + for (int i = 0; i != 32; i++) + res.f16[i] = -(src1.f16[i]); + check_results (&res, &exp, 32, "neg_m512h"); + + /* Copysign for vector float16. */ + emulate_copysign_ph (&exp, src1, src2, 0); + for (int i = 0; i != 8; i++) + res.f16[i] = __builtin_copysignf16 (src1.f16[i], src2.f16[i]); + check_results (&res, &exp, 8, "copysign_m128h"); + + for (int i = 0; i != 16; i++) + res.f16[i] = __builtin_copysignf16 (src1.f16[i], src2.f16[i]); + check_results (&res, &exp, 16, "copysign_m256h"); + + for (int i = 0; i != 32; i++) + res.f16[i] = __builtin_copysignf16 (src1.f16[i], src2.f16[i]); + check_results (&res, &exp, 32, "copysign_m512h"); + + /* Xorsign for vector float16. */ + emulate_copysign_ph (&exp, src1, src2, 1); + for (int i = 0; i != 8; i++) + res.f16[i] = src1.f16[i] * __builtin_copysignf16 (1, src2.f16[i]); + check_results (&res, &exp, 8, "xorsign_m128h"); + + for (int i = 0; i != 16; i++) + res.f16[i] = src1.f16[i] * __builtin_copysignf16 (1, src2.f16[i]); + check_results (&res, &exp, 16, "xorsign_m256h"); + + for (int i = 0; i != 32; i++) + res.f16[i] = src1.f16[i] * __builtin_copysignf16 (1, src2.f16[i]); + check_results (&res, &exp, 32, "xorsign_m512h"); + + if (n_errs != 0) { + abort (); + } +} diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-neg-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16vl-neg-1a.c new file mode 100644 index 00000000000..a40a0d88dd2 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-neg-1a.c @@ -0,0 +1,18 @@ +/* { dg-do compile} */ +/* { dg-options "-O2 -mavx512fp16 -mavx512vl" } */ + +/* { dg-final { scan-assembler-times "vxorps\[ \\t\]+\[^\n\r\]*%xmm0" 1 } } */ +/* { dg-final { scan-assembler-times "vxorps\[ \\t\]+\[^\n\r\]*%ymm0" 1 } } */ +#include + +__m128h +neghf128 (__m128h a) +{ + return -a; +} + +__m256h +neghf256 (__m256h a) +{ + return -a; +} diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-neg-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16vl-neg-1b.c new file mode 100644 index 00000000000..d8f65fb3f60 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-neg-1b.c @@ -0,0 +1,33 @@ +/* { dg-do run { target avx512fp16 } } */ +/* { dg-options "-O2 -mavx512fp16 -mavx512vl -mavx512dq" } */ + +#define AVX512FP16 +#include "avx512fp16-helper.h" + +static void +test_512 (void) +{ + V512 v1, v2, v3, v4, exp, res; + int i; + init_src(); + + unpack_ph_2twops(src1, &v1, &v2); + v1.f32[0] = -v1.f32[0]; + exp = pack_twops_2ph(v1, v2); + res.zmmh = src1.zmmh; + res.f16[0] = -res.f16[0]; + check_results(&res, &exp, 32, "neg"); + + unpack_ph_2twops(src1, &v1, &v2); + for (i=0; i<16; i++) + { + v1.f32[i] = -v1.f32[i]; + v2.f32[i] = -v2.f32[i]; + } + exp = pack_twops_2ph(v1, v2); + res.zmmh = -src1.zmmh; + check_results(&res, &exp, 32, "neg"); + if (n_errs != 0) { + abort (); + } +} From patchwork Thu Jul 1 06:16:32 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: liuhongt X-Patchwork-Id: 1499398 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256 header.s=default header.b=Exo5uM/y; dkim-atps=neutral Received: from sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4GFq8767fhz9sVb for ; Thu, 1 Jul 2021 17:09:11 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id E8AEB383B812 for ; Thu, 1 Jul 2021 07:09:08 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org E8AEB383B812 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1625123348; bh=DF9gdZUmQtEOAaYLxvPlXL5oy96uiRt1HstSSdc9eYg=; h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=Exo5uM/yCTsgXQ4h+B44HNK+dLqhSkPGWBBep5zQ4Yffjgh6xb1mzl91/7uTmi9Zr bzizPRXjrS/AcCTcUjzx0C/gQ9jcU0FniANfco4BC1EmTrwu6KIa0aymygWZJj9k6X 64HdZSFm8A5tn/3Z4XghivP1uhsahhmuav84bm/c= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mga04.intel.com (mga04.intel.com [192.55.52.120]) by sourceware.org (Postfix) with ESMTPS id D753C3840002 for ; Thu, 1 Jul 2021 06:18:06 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org D753C3840002 X-IronPort-AV: E=McAfee;i="6200,9189,10031"; a="206656539" X-IronPort-AV: E=Sophos;i="5.83,313,1616482800"; d="scan'208";a="206656539" Received: from orsmga002.jf.intel.com ([10.7.209.21]) by fmsmga104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Jun 2021 23:18:05 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.83,313,1616482800"; d="scan'208";a="426068622" Received: from scymds01.sc.intel.com ([10.148.94.138]) by orsmga002.jf.intel.com with ESMTP; 30 Jun 2021 23:18:05 -0700 Received: from shliclel320.sh.intel.com (shliclel320.sh.intel.com [10.239.236.50]) by scymds01.sc.intel.com with ESMTP id 1616GmfP031625; Wed, 30 Jun 2021 23:18:04 -0700 To: gcc-patches@gcc.gnu.org Subject: [PATCH 46/62] AVX512FP16: Enable FP16 mask load/store. Date: Thu, 1 Jul 2021 14:16:32 +0800 Message-Id: <20210701061648.9447-47-hongtao.liu@intel.com> X-Mailer: git-send-email 2.18.1 In-Reply-To: <20210701061648.9447-1-hongtao.liu@intel.com> References: <20210701061648.9447-1-hongtao.liu@intel.com> X-Spam-Status: No, score=-12.2 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_NONE, KAM_DMARC_STATUS, KAM_LAZY_DOMAIN_SECURITY, KAM_SHORT, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: liuhongt via Gcc-patches From: liuhongt Reply-To: liuhongt Cc: jakub@redhat.com Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" From: "H.J. Lu" gcc/ChangeLog: * config/i386/sse.md (avx512fmaskmodelower): Extend to support HF modes. (maskload): Ditto. (maskstore): Ditto. gcc/testsuite/ChangeLog: * gcc.target/i386/avx512fp16-xorsign-1.c: New test. --- gcc/config/i386/sse.md | 13 +++--- .../gcc.target/i386/avx512fp16-xorsign-1.c | 41 +++++++++++++++++++ 2 files changed, 48 insertions(+), 6 deletions(-) create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-xorsign-1.c diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index 7c594babcce..cbf1e75c0b2 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -915,6 +915,7 @@ (define_mode_attr avx512fmaskmodelower (V32HI "si") (V16HI "hi") (V8HI "qi") (V4HI "qi") (V16SI "hi") (V8SI "qi") (V4SI "qi") (V8DI "qi") (V4DI "qi") (V2DI "qi") + (V32HF "si") (V16HF "hi") (V8HF "qi") (V16SF "hi") (V8SF "qi") (V4SF "qi") (V8DF "qi") (V4DF "qi") (V2DF "qi")]) @@ -23106,9 +23107,9 @@ (define_expand "maskload" "TARGET_AVX") (define_expand "maskload" - [(set (match_operand:V48_AVX512VL 0 "register_operand") - (vec_merge:V48_AVX512VL - (match_operand:V48_AVX512VL 1 "memory_operand") + [(set (match_operand:V48H_AVX512VL 0 "register_operand") + (vec_merge:V48H_AVX512VL + (match_operand:V48H_AVX512VL 1 "memory_operand") (match_dup 0) (match_operand: 2 "register_operand")))] "TARGET_AVX512F") @@ -23131,9 +23132,9 @@ (define_expand "maskstore" "TARGET_AVX") (define_expand "maskstore" - [(set (match_operand:V48_AVX512VL 0 "memory_operand") - (vec_merge:V48_AVX512VL - (match_operand:V48_AVX512VL 1 "register_operand") + [(set (match_operand:V48H_AVX512VL 0 "memory_operand") + (vec_merge:V48H_AVX512VL + (match_operand:V48H_AVX512VL 1 "register_operand") (match_dup 0) (match_operand: 2 "register_operand")))] "TARGET_AVX512F") diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-xorsign-1.c b/gcc/testsuite/gcc.target/i386/avx512fp16-xorsign-1.c new file mode 100644 index 00000000000..a22a6ceabff --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-xorsign-1.c @@ -0,0 +1,41 @@ +/* { dg-do run { target avx512fp16 } } */ +/* { dg-options "-O2 -mavx512fp16 -mavx512vl -ftree-vectorize -fdump-tree-vect-details -save-temps" } */ + +extern void abort (); + +static void do_test (void); + +#define DO_TEST do_test +#define AVX512FP16 +#include "avx512-check.h" + +#define N 16 +_Float16 a[N] = {-0.1f, -3.2f, -6.3f, -9.4f, + -12.5f, -15.6f, -18.7f, -21.8f, + 24.9f, 27.1f, 30.2f, 33.3f, + 36.4f, 39.5f, 42.6f, 45.7f}; +_Float16 b[N] = {-1.2f, 3.4f, -5.6f, 7.8f, + -9.0f, 1.0f, -2.0f, 3.0f, + -4.0f, -5.0f, 6.0f, 7.0f, + -8.0f, -9.0f, 10.0f, 11.0f}; +_Float16 r[N]; + +static void +__attribute__ ((noinline, noclone)) +do_test (void) +{ + int i; + + for (i = 0; i < N; i++) + r[i] = a[i] * __builtin_copysignf16 (1.0f, b[i]); + + /* check results: */ + for (i = 0; i < N; i++) + if (r[i] != a[i] * __builtin_copysignf16 (1.0f, b[i])) + abort (); +} + +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */ +/* { dg-final { scan-assembler "\[ \t\]xor" } } */ +/* { dg-final { scan-assembler "\[ \t\]and" } } */ +/* { dg-final { scan-assembler-not "copysign" } } */ From patchwork Thu Jul 1 06:16:33 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: liuhongt X-Patchwork-Id: 1499399 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256 header.s=default header.b=hNAUlBfq; dkim-atps=neutral Received: from sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4GFq9r6Rdcz9sVb for ; Thu, 1 Jul 2021 17:10:40 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 025153951809 for ; Thu, 1 Jul 2021 07:10:38 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 025153951809 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1625123438; bh=ejNVV22lywvhz3wNTvZ/0m2nH0Saq4TiOXxA80gBYTE=; h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=hNAUlBfq6uofT3A3m5JzJYl4Z9DcP8kBgL0/dZzq8z4+3F7hIUPaItncitq9uJ9gW PVmZr6+pBmQW8n9Jx1VGFUVkJHyGyb+TyGlBAoDQRmBV2Sx+lbh+rcohRZTUxzvt2o REtkT+f1byYoxmQ3Poe2aCB3zsiv6K/Gk+LCs7YM= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mga05.intel.com (mga05.intel.com [192.55.52.43]) by sourceware.org (Postfix) with ESMTPS id 480863848024 for ; Thu, 1 Jul 2021 06:18:08 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 480863848024 X-IronPort-AV: E=McAfee;i="6200,9189,10031"; a="294115040" X-IronPort-AV: E=Sophos;i="5.83,313,1616482800"; d="scan'208";a="294115040" Received: from orsmga003.jf.intel.com ([10.7.209.27]) by fmsmga105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Jun 2021 23:18:07 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.83,313,1616482800"; d="scan'208";a="409038891" Received: from scymds01.sc.intel.com ([10.148.94.138]) by orsmga003.jf.intel.com with ESMTP; 30 Jun 2021 23:18:06 -0700 Received: from shliclel320.sh.intel.com (shliclel320.sh.intel.com [10.239.236.50]) by scymds01.sc.intel.com with ESMTP id 1616GmfQ031625; Wed, 30 Jun 2021 23:18:05 -0700 To: gcc-patches@gcc.gnu.org Subject: [PATCH 47/62] AVX512FP16: Add scalar fma instructions. Date: Thu, 1 Jul 2021 14:16:33 +0800 Message-Id: <20210701061648.9447-48-hongtao.liu@intel.com> X-Mailer: git-send-email 2.18.1 In-Reply-To: <20210701061648.9447-1-hongtao.liu@intel.com> References: <20210701061648.9447-1-hongtao.liu@intel.com> X-Spam-Status: No, score=-12.2 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_NONE, KAM_DMARC_STATUS, KAM_LAZY_DOMAIN_SECURITY, KAM_SHORT, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: liuhongt via Gcc-patches From: liuhongt Reply-To: liuhongt Cc: jakub@redhat.com Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" Add vfmadd[132,213,231]sh/vfnmadd[132,213,231]sh/ vfmsub[132,213,231]sh/vfnmsub[132,213,231]sh. gcc/ChangeLog: * config/i386/avx512fp16intrin.h (_mm_fmadd_sh): New intrinsic. (_mm_mask_fmadd_sh): Likewise. (_mm_mask3_fmadd_sh): Likewise. (_mm_maskz_fmadd_sh): Likewise. (_mm_fmadd_round_sh): Likewise. (_mm_mask_fmadd_round_sh): Likewise. (_mm_mask3_fmadd_round_sh): Likewise. (_mm_maskz_fmadd_round_sh): Likewise. (_mm_fnmadd_sh): Likewise. (_mm_mask_fnmadd_sh): Likewise. (_mm_mask3_fnmadd_sh): Likewise. (_mm_maskz_fnmadd_sh): Likewise. (_mm_fnmadd_round_sh): Likewise. (_mm_mask_fnmadd_round_sh): Likewise. (_mm_mask3_fnmadd_round_sh): Likewise. (_mm_maskz_fnmadd_round_sh): Likewise. (_mm_fmsub_sh): Likewise. (_mm_mask_fmsub_sh): Likewise. (_mm_mask3_fmsub_sh): Likewise. (_mm_maskz_fmsub_sh): Likewise. (_mm_fmsub_round_sh): Likewise. (_mm_mask_fmsub_round_sh): Likewise. (_mm_mask3_fmsub_round_sh): Likewise. (_mm_maskz_fmsub_round_sh): Likewise. (_mm_fnmsub_sh): Likewise. (_mm_mask_fnmsub_sh): Likewise. (_mm_mask3_fnmsub_sh): Likewise. (_mm_maskz_fnmsub_sh): Likewise. (_mm_fnmsub_round_sh): Likewise. (_mm_mask_fnmsub_round_sh): Likewise. (_mm_mask3_fnmsub_round_sh): Likewise. (_mm_maskz_fnmsub_round_sh): Likewise. * config/i386/i386-builtin-types.def (V8HF_FTYPE_V8HF_V8HF_V8HF_UQI_INT): New builtin type. * config/i386/i386-builtin.def: Add new builtins. * config/i386/i386-expand.c: Handle new builtin type. * config/i386/sse.md (fmai_vmfmadd_): Ajdust to support FP16. (fmai_vmfmsub_): Ditto. (fmai_vmfnmadd_): Ditto. (fmai_vmfnmsub_): Ditto. (*fmai_fmadd_): Ditto. (*fmai_fmsub_): Ditto. (*fmai_fnmadd_): Ditto. (*fmai_fnmsub_): Ditto. (avx512f_vmfmadd__mask): Ditto. (avx512f_vmfmadd__mask3): Ditto. (avx512f_vmfmadd__maskz): Ditto. (avx512f_vmfmadd__maskz_1): Ditto. (*avx512f_vmfmsub__mask): Ditto. (avx512f_vmfmsub__mask3): Ditto. (*avx512f_vmfmsub__maskz_1): Ditto. (*avx512f_vmfnmsub__mask): Ditto. (*avx512f_vmfnmsub__mask3): Ditto. (*avx512f_vmfnmsub__mask): Ditto. (*avx512f_vmfnmadd__mask): Renamed to ... (avx512f_vmfnmadd__mask) ... this, and adjust to support FP16. (avx512f_vmfnmadd__mask3): Ditto. (avx512f_vmfnmadd__maskz_1): Ditto. (avx512f_vmfnmadd__maskz): New expander. gcc/testsuite/ChangeLog: * gcc.target/i386/avx-1.c: Add test for new builtins. * gcc.target/i386/sse-13.c: Ditto. * gcc.target/i386/sse-23.c: Ditto. * gcc.target/i386/sse-14.c: Add test for new intrinsics. * gcc.target/i386/sse-22.c: Ditto. --- gcc/config/i386/avx512fp16intrin.h | 412 +++++++++++++++++++++++++ gcc/config/i386/i386-builtin-types.def | 1 + gcc/config/i386/i386-builtin.def | 7 + gcc/config/i386/i386-expand.c | 1 + gcc/config/i386/sse.md | 340 ++++++++++---------- gcc/testsuite/gcc.target/i386/avx-1.c | 12 + gcc/testsuite/gcc.target/i386/sse-13.c | 12 + gcc/testsuite/gcc.target/i386/sse-14.c | 16 + gcc/testsuite/gcc.target/i386/sse-22.c | 16 + gcc/testsuite/gcc.target/i386/sse-23.c | 12 + 10 files changed, 666 insertions(+), 163 deletions(-) diff --git a/gcc/config/i386/avx512fp16intrin.h b/gcc/config/i386/avx512fp16intrin.h index f246bab5159..5c85ec15b22 100644 --- a/gcc/config/i386/avx512fp16intrin.h +++ b/gcc/config/i386/avx512fp16intrin.h @@ -5697,6 +5697,418 @@ _mm512_maskz_fnmsub_round_ph (__mmask32 __U, __m512h __A, __m512h __B, #endif /* __OPTIMIZE__ */ +/* Intrinsics vfmadd[132,213,231]sh. */ +extern __inline __m128h + __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_fmadd_sh (__m128h __W, __m128h __A, __m128h __B) +{ + return (__m128h) __builtin_ia32_vfmaddsh3_mask ((__v8hf) __W, + (__v8hf) __A, + (__v8hf) __B, + (__mmask8) -1, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_fmadd_sh (__m128h __W, __mmask8 __U, __m128h __A, __m128h __B) +{ + return (__m128h) __builtin_ia32_vfmaddsh3_mask ((__v8hf) __W, + (__v8hf) __A, + (__v8hf) __B, + (__mmask8) __U, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask3_fmadd_sh (__m128h __W, __m128h __A, __m128h __B, __mmask8 __U) +{ + return (__m128h) __builtin_ia32_vfmaddsh3_mask3 ((__v8hf) __W, + (__v8hf) __A, + (__v8hf) __B, + (__mmask8) __U, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_fmadd_sh (__mmask8 __U, __m128h __W, __m128h __A, __m128h __B) +{ + return (__m128h) __builtin_ia32_vfmaddsh3_maskz ((__v8hf) __W, + (__v8hf) __A, + (__v8hf) __B, + (__mmask8) __U, + _MM_FROUND_CUR_DIRECTION); +} + + +#ifdef __OPTIMIZE__ +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_fmadd_round_sh (__m128h __W, __m128h __A, __m128h __B, const int __R) +{ + return (__m128h) __builtin_ia32_vfmaddsh3_mask ((__v8hf) __W, + (__v8hf) __A, + (__v8hf) __B, + (__mmask8) -1, + __R); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_fmadd_round_sh (__m128h __W, __mmask8 __U, __m128h __A, __m128h __B, + const int __R) +{ + return (__m128h) __builtin_ia32_vfmaddsh3_mask ((__v8hf) __W, + (__v8hf) __A, + (__v8hf) __B, + (__mmask8) __U, __R); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask3_fmadd_round_sh (__m128h __W, __m128h __A, __m128h __B, __mmask8 __U, + const int __R) +{ + return (__m128h) __builtin_ia32_vfmaddsh3_mask3 ((__v8hf) __W, + (__v8hf) __A, + (__v8hf) __B, + (__mmask8) __U, __R); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_fmadd_round_sh (__mmask8 __U, __m128h __W, __m128h __A, + __m128h __B, const int __R) +{ + return (__m128h) __builtin_ia32_vfmaddsh3_maskz ((__v8hf) __W, + (__v8hf) __A, + (__v8hf) __B, + (__mmask8) __U, __R); +} + +#else +#define _mm_fmadd_round_sh(A, B, C, R) \ + ((__m128h) __builtin_ia32_vfmaddsh3_mask ((A), (B), (C), (-1), (R))) +#define _mm_mask_fmadd_round_sh(A, U, B, C, R) \ + ((__m128h) __builtin_ia32_vfmaddsh3_mask ((A), (B), (C), (U), (R))) +#define _mm_mask3_fmadd_round_sh(A, B, C, U, R) \ + ((__m128h) __builtin_ia32_vfmaddsh3_mask3 ((A), (B), (C), (U), (R))) +#define _mm_maskz_fmadd_round_sh(U, A, B, C, R) \ + ((__m128h) __builtin_ia32_vfmaddsh3_maskz ((A), (B), (C), (U), (R))) + +#endif /* __OPTIMIZE__ */ + +/* Intrinsics vfnmadd[132,213,231]sh. */ +extern __inline __m128h + __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_fnmadd_sh (__m128h __W, __m128h __A, __m128h __B) +{ + return (__m128h) __builtin_ia32_vfnmaddsh3_mask ((__v8hf) __W, + (__v8hf) __A, + (__v8hf) __B, + (__mmask8) -1, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_fnmadd_sh (__m128h __W, __mmask8 __U, __m128h __A, __m128h __B) +{ + return (__m128h) __builtin_ia32_vfnmaddsh3_mask ((__v8hf) __W, + (__v8hf) __A, + (__v8hf) __B, + (__mmask8) __U, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask3_fnmadd_sh (__m128h __W, __m128h __A, __m128h __B, __mmask8 __U) +{ + return (__m128h) __builtin_ia32_vfnmaddsh3_mask3 ((__v8hf) __W, + (__v8hf) __A, + (__v8hf) __B, + (__mmask8) __U, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_fnmadd_sh (__mmask8 __U, __m128h __W, __m128h __A, __m128h __B) +{ + return (__m128h) __builtin_ia32_vfnmaddsh3_maskz ((__v8hf) __W, + (__v8hf) __A, + (__v8hf) __B, + (__mmask8) __U, + _MM_FROUND_CUR_DIRECTION); +} + + +#ifdef __OPTIMIZE__ +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_fnmadd_round_sh (__m128h __W, __m128h __A, __m128h __B, const int __R) +{ + return (__m128h) __builtin_ia32_vfnmaddsh3_mask ((__v8hf) __W, + (__v8hf) __A, + (__v8hf) __B, + (__mmask8) -1, + __R); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_fnmadd_round_sh (__m128h __W, __mmask8 __U, __m128h __A, __m128h __B, + const int __R) +{ + return (__m128h) __builtin_ia32_vfnmaddsh3_mask ((__v8hf) __W, + (__v8hf) __A, + (__v8hf) __B, + (__mmask8) __U, __R); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask3_fnmadd_round_sh (__m128h __W, __m128h __A, __m128h __B, __mmask8 __U, + const int __R) +{ + return (__m128h) __builtin_ia32_vfnmaddsh3_mask3 ((__v8hf) __W, + (__v8hf) __A, + (__v8hf) __B, + (__mmask8) __U, __R); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_fnmadd_round_sh (__mmask8 __U, __m128h __W, __m128h __A, + __m128h __B, const int __R) +{ + return (__m128h) __builtin_ia32_vfnmaddsh3_maskz ((__v8hf) __W, + (__v8hf) __A, + (__v8hf) __B, + (__mmask8) __U, __R); +} + +#else +#define _mm_fnmadd_round_sh(A, B, C, R) \ + ((__m128h) __builtin_ia32_vfnmaddsh3_mask ((A), (B), (C), (-1), (R))) +#define _mm_mask_fnmadd_round_sh(A, U, B, C, R) \ + ((__m128h) __builtin_ia32_vfnmaddsh3_mask ((A), (B), (C), (U), (R))) +#define _mm_mask3_fnmadd_round_sh(A, B, C, U, R) \ + ((__m128h) __builtin_ia32_vfnmaddsh3_mask3 ((A), (B), (C), (U), (R))) +#define _mm_maskz_fnmadd_round_sh(U, A, B, C, R) \ + ((__m128h) __builtin_ia32_vfnmaddsh3_maskz ((A), (B), (C), (U), (R))) + +#endif /* __OPTIMIZE__ */ + +/* Intrinsics vfmsub[132,213,231]sh. */ +extern __inline __m128h + __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_fmsub_sh (__m128h __W, __m128h __A, __m128h __B) +{ + return (__m128h) __builtin_ia32_vfmaddsh3_mask ((__v8hf) __W, + (__v8hf) __A, + -(__v8hf) __B, + (__mmask8) -1, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_fmsub_sh (__m128h __W, __mmask8 __U, __m128h __A, __m128h __B) +{ + return (__m128h) __builtin_ia32_vfmaddsh3_mask ((__v8hf) __W, + (__v8hf) __A, + -(__v8hf) __B, + (__mmask8) __U, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask3_fmsub_sh (__m128h __W, __m128h __A, __m128h __B, __mmask8 __U) +{ + return (__m128h) __builtin_ia32_vfmsubsh3_mask3 ((__v8hf) __W, + (__v8hf) __A, + (__v8hf) __B, + (__mmask8) __U, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_fmsub_sh (__mmask8 __U, __m128h __W, __m128h __A, __m128h __B) +{ + return (__m128h) __builtin_ia32_vfmaddsh3_maskz ((__v8hf) __W, + (__v8hf) __A, + -(__v8hf) __B, + (__mmask8) __U, + _MM_FROUND_CUR_DIRECTION); +} + + +#ifdef __OPTIMIZE__ +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_fmsub_round_sh (__m128h __W, __m128h __A, __m128h __B, const int __R) +{ + return (__m128h) __builtin_ia32_vfmaddsh3_mask ((__v8hf) __W, + (__v8hf) __A, + -(__v8hf) __B, + (__mmask8) -1, + __R); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_fmsub_round_sh (__m128h __W, __mmask8 __U, __m128h __A, __m128h __B, + const int __R) +{ + return (__m128h) __builtin_ia32_vfmaddsh3_mask ((__v8hf) __W, + (__v8hf) __A, + -(__v8hf) __B, + (__mmask8) __U, __R); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask3_fmsub_round_sh (__m128h __W, __m128h __A, __m128h __B, __mmask8 __U, + const int __R) +{ + return (__m128h) __builtin_ia32_vfmsubsh3_mask3 ((__v8hf) __W, + (__v8hf) __A, + (__v8hf) __B, + (__mmask8) __U, __R); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_fmsub_round_sh (__mmask8 __U, __m128h __W, __m128h __A, + __m128h __B, const int __R) +{ + return (__m128h) __builtin_ia32_vfmaddsh3_maskz ((__v8hf) __W, + (__v8hf) __A, + -(__v8hf) __B, + (__mmask8) __U, __R); +} + +#else +#define _mm_fmsub_round_sh(A, B, C, R) \ + ((__m128h) __builtin_ia32_vfmaddsh3_mask ((A), (B), -(C), (-1), (R))) +#define _mm_mask_fmsub_round_sh(A, U, B, C, R) \ + ((__m128h) __builtin_ia32_vfmaddsh3_mask ((A), (B), -(C), (U), (R))) +#define _mm_mask3_fmsub_round_sh(A, B, C, U, R) \ + ((__m128h) __builtin_ia32_vfmsubsh3_mask3 ((A), (B), (C), (U), (R))) +#define _mm_maskz_fmsub_round_sh(U, A, B, C, R) \ + ((__m128h) __builtin_ia32_vfmaddsh3_maskz ((A), (B), -(C), (U), (R))) + +#endif /* __OPTIMIZE__ */ + +/* Intrinsics vfnmsub[132,213,231]sh. */ +extern __inline __m128h + __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_fnmsub_sh (__m128h __W, __m128h __A, __m128h __B) +{ + return (__m128h) __builtin_ia32_vfmaddsh3_mask ((__v8hf) __W, + -(__v8hf) __A, + -(__v8hf) __B, + (__mmask8) -1, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_fnmsub_sh (__m128h __W, __mmask8 __U, __m128h __A, __m128h __B) +{ + return (__m128h) __builtin_ia32_vfmaddsh3_mask ((__v8hf) __W, + -(__v8hf) __A, + -(__v8hf) __B, + (__mmask8) __U, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask3_fnmsub_sh (__m128h __W, __m128h __A, __m128h __B, __mmask8 __U) +{ + return (__m128h) __builtin_ia32_vfmsubsh3_mask3 ((__v8hf) __W, + -(__v8hf) __A, + (__v8hf) __B, + (__mmask8) __U, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_fnmsub_sh (__mmask8 __U, __m128h __W, __m128h __A, __m128h __B) +{ + return (__m128h) __builtin_ia32_vfmaddsh3_maskz ((__v8hf) __W, + -(__v8hf) __A, + -(__v8hf) __B, + (__mmask8) __U, + _MM_FROUND_CUR_DIRECTION); +} + + +#ifdef __OPTIMIZE__ +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_fnmsub_round_sh (__m128h __W, __m128h __A, __m128h __B, const int __R) +{ + return (__m128h) __builtin_ia32_vfmaddsh3_mask ((__v8hf) __W, + -(__v8hf) __A, + -(__v8hf) __B, + (__mmask8) -1, + __R); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_fnmsub_round_sh (__m128h __W, __mmask8 __U, __m128h __A, __m128h __B, + const int __R) +{ + return (__m128h) __builtin_ia32_vfmaddsh3_mask ((__v8hf) __W, + -(__v8hf) __A, + -(__v8hf) __B, + (__mmask8) __U, __R); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask3_fnmsub_round_sh (__m128h __W, __m128h __A, __m128h __B, __mmask8 __U, + const int __R) +{ + return (__m128h) __builtin_ia32_vfmsubsh3_mask3 ((__v8hf) __W, + -(__v8hf) __A, + (__v8hf) __B, + (__mmask8) __U, __R); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_fnmsub_round_sh (__mmask8 __U, __m128h __W, __m128h __A, + __m128h __B, const int __R) +{ + return (__m128h) __builtin_ia32_vfmaddsh3_maskz ((__v8hf) __W, + -(__v8hf) __A, + -(__v8hf) __B, + (__mmask8) __U, __R); +} + +#else +#define _mm_fnmsub_round_sh(A, B, C, R) \ + ((__m128h) __builtin_ia32_vfmaddsh3_mask ((A), -(B), -(C), (-1), (R))) +#define _mm_mask_fnmsub_round_sh(A, U, B, C, R) \ + ((__m128h) __builtin_ia32_vfmaddsh3_mask ((A), -(B), -(C), (U), (R))) +#define _mm_mask3_fnmsub_round_sh(A, B, C, U, R) \ + ((__m128h) __builtin_ia32_vfmsubsh3_mask3 ((A), -(B), (C), (U), (R))) +#define _mm_maskz_fnmsub_round_sh(U, A, B, C, R) \ + ((__m128h) __builtin_ia32_vfmaddsh3_maskz ((A), -(B), -(C), (U), (R))) + +#endif /* __OPTIMIZE__ */ + #ifdef __DISABLE_AVX512FP16__ #undef __DISABLE_AVX512FP16__ #pragma GCC pop_options diff --git a/gcc/config/i386/i386-builtin-types.def b/gcc/config/i386/i386-builtin-types.def index 0cdbf1bc0c0..22b924bf98d 100644 --- a/gcc/config/i386/i386-builtin-types.def +++ b/gcc/config/i386/i386-builtin-types.def @@ -1342,6 +1342,7 @@ DEF_FUNCTION_TYPE (V8HF, V8HF, V8HF, INT) DEF_FUNCTION_TYPE (V8HF, V8HF, INT, V8HF, UQI) DEF_FUNCTION_TYPE (UQI, V8HF, V8HF, INT, UQI) DEF_FUNCTION_TYPE (V8HF, V8HF, V8HF, V8HF, UQI) +DEF_FUNCTION_TYPE (V8HF, V8HF, V8HF, V8HF, INT) DEF_FUNCTION_TYPE (UQI, V8HF, V8HF, INT, UQI, INT) DEF_FUNCTION_TYPE (V8DI, V8HF, V8DI, UQI, INT) DEF_FUNCTION_TYPE (V8DF, V8HF, V8DF, UQI, INT) diff --git a/gcc/config/i386/i386-builtin.def b/gcc/config/i386/i386-builtin.def index cf0259843cc..f446a6ce5d3 100644 --- a/gcc/config/i386/i386-builtin.def +++ b/gcc/config/i386/i386-builtin.def @@ -3194,6 +3194,13 @@ BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_fmsub_v32hf_maskz_round BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_fnmsub_v32hf_mask_round, "__builtin_ia32_vfnmsubph512_mask", IX86_BUILTIN_VFNMSUBPH512_MASK, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT) BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_fnmsub_v32hf_mask3_round, "__builtin_ia32_vfnmsubph512_mask3", IX86_BUILTIN_VFNMSUBPH512_MASK3, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT) BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_fnmsub_v32hf_maskz_round, "__builtin_ia32_vfnmsubph512_maskz", IX86_BUILTIN_VFNMSUBPH512_MASKZ, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT) +BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512f_vmfmadd_v8hf_mask_round, "__builtin_ia32_vfmaddsh3_mask", IX86_BUILTIN_VFMADDSH3_MASK, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI_INT) +BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512f_vmfmadd_v8hf_mask3_round, "__builtin_ia32_vfmaddsh3_mask3", IX86_BUILTIN_VFMADDSH3_MASK3, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI_INT) +BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512f_vmfmadd_v8hf_maskz_round, "__builtin_ia32_vfmaddsh3_maskz", IX86_BUILTIN_VFMADDSH3_MASKZ, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI_INT) +BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512f_vmfnmadd_v8hf_mask_round, "__builtin_ia32_vfnmaddsh3_mask", IX86_BUILTIN_VFNMADDSH3_MASK, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI_INT) +BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512f_vmfnmadd_v8hf_mask3_round, "__builtin_ia32_vfnmaddsh3_mask3", IX86_BUILTIN_VFNMADDSH3_MASK3, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI_INT) +BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512f_vmfnmadd_v8hf_maskz_round, "__builtin_ia32_vfnmaddsh3_maskz", IX86_BUILTIN_VFNMADDSH3_MASKZ, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI_INT) +BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512f_vmfmsub_v8hf_mask3_round, "__builtin_ia32_vfmsubsh3_mask3", IX86_BUILTIN_VFMSUBSH3_MASK3, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI_INT) BDESC_END (ROUND_ARGS, MULTI_ARG) diff --git a/gcc/config/i386/i386-expand.c b/gcc/config/i386/i386-expand.c index 006f4bec8db..f6de05c769a 100644 --- a/gcc/config/i386/i386-expand.c +++ b/gcc/config/i386/i386-expand.c @@ -10558,6 +10558,7 @@ ix86_expand_round_builtin (const struct builtin_description *d, case V8HF_FTYPE_V8DI_V8HF_UQI_INT: case V8HF_FTYPE_V8DF_V8HF_UQI_INT: case V16HF_FTYPE_V16SF_V16HF_UHI_INT: + case V8HF_FTYPE_V8HF_V8HF_V8HF_INT: nargs = 4; break; case V4SF_FTYPE_V4SF_V4SF_INT_INT: diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index cbf1e75c0b2..31f8fc68c65 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -5049,60 +5049,60 @@ (define_insn "_fmsubadd__mask3" ;; high-order elements from the destination register. (define_expand "fmai_vmfmadd_" - [(set (match_operand:VF_128 0 "register_operand") - (vec_merge:VF_128 - (fma:VF_128 - (match_operand:VF_128 1 "register_operand") - (match_operand:VF_128 2 "") - (match_operand:VF_128 3 "")) + [(set (match_operand:VFH_128 0 "register_operand") + (vec_merge:VFH_128 + (fma:VFH_128 + (match_operand:VFH_128 1 "register_operand") + (match_operand:VFH_128 2 "") + (match_operand:VFH_128 3 "")) (match_dup 1) (const_int 1)))] "TARGET_FMA") (define_expand "fmai_vmfmsub_" - [(set (match_operand:VF_128 0 "register_operand") - (vec_merge:VF_128 - (fma:VF_128 - (match_operand:VF_128 1 "register_operand") - (match_operand:VF_128 2 "") - (neg:VF_128 - (match_operand:VF_128 3 ""))) + [(set (match_operand:VFH_128 0 "register_operand") + (vec_merge:VFH_128 + (fma:VFH_128 + (match_operand:VFH_128 1 "register_operand") + (match_operand:VFH_128 2 "") + (neg:VFH_128 + (match_operand:VFH_128 3 ""))) (match_dup 1) (const_int 1)))] "TARGET_FMA") (define_expand "fmai_vmfnmadd_" - [(set (match_operand:VF_128 0 "register_operand") - (vec_merge:VF_128 - (fma:VF_128 - (neg:VF_128 - (match_operand:VF_128 2 "")) - (match_operand:VF_128 1 "register_operand") - (match_operand:VF_128 3 "")) + [(set (match_operand:VFH_128 0 "register_operand") + (vec_merge:VFH_128 + (fma:VFH_128 + (neg:VFH_128 + (match_operand:VFH_128 2 "")) + (match_operand:VFH_128 1 "register_operand") + (match_operand:VFH_128 3 "")) (match_dup 1) (const_int 1)))] "TARGET_FMA") (define_expand "fmai_vmfnmsub_" - [(set (match_operand:VF_128 0 "register_operand") - (vec_merge:VF_128 - (fma:VF_128 - (neg:VF_128 - (match_operand:VF_128 2 "")) - (match_operand:VF_128 1 "register_operand") - (neg:VF_128 - (match_operand:VF_128 3 ""))) + [(set (match_operand:VFH_128 0 "register_operand") + (vec_merge:VFH_128 + (fma:VFH_128 + (neg:VFH_128 + (match_operand:VFH_128 2 "")) + (match_operand:VFH_128 1 "register_operand") + (neg:VFH_128 + (match_operand:VFH_128 3 ""))) (match_dup 1) (const_int 1)))] "TARGET_FMA") (define_insn "*fmai_fmadd_" - [(set (match_operand:VF_128 0 "register_operand" "=v,v") - (vec_merge:VF_128 - (fma:VF_128 - (match_operand:VF_128 1 "register_operand" "0,0") - (match_operand:VF_128 2 "" ", v") - (match_operand:VF_128 3 "" "v,")) + [(set (match_operand:VFH_128 0 "register_operand" "=v,v") + (vec_merge:VFH_128 + (fma:VFH_128 + (match_operand:VFH_128 1 "register_operand" "0,0") + (match_operand:VFH_128 2 "" ", v") + (match_operand:VFH_128 3 "" "v,")) (match_dup 1) (const_int 1)))] "TARGET_FMA || TARGET_AVX512F" @@ -5113,13 +5113,13 @@ (define_insn "*fmai_fmadd_" (set_attr "mode" "")]) (define_insn "*fmai_fmsub_" - [(set (match_operand:VF_128 0 "register_operand" "=v,v") - (vec_merge:VF_128 - (fma:VF_128 - (match_operand:VF_128 1 "register_operand" "0,0") - (match_operand:VF_128 2 "" ",v") - (neg:VF_128 - (match_operand:VF_128 3 "" "v,"))) + [(set (match_operand:VFH_128 0 "register_operand" "=v,v") + (vec_merge:VFH_128 + (fma:VFH_128 + (match_operand:VFH_128 1 "register_operand" "0,0") + (match_operand:VFH_128 2 "" ",v") + (neg:VFH_128 + (match_operand:VFH_128 3 "" "v,"))) (match_dup 1) (const_int 1)))] "TARGET_FMA || TARGET_AVX512F" @@ -5130,13 +5130,13 @@ (define_insn "*fmai_fmsub_" (set_attr "mode" "")]) (define_insn "*fmai_fnmadd_" - [(set (match_operand:VF_128 0 "register_operand" "=v,v") - (vec_merge:VF_128 - (fma:VF_128 - (neg:VF_128 - (match_operand:VF_128 2 "" ",v")) - (match_operand:VF_128 1 "register_operand" "0,0") - (match_operand:VF_128 3 "" "v,")) + [(set (match_operand:VFH_128 0 "register_operand" "=v,v") + (vec_merge:VFH_128 + (fma:VFH_128 + (neg:VFH_128 + (match_operand:VFH_128 2 "" ",v")) + (match_operand:VFH_128 1 "register_operand" "0,0") + (match_operand:VFH_128 3 "" "v,")) (match_dup 1) (const_int 1)))] "TARGET_FMA || TARGET_AVX512F" @@ -5147,14 +5147,14 @@ (define_insn "*fmai_fnmadd_" (set_attr "mode" "")]) (define_insn "*fmai_fnmsub_" - [(set (match_operand:VF_128 0 "register_operand" "=v,v") - (vec_merge:VF_128 - (fma:VF_128 - (neg:VF_128 - (match_operand:VF_128 2 "" ",v")) - (match_operand:VF_128 1 "register_operand" "0,0") - (neg:VF_128 - (match_operand:VF_128 3 "" "v,"))) + [(set (match_operand:VFH_128 0 "register_operand" "=v,v") + (vec_merge:VFH_128 + (fma:VFH_128 + (neg:VFH_128 + (match_operand:VFH_128 2 "" ",v")) + (match_operand:VFH_128 1 "register_operand" "0,0") + (neg:VFH_128 + (match_operand:VFH_128 3 "" "v,"))) (match_dup 1) (const_int 1)))] "TARGET_FMA || TARGET_AVX512F" @@ -5165,13 +5165,13 @@ (define_insn "*fmai_fnmsub_" (set_attr "mode" "")]) (define_insn "avx512f_vmfmadd__mask" - [(set (match_operand:VF_128 0 "register_operand" "=v,v") - (vec_merge:VF_128 - (vec_merge:VF_128 - (fma:VF_128 - (match_operand:VF_128 1 "register_operand" "0,0") - (match_operand:VF_128 2 "" ",v") - (match_operand:VF_128 3 "" "v,")) + [(set (match_operand:VFH_128 0 "register_operand" "=v,v") + (vec_merge:VFH_128 + (vec_merge:VFH_128 + (fma:VFH_128 + (match_operand:VFH_128 1 "register_operand" "0,0") + (match_operand:VFH_128 2 "" ",v") + (match_operand:VFH_128 3 "" "v,")) (match_dup 1) (match_operand:QI 4 "register_operand" "Yk,Yk")) (match_dup 1) @@ -5184,13 +5184,13 @@ (define_insn "avx512f_vmfmadd__mask" (set_attr "mode" "")]) (define_insn "avx512f_vmfmadd__mask3" - [(set (match_operand:VF_128 0 "register_operand" "=v") - (vec_merge:VF_128 - (vec_merge:VF_128 - (fma:VF_128 - (match_operand:VF_128 1 "" "%v") - (match_operand:VF_128 2 "" "") - (match_operand:VF_128 3 "register_operand" "0")) + [(set (match_operand:VFH_128 0 "register_operand" "=v") + (vec_merge:VFH_128 + (vec_merge:VFH_128 + (fma:VFH_128 + (match_operand:VFH_128 1 "" "%v") + (match_operand:VFH_128 2 "" "") + (match_operand:VFH_128 3 "register_operand" "0")) (match_dup 3) (match_operand:QI 4 "register_operand" "Yk")) (match_dup 3) @@ -5201,10 +5201,10 @@ (define_insn "avx512f_vmfmadd__mask3" (set_attr "mode" "")]) (define_expand "avx512f_vmfmadd__maskz" - [(match_operand:VF_128 0 "register_operand") - (match_operand:VF_128 1 "") - (match_operand:VF_128 2 "") - (match_operand:VF_128 3 "") + [(match_operand:VFH_128 0 "register_operand") + (match_operand:VFH_128 1 "") + (match_operand:VFH_128 2 "") + (match_operand:VFH_128 3 "") (match_operand:QI 4 "register_operand")] "TARGET_AVX512F" { @@ -5215,14 +5215,14 @@ (define_expand "avx512f_vmfmadd__maskz" }) (define_insn "avx512f_vmfmadd__maskz_1" - [(set (match_operand:VF_128 0 "register_operand" "=v,v") - (vec_merge:VF_128 - (vec_merge:VF_128 - (fma:VF_128 - (match_operand:VF_128 1 "register_operand" "0,0") - (match_operand:VF_128 2 "" ",v") - (match_operand:VF_128 3 "" "v,")) - (match_operand:VF_128 4 "const0_operand" "C,C") + [(set (match_operand:VFH_128 0 "register_operand" "=v,v") + (vec_merge:VFH_128 + (vec_merge:VFH_128 + (fma:VFH_128 + (match_operand:VFH_128 1 "register_operand" "0,0") + (match_operand:VFH_128 2 "" ",v") + (match_operand:VFH_128 3 "" "v,")) + (match_operand:VFH_128 4 "const0_operand" "C,C") (match_operand:QI 5 "register_operand" "Yk,Yk")) (match_dup 1) (const_int 1)))] @@ -5234,14 +5234,14 @@ (define_insn "avx512f_vmfmadd__maskz_1" (set_attr "mode" "")]) (define_insn "*avx512f_vmfmsub__mask" - [(set (match_operand:VF_128 0 "register_operand" "=v,v") - (vec_merge:VF_128 - (vec_merge:VF_128 - (fma:VF_128 - (match_operand:VF_128 1 "register_operand" "0,0") - (match_operand:VF_128 2 "" ",v") - (neg:VF_128 - (match_operand:VF_128 3 "" "v,"))) + [(set (match_operand:VFH_128 0 "register_operand" "=v,v") + (vec_merge:VFH_128 + (vec_merge:VFH_128 + (fma:VFH_128 + (match_operand:VFH_128 1 "register_operand" "0,0") + (match_operand:VFH_128 2 "" ",v") + (neg:VFH_128 + (match_operand:VFH_128 3 "" "v,"))) (match_dup 1) (match_operand:QI 4 "register_operand" "Yk,Yk")) (match_dup 1) @@ -5254,14 +5254,14 @@ (define_insn "*avx512f_vmfmsub__mask" (set_attr "mode" "")]) (define_insn "avx512f_vmfmsub__mask3" - [(set (match_operand:VF_128 0 "register_operand" "=v") - (vec_merge:VF_128 - (vec_merge:VF_128 - (fma:VF_128 - (match_operand:VF_128 1 "" "%v") - (match_operand:VF_128 2 "" "") - (neg:VF_128 - (match_operand:VF_128 3 "register_operand" "0"))) + [(set (match_operand:VFH_128 0 "register_operand" "=v") + (vec_merge:VFH_128 + (vec_merge:VFH_128 + (fma:VFH_128 + (match_operand:VFH_128 1 "" "%v") + (match_operand:VFH_128 2 "" "") + (neg:VFH_128 + (match_operand:VFH_128 3 "register_operand" "0"))) (match_dup 3) (match_operand:QI 4 "register_operand" "Yk")) (match_dup 3) @@ -5272,15 +5272,15 @@ (define_insn "avx512f_vmfmsub__mask3" (set_attr "mode" "")]) (define_insn "*avx512f_vmfmsub__maskz_1" - [(set (match_operand:VF_128 0 "register_operand" "=v,v") - (vec_merge:VF_128 - (vec_merge:VF_128 - (fma:VF_128 - (match_operand:VF_128 1 "register_operand" "0,0") - (match_operand:VF_128 2 "" ",v") - (neg:VF_128 - (match_operand:VF_128 3 "" "v,"))) - (match_operand:VF_128 4 "const0_operand" "C,C") + [(set (match_operand:VFH_128 0 "register_operand" "=v,v") + (vec_merge:VFH_128 + (vec_merge:VFH_128 + (fma:VFH_128 + (match_operand:VFH_128 1 "register_operand" "0,0") + (match_operand:VFH_128 2 "" ",v") + (neg:VFH_128 + (match_operand:VFH_128 3 "" "v,"))) + (match_operand:VFH_128 4 "const0_operand" "C,C") (match_operand:QI 5 "register_operand" "Yk,Yk")) (match_dup 1) (const_int 1)))] @@ -5291,15 +5291,15 @@ (define_insn "*avx512f_vmfmsub__maskz_1" [(set_attr "type" "ssemuladd") (set_attr "mode" "")]) -(define_insn "*avx512f_vmfnmadd__mask" - [(set (match_operand:VF_128 0 "register_operand" "=v,v") - (vec_merge:VF_128 - (vec_merge:VF_128 - (fma:VF_128 - (neg:VF_128 - (match_operand:VF_128 2 "" ",v")) - (match_operand:VF_128 1 "register_operand" "0,0") - (match_operand:VF_128 3 "" "v,")) +(define_insn "avx512f_vmfnmadd__mask" + [(set (match_operand:VFH_128 0 "register_operand" "=v,v") + (vec_merge:VFH_128 + (vec_merge:VFH_128 + (fma:VFH_128 + (neg:VFH_128 + (match_operand:VFH_128 2 "" ",v")) + (match_operand:VFH_128 1 "register_operand" "0,0") + (match_operand:VFH_128 3 "" "v,")) (match_dup 1) (match_operand:QI 4 "register_operand" "Yk,Yk")) (match_dup 1) @@ -5311,15 +5311,15 @@ (define_insn "*avx512f_vmfnmadd__mask" [(set_attr "type" "ssemuladd") (set_attr "mode" "")]) -(define_insn "*avx512f_vmfnmadd__mask3" - [(set (match_operand:VF_128 0 "register_operand" "=v") - (vec_merge:VF_128 - (vec_merge:VF_128 - (fma:VF_128 - (neg:VF_128 - (match_operand:VF_128 2 "" "")) - (match_operand:VF_128 1 "" "%v") - (match_operand:VF_128 3 "register_operand" "0")) +(define_insn "avx512f_vmfnmadd__mask3" + [(set (match_operand:VFH_128 0 "register_operand" "=v") + (vec_merge:VFH_128 + (vec_merge:VFH_128 + (fma:VFH_128 + (neg:VFH_128 + (match_operand:VFH_128 2 "" "")) + (match_operand:VFH_128 1 "" "%v") + (match_operand:VFH_128 3 "register_operand" "0")) (match_dup 3) (match_operand:QI 4 "register_operand" "Yk")) (match_dup 3) @@ -5329,16 +5329,30 @@ (define_insn "*avx512f_vmfnmadd__mask3" [(set_attr "type" "ssemuladd") (set_attr "mode" "")]) -(define_insn "*avx512f_vmfnmadd__maskz_1" - [(set (match_operand:VF_128 0 "register_operand" "=v,v") - (vec_merge:VF_128 - (vec_merge:VF_128 - (fma:VF_128 - (neg:VF_128 - (match_operand:VF_128 2 "" ",v")) - (match_operand:VF_128 1 "register_operand" "0,0") - (match_operand:VF_128 3 "" "v,")) - (match_operand:VF_128 4 "const0_operand" "C,C") +(define_expand "avx512f_vmfnmadd__maskz" + [(match_operand:VFH_128 0 "register_operand") + (match_operand:VFH_128 1 "") + (match_operand:VFH_128 2 "") + (match_operand:VFH_128 3 "") + (match_operand:QI 4 "register_operand")] + "TARGET_AVX512F" +{ + emit_insn (gen_avx512f_vmfnmadd__maskz_1 ( + operands[0], operands[1], operands[2], operands[3], + CONST0_RTX (mode), operands[4])); + DONE; +}) + +(define_insn "avx512f_vmfnmadd__maskz_1" + [(set (match_operand:VFH_128 0 "register_operand" "=v,v") + (vec_merge:VFH_128 + (vec_merge:VFH_128 + (fma:VFH_128 + (neg:VFH_128 + (match_operand:VFH_128 2 "" ",v")) + (match_operand:VFH_128 1 "register_operand" "0,0") + (match_operand:VFH_128 3 "" "v,")) + (match_operand:VFH_128 4 "const0_operand" "C,C") (match_operand:QI 5 "register_operand" "Yk,Yk")) (match_dup 1) (const_int 1)))] @@ -5350,15 +5364,15 @@ (define_insn "*avx512f_vmfnmadd__maskz_1" (set_attr "mode" "")]) (define_insn "*avx512f_vmfnmsub__mask" - [(set (match_operand:VF_128 0 "register_operand" "=v,v") - (vec_merge:VF_128 - (vec_merge:VF_128 - (fma:VF_128 - (neg:VF_128 - (match_operand:VF_128 2 "" ",v")) - (match_operand:VF_128 1 "register_operand" "0,0") - (neg:VF_128 - (match_operand:VF_128 3 "" "v,"))) + [(set (match_operand:VFH_128 0 "register_operand" "=v,v") + (vec_merge:VFH_128 + (vec_merge:VFH_128 + (fma:VFH_128 + (neg:VFH_128 + (match_operand:VFH_128 2 "" ",v")) + (match_operand:VFH_128 1 "register_operand" "0,0") + (neg:VFH_128 + (match_operand:VFH_128 3 "" "v,"))) (match_dup 1) (match_operand:QI 4 "register_operand" "Yk,Yk")) (match_dup 1) @@ -5371,15 +5385,15 @@ (define_insn "*avx512f_vmfnmsub__mask" (set_attr "mode" "")]) (define_insn "*avx512f_vmfnmsub__mask3" - [(set (match_operand:VF_128 0 "register_operand" "=v") - (vec_merge:VF_128 - (vec_merge:VF_128 - (fma:VF_128 - (neg:VF_128 - (match_operand:VF_128 2 "" "")) - (match_operand:VF_128 1 "" "%v") - (neg:VF_128 - (match_operand:VF_128 3 "register_operand" "0"))) + [(set (match_operand:VFH_128 0 "register_operand" "=v") + (vec_merge:VFH_128 + (vec_merge:VFH_128 + (fma:VFH_128 + (neg:VFH_128 + (match_operand:VFH_128 2 "" "")) + (match_operand:VFH_128 1 "" "%v") + (neg:VFH_128 + (match_operand:VFH_128 3 "register_operand" "0"))) (match_dup 3) (match_operand:QI 4 "register_operand" "Yk")) (match_dup 3) @@ -5390,16 +5404,16 @@ (define_insn "*avx512f_vmfnmsub__mask3" (set_attr "mode" "")]) (define_insn "*avx512f_vmfnmsub__maskz_1" - [(set (match_operand:VF_128 0 "register_operand" "=v,v") - (vec_merge:VF_128 - (vec_merge:VF_128 - (fma:VF_128 - (neg:VF_128 - (match_operand:VF_128 2 "" ",v")) - (match_operand:VF_128 1 "register_operand" "0,0") - (neg:VF_128 - (match_operand:VF_128 3 "" "v,"))) - (match_operand:VF_128 4 "const0_operand" "C,C") + [(set (match_operand:VFH_128 0 "register_operand" "=v,v") + (vec_merge:VFH_128 + (vec_merge:VFH_128 + (fma:VFH_128 + (neg:VFH_128 + (match_operand:VFH_128 2 "" ",v")) + (match_operand:VFH_128 1 "register_operand" "0,0") + (neg:VFH_128 + (match_operand:VFH_128 3 "" "v,"))) + (match_operand:VFH_128 4 "const0_operand" "C,C") (match_operand:QI 5 "register_operand" "Yk,Yk")) (match_dup 1) (const_int 1)))] diff --git a/gcc/testsuite/gcc.target/i386/avx-1.c b/gcc/testsuite/gcc.target/i386/avx-1.c index d2ab16538d8..6c2d1dc3df4 100644 --- a/gcc/testsuite/gcc.target/i386/avx-1.c +++ b/gcc/testsuite/gcc.target/i386/avx-1.c @@ -775,6 +775,18 @@ #define __builtin_ia32_vfnmsubph512_mask(A, B, C, D, E) __builtin_ia32_vfnmsubph512_mask(A, B, C, D, 8) #define __builtin_ia32_vfnmsubph512_mask3(A, B, C, D, E) __builtin_ia32_vfnmsubph512_mask3(A, B, C, D, 8) #define __builtin_ia32_vfnmsubph512_maskz(A, B, C, D, E) __builtin_ia32_vfnmsubph512_maskz(A, B, C, D, 8) +#define __builtin_ia32_vfmaddsh3_mask(A, B, C, D, E) __builtin_ia32_vfmaddsh3_mask(A, B, C, D, 8) +#define __builtin_ia32_vfmaddsh3_mask3(A, B, C, D, E) __builtin_ia32_vfmaddsh3_mask3(A, B, C, D, 8) +#define __builtin_ia32_vfmaddsh3_maskz(A, B, C, D, E) __builtin_ia32_vfmaddsh3_maskz(A, B, C, D, 8) +#define __builtin_ia32_vfnmaddsh3_mask(A, B, C, D, E) __builtin_ia32_vfnmaddsh3_mask(A, B, C, D, 8) +#define __builtin_ia32_vfnmaddsh3_mask3(A, B, C, D, E) __builtin_ia32_vfnmaddsh3_mask3(A, B, C, D, 8) +#define __builtin_ia32_vfnmaddsh3_maskz(A, B, C, D, E) __builtin_ia32_vfnmaddsh3_maskz(A, B, C, D, 8) +#define __builtin_ia32_vfmsubsh3_mask(A, B, C, D, E) __builtin_ia32_vfmsubsh3_mask(A, B, C, D, 8) +#define __builtin_ia32_vfmsubsh3_mask3(A, B, C, D, E) __builtin_ia32_vfmsubsh3_mask3(A, B, C, D, 8) +#define __builtin_ia32_vfmsubsh3_maskz(A, B, C, D, E) __builtin_ia32_vfmsubsh3_maskz(A, B, C, D, 8) +#define __builtin_ia32_vfnmsubsh3_mask(A, B, C, D, E) __builtin_ia32_vfnmsubsh3_mask(A, B, C, D, 8) +#define __builtin_ia32_vfnmsubsh3_mask3(A, B, C, D, E) __builtin_ia32_vfnmsubsh3_mask3(A, B, C, D, 8) +#define __builtin_ia32_vfnmsubsh3_maskz(A, B, C, D, E) __builtin_ia32_vfnmsubsh3_maskz(A, B, C, D, 8) /* avx512fp16vlintrin.h */ #define __builtin_ia32_vcmpph_v8hf_mask(A, B, C, D) __builtin_ia32_vcmpph_v8hf_mask(A, B, 1, D) diff --git a/gcc/testsuite/gcc.target/i386/sse-13.c b/gcc/testsuite/gcc.target/i386/sse-13.c index 49c72f6fcef..f16be008909 100644 --- a/gcc/testsuite/gcc.target/i386/sse-13.c +++ b/gcc/testsuite/gcc.target/i386/sse-13.c @@ -792,6 +792,18 @@ #define __builtin_ia32_vfnmsubph512_mask(A, B, C, D, E) __builtin_ia32_vfnmsubph512_mask(A, B, C, D, 8) #define __builtin_ia32_vfnmsubph512_mask3(A, B, C, D, E) __builtin_ia32_vfnmsubph512_mask3(A, B, C, D, 8) #define __builtin_ia32_vfnmsubph512_maskz(A, B, C, D, E) __builtin_ia32_vfnmsubph512_maskz(A, B, C, D, 8) +#define __builtin_ia32_vfmaddsh3_mask(A, B, C, D, E) __builtin_ia32_vfmaddsh3_mask(A, B, C, D, 8) +#define __builtin_ia32_vfmaddsh3_mask3(A, B, C, D, E) __builtin_ia32_vfmaddsh3_mask3(A, B, C, D, 8) +#define __builtin_ia32_vfmaddsh3_maskz(A, B, C, D, E) __builtin_ia32_vfmaddsh3_maskz(A, B, C, D, 8) +#define __builtin_ia32_vfnmaddsh3_mask(A, B, C, D, E) __builtin_ia32_vfnmaddsh3_mask(A, B, C, D, 8) +#define __builtin_ia32_vfnmaddsh3_mask3(A, B, C, D, E) __builtin_ia32_vfnmaddsh3_mask3(A, B, C, D, 8) +#define __builtin_ia32_vfnmaddsh3_maskz(A, B, C, D, E) __builtin_ia32_vfnmaddsh3_maskz(A, B, C, D, 8) +#define __builtin_ia32_vfmsubsh3_mask(A, B, C, D, E) __builtin_ia32_vfmsubsh3_mask(A, B, C, D, 8) +#define __builtin_ia32_vfmsubsh3_mask3(A, B, C, D, E) __builtin_ia32_vfmsubsh3_mask3(A, B, C, D, 8) +#define __builtin_ia32_vfmsubsh3_maskz(A, B, C, D, E) __builtin_ia32_vfmsubsh3_maskz(A, B, C, D, 8) +#define __builtin_ia32_vfnmsubsh3_mask(A, B, C, D, E) __builtin_ia32_vfnmsubsh3_mask(A, B, C, D, 8) +#define __builtin_ia32_vfnmsubsh3_mask3(A, B, C, D, E) __builtin_ia32_vfnmsubsh3_mask3(A, B, C, D, 8) +#define __builtin_ia32_vfnmsubsh3_maskz(A, B, C, D, E) __builtin_ia32_vfnmsubsh3_maskz(A, B, C, D, 8) /* avx512fp16vlintrin.h */ #define __builtin_ia32_vcmpph_v8hf_mask(A, B, C, D) __builtin_ia32_vcmpph_v8hf_mask(A, B, 1, D) diff --git a/gcc/testsuite/gcc.target/i386/sse-14.c b/gcc/testsuite/gcc.target/i386/sse-14.c index 9151e50afd2..01ac4e04173 100644 --- a/gcc/testsuite/gcc.target/i386/sse-14.c +++ b/gcc/testsuite/gcc.target/i386/sse-14.c @@ -842,6 +842,10 @@ test_3 (_mm512_fmadd_round_ph, __m512h, __m512h, __m512h, __m512h, 9) test_3 (_mm512_fnmadd_round_ph, __m512h, __m512h, __m512h, __m512h, 9) test_3 (_mm512_fmsub_round_ph, __m512h, __m512h, __m512h, __m512h, 9) test_3 (_mm512_fnmsub_round_ph, __m512h, __m512h, __m512h, __m512h, 9) +test_3 (_mm_fmadd_round_sh, __m128h, __m128h, __m128h, __m128h, 9) +test_3 (_mm_fnmadd_round_sh, __m128h, __m128h, __m128h, __m128h, 9) +test_3 (_mm_fmsub_round_sh, __m128h, __m128h, __m128h, __m128h, 9) +test_3 (_mm_fnmsub_round_sh, __m128h, __m128h, __m128h, __m128h, 9) test_3x (_mm512_mask_cmp_round_ph_mask, __mmask32, __mmask32, __m512h, __m512h, 1, 8) test_3x (_mm_mask_cmp_round_sh_mask, __mmask8, __mmask8, __m128h, __m128h, 1, 8) test_3x (_mm512_mask_reduce_round_ph, __m512h, __m512h, __mmask32, __m512h, 123, 8) @@ -892,6 +896,18 @@ test_4 (_mm512_maskz_fmsub_round_ph, __m512h, __mmask32, __m512h, __m512h, __m51 test_4 (_mm512_mask_fnmsub_round_ph, __m512h, __m512h, __mmask32, __m512h, __m512h, 9) test_4 (_mm512_mask3_fnmsub_round_ph, __m512h, __m512h, __m512h, __m512h, __mmask32, 9) test_4 (_mm512_maskz_fnmsub_round_ph, __m512h, __mmask32, __m512h, __m512h, __m512h, 9) +test_4 (_mm_mask_fmadd_round_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 9) +test_4 (_mm_mask3_fmadd_round_sh, __m128h, __m128h, __m128h, __m128h, __mmask8, 9) +test_4 (_mm_maskz_fmadd_round_sh, __m128h, __mmask8, __m128h, __m128h, __m128h, 9) +test_4 (_mm_mask_fnmadd_round_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 9) +test_4 (_mm_mask3_fnmadd_round_sh, __m128h, __m128h, __m128h, __m128h, __mmask8, 9) +test_4 (_mm_maskz_fnmadd_round_sh, __m128h, __mmask8, __m128h, __m128h, __m128h, 9) +test_4 (_mm_mask_fmsub_round_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 9) +test_4 (_mm_mask3_fmsub_round_sh, __m128h, __m128h, __m128h, __m128h, __mmask8, 9) +test_4 (_mm_maskz_fmsub_round_sh, __m128h, __mmask8, __m128h, __m128h, __m128h, 9) +test_4 (_mm_mask_fnmsub_round_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 9) +test_4 (_mm_mask3_fnmsub_round_sh, __m128h, __m128h, __m128h, __m128h, __mmask8, 9) +test_4 (_mm_maskz_fnmsub_round_sh, __m128h, __mmask8, __m128h, __m128h, __m128h, 9) test_4x (_mm_mask_reduce_round_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 123, 8) test_4x (_mm_mask_roundscale_round_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 123, 8) test_4x (_mm_mask_getmant_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 1, 1) diff --git a/gcc/testsuite/gcc.target/i386/sse-22.c b/gcc/testsuite/gcc.target/i386/sse-22.c index 892b6334ae2..79e3f35ab86 100644 --- a/gcc/testsuite/gcc.target/i386/sse-22.c +++ b/gcc/testsuite/gcc.target/i386/sse-22.c @@ -945,6 +945,10 @@ test_3 (_mm512_fmadd_round_ph, __m512h, __m512h, __m512h, __m512h, 9) test_3 (_mm512_fnmadd_round_ph, __m512h, __m512h, __m512h, __m512h, 9) test_3 (_mm512_fmsub_round_ph, __m512h, __m512h, __m512h, __m512h, 9) test_3 (_mm512_fnmsub_round_ph, __m512h, __m512h, __m512h, __m512h, 9) +test_3 (_mm_fmadd_round_sh, __m128h, __m128h, __m128h, __m128h, 9) +test_3 (_mm_fnmadd_round_sh, __m128h, __m128h, __m128h, __m128h, 9) +test_3 (_mm_fmsub_round_sh, __m128h, __m128h, __m128h, __m128h, 9) +test_3 (_mm_fnmsub_round_sh, __m128h, __m128h, __m128h, __m128h, 9) test_3x (_mm512_mask_cmp_round_ph_mask, __mmask32, __mmask32, __m512h, __m512h, 1, 8) test_3x (_mm_mask_cmp_round_sh_mask, __mmask8, __mmask8, __m128h, __m128h, 1, 8) test_3x (_mm512_mask_reduce_round_ph, __m512h, __m512h, __mmask32, __m512h, 123, 8) @@ -994,6 +998,18 @@ test_4 (_mm512_maskz_fmsub_round_ph, __m512h, __mmask32, __m512h, __m512h, __m51 test_4 (_mm512_mask_fnmsub_round_ph, __m512h, __m512h, __mmask32, __m512h, __m512h, 9) test_4 (_mm512_mask3_fnmsub_round_ph, __m512h, __m512h, __m512h, __m512h, __mmask32, 9) test_4 (_mm512_maskz_fnmsub_round_ph, __m512h, __mmask32, __m512h, __m512h, __m512h, 9) +test_4 (_mm_mask_fmadd_round_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 9) +test_4 (_mm_mask3_fmadd_round_sh, __m128h, __m128h, __m128h, __m128h, __mmask8, 9) +test_4 (_mm_maskz_fmadd_round_sh, __m128h, __mmask8, __m128h, __m128h, __m128h, 9) +test_4 (_mm_mask_fnmadd_round_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 9) +test_4 (_mm_mask3_fnmadd_round_sh, __m128h, __m128h, __m128h, __m128h, __mmask8, 9) +test_4 (_mm_maskz_fnmadd_round_sh, __m128h, __mmask8, __m128h, __m128h, __m128h, 9) +test_4 (_mm_mask_fmsub_round_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 9) +test_4 (_mm_mask3_fmsub_round_sh, __m128h, __m128h, __m128h, __m128h, __mmask8, 9) +test_4 (_mm_maskz_fmsub_round_sh, __m128h, __mmask8, __m128h, __m128h, __m128h, 9) +test_4 (_mm_mask_fnmsub_round_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 9) +test_4 (_mm_mask3_fnmsub_round_sh, __m128h, __m128h, __m128h, __m128h, __mmask8, 9) +test_4 (_mm_maskz_fnmsub_round_sh, __m128h, __mmask8, __m128h, __m128h, __m128h, 9) test_4x (_mm_mask_reduce_round_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 123, 8) test_4x (_mm_mask_roundscale_round_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 123, 8) test_4x (_mm_mask_getmant_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 1, 1) diff --git a/gcc/testsuite/gcc.target/i386/sse-23.c b/gcc/testsuite/gcc.target/i386/sse-23.c index 447b83829f3..caf14408b91 100644 --- a/gcc/testsuite/gcc.target/i386/sse-23.c +++ b/gcc/testsuite/gcc.target/i386/sse-23.c @@ -793,6 +793,18 @@ #define __builtin_ia32_vfnmsubph512_mask(A, B, C, D, E) __builtin_ia32_vfnmsubph512_mask(A, B, C, D, 8) #define __builtin_ia32_vfnmsubph512_mask3(A, B, C, D, E) __builtin_ia32_vfnmsubph512_mask3(A, B, C, D, 8) #define __builtin_ia32_vfnmsubph512_maskz(A, B, C, D, E) __builtin_ia32_vfnmsubph512_maskz(A, B, C, D, 8) +#define __builtin_ia32_vfmaddsh3_mask(A, B, C, D, E) __builtin_ia32_vfmaddsh3_mask(A, B, C, D, 8) +#define __builtin_ia32_vfmaddsh3_mask3(A, B, C, D, E) __builtin_ia32_vfmaddsh3_mask3(A, B, C, D, 8) +#define __builtin_ia32_vfmaddsh3_maskz(A, B, C, D, E) __builtin_ia32_vfmaddsh3_maskz(A, B, C, D, 8) +#define __builtin_ia32_vfnmaddsh3_mask(A, B, C, D, E) __builtin_ia32_vfnmaddsh3_mask(A, B, C, D, 8) +#define __builtin_ia32_vfnmaddsh3_mask3(A, B, C, D, E) __builtin_ia32_vfnmaddsh3_mask3(A, B, C, D, 8) +#define __builtin_ia32_vfnmaddsh3_maskz(A, B, C, D, E) __builtin_ia32_vfnmaddsh3_maskz(A, B, C, D, 8) +#define __builtin_ia32_vfmsubsh3_mask(A, B, C, D, E) __builtin_ia32_vfmsubsh3_mask(A, B, C, D, 8) +#define __builtin_ia32_vfmsubsh3_mask3(A, B, C, D, E) __builtin_ia32_vfmsubsh3_mask3(A, B, C, D, 8) +#define __builtin_ia32_vfmsubsh3_maskz(A, B, C, D, E) __builtin_ia32_vfmsubsh3_maskz(A, B, C, D, 8) +#define __builtin_ia32_vfnmsubsh3_mask(A, B, C, D, E) __builtin_ia32_vfnmsubsh3_mask(A, B, C, D, 8) +#define __builtin_ia32_vfnmsubsh3_mask3(A, B, C, D, E) __builtin_ia32_vfnmsubsh3_mask3(A, B, C, D, 8) +#define __builtin_ia32_vfnmsubsh3_maskz(A, B, C, D, E) __builtin_ia32_vfnmsubsh3_maskz(A, B, C, D, 8) /* avx512fp16vlintrin.h */ #define __builtin_ia32_vcmpph_v8hf_mask(A, B, C, D) __builtin_ia32_vcmpph_v8hf_mask(A, B, 1, D) From patchwork Thu Jul 1 06:16:34 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: liuhongt X-Patchwork-Id: 1499400 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=8.43.85.97; helo=sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256 header.s=default header.b=RHC0IN3j; dkim-atps=neutral Received: from sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4GFqBY33jJz9sVb for ; Thu, 1 Jul 2021 17:11:17 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 20BF53951809 for ; Thu, 1 Jul 2021 07:11:15 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 20BF53951809 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1625123475; bh=8/FVwfGIzbK5bez4zoC+tHZRIOoyc9pTdDc6EAc9T84=; h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=RHC0IN3js1FtIO8BOS4J+bQWGTvxzML1rHbSsQMadg3dEPAisZuVhrR7+nZ8j7V+B ioAiBWNmBhPn40hK7bz8m4Z/UJzXEwbTa1Z4KfyMngCbD4nOqmoPRqPRK3kJRd+mvO wLS5kRb7fw1TwJxK9ZKH+7pezB6QI0MblCaCnN5w= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mga05.intel.com (mga05.intel.com [192.55.52.43]) by sourceware.org (Postfix) with ESMTPS id 84C8C3840020 for ; Thu, 1 Jul 2021 06:18:10 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 84C8C3840020 X-IronPort-AV: E=McAfee;i="6200,9189,10031"; a="294115043" X-IronPort-AV: E=Sophos;i="5.83,313,1616482800"; d="scan'208";a="294115043" Received: from orsmga001.jf.intel.com ([10.7.209.18]) by fmsmga105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Jun 2021 23:18:09 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.83,313,1616482800"; d="scan'208";a="489821512" Received: from scymds01.sc.intel.com ([10.148.94.138]) by orsmga001.jf.intel.com with ESMTP; 30 Jun 2021 23:18:09 -0700 Received: from shliclel320.sh.intel.com (shliclel320.sh.intel.com [10.239.236.50]) by scymds01.sc.intel.com with ESMTP id 1616GmfR031625; Wed, 30 Jun 2021 23:18:07 -0700 To: gcc-patches@gcc.gnu.org Subject: [PATCH 48/62] AVX512FP16: Add testcase for scalar FMA instructions. Date: Thu, 1 Jul 2021 14:16:34 +0800 Message-Id: <20210701061648.9447-49-hongtao.liu@intel.com> X-Mailer: git-send-email 2.18.1 In-Reply-To: <20210701061648.9447-1-hongtao.liu@intel.com> References: <20210701061648.9447-1-hongtao.liu@intel.com> X-Spam-Status: No, score=-12.2 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_NONE, KAM_DMARC_STATUS, KAM_LAZY_DOMAIN_SECURITY, KAM_SHORT, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: liuhongt via Gcc-patches From: liuhongt Reply-To: liuhongt Cc: jakub@redhat.com Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" gcc/testsuite/ChangeLog: * gcc.target/i386/avx512fp16-vfmaddXXXsh-1a.c: New test. * gcc.target/i386/avx512fp16-vfmaddXXXsh-1b.c: Ditto. * gcc.target/i386/avx512fp16-vfmsubXXXsh-1a.c: Ditto. * gcc.target/i386/avx512fp16-vfmsubXXXsh-1b.c: Ditto. * gcc.target/i386/avx512fp16-vfnmaddXXXsh-1a.c: Ditto. * gcc.target/i386/avx512fp16-vfnmaddXXXsh-1b.c: Ditto. * gcc.target/i386/avx512fp16-vfnmsubXXXsh-1a.c: Ditto. * gcc.target/i386/avx512fp16-vfnmsubXXXsh-1b.c: Ditto. --- .../i386/avx512fp16-vfmaddXXXsh-1a.c | 28 ++++++ .../i386/avx512fp16-vfmaddXXXsh-1b.c | 90 +++++++++++++++++++ .../i386/avx512fp16-vfmsubXXXsh-1a.c | 28 ++++++ .../i386/avx512fp16-vfmsubXXXsh-1b.c | 89 ++++++++++++++++++ .../i386/avx512fp16-vfnmaddXXXsh-1a.c | 32 +++++++ .../i386/avx512fp16-vfnmaddXXXsh-1b.c | 90 +++++++++++++++++++ .../i386/avx512fp16-vfnmsubXXXsh-1a.c | 28 ++++++ .../i386/avx512fp16-vfnmsubXXXsh-1b.c | 90 +++++++++++++++++++ 8 files changed, 475 insertions(+) create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vfmaddXXXsh-1a.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vfmaddXXXsh-1b.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vfmsubXXXsh-1a.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vfmsubXXXsh-1b.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vfnmaddXXXsh-1a.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vfnmaddXXXsh-1b.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vfnmsubXXXsh-1a.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vfnmsubXXXsh-1b.c diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vfmaddXXXsh-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vfmaddXXXsh-1a.c new file mode 100644 index 00000000000..472454d116d --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vfmaddXXXsh-1a.c @@ -0,0 +1,28 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx512fp16 -O2" } */ +/* { dg-final { scan-assembler-times "vfmadd...sh\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vfmadd...sh\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 2 } } */ +/* { dg-final { scan-assembler-times "vfmadd231sh\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vfmadd...sh\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vfmadd...sh\[ \\t\]+\[^\n\]*\{rn-sae\}\[^\{\n\]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vfmadd...sh\[ \\t\]+\[^\n\]*\{rd-sae\}\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vfmadd231sh\[ \\t\]+\[^\n\]*\{ru-sae\}\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vfmadd...sh\[ \\t\]+\[^\n\]*\{rz-sae\}\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ + +#include + +volatile __m128h a, b, c; +volatile __mmask8 m; + +void extern +avx512f_test (void) +{ + a = _mm_fmadd_sh (a, b, c); + a = _mm_mask_fmadd_sh (a, m, b, c); + c = _mm_mask3_fmadd_sh (a, b, c, m); + a = _mm_maskz_fmadd_sh (m, a, b, c); + a = _mm_fmadd_round_sh (a, b, c, _MM_FROUND_TO_NEAREST_INT | _MM_FROUND_NO_EXC); + a = _mm_mask_fmadd_round_sh (a, m, b, c, _MM_FROUND_TO_NEG_INF | _MM_FROUND_NO_EXC); + c = _mm_mask3_fmadd_round_sh (a, b, c, m, _MM_FROUND_TO_POS_INF | _MM_FROUND_NO_EXC); + a = _mm_maskz_fmadd_round_sh (m, a, b, c, _MM_FROUND_TO_ZERO | _MM_FROUND_NO_EXC); +} diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vfmaddXXXsh-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vfmaddXXXsh-1b.c new file mode 100644 index 00000000000..a0eca9cde3a --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vfmaddXXXsh-1b.c @@ -0,0 +1,90 @@ +/* { dg-do run { target avx512fp16 } } */ +/* { dg-options "-O2 -mavx512fp16 -mavx512dq" } */ + + +#define AVX512FP16 +#include "avx512fp16-helper.h" + +#define N_ELEMS 8 + +void NOINLINE +emulate_fmadd_sh(V512 * dest, V512 op1, V512 op2, + __mmask8 k, int zero_mask, int mask3) +{ + V512 v1, v2, v3, v4, v5, v6, v7, v8; + int i; + + unpack_ph_2twops(op1, &v1, &v2); + unpack_ph_2twops(op2, &v3, &v4); + unpack_ph_2twops(*dest, &v7, &v8); + + if ((k&1) || !k) + v5.f32[0] = v1.f32[0] * v3.f32[0] + v7.f32[0]; + else if (zero_mask) + v5.f32[0] = 0; + else + v5.f32[0] = v7.f32[0]; + + for (i = 1; i < 8; i++){ + if (mask3) + v5.f32[i] = v7.f32[i]; + else + v5.f32[i] = v1.f32[i]; + } + *dest = pack_twops_2ph(v5, v6); +} + +void +test_512 (void) +{ + V512 res; + V512 exp; + + init_src(); + + init_dest(&res, &exp); + emulate_fmadd_sh(&exp, src1, src2, 0x1, 0, 0); + res.xmmh[0] = _mm_fmadd_sh(src1.xmmh[0], src2.xmmh[0], res.xmmh[0]); + check_results(&res, &exp, N_ELEMS, "_mm_fmadd_sh"); + init_dest(&res, &exp); + emulate_fmadd_sh(&exp, src1, src2, 0x1, 0, 1); + res.xmmh[0] = _mm_mask3_fmadd_sh(src1.xmmh[0], src2.xmmh[0], res.xmmh[0], + 0x1); + check_results(&res, &exp, N_ELEMS, "_mm_mask3_fmadd_sh"); + init_dest(&res, &exp); + emulate_fmadd_sh(&exp, src1, src2, 0x1, 0, 0); + res.xmmh[0] = _mm_mask_fmadd_sh(src1.xmmh[0], 0x1, src2.xmmh[0], + res.xmmh[0]); + check_results(&res, &exp, N_ELEMS, "_mm_mask_fmadd_sh"); + init_dest(&res, &exp); + emulate_fmadd_sh(&exp, src1, src2, 0x3, 1, 0); + res.xmmh[0] = _mm_maskz_fmadd_sh(0x3, src1.xmmh[0], src2.xmmh[0], + res.xmmh[0]); + check_results(&res, &exp, N_ELEMS, "_mm_maskz_fmadd_sh"); + + init_dest(&res, &exp); + emulate_fmadd_sh(&exp, src1, src2, 0x1, 0, 0); + res.xmmh[0] = _mm_fmadd_round_sh(src1.xmmh[0], src2.xmmh[0], res.xmmh[0], + _ROUND_NINT); + check_results(&res, &exp, N_ELEMS, "_mm_fmadd_sh"); + init_dest(&res, &exp); + emulate_fmadd_sh(&exp, src1, src2, 0x1, 0, 1); + res.xmmh[0] = _mm_mask3_fmadd_round_sh(src1.xmmh[0], src2.xmmh[0], + res.xmmh[0], 0x1, _ROUND_NINT); + check_results(&res, &exp, N_ELEMS, "_mm_mask3_fmadd_sh"); + init_dest(&res, &exp); + emulate_fmadd_sh(&exp, src1, src2, 0x1, 0, 0); + res.xmmh[0] = _mm_mask_fmadd_round_sh(src1.xmmh[0], 0x1, src2.xmmh[0], + res.xmmh[0], _ROUND_NINT); + check_results(&res, &exp, N_ELEMS, "_mm_mask_fmadd_sh"); + init_dest(&res, &exp); + emulate_fmadd_sh(&exp, src1, src2, 0x3, 1, 0); + res.xmmh[0] = _mm_maskz_fmadd_round_sh(0x3, src1.xmmh[0], src2.xmmh[0], + res.xmmh[0], _ROUND_NINT); + check_results(&res, &exp, N_ELEMS, "_mm_maskz_fmadd_sh"); + + if (n_errs != 0) { + abort (); + } +} + diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vfmsubXXXsh-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vfmsubXXXsh-1a.c new file mode 100644 index 00000000000..335b9e21fcf --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vfmsubXXXsh-1a.c @@ -0,0 +1,28 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx512fp16 -O2" } */ +/* { dg-final { scan-assembler-times "vfmsub...sh\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vfmsub...sh\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 2 } } */ +/* { dg-final { scan-assembler-times "vfmsub231sh\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vfmsub...sh\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vfmsub...sh\[ \\t\]+\[^\n\]*\{rn-sae\}\[^\{\n\]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vfmsub...sh\[ \\t\]+\[^\n\]*\{rd-sae\}\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vfmsub231sh\[ \\t\]+\[^\n\]*\{ru-sae\}\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vfmsub...sh\[ \\t\]+\[^\n\]*\{rz-sae\}\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ + +#include + +volatile __m128h a, b, c; +volatile __mmask8 m; + +void extern +avx512f_test (void) +{ + a = _mm_fmsub_sh (a, b, c); + a = _mm_mask_fmsub_sh (a, m, b, c); + c = _mm_mask3_fmsub_sh (a, b, c, m); + a = _mm_maskz_fmsub_sh (m, a, b, c); + a = _mm_fmsub_round_sh (a, b, c, _MM_FROUND_TO_NEAREST_INT | _MM_FROUND_NO_EXC); + a = _mm_mask_fmsub_round_sh (a, m, b, c, _MM_FROUND_TO_NEG_INF | _MM_FROUND_NO_EXC); + c = _mm_mask3_fmsub_round_sh (a, b, c, m, _MM_FROUND_TO_POS_INF | _MM_FROUND_NO_EXC); + a = _mm_maskz_fmsub_round_sh (m, a, b, c, _MM_FROUND_TO_ZERO | _MM_FROUND_NO_EXC); +} diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vfmsubXXXsh-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vfmsubXXXsh-1b.c new file mode 100644 index 00000000000..a2563fa816e --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vfmsubXXXsh-1b.c @@ -0,0 +1,89 @@ +/* { dg-do run { target avx512fp16 } } */ +/* { dg-options "-O2 -mavx512fp16 -mavx512dq" } */ + + +#define AVX512FP16 +#include "avx512fp16-helper.h" + +#define N_ELEMS 8 + +void NOINLINE +emulate_fmsub_sh(V512 * dest, V512 op1, V512 op2, + __mmask8 k, int zero_mask, int mask3) +{ + V512 v1, v2, v3, v4, v5, v6, v7, v8; + int i; + + unpack_ph_2twops(op1, &v1, &v2); + unpack_ph_2twops(op2, &v3, &v4); + unpack_ph_2twops(*dest, &v7, &v8); + + if ((k&1) || !k) + v5.f32[0] = v1.f32[0] * v3.f32[0] - v7.f32[0]; + else if (zero_mask) + v5.f32[0] = 0; + else + v5.f32[0] = v7.f32[0]; + for (i = 1; i < 8; i++){ + if (mask3) + v5.f32[i] = v7.f32[i]; + else + v5.f32[i] = v1.f32[i]; + } + *dest = pack_twops_2ph(v5, v6); +} + +void +test_512 (void) +{ + V512 res; + V512 exp; + + init_src(); + init_dest(&res, &exp); + emulate_fmsub_sh(&exp, src1, src2, 0x1, 0, 0); + res.xmmh[0] = _mm_fmsub_sh(src1.xmmh[0], + src2.xmmh[0], res.xmmh[0]); + check_results(&res, &exp, N_ELEMS, "_mm_fmsub_sh"); + init_dest(&res, &exp); + emulate_fmsub_sh(&exp, src1, src2, 0x1, 0, 1); + res.xmmh[0] = _mm_mask3_fmsub_sh(src1.xmmh[0], src2.xmmh[0], res.xmmh[0], + 0x1); + check_results(&res, &exp, N_ELEMS, "_mm_mask3_fmsub_sh"); + init_dest(&res, &exp); + emulate_fmsub_sh(&exp, src1, src2, 0x1, 0, 0); + res.xmmh[0] = _mm_mask_fmsub_sh(src1.xmmh[0], 0x1, src2.xmmh[0], res.xmmh[0]); + check_results(&res, &exp, N_ELEMS, "_mm_mask_fmsub_sh"); + init_dest(&res, &exp); + emulate_fmsub_sh(&exp, src1, src2, 0x3, 1, 0); + res.xmmh[0] = _mm_maskz_fmsub_sh(0x3, src1.xmmh[0], src2.xmmh[0], + res.xmmh[0]); + check_results(&res, &exp, N_ELEMS, "_mm_maskz_fmsub_sh"); + + init_dest(&res, &exp); + emulate_fmsub_sh(&exp, src1, src2, 0x1, 0, 0); + res.xmmh[0] = _mm_fmsub_round_sh(src1.xmmh[0], src2.xmmh[0], res.xmmh[0], + _ROUND_NINT); + check_results(&res, &exp, N_ELEMS, "_mm_fmsub_sh"); + init_dest(&res, &exp); + emulate_fmsub_sh(&exp, src1, src2, 0x1, 0, 1); + res.xmmh[0] = _mm_mask3_fmsub_round_sh(src1.xmmh[0], src2.xmmh[0], + res.xmmh[0], 0x1, _ROUND_NINT); + check_results(&res, &exp, N_ELEMS, "_mm_mask3_fmsub_sh"); + init_dest(&res, &exp); + emulate_fmsub_sh(&exp, src1, src2, 0x1, 0, 0); + res.xmmh[0] = _mm_mask_fmsub_round_sh(src1.xmmh[0], 0x1, src2.xmmh[0], + res.xmmh[0], _ROUND_NINT); + check_results(&res, &exp, N_ELEMS, "_mm_mask_fmsub_sh"); + init_dest(&res, &exp); + emulate_fmsub_sh(&exp, src1, src2, 0x3, 1, 0); + res.xmmh[0] = _mm_maskz_fmsub_round_sh(0x3, src1.xmmh[0], src2.xmmh[0], + res.xmmh[0], _ROUND_NINT); + check_results(&res, &exp, N_ELEMS, "_mm_maskz_fmsub_sh"); + + + if (n_errs != 0) { + abort (); + } +} + diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vfnmaddXXXsh-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vfnmaddXXXsh-1a.c new file mode 100644 index 00000000000..77106aaeecb --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vfnmaddXXXsh-1a.c @@ -0,0 +1,32 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx512fp16 -O2" } */ +/* { dg-final { scan-assembler-times "vfnmadd...sh\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vfnmadd...sh\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 2 } } */ +/* { dg-final { scan-assembler-times "vfnmadd231sh\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vfnmadd...sh\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vfnmadd...sh\[ \\t\]+\[^\n\]*\{rn-sae\}\[^\{\n\]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vfnmadd...sh\[ \\t\]+\[^\n\]*\{rd-sae\}\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vfnmadd231sh\[ \\t\]+\[^\n\]*\{ru-sae\}\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vfnmadd...sh\[ \\t\]+\[^\n\]*\{rz-sae\}\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ + +#include + +volatile __m128h a, b, c; +volatile __mmask8 m; + +void extern +avx512f_test (void) +{ + a = _mm_fnmadd_sh (a, b, c); + a = _mm_mask_fnmadd_sh (a, m, b, c); + c = _mm_mask3_fnmadd_sh (a, b, c, m); + a = _mm_maskz_fnmadd_sh (m, a, b, c); + a = _mm_fnmadd_round_sh (a, b, c, _MM_FROUND_TO_NEAREST_INT + | _MM_FROUND_NO_EXC); + a = _mm_mask_fnmadd_round_sh (a, m, b, c, _MM_FROUND_TO_NEG_INF + | _MM_FROUND_NO_EXC); + c = _mm_mask3_fnmadd_round_sh (a, b, c, m, _MM_FROUND_TO_POS_INF + | _MM_FROUND_NO_EXC); + a = _mm_maskz_fnmadd_round_sh (m, a, b, c, _MM_FROUND_TO_ZERO + | _MM_FROUND_NO_EXC); +} diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vfnmaddXXXsh-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vfnmaddXXXsh-1b.c new file mode 100644 index 00000000000..92001508424 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vfnmaddXXXsh-1b.c @@ -0,0 +1,90 @@ +/* { dg-do run { target avx512fp16 } } */ +/* { dg-options "-O2 -mavx512fp16 -mavx512dq" } */ + + +#define AVX512FP16 +#include "avx512fp16-helper.h" + +#define N_ELEMS 8 + +void NOINLINE +emulate_fnmadd_sh(V512 * dest, V512 op1, V512 op2, + __mmask8 k, int zero_mask, int mask3) +{ + V512 v1, v2, v3, v4, v5, v6, v7, v8; + int i; + + unpack_ph_2twops(op1, &v1, &v2); + unpack_ph_2twops(op2, &v3, &v4); + unpack_ph_2twops(*dest, &v7, &v8); + + if ((k&1) || !k) + v5.f32[0] = -(v1.f32[0] * v3.f32[0]) + v7.f32[0]; + else if (zero_mask) + v5.f32[0] = 0; + else + v5.f32[0] = v7.f32[0]; + + for (i = 1; i < 8; i++){ + if (mask3) + v5.f32[i] = v7.f32[i]; + else + v5.f32[i] = v1.f32[i]; + } + *dest = pack_twops_2ph(v5, v6); +} + +void +test_512 (void) +{ + V512 res; + V512 exp; + + init_src(); + + init_dest(&res, &exp); + emulate_fnmadd_sh(&exp, src1, src2, 0x1, 0, 0); + res.xmmh[0] = _mm_fnmadd_sh(src1.xmmh[0], src2.xmmh[0], res.xmmh[0]); + check_results(&res, &exp, N_ELEMS, "_mm_fnmadd_sh"); + init_dest(&res, &exp); + emulate_fnmadd_sh(&exp, src1, src2, 0x1, 0, 1); + res.xmmh[0] = _mm_mask3_fnmadd_sh(src1.xmmh[0], src2.xmmh[0], res.xmmh[0], + 0x1); + check_results(&res, &exp, N_ELEMS, "_mm_mask3_fnmadd_sh"); + init_dest(&res, &exp); + emulate_fnmadd_sh(&exp, src1, src2, 0x1, 0, 0); + res.xmmh[0] = _mm_mask_fnmadd_sh(src1.xmmh[0], 0x1, src2.xmmh[0], + res.xmmh[0]); + check_results(&res, &exp, N_ELEMS, "_mm_mask_fnmadd_sh"); + init_dest(&res, &exp); + emulate_fnmadd_sh(&exp, src1, src2, 0x3, 1, 0); + res.xmmh[0] = _mm_maskz_fnmadd_sh(0x3, src1.xmmh[0], src2.xmmh[0], + res.xmmh[0]); + check_results(&res, &exp, N_ELEMS, "_mm_maskz_fnmadd_sh"); + + init_dest(&res, &exp); + emulate_fnmadd_sh(&exp, src1, src2, 0x1, 0, 0); + res.xmmh[0] = _mm_fnmadd_round_sh(src1.xmmh[0], src2.xmmh[0], res.xmmh[0], + _ROUND_NINT); + check_results(&res, &exp, N_ELEMS, "_mm_fnmadd_sh"); + init_dest(&res, &exp); + emulate_fnmadd_sh(&exp, src1, src2, 0x1, 0, 1); + res.xmmh[0] = _mm_mask3_fnmadd_round_sh(src1.xmmh[0], src2.xmmh[0], + res.xmmh[0], 0x1, _ROUND_NINT); + check_results(&res, &exp, N_ELEMS, "_mm_mask3_fnmadd_sh"); + init_dest(&res, &exp); + emulate_fnmadd_sh(&exp, src1, src2, 0x1, 0, 0); + res.xmmh[0] = _mm_mask_fnmadd_round_sh(src1.xmmh[0], 0x1, src2.xmmh[0], + res.xmmh[0], _ROUND_NINT); + check_results(&res, &exp, N_ELEMS, "_mm_mask_fnmadd_sh"); + init_dest(&res, &exp); + emulate_fnmadd_sh(&exp, src1, src2, 0x3, 1, 0); + res.xmmh[0] = _mm_maskz_fnmadd_round_sh(0x3, src1.xmmh[0], src2.xmmh[0], + res.xmmh[0], _ROUND_NINT); + check_results(&res, &exp, N_ELEMS, "_mm_maskz_fnmadd_sh"); + + if (n_errs != 0) { + abort (); + } +} + diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vfnmsubXXXsh-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vfnmsubXXXsh-1a.c new file mode 100644 index 00000000000..5d1460838e7 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vfnmsubXXXsh-1a.c @@ -0,0 +1,28 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx512fp16 -O2" } */ +/* { dg-final { scan-assembler-times "vfnmsub...sh\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vfnmsub...sh\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 2 } } */ +/* { dg-final { scan-assembler-times "vfnmsub231sh\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vfnmsub...sh\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vfnmsub...sh\[ \\t\]+\[^\n\]*\{rn-sae\}\[^\{\n\]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vfnmsub...sh\[ \\t\]+\[^\n\]*\{rd-sae\}\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vfnmsub231sh\[ \\t\]+\[^\n\]*\{ru-sae\}\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vfnmsub...sh\[ \\t\]+\[^\n\]*\{rz-sae\}\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ + +#include + +volatile __m128h a, b, c; +volatile __mmask8 m; + +void extern +avx512f_test (void) +{ + a = _mm_fnmsub_sh (a, b, c); + a = _mm_mask_fnmsub_sh (a, m, b, c); + c = _mm_mask3_fnmsub_sh (a, b, c, m); + a = _mm_maskz_fnmsub_sh (m, a, b, c); + a = _mm_fnmsub_round_sh (a, b, c, _MM_FROUND_TO_NEAREST_INT | _MM_FROUND_NO_EXC); + a = _mm_mask_fnmsub_round_sh (a, m, b, c, _MM_FROUND_TO_NEG_INF | _MM_FROUND_NO_EXC); + c = _mm_mask3_fnmsub_round_sh (a, b, c, m, _MM_FROUND_TO_POS_INF | _MM_FROUND_NO_EXC); + a = _mm_maskz_fnmsub_round_sh (m, a, b, c, _MM_FROUND_TO_ZERO | _MM_FROUND_NO_EXC); +} diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vfnmsubXXXsh-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vfnmsubXXXsh-1b.c new file mode 100644 index 00000000000..7bdb861425f --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vfnmsubXXXsh-1b.c @@ -0,0 +1,90 @@ +/* { dg-do run { target avx512fp16 } } */ +/* { dg-options "-O2 -mavx512fp16 -mavx512dq" } */ + + +#define AVX512FP16 +#include "avx512fp16-helper.h" + +#define N_ELEMS 8 + +void NOINLINE +emulate_fnmsub_sh(V512 * dest, V512 op1, V512 op2, + __mmask8 k, int zero_mask, int mask3) +{ + V512 v1, v2, v3, v4, v5, v6, v7, v8; + int i; + + unpack_ph_2twops(op1, &v1, &v2); + unpack_ph_2twops(op2, &v3, &v4); + unpack_ph_2twops(*dest, &v7, &v8); + + if ((k&1) || !k) + v5.f32[0] = -(v1.f32[0] * v3.f32[0]) - v7.f32[0]; + else if (zero_mask) + v5.f32[0] = 0; + else + v5.f32[0] = v7.f32[0]; + + for (i = 1; i < 8; i++){ + if (mask3) + v5.f32[i] = v7.f32[i]; + else + v5.f32[i] = v1.f32[i]; + } + *dest = pack_twops_2ph(v5, v6); +} + +void +test_512 (void) +{ + V512 res; + V512 exp; + + init_src(); + + init_dest(&res, &exp); + emulate_fnmsub_sh(&exp, src1, src2, 0x1, 0, 0); + res.xmmh[0] = _mm_fnmsub_sh(src1.xmmh[0], src2.xmmh[0], res.xmmh[0]); + check_results(&res, &exp, N_ELEMS, "_mm_fnmsub_sh"); + init_dest(&res, &exp); + emulate_fnmsub_sh(&exp, src1, src2, 0x1, 0, 1); + res.xmmh[0] = _mm_mask3_fnmsub_sh(src1.xmmh[0], src2.xmmh[0], res.xmmh[0], + 0x1); + check_results(&res, &exp, N_ELEMS, "_mm_mask3_fnmsub_sh"); + init_dest(&res, &exp); + emulate_fnmsub_sh(&exp, src1, src2, 0x1, 0, 0); + res.xmmh[0] = _mm_mask_fnmsub_sh(src1.xmmh[0], 0x1, src2.xmmh[0], + res.xmmh[0]); + check_results(&res, &exp, N_ELEMS, "_mm_mask_fnmsub_sh"); + init_dest(&res, &exp); + emulate_fnmsub_sh(&exp, src1, src2, 0x3, 1, 0); + res.xmmh[0] = _mm_maskz_fnmsub_sh(0x3, src1.xmmh[0], src2.xmmh[0], + res.xmmh[0]); + check_results(&res, &exp, N_ELEMS, "_mm_maskz_fnmsub_sh"); + + init_dest(&res, &exp); + emulate_fnmsub_sh(&exp, src1, src2, 0x1, 0, 0); + res.xmmh[0] = _mm_fnmsub_round_sh(src1.xmmh[0], src2.xmmh[0], res.xmmh[0], + _ROUND_NINT); + check_results(&res, &exp, N_ELEMS, "_mm_fnmsub_sh"); + init_dest(&res, &exp); + emulate_fnmsub_sh(&exp, src1, src2, 0x1, 0, 1); + res.xmmh[0] = _mm_mask3_fnmsub_round_sh(src1.xmmh[0], src2.xmmh[0], + res.xmmh[0], 0x1, _ROUND_NINT); + check_results(&res, &exp, N_ELEMS, "_mm_mask3_fnmsub_sh"); + init_dest(&res, &exp); + emulate_fnmsub_sh(&exp, src1, src2, 0x1, 0, 0); + res.xmmh[0] = _mm_mask_fnmsub_round_sh(src1.xmmh[0], 0x1, src2.xmmh[0], + res.xmmh[0], _ROUND_NINT); + check_results(&res, &exp, N_ELEMS, "_mm_mask_fnmsub_sh"); + init_dest(&res, &exp); + emulate_fnmsub_sh(&exp, src1, src2, 0x3, 1, 0); + res.xmmh[0] = _mm_maskz_fnmsub_round_sh(0x3, src1.xmmh[0], src2.xmmh[0], + res.xmmh[0], _ROUND_NINT); + check_results(&res, &exp, N_ELEMS, "_mm_maskz_fnmsub_sh"); + + if (n_errs != 0) { + abort (); + } +} + From patchwork Thu Jul 1 06:16:35 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: liuhongt X-Patchwork-Id: 1499402 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256 header.s=default header.b=GbKdRZw8; dkim-atps=neutral Received: from sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4GFqF03N2Hz9sVb for ; Thu, 1 Jul 2021 17:13:24 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 097B03944427 for ; Thu, 1 Jul 2021 07:13:22 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 097B03944427 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1625123602; bh=zFdRjmRdmrUYk6SVqYdHBfZGzfwMDNmXGig2P1e4pRQ=; h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=GbKdRZw8J4i/I32gVrPvQ01syYc5NAFF9Mxn5yA5XzlPaj19AhJPL1VvxxsBrZMz5 TzHMWuCiguXpmqCq0C/t1IsZ74aLR6OqDPPMkF+HXLySNmoVuq9PriOXS1BuOYmcr7 o5bNcN+Fxo16PBfcKPwpUrmkNOgwXN2uVUfLykfQ= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mga05.intel.com (mga05.intel.com [192.55.52.43]) by sourceware.org (Postfix) with ESMTPS id 968553840022 for ; Thu, 1 Jul 2021 06:18:12 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 968553840022 X-IronPort-AV: E=McAfee;i="6200,9189,10031"; a="294115045" X-IronPort-AV: E=Sophos;i="5.83,313,1616482800"; d="scan'208";a="294115045" Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by fmsmga105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Jun 2021 23:18:10 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.83,313,1616482800"; d="scan'208";a="420287989" Received: from scymds01.sc.intel.com ([10.148.94.138]) by fmsmga007.fm.intel.com with ESMTP; 30 Jun 2021 23:18:10 -0700 Received: from shliclel320.sh.intel.com (shliclel320.sh.intel.com [10.239.236.50]) by scymds01.sc.intel.com with ESMTP id 1616GmfS031625; Wed, 30 Jun 2021 23:18:08 -0700 To: gcc-patches@gcc.gnu.org Subject: [PATCH 49/62] AVX512FP16: Add vfcmaddcph/vfmaddcph/vfcmulcph/vfmulcph Date: Thu, 1 Jul 2021 14:16:35 +0800 Message-Id: <20210701061648.9447-50-hongtao.liu@intel.com> X-Mailer: git-send-email 2.18.1 In-Reply-To: <20210701061648.9447-1-hongtao.liu@intel.com> References: <20210701061648.9447-1-hongtao.liu@intel.com> X-Spam-Status: No, score=-12.2 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_NONE, KAM_DMARC_STATUS, KAM_LAZY_DOMAIN_SECURITY, KAM_SHORT, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: liuhongt via Gcc-patches From: liuhongt Reply-To: liuhongt Cc: jakub@redhat.com Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" gcc/ChangeLog: * config/i386/avx512fp16intrin.h (_mm512_fcmadd_pch): New intrinsic. (_mm512_mask_fcmadd_pch): Likewise. (_mm512_mask3_fcmadd_pch): Likewise. (_mm512_maskz_fcmadd_pch): Likewise. (_mm512_fmadd_pch): Likewise. (_mm512_mask_fmadd_pch): Likewise. (_mm512_mask3_fmadd_pch): Likewise. (_mm512_maskz_fmadd_pch): Likewise. (_mm512_fcmadd_round_pch): Likewise. (_mm512_mask_fcmadd_round_pch): Likewise. (_mm512_mask3_fcmadd_round_pch): Likewise. (_mm512_maskz_fcmadd_round_pch): Likewise. (_mm512_fmadd_round_pch): Likewise. (_mm512_mask_fmadd_round_pch): Likewise. (_mm512_mask3_fmadd_round_pch): Likewise. (_mm512_maskz_fmadd_round_pch): Likewise. (_mm512_fcmul_pch): Likewise. (_mm512_mask_fcmul_pch): Likewise. (_mm512_maskz_fcmul_pch): Likewise. (_mm512_fmul_pch): Likewise. (_mm512_mask_fmul_pch): Likewise. (_mm512_maskz_fmul_pch): Likewise. (_mm512_fcmul_round_pch): Likewise. (_mm512_mask_fcmul_round_pch): Likewise. (_mm512_maskz_fcmul_round_pch): Likewise. (_mm512_fmul_round_pch): Likewise. (_mm512_mask_fmul_round_pch): Likewise. (_mm512_maskz_fmul_round_pch): Likewise. * config/i386/avx512fp16vlintrin.h (_mm_fmadd_pch): New intrinsic. (_mm_mask_fmadd_pch): Likewise. (_mm_mask3_fmadd_pch): Likewise. (_mm_maskz_fmadd_pch): Likewise. (_mm256_fmadd_pch): Likewise. (_mm256_mask_fmadd_pch): Likewise. (_mm256_mask3_fmadd_pch): Likewise. (_mm256_maskz_fmadd_pch): Likewise. (_mm_fcmadd_pch): Likewise. (_mm_mask_fcmadd_pch): Likewise. (_mm_mask3_fcmadd_pch): Likewise. (_mm_maskz_fcmadd_pch): Likewise. (_mm256_fcmadd_pch): Likewise. (_mm256_mask_fcmadd_pch): Likewise. (_mm256_mask3_fcmadd_pch): Likewise. (_mm256_maskz_fcmadd_pch): Likewise. (_mm_fmul_pch): Likewise. (_mm_mask_fmul_pch): Likewise. (_mm_maskz_fmul_pch): Likewise. (_mm256_fmul_pch): Likewise. (_mm256_mask_fmul_pch): Likewise. (_mm256_maskz_fmul_pch): Likewise. (_mm_fcmul_pch): Likewise. (_mm_mask_fcmul_pch): Likewise. (_mm_maskz_fcmul_pch): Likewise. (_mm256_fcmul_pch): Likewise. (_mm256_mask_fcmul_pch): Likewise. (_mm256_maskz_fcmul_pch): Likewise. * config/i386/i386-builtin-types.def (V8HF_FTYPE_V8HF_V8HF_V8HF, V8HF_FTYPE_V16HF_V16HF_V16HF, V16HF_FTYPE_V16HF_V16HF_V16HF_UQI, V32HF_FTYPE_V32HF_V32HF_V32HF_INT, V32HF_FTYPE_V32HF_V32HF_V32HF_UHI_INT): Add new builtin types. * config/i386/i386-builtin.def: Add new builtins. * config/i386/i386-expand.c: Handle new builtin types. * config/i386/subst.md (SUBST_CV): New. (maskc_name): Ditto. (maskc_operand3): Ditto. (maskc): Ditto. (sdc_maskz_name): Ditto. (sdc_mask_op4): Ditto. (sdc_mask_op5): Ditto. (sdc_mask_mode512bit_condition): Ditto. (sdc): Ditto. (round_maskc_operand3): Ditto. (round_sdc_mask_operand4): Ditto. (round_maskc_op3): Ditto. (round_sdc_mask_op4): Ditto. (round_saeonly_sdc_mask_operand5): Ditto. * config/i386/sse.md (unspec): Add complex fma unspecs. (avx512fmaskcmode): New. (UNSPEC_COMPLEX_F_C_MA): Ditto. (UNSPEC_COMPLEX_F_C_MUL): Ditto. (complexopname): Ditto. (_fmaddc__maskz): New expander. (_fcmaddc__maskz): Ditto. (fma__): New define insn. (___mask): Ditto. (__): Ditto. gcc/testsuite/ChangeLog: * gcc.target/i386/avx-1.c: Add test for new builtins. * gcc.target/i386/sse-13.c: Ditto. * gcc.target/i386/sse-23.c: Ditto. * gcc.target/i386/sse-14.c: Add test for new intrinsics. * gcc.target/i386/sse-22.c: Ditto. --- gcc/config/i386/avx512fp16intrin.h | 386 +++++++++++++++++++++++++ gcc/config/i386/avx512fp16vlintrin.h | 257 ++++++++++++++++ gcc/config/i386/i386-builtin-types.def | 5 + gcc/config/i386/i386-builtin.def | 30 ++ gcc/config/i386/i386-expand.c | 5 + gcc/config/i386/sse.md | 98 +++++++ gcc/config/i386/subst.md | 40 +++ gcc/testsuite/gcc.target/i386/avx-1.c | 10 + gcc/testsuite/gcc.target/i386/sse-13.c | 10 + gcc/testsuite/gcc.target/i386/sse-14.c | 14 + gcc/testsuite/gcc.target/i386/sse-22.c | 14 + gcc/testsuite/gcc.target/i386/sse-23.c | 10 + 12 files changed, 879 insertions(+) diff --git a/gcc/config/i386/avx512fp16intrin.h b/gcc/config/i386/avx512fp16intrin.h index 5c85ec15b22..9dd71019972 100644 --- a/gcc/config/i386/avx512fp16intrin.h +++ b/gcc/config/i386/avx512fp16intrin.h @@ -6109,6 +6109,392 @@ _mm_maskz_fnmsub_round_sh (__mmask8 __U, __m128h __W, __m128h __A, #endif /* __OPTIMIZE__ */ +/* Intrinsics vf[,c]maddcph. */ +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_fcmadd_pch (__m512h __A, __m512h __B, __m512h __C) +{ + return (__m512h) + __builtin_ia32_vfcmaddcph_v32hf_round ((__v32hf) __C, + (__v32hf) __A, + (__v32hf) __B, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_fcmadd_pch (__m512h __A, __mmask16 __B, __m512h __C, __m512h __D) +{ + return (__m512h) __builtin_ia32_movaps512_mask + ((__v16sf) + __builtin_ia32_vfcmaddcph_v32hf_mask_round ((__v32hf) __D, + (__v32hf) __A, + (__v32hf) __C, __B, + _MM_FROUND_CUR_DIRECTION), + (__v16sf) __A, __B); +} + +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask3_fcmadd_pch (__m512h __A, __m512h __B, __m512h __C, __mmask16 __D) +{ + return (__m512h) + __builtin_ia32_vfcmaddcph_v32hf_mask_round ((__v32hf) __C, + (__v32hf) __A, + (__v32hf) __B, + __D, _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_fcmadd_pch (__mmask16 __A, __m512h __B, __m512h __C, __m512h __D) +{ + return (__m512h) + __builtin_ia32_vfcmaddcph_v32hf_maskz_round((__v32hf) __D, + (__v32hf) __B, + (__v32hf) __C, + __A, _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_fmadd_pch (__m512h __A, __m512h __B, __m512h __C) +{ + return (__m512h) + __builtin_ia32_vfmaddcph_v32hf_round((__v32hf) __C, + (__v32hf) __A, + (__v32hf) __B, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_fmadd_pch (__m512h __A, __mmask16 __B, __m512h __C, __m512h __D) +{ + return (__m512h) __builtin_ia32_movaps512_mask + ((__v16sf) + __builtin_ia32_vfmaddcph_v32hf_mask_round ((__v32hf) __D, + (__v32hf) __A, + (__v32hf) __C, __B, + _MM_FROUND_CUR_DIRECTION), + (__v16sf) __A, __B); +} + +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask3_fmadd_pch (__m512h __A, __m512h __B, __m512h __C, __mmask16 __D) +{ + return (__m512h) + __builtin_ia32_vfmaddcph_v32hf_mask_round((__v32hf) __C, + (__v32hf) __A, + (__v32hf) __B, + __D, _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_fmadd_pch (__mmask16 __A, __m512h __B, __m512h __C, __m512h __D) +{ + return (__m512h) + __builtin_ia32_vfmaddcph_v32hf_maskz_round((__v32hf) __D, + (__v32hf) __B, + (__v32hf) __C, + __A, _MM_FROUND_CUR_DIRECTION); +} + +#ifdef __OPTIMIZE__ +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_fcmadd_round_pch (__m512h __A, __m512h __B, __m512h __C, const int __D) +{ + return (__m512h)__builtin_ia32_vfcmaddcph_v32hf_round((__v32hf) __C, + (__v32hf) __A, + (__v32hf) __B, + __D); +} + +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_fcmadd_round_pch (__m512h __A, __mmask16 __B, __m512h __C, + __m512h __D, const int __E) +{ + return (__m512h) __builtin_ia32_movaps512_mask + ((__v16sf) + __builtin_ia32_vfcmaddcph_v32hf_mask_round ((__v32hf) __D, + (__v32hf) __A, + (__v32hf) __C, __B, + __E), + (__v16sf) __A, __B); +} + +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask3_fcmadd_round_pch (__m512h __A, __m512h __B, __m512h __C, + __mmask16 __D, const int __E) +{ + return (__m512h) + __builtin_ia32_vfcmaddcph_v32hf_mask_round ((__v32hf) __C, + (__v32hf) __A, + (__v32hf) __B, + __D, __E); +} + +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_fcmadd_round_pch (__mmask16 __A, __m512h __B, __m512h __C, + __m512h __D, const int __E) +{ + return (__m512h)__builtin_ia32_vfcmaddcph_v32hf_maskz_round((__v32hf) __D, + (__v32hf) __B, + (__v32hf) __C, + __A, + __E); +} + +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_fmadd_round_pch (__m512h __A, __m512h __B, __m512h __C, const int __D) +{ + return (__m512h) + __builtin_ia32_vfmaddcph_v32hf_round ((__v32hf) __C, + (__v32hf) __A, + (__v32hf) __B, + __D); +} + +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_fmadd_round_pch (__m512h __A, __mmask16 __B, __m512h __C, + __m512h __D, const int __E) +{ + return (__m512h) __builtin_ia32_movaps512_mask + ((__v16sf) + __builtin_ia32_vfmaddcph_v32hf_mask_round ((__v32hf) __D, + (__v32hf) __A, + (__v32hf) __C, __B, + __E), + (__v16sf) __A, __B); +} + +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask3_fmadd_round_pch (__m512h __A, __m512h __B, __m512h __C, + __mmask16 __D, const int __E) +{ + return (__m512h) + __builtin_ia32_vfmaddcph_v32hf_mask_round ((__v32hf) __C, + (__v32hf) __A, + (__v32hf) __B, + __D, __E); +} + +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_fmadd_round_pch (__mmask16 __A, __m512h __B, __m512h __C, + __m512h __D, const int __E) +{ + return (__m512h)__builtin_ia32_vfmaddcph_v32hf_maskz_round((__v32hf) __D, + (__v32hf) __B, + (__v32hf) __C, + __A, __E); +} + +#else +#define _mm512_fcmadd_round_pch(A, B, C, D) \ + (__m512h) __builtin_ia32_vfcmaddcph_v32hf_round ((C), (A), (B), (D)) + +#define _mm512_mask_fcmadd_round_pch(A, B, C, D, E) \ + ((__m512h) __builtin_ia32_movaps512_mask ( \ + (__v16sf) \ + __builtin_ia32_vfcmaddcph_v32hf_mask_round ((__v32hf) (D), \ + (__v32hf) (A), \ + (__v32hf) (C), \ + (B), (E)), \ + (__v16sf) (A), (B))); + + +#define _mm512_mask3_fcmadd_round_pch(A, B, C, D, E) \ + ((__m512h) \ + __builtin_ia32_vfcmaddcph_v32hf_mask_round ((C), (A), (B), (D), (E))) + +#define _mm512_maskz_fcmadd_round_pch(A, B, C, D, E) \ + (__m512h) \ + __builtin_ia32_vfcmaddcph_v32hf_maskz_round((D), (B), (C), (A), (E)) + +#define _mm512_fmadd_round_pch(A, B, C, D) \ + (__m512h) __builtin_ia32_vfmaddcph_v32hf_round((C), (A), (B), (D)) + +#define _mm512_mask_fmadd_round_pch(A, B, C, D, E) \ + ((__m512h) __builtin_ia32_movaps512_mask ( \ + (__v16sf) \ + __builtin_ia32_vfmaddcph_v32hf_mask_round ((__v32hf) (D), \ + (__v32hf) (A), \ + (__v32hf) (C), \ + (B), (E)), \ + (__v16sf) (A), (B))); + +#define _mm512_mask3_fmadd_round_pch(A, B, C, D, E) \ + (__m512h) \ + __builtin_ia32_vfmaddcph_v32hf_mask_round((C), (A), (B), (D), (E)) + +#define _mm512_maskz_fmadd_round_pch(A, B, C, D, E) \ + (__m512h) \ + __builtin_ia32_vfmaddcph_v32hf_maskz_round((D), (B), (C), (A), (E)) + +#endif /* __OPTIMIZE__ */ + +/* Intrinsics vf[,c]mulcph. */ +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_fcmul_pch (__m512h __A, __m512h __B) +{ + return (__m512h) + __builtin_ia32_vfcmulcph_v32hf_round((__v32hf) __A, + (__v32hf) __B, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_fcmul_pch (__m512h __A, __mmask16 __B, __m512h __C, __m512h __D) +{ + return (__m512h) + __builtin_ia32_vfcmulcph_v32hf_mask_round((__v32hf) __C, + (__v32hf) __D, + (__v32hf) __A, + __B, _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_fcmul_pch (__mmask16 __A, __m512h __B, __m512h __C) +{ + return (__m512h) + __builtin_ia32_vfcmulcph_v32hf_mask_round((__v32hf) __B, + (__v32hf) __C, + _mm512_setzero_ph (), + __A, _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_fmul_pch (__m512h __A, __m512h __B) +{ + return (__m512h) + __builtin_ia32_vfmulcph_v32hf_round((__v32hf) __A, + (__v32hf) __B, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_fmul_pch (__m512h __A, __mmask16 __B, __m512h __C, __m512h __D) +{ + return (__m512h) + __builtin_ia32_vfmulcph_v32hf_mask_round((__v32hf) __C, + (__v32hf) __D, + (__v32hf) __A, + __B, _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_fmul_pch (__mmask16 __A, __m512h __B, __m512h __C) +{ + return (__m512h) + __builtin_ia32_vfmulcph_v32hf_mask_round((__v32hf) __B, + (__v32hf) __C, + _mm512_setzero_ph (), + __A, _MM_FROUND_CUR_DIRECTION); +} + +#ifdef __OPTIMIZE__ +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_fcmul_round_pch (__m512h __A, __m512h __B, const int __D) +{ + return (__m512h)__builtin_ia32_vfcmulcph_v32hf_round((__v32hf) __A, + (__v32hf) __B, __D); +} + +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_fcmul_round_pch (__m512h __A, __mmask16 __B, __m512h __C, + __m512h __D, const int __E) +{ + return (__m512h)__builtin_ia32_vfcmulcph_v32hf_mask_round((__v32hf) __C, + (__v32hf) __D, + (__v32hf) __A, + __B, __E); +} + +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_fcmul_round_pch (__mmask16 __A, __m512h __B, + __m512h __C, const int __E) +{ + return (__m512h)__builtin_ia32_vfcmulcph_v32hf_mask_round((__v32hf) __B, + (__v32hf) __C, + _mm512_setzero_ph (), + __A, __E); +} + +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_fmul_round_pch (__m512h __A, __m512h __B, const int __D) +{ + return (__m512h)__builtin_ia32_vfmulcph_v32hf_round((__v32hf) __A, + (__v32hf) __B, + __D); +} + +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_fmul_round_pch (__m512h __A, __mmask16 __B, __m512h __C, + __m512h __D, const int __E) +{ + return (__m512h)__builtin_ia32_vfmulcph_v32hf_mask_round((__v32hf) __C, + (__v32hf) __D, + (__v32hf) __A, + __B, __E); +} + +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_fmul_round_pch (__mmask16 __A, __m512h __B, + __m512h __C, const int __E) +{ + return (__m512h)__builtin_ia32_vfmulcph_v32hf_mask_round((__v32hf) __B, + (__v32hf) __C, + _mm512_setzero_ph (), + __A, __E); +} + +#else +#define _mm512_fcmul_round_pch(A, B, D) \ + (__m512h)__builtin_ia32_vfcmulcph_v32hf_round(A, B, D) + +#define _mm512_mask_fcmul_round_pch(A, B, C, D, E) \ + (__m512h)__builtin_ia32_vfcmulcph_v32hf_mask_round(C, D, A, B, E) + +#define _mm512_maskz_fcmul_round_pch(A, B, C, E) \ + (__m512h)__builtin_ia32_vfcmulcph_v32hf_mask_round(B, C, \ + _mm512_setzero_ph(), \ + A, E) + +#define _mm512_fmul_round_pch(A, B, D) \ + (__m512h)__builtin_ia32_vfmulcph_v32hf_round(A, B, D) + +#define _mm512_mask_fmul_round_pch(A, B, C, D, E) \ + (__m512h)__builtin_ia32_vfmulcph_v32hf_mask_round(C, D, A, B, E) + +#define _mm512_maskz_fmul_round_pch(A, B, C, E) \ + (__m512h)__builtin_ia32_vfmulcph_v32hf_mask_round(B, C, \ + _mm512_setzero_ph (), \ + A, E) + +#endif /* __OPTIMIZE__ */ + #ifdef __DISABLE_AVX512FP16__ #undef __DISABLE_AVX512FP16__ #pragma GCC pop_options diff --git a/gcc/config/i386/avx512fp16vlintrin.h b/gcc/config/i386/avx512fp16vlintrin.h index bba98f105ac..c7bdfbc0517 100644 --- a/gcc/config/i386/avx512fp16vlintrin.h +++ b/gcc/config/i386/avx512fp16vlintrin.h @@ -2815,6 +2815,263 @@ _mm_maskz_fnmsub_ph (__mmask8 __U, __m128h __A, __m128h __B, __U); } +/* Intrinsics vf[,c]maddcph. */ +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_fmadd_pch (__m128h __A, __m128h __B, __m128h __C) +{ + return (__m128h)__builtin_ia32_vfmaddcph_v8hf((__v8hf) __C, (__v8hf) __A, + (__v8hf) __B); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_fmadd_pch (__m128h __A, __mmask8 __B, __m128h __C, __m128h __D) +{ + return (__m128h) __builtin_ia32_movaps128_mask + ((__v4sf) + __builtin_ia32_vfmaddcph_v8hf_mask ((__v8hf) __D, + (__v8hf) __A, + (__v8hf) __C, __B), + (__v4sf) __A, __B); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask3_fmadd_pch (__m128h __A, __m128h __B, __m128h __C, __mmask8 __D) +{ + return (__m128h) __builtin_ia32_vfmaddcph_v8hf_mask ((__v8hf) __C, + (__v8hf) __A, + (__v8hf) __B, __D); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_fmadd_pch (__mmask8 __A, __m128h __B, __m128h __C, __m128h __D) +{ + return (__m128h)__builtin_ia32_vfmaddcph_v8hf_maskz((__v8hf) __D, + (__v8hf) __B, + (__v8hf) __C, __A); +} + +extern __inline __m256h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_fmadd_pch (__m256h __A, __m256h __B, __m256h __C) +{ + return (__m256h)__builtin_ia32_vfmaddcph_v16hf((__v16hf) __C, (__v16hf) __A, + (__v16hf) __B); +} + +extern __inline __m256h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_mask_fmadd_pch (__m256h __A, __mmask8 __B, __m256h __C, __m256h __D) +{ + return (__m256h) __builtin_ia32_movaps256_mask + ((__v8sf) + __builtin_ia32_vfmaddcph_v16hf_mask ((__v16hf) __D, + (__v16hf) __A, + (__v16hf) __C, __B), + (__v8sf) __A, __B); +} + +extern __inline __m256h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_mask3_fmadd_pch (__m256h __A, __m256h __B, __m256h __C, __mmask8 __D) +{ + return (__m256h) __builtin_ia32_vfmaddcph_v16hf_mask ((__v16hf) __C, + (__v16hf) __A, + (__v16hf) __B, __D); +} + +extern __inline __m256h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_maskz_fmadd_pch (__mmask8 __A, __m256h __B, __m256h __C, __m256h __D) +{ + return (__m256h)__builtin_ia32_vfmaddcph_v16hf_maskz((__v16hf) __D, + (__v16hf) __B, + (__v16hf) __C, __A); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_fcmadd_pch (__m128h __A, __m128h __B, __m128h __C) +{ + return (__m128h)__builtin_ia32_vfcmaddcph_v8hf ((__v8hf) __C, + (__v8hf) __A, (__v8hf) __B); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_fcmadd_pch (__m128h __A, __mmask8 __B, __m128h __C, __m128h __D) +{ + return (__m128h)__builtin_ia32_movaps128_mask + ((__v4sf) + __builtin_ia32_vfcmaddcph_v8hf_mask ((__v8hf) __D, + (__v8hf) __A, + (__v8hf) __C, __B), + (__v4sf) __A, __B); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask3_fcmadd_pch (__m128h __A, __m128h __B, __m128h __C, __mmask8 __D) +{ + return (__m128h) __builtin_ia32_vfcmaddcph_v8hf_mask ((__v8hf) __C, + (__v8hf) __A, + (__v8hf) __B, __D); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_fcmadd_pch (__mmask8 __A, __m128h __B, __m128h __C, __m128h __D) +{ + return (__m128h)__builtin_ia32_vfcmaddcph_v8hf_maskz ((__v8hf) __D, + (__v8hf) __B, + (__v8hf) __C, __A); +} + +extern __inline __m256h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_fcmadd_pch (__m256h __A, __m256h __B, __m256h __C) +{ + return (__m256h)__builtin_ia32_vfcmaddcph_v16hf((__v16hf) __C, + (__v16hf) __A, (__v16hf) __B); +} + +extern __inline __m256h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_mask_fcmadd_pch (__m256h __A, __mmask8 __B, __m256h __C, __m256h __D) +{ + return (__m256h) __builtin_ia32_movaps256_mask + ((__v8sf) + __builtin_ia32_vfcmaddcph_v16hf_mask ((__v16hf) __D, + (__v16hf) __A, + (__v16hf) __C, __B), + (__v8sf) __A, __B); +} + +extern __inline __m256h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_mask3_fcmadd_pch (__m256h __A, __m256h __B, __m256h __C, __mmask8 __D) +{ + return (__m256h) __builtin_ia32_vfcmaddcph_v16hf_mask ((__v16hf) __C, + (__v16hf) __A, + (__v16hf) __B, __D); +} + +extern __inline __m256h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_maskz_fcmadd_pch (__mmask8 __A, __m256h __B, __m256h __C, __m256h __D) +{ + return (__m256h)__builtin_ia32_vfcmaddcph_v16hf_maskz((__v16hf) __D, + (__v16hf) __B, + (__v16hf) __C, __A); +} + +/* Intrinsics vf[,c]mulcph. */ +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_fmul_pch (__m128h __A, __m128h __B) +{ + return (__m128h)__builtin_ia32_vfmulcph_v8hf((__v8hf) __A, (__v8hf) __B); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_fmul_pch (__m128h __A, __mmask8 __B, __m128h __C, __m128h __D) +{ + return (__m128h)__builtin_ia32_vfmulcph_v8hf_mask((__v8hf) __C, + (__v8hf) __D, + (__v8hf) __A, __B); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_fmul_pch (__mmask8 __A, __m128h __B, __m128h __C) +{ + return (__m128h)__builtin_ia32_vfmulcph_v8hf_mask((__v8hf) __B, + (__v8hf) __C, + _mm_setzero_ph (), + __A); +} + +extern __inline __m256h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_fmul_pch (__m256h __A, __m256h __B) +{ + return (__m256h)__builtin_ia32_vfmulcph_v16hf((__v16hf) __A, (__v16hf) __B); +} + +extern __inline __m256h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_mask_fmul_pch (__m256h __A, __mmask8 __B, __m256h __C, __m256h __D) +{ + return (__m256h)__builtin_ia32_vfmulcph_v16hf_mask((__v16hf) __C, + (__v16hf) __D, + (__v16hf) __A, __B); +} + +extern __inline __m256h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_maskz_fmul_pch (__mmask8 __A, __m256h __B, __m256h __C) +{ + return (__m256h)__builtin_ia32_vfmulcph_v16hf_mask((__v16hf) __B, + (__v16hf) __C, + _mm256_setzero_ph (), + __A); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_fcmul_pch (__m128h __A, __m128h __B) +{ + return (__m128h)__builtin_ia32_vfcmulcph_v8hf((__v8hf) __A, (__v8hf) __B); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_fcmul_pch (__m128h __A, __mmask8 __B, __m128h __C, __m128h __D) +{ + return (__m128h)__builtin_ia32_vfcmulcph_v8hf_mask((__v8hf) __C, (__v8hf) __D, + (__v8hf) __A, __B); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_fcmul_pch (__mmask8 __A, __m128h __B, __m128h __C) +{ + return (__m128h)__builtin_ia32_vfcmulcph_v8hf_mask((__v8hf) __B, + (__v8hf) __C, + _mm_setzero_ph (), + __A); +} + +extern __inline __m256h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_fcmul_pch (__m256h __A, __m256h __B) +{ + return (__m256h)__builtin_ia32_vfcmulcph_v16hf((__v16hf) __A, (__v16hf) __B); +} + +extern __inline __m256h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_mask_fcmul_pch (__m256h __A, __mmask8 __B, __m256h __C, __m256h __D) +{ + return (__m256h)__builtin_ia32_vfcmulcph_v16hf_mask((__v16hf) __C, + (__v16hf) __D, + (__v16hf) __A, __B); +} + +extern __inline __m256h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_maskz_fcmul_pch (__mmask8 __A, __m256h __B, __m256h __C) +{ + return (__m256h)__builtin_ia32_vfcmulcph_v16hf_mask((__v16hf) __B, + (__v16hf) __C, + _mm256_setzero_ph (), + __A); +} + #ifdef __DISABLE_AVX512FP16VL__ #undef __DISABLE_AVX512FP16VL__ #pragma GCC pop_options diff --git a/gcc/config/i386/i386-builtin-types.def b/gcc/config/i386/i386-builtin-types.def index 22b924bf98d..35bcafd14e3 100644 --- a/gcc/config/i386/i386-builtin-types.def +++ b/gcc/config/i386/i386-builtin-types.def @@ -1348,6 +1348,7 @@ DEF_FUNCTION_TYPE (V8DI, V8HF, V8DI, UQI, INT) DEF_FUNCTION_TYPE (V8DF, V8HF, V8DF, UQI, INT) DEF_FUNCTION_TYPE (V8HF, V8DI, V8HF, UQI, INT) DEF_FUNCTION_TYPE (V8HF, V8DF, V8HF, UQI, INT) +DEF_FUNCTION_TYPE (V8HF, V8HF, V8HF, V8HF) DEF_FUNCTION_TYPE (V8HF, V8HF, V8HF, V8HF, UQI, INT) DEF_FUNCTION_TYPE (V8HF, V2DF, V8HF, V8HF, UQI, INT) DEF_FUNCTION_TYPE (V8HF, V4SF, V8HF, V8HF, UQI, INT) @@ -1358,12 +1359,14 @@ DEF_FUNCTION_TYPE (V16HF, V16HF, V16HF) DEF_FUNCTION_TYPE (V16HI, V16HF, V16HI, UHI) DEF_FUNCTION_TYPE (V16HF, V16HI, V16HF, UHI) DEF_FUNCTION_TYPE (V16HF, V16HF, V16HF, UHI) +DEF_FUNCTION_TYPE (V16HF, V16HF, V16HF, V16HF) DEF_FUNCTION_TYPE (V16SI, V16HF, V16SI, UHI, INT) DEF_FUNCTION_TYPE (V16SF, V16HF, V16SF, UHI, INT) DEF_FUNCTION_TYPE (V16HF, V16HF, INT, V16HF, UHI) DEF_FUNCTION_TYPE (UHI, V16HF, V16HF, INT, UHI) DEF_FUNCTION_TYPE (V16HF, V16SI, V16HF, UHI, INT) DEF_FUNCTION_TYPE (V16HF, V16SF, V16HF, UHI, INT) +DEF_FUNCTION_TYPE (V16HF, V16HF, V16HF, V16HF, UQI) DEF_FUNCTION_TYPE (V16HF, V16HF, V16HF, V16HF, UHI) DEF_FUNCTION_TYPE (V32HF, V32HF, V32HF, USI) DEF_FUNCTION_TYPE (V32HF, V32HF, V32HF, INT) @@ -1371,7 +1374,9 @@ DEF_FUNCTION_TYPE (V32HI, V32HF, V32HI, USI, INT) DEF_FUNCTION_TYPE (V32HF, V32HI, V32HF, USI, INT) DEF_FUNCTION_TYPE (USI, V32HF, V32HF, INT, USI) DEF_FUNCTION_TYPE (V32HF, V32HF, V32HF, USI, INT) +DEF_FUNCTION_TYPE (V32HF, V32HF, V32HF, V32HF, INT) DEF_FUNCTION_TYPE (V32HF, V32HF, V32HF, V32HF, USI) DEF_FUNCTION_TYPE (USI, V32HF, V32HF, INT, USI, INT) +DEF_FUNCTION_TYPE (V32HF, V32HF, V32HF, V32HF, UHI, INT) DEF_FUNCTION_TYPE (V32HF, V32HF, V32HF, V32HF, USI, INT) DEF_FUNCTION_TYPE (V32HF, V32HF, INT, V32HF, USI, INT) diff --git a/gcc/config/i386/i386-builtin.def b/gcc/config/i386/i386-builtin.def index f446a6ce5d3..448f9f75fa4 100644 --- a/gcc/config/i386/i386-builtin.def +++ b/gcc/config/i386/i386-builtin.def @@ -2911,6 +2911,26 @@ BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512vl_ BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_fnmsub_v8hf_mask, "__builtin_ia32_vfnmsubph128_mask", IX86_BUILTIN_VFNMSUBPH128_MASK, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI) BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_fnmsub_v8hf_mask3, "__builtin_ia32_vfnmsubph128_mask3", IX86_BUILTIN_VFNMSUBPH128_MASK3, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI) BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_fnmsub_v8hf_maskz, "__builtin_ia32_vfnmsubph128_maskz", IX86_BUILTIN_VFNMSUBPH128_MASKZ, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI) +BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_fma_fmaddc_v8hf, "__builtin_ia32_vfmaddcph_v8hf", IX86_BUILTIN_VFMADDCPH_V8HF, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF) +BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_fmaddc_v8hf_mask, "__builtin_ia32_vfmaddcph_v8hf_mask", IX86_BUILTIN_VFMADDCPH_V8HF_MASK, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI) +BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_fmaddc_v8hf_maskz, "__builtin_ia32_vfmaddcph_v8hf_maskz", IX86_BUILTIN_VFMADDCPH_V8HF_MASKZ, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI) +BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_fma_fmaddc_v16hf, "__builtin_ia32_vfmaddcph_v16hf", IX86_BUILTIN_VFMADDCPH_V16HF, UNKNOWN, (int) V16HF_FTYPE_V16HF_V16HF_V16HF) +BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512vl_fmaddc_v16hf_mask, "__builtin_ia32_vfmaddcph_v16hf_mask", IX86_BUILTIN_VFMADDCPH_V16HF_MASK, UNKNOWN, (int) V16HF_FTYPE_V16HF_V16HF_V16HF_UQI) +BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512vl_fmaddc_v16hf_maskz, "__builtin_ia32_vfmaddcph_v16hf_maskz", IX86_BUILTIN_VFMADDCPH_V16HF_MASKZ, UNKNOWN, (int) V16HF_FTYPE_V16HF_V16HF_V16HF_UQI) +BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_fma_fcmaddc_v8hf, "__builtin_ia32_vfcmaddcph_v8hf", IX86_BUILTIN_VFCMADDCPH_V8HF, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF) +BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_fcmaddc_v8hf_mask, "__builtin_ia32_vfcmaddcph_v8hf_mask", IX86_BUILTIN_VFCMADDCPH_V8HF_MASK, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI) +BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_fcmaddc_v8hf_maskz, "__builtin_ia32_vfcmaddcph_v8hf_maskz", IX86_BUILTIN_VFCMADDCPH_V8HF_MASKZ, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI) +BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_fma_fcmaddc_v16hf, "__builtin_ia32_vfcmaddcph_v16hf", IX86_BUILTIN_VFCMADDCPH_V16HF, UNKNOWN, (int) V16HF_FTYPE_V16HF_V16HF_V16HF) +BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512vl_fcmaddc_v16hf_mask, "__builtin_ia32_vfcmaddcph_v16hf_mask", IX86_BUILTIN_VFCMADDCPH_V16HF_MASK, UNKNOWN, (int) V16HF_FTYPE_V16HF_V16HF_V16HF_UQI) +BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512vl_fcmaddc_v16hf_maskz, "__builtin_ia32_vfcmaddcph_v16hf_maskz", IX86_BUILTIN_VFCMADDCPH_V16HF_MASKZ, UNKNOWN, (int) V16HF_FTYPE_V16HF_V16HF_V16HF_UQI) +BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_fcmulc_v8hf, "__builtin_ia32_vfcmulcph_v8hf", IX86_BUILTIN_VFCMULCPH_V8HF, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF) +BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_fcmulc_v8hf_mask, "__builtin_ia32_vfcmulcph_v8hf_mask", IX86_BUILTIN_VFCMULCPH_V8HF_MASK, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI) +BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512vl_fcmulc_v16hf, "__builtin_ia32_vfcmulcph_v16hf", IX86_BUILTIN_VFCMULCPH_V16HF, UNKNOWN, (int) V16HF_FTYPE_V16HF_V16HF) +BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512vl_fcmulc_v16hf_mask, "__builtin_ia32_vfcmulcph_v16hf_mask", IX86_BUILTIN_VFCMULCPH_V16HF_MASK, UNKNOWN, (int) V16HF_FTYPE_V16HF_V16HF_V16HF_UQI) +BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_fmulc_v8hf, "__builtin_ia32_vfmulcph_v8hf", IX86_BUILTIN_VFMULCPH_V8HF, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF) +BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_fmulc_v8hf_mask, "__builtin_ia32_vfmulcph_v8hf_mask", IX86_BUILTIN_VFMULCPH_V8HF_MASK, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI) +BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512vl_fmulc_v16hf, "__builtin_ia32_vfmulcph_v16hf", IX86_BUILTIN_VFMULCPH_V16HF, UNKNOWN, (int) V16HF_FTYPE_V16HF_V16HF) +BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512vl_fmulc_v16hf_mask, "__builtin_ia32_vfmulcph_v16hf_mask", IX86_BUILTIN_VFMULCPH_V16HF_MASK, UNKNOWN, (int) V16HF_FTYPE_V16HF_V16HF_V16HF_UQI) /* Builtins with rounding support. */ BDESC_END (ARGS, ROUND_ARGS) @@ -3201,6 +3221,16 @@ BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512f_vmfnmadd_v8hf_mask_round BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512f_vmfnmadd_v8hf_mask3_round, "__builtin_ia32_vfnmaddsh3_mask3", IX86_BUILTIN_VFNMADDSH3_MASK3, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI_INT) BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512f_vmfnmadd_v8hf_maskz_round, "__builtin_ia32_vfnmaddsh3_maskz", IX86_BUILTIN_VFNMADDSH3_MASKZ, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI_INT) BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512f_vmfmsub_v8hf_mask3_round, "__builtin_ia32_vfmsubsh3_mask3", IX86_BUILTIN_VFMSUBSH3_MASK3, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI_INT) +BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_fma_fmaddc_v32hf_round, "__builtin_ia32_vfmaddcph_v32hf_round", IX86_BUILTIN_VFMADDCPH_V32HF_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_INT) +BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_fmaddc_v32hf_mask_round, "__builtin_ia32_vfmaddcph_v32hf_mask_round", IX86_BUILTIN_VFMADDCPH_V32HF_MASK_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_UHI_INT) +BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_fmaddc_v32hf_maskz_round, "__builtin_ia32_vfmaddcph_v32hf_maskz_round", IX86_BUILTIN_VFMADDCPH_V32HF_MASKZ_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_UHI_INT) +BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_fma_fcmaddc_v32hf_round, "__builtin_ia32_vfcmaddcph_v32hf_round", IX86_BUILTIN_VFCMADDCPH_V32HF_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_INT) +BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_fcmaddc_v32hf_mask_round, "__builtin_ia32_vfcmaddcph_v32hf_mask_round", IX86_BUILTIN_VFCMADDCPH_V32HF_MASK_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_UHI_INT) +BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_fcmaddc_v32hf_maskz_round, "__builtin_ia32_vfcmaddcph_v32hf_maskz_round", IX86_BUILTIN_VFCMADDCPH_V32HF_MASKZ_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_UHI_INT) +BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_fcmulc_v32hf_round, "__builtin_ia32_vfcmulcph_v32hf_round", IX86_BUILTIN_VFCMULCPH_V32HF_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_INT) +BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_fcmulc_v32hf_mask_round, "__builtin_ia32_vfcmulcph_v32hf_mask_round", IX86_BUILTIN_VFCMULCPH_V32HF_MASK_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_UHI_INT) +BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_fmulc_v32hf_round, "__builtin_ia32_vfmulcph_v32hf_round", IX86_BUILTIN_VFMULCPH_V32HF_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_INT) +BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_fmulc_v32hf_mask_round, "__builtin_ia32_vfmulcph_v32hf_mask_round", IX86_BUILTIN_VFMULCPH_V32HF_MASK_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_UHI_INT) BDESC_END (ROUND_ARGS, MULTI_ARG) diff --git a/gcc/config/i386/i386-expand.c b/gcc/config/i386/i386-expand.c index f6de05c769a..f6d74549dc2 100644 --- a/gcc/config/i386/i386-expand.c +++ b/gcc/config/i386/i386-expand.c @@ -9582,6 +9582,7 @@ ix86_expand_args_builtin (const struct builtin_description *d, case V2DI_FTYPE_V8HF_V2DI_UQI: case V2DI_FTYPE_V4SF_V2DI_UQI: case V8HF_FTYPE_V8HF_V8HF_UQI: + case V8HF_FTYPE_V8HF_V8HF_V8HF: case V8HF_FTYPE_V8HI_V8HF_UQI: case V8HF_FTYPE_V8SI_V8HF_UQI: case V8HF_FTYPE_V8SF_V8HF_UQI: @@ -9660,6 +9661,7 @@ ix86_expand_args_builtin (const struct builtin_description *d, case V16SF_FTYPE_V8SF_V16SF_UHI: case V16SI_FTYPE_V8SI_V16SI_UHI: case V16HF_FTYPE_V16HI_V16HF_UHI: + case V16HF_FTYPE_V16HF_V16HF_V16HF: case V16HI_FTYPE_V16HF_V16HI_UHI: case V16HI_FTYPE_V16HI_V16HI_UHI: case V8HI_FTYPE_V16QI_V8HI_UQI: @@ -9816,6 +9818,7 @@ ix86_expand_args_builtin (const struct builtin_description *d, case V8HI_FTYPE_V8HI_V8HI_V8HI_UQI: case V8SI_FTYPE_V8SI_V8SI_V8SI_UQI: case V4SI_FTYPE_V4SI_V4SI_V4SI_UQI: + case V16HF_FTYPE_V16HF_V16HF_V16HF_UQI: case V16HF_FTYPE_V16HF_V16HF_V16HF_UHI: case V8SF_FTYPE_V8SF_V8SF_V8SF_UQI: case V16QI_FTYPE_V16QI_V16QI_V16QI_UHI: @@ -10545,6 +10548,7 @@ ix86_expand_round_builtin (const struct builtin_description *d, case V16SF_FTYPE_V16HF_V16SF_UHI_INT: case V32HF_FTYPE_V32HI_V32HF_USI_INT: case V32HF_FTYPE_V32HF_V32HF_USI_INT: + case V32HF_FTYPE_V32HF_V32HF_V32HF_INT: case V16SF_FTYPE_V16SF_V16SF_HI_INT: case V8DI_FTYPE_V8SF_V8DI_QI_INT: case V16SF_FTYPE_V16SI_V16SF_HI_INT: @@ -10574,6 +10578,7 @@ ix86_expand_round_builtin (const struct builtin_description *d, case V4SF_FTYPE_V4SF_V4SF_V4SF_UQI_INT: case V4SF_FTYPE_V8HF_V4SF_V4SF_UQI_INT: case V16SF_FTYPE_V16SF_V16SF_V16SF_HI_INT: + case V32HF_FTYPE_V32HF_V32HF_V32HF_UHI_INT: case V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT: case V2DF_FTYPE_V8HF_V2DF_V2DF_UQI_INT: case V2DF_FTYPE_V2DF_V2DF_V2DF_QI_INT: diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index 31f8fc68c65..ddd93f739e3 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -194,6 +194,14 @@ (define_c_enum "unspec" [ UNSPEC_VCVTNE2PS2BF16 UNSPEC_VCVTNEPS2BF16 UNSPEC_VDPBF16PS + + ;; For AVX512FP16 suppport + UNSPEC_COMPLEX_FMA + UNSPEC_COMPLEX_FCMA + UNSPEC_COMPLEX_FMUL + UNSPEC_COMPLEX_FCMUL + UNSPEC_COMPLEX_MASK + ]) (define_c_enum "unspecv" [ @@ -909,6 +917,10 @@ (define_mode_attr avx512fmaskmode (V16SF "HI") (V8SF "QI") (V4SF "QI") (V8DF "QI") (V4DF "QI") (V2DF "QI")]) +;; Mapping of vector modes to corresponding complex mask size +(define_mode_attr avx512fmaskcmode + [(V32HF "HI") (V16HF "QI") (V8HF "QI")]) + ;; Mapping of vector modes to corresponding mask size (define_mode_attr avx512fmaskmodelower [(V64QI "di") (V32QI "si") (V16QI "hi") @@ -5499,6 +5511,92 @@ (define_insn "*fma4i_vmfnmsub_" [(set_attr "type" "ssemuladd") (set_attr "mode" "")]) +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;; +;; Complex type operations +;; +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; + +(define_int_iterator UNSPEC_COMPLEX_F_C_MA + [UNSPEC_COMPLEX_FMA UNSPEC_COMPLEX_FCMA]) + +(define_int_iterator UNSPEC_COMPLEX_F_C_MUL + [UNSPEC_COMPLEX_FMUL UNSPEC_COMPLEX_FCMUL]) + +(define_int_attr complexopname + [(UNSPEC_COMPLEX_FMA "fmaddc") + (UNSPEC_COMPLEX_FCMA "fcmaddc") + (UNSPEC_COMPLEX_FMUL "fmulc") + (UNSPEC_COMPLEX_FCMUL "fcmulc")]) + +(define_expand "_fmaddc__maskz" + [(match_operand:VF_AVX512FP16VL 0 "register_operand") + (match_operand:VF_AVX512FP16VL 1 "") + (match_operand:VF_AVX512FP16VL 2 "") + (match_operand:VF_AVX512FP16VL 3 "") + (match_operand: 4 "register_operand")] + "TARGET_AVX512FP16 && " +{ + emit_insn (gen_fma_fmaddc__maskz_1 ( + operands[0], operands[1], operands[2], operands[3], + CONST0_RTX (mode), operands[4])); + DONE; +}) + +(define_expand "_fcmaddc__maskz" + [(match_operand:VF_AVX512FP16VL 0 "register_operand") + (match_operand:VF_AVX512FP16VL 1 "") + (match_operand:VF_AVX512FP16VL 2 "") + (match_operand:VF_AVX512FP16VL 3 "") + (match_operand: 4 "register_operand")] + "TARGET_AVX512FP16 && " +{ + emit_insn (gen_fma_fcmaddc__maskz_1 ( + operands[0], operands[1], operands[2], operands[3], + CONST0_RTX (mode), operands[4])); + DONE; +}) + +(define_insn "fma__" + [(set (match_operand:VF_AVX512FP16VL 0 "register_operand" "=v") + (unspec:VF_AVX512FP16VL + [(match_operand:VF_AVX512FP16VL 1 "" "0") + (match_operand:VF_AVX512FP16VL 2 "" "%v") + (match_operand:VF_AVX512FP16VL 3 "" "")] + UNSPEC_COMPLEX_F_C_MA))] + "TARGET_AVX512FP16 && && " + "v\t{%3, %2, %0|%0, %2, %3}" + [(set_attr "type" "ssemuladd") + (set_attr "mode" "")]) + +(define_insn "___mask" + [(set (match_operand:VF_AVX512FP16VL 0 "register_operand" "=v") + (vec_merge:VF_AVX512FP16VL + (unspec:VF_AVX512FP16VL + [(match_operand:VF_AVX512FP16VL 1 "register_operand" "0") + (match_operand:VF_AVX512FP16VL 2 "nonimmediate_operand" "%v") + (match_operand:VF_AVX512FP16VL 3 "nonimmediate_operand" "")] + UNSPEC_COMPLEX_F_C_MA) + (match_dup 1) + (unspec: + [(match_operand: 4 "register_operand" "Yk")] + UNSPEC_COMPLEX_MASK)))] + "TARGET_AVX512FP16 && " + "v\t{%3, %2, %0%{%4%}|%0%{%4%}, %2, %3}" + [(set_attr "type" "ssemuladd") + (set_attr "mode" "")]) + +(define_insn "__" + [(set (match_operand:VF_AVX512FP16VL 0 "register_operand" "=v") + (unspec:VF_AVX512FP16VL + [(match_operand:VF_AVX512FP16VL 1 "nonimmediate_operand" "%v") + (match_operand:VF_AVX512FP16VL 2 "nonimmediate_operand" "")] + UNSPEC_COMPLEX_F_C_MUL))] + "TARGET_AVX512FP16 && " + "v\t{%2, %1, %0|%0, %1, %2}" + [(set_attr "type" "ssemul") + (set_attr "mode" "")]) + ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; ;; ;; Parallel half-precision floating point conversion operations diff --git a/gcc/config/i386/subst.md b/gcc/config/i386/subst.md index 2e9c2b38e25..3a1f554e9b9 100644 --- a/gcc/config/i386/subst.md +++ b/gcc/config/i386/subst.md @@ -28,6 +28,9 @@ (define_mode_iterator SUBST_V V16SF V8SF V4SF V8DF V4DF V2DF]) +(define_mode_iterator SUBST_CV + [V32HF V16HF V8HF]) + (define_mode_iterator SUBST_S [QI HI SI DI]) @@ -42,9 +45,11 @@ (define_mode_iterator SUBST_A QI HI SI DI SF DF]) (define_subst_attr "mask_name" "mask" "" "_mask") +(define_subst_attr "maskc_name" "maskc" "" "_mask") (define_subst_attr "mask_applied" "mask" "false" "true") (define_subst_attr "mask_operand2" "mask" "" "%{%3%}%N2") (define_subst_attr "mask_operand3" "mask" "" "%{%4%}%N3") +(define_subst_attr "maskc_operand3" "maskc" "" "%{%4%}%N3") (define_subst_attr "mask_operand3_1" "mask" "" "%%{%%4%%}%%N3") ;; for sprintf (define_subst_attr "mask_operand4" "mask" "" "%{%5%}%N4") (define_subst_attr "mask_operand6" "mask" "" "%{%7%}%N6") @@ -89,6 +94,18 @@ (define_subst "merge_mask" (match_dup 0) (match_operand: 2 "register_operand" "Yk")))]) +(define_subst "maskc" + [(set (match_operand:SUBST_CV 0) + (match_operand:SUBST_CV 1))] + "TARGET_AVX512F" + [(set (match_dup 0) + (vec_merge:SUBST_CV + (match_dup 1) + (match_operand:SUBST_CV 2 "nonimm_or_0_operand" "0C") + (unspec: + [(match_operand: 3 "register_operand" "Yk")] + UNSPEC_COMPLEX_MASK)))]) + (define_subst_attr "mask_scalar_merge_name" "mask_scalar_merge" "" "_mask") (define_subst_attr "mask_scalar_merge_operand3" "mask_scalar_merge" "" "%{%3%}") (define_subst_attr "mask_scalar_merge_operand4" "mask_scalar_merge" "" "%{%4%}") @@ -119,11 +136,31 @@ (define_subst "sd" (match_operand: 3 "register_operand" "Yk"))) ]) +(define_subst_attr "sdc_maskz_name" "sdc" "" "_maskz_1") +(define_subst_attr "sdc_mask_op4" "sdc" "" "%{%5%}%N4") +(define_subst_attr "sdc_mask_op5" "sdc" "" "%{%6%}%N5") +(define_subst_attr "sdc_mask_mode512bit_condition" "sdc" "1" "( == 64 || TARGET_AVX512VL)") + +(define_subst "sdc" + [(set (match_operand:SUBST_CV 0) + (match_operand:SUBST_CV 1))] + "" + [(set (match_dup 0) + (vec_merge:SUBST_CV + (match_dup 1) + (match_operand:SUBST_CV 2 "const0_operand" "C") + (unspec: + [(match_operand: 3 "register_operand" "Yk")] + UNSPEC_COMPLEX_MASK))) +]) + (define_subst_attr "round_name" "round" "" "_round") (define_subst_attr "round_mask_operand2" "mask" "%R2" "%R4") (define_subst_attr "round_mask_operand3" "mask" "%R3" "%R5") +(define_subst_attr "round_maskc_operand3" "maskc" "%R3" "%R5") (define_subst_attr "round_mask_operand4" "mask" "%R4" "%R6") (define_subst_attr "round_sd_mask_operand4" "sd" "%R4" "%R6") +(define_subst_attr "round_sdc_mask_operand4" "sdc" "%R4" "%R6") (define_subst_attr "round_op2" "round" "" "%R2") (define_subst_attr "round_op3" "round" "" "%R3") (define_subst_attr "round_op4" "round" "" "%R4") @@ -131,8 +168,10 @@ (define_subst_attr "round_op5" "round" "" "%R5") (define_subst_attr "round_op6" "round" "" "%R6") (define_subst_attr "round_mask_op2" "round" "" "") (define_subst_attr "round_mask_op3" "round" "" "") +(define_subst_attr "round_maskc_op3" "round" "" "") (define_subst_attr "round_mask_op4" "round" "" "") (define_subst_attr "round_sd_mask_op4" "round" "" "") +(define_subst_attr "round_sdc_mask_op4" "round" "" "") (define_subst_attr "round_constraint" "round" "vm" "v") (define_subst_attr "round_qq2phsuff" "round" "" "") (define_subst_attr "bcst_round_constraint" "round" "vmBr" "v") @@ -169,6 +208,7 @@ (define_subst_attr "round_saeonly_mask_operand3" "mask" "%r3" "%r5") (define_subst_attr "round_saeonly_mask_operand4" "mask" "%r4" "%r6") (define_subst_attr "round_saeonly_mask_scalar_merge_operand4" "mask_scalar_merge" "%r4" "%r5") (define_subst_attr "round_saeonly_sd_mask_operand5" "sd" "%r5" "%r7") +(define_subst_attr "round_saeonly_sdc_mask_operand5" "sdc" "%r5" "%r7") (define_subst_attr "round_saeonly_op2" "round_saeonly" "" "%r2") (define_subst_attr "round_saeonly_op3" "round_saeonly" "" "%r3") (define_subst_attr "round_saeonly_op4" "round_saeonly" "" "%r4") diff --git a/gcc/testsuite/gcc.target/i386/avx-1.c b/gcc/testsuite/gcc.target/i386/avx-1.c index 6c2d1dc3df4..56e90d9f9a5 100644 --- a/gcc/testsuite/gcc.target/i386/avx-1.c +++ b/gcc/testsuite/gcc.target/i386/avx-1.c @@ -787,6 +787,16 @@ #define __builtin_ia32_vfnmsubsh3_mask(A, B, C, D, E) __builtin_ia32_vfnmsubsh3_mask(A, B, C, D, 8) #define __builtin_ia32_vfnmsubsh3_mask3(A, B, C, D, E) __builtin_ia32_vfnmsubsh3_mask3(A, B, C, D, 8) #define __builtin_ia32_vfnmsubsh3_maskz(A, B, C, D, E) __builtin_ia32_vfnmsubsh3_maskz(A, B, C, D, 8) +#define __builtin_ia32_vfcmaddcph_v32hf_round(A, B, C, D) __builtin_ia32_vfcmaddcph_v32hf_round(A, B, C, 8) +#define __builtin_ia32_vfcmaddcph_v32hf_mask_round(A, C, D, B, E) __builtin_ia32_vfcmaddcph_v32hf_mask_round(A, C, D, B, 8) +#define __builtin_ia32_vfcmaddcph_v32hf_maskz_round(B, C, D, A, E) __builtin_ia32_vfcmaddcph_v32hf_maskz_round(B, C, D, A, 8) +#define __builtin_ia32_vfmaddcph_v32hf_round(A, B, C, D) __builtin_ia32_vfmaddcph_v32hf_round(A, B, C, 8) +#define __builtin_ia32_vfmaddcph_v32hf_mask_round(A, C, D, B, E) __builtin_ia32_vfmaddcph_v32hf_mask_round(A, C, D, B, 8) +#define __builtin_ia32_vfmaddcph_v32hf_maskz_round(B, C, D, A, E) __builtin_ia32_vfmaddcph_v32hf_maskz_round(B, C, D, A, 8) +#define __builtin_ia32_vfmulcph_v32hf_round(A, B, C) __builtin_ia32_vfmulcph_v32hf_round(A, B, 8) +#define __builtin_ia32_vfmulcph_v32hf_mask_round(A, C, D, B, E) __builtin_ia32_vfmulcph_v32hf_mask_round(A, C, D, B, 8) +#define __builtin_ia32_vfcmulcph_v32hf_round(A, B, C) __builtin_ia32_vfcmulcph_v32hf_round(A, B, 8) +#define __builtin_ia32_vfcmulcph_v32hf_mask_round(A, C, D, B, E) __builtin_ia32_vfcmulcph_v32hf_mask_round(A, C, D, B, 8) /* avx512fp16vlintrin.h */ #define __builtin_ia32_vcmpph_v8hf_mask(A, B, C, D) __builtin_ia32_vcmpph_v8hf_mask(A, B, 1, D) diff --git a/gcc/testsuite/gcc.target/i386/sse-13.c b/gcc/testsuite/gcc.target/i386/sse-13.c index f16be008909..ef9f8aad853 100644 --- a/gcc/testsuite/gcc.target/i386/sse-13.c +++ b/gcc/testsuite/gcc.target/i386/sse-13.c @@ -804,6 +804,16 @@ #define __builtin_ia32_vfnmsubsh3_mask(A, B, C, D, E) __builtin_ia32_vfnmsubsh3_mask(A, B, C, D, 8) #define __builtin_ia32_vfnmsubsh3_mask3(A, B, C, D, E) __builtin_ia32_vfnmsubsh3_mask3(A, B, C, D, 8) #define __builtin_ia32_vfnmsubsh3_maskz(A, B, C, D, E) __builtin_ia32_vfnmsubsh3_maskz(A, B, C, D, 8) +#define __builtin_ia32_vfcmaddcph_v32hf_round(A, B, C, D) __builtin_ia32_vfcmaddcph_v32hf_round(A, B, C, 8) +#define __builtin_ia32_vfcmaddcph_v32hf_mask_round(A, C, D, B, E) __builtin_ia32_vfcmaddcph_v32hf_mask_round(A, C, D, B, 8) +#define __builtin_ia32_vfcmaddcph_v32hf_maskz_round(B, C, D, A, E) __builtin_ia32_vfcmaddcph_v32hf_maskz_round(B, C, D, A, 8) +#define __builtin_ia32_vfmaddcph_v32hf_round(A, B, C, D) __builtin_ia32_vfmaddcph_v32hf_round(A, B, C, 8) +#define __builtin_ia32_vfmaddcph_v32hf_mask_round(A, C, D, B, E) __builtin_ia32_vfmaddcph_v32hf_mask_round(A, C, D, B, 8) +#define __builtin_ia32_vfmaddcph_v32hf_maskz_round(B, C, D, A, E) __builtin_ia32_vfmaddcph_v32hf_maskz_round(B, C, D, A, 8) +#define __builtin_ia32_vfmulcph_v32hf_round(A, B, C) __builtin_ia32_vfmulcph_v32hf_round(A, B, 8) +#define __builtin_ia32_vfmulcph_v32hf_mask_round(A, C, D, B, E) __builtin_ia32_vfmulcph_v32hf_mask_round(A, C, D, B, 8) +#define __builtin_ia32_vfcmulcph_v32hf_round(A, B, C) __builtin_ia32_vfcmulcph_v32hf_round(A, B, 8) +#define __builtin_ia32_vfcmulcph_v32hf_mask_round(A, C, D, B, E) __builtin_ia32_vfcmulcph_v32hf_mask_round(A, C, D, B, 8) /* avx512fp16vlintrin.h */ #define __builtin_ia32_vcmpph_v8hf_mask(A, B, C, D) __builtin_ia32_vcmpph_v8hf_mask(A, B, 1, D) diff --git a/gcc/testsuite/gcc.target/i386/sse-14.c b/gcc/testsuite/gcc.target/i386/sse-14.c index 01ac4e04173..f27c73fd4cc 100644 --- a/gcc/testsuite/gcc.target/i386/sse-14.c +++ b/gcc/testsuite/gcc.target/i386/sse-14.c @@ -772,6 +772,8 @@ test_2 (_mm_cvt_roundss_sh, __m128h, __m128h, __m128, 8) test_2 (_mm_cvt_roundsd_sh, __m128h, __m128h, __m128d, 8) test_2 (_mm_cvt_roundi32_sh, __m128h, __m128h, int, 8) test_2 (_mm_cvt_roundu32_sh, __m128h, __m128h, unsigned, 8) +test_2 (_mm512_fmul_round_pch, __m512h, __m512h, __m512h, 8) +test_2 (_mm512_fcmul_round_pch, __m512h, __m512h, __m512h, 8) test_2x (_mm512_cmp_round_ph_mask, __mmask32, __m512h, __m512h, 1, 8) test_2x (_mm_cmp_round_sh_mask, __mmask8, __m128h, __m128h, 1, 8) test_2x (_mm_comi_round_sh, int, __m128h, __m128h, 1, 8) @@ -846,6 +848,10 @@ test_3 (_mm_fmadd_round_sh, __m128h, __m128h, __m128h, __m128h, 9) test_3 (_mm_fnmadd_round_sh, __m128h, __m128h, __m128h, __m128h, 9) test_3 (_mm_fmsub_round_sh, __m128h, __m128h, __m128h, __m128h, 9) test_3 (_mm_fnmsub_round_sh, __m128h, __m128h, __m128h, __m128h, 9) +test_3 (_mm512_fmadd_round_pch, __m512h, __m512h, __m512h, __m512h, 8) +test_3 (_mm512_fcmadd_round_pch, __m512h, __m512h, __m512h, __m512h, 8) +test_3 (_mm512_maskz_fmul_round_pch, __m512h, __mmask16, __m512h, __m512h, 8) +test_3 (_mm512_maskz_fcmul_round_pch, __m512h, __mmask16, __m512h, __m512h, 8) test_3x (_mm512_mask_cmp_round_ph_mask, __mmask32, __mmask32, __m512h, __m512h, 1, 8) test_3x (_mm_mask_cmp_round_sh_mask, __mmask8, __mmask8, __m128h, __m128h, 1, 8) test_3x (_mm512_mask_reduce_round_ph, __m512h, __m512h, __mmask32, __m512h, 123, 8) @@ -908,6 +914,14 @@ test_4 (_mm_maskz_fmsub_round_sh, __m128h, __mmask8, __m128h, __m128h, __m128h, test_4 (_mm_mask_fnmsub_round_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 9) test_4 (_mm_mask3_fnmsub_round_sh, __m128h, __m128h, __m128h, __m128h, __mmask8, 9) test_4 (_mm_maskz_fnmsub_round_sh, __m128h, __mmask8, __m128h, __m128h, __m128h, 9) +test_4 (_mm512_mask_fmadd_round_pch, __m512h, __m512h, __mmask16, __m512h, __m512h, 8) +test_4 (_mm512_mask_fcmadd_round_pch, __m512h, __m512h, __mmask16, __m512h, __m512h, 8) +test_4 (_mm512_mask3_fmadd_round_pch, __m512h, __m512h, __m512h, __m512h, __mmask16, 8) +test_4 (_mm512_mask3_fcmadd_round_pch, __m512h, __m512h, __m512h, __m512h, __mmask16, 8) +test_4 (_mm512_maskz_fmadd_round_pch, __m512h, __mmask16, __m512h, __m512h, __m512h, 8) +test_4 (_mm512_maskz_fcmadd_round_pch, __m512h, __mmask16, __m512h, __m512h, __m512h, 8) +test_4 (_mm512_mask_fmul_round_pch, __m512h, __m512h, __mmask16, __m512h, __m512h, 8) +test_4 (_mm512_mask_fcmul_round_pch, __m512h, __m512h, __mmask16, __m512h, __m512h, 8) test_4x (_mm_mask_reduce_round_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 123, 8) test_4x (_mm_mask_roundscale_round_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 123, 8) test_4x (_mm_mask_getmant_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 1, 1) diff --git a/gcc/testsuite/gcc.target/i386/sse-22.c b/gcc/testsuite/gcc.target/i386/sse-22.c index 79e3f35ab86..ccf8c3a6c03 100644 --- a/gcc/testsuite/gcc.target/i386/sse-22.c +++ b/gcc/testsuite/gcc.target/i386/sse-22.c @@ -876,6 +876,8 @@ test_2 (_mm_cvt_roundsh_ss, __m128, __m128, __m128h, 8) test_2 (_mm_cvt_roundsh_sd, __m128d, __m128d, __m128h, 8) test_2 (_mm_cvt_roundss_sh, __m128h, __m128h, __m128, 8) test_2 (_mm_cvt_roundsd_sh, __m128h, __m128h, __m128d, 8) +test_2 (_mm512_fmul_round_pch, __m512h, __m512h, __m512h, 8) +test_2 (_mm512_fcmul_round_pch, __m512h, __m512h, __m512h, 8) test_2x (_mm512_cmp_round_ph_mask, __mmask32, __m512h, __m512h, 1, 8) test_2x (_mm_cmp_round_sh_mask, __mmask8, __m128h, __m128h, 1, 8) test_2x (_mm_comi_round_sh, int, __m128h, __m128h, 1, 8) @@ -949,6 +951,10 @@ test_3 (_mm_fmadd_round_sh, __m128h, __m128h, __m128h, __m128h, 9) test_3 (_mm_fnmadd_round_sh, __m128h, __m128h, __m128h, __m128h, 9) test_3 (_mm_fmsub_round_sh, __m128h, __m128h, __m128h, __m128h, 9) test_3 (_mm_fnmsub_round_sh, __m128h, __m128h, __m128h, __m128h, 9) +test_3 (_mm512_fmadd_round_pch, __m512h, __m512h, __m512h, __m512h, 8) +test_3 (_mm512_fcmadd_round_pch, __m512h, __m512h, __m512h, __m512h, 8) +test_3 (_mm512_maskz_fmul_round_pch, __m512h, __mmask16, __m512h, __m512h, 8) +test_3 (_mm512_maskz_fcmul_round_pch, __m512h, __mmask16, __m512h, __m512h, 8) test_3x (_mm512_mask_cmp_round_ph_mask, __mmask32, __mmask32, __m512h, __m512h, 1, 8) test_3x (_mm_mask_cmp_round_sh_mask, __mmask8, __mmask8, __m128h, __m128h, 1, 8) test_3x (_mm512_mask_reduce_round_ph, __m512h, __m512h, __mmask32, __m512h, 123, 8) @@ -1010,6 +1016,14 @@ test_4 (_mm_maskz_fmsub_round_sh, __m128h, __mmask8, __m128h, __m128h, __m128h, test_4 (_mm_mask_fnmsub_round_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 9) test_4 (_mm_mask3_fnmsub_round_sh, __m128h, __m128h, __m128h, __m128h, __mmask8, 9) test_4 (_mm_maskz_fnmsub_round_sh, __m128h, __mmask8, __m128h, __m128h, __m128h, 9) +test_4 (_mm512_mask_fmadd_round_pch, __m512h, __m512h, __mmask16, __m512h, __m512h, 8) +test_4 (_mm512_mask_fcmadd_round_pch, __m512h, __m512h, __mmask16, __m512h, __m512h, 8) +test_4 (_mm512_mask3_fmadd_round_pch, __m512h, __m512h, __m512h, __m512h, __mmask16, 8) +test_4 (_mm512_mask3_fcmadd_round_pch, __m512h, __m512h, __m512h, __m512h, __mmask16, 8) +test_4 (_mm512_maskz_fmadd_round_pch, __m512h, __mmask16, __m512h, __m512h, __m512h, 8) +test_4 (_mm512_maskz_fcmadd_round_pch, __m512h, __mmask16, __m512h, __m512h, __m512h, 8) +test_4 (_mm512_mask_fmul_round_pch, __m512h, __m512h, __mmask16, __m512h, __m512h, 8) +test_4 (_mm512_mask_fcmul_round_pch, __m512h, __m512h, __mmask16, __m512h, __m512h, 8) test_4x (_mm_mask_reduce_round_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 123, 8) test_4x (_mm_mask_roundscale_round_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 123, 8) test_4x (_mm_mask_getmant_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 1, 1) diff --git a/gcc/testsuite/gcc.target/i386/sse-23.c b/gcc/testsuite/gcc.target/i386/sse-23.c index caf14408b91..dc39d7e2012 100644 --- a/gcc/testsuite/gcc.target/i386/sse-23.c +++ b/gcc/testsuite/gcc.target/i386/sse-23.c @@ -805,6 +805,16 @@ #define __builtin_ia32_vfnmsubsh3_mask(A, B, C, D, E) __builtin_ia32_vfnmsubsh3_mask(A, B, C, D, 8) #define __builtin_ia32_vfnmsubsh3_mask3(A, B, C, D, E) __builtin_ia32_vfnmsubsh3_mask3(A, B, C, D, 8) #define __builtin_ia32_vfnmsubsh3_maskz(A, B, C, D, E) __builtin_ia32_vfnmsubsh3_maskz(A, B, C, D, 8) +#define __builtin_ia32_vfcmaddcph_v32hf_round(A, B, C, D) __builtin_ia32_vfcmaddcph_v32hf_round(A, B, C, 8) +#define __builtin_ia32_vfcmaddcph_v32hf_mask_round(A, C, D, B, E) __builtin_ia32_vfcmaddcph_v32hf_mask_round(A, C, D, B, 8) +#define __builtin_ia32_vfcmaddcph_v32hf_maskz_round(B, C, D, A, E) __builtin_ia32_vfcmaddcph_v32hf_maskz_round(B, C, D, A, 8) +#define __builtin_ia32_vfmaddcph_v32hf_round(A, B, C, D) __builtin_ia32_vfmaddcph_v32hf_round(A, B, C, 8) +#define __builtin_ia32_vfmaddcph_v32hf_mask_round(A, C, D, B, E) __builtin_ia32_vfmaddcph_v32hf_mask_round(A, C, D, B, 8) +#define __builtin_ia32_vfmaddcph_v32hf_maskz_round(B, C, D, A, E) __builtin_ia32_vfmaddcph_v32hf_maskz_round(B, C, D, A, 8) +#define __builtin_ia32_vfmulcph_v32hf_round(A, B, C) __builtin_ia32_vfmulcph_v32hf_round(A, B, 8) +#define __builtin_ia32_vfmulcph_v32hf_mask_round(A, C, D, B, E) __builtin_ia32_vfmulcph_v32hf_mask_round(A, C, D, B, 8) +#define __builtin_ia32_vfcmulcph_v32hf_round(A, B, C) __builtin_ia32_vfcmulcph_v32hf_round(A, B, 8) +#define __builtin_ia32_vfcmulcph_v32hf_mask_round(A, C, D, B, E) __builtin_ia32_vfcmulcph_v32hf_mask_round(A, C, D, B, 8) /* avx512fp16vlintrin.h */ #define __builtin_ia32_vcmpph_v8hf_mask(A, B, C, D) __builtin_ia32_vcmpph_v8hf_mask(A, B, 1, D) From patchwork Thu Jul 1 06:16:36 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: liuhongt X-Patchwork-Id: 1499401 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256 header.s=default header.b=FWKZnfxU; dkim-atps=neutral Received: from sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4GFqDH3GZMz9sWw for ; Thu, 1 Jul 2021 17:12:47 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id C6977384A028 for ; Thu, 1 Jul 2021 07:12:44 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org C6977384A028 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1625123564; bh=0zoRT9USRE3ItzdhA83zEWpbCdr41lOfxSDqddwKLok=; h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=FWKZnfxU8gKUao8yOB4WLOYGMQjHloXgyUFi9j+uVDR4qYMiDiomkW8ukg0ZqW5de LVkX/HxkSidKueH2iX84Rorg5JbvUAnJUnz4BAQBM5n6C6WFGjRHiGsB5cPGzW+bP/ 2dPag0CcZLZusn9P3VLWmmWZoWDBhPHVVUq7L6wo= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mga05.intel.com (mga05.intel.com [192.55.52.43]) by sourceware.org (Postfix) with ESMTPS id 3574A384F00B for ; Thu, 1 Jul 2021 06:18:13 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 3574A384F00B X-IronPort-AV: E=McAfee;i="6200,9189,10031"; a="294115046" X-IronPort-AV: E=Sophos;i="5.83,313,1616482800"; d="scan'208";a="294115046" Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by fmsmga105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Jun 2021 23:18:11 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.83,313,1616482800"; d="scan'208";a="420287994" Received: from scymds01.sc.intel.com ([10.148.94.138]) by fmsmga007.fm.intel.com with ESMTP; 30 Jun 2021 23:18:11 -0700 Received: from shliclel320.sh.intel.com (shliclel320.sh.intel.com [10.239.236.50]) by scymds01.sc.intel.com with ESMTP id 1616GmfT031625; Wed, 30 Jun 2021 23:18:10 -0700 To: gcc-patches@gcc.gnu.org Subject: [PATCH 50/62] AVX512FP16: Add testcases for vfcmaddcph/vfmaddcph/vfcmulcph/vfmulcph. Date: Thu, 1 Jul 2021 14:16:36 +0800 Message-Id: <20210701061648.9447-51-hongtao.liu@intel.com> X-Mailer: git-send-email 2.18.1 In-Reply-To: <20210701061648.9447-1-hongtao.liu@intel.com> References: <20210701061648.9447-1-hongtao.liu@intel.com> X-Spam-Status: No, score=-12.2 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_NONE, KAM_DMARC_STATUS, KAM_LAZY_DOMAIN_SECURITY, KAM_SHORT, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: liuhongt via Gcc-patches From: liuhongt Reply-To: liuhongt Cc: jakub@redhat.com Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" gcc/testsuite/ChangeLog: * gcc.target/i386/avx512fp16-helper.h (init_src): Adjust init value. (NET_CMASK): New net mask for complex input. * gcc.target/i386/avx512fp16-vfcmaddcph-1a.c: New test. * gcc.target/i386/avx512fp16-vfcmaddcph-1b.c: Ditto. * gcc.target/i386/avx512fp16-vfcmulcph-1a.c: Ditto. * gcc.target/i386/avx512fp16-vfcmulcph-1b.c: Ditto. * gcc.target/i386/avx512fp16-vfmaddcph-1a.c: Ditto. * gcc.target/i386/avx512fp16-vfmaddcph-1b.c: Ditto. * gcc.target/i386/avx512fp16-vfmulcph-1a.c: Ditto. * gcc.target/i386/avx512fp16-vfmulcph-1b.c: Ditto. * gcc.target/i386/avx512fp16vl-vfcmaddcph-1a.c: Ditto. * gcc.target/i386/avx512fp16vl-vfcmaddcph-1b.c: Ditto. * gcc.target/i386/avx512fp16vl-vfcmulcph-1a.c: Ditto. * gcc.target/i386/avx512fp16vl-vfcmulcph-1b.c: Ditto. * gcc.target/i386/avx512fp16vl-vfmaddcph-1a.c: Ditto. * gcc.target/i386/avx512fp16vl-vfmaddcph-1b.c: Ditto. * gcc.target/i386/avx512fp16vl-vfmulcph-1a.c: Ditto. * gcc.target/i386/avx512fp16vl-vfmulcph-1b.c: Ditto. --- .../gcc.target/i386/avx512fp16-helper.h | 9 +- .../i386/avx512fp16-vfcmaddcph-1a.c | 27 ++++ .../i386/avx512fp16-vfcmaddcph-1b.c | 133 ++++++++++++++++++ .../gcc.target/i386/avx512fp16-vfcmulcph-1a.c | 25 ++++ .../gcc.target/i386/avx512fp16-vfcmulcph-1b.c | 111 +++++++++++++++ .../gcc.target/i386/avx512fp16-vfmaddcph-1a.c | 27 ++++ .../gcc.target/i386/avx512fp16-vfmaddcph-1b.c | 131 +++++++++++++++++ .../gcc.target/i386/avx512fp16-vfmulcph-1a.c | 25 ++++ .../gcc.target/i386/avx512fp16-vfmulcph-1b.c | 115 +++++++++++++++ .../i386/avx512fp16vl-vfcmaddcph-1a.c | 30 ++++ .../i386/avx512fp16vl-vfcmaddcph-1b.c | 15 ++ .../i386/avx512fp16vl-vfcmulcph-1a.c | 28 ++++ .../i386/avx512fp16vl-vfcmulcph-1b.c | 15 ++ .../i386/avx512fp16vl-vfmaddcph-1a.c | 30 ++++ .../i386/avx512fp16vl-vfmaddcph-1b.c | 15 ++ .../i386/avx512fp16vl-vfmulcph-1a.c | 28 ++++ .../i386/avx512fp16vl-vfmulcph-1b.c | 15 ++ 17 files changed, 777 insertions(+), 2 deletions(-) create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vfcmaddcph-1a.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vfcmaddcph-1b.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vfcmulcph-1a.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vfcmulcph-1b.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vfmaddcph-1a.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vfmaddcph-1b.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vfmulcph-1a.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vfmulcph-1b.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vfcmaddcph-1a.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vfcmaddcph-1b.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vfcmulcph-1a.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vfcmulcph-1b.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vfmaddcph-1a.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vfmaddcph-1b.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vfmulcph-1a.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vfmulcph-1b.c diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-helper.h b/gcc/testsuite/gcc.target/i386/avx512fp16-helper.h index ce3cfdc3f6b..69948f8ee4f 100644 --- a/gcc/testsuite/gcc.target/i386/avx512fp16-helper.h +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-helper.h @@ -172,9 +172,9 @@ init_src() for (i = 0; i < AVX512F_MAX_ELEM; i++) { v1.f32[i] = i + 1; - v2.f32[i] = i * 0.5f; + v2.f32[i] = (i + 2) * 0.5f; v3.f32[i] = i * 1.5f; - v4.f32[i] = i - 0.5f; + v4.f32[i] = i - 1.5f; src3.u32[i] = (i + 1) * 10; } @@ -234,10 +234,12 @@ init_dest(V512 * res, V512 * exp) #undef DF #undef H_HF #undef NET_MASK +#undef NET_CMASK #undef MASK_VALUE #undef HALF_MASK #undef ZMASK_VALUE #define NET_MASK 0xffff +#define NET_CMASK 0xff #define MASK_VALUE 0xcccc #define ZMASK_VALUE 0xfcc1 #define HALF_MASK 0xcc @@ -253,10 +255,12 @@ init_dest(V512 * res, V512 * exp) #undef SI #undef H_HF #undef NET_MASK +#undef NET_CMASK #undef MASK_VALUE #undef ZMASK_VALUE #undef HALF_MASK #define NET_MASK 0xff +#define NET_CMASK 0xff #define MASK_VALUE 0xcc #define HALF_MASK MASK_VALUE #define ZMASK_VALUE 0xc1 @@ -267,6 +271,7 @@ init_dest(V512 * res, V512 * exp) #define H_HF(x) x.xmmh[0] #else #define NET_MASK 0xffffffff +#define NET_CMASK 0xffff #define MASK_VALUE 0xcccccccc #define ZMASK_VALUE 0xfcc1fcc1 #define HALF_MASK 0xcccc diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vfcmaddcph-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vfcmaddcph-1a.c new file mode 100644 index 00000000000..6c2c34c1731 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vfcmaddcph-1a.c @@ -0,0 +1,27 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx512fp16 -O2" } */ +/* { dg-final { scan-assembler-times "vfcmaddcph\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vfcmaddcph\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\[^\{\n\r]*(?:\n|\[ \\t\]+#)" 2 } } */ +/* { dg-final { scan-assembler-times "vfcmaddcph\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vfcmaddcph\[ \\t\]+\{rn-sae\}\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vfcmaddcph\[ \\t\]+\{rn-sae\}\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 2 } } */ +/* { dg-final { scan-assembler-times "vfcmaddcph\[ \\t\]+\{rz-sae\}\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ + +#include + +volatile __m512h res, res1, res2; +volatile __m512h x1, x2, x3; +volatile __mmask16 m16; + +void extern +avx512f_test (void) +{ + res = _mm512_fcmadd_pch (x1, x2, x3); + res1 = _mm512_mask_fcmadd_pch (res1, m16, x1, x2); + res1 = _mm512_mask3_fcmadd_pch (res1, x1, x2, m16); + res2 = _mm512_maskz_fcmadd_pch (m16, x1, x2, x3); + res = _mm512_fcmadd_round_pch (x1, x2, x3, 8); + res1 = _mm512_mask_fcmadd_round_pch (res1, m16, x1, x2, 8); + res1 = _mm512_mask3_fcmadd_round_pch (res1, x1, x2, m16, 8); + res2 = _mm512_maskz_fcmadd_round_pch (m16, x1, x2, x3, 11); +} diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vfcmaddcph-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vfcmaddcph-1b.c new file mode 100644 index 00000000000..835699b834d --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vfcmaddcph-1b.c @@ -0,0 +1,133 @@ +/* { dg-do run { target avx512fp16 } } */ +/* { dg-options "-O2 -mavx512fp16 -mavx512dq" } */ + + +#define AVX512FP16 +#include "avx512fp16-helper.h" + +#define N_ELEMS (AVX512F_LEN / 16) + +void NOINLINE +EMULATE(c_fmadd_pch) (V512 * dest, V512 op1, V512 op2, + __mmask16 k, int zero_mask, int c_flag, + int is_mask3) +{ + V512 v1, v2, v3, v4, v5, v6, v7, v8; + int i; + int invert = 1; + if (c_flag == 1) + invert = -1; + + unpack_ph_2twops(op1, &v1, &v2); + unpack_ph_2twops(op2, &v3, &v4); + unpack_ph_2twops(*dest, &v7, &v8); + + for (i = 0; i < 16; i++) { + if (((1 << (i / 2)) & k) == 0) { + if (zero_mask) { + v5.f32[i] = 0; + } + else { + v5.u32[i] = is_mask3 ? v3.u32[i] : v7.u32[i]; + } + } + else { + if ((i % 2) == 0) { + v5.f32[i] = v1.f32[i] * v7.f32[i] + - invert * (v1.f32[i+1] * v7.f32[i+1]) + v3.f32[i]; + } + else { + v5.f32[i] = v1.f32[i-1] * v7.f32[i] + + invert * (v1.f32[i] * v7.f32[i-1]) + v3.f32[i]; + + } + } + if (((1 << (i / 2 + 8)) & k) == 0) { + if (zero_mask) { + v6.f32[i] = 0; + } + else { + v6.u32[i] = is_mask3 ? v4.u32[i] : v8.u32[i]; + } + } + else { + if ((i % 2) == 0) { + v6.f32[i] = v2.f32[i] * v8.f32[i] + - invert * (v2.f32[i+1] * v8.f32[i+1]) + v4.f32[i]; + } + else { + v6.f32[i] = v2.f32[i-1] * v8.f32[i] + + invert * (v2.f32[i] * v8.f32[i-1]) + v4.f32[i]; + } + + } + } + + *dest = pack_twops_2ph(v5, v6); +} + +void +TEST (void) +{ + V512 res; + V512 exp; + + init_src(); + + init_dest(&res, &exp); + EMULATE(c_fmadd_pch)(&exp, src1, src2, NET_CMASK, 0, 1, 0); + HF(res) = INTRINSIC (_fcmadd_pch) (HF(res), HF(src1), + HF(src2)); + CHECK_RESULT (&res, &exp, N_ELEMS, _fcmadd_pch); + + init_dest(&res, &exp); + EMULATE(c_fmadd_pch)(&exp, src1, src2, HALF_MASK, 0, 1, 0); + HF(res) = INTRINSIC (_mask_fcmadd_pch) (HF(res) ,HALF_MASK, HF(src1), + HF(src2)); + CHECK_RESULT (&res, &exp, N_ELEMS, _mask_fcmadd_pch); + + init_dest(&res, &exp); + EMULATE(c_fmadd_pch)(&exp, src1, src2, HALF_MASK, 0, 1, 1); + HF(res) = INTRINSIC (_mask3_fcmadd_pch) (HF(res), HF(src1), + HF(src2), HALF_MASK); + CHECK_RESULT (&res, &exp, N_ELEMS, _mask3_fcmadd_pch); + + + init_dest(&res, &exp); + EMULATE(c_fmadd_pch)(&exp, src1, src2, HALF_MASK, 1, 1, 0); + HF(res) = INTRINSIC (_maskz_fcmadd_pch) (HALF_MASK, HF(res), + HF(src1), HF(src2)); + CHECK_RESULT (&res, &exp, N_ELEMS, _maskz_fcmadd_pch); + +#if AVX512F_LEN == 512 + init_dest(&res, &exp); + EMULATE(c_fmadd_pch)(&exp, src1, src2, NET_CMASK, 0, 1, 0); + HF(res) = INTRINSIC (_fcmadd_round_pch) (HF(res), HF(src1), + HF(src2), _ROUND_NINT); + CHECK_RESULT (&res, &exp, N_ELEMS, _fcmadd_pch); + + init_dest(&res, &exp); + EMULATE(c_fmadd_pch)(&exp, src1, src2, HALF_MASK, 0, 1, 0); + HF(res) = INTRINSIC (_mask_fcmadd_round_pch) (HF(res) ,HALF_MASK, HF(src1), + HF(src2), _ROUND_NINT); + CHECK_RESULT (&res, &exp, N_ELEMS, _mask_fcmadd_pch); + + init_dest(&res, &exp); + EMULATE(c_fmadd_pch)(&exp, src1, src2, HALF_MASK, 0, 1, 1); + HF(res) = INTRINSIC (_mask3_fcmadd_round_pch) (HF(res), HF(src1), + HF(src2), HALF_MASK, _ROUND_NINT); + CHECK_RESULT (&res, &exp, N_ELEMS, _mask3_fcmadd_pch); + + + init_dest(&res, &exp); + EMULATE(c_fmadd_pch)(&exp, src1, src2, HALF_MASK, 1, 1, 0); + HF(res) = INTRINSIC (_maskz_fcmadd_round_pch) (HALF_MASK, HF(res), + HF(src1), HF(src2), _ROUND_NINT); + CHECK_RESULT (&res, &exp, N_ELEMS, _maskz_fcmadd_pch); +#endif + + if (n_errs != 0) { + abort (); + } +} + diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vfcmulcph-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vfcmulcph-1a.c new file mode 100644 index 00000000000..ca2f14072ba --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vfcmulcph-1a.c @@ -0,0 +1,25 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx512fp16 -O2" } */ +/* { dg-final { scan-assembler-times "vfcmulcph\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vfcmulcph\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\[^\{\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vfcmulcph\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vfcmulcph\[ \\t\]+\{rn-sae\}\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vfcmulcph\[ \\t\]+\{rn-sae\}\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vfcmulcph\[ \\t\]+\{rz-sae\}\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ + +#include + +volatile __m512h res, res1, res2; +volatile __m512h x1, x2, x3; +volatile __mmask16 m16; + +void extern +avx512f_test (void) +{ + res = _mm512_fcmul_pch (x1, x2); + res1 = _mm512_mask_fcmul_pch (res1, m16, x1, x2); + res2 = _mm512_maskz_fcmul_pch (m16, x1, x2); + res = _mm512_fcmul_round_pch (x1, x2, 8); + res1 = _mm512_mask_fcmul_round_pch (res1, m16, x1, x2, 8); + res2 = _mm512_maskz_fcmul_round_pch (m16, x1, x2, 11); +} diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vfcmulcph-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vfcmulcph-1b.c new file mode 100644 index 00000000000..ee41f6c58d4 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vfcmulcph-1b.c @@ -0,0 +1,111 @@ +/* { dg-do run { target avx512fp16 } } */ +/* { dg-options "-O2 -mavx512fp16 -mavx512dq" } */ + +#define AVX512FP16 +#include "avx512fp16-helper.h" + +#define N_ELEMS (AVX512F_LEN / 16) + +void NOINLINE +EMULATE(c_fmul_pch) (V512 * dest, V512 op1, V512 op2, + __mmask16 k, int zero_mask, int c_flag) +{ + V512 v1, v2, v3, v4, v5, v6, v7, v8; + int i; + int invert = 1; + if (c_flag == 1) + invert = -1; + + unpack_ph_2twops(op1, &v1, &v2); + unpack_ph_2twops(op2, &v3, &v4); + unpack_ph_2twops(*dest, &v7, &v8); + + for (i = 0; i < 16; i++) { + if (((1 << (i / 2)) & k) == 0) { + if (zero_mask) { + v5.f32[i] = 0; + } + else { + v5.u32[i] = v7.u32[i]; + } + } + else { + if ((i % 2) == 0) { + v5.f32[i] = v1.f32[i] * v3.f32[i] + - invert * (v1.f32[i+1] * v3.f32[i+1]); + } + else { + v5.f32[i] = v1.f32[i] * v3.f32[i-1] + + invert * (v1.f32[i-1] * v3.f32[i]); + + } + } + if (((1 << (i / 2 + 8)) & k) == 0) { + if (zero_mask) { + v6.f32[i] = 0; + } + else { + v6.u32[i] = v8.u32[i]; + } + } + else { + if ((i % 2) == 0) { + v6.f32[i] = v2.f32[i] * v4.f32[i] + - invert * (v2.f32[i+1] * v4.f32[i+1]); + } + else { + v6.f32[i] = v2.f32[i] * v4.f32[i-1] + + invert * (v2.f32[i-1] * v4.f32[i]); + } + + } + } + + *dest = pack_twops_2ph(v5, v6); +} + +void +TEST (void) +{ + V512 res; + V512 exp; + + init_src(); + + EMULATE(c_fmul_pch)(&exp, src1, src2, NET_CMASK, 0, 1); + HF(res) = INTRINSIC (_fcmul_pch) (HF(src1), HF(src2)); + CHECK_RESULT (&res, &exp, N_ELEMS, _fcmul_pch); + + init_dest(&res, &exp); + EMULATE(c_fmul_pch)(&exp, src1, src2, HALF_MASK, 0, 1); + HF(res) = INTRINSIC (_mask_fcmul_pch) (HF(res) ,HALF_MASK, HF(src1), + HF(src2)); + CHECK_RESULT (&res, &exp, N_ELEMS, _mask_fcmul_pch); + + EMULATE(c_fmul_pch)(&exp, src1, src2, HALF_MASK, 1, 1); + HF(res) = INTRINSIC (_maskz_fcmul_pch) ( HALF_MASK, HF(src1), + HF(src2)); + CHECK_RESULT (&res, &exp, N_ELEMS, _maskz_fcmul_pch); + +#if AVX512F_LEN == 512 + EMULATE(c_fmul_pch)(&exp, src1, src2, NET_CMASK, 0, 1); + HF(res) = INTRINSIC (_fcmul_round_pch) (HF(src1), HF(src2), _ROUND_NINT); + CHECK_RESULT (&res, &exp, N_ELEMS, _fcmul_round_pch); + + init_dest(&res, &exp); + EMULATE(c_fmul_pch)(&exp, src1, src2, HALF_MASK, 0, 1); + HF(res) = INTRINSIC (_mask_fcmul_round_pch) (HF(res) ,HALF_MASK, HF(src1), + HF(src2), _ROUND_NINT); + CHECK_RESULT (&res, &exp, N_ELEMS, _mask_fcmul_round_pch); + + EMULATE(c_fmul_pch)(&exp, src1, src2, HALF_MASK, 1, 1); + HF(res) = INTRINSIC (_maskz_fcmul_round_pch) ( HALF_MASK, HF(src1), + HF(src2), _ROUND_NINT); + CHECK_RESULT (&res, &exp, N_ELEMS, _maskz_fcmul_round_pch); +#endif + + if (n_errs != 0) { + abort (); + } +} + diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vfmaddcph-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vfmaddcph-1a.c new file mode 100644 index 00000000000..4dae5f02dc6 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vfmaddcph-1a.c @@ -0,0 +1,27 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx512fp16 -O2" } */ +/* { dg-final { scan-assembler-times "vfmaddcph\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vfmaddcph\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\[^\{\n\r]*(?:\n|\[ \\t\]+#)" 2 } } */ +/* { dg-final { scan-assembler-times "vfmaddcph\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vfmaddcph\[ \\t\]+\{rn-sae\}\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vfmaddcph\[ \\t\]+\{rn-sae\}\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 2 } } */ +/* { dg-final { scan-assembler-times "vfmaddcph\[ \\t\]+\{rz-sae\}\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ + +#include + +volatile __m512h res, res1, res2; +volatile __m512h x1, x2, x3; +volatile __mmask16 m16; + +void extern +avx512f_test (void) +{ + res = _mm512_fmadd_pch (x1, x2, x3); + res1 = _mm512_mask_fmadd_pch (res1, m16, x1, x2); + res1 = _mm512_mask3_fmadd_pch (res1, x1, x2, m16); + res2 = _mm512_maskz_fmadd_pch (m16, x1, x2, x3); + res = _mm512_fmadd_round_pch (x1, x2, x3, 8); + res1 = _mm512_mask_fmadd_round_pch (res1, m16, x1, x2, 8); + res1 = _mm512_mask3_fmadd_round_pch (res1, x1, x2, m16, 8); + res2 = _mm512_maskz_fmadd_round_pch (m16, x1, x2, x3, 11); +} diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vfmaddcph-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vfmaddcph-1b.c new file mode 100644 index 00000000000..1da6f01e139 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vfmaddcph-1b.c @@ -0,0 +1,131 @@ +/* { dg-do run { target avx512fp16 } } */ +/* { dg-options "-O2 -mavx512fp16 -mavx512dq" } */ + + +#define AVX512FP16 +#include "avx512fp16-helper.h" + +#define N_ELEMS (AVX512F_LEN / 16) + +void NOINLINE +EMULATE(c_fmadd_pch) (V512 * dest, V512 op1, V512 op2, + __mmask16 k, int zero_mask, int c_flag, + int is_mask3) +{ + V512 v1, v2, v3, v4, v5, v6, v7, v8; + int i; + int invert = 1; + if (c_flag == 1) + invert = -1; + + unpack_ph_2twops(op1, &v1, &v2); + unpack_ph_2twops(op2, &v3, &v4); + unpack_ph_2twops(*dest, &v7, &v8); + + for (i = 0; i < 16; i++) { + if (((1 << (i / 2)) & k) == 0) { + if (zero_mask) { + v5.f32[i] = 0; + } + else { + v5.u32[i] = is_mask3 ? v3.u32[i] : v7.u32[i]; + } + } + else { + if ((i % 2) == 0) { + v5.f32[i] = v1.f32[i] * v7.f32[i] + - invert * (v1.f32[i+1] * v7.f32[i+1]) + v3.f32[i]; + } + else { + v5.f32[i] = v1.f32[i-1] * v7.f32[i] + + invert * (v1.f32[i] * v7.f32[i-1]) + v3.f32[i]; + + } + } + if (((1 << (i / 2 + 8)) & k) == 0) { + if (zero_mask) { + v6.f32[i] = 0; + } + else { + v6.u32[i] = is_mask3 ? v4.u32[i] : v8.u32[i]; + } + } + else { + if ((i % 2) == 0) { + v6.f32[i] = v2.f32[i] * v8.f32[i] + - invert * (v2.f32[i+1] * v8.f32[i+1]) + v4.f32[i]; + } + else { + v6.f32[i] = v2.f32[i-1] * v8.f32[i] + + invert * (v2.f32[i] * v8.f32[i-1]) + v4.f32[i]; + } + + } + } + + *dest = pack_twops_2ph(v5, v6); +} + +void +TEST (void) +{ + V512 res; + V512 exp; + + init_src(); + + init_dest(&res, &exp); + EMULATE(c_fmadd_pch)(&exp, src1, src2, NET_CMASK, 0, 0, 0); + HF(res) = INTRINSIC (_fmadd_pch) (HF(res), HF(src1), + HF(src2)); + CHECK_RESULT (&res, &exp, N_ELEMS, _fmadd_pch); + + init_dest(&res, &exp); + EMULATE(c_fmadd_pch)(&exp, src1, src2, HALF_MASK, 0, 0, 0); + HF(res) = INTRINSIC (_mask_fmadd_pch) (HF(res), HALF_MASK, HF(src1), + HF(src2)); + CHECK_RESULT (&res, &exp, N_ELEMS, _mask_fmadd_pch); + + init_dest(&res, &exp); + EMULATE(c_fmadd_pch)(&exp, src1, src2, HALF_MASK, 0, 0, 1); + HF(res) = INTRINSIC (_mask3_fmadd_pch) (HF(res), HF(src1), HF(src2), + HALF_MASK); + CHECK_RESULT (&res, &exp, N_ELEMS, _mask3_fmadd_pch); + + init_dest(&res, &exp); + EMULATE(c_fmadd_pch)(&exp, src1, src2, HALF_MASK, 1, 0, 0); + HF(res) = INTRINSIC (_maskz_fmadd_pch) (HALF_MASK, HF(res), HF(src1), + HF(src2)); + CHECK_RESULT (&res, &exp, N_ELEMS, _maskz_fmadd_pch); + +#if AVX512F_LEN == 512 + init_dest(&res, &exp); + EMULATE(c_fmadd_pch)(&exp, src1, src2, NET_CMASK, 0, 0, 0); + HF(res) = INTRINSIC (_fmadd_round_pch) (HF(res), HF(src1), + HF(src2), _ROUND_NINT); + CHECK_RESULT (&res, &exp, N_ELEMS, _fmadd_pch); + + init_dest(&res, &exp); + EMULATE(c_fmadd_pch)(&exp, src1, src2, HALF_MASK, 0, 0, 0); + HF(res) = INTRINSIC (_mask_fmadd_round_pch) (HF(res), HALF_MASK, HF(src1), + HF(src2), _ROUND_NINT); + CHECK_RESULT (&res, &exp, N_ELEMS, _mask_fmadd_pch); + + init_dest(&res, &exp); + EMULATE(c_fmadd_pch)(&exp, src1, src2, HALF_MASK, 0, 0, 1); + HF(res) = INTRINSIC (_mask3_fmadd_round_pch) (HF(res), HF(src1), HF(src2), + HALF_MASK, _ROUND_NINT); + CHECK_RESULT (&res, &exp, N_ELEMS, _mask3_fmadd_pch); + + init_dest(&res, &exp); + EMULATE(c_fmadd_pch)(&exp, src1, src2, HALF_MASK, 1, 0, 0); + HF(res) = INTRINSIC (_maskz_fmadd_round_pch) (HALF_MASK, HF(res), HF(src1), + HF(src2), _ROUND_NINT); + CHECK_RESULT (&res, &exp, N_ELEMS, _maskz_fmadd_pch); +#endif + + if (n_errs != 0) { + abort (); + } +} + diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vfmulcph-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vfmulcph-1a.c new file mode 100644 index 00000000000..f31cbca368e --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vfmulcph-1a.c @@ -0,0 +1,25 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx512fp16 -O2" } */ +/* { dg-final { scan-assembler-times "vfmulcph\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vfmulcph\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\[^\{\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vfmulcph\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vfmulcph\[ \\t\]+\{rn-sae\}\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vfmulcph\[ \\t\]+\{rn-sae\}\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vfmulcph\[ \\t\]+\{rz-sae\}\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ + +#include + +volatile __m512h res, res1, res2; +volatile __m512h x1, x2, x3; +volatile __mmask16 m16; + +void extern +avx512f_test (void) +{ + res = _mm512_fmul_pch (x1, x2); + res1 = _mm512_mask_fmul_pch (res1, m16, x1, x2); + res2 = _mm512_maskz_fmul_pch (m16, x1, x2); + res = _mm512_fmul_round_pch (x1, x2, 8); + res1 = _mm512_mask_fmul_round_pch (res1, m16, x1, x2, 8); + res2 = _mm512_maskz_fmul_round_pch (m16, x1, x2, 11); +} diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vfmulcph-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vfmulcph-1b.c new file mode 100644 index 00000000000..d9bb1b0ec12 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vfmulcph-1b.c @@ -0,0 +1,115 @@ +/* { dg-do run { target avx512fp16 } } */ +/* { dg-options "-O2 -mavx512fp16 -mavx512dq" } */ + + +#define AVX512FP16 +#include "avx512fp16-helper.h" + +#define N_ELEMS (AVX512F_LEN / 16) + +void NOINLINE +EMULATE(c_fmul_pch) (V512 * dest, V512 op1, V512 op2, + __mmask16 k, int zero_mask, int c_flag) +{ + V512 v1, v2, v3, v4, v5, v6, v7, v8; + int i; + int invert = 1; + if (c_flag == 1) + invert = -1; + + unpack_ph_2twops(op1, &v1, &v2); + unpack_ph_2twops(op2, &v3, &v4); + unpack_ph_2twops(*dest, &v7, &v8); + + for (i = 0; i < 16; i++) { + if (((1 << (i / 2)) & k) == 0) { + if (zero_mask) { + v5.f32[i] = 0; + } + else { + v5.u32[i] = v7.u32[i]; + } + } + else { + if ((i % 2) == 0) { + v5.f32[i] = v1.f32[i] * v3.f32[i] + - invert * (v1.f32[i+1] * v3.f32[i+1]); + } + else { + v5.f32[i] = v1.f32[i-1] * v3.f32[i] + + invert * (v1.f32[i] * v3.f32[i-1]); + + } + } + if (((1 << (i / 2 + 8)) & k) == 0) { + if (zero_mask) { + v6.f32[i] = 0; + } + else { + v6.u32[i] = v8.u32[i]; + } + } + else { + if ((i % 2) == 0) { + v6.f32[i] = v2.f32[i] * v4.f32[i] + - invert * (v2.f32[i+1] * v4.f32[i+1]); + } + else { + v6.f32[i] = v2.f32[i-1] * v4.f32[i] + + invert * (v2.f32[i] * v4.f32[i-1]); + } + + } + } + + *dest = pack_twops_2ph(v5, v6); +} + +void +TEST (void) +{ + V512 res; + V512 exp; + + init_src(); + + EMULATE(c_fmul_pch)(&exp, src1, src2, NET_CMASK, 0, 0); + HF(res) = INTRINSIC (_fmul_pch) (HF(src1), HF(src2)); + CHECK_RESULT (&res, &exp, N_ELEMS, _fmul_pch); + + init_dest(&res, &exp); + EMULATE(c_fmul_pch)(&exp, src1, src2, HALF_MASK, 0, 0); + HF(res) = INTRINSIC (_mask_fmul_pch) (HF(res),HALF_MASK, HF(src1), + HF(src2)); + CHECK_RESULT (&res, &exp, N_ELEMS, _mask_fmul_pch); + + init_dest(&res, &exp); + EMULATE(c_fmul_pch)(&exp, src1, src2, HALF_MASK, 1, 0); + HF(res) = INTRINSIC (_maskz_fmul_pch) (HALF_MASK, HF(src1), + HF(src2)); + CHECK_RESULT (&res, &exp, N_ELEMS, _maskz_fmul_pch); + +#if AVX512F_LEN == 512 + init_dest(&res, &exp); + EMULATE(c_fmul_pch)(&exp, src1, src2, NET_CMASK, 0, 0); + HF(res) = INTRINSIC (_fmul_round_pch) (HF(src1), HF(src2), _ROUND_NINT); + CHECK_RESULT (&res, &exp, N_ELEMS, _fmul_pch); + + init_dest(&res, &exp); + EMULATE(c_fmul_pch)(&exp, src1, src2, HALF_MASK, 0, 0); + HF(res) = INTRINSIC (_mask_fmul_round_pch) (HF(res),HALF_MASK, HF(src1), + HF(src2), _ROUND_NINT); + CHECK_RESULT (&res, &exp, N_ELEMS, _mask_fmul_pch); + + init_dest(&res, &exp); + EMULATE(c_fmul_pch)(&exp, src1, src2, HALF_MASK, 1, 0); + HF(res) = INTRINSIC (_maskz_fmul_round_pch) (HALF_MASK, HF(src1), + HF(src2), _ROUND_NINT); + CHECK_RESULT (&res, &exp, N_ELEMS, _maskz_fmul_pch); +#endif + + if (n_errs != 0) { + abort (); + } +} + diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-vfcmaddcph-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vfcmaddcph-1a.c new file mode 100644 index 00000000000..eff13812c87 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vfcmaddcph-1a.c @@ -0,0 +1,30 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx512fp16 -mavx512vl -O2" } */ +/* { dg-final { scan-assembler-times "vfcmaddcph\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vfcmaddcph\[ \\t\]+%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 2 } } */ +/* { dg-final { scan-assembler-times "vfcmaddcph\[ \\t\]+%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vfcmaddcph\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vfcmaddcph\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 2 } } */ +/* { dg-final { scan-assembler-times "vfcmaddcph\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ + +#include + +volatile __m256h res1; +volatile __m128h res2; +volatile __m256h x1, x2, x3; +volatile __m128h x4, x5, x6; +volatile __mmask8 m8; + +void extern +avx512f_test (void) +{ + res1 = _mm256_fcmadd_pch (x1, x2, x3); + res1 = _mm256_mask_fcmadd_pch (res1, m8, x1, x2); + res1 = _mm256_mask3_fcmadd_pch (res1, x1, x2, m8); + res1 = _mm256_maskz_fcmadd_pch (m8, x1, x2, x3); + + res2 = _mm_fcmadd_pch (x4, x5, x6); + res2 = _mm_mask_fcmadd_pch (res2, m8, x4, x5); + res2 = _mm_mask3_fcmadd_pch (res2, x4, x5, m8); + res2 = _mm_maskz_fcmadd_pch (m8, x4, x5, x6); +} diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-vfcmaddcph-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vfcmaddcph-1b.c new file mode 100644 index 00000000000..5e3a54ecaae --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vfcmaddcph-1b.c @@ -0,0 +1,15 @@ +/* { dg-do run { target avx512fp16 } } */ +/* { dg-options "-O2 -mavx512fp16 -mavx512vl -mavx512dq" } */ + +#define AVX512VL +#define AVX512F_LEN 256 +#define AVX512F_LEN_HALF 128 +#include "avx512fp16-vfcmaddcph-1b.c" + +#undef AVX512F_LEN +#undef AVX512F_LEN_HALF + +#define AVX512F_LEN 128 +#define AVX512F_LEN_HALF 128 +#include "avx512fp16-vfcmaddcph-1b.c" + diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-vfcmulcph-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vfcmulcph-1a.c new file mode 100644 index 00000000000..4e48e9c7f85 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vfcmulcph-1a.c @@ -0,0 +1,28 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx512f -mavx512fp16 -mavx512vl -O2" } */ +/* { dg-final { scan-assembler-times "vfcmulcph\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vfcmulcph\[ \\t\]+%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vfcmulcph\[ \\t\]+%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vfcmulcph\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vfcmulcph\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vfcmulcph\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ + +#include + +volatile __m256h res1; +volatile __m128h res2; +volatile __m256h x1, x2, x3; +volatile __m128h x4, x5, x6; +volatile __mmask8 m8; + +void extern +avx512f_test (void) +{ + res1 = _mm256_fcmul_pch (x1, x2); + res1 = _mm256_mask_fcmul_pch (res1, m8, x1, x2); + res1 = _mm256_maskz_fcmul_pch (m8, x1, x2); + + res2 = _mm_fcmul_pch (x4, x5); + res2 = _mm_mask_fcmul_pch (res2, m8, x4, x5); + res2 = _mm_maskz_fcmul_pch (m8, x4, x5); +} diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-vfcmulcph-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vfcmulcph-1b.c new file mode 100644 index 00000000000..19564a1955d --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vfcmulcph-1b.c @@ -0,0 +1,15 @@ +/* { dg-do run { target avx512fp16 } } */ +/* { dg-options "-O2 -mavx512fp16 -mavx512vl -mavx512dq" } */ + +#define AVX512VL +#define AVX512F_LEN 256 +#define AVX512F_LEN_HALF 128 +#include "avx512fp16-vfcmulcph-1b.c" + +#undef AVX512F_LEN +#undef AVX512F_LEN_HALF + +#define AVX512F_LEN 128 +#define AVX512F_LEN_HALF 128 +#include "avx512fp16-vfcmulcph-1b.c" + diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-vfmaddcph-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vfmaddcph-1a.c new file mode 100644 index 00000000000..b9a24d0b9d8 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vfmaddcph-1a.c @@ -0,0 +1,30 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx512fp16 -mavx512vl -O2" } */ +/* { dg-final { scan-assembler-times "vfmaddcph\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vfmaddcph\[ \\t\]+%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 2 } } */ +/* { dg-final { scan-assembler-times "vfmaddcph\[ \\t\]+%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vfmaddcph\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vfmaddcph\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 2 } } */ +/* { dg-final { scan-assembler-times "vfmaddcph\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ + +#include + +volatile __m256h res1; +volatile __m128h res2; +volatile __m256h x1, x2, x3; +volatile __m128h x4, x5, x6; +volatile __mmask8 m8; + +void extern +avx512f_test (void) +{ + res1 = _mm256_fmadd_pch (x1, x2, x3); + res1 = _mm256_mask_fmadd_pch (res1, m8, x1, x2); + res1 = _mm256_mask3_fmadd_pch (res1, x1, x2, m8); + res1 = _mm256_maskz_fmadd_pch (m8, x1, x2, x3); + + res2 = _mm_fmadd_pch (x4, x5, x6); + res2 = _mm_mask_fmadd_pch (res2, m8, x4, x5); + res2 = _mm_mask3_fmadd_pch (res2, x4, x5, m8); + res2 = _mm_maskz_fmadd_pch (m8, x4, x5, x6); +} diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-vfmaddcph-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vfmaddcph-1b.c new file mode 100644 index 00000000000..bf85fea75ab --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vfmaddcph-1b.c @@ -0,0 +1,15 @@ +/* { dg-do run { target avx512fp16 } } */ +/* { dg-options "-O2 -mavx512fp16 -mavx512vl -mavx512dq" } */ + +#define AVX512VL +#define AVX512F_LEN 256 +#define AVX512F_LEN_HALF 128 +#include "avx512fp16-vfmaddcph-1b.c" + +#undef AVX512F_LEN +#undef AVX512F_LEN_HALF + +#define AVX512F_LEN 128 +#define AVX512F_LEN_HALF 128 +#include "avx512fp16-vfmaddcph-1b.c" + diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-vfmulcph-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vfmulcph-1a.c new file mode 100644 index 00000000000..54e58c66edb --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vfmulcph-1a.c @@ -0,0 +1,28 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx512f -mavx512fp16 -mavx512vl -O2" } */ +/* { dg-final { scan-assembler-times "vfmulcph\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vfmulcph\[ \\t\]+%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vfmulcph\[ \\t\]+%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vfmulcph\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vfmulcph\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vfmulcph\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ + +#include + +volatile __m256h res1; +volatile __m128h res2; +volatile __m256h x1, x2, x3; +volatile __m128h x4, x5, x6; +volatile __mmask8 m8; + +void extern +avx512f_test (void) +{ + res1 = _mm256_fmul_pch (x1, x2); + res1 = _mm256_mask_fmul_pch (res1, m8, x1, x2); + res1 = _mm256_maskz_fmul_pch (m8, x1, x2); + + res2 = _mm_fmul_pch (x4, x5); + res2 = _mm_mask_fmul_pch (res2, m8, x4, x5); + res2 = _mm_maskz_fmul_pch (m8, x4, x5); +} diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-vfmulcph-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vfmulcph-1b.c new file mode 100644 index 00000000000..f88d8423965 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vfmulcph-1b.c @@ -0,0 +1,15 @@ +/* { dg-do run { target avx512fp16 } } */ +/* { dg-options "-O2 -mavx512fp16 -mavx512vl -mavx512dq" } */ + +#define AVX512VL +#define AVX512F_LEN 256 +#define AVX512F_LEN_HALF 128 +#include "avx512fp16-vfmulcph-1b.c" + +#undef AVX512F_LEN +#undef AVX512F_LEN_HALF + +#define AVX512F_LEN 128 +#define AVX512F_LEN_HALF 128 +#include "avx512fp16-vfmulcph-1b.c" + From patchwork Thu Jul 1 06:16:37 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: liuhongt X-Patchwork-Id: 1499404 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=8.43.85.97; helo=sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256 header.s=default header.b=fDm6jrxd; dkim-atps=neutral Received: from sourceware.org (ip-8-43-85-97.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4GFqHc2jfdz9sWw for ; Thu, 1 Jul 2021 17:15:40 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 4CD14383F433 for ; Thu, 1 Jul 2021 07:15:36 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 4CD14383F433 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1625123736; bh=UvLicC8M6Y+PHRfERkUywVwZ+PeAloidNbtYZlJT6og=; h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=fDm6jrxdlq6PgBlENqJaX9r+HSx6yi2x8gemEYI/mbjtHTP6Y1ujWtXpOzyHP1mRW AiAEgs7sraWVYN2BOGEno3XGT+WmvCyBrwma5Gdm9Pq2LfS8bSELjsP85GCBKT0lKC GCMHR9P2aXP4i0SmtSMNr4LhD/DJyxURmRsaVmr0= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mga09.intel.com (mga09.intel.com [134.134.136.24]) by sourceware.org (Postfix) with ESMTPS id 84B0D385C8B1 for ; Thu, 1 Jul 2021 06:18:14 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 84B0D385C8B1 X-IronPort-AV: E=McAfee;i="6200,9189,10031"; a="208430366" X-IronPort-AV: E=Sophos;i="5.83,313,1616482800"; d="scan'208";a="208430366" Received: from orsmga004.jf.intel.com ([10.7.209.38]) by orsmga102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Jun 2021 23:18:13 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.83,313,1616482800"; d="scan'208";a="558546815" Received: from scymds01.sc.intel.com ([10.148.94.138]) by orsmga004.jf.intel.com with ESMTP; 30 Jun 2021 23:18:13 -0700 Received: from shliclel320.sh.intel.com (shliclel320.sh.intel.com [10.239.236.50]) by scymds01.sc.intel.com with ESMTP id 1616GmfU031625; Wed, 30 Jun 2021 23:18:12 -0700 To: gcc-patches@gcc.gnu.org Subject: [PATCH 51/62] AVX512FP16: Add vfcmaddcsh/vfmaddcsh/vfcmulcsh/vfmulcsh. Date: Thu, 1 Jul 2021 14:16:37 +0800 Message-Id: <20210701061648.9447-52-hongtao.liu@intel.com> X-Mailer: git-send-email 2.18.1 In-Reply-To: <20210701061648.9447-1-hongtao.liu@intel.com> References: <20210701061648.9447-1-hongtao.liu@intel.com> X-Spam-Status: No, score=-12.4 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_NONE, KAM_DMARC_STATUS, KAM_LAZY_DOMAIN_SECURITY, KAM_SHORT, RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: liuhongt via Gcc-patches From: liuhongt Reply-To: liuhongt Cc: jakub@redhat.com Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" gcc/ChangeLog: * config/i386/avx512fp16intrin.h (_mm_mask_fcmadd_sch): New intrinsic. (_mm_mask3_fcmadd_sch): Likewise. (_mm_maskz_fcmadd_sch): Likewise. (_mm_fcmadd_sch): Likewise. (_mm_mask_fmadd_sch): Likewise. (_mm_mask3_fmadd_sch): Likewise. (_mm_maskz_fmadd_sch): Likewise. (_mm_fmadd_sch): Likewise. (_mm_mask_fcmadd_round_sch): Likewise. (_mm_mask3_fcmadd_round_sch): Likewise. (_mm_maskz_fcmadd_round_sch): Likewise. (_mm_fcmadd_round_sch): Likewise. (_mm_mask_fmadd_round_sch): Likewise. (_mm_mask3_fmadd_round_sch): Likewise. (_mm_maskz_fmadd_round_sch): Likewise. (_mm_fmadd_round_sch): Likewise. (_mm_fcmul_sch): Likewise. (_mm_mask_fcmul_sch): Likewise. (_mm_maskz_fcmul_sch): Likewise. (_mm_fmul_sch): Likewise. (_mm_mask_fmul_sch): Likewise. (_mm_maskz_fmul_sch): Likewise. (_mm_fcmul_round_sch): Likewise. (_mm_mask_fcmul_round_sch): Likewise. (_mm_maskz_fcmul_round_sch): Likewise. (_mm_fmul_round_sch): Likewise. (_mm_mask_fmul_round_sch): Likewise. (_mm_maskz_fmul_round_sch): Likewise. * config/i386/i386-builtin.def: Add corresponding new builtins. * config/i386/sse.md (avx512fp16_fmaddcsh_v8hf_maskz): New expander. (avx512fp16_fcmaddcsh_v8hf_maskz): Ditto. (avx512fp16_fma_sh_v8hf): New define insn. (avx512fp16_sh_v8hf_mask): Ditto. (avx512fp16_sh_v8hf): Ditto. * config/i386/subst.md (mask_scalarcz_name): New. (mask_scalarc_name): Ditto. (mask_scalarc_operand3): Ditto. (mask_scalarcz_operand4): Ditto. (round_scalarcz_name): Ditto. (round_scalarc_mask_operand3): Ditto. (round_scalarcz_mask_operand4): Ditto. (round_scalarc_mask_op3): Ditto. (round_scalarcz_mask_op4): Ditto. (round_scalarcz_constraint): Ditto. (round_scalarcz_nimm_predicate): Ditto. (mask_scalarcz): Ditto. (mask_scalarc): Ditto. (round_scalarcz): Ditto. gcc/testsuite/ChangeLog: * gcc.target/i386/avx-1.c: Add test for new builtins. * gcc.target/i386/sse-13.c: Ditto. * gcc.target/i386/sse-23.c: Ditto. * gcc.target/i386/sse-14.c: Add test for new intrinsics. * gcc.target/i386/sse-22.c: Ditto. --- gcc/config/i386/avx512fp16intrin.h | 464 +++++++++++++++++++++++++ gcc/config/i386/i386-builtin.def | 10 + gcc/config/i386/sse.md | 76 ++++ gcc/config/i386/subst.md | 63 ++++ gcc/testsuite/gcc.target/i386/avx-1.c | 10 + gcc/testsuite/gcc.target/i386/sse-13.c | 10 + gcc/testsuite/gcc.target/i386/sse-14.c | 14 + gcc/testsuite/gcc.target/i386/sse-22.c | 14 + gcc/testsuite/gcc.target/i386/sse-23.c | 10 + 9 files changed, 671 insertions(+) diff --git a/gcc/config/i386/avx512fp16intrin.h b/gcc/config/i386/avx512fp16intrin.h index 9dd71019972..39c10beb1de 100644 --- a/gcc/config/i386/avx512fp16intrin.h +++ b/gcc/config/i386/avx512fp16intrin.h @@ -6495,6 +6495,470 @@ _mm512_maskz_fmul_round_pch (__mmask16 __A, __m512h __B, #endif /* __OPTIMIZE__ */ +/* Intrinsics vf[,c]maddcsh. */ +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_fcmadd_sch (__m128h __A, __mmask8 __B, __m128h __C, __m128h __D) +{ +#ifdef __AVX512VL__ + return (__m128h) __builtin_ia32_movaps128_mask ( + (__v4sf) + __builtin_ia32_vfcmaddcsh_v8hf_mask_round ((__v8hf) __D, + (__v8hf) __A, + (__v8hf) __C, __B, + _MM_FROUND_CUR_DIRECTION), + (__v4sf) __A, __B); +#else + return (__m128h) __builtin_ia32_blendvps ((__v4sf) __A, + (__v4sf) + __builtin_ia32_vfcmaddcsh_v8hf_mask_round ((__v8hf) __D, + (__v8hf) __A, + (__v8hf) __C, __B, + _MM_FROUND_CUR_DIRECTION), + (__v4sf) _mm_set_ss ((float) ((int) __B << 31))); +#endif +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask3_fcmadd_sch (__m128h __A, __m128h __B, __m128h __C, __mmask8 __D) +{ + return (__m128h) _mm_move_ss ((__m128) __C, + (__m128) + __builtin_ia32_vfcmaddcsh_v8hf_mask_round ((__v8hf) __C, + (__v8hf) __A, + (__v8hf) __B, __D, + _MM_FROUND_CUR_DIRECTION)); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_fcmadd_sch (__mmask8 __A, __m128h __B, __m128h __C, __m128h __D) +{ + return (__m128h) + __builtin_ia32_vfcmaddcsh_v8hf_maskz_round((__v8hf) __D, + (__v8hf) __B, + (__v8hf) __C, + __A, _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_fcmadd_sch (__m128h __A, __m128h __B, __m128h __C) +{ + return (__m128h) + __builtin_ia32_vfcmaddcsh_v8hf_round((__v8hf) __C, + (__v8hf) __A, + (__v8hf) __B, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_fmadd_sch (__m128h __A, __mmask8 __B, __m128h __C, __m128h __D) +{ +#ifdef __AVX512VL__ + return (__m128h) __builtin_ia32_movaps128_mask ( + (__v4sf) + __builtin_ia32_vfmaddcsh_v8hf_mask_round ((__v8hf) __D, + (__v8hf) __A, + (__v8hf) __C, __B, + _MM_FROUND_CUR_DIRECTION), + (__v4sf) __A, __B); +#else + return (__m128h) __builtin_ia32_blendvps ((__v4sf) __A, + (__v4sf) + __builtin_ia32_vfmaddcsh_v8hf_mask_round ((__v8hf) __D, + (__v8hf) __A, + (__v8hf) __C, __B, + _MM_FROUND_CUR_DIRECTION), + (__v4sf) _mm_set_ss ((float) ((int) __B << 31))); +#endif +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask3_fmadd_sch (__m128h __A, __m128h __B, __m128h __C, __mmask8 __D) +{ + return (__m128h) _mm_move_ss ((__m128) __C, + (__m128) + __builtin_ia32_vfmaddcsh_v8hf_mask_round ((__v8hf) __C, + (__v8hf) __A, + (__v8hf) __B, __D, + _MM_FROUND_CUR_DIRECTION)); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_fmadd_sch (__mmask8 __A, __m128h __B, __m128h __C, __m128h __D) +{ + return (__m128h) + __builtin_ia32_vfmaddcsh_v8hf_maskz_round((__v8hf) __D, + (__v8hf) __B, + (__v8hf) __C, + __A, _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_fmadd_sch (__m128h __A, __m128h __B, __m128h __C) +{ + return (__m128h) + __builtin_ia32_vfmaddcsh_v8hf_round((__v8hf) __C, + (__v8hf) __A, + (__v8hf) __B, + _MM_FROUND_CUR_DIRECTION); +} + +#ifdef __OPTIMIZE__ +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_fcmadd_round_sch (__m128h __A, __mmask8 __B, __m128h __C, + __m128h __D, const int __E) +{ +#ifdef __AVX512VL__ + return (__m128h) __builtin_ia32_movaps128_mask ( + (__v4sf) + __builtin_ia32_vfcmaddcsh_v8hf_mask_round ((__v8hf) __D, + (__v8hf) __A, + (__v8hf) __C, + __B, __E), + (__v4sf) __A, __B); +#else + return (__m128h) __builtin_ia32_blendvps ((__v4sf) __A, + (__v4sf) + __builtin_ia32_vfcmaddcsh_v8hf_mask_round ((__v8hf) __D, + (__v8hf) __A, + (__v8hf) __C, + __B, __E), + (__v4sf) _mm_set_ss ((float) ((int) __B << 31))); +#endif +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask3_fcmadd_round_sch (__m128h __A, __m128h __B, __m128h __C, + __mmask8 __D, const int __E) +{ + return (__m128h) _mm_move_ss ((__m128) __C, + (__m128) + __builtin_ia32_vfcmaddcsh_v8hf_mask_round ((__v8hf) __C, + (__v8hf) __A, + (__v8hf) __B, + __D, __E)); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_fcmadd_round_sch (__mmask8 __A, __m128h __B, __m128h __C, + __m128h __D, const int __E) +{ + return (__m128h)__builtin_ia32_vfcmaddcsh_v8hf_maskz_round((__v8hf) __D, + (__v8hf) __B, + (__v8hf) __C, + __A, __E); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_fcmadd_round_sch (__m128h __A, __m128h __B, __m128h __C, const int __D) +{ + return (__m128h)__builtin_ia32_vfcmaddcsh_v8hf_round((__v8hf) __C, + (__v8hf) __A, + (__v8hf) __B, + __D); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_fmadd_round_sch (__m128h __A, __mmask8 __B, __m128h __C, + __m128h __D, const int __E) +{ +#ifdef __AVX512VL__ + return (__m128h) __builtin_ia32_movaps128_mask ( + (__v4sf) + __builtin_ia32_vfmaddcsh_v8hf_mask_round ((__v8hf) __D, + (__v8hf) __A, + (__v8hf) __C, + __B, __E), + (__v4sf) __A, __B); +#else + return (__m128h) __builtin_ia32_blendvps ((__v4sf) __A, + (__v4sf) + __builtin_ia32_vfmaddcsh_v8hf_mask_round ((__v8hf) __D, + (__v8hf) __A, + (__v8hf) __C, + __B, __E), + (__v4sf) _mm_set_ss ((float) ((int) __B << 31))); +#endif +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask3_fmadd_round_sch (__m128h __A, __m128h __B, __m128h __C, + __mmask8 __D, const int __E) +{ + return (__m128h) _mm_move_ss ((__m128) __C, + (__m128) + __builtin_ia32_vfmaddcsh_v8hf_mask_round ((__v8hf) __C, + (__v8hf) __A, + (__v8hf) __B, + __D, __E)); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_fmadd_round_sch (__mmask8 __A, __m128h __B, __m128h __C, + __m128h __D, const int __E) +{ + return (__m128h)__builtin_ia32_vfmaddcsh_v8hf_maskz_round((__v8hf) __D, + (__v8hf) __B, + (__v8hf) __C, + __A, __E); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_fmadd_round_sch (__m128h __A, __m128h __B, __m128h __C, const int __D) +{ + return (__m128h)__builtin_ia32_vfmaddcsh_v8hf_round((__v8hf) __C, + (__v8hf) __A, + (__v8hf) __B, + __D); +} + +#else +#ifdef __AVX512VL__ +#define _mm_mask_fcmadd_round_sch(A, B, C, D, E) \ + ((__m128h) __builtin_ia32_movaps128_mask ( \ + (__v4sf) \ + __builtin_ia32_vfcmaddcsh_v8hf_mask_round ((__v8hf) (D), \ + (__v8hf) (A), \ + (__v8hf) (C), \ + (B), (E)), \ + (__v4sf) (A), (B))) + +#else +#define _mm_mask_fcmadd_round_sch(A, B, C, D, E) \ + ((__m128h) __builtin_ia32_blendvps ((__v4sf) (A), \ + (__v4sf) \ + __builtin_ia32_vfcmaddcsh_v8hf_mask_round ((__v8hf) (D), \ + (__v8hf) (A), \ + (__v8hf) (C), \ + (B), (E)), \ + (__v4sf) _mm_set_ss ((float) ((int) (B) << 31)))) +#endif + +#define _mm_mask3_fcmadd_round_sch(A, B, C, D, E) \ + ((__m128h) _mm_move_ss ((__m128) (C), \ + (__m128) \ + __builtin_ia32_vfcmaddcsh_v8hf_mask_round ((__v8hf) (C), \ + (__v8hf) (A), \ + (__v8hf) (B), \ + (D), (E)))) + +#define _mm_maskz_fcmadd_round_sch(A, B, C, D, E) \ + __builtin_ia32_vfcmaddcsh_v8hf_maskz_round ((D), (B), (C), (A), (E)) + +#define _mm_fcmadd_round_sch(A, B, C, D) \ + __builtin_ia32_vfcmaddcsh_v8hf_round ((C), (A), (B), (D)) + +#ifdef __AVX512VL__ +#define _mm_mask_fmadd_round_sch(A, B, C, D, E) \ + ((__m128h) __builtin_ia32_movaps128_mask ( \ + (__v4sf) \ + __builtin_ia32_vfmaddcsh_v8hf_mask_round ((__v8hf) (D), \ + (__v8hf) (A), \ + (__v8hf) (C), \ + (B), (E)), \ + (__v4sf) (A), (B))) + +#else +#define _mm_mask_fmadd_round_sch(A, B, C, D, E) \ + ((__m128h) __builtin_ia32_blendvps ((__v4sf) (A), \ + (__v4sf) \ + __builtin_ia32_vfmaddcsh_v8hf_mask_round ((__v8hf) (D), \ + (__v8hf) (A), \ + (__v8hf) (C), \ + (B), (E)), \ + (__v4sf) _mm_set_ss ((float) ((int) (B) << 31)))) +#endif + +#define _mm_mask3_fmadd_round_sch(A, B, C, D, E) \ + ((__m128h) _mm_move_ss ((__m128) (C), \ + (__m128) \ + __builtin_ia32_vfmaddcsh_v8hf_mask_round ((__v8hf) (C), \ + (__v8hf) (A), \ + (__v8hf) (B), \ + (D), (E)))) + +#define _mm_maskz_fmadd_round_sch(A, B, C, D, E) \ + __builtin_ia32_vfmaddcsh_v8hf_maskz_round ((D), (B), (C), (A), (E)) + +#define _mm_fmadd_round_sch(A, B, C, D) \ + __builtin_ia32_vfmaddcsh_v8hf_round ((C), (A), (B), (D)) + +#endif /* __OPTIMIZE__ */ + +/* Intrinsics vf[,c]mulcsh. */ +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_fcmul_sch (__m128h __A, __m128h __B) +{ + return (__m128h) + __builtin_ia32_vfcmulcsh_v8hf_round((__v8hf) __A, + (__v8hf) __B, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_fcmul_sch (__m128h __A, __mmask8 __B, __m128h __C, __m128h __D) +{ + return (__m128h) + __builtin_ia32_vfcmulcsh_v8hf_mask_round((__v8hf) __C, + (__v8hf) __D, + (__v8hf) __A, + __B, _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_fcmul_sch (__mmask8 __A, __m128h __B, __m128h __C) +{ + return (__m128h) + __builtin_ia32_vfcmulcsh_v8hf_mask_round((__v8hf) __B, + (__v8hf) __C, + _mm_setzero_ph (), + __A, _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_fmul_sch (__m128h __A, __m128h __B) +{ + return (__m128h) + __builtin_ia32_vfmulcsh_v8hf_round((__v8hf) __A, + (__v8hf) __B, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_fmul_sch (__m128h __A, __mmask8 __B, __m128h __C, __m128h __D) +{ + return (__m128h) + __builtin_ia32_vfmulcsh_v8hf_mask_round((__v8hf) __C, + (__v8hf) __D, + (__v8hf) __A, + __B, _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_fmul_sch (__mmask8 __A, __m128h __B, __m128h __C) +{ + return (__m128h) + __builtin_ia32_vfmulcsh_v8hf_mask_round((__v8hf) __B, + (__v8hf) __C, + _mm_setzero_ph (), + __A, _MM_FROUND_CUR_DIRECTION); +} + +#ifdef __OPTIMIZE__ +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_fcmul_round_sch (__m128h __A, __m128h __B, const int __D) +{ + return (__m128h)__builtin_ia32_vfcmulcsh_v8hf_round((__v8hf) __A, + (__v8hf) __B, + __D); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_fcmul_round_sch (__m128h __A, __mmask8 __B, __m128h __C, + __m128h __D, const int __E) +{ + return (__m128h)__builtin_ia32_vfcmulcsh_v8hf_mask_round((__v8hf) __C, + (__v8hf) __D, + (__v8hf) __A, + __B, __E); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_fcmul_round_sch (__mmask8 __A, __m128h __B, __m128h __C, + const int __E) +{ + return (__m128h)__builtin_ia32_vfcmulcsh_v8hf_mask_round((__v8hf) __B, + (__v8hf) __C, + _mm_setzero_ph (), + __A, __E); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_fmul_round_sch (__m128h __A, __m128h __B, const int __D) +{ + return (__m128h)__builtin_ia32_vfmulcsh_v8hf_round((__v8hf) __A, + (__v8hf) __B, __D); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_fmul_round_sch (__m128h __A, __mmask8 __B, __m128h __C, + __m128h __D, const int __E) +{ + return (__m128h)__builtin_ia32_vfmulcsh_v8hf_mask_round((__v8hf) __C, + (__v8hf) __D, + (__v8hf) __A, + __B, __E); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_fmul_round_sch (__mmask8 __A, __m128h __B, __m128h __C, const int __E) +{ + return (__m128h)__builtin_ia32_vfmulcsh_v8hf_mask_round((__v8hf) __B, + (__v8hf) __C, + _mm_setzero_ph (), + __A, __E); +} + +#else +#define _mm_fcmul_round_sch(__A, __B, __D) \ + (__m128h)__builtin_ia32_vfcmulcsh_v8hf_round((__v8hf) __A,(__v8hf) __B, __D) + +#define _mm_mask_fcmul_round_sch(__A, __B, __C, __D, __E) \ + (__m128h)__builtin_ia32_vfcmulcsh_v8hf_mask_round((__v8hf) __C, \ + (__v8hf) __D, \ + (__v8hf) __A, \ + __B, __E) + +#define _mm_maskz_fcmul_round_sch(__A, __B, __C, __E) \ + (__m128h)__builtin_ia32_vfcmulcsh_v8hf_mask_round((__v8hf) __B, \ + (__v8hf) __C, \ + _mm_setzero_ph(), \ + __A, __E) + +#define _mm_fmul_round_sch(__A, __B, __D) \ + (__m128h)__builtin_ia32_vfmulcsh_v8hf_round((__v8hf) __A,(__v8hf) __B, __D) + +#define _mm_mask_fmul_round_sch(__A, __B, __C, __D, __E) \ + (__m128h)__builtin_ia32_vfmulcsh_v8hf_mask_round((__v8hf) __C, \ + (__v8hf) __D, \ + (__v8hf) __A, \ + __B, __E) + +#define _mm_maskz_fmul_round_sch(__A, __B, __C, __E) \ + (__m128h)__builtin_ia32_vfmulcsh_v8hf_mask_round((__v8hf) __B, \ + (__v8hf) __C, \ + _mm_setzero_ph (), \ + __A, __E) + +#endif /* __OPTIMIZE__ */ + #ifdef __DISABLE_AVX512FP16__ #undef __DISABLE_AVX512FP16__ #pragma GCC pop_options diff --git a/gcc/config/i386/i386-builtin.def b/gcc/config/i386/i386-builtin.def index 448f9f75fa4..8d57413153f 100644 --- a/gcc/config/i386/i386-builtin.def +++ b/gcc/config/i386/i386-builtin.def @@ -3231,6 +3231,16 @@ BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_fcmulc_v32hf_round, "__ BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_fcmulc_v32hf_mask_round, "__builtin_ia32_vfcmulcph_v32hf_mask_round", IX86_BUILTIN_VFCMULCPH_V32HF_MASK_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_UHI_INT) BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_fmulc_v32hf_round, "__builtin_ia32_vfmulcph_v32hf_round", IX86_BUILTIN_VFMULCPH_V32HF_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_INT) BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_fmulc_v32hf_mask_round, "__builtin_ia32_vfmulcph_v32hf_mask_round", IX86_BUILTIN_VFMULCPH_V32HF_MASK_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_UHI_INT) +BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_fma_fcmaddcsh_v8hf_round, "__builtin_ia32_vfcmaddcsh_v8hf_round", IX86_BUILTIN_VFCMADDCSH_V8HF_ROUND, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_INT) +BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_fcmaddcsh_v8hf_mask_round, "__builtin_ia32_vfcmaddcsh_v8hf_mask_round", IX86_BUILTIN_VFCMADDCSH_V8HF_MASK_ROUND, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI_INT) +BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_fcmaddcsh_v8hf_maskz_round, "__builtin_ia32_vfcmaddcsh_v8hf_maskz_round", IX86_BUILTIN_VFCMADDCSH_V8HF_MASKZ_ROUND, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI_INT) +BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_fma_fmaddcsh_v8hf_round, "__builtin_ia32_vfmaddcsh_v8hf_round", IX86_BUILTIN_VFMADDCSH_V8HF_ROUND, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_INT) +BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_fmaddcsh_v8hf_mask_round, "__builtin_ia32_vfmaddcsh_v8hf_mask_round", IX86_BUILTIN_VFMADDCSH_V8HF_MASK_ROUND, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI_INT) +BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_fmaddcsh_v8hf_maskz_round, "__builtin_ia32_vfmaddcsh_v8hf_maskz_round", IX86_BUILTIN_VFMADDCSH_V8HF_MASKZ_ROUND, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI_INT) +BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_fcmulcsh_v8hf_round, "__builtin_ia32_vfcmulcsh_v8hf_round", IX86_BUILTIN_VFCMULCSH_V8HF_ROUND, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_INT) +BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_fcmulcsh_v8hf_mask_round, "__builtin_ia32_vfcmulcsh_v8hf_mask_round", IX86_BUILTIN_VFCMULCSH_V8HF_MASK_ROUND, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI_INT) +BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_fmulcsh_v8hf_round, "__builtin_ia32_vfmulcsh_v8hf_round", IX86_BUILTIN_VFMULCSH_V8HF_ROUND, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_INT) +BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_fmulcsh_v8hf_mask_round, "__builtin_ia32_vfmulcsh_v8hf_mask_round", IX86_BUILTIN_VFMULCSH_V8HF_MASK_ROUND, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI_INT) BDESC_END (ROUND_ARGS, MULTI_ARG) diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index ddd93f739e3..2c3dba5bdb0 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -5597,6 +5597,82 @@ (define_insn "__" [(set_attr "type" "ssemul") (set_attr "mode" "")]) +(define_expand "avx512fp16_fmaddcsh_v8hf_maskz" + [(match_operand:V8HF 0 "register_operand") + (match_operand:V8HF 1 "") + (match_operand:V8HF 2 "") + (match_operand:V8HF 3 "") + (match_operand:QI 4 "register_operand")] + "TARGET_AVX512FP16 && " +{ + emit_insn (gen_avx512fp16_fma_fmaddcsh_v8hf_maskz ( + operands[0], operands[1], operands[2], operands[3], + CONST0_RTX (V8HFmode), operands[4])); + DONE; +}) + +(define_expand "avx512fp16_fcmaddcsh_v8hf_maskz" + [(match_operand:V8HF 0 "register_operand") + (match_operand:V8HF 1 "") + (match_operand:V8HF 2 "") + (match_operand:V8HF 3 "") + (match_operand:QI 4 "register_operand")] + "TARGET_AVX512FP16 && " +{ + emit_insn (gen_avx512fp16_fma_fcmaddcsh_v8hf_maskz ( + operands[0], operands[1], operands[2], operands[3], + CONST0_RTX (V8HFmode), operands[4])); + DONE; +}) + +(define_insn "avx512fp16_fma_sh_v8hf" + [(set (match_operand:V8HF 0 "register_operand" "=v") + (vec_merge:V8HF + (unspec:V8HF + [(match_operand:V8HF 1 "" "0") + (match_operand:V8HF 2 "" "v") + (match_operand:V8HF 3 "" "")] + UNSPEC_COMPLEX_F_C_MA) + (match_dup 2) + (const_int 3)))] + "TARGET_AVX512FP16" + "vsh\t{%3, %2, %0|%0, %2, %3}" + [(set_attr "type" "ssemuladd") + (set_attr "mode" "V8HF")]) + +(define_insn "avx512fp16_sh_v8hf_mask" + [(set (match_operand:V8HF 0 "register_operand" "=v") + (vec_merge:V8HF + (vec_merge:V8HF + (unspec:V8HF + [(match_operand:V8HF 1 "" "0") + (match_operand:V8HF 2 "" "v") + (match_operand:V8HF 3 "" "")] + UNSPEC_COMPLEX_F_C_MA) + (match_dup 1) + (unspec:QI [(match_operand:QI 4 "register_operand" "Yk")] + UNSPEC_COMPLEX_MASK)) + (match_dup 2) + (const_int 3)))] + "TARGET_AVX512FP16" + "vsh\t{%3, %2, %0%{%4%}|%0%{%4%}, %2, %3}" + [(set_attr "type" "ssemuladd") + (set_attr "mode" "V8HF")]) + +(define_insn "avx512fp16_sh_v8hf" + [(set (match_operand:V8HF 0 "register_operand" "=v") + (vec_merge:V8HF + (unspec:V8HF + [(match_operand:V8HF 1 "nonimmediate_operand" "v") + (match_operand:V8HF 2 "" "")] + UNSPEC_COMPLEX_F_C_MUL) + (match_dup 1) + (const_int 3)))] + "TARGET_AVX512FP16" + "vsh\t{%2, %1, %0|%0, %1, %2}" + [(set_attr "type" "ssemul") + (set_attr "mode" "V8HF")]) + ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; ;; ;; Parallel half-precision floating point conversion operations diff --git a/gcc/config/i386/subst.md b/gcc/config/i386/subst.md index 3a1f554e9b9..5b14a632111 100644 --- a/gcc/config/i386/subst.md +++ b/gcc/config/i386/subst.md @@ -308,8 +308,12 @@ (define_subst "mask_expand4" (match_operand: 5 "register_operand")]) (define_subst_attr "mask_scalar_name" "mask_scalar" "" "_mask") +(define_subst_attr "mask_scalarcz_name" "mask_scalarcz" "" "_maskz") +(define_subst_attr "mask_scalarc_name" "mask_scalarc" "" "_mask") +(define_subst_attr "mask_scalarc_operand3" "mask_scalarc" "" "%{%4%}%N3") (define_subst_attr "mask_scalar_operand3" "mask_scalar" "" "%{%4%}%N3") (define_subst_attr "mask_scalar_operand4" "mask_scalar" "" "%{%5%}%N4") +(define_subst_attr "mask_scalarcz_operand4" "mask_scalarcz" "" "%{%5%}%N4") (define_subst "mask_scalar" [(set (match_operand:SUBST_V 0) @@ -327,12 +331,55 @@ (define_subst "mask_scalar" (match_dup 2) (const_int 1)))]) +(define_subst "mask_scalarcz" + [(set (match_operand:SUBST_CV 0) + (vec_merge:SUBST_CV + (match_operand:SUBST_CV 1) + (match_operand:SUBST_CV 2) + (const_int 3)))] + "TARGET_AVX512F" + [(set (match_dup 0) + (vec_merge:SUBST_CV + (vec_merge:SUBST_CV + (match_dup 1) + (match_operand:SUBST_CV 3 "const0_operand" "C") + (unspec: + [(match_operand: 4 "register_operand" "Yk")] + UNSPEC_COMPLEX_MASK)) + (match_dup 2) + (const_int 3)))]) + +(define_subst "mask_scalarc" + [(set (match_operand:SUBST_CV 0) + (vec_merge:SUBST_CV + (match_operand:SUBST_CV 1) + (match_operand:SUBST_CV 2) + (const_int 3)))] + "TARGET_AVX512F" + [(set (match_dup 0) + (vec_merge:SUBST_CV + (vec_merge:SUBST_CV + (match_dup 1) + (match_operand:SUBST_CV 3 "nonimm_or_0_operand" "0C") + (unspec: + [(match_operand: 4 "register_operand" "Yk")] + UNSPEC_COMPLEX_MASK)) + (match_dup 2) + (const_int 3)))]) + (define_subst_attr "round_scalar_name" "round_scalar" "" "_round") +(define_subst_attr "round_scalarcz_name" "round_scalarcz" "" "_round") (define_subst_attr "round_scalar_mask_operand3" "mask_scalar" "%R3" "%R5") +(define_subst_attr "round_scalarc_mask_operand3" "mask_scalarc" "%R3" "%R5") +(define_subst_attr "round_scalarcz_mask_operand4" "mask_scalarcz" "%R4" "%R6") (define_subst_attr "round_scalar_mask_op3" "round_scalar" "" "") +(define_subst_attr "round_scalarc_mask_op3" "round_scalarcz" "" "") +(define_subst_attr "round_scalarcz_mask_op4" "round_scalarcz" "" "") (define_subst_attr "round_scalar_constraint" "round_scalar" "vm" "v") +(define_subst_attr "round_scalarcz_constraint" "round_scalarcz" "vm" "v") (define_subst_attr "round_scalar_prefix" "round_scalar" "vex" "evex") (define_subst_attr "round_scalar_nimm_predicate" "round_scalar" "nonimmediate_operand" "register_operand") +(define_subst_attr "round_scalarcz_nimm_predicate" "round_scalarcz" "vector_operand" "register_operand") (define_subst "round_scalar" [(set (match_operand:SUBST_V 0) @@ -350,6 +397,22 @@ (define_subst "round_scalar" (match_operand:SI 3 "const_4_or_8_to_11_operand")] UNSPEC_EMBEDDED_ROUNDING))]) +(define_subst "round_scalarcz" + [(set (match_operand:SUBST_V 0) + (vec_merge:SUBST_V + (match_operand:SUBST_V 1) + (match_operand:SUBST_V 2) + (const_int 3)))] + "TARGET_AVX512F" + [(set (match_dup 0) + (unspec:SUBST_V [ + (vec_merge:SUBST_V + (match_dup 1) + (match_dup 2) + (const_int 3)) + (match_operand:SI 3 "const_4_or_8_to_11_operand")] + UNSPEC_EMBEDDED_ROUNDING))]) + (define_subst_attr "round_saeonly_scalar_name" "round_saeonly_scalar" "" "_round") (define_subst_attr "round_saeonly_scalar_mask_operand3" "mask_scalar" "%r3" "%r5") (define_subst_attr "round_saeonly_scalar_mask_operand4" "mask_scalar" "%r4" "%r6") diff --git a/gcc/testsuite/gcc.target/i386/avx-1.c b/gcc/testsuite/gcc.target/i386/avx-1.c index 56e90d9f9a5..69de37a0087 100644 --- a/gcc/testsuite/gcc.target/i386/avx-1.c +++ b/gcc/testsuite/gcc.target/i386/avx-1.c @@ -797,6 +797,16 @@ #define __builtin_ia32_vfmulcph_v32hf_mask_round(A, C, D, B, E) __builtin_ia32_vfmulcph_v32hf_mask_round(A, C, D, B, 8) #define __builtin_ia32_vfcmulcph_v32hf_round(A, B, C) __builtin_ia32_vfcmulcph_v32hf_round(A, B, 8) #define __builtin_ia32_vfcmulcph_v32hf_mask_round(A, C, D, B, E) __builtin_ia32_vfcmulcph_v32hf_mask_round(A, C, D, B, 8) +#define __builtin_ia32_vfmaddcsh_v8hf_round(A, B, C, D) __builtin_ia32_vfmaddcsh_v8hf_round(A, B, C, 8) +#define __builtin_ia32_vfmaddcsh_v8hf_mask_round(A, C, D, B, E) __builtin_ia32_vfmaddcsh_v8hf_mask_round(A, C, D, B, 8) +#define __builtin_ia32_vfmaddcsh_v8hf_maskz_round(B, C, D, A, E) __builtin_ia32_vfmaddcsh_v8hf_maskz_round(B, C, D, A, 8) +#define __builtin_ia32_vfcmaddcsh_v8hf_round(A, B, C, D) __builtin_ia32_vfcmaddcsh_v8hf_round(A, B, C, 8) +#define __builtin_ia32_vfcmaddcsh_v8hf_mask_round(A, C, D, B, E) __builtin_ia32_vfcmaddcsh_v8hf_mask_round(A, C, D, B, 8) +#define __builtin_ia32_vfcmaddcsh_v8hf_maskz_round(B, C, D, A, E) __builtin_ia32_vfcmaddcsh_v8hf_maskz_round(B, C, D, A, 8) +#define __builtin_ia32_vfmulcsh_v8hf_round(A, B, C) __builtin_ia32_vfmulcsh_v8hf_round(A, B, 8) +#define __builtin_ia32_vfmulcsh_v8hf_mask_round(A, C, D, B, E) __builtin_ia32_vfmulcsh_v8hf_mask_round(A, C, D, B, 8) +#define __builtin_ia32_vfcmulcsh_v8hf_round(A, B, C) __builtin_ia32_vfcmulcsh_v8hf_round(A, B, 8) +#define __builtin_ia32_vfcmulcsh_v8hf_mask_round(A, C, D, B, E) __builtin_ia32_vfcmulcsh_v8hf_mask_round(A, C, D, B, 8) /* avx512fp16vlintrin.h */ #define __builtin_ia32_vcmpph_v8hf_mask(A, B, C, D) __builtin_ia32_vcmpph_v8hf_mask(A, B, 1, D) diff --git a/gcc/testsuite/gcc.target/i386/sse-13.c b/gcc/testsuite/gcc.target/i386/sse-13.c index ef9f8aad853..60adfcc1c67 100644 --- a/gcc/testsuite/gcc.target/i386/sse-13.c +++ b/gcc/testsuite/gcc.target/i386/sse-13.c @@ -814,6 +814,16 @@ #define __builtin_ia32_vfmulcph_v32hf_mask_round(A, C, D, B, E) __builtin_ia32_vfmulcph_v32hf_mask_round(A, C, D, B, 8) #define __builtin_ia32_vfcmulcph_v32hf_round(A, B, C) __builtin_ia32_vfcmulcph_v32hf_round(A, B, 8) #define __builtin_ia32_vfcmulcph_v32hf_mask_round(A, C, D, B, E) __builtin_ia32_vfcmulcph_v32hf_mask_round(A, C, D, B, 8) +#define __builtin_ia32_vfmaddcsh_v8hf_round(A, B, C, D) __builtin_ia32_vfmaddcsh_v8hf_round(A, B, C, 8) +#define __builtin_ia32_vfmaddcsh_v8hf_mask_round(A, C, D, B, E) __builtin_ia32_vfmaddcsh_v8hf_mask_round(A, C, D, B, 8) +#define __builtin_ia32_vfmaddcsh_v8hf_maskz_round(B, C, D, A, E) __builtin_ia32_vfmaddcsh_v8hf_maskz_round(B, C, D, A, 8) +#define __builtin_ia32_vfcmaddcsh_v8hf_round(A, B, C, D) __builtin_ia32_vfcmaddcsh_v8hf_round(A, B, C, 8) +#define __builtin_ia32_vfcmaddcsh_v8hf_mask_round(A, C, D, B, E) __builtin_ia32_vfcmaddcsh_v8hf_mask_round(A, C, D, B, 8) +#define __builtin_ia32_vfcmaddcsh_v8hf_maskz_round(B, C, D, A, E) __builtin_ia32_vfcmaddcsh_v8hf_maskz_round(B, C, D, A, 8) +#define __builtin_ia32_vfmulcsh_v8hf_round(A, B, C) __builtin_ia32_vfmulcsh_v8hf_round(A, B, 8) +#define __builtin_ia32_vfmulcsh_v8hf_mask_round(A, C, D, B, E) __builtin_ia32_vfmulcsh_v8hf_mask_round(A, C, D, B, 8) +#define __builtin_ia32_vfcmulcsh_v8hf_round(A, B, C) __builtin_ia32_vfcmulcsh_v8hf_round(A, B, 8) +#define __builtin_ia32_vfcmulcsh_v8hf_mask_round(A, C, D, B, E) __builtin_ia32_vfcmulcsh_v8hf_mask_round(A, C, D, B, 8) /* avx512fp16vlintrin.h */ #define __builtin_ia32_vcmpph_v8hf_mask(A, B, C, D) __builtin_ia32_vcmpph_v8hf_mask(A, B, 1, D) diff --git a/gcc/testsuite/gcc.target/i386/sse-14.c b/gcc/testsuite/gcc.target/i386/sse-14.c index f27c73fd4cc..956a9d16f84 100644 --- a/gcc/testsuite/gcc.target/i386/sse-14.c +++ b/gcc/testsuite/gcc.target/i386/sse-14.c @@ -774,6 +774,8 @@ test_2 (_mm_cvt_roundi32_sh, __m128h, __m128h, int, 8) test_2 (_mm_cvt_roundu32_sh, __m128h, __m128h, unsigned, 8) test_2 (_mm512_fmul_round_pch, __m512h, __m512h, __m512h, 8) test_2 (_mm512_fcmul_round_pch, __m512h, __m512h, __m512h, 8) +test_2 (_mm_fmul_round_sch, __m128h, __m128h, __m128h, 8) +test_2 (_mm_fcmul_round_sch, __m128h, __m128h, __m128h, 8) test_2x (_mm512_cmp_round_ph_mask, __mmask32, __m512h, __m512h, 1, 8) test_2x (_mm_cmp_round_sh_mask, __mmask8, __m128h, __m128h, 1, 8) test_2x (_mm_comi_round_sh, int, __m128h, __m128h, 1, 8) @@ -850,8 +852,12 @@ test_3 (_mm_fmsub_round_sh, __m128h, __m128h, __m128h, __m128h, 9) test_3 (_mm_fnmsub_round_sh, __m128h, __m128h, __m128h, __m128h, 9) test_3 (_mm512_fmadd_round_pch, __m512h, __m512h, __m512h, __m512h, 8) test_3 (_mm512_fcmadd_round_pch, __m512h, __m512h, __m512h, __m512h, 8) +test_3 (_mm_fmadd_round_sch, __m128h, __m128h, __m128h, __m128h, 8) +test_3 (_mm_fcmadd_round_sch, __m128h, __m128h, __m128h, __m128h, 8) test_3 (_mm512_maskz_fmul_round_pch, __m512h, __mmask16, __m512h, __m512h, 8) test_3 (_mm512_maskz_fcmul_round_pch, __m512h, __mmask16, __m512h, __m512h, 8) +test_3 (_mm_maskz_fmul_round_sch, __m128h, __mmask8, __m128h, __m128h, 8) +test_3 (_mm_maskz_fcmul_round_sch, __m128h, __mmask8, __m128h, __m128h, 8) test_3x (_mm512_mask_cmp_round_ph_mask, __mmask32, __mmask32, __m512h, __m512h, 1, 8) test_3x (_mm_mask_cmp_round_sh_mask, __mmask8, __mmask8, __m128h, __m128h, 1, 8) test_3x (_mm512_mask_reduce_round_ph, __m512h, __m512h, __mmask32, __m512h, 123, 8) @@ -920,8 +926,16 @@ test_4 (_mm512_mask3_fmadd_round_pch, __m512h, __m512h, __m512h, __m512h, __mmas test_4 (_mm512_mask3_fcmadd_round_pch, __m512h, __m512h, __m512h, __m512h, __mmask16, 8) test_4 (_mm512_maskz_fmadd_round_pch, __m512h, __mmask16, __m512h, __m512h, __m512h, 8) test_4 (_mm512_maskz_fcmadd_round_pch, __m512h, __mmask16, __m512h, __m512h, __m512h, 8) +test_4 (_mm_mask_fmadd_round_sch, __m128h, __m128h, __mmask8, __m128h, __m128h, 8) +test_4 (_mm_mask_fcmadd_round_sch, __m128h, __m128h, __mmask8, __m128h, __m128h, 8) +test_4 (_mm_mask3_fmadd_round_sch, __m128h, __m128h, __m128h, __m128h, __mmask8, 8) +test_4 (_mm_mask3_fcmadd_round_sch, __m128h, __m128h, __m128h, __m128h, __mmask8, 8) +test_4 (_mm_maskz_fmadd_round_sch, __m128h, __mmask8, __m128h, __m128h, __m128h, 8) +test_4 (_mm_maskz_fcmadd_round_sch, __m128h, __mmask8, __m128h, __m128h, __m128h, 8) test_4 (_mm512_mask_fmul_round_pch, __m512h, __m512h, __mmask16, __m512h, __m512h, 8) test_4 (_mm512_mask_fcmul_round_pch, __m512h, __m512h, __mmask16, __m512h, __m512h, 8) +test_4 (_mm_mask_fmul_round_sch, __m128h, __m128h, __mmask8, __m128h, __m128h, 8) +test_4 (_mm_mask_fcmul_round_sch, __m128h, __m128h, __mmask8, __m128h, __m128h, 8) test_4x (_mm_mask_reduce_round_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 123, 8) test_4x (_mm_mask_roundscale_round_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 123, 8) test_4x (_mm_mask_getmant_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 1, 1) diff --git a/gcc/testsuite/gcc.target/i386/sse-22.c b/gcc/testsuite/gcc.target/i386/sse-22.c index ccf8c3a6c03..31492ef3697 100644 --- a/gcc/testsuite/gcc.target/i386/sse-22.c +++ b/gcc/testsuite/gcc.target/i386/sse-22.c @@ -878,6 +878,8 @@ test_2 (_mm_cvt_roundss_sh, __m128h, __m128h, __m128, 8) test_2 (_mm_cvt_roundsd_sh, __m128h, __m128h, __m128d, 8) test_2 (_mm512_fmul_round_pch, __m512h, __m512h, __m512h, 8) test_2 (_mm512_fcmul_round_pch, __m512h, __m512h, __m512h, 8) +test_2 (_mm_fmul_round_sch, __m128h, __m128h, __m128h, 8) +test_2 (_mm_fcmul_round_sch, __m128h, __m128h, __m128h, 8) test_2x (_mm512_cmp_round_ph_mask, __mmask32, __m512h, __m512h, 1, 8) test_2x (_mm_cmp_round_sh_mask, __mmask8, __m128h, __m128h, 1, 8) test_2x (_mm_comi_round_sh, int, __m128h, __m128h, 1, 8) @@ -954,6 +956,10 @@ test_3 (_mm_fnmsub_round_sh, __m128h, __m128h, __m128h, __m128h, 9) test_3 (_mm512_fmadd_round_pch, __m512h, __m512h, __m512h, __m512h, 8) test_3 (_mm512_fcmadd_round_pch, __m512h, __m512h, __m512h, __m512h, 8) test_3 (_mm512_maskz_fmul_round_pch, __m512h, __mmask16, __m512h, __m512h, 8) +test_3 (_mm_maskz_fmul_round_sch, __m128h, __mmask8, __m128h, __m128h, 8) +test_3 (_mm_maskz_fcmul_round_sch, __m128h, __mmask8, __m128h, __m128h, 8) +test_3 (_mm_fmadd_round_sch, __m128h, __m128h, __m128h, __m128h, 8) +test_3 (_mm_fcmadd_round_sch, __m128h, __m128h, __m128h, __m128h, 8) test_3 (_mm512_maskz_fcmul_round_pch, __m512h, __mmask16, __m512h, __m512h, 8) test_3x (_mm512_mask_cmp_round_ph_mask, __mmask32, __mmask32, __m512h, __m512h, 1, 8) test_3x (_mm_mask_cmp_round_sh_mask, __mmask8, __mmask8, __m128h, __m128h, 1, 8) @@ -1022,8 +1028,16 @@ test_4 (_mm512_mask3_fmadd_round_pch, __m512h, __m512h, __m512h, __m512h, __mmas test_4 (_mm512_mask3_fcmadd_round_pch, __m512h, __m512h, __m512h, __m512h, __mmask16, 8) test_4 (_mm512_maskz_fmadd_round_pch, __m512h, __mmask16, __m512h, __m512h, __m512h, 8) test_4 (_mm512_maskz_fcmadd_round_pch, __m512h, __mmask16, __m512h, __m512h, __m512h, 8) +test_4 (_mm_mask_fmadd_round_sch, __m128h, __m128h, __mmask8, __m128h, __m128h, 8) +test_4 (_mm_mask_fcmadd_round_sch, __m128h, __m128h, __mmask8, __m128h, __m128h, 8) +test_4 (_mm_mask3_fmadd_round_sch, __m128h, __m128h, __m128h, __m128h, __mmask8, 8) +test_4 (_mm_mask3_fcmadd_round_sch, __m128h, __m128h, __m128h, __m128h, __mmask8, 8) +test_4 (_mm_maskz_fmadd_round_sch, __m128h, __mmask8, __m128h, __m128h, __m128h, 8) +test_4 (_mm_maskz_fcmadd_round_sch, __m128h, __mmask8, __m128h, __m128h, __m128h, 8) test_4 (_mm512_mask_fmul_round_pch, __m512h, __m512h, __mmask16, __m512h, __m512h, 8) test_4 (_mm512_mask_fcmul_round_pch, __m512h, __m512h, __mmask16, __m512h, __m512h, 8) +test_4 (_mm_mask_fmul_round_sch, __m128h, __m128h, __mmask8, __m128h, __m128h, 8) +test_4 (_mm_mask_fcmul_round_sch, __m128h, __m128h, __mmask8, __m128h, __m128h, 8) test_4x (_mm_mask_reduce_round_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 123, 8) test_4x (_mm_mask_roundscale_round_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 123, 8) test_4x (_mm_mask_getmant_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 1, 1) diff --git a/gcc/testsuite/gcc.target/i386/sse-23.c b/gcc/testsuite/gcc.target/i386/sse-23.c index dc39d7e2012..4a110e86855 100644 --- a/gcc/testsuite/gcc.target/i386/sse-23.c +++ b/gcc/testsuite/gcc.target/i386/sse-23.c @@ -815,6 +815,16 @@ #define __builtin_ia32_vfmulcph_v32hf_mask_round(A, C, D, B, E) __builtin_ia32_vfmulcph_v32hf_mask_round(A, C, D, B, 8) #define __builtin_ia32_vfcmulcph_v32hf_round(A, B, C) __builtin_ia32_vfcmulcph_v32hf_round(A, B, 8) #define __builtin_ia32_vfcmulcph_v32hf_mask_round(A, C, D, B, E) __builtin_ia32_vfcmulcph_v32hf_mask_round(A, C, D, B, 8) +#define __builtin_ia32_vfmaddcsh_v8hf_round(A, B, C, D) __builtin_ia32_vfmaddcsh_v8hf_round(A, B, C, 8) +#define __builtin_ia32_vfmaddcsh_v8hf_mask_round(A, C, D, B, E) __builtin_ia32_vfmaddcsh_v8hf_mask_round(A, C, D, B, 8) +#define __builtin_ia32_vfmaddcsh_v8hf_maskz_round(B, C, D, A, E) __builtin_ia32_vfmaddcsh_v8hf_maskz_round(B, C, D, A, 8) +#define __builtin_ia32_vfcmaddcsh_v8hf_round(A, B, C, D) __builtin_ia32_vfcmaddcsh_v8hf_round(A, B, C, 8) +#define __builtin_ia32_vfcmaddcsh_v8hf_mask_round(A, C, D, B, E) __builtin_ia32_vfcmaddcsh_v8hf_mask_round(A, C, D, B, 8) +#define __builtin_ia32_vfcmaddcsh_v8hf_maskz_round(B, C, D, A, E) __builtin_ia32_vfcmaddcsh_v8hf_maskz_round(B, C, D, A, 8) +#define __builtin_ia32_vfmulcsh_v8hf_round(A, B, C) __builtin_ia32_vfmulcsh_v8hf_round(A, B, 8) +#define __builtin_ia32_vfmulcsh_v8hf_mask_round(A, C, D, B, E) __builtin_ia32_vfmulcsh_v8hf_mask_round(A, C, D, B, 8) +#define __builtin_ia32_vfcmulcsh_v8hf_round(A, B, C) __builtin_ia32_vfcmulcsh_v8hf_round(A, B, 8) +#define __builtin_ia32_vfcmulcsh_v8hf_mask_round(A, C, D, B, E) __builtin_ia32_vfcmulcsh_v8hf_mask_round(A, C, D, B, 8) /* avx512fp16vlintrin.h */ #define __builtin_ia32_vcmpph_v8hf_mask(A, B, C, D) __builtin_ia32_vcmpph_v8hf_mask(A, B, 1, D) From patchwork Thu Jul 1 06:16:38 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: liuhongt X-Patchwork-Id: 1499405 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=8.43.85.97; helo=sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256 header.s=default header.b=aphbYitB; dkim-atps=neutral Received: from sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4GFqKJ2XTWz9sVb for ; Thu, 1 Jul 2021 17:17:08 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 84D54383F433 for ; Thu, 1 Jul 2021 07:17:05 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 84D54383F433 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1625123825; bh=bk8KRylODkXULOmNFeUe1BNa7eoXeTJrEiMNzRnkaUA=; h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=aphbYitBCgIHFfL88z/TbSkpZc5jkz2eXAf8fcVk3+Z2P26hPOwLnqamXcsLtYsxU RBIoj927Xfjl9GNRhsiYzB4ew7p7ekQDrqKkbWvOG6yzLeSLcC5nM+DNfoHesgiqYD 46IseV0/KooVyBStMGGzZcfO5Qmp+ANRcsGApSsQ= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) by sourceware.org (Postfix) with ESMTPS id C1CD1384802F for ; Thu, 1 Jul 2021 06:18:15 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org C1CD1384802F X-IronPort-AV: E=McAfee;i="6200,9189,10031"; a="208300046" X-IronPort-AV: E=Sophos;i="5.83,313,1616482800"; d="scan'208";a="208300046" Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Jun 2021 23:18:15 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.83,313,1616482800"; d="scan'208";a="476546003" Received: from scymds01.sc.intel.com ([10.148.94.138]) by fmsmga004.fm.intel.com with ESMTP; 30 Jun 2021 23:18:14 -0700 Received: from shliclel320.sh.intel.com (shliclel320.sh.intel.com [10.239.236.50]) by scymds01.sc.intel.com with ESMTP id 1616GmfV031625; Wed, 30 Jun 2021 23:18:13 -0700 To: gcc-patches@gcc.gnu.org Subject: [PATCH 52/62] AVX512FP16: Add testcases for vfcmaddcsh/vfmaddcsh/vfcmulcsh/vfmulcsh. Date: Thu, 1 Jul 2021 14:16:38 +0800 Message-Id: <20210701061648.9447-53-hongtao.liu@intel.com> X-Mailer: git-send-email 2.18.1 In-Reply-To: <20210701061648.9447-1-hongtao.liu@intel.com> References: <20210701061648.9447-1-hongtao.liu@intel.com> X-Spam-Status: No, score=-12.1 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_NONE, KAM_DMARC_STATUS, KAM_LAZY_DOMAIN_SECURITY, KAM_SHORT, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: liuhongt via Gcc-patches From: liuhongt Reply-To: liuhongt Cc: jakub@redhat.com Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" gcc/testsuite/ChangeLog: * gcc.target/i386/avx512fp16-vfcmaddcsh-1a.c: New test. * gcc.target/i386/avx512fp16-vfcmaddcsh-1b.c: Ditto. * gcc.target/i386/avx512fp16-vfcmulcsh-1a.c: Ditto. * gcc.target/i386/avx512fp16-vfcmulcsh-1b.c: Ditto. * gcc.target/i386/avx512fp16-vfmaddcsh-1a.c: Ditto. * gcc.target/i386/avx512fp16-vfmaddcsh-1b.c: Ditto. * gcc.target/i386/avx512fp16-vfmulcsh-1a.c: Ditto. * gcc.target/i386/avx512fp16-vfmulcsh-1b.c: Ditto. --- .../i386/avx512fp16-vfcmaddcsh-1a.c | 27 +++++++ .../i386/avx512fp16-vfcmaddcsh-1b.c | 78 +++++++++++++++++++ .../gcc.target/i386/avx512fp16-vfcmulcsh-1a.c | 25 ++++++ .../gcc.target/i386/avx512fp16-vfcmulcsh-1b.c | 71 +++++++++++++++++ .../gcc.target/i386/avx512fp16-vfmaddcsh-1a.c | 27 +++++++ .../gcc.target/i386/avx512fp16-vfmaddcsh-1b.c | 77 ++++++++++++++++++ .../gcc.target/i386/avx512fp16-vfmulcsh-1a.c | 25 ++++++ .../gcc.target/i386/avx512fp16-vfmulcsh-1b.c | 71 +++++++++++++++++ 8 files changed, 401 insertions(+) create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vfcmaddcsh-1a.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vfcmaddcsh-1b.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vfcmulcsh-1a.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vfcmulcsh-1b.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vfmaddcsh-1a.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vfmaddcsh-1b.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vfmulcsh-1a.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vfmulcsh-1b.c diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vfcmaddcsh-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vfcmaddcsh-1a.c new file mode 100644 index 00000000000..8bd8eebd8df --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vfcmaddcsh-1a.c @@ -0,0 +1,27 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx512fp16 -O2" } */ +/* { dg-final { scan-assembler-times "vfcmaddcsh\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vfcmaddcsh\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\[^\{\n\r]*(?:\n|\[ \\t\]+#)" 2 } } */ +/* { dg-final { scan-assembler-times "vfcmaddcsh\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vfcmaddcsh\[ \\t\]+\{rn-sae\}\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vfcmaddcsh\[ \\t\]+\{rn-sae\}\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 2 } } */ +/* { dg-final { scan-assembler-times "vfcmaddcsh\[ \\t\]+\{rz-sae\}\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ + +#include + +volatile __m128h res, res1, res2; +volatile __m128h x1, x2, x3; +volatile __mmask8 m8; + +void extern +avx128f_test (void) +{ + res = _mm_fcmadd_sch (x1, x2, x3); + res1 = _mm_mask_fcmadd_sch (res1, m8, x1, x2); + res1 = _mm_mask3_fcmadd_sch (res1, x1, x2, m8); + res2 = _mm_maskz_fcmadd_sch (m8, x1, x2, x3); + res = _mm_fcmadd_round_sch (x1, x2, x3, 8); + res1 = _mm_mask_fcmadd_round_sch (res1, m8, x1, x2, 8); + res1 = _mm_mask3_fcmadd_round_sch (res1, x1, x2, m8, 8); + res2 = _mm_maskz_fcmadd_round_sch (m8, x1, x2, x3, 11); +} diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vfcmaddcsh-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vfcmaddcsh-1b.c new file mode 100644 index 00000000000..c4790684b66 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vfcmaddcsh-1b.c @@ -0,0 +1,78 @@ +/* { dg-do run { target avx512fp16 } } */ +/* { dg-options "-O2 -mavx512fp16 -mavx512dq" } */ + + +#define AVX512FP16 +#include "avx512fp16-helper.h" + +#define N_ELEMS 8 + +void NOINLINE +EMULATE(c_fmadd_csh) (V512 * dest, V512 op1, V512 op2, + __mmask8 k, int zero_mask, int c_flag, + int is_mask3) +{ + V512 v1, v2, v3, v4, v5, v6, v7, v8; + int i; + int invert = 1; + if (c_flag == 1) + invert = -1; + + unpack_ph_2twops(op1, &v1, &v2); + unpack_ph_2twops(op2, &v3, &v4); + unpack_ph_2twops(*dest, &v7, &v8); + + if ((k&1) || !k) { + v5.f32[0] = v1.f32[0] * v7.f32[0] + - invert * (v1.f32[1] * v7.f32[1]) + v3.f32[0]; + v5.f32[1] = v1.f32[0] * v7.f32[1] + + invert * (v1.f32[1] * v7.f32[0]) + v3.f32[1]; + } + else if (zero_mask) + v5.f32[0] = 0; + else + v5.f32[0] = v7.f32[0]; + + for (i = 2; i < 8; i++) + v5.f32[i] = is_mask3? v3.f32[i] : v7.f32[i]; + + *dest = pack_twops_2ph(v5, v6); +} + +void +TEST (void) +{ + V512 res; + V512 exp; + + init_src(); + + init_dest(&res, &exp); + EMULATE(c_fmadd_csh)(&exp, src1, src2, 0x1, 0, 1, 0); + res.xmmh[0] = _mm_fcmadd_round_sch(res.xmmh[0], src1.xmmh[0], + src2.xmmh[0], _ROUND_NINT); + CHECK_RESULT (&res, &exp, N_ELEMS, _mm_fcmadd_sch); + + init_dest(&res, &exp); + EMULATE(c_fmadd_csh)(&exp, src1, src2, 0x1, 0, 1, 0); + res.xmmh[0] = _mm_mask_fcmadd_round_sch(res.xmmh[0], 0x1, + src1.xmmh[0], src2.xmmh[0], _ROUND_NINT); + CHECK_RESULT (&res, &exp, N_ELEMS, _mm_mask_fcmadd_sch); + + init_dest(&res, &exp); + EMULATE(c_fmadd_csh)(&exp, src1, src2, 0x1, 0, 1, 1); + res.xmmh[0] = _mm_mask3_fcmadd_round_sch(res.xmmh[0], src1.xmmh[0], src2.xmmh[0], + 0x1, _ROUND_NINT); + CHECK_RESULT (&res, &exp, N_ELEMS, _mm_mask3_fcmadd_sch); + + init_dest(&res, &exp); + EMULATE(c_fmadd_csh)(&exp, src1, src2, 0x3, 1, 1, 0); + res.xmmh[0] = _mm_maskz_fcmadd_round_sch(0x3, res.xmmh[0], src1.xmmh[0], + src2.xmmh[0], _ROUND_NINT); + CHECK_RESULT (&res, &exp, N_ELEMS, _mm_maskz_fcmadd_sch); + + if (n_errs != 0) { + abort (); + } +} + diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vfcmulcsh-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vfcmulcsh-1a.c new file mode 100644 index 00000000000..872d91ac257 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vfcmulcsh-1a.c @@ -0,0 +1,25 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx512fp16 -O2" } */ +/* { dg-final { scan-assembler-times "vfcmulcsh\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vfcmulcsh\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\[^\{\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vfcmulcsh\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vfcmulcsh\[ \\t\]+\{rn-sae\}\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vfcmulcsh\[ \\t\]+\{rn-sae\}\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vfcmulcsh\[ \\t\]+\{rz-sae\}\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ + +#include + +volatile __m128h res, res1, res2; +volatile __m128h x1, x2, x3; +volatile __mmask8 m8; + +void extern +avx512f_test (void) +{ + res = _mm_fcmul_sch (x1, x2); + res1 = _mm_mask_fcmul_sch (res1, m8, x1, x2); + res2 = _mm_maskz_fcmul_sch (m8, x1, x2); + res = _mm_fcmul_round_sch (x1, x2, 8); + res1 = _mm_mask_fcmul_round_sch (res1, m8, x1, x2, 8); + res2 = _mm_maskz_fcmul_round_sch (m8, x1, x2, 11); +} diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vfcmulcsh-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vfcmulcsh-1b.c new file mode 100644 index 00000000000..995df8422f4 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vfcmulcsh-1b.c @@ -0,0 +1,71 @@ +/* { dg-do run { target avx512fp16 } } */ +/* { dg-options "-O2 -mavx512fp16 -mavx512dq" } */ + + +#define AVX512FP16 +#include "avx512fp16-helper.h" + +#define N_ELEMS 8 + +void NOINLINE +EMULATE(c_fmul_csh) (V512 * dest, V512 op1, V512 op2, + __mmask8 k, int zero_mask, int c_flag) +{ + V512 v1, v2, v3, v4, v5, v6, v7, v8; + int i; + int invert = 1; + if (c_flag == 1) + invert = -1; + + unpack_ph_2twops(op1, &v1, &v2); + unpack_ph_2twops(op2, &v3, &v4); + unpack_ph_2twops(*dest, &v7, &v8); + + if ((k&1) || !k) { + v5.f32[0] = v1.f32[0] * v3.f32[0] + - invert * (v1.f32[1] * v3.f32[1]); + v5.f32[1] = v1.f32[1] * v3.f32[0] + + invert * (v1.f32[0] * v3.f32[1]); + } + else if (zero_mask) + v5.f32[0] = 0; + else + v5.f32[0] = v7.f32[0]; + + for (i = 2; i < 8; i++) + v5.f32[i] = v1.f32[i]; + + *dest = pack_twops_2ph(v5, v6); +} + +void +TEST (void) +{ + V512 res; + V512 exp; + + init_src(); + + init_dest(&res, &exp); + EMULATE(c_fmul_csh)(&exp, src1, src2, 0x1, 0 , 1); + res.xmmh[0] = _mm_fcmul_round_sch(src1.xmmh[0], src2.xmmh[0], _ROUND_NINT); + CHECK_RESULT (&res, &exp, N_ELEMS, _mm_fcmul_sch); + + init_dest(&res, &exp); + EMULATE(c_fmul_csh)(&exp, src1, src2, 0x1, 0, 1); + res.xmmh[0] = _mm_mask_fcmul_round_sch(res.xmmh[0], 0x1, + src1.xmmh[0], src2.xmmh[0], + _ROUND_NINT); + CHECK_RESULT (&res, &exp, N_ELEMS, _mm_mask_fcmul_sch); + + init_dest(&res, &exp); + EMULATE(c_fmul_csh)(&exp, src1, src2, 0x3, 1, 1); + res.xmmh[0] = _mm_maskz_fcmul_round_sch(0x3, src1.xmmh[0], + src2.xmmh[0], _ROUND_NINT); + CHECK_RESULT (&res, &exp, N_ELEMS, _mm_maskz_fcmul_sch); + + if (n_errs != 0) { + abort (); + } +} + diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vfmaddcsh-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vfmaddcsh-1a.c new file mode 100644 index 00000000000..1e376b4a2bb --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vfmaddcsh-1a.c @@ -0,0 +1,27 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx512fp16 -O2" } */ +/* { dg-final { scan-assembler-times "vfmaddcsh\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vfmaddcsh\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\[^\{\n\r]*(?:\n|\[ \\t\]+#)" 2 } } */ +/* { dg-final { scan-assembler-times "vfmaddcsh\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vfmaddcsh\[ \\t\]+\{rn-sae\}\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vfmaddcsh\[ \\t\]+\{rn-sae\}\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 2 } } */ +/* { dg-final { scan-assembler-times "vfmaddcsh\[ \\t\]+\{rz-sae\}\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ + +#include + +volatile __m128h res, res1, res2; +volatile __m128h x1, x2, x3; +volatile __mmask8 m8; + +void extern +avx128f_test (void) +{ + res = _mm_fmadd_sch (x1, x2, x3); + res1 = _mm_mask_fmadd_sch (res1, m8, x1, x2); + res1 = _mm_mask3_fmadd_sch (res1, x1, x2, m8); + res2 = _mm_maskz_fmadd_sch (m8, x1, x2, x3); + res = _mm_fmadd_round_sch (x1, x2, x3, 8); + res1 = _mm_mask_fmadd_round_sch (res1, m8, x1, x2, 8); + res1 = _mm_mask3_fmadd_round_sch (res1, x1, x2, m8, 8); + res2 = _mm_maskz_fmadd_round_sch (m8, x1, x2, x3, 11); +} diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vfmaddcsh-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vfmaddcsh-1b.c new file mode 100644 index 00000000000..4c74e01d8a0 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vfmaddcsh-1b.c @@ -0,0 +1,77 @@ +/* { dg-do run { target avx512fp16 } } */ +/* { dg-options "-O2 -mavx512fp16 -mavx512dq" } */ + + +#define AVX512FP16 +#include "avx512fp16-helper.h" + +#define N_ELEMS 8 + +void NOINLINE +EMULATE(c_fmadd_csh) (V512 * dest, V512 op1, V512 op2, + __mmask8 k, int zero_mask, int c_flag, + int is_mask3) +{ + V512 v1, v2, v3, v4, v5, v6, v7, v8; + int i; + int invert = 1; + if (c_flag == 1) + invert = -1; + + unpack_ph_2twops(op1, &v1, &v2); + unpack_ph_2twops(op2, &v3, &v4); + unpack_ph_2twops(*dest, &v7, &v8); + + if ((k&1) || !k) { + v5.f32[0] = v1.f32[0] * v7.f32[0] + - invert * (v1.f32[1] * v7.f32[1]) + v3.f32[0]; + v5.f32[1] = v1.f32[0] * v7.f32[1] + + invert * (v1.f32[1] * v7.f32[0]) + v3.f32[1]; + } + else if (zero_mask) + v5.f32[0] = 0; + else + v5.f32[0] = v7.f32[0]; + + for (i = 2; i < 8; i++) + v5.f32[i] = is_mask3? v3.f32[i] : v7.f32[i]; + + *dest = pack_twops_2ph(v5, v6); +} + +void +TEST (void) +{ + V512 res; + V512 exp; + + init_src(); + + init_dest(&res, &exp); + EMULATE(c_fmadd_csh)(&exp, src1, src2, 0x1, 0, 0, 0); + res.xmmh[0] = _mm_fmadd_round_sch(res.xmmh[0], src1.xmmh[0], + src2.xmmh[0], _ROUND_NINT); + CHECK_RESULT (&res, &exp, N_ELEMS, _mm_fmadd_sch); + + init_dest(&res, &exp); + EMULATE(c_fmadd_csh)(&exp, src1, src2, 0x1, 0, 0, 0); + res.xmmh[0] = _mm_mask_fmadd_round_sch(res.xmmh[0], 0x1, src1.xmmh[0], + src2.xmmh[0], _ROUND_NINT); + CHECK_RESULT (&res, &exp, N_ELEMS, _mm_mask_fmadd_sch); + init_dest(&res, &exp); + EMULATE(c_fmadd_csh)(&exp, src1, src2, 0x1, 0, 0, 1); + res.xmmh[0] = _mm_mask3_fmadd_round_sch(res.xmmh[0], src1.xmmh[0], src2.xmmh[0], + 0x1, _ROUND_NINT); + CHECK_RESULT (&res, &exp, N_ELEMS, _mm_mask3_fmadd_sch); + + init_dest(&res, &exp); + EMULATE(c_fmadd_csh)(&exp, src1, src2, 0x3, 1, 0, 0); + res.xmmh[0] = _mm_maskz_fmadd_round_sch(0x3, res.xmmh[0], src1.xmmh[0], + src2.xmmh[0], _ROUND_NINT); + CHECK_RESULT (&res, &exp, N_ELEMS, _mm_maskz_fmadd_sch); + + if (n_errs != 0) { + abort (); + } +} + diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vfmulcsh-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vfmulcsh-1a.c new file mode 100644 index 00000000000..5d48874b760 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vfmulcsh-1a.c @@ -0,0 +1,25 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx512fp16 -O2" } */ +/* { dg-final { scan-assembler-times "vfmulcsh\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vfmulcsh\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\[^\{\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vfmulcsh\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vfmulcsh\[ \\t\]+\{rn-sae\}\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vfmulcsh\[ \\t\]+\{rn-sae\}\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vfmulcsh\[ \\t\]+\{rz-sae\}\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ + +#include + +volatile __m128h res, res1, res2; +volatile __m128h x1, x2, x3; +volatile __mmask8 m8; + +void extern +avx512f_test (void) +{ + res = _mm_fmul_sch (x1, x2); + res1 = _mm_mask_fmul_sch (res1, m8, x1, x2); + res2 = _mm_maskz_fmul_sch (m8, x1, x2); + res = _mm_fmul_round_sch (x1, x2, 8); + res1 = _mm_mask_fmul_round_sch (res1, m8, x1, x2, 8); + res2 = _mm_maskz_fmul_round_sch (m8, x1, x2, 11); +} diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vfmulcsh-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vfmulcsh-1b.c new file mode 100644 index 00000000000..45840d62f67 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vfmulcsh-1b.c @@ -0,0 +1,71 @@ +/* { dg-do run { target avx512fp16 } } */ +/* { dg-options "-O2 -mavx512fp16 -mavx512dq" } */ + + +#define AVX512FP16 +#include "avx512fp16-helper.h" + +#define N_ELEMS 8 + +void NOINLINE +EMULATE(c_fmul_csh) (V512 * dest, V512 op1, V512 op2, + __mmask8 k, int zero_mask, int c_flag) +{ + V512 v1, v2, v3, v4, v5, v6, v7, v8; + int i; + int invert = 1; + if (c_flag == 1) + invert = -1; + + unpack_ph_2twops(op1, &v1, &v2); + unpack_ph_2twops(op2, &v3, &v4); + unpack_ph_2twops(*dest, &v7, &v8); + + if ((k&1) || !k) { + v5.f32[0] = v1.f32[0] * v3.f32[0] + - invert * (v1.f32[1] * v3.f32[1]); + v5.f32[1] = v1.f32[0] * v3.f32[1] + + invert * (v1.f32[1] * v3.f32[0]); + } + else if (zero_mask) + v5.f32[0] = 0; + else + v5.f32[0] = v7.f32[0]; + + for (i = 2; i < 8; i++) + v5.f32[i] = v1.f32[i]; + + *dest = pack_twops_2ph(v5, v6); +} + +void +TEST (void) +{ + V512 res; + V512 exp; + + init_src(); + + init_dest(&res, &exp); + EMULATE(c_fmul_csh)(&exp, src1, src2, 0x1, 0 , 0); + res.xmmh[0] = _mm_fmul_round_sch(src1.xmmh[0], src2.xmmh[0], _ROUND_NINT); + CHECK_RESULT (&res, &exp, N_ELEMS, _mm_fmul_sch); + + init_dest(&res, &exp); + EMULATE(c_fmul_csh)(&exp, src1, src2, 0x1, 0, 0); + res.xmmh[0] = _mm_mask_fmul_round_sch(res.xmmh[0], 0x1, + src1.xmmh[0], src2.xmmh[0], + _ROUND_NINT); + CHECK_RESULT (&res, &exp, N_ELEMS, _mm_mask_fmul_sch); + + init_dest(&res, &exp); + EMULATE(c_fmul_csh)(&exp, src1, src2, 0x3, 1, 0); + res.xmmh[0] = _mm_maskz_fmul_round_sch(0x3, src1.xmmh[0], + src2.xmmh[0], _ROUND_NINT); + CHECK_RESULT (&res, &exp, N_ELEMS, _mm_maskz_fmul_sch); + + if (n_errs != 0) { + abort (); + } +} + From patchwork Thu Jul 1 06:16:39 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: liuhongt X-Patchwork-Id: 1499403 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256 header.s=default header.b=lwq3N4vS; dkim-atps=neutral Received: from sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4GFqGv4SdDz9sVb for ; Thu, 1 Jul 2021 17:15:02 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 1C97A3848027 for ; Thu, 1 Jul 2021 07:14:59 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 1C97A3848027 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1625123699; bh=gdDrhoHzpco2PqgVG/52gzxAWeezPNj+nbDK8ARlIZ4=; h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=lwq3N4vSAzt7Wr8thgSR/A+2VPGRjJViFPIFbEwT9zW6D6Ppgc9c9R9Xcr5F+CDf7 RkSx0H0IZ/w7tSEJHlmfEJQM0PL4cvHpWMwSN2dvmc/2jmv8KEQVdw6Rjp9aB+olvn GnQTZE+2Xbc/fP4Gw4dN9ciOQm++eya+btnzJ3Zs= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mga05.intel.com (mga05.intel.com [192.55.52.43]) by sourceware.org (Postfix) with ESMTPS id C8ED23840020 for ; Thu, 1 Jul 2021 06:18:16 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org C8ED23840020 X-IronPort-AV: E=McAfee;i="6200,9189,10031"; a="294115052" X-IronPort-AV: E=Sophos;i="5.83,313,1616482800"; d="scan'208";a="294115052" Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by fmsmga105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Jun 2021 23:18:16 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.83,313,1616482800"; d="scan'208";a="420288009" Received: from scymds01.sc.intel.com ([10.148.94.138]) by fmsmga007.fm.intel.com with ESMTP; 30 Jun 2021 23:18:16 -0700 Received: from shliclel320.sh.intel.com (shliclel320.sh.intel.com [10.239.236.50]) by scymds01.sc.intel.com with ESMTP id 1616GmfW031625; Wed, 30 Jun 2021 23:18:15 -0700 To: gcc-patches@gcc.gnu.org Subject: [PATCH 53/62] AVX512FP16: Add expander for sqrthf2. Date: Thu, 1 Jul 2021 14:16:39 +0800 Message-Id: <20210701061648.9447-54-hongtao.liu@intel.com> X-Mailer: git-send-email 2.18.1 In-Reply-To: <20210701061648.9447-1-hongtao.liu@intel.com> References: <20210701061648.9447-1-hongtao.liu@intel.com> X-Spam-Status: No, score=-12.2 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_NONE, KAM_DMARC_STATUS, KAM_LAZY_DOMAIN_SECURITY, KAM_SHORT, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: liuhongt via Gcc-patches From: liuhongt Reply-To: liuhongt Cc: jakub@redhat.com Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" gcc/ChangeLog: * config/i386/i386-features.c (i386-features.c): Handle E_HFmode. * config/i386/i386.md (sqrthf2): New expander. (*sqrt2_sse): Extend to MODEFH. * config/i386/sse.md (*_vmsqrt2): Extend to VFH_128. gcc/testsuite/ChangeLog: * gcc.target/i386/avx512fp16-builtin-sqrt-1.c: New test. * gcc.target/i386/avx512fp16vl-builtin-sqrt-1.c: New test. --- gcc/config/i386/i386-features.c | 15 +++++++++++---- gcc/config/i386/i386.md | 12 +++++++++--- gcc/config/i386/sse.md | 8 ++++---- .../i386/avx512fp16-builtin-sqrt-1.c | 18 ++++++++++++++++++ .../i386/avx512fp16vl-builtin-sqrt-1.c | 19 +++++++++++++++++++ 5 files changed, 61 insertions(+), 11 deletions(-) create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-builtin-sqrt-1.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-builtin-sqrt-1.c diff --git a/gcc/config/i386/i386-features.c b/gcc/config/i386/i386-features.c index a25769ae478..0b5a1a3af53 100644 --- a/gcc/config/i386/i386-features.c +++ b/gcc/config/i386/i386-features.c @@ -2238,15 +2238,22 @@ remove_partial_avx_dependency (void) rtx zero; machine_mode dest_vecmode; - if (dest_mode == E_SFmode) + switch (dest_mode) { + case E_HFmode: + dest_vecmode = V8HFmode; + zero = gen_rtx_SUBREG (V8HFmode, v4sf_const0, 0); + break; + case E_SFmode: dest_vecmode = V4SFmode; zero = v4sf_const0; - } - else - { + break; + case E_DFmode: dest_vecmode = V2DFmode; zero = gen_rtx_SUBREG (V2DFmode, v4sf_const0, 0); + break; + default: + gcc_unreachable (); } /* Change source to vector mode. */ diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md index a85c23d74f1..81c893c60de 100644 --- a/gcc/config/i386/i386.md +++ b/gcc/config/i386/i386.md @@ -16561,9 +16561,9 @@ (define_expand "rsqrtsf2" }) (define_insn "*sqrt2_sse" - [(set (match_operand:MODEF 0 "register_operand" "=v,v,v") - (sqrt:MODEF - (match_operand:MODEF 1 "nonimmediate_operand" "0,v,m")))] + [(set (match_operand:MODEFH 0 "register_operand" "=v,v,v") + (sqrt:MODEFH + (match_operand:MODEFH 1 "nonimmediate_operand" "0,v,m")))] "SSE_FLOAT_MODE_P (mode) && TARGET_SSE_MATH" "@ %vsqrt\t{%d1, %0|%0, %d1} @@ -16583,6 +16583,12 @@ (define_insn "*sqrt2_sse" ] (symbol_ref "true")))]) +(define_expand "sqrthf2" + [(set (match_operand:HF 0 "register_operand") + (sqrt:HF + (match_operand:HF 1 "nonimmediate_operand")))] + "TARGET_AVX512FP16") + (define_expand "sqrt2" [(set (match_operand:MODEF 0 "register_operand") (sqrt:MODEF diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index 2c3dba5bdb0..b47e7f0b82a 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -2389,12 +2389,12 @@ (define_insn "_vmsqrt2" (set_attr "mode" "")]) (define_insn "*_vmsqrt2" - [(set (match_operand:VF_128 0 "register_operand" "=x,v") - (vec_merge:VF_128 - (vec_duplicate:VF_128 + [(set (match_operand:VFH_128 0 "register_operand" "=x,v") + (vec_merge:VFH_128 + (vec_duplicate:VFH_128 (sqrt: (match_operand: 1 "nonimmediate_operand" "xm,"))) - (match_operand:VF_128 2 "register_operand" "0,v") + (match_operand:VFH_128 2 "register_operand" "0,v") (const_int 1)))] "TARGET_SSE" "@ diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-builtin-sqrt-1.c b/gcc/testsuite/gcc.target/i386/avx512fp16-builtin-sqrt-1.c new file mode 100644 index 00000000000..38cdf23fef7 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-builtin-sqrt-1.c @@ -0,0 +1,18 @@ +/* { dg-do compile } */ +/* { dg-options "-Ofast -mavx512fp16" } */ + +_Float16 +f1 (_Float16 x) +{ + return __builtin_sqrtf16 (x); +} + +void +f2 (_Float16* __restrict psrc, _Float16* __restrict pdst) +{ + for (int i = 0; i != 32; i++) + pdst[i] = __builtin_sqrtf16 (psrc[i]); +} + +/* { dg-final { scan-assembler-times "vsqrtsh\[^\n\r\]*xmm\[0-9\]" 1 } } */ +/* { dg-final { scan-assembler-times "vsqrtph\[^\n\r\]*zmm\[0-9\]" 1 } } */ diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-builtin-sqrt-1.c b/gcc/testsuite/gcc.target/i386/avx512fp16vl-builtin-sqrt-1.c new file mode 100644 index 00000000000..08deb3ea470 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-builtin-sqrt-1.c @@ -0,0 +1,19 @@ +/* { dg-do compile } */ +/* { dg-options "-Ofast -mavx512fp16 -mavx512vl" } */ + +void +f1 (_Float16* __restrict psrc, _Float16* __restrict pdst) +{ + for (int i = 0; i != 8; i++) + pdst[i] = __builtin_sqrtf16 (psrc[i]); +} + +void +f2 (_Float16* __restrict psrc, _Float16* __restrict pdst) +{ + for (int i = 0; i != 16; i++) + pdst[i] = __builtin_sqrtf16 (psrc[i]); +} + +/* { dg-final { scan-assembler-times "vsqrtph\[^\n\r\]*xmm\[0-9\]" 1 } } */ +/* { dg-final { scan-assembler-times "vsqrtph\[^\n\r\]*ymm\[0-9\]" 1 } } */ From patchwork Thu Jul 1 06:16:40 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: liuhongt X-Patchwork-Id: 1499406 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=8.43.85.97; helo=sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256 header.s=default header.b=FZmfwBBv; dkim-atps=neutral Received: from sourceware.org (ip-8-43-85-97.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4GFqLR21nWz9sVb for ; Thu, 1 Jul 2021 17:18:07 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id E14F43951E44 for ; Thu, 1 Jul 2021 07:18:04 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org E14F43951E44 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1625123884; bh=n4ANGBbXRoKSLhskIZs2x2FXBPKQMjaNxILntSduQZc=; h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=FZmfwBBv0+6Bvmb0pQBYBJ8FN6bo/GQosA8D4YPW7fCiBZfmgtimEjfPYzhVSKQYP SpflnIwOilOyhQgi3nHw+f6NxknZcTPPPaFLmV7puYZwWnPMdjF1y8yUTRbzMAkokX HOlpocdsJylJmAwquHCgcGdMLPdBWU2dGusiFqzM= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mga12.intel.com (mga12.intel.com [192.55.52.136]) by sourceware.org (Postfix) with ESMTPS id 77AFA3848007 for ; Thu, 1 Jul 2021 06:18:19 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 77AFA3848007 X-IronPort-AV: E=McAfee;i="6200,9189,10031"; a="188163553" X-IronPort-AV: E=Sophos;i="5.83,313,1616482800"; d="scan'208";a="188163553" Received: from orsmga005.jf.intel.com ([10.7.209.41]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Jun 2021 23:18:18 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.83,313,1616482800"; d="scan'208";a="626257644" Received: from scymds01.sc.intel.com ([10.148.94.138]) by orsmga005.jf.intel.com with ESMTP; 30 Jun 2021 23:18:18 -0700 Received: from shliclel320.sh.intel.com (shliclel320.sh.intel.com [10.239.236.50]) by scymds01.sc.intel.com with ESMTP id 1616GmfX031625; Wed, 30 Jun 2021 23:18:16 -0700 To: gcc-patches@gcc.gnu.org Subject: [PATCH 54/62] AVX512FP16: Add expander for ceil/floor/trunc/roundeven. Date: Thu, 1 Jul 2021 14:16:40 +0800 Message-Id: <20210701061648.9447-55-hongtao.liu@intel.com> X-Mailer: git-send-email 2.18.1 In-Reply-To: <20210701061648.9447-1-hongtao.liu@intel.com> References: <20210701061648.9447-1-hongtao.liu@intel.com> X-Spam-Status: No, score=-12.1 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_NONE, KAM_DMARC_STATUS, KAM_LAZY_DOMAIN_SECURITY, KAM_SHORT, SPF_HELO_PASS, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: liuhongt via Gcc-patches From: liuhongt Reply-To: liuhongt Cc: jakub@redhat.com Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" gcc/ChangeLog: * config/i386/i386.md (hf2): New expander. (sse4_1_round2): Extend from MODEF to MODEFH. * config/i386/sse.md (*sse4_1_round): Extend from VF_128 to VFH_128. gcc/testsuite/ChangeLog: * gcc.target/i386/avx512fp16-builtin-round-1.c: New test. --- gcc/config/i386/i386.md | 19 ++++++++++-- gcc/config/i386/sse.md | 8 ++--- .../i386/avx512fp16-builtin-round-1.c | 31 +++++++++++++++++++ 3 files changed, 51 insertions(+), 7 deletions(-) create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-builtin-round-1.c diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md index 81c893c60de..247a6e489ef 100644 --- a/gcc/config/i386/i386.md +++ b/gcc/config/i386/i386.md @@ -17721,9 +17721,9 @@ (define_expand "significand2" (define_insn "sse4_1_round2" - [(set (match_operand:MODEF 0 "register_operand" "=x,x,x,v,v") - (unspec:MODEF - [(match_operand:MODEF 1 "nonimmediate_operand" "0,x,m,v,m") + [(set (match_operand:MODEFH 0 "register_operand" "=x,x,x,v,v") + (unspec:MODEFH + [(match_operand:MODEFH 1 "nonimmediate_operand" "0,x,m,v,m") (match_operand:SI 2 "const_0_to_15_operand" "n,n,n,n,n")] UNSPEC_ROUND))] "TARGET_SSE4_1" @@ -17980,6 +17980,19 @@ (define_expand "xf2" "TARGET_USE_FANCY_MATH_387 && (flag_fp_int_builtin_inexact || !flag_trapping_math)") +(define_expand "hf2" + [(parallel [(set (match_operand:HF 0 "register_operand") + (unspec:HF [(match_operand:HF 1 "register_operand")] + FRNDINT_ROUNDING)) + (clobber (reg:CC FLAGS_REG))])] + "TARGET_AVX512FP16" +{ + emit_insn (gen_sse4_1_roundhf2 + (operands[0], operands[1], + GEN_INT (ROUND_ | ROUND_NO_EXC))); + DONE; +}) + (define_expand "2" [(parallel [(set (match_operand:MODEF 0 "register_operand") (unspec:MODEF [(match_operand:MODEF 1 "register_operand")] diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index b47e7f0b82a..a76c30c75cb 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -20202,14 +20202,14 @@ (define_insn "sse4_1_round" (set_attr "mode" "")]) (define_insn "*sse4_1_round" - [(set (match_operand:VF_128 0 "register_operand" "=Yr,*x,x,v") - (vec_merge:VF_128 - (vec_duplicate:VF_128 + [(set (match_operand:VFH_128 0 "register_operand" "=Yr,*x,x,v") + (vec_merge:VFH_128 + (vec_duplicate:VFH_128 (unspec: [(match_operand: 2 "nonimmediate_operand" "Yrm,*xm,xm,vm") (match_operand:SI 3 "const_0_to_15_operand" "n,n,n,n")] UNSPEC_ROUND)) - (match_operand:VF_128 1 "register_operand" "0,0,x,v") + (match_operand:VFH_128 1 "register_operand" "0,0,x,v") (const_int 1)))] "TARGET_SSE4_1" "@ diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-builtin-round-1.c b/gcc/testsuite/gcc.target/i386/avx512fp16-builtin-round-1.c new file mode 100644 index 00000000000..3cab1526967 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-builtin-round-1.c @@ -0,0 +1,31 @@ +/* { dg-do compile } */ +/* { dg-options "-Ofast -mavx512fp16" } */ + +_Float16 +f1 (_Float16 x) +{ + return __builtin_truncf16 (x); +} + +_Float16 +f2 (_Float16 x) +{ + return __builtin_floorf16 (x); +} + +_Float16 +f3 (_Float16 x) +{ + return __builtin_ceilf16 (x); +} + +_Float16 +f4 (_Float16 x) +{ + return __builtin_roundevenf16 (x); +} + +/* { dg-final { scan-assembler-times "vrndscalesh\[ \\t\]+\\\$11\[^\n\r\]*xmm\[0-9\]" 1 } } */ +/* { dg-final { scan-assembler-times "vrndscalesh\[ \\t\]+\\\$10\[^\n\r\]*xmm\[0-9\]" 1 } } */ +/* { dg-final { scan-assembler-times "vrndscalesh\[ \\t\]+\\\$9\[^\n\r\]*xmm\[0-9\]" 1 } } */ +/* { dg-final { scan-assembler-times "vrndscalesh\[ \\t\]+\\\$8\[^\n\r\]*xmm\[0-9\]" 1 } } */ From patchwork Thu Jul 1 06:16:41 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: liuhongt X-Patchwork-Id: 1499408 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256 header.s=default header.b=ZDXLvbWq; dkim-atps=neutral Received: from sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4GFqMZ57qwz9sWw for ; Thu, 1 Jul 2021 17:19:06 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 3AB5B383D001 for ; Thu, 1 Jul 2021 07:19:04 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 3AB5B383D001 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1625123944; bh=9LkOD1VC+rOXKZONuWkiD56hl5IKTHaBiX0Z9dkVYQ8=; h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=ZDXLvbWq1Nue7Pwk70CgICSALyRJ1kTrf6tLUJ+SRGDrMvkjplEA1nFLCToN16bji rAJR4W/7f8nYyONvMjIgcMO8DudGjVL6j+ozmBsCAy6LZqNXiDCLFMtRrCbRUmQP4L Bj1zVcCDrHTZgyqSF62jkzKyiBZix1AEp6awi7AY= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mga01.intel.com (mga01.intel.com [192.55.52.88]) by sourceware.org (Postfix) with ESMTPS id F2E4A384800C for ; Thu, 1 Jul 2021 06:18:20 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org F2E4A384800C X-IronPort-AV: E=McAfee;i="6200,9189,10031"; a="230128776" X-IronPort-AV: E=Sophos;i="5.83,313,1616482800"; d="scan'208";a="230128776" Received: from orsmga006.jf.intel.com ([10.7.209.51]) by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Jun 2021 23:18:20 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.83,313,1616482800"; d="scan'208";a="408822450" Received: from scymds01.sc.intel.com ([10.148.94.138]) by orsmga006.jf.intel.com with ESMTP; 30 Jun 2021 23:18:19 -0700 Received: from shliclel320.sh.intel.com (shliclel320.sh.intel.com [10.239.236.50]) by scymds01.sc.intel.com with ESMTP id 1616GmfY031625; Wed, 30 Jun 2021 23:18:18 -0700 To: gcc-patches@gcc.gnu.org Subject: [PATCH 55/62] AVX512FP16: Add expander for cstorehf4. Date: Thu, 1 Jul 2021 14:16:41 +0800 Message-Id: <20210701061648.9447-56-hongtao.liu@intel.com> X-Mailer: git-send-email 2.18.1 In-Reply-To: <20210701061648.9447-1-hongtao.liu@intel.com> References: <20210701061648.9447-1-hongtao.liu@intel.com> X-Spam-Status: No, score=-12.1 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_NONE, KAM_DMARC_STATUS, KAM_LAZY_DOMAIN_SECURITY, KAM_SHORT, RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: liuhongt via Gcc-patches From: liuhongt Reply-To: liuhongt Cc: jakub@redhat.com Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" gcc/ChangeLog: * config/i386/i386.md (cstore4): Extend from MODEF to MODEFH. gcc/testsuite/ChangeLog: * gcc.target/i386/avx512fp16-builtin-fpcompare-1.c: New test. * gcc.target/i386/avx512fp16-builtin-fpcompare-2.c: New test. --- gcc/config/i386/i386.md | 4 +- .../i386/avx512fp16-builtin-fpcompare-1.c | 40 +++++++++++++++++++ .../i386/avx512fp16-builtin-fpcompare-2.c | 29 ++++++++++++++ 3 files changed, 71 insertions(+), 2 deletions(-) create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-builtin-fpcompare-1.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-builtin-fpcompare-2.c diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md index 247a6e489ef..5f45c4ff583 100644 --- a/gcc/config/i386/i386.md +++ b/gcc/config/i386/i386.md @@ -1524,8 +1524,8 @@ (define_expand "cbranch4" (define_expand "cstore4" [(set (reg:CC FLAGS_REG) - (compare:CC (match_operand:MODEF 2 "cmp_fp_expander_operand") - (match_operand:MODEF 3 "cmp_fp_expander_operand"))) + (compare:CC (match_operand:MODEFH 2 "cmp_fp_expander_operand") + (match_operand:MODEFH 3 "cmp_fp_expander_operand"))) (set (match_operand:QI 0 "register_operand") (match_operator 1 "ix86_fp_comparison_operator" [(reg:CC FLAGS_REG) diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-builtin-fpcompare-1.c b/gcc/testsuite/gcc.target/i386/avx512fp16-builtin-fpcompare-1.c new file mode 100644 index 00000000000..62115f15f30 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-builtin-fpcompare-1.c @@ -0,0 +1,40 @@ +/* { dg-do compile } */ +/* { dg-options "-Ofast -mavx512fp16" } */ + +int +f1 (_Float16 x, _Float16 y) +{ + return x > y; +} + +int +f2 (_Float16 x, _Float16 y) +{ + return x < y; +} + +/* { dg-final { scan-assembler-times "seta" 2 } } */ + +int +f3 (_Float16 x, _Float16 y) +{ + return x >= y; +} + +int +f4 (_Float16 x, _Float16 y) +{ + return x <= y; +} + +/* { dg-final { scan-assembler-times "setnb" 2 } } */ + +int +f5 (_Float16 x, _Float16 y) +{ + return __builtin_isunordered (x, y); +} + +/* { dg-final { scan-assembler-not "vcvtsh2s\[sd\]" } } */ +/* { dg-final { scan-assembler-times "xorl" 5 } } */ +/* { dg-final { scan-assembler-times "vcomish\[^\n\r\]*xmm\[0-9\]" 4 } } */ diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-builtin-fpcompare-2.c b/gcc/testsuite/gcc.target/i386/avx512fp16-builtin-fpcompare-2.c new file mode 100644 index 00000000000..150c351e784 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-builtin-fpcompare-2.c @@ -0,0 +1,29 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -mfpmath=sse -mavx512fp16" } */ + +int +foo (_Float16 y) +{ + return __builtin_isinf (y); +} + +int +foo2 (_Float16 y) +{ + return __builtin_isfinite (y); +} + +int +foo3 (_Float16 y) +{ + return __builtin_signbit(y); +} + +int +foo4 (_Float16 y) +{ + return __builtin_isnormal (y); +} + +/* { dg-final { scan-assembler-not "vcvtsh2s\[sd\]" } } */ +/* { dg-final { scan-assembler-times "vucomish\[^\n\r\]*xmm\[0-9\]" 4 } } */ From patchwork Thu Jul 1 06:16:42 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: liuhongt X-Patchwork-Id: 1499412 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256 header.s=default header.b=hB8OlN/7; dkim-atps=neutral Received: from sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4GFqNt16Drz9sWw for ; Thu, 1 Jul 2021 17:20:13 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id B140C384A014 for ; Thu, 1 Jul 2021 07:20:10 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org B140C384A014 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1625124010; bh=lFobg/tj2bZc9Q7SXepyyX8nzO2U7kgtdrKKYtjK/Lg=; h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=hB8OlN/7NYgnjmXPtASIHHPwYusHVYXMXX8oeV5VBkrtE21gPAyhjS1j018on1ecX WwJhzPjPJXDgdhjDUWfxbOVF3t6HloUXBwZ+YPPUpdvfZymZRN+zRNTNw53efnUrrE nznBCKPGzxjMC0InN6JuBPp1jdR0/JH6gTFr16wU= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mga02.intel.com (mga02.intel.com [134.134.136.20]) by sourceware.org (Postfix) with ESMTPS id 9E3323848422 for ; Thu, 1 Jul 2021 06:18:22 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 9E3323848422 X-IronPort-AV: E=McAfee;i="6200,9189,10031"; a="195639253" X-IronPort-AV: E=Sophos;i="5.83,313,1616482800"; d="scan'208";a="195639253" Received: from fmsmga005.fm.intel.com ([10.253.24.32]) by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Jun 2021 23:18:21 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.83,313,1616482800"; d="scan'208";a="644339271" Received: from scymds01.sc.intel.com ([10.148.94.138]) by fmsmga005.fm.intel.com with ESMTP; 30 Jun 2021 23:18:21 -0700 Received: from shliclel320.sh.intel.com (shliclel320.sh.intel.com [10.239.236.50]) by scymds01.sc.intel.com with ESMTP id 1616GmfZ031625; Wed, 30 Jun 2021 23:18:20 -0700 To: gcc-patches@gcc.gnu.org Subject: [PATCH 56/62] AVX512FP16: Optimize (_Float16) sqrtf ((float) f16) to sqrtf16 (f16). Date: Thu, 1 Jul 2021 14:16:42 +0800 Message-Id: <20210701061648.9447-57-hongtao.liu@intel.com> X-Mailer: git-send-email 2.18.1 In-Reply-To: <20210701061648.9447-1-hongtao.liu@intel.com> References: <20210701061648.9447-1-hongtao.liu@intel.com> X-Spam-Status: No, score=-12.3 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_NONE, KAM_DMARC_STATUS, KAM_LAZY_DOMAIN_SECURITY, KAM_SHORT, RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: liuhongt via Gcc-patches From: liuhongt Reply-To: liuhongt Cc: jakub@redhat.com Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" gcc/ChangeLog: * config/i386/i386.md (*sqrthf2): New define_insn. * config/i386/sse.md (*avx512fp16_vmsqrthf2): Ditto. gcc/testsuite/ChangeLog: * gcc.target/i386/avx512fp16-builtin-sqrt-2.c: New test. --- gcc/config/i386/i386.md | 18 ++++++++++++++++++ gcc/config/i386/sse.md | 18 ++++++++++++++++++ .../i386/avx512fp16-builtin-sqrt-2.c | 18 ++++++++++++++++++ 3 files changed, 54 insertions(+) create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-builtin-sqrt-2.c diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md index 5f45c4ff583..684b2080a93 100644 --- a/gcc/config/i386/i386.md +++ b/gcc/config/i386/i386.md @@ -16583,6 +16583,24 @@ (define_insn "*sqrt2_sse" ] (symbol_ref "true")))]) +/* Optimize for code like (_Float16) __builtin_sqrtf ((float) f16) + since it's not handled in frontend. */ +(define_insn "*sqrthf2" + [(set (match_operand:HF 0 "register_operand" "=v,v") + (float_truncate:HF + (sqrt:MODEF + (float_extend:MODEF + (match_operand:HF 1 "nonimmediate_operand" "v,m")))))] + "TARGET_AVX512FP16" + "@ + vsqrtsh\t{%d1, %0|%0, %d1} + vsqrtsh\t{%1, %d0|%d0, %1}" + [(set_attr "type" "sse") + (set_attr "atom_sse_attr" "sqrt") + (set_attr "prefix" "evex") + (set_attr "avx_partial_xmm_update" "false,true") + (set_attr "mode" "HF")]) + (define_expand "sqrthf2" [(set (match_operand:HF 0 "register_operand") (sqrt:HF diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index a76c30c75cb..f87f6893835 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -2407,6 +2407,24 @@ (define_insn "*_vmsqrt2" (set_attr "btver2_sse_attr" "sqrt") (set_attr "mode" "")]) +(define_insn "*avx512fp16_vmsqrthf2" + [(set (match_operand:V8HF 0 "register_operand" "=v") + (vec_merge:V8HF + (vec_duplicate:V8HF + (float_truncate:HF + (sqrt:MODEF + (float_extend:MODEF + (match_operand:HF 1 "nonimmediate_operand" ""))))) + (match_operand:VFH_128 2 "register_operand" "v") + (const_int 1)))] + "TARGET_AVX512FP16" + "vsqrtsh\t{%1, %2, %0|%0, %2, %1}" + [(set_attr "type" "sse") + (set_attr "atom_sse_attr" "sqrt") + (set_attr "prefix" "evex") + (set_attr "mode" "HF")]) + + (define_expand "rsqrt2" [(set (match_operand:VF1_AVX512ER_128_256 0 "register_operand") (unspec:VF1_AVX512ER_128_256 diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-builtin-sqrt-2.c b/gcc/testsuite/gcc.target/i386/avx512fp16-builtin-sqrt-2.c new file mode 100644 index 00000000000..4fefee179af --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-builtin-sqrt-2.c @@ -0,0 +1,18 @@ +/* { dg-do compile } */ +/* { dg-options "-Ofast -mavx512fp16" } */ + +#include +_Float16 +foo (_Float16 f16) +{ + return sqrtf (f16); +} + +_Float16 +foo1 (_Float16 f16) +{ + return sqrt (f16); +} + +/* { dg-final { scan-assembler-not "vcvtsh2s\[sd\]" } } */ +/* { dg-final { scan-assembler-times "vsqrtsh\[^\n\r\]*xmm\[0-9\]" 2 } } */ From patchwork Thu Jul 1 06:16:43 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: liuhongt X-Patchwork-Id: 1499413 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256 header.s=default header.b=b2u4arxs; dkim-atps=neutral Received: from sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4GFqQ10trBz9sVb for ; Thu, 1 Jul 2021 17:21:12 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 1BCAB3951822 for ; Thu, 1 Jul 2021 07:21:10 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 1BCAB3951822 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1625124070; bh=9LBshsE9wam/Wia3YFcvH+kZDPzppmXNrVAbRu0ieG8=; h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=b2u4arxsuZCSfnoP+eW4w8nHwYHjPlInMX8XE/hlT/N895muItwsPyFzKqNnfi7+m 4L5MkHgnlGwExbcZGB2B8Giqs1S4akEQNk0+vk+CGb+DbEhuqo4wuYk70QsQME7bnQ TRtrV/CEbJiN0AFhil9JkGFvhHQZjZQtNa0OIDtQ= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mga07.intel.com (mga07.intel.com [134.134.136.100]) by sourceware.org (Postfix) with ESMTPS id 3F9C13858034 for ; Thu, 1 Jul 2021 06:18:24 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 3F9C13858034 X-IronPort-AV: E=McAfee;i="6200,9189,10031"; a="272334162" X-IronPort-AV: E=Sophos;i="5.83,313,1616482800"; d="scan'208";a="272334162" Received: from orsmga004.jf.intel.com ([10.7.209.38]) by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Jun 2021 23:18:23 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.83,313,1616482800"; d="scan'208";a="558546834" Received: from scymds01.sc.intel.com ([10.148.94.138]) by orsmga004.jf.intel.com with ESMTP; 30 Jun 2021 23:18:22 -0700 Received: from shliclel320.sh.intel.com (shliclel320.sh.intel.com [10.239.236.50]) by scymds01.sc.intel.com with ESMTP id 1616Gmfa031625; Wed, 30 Jun 2021 23:18:21 -0700 To: gcc-patches@gcc.gnu.org Subject: [PATCH 57/62] AVX512FP16: Add expander for fmahf4 Date: Thu, 1 Jul 2021 14:16:43 +0800 Message-Id: <20210701061648.9447-58-hongtao.liu@intel.com> X-Mailer: git-send-email 2.18.1 In-Reply-To: <20210701061648.9447-1-hongtao.liu@intel.com> References: <20210701061648.9447-1-hongtao.liu@intel.com> X-Spam-Status: No, score=-12.1 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_NONE, KAM_DMARC_STATUS, KAM_LAZY_DOMAIN_SECURITY, KAM_NUMSUBJECT, KAM_SHORT, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: liuhongt via Gcc-patches From: liuhongt Reply-To: liuhongt Cc: jakub@redhat.com Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" gcc/ChangeLog: * config/i386/sse.md (FMAMODEM): extend to handle FP16. (VFH_SF_AVX512VL): Extend to handle HFmode. (VF_SF_AVX512VL): Deleted. gcc/testsuite/ChangeLog: * gcc.target/i386/avx512fp16-fma-1.c: New test. * gcc.target/i386/avx512fp16vl-fma-1.c: New test. * gcc.target/i386/avx512fp16vl-fma-vectorize-1.c: New test. --- gcc/config/i386/sse.md | 11 +-- .../gcc.target/i386/avx512fp16-fma-1.c | 69 ++++++++++++++++++ .../gcc.target/i386/avx512fp16vl-fma-1.c | 70 +++++++++++++++++++ .../i386/avx512fp16vl-fma-vectorize-1.c | 45 ++++++++++++ 4 files changed, 190 insertions(+), 5 deletions(-) create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-fma-1.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-fma-1.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-fma-vectorize-1.c diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index f87f6893835..2b8d12086f4 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -4489,7 +4489,11 @@ (define_mode_iterator FMAMODEM (V8SF "TARGET_FMA || TARGET_FMA4 || TARGET_AVX512VL") (V4DF "TARGET_FMA || TARGET_FMA4 || TARGET_AVX512VL") (V16SF "TARGET_AVX512F") - (V8DF "TARGET_AVX512F")]) + (V8DF "TARGET_AVX512F") + (HF "TARGET_AVX512FP16") + (V8HF "TARGET_AVX512FP16 && TARGET_AVX512VL") + (V16HF "TARGET_AVX512FP16 && TARGET_AVX512VL") + (V32HF "TARGET_AVX512FP16")]) (define_expand "fma4" [(set (match_operand:FMAMODEM 0 "register_operand") @@ -4597,14 +4601,11 @@ (define_insn "*fma_fmadd_" (set_attr "mode" "")]) ;; Suppose AVX-512F as baseline -(define_mode_iterator VF_SF_AVX512VL - [SF V16SF (V8SF "TARGET_AVX512VL") (V4SF "TARGET_AVX512VL") - DF V8DF (V4DF "TARGET_AVX512VL") (V2DF "TARGET_AVX512VL")]) - (define_mode_iterator VFH_SF_AVX512VL [(V32HF "TARGET_AVX512FP16") (V16HF "TARGET_AVX512FP16 && TARGET_AVX512VL") (V8HF "TARGET_AVX512FP16 && TARGET_AVX512VL") + (HF "TARGET_AVX512FP16") SF V16SF (V8SF "TARGET_AVX512VL") (V4SF "TARGET_AVX512VL") DF V8DF (V4DF "TARGET_AVX512VL") (V2DF "TARGET_AVX512VL")]) diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-fma-1.c b/gcc/testsuite/gcc.target/i386/avx512fp16-fma-1.c new file mode 100644 index 00000000000..d78d7629838 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-fma-1.c @@ -0,0 +1,69 @@ +/* { dg-do compile } */ +/* { dg-options "-Ofast -mavx512fp16" } */ + +typedef _Float16 v32hf __attribute__ ((__vector_size__ (64))); + +_Float16 +foo1 (_Float16 a, _Float16 b, _Float16 c) +{ + return a * b + c; +} + +/* { dg-final { scan-assembler-times "vfmadd132sh\[^\n\r\]*xmm\[0-9\]" 1 } } */ + +_Float16 +foo2 (_Float16 a, _Float16 b, _Float16 c) +{ + return -a * b + c; +} + +/* { dg-final { scan-assembler-times "vfnmadd132sh\[^\n\r\]*xmm\[0-9\]" 1 } } */ + +_Float16 +foo3 (_Float16 a, _Float16 b, _Float16 c) +{ + return a * b - c; +} + +/* { dg-final { scan-assembler-times "vfmsub132sh\[^\n\r\]*xmm\[0-9\]" 1 } } */ + +_Float16 +foo4 (_Float16 a, _Float16 b, _Float16 c) +{ + return -a * b - c; +} + +/* { dg-final { scan-assembler-times "vfnmsub132sh\[^\n\r\]*xmm\[0-9\]" 1 } } */ + +v32hf +foo5 (v32hf a, v32hf b, v32hf c) +{ + return a * b + c; +} + +/* { dg-final { scan-assembler-times "vfmadd132ph\[^\n\r\]*zmm\[0-9\]" 1 } } */ + +v32hf +foo6 (v32hf a, v32hf b, v32hf c) +{ + return -a * b + c; +} + +/* { dg-final { scan-assembler-times "vfnmadd132ph\[^\n\r\]*zmm\[0-9\]" 1 } } */ + +v32hf +foo7 (v32hf a, v32hf b, v32hf c) +{ + return a * b - c; +} + +/* { dg-final { scan-assembler-times "vfmsub132ph\[^\n\r\]*zmm\[0-9\]" 1 } } */ + +v32hf +foo8 (v32hf a, v32hf b, v32hf c) +{ + return -a * b - c; +} + +/* { dg-final { scan-assembler-times "vfnmsub132ph\[^\n\r\]*zmm\[0-9\]" 1 } } */ + diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-fma-1.c b/gcc/testsuite/gcc.target/i386/avx512fp16vl-fma-1.c new file mode 100644 index 00000000000..1a832f37d6c --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-fma-1.c @@ -0,0 +1,70 @@ +/* { dg-do compile } */ +/* { dg-options "-Ofast -mavx512fp16 -mavx512vl" } */ + +typedef _Float16 v8hf __attribute__ ((__vector_size__ (16))); +typedef _Float16 v16hf __attribute__ ((__vector_size__ (32))); + +v8hf +foo1 (v8hf a, v8hf b, v8hf c) +{ + return a * b + c; +} + +/* { dg-final { scan-assembler-times "vfmadd132ph\[^\n\r\]*xmm\[0-9\]" 1 } } */ + +v8hf +foo2 (v8hf a, v8hf b, v8hf c) +{ + return -a * b + c; +} + +/* { dg-final { scan-assembler-times "vfnmadd132ph\[^\n\r\]*xmm\[0-9\]" 1 } } */ + +v8hf +foo3 (v8hf a, v8hf b, v8hf c) +{ + return a * b - c; +} + +/* { dg-final { scan-assembler-times "vfmsub132ph\[^\n\r\]*xmm\[0-9\]" 1 } } */ + +v8hf +foo4 (v8hf a, v8hf b, v8hf c) +{ + return -a * b - c; +} + +/* { dg-final { scan-assembler-times "vfnmsub132ph\[^\n\r\]*xmm\[0-9\]" 1 } } */ + +v16hf +foo5 (v16hf a, v16hf b, v16hf c) +{ + return a * b + c; +} + +/* { dg-final { scan-assembler-times "vfmadd132ph\[^\n\r\]*ymm\[0-9\]" 1 } } */ + +v16hf +foo6 (v16hf a, v16hf b, v16hf c) +{ + return -a * b + c; +} + +/* { dg-final { scan-assembler-times "vfnmadd132ph\[^\n\r\]*ymm\[0-9\]" 1 } } */ + +v16hf +foo7 (v16hf a, v16hf b, v16hf c) +{ + return a * b - c; +} + +/* { dg-final { scan-assembler-times "vfmsub132ph\[^\n\r\]*ymm\[0-9\]" 1 } } */ + +v16hf +foo8 (v16hf a, v16hf b, v16hf c) +{ + return -a * b - c; +} + +/* { dg-final { scan-assembler-times "vfnmsub132ph\[^\n\r\]*ymm\[0-9\]" 1 } } */ + diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-fma-vectorize-1.c b/gcc/testsuite/gcc.target/i386/avx512fp16vl-fma-vectorize-1.c new file mode 100644 index 00000000000..d0b8bec34f1 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-fma-vectorize-1.c @@ -0,0 +1,45 @@ +/* { dg-do compile } */ +/* { dg-options "-Ofast -mavx512fp16 -mavx512vl" } */ + +typedef _Float16 v8hf __attribute__ ((__vector_size__ (16))); +typedef _Float16 v16hf __attribute__ ((__vector_size__ (32))); + +void +foo1 (_Float16* __restrict pa, _Float16* __restrict pb, + _Float16* __restrict pc, _Float16* __restrict pd) +{ + for (int i = 0; i != 8; i++) + pd[i] = pa[i] * pb[i] + pc[i]; +} + +/* { dg-final { scan-assembler-times "vfmadd132ph\[^\n\r\]*xmm\[0-9\]" 1 } } */ + +void +foo2 (_Float16* __restrict pa, _Float16* __restrict pb, + _Float16* __restrict pc, _Float16* __restrict pd) +{ + for (int i = 0; i != 8; i++) + pd[i] = -pa[i] * pb[i] + pc[i]; +} + +/* { dg-final { scan-assembler-times "vfnmadd132ph\[^\n\r\]*xmm\[0-9\]" 1 } } */ + +void +foo3 (_Float16* __restrict pa, _Float16* __restrict pb, + _Float16* __restrict pc, _Float16* __restrict pd) +{ + for (int i = 0; i != 8; i++) + pd[i] = pa[i] * pb[i] - pc[i]; +} + +/* { dg-final { scan-assembler-times "vfmsub132ph\[^\n\r\]*xmm\[0-9\]" 1 } } */ + +void +foo4 (_Float16* __restrict pa, _Float16* __restrict pb, + _Float16* __restrict pc, _Float16* __restrict pd) +{ + for (int i = 0; i != 8; i++) + pd[i] = -pa[i] * pb[i] - pc[i]; +} + +/* { dg-final { scan-assembler-times "vfnmsub132ph\[^\n\r\]*xmm\[0-9\]" 1 } } */ From patchwork Thu Jul 1 06:16:44 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: liuhongt X-Patchwork-Id: 1499414 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=8.43.85.97; helo=sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256 header.s=default header.b=M/ml1rdz; dkim-atps=neutral Received: from sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4GFqR74Vv8z9sWw for ; Thu, 1 Jul 2021 17:22:11 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 4FC5B3840001 for ; Thu, 1 Jul 2021 07:22:09 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 4FC5B3840001 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1625124129; bh=WU/+zXe0Z7t6qZFNS5fHdYN6SjJUFQQLkA2I/hSIbOg=; h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=M/ml1rdzvDBqDpeoSgoYQaM2VrOcrnpEiCZ1eSSCh7VVGlOJaKvD+l6QIolu736vP nSDAAi1NTMwJ3s4wfpOOSsMXsv/mVfjisr5Ubeh59fqER4JXAnvm0GCoGoXu4X3NfK ZjKBHG1o+SEaA75Bp+099zwtYtn0bwR3U0Tsbwb4= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mga09.intel.com (mga09.intel.com [134.134.136.24]) by sourceware.org (Postfix) with ESMTPS id A3087384F017 for ; Thu, 1 Jul 2021 06:18:25 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org A3087384F017 X-IronPort-AV: E=McAfee;i="6200,9189,10031"; a="208430390" X-IronPort-AV: E=Sophos;i="5.83,313,1616482800"; d="scan'208";a="208430390" Received: from fmsmga006.fm.intel.com ([10.253.24.20]) by orsmga102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Jun 2021 23:18:24 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.83,313,1616482800"; d="scan'208";a="641962395" Received: from scymds01.sc.intel.com ([10.148.94.138]) by fmsmga006.fm.intel.com with ESMTP; 30 Jun 2021 23:18:24 -0700 Received: from shliclel320.sh.intel.com (shliclel320.sh.intel.com [10.239.236.50]) by scymds01.sc.intel.com with ESMTP id 1616Gmfb031625; Wed, 30 Jun 2021 23:18:23 -0700 To: gcc-patches@gcc.gnu.org Subject: [PATCH 58/62] AVX512FP16: Optimize for code like (_Float16) __builtin_ceif ((float) f16). Date: Thu, 1 Jul 2021 14:16:44 +0800 Message-Id: <20210701061648.9447-59-hongtao.liu@intel.com> X-Mailer: git-send-email 2.18.1 In-Reply-To: <20210701061648.9447-1-hongtao.liu@intel.com> References: <20210701061648.9447-1-hongtao.liu@intel.com> X-Spam-Status: No, score=-12.3 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_NONE, KAM_DMARC_STATUS, KAM_LAZY_DOMAIN_SECURITY, KAM_SHORT, RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: liuhongt via Gcc-patches From: liuhongt Reply-To: liuhongt Cc: jakub@redhat.com Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" gcc/ChangeLog: * config/i386/i386.md (*avx512fp16_1_roundhf2): New define_insn. * config/i386/sse.md (*avx512fp16_1_roundhf): New fine_insn. gcc/testsuite/ChangeLog: * gcc.target/i386/avx512fp16-builtin-round-2.c: New test. --- gcc/config/i386/i386.md | 22 ++++++++++++++ gcc/config/i386/sse.md | 20 +++++++++++++ .../i386/avx512fp16-builtin-round-2.c | 29 +++++++++++++++++++ 3 files changed, 71 insertions(+) create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-builtin-round-2.c diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md index 684b2080a93..457f37dcb61 100644 --- a/gcc/config/i386/i386.md +++ b/gcc/config/i386/i386.md @@ -17738,6 +17738,28 @@ (define_expand "significand2" }) +/* Optimize for code like (_Float16) __builtin_ceif ((float) f16) + since it's not handled in frontend. */ + +(define_insn "*avx512fp16_1_roundhf2" + [(set (match_operand:HF 0 "register_operand" "=v,v") + (float_truncate:HF + (unspec:MODEF + [(float_extend:MODEF + (match_operand:HF 1 "nonimmediate_operand" "v,m")) + (match_operand:SI 2 "const_0_to_15_operand" "n,n")] + UNSPEC_ROUND)))] + "TARGET_AVX512FP16" + "@ + vrndscalesh\t{%2, %d1, %0|%0, %d1, %2} + vrndscalesh\t{%2, %1, %d0|%d0, %1, %2}" + [(set_attr "type" "ssecvt") + (set_attr "length_immediate" "1,1") + (set_attr "prefix" "evex") + (set_attr "avx_partial_xmm_update" "false,true") + (set_attr "mode" "HF")]) + + (define_insn "sse4_1_round2" [(set (match_operand:MODEFH 0 "register_operand" "=x,x,x,v,v") (unspec:MODEFH diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index 2b8d12086f4..b3d8ffb4f8e 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -20220,6 +20220,26 @@ (define_insn "sse4_1_round" (set_attr "prefix" "orig,orig,vex,evex") (set_attr "mode" "")]) +(define_insn "*avx512fp16_1_roundhf" + [(set (match_operand:V8HF 0 "register_operand" "=v") + (vec_merge:V8HF + (vec_duplicate:V8HF + (float_truncate:HF + (unspec:MODEF + [(float_extend:MODEF + (match_operand:HF 2 "nonimmediate_operand" "vm")) + (match_operand:SI 3 "const_0_to_15_operand" "n")] + UNSPEC_ROUND))) + (match_operand:V8HF 1 "register_operand" "v") + (const_int 1)))] + "TARGET_AVX512FP16" + "vrndscalesh\t{%3, %2, %1, %0|%0, %1, %2, %3}" + [(set_attr "type" "ssecvt") + (set_attr "length_immediate" "1") + (set_attr "prefix_extra" "1") + (set_attr "prefix" "evex") + (set_attr "mode" "HF")]) + (define_insn "*sse4_1_round" [(set (match_operand:VFH_128 0 "register_operand" "=Yr,*x,x,v") (vec_merge:VFH_128 diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-builtin-round-2.c b/gcc/testsuite/gcc.target/i386/avx512fp16-builtin-round-2.c new file mode 100644 index 00000000000..bcd41929637 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-builtin-round-2.c @@ -0,0 +1,29 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -mavx512fp16" } */ + +_Float16 +foo1 (_Float16 a) +{ + return __builtin_roundeven (a); +} + +_Float16 +foo2 (_Float16 a) +{ + return __builtin_trunc (a); +} + +_Float16 +foo3 (_Float16 a) +{ + return __builtin_ceil (a); +} + +_Float16 +foo4 (_Float16 a) +{ + return __builtin_floor (a); +} + +/* { dg-final { scan-assembler-not "vcvtsh2s\[sd\]" } } */ +/* { dg-final { scan-assembler-times "vrndscalesh\[^\n\r\]*xmm\[0-9\]" 4 } } */ From patchwork Thu Jul 1 06:16:45 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: liuhongt X-Patchwork-Id: 1499415 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256 header.s=default header.b=UqNQSLfe; dkim-atps=neutral Received: from sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4GFqSH1yJQz9sVb for ; Thu, 1 Jul 2021 17:23:10 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 9233B395187A for ; Thu, 1 Jul 2021 07:23:08 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 9233B395187A DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1625124188; bh=HHep3nXNjydW3zHaBmfpXWQbaa818ZOjlb7rxfW8PQw=; h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=UqNQSLfelqs6mV/rRkIvvZN0B4sKmFYTgyOGus3jGRs4sqDyKHHxO70KIsV5UnK/Y qDjhEpyaNPmro3ANvbTyf5YzGa2104LfpxwqqsKJkRitZ9d9Dd3tOBPazxzANHSY/l 3MSItbF3xlX9iRe8CzgtxZV4w4FpjynodhbPZroc= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mga09.intel.com (mga09.intel.com [134.134.136.24]) by sourceware.org (Postfix) with ESMTPS id 45CDE3848422 for ; Thu, 1 Jul 2021 06:18:27 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 45CDE3848422 X-IronPort-AV: E=McAfee;i="6200,9189,10031"; a="208430392" X-IronPort-AV: E=Sophos;i="5.83,313,1616482800"; d="scan'208";a="208430392" Received: from fmsmga006.fm.intel.com ([10.253.24.20]) by orsmga102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Jun 2021 23:18:26 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.83,313,1616482800"; d="scan'208";a="641962414" Received: from scymds01.sc.intel.com ([10.148.94.138]) by fmsmga006.fm.intel.com with ESMTP; 30 Jun 2021 23:18:26 -0700 Received: from shliclel320.sh.intel.com (shliclel320.sh.intel.com [10.239.236.50]) by scymds01.sc.intel.com with ESMTP id 1616Gmfc031625; Wed, 30 Jun 2021 23:18:24 -0700 To: gcc-patches@gcc.gnu.org Subject: [PATCH 59/62] AVX512FP16: Support load/store/abs intrinsics. Date: Thu, 1 Jul 2021 14:16:45 +0800 Message-Id: <20210701061648.9447-60-hongtao.liu@intel.com> X-Mailer: git-send-email 2.18.1 In-Reply-To: <20210701061648.9447-1-hongtao.liu@intel.com> References: <20210701061648.9447-1-hongtao.liu@intel.com> X-Spam-Status: No, score=-12.3 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_NONE, KAM_DMARC_STATUS, KAM_LAZY_DOMAIN_SECURITY, KAM_SHORT, RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: liuhongt via Gcc-patches From: liuhongt Reply-To: liuhongt Cc: jakub@redhat.com, dianhong xu Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" From: dianhong xu gcc/ChangeLog: * config/i386/avx512fp16intrin.h (__m512h_u, __m256h_u, __m128h_u): New typedef. (_mm512_load_ph): New intrinsic. (_mm256_load_ph): Ditto. (_mm_load_ph): Ditto. (_mm512_loadu_ph): Ditto. (_mm256_loadu_ph): Ditto. (_mm_loadu_ph): Ditto. (_mm512_store_ph): Ditto. (_mm256_store_ph): Ditto. (_mm_store_ph): Ditto. (_mm512_storeu_ph): Ditto. (_mm256_storeu_ph): Ditto. (_mm_storeu_ph): Ditto. (_mm512_abs_ph): Ditto. * config/i386/avx512fp16vlintrin.h (_mm_abs_ph): Ditto. (_mm256_abs_ph): Ditto. gcc/testsuite/ChangeLog: * gcc.target/i386/avx512fp16-13.c: New test. --- gcc/config/i386/avx512fp16intrin.h | 97 ++++++++++++ gcc/config/i386/avx512fp16vlintrin.h | 16 ++ gcc/testsuite/gcc.target/i386/avx512fp16-13.c | 143 ++++++++++++++++++ 3 files changed, 256 insertions(+) create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-13.c diff --git a/gcc/config/i386/avx512fp16intrin.h b/gcc/config/i386/avx512fp16intrin.h index 39c10beb1de..b8ca9201828 100644 --- a/gcc/config/i386/avx512fp16intrin.h +++ b/gcc/config/i386/avx512fp16intrin.h @@ -45,6 +45,11 @@ typedef _Float16 __m128h __attribute__ ((__vector_size__ (16), __may_alias__)); typedef _Float16 __m256h __attribute__ ((__vector_size__ (32), __may_alias__)); typedef _Float16 __m512h __attribute__ ((__vector_size__ (64), __may_alias__)); +/* Unaligned version of the same type. */ +typedef _Float16 __m128h_u __attribute__ ((__vector_size__ (16), __may_alias__, __aligned__ (1))); +typedef _Float16 __m256h_u __attribute__ ((__vector_size__ (32), __may_alias__, __aligned__ (1))); +typedef _Float16 __m512h_u __attribute__ ((__vector_size__ (64), __may_alias__, __aligned__ (1))); + extern __inline __m128h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) _mm_set_ph (_Float16 __A7, _Float16 __A6, _Float16 __A5, @@ -362,6 +367,48 @@ _mm_load_sh (void const *__P) *(_Float16 const *) __P); } +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_load_ph (void const *__P) +{ + return *(const __m512h *) __P; +} + +extern __inline __m256h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_load_ph (void const *__P) +{ + return *(const __m256h *) __P; +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_load_ph (void const *__P) +{ + return *(const __m128h *) __P; +} + +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_loadu_ph (void const *__P) +{ + return *(const __m512h_u *) __P; +} + +extern __inline __m256h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_loadu_ph (void const *__P) +{ + return *(const __m256h_u *) __P; +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_loadu_ph (void const *__P) +{ + return *(const __m128h_u *) __P; +} + /* Stores the lower _Float16 value. */ extern __inline void __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) @@ -370,6 +417,56 @@ _mm_store_sh (void *__P, __m128h __A) *(_Float16 *) __P = ((__v8hf)__A)[0]; } +extern __inline void +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_store_ph (void *__P, __m512h __A) +{ + *(__m512h *) __P = __A; +} + +extern __inline void +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_store_ph (void *__P, __m256h __A) +{ + *(__m256h *) __P = __A; +} + +extern __inline void +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_store_ph (void *__P, __m128h __A) +{ + *(__m128h *) __P = __A; +} + +extern __inline void +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_storeu_ph (void *__P, __m512h __A) +{ + *(__m512h_u *) __P = __A; +} + +extern __inline void +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_storeu_ph (void *__P, __m256h __A) +{ + *(__m256h_u *) __P = __A; +} + +extern __inline void +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_storeu_ph (void *__P, __m128h __A) +{ + *(__m128h_u *) __P = __A; +} + +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_abs_ph(__m512h __A) +{ + return (__m512h) _mm512_and_epi32 ( _mm512_set1_epi32(0x7FFF7FFF), + (__m512i) __A); +} + /* Intrinsics v[add,sub,mul,div]ph. */ extern __inline __m512h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) diff --git a/gcc/config/i386/avx512fp16vlintrin.h b/gcc/config/i386/avx512fp16vlintrin.h index c7bdfbc0517..d4aa9928406 100644 --- a/gcc/config/i386/avx512fp16vlintrin.h +++ b/gcc/config/i386/avx512fp16vlintrin.h @@ -425,6 +425,22 @@ _mm256_maskz_min_ph (__mmask16 __A, __m256h __B, __m256h __C) _mm256_setzero_ph (), __A); } +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_abs_ph (__m128h __A) +{ + return (__m128h) _mm_and_si128 ( _mm_set1_epi32(0x7FFF7FFF), + (__m128i) __A); +} + +extern __inline __m256h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_abs_ph (__m256h __A) +{ + return (__m256h) _mm256_and_si256 ( _mm256_set1_epi32(0x7FFF7FFF), + (__m256i) __A); +} + /* vcmpph */ #ifdef __OPTIMIZE extern __inline __mmask8 diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-13.c b/gcc/testsuite/gcc.target/i386/avx512fp16-13.c new file mode 100644 index 00000000000..3b6219e493f --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-13.c @@ -0,0 +1,143 @@ +/* { dg-do compile} */ +/* { dg-options "-O2 -mavx512fp16 -mavx512vl" } */ + +#include +void +__attribute__ ((noinline, noclone)) +store512_ph (void *p, __m512h a) +{ + _mm512_store_ph (p, a); +} + +/* { dg-final { scan-assembler-times "vmovdqa64\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\]*\\)" 1 } } */ + +void +__attribute__ ((noinline, noclone)) +store256_ph (void *p, __m256h a) +{ + _mm256_store_ph (p, a); +} + +/* { dg-final { scan-assembler-times "vmovdqa64\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\]*\\)" 1 } } */ + +void +__attribute__ ((noinline, noclone)) +store_ph (void *p, __m128h a) +{ + _mm_store_ph (p, a); +} + +/* { dg-final { scan-assembler-times "vmovdqa64\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\]*\\)" 1 } } */ + +__m512h +__attribute__ ((noinline, noclone)) +load512_ph (void const *p) +{ + return _mm512_load_ph (p); +} + +/* { dg-final { scan-assembler-times "vmovdqa64\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\]*\\)" 1 } } */ + +__m256h +__attribute__ ((noinline, noclone)) +load256_ph (void const *p) +{ + return _mm256_load_ph (p); +} + +/* { dg-final { scan-assembler-times "vmovdqa64\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\]*\\)" 1 } } */ + +__m128h +__attribute__ ((noinline, noclone)) +load_ph (void const *p) +{ + return _mm_load_ph (p); +} +/* { dg-final { scan-assembler-times "vmovdqa64\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\]*\\)" 1 } } */ + +__m512h +__attribute__ ((noinline, noclone)) +load512u_ph (void const *p) +{ + return _mm512_loadu_ph (p); +} + +/* { dg-final { scan-assembler-times "vmovdqu16\[ \\t\]*\[^,\]*,\[^\{\n\]*%zmm\[0-9\]" 1 } } */ + +__m256h +__attribute__ ((noinline, noclone)) +load256u_ph (void const *p) +{ + return _mm256_loadu_ph (p); +} + +/* { dg-final { scan-assembler-times "vmovdqu16\[ \\t\]*\[^,\]*,\[^\{\n\]*%ymm\[0-9\]" 1 } } */ + +__m128h +__attribute__ ((noinline, noclone)) +load128u_ph (void const *p) +{ + return _mm_loadu_ph (p); +} + +/* { dg-final { scan-assembler-times "vmovdqu16\[ \\t\]*\[^,\]*,\[^\{\n\]*%xmm\[0-9\]" 1 } } */ + +void +__attribute__ ((noinline, noclone)) +store512u_ph (void *p, __m512h a) +{ + return _mm512_storeu_ph (p, a); +} + +/* { dg-final { scan-assembler-times "vmovdqu16\[ \\t\]*\[^\{\n\]*%zmm\[0-9\], *\[^,\]*" 1 } } */ + +void +__attribute__ ((noinline, noclone)) +store256u_ph (void *p, __m256h a) +{ + return _mm256_storeu_ph (p, a); +} + +/* { dg-final { scan-assembler-times "vmovdqu16\[ \\t\]*\[^\{\n\]*%ymm\[0-9\], *\[^,\]*" 1 } } */ + +void +__attribute__ ((noinline, noclone)) +storeu_ph (void *p, __m128h a) +{ + return _mm_storeu_ph (p, a); +} + +/* { dg-final { scan-assembler-times "vmovdqu16\[ \\t\]*\[^\{\n\]*%xmm\[0-9\], *\[^,\]*" 1 } } */ + +__m512h +__attribute__ ((noinline, noclone)) +abs512_ph (__m512h a) +{ + return _mm512_abs_ph (a); +} + +/* { dg-final { scan-assembler-times "vpandd\[ \\t\]+\[^\n\]*\\\{1to\[1-8\]+\\\}, %zmm\[0-9\]+, %zmm0" 1 { target {! ia32 } } } } */ +/* { dg-final { scan-assembler-times "vpbroadcastd\[^\n\]*%zmm\[0-9\]+" 1 { target ia32 } } } */ +/* { dg-final { scan-assembler-times "vpandd\[^\n\]*%zmm\[0-9\]+" 1 { target ia32 } } } */ + +__m256h +__attribute__ ((noinline, noclone)) +abs256_ph (__m256h a) +{ + return _mm256_abs_ph (a); +} + +/* { dg-final { scan-assembler-times "vpandq\[ \\t\]+\[^\n\]*\\\{1to\[1-4\]+\\\}, %ymm\[0-9\]+, %ymm0" 1 { target {! ia32 } } } } */ +/* { dg-final { scan-assembler-times "vpbroadcastq\[^\n\]*%ymm\[0-9\]+" 1 { target ia32 } } } */ +/* { dg-final { scan-assembler-times "vpand\[^\n\]*%ymm\[0-9\]+" 1 { target ia32 } } } */ + +__m128h +__attribute__ ((noinline, noclone)) +abs_ph (__m128h a) +{ + return _mm_abs_ph (a); +} + +/* { dg-final { scan-assembler-times "vpandq\[ \\t\]+\[^\n\]*\\\{1to\[1-2\]+\\\}, %xmm\[0-9\]+, %xmm0" 1 { target {! ia32 } } } } */ +/* { dg-final { scan-assembler-times "vpbroadcastq\[^\n\]*%xmm\[0-9\]+" 1 { target ia32 } } } */ +/* { dg-final { scan-assembler-times "vpand\[^\n\]*%xmm\[0-9\]+" 1 { target ia32 } } } */ From patchwork Thu Jul 1 06:16:46 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: liuhongt X-Patchwork-Id: 1499418 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256 header.s=default header.b=B+hUXwLf; dkim-atps=neutral Received: from sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4GFqXK500fz9sVb for ; Thu, 1 Jul 2021 17:26:41 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id DBB39384F00B for ; Thu, 1 Jul 2021 07:26:38 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org DBB39384F00B DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1625124398; bh=EIZy4Dwl9YAdu0KKiAapH7A/ke/6WXexyG09AV12Mvk=; h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=B+hUXwLfz32l0zNmuL5yIKwMJ3W7TBhbALFRcfhwDZpVPU85ic+fiUaHCzNwx4H7I tia+ROaDrlfawrZCv/47vhloVV8ib8xx7jxY1VW9ElIESU0l4xijNe50KEu9z5T9KI Ol2V8XRHbyveSPrjVf/mBZtE5OMTBnUhXuN4UeM0= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mga18.intel.com (mga18.intel.com [134.134.136.126]) by sourceware.org (Postfix) with ESMTPS id 49756384F02A for ; Thu, 1 Jul 2021 06:18:33 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 49756384F02A X-IronPort-AV: E=McAfee;i="6200,9189,10031"; a="195769994" X-IronPort-AV: E=Sophos;i="5.83,313,1616482800"; d="scan'208";a="195769994" Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Jun 2021 23:18:32 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.83,313,1616482800"; d="scan'208";a="420288069" Received: from scymds01.sc.intel.com ([10.148.94.138]) by fmsmga007.fm.intel.com with ESMTP; 30 Jun 2021 23:18:32 -0700 Received: from shliclel320.sh.intel.com (shliclel320.sh.intel.com [10.239.236.50]) by scymds01.sc.intel.com with ESMTP id 1616Gmfd031625; Wed, 30 Jun 2021 23:18:26 -0700 To: gcc-patches@gcc.gnu.org Subject: [PATCH 60/62] AVX512FP16: Add reduce operators(add/mul/min/max). Date: Thu, 1 Jul 2021 14:16:46 +0800 Message-Id: <20210701061648.9447-61-hongtao.liu@intel.com> X-Mailer: git-send-email 2.18.1 In-Reply-To: <20210701061648.9447-1-hongtao.liu@intel.com> References: <20210701061648.9447-1-hongtao.liu@intel.com> X-Spam-Status: No, score=-12.3 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_NONE, KAM_DMARC_STATUS, KAM_LAZY_DOMAIN_SECURITY, KAM_SHORT, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: liuhongt via Gcc-patches From: liuhongt Reply-To: liuhongt Cc: jakub@redhat.com, dianhong xu Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" From: dianhong xu gcc/ChangeLog: * config/i386/avx512fp16intrin.h (_MM512_REDUCE_OP): New macro (_mm512_reduce_add_ph): New intrinsic. (_mm512_reduce_mul_ph): Ditto. (_mm512_reduce_min_ph): Ditto. (_mm512_reduce_max_ph): Ditto. * config/i386/avx512fp16vlintrin.h (_MM256_REDUCE_OP/_MM_REDUCE_OP): New macro. (_mm256_reduce_add_ph): New intrinsic. (_mm256_reduce_mul_ph): Ditto. (_mm256_reduce_min_ph): Ditto. (_mm256_reduce_max_ph): Ditto. (_mm_reduce_add_ph): Ditto. (_mm_reduce_mul_ph): Ditto. (_mm_reduce_min_ph): Ditto. (_mm_reduce_max_ph): Ditto. gcc/testsuite/ChangeLog: * gcc.target/i386/avx512fp16-reduce-op-1.c: New test. * gcc.target/i386/avx512fp16vl-reduce-op-1.c: Ditto. --- gcc/config/i386/avx512fp16intrin.h | 69 +++++ gcc/config/i386/avx512fp16vlintrin.h | 105 ++++++++ .../gcc.target/i386/avx512fp16-reduce-op-1.c | 132 ++++++++++ .../i386/avx512fp16vl-reduce-op-1.c | 244 ++++++++++++++++++ 4 files changed, 550 insertions(+) create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-reduce-op-1.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-reduce-op-1.c diff --git a/gcc/config/i386/avx512fp16intrin.h b/gcc/config/i386/avx512fp16intrin.h index b8ca9201828..6e0f3a80e54 100644 --- a/gcc/config/i386/avx512fp16intrin.h +++ b/gcc/config/i386/avx512fp16intrin.h @@ -7056,6 +7056,75 @@ _mm_maskz_fmul_round_sch (__mmask8 __A, __m128h __B, __m128h __C, const int __E) #endif /* __OPTIMIZE__ */ +#define _MM512_REDUCE_OP(op) \ + __m256h __T1 = (__m256h) _mm512_extractf64x4_pd ((__m512d) __A, 0); \ + __m256h __T2 = (__m256h) _mm512_extractf64x4_pd ((__m512d) __A, 1); \ + __m256h __T3 = (__T1 op __T2); \ + __m128h __T4 = (__m128h) _mm256_extractf128_pd ((__m256d) __T3, 0); \ + __m128h __T5 = (__m128h) _mm256_extractf128_pd ((__m256d) __T3, 1); \ + __m128h __T6 = (__T4 op __T5); \ + __m128h __T7 = (__m128h) __builtin_shuffle ((__m128h)__T6, \ + (__v8hi) {4, 5, 6, 7, 0, 1, 2, 3}); \ + __m128h __T8 = (__T6 op __T7); \ + __m128h __T9 = (__m128h) __builtin_shuffle ((__m128h)__T8, \ + (__v8hi) {2, 3, 0, 1, 4, 5, 6, 7}); \ + __m128h __T10 = __T8 op __T9; \ + return __T10[0] op __T10[1] + +// TODO reduce +extern __inline _Float16 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_reduce_add_ph (__m512h __A) +{ + _MM512_REDUCE_OP(+); +} + +extern __inline _Float16 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_reduce_mul_ph (__m512h __A) +{ + _MM512_REDUCE_OP(*); +} + +#undef _MM512_REDUCE_OP +#define _MM512_REDUCE_OP(op) \ + __m512h __T1 = (__m512h) __builtin_shuffle ((__m512d) __A, \ + (__v8di) {4,5,6,7,0,0,0,0}); \ + __m512h __T2 = _mm512_##op(__A, __T1); \ + __m512h __T3 = (__m512h) __builtin_shuffle ((__m512d) __T2, \ + (__v8di) {2,3,0,0,0,0,0,0}); \ + __m512h __T4 = _mm512_##op(__T2, __T3); \ + __m512h __T5 = (__m512h) __builtin_shuffle ((__m512d) __T4, \ + (__v8di) {1,0,0,0,0,0,0,0}); \ + __m512h __T6 = _mm512_##op(__T4, __T5); \ + __m512h __T7 = (__m512h) __builtin_shuffle ((__m512) __T6, \ + (__v16si) {1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0}); \ + __m512h __T8 = _mm512_##op(__T6, __T7); \ + __m512h __T9 = (__m512h) __builtin_shuffle (__T8, \ + (__v32hi) {1,0,0,0,0,0,0,0,\ + 0,0,0,0,0,0,0,0,\ + 0,0,0,0,0,0,0,0,\ + 0,0,0,0,0,0,0,0}\ + ); \ + __m512h __T10 = _mm512_##op(__T8, __T9); \ + return __T10[0] + +extern __inline _Float16 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_reduce_min_ph (__m512h __A) +{ + _MM512_REDUCE_OP(min_ph); +} + +extern __inline _Float16 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_reduce_max_ph (__m512h __A) +{ + _MM512_REDUCE_OP(max_ph); +} + +#undef _MM512_REDUCE_OP + #ifdef __DISABLE_AVX512FP16__ #undef __DISABLE_AVX512FP16__ #pragma GCC pop_options diff --git a/gcc/config/i386/avx512fp16vlintrin.h b/gcc/config/i386/avx512fp16vlintrin.h index d4aa9928406..eea1941617f 100644 --- a/gcc/config/i386/avx512fp16vlintrin.h +++ b/gcc/config/i386/avx512fp16vlintrin.h @@ -3088,6 +3088,111 @@ _mm256_maskz_fcmul_pch (__mmask8 __A, __m256h __B, __m256h __C) __A); } +#define _MM256_REDUCE_OP(op) \ + __m128h __T1 = (__m128h) _mm256_extractf128_pd ((__m256d) __A, 0); \ + __m128h __T2 = (__m128h) _mm256_extractf128_pd ((__m256d) __A, 1); \ + __m128h __T3 = (__T1 op __T2); \ + __m128h __T4 = (__m128h) __builtin_shuffle (__T3, \ + (__v8hi) {4, 5, 6, 7, 0, 1, 2, 3}); \ + __m128h __T5 = (__T3) op (__T4); \ + __m128h __T6 = (__m128h) __builtin_shuffle (__T5, \ + (__v8hi) {2, 3, 0, 1, 4, 5, 6, 7}); \ + __m128h __T7 = __T5 op __T6; \ + return __T7[0] op __T7[1] + +extern __inline _Float16 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_reduce_add_ph (__m256h __A) +{ + _MM256_REDUCE_OP(+); +} + +extern __inline _Float16 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_reduce_mul_ph (__m256h __A) +{ + _MM256_REDUCE_OP(*); +} + +#undef _MM256_REDUCE_OP +#define _MM256_REDUCE_OP(op) \ + __m128h __T1 = (__m128h) _mm256_extractf128_pd ((__m256d) __A, 0); \ + __m128h __T2 = (__m128h) _mm256_extractf128_pd ((__m256d) __A, 1); \ + __m128h __T3 = _mm_##op (__T1, __T2); \ + __m128h __T4 = (__m128h) __builtin_shuffle (__T3, \ + (__v8hi) {2, 3, 0, 1, 6, 7, 4, 5}); \ + __m128h __T5 = _mm_##op (__T3, __T4); \ + __m128h __T6 = (__m128h) __builtin_shuffle (__T5, (__v8hi) {4, 5}); \ + __m128h __T7 = _mm_##op (__T5, __T6); \ + __m128h __T8 = (__m128h) __builtin_shuffle (__T7, (__v8hi) {1, 0}); \ + __m128h __T9 = _mm_##op (__T7, __T8); \ + return __T9[0] + +extern __inline _Float16 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_reduce_min_ph (__m256h __A) +{ + _MM256_REDUCE_OP(min_ph); +} + +extern __inline _Float16 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_reduce_max_ph (__m256h __A) +{ + _MM256_REDUCE_OP(max_ph); +} + +#define _MM_REDUCE_OP(op) \ + __m128h __T1 = (__m128h) __builtin_shuffle (__A, \ + (__v8hi) {4, 5, 6, 7, 0, 1, 2, 3}); \ + __m128h __T2 = (__A) op (__T1); \ + __m128h __T3 = (__m128h) __builtin_shuffle (__T2, \ + (__v8hi){2, 3, 0, 1, 4, 5, 6, 7}); \ + __m128h __T4 = __T2 op __T3; \ + return __T4[0] op __T4[1] + +extern __inline _Float16 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_reduce_add_ph (__m128h __A) +{ + _MM_REDUCE_OP(+); +} + +extern __inline _Float16 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_reduce_mul_ph (__m128h __A) +{ + _MM_REDUCE_OP(*); +} + +#undef _MM_REDUCE_OP +#define _MM_REDUCE_OP(op) \ + __m128h __T1 = (__m128h) __builtin_shuffle (__A, \ + (__v8hi) {2, 3, 0, 1, 6, 7, 4, 5}); \ + __m128h __T2 = _mm_##op (__A, __T1); \ + __m128h __T3 = (__m128h) __builtin_shuffle (__T2, (__v8hi){4, 5}); \ + __m128h __T4 = _mm_##op (__T2, __T3); \ + __m128h __T5 = (__m128h) __builtin_shuffle (__T4, (__v8hi){1, 0}); \ + __m128h __T6 = _mm_##op (__T4, __T5); \ + return __T6[0] + +extern __inline _Float16 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_reduce_min_ph (__m128h __A) +{ + _MM_REDUCE_OP(min_ph); +} + +extern __inline _Float16 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_reduce_max_ph (__m128h __A) +{ + _MM_REDUCE_OP(max_ph); +} + +#undef _MM256_REDUCE_OP +#undef _MM_REDUCE_OP + #ifdef __DISABLE_AVX512FP16VL__ #undef __DISABLE_AVX512FP16VL__ #pragma GCC pop_options diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-reduce-op-1.c b/gcc/testsuite/gcc.target/i386/avx512fp16-reduce-op-1.c new file mode 100644 index 00000000000..35563166536 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-reduce-op-1.c @@ -0,0 +1,132 @@ +/* { dg-do run { target avx512fp16 } } */ +/* { dg-options "-O2 -mavx512fp16" } */ + +static void do_test (void); + +#define DO_TEST do_test +#define AVX512FP16 + +#include +#include "avx512-check.h" + +__m512h a1 = { -39.3f16, -180.9f16, 13.4f16, 35.4f16, -41.1f16, -14.4f16, 24.5f16, 53.54f16, + 238.4f16, -134.8f16, 24.5f16, 35.6f16, -346.7f16, -43.4f16, -535.3f16, 324.7f16, + 82.5f16, 21.4f16, 24.4f16, 53.4f16, 23.5f16, -24.4f16, -34.5f16, -32.5f16, + 23.6f16, -13.4f16, 24.5f16, 35.5f16, -34.4f16, -24.5f16, -34.5f16, 13.5f16 }; + +__m512h a2 = { 1.25f16, 2.25f16, -0.25f16, 4.0f16, -2.0f16, 4.0f16, -3.0f16, 2.0f16, + -0.5f16, -1.0f16, 1.0f16, -1.0f16, 1.0f16, 1.0f16, 2.0f16, 4.0f16, + 1.25f16, 2.25f16, -4.25f16, 4.0f16, -2.4f16, 4.0f16, -3.0f, 2.0f16, + -4.5f16, 7.6f16, 0.7f16, -8.2f16, 2.1f16, 2.4f16, -2.0f16, 19.4f16 }; + +__attribute__((noinline, noclone)) _Float16 +test_reduce_add_ph (__m512h a) +{ + return _mm512_reduce_add_ph (a); +} + +__attribute__((noinline, noclone)) _Float16 +test_reduce_mul_ph (__m512h a) +{ + return _mm512_reduce_mul_ph (a); +} + +__attribute__((noinline, noclone)) _Float16 +test_reduce_max_ph (__m512h a) +{ + return _mm512_reduce_max_ph (a); +} + +__attribute__((noinline, noclone)) _Float16 +test_reduce_min_ph (__m512h a) +{ + return _mm512_reduce_min_ph (a); +} + +#define SIZE 32 +#define REF_ADDMUL(op, a) \ + __m256h __a1 = _mm256_setzero_ph (); \ + for (int i =0; i < 16; i++) { \ + __a1[i] = (_Float16) a[i] op (_Float16) a[i + 16]; \ + } \ + __m128h __a2 = _mm_setzero_ph (); \ + for (int i =0; i < 8; i++) { \ + __a2[i] = (_Float16) __a1[i] op (_Float16) __a1[i + 8]; \ + } \ + _Float16 __c0 = __a2[0] op __a2[4]; \ + _Float16 __c1 = __a2[1] op __a2[5]; \ + _Float16 __c2 = __a2[2] op __a2[6]; \ + _Float16 __c3 = __a2[3] op __a2[7]; \ + _Float16 __d0 = __c0 op __c2; \ + _Float16 __d1 = __c1 op __c3; \ + _Float16 __e0 = __d0 op __d1; \ + r3 = __e0 + +#define TESTOP(opname, op, a) \ + do { \ + _Float16 r1 = _mm512_reduce_##opname##_ph (a); \ + _Float16 r2 = test_reduce_##opname##_ph (a); \ + _Float16 r3 = a[0]; \ + if (r1 != r2) { \ + __builtin_abort (); \ + } \ + REF_ADDMUL (op, a); \ + if (r1 != r3) { \ + __builtin_abort (); \ + } \ + } while (0) + +#define TEST_ADDMUL_PH(a) \ + do { \ + TESTOP (add, +, a); \ + TESTOP (mul, *, a); \ + } while (0) + + static void + test_512_addmul_ph (void) + { + TEST_ADDMUL_PH (a1); + TEST_ADDMUL_PH (a2); + } + +#undef TESTOP +#define TESTOP(opname, op, a) \ + do { \ + _Float16 r1 = _mm512_reduce_##opname##_ph (a); \ + _Float16 r2 = test_reduce_##opname##_ph (a); \ + _Float16 r3 = a[0]; \ + if (r1 != r2) { \ + __builtin_abort (); \ + } \ + for (int i = 1; i < SIZE; i++) \ + r3 = r3 op a[i]; \ + if (r1 != r3) { \ + __builtin_abort (); \ + } \ + } while (0) + +#define TEST_MINMAX_PH(a) \ + do { \ + TESTOP (min, < a[i] ? r3 :, a); \ + TESTOP (max, > a[i] ? r3 :, a); \ + } while (0) + +static void +test_512_minmax_ph (void) +{ + TEST_MINMAX_PH (a1); + TEST_MINMAX_PH (a2); +} + +static void +do_test (void) +{ + test_512_addmul_ph(); + test_512_minmax_ph(); +} + +#undef SIZE +#undef REF_ADDMUL +#undef TESTOP +#undef TEST_ADDMUL_PH +#undef TEST_MINMAX_PH diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-reduce-op-1.c b/gcc/testsuite/gcc.target/i386/avx512fp16vl-reduce-op-1.c new file mode 100644 index 00000000000..70485d89720 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-reduce-op-1.c @@ -0,0 +1,244 @@ +/* { dg-do run { target avx512fp16 } } */ +/* { dg-options "-O2 -mavx512fp16 -mavx512vl" } */ + +static void do_test (void); + +#define DO_TEST do_test +#define AVX512FP16 + +#include +#include "avx512-check.h" + +__m256h a1 = { -39.3f16, -180.9f16, 13.4f16, 35.4f16, -41.1f16, -14.4f16, 24.5f16, 53.54f16, + 238.4f16, -134.8f16, 24.5f16, 35.6f16, -346.7f16, -43.4f16, -535.3f16, 324.7f16 }; +__m256h a2 = { 82.5f16, 21.4f16, 24.4f16, 53.4f16, 23.5f16, -24.4f16, -34.5f16, -32.5f16, + 23.6f16, -13.4f16, 24.5f16, 35.5f16, -34.4f16, -24.5f16, -34.5f16, 13.5f16 }; + +__m128h b1 = { 1.25f16, 2.25f16, -0.25f16, 4.0f16, -2.0f16, 4.0f16, -3.0f16, 2.0f16 }; +__m128h b2 = { -0.5f16, -1.0f16, 1.0f16, -1.0f16, 1.0f16, 1.0f16, 2.0f16, 4.0f16 }; +__m128h b3 = { 1.25f16, 2.25f16, -4.25f16, 4.0f16, -2.4f16, 4.0f16, -3.0f, 2.0f16 }; +__m128h b4 = { -4.5f16, 7.6f16, 0.7f16, -8.2f16, 2.1f16, 2.4f16, -2.0f16, 1.4f16 }; + +__attribute__((noinline, noclone)) _Float16 +test_reduce_256_add_ph (__m256h a) +{ + return _mm256_reduce_add_ph (a); +} + +__attribute__((noinline, noclone)) _Float16 +test_reduce_256_mul_ph (__m256h a) +{ + return _mm256_reduce_mul_ph (a); +} + +__attribute__((noinline, noclone)) _Float16 +test_reduce_256_max_ph (__m256h a) +{ + return _mm256_reduce_max_ph (a); +} + +__attribute__((noinline, noclone)) _Float16 +test_reduce_256_min_ph (__m256h a) +{ + return _mm256_reduce_min_ph (a); +} + +__attribute__((noinline, noclone)) _Float16 +test_reduce_add_ph (__m128h b) +{ + return _mm_reduce_add_ph (b); +} + +__attribute__((noinline, noclone)) _Float16 +test_reduce_mul_ph (__m128h b) +{ + return _mm_reduce_mul_ph (b); +} + +__attribute__((noinline, noclone)) _Float16 +test_reduce_max_ph (__m128h b) +{ + return _mm_reduce_max_ph (b); +} + +__attribute__((noinline, noclone)) _Float16 +test_reduce_min_ph (__m128h b) +{ + return _mm_reduce_min_ph (b); +} + +#define SIZE 16 +#define REF_ADDMUL(op, a) \ + __m128h __a1 = _mm_setzero_ph (); \ + for (int i = 0; i < 8; i++) { \ + __a1[i] = (_Float16) a[i] op (_Float16) a[i + 8]; \ + } \ + _Float16 __c0 = __a1[0] op __a1[4]; \ + _Float16 __c1 = __a1[1] op __a1[5]; \ + _Float16 __c2 = __a1[2] op __a1[6]; \ + _Float16 __c3 = __a1[3] op __a1[7]; \ + _Float16 __d0 = __c0 op __c2; \ + _Float16 __d1 = __c1 op __c3; \ + _Float16 __e0 = __d0 op __d1; \ + r3 = __e0 + +#define TESTOP(opname, op, a) \ + do { \ + _Float16 r1 = _mm256_reduce_##opname##_ph (a); \ + _Float16 r2 = test_reduce_256_##opname##_ph (a); \ + _Float16 r3 = a[0]; \ + if (r1 != r2) { \ + __builtin_abort (); \ + } \ + REF_ADDMUL (op, a); \ + if (r1 != r3) { \ + __builtin_abort (); \ + } \ + } while (0) + +#define TEST_ADDMUL_PH(a) \ + do { \ + TESTOP (add, +, a); \ + TESTOP (mul, *, a); \ + } while (0) + +static void +test_256_addmul_ph (void) +{ + TEST_ADDMUL_PH (a1); + TEST_ADDMUL_PH (a2); +} + +#undef TESTOP +#define TESTOP(opname, op, a) \ + do { \ + _Float16 r1 = _mm256_reduce_##opname##_ph (a); \ + _Float16 r2 = test_reduce_256_##opname##_ph (a); \ + _Float16 r3 = a[0]; \ + if (r1 != r2) { \ + __builtin_abort (); \ + } \ + for (int i = 1; i < SIZE; i++) \ + r3 = r3 op a[i]; \ + if (r1 != r3) { \ + __builtin_abort (); \ + } \ + } while (0) + +#define TEST_MINMAX_PH(a) \ + do { \ + TESTOP (min, < a[i] ? r3 :, a); \ + TESTOP (max, > a[i] ? r3 :, a); \ + } while (0) + +static void +test_256_minmax_ph (void) +{ + TEST_MINMAX_PH (a1); + TEST_MINMAX_PH (a2); +} + +static void +test_256_ph (void) +{ + test_256_addmul_ph (); + test_256_minmax_ph (); +} + +#undef SIZE +#define SIZE 8 + +#undef REF_ADDMUL +#define REF_ADDMUL(op, a) \ + _Float16 __c0 = a[0] op a[4]; \ + _Float16 __c1 = a[1] op a[5]; \ + _Float16 __c2 = a[2] op a[6]; \ + _Float16 __c3 = a[3] op a[7]; \ + _Float16 __d0 = __c0 op __c2; \ + _Float16 __d1 = __c1 op __c3; \ + _Float16 __e0 = __d0 op __d1; \ + r3 = __e0 + +#undef TESTOP +#define TESTOP(opname, op, a) \ + do { \ + _Float16 r1 = _mm_reduce_##opname##_ph (a); \ + _Float16 r2 = test_reduce_##opname##_ph (a); \ + _Float16 r3 = a[0]; \ + if (r1 != r2) { \ + __builtin_abort (); \ + } \ + REF_ADDMUL (op, a); \ + if (r1 != r3) { \ + __builtin_abort (); \ + } \ + } while (0) + +#undef TEST_ADDMUL_PH +#define TEST_ADDMUL_PH(a) \ + do { \ + TESTOP (add, +, a); \ + TESTOP (mul, *, a); \ + } while (0) + +static void +test_128_addmul_ph (void) +{ + TEST_ADDMUL_PH (b1); + TEST_ADDMUL_PH (b2); + TEST_ADDMUL_PH (b3); + TEST_ADDMUL_PH (b4); +} + +#undef TESTOP +#define TESTOP(opname, op, b) \ + do { \ + _Float16 r1 = _mm_reduce_##opname##_ph (b); \ + _Float16 r2 = test_reduce_##opname##_ph (b); \ + _Float16 r3 = b[0]; \ + if (r1 != r2) { \ + __builtin_abort (); \ + } \ + for (int i = 1; i < SIZE; i++) \ + r3 = r3 op b[i]; \ + if (r1 != r3) { \ + __builtin_abort (); \ + } \ + } while (0) + +#undef TEST_MINMAX_PH +#define TEST_MINMAX_PH(b) \ + do { \ + TESTOP (min, < b[i] ? r3 :, b); \ + TESTOP (max, > b[i] ? r3 :, b); \ + } while (0) + +static void +test_128_minmax_ph (void) +{ + TEST_MINMAX_PH (b1); + TEST_MINMAX_PH (b2); + TEST_MINMAX_PH (b3); + TEST_MINMAX_PH (b4); +} + +static void +test_128_ph (void) +{ + test_128_addmul_ph (); + test_128_minmax_ph (); +} + +static void +do_test (void) +{ + test_256_ph (); + test_128_ph (); +} + + +#undef SIZE +#undef REF_ADDMUL +#undef TESTOP +#undef TEST_ADDMUL_PH +#undef TEST_MINMAX_PH From patchwork Thu Jul 1 06:16:47 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: liuhongt X-Patchwork-Id: 1499416 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=8.43.85.97; helo=sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256 header.s=default header.b=lHjkNWi+; dkim-atps=neutral Received: from sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4GFqTY3xV7z9sVb for ; Thu, 1 Jul 2021 17:24:17 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 4A787394FC02 for ; Thu, 1 Jul 2021 07:24:15 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 4A787394FC02 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1625124255; bh=4QzsdgyefG1Mem8zLPlOfQHlI4ztJJqzafd70i1rKPw=; h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=lHjkNWi+hBQ7aSwEqRDFcR3/ZzxEeCOrBPu3bp1+DNaYN6wInG4W6tqctzcv4KhM4 vj7nlGEi9CFL0JsBedBoE4GyEGYJL/ykdUW2rNpXwDDu7XrPC5HMeduSJFyryzGRzf k7C2fGWb/yUUlxBsTECOBegb493YkivMBIn00Nbs= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mga18.intel.com (mga18.intel.com [134.134.136.126]) by sourceware.org (Postfix) with ESMTPS id 4F2E23857432 for ; Thu, 1 Jul 2021 06:18:31 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 4F2E23857432 X-IronPort-AV: E=McAfee;i="6200,9189,10031"; a="195769989" X-IronPort-AV: E=Sophos;i="5.83,313,1616482800"; d="scan'208";a="195769989" Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Jun 2021 23:18:30 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.83,313,1616482800"; d="scan'208";a="420288065" Received: from scymds01.sc.intel.com ([10.148.94.138]) by fmsmga007.fm.intel.com with ESMTP; 30 Jun 2021 23:18:30 -0700 Received: from shliclel320.sh.intel.com (shliclel320.sh.intel.com [10.239.236.50]) by scymds01.sc.intel.com with ESMTP id 1616Gmfe031625; Wed, 30 Jun 2021 23:18:28 -0700 To: gcc-patches@gcc.gnu.org Subject: [PATCH 61/62] AVX512FP16: Add complex conjugation intrinsic instructions. Date: Thu, 1 Jul 2021 14:16:47 +0800 Message-Id: <20210701061648.9447-62-hongtao.liu@intel.com> X-Mailer: git-send-email 2.18.1 In-Reply-To: <20210701061648.9447-1-hongtao.liu@intel.com> References: <20210701061648.9447-1-hongtao.liu@intel.com> X-Spam-Status: No, score=-12.3 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_NONE, KAM_DMARC_STATUS, KAM_LAZY_DOMAIN_SECURITY, KAM_SHORT, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: liuhongt via Gcc-patches From: liuhongt Reply-To: liuhongt Cc: jakub@redhat.com, dianhong xu Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" From: dianhong xu gcc/ChangeLog: * config/i386/avx512fp16intrin.h: Add new intrinsics. (_mm512_conj_pch): New intrinsic. (_mm512_mask_conj_pch): Ditto. (_mm512_maskz_conj_pch): Ditto. * config/i386/avx512fp16vlintrin.h: Add new intrinsics. (_mm256_conj_pch): New intrinsic. (_mm256_mask_conj_pch): Ditto. (_mm256_maskz_conj_pch): Ditto. (_mm_conj_pch): Ditto. (_mm_mask_conj_pch): Ditto. (_mm_maskz_conj_pch): Ditto. gcc/testsuite/ChangeLog: * gcc.target/i386/avx512fp16-conjugation-1.c: New test. * gcc.target/i386/avx512fp16vl-conjugation-1.c: New test. --- gcc/config/i386/avx512fp16intrin.h | 25 +++++++ gcc/config/i386/avx512fp16vlintrin.h | 53 +++++++++++++++ .../i386/avx512fp16-conjugation-1.c | 34 ++++++++++ .../i386/avx512fp16vl-conjugation-1.c | 65 +++++++++++++++++++ 4 files changed, 177 insertions(+) create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-conjugation-1.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-conjugation-1.c diff --git a/gcc/config/i386/avx512fp16intrin.h b/gcc/config/i386/avx512fp16intrin.h index 6e0f3a80e54..38767ef270b 100644 --- a/gcc/config/i386/avx512fp16intrin.h +++ b/gcc/config/i386/avx512fp16intrin.h @@ -718,6 +718,31 @@ _mm512_maskz_div_round_ph (__mmask32 __A, __m512h __B, __m512h __C, (A), (D))) #endif /* __OPTIMIZE__ */ +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_conj_pch (__m512h __A) +{ + return (__m512h) _mm512_xor_epi32 ((__m512i) __A, _mm512_set1_epi32 (1<<31)); +} + +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_conj_pch (__m512h __W, __mmask16 __U, __m512h __A) +{ + return (__m512h) __builtin_ia32_movaps512_mask ((__v16sf) _mm512_conj_pch (__A), + (__v16sf) __W, + (__mmask16) __U); +} + +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_conj_pch (__mmask16 __U, __m512h __A) +{ + return (__m512h) __builtin_ia32_movaps512_mask ((__v16sf) _mm512_conj_pch (__A), + (__v16sf) _mm512_setzero_ps (), + (__mmask16) __U); +} + /* Intrinsics of v[add,sub,mul,div]sh. */ extern __inline __m128h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) diff --git a/gcc/config/i386/avx512fp16vlintrin.h b/gcc/config/i386/avx512fp16vlintrin.h index eea1941617f..9bbd5c5a5f4 100644 --- a/gcc/config/i386/avx512fp16vlintrin.h +++ b/gcc/config/i386/avx512fp16vlintrin.h @@ -151,6 +151,59 @@ _mm256_zextph128_ph256 (__m128h __A) (__m128) __A, 0); } +extern __inline __m256h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_conj_pch (__m256h __A) +{ + return (__m256h) _mm256_xor_epi32 ((__m256i) __A, _mm256_set1_epi32 (1<<31)); +} + +extern __inline __m256h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_mask_conj_pch (__m256h __W, __mmask8 __U, __m256h __A) +{ + return (__m256h) __builtin_ia32_movaps256_mask ((__v8sf) + _mm256_conj_pch (__A), + (__v8sf) __W, + (__mmask8) __U); +} + +extern __inline __m256h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_maskz_conj_pch (__mmask8 __U, __m256h __A) +{ + return (__m256h) __builtin_ia32_movaps256_mask ((__v8sf) + _mm256_conj_pch (__A), + (__v8sf) + _mm256_setzero_ps (), + (__mmask8) __U); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_conj_pch (__m128h __A) +{ + return (__m128h) _mm_xor_epi32 ((__m128i) __A, _mm_set1_epi32 (1<<31)); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_conj_pch (__m128h __W, __mmask8 __U, __m128h __A) +{ + return (__m128h) __builtin_ia32_movaps128_mask ((__v4sf) _mm_conj_pch (__A), + (__v4sf) __W, + (__mmask8) __U); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_conj_pch (__mmask8 __U, __m128h __A) +{ + return (__m128h) __builtin_ia32_movaps128_mask ((__v4sf) _mm_conj_pch (__A), + (__v4sf) _mm_setzero_ps (), + (__mmask8) __U); +} + /* Intrinsics v[add,sub,mul,div]ph. */ extern __inline __m128h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-conjugation-1.c b/gcc/testsuite/gcc.target/i386/avx512fp16-conjugation-1.c new file mode 100644 index 00000000000..662b23ca43d --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-conjugation-1.c @@ -0,0 +1,34 @@ +/* { dg-do compile} */ +/* { dg-options "-O2 -mavx512fp16" } */ + +#include +__m512h +__attribute__ ((noinline, noclone)) +test_mm512_conj_pch (__m512h __A) +{ + return _mm512_conj_pch (__A); +} + +/* { dg-final { scan-assembler-times "vpxord\[^\n\]*%zmm\[0-9\]+" 3 } } */ + +__m512h +__attribute__ ((noinline, noclone)) +test_mm512_mask_conj_pch (__m512h __W, __mmask16 __U, __m512h __A) +{ + return _mm512_mask_conj_pch (__W, __U, __A); +} + +/* { dg-final { scan-assembler-times "vpxord\[^\n\]*%zmm\[0-9\]+" 3 } } */ +/* { dg-final { scan-assembler-times "kmovw\[^\n\]*%k\[1-9\]+" 2 } } */ +/* { dg-final { scan-assembler-times "vmovaps\[^\n]" 2 } } */ + +__m512h +__attribute__ ((noinline, noclone)) +test_mm512_maskz_conj_pch (__mmask16 __U, __m512h __A) +{ + return _mm512_maskz_conj_pch (__U, __A); +} + +/* { dg-final { scan-assembler-times "vpxord\[^\n\]*%zmm\[0-9\]+" 3 } } */ +/* { dg-final { scan-assembler-times "kmovw\[^\n\]*%k\[1-9\]+" 2 } } */ +/* { dg-final { scan-assembler-times "vmovaps\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-conjugation-1.c b/gcc/testsuite/gcc.target/i386/avx512fp16vl-conjugation-1.c new file mode 100644 index 00000000000..0bce99790c6 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-conjugation-1.c @@ -0,0 +1,65 @@ +/* { dg-do compile} */ +/* { dg-options "-O2 -mavx512fp16 -mavx512vl -mavx512dq" } */ + +#include +__m256h +__attribute__ ((noinline, noclone)) +test_mm256_conj_pch (__m256h __A) +{ + return _mm256_conj_pch (__A); +} + +/* { dg-final { scan-assembler-times "vpxord\[^\n\]*%ymm\[0-9\]+" 3 {target { ! ia32} } } } */ +/* { dg-final { scan-assembler-times "vpxor\[^\n\]*%ymm\[0-9\]+" 3 {target ia32} } } */ + +__m128h +__attribute__ ((noinline, noclone)) +test_mm_conj_pch (__m128h __A) +{ + return _mm_conj_pch (__A); +} + +/* { dg-final { scan-assembler-times "vpxord\[^\n\]*%xmm\[0-9\]+" 3 {target { ! ia32} } } } */ +/* { dg-final { scan-assembler-times "vpxor\[^\n\]*%xmm\[0-9\]+" 3 {target ia32} } } */ + +__m256h +__attribute__ ((noinline, noclone)) +test_mm256_mask_conj_pch (__m256h __W, __mmask8 __U, __m256h __A) +{ + return _mm256_mask_conj_pch (__W, __U, __A); +} + +/* { dg-final { scan-assembler-times "vpxord\[^\n\]*%ymm\[0-9\]+" 3 {target { ! ia32} } } } */ +/* { dg-final { scan-assembler-times "vpxor\[^\n\]*%ymm\[0-9\]+" 3 {target ia32} } } */ +/* { dg-final { scan-assembler-times "vmovaps\[^\n\]*%ymm\[0-9\]+" 2 } } */ + +__m128h +__attribute__ ((noinline, noclone)) +test_mm_mask_conj_pch (__m128h __W, __mmask8 __U, __m128h __A) +{ + return _mm_mask_conj_pch (__W, __U, __A); +} + +/* { dg-final { scan-assembler-times "vpxord\[^\n\]*%xmm\[0-9\]+" 3 {target { ! ia32} } } } */ +/* { dg-final { scan-assembler-times "vpxor\[^\n\]*%xmm\[0-9\]+" 3 {target ia32} } } */ +/* { dg-final { scan-assembler-times "vmovaps\[^\n\]*%xmm\[0-9\]+" 2 } } */ + +__m256h +__attribute__ ((noinline, noclone)) +test_mm256_maskz_conj_pch (__mmask8 __U, __m256h __A) +{ + return _mm256_maskz_conj_pch (__U, __A); +} +/* { dg-final { scan-assembler-times "vpxord\[^\n\]*%ymm\[0-9\]+" 3 {target { ! ia32} } } } */ +/* { dg-final { scan-assembler-times "vpxor\[^\n\]*%ymm\[0-9\]+" 3 {target ia32} } } */ +/* { dg-final { scan-assembler-times "vmovaps\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ + +__m128h +__attribute__ ((noinline, noclone)) +test_mm_maskz_conj_pch (__mmask8 __U, __m128h __A) { + return _mm_maskz_conj_pch (__U, __A); +} + +/* { dg-final { scan-assembler-times "vpxord\[^\n\]*%xmm\[0-9\]+" 3 {target { ! ia32} } } } */ +/* { dg-final { scan-assembler-times "vpxor\[^\n\]*%xmm\[0-9\]+" 3 {target ia32} } } */ +/* { dg-final { scan-assembler-times "vmovaps\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ From patchwork Thu Jul 1 06:16:48 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: liuhongt X-Patchwork-Id: 1499417 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256 header.s=default header.b=j54gw1O0; dkim-atps=neutral Received: from sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4GFqWC4dw9z9sWw for ; Thu, 1 Jul 2021 17:25:42 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 44D09393C843 for ; Thu, 1 Jul 2021 07:25:39 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 44D09393C843 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1625124339; bh=oPFPBQT4FO/RUdCyVyI6YgE0Lsni7y4CVExV5HVLbCw=; h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=j54gw1O0ZYOW6zLNr3ZGV8KeTTKfwlUhsGrKV2ZK/rlhNUv7K9c7NmQnLEdgTpHl0 uCp4UVGV8Hd2YiUMkQZxPm+r/Z/BVr6mNkTeiZLeY3NNwbOWFrn1EQyH542Y++bFJY 5ep/5q/39tXhulDdct5/ywN88ZPiWByCsZuXclw0= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mga18.intel.com (mga18.intel.com [134.134.136.126]) by sourceware.org (Postfix) with ESMTPS id 2B941384F009 for ; Thu, 1 Jul 2021 06:18:33 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 2B941384F009 X-IronPort-AV: E=McAfee;i="6200,9189,10031"; a="195769992" X-IronPort-AV: E=Sophos;i="5.83,313,1616482800"; d="scan'208";a="195769992" Received: from fmsmga006.fm.intel.com ([10.253.24.20]) by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Jun 2021 23:18:32 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.83,313,1616482800"; d="scan'208";a="641962450" Received: from scymds01.sc.intel.com ([10.148.94.138]) by fmsmga006.fm.intel.com with ESMTP; 30 Jun 2021 23:18:31 -0700 Received: from shliclel320.sh.intel.com (shliclel320.sh.intel.com [10.239.236.50]) by scymds01.sc.intel.com with ESMTP id 1616Gmff031625; Wed, 30 Jun 2021 23:18:30 -0700 To: gcc-patches@gcc.gnu.org Subject: [PATCH 62/62] AVX512FP16: Add permutation and mask blend intrinsics. Date: Thu, 1 Jul 2021 14:16:48 +0800 Message-Id: <20210701061648.9447-63-hongtao.liu@intel.com> X-Mailer: git-send-email 2.18.1 In-Reply-To: <20210701061648.9447-1-hongtao.liu@intel.com> References: <20210701061648.9447-1-hongtao.liu@intel.com> X-Spam-Status: No, score=-12.3 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_NONE, KAM_DMARC_STATUS, KAM_LAZY_DOMAIN_SECURITY, KAM_SHORT, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: liuhongt via Gcc-patches From: liuhongt Reply-To: liuhongt Cc: jakub@redhat.com, dianhong xu Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" From: dianhong xu gcc/ChangeLog: * config/i386/avx512fp16intrin.h: (_mm512_mask_blend_ph): New intrinsic. (_mm512_permutex2var_ph): Ditto. (_mm512_permutexvar_ph): Ditto. * config/i386/avx512fp16vlintrin.h: (_mm256_mask_blend_ph): New intrinsic. (_mm256_permutex2var_ph): Ditto. (_mm256_permutexvar_ph): Ditto. (_mm_mask_blend_ph): Ditto. (_mm_permutex2var_ph): Ditto. (_mm_permutexvar_ph): Ditto. gcc/testsuite/ChangeLog: * gcc.target/i386/avx512fp16-14.c: New test. --- gcc/config/i386/avx512fp16intrin.h | 31 +++++++ gcc/config/i386/avx512fp16vlintrin.h | 62 +++++++++++++ gcc/testsuite/gcc.target/i386/avx512fp16-14.c | 91 +++++++++++++++++++ 3 files changed, 184 insertions(+) create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-14.c diff --git a/gcc/config/i386/avx512fp16intrin.h b/gcc/config/i386/avx512fp16intrin.h index 38767ef270b..2a2cb7b6348 100644 --- a/gcc/config/i386/avx512fp16intrin.h +++ b/gcc/config/i386/avx512fp16intrin.h @@ -7150,6 +7150,37 @@ _mm512_reduce_max_ph (__m512h __A) #undef _MM512_REDUCE_OP +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_blend_ph (__mmask32 __U, __m512h __A, __m512h __W) +{ + return (__m512h) __builtin_ia32_movdquhi512_mask ((__v32hi) __W, + (__v32hi) __A, + (__mmask32) __U); + +} + +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_permutex2var_ph (__m512h __A, __m512i __I, __m512h __B) +{ + return (__m512h) __builtin_ia32_vpermi2varhi512_mask ((__v32hi) __A, + (__v32hi) __I, + (__v32hi) __B, + (__mmask32)-1); +} + +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_permutexvar_ph (__m512i __A, __m512h __B) +{ + return (__m512h) __builtin_ia32_permvarhi512_mask ((__v32hi) __B, + (__v32hi) __A, + (__v32hi) + (_mm512_setzero_ph ()), + (__mmask32)-1); +} + #ifdef __DISABLE_AVX512FP16__ #undef __DISABLE_AVX512FP16__ #pragma GCC pop_options diff --git a/gcc/config/i386/avx512fp16vlintrin.h b/gcc/config/i386/avx512fp16vlintrin.h index 9bbd5c5a5f4..bc691ee61b7 100644 --- a/gcc/config/i386/avx512fp16vlintrin.h +++ b/gcc/config/i386/avx512fp16vlintrin.h @@ -3246,6 +3246,68 @@ _mm_reduce_max_ph (__m128h __A) #undef _MM256_REDUCE_OP #undef _MM_REDUCE_OP +extern __inline __m256h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_mask_blend_ph (__mmask16 __U, __m256h __A, __m256h __W) +{ + return (__m256h) __builtin_ia32_movdquhi256_mask ((__v16hi) __W, + (__v16hi) __A, + (__mmask16) __U); + +} + +extern __inline __m256h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_permutex2var_ph (__m256h __A, __m256i __I, __m256h __B) +{ + return (__m256h) __builtin_ia32_vpermi2varhi256_mask ((__v16hi) __A, + (__v16hi) __I, + (__v16hi) __B, + (__mmask16)-1); +} + +extern __inline __m256h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_permutexvar_ph (__m256i __A, __m256h __B) +{ + return (__m256h) __builtin_ia32_permvarhi256_mask ((__v16hi) __B, + (__v16hi) __A, + (__v16hi) + (_mm256_setzero_ph ()), + (__mmask16)-1); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_blend_ph (__mmask8 __U, __m128h __A, __m128h __W) +{ + return (__m128h) __builtin_ia32_movdquhi128_mask ((__v8hi) __W, + (__v8hi) __A, + (__mmask8) __U); + +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_permutex2var_ph (__m128h __A, __m128i __I, __m128h __B) +{ + return (__m128h) __builtin_ia32_vpermi2varhi128_mask ((__v8hi) __A, + (__v8hi) __I, + (__v8hi) __B, + (__mmask8)-1); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_permutexvar_ph (__m128i __A, __m128h __B) +{ + return (__m128h) __builtin_ia32_permvarhi128_mask ((__v8hi) __B, + (__v8hi) __A, + (__v8hi) + (_mm_setzero_ph ()), + (__mmask8)-1); +} + #ifdef __DISABLE_AVX512FP16VL__ #undef __DISABLE_AVX512FP16VL__ #pragma GCC pop_options diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-14.c b/gcc/testsuite/gcc.target/i386/avx512fp16-14.c new file mode 100644 index 00000000000..b2321fbcbab --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-14.c @@ -0,0 +1,91 @@ +/* { dg-do compile} */ +/* { dg-options "-O2 -mavx512fp16 -mavx512vl -mavx512bw" } */ + +#include + +__m512h +__attribute__ ((noinline, noclone)) +test_mm512_mask_blend_ph (__mmask32 U, __m512h A, __m512h B ) +{ + return _mm512_mask_blend_ph (U, A, B); +} + +/* { dg-final { scan-assembler-times "vmovdqu16\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 { target { ! ia32 } } } } */ +/* { dg-final { scan-assembler-times "vpblendmw\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 { target ia32 } } } */ + +__m512h +__attribute__ ((noinline, noclone)) +test_mm512_permutex2var_ph (__m512h A, __m512i I, __m512h B) +{ + return _mm512_permutex2var_ph (A, I, B); +} + +/* { dg-final { scan-assembler-times "vpermt2w\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+" 1 { target { ! ia32 } } } } */ +/* { dg-final { scan-assembler-times "vpermi2w\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+" 1 { target ia32 } } } */ + +__m512h +__attribute__ ((noinline, noclone)) +test_mm512_permutexvar_ph (__m512i A, __m512h B) +{ + return _mm512_permutexvar_ph (A, B); +} + +/* { dg-final { scan-assembler-times "vpermw\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+" 1 } } */ + +__m256h +__attribute__ ((noinline, noclone)) +test_mm256_mask_blend_ph (__mmask16 U, __m256h A, __m256h B ) +{ + return _mm256_mask_blend_ph (U, A, B); +} + +/* { dg-final { scan-assembler-times "vmovdqu16\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 { target { ! ia32 } } } } */ +/* { dg-final { scan-assembler-times "vpblendmw\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 { target ia32 } } } */ + +__m256h +__attribute__ ((noinline, noclone)) +test_mm256_permutex2var_ph (__m256h A, __m256i I, __m256h B) +{ + return _mm256_permutex2var_ph (A, I, B); +} + +/* { dg-final { scan-assembler-times "vpermt2w\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+" 1 { target { ! ia32 } } } } */ +/* { dg-final { scan-assembler-times "vpermi2w\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+" 1 { target ia32 } } } */ + +__m256h +__attribute__ ((noinline, noclone)) +test_mm256_permutexvar_ph (__m256i A, __m256h B) +{ + return _mm256_permutexvar_ph (A, B); +} + +/* { dg-final { scan-assembler-times "vpermw\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+" 1 } } */ + +__m128h +__attribute__ ((noinline, noclone)) +test_mm_mask_blend_ph (__mmask8 U, __m128h A, __m128h B ) +{ + return _mm_mask_blend_ph (U, A, B); +} + +/* { dg-final { scan-assembler-times "vmovdqu16\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 { target { ! ia32 } } } } */ +/* { dg-final { scan-assembler-times "vpblendmw\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 { target ia32 } } } */ + +__m128h +__attribute__ ((noinline, noclone)) +test_mm_permutex2var_ph (__m128h A, __m128i I, __m128h B) +{ + return _mm_permutex2var_ph (A, I, B); +} + +/* { dg-final { scan-assembler-times "vpermt2w\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+" 1 { target { ! ia32 } } } } */ +/* { dg-final { scan-assembler-times "vpermi2w\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+" 1 { target ia32 } } } */ + +__m128h +__attribute__ ((noinline, noclone)) +test_mm_permutexvar_ph (__m128i A, __m128h B) +{ + return _mm_permutexvar_ph (A, B); +} + +/* { dg-final { scan-assembler-times "vpermw\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+" 1 } } */