From patchwork Mon Aug 19 08:56:48 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Haochen Jiang X-Patchwork-Id: 1973731 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=intel.com header.i=@intel.com header.a=rsa-sha256 header.s=Intel header.b=F1W40s4n; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=8.43.85.97; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4WnRPD13Dbz1yfj for ; Mon, 19 Aug 2024 18:59:44 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 544733864862 for ; Mon, 19 Aug 2024 08:59:42 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.18]) by sourceware.org (Postfix) with ESMTPS id A1BEF3864823 for ; Mon, 19 Aug 2024 08:57:24 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org A1BEF3864823 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=intel.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org A1BEF3864823 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=192.198.163.18 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1724057852; cv=none; b=w239WtRyN4WjD6YXdyT1ALrOBgkyUou6ELPW/gHRjKAz9f8//UNTL33aQ4tU/6YJQ45n6E0WMZDn7t75/1uTguT2pNb4SDhs2d2DlrQd5puNG87TSW3AxTMEYuIeGWWEkU8K5RqX+BDE34qh++RdgMeDrvHgcdjRO+40gol+rWc= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1724057852; c=relaxed/simple; bh=0ckBA7hicTE178xFBWzRhKNGFv7ONbEjmxjbI7D1Iu4=; h=DKIM-Signature:From:To:Subject:Date:Message-ID:MIME-Version; b=EicLma62WHXEcbRGHYeO4fUSzvaZR59DcZoxkiBL4E7rOd67InR8a37TvvYt1yzunLlh+ZTT2ndATDLXxokjf4U1n4hf9evCLmeyTsjOwEDP0O6WsWpbnT3toU6AgEJElXAIG9914fVOxKsIkzgWMUS1R2695OMN6FwmTM7mibw= ARC-Authentication-Results: i=1; server2.sourceware.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1724057845; x=1755593845; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=0ckBA7hicTE178xFBWzRhKNGFv7ONbEjmxjbI7D1Iu4=; b=F1W40s4n534dX3dsCIIJPouw6iVclgFHe8Lt/R8nbidZwh2M/6E34NJx VEM1CYq6volvAetqEPy7G1TvMmiEQNXpFc7K1fw/xkrBnNxCEwghTC54z z5tOBPOA3Y3Gq1DjcLGdI5ZAO4ASwf9z0dVXAY/lCHBTXwC8/OugfLb5F 3yonJ1emI2H946du4ZcI3RyTFmHalR9UpNvm8zBqZ6OSxjkbm5VcXHTO/ B3j5XlqdaGT6cmE0Nn/YcLd3lpN721fkajPYxvSOywDy3mjYAVOiTP42i JzgZGZ+Tc3Nd+zcYOSiEe9dOF8K81SR/JNLVsn0/bzfgiTiKZ3ARsm/NF A==; X-CSE-ConnectionGUID: gBhmHAAJQpyRVzwGoJsj2w== X-CSE-MsgGUID: xhSp32nUSzm/1djom7WWMA== X-IronPort-AV: E=McAfee;i="6700,10204,11168"; a="21837750" X-IronPort-AV: E=Sophos;i="6.10,158,1719903600"; d="scan'208";a="21837750" Received: from fmviesa001.fm.intel.com ([10.60.135.141]) by fmvoesa112.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Aug 2024 01:57:21 -0700 X-CSE-ConnectionGUID: Feyu4OvlRKOaUcU+9sWaXA== X-CSE-MsgGUID: PbtgmEocQ8W/hOYT03u/mg== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.10,158,1719903600"; d="scan'208";a="91084194" Received: from scymds04.sc.intel.com ([10.82.73.238]) by fmviesa001.fm.intel.com with ESMTP; 19 Aug 2024 01:57:21 -0700 Received: from icl-spr-01.jf.intel.com (icl-spr-01.jf.intel.com [10.165.54.241]) by scymds04.sc.intel.com (Postfix) with ESMTP id 8AD762003EAB; Mon, 19 Aug 2024 01:57:20 -0700 (PDT) From: Haochen Jiang To: gcc-patches@gcc.gnu.org Cc: hongtao.liu@intel.com, zewei.mo@pitt.edu, ubizjak@gmail.com, Levy Hsu , Kong Lingling Subject: [PATCH 04/12] AVX10.2: Support convert instructions Date: Mon, 19 Aug 2024 01:56:48 -0700 Message-ID: <20240819085717.193256-5-haochen.jiang@intel.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20240819085717.193256-1-haochen.jiang@intel.com> References: <20240819085717.193256-1-haochen.jiang@intel.com> MIME-Version: 1.0 X-Spam-Status: No, score=-10.8 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, SPF_HELO_NONE, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org From: Levy Hsu gcc/ChangeLog: * config.gcc: Add avx10_2-512convertintrin.h and avx10_2convertintrin.h. * config/i386/i386-builtin-types.def: Add new DEF_POINTER_TYPE and DEF_FUNCTION_TYPE. * config/i386/i386-builtin.def (BDESC): Add new builtins. * config/i386/i386-expand.cc (ix86_expand_args_builtin): Handle AVX10.2. (ix86_expand_round_builtin): Ditto. * config/i386/immintrin.h: Include avx10_2-512convertintrin.h, avx10_2convertintrin.h. * config/i386/sse.md (VHF_AVX10_2): New iterator. (avx10_2_cvtne2ps2ph_): New define_insn. (vcvt): Ditto. (vcvtv8hf): Ditto. (*vcvtv8hf): Ditto. (vcvtv8hf_mask): Ditto. (*vcvtv8hf_mask): Ditto. (vcvt): Ditto. (vcvthf82ph): Ditto. * config/i386/avx10_2-512convertintrin.h: New file. * config/i386/avx10_2convertintrin.h: Ditto. gcc/testsuite/ChangeLog: * gcc.target/i386/avx-1.c: Add macros for const. * gcc.target/i386/sse-13.c: Ditto. * gcc.target/i386/sse-14.c: Ditto. * gcc.target/i386/sse-22.c: Ditto. * gcc.target/i386/sse-23.c: Ditto. * gcc.target/i386/avx10_2-512-convert-1.c: New test. * gcc.target/i386/avx10_2-512-vcvt2ps2ph-2.c: Ditto. * gcc.target/i386/avx10_2-512-vcvtbiasph2bf8-2.c: Ditto. * gcc.target/i386/avx10_2-512-vcvtbiasph2bf8s-2.c: Ditto. * gcc.target/i386/avx10_2-512-vcvtbiasph2hf8-2.c: Ditto. * gcc.target/i386/avx10_2-512-vcvtbiasph2hf8s-2.c: Ditto. * gcc.target/i386/avx10_2-512-vcvthf82ph-2.c: Ditto. * gcc.target/i386/avx10_2-512-vcvtne2ph2bf8-2.c: Ditto. * gcc.target/i386/avx10_2-512-vcvtne2ph2bf8s-2.c: Ditto. * gcc.target/i386/avx10_2-512-vcvtne2ph2hf8-2.c: Ditto. * gcc.target/i386/avx10_2-512-vcvtne2ph2hf8s-2.c: Ditto. * gcc.target/i386/avx10_2-512-vcvtneph2bf8-2.c: Ditto. * gcc.target/i386/avx10_2-512-vcvtneph2bf8s-2.c: Ditto. * gcc.target/i386/avx10_2-512-vcvtneph2hf8-2.c: Ditto. * gcc.target/i386/avx10_2-512-vcvtneph2hf8s-2.c: Ditto. * gcc.target/i386/avx10_2-convert-1.c: Ditto. * gcc.target/i386/avx10_2-vcvt2ps2ph-2.c: Ditto. * gcc.target/i386/avx10_2-vcvtbiasph2bf8-2.c: Ditto. * gcc.target/i386/avx10_2-vcvtbiasph2bf8s-2.c: Ditto. * gcc.target/i386/avx10_2-vcvtbiasph2hf8-2.c: Ditto. * gcc.target/i386/avx10_2-vcvtbiasph2hf8s-2.c: Ditto. * gcc.target/i386/avx10_2-vcvthf82ph-2.c: Ditto. * gcc.target/i386/avx10_2-vcvtne2ph2bf8-2.c: Ditto. * gcc.target/i386/avx10_2-vcvtne2ph2bf8s-2.c: Ditto. * gcc.target/i386/avx10_2-vcvtne2ph2hf8-2.c: Ditto. * gcc.target/i386/avx10_2-vcvtne2ph2hf8s-2.c: Ditto. * gcc.target/i386/avx10_2-vcvtneph2bf8-2.c: Ditto. * gcc.target/i386/avx10_2-vcvtneph2bf8s-2.c: Ditto. * gcc.target/i386/avx10_2-vcvtneph2hf8-2.c: Ditto. * gcc.target/i386/avx10_2-vcvtneph2hf8s-2.c: Ditto. * gcc.target/i386/fp8-helper.h: New helper file. Co-authored-by: Levy Hsu Co-authored-by: Kong Lingling --- gcc/config.gcc | 3 +- gcc/config/i386/avx10_2-512convertintrin.h | 548 ++++++++++ gcc/config/i386/avx10_2convertintrin.h | 978 ++++++++++++++++++ gcc/config/i386/i386-builtin-types.def | 21 + gcc/config/i386/i386-builtin.def | 46 + gcc/config/i386/i386-expand.cc | 21 + gcc/config/i386/immintrin.h | 4 + gcc/config/i386/sse.md | 235 ++++- gcc/testsuite/gcc.target/i386/avx-1.c | 6 + gcc/testsuite/gcc.target/i386/avx-2.c | 3 +- .../gcc.target/i386/avx10_2-512-convert-1.c | 176 ++++ .../i386/avx10_2-512-vcvt2ps2phx-2.c | 51 + .../i386/avx10_2-512-vcvtbiasph2bf8-2.c | 59 ++ .../i386/avx10_2-512-vcvtbiasph2bf8s-2.c | 59 ++ .../i386/avx10_2-512-vcvtbiasph2hf8-2.c | 59 ++ .../i386/avx10_2-512-vcvtbiasph2hf8s-2.c | 59 ++ .../i386/avx10_2-512-vcvthf82ph-2.c | 45 + .../i386/avx10_2-512-vcvtne2ph2bf8-2.c | 65 ++ .../i386/avx10_2-512-vcvtne2ph2bf8s-2.c | 65 ++ .../i386/avx10_2-512-vcvtne2ph2hf8-2.c | 65 ++ .../i386/avx10_2-512-vcvtne2ph2hf8s-2.c | 65 ++ .../i386/avx10_2-512-vcvtneph2bf8-2.c | 58 ++ .../i386/avx10_2-512-vcvtneph2bf8s-2.c | 56 + .../i386/avx10_2-512-vcvtneph2hf8-2.c | 56 + .../i386/avx10_2-512-vcvtneph2hf8s-2.c | 56 + .../gcc.target/i386/avx10_2-convert-1.c | 274 +++++ .../gcc.target/i386/avx10_2-vcvt2ps2phx-2.c | 16 + .../i386/avx10_2-vcvtbiasph2bf8-2.c | 16 + .../i386/avx10_2-vcvtbiasph2bf8s-2.c | 16 + .../i386/avx10_2-vcvtbiasph2hf8-2.c | 16 + .../i386/avx10_2-vcvtbiasph2hf8s-2.c | 16 + .../gcc.target/i386/avx10_2-vcvthf82ph-2.c | 16 + .../gcc.target/i386/avx10_2-vcvtne2ph2bf8-2.c | 16 + .../i386/avx10_2-vcvtne2ph2bf8s-2.c | 16 + .../gcc.target/i386/avx10_2-vcvtne2ph2hf8-2.c | 16 + .../i386/avx10_2-vcvtne2ph2hf8s-2.c | 16 + .../gcc.target/i386/avx10_2-vcvtneph2bf8-2.c | 16 + .../gcc.target/i386/avx10_2-vcvtneph2bf8s-2.c | 16 + .../gcc.target/i386/avx10_2-vcvtneph2hf8-2.c | 16 + .../gcc.target/i386/avx10_2-vcvtneph2hf8s-2.c | 16 + gcc/testsuite/gcc.target/i386/fp8-helper.h | 135 +++ gcc/testsuite/gcc.target/i386/sse-13.c | 6 + gcc/testsuite/gcc.target/i386/sse-14.c | 6 + gcc/testsuite/gcc.target/i386/sse-22.c | 6 + gcc/testsuite/gcc.target/i386/sse-23.c | 6 + 45 files changed, 3511 insertions(+), 5 deletions(-) create mode 100644 gcc/config/i386/avx10_2-512convertintrin.h create mode 100644 gcc/config/i386/avx10_2convertintrin.h create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-512-convert-1.c create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-512-vcvt2ps2phx-2.c create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-512-vcvtbiasph2bf8-2.c create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-512-vcvtbiasph2bf8s-2.c create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-512-vcvtbiasph2hf8-2.c create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-512-vcvtbiasph2hf8s-2.c create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-512-vcvthf82ph-2.c create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-512-vcvtne2ph2bf8-2.c create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-512-vcvtne2ph2bf8s-2.c create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-512-vcvtne2ph2hf8-2.c create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-512-vcvtne2ph2hf8s-2.c create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-512-vcvtneph2bf8-2.c create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-512-vcvtneph2bf8s-2.c create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-512-vcvtneph2hf8-2.c create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-512-vcvtneph2hf8s-2.c create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-convert-1.c create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-vcvt2ps2phx-2.c create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-vcvtbiasph2bf8-2.c create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-vcvtbiasph2bf8s-2.c create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-vcvtbiasph2hf8-2.c create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-vcvtbiasph2hf8s-2.c create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-vcvthf82ph-2.c create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-vcvtne2ph2bf8-2.c create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-vcvtne2ph2bf8s-2.c create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-vcvtne2ph2hf8-2.c create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-vcvtne2ph2hf8s-2.c create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-vcvtneph2bf8-2.c create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-vcvtneph2bf8s-2.c create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-vcvtneph2hf8-2.c create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-vcvtneph2hf8s-2.c create mode 100644 gcc/testsuite/gcc.target/i386/fp8-helper.h diff --git a/gcc/config.gcc b/gcc/config.gcc index 22353f2d69e..5e9c36a2aad 100644 --- a/gcc/config.gcc +++ b/gcc/config.gcc @@ -453,7 +453,8 @@ i[34567]86-*-* | x86_64-*-*) raointintrin.h amxcomplexintrin.h avxvnniint16intrin.h sm3intrin.h sha512intrin.h sm4intrin.h usermsrintrin.h avx10_2roundingintrin.h - avx10_2mediaintrin.h avx10_2-512mediaintrin.h" + avx10_2mediaintrin.h avx10_2-512mediaintrin.h + avx10_2convertintrin.h avx10_2-512convertintrin.h" ;; ia64-*-*) extra_headers=ia64intrin.h diff --git a/gcc/config/i386/avx10_2-512convertintrin.h b/gcc/config/i386/avx10_2-512convertintrin.h new file mode 100644 index 00000000000..4ad339bbbf9 --- /dev/null +++ b/gcc/config/i386/avx10_2-512convertintrin.h @@ -0,0 +1,548 @@ +/* Copyright (C) 2024 Free Software Foundation, Inc. + + This file is part of GCC. + + GCC is free software; you can redistribute it and/or modify + it under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 3, or (at your option) + any later version. + + GCC is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + Under Section 7 of GPL version 3, you are granted additional + permissions described in the GCC Runtime Library Exception, version + 3.1, as published by the Free Software Foundation. + + You should have received a copy of the GNU General Public License and + a copy of the GCC Runtime Library Exception along with this program; + see the files COPYING3 and COPYING.RUNTIME respectively. If not, see + . */ + +#ifndef _IMMINTRIN_H_INCLUDED +#error "Never use directly; include instead." +#endif // _IMMINTRIN_H_INCLUDED + +#ifndef __AVX10_2_512CONVERTINTRIN_H_INCLUDED +#define __AVX10_2_512CONVERTINTRIN_H_INCLUDED + +#ifndef __AVX10_2_512__ +#pragma GCC push_options +#pragma GCC target("avx10.2-512") +#define __DISABLE_AVX10_2_512__ +#endif /* __AVX10_2_512__ */ + +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_cvtx2ps_ph (__m512 __A, __m512 __B) +{ + return (__m512h) __builtin_ia32_vcvt2ps2phx512_mask_round ((__v16sf) __A, + (__v16sf) __B, + (__v32hf) + _mm512_setzero_ph (), + (__mmask32) -1, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_cvtx2ps_ph (__m512h __W, __mmask32 __U, __m512 __A, + __m512 __B) +{ + return (__m512h) __builtin_ia32_vcvt2ps2phx512_mask_round ((__v16sf) __A, + (__v16sf) __B, + (__v32hf) __W, + (__mmask32) __U, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_cvtx2ps_ph (__mmask32 __U, __m512 __A, __m512 __B) +{ + return (__m512h) __builtin_ia32_vcvt2ps2phx512_mask_round ((__v16sf) __A, + (__v16sf) __B, + (__v32hf) + _mm512_setzero_ph (), + (__mmask32) __U, + _MM_FROUND_CUR_DIRECTION); +} + +#ifdef __OPTIMIZE__ +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_cvtx_round2ps_ph (__m512 __A, __m512 __B, const int __R) +{ + return (__m512h) __builtin_ia32_vcvt2ps2phx512_mask_round ((__v16sf) __A, + (__v16sf) __B, + (__v32hf) + _mm512_setzero_ph (), + (__mmask32) -1, + __R); +} + +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_cvtx_round2ps_ph (__m512h __W, __mmask32 __U, __m512 __A, + __m512 __B, const int __R) +{ + return (__m512h) __builtin_ia32_vcvt2ps2phx512_mask_round ((__v16sf) __A, + (__v16sf) __B, + (__v32hf) __W, + (__mmask32) __U, + __R); +} + +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_cvtx_round2ps_ph (__mmask32 __U, __m512 __A, + __m512 __B, const int __R) +{ + return (__m512h) __builtin_ia32_vcvt2ps2phx512_mask_round ((__v16sf) __A, + (__v16sf) __B, + (__v32hf) + _mm512_setzero_ph (), + (__mmask32) __U, + __R); +} + +#else +#define _mm512_cvtx_round2ps_ph(A, B, R) \ + ((__m512h) __builtin_ia32_vcvt2ps2phx512_mask_round ((__v16sf) (A), \ + (__v16sf) (B), \ + (__v32hf) \ + (_mm512_setzero_ph ()), \ + (__mmask32) (-1), \ + (R))) +#define _mm512_mask_cvtx_round2ps_ph(W, U, A, B, R) \ + ((__m512h) __builtin_ia32_vcvt2ps2phx512_mask_round ((__v16sf) (A), \ + (__v16sf) (B), \ + (__v32hf) (W), \ + (__mmask32) (U), \ + (R))) +#define _mm512_maskz_cvtx_round2ps_ph(U, A, B, R) \ + ((__m512h) __builtin_ia32_vcvt2ps2phx512_mask_round ((__v16sf) (A), \ + (__v16sf) (B), \ + (__v32hf) \ + (_mm512_setzero_ph ()), \ + (__mmask32) (U), \ + (R))) +#endif /* __OPTIMIZE__ */ + +extern __inline__ __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_cvtbiasph_pbf8 (__m512i __A, __m512h __B) +{ + return (__m256i) __builtin_ia32_vcvtbiasph2bf8512_mask ((__v64qi) __A, + (__v32hf) __B, + (__v32qi)(__m256i) + _mm256_undefined_si256 (), + (__mmask32) -1); +} + +extern __inline__ __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_cvtbiasph_pbf8 (__m256i __W, __mmask32 __U, + __m512i __A, __m512h __B) +{ + return (__m256i) __builtin_ia32_vcvtbiasph2bf8512_mask ((__v64qi) __A, + (__v32hf) __B, + (__v32qi)(__m256i) __W, + (__mmask32) __U); +} + +extern __inline__ __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_cvtbiasph_pbf8 (__mmask32 __U, __m512i __A, __m512h __B) +{ + return (__m256i) __builtin_ia32_vcvtbiasph2bf8512_mask ((__v64qi) __A, + (__v32hf) __B, + (__v32qi)(__m256i) + _mm256_setzero_si256 (), + (__mmask32) __U); +} + +extern __inline__ __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_cvtbiassph_pbf8 (__m512i __A, __m512h __B) +{ + return (__m256i) __builtin_ia32_vcvtbiasph2bf8s512_mask ((__v64qi) __A, + (__v32hf) __B, + (__v32qi)(__m256i) + _mm256_undefined_si256 (), + (__mmask32) -1); +} + +extern __inline__ __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_cvtbiassph_pbf8 (__m256i __W, __mmask32 __U, + __m512i __A, __m512h __B) +{ + return (__m256i) __builtin_ia32_vcvtbiasph2bf8s512_mask ((__v64qi) __A, + (__v32hf) __B, + (__v32qi)(__m256i) __W, + (__mmask32) __U); +} + +extern __inline__ __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_cvtbiassph_pbf8 (__mmask32 __U, __m512i __A, __m512h __B) +{ + return (__m256i) __builtin_ia32_vcvtbiasph2bf8s512_mask ((__v64qi) __A, + (__v32hf) __B, + (__v32qi)(__m256i) + _mm256_setzero_si256 (), + (__mmask32) __U); +} + +extern __inline__ __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_cvtbiasph_phf8 (__m512i __A, __m512h __B) +{ + return (__m256i) __builtin_ia32_vcvtbiasph2hf8512_mask ((__v64qi) __A, + (__v32hf) __B, + (__v32qi)(__m256i) + _mm256_undefined_si256 (), + (__mmask32) -1); +} + +extern __inline__ __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_cvtbiasph_phf8 (__m256i __W, __mmask32 __U, __m512i __A, + __m512h __B) +{ + return (__m256i) __builtin_ia32_vcvtbiasph2hf8512_mask ((__v64qi) __A, + (__v32hf) __B, + (__v32qi)(__m256i) __W, + (__mmask32) __U); +} + +extern __inline__ __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_cvtbiasph_phf8 (__mmask32 __U, __m512i __A, __m512h __B) +{ + return (__m256i) __builtin_ia32_vcvtbiasph2hf8512_mask ((__v64qi) __A, + (__v32hf) __B, + (__v32qi)(__m256i) + _mm256_setzero_si256 (), + (__mmask32) __U); +} + +extern __inline__ __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_cvtbiassph_phf8 (__m512i __A, __m512h __B) +{ + return (__m256i) __builtin_ia32_vcvtbiasph2hf8s512_mask ((__v64qi) __A, + (__v32hf) __B, + (__v32qi)(__m256i) + _mm256_undefined_si256 (), + (__mmask32) -1); +} + +extern __inline__ __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_cvtbiassph_phf8 (__m256i __W, __mmask32 __U, + __m512i __A, __m512h __B) +{ + return (__m256i) __builtin_ia32_vcvtbiasph2hf8s512_mask ((__v64qi) __A, + (__v32hf) __B, + (__v32qi)(__m256i) __W, + (__mmask32) __U); +} + +extern __inline__ __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_cvtbiassph_phf8 (__mmask32 __U, __m512i __A, __m512h __B) +{ + return (__m256i) __builtin_ia32_vcvtbiasph2hf8s512_mask ((__v64qi) __A, + (__v32hf) __B, + (__v32qi)(__m256i) + _mm256_setzero_si256 (), + (__mmask32) __U); +} + +extern __inline__ __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_cvtne2ph_pbf8 (__m512h __A, __m512h __B) +{ + return (__m512i) __builtin_ia32_vcvtne2ph2bf8512_mask ((__v32hf) __A, + (__v32hf) __B, + (__v64qi) + _mm512_setzero_si512 (), + (__mmask64) -1); +} + +extern __inline__ __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_cvtne2ph_pbf8 (__m512i __W, __mmask64 __U, + __m512h __A, __m512h __B) +{ + return (__m512i) __builtin_ia32_vcvtne2ph2bf8512_mask ((__v32hf) __A, + (__v32hf) __B, + (__v64qi) __W, + (__mmask64) __U); +} + +extern __inline__ __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_cvtne2ph_pbf8 (__mmask64 __U, __m512h __A, __m512h __B) +{ + return (__m512i) __builtin_ia32_vcvtne2ph2bf8512_mask ((__v32hf) __A, + (__v32hf) __B, + (__v64qi) + _mm512_setzero_si512 (), + (__mmask64) __U); +} + +extern __inline__ __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_cvtnes2ph_pbf8 (__m512h __A, __m512h __B) +{ + return (__m512i) __builtin_ia32_vcvtne2ph2bf8s512_mask ((__v32hf) __A, + (__v32hf) __B, + (__v64qi) + _mm512_setzero_si512 (), + (__mmask64) -1); +} + +extern __inline__ __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_cvtnes2ph_pbf8 (__m512i __W, __mmask64 __U, + __m512h __A, __m512h __B) +{ + return (__m512i) __builtin_ia32_vcvtne2ph2bf8s512_mask ((__v32hf) __A, + (__v32hf) __B, + (__v64qi) __W, + (__mmask64) __U); +} + +extern __inline__ __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_cvtnes2ph_pbf8 (__mmask64 __U, __m512h __A, __m512h __B) +{ + return (__m512i) __builtin_ia32_vcvtne2ph2bf8s512_mask ((__v32hf) __A, + (__v32hf) __B, + (__v64qi) + _mm512_setzero_si512 (), + (__mmask64) __U); +} + +extern __inline__ __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_cvtne2ph_phf8 (__m512h __A, __m512h __B) +{ + return (__m512i) __builtin_ia32_vcvtne2ph2hf8512_mask ((__v32hf) __A, + (__v32hf) __B, + (__v64qi) + _mm512_setzero_si512 (), + (__mmask64) -1); +} + +extern __inline__ __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_cvtne2ph_phf8 (__m512i __W, __mmask64 __U, + __m512h __A, __m512h __B) +{ + return (__m512i) __builtin_ia32_vcvtne2ph2hf8512_mask ((__v32hf) __A, + (__v32hf) __B, + (__v64qi) __W, + (__mmask64) __U); +} + +extern __inline__ __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_cvtne2ph_phf8 (__mmask64 __U, __m512h __A, __m512h __B) +{ + return (__m512i) __builtin_ia32_vcvtne2ph2hf8512_mask ((__v32hf) __A, + (__v32hf) __B, + (__v64qi) + _mm512_setzero_si512 (), + (__mmask64) __U); +} + +extern __inline__ __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_cvtnes2ph_phf8 (__m512h __A, __m512h __B) +{ + return (__m512i) __builtin_ia32_vcvtne2ph2hf8s512_mask ((__v32hf) __A, + (__v32hf) __B, + (__v64qi) + _mm512_setzero_si512 (), + (__mmask64) -1); +} + +extern __inline__ __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_cvtnes2ph_phf8 (__m512i __W, __mmask64 __U, + __m512h __A, __m512h __B) +{ + return (__m512i) __builtin_ia32_vcvtne2ph2hf8s512_mask ((__v32hf) __A, + (__v32hf) __B, + (__v64qi) __W, + (__mmask64) __U); +} + +extern __inline__ __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_cvtnes2ph_phf8 (__mmask64 __U, __m512h __A, __m512h __B) +{ + return (__m512i) __builtin_ia32_vcvtne2ph2hf8s512_mask ((__v32hf) __A, + (__v32hf) __B, + (__v64qi) + _mm512_setzero_si512 (), + (__mmask64) __U); +} + +extern __inline__ __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_cvthf8_ph (__m256i __A) +{ + return (__m512h) __builtin_ia32_vcvthf82ph512_mask ((__v32qi) __A, + (__v32hf) (__m512h) + _mm512_undefined_ph (), + (__mmask32) -1); +} + +extern __inline__ __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_cvthf8_ph (__m512h __W, __mmask32 __U, __m256i __A) +{ + return (__m512h) __builtin_ia32_vcvthf82ph512_mask ((__v32qi) __A, + (__v32hf) (__m512h) __W, + (__mmask32) __U); +} + +extern __inline__ __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_cvthf8_ph (__mmask32 __U, __m256i __A) +{ + return (__m512h) __builtin_ia32_vcvthf82ph512_mask ((__v32qi) __A, + (__v32hf) (__m512h) + _mm512_setzero_ph (), + (__mmask32) __U); +} + +extern __inline__ __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_cvtneph_pbf8 (__m512h __A) +{ + return (__m256i) __builtin_ia32_vcvtneph2bf8512_mask ((__v32hf) __A, + (__v32qi) (__m256i) + _mm256_undefined_si256 (), + (__mmask32) -1); +} + +extern __inline__ __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_cvtneph_pbf8 (__m256i __W, __mmask32 __U, __m512h __A) +{ + return (__m256i) __builtin_ia32_vcvtneph2bf8512_mask ((__v32hf) __A, + (__v32qi) (__m256i) __W, + (__mmask32) __U); +} + +extern __inline__ __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_cvtneph_pbf8 (__mmask32 __U, __m512h __A) +{ + return (__m256i) __builtin_ia32_vcvtneph2bf8512_mask ((__v32hf) __A, + (__v32qi) (__m256i) + _mm256_setzero_si256 (), + (__mmask32) __U); +} + +extern __inline__ __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_cvtnesph_pbf8 (__m512h __A) +{ + return (__m256i) __builtin_ia32_vcvtneph2bf8s512_mask ((__v32hf) __A, + (__v32qi) (__m256i) + _mm256_undefined_si256 (), + (__mmask32) -1); +} + +extern __inline__ __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_cvtnesph_pbf8 (__m256i __W, __mmask32 __U, __m512h __A) +{ + return (__m256i) __builtin_ia32_vcvtneph2bf8s512_mask ((__v32hf) __A, + (__v32qi) (__m256i) __W, + (__mmask32) __U); +} + +extern __inline__ __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_cvtnesph_pbf8 (__mmask32 __U, __m512h __A) +{ + return (__m256i) __builtin_ia32_vcvtneph2bf8s512_mask ((__v32hf) __A, + (__v32qi) (__m256i) + _mm256_setzero_si256 (), + (__mmask32) __U); +} + +extern __inline__ __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_cvtneph_phf8 (__m512h __A) +{ + return (__m256i) __builtin_ia32_vcvtneph2hf8512_mask ((__v32hf) __A, + (__v32qi) (__m256i) + _mm256_undefined_si256 (), + (__mmask32) -1); +} + +extern __inline__ __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_cvtneph_phf8 (__m256i __W, __mmask32 __U, __m512h __A) +{ + return (__m256i) __builtin_ia32_vcvtneph2hf8512_mask ((__v32hf) __A, + (__v32qi)(__m256i) __W, + (__mmask32) __U); +} + +extern __inline__ __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_cvtneph_phf8 (__mmask32 __U, __m512h __A) +{ + return (__m256i) __builtin_ia32_vcvtneph2hf8512_mask ((__v32hf) __A, + (__v32qi) (__m256i) + _mm256_setzero_si256 (), + (__mmask32) __U); +} + +extern __inline__ __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_cvtnesph_phf8 (__m512h __A) +{ + return (__m256i) __builtin_ia32_vcvtneph2hf8s512_mask ((__v32hf) __A, + (__v32qi) (__m256i) + _mm256_undefined_si256 (), + (__mmask32) -1); +} + +extern __inline__ __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_cvtnesph_phf8 (__m256i __W, __mmask32 __U, __m512h __A) +{ + return (__m256i) __builtin_ia32_vcvtneph2hf8s512_mask ((__v32hf) __A, + (__v32qi) (__m256i) __W, + (__mmask32) __U); +} + +extern __inline__ __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_cvtnesph_phf8 (__mmask32 __U, __m512h __A) +{ + return (__m256i) __builtin_ia32_vcvtneph2hf8s512_mask ((__v32hf) __A, + (__v32qi) (__m256i) + _mm256_setzero_si256 (), + (__mmask32) __U); +} + +#ifdef __DISABLE_AVX10_2_512__ +#undef __DISABLE_AVX10_2_512__ +#pragma GCC pop_options +#endif /* __DISABLE_AVX10_2_512__ */ + +#endif /* __AVX10_2_512CONVERTINTRIN_H_INCLUDED */ diff --git a/gcc/config/i386/avx10_2convertintrin.h b/gcc/config/i386/avx10_2convertintrin.h new file mode 100644 index 00000000000..ac62d1290a5 --- /dev/null +++ b/gcc/config/i386/avx10_2convertintrin.h @@ -0,0 +1,978 @@ +/* Copyright (C) 2024 Free Software Foundation, Inc. + + This file is part of GCC. + + GCC is free software; you can redistribute it and/or modify + it under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 3, or (at your option) + any later version. + + GCC is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + Under Section 7 of GPL version 3, you are granted additional + permissions described in the GCC Runtime Library Exception, version + 3.1, as published by the Free Software Foundation. + + You should have received a copy of the GNU General Public License and + a copy of the GCC Runtime Library Exception along with this program; + see the files COPYING3 and COPYING.RUNTIME respectively. If not, see + . */ + +#if !defined _IMMINTRIN_H_INCLUDED +#error "Never use directly; include instead." +#endif + +#ifndef _AVX10_2CONVERTINTRIN_H_INCLUDED +#define _AVX10_2CONVERTINTRIN_H_INCLUDED + +#if !defined(__AVX10_2_256__) +#pragma GCC push_options +#pragma GCC target("avx10.2") +#define __DISABLE_AVX10_2_256__ +#endif /* __AVX10_2__ */ + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvtx2ps_ph (__m128 __A, __m128 __B) +{ + return (__m128h) __builtin_ia32_vcvt2ps2phx128_mask ((__v4sf) __A, + (__v4sf) __B, + (__v8hf) + _mm_setzero_ph (), + (__mmask8) -1); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_cvtx2ps_ph (__m128h __W, __mmask8 __U, __m128 __A, __m128 __B) +{ + return (__m128h) __builtin_ia32_vcvt2ps2phx128_mask ((__v4sf) __A, + (__v4sf) __B, + (__v8hf) __W, + (__mmask8) __U); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_cvtx2ps_ph (__mmask8 __U, __m128 __A, __m128 __B) +{ + return (__m128h) __builtin_ia32_vcvt2ps2phx128_mask ((__v4sf) __A, + (__v4sf) __B, + (__v8hf) + _mm_setzero_ph (), + (__mmask8) __U); +} + +extern __inline __m256h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_cvtx2ps_ph (__m256 __A, __m256 __B) +{ + return (__m256h) __builtin_ia32_vcvt2ps2phx256_mask_round ((__v8sf) __A, + (__v8sf) __B, + (__v16hf) + _mm256_setzero_ph (), + (__mmask16) -1, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m256h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_mask_cvtx2ps_ph (__m256h __W, __mmask16 __U, __m256 __A, __m256 __B) +{ + return (__m256h) __builtin_ia32_vcvt2ps2phx256_mask_round ((__v8sf) __A, + (__v8sf) __B, + (__v16hf) __W, + (__mmask16) __U, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m256h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_maskz_cvtx2ps_ph ( __mmask16 __U, __m256 __A, __m256 __B) +{ + return (__m256h) __builtin_ia32_vcvt2ps2phx256_mask_round ((__v8sf) __A, + (__v8sf) __B, + (__v16hf) + _mm256_setzero_ph (), + (__mmask16) __U, + _MM_FROUND_CUR_DIRECTION); +} + +#ifdef __OPTIMIZE__ +extern __inline __m256h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_cvtx_round2ps_ph (__m256 __A, __m256 __B, const int __R) +{ + return (__m256h) __builtin_ia32_vcvt2ps2phx256_mask_round ((__v8sf) __A, + (__v8sf) __B, + (__v16hf) + _mm256_setzero_ph (), + (__mmask16) -1, + __R); +} + +extern __inline __m256h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_mask_cvtx_round2ps_ph (__m256h __W, __mmask16 __U, __m256 __A, + __m256 __B, const int __R) +{ + return (__m256h) __builtin_ia32_vcvt2ps2phx256_mask_round ((__v8sf) __A, + (__v8sf) __B, + (__v16hf) __W, + (__mmask16) __U, + __R); +} + +extern __inline __m256h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_maskz_cvtx_round2ps_ph (__mmask16 __U, __m256 __A, + __m256 __B, const int __R) +{ + return (__m256h) __builtin_ia32_vcvt2ps2phx256_mask_round ((__v8sf) __A, + (__v8sf) __B, + (__v16hf) + _mm256_setzero_ph (), + (__mmask16) __U, + __R); +} + +#else +#define _mm256_cvtx_round2ps_ph(A, B, R) \ + ((__m256h) __builtin_ia32_vcvt2ps2phx256_mask_round ((__v8sf) (A), \ + (__v8sf) (B), \ + (__v16hf) \ + (_mm256_setzero_ph ()), \ + (__mmask16) (-1), \ + (R))) + +#define _mm256_mask_cvtx_round2ps_ph(W, U, A, B, R) \ + ((__m256h) __builtin_ia32_vcvt2ps2phx256_mask_round ((__v8sf) (A), \ + (__v8sf) (B), \ + (__v16hf) (W), \ + (__mmask16) (U), \ + (R))) + +#define _mm256_maskz_cvtx_round2ps_ph(U, A, B, R) \ + ((__m256h) __builtin_ia32_vcvt2ps2phx256_mask_round ((__v8sf) (A), \ + (__v8sf) (B), \ + (__v16hf) \ + (_mm256_setzero_ph ()), \ + (__mmask16) (U), \ + (R))) +#endif /* __OPTIMIZE__ */ + +extern __inline__ __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvtbiasph_pbf8 (__m128i __A, __m128h __B) +{ + return (__m128i) __builtin_ia32_vcvtbiasph2bf8128 ((__v16qi) __A, + (__v8hf) __B); +} + +extern __inline__ __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_cvtbiasph_pbf8 (__m128i __W, __mmask8 __U, __m128i __A, + __m128h __B) +{ + return (__m128i) __builtin_ia32_vcvtbiasph2bf8128_mask ((__v16qi) __A, + (__v8hf) __B, + (__v16qi)(__m128i) __W, + (__mmask8) __U); +} + +extern __inline__ __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_cvtbiasph_pbf8 (__mmask8 __U, __m128i __A, __m128h __B) +{ + return (__m128i) __builtin_ia32_vcvtbiasph2bf8128_mask ((__v16qi) __A, + (__v8hf) __B, + (__v16qi)(__m128i) + _mm_setzero_si128 (), + (__mmask8) __U); +} + +extern __inline__ __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_cvtbiasph_pbf8 (__m256i __A, __m256h __B) +{ + return (__m128i) __builtin_ia32_vcvtbiasph2bf8256_mask ((__v32qi) __A, + (__v16hf) __B, + (__v16qi)(__m128i) + _mm_undefined_si128 (), + (__mmask16) -1); +} + +extern __inline__ __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_mask_cvtbiasph_pbf8 (__m128i __W, __mmask16 __U, __m256i __A, + __m256h __B) +{ + return (__m128i) __builtin_ia32_vcvtbiasph2bf8256_mask ((__v32qi) __A, + (__v16hf) __B, + (__v16qi)(__m128i) __W, + (__mmask16) __U); +} + +extern __inline__ __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_maskz_cvtbiasph_pbf8 (__mmask16 __U, __m256i __A, __m256h __B) +{ + return (__m128i) __builtin_ia32_vcvtbiasph2bf8256_mask ((__v32qi) __A, + (__v16hf) __B, + (__v16qi)(__m128i) + _mm_setzero_si128 (), + (__mmask16) __U); +} + +extern __inline__ __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvtbiassph_pbf8 (__m128i __A, __m128h __B) +{ + return (__m128i) __builtin_ia32_vcvtbiasph2bf8s128 ((__v16qi) __A, + (__v8hf) __B); +} + +extern __inline__ __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_cvtbiassph_pbf8 (__m128i __W, __mmask8 __U, + __m128i __A, __m128h __B) +{ + return (__m128i) __builtin_ia32_vcvtbiasph2bf8s128_mask ((__v16qi) __A, + (__v8hf) __B, + (__v16qi)(__m128i) __W, + (__mmask8) __U); +} + +extern __inline__ __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_cvtbiassph_pbf8 (__mmask8 __U, __m128i __A, __m128h __B) +{ + return (__m128i) __builtin_ia32_vcvtbiasph2bf8s128_mask ((__v16qi) __A, + (__v8hf) __B, + (__v16qi)(__m128i) + _mm_setzero_si128 (), + (__mmask8) __U); +} + +extern __inline__ __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_cvtbiassph_pbf8 (__m256i __A, __m256h __B) +{ + return (__m128i) __builtin_ia32_vcvtbiasph2bf8s256_mask ((__v32qi) __A, + (__v16hf) __B, + (__v16qi)(__m128i) + _mm_undefined_si128 (), + (__mmask16) -1); +} + +extern __inline__ __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_mask_cvtbiassph_pbf8 (__m128i __W, __mmask16 __U, + __m256i __A, __m256h __B) +{ + return (__m128i) __builtin_ia32_vcvtbiasph2bf8s256_mask ((__v32qi) __A, + (__v16hf) __B, + (__v16qi)(__m128i) __W, + (__mmask16) __U); +} + +extern __inline__ __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_maskz_cvtbiassph_pbf8 (__mmask16 __U, __m256i __A, __m256h __B) +{ + return (__m128i) __builtin_ia32_vcvtbiasph2bf8s256_mask ((__v32qi) __A, + (__v16hf) __B, + (__v16qi)(__m128i) + _mm_setzero_si128 (), + (__mmask16) __U); +} + +extern __inline__ __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvtbiasph_phf8 (__m128i __A, __m128h __B) +{ + return (__m128i) __builtin_ia32_vcvtbiasph2hf8128 ((__v16qi) __A, + (__v8hf) __B); +} + +extern __inline__ __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_cvtbiasph_phf8 (__m128i __W, __mmask8 __U, __m128i __A, + __m128h __B) +{ + return (__m128i) __builtin_ia32_vcvtbiasph2hf8128_mask ((__v16qi) __A, + (__v8hf) __B, + (__v16qi)(__m128i) __W, + (__mmask8) __U); +} + +extern __inline__ __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_cvtbiasph_phf8 (__mmask8 __U, __m128i __A, __m128h __B) +{ + return (__m128i) __builtin_ia32_vcvtbiasph2hf8128_mask ((__v16qi) __A, + (__v8hf) __B, + (__v16qi)(__m128i) + _mm_setzero_si128 (), + (__mmask8) __U); +} + +extern __inline__ __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_cvtbiasph_phf8 (__m256i __A, __m256h __B) +{ + return (__m128i) __builtin_ia32_vcvtbiasph2hf8256_mask ((__v32qi) __A, + (__v16hf) __B, + (__v16qi)(__m128i) + _mm_undefined_si128 (), + (__mmask16) -1); +} + +extern __inline__ __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_mask_cvtbiasph_phf8 (__m128i __W, __mmask16 __U, + __m256i __A, __m256h __B) +{ + return (__m128i) __builtin_ia32_vcvtbiasph2hf8256_mask ((__v32qi) __A, + (__v16hf) __B, + (__v16qi)(__m128i) __W, + (__mmask16) __U); +} + +extern __inline__ __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_maskz_cvtbiasph_phf8 (__mmask16 __U, __m256i __A, __m256h __B) +{ + return (__m128i) __builtin_ia32_vcvtbiasph2hf8256_mask ((__v32qi) __A, + (__v16hf) __B, + (__v16qi)(__m128i) + _mm_setzero_si128 (), + (__mmask16) __U); +} + +extern __inline__ __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvtbiassph_phf8 (__m128i __A, __m128h __B) +{ + return (__m128i) __builtin_ia32_vcvtbiasph2hf8s128 ((__v16qi) __A, + (__v8hf) __B); +} + +extern __inline__ __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_cvtbiassph_phf8 (__m128i __W, __mmask8 __U, + __m128i __A, __m128h __B) +{ + return (__m128i) __builtin_ia32_vcvtbiasph2hf8s128_mask ((__v16qi) __A, + (__v8hf) __B, + (__v16qi)(__m128i) __W, + (__mmask8) __U); +} + +extern __inline__ __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_cvtbiassph_phf8 (__mmask8 __U, __m128i __A, __m128h __B) +{ + return (__m128i) __builtin_ia32_vcvtbiasph2hf8s128_mask ((__v16qi) __A, + (__v8hf) __B, + (__v16qi)(__m128i) + _mm_setzero_si128 (), + (__mmask8) __U); +} + +extern __inline__ __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_cvtbiassph_phf8 (__m256i __A, __m256h __B) +{ + return (__m128i) __builtin_ia32_vcvtbiasph2hf8s256_mask ((__v32qi) __A, + (__v16hf) __B, + (__v16qi)(__m128i) + _mm_undefined_si128 (), + (__mmask16) -1); +} + +extern __inline__ __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_mask_cvtbiassph_phf8 (__m128i __W, __mmask16 __U, + __m256i __A, __m256h __B) +{ + return (__m128i) __builtin_ia32_vcvtbiasph2hf8s256_mask ((__v32qi) __A, + (__v16hf) __B, + (__v16qi)(__m128i) __W, + (__mmask16) __U); +} + +extern __inline__ __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_maskz_cvtbiassph_phf8 (__mmask16 __U, __m256i __A, __m256h __B) +{ + return (__m128i) __builtin_ia32_vcvtbiasph2hf8s256_mask ((__v32qi) __A, + (__v16hf) __B, + (__v16qi)(__m128i) + _mm_setzero_si128 (), + (__mmask16) __U); +} + +extern __inline__ __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvtne2ph_pbf8 (__m128h __A, __m128h __B) +{ + return (__m128i) __builtin_ia32_vcvtne2ph2bf8128_mask ((__v8hf) __A, + (__v8hf) __B, + (__v16qi) + _mm_setzero_si128 (), + (__mmask16) -1); +} + +extern __inline__ __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_cvtne2ph_pbf8 (__m128i __W, __mmask16 __U, + __m128h __A, __m128h __B) +{ + return (__m128i) __builtin_ia32_vcvtne2ph2bf8128_mask ((__v8hf) __A, + (__v8hf) __B, + (__v16qi) __W, + (__mmask16) __U); +} + +extern __inline__ __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_cvtne2ph_pbf8 (__mmask16 __U, __m128h __A, __m128h __B) +{ + return (__m128i) __builtin_ia32_vcvtne2ph2bf8128_mask ((__v8hf) __A, + (__v8hf) __B, + (__v16qi) + _mm_setzero_si128 (), + (__mmask16) __U); +} + +extern __inline__ __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_cvtne2ph_pbf8 (__m256h __A, __m256h __B) +{ + return (__m256i) __builtin_ia32_vcvtne2ph2bf8256_mask ((__v16hf) __A, + (__v16hf) __B, + (__v32qi) + _mm256_setzero_si256 (), + (__mmask32) -1); +} + +extern __inline__ __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_mask_cvtne2ph_pbf8 (__m256i __W, __mmask32 __U, + __m256h __A, __m256h __B) +{ + return (__m256i) __builtin_ia32_vcvtne2ph2bf8256_mask ((__v16hf) __A, + (__v16hf) __B, + (__v32qi) __W, + (__mmask32) __U); +} + +extern __inline__ __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_maskz_cvtne2ph_pbf8 (__mmask32 __U, __m256h __A, __m256h __B) +{ + return (__m256i) __builtin_ia32_vcvtne2ph2bf8256_mask ((__v16hf) __A, + (__v16hf) __B, + (__v32qi) + _mm256_setzero_si256 (), + (__mmask32) __U); +} + +extern __inline__ __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvtnes2ph_pbf8 (__m128h __A, __m128h __B) +{ + return (__m128i) __builtin_ia32_vcvtne2ph2bf8s128_mask ((__v8hf) __A, + (__v8hf) __B, + (__v16qi) + _mm_setzero_si128 (), + (__mmask16) -1); +} + +extern __inline__ __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_cvtnes2ph_pbf8 (__m128i __W, __mmask16 __U, + __m128h __A, __m128h __B) +{ + return (__m128i) __builtin_ia32_vcvtne2ph2bf8s128_mask ((__v8hf) __A, + (__v8hf) __B, + (__v16qi) __W, + (__mmask16) __U); +} + +extern __inline__ __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_cvtnes2ph_pbf8 (__mmask16 __U, __m128h __A, __m128h __B) +{ + return (__m128i) __builtin_ia32_vcvtne2ph2bf8s128_mask ((__v8hf) __A, + (__v8hf) __B, + (__v16qi) + _mm_setzero_si128 (), + (__mmask16) __U); +} + +extern __inline__ __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_cvtnes2ph_pbf8 (__m256h __A, __m256h __B) +{ + return (__m256i) __builtin_ia32_vcvtne2ph2bf8s256_mask ((__v16hf) __A, + (__v16hf) __B, + (__v32qi) + _mm256_setzero_si256 (), + (__mmask32) -1); +} + +extern __inline__ __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_mask_cvtnes2ph_pbf8 (__m256i __W, __mmask32 __U, + __m256h __A, __m256h __B) +{ + return (__m256i) __builtin_ia32_vcvtne2ph2bf8s256_mask ((__v16hf) __A, + (__v16hf) __B, + (__v32qi) __W, + (__mmask32) __U); +} + +extern __inline__ __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_maskz_cvtnes2ph_pbf8 (__mmask32 __U, __m256h __A, __m256h __B) +{ + return (__m256i) __builtin_ia32_vcvtne2ph2bf8s256_mask ((__v16hf) __A, + (__v16hf) __B, + (__v32qi) + _mm256_setzero_si256 (), + (__mmask32) __U); +} + +extern __inline__ __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvtne2ph_phf8 (__m128h __A, __m128h __B) +{ + return (__m128i) __builtin_ia32_vcvtne2ph2hf8128_mask ((__v8hf) __A, + (__v8hf) __B, + (__v16qi) + _mm_setzero_si128 (), + (__mmask16) -1); +} + +extern __inline__ __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_cvtne2ph_phf8 (__m128i __W, __mmask16 __U, + __m128h __A, __m128h __B) +{ + return (__m128i) __builtin_ia32_vcvtne2ph2hf8128_mask ((__v8hf) __A, + (__v8hf) __B, + (__v16qi) __W, + (__mmask16) __U); +} + +extern __inline__ __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_cvtne2ph_phf8 (__mmask16 __U, __m128h __A, __m128h __B) +{ + return (__m128i) __builtin_ia32_vcvtne2ph2hf8128_mask ((__v8hf) __A, + (__v8hf) __B, + (__v16qi) + _mm_setzero_si128 (), + (__mmask16) __U); +} + +extern __inline__ __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_cvtne2ph_phf8 (__m256h __A, __m256h __B) +{ + return (__m256i) __builtin_ia32_vcvtne2ph2hf8256_mask ((__v16hf) __A, + (__v16hf) __B, + (__v32qi) + _mm256_setzero_si256 (), + (__mmask32) -1); +} + +extern __inline__ __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_mask_cvtne2ph_phf8 (__m256i __W, __mmask32 __U, + __m256h __A, __m256h __B) +{ + return (__m256i) __builtin_ia32_vcvtne2ph2hf8256_mask ((__v16hf) __A, + (__v16hf) __B, + (__v32qi) __W, + (__mmask32) __U); +} + +extern __inline__ __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_maskz_cvtne2ph_phf8 (__mmask32 __U, __m256h __A, __m256h __B) +{ + return (__m256i) __builtin_ia32_vcvtne2ph2hf8256_mask ((__v16hf) __A, + (__v16hf) __B, + (__v32qi) + _mm256_setzero_si256 (), + (__mmask32) __U); +} + +extern __inline__ __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvtnes2ph_phf8 (__m128h __A, __m128h __B) +{ + return (__m128i) __builtin_ia32_vcvtne2ph2hf8s128_mask ((__v8hf) __A, + (__v8hf) __B, + (__v16qi) + _mm_setzero_si128 (), + (__mmask16) -1); +} + +extern __inline__ __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_cvtnes2ph_phf8 (__m128i __W, __mmask16 __U, + __m128h __A, __m128h __B) +{ + return (__m128i) __builtin_ia32_vcvtne2ph2hf8s128_mask ((__v8hf) __A, + (__v8hf) __B, + (__v16qi) __W, + (__mmask16) __U); +} + +extern __inline__ __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_cvtnes2ph_phf8 (__mmask16 __U, __m128h __A, __m128h __B) +{ + return (__m128i) __builtin_ia32_vcvtne2ph2hf8s128_mask ((__v8hf) __A, + (__v8hf) __B, + (__v16qi) + _mm_setzero_si128 (), + (__mmask16) __U); +} + +extern __inline__ __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_cvtnes2ph_phf8 (__m256h __A, __m256h __B) +{ + return (__m256i) __builtin_ia32_vcvtne2ph2hf8s256_mask ((__v16hf) __A, + (__v16hf) __B, + (__v32qi) + _mm256_setzero_si256 (), + (__mmask32) -1); +} + +extern __inline__ __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_mask_cvtnes2ph_phf8 (__m256i __W, __mmask32 __U, + __m256h __A, __m256h __B) +{ + return (__m256i) __builtin_ia32_vcvtne2ph2hf8s256_mask ((__v16hf) __A, + (__v16hf) __B, + (__v32qi) __W, + (__mmask32) __U); +} + +extern __inline__ __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_maskz_cvtnes2ph_phf8 (__mmask32 __U, __m256h __A, __m256h __B) +{ + return (__m256i) __builtin_ia32_vcvtne2ph2hf8s256_mask ((__v16hf) __A, + (__v16hf) __B, + (__v32qi) + _mm256_setzero_si256 (), + (__mmask32) __U); +} + +extern __inline__ __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvthf8_ph (__m128i __A) +{ + return (__m128h) __builtin_ia32_vcvthf82ph128_mask ((__v16qi) __A, + (__v8hf)(__m128h) + _mm_undefined_ph (), + (__mmask8) -1); +} + +extern __inline__ __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_cvthf8_ph (__m128h __W, __mmask8 __U, __m128i __A) +{ + return (__m128h) __builtin_ia32_vcvthf82ph128_mask ((__v16qi) __A, + (__v8hf)(__m128h) __W, + (__mmask8) __U); +} + +extern __inline__ __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_cvthf8_ph (__mmask8 __U, __m128i __A) +{ + return (__m128h) __builtin_ia32_vcvthf82ph128_mask ((__v16qi) __A, + (__v8hf)(__m128h) + _mm_setzero_ph (), + (__mmask8) __U); +} + +extern __inline__ __m256h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_cvthf8_ph (__m128i __A) +{ + return (__m256h) __builtin_ia32_vcvthf82ph256_mask ((__v16qi) __A, + (__v16hf)(__m256h) + _mm256_undefined_ph (), + (__mmask16) -1); +} + +extern __inline__ __m256h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_mask_cvthf8_ph (__m256h __W, __mmask16 __U, __m128i __A) +{ + return (__m256h) __builtin_ia32_vcvthf82ph256_mask ((__v16qi) __A, + (__v16hf)(__m256h) __W, + (__mmask16) __U); +} + +extern __inline__ __m256h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_maskz_cvthf8_ph (__mmask16 __U, __m128i __A) +{ + return (__m256h) __builtin_ia32_vcvthf82ph256_mask ((__v16qi) __A, + (__v16hf)(__m256h) + _mm256_setzero_ph (), + (__mmask16) __U); +} + +extern __inline__ __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvtneph_pbf8 (__m128h __A) +{ + return (__m128i) __builtin_ia32_vcvtneph2bf8128_mask ((__v8hf) __A, + (__v16qi)(__m128i) + _mm_undefined_si128 (), + (__mmask8) -1); +} + +extern __inline__ __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_cvtneph_pbf8 (__m128i __W, __mmask8 __U, __m128h __A) +{ + return (__m128i) __builtin_ia32_vcvtneph2bf8128_mask ((__v8hf) __A, + (__v16qi)(__m128i) __W, + (__mmask8) __U); +} + +extern __inline__ __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_cvtneph_pbf8 (__mmask8 __U, __m128h __A) +{ + return (__m128i) __builtin_ia32_vcvtneph2bf8128_mask ((__v8hf) __A, + (__v16qi)(__m128i) + _mm_setzero_si128 (), + (__mmask8) __U); +} + +extern __inline__ __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_cvtneph_pbf8 (__m256h __A) +{ + return (__m128i) __builtin_ia32_vcvtneph2bf8256_mask ((__v16hf) __A, + (__v16qi)(__m128i) + _mm_undefined_si128 (), + (__mmask16) -1); +} + +extern __inline__ __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_mask_cvtneph_pbf8 (__m128i __W, __mmask16 __U, __m256h __A) +{ + return (__m128i) __builtin_ia32_vcvtneph2bf8256_mask ((__v16hf) __A, + (__v16qi)(__m128i) __W, + (__mmask16) __U); +} + +extern __inline__ __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_maskz_cvtneph_pbf8 (__mmask16 __U, __m256h __A) +{ + return (__m128i) __builtin_ia32_vcvtneph2bf8256_mask ((__v16hf) __A, + (__v16qi)(__m128i) + _mm_setzero_si128 (), + (__mmask16) __U); +} + +extern __inline__ __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvtnesph_pbf8 (__m128h __A) +{ + return (__m128i) __builtin_ia32_vcvtneph2bf8s128_mask ((__v8hf) __A, + (__v16qi)(__m128i) + _mm_undefined_si128 (), + (__mmask8) -1); +} + +extern __inline__ __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_cvtnesph_pbf8 (__m128i __W, __mmask8 __U, __m128h __A) +{ + return (__m128i) __builtin_ia32_vcvtneph2bf8s128_mask ((__v8hf) __A, + (__v16qi)(__m128i) __W, + (__mmask8) __U); +} + +extern __inline__ __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_cvtnesph_pbf8 (__mmask8 __U, __m128h __A) +{ + return (__m128i) __builtin_ia32_vcvtneph2bf8s128_mask ((__v8hf) __A, + (__v16qi)(__m128i) + _mm_setzero_si128 (), + (__mmask8) __U); +} + +extern __inline__ __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_cvtnesph_pbf8 (__m256h __A) +{ + return (__m128i) __builtin_ia32_vcvtneph2bf8s256_mask ((__v16hf) __A, + (__v16qi)(__m128i) + _mm_undefined_si128 (), + (__mmask16) -1); +} + +extern __inline__ __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_mask_cvtnesph_pbf8 (__m128i __W, __mmask16 __U, __m256h __A) +{ + return (__m128i) __builtin_ia32_vcvtneph2bf8s256_mask ((__v16hf) __A, + (__v16qi)(__m128i) __W, + (__mmask16) __U); +} + +extern __inline__ __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_maskz_cvtnesph_pbf8 (__mmask16 __U, __m256h __A) +{ + return (__m128i) __builtin_ia32_vcvtneph2bf8s256_mask ((__v16hf) __A, + (__v16qi)(__m128i) + _mm_setzero_si128 (), + (__mmask16) __U); +} + +extern __inline__ __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvtneph_phf8 (__m128h __A) +{ + return (__m128i) __builtin_ia32_vcvtneph2hf8128_mask ((__v8hf) __A, + (__v16qi)(__m128i) + _mm_undefined_si128 (), + (__mmask8) -1); +} + +extern __inline__ __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_cvtneph_phf8 (__m128i __W, __mmask8 __U, __m128h __A) +{ + return (__m128i) __builtin_ia32_vcvtneph2hf8128_mask ((__v8hf) __A, + (__v16qi)(__m128i) __W, + (__mmask8) __U); +} + +extern __inline__ __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_cvtneph_phf8 (__mmask8 __U, __m128h __A) +{ + return (__m128i) __builtin_ia32_vcvtneph2hf8128_mask ((__v8hf) __A, + (__v16qi)(__m128i) + _mm_setzero_si128 (), + (__mmask8) __U); +} + +extern __inline__ __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_cvtneph_phf8 (__m256h __A) +{ + return (__m128i) __builtin_ia32_vcvtneph2hf8256_mask ((__v16hf) __A, + (__v16qi)(__m128i) + _mm_undefined_si128 (), + (__mmask16) -1); +} + +extern __inline__ __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_mask_cvtneph_phf8 (__m128i __W, __mmask16 __U, __m256h __A) +{ + return (__m128i) __builtin_ia32_vcvtneph2hf8256_mask ((__v16hf) __A, + (__v16qi)(__m128i) __W, + (__mmask16) __U); +} + +extern __inline__ __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_maskz_cvtneph_phf8 (__mmask16 __U, __m256h __A) +{ + return (__m128i) __builtin_ia32_vcvtneph2hf8256_mask ((__v16hf) __A, + (__v16qi)(__m128i) + _mm_setzero_si128 (), + (__mmask16) __U); +} + +extern __inline__ __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvtnesph_phf8 (__m128h __A) +{ + return (__m128i) __builtin_ia32_vcvtneph2hf8s128_mask ((__v8hf) __A, + (__v16qi)(__m128i) + _mm_undefined_si128 (), + (__mmask8) -1); +} + +extern __inline__ __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_cvtnesph_phf8 (__m128i __W, __mmask8 __U, __m128h __A) +{ + return (__m128i) __builtin_ia32_vcvtneph2hf8s128_mask ((__v8hf) __A, + (__v16qi)(__m128i) __W, + (__mmask8) __U); +} + +extern __inline__ __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_cvtnesph_phf8 (__mmask8 __U, __m128h __A) +{ + return (__m128i) __builtin_ia32_vcvtneph2hf8s128_mask ((__v8hf) __A, + (__v16qi)(__m128i) + _mm_setzero_si128 (), + (__mmask8) __U); +} + +extern __inline__ __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_cvtnesph_phf8 (__m256h __A) +{ + return (__m128i) __builtin_ia32_vcvtneph2hf8s256_mask ((__v16hf) __A, + (__v16qi)(__m128i) + _mm_undefined_si128 (), + (__mmask16) -1); +} + +extern __inline__ __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_mask_cvtnesph_phf8 (__m128i __W, __mmask16 __U, __m256h __A) +{ + return (__m128i) __builtin_ia32_vcvtneph2hf8s256_mask ((__v16hf) __A, + (__v16qi)(__m128i) __W, + (__mmask16) __U); +} + +extern __inline__ __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_maskz_cvtnesph_phf8 (__mmask16 __U, __m256h __A) +{ + return (__m128i) __builtin_ia32_vcvtneph2hf8s256_mask ((__v16hf) __A, + (__v16qi)(__m128i) + _mm_setzero_si128 (), + (__mmask16) __U); +} + +#ifdef __DISABLE_AVX10_2_256__ +#undef __DISABLE_AVX10_2_256__ +#pragma GCC pop_options +#endif /* __DISABLE_AVX10_2_256__ */ + +#endif /* __AVX10_2CONVERTINTRIN_H_INCLUDED */ diff --git a/gcc/config/i386/i386-builtin-types.def b/gcc/config/i386/i386-builtin-types.def index f5fa2544cc5..63b65846c8f 100644 --- a/gcc/config/i386/i386-builtin-types.def +++ b/gcc/config/i386/i386-builtin-types.def @@ -1453,3 +1453,24 @@ DEF_FUNCTION_TYPE (V16HF, V16HF, INT, V16HF, UHI, INT) DEF_FUNCTION_TYPE (V8SF, V8SF, INT, V8SF, UQI, INT) DEF_FUNCTION_TYPE (V4DF, V4DF, V4DF, INT, V4DF, UQI, INT) DEF_FUNCTION_TYPE (V8SF, V8SF, V8SF, INT, V8SF, UQI, INT) +DEF_FUNCTION_TYPE (V32HF, V16SF, V16SF, V32HF, USI, INT) +DEF_FUNCTION_TYPE (V16HF, V8SF, V8SF, V16HF, UHI, INT) +DEF_FUNCTION_TYPE (V32HF, V16SF, V16SF, V32HF, USI) +DEF_FUNCTION_TYPE (V16HF, V8SF, V8SF, V16HF, UHI) +DEF_FUNCTION_TYPE (V8HF, V4SF, V4SF, V8HF, UQI) +DEF_FUNCTION_TYPE (V16QI, V16QI, V8HF) +DEF_FUNCTION_TYPE (V16QI, V16QI, V8HF, V16QI, UHI) +DEF_FUNCTION_TYPE (V16QI, V32QI, V16HF, V16QI, UHI) +DEF_FUNCTION_TYPE (V32QI, V64QI, V32HF, V32QI, USI) +DEF_FUNCTION_TYPE (V64QI, V64QI, V32HF, V32HF) +DEF_FUNCTION_TYPE (V32HF, V32QI, V32HF, USI) +DEF_FUNCTION_TYPE (V32QI, V32QI, V16HF, V16HF) +DEF_FUNCTION_TYPE (V16QI, V16QI, V8HF, V8HF) +DEF_FUNCTION_TYPE (V8HF, V16QI, V8HF, UQI) +DEF_FUNCTION_TYPE (V16HF, V16QI, V16HF, UHI) +DEF_FUNCTION_TYPE (V16QI, V8HF, V8HF, V16QI, UHI) +DEF_FUNCTION_TYPE (V32QI, V16HF, V16HF, V32QI, USI) +DEF_FUNCTION_TYPE (V64QI, V32HF, V32HF, V64QI, UDI) +DEF_FUNCTION_TYPE (V16QI, V8HF, V16QI, UQI) +DEF_FUNCTION_TYPE (V16QI, V16HF, V16QI, UHI) +DEF_FUNCTION_TYPE (V32QI, V32HF, V32QI, USI) diff --git a/gcc/config/i386/i386-builtin.def b/gcc/config/i386/i386-builtin.def index cdf28cd261c..6f5ab32dd0d 100644 --- a/gcc/config/i386/i386-builtin.def +++ b/gcc/config/i386/i386-builtin.def @@ -3115,6 +3115,50 @@ BDESC (0, OPTION_MASK_ISA2_AVX10_2_512, CODE_FOR_avx10_2_mpsadbw, "__builtin_ia3 BDESC (0, OPTION_MASK_ISA2_AVX10_2_512, CODE_FOR_avx10_2_mpsadbw_mask, "__builtin_ia32_mpsadbw512_mask", IX86_BUILTIN_VMPSADBW_V32HI_MASK, UNKNOWN, (int) V32HI_FTYPE_V64QI_V64QI_INT_V32HI_USI) BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx2_mpsadbw_mask, "__builtin_ia32_mpsadbw256_mask", IX86_BUILTIN_VMPSADBW_V16HI_MASK, UNKNOWN, (int) V16HI_FTYPE_V32QI_V32QI_INT_V16HI_UHI) BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_sse4_1_mpsadbw_mask, "__builtin_ia32_mpsadbw128_mask", IX86_BUILTIN_VMPSADBW_V8HI_MASK, UNKNOWN, (int) V8HI_FTYPE_V16QI_V16QI_INT_V8HI_UQI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_cvt2ps2phx_v8hf_mask, "__builtin_ia32_vcvt2ps2phx128_mask", IX86_BUILTIN_VCVT2PS2PHX_V8HF_MASK, UNKNOWN, (int) V8HF_FTYPE_V4SF_V4SF_V8HF_UQI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_vcvtbiasph2bf8v8hf, "__builtin_ia32_vcvtbiasph2bf8128", IX86_BUILTIN_VCVTBIASPH2BF8128, UNKNOWN, (int) V16QI_FTYPE_V16QI_V8HF) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_vcvtbiasph2bf8v8hf_mask, "__builtin_ia32_vcvtbiasph2bf8128_mask", IX86_BUILTIN_VCVTBIASPH2BF8128_MASK, UNKNOWN, (int) V16QI_FTYPE_V16QI_V8HF_V16QI_UHI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_vcvtbiasph2bf8v16hf_mask, "__builtin_ia32_vcvtbiasph2bf8256_mask", IX86_BUILTIN_VCVTBIASPH2BF8256_MASK, UNKNOWN, (int) V16QI_FTYPE_V32QI_V16HF_V16QI_UHI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_512, CODE_FOR_vcvtbiasph2bf8v32hf_mask, "__builtin_ia32_vcvtbiasph2bf8512_mask", IX86_BUILTIN_VCVTBIASPH2BF8512_MASK, UNKNOWN, (int) V32QI_FTYPE_V64QI_V32HF_V32QI_USI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_vcvtbiasph2bf8sv8hf, "__builtin_ia32_vcvtbiasph2bf8s128", IX86_BUILTIN_VCVTBIASPH2BF8S128, UNKNOWN, (int) V16QI_FTYPE_V16QI_V8HF) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_vcvtbiasph2bf8sv8hf_mask, "__builtin_ia32_vcvtbiasph2bf8s128_mask", IX86_BUILTIN_VCVTBIASPH2BF8S128_MASK, UNKNOWN, (int) V16QI_FTYPE_V16QI_V8HF_V16QI_UHI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_vcvtbiasph2bf8sv16hf_mask, "__builtin_ia32_vcvtbiasph2bf8s256_mask", IX86_BUILTIN_VCVTBIASPH2BF8S256_MASK, UNKNOWN, (int) V16QI_FTYPE_V32QI_V16HF_V16QI_UHI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_512, CODE_FOR_vcvtbiasph2bf8sv32hf_mask, "__builtin_ia32_vcvtbiasph2bf8s512_mask", IX86_BUILTIN_VCVTBIASPH2BF8S512_MASK, UNKNOWN, (int) V32QI_FTYPE_V64QI_V32HF_V32QI_USI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_vcvtbiasph2hf8v8hf, "__builtin_ia32_vcvtbiasph2hf8128", IX86_BUILTIN_VCVTBIASPH2HF8128, UNKNOWN, (int) V16QI_FTYPE_V16QI_V8HF) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_vcvtbiasph2hf8v8hf_mask, "__builtin_ia32_vcvtbiasph2hf8128_mask", IX86_BUILTIN_VCVTBIASPH2HF8128_MASK, UNKNOWN, (int) V16QI_FTYPE_V16QI_V8HF_V16QI_UHI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_vcvtbiasph2hf8v16hf_mask, "__builtin_ia32_vcvtbiasph2hf8256_mask", IX86_BUILTIN_VCVTBIASPH2HF8256_MASK, UNKNOWN, (int) V16QI_FTYPE_V32QI_V16HF_V16QI_UHI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_512, CODE_FOR_vcvtbiasph2hf8v32hf_mask, "__builtin_ia32_vcvtbiasph2hf8512_mask", IX86_BUILTIN_VCVTBIASPH2HF8512_MASK, UNKNOWN, (int) V32QI_FTYPE_V64QI_V32HF_V32QI_USI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_vcvtbiasph2hf8sv8hf, "__builtin_ia32_vcvtbiasph2hf8s128", IX86_BUILTIN_VCVTBIASPH2HF8S128, UNKNOWN, (int) V16QI_FTYPE_V16QI_V8HF) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_vcvtbiasph2hf8sv8hf_mask, "__builtin_ia32_vcvtbiasph2hf8s128_mask", IX86_BUILTIN_VCVTBIASPH2HF8S128_MASK, UNKNOWN, (int) V16QI_FTYPE_V16QI_V8HF_V16QI_UHI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_vcvtbiasph2hf8sv16hf_mask, "__builtin_ia32_vcvtbiasph2hf8s256_mask", IX86_BUILTIN_VCVTBIASPH2HF8S256_MASK, UNKNOWN, (int) V16QI_FTYPE_V32QI_V16HF_V16QI_UHI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_512, CODE_FOR_vcvtbiasph2hf8sv32hf_mask, "__builtin_ia32_vcvtbiasph2hf8s512_mask", IX86_BUILTIN_VCVTBIASPH2HF8S512_MASK, UNKNOWN, (int) V32QI_FTYPE_V64QI_V32HF_V32QI_USI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_vcvtne2ph2bf8v8hf_mask, "__builtin_ia32_vcvtne2ph2bf8128_mask", IX86_BUILTIN_VCVTNE2PH2BF8128_MASK, UNKNOWN, (int) V16QI_FTYPE_V8HF_V8HF_V16QI_UHI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_vcvtne2ph2bf8v16hf_mask, "__builtin_ia32_vcvtne2ph2bf8256_mask", IX86_BUILTIN_VCVTNE2PH2BF8256_MASK, UNKNOWN, (int) V32QI_FTYPE_V16HF_V16HF_V32QI_USI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_512, CODE_FOR_vcvtne2ph2bf8v32hf_mask, "__builtin_ia32_vcvtne2ph2bf8512_mask", IX86_BUILTIN_VCVTNE2PH2BF8512_MASK, UNKNOWN, (int) V64QI_FTYPE_V32HF_V32HF_V64QI_UDI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_vcvtne2ph2bf8sv8hf_mask, "__builtin_ia32_vcvtne2ph2bf8s128_mask", IX86_BUILTIN_VCVTNE2PH2BF8S128_MASK, UNKNOWN, (int) V16QI_FTYPE_V8HF_V8HF_V16QI_UHI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_vcvtne2ph2bf8sv16hf_mask, "__builtin_ia32_vcvtne2ph2bf8s256_mask", IX86_BUILTIN_VCVTNE2PH2BF8S256_MASK, UNKNOWN, (int) V32QI_FTYPE_V16HF_V16HF_V32QI_USI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_512, CODE_FOR_vcvtne2ph2bf8sv32hf_mask, "__builtin_ia32_vcvtne2ph2bf8s512_mask", IX86_BUILTIN_VCVTNE2PH2BF8S512_MASK, UNKNOWN, (int) V64QI_FTYPE_V32HF_V32HF_V64QI_UDI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_vcvtne2ph2hf8v8hf_mask, "__builtin_ia32_vcvtne2ph2hf8128_mask", IX86_BUILTIN_VCVTNE2PH2HF8128_MASK, UNKNOWN, (int) V16QI_FTYPE_V8HF_V8HF_V16QI_UHI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_vcvtne2ph2hf8v16hf_mask, "__builtin_ia32_vcvtne2ph2hf8256_mask", IX86_BUILTIN_VCVTNE2PH2HF8256_MASK, UNKNOWN, (int) V32QI_FTYPE_V16HF_V16HF_V32QI_USI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_512, CODE_FOR_vcvtne2ph2hf8v32hf_mask, "__builtin_ia32_vcvtne2ph2hf8512_mask", IX86_BUILTIN_VCVTNE2PH2HF8512_MASK, UNKNOWN, (int) V64QI_FTYPE_V32HF_V32HF_V64QI_UDI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_vcvtne2ph2hf8sv8hf_mask, "__builtin_ia32_vcvtne2ph2hf8s128_mask", IX86_BUILTIN_VCVTNE2PH2HF8S128_MASK, UNKNOWN, (int) V16QI_FTYPE_V8HF_V8HF_V16QI_UHI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_vcvtne2ph2hf8sv16hf_mask, "__builtin_ia32_vcvtne2ph2hf8s256_mask", IX86_BUILTIN_VCVTNE2PH2HF8S256_MASK, UNKNOWN, (int) V32QI_FTYPE_V16HF_V16HF_V32QI_USI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_512, CODE_FOR_vcvtne2ph2hf8sv32hf_mask, "__builtin_ia32_vcvtne2ph2hf8s512_mask", IX86_BUILTIN_VCVTNE2PH2HF8S512_MASK, UNKNOWN, (int) V64QI_FTYPE_V32HF_V32HF_V64QI_UDI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_vcvtneph2bf8v8hf_mask, "__builtin_ia32_vcvtneph2bf8128_mask", IX86_BUILTIN_VCVTNEPH2BF8128_MASK, UNKNOWN, (int) V16QI_FTYPE_V8HF_V16QI_UQI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_vcvtneph2bf8v16hf_mask, "__builtin_ia32_vcvtneph2bf8256_mask", IX86_BUILTIN_VCVTNEPH2BF8256_MASK, UNKNOWN, (int) V16QI_FTYPE_V16HF_V16QI_UHI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_512, CODE_FOR_vcvtneph2bf8v32hf_mask, "__builtin_ia32_vcvtneph2bf8512_mask", IX86_BUILTIN_VCVTNEPH2BF8512_MASK, UNKNOWN, (int) V32QI_FTYPE_V32HF_V32QI_USI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_vcvtneph2bf8sv8hf_mask, "__builtin_ia32_vcvtneph2bf8s128_mask", IX86_BUILTIN_VCVTNEPH2BF8S128_MASK, UNKNOWN, (int) V16QI_FTYPE_V8HF_V16QI_UQI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_vcvtneph2bf8sv16hf_mask, "__builtin_ia32_vcvtneph2bf8s256_mask", IX86_BUILTIN_VCVTNEPH2BF8S256_MASK, UNKNOWN, (int) V16QI_FTYPE_V16HF_V16QI_UHI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_512, CODE_FOR_vcvtneph2bf8sv32hf_mask, "__builtin_ia32_vcvtneph2bf8s512_mask", IX86_BUILTIN_VCVTNEPH2BF8S512_MASK, UNKNOWN, (int) V32QI_FTYPE_V32HF_V32QI_USI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_vcvtneph2hf8v8hf_mask, "__builtin_ia32_vcvtneph2hf8128_mask", IX86_BUILTIN_VCVTNEPH2HF8128_MASK, UNKNOWN, (int) V16QI_FTYPE_V8HF_V16QI_UQI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_vcvtneph2hf8v16hf_mask, "__builtin_ia32_vcvtneph2hf8256_mask", IX86_BUILTIN_VCVTNEPH2HF8256_MASK, UNKNOWN, (int) V16QI_FTYPE_V16HF_V16QI_UHI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_512, CODE_FOR_vcvtneph2hf8v32hf_mask, "__builtin_ia32_vcvtneph2hf8512_mask", IX86_BUILTIN_VCVTNEPH2HF8512_MASK, UNKNOWN, (int) V32QI_FTYPE_V32HF_V32QI_USI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_vcvtneph2hf8sv8hf_mask, "__builtin_ia32_vcvtneph2hf8s128_mask", IX86_BUILTIN_VCVTNEPH2HF8S128_MASK, UNKNOWN, (int) V16QI_FTYPE_V8HF_V16QI_UQI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_vcvtneph2hf8sv16hf_mask, "__builtin_ia32_vcvtneph2hf8s256_mask", IX86_BUILTIN_VCVTNEPH2HF8S256_MASK, UNKNOWN, (int) V16QI_FTYPE_V16HF_V16QI_UHI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_512, CODE_FOR_vcvtneph2hf8sv32hf_mask, "__builtin_ia32_vcvtneph2hf8s512_mask", IX86_BUILTIN_VCVTNEPH2HF8S512_MASK, UNKNOWN, (int) V32QI_FTYPE_V32HF_V32QI_USI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_vcvthf82phv8hf_mask, "__builtin_ia32_vcvthf82ph128_mask", IX86_BUILTIN_VCVTHF82PH128_MASK, UNKNOWN, (int) V8HF_FTYPE_V16QI_V8HF_UQI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_vcvthf82phv16hf_mask, "__builtin_ia32_vcvthf82ph256_mask", IX86_BUILTIN_VCVTHF82PH256_MASK, UNKNOWN, (int) V16HF_FTYPE_V16QI_V16HF_UHI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_512, CODE_FOR_vcvthf82phv32hf_mask, "__builtin_ia32_vcvthf82ph512_mask", IX86_BUILTIN_VCVTHF82PH512_MASK, UNKNOWN, (int) V32HF_FTYPE_V32QI_V32HF_USI) /* Builtins with rounding support. */ BDESC_END (ARGS, ROUND_ARGS) @@ -3573,6 +3617,8 @@ BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx_sqrtv8sf2_mask_round, "__b BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_subv4df3_mask_round, "__builtin_ia32_subpd256_mask_round", IX86_BUILTIN_VSUBPD256_MASK_ROUND, UNKNOWN, (int) V4DF_FTYPE_V4DF_V4DF_V4DF_UQI_INT) BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_subv16hf3_mask_round, "__builtin_ia32_subph256_mask_round", IX86_BUILTIN_VSUBPH256_MASK_ROUND, UNKNOWN, (int) V16HF_FTYPE_V16HF_V16HF_V16HF_UHI_INT) BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_subv8sf3_mask_round, "__builtin_ia32_subps256_mask_round", IX86_BUILTIN_VSUBPS256_MASK_ROUND, UNKNOWN, (int) V8SF_FTYPE_V8SF_V8SF_V8SF_UQI_INT) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_512, CODE_FOR_avx10_2_cvt2ps2phx_v32hf_mask_round, "__builtin_ia32_vcvt2ps2phx512_mask_round", IX86_BUILTIN_VCVT2PS2PHX_V32HF_MASK_ROUND, UNKNOWN, (int) V32HF_FTYPE_V16SF_V16SF_V32HF_USI_INT) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_cvt2ps2phx_v16hf_mask_round, "__builtin_ia32_vcvt2ps2phx256_mask_round", IX86_BUILTIN_VCVT2PS2PHX_V16HF_MASK_ROUND, UNKNOWN, (int) V16HF_FTYPE_V8SF_V8SF_V16HF_UHI_INT) BDESC_END (ROUND_ARGS, MULTI_ARG) diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc index f1e6bc11f86..c5305395a64 100644 --- a/gcc/config/i386/i386-expand.cc +++ b/gcc/config/i386/i386-expand.cc @@ -11408,6 +11408,7 @@ ix86_expand_args_builtin (const struct builtin_description *d, case V16BF_FTYPE_V16SF_UHI: case V8BF_FTYPE_V8SF_UQI: case V8BF_FTYPE_V4SF_UQI: + case V16QI_FTYPE_V16QI_V8HF: nargs = 2; break; case V2DI_FTYPE_V2DI_INT_CONVERT: @@ -11623,6 +11624,15 @@ ix86_expand_args_builtin (const struct builtin_description *d, case V16SF_FTYPE_V16SF_V32BF_V32BF: case V8SF_FTYPE_V8SF_V16BF_V16BF: case V4SF_FTYPE_V4SF_V8BF_V8BF: + case V16QI_FTYPE_V16QI_V8HF_V8HF: + case V32QI_FTYPE_V32QI_V16HF_V16HF: + case V64QI_FTYPE_V64QI_V32HF_V32HF: + case V16QI_FTYPE_V8HF_V16QI_UQI: + case V16QI_FTYPE_V16HF_V16QI_UHI: + case V32QI_FTYPE_V32HF_V32QI_USI: + case V8HF_FTYPE_V16QI_V8HF_UQI: + case V16HF_FTYPE_V16QI_V16HF_UHI: + case V32HF_FTYPE_V32QI_V32HF_USI: nargs = 3; break; case V32QI_FTYPE_V32QI_V32QI_INT: @@ -11772,6 +11782,15 @@ ix86_expand_args_builtin (const struct builtin_description *d, case V32BF_FTYPE_V16SF_V16SF_V32BF_USI: case V16BF_FTYPE_V8SF_V8SF_V16BF_UHI: case V8BF_FTYPE_V4SF_V4SF_V8BF_UQI: + case V32HF_FTYPE_V16SF_V16SF_V32HF_USI: + case V16HF_FTYPE_V8SF_V8SF_V16HF_UHI: + case V8HF_FTYPE_V4SF_V4SF_V8HF_UQI: + case V16QI_FTYPE_V8HF_V8HF_V16QI_UHI: + case V32QI_FTYPE_V16HF_V16HF_V32QI_USI: + case V64QI_FTYPE_V32HF_V32HF_V64QI_UDI: + case V16QI_FTYPE_V16QI_V8HF_V16QI_UHI: + case V16QI_FTYPE_V32QI_V16HF_V16QI_UHI: + case V32QI_FTYPE_V64QI_V32HF_V32QI_USI: nargs = 4; break; case V2DF_FTYPE_V2DF_V2DF_V2DI_INT: @@ -12525,6 +12544,8 @@ ix86_expand_round_builtin (const struct builtin_description *d, case V8HF_FTYPE_V8HF_V8HF_V8HF_UQI_INT: case V8HF_FTYPE_V2DF_V8HF_V8HF_UQI_INT: case V8HF_FTYPE_V4SF_V8HF_V8HF_UQI_INT: + case V16HF_FTYPE_V8SF_V8SF_V16HF_UHI_INT: + case V32HF_FTYPE_V16SF_V16SF_V32HF_USI_INT: nargs = 5; break; case V32HF_FTYPE_V32HF_INT_V32HF_USI_INT: diff --git a/gcc/config/i386/immintrin.h b/gcc/config/i386/immintrin.h index ce8437d00c2..fea55a298fc 100644 --- a/gcc/config/i386/immintrin.h +++ b/gcc/config/i386/immintrin.h @@ -144,4 +144,8 @@ #include +#include + +#include + #endif /* _IMMINTRIN_H_INCLUDED */ diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index 6f76e8f50ad..1d62f96dcc5 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -216,6 +216,19 @@ ;; For AVX10.2 suppport UNSPEC_VDPPHPS + UNSPEC_VCVTBIASPH2BF8 + UNSPEC_VCVTBIASPH2BF8S + UNSPEC_VCVTBIASPH2HF8 + UNSPEC_VCVTBIASPH2HF8S + UNSPEC_VCVTNE2PH2BF8 + UNSPEC_VCVTNE2PH2BF8S + UNSPEC_VCVTNE2PH2HF8 + UNSPEC_VCVTNE2PH2HF8S + UNSPEC_VCVTNEPH2BF8 + UNSPEC_VCVTNEPH2BF8S + UNSPEC_VCVTNEPH2HF8 + UNSPEC_VCVTNEPH2HF8S + UNSPEC_VCVTHF82PH ]) (define_c_enum "unspecv" [ @@ -483,6 +496,9 @@ [(V32HF "TARGET_EVEX512") (V16HF "TARGET_AVX512VL") (V8HF "TARGET_AVX512VL") (V32BF "TARGET_EVEX512") (V16BF "TARGET_AVX512VL") (V8BF "TARGET_AVX512VL")]) +(define_mode_iterator VHF_AVX10_2 + [(V32HF "TARGET_AVX10_2_512") V16HF V8HF]) + ;; All vector integer modes (define_mode_iterator VI [(V16SI "TARGET_AVX512F && TARGET_EVEX512") @@ -31359,8 +31375,8 @@ (set_attr "mode" "")]) (define_mode_attr bf16_ph - [(V8HF "ph") (V16HF "ph") - (V8BF "bf16") (V16BF "bf16")]) + [(V8HF "ph") (V16HF "ph") (V32HF "ph") + (V8BF "bf16") (V16BF "bf16") (V32BF "bf16")]) (define_insn "vcvtnee2ps_" [(set (match_operand:V4SF 0 "register_operand" "=x") @@ -31418,6 +31434,221 @@ (set_attr "addr" "gpr16") (set_attr "mode" "")]) +(define_insn "avx10_2_cvt2ps2phx_" + [(set (match_operand:VHF_AVX10_2 0 "register_operand" "=v") + (vec_concat:VHF_AVX10_2 + (float_truncate: + (match_operand: 2 "" "")) + (float_truncate: + (match_operand: 1 "register_operand" "v"))))] + "TARGET_AVX10_2_256 && " + "vcvt2ps2phx\t{%2, %1, %0|%0, %1, %2}") + +(define_mode_attr ssebvecmode + [(V8HF "V16QI") (V16HF "V32QI") (V32HF "V64QI")]) + +(define_int_iterator UNSPEC_NECONVERTFP8_PACK + [UNSPEC_VCVTNE2PH2BF8 UNSPEC_VCVTNE2PH2BF8S + UNSPEC_VCVTNE2PH2HF8 UNSPEC_VCVTNE2PH2HF8S]) + +(define_int_attr neconvertfp8_pack + [(UNSPEC_VCVTNE2PH2BF8 "ne2ph2bf8") + (UNSPEC_VCVTNE2PH2BF8S "ne2ph2bf8s") + (UNSPEC_VCVTNE2PH2HF8 "ne2ph2hf8") + (UNSPEC_VCVTNE2PH2HF8S "ne2ph2hf8s")]) + +(define_insn "vcvt" + [(set (match_operand: 0 "register_operand" "=v") + (unspec: + [(match_operand:VHF_AVX10_2 1 "register_operand" "v") + (match_operand:VHF_AVX10_2 2 "nonimmediate_operand" "vm")] + UNSPEC_NECONVERTFP8_PACK))] + "TARGET_AVX10_2_256" + "vcvt\t{%2, %1, %0|%0, %1, %2}" + [(set_attr "prefix" "evex")]) + +(define_mode_attr ssebvecmode_2 + [(V8HF "V16QI") (V16HF "V16QI") (V32HF "V32QI")]) + +(define_int_iterator UNSPEC_VCVTBIASPH2FP8_PACK + [UNSPEC_VCVTBIASPH2BF8 UNSPEC_VCVTBIASPH2BF8S + UNSPEC_VCVTBIASPH2HF8 UNSPEC_VCVTBIASPH2HF8S]) + +(define_int_attr biasph2fp8_pack + [(UNSPEC_VCVTBIASPH2BF8 "biasph2bf8") + (UNSPEC_VCVTBIASPH2BF8S "biasph2bf8s") + (UNSPEC_VCVTBIASPH2HF8 "biasph2hf8") + (UNSPEC_VCVTBIASPH2HF8S "biasph2hf8s")]) + +(define_expand "vcvtv8hf" + [(set (match_operand:V16QI 0 "register_operand") + (vec_concat:V16QI + (unspec:V8QI + [(match_operand:V16QI 1 "register_operand") + (match_operand:V8HF 2 "nonimmediate_operand")] + UNSPEC_VCVTBIASPH2FP8_PACK) + (match_dup 3)))] + "TARGET_AVX10_2_256" + "operands[3] = CONST0_RTX (V8QImode);") + +(define_insn "*vcvtv8hf" + [(set (match_operand:V16QI 0 "register_operand" "=v") + (vec_concat:V16QI + (unspec:V8QI + [(match_operand:V16QI 1 "register_operand" "v") + (match_operand:V8HF 2 "nonimmediate_operand" "vm")] + UNSPEC_VCVTBIASPH2FP8_PACK) + (match_operand:V8QI 3 "const0_operand")))] + "TARGET_AVX10_2_256" + "vcvt\t{%2, %1, %0|%0, %1, %2}" + [(set_attr "prefix" "evex") + (set_attr "mode" "HF")]) + +(define_expand "vcvtv8hf_mask" + [(set (match_operand:V16QI 0 "register_operand") + (vec_concat:V16QI + (vec_merge:V8QI + (unspec:V8QI + [(match_operand:V16QI 1 "register_operand") + (match_operand:V8HF 2 "nonimmediate_operand")] + UNSPEC_VCVTBIASPH2FP8_PACK) + (vec_select:V8QI + (match_operand:V16QI 3 "nonimm_or_0_operand") + (parallel [(const_int 0) (const_int 1) + (const_int 2) (const_int 3) + (const_int 4) (const_int 5) + (const_int 6) (const_int 7)])) + (match_operand:QI 4 "register_operand" "C")) + (match_dup 5)))] + "TARGET_AVX10_2_256" + "operands[5] = CONST0_RTX (V8QImode);") + +(define_insn "*vcvtv8hf_mask" + [(set (match_operand:V16QI 0 "register_operand" "=v") + (vec_concat:V16QI + (vec_merge:V8QI + (unspec:V8QI + [(match_operand:V16QI 1 "register_operand" "v") + (match_operand:V8HF 2 "nonimmediate_operand" "vm")] + UNSPEC_VCVTBIASPH2FP8_PACK) + (vec_select:V8QI + (match_operand:V16QI 3 "nonimm_or_0_operand" "0C") + (parallel [(const_int 0) (const_int 1) + (const_int 2) (const_int 3) + (const_int 4) (const_int 5) + (const_int 6) (const_int 7)])) + (match_operand:QI 4 "register_operand" "Yk")) + (match_operand:V8QI 5 "const0_operand")))] + "TARGET_AVX10_2_256" + "vcvt\t{%2, %1, %0%{%4%}%N3|%0%{%4%}%N3, %1, %2}" + [(set_attr "prefix" "evex")]) + +(define_mode_iterator VHF_AVX10_2_2 + [(V32HF "TARGET_AVX10_2_512") V16HF]) + +(define_insn "vcvt" + [(set (match_operand: 0 "register_operand" "=v") + (unspec: + [(match_operand: 1 "register_operand" "v") + (match_operand:VHF_AVX10_2_2 2 "nonimmediate_operand" "vm")] + UNSPEC_VCVTBIASPH2FP8_PACK))] + "TARGET_AVX10_2_256" + "vcvt\t{%2, %1, %0|%0, %1, %2}" + [(set_attr "prefix" "evex")]) + +(define_mode_iterator VHF_256_512 + [V16HF (V32HF "TARGET_AVX10_2_512")]) + +(define_mode_attr ph2fp8suff + [(V32HF "") (V16HF "{y}") (V8HF "{x}")]) + +(define_int_iterator UNSPEC_NECONVERTPH2FP8 + [UNSPEC_VCVTNEPH2BF8 UNSPEC_VCVTNEPH2BF8S + UNSPEC_VCVTNEPH2HF8 UNSPEC_VCVTNEPH2HF8S]) + +(define_int_attr neconvertph2fp8 + [(UNSPEC_VCVTNEPH2BF8 "neph2bf8") + (UNSPEC_VCVTNEPH2BF8S "neph2bf8s") + (UNSPEC_VCVTNEPH2HF8 "neph2hf8") + (UNSPEC_VCVTNEPH2HF8S "neph2hf8s")]) + +(define_expand "vcvtv8hf" + [(set (match_operand:V16QI 0 "register_operand") + (vec_concat:V16QI + (unspec:V8QI + [(match_operand:V8HF 1 "nonimmediate_operand")] + UNSPEC_NECONVERTPH2FP8) + (match_dup 2)))] + "TARGET_AVX10_2_256" + "operands[2] = CONST0_RTX (V8QImode);") + +(define_insn "*vcvtv8hf" + [(set (match_operand:V16QI 0 "register_operand" "=v") + (vec_concat:V16QI + (unspec:V8QI + [(match_operand:V8HF 1 "nonimmediate_operand" "vm")] + UNSPEC_NECONVERTPH2FP8) + (match_operand:V8QI 2 "const0_operand")))] + "TARGET_AVX10_2_256" + "vcvt{x}\t{%1, %0|%0, %1}" + [(set_attr "prefix" "evex") + (set_attr "mode" "HF")]) + +(define_expand "vcvtv8hf_mask" + [(set (match_operand:V16QI 0 "register_operand") + (vec_concat:V16QI + (vec_merge:V8QI + (unspec:V8QI + [(match_operand:V8HF 1 "nonimmediate_operand")] + UNSPEC_NECONVERTPH2FP8) + (vec_select:V8QI + (match_operand:V16QI 2 "nonimm_or_0_operand") + (parallel [(const_int 0) (const_int 1) + (const_int 2) (const_int 3) + (const_int 4) (const_int 5) + (const_int 6) (const_int 7)])) + (match_operand:QI 3 "register_operand")) + (match_dup 4)))] + "TARGET_AVX10_2_256" + "operands[4] = CONST0_RTX (V8QImode);") + +(define_insn "*vcvtv8hf_mask" + [(set (match_operand:V16QI 0 "register_operand" "=v") + (vec_concat:V16QI + (vec_merge:V8QI + (unspec:V8QI + [(match_operand:V8HF 1 "nonimmediate_operand" "vm")] + UNSPEC_NECONVERTPH2FP8) + (vec_select:V8QI + (match_operand:V16QI 2 "nonimm_or_0_operand" "0C") + (parallel [(const_int 0) (const_int 1) + (const_int 2) (const_int 3) + (const_int 4) (const_int 5) + (const_int 6) (const_int 7)])) + (match_operand:QI 3 "register_operand" "Yk")) + (match_operand:V8QI 4 "const0_operand")))] + "TARGET_AVX10_2_256" + "vcvt{x}\t{%1, %0%{%3%}%N2|%0%{%3%}%N2, %1}" + [(set_attr "prefix" "evex")]) + +(define_insn "vcvt" + [(set (match_operand: 0 "register_operand" "=v") + (unspec: + [(match_operand:VHF_256_512 1 "nonimmediate_operand" "vm")] + UNSPEC_NECONVERTPH2FP8))] + "TARGET_AVX10_2_256" + "vcvt\t{%1, %0|%0, %1}" + [(set_attr "prefix" "evex")]) + +(define_insn "vcvthf82ph" + [(set (match_operand:VHF_AVX10_2 0 "register_operand" "=v") + (unspec:VHF_AVX10_2 + [(match_operand: 1 "nonimmediate_operand" "vm")] + UNSPEC_VCVTHF82PH))] + "TARGET_AVX10_2_256" + "vcvthf82ph\t{%1, %0|%0, %1}" + [(set_attr "prefix" "evex")]) + (define_int_iterator VPDPWPROD [UNSPEC_VPDPWUSD UNSPEC_VPDPWUSDS diff --git a/gcc/testsuite/gcc.target/i386/avx-1.c b/gcc/testsuite/gcc.target/i386/avx-1.c index 5fc84234b57..4a47e313096 100644 --- a/gcc/testsuite/gcc.target/i386/avx-1.c +++ b/gcc/testsuite/gcc.target/i386/avx-1.c @@ -1010,6 +1010,12 @@ #define __builtin_ia32_mpsadbw128_mask(A, B, C, D, E) __builtin_ia32_mpsadbw128_mask (A, B, 1, D, E) #define __builtin_ia32_mpsadbw256_mask(A, B, C, D, E) __builtin_ia32_mpsadbw256_mask (A, B, 1, D, E) +/* avx10_2convertintrin.h */ +#define __builtin_ia32_vcvt2ps2phx256_mask_round(A, B, C, D, E) __builtin_ia32_vcvt2ps2phx256_mask_round(A, B, C, D, 8) + +/* avx10_2-512convertintrin.h */ +#define __builtin_ia32_vcvt2ps2phx512_mask_round(A, B, C, D, E) __builtin_ia32_vcvt2ps2phx512_mask_round(A, B, C, D, 8) + #include #include #include diff --git a/gcc/testsuite/gcc.target/i386/avx-2.c b/gcc/testsuite/gcc.target/i386/avx-2.c index fb0ef9e2aa5..3f4d7353c62 100644 --- a/gcc/testsuite/gcc.target/i386/avx-2.c +++ b/gcc/testsuite/gcc.target/i386/avx-2.c @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-O0 -Werror-implicit-function-declaration -march=k8 -m3dnow -mavx -mavx2 -msse4a -maes -mpclmul -mavx10.2-512" } */ +/* { dg-options "-O0 -Werror-implicit-function-declaration -march=k8 -m3dnow -mavx -mavx2 -msse4a -maes -mpclmul" } */ /* { dg-add-options bind_pic_locally } */ #include @@ -160,4 +160,3 @@ test_2 (_m_pinsrw, __m64, __m64, int, 1) test_1 (_mm_shuffle_pi16, __m64, __m64, 1) test_1 (_m_pshufw, __m64, __m64, 1) test_1 (_mm_prefetch, void, void *, _MM_HINT_NTA) - diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-512-convert-1.c b/gcc/testsuite/gcc.target/i386/avx10_2-512-convert-1.c new file mode 100644 index 00000000000..bbbff186d0a --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx10_2-512-convert-1.c @@ -0,0 +1,176 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx10.2-512 -O2" } */ +/* { dg-final { scan-assembler-times "vcvt2ps2phx\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvt2ps2phx\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvt2ps2phx\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvt2ps2phx\[ \\t\]+\{rn-sae\}\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvt2ps2phx\[ \\t\]+\{rn-sae\}\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvt2ps2phx\[ \\t\]+\{rn-sae\}\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtbiasph2bf8\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+,\[^\{\n\]*%zmm\[0-9\]+,\[^\{\n\]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtbiasph2bf8\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+,\[^\{\n\]*%zmm\[0-9\]+,\[^\{\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtbiasph2bf8\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+,\[^\{\n\]*%zmm\[0-9\]+,\[^\{\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtbiasph2bf8s\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+,\[^\{\n\]*%zmm\[0-9\]+,\[^\{\n\]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtbiasph2bf8s\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+,\[^\{\n\]*%zmm\[0-9\]+,\[^\{\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtbiasph2bf8s\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+,\[^\{\n\]*%zmm\[0-9\]+,\[^\{\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtbiasph2hf8\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+,\[^\{\n\]*%zmm\[0-9\]+,\[^\{\n\]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtbiasph2hf8\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+,\[^\{\n\]*%zmm\[0-9\]+,\[^\{\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtbiasph2hf8\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+,\[^\{\n\]*%zmm\[0-9\]+,\[^\{\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtbiasph2hf8s\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+,\[^\{\n\]*%zmm\[0-9\]+,\[^\{\n\]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtbiasph2hf8s\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+,\[^\{\n\]*%zmm\[0-9\]+,\[^\{\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtbiasph2hf8s\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+,\[^\{\n\]*%zmm\[0-9\]+,\[^\{\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtne2ph2bf8\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+,\[^\{\n\]*%zmm\[0-9\]+,\[^\{\n\]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtne2ph2bf8\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+,\[^\{\n\]*%zmm\[0-9\]+,\[^\{\n\]*%zmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtne2ph2bf8\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+,\[^\{\n\]*%zmm\[0-9\]+,\[^\{\n\]*%zmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtne2ph2bf8s\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+,\[^\{\n\]*%zmm\[0-9\]+,\[^\{\n\]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtne2ph2bf8s\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+,\[^\{\n\]*%zmm\[0-9\]+,\[^\{\n\]*%zmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtne2ph2bf8s\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+,\[^\{\n\]*%zmm\[0-9\]+,\[^\{\n\]*%zmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtne2ph2hf8\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+,\[^\{\n\]*%zmm\[0-9\]+,\[^\{\n\]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtne2ph2hf8\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+,\[^\{\n\]*%zmm\[0-9\]+,\[^\{\n\]*%zmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtne2ph2hf8\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+,\[^\{\n\]*%zmm\[0-9\]+,\[^\{\n\]*%zmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtne2ph2hf8s\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+,\[^\{\n\]*%zmm\[0-9\]+,\[^\{\n\]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtne2ph2hf8s\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+,\[^\{\n\]*%zmm\[0-9\]+,\[^\{\n\]*%zmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtne2ph2hf8s\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+,\[^\{\n\]*%zmm\[0-9\]+,\[^\{\n\]*%zmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvthf82ph\[ \\t\]*%ymm\[0-9\]+,\[^\{\n\]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvthf82ph\[ \\t\]*%ymm\[0-9\]+,\[^\{\n\]*%zmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvthf82ph\[ \\t\]*%ymm\[0-9\]+,\[^\{\n\]*%zmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtneph2bf8\[ \\t\]*%zmm\[0-9\]+,\[^\{\n\]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtneph2bf8\[ \\t\]*%zmm\[0-9\]+,\[^\{\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtneph2bf8\[ \\t\]*%zmm\[0-9\]+,\[^\{\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtneph2bf8s\[ \\t\]*%zmm\[0-9\]+,\[^\{\n\]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtneph2bf8s\[ \\t\]*%zmm\[0-9\]+,\[^\{\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtneph2bf8s\[ \\t\]*%zmm\[0-9\]+,\[^\{\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtneph2hf8\[ \\t\]*%zmm\[0-9\]+,\[^\{\n\]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtneph2hf8\[ \\t\]*%zmm\[0-9\]+,\[^\{\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtneph2hf8\[ \\t\]*%zmm\[0-9\]+,\[^\{\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtneph2hf8s\[ \\t\]*%zmm\[0-9\]+,\[^\{\n\]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtneph2hf8s\[ \\t\]*%zmm\[0-9\]+,\[^\{\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtneph2hf8s\[ \\t\]*%zmm\[0-9\]+,\[^\{\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ + +#include + +volatile __m256i x256i; +volatile __m512i x512i; +volatile __m512 x, a1, b1; +volatile __m512h y, x512h; +volatile __mmask16 m16; +volatile __mmask32 m32; +volatile __mmask64 m64; +const void *a; +__m512bh *c; +__m512h *d; + +void extern +avx10_2_512_test (void) +{ + y = _mm512_cvtx2ps_ph (a1, b1); + y = _mm512_mask_cvtx2ps_ph (y, m32, a1, b1); + y = _mm512_maskz_cvtx2ps_ph (m32, a1, b1); + + y = _mm512_cvtx_round2ps_ph (a1, b1, 8); + y = _mm512_mask_cvtx_round2ps_ph (y, m32, a1, b1, 8); + y = _mm512_maskz_cvtx_round2ps_ph (m32, a1, b1, 8); +} + +void extern +avx10_2_512_vcvtbiasph2bf8_test (void) +{ + x256i = _mm512_cvtbiasph_pbf8 (x512i, x512h); + x256i = _mm512_mask_cvtbiasph_pbf8 (x256i, m32, x512i, x512h); + x256i = _mm512_maskz_cvtbiasph_pbf8 (m32, x512i, x512h); +} + +void extern +avx10_2_512_vcvtbiasph2bf8s_test (void) +{ + x256i = _mm512_cvtbiassph_pbf8 (x512i, x512h); + x256i = _mm512_mask_cvtbiassph_pbf8 (x256i, m32, x512i, x512h); + x256i = _mm512_maskz_cvtbiassph_pbf8 (m32, x512i, x512h); +} + +void extern +avx10_2_512_vcvtbiasph2hf8_test (void) +{ + x256i = _mm512_cvtbiasph_phf8 (x512i, x512h); + x256i = _mm512_mask_cvtbiasph_phf8 (x256i, m32, x512i, x512h); + x256i = _mm512_maskz_cvtbiasph_phf8 (m32, x512i, x512h); +} + +void extern +avx10_2_512_vcvtbiasph2hf8s_test (void) +{ + x256i = _mm512_cvtbiassph_phf8 (x512i, x512h); + x256i = _mm512_mask_cvtbiassph_phf8 (x256i, m32, x512i, x512h); + x256i = _mm512_maskz_cvtbiassph_phf8 (m32, x512i, x512h); +} + +void extern +avx10_2_512_vcvtne2ph2bf8_test (void) +{ + x512i = _mm512_cvtne2ph_pbf8 (x512h, x512h); + x512i = _mm512_mask_cvtne2ph_pbf8 (x512i, m64, x512h, x512h); + x512i = _mm512_maskz_cvtne2ph_pbf8 (m64, x512h, x512h); +} + +void extern +avx10_2_512_vcvtne2ph2bf8s_test (void) +{ + x512i = _mm512_cvtnes2ph_pbf8 (x512h, x512h); + x512i = _mm512_mask_cvtnes2ph_pbf8 (x512i, m64, x512h, x512h); + x512i = _mm512_maskz_cvtnes2ph_pbf8 (m64, x512h, x512h); +} + +void extern +avx10_2_512_vcvtne2ph2hf8_test (void) +{ + x512i = _mm512_cvtne2ph_phf8 (x512h, x512h); + x512i = _mm512_mask_cvtne2ph_phf8 (x512i, m64, x512h, x512h); + x512i = _mm512_maskz_cvtne2ph_phf8 (m64, x512h, x512h); +} + +void extern +avx10_2_512_vcvtne2ph2hf8s_test (void) +{ + x512i = _mm512_cvtnes2ph_phf8 (x512h, x512h); + x512i = _mm512_mask_cvtnes2ph_phf8 (x512i, m64, x512h, x512h); + x512i = _mm512_maskz_cvtnes2ph_phf8 (m64, x512h, x512h); +} + +void extern +avx10_2_512_vcvthf82ph_test (void) +{ + x512h = _mm512_cvthf8_ph (x256i); + x512h = _mm512_mask_cvthf8_ph (x512h, m32, x256i); + x512h = _mm512_maskz_cvthf8_ph (m32, x256i); +} + +void extern +avx10_2_512_vcvtneph2bf8_test (void) +{ + x256i = _mm512_cvtneph_pbf8 (x512h); + x256i = _mm512_mask_cvtneph_pbf8 (x256i, m32, x512h); + x256i = _mm512_maskz_cvtneph_pbf8 (m32, x512h); +} + +void extern +avx10_2_512_vcvtneph2bf8s_test (void) +{ + x256i = _mm512_cvtnesph_pbf8 (x512h); + x256i = _mm512_mask_cvtnesph_pbf8 (x256i, m32, x512h); + x256i = _mm512_maskz_cvtnesph_pbf8 (m32, x512h); +} + +void extern +avx10_2_512_vcvtneph2hf8_test (void) +{ + x256i = _mm512_cvtneph_phf8 (x512h); + x256i = _mm512_mask_cvtneph_phf8 (x256i, m32, x512h); + x256i = _mm512_maskz_cvtneph_phf8 (m32, x512h); +} + +void extern +avx10_2_512_vcvtneph2hf8s_test (void) +{ + x256i = _mm512_cvtnesph_phf8 (x512h); + x256i = _mm512_mask_cvtnesph_phf8 (x256i, m32, x512h); + x256i = _mm512_maskz_cvtnesph_phf8 (m32, x512h); +} diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-512-vcvt2ps2phx-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-512-vcvt2ps2phx-2.c new file mode 100644 index 00000000000..40dbe18abbe --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx10_2-512-vcvt2ps2phx-2.c @@ -0,0 +1,51 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mavx10.2-512" } */ +/* { dg-require-effective-target avx10_2_512 } */ + +#ifndef AVX10_2 +#define AVX10_2 +#define AVX10_2_512 +#define AVX10_512BIT +#endif +#include "avx10-helper.h" +#include + +#define SIZE_RES (AVX512F_LEN / 16) + +static void +CALC (_Float16 *res_ref, float *src1, float *src2) +{ + float fp32; + int i; + for (i = 0; i < SIZE_RES / 2; i++) + { + fp32 = (float) 2 * i + 7 + i * 0.5; + res_ref[i] = fp32; + src2[i] = fp32; + } + for (i = SIZE_RES / 2; i < SIZE_RES; i++) + { + fp32 = (float)2 * i + 7 + i * 0.5; + res_ref[i] = fp32; + src1[i - (SIZE_RES / 2)] = fp32; + } +} + +void +TEST (void) +{ + int i; + UNION_TYPE (AVX512F_LEN, h) res1; + UNION_TYPE (AVX512F_LEN, ) src1, src2; + _Float16 res_ref[SIZE_RES]; + float fp32; + + for (i = 0; i < SIZE_RES; i++) + res1.a[i] = 5; + + CALC (res_ref, src1.a, src2.a); + + res1.x = INTRINSIC (_cvtx2ps_ph) (src1.x, src2.x); + if (UNION_CHECK (AVX512F_LEN, h) (res1, res_ref)) + abort (); +} diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-512-vcvtbiasph2bf8-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-512-vcvtbiasph2bf8-2.c new file mode 100644 index 00000000000..9ce3c9059f1 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx10_2-512-vcvtbiasph2bf8-2.c @@ -0,0 +1,59 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mavx10.2-512" } */ +/* { dg-require-effective-target avx10_2_512 } */ + +#ifndef AVX10_2 +#define AVX10_2 +#define AVX10_2_512 +#define AVX10_512BIT +#endif + +#include "avx10-helper.h" +#include "fp8-helper.h" + +#define SRC_F8_I8 (AVX512F_LEN / 8) +#define SRC_F16 (AVX512F_LEN / 16) +#define DST_F8_I8 (AVX512F_LEN_HALF / 8) +#define DST_F16 (AVX512F_LEN_HALF / 16) + +void +CALC (unsigned char *r, char *src1, _Float16 *src2) +{ + int i, hf8_bf8, saturate; + + hf8_bf8 = 1; + saturate = 0; + + for (i = 0; i < DST_F8_I8; i++) + { + Float16Union usrc = {.f16 = src2[i]}; + r[i] = convert_fp16_to_fp8(usrc.f16, src1[2 * i], hf8_bf8, saturate); + } + + if (AVX512F_LEN == 128) + for (i = DST_F16; i < DST_F8_I8; i++) + r[i] = 0; +} + +void +TEST (void) +{ + int i,sign; + UNION_TYPE (AVX512F_LEN_HALF, i_b) res; + UNION_TYPE (AVX512F_LEN, i_b) src1; + UNION_TYPE (AVX512F_LEN, h) src2; + unsigned char res_ref[DST_F8_I8]; + + sign = 1; + for (i = 0; i < SRC_F16; i++) + { + src2.a[i] = (_Float16)(sign * (2.5 * (1 << (i % 3)))); + sign = -sign; + } + + res.x = INTRINSIC (_cvtbiasph_pbf8) (src1.x, src2.x); + CALC(res_ref, src1.a, src2.a); + + if (UNION_CHECK (AVX512F_LEN_HALF, i_b) (res, res_ref)) + abort (); +} diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-512-vcvtbiasph2bf8s-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-512-vcvtbiasph2bf8s-2.c new file mode 100644 index 00000000000..5e33b8dc498 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx10_2-512-vcvtbiasph2bf8s-2.c @@ -0,0 +1,59 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mavx10.2-512" } */ +/* { dg-require-effective-target avx10_2_512 } */ + +#ifndef AVX10_2 +#define AVX10_2 +#define AVX10_2_512 +#define AVX10_512BIT +#endif + +#include "avx10-helper.h" +#include "fp8-helper.h" + +#define SRC_F8_I8 (AVX512F_LEN / 8) +#define SRC_F16 (AVX512F_LEN / 16) +#define DST_F8_I8 (AVX512F_LEN_HALF / 8) +#define DST_F16 (AVX512F_LEN_HALF / 16) + +void +CALC (unsigned char *r, char *src1, _Float16 *src2) +{ + int i, hf8_bf8, saturate; + + hf8_bf8 = 1; + saturate = 1; + + for (i = 0; i < DST_F8_I8; i++) + { + Float16Union usrc = {.f16 = src2[i]}; + r[i] = convert_fp16_to_fp8(usrc.f16, src1[2 * i], hf8_bf8, saturate); + } + + if (AVX512F_LEN == 128) + for (i = DST_F16; i < DST_F8_I8; i++) + r[i] = 0; +} + +void +TEST (void) +{ + int i,sign; + UNION_TYPE (AVX512F_LEN_HALF, i_b) res; + UNION_TYPE (AVX512F_LEN, i_b) src1; + UNION_TYPE (AVX512F_LEN, h) src2; + unsigned char res_ref[DST_F8_I8]; + + sign = 1; + for (i = 0; i < SRC_F16; i++) + { + src2.a[i] = (_Float16)(sign * (2.5 * (1 << (i % 3)))); + sign = -sign; + } + + res.x = INTRINSIC (_cvtbiassph_pbf8) (src1.x, src2.x); + CALC(res_ref, src1.a, src2.a); + + if (UNION_CHECK (AVX512F_LEN_HALF, i_b) (res, res_ref)) + abort (); +} diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-512-vcvtbiasph2hf8-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-512-vcvtbiasph2hf8-2.c new file mode 100644 index 00000000000..96d1a33adcd --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx10_2-512-vcvtbiasph2hf8-2.c @@ -0,0 +1,59 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mavx10.2-512" } */ +/* { dg-require-effective-target avx10_2_512 } */ + +#ifndef AVX10_2 +#define AVX10_2 +#define AVX10_2_512 +#define AVX10_512BIT +#endif + +#include "avx10-helper.h" +#include "fp8-helper.h" + +#define SRC_F8_I8 (AVX512F_LEN / 8) +#define SRC_F16 (AVX512F_LEN / 16) +#define DST_F8_I8 (AVX512F_LEN_HALF / 8) +#define DST_F16 (AVX512F_LEN_HALF / 16) + +void +CALC (unsigned char *r, char *src1, _Float16 *src2) +{ + int i, hf8_bf8, saturate; + + hf8_bf8 = 0; + saturate = 0; + + for (i = 0; i < DST_F8_I8; i++) + { + Float16Union usrc = {.f16 = src2[i]}; + r[i] = convert_fp16_to_fp8(usrc.f16, src1[2 * i], hf8_bf8, saturate); + } + + if (AVX512F_LEN == 128) + for (i = DST_F16; i < DST_F8_I8; i++) + r[i] = 0; +} + +void +TEST (void) +{ + int i,sign; + UNION_TYPE (AVX512F_LEN_HALF, i_b) res; + UNION_TYPE (AVX512F_LEN, i_b) src1; + UNION_TYPE (AVX512F_LEN, h) src2; + unsigned char res_ref[DST_F8_I8]; + + sign = 1; + for (i = 0; i < SRC_F16; i++) + { + src2.a[i] = (_Float16)(sign * (2.5 * (1 << (i % 3)))); + sign = -sign; + } + + res.x = INTRINSIC (_cvtbiasph_phf8) (src1.x, src2.x); + CALC(res_ref, src1.a, src2.a); + + if (UNION_CHECK (AVX512F_LEN_HALF, i_b) (res, res_ref)) + abort (); +} diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-512-vcvtbiasph2hf8s-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-512-vcvtbiasph2hf8s-2.c new file mode 100644 index 00000000000..e66b952a45e --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx10_2-512-vcvtbiasph2hf8s-2.c @@ -0,0 +1,59 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mavx10.2-512" } */ +/* { dg-require-effective-target avx10_2_512 } */ + +#ifndef AVX10_2 +#define AVX10_2 +#define AVX10_2_512 +#define AVX10_512BIT +#endif + +#include "avx10-helper.h" +#include "fp8-helper.h" + +#define SRC_F8_I8 (AVX512F_LEN / 8) +#define SRC_F16 (AVX512F_LEN / 16) +#define DST_F8_I8 (AVX512F_LEN_HALF / 8) +#define DST_F16 (AVX512F_LEN_HALF / 16) + +void +CALC (unsigned char *r, char *src1, _Float16 *src2) +{ + int i, hf8_bf8, saturate; + + hf8_bf8 = 0; + saturate = 1; + + for (i = 0; i < DST_F8_I8; i++) + { + Float16Union usrc = {.f16 = src2[i]}; + r[i] = convert_fp16_to_fp8(usrc.f16, src1[2 * i], hf8_bf8, saturate); + } + + if (AVX512F_LEN == 128) + for (i = DST_F16; i < DST_F8_I8; i++) + r[i] = 0; +} + +void +TEST (void) +{ + int i,sign; + UNION_TYPE (AVX512F_LEN_HALF, i_b) res; + UNION_TYPE (AVX512F_LEN, i_b) src1; + UNION_TYPE (AVX512F_LEN, h) src2; + unsigned char res_ref[DST_F8_I8]; + + sign = 1; + for (i = 0; i < SRC_F16; i++) + { + src2.a[i] = (_Float16)(sign * (2.5 * (1 << (i % 3)))); + sign = -sign; + } + + res.x = INTRINSIC (_cvtbiassph_phf8) (src1.x, src2.x); + CALC(res_ref, src1.a, src2.a); + + if (UNION_CHECK (AVX512F_LEN_HALF, i_b) (res, res_ref)) + abort (); +} diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-512-vcvthf82ph-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-512-vcvthf82ph-2.c new file mode 100644 index 00000000000..6b9f07ff86a --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx10_2-512-vcvthf82ph-2.c @@ -0,0 +1,45 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mavx10.2-512" } */ +/* { dg-require-effective-target avx10_2_512 } */ + +#ifndef AVX10_2 +#define AVX10_2 +#define AVX10_2_512 +#define AVX10_512BIT +#endif + +#include "avx10-helper.h" +#include "fp8-helper.h" + +#define SIZE_SRC (AVX512F_LEN_HALF / 8) +#define SIZE_RES (AVX512F_LEN / 16) + +void +CALC (_Float16 *r, unsigned char *s) +{ + int i; + for (i = 0; i < SIZE_RES; i++) + r[i] = convert_hf8_to_fp16(s[i]); +} + +void +TEST (void) +{ + int i,sign; + UNION_TYPE (AVX512F_LEN, h) res; + UNION_TYPE (AVX512F_LEN_HALF, i_b) src; + _Float16 res_ref[SIZE_RES]; + + sign = 1; + for (i = 0; i < SIZE_SRC; i++) + { + src.a[i] = sign * (2.5 * (1 << (i % 3))); + sign = -sign; + } + + res.x = INTRINSIC (_cvthf8_ph) (src.x); + CALC(res_ref, src.a); + + if (UNION_ROUGH_CHECK (AVX512F_LEN, h) (res, res_ref, 0.0009765625)) + abort (); +} diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-512-vcvtne2ph2bf8-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-512-vcvtne2ph2bf8-2.c new file mode 100644 index 00000000000..96fa7c1634d --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx10_2-512-vcvtne2ph2bf8-2.c @@ -0,0 +1,65 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mavx10.2-512" } */ +/* { dg-require-effective-target avx10_2_512 } */ + +#ifndef AVX10_2 +#define AVX10_2 +#define AVX10_2_512 +#define AVX10_512BIT +#endif + +#include "avx10-helper.h" +#include "fp8-helper.h" + +#define SIZE_SRC (AVX512F_LEN / 16) +#define SIZE_RES (AVX512F_LEN / 8) + +void +CALC (unsigned char *r, _Float16 *s1, _Float16 *s2) +{ + _Float16 temp; + Float16Union ut = {.f16 = temp}; + int i, hf8_bf8, saturate; + + hf8_bf8 = 1; + saturate = 0; + + for (i = 0; i < SIZE_RES; i++) + { + r[i] = 0; + if (i < SIZE_SRC) + { + Float16Union usrc2 = {.f16 = s2[i]}; + ut.u16 = usrc2.u16; + } + else + { + Float16Union usrc1 = {.f16 = s1[i-SIZE_SRC]}; + ut.u16 = usrc1.u16; + } + r[i] = convert_fp16_to_fp8(ut.f16, 0, hf8_bf8, saturate); + } +} + +void +TEST (void) +{ + int i,sign; + UNION_TYPE (AVX512F_LEN, i_b) res; + UNION_TYPE (AVX512F_LEN, h) src1, src2; + unsigned char res_ref[SIZE_RES]; + + sign = 1; + for (i = 0; i < SIZE_SRC; i++) + { + src1.a[i] = (_Float16)(sign * (1.5 * (1 << (i % 3)))); + src2.a[i] = (_Float16)(-sign * (2.5 * (1 << (i % 3)))); + sign = -sign; + } + + res.x = INTRINSIC (_cvtne2ph_pbf8) (src1.x, src2.x); + CALC(res_ref, src1.a, src2.a); + + if (UNION_CHECK (AVX512F_LEN, i_b) (res, res_ref)) + abort (); +} diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-512-vcvtne2ph2bf8s-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-512-vcvtne2ph2bf8s-2.c new file mode 100644 index 00000000000..cead411e178 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx10_2-512-vcvtne2ph2bf8s-2.c @@ -0,0 +1,65 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mavx10.2-512" } */ +/* { dg-require-effective-target avx10_2_512 } */ + +#ifndef AVX10_2 +#define AVX10_2 +#define AVX10_2_512 +#define AVX10_512BIT +#endif + +#include "avx10-helper.h" +#include "fp8-helper.h" + +#define SIZE_SRC (AVX512F_LEN / 16) +#define SIZE_RES (AVX512F_LEN / 8) + +void +CALC (unsigned char *r, _Float16 *s1, _Float16 *s2) +{ + _Float16 temp; + Float16Union ut = {.f16 = temp}; + int i, hf8_bf8, saturate; + + hf8_bf8 = 1; + saturate = 1; + + for (i = 0; i < SIZE_RES; i++) + { + r[i] = 0; + if (i < SIZE_SRC) + { + Float16Union usrc2 = {.f16 = s2[i]}; + ut.u16 = usrc2.u16; + } + else + { + Float16Union usrc1 = {.f16 = s1[i-SIZE_SRC]}; + ut.u16 = usrc1.u16; + } + r[i] = convert_fp16_to_fp8(ut.f16, 0, hf8_bf8, saturate); + } +} + +void +TEST (void) +{ + int i,sign; + UNION_TYPE (AVX512F_LEN, i_b) res; + UNION_TYPE (AVX512F_LEN, h) src1, src2; + unsigned char res_ref[SIZE_RES]; + + sign = 1; + for (i = 0; i < SIZE_SRC; i++) + { + src1.a[i] = (_Float16)(sign * (1.5 * (1 << (i % 3)))); + src2.a[i] = (_Float16)(-sign * (2.5 * (1 << (i % 3)))); + sign = -sign; + } + + res.x = INTRINSIC (_cvtnes2ph_pbf8) (src1.x, src2.x); + CALC(res_ref, src1.a, src2.a); + + if (UNION_CHECK (AVX512F_LEN, i_b) (res, res_ref)) + abort (); +} diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-512-vcvtne2ph2hf8-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-512-vcvtne2ph2hf8-2.c new file mode 100644 index 00000000000..6887b4085f5 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx10_2-512-vcvtne2ph2hf8-2.c @@ -0,0 +1,65 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mavx10.2-512" } */ +/* { dg-require-effective-target avx10_2_512 } */ + +#ifndef AVX10_2 +#define AVX10_2 +#define AVX10_2_512 +#define AVX10_512BIT +#endif + +#include "avx10-helper.h" +#include "fp8-helper.h" + +#define SIZE_SRC (AVX512F_LEN / 16) +#define SIZE_RES (AVX512F_LEN / 8) + +void +CALC (unsigned char *r, _Float16 *s1, _Float16 *s2) +{ + _Float16 temp; + Float16Union ut = {.f16 = temp}; + int i, hf8_bf8, saturate; + + hf8_bf8 = 0; + saturate = 0; + + for (i = 0; i < SIZE_RES; i++) + { + r[i] = 0; + if (i < SIZE_SRC) + { + Float16Union usrc2 = {.f16 = s2[i]}; + ut.u16 = usrc2.u16; + } + else + { + Float16Union usrc1 = {.f16 = s1[i-SIZE_SRC]}; + ut.u16 = usrc1.u16; + } + r[i] = convert_fp16_to_fp8(ut.f16, 0, hf8_bf8, saturate); + } +} + +void +TEST (void) +{ + int i,sign; + UNION_TYPE (AVX512F_LEN, i_b) res; + UNION_TYPE (AVX512F_LEN, h) src1, src2; + unsigned char res_ref[SIZE_RES]; + + sign = 1; + for (i = 0; i < SIZE_SRC; i++) + { + src1.a[i] = (_Float16)(sign * (1.5 * (1 << (i % 3)))); + src2.a[i] = (_Float16)(-sign * (2.5 * (1 << (i % 3)))); + sign = -sign; + } + + res.x = INTRINSIC (_cvtne2ph_phf8) (src1.x, src2.x); + CALC(res_ref, src1.a, src2.a); + + if (UNION_CHECK (AVX512F_LEN, i_b) (res, res_ref)) + abort (); +} diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-512-vcvtne2ph2hf8s-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-512-vcvtne2ph2hf8s-2.c new file mode 100644 index 00000000000..6637d5e726f --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx10_2-512-vcvtne2ph2hf8s-2.c @@ -0,0 +1,65 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mavx10.2-512" } */ +/* { dg-require-effective-target avx10_2_512 } */ + +#ifndef AVX10_2 +#define AVX10_2 +#define AVX10_2_512 +#define AVX10_512BIT +#endif + +#include "avx10-helper.h" +#include "fp8-helper.h" + +#define SIZE_SRC (AVX512F_LEN / 16) +#define SIZE_RES (AVX512F_LEN / 8) + +void +CALC (unsigned char *r, _Float16 *s1, _Float16 *s2) +{ + _Float16 temp; + Float16Union ut = {.f16 = temp}; + int i, hf8_bf8, saturate; + + hf8_bf8 = 0; + saturate = 1; + + for (i = 0; i < SIZE_RES; i++) + { + r[i] = 0; + if (i < SIZE_SRC) + { + Float16Union usrc2 = {.f16 = s2[i]}; + ut.u16 = usrc2.u16; + } + else + { + Float16Union usrc1 = {.f16 = s1[i-SIZE_SRC]}; + ut.u16 = usrc1.u16; + } + r[i] = convert_fp16_to_fp8(ut.f16, 0, hf8_bf8, saturate); + } +} + +void +TEST (void) +{ + int i,sign; + UNION_TYPE (AVX512F_LEN, i_b) res; + UNION_TYPE (AVX512F_LEN, h) src1, src2; + unsigned char res_ref[SIZE_RES]; + + sign = 1; + for (i = 0; i < SIZE_SRC; i++) + { + src1.a[i] = (_Float16)(sign * (1.5 * (1 << (i % 3)))); + src2.a[i] = (_Float16)(-sign * (2.5 * (1 << (i % 3)))); + sign *= -1; + } + + res.x = INTRINSIC (_cvtnes2ph_phf8) (src1.x, src2.x); + CALC(res_ref, src1.a, src2.a); + + if (UNION_CHECK (AVX512F_LEN, i_b) (res, res_ref)) + abort (); +} diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-512-vcvtneph2bf8-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-512-vcvtneph2bf8-2.c new file mode 100644 index 00000000000..253b8424ee2 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx10_2-512-vcvtneph2bf8-2.c @@ -0,0 +1,58 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mavx10.2-512" } */ +/* { dg-require-effective-target avx10_2_512 } */ + +#ifndef AVX10_2 +#define AVX10_2 +#define AVX10_2_512 +#define AVX10_512BIT +#define AVX512F_LEN 512 +#define AVX512F_LEN_HALF 256 +#endif + +#include "avx10-helper.h" +#include "fp8-helper.h" + +#define SIZE_SRC (AVX512F_LEN / 16) +#define SIZE_RES (AVX512F_LEN_HALF / 8) + +void +CALC (unsigned char *r, _Float16 *s) +{ + int i, hf8_bf8, saturate; + + hf8_bf8 = 1; + saturate = 0; + + for (i = 0; i < SIZE_RES; i++) + { + r[i] = 0; + if (i < SIZE_SRC) + { + Float16Union usrc = {.f16 = s[i]}; + r[i] = convert_fp16_to_fp8(usrc.f16, 0, hf8_bf8, saturate); + } + } +} + +void +TEST (void) +{ + int i,sign; + UNION_TYPE (AVX512F_LEN_HALF, i_b) res; + UNION_TYPE (AVX512F_LEN, h) src; + unsigned char res_ref[SIZE_RES]; + + sign = 1; + for (i = 0; i < SIZE_SRC; i++) + { + src.a[i] = (_Float16)(sign * (2.5 * (1 << (i % 3)))); + sign = -sign; + } + + res.x = INTRINSIC (_cvtneph_pbf8) (src.x); + CALC(res_ref, src.a); + + if (UNION_CHECK (AVX512F_LEN_HALF, i_b) (res, res_ref)) + abort (); +} diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-512-vcvtneph2bf8s-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-512-vcvtneph2bf8s-2.c new file mode 100644 index 00000000000..b7f9944f1c9 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx10_2-512-vcvtneph2bf8s-2.c @@ -0,0 +1,56 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mavx10.2-512" } */ +/* { dg-require-effective-target avx10_2_512 } */ + +#ifndef AVX10_2 +#define AVX10_2 +#define AVX10_2_512 +#define AVX10_512BIT +#endif + +#include "avx10-helper.h" +#include "fp8-helper.h" + +#define SIZE_SRC (AVX512F_LEN / 16) +#define SIZE_RES (AVX512F_LEN_HALF / 8) + +void +CALC (unsigned char *r, _Float16 *s) +{ + int i, hf8_bf8, saturate; + + hf8_bf8 = 1; + saturate = 1; + + for (i = 0; i < SIZE_RES; i++) + { + r[i] = 0; + if (i < SIZE_SRC) + { + Float16Union usrc = {.f16 = s[i]}; + r[i] = convert_fp16_to_fp8(usrc.f16, 0, hf8_bf8, saturate); + } + } +} + +void +TEST (void) +{ + int i,sign; + UNION_TYPE (AVX512F_LEN_HALF, i_b) res; + UNION_TYPE (AVX512F_LEN, h) src; + unsigned char res_ref[SIZE_RES]; + + sign = 1; + for (i = 0; i < SIZE_SRC; i++) + { + src.a[i] = (_Float16)(sign * (2.5 * (1 << (i % 3)))); + sign = -sign; + } + + res.x = INTRINSIC (_cvtnesph_pbf8) (src.x); + CALC(res_ref, src.a); + + if (UNION_CHECK (AVX512F_LEN_HALF, i_b) (res, res_ref)) + abort (); +} diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-512-vcvtneph2hf8-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-512-vcvtneph2hf8-2.c new file mode 100644 index 00000000000..75f1292a33c --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx10_2-512-vcvtneph2hf8-2.c @@ -0,0 +1,56 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mavx10.2-512" } */ +/* { dg-require-effective-target avx10_2_512 } */ + +#ifndef AVX10_2 +#define AVX10_2 +#define AVX10_2_512 +#define AVX10_512BIT +#endif + +#include "avx10-helper.h" +#include "fp8-helper.h" + +#define SIZE_SRC (AVX512F_LEN / 16) +#define SIZE_RES (AVX512F_LEN_HALF / 8) + +void +CALC (unsigned char *r, _Float16 *s) +{ + int i, hf8_bf8, saturate; + + hf8_bf8 = 0; + saturate = 0; + + for (i = 0; i < SIZE_RES; i++) + { + r[i] = 0; + if (i < SIZE_SRC) + { + Float16Union usrc = {.f16 = s[i]}; + r[i] = convert_fp16_to_fp8(usrc.f16, 0, hf8_bf8, saturate); + } + } +} + +void +TEST (void) +{ + int i,sign; + UNION_TYPE (AVX512F_LEN_HALF, i_b) res; + UNION_TYPE (AVX512F_LEN, h) src; + unsigned char res_ref[SIZE_RES]; + + sign = 1; + for (i = 0; i < SIZE_SRC; i++) + { + src.a[i] = (_Float16)(sign * (2.5 * (1 << (i % 3)))); + sign = -sign; + } + + res.x = INTRINSIC (_cvtneph_phf8) (src.x); + CALC(res_ref, src.a); + + if (UNION_CHECK (AVX512F_LEN_HALF, i_b) (res, res_ref)) + abort (); +} diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-512-vcvtneph2hf8s-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-512-vcvtneph2hf8s-2.c new file mode 100644 index 00000000000..b0f3cb07019 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx10_2-512-vcvtneph2hf8s-2.c @@ -0,0 +1,56 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mavx10.2-512" } */ +/* { dg-require-effective-target avx10_2_512 } */ + +#ifndef AVX10_2 +#define AVX10_2 +#define AVX10_2_512 +#define AVX10_512BIT +#endif + +#include "avx10-helper.h" +#include "fp8-helper.h" + +#define SIZE_SRC (AVX512F_LEN / 16) +#define SIZE_RES (AVX512F_LEN_HALF / 8) + +void +CALC (unsigned char *r, _Float16 *s) +{ + int i, hf8_bf8, saturate; + + hf8_bf8 = 0; + saturate = 1; + + for (i = 0; i < SIZE_RES; i++) + { + r[i] = 0; + if (i < SIZE_SRC) + { + Float16Union usrc = {.f16 = s[i]}; + r[i] = convert_fp16_to_fp8(usrc.f16, 0, hf8_bf8, saturate); + } + } +} + +void +TEST (void) +{ + int i,sign; + UNION_TYPE (AVX512F_LEN_HALF, i_b) res; + UNION_TYPE (AVX512F_LEN, h) src; + unsigned char res_ref[SIZE_RES]; + + sign = 1; + for (i = 0; i < SIZE_SRC; i++) + { + src.a[i] = (_Float16)(sign * (2.5 * (1 << (i % 3)))); + sign = -sign; + } + + res.x = INTRINSIC (_cvtnesph_phf8) (src.x); + CALC(res_ref, src.a); + + if (UNION_CHECK (AVX512F_LEN_HALF, i_b) (res, res_ref)) + abort (); +} diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-convert-1.c b/gcc/testsuite/gcc.target/i386/avx10_2-convert-1.c new file mode 100644 index 00000000000..015474f8cf3 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx10_2-convert-1.c @@ -0,0 +1,274 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx10.2 -O2" } */ +/* { dg-final { scan-assembler-times "vcvt2ps2phx\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvt2ps2phx\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvt2ps2phx\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvt2ps2phx\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvt2ps2phx\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvt2ps2phx\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvt2ps2phx\[ \\t\]+\{rn-sae\}\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvt2ps2phx\[ \\t\]+\{rn-sae\}\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvt2ps2phx\[ \\t\]+\{rn-sae\}\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtbiasph2bf8\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+,\[^\{\n\]*%xmm\[0-9\]+,\[^\{\n\]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtbiasph2bf8\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+,\[^\{\n\]*%xmm\[0-9\]+,\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtbiasph2bf8\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+,\[^\{\n\]*%xmm\[0-9\]+,\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtbiasph2bf8\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+,\[^\{\n\]*%ymm\[0-9\]+,\[^\{\n\]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtbiasph2bf8\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+,\[^\{\n\]*%ymm\[0-9\]+,\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtbiasph2bf8\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+,\[^\{\n\]*%ymm\[0-9\]+,\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtbiasph2bf8s\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+,\[^\{\n\]*%xmm\[0-9\]+,\[^\{\n\]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtbiasph2bf8s\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+,\[^\{\n\]*%xmm\[0-9\]+,\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtbiasph2bf8s\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+,\[^\{\n\]*%xmm\[0-9\]+,\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtbiasph2bf8s\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+,\[^\{\n\]*%ymm\[0-9\]+,\[^\{\n\]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtbiasph2bf8s\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+,\[^\{\n\]*%ymm\[0-9\]+,\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtbiasph2bf8s\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+,\[^\{\n\]*%ymm\[0-9\]+,\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtbiasph2hf8\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+,\[^\{\n\]*%xmm\[0-9\]+,\[^\{\n\]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtbiasph2hf8\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+,\[^\{\n\]*%xmm\[0-9\]+,\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtbiasph2hf8\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+,\[^\{\n\]*%xmm\[0-9\]+,\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtbiasph2hf8\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+,\[^\{\n\]*%ymm\[0-9\]+,\[^\{\n\]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtbiasph2hf8\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+,\[^\{\n\]*%ymm\[0-9\]+,\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtbiasph2hf8\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+,\[^\{\n\]*%ymm\[0-9\]+,\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtbiasph2hf8s\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+,\[^\{\n\]*%xmm\[0-9\]+,\[^\{\n\]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtbiasph2hf8s\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+,\[^\{\n\]*%xmm\[0-9\]+,\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtbiasph2hf8s\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+,\[^\{\n\]*%xmm\[0-9\]+,\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtbiasph2hf8s\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+,\[^\{\n\]*%ymm\[0-9\]+,\[^\{\n\]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtbiasph2hf8s\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+,\[^\{\n\]*%ymm\[0-9\]+,\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtbiasph2hf8s\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+,\[^\{\n\]*%ymm\[0-9\]+,\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtne2ph2bf8\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+,\[^\{\n\]*%xmm\[0-9\]+,\[^\{\n\]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtne2ph2bf8\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+,\[^\{\n\]*%xmm\[0-9\]+,\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtne2ph2bf8\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+,\[^\{\n\]*%xmm\[0-9\]+,\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtne2ph2bf8\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+,\[^\{\n\]*%ymm\[0-9\]+,\[^\{\n\]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtne2ph2bf8\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+,\[^\{\n\]*%ymm\[0-9\]+,\[^\{\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtne2ph2bf8\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+,\[^\{\n\]*%ymm\[0-9\]+,\[^\{\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtne2ph2bf8s\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+,\[^\{\n\]*%xmm\[0-9\]+,\[^\{\n\]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtne2ph2bf8s\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+,\[^\{\n\]*%xmm\[0-9\]+,\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtne2ph2bf8s\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+,\[^\{\n\]*%xmm\[0-9\]+,\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtne2ph2bf8s\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+,\[^\{\n\]*%ymm\[0-9\]+,\[^\{\n\]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtne2ph2bf8s\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+,\[^\{\n\]*%ymm\[0-9\]+,\[^\{\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtne2ph2bf8s\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+,\[^\{\n\]*%ymm\[0-9\]+,\[^\{\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtne2ph2hf8\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+,\[^\{\n\]*%xmm\[0-9\]+,\[^\{\n\]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtne2ph2hf8\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+,\[^\{\n\]*%xmm\[0-9\]+,\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtne2ph2hf8\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+,\[^\{\n\]*%xmm\[0-9\]+,\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtne2ph2hf8\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+,\[^\{\n\]*%ymm\[0-9\]+,\[^\{\n\]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtne2ph2hf8\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+,\[^\{\n\]*%ymm\[0-9\]+,\[^\{\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtne2ph2hf8\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+,\[^\{\n\]*%ymm\[0-9\]+,\[^\{\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtne2ph2hf8s\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+,\[^\{\n\]*%xmm\[0-9\]+,\[^\{\n\]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtne2ph2hf8s\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+,\[^\{\n\]*%xmm\[0-9\]+,\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtne2ph2hf8s\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+,\[^\{\n\]*%xmm\[0-9\]+,\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtne2ph2hf8s\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+,\[^\{\n\]*%ymm\[0-9\]+,\[^\{\n\]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtne2ph2hf8s\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+,\[^\{\n\]*%ymm\[0-9\]+,\[^\{\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtne2ph2hf8s\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+,\[^\{\n\]*%ymm\[0-9\]+,\[^\{\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvthf82ph\[ \\t\]*%xmm\[0-9\]+,\[^\{\n\]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvthf82ph\[ \\t\]*%xmm\[0-9\]+,\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvthf82ph\[ \\t\]*%xmm\[0-9\]+,\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvthf82ph\[ \\t\]*%xmm\[0-9\]+,\[^\{\n\]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvthf82ph\[ \\t\]*%xmm\[0-9\]+,\[^\{\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvthf82ph\[ \\t\]*%xmm\[0-9\]+,\[^\{\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtneph2bf8x\[ \\t\]*%xmm\[0-9\]+,\[^\{\n\]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtneph2bf8x\[ \\t\]*%xmm\[0-9\]+,\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtneph2bf8x\[ \\t\]*%xmm\[0-9\]+,\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtneph2bf8y\[ \\t\]*%ymm\[0-9\]+,\[^\{\n\]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtneph2bf8y\[ \\t\]*%ymm\[0-9\]+,\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtneph2bf8y\[ \\t\]*%ymm\[0-9\]+,\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtneph2bf8sx\[ \\t\]*%xmm\[0-9\]+,\[^\{\n\]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtneph2bf8sx\[ \\t\]*%xmm\[0-9\]+,\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtneph2bf8sx\[ \\t\]*%xmm\[0-9\]+,\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtneph2bf8sy\[ \\t\]*%ymm\[0-9\]+,\[^\{\n\]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtneph2bf8sy\[ \\t\]*%ymm\[0-9\]+,\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtneph2bf8sy\[ \\t\]*%ymm\[0-9\]+,\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtneph2hf8x\[ \\t\]*%xmm\[0-9\]+,\[^\{\n\]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtneph2hf8x\[ \\t\]*%xmm\[0-9\]+,\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtneph2hf8x\[ \\t\]*%xmm\[0-9\]+,\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtneph2hf8y\[ \\t\]*%ymm\[0-9\]+,\[^\{\n\]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtneph2hf8y\[ \\t\]*%ymm\[0-9\]+,\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtneph2hf8y\[ \\t\]*%ymm\[0-9\]+,\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtneph2hf8sx\[ \\t\]*%xmm\[0-9\]+,\[^\{\n\]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtneph2hf8sx\[ \\t\]*%xmm\[0-9\]+,\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtneph2hf8sx\[ \\t\]*%xmm\[0-9\]+,\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtneph2hf8sy\[ \\t\]*%ymm\[0-9\]+,\[^\{\n\]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtneph2hf8sy\[ \\t\]*%ymm\[0-9\]+,\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtneph2hf8sy\[ \\t\]*%ymm\[0-9\]+,\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ + +#include + +volatile __m128 x1,a1,b1; +volatile __m256 x2,a2,b2; +volatile __m128h y,x128h; +volatile __m256h y2,x256h; +volatile __m128i x128i; +volatile __m256i x256i; +volatile __mmask8 m8; +volatile __mmask16 m16; +volatile __mmask32 m32; +const void *a; +__m128bh *b; +__m256bh *c; +__m128h *d; +__m256h *e; + +void extern +avx10_2_test (void) +{ + y = _mm_cvtx2ps_ph (a1, b1); + y = _mm_mask_cvtx2ps_ph (y, m8, a1, b1); + y = _mm_maskz_cvtx2ps_ph (m8, a1, b1); + + y2 = _mm256_cvtx2ps_ph (a2, b2); + y2 = _mm256_mask_cvtx2ps_ph (y2, m16, a2, b2); + y2 = _mm256_maskz_cvtx2ps_ph (m16, a2, b2); + + y2 = _mm256_cvtx_round2ps_ph (a2, b2, 8); + y2 = _mm256_mask_cvtx_round2ps_ph (y2, m16, a2, b2, 8); + y2 = _mm256_maskz_cvtx_round2ps_ph (m16, a2, b2, 8); +} + +void extern +avx10_2_vcvtbiasph2bf8_test (void) +{ + x128i = _mm_cvtbiasph_pbf8 (x128i, x128h); + x128i = _mm_mask_cvtbiasph_pbf8 (x128i, m8, x128i, x128h); + x128i = _mm_maskz_cvtbiasph_pbf8 (m8, x128i, x128h); + + x128i = _mm256_cvtbiasph_pbf8 (x256i, x256h); + x128i = _mm256_mask_cvtbiasph_pbf8 (x128i, m16, x256i, x256h); + x128i = _mm256_maskz_cvtbiasph_pbf8 (m16, x256i, x256h); +} + +void extern +avx10_2_vcvtbiasph2bf8s_test (void) +{ + x128i = _mm_cvtbiassph_pbf8 (x128i, x128h); + x128i = _mm_mask_cvtbiassph_pbf8 (x128i, m8, x128i, x128h); + x128i = _mm_maskz_cvtbiassph_pbf8 (m8, x128i, x128h); + + x128i = _mm256_cvtbiassph_pbf8 (x256i, x256h); + x128i = _mm256_mask_cvtbiassph_pbf8 (x128i, m16, x256i, x256h); + x128i = _mm256_maskz_cvtbiassph_pbf8 (m16, x256i, x256h); +} + +void extern +avx10_2_vcvtbiasph2hf8_test (void) +{ + x128i = _mm_cvtbiasph_phf8 (x128i, x128h); + x128i = _mm_mask_cvtbiasph_phf8 (x128i, m8, x128i, x128h); + x128i = _mm_maskz_cvtbiasph_phf8 (m8, x128i, x128h); + + x128i = _mm256_cvtbiasph_phf8 (x256i, x256h); + x128i = _mm256_mask_cvtbiasph_phf8 (x128i, m16, x256i, x256h); + x128i = _mm256_maskz_cvtbiasph_phf8 (m16, x256i, x256h); +} + +void extern +avx10_2_vcvtbiasph2hf8s_test (void) +{ + x128i = _mm_cvtbiassph_phf8 (x128i, x128h); + x128i = _mm_mask_cvtbiassph_phf8 (x128i, m8, x128i, x128h); + x128i = _mm_maskz_cvtbiassph_phf8 (m8, x128i, x128h); + + x128i = _mm256_cvtbiassph_phf8 (x256i, x256h); + x128i = _mm256_mask_cvtbiassph_phf8 (x128i, m16, x256i, x256h); + x128i = _mm256_maskz_cvtbiassph_phf8 (m16, x256i, x256h); +} + +void extern +avx10_2_vcvtne2ph2bf8_test (void) +{ + x128i = _mm_cvtne2ph_pbf8 (x128h, x128h); + x128i = _mm_mask_cvtne2ph_pbf8 (x128i, m16, x128h, x128h); + x128i = _mm_maskz_cvtne2ph_pbf8 (m16, x128h, x128h); + x256i = _mm256_cvtne2ph_pbf8 (x256h, x256h); + x256i = _mm256_mask_cvtne2ph_pbf8 (x256i, m32, x256h, x256h); + x256i = _mm256_maskz_cvtne2ph_pbf8 (m32, x256h, x256h); +} + +void extern +avx10_2_vcvtne2ph2bf8s_test (void) +{ + x128i = _mm_cvtnes2ph_pbf8 (x128h, x128h); + x128i = _mm_mask_cvtnes2ph_pbf8 (x128i, m16, x128h, x128h); + x128i = _mm_maskz_cvtnes2ph_pbf8 (m16, x128h, x128h); + x256i = _mm256_cvtnes2ph_pbf8 (x256h, x256h); + x256i = _mm256_mask_cvtnes2ph_pbf8 (x256i, m32, x256h, x256h); + x256i = _mm256_maskz_cvtnes2ph_pbf8 (m32, x256h, x256h); +} + +void extern +avx10_2_vcvtne2ph2hf8_test (void) +{ + x128i = _mm_cvtne2ph_phf8 (x128h, x128h); + x128i = _mm_mask_cvtne2ph_phf8 (x128i, m16, x128h, x128h); + x128i = _mm_maskz_cvtne2ph_phf8 (m16, x128h, x128h); + x256i = _mm256_cvtne2ph_phf8 (x256h, x256h); + x256i = _mm256_mask_cvtne2ph_phf8 (x256i, m32, x256h, x256h); + x256i = _mm256_maskz_cvtne2ph_phf8 (m32, x256h, x256h); +} + +void extern +avx10_2_vcvtne2ph2hf8s_test (void) +{ + x128i = _mm_cvtnes2ph_phf8 (x128h, x128h); + x128i = _mm_mask_cvtnes2ph_phf8 (x128i, m16, x128h, x128h); + x128i = _mm_maskz_cvtnes2ph_phf8 (m16, x128h, x128h); + x256i = _mm256_cvtnes2ph_phf8 (x256h, x256h); + x256i = _mm256_mask_cvtnes2ph_phf8 (x256i, m32, x256h, x256h); + x256i = _mm256_maskz_cvtnes2ph_phf8 (m32, x256h, x256h); +} + +void extern +avx10_2_vcvthf82ph_test (void) +{ + x128h = _mm_cvthf8_ph (x128i); + x128h = _mm_mask_cvthf8_ph (x128h, m8, x128i); + x128h = _mm_maskz_cvthf8_ph (m8, x128i); + + x256h = _mm256_cvthf8_ph (x128i); + x256h = _mm256_mask_cvthf8_ph (x256h, m16, x128i); + x256h = _mm256_maskz_cvthf8_ph (m16, x128i); +} + +void extern +avx10_2_vcvtneph2bf8_test (void) +{ + x128i = _mm_cvtneph_pbf8 (x128h); + x128i = _mm_mask_cvtneph_pbf8 (x128i, m8, x128h); + x128i = _mm_maskz_cvtneph_pbf8 (m8, x128h); + + x128i = _mm256_cvtneph_pbf8 (x256h); + x128i = _mm256_mask_cvtneph_pbf8 (x128i, m16, x256h); + x128i = _mm256_maskz_cvtneph_pbf8 (m16, x256h); +} + +void extern +avx10_2_vcvtneph2bf8s_test (void) +{ + x128i = _mm_cvtnesph_pbf8 (x128h); + x128i = _mm_mask_cvtnesph_pbf8 (x128i, m8, x128h); + x128i = _mm_maskz_cvtnesph_pbf8 (m8, x128h); + + x128i = _mm256_cvtnesph_pbf8 (x256h); + x128i = _mm256_mask_cvtnesph_pbf8 (x128i, m16, x256h); + x128i = _mm256_maskz_cvtnesph_pbf8 (m16, x256h); +} + +void extern +avx10_2_vcvtneph2hf8_test (void) +{ + x128i = _mm_cvtneph_phf8 (x128h); + x128i = _mm_mask_cvtneph_phf8 (x128i, m8, x128h); + x128i = _mm_maskz_cvtneph_phf8 (m8, x128h); + + x128i = _mm256_cvtneph_phf8 (x256h); + x128i = _mm256_mask_cvtneph_phf8 (x128i, m16, x256h); + x128i = _mm256_maskz_cvtneph_phf8 (m16, x256h); +} + +void extern +avx10_2_vcvtneph2hf8s_test (void) +{ + x128i = _mm_cvtnesph_phf8 (x128h); + x128i = _mm_mask_cvtnesph_phf8 (x128i, m8, x128h); + x128i = _mm_maskz_cvtnesph_phf8 (m8, x128h); + + x128i = _mm256_cvtnesph_phf8 (x256h); + x128i = _mm256_mask_cvtnesph_phf8 (x128i, m16, x256h); + x128i = _mm256_maskz_cvtnesph_phf8 (m16, x256h); +} diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-vcvt2ps2phx-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-vcvt2ps2phx-2.c new file mode 100644 index 00000000000..ba3a30c9317 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx10_2-vcvt2ps2phx-2.c @@ -0,0 +1,16 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mavx10.2" } */ +/* { dg-require-effective-target avx10_2 } */ + +#define AVX10_2 +#define AVX512VL +#define AVX512F_LEN 256 +#define AVX512F_LEN_HALF 128 +#include "avx10_2-512-vcvt2ps2phx-2.c" + +#undef AVX512F_LEN +#undef AVX512F_LEN_HALF + +#define AVX512F_LEN 128 +#define AVX512F_LEN_HALF 128 +#include "avx10_2-512-vcvt2ps2phx-2.c" diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-vcvtbiasph2bf8-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-vcvtbiasph2bf8-2.c new file mode 100644 index 00000000000..b33d465f465 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx10_2-vcvtbiasph2bf8-2.c @@ -0,0 +1,16 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mavx10.2" } */ +/* { dg-require-effective-target avx10_2 } */ + +#define AVX10_2 +#define AVX512VL +#define AVX512F_LEN 256 +#define AVX512F_LEN_HALF 128 +#include "avx10_2-512-vcvtbiasph2bf8-2.c" + +#undef AVX512F_LEN +#undef AVX512F_LEN_HALF + +#define AVX512F_LEN 128 +#define AVX512F_LEN_HALF 128 +#include "avx10_2-512-vcvtbiasph2bf8-2.c" diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-vcvtbiasph2bf8s-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-vcvtbiasph2bf8s-2.c new file mode 100644 index 00000000000..dcf0d39a54c --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx10_2-vcvtbiasph2bf8s-2.c @@ -0,0 +1,16 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mavx10.2" } */ +/* { dg-require-effective-target avx10_2 } */ + +#define AVX10_2 +#define AVX512VL +#define AVX512F_LEN 256 +#define AVX512F_LEN_HALF 128 +#include "avx10_2-512-vcvtbiasph2bf8s-2.c" + +#undef AVX512F_LEN +#undef AVX512F_LEN_HALF + +#define AVX512F_LEN 128 +#define AVX512F_LEN_HALF 128 +#include "avx10_2-512-vcvtbiasph2bf8s-2.c" diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-vcvtbiasph2hf8-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-vcvtbiasph2hf8-2.c new file mode 100644 index 00000000000..93b80c7cecb --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx10_2-vcvtbiasph2hf8-2.c @@ -0,0 +1,16 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mavx10.2" } */ +/* { dg-require-effective-target avx10_2 } */ + +#define AVX10_2 +#define AVX512VL +#define AVX512F_LEN 256 +#define AVX512F_LEN_HALF 128 +#include "avx10_2-512-vcvtbiasph2hf8-2.c" + +#undef AVX512F_LEN +#undef AVX512F_LEN_HALF + +#define AVX512F_LEN 128 +#define AVX512F_LEN_HALF 128 +#include "avx10_2-512-vcvtbiasph2hf8-2.c" diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-vcvtbiasph2hf8s-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-vcvtbiasph2hf8s-2.c new file mode 100644 index 00000000000..ed35bf08e12 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx10_2-vcvtbiasph2hf8s-2.c @@ -0,0 +1,16 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mavx10.2" } */ +/* { dg-require-effective-target avx10_2 } */ + +#define AVX10_2 +#define AVX512VL +#define AVX512F_LEN 256 +#define AVX512F_LEN_HALF 128 +#include "avx10_2-512-vcvtbiasph2hf8s-2.c" + +#undef AVX512F_LEN +#undef AVX512F_LEN_HALF + +#define AVX512F_LEN 128 +#define AVX512F_LEN_HALF 128 +#include "avx10_2-512-vcvtbiasph2hf8s-2.c" diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-vcvthf82ph-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-vcvthf82ph-2.c new file mode 100644 index 00000000000..d0d9a8d6cff --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx10_2-vcvthf82ph-2.c @@ -0,0 +1,16 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mavx10.2" } */ +/* { dg-require-effective-target avx10_2 } */ + +#define AVX10_2 +#define AVX512VL +#define AVX512F_LEN 256 +#define AVX512F_LEN_HALF 128 +#include "avx10_2-512-vcvthf82ph-2.c" + +#undef AVX512F_LEN +#undef AVX512F_LEN_HALF + +#define AVX512F_LEN 128 +#define AVX512F_LEN_HALF 128 +#include "avx10_2-512-vcvthf82ph-2.c" diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-vcvtne2ph2bf8-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-vcvtne2ph2bf8-2.c new file mode 100644 index 00000000000..50948cfd00a --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx10_2-vcvtne2ph2bf8-2.c @@ -0,0 +1,16 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mavx10.2" } */ +/* { dg-require-effective-target avx10_2 } */ + +#define AVX10_2 +#define AVX512VL +#define AVX512F_LEN 256 +#define AVX512F_LEN_HALF 128 +#include "avx10_2-512-vcvtne2ph2bf8-2.c" + +#undef AVX512F_LEN +#undef AVX512F_LEN_HALF + +#define AVX512F_LEN 128 +#define AVX512F_LEN_HALF 128 +#include "avx10_2-512-vcvtne2ph2bf8-2.c" diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-vcvtne2ph2bf8s-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-vcvtne2ph2bf8s-2.c new file mode 100644 index 00000000000..dda859c5def --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx10_2-vcvtne2ph2bf8s-2.c @@ -0,0 +1,16 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mavx10.2" } */ +/* { dg-require-effective-target avx10_2 } */ + +#define AVX10_2 +#define AVX512VL +#define AVX512F_LEN 256 +#define AVX512F_LEN_HALF 128 +#include "avx10_2-512-vcvtne2ph2bf8s-2.c" + +#undef AVX512F_LEN +#undef AVX512F_LEN_HALF + +#define AVX512F_LEN 128 +#define AVX512F_LEN_HALF 128 +#include "avx10_2-512-vcvtne2ph2bf8s-2.c" diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-vcvtne2ph2hf8-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-vcvtne2ph2hf8-2.c new file mode 100644 index 00000000000..5db139f005a --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx10_2-vcvtne2ph2hf8-2.c @@ -0,0 +1,16 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mavx10.2" } */ +/* { dg-require-effective-target avx10_2 } */ + +#define AVX10_2 +#define AVX512VL +#define AVX512F_LEN 256 +#define AVX512F_LEN_HALF 128 +#include "avx10_2-512-vcvtne2ph2hf8-2.c" + +#undef AVX512F_LEN +#undef AVX512F_LEN_HALF + +#define AVX512F_LEN 128 +#define AVX512F_LEN_HALF 128 +#include "avx10_2-512-vcvtne2ph2hf8-2.c" diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-vcvtne2ph2hf8s-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-vcvtne2ph2hf8s-2.c new file mode 100644 index 00000000000..84bd9b2de2e --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx10_2-vcvtne2ph2hf8s-2.c @@ -0,0 +1,16 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mavx10.2" } */ +/* { dg-require-effective-target avx10_2 } */ + +#define AVX10_2 +#define AVX512VL +#define AVX512F_LEN 256 +#define AVX512F_LEN_HALF 128 +#include "avx10_2-512-vcvtne2ph2hf8s-2.c" + +#undef AVX512F_LEN +#undef AVX512F_LEN_HALF + +#define AVX512F_LEN 128 +#define AVX512F_LEN_HALF 128 +#include "avx10_2-512-vcvtne2ph2hf8s-2.c" diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-vcvtneph2bf8-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-vcvtneph2bf8-2.c new file mode 100644 index 00000000000..96deb4c4b55 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx10_2-vcvtneph2bf8-2.c @@ -0,0 +1,16 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mavx10.2" } */ +/* { dg-require-effective-target avx10_2 } */ + +#define AVX10_2 +#define AVX512VL +#define AVX512F_LEN 256 +#define AVX512F_LEN_HALF 128 +#include "avx10_2-512-vcvtneph2bf8-2.c" + +#undef AVX512F_LEN +#undef AVX512F_LEN_HALF + +#define AVX512F_LEN 128 +#define AVX512F_LEN_HALF 128 +#include "avx10_2-512-vcvtneph2bf8-2.c" diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-vcvtneph2bf8s-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-vcvtneph2bf8s-2.c new file mode 100644 index 00000000000..ea34459afbe --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx10_2-vcvtneph2bf8s-2.c @@ -0,0 +1,16 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mavx10.2" } */ +/* { dg-require-effective-target avx10_2 } */ + +#define AVX10_2 +#define AVX512VL +#define AVX512F_LEN 256 +#define AVX512F_LEN_HALF 128 +#include "avx10_2-512-vcvtneph2bf8s-2.c" + +#undef AVX512F_LEN +#undef AVX512F_LEN_HALF + +#define AVX512F_LEN 128 +#define AVX512F_LEN_HALF 128 +#include "avx10_2-512-vcvtneph2bf8s-2.c" diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-vcvtneph2hf8-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-vcvtneph2hf8-2.c new file mode 100644 index 00000000000..e43c6080309 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx10_2-vcvtneph2hf8-2.c @@ -0,0 +1,16 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mavx10.2" } */ +/* { dg-require-effective-target avx10_2 } */ + +#define AVX10_2 +#define AVX512VL +#define AVX512F_LEN 256 +#define AVX512F_LEN_HALF 128 +#include "avx10_2-512-vcvtneph2hf8-2.c" + +#undef AVX512F_LEN +#undef AVX512F_LEN_HALF + +#define AVX512F_LEN 128 +#define AVX512F_LEN_HALF 128 +#include "avx10_2-512-vcvtneph2hf8-2.c" diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-vcvtneph2hf8s-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-vcvtneph2hf8s-2.c new file mode 100644 index 00000000000..109df51b4d1 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx10_2-vcvtneph2hf8s-2.c @@ -0,0 +1,16 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mavx10.2" } */ +/* { dg-require-effective-target avx10_2 } */ + +#define AVX10_2 +#define AVX512VL +#define AVX512F_LEN 256 +#define AVX512F_LEN_HALF 128 +#include "avx10_2-512-vcvtneph2hf8s-2.c" + +#undef AVX512F_LEN +#undef AVX512F_LEN_HALF + +#define AVX512F_LEN 128 +#define AVX512F_LEN_HALF 128 +#include "avx10_2-512-vcvtneph2hf8s-2.c" diff --git a/gcc/testsuite/gcc.target/i386/fp8-helper.h b/gcc/testsuite/gcc.target/i386/fp8-helper.h new file mode 100644 index 00000000000..b486db5bae8 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/fp8-helper.h @@ -0,0 +1,135 @@ +#ifndef FP8_HELPER_UNCLUDED +#define FP8_HELPER_UNCLUDED + +typedef union +{ + _Float16 f16; + unsigned short u16; +} Float16Union; + +static unsigned char +convert_fp16_to_hf8 (_Float16 x, unsigned char b, int s) +{ + Float16Union ux = { .f16 = x }; + const unsigned short fp16_bias = 15, hf8_bias = 7; + unsigned short sign = (ux.u16 & 0x8000) >> 8; + unsigned short e_fp16 = (ux.u16 & 0x7c00) >> 10; + unsigned short m_fp16 = ux.u16 & 0x03ff; + + /* If bias */ + unsigned short x_bias = b ? ux.u16 + (b >> 1) : ux.u16; + unsigned short e = (x_bias & 0x7c00) >> 10; + unsigned short m = (x_bias & 0x03ff) >> 7; + + if (e_fp16 == 0x1f) + { + /* Special value: NaN or Infinity. */ + return (0xf << 3) | 0x7 | sign; + } + else if ((e_fp16 > (fp16_bias - hf8_bias + 15)) + || ((e_fp16 == (fp16_bias - hf8_bias + 15)) + && (m_fp16 > 0x0300))) + { + /* Overflow: Return Max or NaN. */ + return (0xf << 3) | (s ? 0x6 : 0x7) | sign; + } + else if (e_fp16 < fp16_bias - hf8_bias - 3) + { + /* Value too small: Return zero. */ + return sign; + } + else if (e_fp16 <= fp16_bias - hf8_bias) + { + /* Denormalized value: Adjust mantissa. */ + m = ((m_fp16 | 0x0400) >> ((fp16_bias - hf8_bias) + 1 - e_fp16)) + | (((m_fp16 & 0x007f) + 0x007f) >> 7); + return sign; + } + else + { + /* Normal value: Adjust exponent and mantissa. */ + e -= (fp16_bias - hf8_bias); + return (e << 3) | m | sign; + } +} + +static unsigned char +convert_fp16_to_bf8 (_Float16 x, unsigned char b, int s) +{ + Float16Union ux = { .f16 = x }; + unsigned short temp; + unsigned short fp8_res = 0; + + if (__builtin_isinf (x) || __builtin_isnan (x)) + { + /* Special value: NaN or Infinity. */ + fp8_res = (ux.u16 >> 8) & 0xFF; + if (__builtin_isnan (x)) + fp8_res |= 0x02; + } + else + { + unsigned short rounding_bias = b ? b & 0xFF + : ((ux.u16 >> 8) & 0x1) + 0x7F; + temp = ux.u16 + rounding_bias; + fp8_res = (temp >> 8) & 0xFF; + if (((temp >> 8) & 0x7F) == 0x7C && s) + fp8_res = (fp8_res & 0x80) | 0x7B; + } + return fp8_res; +} + +static unsigned char +convert_fp16_to_fp8 (_Float16 x, unsigned char b, int y, int s) +{ + return y ? convert_fp16_to_bf8 (x, b, s) + : convert_fp16_to_hf8 (x, b, s); +} + +static _Float16 +convert_bf8_to_fp16(unsigned char x) +{ + Float16Union u = {.u16 = (x << 8) & 0xff00}; + return u.f16; +} + +static _Float16 +convert_hf8_to_fp16(unsigned char x) +{ + unsigned char hf8_bias; + Float16Union res; + unsigned short fp_16bias, s, e, m, e_norm, lz_cnt; + + fp_16bias = 15; + hf8_bias = 7; + s = (x & 0x80) << 8; + e = (x & 0x78) >> 3; + m = x & 0x07; + e_norm = e + fp_16bias - hf8_bias; + + /* convert denormal hf8 number into a normal fp16 number */ + if ((e == 0) && (m !=0)) + { + lz_cnt = 2; + lz_cnt = (m > 0x1) ? 1 : lz_cnt; + lz_cnt = (m > 0x3) ? 0 : lz_cnt; + e_norm -= lz_cnt; + m = (m << (lz_cnt + 1)) & 0x07; + } + else if ((e == 0) && (m == 0)) + e_norm = 0; + else if ((e == 0xf) && (m == 0x7)) + { + e_norm = 0x1f; + m = 0x4; + } + + res.u16 = 0; + res.u16 |= e_norm << 10; + res.u16 |= m << 7; + res.u16 |= s; + + return res.f16; +} + +#endif diff --git a/gcc/testsuite/gcc.target/i386/sse-13.c b/gcc/testsuite/gcc.target/i386/sse-13.c index 6b1c9e545f0..a5ba3decc97 100644 --- a/gcc/testsuite/gcc.target/i386/sse-13.c +++ b/gcc/testsuite/gcc.target/i386/sse-13.c @@ -1018,4 +1018,10 @@ #define __builtin_ia32_mpsadbw128_mask(A, B, C, D, E) __builtin_ia32_mpsadbw128_mask (A, B, 1, D, E) #define __builtin_ia32_mpsadbw256_mask(A, B, C, D, E) __builtin_ia32_mpsadbw256_mask (A, B, 1, D, E) +/* avx10_2convertintrin.h */ +#define __builtin_ia32_vcvt2ps2phx256_mask_round(A, B, C, D, E) __builtin_ia32_vcvt2ps2phx256_mask_round(A, B, C, D, 8) + +/* avx10_2-512convertintrin.h */ +#define __builtin_ia32_vcvt2ps2phx512_mask_round(A, B, C, D, E) __builtin_ia32_vcvt2ps2phx512_mask_round(A, B, C, D, 8) + #include diff --git a/gcc/testsuite/gcc.target/i386/sse-14.c b/gcc/testsuite/gcc.target/i386/sse-14.c index 6dfdaa96c76..9253e5eb905 100644 --- a/gcc/testsuite/gcc.target/i386/sse-14.c +++ b/gcc/testsuite/gcc.target/i386/sse-14.c @@ -1382,3 +1382,9 @@ test_3 (_mm_maskz_mpsadbw_epu8, __m128i, __mmask8, __m128i, __m128i, 1) test_3 (_mm256_maskz_mpsadbw_epu8, __m256i, __mmask16, __m256i, __m256i, 1) test_4 (_mm_mask_mpsadbw_epu8, __m128i, __m128i, __mmask8, __m128i, __m128i, 1) test_4 (_mm256_mask_mpsadbw_epu8, __m256i, __m256i, __mmask16, __m256i, __m256i, 1) + +/* avx10_2convertintrin */ +test_2 (_mm256_cvtx_round2ps_ph, __m256h, __m256, __m256, 4) + +/* avx10_2-512convertintrin.h */ +test_2 (_mm512_cvtx_round2ps_ph, __m512h, __m512, __m512, 4) diff --git a/gcc/testsuite/gcc.target/i386/sse-22.c b/gcc/testsuite/gcc.target/i386/sse-22.c index 102b6b878c8..d57bbc41a49 100644 --- a/gcc/testsuite/gcc.target/i386/sse-22.c +++ b/gcc/testsuite/gcc.target/i386/sse-22.c @@ -1421,3 +1421,9 @@ test_3 (_mm_maskz_mpsadbw_epu8, __m128i, __mmask8, __m128i, __m128i, 1) test_3 (_mm256_maskz_mpsadbw_epu8, __m256i, __mmask16, __m256i, __m256i, 1) test_4 (_mm_mask_mpsadbw_epu8, __m128i, __m128i, __mmask8, __m128i, __m128i, 1) test_4 (_mm256_mask_mpsadbw_epu8, __m256i, __m256i, __mmask16, __m256i, __m256i, 1) + +/* avx10_2convertintrin */ +test_2 (_mm256_cvtx_round2ps_ph, __m256h, __m256, __m256, 4) + +/* avx10_2-512convertintrin.h */ +test_2 (_mm512_cvtx_round2ps_ph, __m512h, __m512, __m512, 4) diff --git a/gcc/testsuite/gcc.target/i386/sse-23.c b/gcc/testsuite/gcc.target/i386/sse-23.c index 962b9507283..438974cb0c6 100644 --- a/gcc/testsuite/gcc.target/i386/sse-23.c +++ b/gcc/testsuite/gcc.target/i386/sse-23.c @@ -992,6 +992,12 @@ #define __builtin_ia32_mpsadbw128_mask(A, B, C, D, E) __builtin_ia32_mpsadbw128_mask (A, B, 1, D, E) #define __builtin_ia32_mpsadbw256_mask(A, B, C, D, E) __builtin_ia32_mpsadbw256_mask (A, B, 1, D, E) +/* avx10_2convertintrin.h */ +#define __builtin_ia32_vcvt2ps2phx256_mask_round(A, B, C, D, E) __builtin_ia32_vcvt2ps2phx256_mask_round(A, B, C, D, 8) + +/* avx10_2-512convertintrin.h */ +#define __builtin_ia32_vcvt2ps2phx512_mask_round(A, B, C, D, E) __builtin_ia32_vcvt2ps2phx512_mask_round(A, B, C, D, 8) + #pragma GCC target ("sse4a,3dnow,avx,avx2,fma4,xop,aes,pclmul,popcnt,abm,lzcnt,bmi,bmi2,tbm,lwp,fsgsbase,rdrnd,f16c,fma,rtm,rdseed,prfchw,adx,fxsr,xsaveopt,sha,xsavec,xsaves,clflushopt,clwb,mwaitx,clzero,pku,sgx,rdpid,gfni,vpclmulqdq,pconfig,wbnoinvd,enqcmd,avx512vp2intersect,serialize,tsxldtrk,amx-tile,amx-int8,amx-bf16,kl,widekl,avxvnni,avxifma,avxvnniint8,avxneconvert,cmpccxadd,amx-fp16,prefetchi,raoint,amx-complex,avxvnniint16,sm3,sha512,sm4,avx10.2-512") #include