From patchwork Mon Aug 19 08:56:45 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Haochen Jiang X-Patchwork-Id: 1973725 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=intel.com header.i=@intel.com header.a=rsa-sha256 header.s=Intel header.b=jLS4C6hK; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=8.43.85.97; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4WnRM85rLRz1yfj for ; Mon, 19 Aug 2024 18:57:56 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id E2EFA3865C2D for ; Mon, 19 Aug 2024 08:57:54 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.18]) by sourceware.org (Postfix) with ESMTPS id 1A93A3864849 for ; Mon, 19 Aug 2024 08:57:22 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 1A93A3864849 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=intel.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 1A93A3864849 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=192.198.163.18 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1724057843; cv=none; b=oPa/OIxiLNaLg1vYgOCCMeFX0tX+PiguE0t4GHLr2xrMqJcm2/wu0AmY2tpxU9xH5P9WmpmfBBLonr/64vpBlvAZKNrRqVZLHPzk9n/knxr2vVJ7HqWTnrJ8951hrGKC41Lcmuc5lqoxDHqL9HNnKVTRZrUyKgNRy6ltIrRdCY4= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1724057843; c=relaxed/simple; bh=VApy9XRfKEkAB/sDANQgmcq9ax6n2GuF+7OgQ+xJrgk=; h=DKIM-Signature:From:To:Subject:Date:Message-ID:MIME-Version; b=qUzM24L2KGCn6mGB0w6n+huPf2rETOSgNtaMfN6gPrWWkF6DS5O7HG5dp1sd58XYYvNk0iAtFu2IT84ozMCBX2q0mN79oCY5ie39PxCgddJ4iRF+QUrRbpxMFHqhDjRbEES937V8XaFrwCRkgMxhLY6EMk9ucy9QXa4rePDP11w= ARC-Authentication-Results: i=1; server2.sourceware.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1724057842; x=1755593842; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=VApy9XRfKEkAB/sDANQgmcq9ax6n2GuF+7OgQ+xJrgk=; b=jLS4C6hKHBueGkaIJKglpPx7QSul3WCDFz4mG68t5VswLXR1Pbizt8Lj 0M1GkYc58BCc+Ftqbe1OeP2NCj1S0Z+SCKIZFg+zpIzRzlSOEtyJ3AQjQ 9Y4kr1mY10x+nIIJLWuxzu+zx5QXBWfZQCeS71B+lGWxi9WPMtaWYiEli H080bg3ctjJSe/qgZJ484v3htlqCLD3YbTxC+tx3A/nLOMjkRK4Tm8P0c nkftHWUi1aYGF8/8uzDAuUCL9k7hhALKPwy9mtJMzcpZIqdDGbf9kahJG MePiTRDvQ2noLZBHqd24ixMBh4nzi8r0rxhn3Jd+qE9fM6jRnZIIlA/ds Q==; X-CSE-ConnectionGUID: ZybTZDIZTOmP2Yjci1obIw== X-CSE-MsgGUID: qTKfeVJrRWGqtmjPxPCl+g== X-IronPort-AV: E=McAfee;i="6700,10204,11168"; a="21837739" X-IronPort-AV: E=Sophos;i="6.10,158,1719903600"; d="scan'208";a="21837739" Received: from fmviesa001.fm.intel.com ([10.60.135.141]) by fmvoesa112.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Aug 2024 01:57:20 -0700 X-CSE-ConnectionGUID: C/VpMNuZSo+3blmwdk2Jyw== X-CSE-MsgGUID: uc8rwQXDRbiciH51W/7xFw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.10,158,1719903600"; d="scan'208";a="91084183" Received: from scymds04.sc.intel.com ([10.82.73.238]) by fmviesa001.fm.intel.com with ESMTP; 19 Aug 2024 01:57:19 -0700 Received: from icl-spr-01.jf.intel.com (icl-spr-01.jf.intel.com [10.165.54.241]) by scymds04.sc.intel.com (Postfix) with ESMTP id 9B85D2003EAC; Mon, 19 Aug 2024 01:57:19 -0700 (PDT) From: Haochen Jiang To: gcc-patches@gcc.gnu.org Cc: hongtao.liu@intel.com, zewei.mo@pitt.edu, ubizjak@gmail.com Subject: [PATCH 01/12] i386: Refactor m512-check.h Date: Mon, 19 Aug 2024 01:56:45 -0700 Message-ID: <20240819085717.193256-2-haochen.jiang@intel.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20240819085717.193256-1-haochen.jiang@intel.com> References: <20240819085717.193256-1-haochen.jiang@intel.com> MIME-Version: 1.0 X-Spam-Status: No, score=-10.7 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, SPF_HELO_NONE, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org After AVX10 introduction, we still want to use AVX512 helper functions to avoid duplicate code. In order to reuse them, we need to do some refactor to make sure each function define happen under correct ISA to avoid ABI warnings. gcc/testsuite/ChangeLog: * gcc.target/i386/m512-check.h: Wrap the function define with correct vector size. --- gcc/testsuite/gcc.target/i386/m512-check.h | 66 ++++++++++++---------- 1 file changed, 35 insertions(+), 31 deletions(-) diff --git a/gcc/testsuite/gcc.target/i386/m512-check.h b/gcc/testsuite/gcc.target/i386/m512-check.h index 68e74fce68d..d5d18372947 100644 --- a/gcc/testsuite/gcc.target/i386/m512-check.h +++ b/gcc/testsuite/gcc.target/i386/m512-check.h @@ -61,6 +61,12 @@ typedef union unsigned long long a[8]; } union512i_uq; +typedef union +{ + __m512h x; + _Float16 a[32]; +} union512h; + typedef union { __m128h x; @@ -73,27 +79,6 @@ typedef union _Float16 a[16]; } union256h; -typedef union -{ - __m512h x; - _Float16 a[32]; -} union512h; - -CHECK_EXP (union512i_b, char, "%d") -CHECK_EXP (union512i_w, short, "%d") -CHECK_EXP (union512i_d, int, "0x%x") -CHECK_EXP (union512i_q, long long, "0x%llx") -CHECK_EXP (union512, float, "%f") -CHECK_EXP (union512d, double, "%f") -CHECK_EXP (union512i_ub, unsigned char, "%d") -CHECK_EXP (union512i_uw, unsigned short, "%d") -CHECK_EXP (union512i_ud, unsigned int, "0x%x") -CHECK_EXP (union512i_uq, unsigned long long, "0x%llx") - - -CHECK_FP_EXP (union512, float, ESP_FLOAT, "%f") -CHECK_FP_EXP (union512d, double, ESP_DOUBLE, "%f") - #define CHECK_ROUGH_EXP(UNION_TYPE, VALUE_TYPE, FMT) \ static int \ __attribute__((noinline, unused)) \ @@ -126,28 +111,47 @@ check_rough_##UNION_TYPE (UNION_TYPE u, const VALUE_TYPE *v, \ return err; \ } -CHECK_ROUGH_EXP (union512, float, "%f") -CHECK_ROUGH_EXP (union512d, double, "%f") +#ifndef ESP_FLOAT16 +#define ESP_FLOAT16 0.27 +#endif + CHECK_ROUGH_EXP (union256, float, "%f") CHECK_ROUGH_EXP (union256d, double, "%f") CHECK_ROUGH_EXP (union128, float, "%f") CHECK_ROUGH_EXP (union128d, double, "%f") -#ifdef AVX512FP16 +#ifndef AVX512F_LEN +CHECK_EXP (union512i_b, char, "%d") +CHECK_EXP (union512i_w, short, "%d") +CHECK_EXP (union512i_d, int, "0x%x") +CHECK_EXP (union512i_q, long long, "0x%llx") +CHECK_EXP (union512, float, "%f") +CHECK_EXP (union512d, double, "%f") +CHECK_EXP (union512i_ub, unsigned char, "%d") +CHECK_EXP (union512i_uw, unsigned short, "%d") +CHECK_EXP (union512i_ud, unsigned int, "0x%x") +CHECK_EXP (union512i_uq, unsigned long long, "0x%llx") + +CHECK_FP_EXP (union512, float, ESP_FLOAT, "%f") +CHECK_FP_EXP (union512d, double, ESP_DOUBLE, "%f") -CHECK_EXP (union128h, _Float16, "%f") -CHECK_EXP (union256h, _Float16, "%f") -CHECK_EXP (union512h, _Float16, "%f") +CHECK_ROUGH_EXP (union512, float, "%f") +CHECK_ROUGH_EXP (union512d, double, "%f") -#ifndef ESP_FLOAT16 -#define ESP_FLOAT16 0.27 +#if defined(AVX512FP16) +CHECK_EXP (union512h, _Float16, "%f") +CHECK_FP_EXP (union512h, _Float16, ESP_FLOAT16, "%f") +CHECK_ROUGH_EXP (union512h, _Float16, "%f") +#endif #endif +#if defined(AVX512FP16) +CHECK_EXP (union128h, _Float16, "%f") +CHECK_EXP (union256h, _Float16, "%f") + CHECK_FP_EXP (union128h, _Float16, ESP_FLOAT16, "%f") CHECK_FP_EXP (union256h, _Float16, ESP_FLOAT16, "%f") -CHECK_FP_EXP (union512h, _Float16, ESP_FLOAT16, "%f") CHECK_ROUGH_EXP (union128h, _Float16, "%f") CHECK_ROUGH_EXP (union256h, _Float16, "%f") -CHECK_ROUGH_EXP (union512h, _Float16, "%f") #endif From patchwork Mon Aug 19 08:56:46 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Haochen Jiang X-Patchwork-Id: 1973728 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=intel.com header.i=@intel.com header.a=rsa-sha256 header.s=Intel header.b=Y5l9H2ki; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4WnRMf6fyRz1yfj for ; Mon, 19 Aug 2024 18:58:22 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id BD9623865479 for ; Mon, 19 Aug 2024 08:58:20 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.18]) by sourceware.org (Postfix) with ESMTPS id 9CB3F3864816 for ; Mon, 19 Aug 2024 08:57:24 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 9CB3F3864816 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=intel.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 9CB3F3864816 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=192.198.163.18 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1724057851; cv=none; b=LLxvCN55pP4kvBj6gnnzHlZYiePWCrxCpfTTgRBwgoxOw5q4aKnRDN8/nJfIj7y0Gx2+4cLHKhUuMNR5tODxsrGvxbZJSuXRmm4CVR1T9+K7f5TtimAEsNUpBZr1AQegp4tmMMgIuMzYoFQi4BQt1BiBr8rx9jPM06OOvw/BMI0= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1724057851; c=relaxed/simple; bh=9bDvxLJMFiTXL8HjJVGB1I2ylgCqH2UyE+rwsT8s/Ok=; h=DKIM-Signature:From:To:Subject:Date:Message-ID:MIME-Version; b=Ik5gsV4MfUCR4sDaXSPFzG5ynVchZwsAQzktO9BxRxJ36uw9Ns1tB0937J2iOzskHYbdYxFGhfFizPRHNMz+25FxmDGDMmuEB8ZWtcVlVrjk8//hapxhQeeD1kBM9y5E9WjsMEH/BXUqDjNYvGYkqjQmlYahHhifYmpHFXgiGsg= ARC-Authentication-Results: i=1; server2.sourceware.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1724057845; x=1755593845; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=9bDvxLJMFiTXL8HjJVGB1I2ylgCqH2UyE+rwsT8s/Ok=; b=Y5l9H2kiMoAEryu0WTb6gbp+94j16Sq/Wj/OplrMHPxMdcVja7X6eZxq 3Y7C6vaPRHzd7sy4153PiXEj1LhInhb+QCjkuvFPfotbaMdLyWd6IkqBb fprHqk+BsI3gdlF9DxGAgSwY6yOzCQ3oYglphoFBtP22VwgWN9licgwel /I0pE5CRqB1iI93WHDVTkM7Ua+SUyyn+9ICngevVgAuIORYVCqc7p6pbk n8tqY3Xt1S7PQ760WxCFB62ifpbJQFlB6ExGjNVViVPuW2A9kqXRkEFLe nKAAGbw2Zah0ycG4A8GXn+IqY/Z4V8mWLvzQJd48HkjpgQL/HBDtJpVk7 Q==; X-CSE-ConnectionGUID: kEr6h6SUQXKy9lnLaRZT9g== X-CSE-MsgGUID: BLYCfoT2SjWxk43glU72hQ== X-IronPort-AV: E=McAfee;i="6700,10204,11168"; a="21837744" X-IronPort-AV: E=Sophos;i="6.10,158,1719903600"; d="scan'208";a="21837744" Received: from fmviesa001.fm.intel.com ([10.60.135.141]) by fmvoesa112.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Aug 2024 01:57:21 -0700 X-CSE-ConnectionGUID: 3QlBvqzATR2g5AmeXWaxrA== X-CSE-MsgGUID: N5yrai3jRzusetA0ylCBmw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.10,158,1719903600"; d="scan'208";a="91084187" Received: from scymds04.sc.intel.com ([10.82.73.238]) by fmviesa001.fm.intel.com with ESMTP; 19 Aug 2024 01:57:20 -0700 Received: from icl-spr-01.jf.intel.com (icl-spr-01.jf.intel.com [10.165.54.241]) by scymds04.sc.intel.com (Postfix) with ESMTP id CB51E2003EAB; Mon, 19 Aug 2024 01:57:19 -0700 (PDT) From: Haochen Jiang To: gcc-patches@gcc.gnu.org Cc: hongtao.liu@intel.com, zewei.mo@pitt.edu, ubizjak@gmail.com, Hongyu Wang Subject: [PATCH 02/12] [PATCH 1/2] AVX10.2: Support media instructions Date: Mon, 19 Aug 2024 01:56:46 -0700 Message-ID: <20240819085717.193256-3-haochen.jiang@intel.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20240819085717.193256-1-haochen.jiang@intel.com> References: <20240819085717.193256-1-haochen.jiang@intel.com> MIME-Version: 1.0 X-Spam-Status: No, score=-10.8 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, SPF_HELO_NONE, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org From: Hongyu Wang gcc/ChangeLog * config.gcc: Add avx10_2mediaintrin.h and avx10_2-512mediaintrin.h. * config/i386/i386-builtin.def: Add new builtins. * config/i386/i386-builtins.cc (def_builtin): Handle shared builtins between AVXVNNIINT8 and AVX10.2. * config/i386/i386-expand.cc (ix86_check_builtin_isa_match): Ditto. * config/i386/immintrin.h: Include avx10_2mediaintrin.h and avx10_2-512mediaintrin.h * config/i386/sse.md: (VI4_AVX10_2): New. (vpdp_): Add AVX10_2_256. (vpdp_v16si): New define_insn. (vpdp__mask): Ditto. (*vpdp__maskz): Ditto. (vpdp__maskz): New expander. * config/i386/avx10_2_512mediaintrin.h: New file. * config/i386/avx10_2mediaintrin.h: Ditto. gcc/testsuite/ChangeLog * g++.dg/other/i386-2.C: Add -mavx10.2-512. * g++.dg/other/i386-3.C: Ditto. * gcc.target/i386/avx512f-helper.h: Reuse AVX512F macros for AVX10. * gcc.target/i386/funcspec-56.inc: Add new target attribute. * lib/target-supports.exp (check_effective_target_avx10_2): New. (check_effective_target_avx10_2_512): Ditto. * gcc.target/i386/avx10-check.h: New. * gcc.target/i386/avx10-helper.h: New. * gcc.target/i386/avx10_2-builtin-1.c: Ditto. * gcc.target/i386/avx10_2-512-media-1.c: Ditto. * gcc.target/i386/avx10_2-media-1.c: Ditto.. * gcc.target/i386/avxvnniint8-builtin.c: Ditto. * gcc.target/i386/avx10_2-512-vpdpbssd-2.c: Ditto. * gcc.target/i386/avx10_2-512-vpdpbssds-2.c: Ditto. * gcc.target/i386/avx10_2-512-vpdpbsud-2.c: Ditto. * gcc.target/i386/avx10_2-512-vpdpbsuds-2.c: Ditto. * gcc.target/i386/avx10_2-512-vpdpbuud-2.c: Ditto. * gcc.target/i386/avx10_2-512-vpdpbuuds-2.c: Ditto. * gcc.target/i386/avx10_2-vpdpbssd-2.c: Ditto. * gcc.target/i386/avx10_2-vpdpbssds-2.c: Ditto. * gcc.target/i386/avx10_2-vpdpbsud-2.c: Ditto. * gcc.target/i386/avx10_2-vpdpbsuds-2.c: Ditto. * gcc.target/i386/avx10_2-vpdpbuud-2.c: Ditto. * gcc.target/i386/avx10_2-vpdpbuuds-2.c: Ditto. Co-authored-by: Haochen Jiang --- gcc/config.gcc | 3 +- gcc/config/i386/avx10_2-512mediaintrin.h | 234 +++++++++++ gcc/config/i386/avx10_2mediaintrin.h | 367 ++++++++++++++++++ gcc/config/i386/i386-builtin.def | 68 +++- gcc/config/i386/i386-builtins.cc | 10 +- gcc/config/i386/i386-expand.cc | 3 + gcc/config/i386/immintrin.h | 4 + gcc/config/i386/sse.md | 66 +++- gcc/testsuite/gcc.target/i386/avx10-check.h | 61 +++ gcc/testsuite/gcc.target/i386/avx10-helper.h | 23 ++ .../gcc.target/i386/avx10-os-support.h | 23 ++ .../gcc.target/i386/avx10_2-512-media-1.c | 52 +++ .../gcc.target/i386/avx10_2-512-vpdpbssd-2.c | 71 ++++ .../gcc.target/i386/avx10_2-512-vpdpbssds-2.c | 74 ++++ .../gcc.target/i386/avx10_2-512-vpdpbsud-2.c | 71 ++++ .../gcc.target/i386/avx10_2-512-vpdpbsuds-2.c | 74 ++++ .../gcc.target/i386/avx10_2-512-vpdpbuud-2.c | 70 ++++ .../gcc.target/i386/avx10_2-512-vpdpbuuds-2.c | 73 ++++ .../gcc.target/i386/avx10_2-builtin-1.c | 8 + .../gcc.target/i386/avx10_2-media-1.c | 96 +++++ .../gcc.target/i386/avx10_2-vpdpbssd-2.c | 16 + .../gcc.target/i386/avx10_2-vpdpbssds-2.c | 16 + .../gcc.target/i386/avx10_2-vpdpbsud-2.c | 16 + .../gcc.target/i386/avx10_2-vpdpbsuds-2.c | 16 + .../gcc.target/i386/avx10_2-vpdpbuud-2.c | 16 + .../gcc.target/i386/avx10_2-vpdpbuuds-2.c | 16 + .../gcc.target/i386/avx512f-helper.h | 6 +- .../gcc.target/i386/avxvnniint8-builtin.c | 8 + gcc/testsuite/gcc.target/i386/funcspec-56.inc | 4 + gcc/testsuite/lib/target-supports.exp | 36 ++ 30 files changed, 1577 insertions(+), 24 deletions(-) create mode 100644 gcc/config/i386/avx10_2-512mediaintrin.h create mode 100644 gcc/config/i386/avx10_2mediaintrin.h create mode 100644 gcc/testsuite/gcc.target/i386/avx10-check.h create mode 100644 gcc/testsuite/gcc.target/i386/avx10-helper.h create mode 100644 gcc/testsuite/gcc.target/i386/avx10-os-support.h create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-512-media-1.c create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-512-vpdpbssd-2.c create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-512-vpdpbssds-2.c create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-512-vpdpbsud-2.c create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-512-vpdpbsuds-2.c create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-512-vpdpbuud-2.c create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-512-vpdpbuuds-2.c create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-builtin-1.c create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-media-1.c create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-vpdpbssd-2.c create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-vpdpbssds-2.c create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-vpdpbsud-2.c create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-vpdpbsuds-2.c create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-vpdpbuud-2.c create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-vpdpbuuds-2.c create mode 100644 gcc/testsuite/gcc.target/i386/avxvnniint8-builtin.c diff --git a/gcc/config.gcc b/gcc/config.gcc index 2c0f4518638..22353f2d69e 100644 --- a/gcc/config.gcc +++ b/gcc/config.gcc @@ -452,7 +452,8 @@ i[34567]86-*-* | x86_64-*-*) cmpccxaddintrin.h amxfp16intrin.h prfchiintrin.h raointintrin.h amxcomplexintrin.h avxvnniint16intrin.h sm3intrin.h sha512intrin.h sm4intrin.h - usermsrintrin.h avx10_2roundingintrin.h" + usermsrintrin.h avx10_2roundingintrin.h + avx10_2mediaintrin.h avx10_2-512mediaintrin.h" ;; ia64-*-*) extra_headers=ia64intrin.h diff --git a/gcc/config/i386/avx10_2-512mediaintrin.h b/gcc/config/i386/avx10_2-512mediaintrin.h new file mode 100644 index 00000000000..02d826b24cd --- /dev/null +++ b/gcc/config/i386/avx10_2-512mediaintrin.h @@ -0,0 +1,234 @@ +/* Copyright (C) 2024 Free Software Foundation, Inc. + + This file is part of GCC. + + GCC is free software; you can redistribute it and/or modify + it under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 3, or (at your option) + any later version. + + GCC is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + Under Section 7 of GPL version 3, you are granted additional + permissions described in the GCC Runtime Library Exception, version + 3.1, as published by the Free Software Foundation. + + You should have received a copy of the GNU General Public License and + a copy of the GCC Runtime Library Exception along with this program; + see the files COPYING3 and COPYING.RUNTIME respectively. If not, see + . */ + +#if !defined _IMMINTRIN_H_INCLUDED +#error "Never use directly; include instead." +#endif + +#ifndef _AVX10_2_512MEDIAINTRIN_H_INCLUDED +#define _AVX10_2_512MEDIAINTRIN_H_INCLUDED + +#if !defined(__AVX10_2_512__) +#pragma GCC push_options +#pragma GCC target("avx10.2-512") +#define __DISABLE_AVX10_2_512__ +#endif /* __AVX10_2_512__ */ + +extern __inline __m512i +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_dpbssd_epi32 (__m512i __W, __m512i __A, __m512i __B) +{ + return (__m512i) + __builtin_ia32_vpdpbssd512 ((__v16si) __W, (__v16si) __A, (__v16si) __B); +} + +extern __inline __m512i +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_dpbssd_epi32 (__m512i __W, __mmask16 __U, + __m512i __A, __m512i __B) +{ + return (__m512i) + __builtin_ia32_vpdpbssd_v16si_mask ((__v16si) __W, + (__v16si) __A, + (__v16si) __B, + (__mmask16) __U); +} + +extern __inline __m512i +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_dpbssd_epi32 (__mmask16 __U, __m512i __W, + __m512i __A, __m512i __B) +{ + return (__m512i) + __builtin_ia32_vpdpbssd_v16si_maskz ((__v16si) __W, + (__v16si) __A, + (__v16si) __B, + (__mmask16) __U); +} + +extern __inline __m512i +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_dpbssds_epi32 (__m512i __W, __m512i __A, __m512i __B) +{ + return (__m512i) + __builtin_ia32_vpdpbssds512 ((__v16si) __W, (__v16si) __A, (__v16si) __B); +} + +extern __inline __m512i +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_dpbssds_epi32 (__m512i __W, __mmask16 __U, + __m512i __A, __m512i __B) +{ + return (__m512i) + __builtin_ia32_vpdpbssds_v16si_mask ((__v16si) __W, + (__v16si) __A, + (__v16si) __B, + (__mmask16) __U); +} + +extern __inline __m512i +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_dpbssds_epi32 (__mmask16 __U, __m512i __W, + __m512i __A, __m512i __B) +{ + return (__m512i) + __builtin_ia32_vpdpbssds_v16si_maskz ((__v16si) __W, + (__v16si) __A, + (__v16si) __B, + (__mmask16) __U); +} + +extern __inline __m512i +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_dpbsud_epi32 (__m512i __W, __m512i __A, __m512i __B) +{ + return (__m512i) + __builtin_ia32_vpdpbsud512 ((__v16si) __W, (__v16si) __A, (__v16si) __B); +} + +extern __inline __m512i +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_dpbsud_epi32 (__m512i __W, __mmask16 __U, + __m512i __A, __m512i __B) +{ + return (__m512i) + __builtin_ia32_vpdpbsud_v16si_mask ((__v16si) __W, + (__v16si) __A, + (__v16si) __B, + (__mmask16) __U); +} + +extern __inline __m512i +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_dpbsud_epi32 (__mmask16 __U, __m512i __W, + __m512i __A, __m512i __B) +{ + return (__m512i) + __builtin_ia32_vpdpbsud_v16si_maskz ((__v16si) __W, + (__v16si) __A, + (__v16si) __B, + (__mmask16) __U); +} + +extern __inline __m512i +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_dpbsuds_epi32 (__m512i __W, __m512i __A, __m512i __B) +{ + return (__m512i) + __builtin_ia32_vpdpbsuds512 ((__v16si) __W, (__v16si) __A, (__v16si) __B); +} + +extern __inline __m512i +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_dpbsuds_epi32 (__m512i __W, __mmask16 __U, + __m512i __A, __m512i __B) +{ + return (__m512i) + __builtin_ia32_vpdpbsuds_v16si_mask ((__v16si) __W, + (__v16si) __A, + (__v16si) __B, + (__mmask16) __U); +} + +extern __inline __m512i +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_dpbsuds_epi32 (__mmask16 __U, __m512i __W, + __m512i __A, __m512i __B) +{ + return (__m512i) + __builtin_ia32_vpdpbsuds_v16si_maskz ((__v16si) __W, + (__v16si) __A, + (__v16si) __B, + (__mmask16) __U); +} + +extern __inline __m512i +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_dpbuud_epi32 (__m512i __W, __m512i __A, __m512i __B) +{ + return (__m512i) + __builtin_ia32_vpdpbuud512 ((__v16si) __W, (__v16si) __A, (__v16si) __B); +} + +extern __inline __m512i +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_dpbuud_epi32 (__m512i __W, __mmask16 __U, + __m512i __A, __m512i __B) +{ + return (__m512i) + __builtin_ia32_vpdpbuud_v16si_mask ((__v16si) __W, + (__v16si) __A, + (__v16si) __B, + (__mmask16) __U); +} + +extern __inline __m512i +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_dpbuud_epi32 (__mmask16 __U, __m512i __W, + __m512i __A, __m512i __B) +{ + return (__m512i) + __builtin_ia32_vpdpbuud_v16si_maskz ((__v16si) __W, + (__v16si) __A, + (__v16si) __B, + (__mmask16) __U); +} + +extern __inline __m512i +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_dpbuuds_epi32 (__m512i __W, __m512i __A, __m512i __B) +{ + return (__m512i) + __builtin_ia32_vpdpbuuds512 ((__v16si) __W, (__v16si) __A, (__v16si) __B); +} + +extern __inline __m512i +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_dpbuuds_epi32 (__m512i __W, __mmask16 __U, + __m512i __A, __m512i __B) +{ + return (__m512i) + __builtin_ia32_vpdpbuuds_v16si_mask ((__v16si) __W, + (__v16si) __A, + (__v16si) __B, + (__mmask16) __U); +} + +extern __inline __m512i +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_dpbuuds_epi32 (__mmask16 __U, __m512i __W, + __m512i __A, __m512i __B) +{ + return (__m512i) + __builtin_ia32_vpdpbuuds_v16si_maskz ((__v16si) __W, + (__v16si) __A, + (__v16si) __B, + (__mmask16) __U); +} + +#ifdef __DISABLE_AVX10_2_512__ +#undef __DISABLE_AVX10_2_512__ +#pragma GCC pop_options +#endif /* __DISABLE_AVX10_2_512__ */ + +#endif /* __AVX10_2_512MEDIAINTRIN_H_INCLUDED */ diff --git a/gcc/config/i386/avx10_2mediaintrin.h b/gcc/config/i386/avx10_2mediaintrin.h new file mode 100644 index 00000000000..e668af62e36 --- /dev/null +++ b/gcc/config/i386/avx10_2mediaintrin.h @@ -0,0 +1,367 @@ +/* Copyright (C) 2024 Free Software Foundation, Inc. + + This file is part of GCC. + + GCC is free software; you can redistribute it and/or modify + it under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 3, or (at your option) + any later version. + + GCC is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + Under Section 7 of GPL version 3, you are granted additional + permissions described in the GCC Runtime Library Exception, version + 3.1, as published by the Free Software Foundation. + + You should have received a copy of the GNU General Public License and + a copy of the GCC Runtime Library Exception along with this program; + see the files COPYING3 and COPYING.RUNTIME respectively. If not, see + . */ + +#if !defined _IMMINTRIN_H_INCLUDED +#error "Never use directly; include instead." +#endif + +#ifndef _AVX10_2MEDIAINTRIN_H_INCLUDED +#define _AVX10_2MEDIAINTRIN_H_INCLUDED + +#if !defined(__AVX10_2_256__) +#pragma GCC push_options +#pragma GCC target("avx10.2-256") +#define __DISABLE_AVX10_2_256__ +#endif /* __AVX10_2_256__ */ + +#define _mm_dpbssd_epi32(W, A, B) \ + (__m128i) __builtin_ia32_vpdpbssd128 ((__v4si) (W), (__v4si) (A), (__v4si) (B)) + +#define _mm_dpbssds_epi32(W, A, B) \ + (__m128i) __builtin_ia32_vpdpbssds128 ((__v4si) (W), (__v4si) (A), (__v4si) (B)) + +#define _mm_dpbsud_epi32(W, A, B) \ + (__m128i) __builtin_ia32_vpdpbsud128 ((__v4si) (W), (__v4si) (A), (__v4si) (B)) + +#define _mm_dpbsuds_epi32(W, A, B) \ + (__m128i) __builtin_ia32_vpdpbsuds128 ((__v4si) (W), (__v4si) (A), (__v4si) (B)) + +#define _mm_dpbuud_epi32(W, A, B) \ + (__m128i) __builtin_ia32_vpdpbuud128 ((__v4si) (W), (__v4si) (A), (__v4si) (B)) + +#define _mm_dpbuuds_epi32(W, A, B) \ + (__m128i) __builtin_ia32_vpdpbuuds128 ((__v4si) (W), (__v4si) (A), (__v4si) (B)) + +#define _mm256_dpbssd_epi32(W, A, B) \ + (__m256i) __builtin_ia32_vpdpbssd256 ((__v8si) (W), (__v8si) (A), (__v8si) (B)) + +#define _mm256_dpbssds_epi32(W, A, B) \ + (__m256i) __builtin_ia32_vpdpbssds256 ((__v8si) (W), (__v8si) (A), (__v8si) (B)) + +#define _mm256_dpbsud_epi32(W, A, B) \ + (__m256i) __builtin_ia32_vpdpbsud256 ((__v8si) (W), (__v8si) (A), (__v8si) (B)) + +#define _mm256_dpbsuds_epi32(W, A, B) \ + (__m256i) __builtin_ia32_vpdpbsuds256 ((__v8si) (W), (__v8si) (A), (__v8si) (B)) + +#define _mm256_dpbuud_epi32(W, A, B) \ + (__m256i) __builtin_ia32_vpdpbuud256 ((__v8si) (W), (__v8si) (A), (__v8si) (B)) + +#define _mm256_dpbuuds_epi32(W, A, B) \ + (__m256i) __builtin_ia32_vpdpbuuds256 ((__v8si) (W), (__v8si) (A), (__v8si) (B)) + +extern __inline __m128i +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_dpbssd_epi32 (__m128i __W, __mmask8 __U, + __m128i __A, __m128i __B) +{ + return (__m128i) + __builtin_ia32_vpdpbssd_v4si_mask ((__v4si) __W, + (__v4si) __A, + (__v4si) __B, + (__mmask8) __U); +} + +extern __inline __m128i +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_dpbssd_epi32 (__mmask8 __U, __m128i __W, + __m128i __A, __m128i __B) +{ + return (__m128i) + __builtin_ia32_vpdpbssd_v4si_maskz ((__v4si) __W, + (__v4si) __A, + (__v4si) __B, + (__mmask8) __U); +} + +extern __inline __m128i +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_dpbssds_epi32 (__m128i __W, __mmask8 __U, + __m128i __A, __m128i __B) +{ + return (__m128i) + __builtin_ia32_vpdpbssds_v4si_mask ((__v4si) __W, + (__v4si) __A, + (__v4si) __B, + (__mmask8) __U); +} + +extern __inline __m128i +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_dpbssds_epi32 (__mmask8 __U, __m128i __W, + __m128i __A, __m128i __B) +{ + return (__m128i) + __builtin_ia32_vpdpbssds_v4si_maskz ((__v4si) __W, + (__v4si) __A, + (__v4si) __B, + (__mmask8) __U); +} + +extern __inline __m128i +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_dpbsud_epi32 (__m128i __W, __mmask8 __U, + __m128i __A, __m128i __B) +{ + return (__m128i) + __builtin_ia32_vpdpbsud_v4si_mask ((__v4si) __W, + (__v4si) __A, + (__v4si) __B, + (__mmask8) __U); +} + +extern __inline __m128i +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_dpbsud_epi32 (__mmask8 __U, __m128i __W, + __m128i __A, __m128i __B) +{ + return (__m128i) + __builtin_ia32_vpdpbsud_v4si_maskz ((__v4si) __W, + (__v4si) __A, + (__v4si) __B, + (__mmask8) __U); +} + +extern __inline __m128i +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_dpbsuds_epi32 (__m128i __W, __mmask8 __U, + __m128i __A, __m128i __B) +{ + return (__m128i) + __builtin_ia32_vpdpbsuds_v4si_mask ((__v4si) __W, + (__v4si) __A, + (__v4si) __B, + (__mmask8) __U); +} + +extern __inline __m128i +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_dpbsuds_epi32 (__mmask8 __U, __m128i __W, + __m128i __A, __m128i __B) +{ + return (__m128i) + __builtin_ia32_vpdpbsuds_v4si_maskz ((__v4si) __W, + (__v4si) __A, + (__v4si) __B, + (__mmask8) __U); +} + +extern __inline __m128i +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_dpbuud_epi32 (__m128i __W, __mmask8 __U, + __m128i __A, __m128i __B) +{ + return (__m128i) + __builtin_ia32_vpdpbuud_v4si_mask ((__v4si) __W, + (__v4si) __A, + (__v4si) __B, + (__mmask8) __U); +} + +extern __inline __m128i +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_dpbuud_epi32 (__mmask8 __U, __m128i __W, + __m128i __A, __m128i __B) +{ + return (__m128i) + __builtin_ia32_vpdpbuud_v4si_maskz ((__v4si) __W, + (__v4si) __A, + (__v4si) __B, + (__mmask8) __U); +} + +extern __inline __m128i +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_dpbuuds_epi32 (__m128i __W, __mmask8 __U, + __m128i __A, __m128i __B) +{ + return (__m128i) + __builtin_ia32_vpdpbuuds_v4si_mask ((__v4si) __W, + (__v4si) __A, + (__v4si) __B, + (__mmask8) __U); +} + +extern __inline __m128i +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_dpbuuds_epi32 (__mmask8 __U, __m128i __W, + __m128i __A, __m128i __B) +{ + return (__m128i) + __builtin_ia32_vpdpbuuds_v4si_maskz ((__v4si) __W, + (__v4si) __A, + (__v4si) __B, + (__mmask8) __U); +} + +extern __inline __m256i +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_mask_dpbssd_epi32 (__m256i __W, __mmask8 __U, + __m256i __A, __m256i __B) +{ + return (__m256i) + __builtin_ia32_vpdpbssd_v8si_mask ((__v8si) __W, + (__v8si) __A, + (__v8si) __B, + (__mmask8) __U); +} + +extern __inline __m256i +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_maskz_dpbssd_epi32 (__mmask8 __U, __m256i __W, + __m256i __A, __m256i __B) +{ + return (__m256i) + __builtin_ia32_vpdpbssd_v8si_maskz ((__v8si) __W, + (__v8si) __A, + (__v8si) __B, + (__mmask8) __U); +} + +extern __inline __m256i +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_mask_dpbssds_epi32 (__m256i __W, __mmask8 __U, + __m256i __A, __m256i __B) +{ + return (__m256i) + __builtin_ia32_vpdpbssds_v8si_mask ((__v8si) __W, + (__v8si) __A, + (__v8si) __B, + (__mmask8) __U); +} + +extern __inline __m256i +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_maskz_dpbssds_epi32 (__mmask8 __U, __m256i __W, + __m256i __A, __m256i __B) +{ + return (__m256i) + __builtin_ia32_vpdpbssds_v8si_maskz ((__v8si) __W, + (__v8si) __A, + (__v8si) __B, + (__mmask8) __U); +} + +extern __inline __m256i +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_mask_dpbsud_epi32 (__m256i __W, __mmask8 __U, + __m256i __A, __m256i __B) +{ + return (__m256i) + __builtin_ia32_vpdpbsud_v8si_mask ((__v8si) __W, + (__v8si) __A, + (__v8si) __B, + (__mmask8) __U); +} + +extern __inline __m256i +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_maskz_dpbsud_epi32 (__mmask8 __U, __m256i __W, + __m256i __A, __m256i __B) +{ + return (__m256i) + __builtin_ia32_vpdpbsud_v8si_maskz ((__v8si) __W, + (__v8si) __A, + (__v8si) __B, + (__mmask8) __U); +} + +extern __inline __m256i +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_mask_dpbsuds_epi32 (__m256i __W, __mmask8 __U, + __m256i __A, __m256i __B) +{ + return (__m256i) + __builtin_ia32_vpdpbsuds_v8si_mask ((__v8si) __W, + (__v8si) __A, + (__v8si) __B, + (__mmask8) __U); +} + +extern __inline __m256i +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_maskz_dpbsuds_epi32 (__mmask8 __U, __m256i __W, + __m256i __A, __m256i __B) +{ + return (__m256i) + __builtin_ia32_vpdpbsuds_v8si_maskz ((__v8si) __W, + (__v8si) __A, + (__v8si) __B, + (__mmask8) __U); +} + +extern __inline __m256i +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_mask_dpbuud_epi32 (__m256i __W, __mmask8 __U, + __m256i __A, __m256i __B) +{ + return (__m256i) + __builtin_ia32_vpdpbuud_v8si_mask ((__v8si) __W, + (__v8si) __A, + (__v8si) __B, + (__mmask8) __U); +} + +extern __inline __m256i +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_maskz_dpbuud_epi32 (__mmask8 __U, __m256i __W, + __m256i __A, __m256i __B) +{ + return (__m256i) + __builtin_ia32_vpdpbuud_v8si_maskz ((__v8si) __W, + (__v8si) __A, + (__v8si) __B, + (__mmask8) __U); +} + +extern __inline __m256i +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_mask_dpbuuds_epi32 (__m256i __W, __mmask8 __U, + __m256i __A, __m256i __B) +{ + return (__m256i) + __builtin_ia32_vpdpbuuds_v8si_mask ((__v8si) __W, + (__v8si) __A, + (__v8si) __B, + (__mmask8) __U); +} + +extern __inline __m256i +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_maskz_dpbuuds_epi32 (__mmask8 __U, __m256i __W, + __m256i __A, __m256i __B) +{ + return (__m256i) + __builtin_ia32_vpdpbuuds_v8si_maskz ((__v8si) __W, + (__v8si) __A, + (__v8si) __B, + (__mmask8) __U); +} + + +#ifdef __DISABLE_AVX10_2_256__ +#undef __DISABLE_AVX10_2_256__ +#pragma GCC pop_options +#endif /* __DISABLE_AVX10_2_256__ */ + +#endif /* __AVX10_2MEDIAINTRIN_H_INCLUDED */ diff --git a/gcc/config/i386/i386-builtin.def b/gcc/config/i386/i386-builtin.def index 416944f8b5b..5bd9aabdc52 100644 --- a/gcc/config/i386/i386-builtin.def +++ b/gcc/config/i386/i386-builtin.def @@ -2748,18 +2748,18 @@ BDESC (OPTION_MASK_ISA_AVX512VNNI | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpdpws BDESC (OPTION_MASK_ISA_AVX512VNNI | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpdpwssds_v4si_maskz, "__builtin_ia32_vpdpwssds_v4si_maskz", IX86_BUILTIN_VPDPWSSDSV4SI_MASKZ, UNKNOWN, (int) V4SI_FTYPE_V4SI_V4SI_V4SI_UQI) /* AVXVNNIINT8 */ -BDESC (0, OPTION_MASK_ISA2_AVXVNNIINT8, CODE_FOR_vpdpbssd_v8si, "__builtin_ia32_vpdpbssd256", IX86_BUILTIN_VPDPBSSDV8SI, UNKNOWN, (int) V8SI_FTYPE_V8SI_V8SI_V8SI) -BDESC (0, OPTION_MASK_ISA2_AVXVNNIINT8, CODE_FOR_vpdpbssds_v8si, "__builtin_ia32_vpdpbssds256", IX86_BUILTIN_VPDPBSSDSV8SI, UNKNOWN, (int) V8SI_FTYPE_V8SI_V8SI_V8SI) -BDESC (0, OPTION_MASK_ISA2_AVXVNNIINT8, CODE_FOR_vpdpbsud_v8si, "__builtin_ia32_vpdpbsud256", IX86_BUILTIN_VPDPBSUDV8SI, UNKNOWN, (int) V8SI_FTYPE_V8SI_V8SI_V8SI) -BDESC (0, OPTION_MASK_ISA2_AVXVNNIINT8, CODE_FOR_vpdpbsuds_v8si, "__builtin_ia32_vpdpbsuds256", IX86_BUILTIN_VPDPBSUDSV8SI, UNKNOWN, (int) V8SI_FTYPE_V8SI_V8SI_V8SI) -BDESC (0, OPTION_MASK_ISA2_AVXVNNIINT8, CODE_FOR_vpdpbuud_v8si, "__builtin_ia32_vpdpbuud256", IX86_BUILTIN_VPDPBUUDV8SI, UNKNOWN, (int) V8SI_FTYPE_V8SI_V8SI_V8SI) -BDESC (0, OPTION_MASK_ISA2_AVXVNNIINT8, CODE_FOR_vpdpbuuds_v8si, "__builtin_ia32_vpdpbuuds256", IX86_BUILTIN_VPDPBUUDSV8SI, UNKNOWN, (int) V8SI_FTYPE_V8SI_V8SI_V8SI) -BDESC (0, OPTION_MASK_ISA2_AVXVNNIINT8, CODE_FOR_vpdpbssd_v4si, "__builtin_ia32_vpdpbssd128", IX86_BUILTIN_VPDPBSSDV4SI, UNKNOWN, (int) V4SI_FTYPE_V4SI_V4SI_V4SI) -BDESC (0, OPTION_MASK_ISA2_AVXVNNIINT8, CODE_FOR_vpdpbssds_v4si, "__builtin_ia32_vpdpbssds128", IX86_BUILTIN_VPDPBSSDSV4SI, UNKNOWN, (int) V4SI_FTYPE_V4SI_V4SI_V4SI) -BDESC (0, OPTION_MASK_ISA2_AVXVNNIINT8, CODE_FOR_vpdpbsud_v4si, "__builtin_ia32_vpdpbsud128", IX86_BUILTIN_VPDPBSUDV4SI, UNKNOWN, (int) V4SI_FTYPE_V4SI_V4SI_V4SI) -BDESC (0, OPTION_MASK_ISA2_AVXVNNIINT8, CODE_FOR_vpdpbsuds_v4si, "__builtin_ia32_vpdpbsuds128", IX86_BUILTIN_VPDPBSUDSV4SI, UNKNOWN, (int) V4SI_FTYPE_V4SI_V4SI_V4SI) -BDESC (0, OPTION_MASK_ISA2_AVXVNNIINT8, CODE_FOR_vpdpbuud_v4si, "__builtin_ia32_vpdpbuud128", IX86_BUILTIN_VPDPBUUDV4SI, UNKNOWN, (int) V4SI_FTYPE_V4SI_V4SI_V4SI) -BDESC (0, OPTION_MASK_ISA2_AVXVNNIINT8, CODE_FOR_vpdpbuuds_v4si, "__builtin_ia32_vpdpbuuds128", IX86_BUILTIN_VPDPBUUDSV4SI, UNKNOWN, (int) V4SI_FTYPE_V4SI_V4SI_V4SI) +BDESC (0, OPTION_MASK_ISA2_AVXVNNIINT8 | OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_vpdpbssd_v8si, "__builtin_ia32_vpdpbssd256", IX86_BUILTIN_VPDPBSSDV8SI, UNKNOWN, (int) V8SI_FTYPE_V8SI_V8SI_V8SI) +BDESC (0, OPTION_MASK_ISA2_AVXVNNIINT8 | OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_vpdpbssds_v8si, "__builtin_ia32_vpdpbssds256", IX86_BUILTIN_VPDPBSSDSV8SI, UNKNOWN, (int) V8SI_FTYPE_V8SI_V8SI_V8SI) +BDESC (0, OPTION_MASK_ISA2_AVXVNNIINT8 | OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_vpdpbsud_v8si, "__builtin_ia32_vpdpbsud256", IX86_BUILTIN_VPDPBSUDV8SI, UNKNOWN, (int) V8SI_FTYPE_V8SI_V8SI_V8SI) +BDESC (0, OPTION_MASK_ISA2_AVXVNNIINT8 | OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_vpdpbsuds_v8si, "__builtin_ia32_vpdpbsuds256", IX86_BUILTIN_VPDPBSUDSV8SI, UNKNOWN, (int) V8SI_FTYPE_V8SI_V8SI_V8SI) +BDESC (0, OPTION_MASK_ISA2_AVXVNNIINT8 | OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_vpdpbuud_v8si, "__builtin_ia32_vpdpbuud256", IX86_BUILTIN_VPDPBUUDV8SI, UNKNOWN, (int) V8SI_FTYPE_V8SI_V8SI_V8SI) +BDESC (0, OPTION_MASK_ISA2_AVXVNNIINT8 | OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_vpdpbuuds_v8si, "__builtin_ia32_vpdpbuuds256", IX86_BUILTIN_VPDPBUUDSV8SI, UNKNOWN, (int) V8SI_FTYPE_V8SI_V8SI_V8SI) +BDESC (0, OPTION_MASK_ISA2_AVXVNNIINT8 | OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_vpdpbssd_v4si, "__builtin_ia32_vpdpbssd128", IX86_BUILTIN_VPDPBSSDV4SI, UNKNOWN, (int) V4SI_FTYPE_V4SI_V4SI_V4SI) +BDESC (0, OPTION_MASK_ISA2_AVXVNNIINT8 | OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_vpdpbssds_v4si, "__builtin_ia32_vpdpbssds128", IX86_BUILTIN_VPDPBSSDSV4SI, UNKNOWN, (int) V4SI_FTYPE_V4SI_V4SI_V4SI) +BDESC (0, OPTION_MASK_ISA2_AVXVNNIINT8 | OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_vpdpbsud_v4si, "__builtin_ia32_vpdpbsud128", IX86_BUILTIN_VPDPBSUDV4SI, UNKNOWN, (int) V4SI_FTYPE_V4SI_V4SI_V4SI) +BDESC (0, OPTION_MASK_ISA2_AVXVNNIINT8 | OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_vpdpbsuds_v4si, "__builtin_ia32_vpdpbsuds128", IX86_BUILTIN_VPDPBSUDSV4SI, UNKNOWN, (int) V4SI_FTYPE_V4SI_V4SI_V4SI) +BDESC (0, OPTION_MASK_ISA2_AVXVNNIINT8 | OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_vpdpbuud_v4si, "__builtin_ia32_vpdpbuud128", IX86_BUILTIN_VPDPBUUDV4SI, UNKNOWN, (int) V4SI_FTYPE_V4SI_V4SI_V4SI) +BDESC (0, OPTION_MASK_ISA2_AVXVNNIINT8 | OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_vpdpbuuds_v4si, "__builtin_ia32_vpdpbuuds128", IX86_BUILTIN_VPDPBUUDSV4SI, UNKNOWN, (int) V4SI_FTYPE_V4SI_V4SI_V4SI) /* AVXVNNIINT16 */ BDESC (0, OPTION_MASK_ISA2_AVXVNNIINT16, CODE_FOR_vpdpwusd_v8si, "__builtin_ia32_vpdpwusd256", IX86_BUILTIN_VPDPWUSDV8SI, UNKNOWN, (int) V8SI_FTYPE_V8SI_V8SI_V8SI) @@ -3020,6 +3020,50 @@ BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp1 BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512vl_fmulc_v16hf, "__builtin_ia32_vfmulcph256", IX86_BUILTIN_VFMULCPH_V16HF, UNKNOWN, (int) V16HF_FTYPE_V16HF_V16HF) BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512vl_fmulc_v16hf_mask, "__builtin_ia32_vfmulcph256_mask", IX86_BUILTIN_VFMULCPH256_MASK, UNKNOWN, (int) V16HF_FTYPE_V16HF_V16HF_V16HF_UQI) +/* AVX10.2. */ +BDESC (0, OPTION_MASK_ISA2_AVX10_2_512, CODE_FOR_vpdpbssd_v16si, "__builtin_ia32_vpdpbssd512", IX86_BUILTIN_VPDPBSSDV16SI, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_512, CODE_FOR_vpdpbssds_v16si, "__builtin_ia32_vpdpbssds512", IX86_BUILTIN_VPDPBSSDSV16SI, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_512, CODE_FOR_vpdpbsud_v16si, "__builtin_ia32_vpdpbsud512", IX86_BUILTIN_VPDPBSUDV16SI, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_512, CODE_FOR_vpdpbsuds_v16si, "__builtin_ia32_vpdpbsuds512", IX86_BUILTIN_VPDPBSUDSV16SI, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_512, CODE_FOR_vpdpbuud_v16si, "__builtin_ia32_vpdpbuud512", IX86_BUILTIN_VPDPBUUDV16SI, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_512, CODE_FOR_vpdpbuuds_v16si, "__builtin_ia32_vpdpbuuds512", IX86_BUILTIN_VPDPBUUDSV16SI, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_512, CODE_FOR_vpdpbssd_v16si_mask, "__builtin_ia32_vpdpbssd_v16si_mask", IX86_BUILTIN_VPDPBSSDV16SI_MASK, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_512, CODE_FOR_vpdpbssd_v16si_maskz, "__builtin_ia32_vpdpbssd_v16si_maskz", IX86_BUILTIN_VPDPBSSDV16SI_MASKZ, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_512, CODE_FOR_vpdpbssds_v16si_mask, "__builtin_ia32_vpdpbssds_v16si_mask", IX86_BUILTIN_VPDPBSSDSV16SI_MASK, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_512, CODE_FOR_vpdpbssds_v16si_maskz, "__builtin_ia32_vpdpbssds_v16si_maskz", IX86_BUILTIN_VPDPBSSDSV16SI_MASKZ, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_512, CODE_FOR_vpdpbsud_v16si_mask, "__builtin_ia32_vpdpbsud_v16si_mask", IX86_BUILTIN_VPDPBSUDV16SI_MASK, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_512, CODE_FOR_vpdpbsud_v16si_maskz, "__builtin_ia32_vpdpbsud_v16si_maskz", IX86_BUILTIN_VPDPBSUDV16SI_MASKZ, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_512, CODE_FOR_vpdpbsuds_v16si_mask, "__builtin_ia32_vpdpbsuds_v16si_mask", IX86_BUILTIN_VPDPBSUDSV16SI_MASK, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_512, CODE_FOR_vpdpbsuds_v16si_maskz, "__builtin_ia32_vpdpbsuds_v16si_maskz", IX86_BUILTIN_VPDPBSUDSV16SI_MASKZ, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_512, CODE_FOR_vpdpbuud_v16si_mask, "__builtin_ia32_vpdpbuud_v16si_mask", IX86_BUILTIN_VPDPBUUDV16SI_MASK, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_512, CODE_FOR_vpdpbuud_v16si_maskz, "__builtin_ia32_vpdpbuud_v16si_maskz", IX86_BUILTIN_VPDPBUUDV16SI_MASKZ, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_512, CODE_FOR_vpdpbuuds_v16si_mask, "__builtin_ia32_vpdpbuuds_v16si_mask", IX86_BUILTIN_VPDPBUUDSV16SI_MASK, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_512, CODE_FOR_vpdpbuuds_v16si_maskz, "__builtin_ia32_vpdpbuuds_v16si_maskz", IX86_BUILTIN_VPDPBUUDSV16SI_MASKZ, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_vpdpbssd_v8si_mask, "__builtin_ia32_vpdpbssd_v8si_mask", IX86_BUILTIN_VPDPBSSDV8SI_MASK, UNKNOWN, (int) V8SI_FTYPE_V8SI_V8SI_V8SI_UQI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_vpdpbssd_v8si_maskz, "__builtin_ia32_vpdpbssd_v8si_maskz", IX86_BUILTIN_VPDPBSSDV8SI_MASKZ, UNKNOWN, (int) V8SI_FTYPE_V8SI_V8SI_V8SI_UQI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_vpdpbssds_v8si_mask, "__builtin_ia32_vpdpbssds_v8si_mask", IX86_BUILTIN_VPDPBSSDSV8SI_MASK, UNKNOWN, (int) V8SI_FTYPE_V8SI_V8SI_V8SI_UQI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_vpdpbssds_v8si_maskz, "__builtin_ia32_vpdpbssds_v8si_maskz", IX86_BUILTIN_VPDPBSSDSV8SI_MASKZ, UNKNOWN, (int) V8SI_FTYPE_V8SI_V8SI_V8SI_UQI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_vpdpbsud_v8si_mask, "__builtin_ia32_vpdpbsud_v8si_mask", IX86_BUILTIN_VPDPBSUDV8SI_MASK, UNKNOWN, (int) V8SI_FTYPE_V8SI_V8SI_V8SI_UQI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_vpdpbsud_v8si_maskz, "__builtin_ia32_vpdpbsud_v8si_maskz", IX86_BUILTIN_VPDPBSUDV8SI_MASKZ, UNKNOWN, (int) V8SI_FTYPE_V8SI_V8SI_V8SI_UQI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_vpdpbsuds_v8si_mask, "__builtin_ia32_vpdpbsuds_v8si_mask", IX86_BUILTIN_VPDPBSUDSV8SI_MASK, UNKNOWN, (int) V8SI_FTYPE_V8SI_V8SI_V8SI_UQI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_vpdpbsuds_v8si_maskz, "__builtin_ia32_vpdpbsuds_v8si_maskz", IX86_BUILTIN_VPDPBSUDSV8SI_MASKZ, UNKNOWN, (int) V8SI_FTYPE_V8SI_V8SI_V8SI_UQI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_vpdpbuud_v8si_mask, "__builtin_ia32_vpdpbuud_v8si_mask", IX86_BUILTIN_VPDPBUUDV8SI_MASK, UNKNOWN, (int) V8SI_FTYPE_V8SI_V8SI_V8SI_UQI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_vpdpbuud_v8si_maskz, "__builtin_ia32_vpdpbuud_v8si_maskz", IX86_BUILTIN_VPDPBUUDV8SI_MASKZ, UNKNOWN, (int) V8SI_FTYPE_V8SI_V8SI_V8SI_UQI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_vpdpbuuds_v8si_mask, "__builtin_ia32_vpdpbuuds_v8si_mask", IX86_BUILTIN_VPDPBUUDSV8SI_MASK, UNKNOWN, (int) V8SI_FTYPE_V8SI_V8SI_V8SI_UQI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_vpdpbuuds_v8si_maskz, "__builtin_ia32_vpdpbuuds_v8si_maskz", IX86_BUILTIN_VPDPBUUDSV8SI_MASKZ, UNKNOWN, (int) V8SI_FTYPE_V8SI_V8SI_V8SI_UQI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_vpdpbssd_v4si_mask, "__builtin_ia32_vpdpbssd_v4si_mask", IX86_BUILTIN_VPDPBSSDV4SI_MASK, UNKNOWN, (int) V4SI_FTYPE_V4SI_V4SI_V4SI_UQI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_vpdpbssd_v4si_maskz, "__builtin_ia32_vpdpbssd_v4si_maskz", IX86_BUILTIN_VPDPBSSDV4SI_MASKZ, UNKNOWN, (int) V4SI_FTYPE_V4SI_V4SI_V4SI_UQI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_vpdpbssds_v4si_mask, "__builtin_ia32_vpdpbssds_v4si_mask", IX86_BUILTIN_VPDPBSSDSV4SI_MASK, UNKNOWN, (int) V4SI_FTYPE_V4SI_V4SI_V4SI_UQI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_vpdpbssds_v4si_maskz, "__builtin_ia32_vpdpbssds_v4si_maskz", IX86_BUILTIN_VPDPBSSDSV4SI_MASKZ, UNKNOWN, (int) V4SI_FTYPE_V4SI_V4SI_V4SI_UQI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_vpdpbsud_v4si_mask, "__builtin_ia32_vpdpbsud_v4si_mask", IX86_BUILTIN_VPDPBSUDV4SI_MASK, UNKNOWN, (int) V4SI_FTYPE_V4SI_V4SI_V4SI_UQI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_vpdpbsud_v4si_maskz, "__builtin_ia32_vpdpbsud_v4si_maskz", IX86_BUILTIN_VPDPBSUDV4SI_MASKZ, UNKNOWN, (int) V4SI_FTYPE_V4SI_V4SI_V4SI_UQI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_vpdpbsuds_v4si_mask, "__builtin_ia32_vpdpbsuds_v4si_mask", IX86_BUILTIN_VPDPBSUDSV4SI_MASK, UNKNOWN, (int) V4SI_FTYPE_V4SI_V4SI_V4SI_UQI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_vpdpbsuds_v4si_maskz, "__builtin_ia32_vpdpbsuds_v4si_maskz", IX86_BUILTIN_VPDPBSUDSV4SI_MASKZ, UNKNOWN, (int) V4SI_FTYPE_V4SI_V4SI_V4SI_UQI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_vpdpbuud_v4si_mask, "__builtin_ia32_vpdpbuud_v4si_mask", IX86_BUILTIN_VPDPBUUDV4SI_MASK, UNKNOWN, (int) V4SI_FTYPE_V4SI_V4SI_V4SI_UQI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_vpdpbuud_v4si_maskz, "__builtin_ia32_vpdpbuud_v4si_maskz", IX86_BUILTIN_VPDPBUUDV4SI_MASKZ, UNKNOWN, (int) V4SI_FTYPE_V4SI_V4SI_V4SI_UQI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_vpdpbuuds_v4si_mask, "__builtin_ia32_vpdpbuuds_v4si_mask", IX86_BUILTIN_VPDPBUUDSV4SI_MASK, UNKNOWN, (int) V4SI_FTYPE_V4SI_V4SI_V4SI_UQI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_vpdpbuuds_v4si_maskz, "__builtin_ia32_vpdpbuuds_v4si_maskz", IX86_BUILTIN_VPDPBUUDSV4SI_MASKZ, UNKNOWN, (int) V4SI_FTYPE_V4SI_V4SI_V4SI_UQI) + /* Builtins with rounding support. */ BDESC_END (ARGS, ROUND_ARGS) diff --git a/gcc/config/i386/i386-builtins.cc b/gcc/config/i386/i386-builtins.cc index a76451f5949..130ba853125 100644 --- a/gcc/config/i386/i386-builtins.cc +++ b/gcc/config/i386/i386-builtins.cc @@ -280,15 +280,17 @@ def_builtin (HOST_WIDE_INT mask, HOST_WIDE_INT mask2, if (((mask2 == 0 || (mask2 & ix86_isa_flags2) != 0) && (mask == 0 || (mask & ix86_isa_flags) != 0)) || ((mask & OPTION_MASK_ISA_MMX) != 0 && TARGET_MMX_WITH_SSE) - /* "Unified" builtin used by either AVXVNNI/AVXIFMA/AES intrinsics - or AVX512VNNIVL/AVX512IFMAVL/VAESVL non-mask intrinsics should be - defined whenever avxvnni/avxifma/aes or avx512vnni/avx512ifma/vaes - && avx512vl exist. */ + /* "Unified" builtin used by either AVXVNNI/AVXIFMA/AES/AVXVNNIINT8 + intrinsics or AVX512VNNIVL/AVX512IFMAVL/VAESVL/AVX10.2 non-mask + intrinsics should be defined whenever avxvnni/avxifma/aes/ + avxvnniint8 or avx512vnni && avx512vl/avx512ifma && avx512vl/vaes + && avx512vl/avx10.2 exist. */ || (mask2 == OPTION_MASK_ISA2_AVXVNNI) || (mask2 == OPTION_MASK_ISA2_AVXIFMA) || (mask2 == (OPTION_MASK_ISA2_AVXNECONVERT | OPTION_MASK_ISA2_AVX512BF16)) || ((mask2 & OPTION_MASK_ISA2_VAES) != 0) + || ((mask2 & OPTION_MASK_ISA2_AVXVNNIINT8) != 0) || (lang_hooks.builtin_function == lang_hooks.builtin_function_ext_scope)) { diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc index bc8f16bfb65..200b768f5d9 100644 --- a/gcc/config/i386/i386-expand.cc +++ b/gcc/config/i386/i386-expand.cc @@ -13298,6 +13298,7 @@ ix86_check_builtin_isa_match (unsigned int fcode, (OPTION_MASK_ISA_AVX512VL | OPTION_MASK_ISA2_AVX512BF16) or OPTION_MASK_ISA2_AVXNECONVERT OPTION_MASK_ISA_AES or (OPTION_MASK_ISA_AVX512VL | OPTION_MASK_ISA2_VAES) + OPTION_MASK_ISA2_AVX10_2 or OPTION_MASK_ISA2_AVXVNNIINT8 where for each such pair it is sufficient if either of the ISAs is enabled, plus if it is ored with other options also those others. OPTION_MASK_ISA_MMX in bisa is satisfied also if TARGET_MMX_WITH_SSE. */ @@ -13323,6 +13324,8 @@ ix86_check_builtin_isa_match (unsigned int fcode, OPTION_MASK_ISA2_AVXNECONVERT); SHARE_BUILTIN (OPTION_MASK_ISA_AES, 0, OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_VAES); + SHARE_BUILTIN (0, OPTION_MASK_ISA2_AVXVNNIINT8, 0, + OPTION_MASK_ISA2_AVX10_2_256); isa = tmp_isa; isa2 = tmp_isa2; diff --git a/gcc/config/i386/immintrin.h b/gcc/config/i386/immintrin.h index 80357d563ee..ce8437d00c2 100644 --- a/gcc/config/i386/immintrin.h +++ b/gcc/config/i386/immintrin.h @@ -140,4 +140,8 @@ #include +#include + +#include + #endif /* _IMMINTRIN_H_INCLUDED */ diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index 8f34c9300d0..41d448f57cb 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -578,6 +578,9 @@ (define_mode_iterator VI4_AVX512VL [(V16SI "TARGET_EVEX512") (V8SI "TARGET_AVX512VL") (V4SI "TARGET_AVX512VL")]) +(define_mode_iterator VI4_AVX10_2 + [(V16SI "TARGET_AVX10_2_512") V8SI V4SI]) + (define_mode_iterator VI48_AVX512F_AVX512VL [V4SI V8SI (V16SI "TARGET_AVX512F && TARGET_EVEX512") (V2DI "TARGET_AVX512VL") (V4DI "TARGET_AVX512VL") @@ -31241,16 +31244,67 @@ }) (define_insn "vpdp_" - [(set (match_operand:VI4_AVX 0 "register_operand" "=x") + [(set (match_operand:VI4_AVX 0 "register_operand" "=v") (unspec:VI4_AVX [(match_operand:VI4_AVX 1 "register_operand" "0") - (match_operand:VI4_AVX 2 "register_operand" "x") - (match_operand:VI4_AVX 3 "nonimmediate_operand" "xjm")] + (match_operand:VI4_AVX 2 "register_operand" "v") + (match_operand:VI4_AVX 3 "nonimmediate_operand" "vm")] VPDOTPROD))] - "TARGET_AVXVNNIINT8" + "TARGET_AVXVNNIINT8 || TARGET_AVX10_2_256" "vpdp\t{%3, %2, %0|%0, %2, %3}" - [(set_attr "prefix" "vex") - (set_attr "addr" "gpr16")]) + [(set_attr "prefix" "maybe_evex")]) + +(define_insn "vpdp_v16si" + [(set (match_operand:V16SI 0 "register_operand" "=v") + (unspec:V16SI + [(match_operand:V16SI 1 "register_operand" "0") + (match_operand:V16SI 2 "register_operand" "v") + (match_operand:V16SI 3 "nonimmediate_operand" "vm")] + VPDOTPROD))] + "TARGET_AVX10_2_512" + "vpdp\t{%3, %2, %0|%0, %2, %3}" + [(set_attr "prefix" "evex")]) + +(define_insn "vpdp__mask" + [(set (match_operand:VI4_AVX10_2 0 "register_operand" "=v") + (vec_merge:VI4_AVX10_2 + (unspec:VI4_AVX10_2 + [(match_operand:VI4_AVX10_2 1 "register_operand" "0") + (match_operand:VI4_AVX10_2 2 "register_operand" "v") + (match_operand:VI4_AVX10_2 3 "nonimmediate_operand" "vm")] + VPDOTPROD) + (match_dup 1) + (match_operand: 4 "register_operand" "Yk")))] + "TARGET_AVX10_2_256" + "vpdp\t{%3, %2, %0%{%4%}|%0%{%4%}, %2, %3}" + [(set_attr "prefix" "evex")]) + +(define_expand "vpdp__maskz" + [(set (match_operand:VI4_AVX10_2 0 "register_operand") + (vec_merge:VI4_AVX10_2 + (unspec:VI4_AVX10_2 + [(match_operand:VI4_AVX10_2 1 "register_operand") + (match_operand:VI4_AVX10_2 2 "register_operand") + (match_operand:VI4_AVX10_2 3 "nonimmediate_operand")] + VPDOTPROD) + (match_dup 5) + (match_operand: 4 "register_operand")))] + "TARGET_AVX10_2_256" + "operands[5] = CONST0_RTX (mode);") + +(define_insn "*vpdp__maskz" + [(set (match_operand:VI4_AVX10_2 0 "register_operand" "=v") + (vec_merge:VI4_AVX10_2 + (unspec:VI4_AVX10_2 + [(match_operand:VI4_AVX10_2 1 "register_operand" "0") + (match_operand:VI4_AVX10_2 2 "register_operand" "v") + (match_operand:VI4_AVX10_2 3 "nonimmediate_operand" "vm")] + VPDOTPROD) + (match_operand:VI4_AVX10_2 5 "const0_operand" "C") + (match_operand: 4 "register_operand" "Yk")))] + "TARGET_AVX10_2_256" + "vpdp\t{%3, %2, %0%{%4%}%N5|%0%{%4%}%N5, %2, %3}" + [(set_attr "prefix" "evex")]) (define_insn "vbcstnebf162ps_" [(set (match_operand:VF1_128_256 0 "register_operand" "=x") diff --git a/gcc/testsuite/gcc.target/i386/avx10-check.h b/gcc/testsuite/gcc.target/i386/avx10-check.h new file mode 100644 index 00000000000..76c32d7acaa --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx10-check.h @@ -0,0 +1,61 @@ +#include +#include "cpuid.h" +#include "m512-check.h" +#include "avx10-os-support.h" + +#ifndef DO_TEST +#define DO_TEST do_test +#if defined(AVX10_512BIT) +static void test_512 (void); +#else +static void test_256 (void); +static void test_128 (void); +#endif + +__attribute__ ((noinline)) +static void +do_test (void) +{ +#if defined(AVX10_512BIT) + test_512 (); +#else + test_256 (); + test_128 (); +#endif +} +#endif + +static int +check_osxsave (void) +{ + unsigned int eax, ebx, ecx, edx; + + __cpuid (1, eax, ebx, ecx, edx); + return (ecx & bit_OSXSAVE) != 0; +} + +int +main () +{ + /* Run AVX10 test only if host has ISA support. */ + if (__builtin_cpu_supports ("avx10.1") +#ifdef AVX10_2 + && __builtin_cpu_supports ("avx10.2") +#endif +#ifdef AVX10_2_512 + && __builtin_cpu_supports ("avx10.2-512") +#endif + && avx10_os_support ()) + { + DO_TEST (); +#ifdef DEBUG + printf ("PASSED\n"); +#endif + return 0; + } + +#ifdef DEBUG + printf ("SKIPPED\n"); +#endif + return 0; +} diff --git a/gcc/testsuite/gcc.target/i386/avx10-helper.h b/gcc/testsuite/gcc.target/i386/avx10-helper.h new file mode 100644 index 00000000000..385c7446979 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx10-helper.h @@ -0,0 +1,23 @@ +#ifndef AVX10_HELPER_INCLUDED +#define AVX10_HELPER_INCLUDED + +#define AVX10 +#define AVX512FP16 + +#include "avx512f-helper.h" +#include "avx512f-mask-type.h" + +#endif /* AVX10_HELPER_INCLUDED */ + +/* Intrinsic being tested. It has different deffinitions, + depending on AVX512F_LEN, so it's outside include guards + and in undefed away to silence warnings. */ +#if defined INTRINSIC +#undef INTRINSIC +#endif + +#if AVX512F_LEN != 128 +#define INTRINSIC(NAME) EVAL(_mm, AVX512F_LEN, NAME) +#else +#define INTRINSIC(NAME) _mm ## NAME +#endif diff --git a/gcc/testsuite/gcc.target/i386/avx10-os-support.h b/gcc/testsuite/gcc.target/i386/avx10-os-support.h new file mode 100644 index 00000000000..ea6899bbcab --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx10-os-support.h @@ -0,0 +1,23 @@ +/* Check if the OS supports executing AVX10 instructions. */ + +#define XCR_XFEATURE_ENABLED_MASK 0x0 + +#define XSTATE_FP 0x1 +#define XSTATE_SSE 0x2 +#define XSTATE_YMM 0x4 +#define XSTATE_OPMASK 0x20 +#define XSTATE_ZMM 0x40 +#define XSTATE_HI_ZMM 0x80 + +static int +avx10_os_support (void) +{ + unsigned int eax, edx; + unsigned int ecx = XCR_XFEATURE_ENABLED_MASK; + unsigned int mask = XSTATE_SSE | XSTATE_YMM | XSTATE_OPMASK + | XSTATE_HI_ZMM; + + __asm__ ("xgetbv" : "=a" (eax), "=d" (edx) : "c" (ecx)); + + return ((eax & mask) == mask); +} diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-512-media-1.c b/gcc/testsuite/gcc.target/i386/avx10_2-512-media-1.c new file mode 100644 index 00000000000..d4145c41a99 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx10_2-512-media-1.c @@ -0,0 +1,52 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx10.2-512 -O2" } */ +/* { dg-final { scan-assembler-times "vpdpbssd\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vpdpbssd\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\\n\\r]*%zmm\[0-9\]+\[^\\n\\r\]*%zmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vpdpbssd\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\\n\\r]*%zmm\[0-9\]+\[^\\n\\r\]*%zmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vpdpbssds\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vpdpbssds\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\\n\\r]*%zmm\[0-9\]+\[^\\n\\r\]*%zmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vpdpbssds\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\\n\\r]*%zmm\[0-9\]+\[^\\n\\r\]*%zmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vpdpbsud\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vpdpbsud\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\\n\\r]*%zmm\[0-9\]+\[^\\n\\r\]*%zmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vpdpbsud\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\\n\\r]*%zmm\[0-9\]+\[^\\n\\r\]*%zmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vpdpbsuds\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vpdpbsuds\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\\n\\r]*%zmm\[0-9\]+\[^\\n\\r\]*%zmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vpdpbsuds\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\\n\\r]*%zmm\[0-9\]+\[^\\n\\r\]*%zmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vpdpbuud\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vpdpbuud\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\\n\\r]*%zmm\[0-9\]+\[^\\n\\r\]*%zmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vpdpbuud\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\\n\\r]*%zmm\[0-9\]+\[^\\n\\r\]*%zmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vpdpbuuds\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vpdpbuuds\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\\n\\r]*%zmm\[0-9\]+\[^\\n\\r\]*%zmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vpdpbuuds\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\\n\\r]*%zmm\[0-9\]+\[^\\n\\r\]*%zmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ + +#include + +volatile __m512i x,y,z,z1; +volatile __mmask16 m16; + +void avx10_2_512_test (void) +{ + x = _mm512_dpbssd_epi32 (x, y, z); + x = _mm512_mask_dpbssd_epi32 (x, m16, y, z); + x = _mm512_maskz_dpbssd_epi32 (m16, x, y, z); + + x = _mm512_dpbssds_epi32 (x, y, z); + x = _mm512_mask_dpbssds_epi32 (x, m16, y, z); + x = _mm512_maskz_dpbssds_epi32 (m16, x, y, z); + + x = _mm512_dpbsud_epi32 (x, y, z); + x = _mm512_mask_dpbsud_epi32 (x, m16, y, z); + x = _mm512_maskz_dpbsud_epi32 (m16, x, y, z); + + x = _mm512_dpbsuds_epi32 (x, y, z); + x = _mm512_mask_dpbsuds_epi32 (x, m16, y, z); + x = _mm512_maskz_dpbsuds_epi32 (m16, x, y, z); + + x = _mm512_dpbuud_epi32 (x, y, z); + x = _mm512_mask_dpbuud_epi32 (x, m16, y, z); + x = _mm512_maskz_dpbuud_epi32 (m16, x, y, z); + + x = _mm512_dpbuuds_epi32 (x, y, z); + x = _mm512_mask_dpbuuds_epi32 (x, m16, y, z); + x = _mm512_maskz_dpbuuds_epi32 (m16, x, y, z); +} diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-512-vpdpbssd-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-512-vpdpbssd-2.c new file mode 100644 index 00000000000..969a5ff844e --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx10_2-512-vpdpbssd-2.c @@ -0,0 +1,71 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mavx10.2-512" } */ +/* { dg-require-effective-target avx10_2_512 } */ + +#ifndef AVX10_2 +#define AVX10_2 +#define AVX10_2_512 +#define AVX10_512BIT +#endif + +#include "avx10-helper.h" + +#define SIZE (AVX512F_LEN / 8) +#define SIZE_RES (AVX512F_LEN / 32) + +static void +CALC (int *r, int *dst, char *s1, char *s2) +{ + short tempres[SIZE]; + for (int i = 0; i < SIZE; i++) + tempres[i] = (short) s1[i] * (short) s2[i]; + for (int i = 0; i < SIZE_RES; i++) + { + long long test = (long long) dst[i] + tempres[i * 4] + tempres[i * 4 + 1] + + tempres[i * 4 + 2] + tempres[i * 4 + 3]; + r[i] = test; + } +} + +void +TEST (void) +{ + int i; + UNION_TYPE (AVX512F_LEN, i_d) res1, res2, res3; + UNION_TYPE (AVX512F_LEN, i_b) src1; + UNION_TYPE (AVX512F_LEN, i_b) src2; + MASK_TYPE mask = MASK_VALUE; + int res_ref[SIZE_RES], res_ref2[SIZE_RES]; + + for (i = 0; i < SIZE; i++) + { + int sign = i % 2 ? 1 : -1; + src1.a[i] = sign * (10 + 3 * i * i); + src2.a[i] = sign * 10 * i * i; + } + + for (i = 0; i < SIZE_RES; i++) + { + res1.a[i] = 0x7FFFFFFF; + res2.a[i] = DEFAULT_VALUE; + res3.a[i] = DEFAULT_VALUE; + } + + CALC (res_ref, res1.a, src1.a, src2.a); + CALC (res_ref2, res2.a, src1.a, src2.a); + + res1.x = INTRINSIC (_dpbssd_epi32) (res1.x, src1.x, src2.x); + res2.x = INTRINSIC (_mask_dpbssd_epi32) (res2.x, mask, src1.x, src2.x); + res3.x = INTRINSIC (_maskz_dpbssd_epi32) (mask, res3.x, src1.x, src2.x); + + if (UNION_CHECK (AVX512F_LEN, i_d) (res1, res_ref)) + abort (); + + MASK_MERGE (i_d) (res_ref2, mask, SIZE_RES); + if (UNION_CHECK (AVX512F_LEN, i_d) (res2, res_ref2)) + abort (); + + MASK_ZERO (i_d) (res_ref2, mask, SIZE_RES); + if (UNION_CHECK (AVX512F_LEN, i_d) (res3, res_ref2)) + abort (); +} diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-512-vpdpbssds-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-512-vpdpbssds-2.c new file mode 100644 index 00000000000..1f147009186 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx10_2-512-vpdpbssds-2.c @@ -0,0 +1,74 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mavx10.2-512" } */ +/* { dg-require-effective-target avx10_2_512 } */ + +#ifndef AVX10_2 +#define AVX10_2 +#define AVX10_2_512 +#define AVX10_512BIT +#endif + +#include "avx10-helper.h" + +#define SIZE (AVX512F_LEN / 8) +#define SIZE_RES (AVX512F_LEN / 32) + +static void +CALC (int *r, int *dst, char *s1, char *s2) +{ + short tempres[SIZE]; + for (int i = 0; i < SIZE; i++) + tempres[i] = (short) s1[i] * (short) s2[i]; + for (int i = 0; i < SIZE_RES; i++) + { + long long test = (long long) dst[i] + tempres[i * 4] + tempres[i * 4 + 1] + + tempres[i * 4 + 2] + tempres[i * 4 + 3]; + long long max_int = 0x7FFFFFFF; + if (test > max_int) + test = max_int; + r[i] = test; + } +} + +void +TEST (void) +{ + int i; + UNION_TYPE (AVX512F_LEN, i_d) res1, res2, res3; + UNION_TYPE (AVX512F_LEN, i_b) src1; + UNION_TYPE (AVX512F_LEN, i_b) src2; + MASK_TYPE mask = MASK_VALUE; + int res_ref[SIZE_RES], res_ref2[SIZE_RES]; + + for (i = 0; i < SIZE; i++) + { + int sign = i % 2 ? 1 : -1; + src1.a[i] = sign * (10 + 3 * i * i); + src2.a[i] = sign * 10 * i * i; + } + + for (i = 0; i < SIZE_RES; i++) + { + res1.a[i] = 0x7FFFFFFF; + res2.a[i] = DEFAULT_VALUE; + res3.a[i] = DEFAULT_VALUE; + } + + CALC (res_ref, res1.a, src1.a, src2.a); + CALC (res_ref2, res2.a, src1.a, src2.a); + + res1.x = INTRINSIC (_dpbssds_epi32) (res1.x, src1.x, src2.x); + res2.x = INTRINSIC (_mask_dpbssds_epi32) (res2.x, mask, src1.x, src2.x); + res3.x = INTRINSIC (_maskz_dpbssds_epi32) (mask, res3.x, src1.x, src2.x); + + if (UNION_CHECK (AVX512F_LEN, i_d) (res1, res_ref)) + abort (); + + MASK_MERGE (i_d) (res_ref2, mask, SIZE_RES); + if (UNION_CHECK (AVX512F_LEN, i_d) (res2, res_ref2)) + abort (); + + MASK_ZERO (i_d) (res_ref2, mask, SIZE_RES); + if (UNION_CHECK (AVX512F_LEN, i_d) (res3, res_ref2)) + abort (); +} diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-512-vpdpbsud-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-512-vpdpbsud-2.c new file mode 100644 index 00000000000..81653b223c7 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx10_2-512-vpdpbsud-2.c @@ -0,0 +1,71 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mavx10.2-512" } */ +/* { dg-require-effective-target avx10_2_512 } */ + +#ifndef AVX10_2 +#define AVX10_2 +#define AVX10_2_512 +#define AVX10_512BIT +#endif + +#include "avx10-helper.h" + +#define SIZE (AVX512F_LEN / 8) +#define SIZE_RES (AVX512F_LEN / 32) + +static void +CALC (int *r, int *dst, char *s1, unsigned char *s2) +{ + short tempres[SIZE]; + for (int i = 0; i < SIZE; i++) + tempres[i] = (short) s1[i] * (unsigned short) s2[i]; + for (int i = 0; i < SIZE_RES; i++) + { + long long test = (long long) dst[i] + tempres[i * 4] + tempres[i * 4 + 1] + + tempres[i * 4 + 2] + tempres[i * 4 + 3]; + r[i] = test; + } +} + +void +TEST (void) +{ + int i; + UNION_TYPE (AVX512F_LEN, i_d) res1, res2, res3; + UNION_TYPE (AVX512F_LEN, i_b) src1; + UNION_TYPE (AVX512F_LEN, i_ub) src2; + MASK_TYPE mask = MASK_VALUE; + int res_ref[SIZE_RES], res_ref2[SIZE_RES]; + + for (i = 0; i < SIZE; i++) + { + int sign = i % 2 ? 1 : -1; + src1.a[i] = sign*10*i*i; + src2.a[i] = 10 + 3*i*i + sign; + } + + for (i = 0; i < SIZE_RES; i++) + { + res1.a[i] = 0x7FFFFFFF; + res2.a[i] = DEFAULT_VALUE; + res3.a[i] = DEFAULT_VALUE; + } + + CALC (res_ref, res1.a, src1.a, src2.a); + CALC (res_ref2, res2.a, src1.a, src2.a); + + res1.x = INTRINSIC (_dpbsud_epi32) (res1.x, src1.x, src2.x); + res2.x = INTRINSIC (_mask_dpbsud_epi32) (res2.x, mask, src1.x, src2.x); + res3.x = INTRINSIC (_maskz_dpbsud_epi32) (mask, res3.x, src1.x, src2.x); + + if (UNION_CHECK (AVX512F_LEN, i_d) (res1, res_ref)) + abort (); + + MASK_MERGE (i_d) (res_ref2, mask, SIZE_RES); + if (UNION_CHECK (AVX512F_LEN, i_d) (res2, res_ref2)) + abort (); + + MASK_ZERO (i_d) (res_ref2, mask, SIZE_RES); + if (UNION_CHECK (AVX512F_LEN, i_d) (res3, res_ref2)) + abort (); +} diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-512-vpdpbsuds-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-512-vpdpbsuds-2.c new file mode 100644 index 00000000000..70a00aa76f2 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx10_2-512-vpdpbsuds-2.c @@ -0,0 +1,74 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mavx10.2-512" } */ +/* { dg-require-effective-target avx10_2_512 } */ + +#ifndef AVX10_2 +#define AVX10_2 +#define AVX10_2_512 +#define AVX10_512BIT +#endif + +#include "avx10-helper.h" + +#define SIZE (AVX512F_LEN / 8) +#define SIZE_RES (AVX512F_LEN / 32) + +static void +CALC (int *r, int *dst, char *s1, unsigned char *s2) +{ + short tempres[SIZE]; + for (int i = 0; i < SIZE; i++) + tempres[i] = (short) s1[i] * (unsigned short) s2[i]; + for (int i = 0; i < SIZE_RES; i++) + { + long long test = (long long) dst[i] + tempres[i * 4] + tempres[i * 4 + 1] + + tempres[i * 4 + 2] + tempres[i * 4 + 3]; + long long max_int = 0x7FFFFFFF; + if (test > max_int) + test = max_int; + r[i] = test; + } +} + +void +TEST (void) +{ + int i; + UNION_TYPE (AVX512F_LEN, i_d) res1, res2, res3; + UNION_TYPE (AVX512F_LEN, i_b) src1; + UNION_TYPE (AVX512F_LEN, i_ub) src2; + MASK_TYPE mask = MASK_VALUE; + int res_ref[SIZE_RES], res_ref2[SIZE_RES]; + + for (i = 0; i < SIZE; i++) + { + int sign = i % 2 ? 1 : -1; + src1.a[i] = sign * 10 * i * i; + src2.a[i] = 10 + 3 * i * i + sign; + } + + for (i = 0; i < SIZE_RES; i++) + { + res1.a[i] = 0x7FFFFFFF; + res2.a[i] = DEFAULT_VALUE; + res3.a[i] = DEFAULT_VALUE; + } + + CALC (res_ref, res1.a, src1.a, src2.a); + CALC (res_ref2, res2.a, src1.a, src2.a); + + res1.x = INTRINSIC (_dpbsuds_epi32) (res1.x, src1.x, src2.x); + res2.x = INTRINSIC (_mask_dpbsuds_epi32) (res2.x, mask, src1.x, src2.x); + res3.x = INTRINSIC (_maskz_dpbsuds_epi32) (mask, res3.x, src1.x, src2.x); + + if (UNION_CHECK (AVX512F_LEN, i_d) (res1, res_ref)) + abort (); + + MASK_MERGE (i_d) (res_ref2, mask, SIZE_RES); + if (UNION_CHECK (AVX512F_LEN, i_d) (res2, res_ref2)) + abort (); + + MASK_ZERO (i_d) (res_ref2, mask, SIZE_RES); + if (UNION_CHECK (AVX512F_LEN, i_d) (res3, res_ref2)) + abort (); +} diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-512-vpdpbuud-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-512-vpdpbuud-2.c new file mode 100644 index 00000000000..84ef32f1b01 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx10_2-512-vpdpbuud-2.c @@ -0,0 +1,70 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mavx10.2-512" } */ +/* { dg-require-effective-target avx10_2_512 } */ + +#ifndef AVX10_2 +#define AVX10_2 +#define AVX10_2_512 +#define AVX10_512BIT +#endif + +#include "avx10-helper.h" + +#define SIZE (AVX512F_LEN / 8) +#define SIZE_RES (AVX512F_LEN / 32) + +static void +CALC (int *r, int *dst, unsigned char *s1, unsigned char *s2) +{ + unsigned short tempres[SIZE]; + for (int i = 0; i < SIZE; i++) + tempres[i] = (unsigned short) s1[i] * (unsigned short) s2[i]; + for (int i = 0; i < SIZE_RES; i++) + { + long long test = (long long) dst[i] + tempres[i * 4] + tempres[i * 4 + 1] + + tempres[i * 4 + 2] + tempres[i * 4 + 3]; + r[i] = test; + } +} + +void +TEST (void) +{ + int i; + UNION_TYPE (AVX512F_LEN, i_d) res1, res2, res3; + UNION_TYPE (AVX512F_LEN, i_ub) src1; + UNION_TYPE (AVX512F_LEN, i_ub) src2; + MASK_TYPE mask = MASK_VALUE; + int res_ref[SIZE_RES], res_ref2[SIZE_RES]; + + for (i = 0; i < SIZE; i++) + { + src1.a[i] = 10 + 3 * i * i; + src2.a[i] = 10 * i * i; + } + + for (i = 0; i < SIZE_RES; i++) + { + res1.a[i] = 0x7FFFFFFF; + res2.a[i] = DEFAULT_VALUE; + res3.a[i] = DEFAULT_VALUE; + } + + CALC (res_ref, res1.a, src1.a, src2.a); + CALC (res_ref2, res2.a, src1.a, src2.a); + + res1.x = INTRINSIC (_dpbuud_epi32) (res1.x, src1.x, src2.x); + res2.x = INTRINSIC (_mask_dpbuud_epi32) (res2.x, mask, src1.x, src2.x); + res3.x = INTRINSIC (_maskz_dpbuud_epi32) (mask, res3.x, src1.x, src2.x); + + if (UNION_CHECK (AVX512F_LEN, i_d) (res1, res_ref)) + abort (); + + MASK_MERGE (i_d) (res_ref2, mask, SIZE_RES); + if (UNION_CHECK (AVX512F_LEN, i_d) (res2, res_ref2)) + abort (); + + MASK_ZERO (i_d) (res_ref2, mask, SIZE_RES); + if (UNION_CHECK (AVX512F_LEN, i_d) (res3, res_ref2)) + abort (); +} diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-512-vpdpbuuds-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-512-vpdpbuuds-2.c new file mode 100644 index 00000000000..98fe36d6b66 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx10_2-512-vpdpbuuds-2.c @@ -0,0 +1,73 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mavx10.2-512" } */ +/* { dg-require-effective-target avx10_2_512 } */ + +#ifndef AVX10_2 +#define AVX10_2 +#define AVX10_2_512 +#define AVX10_512BIT +#endif + +#include "avx10-helper.h" + +#define SIZE (AVX512F_LEN / 8) +#define SIZE_RES (AVX512F_LEN / 32) + +static void +CALC (int *r, int *dst, unsigned char *s1, unsigned char *s2) +{ + unsigned short tempres[SIZE]; + for (int i = 0; i < SIZE; i++) + tempres[i] = (unsigned short) s1[i] * (unsigned short) s2[i]; + for (int i = 0; i < SIZE_RES; i++) + { + long long test = (long long) dst[i] + tempres[i * 4] + tempres[i * 4 + 1] + + tempres[i * 4 + 2] + tempres[i * 4 + 3]; + long long max_uint = 0xFFFFFFFF; + if (test > max_uint) + test = max_uint; + r[i] = test; + } +} + +void +TEST (void) +{ + int i; + UNION_TYPE (AVX512F_LEN, i_d) res1, res2, res3; + UNION_TYPE (AVX512F_LEN, i_ub) src1; + UNION_TYPE (AVX512F_LEN, i_ub) src2; + MASK_TYPE mask = MASK_VALUE; + int res_ref[SIZE_RES], res_ref2[SIZE_RES]; + + for (i = 0; i < SIZE; i++) + { + src1.a[i] = 10 + 3 * i * i; + src2.a[i] = 10 * i * i; + } + + for (i = 0; i < SIZE_RES; i++) + { + res1.a[i] = 0x7FFFFFFF; + res2.a[i] = DEFAULT_VALUE; + res3.a[i] = DEFAULT_VALUE; + } + + CALC (res_ref, res1.a, src1.a, src2.a); + CALC (res_ref2, res2.a, src1.a, src2.a); + + res1.x = INTRINSIC (_dpbuuds_epi32) (res1.x, src1.x, src2.x); + res2.x = INTRINSIC (_mask_dpbuuds_epi32) (res2.x, mask, src1.x, src2.x); + res3.x = INTRINSIC (_maskz_dpbuuds_epi32) (mask, res3.x, src1.x, src2.x); + + if (UNION_CHECK (AVX512F_LEN, i_d) (res1, res_ref)) + abort (); + + MASK_MERGE (i_d) (res_ref2, mask, SIZE_RES); + if (UNION_CHECK (AVX512F_LEN, i_d) (res2, res_ref2)) + abort (); + + MASK_ZERO (i_d) (res_ref2, mask, SIZE_RES); + if (UNION_CHECK (AVX512F_LEN, i_d) (res3, res_ref2)) + abort (); +} diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-builtin-1.c b/gcc/testsuite/gcc.target/i386/avx10_2-builtin-1.c new file mode 100644 index 00000000000..daf61c785a2 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx10_2-builtin-1.c @@ -0,0 +1,8 @@ +/* { dg-do compile } */ +/* { dg-options "-O0 -mavx10.2 -mno-avxvnniint8" } */ +typedef int v8si __attribute__ ((vector_size (32))); +v8si +foo (v8si a, v8si b, v8si c) +{ + return __builtin_ia32_vpdpbssd256 (a, b, c); +} diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-media-1.c b/gcc/testsuite/gcc.target/i386/avx10_2-media-1.c new file mode 100644 index 00000000000..c2b3e5527d9 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx10_2-media-1.c @@ -0,0 +1,96 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx10.2 -O2" } */ +/* { dg-final { scan-assembler-times "vpdpbssd\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vpdpbssd\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\\n\\r]*%ymm\[0-9\]+\[^\\n\\r\]*%ymm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vpdpbssd\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\\n\\r]*%ymm\[0-9\]+\[^\\n\\r\]*%ymm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vpdpbssd\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vpdpbssd\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\\n\\r]*%xmm\[0-9\]+\[^\\n\\r\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vpdpbssd\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\\n\\r]*%xmm\[0-9\]+\[^\\n\\r\]*%xmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vpdpbssds\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vpdpbssds\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\\n\\r]*%ymm\[0-9\]+\[^\\n\\r\]*%ymm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vpdpbssds\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\\n\\r]*%ymm\[0-9\]+\[^\\n\\r\]*%ymm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vpdpbssds\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vpdpbssds\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\\n\\r]*%xmm\[0-9\]+\[^\\n\\r\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vpdpbssds\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\\n\\r]*%xmm\[0-9\]+\[^\\n\\r\]*%xmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vpdpbsud\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vpdpbsud\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\\n\\r]*%ymm\[0-9\]+\[^\\n\\r\]*%ymm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vpdpbsud\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\\n\\r]*%ymm\[0-9\]+\[^\\n\\r\]*%ymm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vpdpbsud\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vpdpbsud\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\\n\\r]*%xmm\[0-9\]+\[^\\n\\r\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vpdpbsud\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\\n\\r]*%xmm\[0-9\]+\[^\\n\\r\]*%xmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vpdpbsuds\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vpdpbsuds\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\\n\\r]*%ymm\[0-9\]+\[^\\n\\r\]*%ymm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vpdpbsuds\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\\n\\r]*%ymm\[0-9\]+\[^\\n\\r\]*%ymm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vpdpbsuds\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vpdpbsuds\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\\n\\r]*%xmm\[0-9\]+\[^\\n\\r\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vpdpbsuds\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\\n\\r]*%xmm\[0-9\]+\[^\\n\\r\]*%xmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vpdpbuud\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vpdpbuud\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\\n\\r]*%ymm\[0-9\]+\[^\\n\\r\]*%ymm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vpdpbuud\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\\n\\r]*%ymm\[0-9\]+\[^\\n\\r\]*%ymm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vpdpbuud\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vpdpbuud\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\\n\\r]*%xmm\[0-9\]+\[^\\n\\r\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vpdpbuud\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\\n\\r]*%xmm\[0-9\]+\[^\\n\\r\]*%xmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vpdpbuuds\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vpdpbuuds\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\\n\\r]*%ymm\[0-9\]+\[^\\n\\r\]*%ymm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vpdpbuuds\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\\n\\r]*%ymm\[0-9\]+\[^\\n\\r\]*%ymm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vpdpbuuds\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vpdpbuuds\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\\n\\r]*%xmm\[0-9\]+\[^\\n\\r\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vpdpbuuds\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\\n\\r]*%xmm\[0-9\]+\[^\\n\\r\]*%xmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ + +#include + +volatile __m256i x,y,z; +volatile __m128i x_,y_,z_; +volatile __mmask8 m; + +void extern +avx10_2_test (void) +{ + x = _mm256_dpbssd_epi32 (x, y, z); + x = _mm256_mask_dpbssd_epi32 (x, m, y, z); + x = _mm256_maskz_dpbssd_epi32 (m, x, y, z); + + x_ = _mm_dpbssd_epi32 (x_, y_, z_); + x_ = _mm_mask_dpbssd_epi32 (x_, m, y_, z_); + x_ = _mm_maskz_dpbssd_epi32 (m, x_, y_, z_); + + x = _mm256_dpbssds_epi32 (x, y, z); + x = _mm256_mask_dpbssds_epi32 (x, m, y, z); + x = _mm256_maskz_dpbssds_epi32 (m, x, y, z); + + x_ = _mm_dpbssds_epi32 (x_, y_, z_); + x_ = _mm_mask_dpbssds_epi32 (x_, m, y_, z_); + x_ = _mm_maskz_dpbssds_epi32 (m, x_, y_, z_); + + x = _mm256_dpbsud_epi32 (x, y, z); + x = _mm256_mask_dpbsud_epi32 (x, m, y, z); + x = _mm256_maskz_dpbsud_epi32 (m, x, y, z); + + x_ = _mm_dpbsud_epi32 (x_, y_, z_); + x_ = _mm_mask_dpbsud_epi32 (x_, m, y_, z_); + x_ = _mm_maskz_dpbsud_epi32 (m, x_, y_, z_); + + x = _mm256_dpbsuds_epi32 (x, y, z); + x = _mm256_mask_dpbsuds_epi32 (x, m, y, z); + x = _mm256_maskz_dpbsuds_epi32 (m, x, y, z); + + x_ = _mm_dpbsuds_epi32 (x_, y_, z_); + x_ = _mm_mask_dpbsuds_epi32 (x_, m, y_, z_); + x_ = _mm_maskz_dpbsuds_epi32 (m, x_, y_, z_); + + x = _mm256_dpbuud_epi32 (x, y, z); + x = _mm256_mask_dpbuud_epi32 (x, m, y, z); + x = _mm256_maskz_dpbuud_epi32 (m, x, y, z); + + x_ = _mm_dpbuud_epi32 (x_, y_, z_); + x_ = _mm_mask_dpbuud_epi32 (x_, m, y_, z_); + x_ = _mm_maskz_dpbuud_epi32 (m, x_, y_, z_); + + x = _mm256_dpbuuds_epi32 (x, y, z); + x = _mm256_mask_dpbuuds_epi32 (x, m, y, z); + x = _mm256_maskz_dpbuuds_epi32 (m, x, y, z); + + x_ = _mm_dpbuuds_epi32 (x_, y_, z_); + x_ = _mm_mask_dpbuuds_epi32 (x_, m, y_, z_); + x_ = _mm_maskz_dpbuuds_epi32 (m, x_, y_, z_); +} diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-vpdpbssd-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-vpdpbssd-2.c new file mode 100644 index 00000000000..510216a7be2 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx10_2-vpdpbssd-2.c @@ -0,0 +1,16 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mavx10.2" } */ +/* { dg-require-effective-target avx10_2 } */ + +#define AVX10_2 +#define AVX512VL +#define AVX512F_LEN 256 +#define AVX512F_LEN_HALF 128 +#include "avx10_2-512-vpdpbssd-2.c" + +#undef AVX512F_LEN +#undef AVX512F_LEN_HALF + +#define AVX512F_LEN 128 +#define AVX512F_LEN_HALF 128 +#include "avx10_2-512-vpdpbssd-2.c" diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-vpdpbssds-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-vpdpbssds-2.c new file mode 100644 index 00000000000..4b84105c202 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx10_2-vpdpbssds-2.c @@ -0,0 +1,16 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mavx10.2" } */ +/* { dg-require-effective-target avx10_2 } */ + +#define AVX10_2 +#define AVX512VL +#define AVX512F_LEN 256 +#define AVX512F_LEN_HALF 128 +#include "avx10_2-512-vpdpbssds-2.c" + +#undef AVX512F_LEN +#undef AVX512F_LEN_HALF + +#define AVX512F_LEN 128 +#define AVX512F_LEN_HALF 128 +#include "avx10_2-512-vpdpbssds-2.c" diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-vpdpbsud-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-vpdpbsud-2.c new file mode 100644 index 00000000000..e4f0f415a1a --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx10_2-vpdpbsud-2.c @@ -0,0 +1,16 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mavx10.2" } */ +/* { dg-require-effective-target avx10_2 } */ + +#define AVX10_2 +#define AVX512VL +#define AVX512F_LEN 256 +#define AVX512F_LEN_HALF 128 +#include "avx10_2-512-vpdpbsud-2.c" + +#undef AVX512F_LEN +#undef AVX512F_LEN_HALF + +#define AVX512F_LEN 128 +#define AVX512F_LEN_HALF 128 +#include "avx10_2-512-vpdpbsud-2.c" diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-vpdpbsuds-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-vpdpbsuds-2.c new file mode 100644 index 00000000000..ca7942e288e --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx10_2-vpdpbsuds-2.c @@ -0,0 +1,16 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mavx10.2" } */ +/* { dg-require-effective-target avx10_2 } */ + +#define AVX10_2 +#define AVX512VL +#define AVX512F_LEN 256 +#define AVX512F_LEN_HALF 128 +#include "avx10_2-512-vpdpbsuds-2.c" + +#undef AVX512F_LEN +#undef AVX512F_LEN_HALF + +#define AVX512F_LEN 128 +#define AVX512F_LEN_HALF 128 +#include "avx10_2-512-vpdpbsuds-2.c" diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-vpdpbuud-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-vpdpbuud-2.c new file mode 100644 index 00000000000..9664c99baa2 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx10_2-vpdpbuud-2.c @@ -0,0 +1,16 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mavx10.2" } */ +/* { dg-require-effective-target avx10_2 } */ + +#define AVX10_2 +#define AVX512VL +#define AVX512F_LEN 256 +#define AVX512F_LEN_HALF 128 +#include "avx10_2-512-vpdpbuud-2.c" + +#undef AVX512F_LEN +#undef AVX512F_LEN_HALF + +#define AVX512F_LEN 128 +#define AVX512F_LEN_HALF 128 +#include "avx10_2-512-vpdpbuud-2.c" diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-vpdpbuuds-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-vpdpbuuds-2.c new file mode 100644 index 00000000000..285637bbc13 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx10_2-vpdpbuuds-2.c @@ -0,0 +1,16 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mavx10.2" } */ +/* { dg-require-effective-target avx10_2 } */ + +#define AVX10_2 +#define AVX512VL +#define AVX512F_LEN 256 +#define AVX512F_LEN_HALF 128 +#include "avx10_2-512-vpdpbuuds-2.c" + +#undef AVX512F_LEN +#undef AVX512F_LEN_HALF + +#define AVX512F_LEN 128 +#define AVX512F_LEN_HALF 128 +#include "avx10_2-512-vpdpbuuds-2.c" diff --git a/gcc/testsuite/gcc.target/i386/avx512f-helper.h b/gcc/testsuite/gcc.target/i386/avx512f-helper.h index 72982f95aed..3cd6751af26 100644 --- a/gcc/testsuite/gcc.target/i386/avx512f-helper.h +++ b/gcc/testsuite/gcc.target/i386/avx512f-helper.h @@ -8,7 +8,11 @@ #ifndef AVX512F_HELPER_INCLUDED #define AVX512F_HELPER_INCLUDED +#if defined(AVX10) +#include "avx10-check.h" +#else #include "avx512-check.h" +#endif /* Macros expansion. */ #define CONCAT(a,b,c) a ## b ## c @@ -87,7 +91,7 @@ MAKE_MASK_ZERO(i_uq, unsigned long long) /* Function which calculates result. */ #define CALC EVAL(calc_, AVX512F_LEN,) -#ifndef AVX512VL +#if !defined(AVX512VL) || defined(AVX10_512) #define AVX512F_LEN 512 #define AVX512F_LEN_HALF 256 #endif diff --git a/gcc/testsuite/gcc.target/i386/avxvnniint8-builtin.c b/gcc/testsuite/gcc.target/i386/avxvnniint8-builtin.c new file mode 100644 index 00000000000..9d61af6df54 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avxvnniint8-builtin.c @@ -0,0 +1,8 @@ +/* { dg-do compile } */ +/* { dg-options "-O0 -mavxvnniint8 -mno-avx10.2" } */ +typedef int v8si __attribute__ ((vector_size (32))); +v8si +foo (v8si a, v8si b, v8si c) +{ + return __builtin_ia32_vpdpbssd256 (a, b, c); +} diff --git a/gcc/testsuite/gcc.target/i386/funcspec-56.inc b/gcc/testsuite/gcc.target/i386/funcspec-56.inc index e4713eaa88d..0852e539cd7 100644 --- a/gcc/testsuite/gcc.target/i386/funcspec-56.inc +++ b/gcc/testsuite/gcc.target/i386/funcspec-56.inc @@ -87,6 +87,8 @@ extern void test_sm3 (void) __attribute__((__target__("sm3"))); extern void test_sha512 (void) __attribute__((__target__("sha512"))); extern void test_sm4 (void) __attribute__((__target__("sm4"))); extern void test_user_msr (void) __attribute__((__target__("usermsr"))); +extern void test_avx10_2 (void) __attribute__((__target__("avx10.2"))); +extern void test_avx10_2_512 (void) __attribute__((__target__("avx10.2-512"))); extern void test_no_sgx (void) __attribute__((__target__("no-sgx"))); extern void test_no_avx512vpopcntdq(void) __attribute__((__target__("no-avx512vpopcntdq"))); @@ -175,6 +177,8 @@ extern void test_no_sm3 (void) __attribute__((__target__("no-sm3"))); extern void test_no_sha512 (void) __attribute__((__target__("no-sha512"))); extern void test_no_sm4 (void) __attribute__((__target__("no-sm4"))); extern void test_no_user_msr (void) __attribute__((__target__("no-usermsr"))); +extern void test_no_avx10_2 (void) __attribute__((__target__("no-avx10.2"))); +extern void test_no_avx10_2_512 (void) __attribute__((__target__("no-avx10.2-512"))); extern void test_arch_nocona (void) __attribute__((__target__("arch=nocona"))); extern void test_arch_core2 (void) __attribute__((__target__("arch=core2"))); diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp index 11ba77ca404..67362a5e9ca 100644 --- a/gcc/testsuite/lib/target-supports.exp +++ b/gcc/testsuite/lib/target-supports.exp @@ -10638,6 +10638,42 @@ proc check_effective_target_apxf { } { } "-mapxf" ] } +# Return 1 if avx10.2 instructions can be compiled. +proc check_effective_target_avx10_2 { } { + return [check_no_compiler_messages avx10.2 object { + typedef int __v8si __attribute__ ((__vector_size__ (32))); + typedef char __mmask8; + + __v8si + _mm256_mask_vpdpbssd_epi32 (__v8si __A, __mmask8 __U, + __v8si __B, __v8si __C) + { + return (__v8si) __builtin_ia32_vpdpbssd_v8si_mask ((__v8si)__A, + (__v8si)__B, + (__v8si)__C, + (__mmask8)__U); + } + } "-mavx10.2" ] +} + +# Return 1 if avx10.2-512 instructions can be compiled. +proc check_effective_target_avx10_2_512 { } { + return [check_no_compiler_messages avx10.2-512 object { + typedef int __v16si __attribute__ ((__vector_size__ (64))); + typedef short __mmask16; + + __v16si + _mm512_vpdpbssd_epi32 (__v16si __A, __mmask16 __U, + __v16si __B, __v16si __C) + { + return (__v16si) __builtin_ia32_vpdpbssd_v16si_mask ((__v16si)__A, + (__v16si)__B, + (__v16si)__C, + (__mmask16)__U); + } + } "-mavx10.2-512" ] +} + # Return 1 if sse instructions can be compiled. proc check_effective_target_sse { } { return [check_no_compiler_messages sse object { From patchwork Mon Aug 19 08:56:47 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Haochen Jiang X-Patchwork-Id: 1973727 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=intel.com header.i=@intel.com header.a=rsa-sha256 header.s=Intel header.b=QMRGyunQ; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=8.43.85.97; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4WnRMK6CV5z1yfj for ; Mon, 19 Aug 2024 18:58:05 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 1D2953864C6A for ; Mon, 19 Aug 2024 08:58:04 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.18]) by sourceware.org (Postfix) with ESMTPS id A65AF385841E for ; Mon, 19 Aug 2024 08:57:23 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org A65AF385841E Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=intel.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org A65AF385841E Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=192.198.163.18 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1724057850; cv=none; b=pm0qO71/dYCe7TxpoqfMsbX9mmCfFroEpfkOelHsOJZ4NJVISBqo5dOzuWXBemNFVbUnYn+5V8B0zZydGsZdnc/3Z1/ans/GZL4uunVJH8ig6zZWFZ0Dclf1mQ8yzne/UTrA6bQ32VZeULYfx57+6H3zkHQ7TJ5tJ3J6gVl97hs= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1724057850; c=relaxed/simple; bh=A2fYi1hYFmKsKO4W9uVCUE4OtUhN9+0LfyhevGpFJvc=; h=DKIM-Signature:From:To:Subject:Date:Message-ID:MIME-Version; b=jEigzWVTioYN04XAg+ld+iVwCZtseO90clZZ0Tyk7niYURyDvqWp/TMS2hEQWewTD/TYEVsQl8Mkv3uJiM0+DTih25DPBQsmjwR4M2aj0hWe4QvX5DeSJHcR9znV8zGOkpDMMrkiyQrbNzuNngHY3X/w5c6CZmRjkNjLxOKReEo= ARC-Authentication-Results: i=1; server2.sourceware.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1724057844; x=1755593844; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=A2fYi1hYFmKsKO4W9uVCUE4OtUhN9+0LfyhevGpFJvc=; b=QMRGyunQNucT4K9g7SjFN81DkIU9Bj3cVpbh0IQNasp/0SbEwBLY/xDQ tkpyklk8octXfHHgKhV7uZvWV/RM/TVlj6wQXWcwtHzhe3jMzxorGKqgx L75QD8UWY57yJlDgLpnbLDxlpAE/KV+XjvOihFqht/sQ5WucV/Ly4Nzbp 5kQnoki0y80IyfQEE+NOVUivnQUsNoRCd+7B5Z0tk9MGVzloLTmAEcssh 59Wb307FY+Q2a/WtpYhOwQZ/HbDXsOiLPGkbyPNVryp0ANgEdlXTsvj+F PyaPJ+YrQMNiISdG6GG7HGt+oBGheCMN+zSJUYysgAAA+qmukwCktcAEQ w==; X-CSE-ConnectionGUID: HmShex0zRbCkspQZBngbIQ== X-CSE-MsgGUID: pyQ5s7TjTIO3dOwT1U2Czg== X-IronPort-AV: E=McAfee;i="6700,10204,11168"; a="21837747" X-IronPort-AV: E=Sophos;i="6.10,158,1719903600"; d="scan'208";a="21837747" Received: from fmviesa001.fm.intel.com ([10.60.135.141]) by fmvoesa112.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Aug 2024 01:57:21 -0700 X-CSE-ConnectionGUID: +t2a39EJQ3akoAQVWI/CnA== X-CSE-MsgGUID: Ill2k5I2R7uzOvWQChKthg== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.10,158,1719903600"; d="scan'208";a="91084190" Received: from scymds04.sc.intel.com ([10.82.73.238]) by fmviesa001.fm.intel.com with ESMTP; 19 Aug 2024 01:57:20 -0700 Received: from icl-spr-01.jf.intel.com (icl-spr-01.jf.intel.com [10.165.54.241]) by scymds04.sc.intel.com (Postfix) with ESMTP id 36DEC2003EAC; Mon, 19 Aug 2024 01:57:20 -0700 (PDT) From: Haochen Jiang To: gcc-patches@gcc.gnu.org Cc: hongtao.liu@intel.com, zewei.mo@pitt.edu, ubizjak@gmail.com, Hongyu Wang Subject: [PATCH 03/12] [PATCH 2/2] AVX10.2: Support media instructions Date: Mon, 19 Aug 2024 01:56:47 -0700 Message-ID: <20240819085717.193256-4-haochen.jiang@intel.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20240819085717.193256-1-haochen.jiang@intel.com> References: <20240819085717.193256-1-haochen.jiang@intel.com> MIME-Version: 1.0 X-Spam-Status: No, score=-0.8 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, SPF_HELO_NONE, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE, UNWANTED_LANGUAGE_BODY autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org gcc/ChangeLog: * config/i386/avx10_2-512mediaintrin.h: Add new intrins. * config/i386/avx10_2mediaintrin.h: Ditto. * config/i386/i386-builtin.def: Add new builtins. * config/i386/i386-builtins.cc (def_builtin): Handle shared builtins between AVXVNNIINT16 and AVX10.2. * config/i386/i386-expand.cc (ix86_check_builtin_isa_match): Ditto. * config/i386/sse.md (unspec): Add UNSPEC_VDPPHPS. (_mpsadbw): New define_insn. (avx10_2_mpsadbw): Ditto. (vpdp_): Add AVX10_2_256. (vpdp_v16si): New defin_insn. (vpdp__mask): Ditto. (*vpdp__maskz): Ditto. (vpdp__maskz): New expander. (vdpphps_): New define_insn. (vdpphps__mask): Ditto. (*vdpphps__maskz): Ditto. (vdpphps__maskz): New expander. gcc/testsuite/ChangeLog: * gcc.target/i386/avxvnniint16-1.c: Add new macro test. * gcc.target/i386/avx-1.c: Ditto. * gcc.target/i386/sse-13.c: Ditto. * gcc.target/i386/sse-14.c: Ditto. * gcc.target/i386/sse-22.c: Ditto. * gcc.target/i386/sse-23.c: Ditto. * gcc.target/i386/avx10_2-512-media-1.c: Add test. * gcc.target/i386/avx10_2-media-1.c: Ditto. * gcc.target/i386/avxvnniint16-builtin.c: New test. * gcc.target/i386/avx10_2-512-vdpphps-2.c: Ditto. * gcc.target/i386/avx10_2-512-vmpsadbw-2.c: Ditto. * gcc.target/i386/avx10_2-512-vpdpwsud-2.c: Ditto. * gcc.target/i386/avx10_2-512-vpdpwsuds-2.c: Ditto. * gcc.target/i386/avx10_2-512-vpdpwusd-2.c: Ditto. * gcc.target/i386/avx10_2-512-vpdpwusds-2.c: Ditto. * gcc.target/i386/avx10_2-512-vpdpwuud-2.c: Ditto. * gcc.target/i386/avx10_2-512-vpdpwuuds-2.c: Ditto. * gcc.target/i386/avx10_2-builtin-2.c: Ditto. * gcc.target/i386/avx10_2-vdpphps-2.c: Ditto. * gcc.target/i386/avx10_2-vmpsadbw-2.c: Ditto. * gcc.target/i386/avx10_2-vpdpwsud-2.c: Ditto. * gcc.target/i386/avx10_2-vpdpwsuds-2.c: Ditto. * gcc.target/i386/avx10_2-vpdpwusd-2.c: Ditto. * gcc.target/i386/avx10_2-vpdpwusds-2.c: Ditto. * gcc.target/i386/avx10_2-vpdpwuud-2.c: Ditto. * gcc.target/i386/avx10_2-vpdpwuuds-2.c: Ditto. Co-authored-by: Hongyu Wang --- gcc/config/i386/avx10_2-512mediaintrin.h | 280 +++++++++++ gcc/config/i386/avx10_2mediaintrin.h | 472 ++++++++++++++++++ gcc/config/i386/i386-builtin.def | 76 ++- gcc/config/i386/i386-builtins.cc | 11 +- gcc/config/i386/i386-expand.cc | 3 + gcc/config/i386/sse.md | 145 +++++- gcc/testsuite/gcc.target/i386/avx-1.c | 8 + .../gcc.target/i386/avx10_2-512-media-1.c | 60 +++ .../gcc.target/i386/avx10_2-512-vdpphps-2.c | 71 +++ .../gcc.target/i386/avx10_2-512-vmpsadbw-2.c | 93 ++++ .../gcc.target/i386/avx10_2-512-vpdpwsud-2.c | 71 +++ .../gcc.target/i386/avx10_2-512-vpdpwsuds-2.c | 74 +++ .../gcc.target/i386/avx10_2-512-vpdpwusd-2.c | 71 +++ .../gcc.target/i386/avx10_2-512-vpdpwusds-2.c | 74 +++ .../gcc.target/i386/avx10_2-512-vpdpwuud-2.c | 70 +++ .../gcc.target/i386/avx10_2-512-vpdpwuuds-2.c | 73 +++ .../gcc.target/i386/avx10_2-builtin-2.c | 8 + .../gcc.target/i386/avx10_2-media-1.c | 112 +++++ .../gcc.target/i386/avx10_2-vdpphps-2.c | 16 + .../gcc.target/i386/avx10_2-vmpsadbw-2.c | 16 + .../gcc.target/i386/avx10_2-vpdpwsud-2.c | 16 + .../gcc.target/i386/avx10_2-vpdpwsuds-2.c | 16 + .../gcc.target/i386/avx10_2-vpdpwusd-2.c | 16 + .../gcc.target/i386/avx10_2-vpdpwusds-2.c | 16 + .../gcc.target/i386/avx10_2-vpdpwuud-2.c | 16 + .../gcc.target/i386/avx10_2-vpdpwuuds-2.c | 16 + .../gcc.target/i386/avxvnniint16-1.c | 42 +- .../gcc.target/i386/avxvnniint16-builtin.c | 8 + gcc/testsuite/gcc.target/i386/sse-13.c | 8 + gcc/testsuite/gcc.target/i386/sse-14.c | 11 + gcc/testsuite/gcc.target/i386/sse-22.c | 11 + gcc/testsuite/gcc.target/i386/sse-23.c | 8 + 32 files changed, 1953 insertions(+), 35 deletions(-) create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-512-vdpphps-2.c create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-512-vmpsadbw-2.c create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-512-vpdpwsud-2.c create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-512-vpdpwsuds-2.c create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-512-vpdpwusd-2.c create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-512-vpdpwusds-2.c create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-512-vpdpwuud-2.c create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-512-vpdpwuuds-2.c create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-builtin-2.c create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-vdpphps-2.c create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-vmpsadbw-2.c create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-vpdpwsud-2.c create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-vpdpwsuds-2.c create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-vpdpwusd-2.c create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-vpdpwusds-2.c create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-vpdpwuud-2.c create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-vpdpwuuds-2.c create mode 100644 gcc/testsuite/gcc.target/i386/avxvnniint16-builtin.c diff --git a/gcc/config/i386/avx10_2-512mediaintrin.h b/gcc/config/i386/avx10_2-512mediaintrin.h index 02d826b24cd..e471c83b1c4 100644 --- a/gcc/config/i386/avx10_2-512mediaintrin.h +++ b/gcc/config/i386/avx10_2-512mediaintrin.h @@ -226,6 +226,286 @@ _mm512_maskz_dpbuuds_epi32 (__mmask16 __U, __m512i __W, (__mmask16) __U); } +extern __inline __m512i +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_dpwsud_epi32 (__m512i __W, __m512i __A, __m512i __B) +{ + return (__m512i) + __builtin_ia32_vpdpwsud512 ((__v16si) __W, (__v16si) __A, (__v16si) __B); +} + +extern __inline __m512i +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_dpwsud_epi32 (__m512i __W, __mmask16 __U, + __m512i __A, __m512i __B) +{ + return (__m512i) + __builtin_ia32_vpdpwsud_v16si_mask ((__v16si) __W, + (__v16si) __A, + (__v16si) __B, + (__mmask16) __U); +} + +extern __inline __m512i +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_dpwsud_epi32 (__mmask16 __U, __m512i __W, + __m512i __A, __m512i __B) +{ + return (__m512i) + __builtin_ia32_vpdpwsud_v16si_maskz ((__v16si) __W, + (__v16si) __A, + (__v16si) __B, + (__mmask16) __U); +} + +extern __inline __m512i +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_dpwsuds_epi32 (__m512i __W, __m512i __A, __m512i __B) +{ + return (__m512i) + __builtin_ia32_vpdpwsuds512 ((__v16si) __W, (__v16si) __A, (__v16si) __B); +} + +extern __inline __m512i +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_dpwsuds_epi32 (__m512i __W, __mmask16 __U, + __m512i __A, __m512i __B) +{ + return (__m512i) + __builtin_ia32_vpdpwsuds_v16si_mask ((__v16si) __W, + (__v16si) __A, + (__v16si) __B, + (__mmask16) __U); +} + +extern __inline __m512i +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_dpwsuds_epi32 (__mmask16 __U, __m512i __W, + __m512i __A, __m512i __B) +{ + return (__m512i) + __builtin_ia32_vpdpwsuds_v16si_maskz ((__v16si) __W, + (__v16si) __A, + (__v16si) __B, + (__mmask16) __U); +} + +extern __inline __m512i +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_dpwusd_epi32 (__m512i __W, __m512i __A, __m512i __B) +{ + return (__m512i) + __builtin_ia32_vpdpwusd512 ((__v16si) __W, (__v16si) __A, (__v16si) __B); +} + +extern __inline __m512i +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_dpwusd_epi32 (__m512i __W, __mmask16 __U, + __m512i __A, __m512i __B) +{ + return (__m512i) + __builtin_ia32_vpdpwusd_v16si_mask ((__v16si) __W, + (__v16si) __A, + (__v16si) __B, + (__mmask16) __U); +} + +extern __inline __m512i +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_dpwusd_epi32 (__mmask16 __U, __m512i __W, + __m512i __A, __m512i __B) +{ + return (__m512i) + __builtin_ia32_vpdpwusd_v16si_maskz ((__v16si) __W, + (__v16si) __A, + (__v16si) __B, + (__mmask16) __U); +} + +extern __inline __m512i +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_dpwusds_epi32 (__m512i __W, __m512i __A, __m512i __B) +{ + return (__m512i) + __builtin_ia32_vpdpwusds512 ((__v16si) __W, (__v16si) __A, (__v16si) __B); +} + +extern __inline __m512i +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_dpwusds_epi32 (__m512i __W, __mmask16 __U, + __m512i __A, __m512i __B) +{ + return (__m512i) + __builtin_ia32_vpdpwusds_v16si_mask ((__v16si) __W, + (__v16si) __A, + (__v16si) __B, + (__mmask16) __U); +} + +extern __inline __m512i +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_dpwusds_epi32 (__mmask16 __U, __m512i __W, + __m512i __A, __m512i __B) +{ + return (__m512i) + __builtin_ia32_vpdpwusds_v16si_maskz ((__v16si) __W, + (__v16si) __A, + (__v16si) __B, + (__mmask16) __U); +} + +extern __inline __m512i +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_dpwuud_epi32 (__m512i __W, __m512i __A, __m512i __B) +{ + return (__m512i) + __builtin_ia32_vpdpwuud512 ((__v16si) __W, (__v16si) __A, (__v16si) __B); +} + +extern __inline __m512i +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_dpwuud_epi32 (__m512i __W, __mmask16 __U, + __m512i __A, __m512i __B) +{ + return (__m512i) + __builtin_ia32_vpdpwuud_v16si_mask ((__v16si) __W, + (__v16si) __A, + (__v16si) __B, + (__mmask16) __U); +} + +extern __inline __m512i +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_dpwuud_epi32 (__mmask16 __U, __m512i __W, + __m512i __A, __m512i __B) +{ + return (__m512i) + __builtin_ia32_vpdpwuud_v16si_maskz ((__v16si) __W, + (__v16si) __A, + (__v16si) __B, + (__mmask16) __U); +} + +extern __inline __m512i +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_dpwuuds_epi32 (__m512i __W, __m512i __A, __m512i __B) +{ + return (__m512i) + __builtin_ia32_vpdpwuuds512 ((__v16si) __W, (__v16si) __A, (__v16si) __B); +} + +extern __inline __m512i +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_dpwuuds_epi32 (__m512i __W, __mmask16 __U, + __m512i __A, __m512i __B) +{ + return (__m512i) + __builtin_ia32_vpdpwuuds_v16si_mask ((__v16si) __W, + (__v16si) __A, + (__v16si) __B, + (__mmask16) __U); +} + +extern __inline __m512i +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_dpwuuds_epi32 (__mmask16 __U, __m512i __W, + __m512i __A, __m512i __B) +{ + return (__m512i) + __builtin_ia32_vpdpwuuds_v16si_maskz ((__v16si) __W, + (__v16si) __A, + (__v16si) __B, + (__mmask16) __U); +} + +extern __inline __m512 +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_dpph_ps (__m512 __W, __m512h __A, __m512h __B) +{ + return (__m512) + __builtin_ia32_vdpphps512_mask ((__v16sf) __W, + (__v16sf) __A, + (__v16sf) __B, + (__mmask16) -1); +} + +extern __inline __m512 +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_dpph_ps (__m512 __W, __mmask16 __U, __m512h __A, + __m512h __B) +{ + return (__m512) + __builtin_ia32_vdpphps512_mask ((__v16sf) __W, + (__v16sf) __A, + (__v16sf) __B, + (__mmask16) __U); +} + +extern __inline __m512 +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_dpph_ps (__mmask16 __U, __m512 __W, __m512h __A, + __m512h __B) +{ + return (__m512) + __builtin_ia32_vdpphps512_maskz ((__v16sf) __W, + (__v16sf) __A, + (__v16sf) __B, + (__mmask16) __U); +} + +#ifdef __OPTIMIZE__ +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mpsadbw_epu8 (__m512i __X, __m512i __Y, const int __M) +{ + return (__m512i) __builtin_ia32_mpsadbw512 ((__v64qi) __X, + (__v64qi) __Y, + __M); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_mpsadbw_epu8 (__m512i __W, __mmask32 __U, __m512i __X, + __m512i __Y, const int __M) +{ + return (__m512i) __builtin_ia32_mpsadbw512_mask ((__v64qi) __X, + (__v64qi) __Y, + __M, + (__v32hi) __W, + __U); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_mpsadbw_epu8 (__mmask32 __U, __m512i __X, + __m512i __Y, const int __M) +{ + return (__m512i) __builtin_ia32_mpsadbw512_mask ((__v64qi) __X, + (__v64qi) __Y, + __M, + (__v32hi) _mm512_setzero_epi32 (), + __U); +} +#else +#define _mm512_mpsadbw_epu8(X, Y, M) \ + (__m512i) __builtin_ia32_mpsadbw512 ((__v64qi)(__m512i)(X), \ + (__v64qi)(__m512i)(Y), (int)(M)) + +#define _mm512_mask_mpsadbw_epu8(W, U, X, Y, M) \ + (__m512i) __builtin_ia32_mpsadbw512_mask ((__v64qi)(__m512i)(X), \ + (__v64qi)(__m512i)(Y), \ + (int)(M), \ + (__v32hi)(__m512i)(W), \ + (__mmask32)(U)) + +#define _mm512_maskz_mpsadbw_epu8(U, X, Y, M) \ + (__m512i) __builtin_ia32_mpsadbw512_mask ((__v64qi)(__m512i)(X), \ + (__v64qi)(__m512i)(Y), \ + (int)(M), \ + (__v32hi) _mm512_setzero_epi32 (), \ + (__mmask32)(U)) +#endif + #ifdef __DISABLE_AVX10_2_512__ #undef __DISABLE_AVX10_2_512__ #pragma GCC pop_options diff --git a/gcc/config/i386/avx10_2mediaintrin.h b/gcc/config/i386/avx10_2mediaintrin.h index e668af62e36..5456c185284 100644 --- a/gcc/config/i386/avx10_2mediaintrin.h +++ b/gcc/config/i386/avx10_2mediaintrin.h @@ -70,6 +70,42 @@ #define _mm256_dpbuuds_epi32(W, A, B) \ (__m256i) __builtin_ia32_vpdpbuuds256 ((__v8si) (W), (__v8si) (A), (__v8si) (B)) +#define _mm_dpwsud_epi32(W, A, B) \ + (__m128i) __builtin_ia32_vpdpwsud128 ((__v4si) (W), (__v4si) (A), (__v4si) (B)) + +#define _mm_dpwsuds_epi32(W, A, B) \ + (__m128i) __builtin_ia32_vpdpwsuds128 ((__v4si) (W), (__v4si) (A), (__v4si) (B)) + +#define _mm_dpwusd_epi32(W, A, B) \ + (__m128i) __builtin_ia32_vpdpwusd128 ((__v4si) (W), (__v4si) (A), (__v4si) (B)) + +#define _mm_dpwusds_epi32(W, A, B) \ + (__m128i) __builtin_ia32_vpdpwusds128 ((__v4si) (W), (__v4si) (A), (__v4si) (B)) + +#define _mm_dpwuud_epi32(W, A, B) \ + (__m128i) __builtin_ia32_vpdpwuud128 ((__v4si) (W), (__v4si) (A), (__v4si) (B)) + +#define _mm_dpwuuds_epi32(W, A, B) \ + (__m128i) __builtin_ia32_vpdpwuuds128 ((__v4si) (W), (__v4si) (A), (__v4si) (B)) + +#define _mm256_dpwsud_epi32(W, A, B) \ + (__m256i) __builtin_ia32_vpdpwsud256 ((__v8si) (W), (__v8si) (A), (__v8si) (B)) + +#define _mm256_dpwsuds_epi32(W, A, B) \ + (__m256i) __builtin_ia32_vpdpwsuds256 ((__v8si) (W), (__v8si) (A), (__v8si) (B)) + +#define _mm256_dpwusd_epi32(W, A, B) \ + (__m256i) __builtin_ia32_vpdpwusd256 ((__v8si) (W), (__v8si) (A), (__v8si) (B)) + +#define _mm256_dpwusds_epi32(W, A, B) \ + (__m256i) __builtin_ia32_vpdpwusds256 ((__v8si) (W), (__v8si) (A), (__v8si) (B)) + +#define _mm256_dpwuud_epi32(W, A, B) \ + (__m256i) __builtin_ia32_vpdpwuud256 ((__v8si) (W), (__v8si) (A), (__v8si) (B)) + +#define _mm256_dpwuuds_epi32(W, A, B) \ + (__m256i) __builtin_ia32_vpdpwuuds256 ((__v8si) (W), (__v8si) (A), (__v8si) (B)) + extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) _mm_mask_dpbssd_epi32 (__m128i __W, __mmask8 __U, @@ -358,6 +394,442 @@ _mm256_maskz_dpbuuds_epi32 (__mmask8 __U, __m256i __W, (__mmask8) __U); } +extern __inline __m128i +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_dpwsud_epi32 (__m128i __W, __mmask8 __U, + __m128i __A, __m128i __B) +{ + return (__m128i) + __builtin_ia32_vpdpwsud_v4si_mask ((__v4si) __W, + (__v4si) __A, + (__v4si) __B, + (__mmask8) __U); +} + +extern __inline __m128i +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_dpwsud_epi32 (__mmask8 __U, __m128i __W, + __m128i __A, __m128i __B) +{ + return (__m128i) + __builtin_ia32_vpdpwsud_v4si_maskz ((__v4si) __W, + (__v4si) __A, + (__v4si) __B, + (__mmask8) __U); +} + +extern __inline __m128i +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_dpwsuds_epi32 (__m128i __W, __mmask8 __U, + __m128i __A, __m128i __B) +{ + return (__m128i) + __builtin_ia32_vpdpwsuds_v4si_mask ((__v4si) __W, + (__v4si) __A, + (__v4si) __B, + (__mmask8) __U); +} + +extern __inline __m128i +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_dpwsuds_epi32 (__mmask8 __U, __m128i __W, + __m128i __A, __m128i __B) +{ + return (__m128i) + __builtin_ia32_vpdpwsuds_v4si_maskz ((__v4si) __W, + (__v4si) __A, + (__v4si) __B, + (__mmask8) __U); +} + +extern __inline __m128i +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_dpwusd_epi32 (__m128i __W, __mmask8 __U, + __m128i __A, __m128i __B) +{ + return (__m128i) + __builtin_ia32_vpdpwusd_v4si_mask ((__v4si) __W, + (__v4si) __A, + (__v4si) __B, + (__mmask8) __U); +} + +extern __inline __m128i +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_dpwusd_epi32 (__mmask8 __U, __m128i __W, + __m128i __A, __m128i __B) +{ + return (__m128i) + __builtin_ia32_vpdpwusd_v4si_maskz ((__v4si) __W, + (__v4si) __A, + (__v4si) __B, + (__mmask8) __U); +} + +extern __inline __m128i +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_dpwusds_epi32 (__m128i __W, __mmask8 __U, + __m128i __A, __m128i __B) +{ + return (__m128i) + __builtin_ia32_vpdpwusds_v4si_mask ((__v4si) __W, + (__v4si) __A, + (__v4si) __B, + (__mmask8) __U); +} + +extern __inline __m128i +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_dpwusds_epi32 (__mmask8 __U, __m128i __W, + __m128i __A, __m128i __B) +{ + return (__m128i) + __builtin_ia32_vpdpwusds_v4si_maskz ((__v4si) __W, + (__v4si) __A, + (__v4si) __B, + (__mmask8) __U); +} + +extern __inline __m128i +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_dpwuud_epi32 (__m128i __W, __mmask8 __U, + __m128i __A, __m128i __B) +{ + return (__m128i) + __builtin_ia32_vpdpwuud_v4si_mask ((__v4si) __W, + (__v4si) __A, + (__v4si) __B, + (__mmask8) __U); +} + +extern __inline __m128i +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_dpwuud_epi32 (__mmask8 __U, __m128i __W, + __m128i __A, __m128i __B) +{ + return (__m128i) + __builtin_ia32_vpdpwuud_v4si_maskz ((__v4si) __W, + (__v4si) __A, + (__v4si) __B, + (__mmask8) __U); +} + +extern __inline __m128i +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_dpwuuds_epi32 (__m128i __W, __mmask8 __U, + __m128i __A, __m128i __B) +{ + return (__m128i) + __builtin_ia32_vpdpwuuds_v4si_mask ((__v4si) __W, + (__v4si) __A, + (__v4si) __B, + (__mmask8) __U); +} + +extern __inline __m128i +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_dpwuuds_epi32 (__mmask8 __U, __m128i __W, + __m128i __A, __m128i __B) +{ + return (__m128i) + __builtin_ia32_vpdpwuuds_v4si_maskz ((__v4si) __W, + (__v4si) __A, + (__v4si) __B, + (__mmask8) __U); +} + +extern __inline __m256i +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_mask_dpwsud_epi32 (__m256i __W, __mmask8 __U, + __m256i __A, __m256i __B) +{ + return (__m256i) + __builtin_ia32_vpdpwsud_v8si_mask ((__v8si) __W, + (__v8si) __A, + (__v8si) __B, + (__mmask8) __U); +} + +extern __inline __m256i +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_maskz_dpwsud_epi32 (__mmask8 __U, __m256i __W, + __m256i __A, __m256i __B) +{ + return (__m256i) + __builtin_ia32_vpdpwsud_v8si_maskz ((__v8si) __W, + (__v8si) __A, + (__v8si) __B, + (__mmask8) __U); +} + +extern __inline __m256i +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_mask_dpwsuds_epi32 (__m256i __W, __mmask8 __U, + __m256i __A, __m256i __B) +{ + return (__m256i) + __builtin_ia32_vpdpwsuds_v8si_mask ((__v8si) __W, + (__v8si) __A, + (__v8si) __B, + (__mmask8) __U); +} + +extern __inline __m256i +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_maskz_dpwsuds_epi32 (__mmask8 __U, __m256i __W, + __m256i __A, __m256i __B) +{ + return (__m256i) + __builtin_ia32_vpdpwsuds_v8si_maskz ((__v8si) __W, + (__v8si) __A, + (__v8si) __B, + (__mmask8) __U); +} + +extern __inline __m256i +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_mask_dpwusd_epi32 (__m256i __W, __mmask8 __U, + __m256i __A, __m256i __B) +{ + return (__m256i) + __builtin_ia32_vpdpwusd_v8si_mask ((__v8si) __W, + (__v8si) __A, + (__v8si) __B, + (__mmask8) __U); +} + +extern __inline __m256i +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_maskz_dpwusd_epi32 (__mmask8 __U, __m256i __W, + __m256i __A, __m256i __B) +{ + return (__m256i) + __builtin_ia32_vpdpwusd_v8si_maskz ((__v8si) __W, + (__v8si) __A, + (__v8si) __B, + (__mmask8) __U); +} + +extern __inline __m256i +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_mask_dpwusds_epi32 (__m256i __W, __mmask8 __U, + __m256i __A, __m256i __B) +{ + return (__m256i) + __builtin_ia32_vpdpwusds_v8si_mask ((__v8si) __W, + (__v8si) __A, + (__v8si) __B, + (__mmask8) __U); +} + +extern __inline __m256i +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_maskz_dpwusds_epi32 (__mmask8 __U, __m256i __W, + __m256i __A, __m256i __B) +{ + return (__m256i) + __builtin_ia32_vpdpwusds_v8si_maskz ((__v8si) __W, + (__v8si) __A, + (__v8si) __B, + (__mmask8) __U); +} + +extern __inline __m256i +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_mask_dpwuud_epi32 (__m256i __W, __mmask8 __U, + __m256i __A, __m256i __B) +{ + return (__m256i) + __builtin_ia32_vpdpwuud_v8si_mask ((__v8si) __W, + (__v8si) __A, + (__v8si) __B, + (__mmask8) __U); +} + +extern __inline __m256i +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_maskz_dpwuud_epi32 (__mmask8 __U, __m256i __W, + __m256i __A, __m256i __B) +{ + return (__m256i) + __builtin_ia32_vpdpwuud_v8si_maskz ((__v8si) __W, + (__v8si) __A, + (__v8si) __B, + (__mmask8) __U); +} + +extern __inline __m256i +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_mask_dpwuuds_epi32 (__m256i __W, __mmask8 __U, + __m256i __A, __m256i __B) +{ + return (__m256i) + __builtin_ia32_vpdpwuuds_v8si_mask ((__v8si) __W, + (__v8si) __A, + (__v8si) __B, + (__mmask8) __U); +} + +extern __inline __m256i +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_maskz_dpwuuds_epi32 (__mmask8 __U, __m256i __W, + __m256i __A, __m256i __B) +{ + return (__m256i) + __builtin_ia32_vpdpwuuds_v8si_maskz ((__v8si) __W, + (__v8si) __A, + (__v8si) __B, + (__mmask8) __U); +} + +extern __inline __m256 +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_dpph_ps (__m256 __W, __m256h __A, __m256h __B) +{ + return (__m256) + __builtin_ia32_vdpphps256_mask ((__v8sf) __W, + (__v8sf) __A, + (__v8sf) __B, + (__mmask8) -1); +} + +extern __inline __m256 +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_mask_dpph_ps (__m256 __W, __mmask8 __U, __m256h __A, + __m256h __B) +{ + return (__m256) + __builtin_ia32_vdpphps256_mask ((__v8sf) __W, + (__v8sf) __A, + (__v8sf) __B, + (__mmask8) __U); +} + +extern __inline __m256 +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_maskz_dpph_ps (__mmask8 __U, __m256 __W, __m256h __A, + __m256h __B) +{ + return (__m256) + __builtin_ia32_vdpphps256_maskz ((__v8sf) __W, + (__v8sf) __A, + (__v8sf) __B, + (__mmask8) __U); +} + +extern __inline __m128 +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_dpph_ps (__m128 __W, __m128h __A, __m128h __B) +{ + return (__m128) + __builtin_ia32_vdpphps128_mask ((__v4sf) __W, + (__v4sf) __A, + (__v4sf) __B, + (__mmask8) -1); +} + +extern __inline __m128 +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_dpph_ps (__m128 __W, __mmask8 __U, __m128h __A, + __m128h __B) +{ + return (__m128) + __builtin_ia32_vdpphps128_mask ((__v4sf) __W, + (__v4sf) __A, + (__v4sf) __B, + (__mmask8) __U); +} + +extern __inline __m128 +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_dpph_ps (__mmask8 __U, __m128 __W, __m128h __A, + __m128h __B) +{ + return (__m128) + __builtin_ia32_vdpphps128_maskz ((__v4sf) __W, + (__v4sf) __A, + (__v4sf) __B, + (__mmask8) __U); +} + +#ifdef __OPTIMIZE__ +extern __inline __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_mpsadbw_epu8 (__m128i __W, __mmask8 __U, __m128i __X, + __m128i __Y, const int __M) +{ + return (__m128i) __builtin_ia32_mpsadbw128_mask ((__v16qi) __X, + (__v16qi) __Y, + __M, + (__v8hi) __W, + __U); +} + +extern __inline __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_mpsadbw_epu8 (__mmask8 __U, __m128i __X, + __m128i __Y, const int __M) +{ + return (__m128i) __builtin_ia32_mpsadbw128_mask ((__v16qi) __X, + (__v16qi) __Y, + __M, + (__v8hi) _mm_setzero_si128 (), + __U); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_mask_mpsadbw_epu8 (__m256i __W, __mmask16 __U, __m256i __X, + __m256i __Y, const int __M) +{ + return (__m256i) __builtin_ia32_mpsadbw256_mask ((__v32qi) __X, + (__v32qi) __Y, + __M, + (__v16hi) __W, + __U); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_maskz_mpsadbw_epu8 (__mmask16 __U, __m256i __X, + __m256i __Y, const int __M) +{ + return (__m256i) __builtin_ia32_mpsadbw256_mask ((__v32qi) __X, + (__v32qi) __Y, + __M, + (__v16hi) _mm256_setzero_si256 (), + __U); +} +#else +#define _mm_mask_mpsadbw_epu8(W, U, X, Y, M) \ + (__m128i) __builtin_ia32_mpsadbw128_mask ((__v16qi)(__m128i)(X), \ + (__v16qi)(__m128i)(Y), \ + (int)(M), \ + (__v8hi)(__m128i)(W), \ + (__mmask8)(U)) + +#define _mm_maskz_mpsadbw_epu8(U, X, Y, M) \ + (__m128i) __builtin_ia32_mpsadbw128_mask ((__v16qi)(__m128i)(X), \ + (__v16qi)(__m128i)(Y), \ + (int)(M), \ + (__v8hi) _mm_setzero_si128 (), \ + (__mmask8)(U)) + +#define _mm256_mask_mpsadbw_epu8(W, U, X, Y, M) \ + (__m256i) __builtin_ia32_mpsadbw256_mask ((__v32qi)(__m256i)(X), \ + (__v32qi)(__m256i)(Y), \ + (int)(M), \ + (__v16hi)(__m256i)(W), \ + (__mmask16)(U)) + +#define _mm256_maskz_mpsadbw_epu8(U, X, Y, M) \ + (__m256i) __builtin_ia32_mpsadbw256_mask ((__v32qi)(__m256i)(X), \ + (__v32qi)(__m256i)(Y), \ + (int)(M), \ + (__v16hi) _mm256_setzero_si256 (), \ + (__mmask16)(U)) + +#endif #ifdef __DISABLE_AVX10_2_256__ #undef __DISABLE_AVX10_2_256__ diff --git a/gcc/config/i386/i386-builtin.def b/gcc/config/i386/i386-builtin.def index 5bd9aabdc52..cdf28cd261c 100644 --- a/gcc/config/i386/i386-builtin.def +++ b/gcc/config/i386/i386-builtin.def @@ -2762,18 +2762,18 @@ BDESC (0, OPTION_MASK_ISA2_AVXVNNIINT8 | OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_ BDESC (0, OPTION_MASK_ISA2_AVXVNNIINT8 | OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_vpdpbuuds_v4si, "__builtin_ia32_vpdpbuuds128", IX86_BUILTIN_VPDPBUUDSV4SI, UNKNOWN, (int) V4SI_FTYPE_V4SI_V4SI_V4SI) /* AVXVNNIINT16 */ -BDESC (0, OPTION_MASK_ISA2_AVXVNNIINT16, CODE_FOR_vpdpwusd_v8si, "__builtin_ia32_vpdpwusd256", IX86_BUILTIN_VPDPWUSDV8SI, UNKNOWN, (int) V8SI_FTYPE_V8SI_V8SI_V8SI) -BDESC (0, OPTION_MASK_ISA2_AVXVNNIINT16, CODE_FOR_vpdpwusds_v8si, "__builtin_ia32_vpdpwusds256", IX86_BUILTIN_VPDPWUSDSV8SI, UNKNOWN, (int) V8SI_FTYPE_V8SI_V8SI_V8SI) -BDESC (0, OPTION_MASK_ISA2_AVXVNNIINT16, CODE_FOR_vpdpwsud_v8si, "__builtin_ia32_vpdpwsud256", IX86_BUILTIN_VPDPWSUDV8SI, UNKNOWN, (int) V8SI_FTYPE_V8SI_V8SI_V8SI) -BDESC (0, OPTION_MASK_ISA2_AVXVNNIINT16, CODE_FOR_vpdpwsuds_v8si, "__builtin_ia32_vpdpwsuds256", IX86_BUILTIN_VPDPWSUDSV8SI, UNKNOWN, (int) V8SI_FTYPE_V8SI_V8SI_V8SI) -BDESC (0, OPTION_MASK_ISA2_AVXVNNIINT16, CODE_FOR_vpdpwuud_v8si, "__builtin_ia32_vpdpwuud256", IX86_BUILTIN_VPDPWUUDV8SI, UNKNOWN, (int) V8SI_FTYPE_V8SI_V8SI_V8SI) -BDESC (0, OPTION_MASK_ISA2_AVXVNNIINT16, CODE_FOR_vpdpwuuds_v8si, "__builtin_ia32_vpdpwuuds256", IX86_BUILTIN_VPDPWUUDSV8SI, UNKNOWN, (int) V8SI_FTYPE_V8SI_V8SI_V8SI) -BDESC (0, OPTION_MASK_ISA2_AVXVNNIINT16, CODE_FOR_vpdpwusd_v4si, "__builtin_ia32_vpdpwusd128", IX86_BUILTIN_VPDPWUSDV4SI, UNKNOWN, (int) V4SI_FTYPE_V4SI_V4SI_V4SI) -BDESC (0, OPTION_MASK_ISA2_AVXVNNIINT16, CODE_FOR_vpdpwusds_v4si, "__builtin_ia32_vpdpwusds128", IX86_BUILTIN_VPDPWUSDSV4SI, UNKNOWN, (int) V4SI_FTYPE_V4SI_V4SI_V4SI) -BDESC (0, OPTION_MASK_ISA2_AVXVNNIINT16, CODE_FOR_vpdpwsud_v4si, "__builtin_ia32_vpdpwsud128", IX86_BUILTIN_VPDPWSUDV4SI, UNKNOWN, (int) V4SI_FTYPE_V4SI_V4SI_V4SI) -BDESC (0, OPTION_MASK_ISA2_AVXVNNIINT16, CODE_FOR_vpdpwsuds_v4si, "__builtin_ia32_vpdpwsuds128", IX86_BUILTIN_VPDPWSUDSV4SI, UNKNOWN, (int) V4SI_FTYPE_V4SI_V4SI_V4SI) -BDESC (0, OPTION_MASK_ISA2_AVXVNNIINT16, CODE_FOR_vpdpwuud_v4si, "__builtin_ia32_vpdpwuud128", IX86_BUILTIN_VPDPWUUDV4SI, UNKNOWN, (int) V4SI_FTYPE_V4SI_V4SI_V4SI) -BDESC (0, OPTION_MASK_ISA2_AVXVNNIINT16, CODE_FOR_vpdpwuuds_v4si, "__builtin_ia32_vpdpwuuds128", IX86_BUILTIN_VPDPWUUDSV4SI, UNKNOWN, (int) V4SI_FTYPE_V4SI_V4SI_V4SI) +BDESC (0, OPTION_MASK_ISA2_AVXVNNIINT16 | OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_vpdpwusd_v8si, "__builtin_ia32_vpdpwusd256", IX86_BUILTIN_VPDPWUSDV8SI, UNKNOWN, (int) V8SI_FTYPE_V8SI_V8SI_V8SI) +BDESC (0, OPTION_MASK_ISA2_AVXVNNIINT16 | OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_vpdpwusds_v8si, "__builtin_ia32_vpdpwusds256", IX86_BUILTIN_VPDPWUSDSV8SI, UNKNOWN, (int) V8SI_FTYPE_V8SI_V8SI_V8SI) +BDESC (0, OPTION_MASK_ISA2_AVXVNNIINT16 | OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_vpdpwsud_v8si, "__builtin_ia32_vpdpwsud256", IX86_BUILTIN_VPDPWSUDV8SI, UNKNOWN, (int) V8SI_FTYPE_V8SI_V8SI_V8SI) +BDESC (0, OPTION_MASK_ISA2_AVXVNNIINT16 | OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_vpdpwsuds_v8si, "__builtin_ia32_vpdpwsuds256", IX86_BUILTIN_VPDPWSUDSV8SI, UNKNOWN, (int) V8SI_FTYPE_V8SI_V8SI_V8SI) +BDESC (0, OPTION_MASK_ISA2_AVXVNNIINT16 | OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_vpdpwuud_v8si, "__builtin_ia32_vpdpwuud256", IX86_BUILTIN_VPDPWUUDV8SI, UNKNOWN, (int) V8SI_FTYPE_V8SI_V8SI_V8SI) +BDESC (0, OPTION_MASK_ISA2_AVXVNNIINT16 | OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_vpdpwuuds_v8si, "__builtin_ia32_vpdpwuuds256", IX86_BUILTIN_VPDPWUUDSV8SI, UNKNOWN, (int) V8SI_FTYPE_V8SI_V8SI_V8SI) +BDESC (0, OPTION_MASK_ISA2_AVXVNNIINT16 | OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_vpdpwusd_v4si, "__builtin_ia32_vpdpwusd128", IX86_BUILTIN_VPDPWUSDV4SI, UNKNOWN, (int) V4SI_FTYPE_V4SI_V4SI_V4SI) +BDESC (0, OPTION_MASK_ISA2_AVXVNNIINT16 | OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_vpdpwusds_v4si, "__builtin_ia32_vpdpwusds128", IX86_BUILTIN_VPDPWUSDSV4SI, UNKNOWN, (int) V4SI_FTYPE_V4SI_V4SI_V4SI) +BDESC (0, OPTION_MASK_ISA2_AVXVNNIINT16 | OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_vpdpwsud_v4si, "__builtin_ia32_vpdpwsud128", IX86_BUILTIN_VPDPWSUDV4SI, UNKNOWN, (int) V4SI_FTYPE_V4SI_V4SI_V4SI) +BDESC (0, OPTION_MASK_ISA2_AVXVNNIINT16 | OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_vpdpwsuds_v4si, "__builtin_ia32_vpdpwsuds128", IX86_BUILTIN_VPDPWSUDSV4SI, UNKNOWN, (int) V4SI_FTYPE_V4SI_V4SI_V4SI) +BDESC (0, OPTION_MASK_ISA2_AVXVNNIINT16 | OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_vpdpwuud_v4si, "__builtin_ia32_vpdpwuud128", IX86_BUILTIN_VPDPWUUDV4SI, UNKNOWN, (int) V4SI_FTYPE_V4SI_V4SI_V4SI) +BDESC (0, OPTION_MASK_ISA2_AVXVNNIINT16 | OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_vpdpwuuds_v4si, "__builtin_ia32_vpdpwuuds128", IX86_BUILTIN_VPDPWUUDSV4SI, UNKNOWN, (int) V4SI_FTYPE_V4SI_V4SI_V4SI) /* VPCLMULQDQ */ BDESC (OPTION_MASK_ISA_VPCLMULQDQ | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpclmulqdq_v2di, "__builtin_ia32_vpclmulqdq_v2di", IX86_BUILTIN_VPCLMULQDQ2, UNKNOWN, (int) V2DI_FTYPE_V2DI_V2DI_INT) @@ -3063,6 +3063,58 @@ BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_vpdpbuud_v4si_mask, "__builtin_ BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_vpdpbuud_v4si_maskz, "__builtin_ia32_vpdpbuud_v4si_maskz", IX86_BUILTIN_VPDPBUUDV4SI_MASKZ, UNKNOWN, (int) V4SI_FTYPE_V4SI_V4SI_V4SI_UQI) BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_vpdpbuuds_v4si_mask, "__builtin_ia32_vpdpbuuds_v4si_mask", IX86_BUILTIN_VPDPBUUDSV4SI_MASK, UNKNOWN, (int) V4SI_FTYPE_V4SI_V4SI_V4SI_UQI) BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_vpdpbuuds_v4si_maskz, "__builtin_ia32_vpdpbuuds_v4si_maskz", IX86_BUILTIN_VPDPBUUDSV4SI_MASKZ, UNKNOWN, (int) V4SI_FTYPE_V4SI_V4SI_V4SI_UQI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_512, CODE_FOR_vpdpwsud_v16si, "__builtin_ia32_vpdpwsud512", IX86_BUILTIN_VPDPWSUDV16SI, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_512, CODE_FOR_vpdpwsuds_v16si, "__builtin_ia32_vpdpwsuds512", IX86_BUILTIN_VPDPWSUDSV16SI, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_512, CODE_FOR_vpdpwusd_v16si, "__builtin_ia32_vpdpwusd512", IX86_BUILTIN_VPDPWUSDV16SI, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_512, CODE_FOR_vpdpwusds_v16si, "__builtin_ia32_vpdpwusds512", IX86_BUILTIN_VPDPWUSDSV16SI, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_512, CODE_FOR_vpdpwuud_v16si, "__builtin_ia32_vpdpwuud512", IX86_BUILTIN_VPDPWUUDV16SI, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_512, CODE_FOR_vpdpwuuds_v16si, "__builtin_ia32_vpdpwuuds512", IX86_BUILTIN_VPDPWUUDSV16SI, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_512, CODE_FOR_vpdpwsud_v16si_mask, "__builtin_ia32_vpdpwsud_v16si_mask", IX86_BUILTIN_VPDPWSUDV16SI_MASK, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_512, CODE_FOR_vpdpwsud_v16si_maskz, "__builtin_ia32_vpdpwsud_v16si_maskz", IX86_BUILTIN_VPDPWSUDV16SI_MASKZ, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_512, CODE_FOR_vpdpwsuds_v16si_mask, "__builtin_ia32_vpdpwsuds_v16si_mask", IX86_BUILTIN_VPDPWSUDSV16SI_MASK, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_512, CODE_FOR_vpdpwsuds_v16si_maskz, "__builtin_ia32_vpdpwsuds_v16si_maskz", IX86_BUILTIN_VPDPWSUDSV16SI_MASKZ, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_512, CODE_FOR_vpdpwusd_v16si_mask, "__builtin_ia32_vpdpwusd_v16si_mask", IX86_BUILTIN_VPDPWUSDV16SI_MASK, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_512, CODE_FOR_vpdpwusd_v16si_maskz, "__builtin_ia32_vpdpwusd_v16si_maskz", IX86_BUILTIN_VPDPWUSDV16SI_MASKZ, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_512, CODE_FOR_vpdpwusds_v16si_mask, "__builtin_ia32_vpdpwusds_v16si_mask", IX86_BUILTIN_VPDPWUSDSV16SI_MASK, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_512, CODE_FOR_vpdpwusds_v16si_maskz, "__builtin_ia32_vpdpwusds_v16si_maskz", IX86_BUILTIN_VPDPWUSDSV16SI_MASKZ, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_512, CODE_FOR_vpdpwuud_v16si_mask, "__builtin_ia32_vpdpwuud_v16si_mask", IX86_BUILTIN_VPDPWUUDV16SI_MASK, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_512, CODE_FOR_vpdpwuud_v16si_maskz, "__builtin_ia32_vpdpwuud_v16si_maskz", IX86_BUILTIN_VPDPWUUDV16SI_MASKZ, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_512, CODE_FOR_vpdpwuuds_v16si_mask, "__builtin_ia32_vpdpwuuds_v16si_mask", IX86_BUILTIN_VPDPWUUDSV16SI_MASK, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_512, CODE_FOR_vpdpwuuds_v16si_maskz, "__builtin_ia32_vpdpwuuds_v16si_maskz", IX86_BUILTIN_VPDPWUUDSV16SI_MASKZ, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_vpdpwsud_v8si_mask, "__builtin_ia32_vpdpwsud_v8si_mask", IX86_BUILTIN_VPDPWSUDV8SI_MASK, UNKNOWN, (int) V8SI_FTYPE_V8SI_V8SI_V8SI_UQI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_vpdpwsud_v8si_maskz, "__builtin_ia32_vpdpwsud_v8si_maskz", IX86_BUILTIN_VPDPWSUDV8SI_MASKZ, UNKNOWN, (int) V8SI_FTYPE_V8SI_V8SI_V8SI_UQI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_vpdpwsuds_v8si_mask, "__builtin_ia32_vpdpwsuds_v8si_mask", IX86_BUILTIN_VPDPWSUDSV8SI_MASK, UNKNOWN, (int) V8SI_FTYPE_V8SI_V8SI_V8SI_UQI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_vpdpwsuds_v8si_maskz, "__builtin_ia32_vpdpwsuds_v8si_maskz", IX86_BUILTIN_VPDPWSUDSV8SI_MASKZ, UNKNOWN, (int) V8SI_FTYPE_V8SI_V8SI_V8SI_UQI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_vpdpwusd_v8si_mask, "__builtin_ia32_vpdpwusd_v8si_mask", IX86_BUILTIN_VPDPWUSDV8SI_MASK, UNKNOWN, (int) V8SI_FTYPE_V8SI_V8SI_V8SI_UQI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_vpdpwusd_v8si_maskz, "__builtin_ia32_vpdpwusd_v8si_maskz", IX86_BUILTIN_VPDPWUSDV8SI_MASKZ, UNKNOWN, (int) V8SI_FTYPE_V8SI_V8SI_V8SI_UQI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_vpdpwusds_v8si_mask, "__builtin_ia32_vpdpwusds_v8si_mask", IX86_BUILTIN_VPDPWUSDSV8SI_MASK, UNKNOWN, (int) V8SI_FTYPE_V8SI_V8SI_V8SI_UQI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_vpdpwusds_v8si_maskz, "__builtin_ia32_vpdpwusds_v8si_maskz", IX86_BUILTIN_VPDPWUSDSV8SI_MASKZ, UNKNOWN, (int) V8SI_FTYPE_V8SI_V8SI_V8SI_UQI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_vpdpwuud_v8si_mask, "__builtin_ia32_vpdpwuud_v8si_mask", IX86_BUILTIN_VPDPWUUDV8SI_MASK, UNKNOWN, (int) V8SI_FTYPE_V8SI_V8SI_V8SI_UQI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_vpdpwuud_v8si_maskz, "__builtin_ia32_vpdpwuud_v8si_maskz", IX86_BUILTIN_VPDPWUUDV8SI_MASKZ, UNKNOWN, (int) V8SI_FTYPE_V8SI_V8SI_V8SI_UQI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_vpdpwuuds_v8si_mask, "__builtin_ia32_vpdpwuuds_v8si_mask", IX86_BUILTIN_VPDPWUUDSV8SI_MASK, UNKNOWN, (int) V8SI_FTYPE_V8SI_V8SI_V8SI_UQI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_vpdpwuuds_v8si_maskz, "__builtin_ia32_vpdpwuuds_v8si_maskz", IX86_BUILTIN_VPDPWUUDSV8SI_MASKZ, UNKNOWN, (int) V8SI_FTYPE_V8SI_V8SI_V8SI_UQI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_vpdpwsud_v4si_mask, "__builtin_ia32_vpdpwsud_v4si_mask", IX86_BUILTIN_VPDPWSUDV4SI_MASK, UNKNOWN, (int) V4SI_FTYPE_V4SI_V4SI_V4SI_UQI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_vpdpwsud_v4si_maskz, "__builtin_ia32_vpdpwsud_v4si_maskz", IX86_BUILTIN_VPDPWSUDV4SI_MASKZ, UNKNOWN, (int) V4SI_FTYPE_V4SI_V4SI_V4SI_UQI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_vpdpwsuds_v4si_mask, "__builtin_ia32_vpdpwsuds_v4si_mask", IX86_BUILTIN_VPDPWSUDSV4SI_MASK, UNKNOWN, (int) V4SI_FTYPE_V4SI_V4SI_V4SI_UQI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_vpdpwsuds_v4si_maskz, "__builtin_ia32_vpdpwsuds_v4si_maskz", IX86_BUILTIN_VPDPWSUDSV4SI_MASKZ, UNKNOWN, (int) V4SI_FTYPE_V4SI_V4SI_V4SI_UQI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_vpdpwusd_v4si_mask, "__builtin_ia32_vpdpwusd_v4si_mask", IX86_BUILTIN_VPDPWUSDV4SI_MASK, UNKNOWN, (int) V4SI_FTYPE_V4SI_V4SI_V4SI_UQI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_vpdpwusd_v4si_maskz, "__builtin_ia32_vpdpwusd_v4si_maskz", IX86_BUILTIN_VPDPWUSDV4SI_MASKZ, UNKNOWN, (int) V4SI_FTYPE_V4SI_V4SI_V4SI_UQI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_vpdpwusds_v4si_mask, "__builtin_ia32_vpdpwusds_v4si_mask", IX86_BUILTIN_VPDPWUSDSV4SI_MASK, UNKNOWN, (int) V4SI_FTYPE_V4SI_V4SI_V4SI_UQI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_vpdpwusds_v4si_maskz, "__builtin_ia32_vpdpwusds_v4si_maskz", IX86_BUILTIN_VPDPWUSDSV4SI_MASKZ, UNKNOWN, (int) V4SI_FTYPE_V4SI_V4SI_V4SI_UQI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_vpdpwuud_v4si_mask, "__builtin_ia32_vpdpwuud_v4si_mask", IX86_BUILTIN_VPDPWUUDV4SI_MASK, UNKNOWN, (int) V4SI_FTYPE_V4SI_V4SI_V4SI_UQI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_vpdpwuud_v4si_maskz, "__builtin_ia32_vpdpwuud_v4si_maskz", IX86_BUILTIN_VPDPWUUDV4SI_MASKZ, UNKNOWN, (int) V4SI_FTYPE_V4SI_V4SI_V4SI_UQI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_vpdpwuuds_v4si_mask, "__builtin_ia32_vpdpwuuds_v4si_mask", IX86_BUILTIN_VPDPWUUDSV4SI_MASK, UNKNOWN, (int) V4SI_FTYPE_V4SI_V4SI_V4SI_UQI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_vpdpwuuds_v4si_maskz, "__builtin_ia32_vpdpwuuds_v4si_maskz", IX86_BUILTIN_VPDPWUUDSV4SI_MASKZ, UNKNOWN, (int) V4SI_FTYPE_V4SI_V4SI_V4SI_UQI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_512, CODE_FOR_vdpphps_v16sf_mask, "__builtin_ia32_vdpphps512_mask", IX86_BUILTIN_VDPPHPS512_MASK, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_UHI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_512, CODE_FOR_vdpphps_v16sf_maskz, "__builtin_ia32_vdpphps512_maskz", IX86_BUILTIN_VDPPHPS512_MASKZ, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_UHI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_vdpphps_v8sf_mask, "__builtin_ia32_vdpphps256_mask", IX86_BUILTIN_VDPPHPS256_MASK, UNKNOWN, (int) V8SF_FTYPE_V8SF_V8SF_V8SF_UQI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_vdpphps_v8sf_maskz, "__builtin_ia32_vdpphps256_maskz", IX86_BUILTIN_VDPPHPS256_MASKZ, UNKNOWN, (int) V8SF_FTYPE_V8SF_V8SF_V8SF_UQI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_vdpphps_v4sf_mask, "__builtin_ia32_vdpphps128_mask", IX86_BUILTIN_VDPPHPS128_MASK, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_V4SF_UQI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_vdpphps_v4sf_maskz, "__builtin_ia32_vdpphps128_maskz", IX86_BUILTIN_VDPPHPS128_MASKZ, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_V4SF_UQI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_512, CODE_FOR_avx10_2_mpsadbw, "__builtin_ia32_mpsadbw512", IX86_BUILTIN_AVX10_2_MPSADBW, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI_INT) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_512, CODE_FOR_avx10_2_mpsadbw_mask, "__builtin_ia32_mpsadbw512_mask", IX86_BUILTIN_VMPSADBW_V32HI_MASK, UNKNOWN, (int) V32HI_FTYPE_V64QI_V64QI_INT_V32HI_USI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx2_mpsadbw_mask, "__builtin_ia32_mpsadbw256_mask", IX86_BUILTIN_VMPSADBW_V16HI_MASK, UNKNOWN, (int) V16HI_FTYPE_V32QI_V32QI_INT_V16HI_UHI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_sse4_1_mpsadbw_mask, "__builtin_ia32_mpsadbw128_mask", IX86_BUILTIN_VMPSADBW_V8HI_MASK, UNKNOWN, (int) V8HI_FTYPE_V16QI_V16QI_INT_V8HI_UQI) /* Builtins with rounding support. */ BDESC_END (ARGS, ROUND_ARGS) diff --git a/gcc/config/i386/i386-builtins.cc b/gcc/config/i386/i386-builtins.cc index 130ba853125..4286eeb80e6 100644 --- a/gcc/config/i386/i386-builtins.cc +++ b/gcc/config/i386/i386-builtins.cc @@ -280,17 +280,18 @@ def_builtin (HOST_WIDE_INT mask, HOST_WIDE_INT mask2, if (((mask2 == 0 || (mask2 & ix86_isa_flags2) != 0) && (mask == 0 || (mask & ix86_isa_flags) != 0)) || ((mask & OPTION_MASK_ISA_MMX) != 0 && TARGET_MMX_WITH_SSE) - /* "Unified" builtin used by either AVXVNNI/AVXIFMA/AES/AVXVNNIINT8 - intrinsics or AVX512VNNIVL/AVX512IFMAVL/VAESVL/AVX10.2 non-mask - intrinsics should be defined whenever avxvnni/avxifma/aes/ - avxvnniint8 or avx512vnni && avx512vl/avx512ifma && avx512vl/vaes - && avx512vl/avx10.2 exist. */ + /* "Unified" builtin used by either AVXVNNI/AVXIFMA/AES/ + AVXVNNIINT{8,16} intrinsics or AVX512VNNIVL/AVX512IFMAVL/VAESVL/ + AVX10.2 non-mask intrinsics should be defined whenever avxvnni/ + avxifma/aes/avxvnniint{8,16} or avx512vnni && avx512vl/avx512ifma + && avx512vl/vaes && avx512vl/avx10.2 exist. */ || (mask2 == OPTION_MASK_ISA2_AVXVNNI) || (mask2 == OPTION_MASK_ISA2_AVXIFMA) || (mask2 == (OPTION_MASK_ISA2_AVXNECONVERT | OPTION_MASK_ISA2_AVX512BF16)) || ((mask2 & OPTION_MASK_ISA2_VAES) != 0) || ((mask2 & OPTION_MASK_ISA2_AVXVNNIINT8) != 0) + || ((mask2 & OPTION_MASK_ISA2_AVXVNNIINT16) != 0) || (lang_hooks.builtin_function == lang_hooks.builtin_function_ext_scope)) { diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc index 200b768f5d9..f1e6bc11f86 100644 --- a/gcc/config/i386/i386-expand.cc +++ b/gcc/config/i386/i386-expand.cc @@ -13299,6 +13299,7 @@ ix86_check_builtin_isa_match (unsigned int fcode, OPTION_MASK_ISA2_AVXNECONVERT OPTION_MASK_ISA_AES or (OPTION_MASK_ISA_AVX512VL | OPTION_MASK_ISA2_VAES) OPTION_MASK_ISA2_AVX10_2 or OPTION_MASK_ISA2_AVXVNNIINT8 + OPTION_MASK_ISA2_AVX10_2 or OPTION_MASK_ISA2_AVXVNNIINT16 where for each such pair it is sufficient if either of the ISAs is enabled, plus if it is ored with other options also those others. OPTION_MASK_ISA_MMX in bisa is satisfied also if TARGET_MMX_WITH_SSE. */ @@ -13326,6 +13327,8 @@ ix86_check_builtin_isa_match (unsigned int fcode, OPTION_MASK_ISA2_VAES); SHARE_BUILTIN (0, OPTION_MASK_ISA2_AVXVNNIINT8, 0, OPTION_MASK_ISA2_AVX10_2_256); + SHARE_BUILTIN (0, OPTION_MASK_ISA2_AVXVNNIINT16, 0, + OPTION_MASK_ISA2_AVX10_2_256); isa = tmp_isa; isa2 = tmp_isa2; diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index 41d448f57cb..6f76e8f50ad 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -214,6 +214,8 @@ UNSPEC_SM4KEY4 UNSPEC_SM4RNDS4 + ;; For AVX10.2 suppport + UNSPEC_VDPPHPS ]) (define_c_enum "unspecv" [ @@ -465,6 +467,9 @@ (define_mode_iterator VF1_AVX512VL [(V16SF "TARGET_EVEX512") (V8SF "TARGET_AVX512VL") (V4SF "TARGET_AVX512VL")]) +(define_mode_iterator VF1_AVX10_2 + [(V16SF "TARGET_AVX10_2_512") V8SF V4SF]) + (define_mode_iterator VHFBF [(V32HF "TARGET_EVEX512") V16HF V8HF (V32BF "TARGET_EVEX512") V16BF V8BF]) @@ -23555,6 +23560,31 @@ (set_attr "znver1_decode" "vector,vector,vector") (set_attr "mode" "")]) +(define_insn "avx10_2_mpsadbw" + [(set (match_operand:V64QI 0 "register_operand" "=v") + (unspec:V64QI + [(match_operand:V64QI 1 "register_operand" "v") + (match_operand:V64QI 2 "vector_operand" "vm") + (match_operand:SI 3 "const_0_to_255_operand" "n")] + UNSPEC_MPSADBW))] + "TARGET_AVX10_2_512" + "vmpsadbw\t{%3, %2, %1, %0|%0, %1, %2, %3}" + [(set_attr "length_immediate" "1") + (set_attr "prefix" "evex")]) + +(define_insn "_mpsadbw" + [(set (match_operand:VI1 0 "register_operand" "=v") + (unspec:VI1 + [(match_operand:VI1 1 "register_operand" "v") + (match_operand:VI1 2 "vector_operand" "vm") + (match_operand:SI 3 "const_0_to_255_operand" "n")] + UNSPEC_MPSADBW))] + "TARGET_AVX10_2_256" + "vmpsadbw\t{%3, %2, %1, %0|%0, %1, %2, %3}" + [(set_attr "length_immediate" "1") + (set_attr "prefix" "evex") + (set_attr "mode" "")]) + (define_insn "_packusdw" [(set (match_operand:VI2_AVX2_AVX512BW 0 "register_operand" "=Yr,*x,") (unspec:VI2_AVX2_AVX512BW @@ -31438,13 +31468,116 @@ }) (define_insn "vpdp_" - [(set (match_operand:VI4_AVX 0 "register_operand" "=x") + [(set (match_operand:VI4_AVX 0 "register_operand" "=v") (unspec:VI4_AVX [(match_operand:VI4_AVX 1 "register_operand" "0") - (match_operand:VI4_AVX 2 "register_operand" "x") - (match_operand:VI4_AVX 3 "nonimmediate_operand" "xjm")] + (match_operand:VI4_AVX 2 "register_operand" "v") + (match_operand:VI4_AVX 3 "nonimmediate_operand" "vm")] VPDPWPROD))] - "TARGET_AVXVNNIINT16" + "TARGET_AVXVNNIINT16 || TARGET_AVX10_2_256" + "vpdp\t{%3, %2, %0|%0, %2, %3}" + [(set_attr "prefix" "maybe_evex")]) + +(define_insn "vpdp_v16si" + [(set (match_operand:V16SI 0 "register_operand" "=v") + (unspec:V16SI + [(match_operand:V16SI 1 "register_operand" "0") + (match_operand:V16SI 2 "register_operand" "v") + (match_operand:V16SI 3 "nonimmediate_operand" "vm")] + VPDPWPROD))] + "TARGET_AVX10_2_512" "vpdp\t{%3, %2, %0|%0, %2, %3}" - [(set_attr "prefix" "vex") - (set_attr "addr" "gpr16")]) + [(set_attr "prefix" "evex")]) + +(define_insn "vpdp__mask" + [(set (match_operand:VI4_AVX10_2 0 "register_operand" "=v") + (vec_merge:VI4_AVX10_2 + (unspec:VI4_AVX10_2 + [(match_operand:VI4_AVX10_2 1 "register_operand" "0") + (match_operand:VI4_AVX10_2 2 "register_operand" "v") + (match_operand:VI4_AVX10_2 3 "nonimmediate_operand" "vm")] + VPDPWPROD) + (match_dup 1) + (match_operand: 4 "register_operand" "Yk")))] + "TARGET_AVX10_2_256" + "vpdp\t{%3, %2, %0%{%4%}|%0%{%4%}, %2, %3}" + [(set_attr "prefix" "evex")]) + +(define_expand "vpdp__maskz" + [(set (match_operand:VI4_AVX10_2 0 "register_operand") + (vec_merge:VI4_AVX10_2 + (unspec:VI4_AVX10_2 + [(match_operand:VI4_AVX10_2 1 "register_operand") + (match_operand:VI4_AVX10_2 2 "register_operand") + (match_operand:VI4_AVX10_2 3 "nonimmediate_operand")] + VPDPWPROD) + (match_dup 5) + (match_operand: 4 "register_operand")))] + "TARGET_AVX10_2_256" + "operands[5] = CONST0_RTX (mode);") + +(define_insn "*vpdp__maskz" + [(set (match_operand:VI4_AVX10_2 0 "register_operand" "=v") + (vec_merge:VI4_AVX10_2 + (unspec:VI4_AVX10_2 + [(match_operand:VI4_AVX10_2 1 "register_operand" "0") + (match_operand:VI4_AVX10_2 2 "register_operand" "v") + (match_operand:VI4_AVX10_2 3 "nonimmediate_operand" "vm")] + VPDPWPROD) + (match_operand:VI4_AVX10_2 5 "const0_operand" "C") + (match_operand: 4 "register_operand" "Yk")))] + "TARGET_AVX10_2_256" + "vpdp\t{%3, %2, %0%{%4%}%N5|%0%{%4%}%N5, %2, %3}" + [(set_attr "prefix" "evex")]) + +(define_insn "vdpphps_" + [(set (match_operand:VF1_AVX10_2 0 "register_operand" "=v") + (unspec:VF1_AVX10_2 + [(match_operand:VF1_AVX10_2 1 "register_operand" "0") + (match_operand:VF1_AVX10_2 2 "register_operand" "v") + (match_operand:VF1_AVX10_2 3 "nonimmediate_operand" "vm")] + UNSPEC_VDPPHPS))] + "TARGET_AVX10_2_256" + "vdpphps\t{%3, %2, %0|%0, %2, %3}" + [(set_attr "prefix" "evex")]) + +(define_insn "vdpphps__mask" + [(set (match_operand:VF1_AVX10_2 0 "register_operand" "=v") + (vec_merge:VF1_AVX10_2 + (unspec:VF1_AVX10_2 + [(match_operand:VF1_AVX10_2 1 "register_operand" "0") + (match_operand:VF1_AVX10_2 2 "register_operand" "v") + (match_operand:VF1_AVX10_2 3 "nonimmediate_operand" "vm")] + UNSPEC_VDPPHPS) + (match_dup 1) + (match_operand: 4 "register_operand" "Yk")))] + "TARGET_AVX10_2_256" + "vdpphps\t{%3, %2, %0%{%4%}|%0%{%4%}, %2, %3}" + [(set_attr "prefix" "evex")]) + +(define_expand "vdpphps__maskz" + [(match_operand:VF1_AVX10_2 0 "register_operand") + (match_operand:VF1_AVX10_2 1 "register_operand") + (match_operand:VF1_AVX10_2 2 "register_operand") + (match_operand:VF1_AVX10_2 3 "nonimmediate_operand") + (match_operand: 4 "register_operand")] + "TARGET_AVX10_2_256" +{ + emit_insn (gen_vdpphps__maskz_1 (operands[0], operands[1], + operands[2], operands[3], CONST0_RTX(mode), operands[4])); + DONE; +}) + +(define_insn "vdpphps__maskz_1" + [(set (match_operand:VF1_AVX10_2 0 "register_operand" "=v") + (vec_merge:VF1_AVX10_2 + (unspec:VF1_AVX10_2 + [(match_operand:VF1_AVX10_2 1 "register_operand" "0") + (match_operand:VF1_AVX10_2 2 "register_operand" "v") + (match_operand:VF1_AVX10_2 3 "nonimmediate_operand" "vm")] + UNSPEC_VDPPHPS) + (match_operand:VF1_AVX10_2 4 "const0_operand" "C") + (match_operand: 5 "register_operand" "Yk")))] + "TARGET_AVX10_2_256" + "vdpphps\t{%3, %2, %0%{%5%}%N4|%0%{%5%}%N4, %2, %3}" + [(set_attr "prefix" "evex")]) diff --git a/gcc/testsuite/gcc.target/i386/avx-1.c b/gcc/testsuite/gcc.target/i386/avx-1.c index f64d0c88264..5fc84234b57 100644 --- a/gcc/testsuite/gcc.target/i386/avx-1.c +++ b/gcc/testsuite/gcc.target/i386/avx-1.c @@ -1002,6 +1002,14 @@ #define __builtin_ia32_subph256_mask_round(A, B, C, D, E) __builtin_ia32_subph256_mask_round(A, B, C, D, 8) #define __builtin_ia32_subps256_mask_round(A, B, C, D, E) __builtin_ia32_subps256_mask_round(A, B, C, D, 8) +/* avx10_2-512mediaintrin.h */ +#define __builtin_ia32_mpsadbw512(A, B, C) __builtin_ia32_mpsadbw512 (A, B, 1) +#define __builtin_ia32_mpsadbw512_mask(A, B, C, D, E) __builtin_ia32_mpsadbw512_mask (A, B, 1, D, E) + +/* avx10_2mediaintrin.h */ +#define __builtin_ia32_mpsadbw128_mask(A, B, C, D, E) __builtin_ia32_mpsadbw128_mask (A, B, 1, D, E) +#define __builtin_ia32_mpsadbw256_mask(A, B, C, D, E) __builtin_ia32_mpsadbw256_mask (A, B, 1, D, E) + #include #include #include diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-512-media-1.c b/gcc/testsuite/gcc.target/i386/avx10_2-512-media-1.c index d4145c41a99..00df32194e5 100644 --- a/gcc/testsuite/gcc.target/i386/avx10_2-512-media-1.c +++ b/gcc/testsuite/gcc.target/i386/avx10_2-512-media-1.c @@ -18,11 +18,39 @@ /* { dg-final { scan-assembler-times "vpdpbuuds\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ /* { dg-final { scan-assembler-times "vpdpbuuds\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\\n\\r]*%zmm\[0-9\]+\[^\\n\\r\]*%zmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ /* { dg-final { scan-assembler-times "vpdpbuuds\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\\n\\r]*%zmm\[0-9\]+\[^\\n\\r\]*%zmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vpdpwsud\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vpdpwsud\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\\n\\r]*%zmm\[0-9\]+\[^\\n\\r\]*%zmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vpdpwsud\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\\n\\r]*%zmm\[0-9\]+\[^\\n\\r\]*%zmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vpdpwsuds\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vpdpwsuds\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\\n\\r]*%zmm\[0-9\]+\[^\\n\\r\]*%zmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vpdpwsuds\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\\n\\r]*%zmm\[0-9\]+\[^\\n\\r\]*%zmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vpdpwusd\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vpdpwusd\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\\n\\r]*%zmm\[0-9\]+\[^\\n\\r\]*%zmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vpdpwusd\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\\n\\r]*%zmm\[0-9\]+\[^\\n\\r\]*%zmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vpdpwusds\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vpdpwusds\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\\n\\r]*%zmm\[0-9\]+\[^\\n\\r\]*%zmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vpdpwusds\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\\n\\r]*%zmm\[0-9\]+\[^\\n\\r\]*%zmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vpdpwuud\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vpdpwuud\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\\n\\r]*%zmm\[0-9\]+\[^\\n\\r\]*%zmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vpdpwuud\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\\n\\r]*%zmm\[0-9\]+\[^\\n\\r\]*%zmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vpdpwuuds\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vpdpwuuds\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\\n\\r]*%zmm\[0-9\]+\[^\\n\\r\]*%zmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vpdpwuuds\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\\n\\r]*%zmm\[0-9\]+\[^\\n\\r\]*%zmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vdpphps\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vdpphps\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\\n\\r]*%zmm\[0-9\]+\[^\\n\\r\]*%zmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vdpphps\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\\n\\r]*%zmm\[0-9\]+\[^\\n\\r\]*%zmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vmpsadbw\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vmpsadbw\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\\n\\r]*%zmm\[0-9\]+\[^\\n\\r\]*%zmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vmpsadbw\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\\n\\r]*%zmm\[0-9\]+\[^\\n\\r\]*%zmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ + #include +volatile __m512 a; +volatile __m512h b,c; volatile __m512i x,y,z,z1; volatile __mmask16 m16; +volatile __mmask32 m32; void avx10_2_512_test (void) { @@ -49,4 +77,36 @@ void avx10_2_512_test (void) x = _mm512_dpbuuds_epi32 (x, y, z); x = _mm512_mask_dpbuuds_epi32 (x, m16, y, z); x = _mm512_maskz_dpbuuds_epi32 (m16, x, y, z); + + x = _mm512_dpwsud_epi32 (x, y, z); + x = _mm512_mask_dpwsud_epi32 (x, m16, y, z); + x = _mm512_maskz_dpwsud_epi32 (m16, x, y, z); + + x = _mm512_dpwsuds_epi32 (x, y, z); + x = _mm512_mask_dpwsuds_epi32 (x, m16, y, z); + x = _mm512_maskz_dpwsuds_epi32 (m16, x, y, z); + + x = _mm512_dpwusd_epi32 (x, y, z); + x = _mm512_mask_dpwusd_epi32 (x, m16, y, z); + x = _mm512_maskz_dpwusd_epi32 (m16, x, y, z); + + x = _mm512_dpwusds_epi32 (x, y, z); + x = _mm512_mask_dpwusds_epi32 (x, m16, y, z); + x = _mm512_maskz_dpwusds_epi32 (m16, x, y, z); + + x = _mm512_dpwuud_epi32 (x, y, z); + x = _mm512_mask_dpwuud_epi32 (x, m16, y, z); + x = _mm512_maskz_dpwuud_epi32 (m16, x, y, z); + + x = _mm512_dpwuuds_epi32 (x, y, z); + x = _mm512_mask_dpwuuds_epi32 (x, m16, y, z); + x = _mm512_maskz_dpwuuds_epi32 (m16, x, y, z); + + a = _mm512_dpph_ps (a, b, c); + a = _mm512_mask_dpph_ps (a, m16, b, c); + a = _mm512_maskz_dpph_ps (m16, a, b, c); + + x = _mm512_mpsadbw_epu8 (x, y, 1); + x = _mm512_mask_mpsadbw_epu8 (x, m32, y, z, 1); + x = _mm512_maskz_mpsadbw_epu8 (m32, x, y, 1); } diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-512-vdpphps-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-512-vdpphps-2.c new file mode 100644 index 00000000000..9b73a298fb9 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx10_2-512-vdpphps-2.c @@ -0,0 +1,71 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mavx10.2-512" } */ +/* { dg-require-effective-target avx10_2_512 } */ + +#ifndef AVX10_2 +#define AVX10_2 +#define AVX10_2_512 +#define AVX10_512BIT +#endif + +#include "avx10-helper.h" + +#define SRC_SIZE AVX512F_LEN / 16 +#define SIZE AVX512F_LEN / 32 + +static void +CALC (float *dest, _Float16 *src1, _Float16 *src2) +{ + int i; + + for (i = 0; i < SIZE; i++) + { + dest[i] += (float) src1[2 * i + 1] * (float) src2[2 * i + 1]; + dest[i] += (float) src1[2 * i] * (float) src2[2 * i]; + } +} + +void +TEST(void) +{ + UNION_TYPE (AVX512F_LEN, h) src1, src2; + UNION_TYPE (AVX512F_LEN,) res1, res2, res3; + MASK_TYPE mask = MASK_VALUE; + float res_ref[SIZE], res_ref2[SIZE], res_ref3[SIZE]; + + for (int i = 0; i < SRC_SIZE; i++) + { + src1.a[i] = (_Float16) (i * 4) + 1.25f16; + src2.a[i] = (_Float16) (i * 2) + 2.5f16; + } + + for (int i = 0; i < SIZE; i++) + { + res1.a[i] = 3.125f + 2 * i; + res_ref[i] = 3.125f + 2 * i; + res2.a[i] = DEFAULT_VALUE; + res3.a[i] = DEFAULT_VALUE; + res_ref2[i] = DEFAULT_VALUE; + res_ref3[i] = DEFAULT_VALUE; + } + + res1.x = INTRINSIC (_dpph_ps) (res1.x, src1.x, src2.x); + res2.x = INTRINSIC (_mask_dpph_ps) (res2.x, mask, src1.x, src2.x); + res3.x = INTRINSIC (_maskz_dpph_ps) (mask, res3.x, src1.x, src2.x); + + CALC(res_ref, src1.a, src2.a); + CALC(res_ref2, src1.a, src2.a); + CALC(res_ref3, src1.a, src2.a); + + if (UNION_CHECK(AVX512F_LEN,) (res1, res_ref)) + abort (); + + MASK_MERGE () (res_ref2, mask, SIZE); + if (UNION_CHECK(AVX512F_LEN,) (res2, res_ref2)) + abort (); + + MASK_ZERO () (res_ref3, mask, SIZE); + if (UNION_CHECK(AVX512F_LEN,) (res3, res_ref3)) + abort (); +} + diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-512-vmpsadbw-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-512-vmpsadbw-2.c new file mode 100644 index 00000000000..3cedab490fa --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx10_2-512-vmpsadbw-2.c @@ -0,0 +1,93 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mavx10.2-512" } */ +/* { dg-require-effective-target avx10_2_512 } */ + +#ifndef AVX10_2 +#define AVX10_2 +#define AVX10_2_512 +#define AVX10_512BIT +#endif +#include "avx10-helper.h" + +#define SIZE (AVX512F_LEN / 8) +#define SIZE_RES (AVX512F_LEN / 16) + + +static void +CALC (short* dst, char* src1, char* src2, int cont) +{ + int blk2_pos, blk1_pos, i, j, k, c; + char blk1[12], blk2[4], x; + short tmp[4], s; + + for (k = 0; k < AVX512F_LEN / 128; k++) + { + c = cont & 0xff; + if (k % 2 == 1) + c >>= 3; + blk2_pos = (c & 3) * 4; + blk1_pos = ((c >> 2) & 1) * 4; + + for (i = 0; i < 11; i++) + blk1[i] = src1[16 * k + i + blk1_pos]; + + for (i = 0; i < 4; i++) + blk2[i] = src2[16 * k + i + blk2_pos]; + + for (i = 0; i < 8; i++) + { + for (j = 0; j < 4; j++) + { + x = blk1[j + i] - blk2[j]; + tmp[j] = x > 0 ? x : -x; + } + + s = 0; + for (j = 0; j < 4; j++) + s += tmp[j]; + dst[8 * k + i] = s; + } + } +} + +void +TEST (void) +{ + int i; + UNION_TYPE (AVX512F_LEN, i_w) res1, res2, res3; + UNION_TYPE (AVX512F_LEN, i_b) src1; + UNION_TYPE (AVX512F_LEN, i_b) src2; + MASK_TYPE mask = MASK_VALUE; + short res_ref[SIZE_RES], res_ref2[SIZE_RES]; + + for (i = 0; i < SIZE; i++) + { + src1.a[i] = 10 + 2 * i; + src2.a[i] = 3 * i; + } + + for (i = 0; i < SIZE_RES; i++) + { + res1.a[i] = 0x7FFF; + res2.a[i] = DEFAULT_VALUE; + res3.a[i] = DEFAULT_VALUE; + } + + CALC (res_ref, src1.a, src2.a, 0x21); + CALC (res_ref2, src1.a, src2.a, 0x21); + + res1.x = INTRINSIC (_mpsadbw_epu8) (src1.x, src2.x, 0x21); + res2.x = INTRINSIC (_mask_mpsadbw_epu8) (res2.x, mask, src1.x, src2.x, 0x21); + res3.x = INTRINSIC (_maskz_mpsadbw_epu8) (mask, src1.x, src2.x, 0x21); + + if (UNION_CHECK (AVX512F_LEN, i_w) (res1, res_ref)) + abort (); + + MASK_MERGE (i_w) (res_ref2, mask, SIZE_RES); + if (UNION_CHECK (AVX512F_LEN, i_w) (res2, res_ref2)) + abort (); + + MASK_ZERO (i_w) (res_ref2, mask, SIZE_RES); + if (UNION_CHECK (AVX512F_LEN, i_w) (res3, res_ref2)) + abort (); +} diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-512-vpdpwsud-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-512-vpdpwsud-2.c new file mode 100644 index 00000000000..1643f6f0803 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx10_2-512-vpdpwsud-2.c @@ -0,0 +1,71 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mavx10.2-512" } */ +/* { dg-require-effective-target avx10_2_512 } */ + +#ifndef AVX10_2 +#define AVX10_2 +#define AVX10_2_512 +#define AVX10_512BIT +#endif + +#include "avx10-helper.h" + +#define SIZE (AVX512F_LEN / 16) +#define SIZE_RES (AVX512F_LEN / 32) + + +static void +CALC (int *r, int *dst, short *s1, unsigned short *s2) +{ + int tempres[SIZE]; + for (int i = 0; i < SIZE; i++) + tempres[i] = (int) s1[i] * (unsigned int) s2[i]; + for (int i = 0; i < SIZE_RES; i++) + { + long long test = (long long) dst[i] + tempres[i * 2] + tempres[i * 2 + 1]; + r[i] = test; + } +} + +void +TEST (void) +{ + int i; + UNION_TYPE (AVX512F_LEN, i_d) res1, res2, res3; + UNION_TYPE (AVX512F_LEN, i_w) src1; + UNION_TYPE (AVX512F_LEN, i_uw) src2; + MASK_TYPE mask = MASK_VALUE; + int res_ref[SIZE_RES], res_ref2[SIZE_RES]; + + for (i = 0; i < SIZE; i++) + { + int sign = i % 2 ? 1 : -1; + src1.a[i] = sign * (10 + 3 * i * i); + src2.a[i] = sign * 10 * i * i; + } + + for (i = 0; i < SIZE_RES; i++) + { + res1.a[i] = 0x7FFFFFFF; + res2.a[i] = DEFAULT_VALUE; + res3.a[i] = DEFAULT_VALUE; + } + + CALC (res_ref, res1.a, src1.a, src2.a); + CALC (res_ref2, res2.a, src1.a, src2.a); + + res1.x = INTRINSIC (_dpwsud_epi32) (res1.x, src1.x, src2.x); + res2.x = INTRINSIC (_mask_dpwsud_epi32) (res2.x, mask, src1.x, src2.x); + res3.x = INTRINSIC (_maskz_dpwsud_epi32) (mask, res3.x, src1.x, src2.x); + + if (UNION_CHECK (AVX512F_LEN, i_d) (res1, res_ref)) + abort (); + + MASK_MERGE (i_d) (res_ref2, mask, SIZE_RES); + if (UNION_CHECK (AVX512F_LEN, i_d) (res2, res_ref2)) + abort (); + + MASK_ZERO (i_d) (res_ref2, mask, SIZE_RES); + if (UNION_CHECK (AVX512F_LEN, i_d) (res3, res_ref2)) + abort (); +} diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-512-vpdpwsuds-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-512-vpdpwsuds-2.c new file mode 100644 index 00000000000..7c959119a2a --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx10_2-512-vpdpwsuds-2.c @@ -0,0 +1,74 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mavx10.2-512" } */ +/* { dg-require-effective-target avx10_2_512 } */ + +#ifndef AVX10_2 +#define AVX10_2 +#define AVX10_2_512 +#define AVX10_512BIT +#endif + +#include "avx10-helper.h" + +#define SIZE (AVX512F_LEN / 16) +#define SIZE_RES (AVX512F_LEN / 32) + + +static void +CALC (int *r, int *dst, short *s1, unsigned short *s2) +{ + int tempres[SIZE]; + for (int i = 0; i < SIZE; i++) + tempres[i] = (int) s1[i] * (unsigned int) s2[i]; + for (int i = 0; i < SIZE_RES; i++) + { + long long test = (long long) dst[i] + tempres[i * 2] + tempres[i * 2 + 1]; + long long max_int = 0x7FFFFFFF; + if (test > max_int) + test = max_int; + r[i] = test; + } +} + +void +TEST (void) +{ + int i; + UNION_TYPE (AVX512F_LEN, i_d) res1, res2, res3; + UNION_TYPE (AVX512F_LEN, i_w) src1; + UNION_TYPE (AVX512F_LEN, i_uw) src2; + MASK_TYPE mask = MASK_VALUE; + int res_ref[SIZE_RES], res_ref2[SIZE_RES]; + + for (i = 0; i < SIZE; i++) + { + int sign = i % 2 ? 1 : -1; + src1.a[i] = sign * (10 + 3 * i * i); + src2.a[i] = sign * 10 * i * i; + } + + for (i = 0; i < SIZE_RES; i++) + { + res1.a[i] = 0x7FFFFFFF; + res2.a[i] = DEFAULT_VALUE; + res3.a[i] = DEFAULT_VALUE; + } + + CALC (res_ref, res1.a, src1.a, src2.a); + CALC (res_ref2, res2.a, src1.a, src2.a); + + res1.x = INTRINSIC (_dpwsuds_epi32) (res1.x, src1.x, src2.x); + res2.x = INTRINSIC (_mask_dpwsuds_epi32) (res2.x, mask, src1.x, src2.x); + res3.x = INTRINSIC (_maskz_dpwsuds_epi32) (mask, res3.x, src1.x, src2.x); + + if (UNION_CHECK (AVX512F_LEN, i_d) (res1, res_ref)) + abort (); + + MASK_MERGE (i_d) (res_ref2, mask, SIZE_RES); + if (UNION_CHECK (AVX512F_LEN, i_d) (res2, res_ref2)) + abort (); + + MASK_ZERO (i_d) (res_ref2, mask, SIZE_RES); + if (UNION_CHECK (AVX512F_LEN, i_d) (res3, res_ref2)) + abort (); +} diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-512-vpdpwusd-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-512-vpdpwusd-2.c new file mode 100644 index 00000000000..b780e41bfba --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx10_2-512-vpdpwusd-2.c @@ -0,0 +1,71 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mavx10.2-512" } */ +/* { dg-require-effective-target avx10_2_512 } */ + +#ifndef AVX10_2 +#define AVX10_2 +#define AVX10_2_512 +#define AVX10_512BIT +#endif + +#include "avx10-helper.h" + +#define SIZE (AVX512F_LEN / 16) +#define SIZE_RES (AVX512F_LEN / 32) + + +static void +CALC (int *r, int *dst, unsigned short *s1, short *s2) +{ + int tempres[SIZE]; + for (int i = 0; i < SIZE; i++) + tempres[i] = (unsigned int) s1[i] * (int) s2[i]; + for (int i = 0; i < SIZE_RES; i++) + { + long long test = (long long) dst[i] + tempres[i * 2] + tempres[i * 2 + 1]; + r[i] = test; + } +} + +void +TEST (void) +{ + int i; + UNION_TYPE (AVX512F_LEN, i_d) res1, res2, res3; + UNION_TYPE (AVX512F_LEN, i_uw) src1; + UNION_TYPE (AVX512F_LEN, i_w) src2; + MASK_TYPE mask = MASK_VALUE; + int res_ref[SIZE_RES], res_ref2[SIZE_RES]; + + for (i = 0; i < SIZE; i++) + { + int sign = i % 2 ? 1 : -1; + src1.a[i] = sign * 10 * i * i; + src2.a[i] = 10 + 3 * i * i + sign; + } + + for (i = 0; i < SIZE_RES; i++) + { + res1.a[i] = 0x7FFFFFFF; + res2.a[i] = DEFAULT_VALUE; + res3.a[i] = DEFAULT_VALUE; + } + + CALC (res_ref, res1.a, src1.a, src2.a); + CALC (res_ref2, res2.a, src1.a, src2.a); + + res1.x = INTRINSIC (_dpwusd_epi32) (res1.x, src1.x, src2.x); + res2.x = INTRINSIC (_mask_dpwusd_epi32) (res2.x, mask, src1.x, src2.x); + res3.x = INTRINSIC (_maskz_dpwusd_epi32) (mask, res3.x, src1.x, src2.x); + + if (UNION_CHECK (AVX512F_LEN, i_d) (res1, res_ref)) + abort (); + + MASK_MERGE (i_d) (res_ref2, mask, SIZE_RES); + if (UNION_CHECK (AVX512F_LEN, i_d) (res2, res_ref2)) + abort (); + + MASK_ZERO (i_d) (res_ref2, mask, SIZE_RES); + if (UNION_CHECK (AVX512F_LEN, i_d) (res3, res_ref2)) + abort (); +} diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-512-vpdpwusds-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-512-vpdpwusds-2.c new file mode 100644 index 00000000000..922d4b37ab8 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx10_2-512-vpdpwusds-2.c @@ -0,0 +1,74 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mavx10.2-512" } */ +/* { dg-require-effective-target avx10_2_512 } */ + +#ifndef AVX10_2 +#define AVX10_2 +#define AVX10_2_512 +#define AVX10_512BIT +#endif + +#include "avx10-helper.h" + +#define SIZE (AVX512F_LEN / 16) +#define SIZE_RES (AVX512F_LEN / 32) + + +static void +CALC (int *r, int *dst, unsigned short *s1, short *s2) +{ + int tempres[SIZE]; + for (int i = 0; i < SIZE; i++) + tempres[i] = (unsigned int) s1[i] * (int) s2[i]; + for (int i = 0; i < SIZE_RES; i++) + { + long long test = (long long) dst[i] + tempres[i * 2] + tempres[i * 2 + 1]; + long long max_int = 0x7FFFFFFF; + if (test > max_int) + test = max_int; + r[i] = test; + } +} + +void +TEST (void) +{ + int i; + UNION_TYPE (AVX512F_LEN, i_d) res1, res2, res3; + UNION_TYPE (AVX512F_LEN, i_uw) src1; + UNION_TYPE (AVX512F_LEN, i_w) src2; + MASK_TYPE mask = MASK_VALUE; + int res_ref[SIZE_RES], res_ref2[SIZE_RES]; + + for (i = 0; i < SIZE; i++) + { + int sign = i % 2 ? 1 : -1; + src1.a[i] = sign * 10 * i * i; + src2.a[i] = 10 + 3 * i * i + sign; + } + + for (i = 0; i < SIZE_RES; i++) + { + res1.a[i] = 0x7FFFFFFF; + res2.a[i] = DEFAULT_VALUE; + res3.a[i] = DEFAULT_VALUE; + } + + CALC (res_ref, res1.a, src1.a, src2.a); + CALC (res_ref2, res2.a, src1.a, src2.a); + + res1.x = INTRINSIC (_dpwusds_epi32) (res1.x, src1.x, src2.x); + res2.x = INTRINSIC (_mask_dpwusds_epi32) (res2.x, mask, src1.x, src2.x); + res3.x = INTRINSIC (_maskz_dpwusds_epi32) (mask, res3.x, src1.x, src2.x); + + if (UNION_CHECK (AVX512F_LEN, i_d) (res1, res_ref)) + abort (); + + MASK_MERGE (i_d) (res_ref2, mask, SIZE_RES); + if (UNION_CHECK (AVX512F_LEN, i_d) (res2, res_ref2)) + abort (); + + MASK_ZERO (i_d) (res_ref2, mask, SIZE_RES); + if (UNION_CHECK (AVX512F_LEN, i_d) (res3, res_ref2)) + abort (); +} diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-512-vpdpwuud-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-512-vpdpwuud-2.c new file mode 100644 index 00000000000..d9f5dba8dff --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx10_2-512-vpdpwuud-2.c @@ -0,0 +1,70 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mavx10.2-512" } */ +/* { dg-require-effective-target avx10_2_512 } */ + +#ifndef AVX10_2 +#define AVX10_2 +#define AVX10_2_512 +#define AVX10_512BIT +#endif + +#include "avx10-helper.h" + +#define SIZE (AVX512F_LEN / 16) +#define SIZE_RES (AVX512F_LEN / 32) + + +static void +CALC (int *r, int *dst, unsigned short *s1, unsigned short *s2) +{ + unsigned int tempres[SIZE]; + for (int i = 0; i < SIZE; i++) + tempres[i] = (unsigned int) s1[i] * (unsigned int) s2[i]; + for (int i = 0; i < SIZE_RES; i++) + { + long long test = (long long) dst[i] + tempres[i * 2] + tempres[i * 2 + 1]; + r[i] = test; + } +} + +void +TEST (void) +{ + int i; + UNION_TYPE (AVX512F_LEN, i_d) res1, res2, res3; + UNION_TYPE (AVX512F_LEN, i_uw) src1; + UNION_TYPE (AVX512F_LEN, i_uw) src2; + MASK_TYPE mask = MASK_VALUE; + int res_ref[SIZE_RES], res_ref2[SIZE_RES]; + + for (i = 0; i < SIZE; i++) + { + src1.a[i] = 10 + 3 * i * i; + src2.a[i] = 10 * i * i; + } + + for (i = 0; i < SIZE_RES; i++) + { + res1.a[i] = 0x7FFFFFFF; + res2.a[i] = DEFAULT_VALUE; + res3.a[i] = DEFAULT_VALUE; + } + + CALC (res_ref, res1.a, src1.a, src2.a); + CALC (res_ref2, res2.a, src1.a, src2.a); + + res1.x = INTRINSIC (_dpwuud_epi32) (res1.x, src1.x, src2.x); + res2.x = INTRINSIC (_mask_dpwuud_epi32) (res2.x, mask, src1.x, src2.x); + res3.x = INTRINSIC (_maskz_dpwuud_epi32) (mask, res3.x, src1.x, src2.x); + + if (UNION_CHECK (AVX512F_LEN, i_d) (res1, res_ref)) + abort (); + + MASK_MERGE (i_d) (res_ref2, mask, SIZE_RES); + if (UNION_CHECK (AVX512F_LEN, i_d) (res2, res_ref2)) + abort (); + + MASK_ZERO (i_d) (res_ref2, mask, SIZE_RES); + if (UNION_CHECK (AVX512F_LEN, i_d) (res3, res_ref2)) + abort (); +} diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-512-vpdpwuuds-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-512-vpdpwuuds-2.c new file mode 100644 index 00000000000..da3c82bd4cc --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx10_2-512-vpdpwuuds-2.c @@ -0,0 +1,73 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mavx10.2-512" } */ +/* { dg-require-effective-target avx10_2_512 } */ + +#ifndef AVX10_2 +#define AVX10_2 +#define AVX10_2_512 +#define AVX10_512BIT +#endif + +#include "avx10-helper.h" + +#define SIZE (AVX512F_LEN / 16) +#define SIZE_RES (AVX512F_LEN / 32) + + +static void +CALC (int *r, int *dst, unsigned short *s1, unsigned short *s2) +{ + unsigned int tempres[SIZE]; + for (int i = 0; i < SIZE; i++) + tempres[i] = (unsigned int) s1[i] * (unsigned int) s2[i]; + for (int i = 0; i < SIZE_RES; i++) + { + long long test = (long long) dst[i] + tempres[i * 2] + tempres[i * 2 + 1]; + long long max_uint = 0xFFFFFFFF; + if (test > max_uint) + test = max_uint; + r[i] = test; + } +} + +void +TEST (void) +{ + int i; + UNION_TYPE (AVX512F_LEN, i_d) res1, res2, res3; + UNION_TYPE (AVX512F_LEN, i_uw) src1; + UNION_TYPE (AVX512F_LEN, i_uw) src2; + MASK_TYPE mask = MASK_VALUE; + int res_ref[SIZE_RES], res_ref2[SIZE_RES]; + + for (i = 0; i < SIZE; i++) + { + src1.a[i] = 10 + 3 * i * i; + src2.a[i] = 10 * i * i; + } + + for (i = 0; i < SIZE_RES; i++) + { + res1.a[i] = 0x7FFFFFFF; + res2.a[i] = DEFAULT_VALUE; + res3.a[i] = DEFAULT_VALUE; + } + + CALC (res_ref, res1.a, src1.a, src2.a); + CALC (res_ref2, res2.a, src1.a, src2.a); + + res1.x = INTRINSIC (_dpwuuds_epi32) (res1.x, src1.x, src2.x); + res2.x = INTRINSIC (_mask_dpwuuds_epi32) (res2.x, mask, src1.x, src2.x); + res3.x = INTRINSIC (_maskz_dpwuuds_epi32) (mask, res3.x, src1.x, src2.x); + + if (UNION_CHECK (AVX512F_LEN, i_d) (res1, res_ref)) + abort (); + + MASK_MERGE (i_d) (res_ref2, mask, SIZE_RES); + if (UNION_CHECK (AVX512F_LEN, i_d) (res2, res_ref2)) + abort (); + + MASK_ZERO (i_d) (res_ref2, mask, SIZE_RES); + if (UNION_CHECK (AVX512F_LEN, i_d) (res3, res_ref2)) + abort (); +} diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-builtin-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-builtin-2.c new file mode 100644 index 00000000000..521768e92b6 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx10_2-builtin-2.c @@ -0,0 +1,8 @@ +/* { dg-do compile } */ +/* { dg-options "-O0 -mavx10.2 -mno-avxvnniint16" } */ +typedef int v8si __attribute__ ((vector_size (32))); +v8si +foo (v8si a, v8si b, v8si c) +{ + return __builtin_ia32_vpdpwsud256 (a, b, c); +} diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-media-1.c b/gcc/testsuite/gcc.target/i386/avx10_2-media-1.c index c2b3e5527d9..1be3605b81c 100644 --- a/gcc/testsuite/gcc.target/i386/avx10_2-media-1.c +++ b/gcc/testsuite/gcc.target/i386/avx10_2-media-1.c @@ -36,11 +36,62 @@ /* { dg-final { scan-assembler-times "vpdpbuuds\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ /* { dg-final { scan-assembler-times "vpdpbuuds\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\\n\\r]*%xmm\[0-9\]+\[^\\n\\r\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ /* { dg-final { scan-assembler-times "vpdpbuuds\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\\n\\r]*%xmm\[0-9\]+\[^\\n\\r\]*%xmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vpdpwsud\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vpdpwsud\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\\n\\r]*%ymm\[0-9\]+\[^\\n\\r\]*%ymm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vpdpwsud\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\\n\\r]*%ymm\[0-9\]+\[^\\n\\r\]*%ymm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vpdpwsud\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vpdpwsud\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\\n\\r]*%xmm\[0-9\]+\[^\\n\\r\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vpdpwsud\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\\n\\r]*%xmm\[0-9\]+\[^\\n\\r\]*%xmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vpdpwsuds\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vpdpwsuds\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\\n\\r]*%ymm\[0-9\]+\[^\\n\\r\]*%ymm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vpdpwsuds\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\\n\\r]*%ymm\[0-9\]+\[^\\n\\r\]*%ymm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vpdpwsuds\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vpdpwsuds\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\\n\\r]*%xmm\[0-9\]+\[^\\n\\r\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vpdpwsuds\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\\n\\r]*%xmm\[0-9\]+\[^\\n\\r\]*%xmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vpdpwusd\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vpdpwusd\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\\n\\r]*%ymm\[0-9\]+\[^\\n\\r\]*%ymm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vpdpwusd\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\\n\\r]*%ymm\[0-9\]+\[^\\n\\r\]*%ymm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vpdpwusd\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vpdpwusd\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\\n\\r]*%xmm\[0-9\]+\[^\\n\\r\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vpdpwusd\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\\n\\r]*%xmm\[0-9\]+\[^\\n\\r\]*%xmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vpdpwusds\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vpdpwusds\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\\n\\r]*%ymm\[0-9\]+\[^\\n\\r\]*%ymm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vpdpwusds\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\\n\\r]*%ymm\[0-9\]+\[^\\n\\r\]*%ymm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vpdpwusds\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vpdpwusds\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\\n\\r]*%xmm\[0-9\]+\[^\\n\\r\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vpdpwusds\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\\n\\r]*%xmm\[0-9\]+\[^\\n\\r\]*%xmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vpdpwuud\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vpdpwuud\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\\n\\r]*%ymm\[0-9\]+\[^\\n\\r\]*%ymm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vpdpwuud\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\\n\\r]*%ymm\[0-9\]+\[^\\n\\r\]*%ymm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vpdpwuud\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vpdpwuud\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\\n\\r]*%xmm\[0-9\]+\[^\\n\\r\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vpdpwuud\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\\n\\r]*%xmm\[0-9\]+\[^\\n\\r\]*%xmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vpdpwuuds\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vpdpwuuds\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\\n\\r]*%ymm\[0-9\]+\[^\\n\\r\]*%ymm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vpdpwuuds\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\\n\\r]*%ymm\[0-9\]+\[^\\n\\r\]*%ymm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vpdpwuuds\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vpdpwuuds\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\\n\\r]*%xmm\[0-9\]+\[^\\n\\r\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vpdpwuuds\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\\n\\r]*%xmm\[0-9\]+\[^\\n\\r\]*%xmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vdpphps\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vdpphps\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\\n\\r]*%ymm\[0-9\]+\[^\\n\\r\]*%ymm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vdpphps\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\\n\\r]*%ymm\[0-9\]+\[^\\n\\r\]*%ymm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vdpphps\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vdpphps\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\\n\\r]*%xmm\[0-9\]+\[^\\n\\r\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vdpphps\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\\n\\r]*%xmm\[0-9\]+\[^\\n\\r\]*%xmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vmpsadbw\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\\n\\r]*%ymm\[0-9\]+\[^\\n\\r\]*%ymm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vmpsadbw\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\\n\\r]*%ymm\[0-9\]+\[^\\n\\r\]*%ymm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vmpsadbw\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\\n\\r]*%xmm\[0-9\]+\[^\\n\\r\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vmpsadbw\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\\n\\r]*%xmm\[0-9\]+\[^\\n\\r\]*%xmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ #include +volatile __m256 a; +volatile __m256h b,c; volatile __m256i x,y,z; +volatile __m128 a_; +volatile __m128h b_,c_; volatile __m128i x_,y_,z_; +volatile __mmask16 m16; volatile __mmask8 m; void extern @@ -93,4 +144,65 @@ avx10_2_test (void) x_ = _mm_dpbuuds_epi32 (x_, y_, z_); x_ = _mm_mask_dpbuuds_epi32 (x_, m, y_, z_); x_ = _mm_maskz_dpbuuds_epi32 (m, x_, y_, z_); + + x = _mm256_dpwsud_epi32 (x, y, z); + x = _mm256_mask_dpwsud_epi32 (x, m, y, z); + x = _mm256_maskz_dpwsud_epi32 (m, x, y, z); + + x_ = _mm_dpwsud_epi32 (x_, y_, z_); + x_ = _mm_mask_dpwsud_epi32 (x_, m, y_, z_); + x_ = _mm_maskz_dpwsud_epi32 (m, x_, y_, z_); + + x = _mm256_dpwsuds_epi32 (x, y, z); + x = _mm256_mask_dpwsuds_epi32 (x, m, y, z); + x = _mm256_maskz_dpwsuds_epi32 (m, x, y, z); + + x_ = _mm_dpwsuds_epi32 (x_, y_, z_); + x_ = _mm_mask_dpwsuds_epi32 (x_, m, y_, z_); + x_ = _mm_maskz_dpwsuds_epi32 (m, x_, y_, z_); + + x = _mm256_dpwusd_epi32 (x, y, z); + x = _mm256_mask_dpwusd_epi32 (x, m, y, z); + x = _mm256_maskz_dpwusd_epi32 (m, x, y, z); + + x_ = _mm_dpwusd_epi32 (x_, y_, z_); + x_ = _mm_mask_dpwusd_epi32 (x_, m, y_, z_); + x_ = _mm_maskz_dpwusd_epi32 (m, x_, y_, z_); + + x = _mm256_dpwusds_epi32 (x, y, z); + x = _mm256_mask_dpwusds_epi32 (x, m, y, z); + x = _mm256_maskz_dpwusds_epi32 (m, x, y, z); + + x_ = _mm_dpwusds_epi32 (x_, y_, z_); + x_ = _mm_mask_dpwusds_epi32 (x_, m, y_, z_); + x_ = _mm_maskz_dpwusds_epi32 (m, x_, y_, z_); + + x = _mm256_dpwuud_epi32 (x, y, z); + x = _mm256_mask_dpwuud_epi32 (x, m, y, z); + x = _mm256_maskz_dpwuud_epi32 (m, x, y, z); + + x_ = _mm_dpwuud_epi32 (x_, y_, z_); + x_ = _mm_mask_dpwuud_epi32 (x_, m, y_, z_); + x_ = _mm_maskz_dpwuud_epi32 (m, x_, y_, z_); + + x = _mm256_dpwuuds_epi32 (x, y, z); + x = _mm256_mask_dpwuuds_epi32 (x, m, y, z); + x = _mm256_maskz_dpwuuds_epi32 (m, x, y, z); + + x_ = _mm_dpwuuds_epi32 (x_, y_, z_); + x_ = _mm_mask_dpwuuds_epi32 (x_, m, y_, z_); + x_ = _mm_maskz_dpwuuds_epi32 (m, x_, y_, z_); + + a = _mm256_dpph_ps (a, b, c); + a = _mm256_mask_dpph_ps (a, m, b, c); + a = _mm256_maskz_dpph_ps (m, a, b, c); + + a_ = _mm_dpph_ps (a_, b_, c_); + a_ = _mm_mask_dpph_ps (a_, m, b_, c_); + a_ = _mm_maskz_dpph_ps (m, a_, b_, c_); + + x = _mm256_mask_mpsadbw_epu8 (x, m16, y, z, 1); + x = _mm256_maskz_mpsadbw_epu8 (m16, x, y, 1); + x_ = _mm_mask_mpsadbw_epu8 (x_, m, y_, z_, 1); + x_ = _mm_maskz_mpsadbw_epu8 (m, x_, y_, 1); } diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-vdpphps-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-vdpphps-2.c new file mode 100644 index 00000000000..26d98b70590 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx10_2-vdpphps-2.c @@ -0,0 +1,16 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mavx10.2" } */ +/* { dg-require-effective-target avx10_2 } */ + +#define AVX10_2 +#define AVX512VL +#define AVX512F_LEN 256 +#define AVX512F_LEN_HALF 128 +#include "avx10_2-512-vdpphps-2.c" + +#undef AVX512F_LEN +#undef AVX512F_LEN_HALF + +#define AVX512F_LEN 128 +#define AVX512F_LEN_HALF 128 +#include "avx10_2-512-vdpphps-2.c" diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-vmpsadbw-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-vmpsadbw-2.c new file mode 100644 index 00000000000..746ea7baacb --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx10_2-vmpsadbw-2.c @@ -0,0 +1,16 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mavx10.2" } */ +/* { dg-require-effective-target avx10_2 } */ + +#define AVX10_2 +#define AVX512VL +#define AVX512F_LEN 256 +#define AVX512F_LEN_HALF 128 +#include "avx10_2-512-vmpsadbw-2.c" + +#undef AVX512F_LEN +#undef AVX512F_LEN_HALF + +#define AVX512F_LEN 128 +#define AVX512F_LEN_HALF 128 +#include "avx10_2-512-vmpsadbw-2.c" diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-vpdpwsud-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-vpdpwsud-2.c new file mode 100644 index 00000000000..e1c7a81b54f --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx10_2-vpdpwsud-2.c @@ -0,0 +1,16 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mavx10.2" } */ +/* { dg-require-effective-target avx10_2 } */ + +#define AVX10_2 +#define AVX512VL +#define AVX512F_LEN 256 +#define AVX512F_LEN_HALF 128 +#include "avx10_2-512-vpdpwsud-2.c" + +#undef AVX512F_LEN +#undef AVX512F_LEN_HALF + +#define AVX512F_LEN 128 +#define AVX512F_LEN_HALF 128 +#include "avx10_2-512-vpdpwsud-2.c" diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-vpdpwsuds-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-vpdpwsuds-2.c new file mode 100644 index 00000000000..d046fd8747a --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx10_2-vpdpwsuds-2.c @@ -0,0 +1,16 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mavx10.2" } */ +/* { dg-require-effective-target avx10_2 } */ + +#define AVX10_2 +#define AVX512VL +#define AVX512F_LEN 256 +#define AVX512F_LEN_HALF 128 +#include "avx10_2-512-vpdpwsuds-2.c" + +#undef AVX512F_LEN +#undef AVX512F_LEN_HALF + +#define AVX512F_LEN 128 +#define AVX512F_LEN_HALF 128 +#include "avx10_2-512-vpdpwsuds-2.c" diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-vpdpwusd-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-vpdpwusd-2.c new file mode 100644 index 00000000000..5a8af9b8728 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx10_2-vpdpwusd-2.c @@ -0,0 +1,16 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mavx10.2" } */ +/* { dg-require-effective-target avx10_2 } */ + +#define AVX10_2 +#define AVX512VL +#define AVX512F_LEN 256 +#define AVX512F_LEN_HALF 128 +#include "avx10_2-512-vpdpwusd-2.c" + +#undef AVX512F_LEN +#undef AVX512F_LEN_HALF + +#define AVX512F_LEN 128 +#define AVX512F_LEN_HALF 128 +#include "avx10_2-512-vpdpwusd-2.c" diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-vpdpwusds-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-vpdpwusds-2.c new file mode 100644 index 00000000000..88d877f381a --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx10_2-vpdpwusds-2.c @@ -0,0 +1,16 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mavx10.2" } */ +/* { dg-require-effective-target avx10_2 } */ + +#define AVX10_2 +#define AVX512VL +#define AVX512F_LEN 256 +#define AVX512F_LEN_HALF 128 +#include "avx10_2-512-vpdpwusds-2.c" + +#undef AVX512F_LEN +#undef AVX512F_LEN_HALF + +#define AVX512F_LEN 128 +#define AVX512F_LEN_HALF 128 +#include "avx10_2-512-vpdpwusds-2.c" diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-vpdpwuud-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-vpdpwuud-2.c new file mode 100644 index 00000000000..aaefe02d29d --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx10_2-vpdpwuud-2.c @@ -0,0 +1,16 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mavx10.2" } */ +/* { dg-require-effective-target avx10_2 } */ + +#define AVX10_2 +#define AVX512VL +#define AVX512F_LEN 256 +#define AVX512F_LEN_HALF 128 +#include "avx10_2-512-vpdpwuud-2.c" + +#undef AVX512F_LEN +#undef AVX512F_LEN_HALF + +#define AVX512F_LEN 128 +#define AVX512F_LEN_HALF 128 +#include "avx10_2-512-vpdpwuud-2.c" diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-vpdpwuuds-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-vpdpwuuds-2.c new file mode 100644 index 00000000000..6a61112e161 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx10_2-vpdpwuuds-2.c @@ -0,0 +1,16 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mavx10.2" } */ +/* { dg-require-effective-target avx10_2 } */ + +#define AVX10_2 +#define AVX512VL +#define AVX512F_LEN 256 +#define AVX512F_LEN_HALF 128 +#include "avx10_2-512-vpdpwuuds-2.c" + +#undef AVX512F_LEN +#undef AVX512F_LEN_HALF + +#define AVX512F_LEN 128 +#define AVX512F_LEN_HALF 128 +#include "avx10_2-512-vpdpwuuds-2.c" diff --git a/gcc/testsuite/gcc.target/i386/avxvnniint16-1.c b/gcc/testsuite/gcc.target/i386/avxvnniint16-1.c index 6ae57b150fe..5a093c97351 100644 --- a/gcc/testsuite/gcc.target/i386/avxvnniint16-1.c +++ b/gcc/testsuite/gcc.target/i386/avxvnniint16-1.c @@ -1,17 +1,17 @@ /* { dg-do compile } */ /* { dg-options "-mavxvnniint16 -O2" } */ -/* { dg-final { scan-assembler-times "vpdpwusd\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ -/* { dg-final { scan-assembler-times "vpdpwusd\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ -/* { dg-final { scan-assembler-times "vpdpwusds\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ -/* { dg-final { scan-assembler-times "vpdpwusds\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ -/* { dg-final { scan-assembler-times "vpdpwsud\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ -/* { dg-final { scan-assembler-times "vpdpwsud\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ -/* { dg-final { scan-assembler-times "vpdpwsuds\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ -/* { dg-final { scan-assembler-times "vpdpwsuds\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ -/* { dg-final { scan-assembler-times "vpdpwuud\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ -/* { dg-final { scan-assembler-times "vpdpwuud\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ -/* { dg-final { scan-assembler-times "vpdpwuuds\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ -/* { dg-final { scan-assembler-times "vpdpwuuds\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vpdpwusd\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 2 } } */ +/* { dg-final { scan-assembler-times "vpdpwusd\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 2 } } */ +/* { dg-final { scan-assembler-times "vpdpwusds\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 2 } } */ +/* { dg-final { scan-assembler-times "vpdpwusds\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 2 } } */ +/* { dg-final { scan-assembler-times "vpdpwsud\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 2 } } */ +/* { dg-final { scan-assembler-times "vpdpwsud\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 2 } } */ +/* { dg-final { scan-assembler-times "vpdpwsuds\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 2 } } */ +/* { dg-final { scan-assembler-times "vpdpwsuds\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 2 } } */ +/* { dg-final { scan-assembler-times "vpdpwuud\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 2 } } */ +/* { dg-final { scan-assembler-times "vpdpwuud\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 2 } } */ +/* { dg-final { scan-assembler-times "vpdpwuuds\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 2 } } */ +/* { dg-final { scan-assembler-times "vpdpwuuds\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 2 } } */ #include @@ -40,4 +40,22 @@ avxvnniint16_test (void) x = _mm256_dpwuuds_avx_epi32 (x, y, z); x_ = _mm_dpwuuds_avx_epi32 (x_, y_, z_); + + x = _mm256_dpwusd_epi32 (x, y, z); + x_ = _mm_dpwusd_epi32 (x_, y_, z_); + + x = _mm256_dpwusds_epi32 (x, y, z); + x_ = _mm_dpwusds_epi32 (x_, y_, z_); + + x = _mm256_dpwsud_epi32 (x, y, z); + x_ = _mm_dpwsud_epi32 (x_, y_, z_); + + x = _mm256_dpwsuds_epi32 (x, y, z); + x_ = _mm_dpwsuds_epi32 (x_, y_, z_); + + x = _mm256_dpwuud_epi32 (x, y, z); + x_ = _mm_dpwuud_epi32 (x_, y_, z_); + + x = _mm256_dpwuuds_epi32 (x, y, z); + x_ = _mm_dpwuuds_epi32 (x_, y_, z_); } diff --git a/gcc/testsuite/gcc.target/i386/avxvnniint16-builtin.c b/gcc/testsuite/gcc.target/i386/avxvnniint16-builtin.c new file mode 100644 index 00000000000..10e9b643920 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avxvnniint16-builtin.c @@ -0,0 +1,8 @@ +/* { dg-do compile } */ +/* { dg-options "-O0 -mavxvnniint16 -mno-avx10.2" } */ +typedef int v8si __attribute__ ((vector_size (32))); +v8si +foo (v8si a, v8si b, v8si c) +{ + return __builtin_ia32_vpdpwsud256 (a, b, c); +} diff --git a/gcc/testsuite/gcc.target/i386/sse-13.c b/gcc/testsuite/gcc.target/i386/sse-13.c index a5b1775ed2d..6b1c9e545f0 100644 --- a/gcc/testsuite/gcc.target/i386/sse-13.c +++ b/gcc/testsuite/gcc.target/i386/sse-13.c @@ -1010,4 +1010,12 @@ #define __builtin_ia32_subph256_mask_round(A, B, C, D, E) __builtin_ia32_subph256_mask_round(A, B, C, D, 8) #define __builtin_ia32_subps256_mask_round(A, B, C, D, E) __builtin_ia32_subps256_mask_round(A, B, C, D, 8) +/* avx10_2-512mediaintrin.h */ +#define __builtin_ia32_mpsadbw512(A, B, C) __builtin_ia32_mpsadbw512 (A, B, 1) +#define __builtin_ia32_mpsadbw512_mask(A, B, C, D, E) __builtin_ia32_mpsadbw512_mask (A, B, 1, D, E) + +/* avx10_2mediaintrin.h */ +#define __builtin_ia32_mpsadbw128_mask(A, B, C, D, E) __builtin_ia32_mpsadbw128_mask (A, B, 1, D, E) +#define __builtin_ia32_mpsadbw256_mask(A, B, C, D, E) __builtin_ia32_mpsadbw256_mask (A, B, 1, D, E) + #include diff --git a/gcc/testsuite/gcc.target/i386/sse-14.c b/gcc/testsuite/gcc.target/i386/sse-14.c index 4736b2a5d52..6dfdaa96c76 100644 --- a/gcc/testsuite/gcc.target/i386/sse-14.c +++ b/gcc/testsuite/gcc.target/i386/sse-14.c @@ -1371,3 +1371,14 @@ test_4x (_mm256_mask_fixupimm_round_pd, __m256d, __m256d, __mmask8, __m256d, __m test_4x (_mm256_mask_fixupimm_round_ps, __m256, __m256, __mmask8, __m256, __m256i, 3, 8) test_4x (_mm256_mask_range_round_pd, __m256d, __m256d, __mmask8, __m256d, __m256d, 15, 8) test_4x (_mm256_mask_range_round_ps, __m256, __m256, __mmask8, __m256, __m256, 15, 8) + +/* avx10_2-512mediaintrin.h */ +test_2 (_mm512_mpsadbw_epu8, __m512i, __m512i, __m512i, 1) +test_3 (_mm512_maskz_mpsadbw_epu8, __m512i, __mmask32, __m512i, __m512i, 1) +test_4 (_mm512_mask_mpsadbw_epu8, __m512i, __m512i, __mmask32, __m512i, __m512i, 1) + +/* avx10_2mediaintrin.h */ +test_3 (_mm_maskz_mpsadbw_epu8, __m128i, __mmask8, __m128i, __m128i, 1) +test_3 (_mm256_maskz_mpsadbw_epu8, __m256i, __mmask16, __m256i, __m256i, 1) +test_4 (_mm_mask_mpsadbw_epu8, __m128i, __m128i, __mmask8, __m128i, __m128i, 1) +test_4 (_mm256_mask_mpsadbw_epu8, __m256i, __m256i, __mmask16, __m256i, __m256i, 1) diff --git a/gcc/testsuite/gcc.target/i386/sse-22.c b/gcc/testsuite/gcc.target/i386/sse-22.c index 5bfccd52630..102b6b878c8 100644 --- a/gcc/testsuite/gcc.target/i386/sse-22.c +++ b/gcc/testsuite/gcc.target/i386/sse-22.c @@ -1410,3 +1410,14 @@ test_4x (_mm256_mask_fixupimm_round_pd, __m256d, __m256d, __mmask8, __m256d, __m test_4x (_mm256_mask_fixupimm_round_ps, __m256, __m256, __mmask8, __m256, __m256i, 3, 8) test_4x (_mm256_mask_range_round_pd, __m256d, __m256d, __mmask8, __m256d, __m256d, 15, 8) test_4x (_mm256_mask_range_round_ps, __m256, __m256, __mmask8, __m256, __m256, 15, 8) + +/* avx10_2-512mediaintrin.h */ +test_2 (_mm512_mpsadbw_epu8, __m512i, __m512i, __m512i, 1) +test_3 (_mm512_maskz_mpsadbw_epu8, __m512i, __mmask32, __m512i, __m512i, 1) +test_4 (_mm512_mask_mpsadbw_epu8, __m512i, __m512i, __mmask32, __m512i, __m512i, 1) + +/* avx10_2mediaintrin.h */ +test_3 (_mm_maskz_mpsadbw_epu8, __m128i, __mmask8, __m128i, __m128i, 1) +test_3 (_mm256_maskz_mpsadbw_epu8, __m256i, __mmask16, __m256i, __m256i, 1) +test_4 (_mm_mask_mpsadbw_epu8, __m128i, __m128i, __mmask8, __m128i, __m128i, 1) +test_4 (_mm256_mask_mpsadbw_epu8, __m256i, __m256i, __mmask16, __m256i, __m256i, 1) diff --git a/gcc/testsuite/gcc.target/i386/sse-23.c b/gcc/testsuite/gcc.target/i386/sse-23.c index e63c100f452..962b9507283 100644 --- a/gcc/testsuite/gcc.target/i386/sse-23.c +++ b/gcc/testsuite/gcc.target/i386/sse-23.c @@ -984,6 +984,14 @@ #define __builtin_ia32_subph256_mask_round(A, B, C, D, E) __builtin_ia32_subph256_mask_round(A, B, C, D, 8) #define __builtin_ia32_subps256_mask_round(A, B, C, D, E) __builtin_ia32_subps256_mask_round(A, B, C, D, 8) +/* avx10_2-512mediaintrin.h */ +#define __builtin_ia32_mpsadbw512(A, B, C) __builtin_ia32_mpsadbw512 (A, B, 1) +#define __builtin_ia32_mpsadbw512_mask(A, B, C, D, E) __builtin_ia32_mpsadbw512_mask (A, B, 1, D, E) + +/* avx10_2-mediaintrin.h */ +#define __builtin_ia32_mpsadbw128_mask(A, B, C, D, E) __builtin_ia32_mpsadbw128_mask (A, B, 1, D, E) +#define __builtin_ia32_mpsadbw256_mask(A, B, C, D, E) __builtin_ia32_mpsadbw256_mask (A, B, 1, D, E) + #pragma GCC target ("sse4a,3dnow,avx,avx2,fma4,xop,aes,pclmul,popcnt,abm,lzcnt,bmi,bmi2,tbm,lwp,fsgsbase,rdrnd,f16c,fma,rtm,rdseed,prfchw,adx,fxsr,xsaveopt,sha,xsavec,xsaves,clflushopt,clwb,mwaitx,clzero,pku,sgx,rdpid,gfni,vpclmulqdq,pconfig,wbnoinvd,enqcmd,avx512vp2intersect,serialize,tsxldtrk,amx-tile,amx-int8,amx-bf16,kl,widekl,avxvnni,avxifma,avxvnniint8,avxneconvert,cmpccxadd,amx-fp16,prefetchi,raoint,amx-complex,avxvnniint16,sm3,sha512,sm4,avx10.2-512") #include From patchwork Mon Aug 19 08:56:48 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Haochen Jiang X-Patchwork-Id: 1973731 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=intel.com header.i=@intel.com header.a=rsa-sha256 header.s=Intel header.b=F1W40s4n; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=8.43.85.97; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4WnRPD13Dbz1yfj for ; Mon, 19 Aug 2024 18:59:44 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 544733864862 for ; Mon, 19 Aug 2024 08:59:42 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.18]) by sourceware.org (Postfix) with ESMTPS id A1BEF3864823 for ; Mon, 19 Aug 2024 08:57:24 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org A1BEF3864823 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=intel.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org A1BEF3864823 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=192.198.163.18 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1724057852; cv=none; b=w239WtRyN4WjD6YXdyT1ALrOBgkyUou6ELPW/gHRjKAz9f8//UNTL33aQ4tU/6YJQ45n6E0WMZDn7t75/1uTguT2pNb4SDhs2d2DlrQd5puNG87TSW3AxTMEYuIeGWWEkU8K5RqX+BDE34qh++RdgMeDrvHgcdjRO+40gol+rWc= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1724057852; c=relaxed/simple; bh=0ckBA7hicTE178xFBWzRhKNGFv7ONbEjmxjbI7D1Iu4=; h=DKIM-Signature:From:To:Subject:Date:Message-ID:MIME-Version; b=EicLma62WHXEcbRGHYeO4fUSzvaZR59DcZoxkiBL4E7rOd67InR8a37TvvYt1yzunLlh+ZTT2ndATDLXxokjf4U1n4hf9evCLmeyTsjOwEDP0O6WsWpbnT3toU6AgEJElXAIG9914fVOxKsIkzgWMUS1R2695OMN6FwmTM7mibw= ARC-Authentication-Results: i=1; server2.sourceware.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1724057845; x=1755593845; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=0ckBA7hicTE178xFBWzRhKNGFv7ONbEjmxjbI7D1Iu4=; b=F1W40s4n534dX3dsCIIJPouw6iVclgFHe8Lt/R8nbidZwh2M/6E34NJx VEM1CYq6volvAetqEPy7G1TvMmiEQNXpFc7K1fw/xkrBnNxCEwghTC54z z5tOBPOA3Y3Gq1DjcLGdI5ZAO4ASwf9z0dVXAY/lCHBTXwC8/OugfLb5F 3yonJ1emI2H946du4ZcI3RyTFmHalR9UpNvm8zBqZ6OSxjkbm5VcXHTO/ B3j5XlqdaGT6cmE0Nn/YcLd3lpN721fkajPYxvSOywDy3mjYAVOiTP42i JzgZGZ+Tc3Nd+zcYOSiEe9dOF8K81SR/JNLVsn0/bzfgiTiKZ3ARsm/NF A==; X-CSE-ConnectionGUID: gBhmHAAJQpyRVzwGoJsj2w== X-CSE-MsgGUID: xhSp32nUSzm/1djom7WWMA== X-IronPort-AV: E=McAfee;i="6700,10204,11168"; a="21837750" X-IronPort-AV: E=Sophos;i="6.10,158,1719903600"; d="scan'208";a="21837750" Received: from fmviesa001.fm.intel.com ([10.60.135.141]) by fmvoesa112.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Aug 2024 01:57:21 -0700 X-CSE-ConnectionGUID: Feyu4OvlRKOaUcU+9sWaXA== X-CSE-MsgGUID: PbtgmEocQ8W/hOYT03u/mg== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.10,158,1719903600"; d="scan'208";a="91084194" Received: from scymds04.sc.intel.com ([10.82.73.238]) by fmviesa001.fm.intel.com with ESMTP; 19 Aug 2024 01:57:21 -0700 Received: from icl-spr-01.jf.intel.com (icl-spr-01.jf.intel.com [10.165.54.241]) by scymds04.sc.intel.com (Postfix) with ESMTP id 8AD762003EAB; Mon, 19 Aug 2024 01:57:20 -0700 (PDT) From: Haochen Jiang To: gcc-patches@gcc.gnu.org Cc: hongtao.liu@intel.com, zewei.mo@pitt.edu, ubizjak@gmail.com, Levy Hsu , Kong Lingling Subject: [PATCH 04/12] AVX10.2: Support convert instructions Date: Mon, 19 Aug 2024 01:56:48 -0700 Message-ID: <20240819085717.193256-5-haochen.jiang@intel.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20240819085717.193256-1-haochen.jiang@intel.com> References: <20240819085717.193256-1-haochen.jiang@intel.com> MIME-Version: 1.0 X-Spam-Status: No, score=-10.8 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, SPF_HELO_NONE, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org From: Levy Hsu gcc/ChangeLog: * config.gcc: Add avx10_2-512convertintrin.h and avx10_2convertintrin.h. * config/i386/i386-builtin-types.def: Add new DEF_POINTER_TYPE and DEF_FUNCTION_TYPE. * config/i386/i386-builtin.def (BDESC): Add new builtins. * config/i386/i386-expand.cc (ix86_expand_args_builtin): Handle AVX10.2. (ix86_expand_round_builtin): Ditto. * config/i386/immintrin.h: Include avx10_2-512convertintrin.h, avx10_2convertintrin.h. * config/i386/sse.md (VHF_AVX10_2): New iterator. (avx10_2_cvtne2ps2ph_): New define_insn. (vcvt): Ditto. (vcvtv8hf): Ditto. (*vcvtv8hf): Ditto. (vcvtv8hf_mask): Ditto. (*vcvtv8hf_mask): Ditto. (vcvt): Ditto. (vcvthf82ph): Ditto. * config/i386/avx10_2-512convertintrin.h: New file. * config/i386/avx10_2convertintrin.h: Ditto. gcc/testsuite/ChangeLog: * gcc.target/i386/avx-1.c: Add macros for const. * gcc.target/i386/sse-13.c: Ditto. * gcc.target/i386/sse-14.c: Ditto. * gcc.target/i386/sse-22.c: Ditto. * gcc.target/i386/sse-23.c: Ditto. * gcc.target/i386/avx10_2-512-convert-1.c: New test. * gcc.target/i386/avx10_2-512-vcvt2ps2ph-2.c: Ditto. * gcc.target/i386/avx10_2-512-vcvtbiasph2bf8-2.c: Ditto. * gcc.target/i386/avx10_2-512-vcvtbiasph2bf8s-2.c: Ditto. * gcc.target/i386/avx10_2-512-vcvtbiasph2hf8-2.c: Ditto. * gcc.target/i386/avx10_2-512-vcvtbiasph2hf8s-2.c: Ditto. * gcc.target/i386/avx10_2-512-vcvthf82ph-2.c: Ditto. * gcc.target/i386/avx10_2-512-vcvtne2ph2bf8-2.c: Ditto. * gcc.target/i386/avx10_2-512-vcvtne2ph2bf8s-2.c: Ditto. * gcc.target/i386/avx10_2-512-vcvtne2ph2hf8-2.c: Ditto. * gcc.target/i386/avx10_2-512-vcvtne2ph2hf8s-2.c: Ditto. * gcc.target/i386/avx10_2-512-vcvtneph2bf8-2.c: Ditto. * gcc.target/i386/avx10_2-512-vcvtneph2bf8s-2.c: Ditto. * gcc.target/i386/avx10_2-512-vcvtneph2hf8-2.c: Ditto. * gcc.target/i386/avx10_2-512-vcvtneph2hf8s-2.c: Ditto. * gcc.target/i386/avx10_2-convert-1.c: Ditto. * gcc.target/i386/avx10_2-vcvt2ps2ph-2.c: Ditto. * gcc.target/i386/avx10_2-vcvtbiasph2bf8-2.c: Ditto. * gcc.target/i386/avx10_2-vcvtbiasph2bf8s-2.c: Ditto. * gcc.target/i386/avx10_2-vcvtbiasph2hf8-2.c: Ditto. * gcc.target/i386/avx10_2-vcvtbiasph2hf8s-2.c: Ditto. * gcc.target/i386/avx10_2-vcvthf82ph-2.c: Ditto. * gcc.target/i386/avx10_2-vcvtne2ph2bf8-2.c: Ditto. * gcc.target/i386/avx10_2-vcvtne2ph2bf8s-2.c: Ditto. * gcc.target/i386/avx10_2-vcvtne2ph2hf8-2.c: Ditto. * gcc.target/i386/avx10_2-vcvtne2ph2hf8s-2.c: Ditto. * gcc.target/i386/avx10_2-vcvtneph2bf8-2.c: Ditto. * gcc.target/i386/avx10_2-vcvtneph2bf8s-2.c: Ditto. * gcc.target/i386/avx10_2-vcvtneph2hf8-2.c: Ditto. * gcc.target/i386/avx10_2-vcvtneph2hf8s-2.c: Ditto. * gcc.target/i386/fp8-helper.h: New helper file. Co-authored-by: Levy Hsu Co-authored-by: Kong Lingling --- gcc/config.gcc | 3 +- gcc/config/i386/avx10_2-512convertintrin.h | 548 ++++++++++ gcc/config/i386/avx10_2convertintrin.h | 978 ++++++++++++++++++ gcc/config/i386/i386-builtin-types.def | 21 + gcc/config/i386/i386-builtin.def | 46 + gcc/config/i386/i386-expand.cc | 21 + gcc/config/i386/immintrin.h | 4 + gcc/config/i386/sse.md | 235 ++++- gcc/testsuite/gcc.target/i386/avx-1.c | 6 + gcc/testsuite/gcc.target/i386/avx-2.c | 3 +- .../gcc.target/i386/avx10_2-512-convert-1.c | 176 ++++ .../i386/avx10_2-512-vcvt2ps2phx-2.c | 51 + .../i386/avx10_2-512-vcvtbiasph2bf8-2.c | 59 ++ .../i386/avx10_2-512-vcvtbiasph2bf8s-2.c | 59 ++ .../i386/avx10_2-512-vcvtbiasph2hf8-2.c | 59 ++ .../i386/avx10_2-512-vcvtbiasph2hf8s-2.c | 59 ++ .../i386/avx10_2-512-vcvthf82ph-2.c | 45 + .../i386/avx10_2-512-vcvtne2ph2bf8-2.c | 65 ++ .../i386/avx10_2-512-vcvtne2ph2bf8s-2.c | 65 ++ .../i386/avx10_2-512-vcvtne2ph2hf8-2.c | 65 ++ .../i386/avx10_2-512-vcvtne2ph2hf8s-2.c | 65 ++ .../i386/avx10_2-512-vcvtneph2bf8-2.c | 58 ++ .../i386/avx10_2-512-vcvtneph2bf8s-2.c | 56 + .../i386/avx10_2-512-vcvtneph2hf8-2.c | 56 + .../i386/avx10_2-512-vcvtneph2hf8s-2.c | 56 + .../gcc.target/i386/avx10_2-convert-1.c | 274 +++++ .../gcc.target/i386/avx10_2-vcvt2ps2phx-2.c | 16 + .../i386/avx10_2-vcvtbiasph2bf8-2.c | 16 + .../i386/avx10_2-vcvtbiasph2bf8s-2.c | 16 + .../i386/avx10_2-vcvtbiasph2hf8-2.c | 16 + .../i386/avx10_2-vcvtbiasph2hf8s-2.c | 16 + .../gcc.target/i386/avx10_2-vcvthf82ph-2.c | 16 + .../gcc.target/i386/avx10_2-vcvtne2ph2bf8-2.c | 16 + .../i386/avx10_2-vcvtne2ph2bf8s-2.c | 16 + .../gcc.target/i386/avx10_2-vcvtne2ph2hf8-2.c | 16 + .../i386/avx10_2-vcvtne2ph2hf8s-2.c | 16 + .../gcc.target/i386/avx10_2-vcvtneph2bf8-2.c | 16 + .../gcc.target/i386/avx10_2-vcvtneph2bf8s-2.c | 16 + .../gcc.target/i386/avx10_2-vcvtneph2hf8-2.c | 16 + .../gcc.target/i386/avx10_2-vcvtneph2hf8s-2.c | 16 + gcc/testsuite/gcc.target/i386/fp8-helper.h | 135 +++ gcc/testsuite/gcc.target/i386/sse-13.c | 6 + gcc/testsuite/gcc.target/i386/sse-14.c | 6 + gcc/testsuite/gcc.target/i386/sse-22.c | 6 + gcc/testsuite/gcc.target/i386/sse-23.c | 6 + 45 files changed, 3511 insertions(+), 5 deletions(-) create mode 100644 gcc/config/i386/avx10_2-512convertintrin.h create mode 100644 gcc/config/i386/avx10_2convertintrin.h create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-512-convert-1.c create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-512-vcvt2ps2phx-2.c create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-512-vcvtbiasph2bf8-2.c create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-512-vcvtbiasph2bf8s-2.c create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-512-vcvtbiasph2hf8-2.c create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-512-vcvtbiasph2hf8s-2.c create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-512-vcvthf82ph-2.c create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-512-vcvtne2ph2bf8-2.c create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-512-vcvtne2ph2bf8s-2.c create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-512-vcvtne2ph2hf8-2.c create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-512-vcvtne2ph2hf8s-2.c create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-512-vcvtneph2bf8-2.c create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-512-vcvtneph2bf8s-2.c create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-512-vcvtneph2hf8-2.c create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-512-vcvtneph2hf8s-2.c create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-convert-1.c create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-vcvt2ps2phx-2.c create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-vcvtbiasph2bf8-2.c create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-vcvtbiasph2bf8s-2.c create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-vcvtbiasph2hf8-2.c create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-vcvtbiasph2hf8s-2.c create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-vcvthf82ph-2.c create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-vcvtne2ph2bf8-2.c create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-vcvtne2ph2bf8s-2.c create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-vcvtne2ph2hf8-2.c create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-vcvtne2ph2hf8s-2.c create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-vcvtneph2bf8-2.c create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-vcvtneph2bf8s-2.c create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-vcvtneph2hf8-2.c create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-vcvtneph2hf8s-2.c create mode 100644 gcc/testsuite/gcc.target/i386/fp8-helper.h diff --git a/gcc/config.gcc b/gcc/config.gcc index 22353f2d69e..5e9c36a2aad 100644 --- a/gcc/config.gcc +++ b/gcc/config.gcc @@ -453,7 +453,8 @@ i[34567]86-*-* | x86_64-*-*) raointintrin.h amxcomplexintrin.h avxvnniint16intrin.h sm3intrin.h sha512intrin.h sm4intrin.h usermsrintrin.h avx10_2roundingintrin.h - avx10_2mediaintrin.h avx10_2-512mediaintrin.h" + avx10_2mediaintrin.h avx10_2-512mediaintrin.h + avx10_2convertintrin.h avx10_2-512convertintrin.h" ;; ia64-*-*) extra_headers=ia64intrin.h diff --git a/gcc/config/i386/avx10_2-512convertintrin.h b/gcc/config/i386/avx10_2-512convertintrin.h new file mode 100644 index 00000000000..4ad339bbbf9 --- /dev/null +++ b/gcc/config/i386/avx10_2-512convertintrin.h @@ -0,0 +1,548 @@ +/* Copyright (C) 2024 Free Software Foundation, Inc. + + This file is part of GCC. + + GCC is free software; you can redistribute it and/or modify + it under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 3, or (at your option) + any later version. + + GCC is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + Under Section 7 of GPL version 3, you are granted additional + permissions described in the GCC Runtime Library Exception, version + 3.1, as published by the Free Software Foundation. + + You should have received a copy of the GNU General Public License and + a copy of the GCC Runtime Library Exception along with this program; + see the files COPYING3 and COPYING.RUNTIME respectively. If not, see + . */ + +#ifndef _IMMINTRIN_H_INCLUDED +#error "Never use directly; include instead." +#endif // _IMMINTRIN_H_INCLUDED + +#ifndef __AVX10_2_512CONVERTINTRIN_H_INCLUDED +#define __AVX10_2_512CONVERTINTRIN_H_INCLUDED + +#ifndef __AVX10_2_512__ +#pragma GCC push_options +#pragma GCC target("avx10.2-512") +#define __DISABLE_AVX10_2_512__ +#endif /* __AVX10_2_512__ */ + +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_cvtx2ps_ph (__m512 __A, __m512 __B) +{ + return (__m512h) __builtin_ia32_vcvt2ps2phx512_mask_round ((__v16sf) __A, + (__v16sf) __B, + (__v32hf) + _mm512_setzero_ph (), + (__mmask32) -1, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_cvtx2ps_ph (__m512h __W, __mmask32 __U, __m512 __A, + __m512 __B) +{ + return (__m512h) __builtin_ia32_vcvt2ps2phx512_mask_round ((__v16sf) __A, + (__v16sf) __B, + (__v32hf) __W, + (__mmask32) __U, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_cvtx2ps_ph (__mmask32 __U, __m512 __A, __m512 __B) +{ + return (__m512h) __builtin_ia32_vcvt2ps2phx512_mask_round ((__v16sf) __A, + (__v16sf) __B, + (__v32hf) + _mm512_setzero_ph (), + (__mmask32) __U, + _MM_FROUND_CUR_DIRECTION); +} + +#ifdef __OPTIMIZE__ +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_cvtx_round2ps_ph (__m512 __A, __m512 __B, const int __R) +{ + return (__m512h) __builtin_ia32_vcvt2ps2phx512_mask_round ((__v16sf) __A, + (__v16sf) __B, + (__v32hf) + _mm512_setzero_ph (), + (__mmask32) -1, + __R); +} + +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_cvtx_round2ps_ph (__m512h __W, __mmask32 __U, __m512 __A, + __m512 __B, const int __R) +{ + return (__m512h) __builtin_ia32_vcvt2ps2phx512_mask_round ((__v16sf) __A, + (__v16sf) __B, + (__v32hf) __W, + (__mmask32) __U, + __R); +} + +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_cvtx_round2ps_ph (__mmask32 __U, __m512 __A, + __m512 __B, const int __R) +{ + return (__m512h) __builtin_ia32_vcvt2ps2phx512_mask_round ((__v16sf) __A, + (__v16sf) __B, + (__v32hf) + _mm512_setzero_ph (), + (__mmask32) __U, + __R); +} + +#else +#define _mm512_cvtx_round2ps_ph(A, B, R) \ + ((__m512h) __builtin_ia32_vcvt2ps2phx512_mask_round ((__v16sf) (A), \ + (__v16sf) (B), \ + (__v32hf) \ + (_mm512_setzero_ph ()), \ + (__mmask32) (-1), \ + (R))) +#define _mm512_mask_cvtx_round2ps_ph(W, U, A, B, R) \ + ((__m512h) __builtin_ia32_vcvt2ps2phx512_mask_round ((__v16sf) (A), \ + (__v16sf) (B), \ + (__v32hf) (W), \ + (__mmask32) (U), \ + (R))) +#define _mm512_maskz_cvtx_round2ps_ph(U, A, B, R) \ + ((__m512h) __builtin_ia32_vcvt2ps2phx512_mask_round ((__v16sf) (A), \ + (__v16sf) (B), \ + (__v32hf) \ + (_mm512_setzero_ph ()), \ + (__mmask32) (U), \ + (R))) +#endif /* __OPTIMIZE__ */ + +extern __inline__ __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_cvtbiasph_pbf8 (__m512i __A, __m512h __B) +{ + return (__m256i) __builtin_ia32_vcvtbiasph2bf8512_mask ((__v64qi) __A, + (__v32hf) __B, + (__v32qi)(__m256i) + _mm256_undefined_si256 (), + (__mmask32) -1); +} + +extern __inline__ __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_cvtbiasph_pbf8 (__m256i __W, __mmask32 __U, + __m512i __A, __m512h __B) +{ + return (__m256i) __builtin_ia32_vcvtbiasph2bf8512_mask ((__v64qi) __A, + (__v32hf) __B, + (__v32qi)(__m256i) __W, + (__mmask32) __U); +} + +extern __inline__ __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_cvtbiasph_pbf8 (__mmask32 __U, __m512i __A, __m512h __B) +{ + return (__m256i) __builtin_ia32_vcvtbiasph2bf8512_mask ((__v64qi) __A, + (__v32hf) __B, + (__v32qi)(__m256i) + _mm256_setzero_si256 (), + (__mmask32) __U); +} + +extern __inline__ __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_cvtbiassph_pbf8 (__m512i __A, __m512h __B) +{ + return (__m256i) __builtin_ia32_vcvtbiasph2bf8s512_mask ((__v64qi) __A, + (__v32hf) __B, + (__v32qi)(__m256i) + _mm256_undefined_si256 (), + (__mmask32) -1); +} + +extern __inline__ __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_cvtbiassph_pbf8 (__m256i __W, __mmask32 __U, + __m512i __A, __m512h __B) +{ + return (__m256i) __builtin_ia32_vcvtbiasph2bf8s512_mask ((__v64qi) __A, + (__v32hf) __B, + (__v32qi)(__m256i) __W, + (__mmask32) __U); +} + +extern __inline__ __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_cvtbiassph_pbf8 (__mmask32 __U, __m512i __A, __m512h __B) +{ + return (__m256i) __builtin_ia32_vcvtbiasph2bf8s512_mask ((__v64qi) __A, + (__v32hf) __B, + (__v32qi)(__m256i) + _mm256_setzero_si256 (), + (__mmask32) __U); +} + +extern __inline__ __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_cvtbiasph_phf8 (__m512i __A, __m512h __B) +{ + return (__m256i) __builtin_ia32_vcvtbiasph2hf8512_mask ((__v64qi) __A, + (__v32hf) __B, + (__v32qi)(__m256i) + _mm256_undefined_si256 (), + (__mmask32) -1); +} + +extern __inline__ __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_cvtbiasph_phf8 (__m256i __W, __mmask32 __U, __m512i __A, + __m512h __B) +{ + return (__m256i) __builtin_ia32_vcvtbiasph2hf8512_mask ((__v64qi) __A, + (__v32hf) __B, + (__v32qi)(__m256i) __W, + (__mmask32) __U); +} + +extern __inline__ __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_cvtbiasph_phf8 (__mmask32 __U, __m512i __A, __m512h __B) +{ + return (__m256i) __builtin_ia32_vcvtbiasph2hf8512_mask ((__v64qi) __A, + (__v32hf) __B, + (__v32qi)(__m256i) + _mm256_setzero_si256 (), + (__mmask32) __U); +} + +extern __inline__ __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_cvtbiassph_phf8 (__m512i __A, __m512h __B) +{ + return (__m256i) __builtin_ia32_vcvtbiasph2hf8s512_mask ((__v64qi) __A, + (__v32hf) __B, + (__v32qi)(__m256i) + _mm256_undefined_si256 (), + (__mmask32) -1); +} + +extern __inline__ __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_cvtbiassph_phf8 (__m256i __W, __mmask32 __U, + __m512i __A, __m512h __B) +{ + return (__m256i) __builtin_ia32_vcvtbiasph2hf8s512_mask ((__v64qi) __A, + (__v32hf) __B, + (__v32qi)(__m256i) __W, + (__mmask32) __U); +} + +extern __inline__ __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_cvtbiassph_phf8 (__mmask32 __U, __m512i __A, __m512h __B) +{ + return (__m256i) __builtin_ia32_vcvtbiasph2hf8s512_mask ((__v64qi) __A, + (__v32hf) __B, + (__v32qi)(__m256i) + _mm256_setzero_si256 (), + (__mmask32) __U); +} + +extern __inline__ __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_cvtne2ph_pbf8 (__m512h __A, __m512h __B) +{ + return (__m512i) __builtin_ia32_vcvtne2ph2bf8512_mask ((__v32hf) __A, + (__v32hf) __B, + (__v64qi) + _mm512_setzero_si512 (), + (__mmask64) -1); +} + +extern __inline__ __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_cvtne2ph_pbf8 (__m512i __W, __mmask64 __U, + __m512h __A, __m512h __B) +{ + return (__m512i) __builtin_ia32_vcvtne2ph2bf8512_mask ((__v32hf) __A, + (__v32hf) __B, + (__v64qi) __W, + (__mmask64) __U); +} + +extern __inline__ __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_cvtne2ph_pbf8 (__mmask64 __U, __m512h __A, __m512h __B) +{ + return (__m512i) __builtin_ia32_vcvtne2ph2bf8512_mask ((__v32hf) __A, + (__v32hf) __B, + (__v64qi) + _mm512_setzero_si512 (), + (__mmask64) __U); +} + +extern __inline__ __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_cvtnes2ph_pbf8 (__m512h __A, __m512h __B) +{ + return (__m512i) __builtin_ia32_vcvtne2ph2bf8s512_mask ((__v32hf) __A, + (__v32hf) __B, + (__v64qi) + _mm512_setzero_si512 (), + (__mmask64) -1); +} + +extern __inline__ __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_cvtnes2ph_pbf8 (__m512i __W, __mmask64 __U, + __m512h __A, __m512h __B) +{ + return (__m512i) __builtin_ia32_vcvtne2ph2bf8s512_mask ((__v32hf) __A, + (__v32hf) __B, + (__v64qi) __W, + (__mmask64) __U); +} + +extern __inline__ __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_cvtnes2ph_pbf8 (__mmask64 __U, __m512h __A, __m512h __B) +{ + return (__m512i) __builtin_ia32_vcvtne2ph2bf8s512_mask ((__v32hf) __A, + (__v32hf) __B, + (__v64qi) + _mm512_setzero_si512 (), + (__mmask64) __U); +} + +extern __inline__ __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_cvtne2ph_phf8 (__m512h __A, __m512h __B) +{ + return (__m512i) __builtin_ia32_vcvtne2ph2hf8512_mask ((__v32hf) __A, + (__v32hf) __B, + (__v64qi) + _mm512_setzero_si512 (), + (__mmask64) -1); +} + +extern __inline__ __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_cvtne2ph_phf8 (__m512i __W, __mmask64 __U, + __m512h __A, __m512h __B) +{ + return (__m512i) __builtin_ia32_vcvtne2ph2hf8512_mask ((__v32hf) __A, + (__v32hf) __B, + (__v64qi) __W, + (__mmask64) __U); +} + +extern __inline__ __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_cvtne2ph_phf8 (__mmask64 __U, __m512h __A, __m512h __B) +{ + return (__m512i) __builtin_ia32_vcvtne2ph2hf8512_mask ((__v32hf) __A, + (__v32hf) __B, + (__v64qi) + _mm512_setzero_si512 (), + (__mmask64) __U); +} + +extern __inline__ __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_cvtnes2ph_phf8 (__m512h __A, __m512h __B) +{ + return (__m512i) __builtin_ia32_vcvtne2ph2hf8s512_mask ((__v32hf) __A, + (__v32hf) __B, + (__v64qi) + _mm512_setzero_si512 (), + (__mmask64) -1); +} + +extern __inline__ __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_cvtnes2ph_phf8 (__m512i __W, __mmask64 __U, + __m512h __A, __m512h __B) +{ + return (__m512i) __builtin_ia32_vcvtne2ph2hf8s512_mask ((__v32hf) __A, + (__v32hf) __B, + (__v64qi) __W, + (__mmask64) __U); +} + +extern __inline__ __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_cvtnes2ph_phf8 (__mmask64 __U, __m512h __A, __m512h __B) +{ + return (__m512i) __builtin_ia32_vcvtne2ph2hf8s512_mask ((__v32hf) __A, + (__v32hf) __B, + (__v64qi) + _mm512_setzero_si512 (), + (__mmask64) __U); +} + +extern __inline__ __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_cvthf8_ph (__m256i __A) +{ + return (__m512h) __builtin_ia32_vcvthf82ph512_mask ((__v32qi) __A, + (__v32hf) (__m512h) + _mm512_undefined_ph (), + (__mmask32) -1); +} + +extern __inline__ __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_cvthf8_ph (__m512h __W, __mmask32 __U, __m256i __A) +{ + return (__m512h) __builtin_ia32_vcvthf82ph512_mask ((__v32qi) __A, + (__v32hf) (__m512h) __W, + (__mmask32) __U); +} + +extern __inline__ __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_cvthf8_ph (__mmask32 __U, __m256i __A) +{ + return (__m512h) __builtin_ia32_vcvthf82ph512_mask ((__v32qi) __A, + (__v32hf) (__m512h) + _mm512_setzero_ph (), + (__mmask32) __U); +} + +extern __inline__ __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_cvtneph_pbf8 (__m512h __A) +{ + return (__m256i) __builtin_ia32_vcvtneph2bf8512_mask ((__v32hf) __A, + (__v32qi) (__m256i) + _mm256_undefined_si256 (), + (__mmask32) -1); +} + +extern __inline__ __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_cvtneph_pbf8 (__m256i __W, __mmask32 __U, __m512h __A) +{ + return (__m256i) __builtin_ia32_vcvtneph2bf8512_mask ((__v32hf) __A, + (__v32qi) (__m256i) __W, + (__mmask32) __U); +} + +extern __inline__ __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_cvtneph_pbf8 (__mmask32 __U, __m512h __A) +{ + return (__m256i) __builtin_ia32_vcvtneph2bf8512_mask ((__v32hf) __A, + (__v32qi) (__m256i) + _mm256_setzero_si256 (), + (__mmask32) __U); +} + +extern __inline__ __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_cvtnesph_pbf8 (__m512h __A) +{ + return (__m256i) __builtin_ia32_vcvtneph2bf8s512_mask ((__v32hf) __A, + (__v32qi) (__m256i) + _mm256_undefined_si256 (), + (__mmask32) -1); +} + +extern __inline__ __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_cvtnesph_pbf8 (__m256i __W, __mmask32 __U, __m512h __A) +{ + return (__m256i) __builtin_ia32_vcvtneph2bf8s512_mask ((__v32hf) __A, + (__v32qi) (__m256i) __W, + (__mmask32) __U); +} + +extern __inline__ __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_cvtnesph_pbf8 (__mmask32 __U, __m512h __A) +{ + return (__m256i) __builtin_ia32_vcvtneph2bf8s512_mask ((__v32hf) __A, + (__v32qi) (__m256i) + _mm256_setzero_si256 (), + (__mmask32) __U); +} + +extern __inline__ __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_cvtneph_phf8 (__m512h __A) +{ + return (__m256i) __builtin_ia32_vcvtneph2hf8512_mask ((__v32hf) __A, + (__v32qi) (__m256i) + _mm256_undefined_si256 (), + (__mmask32) -1); +} + +extern __inline__ __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_cvtneph_phf8 (__m256i __W, __mmask32 __U, __m512h __A) +{ + return (__m256i) __builtin_ia32_vcvtneph2hf8512_mask ((__v32hf) __A, + (__v32qi)(__m256i) __W, + (__mmask32) __U); +} + +extern __inline__ __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_cvtneph_phf8 (__mmask32 __U, __m512h __A) +{ + return (__m256i) __builtin_ia32_vcvtneph2hf8512_mask ((__v32hf) __A, + (__v32qi) (__m256i) + _mm256_setzero_si256 (), + (__mmask32) __U); +} + +extern __inline__ __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_cvtnesph_phf8 (__m512h __A) +{ + return (__m256i) __builtin_ia32_vcvtneph2hf8s512_mask ((__v32hf) __A, + (__v32qi) (__m256i) + _mm256_undefined_si256 (), + (__mmask32) -1); +} + +extern __inline__ __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_cvtnesph_phf8 (__m256i __W, __mmask32 __U, __m512h __A) +{ + return (__m256i) __builtin_ia32_vcvtneph2hf8s512_mask ((__v32hf) __A, + (__v32qi) (__m256i) __W, + (__mmask32) __U); +} + +extern __inline__ __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_cvtnesph_phf8 (__mmask32 __U, __m512h __A) +{ + return (__m256i) __builtin_ia32_vcvtneph2hf8s512_mask ((__v32hf) __A, + (__v32qi) (__m256i) + _mm256_setzero_si256 (), + (__mmask32) __U); +} + +#ifdef __DISABLE_AVX10_2_512__ +#undef __DISABLE_AVX10_2_512__ +#pragma GCC pop_options +#endif /* __DISABLE_AVX10_2_512__ */ + +#endif /* __AVX10_2_512CONVERTINTRIN_H_INCLUDED */ diff --git a/gcc/config/i386/avx10_2convertintrin.h b/gcc/config/i386/avx10_2convertintrin.h new file mode 100644 index 00000000000..ac62d1290a5 --- /dev/null +++ b/gcc/config/i386/avx10_2convertintrin.h @@ -0,0 +1,978 @@ +/* Copyright (C) 2024 Free Software Foundation, Inc. + + This file is part of GCC. + + GCC is free software; you can redistribute it and/or modify + it under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 3, or (at your option) + any later version. + + GCC is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + Under Section 7 of GPL version 3, you are granted additional + permissions described in the GCC Runtime Library Exception, version + 3.1, as published by the Free Software Foundation. + + You should have received a copy of the GNU General Public License and + a copy of the GCC Runtime Library Exception along with this program; + see the files COPYING3 and COPYING.RUNTIME respectively. If not, see + . */ + +#if !defined _IMMINTRIN_H_INCLUDED +#error "Never use directly; include instead." +#endif + +#ifndef _AVX10_2CONVERTINTRIN_H_INCLUDED +#define _AVX10_2CONVERTINTRIN_H_INCLUDED + +#if !defined(__AVX10_2_256__) +#pragma GCC push_options +#pragma GCC target("avx10.2") +#define __DISABLE_AVX10_2_256__ +#endif /* __AVX10_2__ */ + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvtx2ps_ph (__m128 __A, __m128 __B) +{ + return (__m128h) __builtin_ia32_vcvt2ps2phx128_mask ((__v4sf) __A, + (__v4sf) __B, + (__v8hf) + _mm_setzero_ph (), + (__mmask8) -1); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_cvtx2ps_ph (__m128h __W, __mmask8 __U, __m128 __A, __m128 __B) +{ + return (__m128h) __builtin_ia32_vcvt2ps2phx128_mask ((__v4sf) __A, + (__v4sf) __B, + (__v8hf) __W, + (__mmask8) __U); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_cvtx2ps_ph (__mmask8 __U, __m128 __A, __m128 __B) +{ + return (__m128h) __builtin_ia32_vcvt2ps2phx128_mask ((__v4sf) __A, + (__v4sf) __B, + (__v8hf) + _mm_setzero_ph (), + (__mmask8) __U); +} + +extern __inline __m256h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_cvtx2ps_ph (__m256 __A, __m256 __B) +{ + return (__m256h) __builtin_ia32_vcvt2ps2phx256_mask_round ((__v8sf) __A, + (__v8sf) __B, + (__v16hf) + _mm256_setzero_ph (), + (__mmask16) -1, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m256h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_mask_cvtx2ps_ph (__m256h __W, __mmask16 __U, __m256 __A, __m256 __B) +{ + return (__m256h) __builtin_ia32_vcvt2ps2phx256_mask_round ((__v8sf) __A, + (__v8sf) __B, + (__v16hf) __W, + (__mmask16) __U, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m256h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_maskz_cvtx2ps_ph ( __mmask16 __U, __m256 __A, __m256 __B) +{ + return (__m256h) __builtin_ia32_vcvt2ps2phx256_mask_round ((__v8sf) __A, + (__v8sf) __B, + (__v16hf) + _mm256_setzero_ph (), + (__mmask16) __U, + _MM_FROUND_CUR_DIRECTION); +} + +#ifdef __OPTIMIZE__ +extern __inline __m256h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_cvtx_round2ps_ph (__m256 __A, __m256 __B, const int __R) +{ + return (__m256h) __builtin_ia32_vcvt2ps2phx256_mask_round ((__v8sf) __A, + (__v8sf) __B, + (__v16hf) + _mm256_setzero_ph (), + (__mmask16) -1, + __R); +} + +extern __inline __m256h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_mask_cvtx_round2ps_ph (__m256h __W, __mmask16 __U, __m256 __A, + __m256 __B, const int __R) +{ + return (__m256h) __builtin_ia32_vcvt2ps2phx256_mask_round ((__v8sf) __A, + (__v8sf) __B, + (__v16hf) __W, + (__mmask16) __U, + __R); +} + +extern __inline __m256h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_maskz_cvtx_round2ps_ph (__mmask16 __U, __m256 __A, + __m256 __B, const int __R) +{ + return (__m256h) __builtin_ia32_vcvt2ps2phx256_mask_round ((__v8sf) __A, + (__v8sf) __B, + (__v16hf) + _mm256_setzero_ph (), + (__mmask16) __U, + __R); +} + +#else +#define _mm256_cvtx_round2ps_ph(A, B, R) \ + ((__m256h) __builtin_ia32_vcvt2ps2phx256_mask_round ((__v8sf) (A), \ + (__v8sf) (B), \ + (__v16hf) \ + (_mm256_setzero_ph ()), \ + (__mmask16) (-1), \ + (R))) + +#define _mm256_mask_cvtx_round2ps_ph(W, U, A, B, R) \ + ((__m256h) __builtin_ia32_vcvt2ps2phx256_mask_round ((__v8sf) (A), \ + (__v8sf) (B), \ + (__v16hf) (W), \ + (__mmask16) (U), \ + (R))) + +#define _mm256_maskz_cvtx_round2ps_ph(U, A, B, R) \ + ((__m256h) __builtin_ia32_vcvt2ps2phx256_mask_round ((__v8sf) (A), \ + (__v8sf) (B), \ + (__v16hf) \ + (_mm256_setzero_ph ()), \ + (__mmask16) (U), \ + (R))) +#endif /* __OPTIMIZE__ */ + +extern __inline__ __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvtbiasph_pbf8 (__m128i __A, __m128h __B) +{ + return (__m128i) __builtin_ia32_vcvtbiasph2bf8128 ((__v16qi) __A, + (__v8hf) __B); +} + +extern __inline__ __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_cvtbiasph_pbf8 (__m128i __W, __mmask8 __U, __m128i __A, + __m128h __B) +{ + return (__m128i) __builtin_ia32_vcvtbiasph2bf8128_mask ((__v16qi) __A, + (__v8hf) __B, + (__v16qi)(__m128i) __W, + (__mmask8) __U); +} + +extern __inline__ __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_cvtbiasph_pbf8 (__mmask8 __U, __m128i __A, __m128h __B) +{ + return (__m128i) __builtin_ia32_vcvtbiasph2bf8128_mask ((__v16qi) __A, + (__v8hf) __B, + (__v16qi)(__m128i) + _mm_setzero_si128 (), + (__mmask8) __U); +} + +extern __inline__ __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_cvtbiasph_pbf8 (__m256i __A, __m256h __B) +{ + return (__m128i) __builtin_ia32_vcvtbiasph2bf8256_mask ((__v32qi) __A, + (__v16hf) __B, + (__v16qi)(__m128i) + _mm_undefined_si128 (), + (__mmask16) -1); +} + +extern __inline__ __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_mask_cvtbiasph_pbf8 (__m128i __W, __mmask16 __U, __m256i __A, + __m256h __B) +{ + return (__m128i) __builtin_ia32_vcvtbiasph2bf8256_mask ((__v32qi) __A, + (__v16hf) __B, + (__v16qi)(__m128i) __W, + (__mmask16) __U); +} + +extern __inline__ __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_maskz_cvtbiasph_pbf8 (__mmask16 __U, __m256i __A, __m256h __B) +{ + return (__m128i) __builtin_ia32_vcvtbiasph2bf8256_mask ((__v32qi) __A, + (__v16hf) __B, + (__v16qi)(__m128i) + _mm_setzero_si128 (), + (__mmask16) __U); +} + +extern __inline__ __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvtbiassph_pbf8 (__m128i __A, __m128h __B) +{ + return (__m128i) __builtin_ia32_vcvtbiasph2bf8s128 ((__v16qi) __A, + (__v8hf) __B); +} + +extern __inline__ __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_cvtbiassph_pbf8 (__m128i __W, __mmask8 __U, + __m128i __A, __m128h __B) +{ + return (__m128i) __builtin_ia32_vcvtbiasph2bf8s128_mask ((__v16qi) __A, + (__v8hf) __B, + (__v16qi)(__m128i) __W, + (__mmask8) __U); +} + +extern __inline__ __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_cvtbiassph_pbf8 (__mmask8 __U, __m128i __A, __m128h __B) +{ + return (__m128i) __builtin_ia32_vcvtbiasph2bf8s128_mask ((__v16qi) __A, + (__v8hf) __B, + (__v16qi)(__m128i) + _mm_setzero_si128 (), + (__mmask8) __U); +} + +extern __inline__ __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_cvtbiassph_pbf8 (__m256i __A, __m256h __B) +{ + return (__m128i) __builtin_ia32_vcvtbiasph2bf8s256_mask ((__v32qi) __A, + (__v16hf) __B, + (__v16qi)(__m128i) + _mm_undefined_si128 (), + (__mmask16) -1); +} + +extern __inline__ __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_mask_cvtbiassph_pbf8 (__m128i __W, __mmask16 __U, + __m256i __A, __m256h __B) +{ + return (__m128i) __builtin_ia32_vcvtbiasph2bf8s256_mask ((__v32qi) __A, + (__v16hf) __B, + (__v16qi)(__m128i) __W, + (__mmask16) __U); +} + +extern __inline__ __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_maskz_cvtbiassph_pbf8 (__mmask16 __U, __m256i __A, __m256h __B) +{ + return (__m128i) __builtin_ia32_vcvtbiasph2bf8s256_mask ((__v32qi) __A, + (__v16hf) __B, + (__v16qi)(__m128i) + _mm_setzero_si128 (), + (__mmask16) __U); +} + +extern __inline__ __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvtbiasph_phf8 (__m128i __A, __m128h __B) +{ + return (__m128i) __builtin_ia32_vcvtbiasph2hf8128 ((__v16qi) __A, + (__v8hf) __B); +} + +extern __inline__ __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_cvtbiasph_phf8 (__m128i __W, __mmask8 __U, __m128i __A, + __m128h __B) +{ + return (__m128i) __builtin_ia32_vcvtbiasph2hf8128_mask ((__v16qi) __A, + (__v8hf) __B, + (__v16qi)(__m128i) __W, + (__mmask8) __U); +} + +extern __inline__ __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_cvtbiasph_phf8 (__mmask8 __U, __m128i __A, __m128h __B) +{ + return (__m128i) __builtin_ia32_vcvtbiasph2hf8128_mask ((__v16qi) __A, + (__v8hf) __B, + (__v16qi)(__m128i) + _mm_setzero_si128 (), + (__mmask8) __U); +} + +extern __inline__ __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_cvtbiasph_phf8 (__m256i __A, __m256h __B) +{ + return (__m128i) __builtin_ia32_vcvtbiasph2hf8256_mask ((__v32qi) __A, + (__v16hf) __B, + (__v16qi)(__m128i) + _mm_undefined_si128 (), + (__mmask16) -1); +} + +extern __inline__ __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_mask_cvtbiasph_phf8 (__m128i __W, __mmask16 __U, + __m256i __A, __m256h __B) +{ + return (__m128i) __builtin_ia32_vcvtbiasph2hf8256_mask ((__v32qi) __A, + (__v16hf) __B, + (__v16qi)(__m128i) __W, + (__mmask16) __U); +} + +extern __inline__ __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_maskz_cvtbiasph_phf8 (__mmask16 __U, __m256i __A, __m256h __B) +{ + return (__m128i) __builtin_ia32_vcvtbiasph2hf8256_mask ((__v32qi) __A, + (__v16hf) __B, + (__v16qi)(__m128i) + _mm_setzero_si128 (), + (__mmask16) __U); +} + +extern __inline__ __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvtbiassph_phf8 (__m128i __A, __m128h __B) +{ + return (__m128i) __builtin_ia32_vcvtbiasph2hf8s128 ((__v16qi) __A, + (__v8hf) __B); +} + +extern __inline__ __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_cvtbiassph_phf8 (__m128i __W, __mmask8 __U, + __m128i __A, __m128h __B) +{ + return (__m128i) __builtin_ia32_vcvtbiasph2hf8s128_mask ((__v16qi) __A, + (__v8hf) __B, + (__v16qi)(__m128i) __W, + (__mmask8) __U); +} + +extern __inline__ __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_cvtbiassph_phf8 (__mmask8 __U, __m128i __A, __m128h __B) +{ + return (__m128i) __builtin_ia32_vcvtbiasph2hf8s128_mask ((__v16qi) __A, + (__v8hf) __B, + (__v16qi)(__m128i) + _mm_setzero_si128 (), + (__mmask8) __U); +} + +extern __inline__ __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_cvtbiassph_phf8 (__m256i __A, __m256h __B) +{ + return (__m128i) __builtin_ia32_vcvtbiasph2hf8s256_mask ((__v32qi) __A, + (__v16hf) __B, + (__v16qi)(__m128i) + _mm_undefined_si128 (), + (__mmask16) -1); +} + +extern __inline__ __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_mask_cvtbiassph_phf8 (__m128i __W, __mmask16 __U, + __m256i __A, __m256h __B) +{ + return (__m128i) __builtin_ia32_vcvtbiasph2hf8s256_mask ((__v32qi) __A, + (__v16hf) __B, + (__v16qi)(__m128i) __W, + (__mmask16) __U); +} + +extern __inline__ __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_maskz_cvtbiassph_phf8 (__mmask16 __U, __m256i __A, __m256h __B) +{ + return (__m128i) __builtin_ia32_vcvtbiasph2hf8s256_mask ((__v32qi) __A, + (__v16hf) __B, + (__v16qi)(__m128i) + _mm_setzero_si128 (), + (__mmask16) __U); +} + +extern __inline__ __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvtne2ph_pbf8 (__m128h __A, __m128h __B) +{ + return (__m128i) __builtin_ia32_vcvtne2ph2bf8128_mask ((__v8hf) __A, + (__v8hf) __B, + (__v16qi) + _mm_setzero_si128 (), + (__mmask16) -1); +} + +extern __inline__ __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_cvtne2ph_pbf8 (__m128i __W, __mmask16 __U, + __m128h __A, __m128h __B) +{ + return (__m128i) __builtin_ia32_vcvtne2ph2bf8128_mask ((__v8hf) __A, + (__v8hf) __B, + (__v16qi) __W, + (__mmask16) __U); +} + +extern __inline__ __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_cvtne2ph_pbf8 (__mmask16 __U, __m128h __A, __m128h __B) +{ + return (__m128i) __builtin_ia32_vcvtne2ph2bf8128_mask ((__v8hf) __A, + (__v8hf) __B, + (__v16qi) + _mm_setzero_si128 (), + (__mmask16) __U); +} + +extern __inline__ __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_cvtne2ph_pbf8 (__m256h __A, __m256h __B) +{ + return (__m256i) __builtin_ia32_vcvtne2ph2bf8256_mask ((__v16hf) __A, + (__v16hf) __B, + (__v32qi) + _mm256_setzero_si256 (), + (__mmask32) -1); +} + +extern __inline__ __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_mask_cvtne2ph_pbf8 (__m256i __W, __mmask32 __U, + __m256h __A, __m256h __B) +{ + return (__m256i) __builtin_ia32_vcvtne2ph2bf8256_mask ((__v16hf) __A, + (__v16hf) __B, + (__v32qi) __W, + (__mmask32) __U); +} + +extern __inline__ __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_maskz_cvtne2ph_pbf8 (__mmask32 __U, __m256h __A, __m256h __B) +{ + return (__m256i) __builtin_ia32_vcvtne2ph2bf8256_mask ((__v16hf) __A, + (__v16hf) __B, + (__v32qi) + _mm256_setzero_si256 (), + (__mmask32) __U); +} + +extern __inline__ __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvtnes2ph_pbf8 (__m128h __A, __m128h __B) +{ + return (__m128i) __builtin_ia32_vcvtne2ph2bf8s128_mask ((__v8hf) __A, + (__v8hf) __B, + (__v16qi) + _mm_setzero_si128 (), + (__mmask16) -1); +} + +extern __inline__ __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_cvtnes2ph_pbf8 (__m128i __W, __mmask16 __U, + __m128h __A, __m128h __B) +{ + return (__m128i) __builtin_ia32_vcvtne2ph2bf8s128_mask ((__v8hf) __A, + (__v8hf) __B, + (__v16qi) __W, + (__mmask16) __U); +} + +extern __inline__ __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_cvtnes2ph_pbf8 (__mmask16 __U, __m128h __A, __m128h __B) +{ + return (__m128i) __builtin_ia32_vcvtne2ph2bf8s128_mask ((__v8hf) __A, + (__v8hf) __B, + (__v16qi) + _mm_setzero_si128 (), + (__mmask16) __U); +} + +extern __inline__ __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_cvtnes2ph_pbf8 (__m256h __A, __m256h __B) +{ + return (__m256i) __builtin_ia32_vcvtne2ph2bf8s256_mask ((__v16hf) __A, + (__v16hf) __B, + (__v32qi) + _mm256_setzero_si256 (), + (__mmask32) -1); +} + +extern __inline__ __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_mask_cvtnes2ph_pbf8 (__m256i __W, __mmask32 __U, + __m256h __A, __m256h __B) +{ + return (__m256i) __builtin_ia32_vcvtne2ph2bf8s256_mask ((__v16hf) __A, + (__v16hf) __B, + (__v32qi) __W, + (__mmask32) __U); +} + +extern __inline__ __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_maskz_cvtnes2ph_pbf8 (__mmask32 __U, __m256h __A, __m256h __B) +{ + return (__m256i) __builtin_ia32_vcvtne2ph2bf8s256_mask ((__v16hf) __A, + (__v16hf) __B, + (__v32qi) + _mm256_setzero_si256 (), + (__mmask32) __U); +} + +extern __inline__ __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvtne2ph_phf8 (__m128h __A, __m128h __B) +{ + return (__m128i) __builtin_ia32_vcvtne2ph2hf8128_mask ((__v8hf) __A, + (__v8hf) __B, + (__v16qi) + _mm_setzero_si128 (), + (__mmask16) -1); +} + +extern __inline__ __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_cvtne2ph_phf8 (__m128i __W, __mmask16 __U, + __m128h __A, __m128h __B) +{ + return (__m128i) __builtin_ia32_vcvtne2ph2hf8128_mask ((__v8hf) __A, + (__v8hf) __B, + (__v16qi) __W, + (__mmask16) __U); +} + +extern __inline__ __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_cvtne2ph_phf8 (__mmask16 __U, __m128h __A, __m128h __B) +{ + return (__m128i) __builtin_ia32_vcvtne2ph2hf8128_mask ((__v8hf) __A, + (__v8hf) __B, + (__v16qi) + _mm_setzero_si128 (), + (__mmask16) __U); +} + +extern __inline__ __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_cvtne2ph_phf8 (__m256h __A, __m256h __B) +{ + return (__m256i) __builtin_ia32_vcvtne2ph2hf8256_mask ((__v16hf) __A, + (__v16hf) __B, + (__v32qi) + _mm256_setzero_si256 (), + (__mmask32) -1); +} + +extern __inline__ __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_mask_cvtne2ph_phf8 (__m256i __W, __mmask32 __U, + __m256h __A, __m256h __B) +{ + return (__m256i) __builtin_ia32_vcvtne2ph2hf8256_mask ((__v16hf) __A, + (__v16hf) __B, + (__v32qi) __W, + (__mmask32) __U); +} + +extern __inline__ __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_maskz_cvtne2ph_phf8 (__mmask32 __U, __m256h __A, __m256h __B) +{ + return (__m256i) __builtin_ia32_vcvtne2ph2hf8256_mask ((__v16hf) __A, + (__v16hf) __B, + (__v32qi) + _mm256_setzero_si256 (), + (__mmask32) __U); +} + +extern __inline__ __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvtnes2ph_phf8 (__m128h __A, __m128h __B) +{ + return (__m128i) __builtin_ia32_vcvtne2ph2hf8s128_mask ((__v8hf) __A, + (__v8hf) __B, + (__v16qi) + _mm_setzero_si128 (), + (__mmask16) -1); +} + +extern __inline__ __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_cvtnes2ph_phf8 (__m128i __W, __mmask16 __U, + __m128h __A, __m128h __B) +{ + return (__m128i) __builtin_ia32_vcvtne2ph2hf8s128_mask ((__v8hf) __A, + (__v8hf) __B, + (__v16qi) __W, + (__mmask16) __U); +} + +extern __inline__ __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_cvtnes2ph_phf8 (__mmask16 __U, __m128h __A, __m128h __B) +{ + return (__m128i) __builtin_ia32_vcvtne2ph2hf8s128_mask ((__v8hf) __A, + (__v8hf) __B, + (__v16qi) + _mm_setzero_si128 (), + (__mmask16) __U); +} + +extern __inline__ __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_cvtnes2ph_phf8 (__m256h __A, __m256h __B) +{ + return (__m256i) __builtin_ia32_vcvtne2ph2hf8s256_mask ((__v16hf) __A, + (__v16hf) __B, + (__v32qi) + _mm256_setzero_si256 (), + (__mmask32) -1); +} + +extern __inline__ __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_mask_cvtnes2ph_phf8 (__m256i __W, __mmask32 __U, + __m256h __A, __m256h __B) +{ + return (__m256i) __builtin_ia32_vcvtne2ph2hf8s256_mask ((__v16hf) __A, + (__v16hf) __B, + (__v32qi) __W, + (__mmask32) __U); +} + +extern __inline__ __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_maskz_cvtnes2ph_phf8 (__mmask32 __U, __m256h __A, __m256h __B) +{ + return (__m256i) __builtin_ia32_vcvtne2ph2hf8s256_mask ((__v16hf) __A, + (__v16hf) __B, + (__v32qi) + _mm256_setzero_si256 (), + (__mmask32) __U); +} + +extern __inline__ __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvthf8_ph (__m128i __A) +{ + return (__m128h) __builtin_ia32_vcvthf82ph128_mask ((__v16qi) __A, + (__v8hf)(__m128h) + _mm_undefined_ph (), + (__mmask8) -1); +} + +extern __inline__ __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_cvthf8_ph (__m128h __W, __mmask8 __U, __m128i __A) +{ + return (__m128h) __builtin_ia32_vcvthf82ph128_mask ((__v16qi) __A, + (__v8hf)(__m128h) __W, + (__mmask8) __U); +} + +extern __inline__ __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_cvthf8_ph (__mmask8 __U, __m128i __A) +{ + return (__m128h) __builtin_ia32_vcvthf82ph128_mask ((__v16qi) __A, + (__v8hf)(__m128h) + _mm_setzero_ph (), + (__mmask8) __U); +} + +extern __inline__ __m256h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_cvthf8_ph (__m128i __A) +{ + return (__m256h) __builtin_ia32_vcvthf82ph256_mask ((__v16qi) __A, + (__v16hf)(__m256h) + _mm256_undefined_ph (), + (__mmask16) -1); +} + +extern __inline__ __m256h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_mask_cvthf8_ph (__m256h __W, __mmask16 __U, __m128i __A) +{ + return (__m256h) __builtin_ia32_vcvthf82ph256_mask ((__v16qi) __A, + (__v16hf)(__m256h) __W, + (__mmask16) __U); +} + +extern __inline__ __m256h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_maskz_cvthf8_ph (__mmask16 __U, __m128i __A) +{ + return (__m256h) __builtin_ia32_vcvthf82ph256_mask ((__v16qi) __A, + (__v16hf)(__m256h) + _mm256_setzero_ph (), + (__mmask16) __U); +} + +extern __inline__ __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvtneph_pbf8 (__m128h __A) +{ + return (__m128i) __builtin_ia32_vcvtneph2bf8128_mask ((__v8hf) __A, + (__v16qi)(__m128i) + _mm_undefined_si128 (), + (__mmask8) -1); +} + +extern __inline__ __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_cvtneph_pbf8 (__m128i __W, __mmask8 __U, __m128h __A) +{ + return (__m128i) __builtin_ia32_vcvtneph2bf8128_mask ((__v8hf) __A, + (__v16qi)(__m128i) __W, + (__mmask8) __U); +} + +extern __inline__ __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_cvtneph_pbf8 (__mmask8 __U, __m128h __A) +{ + return (__m128i) __builtin_ia32_vcvtneph2bf8128_mask ((__v8hf) __A, + (__v16qi)(__m128i) + _mm_setzero_si128 (), + (__mmask8) __U); +} + +extern __inline__ __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_cvtneph_pbf8 (__m256h __A) +{ + return (__m128i) __builtin_ia32_vcvtneph2bf8256_mask ((__v16hf) __A, + (__v16qi)(__m128i) + _mm_undefined_si128 (), + (__mmask16) -1); +} + +extern __inline__ __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_mask_cvtneph_pbf8 (__m128i __W, __mmask16 __U, __m256h __A) +{ + return (__m128i) __builtin_ia32_vcvtneph2bf8256_mask ((__v16hf) __A, + (__v16qi)(__m128i) __W, + (__mmask16) __U); +} + +extern __inline__ __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_maskz_cvtneph_pbf8 (__mmask16 __U, __m256h __A) +{ + return (__m128i) __builtin_ia32_vcvtneph2bf8256_mask ((__v16hf) __A, + (__v16qi)(__m128i) + _mm_setzero_si128 (), + (__mmask16) __U); +} + +extern __inline__ __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvtnesph_pbf8 (__m128h __A) +{ + return (__m128i) __builtin_ia32_vcvtneph2bf8s128_mask ((__v8hf) __A, + (__v16qi)(__m128i) + _mm_undefined_si128 (), + (__mmask8) -1); +} + +extern __inline__ __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_cvtnesph_pbf8 (__m128i __W, __mmask8 __U, __m128h __A) +{ + return (__m128i) __builtin_ia32_vcvtneph2bf8s128_mask ((__v8hf) __A, + (__v16qi)(__m128i) __W, + (__mmask8) __U); +} + +extern __inline__ __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_cvtnesph_pbf8 (__mmask8 __U, __m128h __A) +{ + return (__m128i) __builtin_ia32_vcvtneph2bf8s128_mask ((__v8hf) __A, + (__v16qi)(__m128i) + _mm_setzero_si128 (), + (__mmask8) __U); +} + +extern __inline__ __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_cvtnesph_pbf8 (__m256h __A) +{ + return (__m128i) __builtin_ia32_vcvtneph2bf8s256_mask ((__v16hf) __A, + (__v16qi)(__m128i) + _mm_undefined_si128 (), + (__mmask16) -1); +} + +extern __inline__ __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_mask_cvtnesph_pbf8 (__m128i __W, __mmask16 __U, __m256h __A) +{ + return (__m128i) __builtin_ia32_vcvtneph2bf8s256_mask ((__v16hf) __A, + (__v16qi)(__m128i) __W, + (__mmask16) __U); +} + +extern __inline__ __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_maskz_cvtnesph_pbf8 (__mmask16 __U, __m256h __A) +{ + return (__m128i) __builtin_ia32_vcvtneph2bf8s256_mask ((__v16hf) __A, + (__v16qi)(__m128i) + _mm_setzero_si128 (), + (__mmask16) __U); +} + +extern __inline__ __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvtneph_phf8 (__m128h __A) +{ + return (__m128i) __builtin_ia32_vcvtneph2hf8128_mask ((__v8hf) __A, + (__v16qi)(__m128i) + _mm_undefined_si128 (), + (__mmask8) -1); +} + +extern __inline__ __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_cvtneph_phf8 (__m128i __W, __mmask8 __U, __m128h __A) +{ + return (__m128i) __builtin_ia32_vcvtneph2hf8128_mask ((__v8hf) __A, + (__v16qi)(__m128i) __W, + (__mmask8) __U); +} + +extern __inline__ __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_cvtneph_phf8 (__mmask8 __U, __m128h __A) +{ + return (__m128i) __builtin_ia32_vcvtneph2hf8128_mask ((__v8hf) __A, + (__v16qi)(__m128i) + _mm_setzero_si128 (), + (__mmask8) __U); +} + +extern __inline__ __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_cvtneph_phf8 (__m256h __A) +{ + return (__m128i) __builtin_ia32_vcvtneph2hf8256_mask ((__v16hf) __A, + (__v16qi)(__m128i) + _mm_undefined_si128 (), + (__mmask16) -1); +} + +extern __inline__ __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_mask_cvtneph_phf8 (__m128i __W, __mmask16 __U, __m256h __A) +{ + return (__m128i) __builtin_ia32_vcvtneph2hf8256_mask ((__v16hf) __A, + (__v16qi)(__m128i) __W, + (__mmask16) __U); +} + +extern __inline__ __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_maskz_cvtneph_phf8 (__mmask16 __U, __m256h __A) +{ + return (__m128i) __builtin_ia32_vcvtneph2hf8256_mask ((__v16hf) __A, + (__v16qi)(__m128i) + _mm_setzero_si128 (), + (__mmask16) __U); +} + +extern __inline__ __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvtnesph_phf8 (__m128h __A) +{ + return (__m128i) __builtin_ia32_vcvtneph2hf8s128_mask ((__v8hf) __A, + (__v16qi)(__m128i) + _mm_undefined_si128 (), + (__mmask8) -1); +} + +extern __inline__ __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_cvtnesph_phf8 (__m128i __W, __mmask8 __U, __m128h __A) +{ + return (__m128i) __builtin_ia32_vcvtneph2hf8s128_mask ((__v8hf) __A, + (__v16qi)(__m128i) __W, + (__mmask8) __U); +} + +extern __inline__ __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_cvtnesph_phf8 (__mmask8 __U, __m128h __A) +{ + return (__m128i) __builtin_ia32_vcvtneph2hf8s128_mask ((__v8hf) __A, + (__v16qi)(__m128i) + _mm_setzero_si128 (), + (__mmask8) __U); +} + +extern __inline__ __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_cvtnesph_phf8 (__m256h __A) +{ + return (__m128i) __builtin_ia32_vcvtneph2hf8s256_mask ((__v16hf) __A, + (__v16qi)(__m128i) + _mm_undefined_si128 (), + (__mmask16) -1); +} + +extern __inline__ __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_mask_cvtnesph_phf8 (__m128i __W, __mmask16 __U, __m256h __A) +{ + return (__m128i) __builtin_ia32_vcvtneph2hf8s256_mask ((__v16hf) __A, + (__v16qi)(__m128i) __W, + (__mmask16) __U); +} + +extern __inline__ __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_maskz_cvtnesph_phf8 (__mmask16 __U, __m256h __A) +{ + return (__m128i) __builtin_ia32_vcvtneph2hf8s256_mask ((__v16hf) __A, + (__v16qi)(__m128i) + _mm_setzero_si128 (), + (__mmask16) __U); +} + +#ifdef __DISABLE_AVX10_2_256__ +#undef __DISABLE_AVX10_2_256__ +#pragma GCC pop_options +#endif /* __DISABLE_AVX10_2_256__ */ + +#endif /* __AVX10_2CONVERTINTRIN_H_INCLUDED */ diff --git a/gcc/config/i386/i386-builtin-types.def b/gcc/config/i386/i386-builtin-types.def index f5fa2544cc5..63b65846c8f 100644 --- a/gcc/config/i386/i386-builtin-types.def +++ b/gcc/config/i386/i386-builtin-types.def @@ -1453,3 +1453,24 @@ DEF_FUNCTION_TYPE (V16HF, V16HF, INT, V16HF, UHI, INT) DEF_FUNCTION_TYPE (V8SF, V8SF, INT, V8SF, UQI, INT) DEF_FUNCTION_TYPE (V4DF, V4DF, V4DF, INT, V4DF, UQI, INT) DEF_FUNCTION_TYPE (V8SF, V8SF, V8SF, INT, V8SF, UQI, INT) +DEF_FUNCTION_TYPE (V32HF, V16SF, V16SF, V32HF, USI, INT) +DEF_FUNCTION_TYPE (V16HF, V8SF, V8SF, V16HF, UHI, INT) +DEF_FUNCTION_TYPE (V32HF, V16SF, V16SF, V32HF, USI) +DEF_FUNCTION_TYPE (V16HF, V8SF, V8SF, V16HF, UHI) +DEF_FUNCTION_TYPE (V8HF, V4SF, V4SF, V8HF, UQI) +DEF_FUNCTION_TYPE (V16QI, V16QI, V8HF) +DEF_FUNCTION_TYPE (V16QI, V16QI, V8HF, V16QI, UHI) +DEF_FUNCTION_TYPE (V16QI, V32QI, V16HF, V16QI, UHI) +DEF_FUNCTION_TYPE (V32QI, V64QI, V32HF, V32QI, USI) +DEF_FUNCTION_TYPE (V64QI, V64QI, V32HF, V32HF) +DEF_FUNCTION_TYPE (V32HF, V32QI, V32HF, USI) +DEF_FUNCTION_TYPE (V32QI, V32QI, V16HF, V16HF) +DEF_FUNCTION_TYPE (V16QI, V16QI, V8HF, V8HF) +DEF_FUNCTION_TYPE (V8HF, V16QI, V8HF, UQI) +DEF_FUNCTION_TYPE (V16HF, V16QI, V16HF, UHI) +DEF_FUNCTION_TYPE (V16QI, V8HF, V8HF, V16QI, UHI) +DEF_FUNCTION_TYPE (V32QI, V16HF, V16HF, V32QI, USI) +DEF_FUNCTION_TYPE (V64QI, V32HF, V32HF, V64QI, UDI) +DEF_FUNCTION_TYPE (V16QI, V8HF, V16QI, UQI) +DEF_FUNCTION_TYPE (V16QI, V16HF, V16QI, UHI) +DEF_FUNCTION_TYPE (V32QI, V32HF, V32QI, USI) diff --git a/gcc/config/i386/i386-builtin.def b/gcc/config/i386/i386-builtin.def index cdf28cd261c..6f5ab32dd0d 100644 --- a/gcc/config/i386/i386-builtin.def +++ b/gcc/config/i386/i386-builtin.def @@ -3115,6 +3115,50 @@ BDESC (0, OPTION_MASK_ISA2_AVX10_2_512, CODE_FOR_avx10_2_mpsadbw, "__builtin_ia3 BDESC (0, OPTION_MASK_ISA2_AVX10_2_512, CODE_FOR_avx10_2_mpsadbw_mask, "__builtin_ia32_mpsadbw512_mask", IX86_BUILTIN_VMPSADBW_V32HI_MASK, UNKNOWN, (int) V32HI_FTYPE_V64QI_V64QI_INT_V32HI_USI) BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx2_mpsadbw_mask, "__builtin_ia32_mpsadbw256_mask", IX86_BUILTIN_VMPSADBW_V16HI_MASK, UNKNOWN, (int) V16HI_FTYPE_V32QI_V32QI_INT_V16HI_UHI) BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_sse4_1_mpsadbw_mask, "__builtin_ia32_mpsadbw128_mask", IX86_BUILTIN_VMPSADBW_V8HI_MASK, UNKNOWN, (int) V8HI_FTYPE_V16QI_V16QI_INT_V8HI_UQI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_cvt2ps2phx_v8hf_mask, "__builtin_ia32_vcvt2ps2phx128_mask", IX86_BUILTIN_VCVT2PS2PHX_V8HF_MASK, UNKNOWN, (int) V8HF_FTYPE_V4SF_V4SF_V8HF_UQI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_vcvtbiasph2bf8v8hf, "__builtin_ia32_vcvtbiasph2bf8128", IX86_BUILTIN_VCVTBIASPH2BF8128, UNKNOWN, (int) V16QI_FTYPE_V16QI_V8HF) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_vcvtbiasph2bf8v8hf_mask, "__builtin_ia32_vcvtbiasph2bf8128_mask", IX86_BUILTIN_VCVTBIASPH2BF8128_MASK, UNKNOWN, (int) V16QI_FTYPE_V16QI_V8HF_V16QI_UHI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_vcvtbiasph2bf8v16hf_mask, "__builtin_ia32_vcvtbiasph2bf8256_mask", IX86_BUILTIN_VCVTBIASPH2BF8256_MASK, UNKNOWN, (int) V16QI_FTYPE_V32QI_V16HF_V16QI_UHI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_512, CODE_FOR_vcvtbiasph2bf8v32hf_mask, "__builtin_ia32_vcvtbiasph2bf8512_mask", IX86_BUILTIN_VCVTBIASPH2BF8512_MASK, UNKNOWN, (int) V32QI_FTYPE_V64QI_V32HF_V32QI_USI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_vcvtbiasph2bf8sv8hf, "__builtin_ia32_vcvtbiasph2bf8s128", IX86_BUILTIN_VCVTBIASPH2BF8S128, UNKNOWN, (int) V16QI_FTYPE_V16QI_V8HF) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_vcvtbiasph2bf8sv8hf_mask, "__builtin_ia32_vcvtbiasph2bf8s128_mask", IX86_BUILTIN_VCVTBIASPH2BF8S128_MASK, UNKNOWN, (int) V16QI_FTYPE_V16QI_V8HF_V16QI_UHI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_vcvtbiasph2bf8sv16hf_mask, "__builtin_ia32_vcvtbiasph2bf8s256_mask", IX86_BUILTIN_VCVTBIASPH2BF8S256_MASK, UNKNOWN, (int) V16QI_FTYPE_V32QI_V16HF_V16QI_UHI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_512, CODE_FOR_vcvtbiasph2bf8sv32hf_mask, "__builtin_ia32_vcvtbiasph2bf8s512_mask", IX86_BUILTIN_VCVTBIASPH2BF8S512_MASK, UNKNOWN, (int) V32QI_FTYPE_V64QI_V32HF_V32QI_USI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_vcvtbiasph2hf8v8hf, "__builtin_ia32_vcvtbiasph2hf8128", IX86_BUILTIN_VCVTBIASPH2HF8128, UNKNOWN, (int) V16QI_FTYPE_V16QI_V8HF) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_vcvtbiasph2hf8v8hf_mask, "__builtin_ia32_vcvtbiasph2hf8128_mask", IX86_BUILTIN_VCVTBIASPH2HF8128_MASK, UNKNOWN, (int) V16QI_FTYPE_V16QI_V8HF_V16QI_UHI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_vcvtbiasph2hf8v16hf_mask, "__builtin_ia32_vcvtbiasph2hf8256_mask", IX86_BUILTIN_VCVTBIASPH2HF8256_MASK, UNKNOWN, (int) V16QI_FTYPE_V32QI_V16HF_V16QI_UHI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_512, CODE_FOR_vcvtbiasph2hf8v32hf_mask, "__builtin_ia32_vcvtbiasph2hf8512_mask", IX86_BUILTIN_VCVTBIASPH2HF8512_MASK, UNKNOWN, (int) V32QI_FTYPE_V64QI_V32HF_V32QI_USI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_vcvtbiasph2hf8sv8hf, "__builtin_ia32_vcvtbiasph2hf8s128", IX86_BUILTIN_VCVTBIASPH2HF8S128, UNKNOWN, (int) V16QI_FTYPE_V16QI_V8HF) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_vcvtbiasph2hf8sv8hf_mask, "__builtin_ia32_vcvtbiasph2hf8s128_mask", IX86_BUILTIN_VCVTBIASPH2HF8S128_MASK, UNKNOWN, (int) V16QI_FTYPE_V16QI_V8HF_V16QI_UHI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_vcvtbiasph2hf8sv16hf_mask, "__builtin_ia32_vcvtbiasph2hf8s256_mask", IX86_BUILTIN_VCVTBIASPH2HF8S256_MASK, UNKNOWN, (int) V16QI_FTYPE_V32QI_V16HF_V16QI_UHI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_512, CODE_FOR_vcvtbiasph2hf8sv32hf_mask, "__builtin_ia32_vcvtbiasph2hf8s512_mask", IX86_BUILTIN_VCVTBIASPH2HF8S512_MASK, UNKNOWN, (int) V32QI_FTYPE_V64QI_V32HF_V32QI_USI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_vcvtne2ph2bf8v8hf_mask, "__builtin_ia32_vcvtne2ph2bf8128_mask", IX86_BUILTIN_VCVTNE2PH2BF8128_MASK, UNKNOWN, (int) V16QI_FTYPE_V8HF_V8HF_V16QI_UHI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_vcvtne2ph2bf8v16hf_mask, "__builtin_ia32_vcvtne2ph2bf8256_mask", IX86_BUILTIN_VCVTNE2PH2BF8256_MASK, UNKNOWN, (int) V32QI_FTYPE_V16HF_V16HF_V32QI_USI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_512, CODE_FOR_vcvtne2ph2bf8v32hf_mask, "__builtin_ia32_vcvtne2ph2bf8512_mask", IX86_BUILTIN_VCVTNE2PH2BF8512_MASK, UNKNOWN, (int) V64QI_FTYPE_V32HF_V32HF_V64QI_UDI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_vcvtne2ph2bf8sv8hf_mask, "__builtin_ia32_vcvtne2ph2bf8s128_mask", IX86_BUILTIN_VCVTNE2PH2BF8S128_MASK, UNKNOWN, (int) V16QI_FTYPE_V8HF_V8HF_V16QI_UHI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_vcvtne2ph2bf8sv16hf_mask, "__builtin_ia32_vcvtne2ph2bf8s256_mask", IX86_BUILTIN_VCVTNE2PH2BF8S256_MASK, UNKNOWN, (int) V32QI_FTYPE_V16HF_V16HF_V32QI_USI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_512, CODE_FOR_vcvtne2ph2bf8sv32hf_mask, "__builtin_ia32_vcvtne2ph2bf8s512_mask", IX86_BUILTIN_VCVTNE2PH2BF8S512_MASK, UNKNOWN, (int) V64QI_FTYPE_V32HF_V32HF_V64QI_UDI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_vcvtne2ph2hf8v8hf_mask, "__builtin_ia32_vcvtne2ph2hf8128_mask", IX86_BUILTIN_VCVTNE2PH2HF8128_MASK, UNKNOWN, (int) V16QI_FTYPE_V8HF_V8HF_V16QI_UHI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_vcvtne2ph2hf8v16hf_mask, "__builtin_ia32_vcvtne2ph2hf8256_mask", IX86_BUILTIN_VCVTNE2PH2HF8256_MASK, UNKNOWN, (int) V32QI_FTYPE_V16HF_V16HF_V32QI_USI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_512, CODE_FOR_vcvtne2ph2hf8v32hf_mask, "__builtin_ia32_vcvtne2ph2hf8512_mask", IX86_BUILTIN_VCVTNE2PH2HF8512_MASK, UNKNOWN, (int) V64QI_FTYPE_V32HF_V32HF_V64QI_UDI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_vcvtne2ph2hf8sv8hf_mask, "__builtin_ia32_vcvtne2ph2hf8s128_mask", IX86_BUILTIN_VCVTNE2PH2HF8S128_MASK, UNKNOWN, (int) V16QI_FTYPE_V8HF_V8HF_V16QI_UHI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_vcvtne2ph2hf8sv16hf_mask, "__builtin_ia32_vcvtne2ph2hf8s256_mask", IX86_BUILTIN_VCVTNE2PH2HF8S256_MASK, UNKNOWN, (int) V32QI_FTYPE_V16HF_V16HF_V32QI_USI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_512, CODE_FOR_vcvtne2ph2hf8sv32hf_mask, "__builtin_ia32_vcvtne2ph2hf8s512_mask", IX86_BUILTIN_VCVTNE2PH2HF8S512_MASK, UNKNOWN, (int) V64QI_FTYPE_V32HF_V32HF_V64QI_UDI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_vcvtneph2bf8v8hf_mask, "__builtin_ia32_vcvtneph2bf8128_mask", IX86_BUILTIN_VCVTNEPH2BF8128_MASK, UNKNOWN, (int) V16QI_FTYPE_V8HF_V16QI_UQI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_vcvtneph2bf8v16hf_mask, "__builtin_ia32_vcvtneph2bf8256_mask", IX86_BUILTIN_VCVTNEPH2BF8256_MASK, UNKNOWN, (int) V16QI_FTYPE_V16HF_V16QI_UHI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_512, CODE_FOR_vcvtneph2bf8v32hf_mask, "__builtin_ia32_vcvtneph2bf8512_mask", IX86_BUILTIN_VCVTNEPH2BF8512_MASK, UNKNOWN, (int) V32QI_FTYPE_V32HF_V32QI_USI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_vcvtneph2bf8sv8hf_mask, "__builtin_ia32_vcvtneph2bf8s128_mask", IX86_BUILTIN_VCVTNEPH2BF8S128_MASK, UNKNOWN, (int) V16QI_FTYPE_V8HF_V16QI_UQI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_vcvtneph2bf8sv16hf_mask, "__builtin_ia32_vcvtneph2bf8s256_mask", IX86_BUILTIN_VCVTNEPH2BF8S256_MASK, UNKNOWN, (int) V16QI_FTYPE_V16HF_V16QI_UHI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_512, CODE_FOR_vcvtneph2bf8sv32hf_mask, "__builtin_ia32_vcvtneph2bf8s512_mask", IX86_BUILTIN_VCVTNEPH2BF8S512_MASK, UNKNOWN, (int) V32QI_FTYPE_V32HF_V32QI_USI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_vcvtneph2hf8v8hf_mask, "__builtin_ia32_vcvtneph2hf8128_mask", IX86_BUILTIN_VCVTNEPH2HF8128_MASK, UNKNOWN, (int) V16QI_FTYPE_V8HF_V16QI_UQI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_vcvtneph2hf8v16hf_mask, "__builtin_ia32_vcvtneph2hf8256_mask", IX86_BUILTIN_VCVTNEPH2HF8256_MASK, UNKNOWN, (int) V16QI_FTYPE_V16HF_V16QI_UHI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_512, CODE_FOR_vcvtneph2hf8v32hf_mask, "__builtin_ia32_vcvtneph2hf8512_mask", IX86_BUILTIN_VCVTNEPH2HF8512_MASK, UNKNOWN, (int) V32QI_FTYPE_V32HF_V32QI_USI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_vcvtneph2hf8sv8hf_mask, "__builtin_ia32_vcvtneph2hf8s128_mask", IX86_BUILTIN_VCVTNEPH2HF8S128_MASK, UNKNOWN, (int) V16QI_FTYPE_V8HF_V16QI_UQI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_vcvtneph2hf8sv16hf_mask, "__builtin_ia32_vcvtneph2hf8s256_mask", IX86_BUILTIN_VCVTNEPH2HF8S256_MASK, UNKNOWN, (int) V16QI_FTYPE_V16HF_V16QI_UHI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_512, CODE_FOR_vcvtneph2hf8sv32hf_mask, "__builtin_ia32_vcvtneph2hf8s512_mask", IX86_BUILTIN_VCVTNEPH2HF8S512_MASK, UNKNOWN, (int) V32QI_FTYPE_V32HF_V32QI_USI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_vcvthf82phv8hf_mask, "__builtin_ia32_vcvthf82ph128_mask", IX86_BUILTIN_VCVTHF82PH128_MASK, UNKNOWN, (int) V8HF_FTYPE_V16QI_V8HF_UQI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_vcvthf82phv16hf_mask, "__builtin_ia32_vcvthf82ph256_mask", IX86_BUILTIN_VCVTHF82PH256_MASK, UNKNOWN, (int) V16HF_FTYPE_V16QI_V16HF_UHI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_512, CODE_FOR_vcvthf82phv32hf_mask, "__builtin_ia32_vcvthf82ph512_mask", IX86_BUILTIN_VCVTHF82PH512_MASK, UNKNOWN, (int) V32HF_FTYPE_V32QI_V32HF_USI) /* Builtins with rounding support. */ BDESC_END (ARGS, ROUND_ARGS) @@ -3573,6 +3617,8 @@ BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx_sqrtv8sf2_mask_round, "__b BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_subv4df3_mask_round, "__builtin_ia32_subpd256_mask_round", IX86_BUILTIN_VSUBPD256_MASK_ROUND, UNKNOWN, (int) V4DF_FTYPE_V4DF_V4DF_V4DF_UQI_INT) BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_subv16hf3_mask_round, "__builtin_ia32_subph256_mask_round", IX86_BUILTIN_VSUBPH256_MASK_ROUND, UNKNOWN, (int) V16HF_FTYPE_V16HF_V16HF_V16HF_UHI_INT) BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_subv8sf3_mask_round, "__builtin_ia32_subps256_mask_round", IX86_BUILTIN_VSUBPS256_MASK_ROUND, UNKNOWN, (int) V8SF_FTYPE_V8SF_V8SF_V8SF_UQI_INT) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_512, CODE_FOR_avx10_2_cvt2ps2phx_v32hf_mask_round, "__builtin_ia32_vcvt2ps2phx512_mask_round", IX86_BUILTIN_VCVT2PS2PHX_V32HF_MASK_ROUND, UNKNOWN, (int) V32HF_FTYPE_V16SF_V16SF_V32HF_USI_INT) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_cvt2ps2phx_v16hf_mask_round, "__builtin_ia32_vcvt2ps2phx256_mask_round", IX86_BUILTIN_VCVT2PS2PHX_V16HF_MASK_ROUND, UNKNOWN, (int) V16HF_FTYPE_V8SF_V8SF_V16HF_UHI_INT) BDESC_END (ROUND_ARGS, MULTI_ARG) diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc index f1e6bc11f86..c5305395a64 100644 --- a/gcc/config/i386/i386-expand.cc +++ b/gcc/config/i386/i386-expand.cc @@ -11408,6 +11408,7 @@ ix86_expand_args_builtin (const struct builtin_description *d, case V16BF_FTYPE_V16SF_UHI: case V8BF_FTYPE_V8SF_UQI: case V8BF_FTYPE_V4SF_UQI: + case V16QI_FTYPE_V16QI_V8HF: nargs = 2; break; case V2DI_FTYPE_V2DI_INT_CONVERT: @@ -11623,6 +11624,15 @@ ix86_expand_args_builtin (const struct builtin_description *d, case V16SF_FTYPE_V16SF_V32BF_V32BF: case V8SF_FTYPE_V8SF_V16BF_V16BF: case V4SF_FTYPE_V4SF_V8BF_V8BF: + case V16QI_FTYPE_V16QI_V8HF_V8HF: + case V32QI_FTYPE_V32QI_V16HF_V16HF: + case V64QI_FTYPE_V64QI_V32HF_V32HF: + case V16QI_FTYPE_V8HF_V16QI_UQI: + case V16QI_FTYPE_V16HF_V16QI_UHI: + case V32QI_FTYPE_V32HF_V32QI_USI: + case V8HF_FTYPE_V16QI_V8HF_UQI: + case V16HF_FTYPE_V16QI_V16HF_UHI: + case V32HF_FTYPE_V32QI_V32HF_USI: nargs = 3; break; case V32QI_FTYPE_V32QI_V32QI_INT: @@ -11772,6 +11782,15 @@ ix86_expand_args_builtin (const struct builtin_description *d, case V32BF_FTYPE_V16SF_V16SF_V32BF_USI: case V16BF_FTYPE_V8SF_V8SF_V16BF_UHI: case V8BF_FTYPE_V4SF_V4SF_V8BF_UQI: + case V32HF_FTYPE_V16SF_V16SF_V32HF_USI: + case V16HF_FTYPE_V8SF_V8SF_V16HF_UHI: + case V8HF_FTYPE_V4SF_V4SF_V8HF_UQI: + case V16QI_FTYPE_V8HF_V8HF_V16QI_UHI: + case V32QI_FTYPE_V16HF_V16HF_V32QI_USI: + case V64QI_FTYPE_V32HF_V32HF_V64QI_UDI: + case V16QI_FTYPE_V16QI_V8HF_V16QI_UHI: + case V16QI_FTYPE_V32QI_V16HF_V16QI_UHI: + case V32QI_FTYPE_V64QI_V32HF_V32QI_USI: nargs = 4; break; case V2DF_FTYPE_V2DF_V2DF_V2DI_INT: @@ -12525,6 +12544,8 @@ ix86_expand_round_builtin (const struct builtin_description *d, case V8HF_FTYPE_V8HF_V8HF_V8HF_UQI_INT: case V8HF_FTYPE_V2DF_V8HF_V8HF_UQI_INT: case V8HF_FTYPE_V4SF_V8HF_V8HF_UQI_INT: + case V16HF_FTYPE_V8SF_V8SF_V16HF_UHI_INT: + case V32HF_FTYPE_V16SF_V16SF_V32HF_USI_INT: nargs = 5; break; case V32HF_FTYPE_V32HF_INT_V32HF_USI_INT: diff --git a/gcc/config/i386/immintrin.h b/gcc/config/i386/immintrin.h index ce8437d00c2..fea55a298fc 100644 --- a/gcc/config/i386/immintrin.h +++ b/gcc/config/i386/immintrin.h @@ -144,4 +144,8 @@ #include +#include + +#include + #endif /* _IMMINTRIN_H_INCLUDED */ diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index 6f76e8f50ad..1d62f96dcc5 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -216,6 +216,19 @@ ;; For AVX10.2 suppport UNSPEC_VDPPHPS + UNSPEC_VCVTBIASPH2BF8 + UNSPEC_VCVTBIASPH2BF8S + UNSPEC_VCVTBIASPH2HF8 + UNSPEC_VCVTBIASPH2HF8S + UNSPEC_VCVTNE2PH2BF8 + UNSPEC_VCVTNE2PH2BF8S + UNSPEC_VCVTNE2PH2HF8 + UNSPEC_VCVTNE2PH2HF8S + UNSPEC_VCVTNEPH2BF8 + UNSPEC_VCVTNEPH2BF8S + UNSPEC_VCVTNEPH2HF8 + UNSPEC_VCVTNEPH2HF8S + UNSPEC_VCVTHF82PH ]) (define_c_enum "unspecv" [ @@ -483,6 +496,9 @@ [(V32HF "TARGET_EVEX512") (V16HF "TARGET_AVX512VL") (V8HF "TARGET_AVX512VL") (V32BF "TARGET_EVEX512") (V16BF "TARGET_AVX512VL") (V8BF "TARGET_AVX512VL")]) +(define_mode_iterator VHF_AVX10_2 + [(V32HF "TARGET_AVX10_2_512") V16HF V8HF]) + ;; All vector integer modes (define_mode_iterator VI [(V16SI "TARGET_AVX512F && TARGET_EVEX512") @@ -31359,8 +31375,8 @@ (set_attr "mode" "")]) (define_mode_attr bf16_ph - [(V8HF "ph") (V16HF "ph") - (V8BF "bf16") (V16BF "bf16")]) + [(V8HF "ph") (V16HF "ph") (V32HF "ph") + (V8BF "bf16") (V16BF "bf16") (V32BF "bf16")]) (define_insn "vcvtnee2ps_" [(set (match_operand:V4SF 0 "register_operand" "=x") @@ -31418,6 +31434,221 @@ (set_attr "addr" "gpr16") (set_attr "mode" "")]) +(define_insn "avx10_2_cvt2ps2phx_" + [(set (match_operand:VHF_AVX10_2 0 "register_operand" "=v") + (vec_concat:VHF_AVX10_2 + (float_truncate: + (match_operand: 2 "" "")) + (float_truncate: + (match_operand: 1 "register_operand" "v"))))] + "TARGET_AVX10_2_256 && " + "vcvt2ps2phx\t{%2, %1, %0|%0, %1, %2}") + +(define_mode_attr ssebvecmode + [(V8HF "V16QI") (V16HF "V32QI") (V32HF "V64QI")]) + +(define_int_iterator UNSPEC_NECONVERTFP8_PACK + [UNSPEC_VCVTNE2PH2BF8 UNSPEC_VCVTNE2PH2BF8S + UNSPEC_VCVTNE2PH2HF8 UNSPEC_VCVTNE2PH2HF8S]) + +(define_int_attr neconvertfp8_pack + [(UNSPEC_VCVTNE2PH2BF8 "ne2ph2bf8") + (UNSPEC_VCVTNE2PH2BF8S "ne2ph2bf8s") + (UNSPEC_VCVTNE2PH2HF8 "ne2ph2hf8") + (UNSPEC_VCVTNE2PH2HF8S "ne2ph2hf8s")]) + +(define_insn "vcvt" + [(set (match_operand: 0 "register_operand" "=v") + (unspec: + [(match_operand:VHF_AVX10_2 1 "register_operand" "v") + (match_operand:VHF_AVX10_2 2 "nonimmediate_operand" "vm")] + UNSPEC_NECONVERTFP8_PACK))] + "TARGET_AVX10_2_256" + "vcvt\t{%2, %1, %0|%0, %1, %2}" + [(set_attr "prefix" "evex")]) + +(define_mode_attr ssebvecmode_2 + [(V8HF "V16QI") (V16HF "V16QI") (V32HF "V32QI")]) + +(define_int_iterator UNSPEC_VCVTBIASPH2FP8_PACK + [UNSPEC_VCVTBIASPH2BF8 UNSPEC_VCVTBIASPH2BF8S + UNSPEC_VCVTBIASPH2HF8 UNSPEC_VCVTBIASPH2HF8S]) + +(define_int_attr biasph2fp8_pack + [(UNSPEC_VCVTBIASPH2BF8 "biasph2bf8") + (UNSPEC_VCVTBIASPH2BF8S "biasph2bf8s") + (UNSPEC_VCVTBIASPH2HF8 "biasph2hf8") + (UNSPEC_VCVTBIASPH2HF8S "biasph2hf8s")]) + +(define_expand "vcvtv8hf" + [(set (match_operand:V16QI 0 "register_operand") + (vec_concat:V16QI + (unspec:V8QI + [(match_operand:V16QI 1 "register_operand") + (match_operand:V8HF 2 "nonimmediate_operand")] + UNSPEC_VCVTBIASPH2FP8_PACK) + (match_dup 3)))] + "TARGET_AVX10_2_256" + "operands[3] = CONST0_RTX (V8QImode);") + +(define_insn "*vcvtv8hf" + [(set (match_operand:V16QI 0 "register_operand" "=v") + (vec_concat:V16QI + (unspec:V8QI + [(match_operand:V16QI 1 "register_operand" "v") + (match_operand:V8HF 2 "nonimmediate_operand" "vm")] + UNSPEC_VCVTBIASPH2FP8_PACK) + (match_operand:V8QI 3 "const0_operand")))] + "TARGET_AVX10_2_256" + "vcvt\t{%2, %1, %0|%0, %1, %2}" + [(set_attr "prefix" "evex") + (set_attr "mode" "HF")]) + +(define_expand "vcvtv8hf_mask" + [(set (match_operand:V16QI 0 "register_operand") + (vec_concat:V16QI + (vec_merge:V8QI + (unspec:V8QI + [(match_operand:V16QI 1 "register_operand") + (match_operand:V8HF 2 "nonimmediate_operand")] + UNSPEC_VCVTBIASPH2FP8_PACK) + (vec_select:V8QI + (match_operand:V16QI 3 "nonimm_or_0_operand") + (parallel [(const_int 0) (const_int 1) + (const_int 2) (const_int 3) + (const_int 4) (const_int 5) + (const_int 6) (const_int 7)])) + (match_operand:QI 4 "register_operand" "C")) + (match_dup 5)))] + "TARGET_AVX10_2_256" + "operands[5] = CONST0_RTX (V8QImode);") + +(define_insn "*vcvtv8hf_mask" + [(set (match_operand:V16QI 0 "register_operand" "=v") + (vec_concat:V16QI + (vec_merge:V8QI + (unspec:V8QI + [(match_operand:V16QI 1 "register_operand" "v") + (match_operand:V8HF 2 "nonimmediate_operand" "vm")] + UNSPEC_VCVTBIASPH2FP8_PACK) + (vec_select:V8QI + (match_operand:V16QI 3 "nonimm_or_0_operand" "0C") + (parallel [(const_int 0) (const_int 1) + (const_int 2) (const_int 3) + (const_int 4) (const_int 5) + (const_int 6) (const_int 7)])) + (match_operand:QI 4 "register_operand" "Yk")) + (match_operand:V8QI 5 "const0_operand")))] + "TARGET_AVX10_2_256" + "vcvt\t{%2, %1, %0%{%4%}%N3|%0%{%4%}%N3, %1, %2}" + [(set_attr "prefix" "evex")]) + +(define_mode_iterator VHF_AVX10_2_2 + [(V32HF "TARGET_AVX10_2_512") V16HF]) + +(define_insn "vcvt" + [(set (match_operand: 0 "register_operand" "=v") + (unspec: + [(match_operand: 1 "register_operand" "v") + (match_operand:VHF_AVX10_2_2 2 "nonimmediate_operand" "vm")] + UNSPEC_VCVTBIASPH2FP8_PACK))] + "TARGET_AVX10_2_256" + "vcvt\t{%2, %1, %0|%0, %1, %2}" + [(set_attr "prefix" "evex")]) + +(define_mode_iterator VHF_256_512 + [V16HF (V32HF "TARGET_AVX10_2_512")]) + +(define_mode_attr ph2fp8suff + [(V32HF "") (V16HF "{y}") (V8HF "{x}")]) + +(define_int_iterator UNSPEC_NECONVERTPH2FP8 + [UNSPEC_VCVTNEPH2BF8 UNSPEC_VCVTNEPH2BF8S + UNSPEC_VCVTNEPH2HF8 UNSPEC_VCVTNEPH2HF8S]) + +(define_int_attr neconvertph2fp8 + [(UNSPEC_VCVTNEPH2BF8 "neph2bf8") + (UNSPEC_VCVTNEPH2BF8S "neph2bf8s") + (UNSPEC_VCVTNEPH2HF8 "neph2hf8") + (UNSPEC_VCVTNEPH2HF8S "neph2hf8s")]) + +(define_expand "vcvtv8hf" + [(set (match_operand:V16QI 0 "register_operand") + (vec_concat:V16QI + (unspec:V8QI + [(match_operand:V8HF 1 "nonimmediate_operand")] + UNSPEC_NECONVERTPH2FP8) + (match_dup 2)))] + "TARGET_AVX10_2_256" + "operands[2] = CONST0_RTX (V8QImode);") + +(define_insn "*vcvtv8hf" + [(set (match_operand:V16QI 0 "register_operand" "=v") + (vec_concat:V16QI + (unspec:V8QI + [(match_operand:V8HF 1 "nonimmediate_operand" "vm")] + UNSPEC_NECONVERTPH2FP8) + (match_operand:V8QI 2 "const0_operand")))] + "TARGET_AVX10_2_256" + "vcvt{x}\t{%1, %0|%0, %1}" + [(set_attr "prefix" "evex") + (set_attr "mode" "HF")]) + +(define_expand "vcvtv8hf_mask" + [(set (match_operand:V16QI 0 "register_operand") + (vec_concat:V16QI + (vec_merge:V8QI + (unspec:V8QI + [(match_operand:V8HF 1 "nonimmediate_operand")] + UNSPEC_NECONVERTPH2FP8) + (vec_select:V8QI + (match_operand:V16QI 2 "nonimm_or_0_operand") + (parallel [(const_int 0) (const_int 1) + (const_int 2) (const_int 3) + (const_int 4) (const_int 5) + (const_int 6) (const_int 7)])) + (match_operand:QI 3 "register_operand")) + (match_dup 4)))] + "TARGET_AVX10_2_256" + "operands[4] = CONST0_RTX (V8QImode);") + +(define_insn "*vcvtv8hf_mask" + [(set (match_operand:V16QI 0 "register_operand" "=v") + (vec_concat:V16QI + (vec_merge:V8QI + (unspec:V8QI + [(match_operand:V8HF 1 "nonimmediate_operand" "vm")] + UNSPEC_NECONVERTPH2FP8) + (vec_select:V8QI + (match_operand:V16QI 2 "nonimm_or_0_operand" "0C") + (parallel [(const_int 0) (const_int 1) + (const_int 2) (const_int 3) + (const_int 4) (const_int 5) + (const_int 6) (const_int 7)])) + (match_operand:QI 3 "register_operand" "Yk")) + (match_operand:V8QI 4 "const0_operand")))] + "TARGET_AVX10_2_256" + "vcvt{x}\t{%1, %0%{%3%}%N2|%0%{%3%}%N2, %1}" + [(set_attr "prefix" "evex")]) + +(define_insn "vcvt" + [(set (match_operand: 0 "register_operand" "=v") + (unspec: + [(match_operand:VHF_256_512 1 "nonimmediate_operand" "vm")] + UNSPEC_NECONVERTPH2FP8))] + "TARGET_AVX10_2_256" + "vcvt\t{%1, %0|%0, %1}" + [(set_attr "prefix" "evex")]) + +(define_insn "vcvthf82ph" + [(set (match_operand:VHF_AVX10_2 0 "register_operand" "=v") + (unspec:VHF_AVX10_2 + [(match_operand: 1 "nonimmediate_operand" "vm")] + UNSPEC_VCVTHF82PH))] + "TARGET_AVX10_2_256" + "vcvthf82ph\t{%1, %0|%0, %1}" + [(set_attr "prefix" "evex")]) + (define_int_iterator VPDPWPROD [UNSPEC_VPDPWUSD UNSPEC_VPDPWUSDS diff --git a/gcc/testsuite/gcc.target/i386/avx-1.c b/gcc/testsuite/gcc.target/i386/avx-1.c index 5fc84234b57..4a47e313096 100644 --- a/gcc/testsuite/gcc.target/i386/avx-1.c +++ b/gcc/testsuite/gcc.target/i386/avx-1.c @@ -1010,6 +1010,12 @@ #define __builtin_ia32_mpsadbw128_mask(A, B, C, D, E) __builtin_ia32_mpsadbw128_mask (A, B, 1, D, E) #define __builtin_ia32_mpsadbw256_mask(A, B, C, D, E) __builtin_ia32_mpsadbw256_mask (A, B, 1, D, E) +/* avx10_2convertintrin.h */ +#define __builtin_ia32_vcvt2ps2phx256_mask_round(A, B, C, D, E) __builtin_ia32_vcvt2ps2phx256_mask_round(A, B, C, D, 8) + +/* avx10_2-512convertintrin.h */ +#define __builtin_ia32_vcvt2ps2phx512_mask_round(A, B, C, D, E) __builtin_ia32_vcvt2ps2phx512_mask_round(A, B, C, D, 8) + #include #include #include diff --git a/gcc/testsuite/gcc.target/i386/avx-2.c b/gcc/testsuite/gcc.target/i386/avx-2.c index fb0ef9e2aa5..3f4d7353c62 100644 --- a/gcc/testsuite/gcc.target/i386/avx-2.c +++ b/gcc/testsuite/gcc.target/i386/avx-2.c @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-O0 -Werror-implicit-function-declaration -march=k8 -m3dnow -mavx -mavx2 -msse4a -maes -mpclmul -mavx10.2-512" } */ +/* { dg-options "-O0 -Werror-implicit-function-declaration -march=k8 -m3dnow -mavx -mavx2 -msse4a -maes -mpclmul" } */ /* { dg-add-options bind_pic_locally } */ #include @@ -160,4 +160,3 @@ test_2 (_m_pinsrw, __m64, __m64, int, 1) test_1 (_mm_shuffle_pi16, __m64, __m64, 1) test_1 (_m_pshufw, __m64, __m64, 1) test_1 (_mm_prefetch, void, void *, _MM_HINT_NTA) - diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-512-convert-1.c b/gcc/testsuite/gcc.target/i386/avx10_2-512-convert-1.c new file mode 100644 index 00000000000..bbbff186d0a --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx10_2-512-convert-1.c @@ -0,0 +1,176 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx10.2-512 -O2" } */ +/* { dg-final { scan-assembler-times "vcvt2ps2phx\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvt2ps2phx\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvt2ps2phx\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvt2ps2phx\[ \\t\]+\{rn-sae\}\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvt2ps2phx\[ \\t\]+\{rn-sae\}\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvt2ps2phx\[ \\t\]+\{rn-sae\}\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtbiasph2bf8\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+,\[^\{\n\]*%zmm\[0-9\]+,\[^\{\n\]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtbiasph2bf8\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+,\[^\{\n\]*%zmm\[0-9\]+,\[^\{\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtbiasph2bf8\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+,\[^\{\n\]*%zmm\[0-9\]+,\[^\{\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtbiasph2bf8s\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+,\[^\{\n\]*%zmm\[0-9\]+,\[^\{\n\]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtbiasph2bf8s\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+,\[^\{\n\]*%zmm\[0-9\]+,\[^\{\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtbiasph2bf8s\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+,\[^\{\n\]*%zmm\[0-9\]+,\[^\{\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtbiasph2hf8\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+,\[^\{\n\]*%zmm\[0-9\]+,\[^\{\n\]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtbiasph2hf8\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+,\[^\{\n\]*%zmm\[0-9\]+,\[^\{\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtbiasph2hf8\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+,\[^\{\n\]*%zmm\[0-9\]+,\[^\{\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtbiasph2hf8s\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+,\[^\{\n\]*%zmm\[0-9\]+,\[^\{\n\]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtbiasph2hf8s\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+,\[^\{\n\]*%zmm\[0-9\]+,\[^\{\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtbiasph2hf8s\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+,\[^\{\n\]*%zmm\[0-9\]+,\[^\{\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtne2ph2bf8\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+,\[^\{\n\]*%zmm\[0-9\]+,\[^\{\n\]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtne2ph2bf8\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+,\[^\{\n\]*%zmm\[0-9\]+,\[^\{\n\]*%zmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtne2ph2bf8\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+,\[^\{\n\]*%zmm\[0-9\]+,\[^\{\n\]*%zmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtne2ph2bf8s\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+,\[^\{\n\]*%zmm\[0-9\]+,\[^\{\n\]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtne2ph2bf8s\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+,\[^\{\n\]*%zmm\[0-9\]+,\[^\{\n\]*%zmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtne2ph2bf8s\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+,\[^\{\n\]*%zmm\[0-9\]+,\[^\{\n\]*%zmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtne2ph2hf8\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+,\[^\{\n\]*%zmm\[0-9\]+,\[^\{\n\]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtne2ph2hf8\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+,\[^\{\n\]*%zmm\[0-9\]+,\[^\{\n\]*%zmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtne2ph2hf8\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+,\[^\{\n\]*%zmm\[0-9\]+,\[^\{\n\]*%zmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtne2ph2hf8s\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+,\[^\{\n\]*%zmm\[0-9\]+,\[^\{\n\]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtne2ph2hf8s\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+,\[^\{\n\]*%zmm\[0-9\]+,\[^\{\n\]*%zmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtne2ph2hf8s\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+,\[^\{\n\]*%zmm\[0-9\]+,\[^\{\n\]*%zmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvthf82ph\[ \\t\]*%ymm\[0-9\]+,\[^\{\n\]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvthf82ph\[ \\t\]*%ymm\[0-9\]+,\[^\{\n\]*%zmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvthf82ph\[ \\t\]*%ymm\[0-9\]+,\[^\{\n\]*%zmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtneph2bf8\[ \\t\]*%zmm\[0-9\]+,\[^\{\n\]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtneph2bf8\[ \\t\]*%zmm\[0-9\]+,\[^\{\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtneph2bf8\[ \\t\]*%zmm\[0-9\]+,\[^\{\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtneph2bf8s\[ \\t\]*%zmm\[0-9\]+,\[^\{\n\]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtneph2bf8s\[ \\t\]*%zmm\[0-9\]+,\[^\{\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtneph2bf8s\[ \\t\]*%zmm\[0-9\]+,\[^\{\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtneph2hf8\[ \\t\]*%zmm\[0-9\]+,\[^\{\n\]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtneph2hf8\[ \\t\]*%zmm\[0-9\]+,\[^\{\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtneph2hf8\[ \\t\]*%zmm\[0-9\]+,\[^\{\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtneph2hf8s\[ \\t\]*%zmm\[0-9\]+,\[^\{\n\]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtneph2hf8s\[ \\t\]*%zmm\[0-9\]+,\[^\{\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtneph2hf8s\[ \\t\]*%zmm\[0-9\]+,\[^\{\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ + +#include + +volatile __m256i x256i; +volatile __m512i x512i; +volatile __m512 x, a1, b1; +volatile __m512h y, x512h; +volatile __mmask16 m16; +volatile __mmask32 m32; +volatile __mmask64 m64; +const void *a; +__m512bh *c; +__m512h *d; + +void extern +avx10_2_512_test (void) +{ + y = _mm512_cvtx2ps_ph (a1, b1); + y = _mm512_mask_cvtx2ps_ph (y, m32, a1, b1); + y = _mm512_maskz_cvtx2ps_ph (m32, a1, b1); + + y = _mm512_cvtx_round2ps_ph (a1, b1, 8); + y = _mm512_mask_cvtx_round2ps_ph (y, m32, a1, b1, 8); + y = _mm512_maskz_cvtx_round2ps_ph (m32, a1, b1, 8); +} + +void extern +avx10_2_512_vcvtbiasph2bf8_test (void) +{ + x256i = _mm512_cvtbiasph_pbf8 (x512i, x512h); + x256i = _mm512_mask_cvtbiasph_pbf8 (x256i, m32, x512i, x512h); + x256i = _mm512_maskz_cvtbiasph_pbf8 (m32, x512i, x512h); +} + +void extern +avx10_2_512_vcvtbiasph2bf8s_test (void) +{ + x256i = _mm512_cvtbiassph_pbf8 (x512i, x512h); + x256i = _mm512_mask_cvtbiassph_pbf8 (x256i, m32, x512i, x512h); + x256i = _mm512_maskz_cvtbiassph_pbf8 (m32, x512i, x512h); +} + +void extern +avx10_2_512_vcvtbiasph2hf8_test (void) +{ + x256i = _mm512_cvtbiasph_phf8 (x512i, x512h); + x256i = _mm512_mask_cvtbiasph_phf8 (x256i, m32, x512i, x512h); + x256i = _mm512_maskz_cvtbiasph_phf8 (m32, x512i, x512h); +} + +void extern +avx10_2_512_vcvtbiasph2hf8s_test (void) +{ + x256i = _mm512_cvtbiassph_phf8 (x512i, x512h); + x256i = _mm512_mask_cvtbiassph_phf8 (x256i, m32, x512i, x512h); + x256i = _mm512_maskz_cvtbiassph_phf8 (m32, x512i, x512h); +} + +void extern +avx10_2_512_vcvtne2ph2bf8_test (void) +{ + x512i = _mm512_cvtne2ph_pbf8 (x512h, x512h); + x512i = _mm512_mask_cvtne2ph_pbf8 (x512i, m64, x512h, x512h); + x512i = _mm512_maskz_cvtne2ph_pbf8 (m64, x512h, x512h); +} + +void extern +avx10_2_512_vcvtne2ph2bf8s_test (void) +{ + x512i = _mm512_cvtnes2ph_pbf8 (x512h, x512h); + x512i = _mm512_mask_cvtnes2ph_pbf8 (x512i, m64, x512h, x512h); + x512i = _mm512_maskz_cvtnes2ph_pbf8 (m64, x512h, x512h); +} + +void extern +avx10_2_512_vcvtne2ph2hf8_test (void) +{ + x512i = _mm512_cvtne2ph_phf8 (x512h, x512h); + x512i = _mm512_mask_cvtne2ph_phf8 (x512i, m64, x512h, x512h); + x512i = _mm512_maskz_cvtne2ph_phf8 (m64, x512h, x512h); +} + +void extern +avx10_2_512_vcvtne2ph2hf8s_test (void) +{ + x512i = _mm512_cvtnes2ph_phf8 (x512h, x512h); + x512i = _mm512_mask_cvtnes2ph_phf8 (x512i, m64, x512h, x512h); + x512i = _mm512_maskz_cvtnes2ph_phf8 (m64, x512h, x512h); +} + +void extern +avx10_2_512_vcvthf82ph_test (void) +{ + x512h = _mm512_cvthf8_ph (x256i); + x512h = _mm512_mask_cvthf8_ph (x512h, m32, x256i); + x512h = _mm512_maskz_cvthf8_ph (m32, x256i); +} + +void extern +avx10_2_512_vcvtneph2bf8_test (void) +{ + x256i = _mm512_cvtneph_pbf8 (x512h); + x256i = _mm512_mask_cvtneph_pbf8 (x256i, m32, x512h); + x256i = _mm512_maskz_cvtneph_pbf8 (m32, x512h); +} + +void extern +avx10_2_512_vcvtneph2bf8s_test (void) +{ + x256i = _mm512_cvtnesph_pbf8 (x512h); + x256i = _mm512_mask_cvtnesph_pbf8 (x256i, m32, x512h); + x256i = _mm512_maskz_cvtnesph_pbf8 (m32, x512h); +} + +void extern +avx10_2_512_vcvtneph2hf8_test (void) +{ + x256i = _mm512_cvtneph_phf8 (x512h); + x256i = _mm512_mask_cvtneph_phf8 (x256i, m32, x512h); + x256i = _mm512_maskz_cvtneph_phf8 (m32, x512h); +} + +void extern +avx10_2_512_vcvtneph2hf8s_test (void) +{ + x256i = _mm512_cvtnesph_phf8 (x512h); + x256i = _mm512_mask_cvtnesph_phf8 (x256i, m32, x512h); + x256i = _mm512_maskz_cvtnesph_phf8 (m32, x512h); +} diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-512-vcvt2ps2phx-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-512-vcvt2ps2phx-2.c new file mode 100644 index 00000000000..40dbe18abbe --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx10_2-512-vcvt2ps2phx-2.c @@ -0,0 +1,51 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mavx10.2-512" } */ +/* { dg-require-effective-target avx10_2_512 } */ + +#ifndef AVX10_2 +#define AVX10_2 +#define AVX10_2_512 +#define AVX10_512BIT +#endif +#include "avx10-helper.h" +#include + +#define SIZE_RES (AVX512F_LEN / 16) + +static void +CALC (_Float16 *res_ref, float *src1, float *src2) +{ + float fp32; + int i; + for (i = 0; i < SIZE_RES / 2; i++) + { + fp32 = (float) 2 * i + 7 + i * 0.5; + res_ref[i] = fp32; + src2[i] = fp32; + } + for (i = SIZE_RES / 2; i < SIZE_RES; i++) + { + fp32 = (float)2 * i + 7 + i * 0.5; + res_ref[i] = fp32; + src1[i - (SIZE_RES / 2)] = fp32; + } +} + +void +TEST (void) +{ + int i; + UNION_TYPE (AVX512F_LEN, h) res1; + UNION_TYPE (AVX512F_LEN, ) src1, src2; + _Float16 res_ref[SIZE_RES]; + float fp32; + + for (i = 0; i < SIZE_RES; i++) + res1.a[i] = 5; + + CALC (res_ref, src1.a, src2.a); + + res1.x = INTRINSIC (_cvtx2ps_ph) (src1.x, src2.x); + if (UNION_CHECK (AVX512F_LEN, h) (res1, res_ref)) + abort (); +} diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-512-vcvtbiasph2bf8-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-512-vcvtbiasph2bf8-2.c new file mode 100644 index 00000000000..9ce3c9059f1 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx10_2-512-vcvtbiasph2bf8-2.c @@ -0,0 +1,59 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mavx10.2-512" } */ +/* { dg-require-effective-target avx10_2_512 } */ + +#ifndef AVX10_2 +#define AVX10_2 +#define AVX10_2_512 +#define AVX10_512BIT +#endif + +#include "avx10-helper.h" +#include "fp8-helper.h" + +#define SRC_F8_I8 (AVX512F_LEN / 8) +#define SRC_F16 (AVX512F_LEN / 16) +#define DST_F8_I8 (AVX512F_LEN_HALF / 8) +#define DST_F16 (AVX512F_LEN_HALF / 16) + +void +CALC (unsigned char *r, char *src1, _Float16 *src2) +{ + int i, hf8_bf8, saturate; + + hf8_bf8 = 1; + saturate = 0; + + for (i = 0; i < DST_F8_I8; i++) + { + Float16Union usrc = {.f16 = src2[i]}; + r[i] = convert_fp16_to_fp8(usrc.f16, src1[2 * i], hf8_bf8, saturate); + } + + if (AVX512F_LEN == 128) + for (i = DST_F16; i < DST_F8_I8; i++) + r[i] = 0; +} + +void +TEST (void) +{ + int i,sign; + UNION_TYPE (AVX512F_LEN_HALF, i_b) res; + UNION_TYPE (AVX512F_LEN, i_b) src1; + UNION_TYPE (AVX512F_LEN, h) src2; + unsigned char res_ref[DST_F8_I8]; + + sign = 1; + for (i = 0; i < SRC_F16; i++) + { + src2.a[i] = (_Float16)(sign * (2.5 * (1 << (i % 3)))); + sign = -sign; + } + + res.x = INTRINSIC (_cvtbiasph_pbf8) (src1.x, src2.x); + CALC(res_ref, src1.a, src2.a); + + if (UNION_CHECK (AVX512F_LEN_HALF, i_b) (res, res_ref)) + abort (); +} diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-512-vcvtbiasph2bf8s-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-512-vcvtbiasph2bf8s-2.c new file mode 100644 index 00000000000..5e33b8dc498 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx10_2-512-vcvtbiasph2bf8s-2.c @@ -0,0 +1,59 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mavx10.2-512" } */ +/* { dg-require-effective-target avx10_2_512 } */ + +#ifndef AVX10_2 +#define AVX10_2 +#define AVX10_2_512 +#define AVX10_512BIT +#endif + +#include "avx10-helper.h" +#include "fp8-helper.h" + +#define SRC_F8_I8 (AVX512F_LEN / 8) +#define SRC_F16 (AVX512F_LEN / 16) +#define DST_F8_I8 (AVX512F_LEN_HALF / 8) +#define DST_F16 (AVX512F_LEN_HALF / 16) + +void +CALC (unsigned char *r, char *src1, _Float16 *src2) +{ + int i, hf8_bf8, saturate; + + hf8_bf8 = 1; + saturate = 1; + + for (i = 0; i < DST_F8_I8; i++) + { + Float16Union usrc = {.f16 = src2[i]}; + r[i] = convert_fp16_to_fp8(usrc.f16, src1[2 * i], hf8_bf8, saturate); + } + + if (AVX512F_LEN == 128) + for (i = DST_F16; i < DST_F8_I8; i++) + r[i] = 0; +} + +void +TEST (void) +{ + int i,sign; + UNION_TYPE (AVX512F_LEN_HALF, i_b) res; + UNION_TYPE (AVX512F_LEN, i_b) src1; + UNION_TYPE (AVX512F_LEN, h) src2; + unsigned char res_ref[DST_F8_I8]; + + sign = 1; + for (i = 0; i < SRC_F16; i++) + { + src2.a[i] = (_Float16)(sign * (2.5 * (1 << (i % 3)))); + sign = -sign; + } + + res.x = INTRINSIC (_cvtbiassph_pbf8) (src1.x, src2.x); + CALC(res_ref, src1.a, src2.a); + + if (UNION_CHECK (AVX512F_LEN_HALF, i_b) (res, res_ref)) + abort (); +} diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-512-vcvtbiasph2hf8-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-512-vcvtbiasph2hf8-2.c new file mode 100644 index 00000000000..96d1a33adcd --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx10_2-512-vcvtbiasph2hf8-2.c @@ -0,0 +1,59 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mavx10.2-512" } */ +/* { dg-require-effective-target avx10_2_512 } */ + +#ifndef AVX10_2 +#define AVX10_2 +#define AVX10_2_512 +#define AVX10_512BIT +#endif + +#include "avx10-helper.h" +#include "fp8-helper.h" + +#define SRC_F8_I8 (AVX512F_LEN / 8) +#define SRC_F16 (AVX512F_LEN / 16) +#define DST_F8_I8 (AVX512F_LEN_HALF / 8) +#define DST_F16 (AVX512F_LEN_HALF / 16) + +void +CALC (unsigned char *r, char *src1, _Float16 *src2) +{ + int i, hf8_bf8, saturate; + + hf8_bf8 = 0; + saturate = 0; + + for (i = 0; i < DST_F8_I8; i++) + { + Float16Union usrc = {.f16 = src2[i]}; + r[i] = convert_fp16_to_fp8(usrc.f16, src1[2 * i], hf8_bf8, saturate); + } + + if (AVX512F_LEN == 128) + for (i = DST_F16; i < DST_F8_I8; i++) + r[i] = 0; +} + +void +TEST (void) +{ + int i,sign; + UNION_TYPE (AVX512F_LEN_HALF, i_b) res; + UNION_TYPE (AVX512F_LEN, i_b) src1; + UNION_TYPE (AVX512F_LEN, h) src2; + unsigned char res_ref[DST_F8_I8]; + + sign = 1; + for (i = 0; i < SRC_F16; i++) + { + src2.a[i] = (_Float16)(sign * (2.5 * (1 << (i % 3)))); + sign = -sign; + } + + res.x = INTRINSIC (_cvtbiasph_phf8) (src1.x, src2.x); + CALC(res_ref, src1.a, src2.a); + + if (UNION_CHECK (AVX512F_LEN_HALF, i_b) (res, res_ref)) + abort (); +} diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-512-vcvtbiasph2hf8s-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-512-vcvtbiasph2hf8s-2.c new file mode 100644 index 00000000000..e66b952a45e --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx10_2-512-vcvtbiasph2hf8s-2.c @@ -0,0 +1,59 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mavx10.2-512" } */ +/* { dg-require-effective-target avx10_2_512 } */ + +#ifndef AVX10_2 +#define AVX10_2 +#define AVX10_2_512 +#define AVX10_512BIT +#endif + +#include "avx10-helper.h" +#include "fp8-helper.h" + +#define SRC_F8_I8 (AVX512F_LEN / 8) +#define SRC_F16 (AVX512F_LEN / 16) +#define DST_F8_I8 (AVX512F_LEN_HALF / 8) +#define DST_F16 (AVX512F_LEN_HALF / 16) + +void +CALC (unsigned char *r, char *src1, _Float16 *src2) +{ + int i, hf8_bf8, saturate; + + hf8_bf8 = 0; + saturate = 1; + + for (i = 0; i < DST_F8_I8; i++) + { + Float16Union usrc = {.f16 = src2[i]}; + r[i] = convert_fp16_to_fp8(usrc.f16, src1[2 * i], hf8_bf8, saturate); + } + + if (AVX512F_LEN == 128) + for (i = DST_F16; i < DST_F8_I8; i++) + r[i] = 0; +} + +void +TEST (void) +{ + int i,sign; + UNION_TYPE (AVX512F_LEN_HALF, i_b) res; + UNION_TYPE (AVX512F_LEN, i_b) src1; + UNION_TYPE (AVX512F_LEN, h) src2; + unsigned char res_ref[DST_F8_I8]; + + sign = 1; + for (i = 0; i < SRC_F16; i++) + { + src2.a[i] = (_Float16)(sign * (2.5 * (1 << (i % 3)))); + sign = -sign; + } + + res.x = INTRINSIC (_cvtbiassph_phf8) (src1.x, src2.x); + CALC(res_ref, src1.a, src2.a); + + if (UNION_CHECK (AVX512F_LEN_HALF, i_b) (res, res_ref)) + abort (); +} diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-512-vcvthf82ph-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-512-vcvthf82ph-2.c new file mode 100644 index 00000000000..6b9f07ff86a --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx10_2-512-vcvthf82ph-2.c @@ -0,0 +1,45 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mavx10.2-512" } */ +/* { dg-require-effective-target avx10_2_512 } */ + +#ifndef AVX10_2 +#define AVX10_2 +#define AVX10_2_512 +#define AVX10_512BIT +#endif + +#include "avx10-helper.h" +#include "fp8-helper.h" + +#define SIZE_SRC (AVX512F_LEN_HALF / 8) +#define SIZE_RES (AVX512F_LEN / 16) + +void +CALC (_Float16 *r, unsigned char *s) +{ + int i; + for (i = 0; i < SIZE_RES; i++) + r[i] = convert_hf8_to_fp16(s[i]); +} + +void +TEST (void) +{ + int i,sign; + UNION_TYPE (AVX512F_LEN, h) res; + UNION_TYPE (AVX512F_LEN_HALF, i_b) src; + _Float16 res_ref[SIZE_RES]; + + sign = 1; + for (i = 0; i < SIZE_SRC; i++) + { + src.a[i] = sign * (2.5 * (1 << (i % 3))); + sign = -sign; + } + + res.x = INTRINSIC (_cvthf8_ph) (src.x); + CALC(res_ref, src.a); + + if (UNION_ROUGH_CHECK (AVX512F_LEN, h) (res, res_ref, 0.0009765625)) + abort (); +} diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-512-vcvtne2ph2bf8-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-512-vcvtne2ph2bf8-2.c new file mode 100644 index 00000000000..96fa7c1634d --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx10_2-512-vcvtne2ph2bf8-2.c @@ -0,0 +1,65 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mavx10.2-512" } */ +/* { dg-require-effective-target avx10_2_512 } */ + +#ifndef AVX10_2 +#define AVX10_2 +#define AVX10_2_512 +#define AVX10_512BIT +#endif + +#include "avx10-helper.h" +#include "fp8-helper.h" + +#define SIZE_SRC (AVX512F_LEN / 16) +#define SIZE_RES (AVX512F_LEN / 8) + +void +CALC (unsigned char *r, _Float16 *s1, _Float16 *s2) +{ + _Float16 temp; + Float16Union ut = {.f16 = temp}; + int i, hf8_bf8, saturate; + + hf8_bf8 = 1; + saturate = 0; + + for (i = 0; i < SIZE_RES; i++) + { + r[i] = 0; + if (i < SIZE_SRC) + { + Float16Union usrc2 = {.f16 = s2[i]}; + ut.u16 = usrc2.u16; + } + else + { + Float16Union usrc1 = {.f16 = s1[i-SIZE_SRC]}; + ut.u16 = usrc1.u16; + } + r[i] = convert_fp16_to_fp8(ut.f16, 0, hf8_bf8, saturate); + } +} + +void +TEST (void) +{ + int i,sign; + UNION_TYPE (AVX512F_LEN, i_b) res; + UNION_TYPE (AVX512F_LEN, h) src1, src2; + unsigned char res_ref[SIZE_RES]; + + sign = 1; + for (i = 0; i < SIZE_SRC; i++) + { + src1.a[i] = (_Float16)(sign * (1.5 * (1 << (i % 3)))); + src2.a[i] = (_Float16)(-sign * (2.5 * (1 << (i % 3)))); + sign = -sign; + } + + res.x = INTRINSIC (_cvtne2ph_pbf8) (src1.x, src2.x); + CALC(res_ref, src1.a, src2.a); + + if (UNION_CHECK (AVX512F_LEN, i_b) (res, res_ref)) + abort (); +} diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-512-vcvtne2ph2bf8s-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-512-vcvtne2ph2bf8s-2.c new file mode 100644 index 00000000000..cead411e178 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx10_2-512-vcvtne2ph2bf8s-2.c @@ -0,0 +1,65 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mavx10.2-512" } */ +/* { dg-require-effective-target avx10_2_512 } */ + +#ifndef AVX10_2 +#define AVX10_2 +#define AVX10_2_512 +#define AVX10_512BIT +#endif + +#include "avx10-helper.h" +#include "fp8-helper.h" + +#define SIZE_SRC (AVX512F_LEN / 16) +#define SIZE_RES (AVX512F_LEN / 8) + +void +CALC (unsigned char *r, _Float16 *s1, _Float16 *s2) +{ + _Float16 temp; + Float16Union ut = {.f16 = temp}; + int i, hf8_bf8, saturate; + + hf8_bf8 = 1; + saturate = 1; + + for (i = 0; i < SIZE_RES; i++) + { + r[i] = 0; + if (i < SIZE_SRC) + { + Float16Union usrc2 = {.f16 = s2[i]}; + ut.u16 = usrc2.u16; + } + else + { + Float16Union usrc1 = {.f16 = s1[i-SIZE_SRC]}; + ut.u16 = usrc1.u16; + } + r[i] = convert_fp16_to_fp8(ut.f16, 0, hf8_bf8, saturate); + } +} + +void +TEST (void) +{ + int i,sign; + UNION_TYPE (AVX512F_LEN, i_b) res; + UNION_TYPE (AVX512F_LEN, h) src1, src2; + unsigned char res_ref[SIZE_RES]; + + sign = 1; + for (i = 0; i < SIZE_SRC; i++) + { + src1.a[i] = (_Float16)(sign * (1.5 * (1 << (i % 3)))); + src2.a[i] = (_Float16)(-sign * (2.5 * (1 << (i % 3)))); + sign = -sign; + } + + res.x = INTRINSIC (_cvtnes2ph_pbf8) (src1.x, src2.x); + CALC(res_ref, src1.a, src2.a); + + if (UNION_CHECK (AVX512F_LEN, i_b) (res, res_ref)) + abort (); +} diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-512-vcvtne2ph2hf8-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-512-vcvtne2ph2hf8-2.c new file mode 100644 index 00000000000..6887b4085f5 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx10_2-512-vcvtne2ph2hf8-2.c @@ -0,0 +1,65 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mavx10.2-512" } */ +/* { dg-require-effective-target avx10_2_512 } */ + +#ifndef AVX10_2 +#define AVX10_2 +#define AVX10_2_512 +#define AVX10_512BIT +#endif + +#include "avx10-helper.h" +#include "fp8-helper.h" + +#define SIZE_SRC (AVX512F_LEN / 16) +#define SIZE_RES (AVX512F_LEN / 8) + +void +CALC (unsigned char *r, _Float16 *s1, _Float16 *s2) +{ + _Float16 temp; + Float16Union ut = {.f16 = temp}; + int i, hf8_bf8, saturate; + + hf8_bf8 = 0; + saturate = 0; + + for (i = 0; i < SIZE_RES; i++) + { + r[i] = 0; + if (i < SIZE_SRC) + { + Float16Union usrc2 = {.f16 = s2[i]}; + ut.u16 = usrc2.u16; + } + else + { + Float16Union usrc1 = {.f16 = s1[i-SIZE_SRC]}; + ut.u16 = usrc1.u16; + } + r[i] = convert_fp16_to_fp8(ut.f16, 0, hf8_bf8, saturate); + } +} + +void +TEST (void) +{ + int i,sign; + UNION_TYPE (AVX512F_LEN, i_b) res; + UNION_TYPE (AVX512F_LEN, h) src1, src2; + unsigned char res_ref[SIZE_RES]; + + sign = 1; + for (i = 0; i < SIZE_SRC; i++) + { + src1.a[i] = (_Float16)(sign * (1.5 * (1 << (i % 3)))); + src2.a[i] = (_Float16)(-sign * (2.5 * (1 << (i % 3)))); + sign = -sign; + } + + res.x = INTRINSIC (_cvtne2ph_phf8) (src1.x, src2.x); + CALC(res_ref, src1.a, src2.a); + + if (UNION_CHECK (AVX512F_LEN, i_b) (res, res_ref)) + abort (); +} diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-512-vcvtne2ph2hf8s-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-512-vcvtne2ph2hf8s-2.c new file mode 100644 index 00000000000..6637d5e726f --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx10_2-512-vcvtne2ph2hf8s-2.c @@ -0,0 +1,65 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mavx10.2-512" } */ +/* { dg-require-effective-target avx10_2_512 } */ + +#ifndef AVX10_2 +#define AVX10_2 +#define AVX10_2_512 +#define AVX10_512BIT +#endif + +#include "avx10-helper.h" +#include "fp8-helper.h" + +#define SIZE_SRC (AVX512F_LEN / 16) +#define SIZE_RES (AVX512F_LEN / 8) + +void +CALC (unsigned char *r, _Float16 *s1, _Float16 *s2) +{ + _Float16 temp; + Float16Union ut = {.f16 = temp}; + int i, hf8_bf8, saturate; + + hf8_bf8 = 0; + saturate = 1; + + for (i = 0; i < SIZE_RES; i++) + { + r[i] = 0; + if (i < SIZE_SRC) + { + Float16Union usrc2 = {.f16 = s2[i]}; + ut.u16 = usrc2.u16; + } + else + { + Float16Union usrc1 = {.f16 = s1[i-SIZE_SRC]}; + ut.u16 = usrc1.u16; + } + r[i] = convert_fp16_to_fp8(ut.f16, 0, hf8_bf8, saturate); + } +} + +void +TEST (void) +{ + int i,sign; + UNION_TYPE (AVX512F_LEN, i_b) res; + UNION_TYPE (AVX512F_LEN, h) src1, src2; + unsigned char res_ref[SIZE_RES]; + + sign = 1; + for (i = 0; i < SIZE_SRC; i++) + { + src1.a[i] = (_Float16)(sign * (1.5 * (1 << (i % 3)))); + src2.a[i] = (_Float16)(-sign * (2.5 * (1 << (i % 3)))); + sign *= -1; + } + + res.x = INTRINSIC (_cvtnes2ph_phf8) (src1.x, src2.x); + CALC(res_ref, src1.a, src2.a); + + if (UNION_CHECK (AVX512F_LEN, i_b) (res, res_ref)) + abort (); +} diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-512-vcvtneph2bf8-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-512-vcvtneph2bf8-2.c new file mode 100644 index 00000000000..253b8424ee2 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx10_2-512-vcvtneph2bf8-2.c @@ -0,0 +1,58 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mavx10.2-512" } */ +/* { dg-require-effective-target avx10_2_512 } */ + +#ifndef AVX10_2 +#define AVX10_2 +#define AVX10_2_512 +#define AVX10_512BIT +#define AVX512F_LEN 512 +#define AVX512F_LEN_HALF 256 +#endif + +#include "avx10-helper.h" +#include "fp8-helper.h" + +#define SIZE_SRC (AVX512F_LEN / 16) +#define SIZE_RES (AVX512F_LEN_HALF / 8) + +void +CALC (unsigned char *r, _Float16 *s) +{ + int i, hf8_bf8, saturate; + + hf8_bf8 = 1; + saturate = 0; + + for (i = 0; i < SIZE_RES; i++) + { + r[i] = 0; + if (i < SIZE_SRC) + { + Float16Union usrc = {.f16 = s[i]}; + r[i] = convert_fp16_to_fp8(usrc.f16, 0, hf8_bf8, saturate); + } + } +} + +void +TEST (void) +{ + int i,sign; + UNION_TYPE (AVX512F_LEN_HALF, i_b) res; + UNION_TYPE (AVX512F_LEN, h) src; + unsigned char res_ref[SIZE_RES]; + + sign = 1; + for (i = 0; i < SIZE_SRC; i++) + { + src.a[i] = (_Float16)(sign * (2.5 * (1 << (i % 3)))); + sign = -sign; + } + + res.x = INTRINSIC (_cvtneph_pbf8) (src.x); + CALC(res_ref, src.a); + + if (UNION_CHECK (AVX512F_LEN_HALF, i_b) (res, res_ref)) + abort (); +} diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-512-vcvtneph2bf8s-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-512-vcvtneph2bf8s-2.c new file mode 100644 index 00000000000..b7f9944f1c9 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx10_2-512-vcvtneph2bf8s-2.c @@ -0,0 +1,56 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mavx10.2-512" } */ +/* { dg-require-effective-target avx10_2_512 } */ + +#ifndef AVX10_2 +#define AVX10_2 +#define AVX10_2_512 +#define AVX10_512BIT +#endif + +#include "avx10-helper.h" +#include "fp8-helper.h" + +#define SIZE_SRC (AVX512F_LEN / 16) +#define SIZE_RES (AVX512F_LEN_HALF / 8) + +void +CALC (unsigned char *r, _Float16 *s) +{ + int i, hf8_bf8, saturate; + + hf8_bf8 = 1; + saturate = 1; + + for (i = 0; i < SIZE_RES; i++) + { + r[i] = 0; + if (i < SIZE_SRC) + { + Float16Union usrc = {.f16 = s[i]}; + r[i] = convert_fp16_to_fp8(usrc.f16, 0, hf8_bf8, saturate); + } + } +} + +void +TEST (void) +{ + int i,sign; + UNION_TYPE (AVX512F_LEN_HALF, i_b) res; + UNION_TYPE (AVX512F_LEN, h) src; + unsigned char res_ref[SIZE_RES]; + + sign = 1; + for (i = 0; i < SIZE_SRC; i++) + { + src.a[i] = (_Float16)(sign * (2.5 * (1 << (i % 3)))); + sign = -sign; + } + + res.x = INTRINSIC (_cvtnesph_pbf8) (src.x); + CALC(res_ref, src.a); + + if (UNION_CHECK (AVX512F_LEN_HALF, i_b) (res, res_ref)) + abort (); +} diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-512-vcvtneph2hf8-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-512-vcvtneph2hf8-2.c new file mode 100644 index 00000000000..75f1292a33c --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx10_2-512-vcvtneph2hf8-2.c @@ -0,0 +1,56 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mavx10.2-512" } */ +/* { dg-require-effective-target avx10_2_512 } */ + +#ifndef AVX10_2 +#define AVX10_2 +#define AVX10_2_512 +#define AVX10_512BIT +#endif + +#include "avx10-helper.h" +#include "fp8-helper.h" + +#define SIZE_SRC (AVX512F_LEN / 16) +#define SIZE_RES (AVX512F_LEN_HALF / 8) + +void +CALC (unsigned char *r, _Float16 *s) +{ + int i, hf8_bf8, saturate; + + hf8_bf8 = 0; + saturate = 0; + + for (i = 0; i < SIZE_RES; i++) + { + r[i] = 0; + if (i < SIZE_SRC) + { + Float16Union usrc = {.f16 = s[i]}; + r[i] = convert_fp16_to_fp8(usrc.f16, 0, hf8_bf8, saturate); + } + } +} + +void +TEST (void) +{ + int i,sign; + UNION_TYPE (AVX512F_LEN_HALF, i_b) res; + UNION_TYPE (AVX512F_LEN, h) src; + unsigned char res_ref[SIZE_RES]; + + sign = 1; + for (i = 0; i < SIZE_SRC; i++) + { + src.a[i] = (_Float16)(sign * (2.5 * (1 << (i % 3)))); + sign = -sign; + } + + res.x = INTRINSIC (_cvtneph_phf8) (src.x); + CALC(res_ref, src.a); + + if (UNION_CHECK (AVX512F_LEN_HALF, i_b) (res, res_ref)) + abort (); +} diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-512-vcvtneph2hf8s-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-512-vcvtneph2hf8s-2.c new file mode 100644 index 00000000000..b0f3cb07019 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx10_2-512-vcvtneph2hf8s-2.c @@ -0,0 +1,56 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mavx10.2-512" } */ +/* { dg-require-effective-target avx10_2_512 } */ + +#ifndef AVX10_2 +#define AVX10_2 +#define AVX10_2_512 +#define AVX10_512BIT +#endif + +#include "avx10-helper.h" +#include "fp8-helper.h" + +#define SIZE_SRC (AVX512F_LEN / 16) +#define SIZE_RES (AVX512F_LEN_HALF / 8) + +void +CALC (unsigned char *r, _Float16 *s) +{ + int i, hf8_bf8, saturate; + + hf8_bf8 = 0; + saturate = 1; + + for (i = 0; i < SIZE_RES; i++) + { + r[i] = 0; + if (i < SIZE_SRC) + { + Float16Union usrc = {.f16 = s[i]}; + r[i] = convert_fp16_to_fp8(usrc.f16, 0, hf8_bf8, saturate); + } + } +} + +void +TEST (void) +{ + int i,sign; + UNION_TYPE (AVX512F_LEN_HALF, i_b) res; + UNION_TYPE (AVX512F_LEN, h) src; + unsigned char res_ref[SIZE_RES]; + + sign = 1; + for (i = 0; i < SIZE_SRC; i++) + { + src.a[i] = (_Float16)(sign * (2.5 * (1 << (i % 3)))); + sign = -sign; + } + + res.x = INTRINSIC (_cvtnesph_phf8) (src.x); + CALC(res_ref, src.a); + + if (UNION_CHECK (AVX512F_LEN_HALF, i_b) (res, res_ref)) + abort (); +} diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-convert-1.c b/gcc/testsuite/gcc.target/i386/avx10_2-convert-1.c new file mode 100644 index 00000000000..015474f8cf3 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx10_2-convert-1.c @@ -0,0 +1,274 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx10.2 -O2" } */ +/* { dg-final { scan-assembler-times "vcvt2ps2phx\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvt2ps2phx\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvt2ps2phx\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvt2ps2phx\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvt2ps2phx\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvt2ps2phx\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvt2ps2phx\[ \\t\]+\{rn-sae\}\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvt2ps2phx\[ \\t\]+\{rn-sae\}\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvt2ps2phx\[ \\t\]+\{rn-sae\}\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtbiasph2bf8\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+,\[^\{\n\]*%xmm\[0-9\]+,\[^\{\n\]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtbiasph2bf8\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+,\[^\{\n\]*%xmm\[0-9\]+,\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtbiasph2bf8\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+,\[^\{\n\]*%xmm\[0-9\]+,\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtbiasph2bf8\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+,\[^\{\n\]*%ymm\[0-9\]+,\[^\{\n\]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtbiasph2bf8\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+,\[^\{\n\]*%ymm\[0-9\]+,\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtbiasph2bf8\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+,\[^\{\n\]*%ymm\[0-9\]+,\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtbiasph2bf8s\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+,\[^\{\n\]*%xmm\[0-9\]+,\[^\{\n\]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtbiasph2bf8s\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+,\[^\{\n\]*%xmm\[0-9\]+,\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtbiasph2bf8s\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+,\[^\{\n\]*%xmm\[0-9\]+,\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtbiasph2bf8s\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+,\[^\{\n\]*%ymm\[0-9\]+,\[^\{\n\]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtbiasph2bf8s\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+,\[^\{\n\]*%ymm\[0-9\]+,\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtbiasph2bf8s\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+,\[^\{\n\]*%ymm\[0-9\]+,\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtbiasph2hf8\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+,\[^\{\n\]*%xmm\[0-9\]+,\[^\{\n\]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtbiasph2hf8\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+,\[^\{\n\]*%xmm\[0-9\]+,\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtbiasph2hf8\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+,\[^\{\n\]*%xmm\[0-9\]+,\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtbiasph2hf8\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+,\[^\{\n\]*%ymm\[0-9\]+,\[^\{\n\]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtbiasph2hf8\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+,\[^\{\n\]*%ymm\[0-9\]+,\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtbiasph2hf8\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+,\[^\{\n\]*%ymm\[0-9\]+,\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtbiasph2hf8s\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+,\[^\{\n\]*%xmm\[0-9\]+,\[^\{\n\]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtbiasph2hf8s\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+,\[^\{\n\]*%xmm\[0-9\]+,\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtbiasph2hf8s\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+,\[^\{\n\]*%xmm\[0-9\]+,\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtbiasph2hf8s\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+,\[^\{\n\]*%ymm\[0-9\]+,\[^\{\n\]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtbiasph2hf8s\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+,\[^\{\n\]*%ymm\[0-9\]+,\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtbiasph2hf8s\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+,\[^\{\n\]*%ymm\[0-9\]+,\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtne2ph2bf8\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+,\[^\{\n\]*%xmm\[0-9\]+,\[^\{\n\]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtne2ph2bf8\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+,\[^\{\n\]*%xmm\[0-9\]+,\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtne2ph2bf8\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+,\[^\{\n\]*%xmm\[0-9\]+,\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtne2ph2bf8\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+,\[^\{\n\]*%ymm\[0-9\]+,\[^\{\n\]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtne2ph2bf8\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+,\[^\{\n\]*%ymm\[0-9\]+,\[^\{\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtne2ph2bf8\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+,\[^\{\n\]*%ymm\[0-9\]+,\[^\{\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtne2ph2bf8s\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+,\[^\{\n\]*%xmm\[0-9\]+,\[^\{\n\]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtne2ph2bf8s\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+,\[^\{\n\]*%xmm\[0-9\]+,\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtne2ph2bf8s\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+,\[^\{\n\]*%xmm\[0-9\]+,\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtne2ph2bf8s\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+,\[^\{\n\]*%ymm\[0-9\]+,\[^\{\n\]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtne2ph2bf8s\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+,\[^\{\n\]*%ymm\[0-9\]+,\[^\{\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtne2ph2bf8s\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+,\[^\{\n\]*%ymm\[0-9\]+,\[^\{\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtne2ph2hf8\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+,\[^\{\n\]*%xmm\[0-9\]+,\[^\{\n\]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtne2ph2hf8\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+,\[^\{\n\]*%xmm\[0-9\]+,\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtne2ph2hf8\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+,\[^\{\n\]*%xmm\[0-9\]+,\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtne2ph2hf8\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+,\[^\{\n\]*%ymm\[0-9\]+,\[^\{\n\]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtne2ph2hf8\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+,\[^\{\n\]*%ymm\[0-9\]+,\[^\{\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtne2ph2hf8\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+,\[^\{\n\]*%ymm\[0-9\]+,\[^\{\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtne2ph2hf8s\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+,\[^\{\n\]*%xmm\[0-9\]+,\[^\{\n\]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtne2ph2hf8s\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+,\[^\{\n\]*%xmm\[0-9\]+,\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtne2ph2hf8s\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+,\[^\{\n\]*%xmm\[0-9\]+,\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtne2ph2hf8s\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+,\[^\{\n\]*%ymm\[0-9\]+,\[^\{\n\]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtne2ph2hf8s\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+,\[^\{\n\]*%ymm\[0-9\]+,\[^\{\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtne2ph2hf8s\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+,\[^\{\n\]*%ymm\[0-9\]+,\[^\{\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvthf82ph\[ \\t\]*%xmm\[0-9\]+,\[^\{\n\]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvthf82ph\[ \\t\]*%xmm\[0-9\]+,\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvthf82ph\[ \\t\]*%xmm\[0-9\]+,\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvthf82ph\[ \\t\]*%xmm\[0-9\]+,\[^\{\n\]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvthf82ph\[ \\t\]*%xmm\[0-9\]+,\[^\{\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvthf82ph\[ \\t\]*%xmm\[0-9\]+,\[^\{\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtneph2bf8x\[ \\t\]*%xmm\[0-9\]+,\[^\{\n\]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtneph2bf8x\[ \\t\]*%xmm\[0-9\]+,\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtneph2bf8x\[ \\t\]*%xmm\[0-9\]+,\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtneph2bf8y\[ \\t\]*%ymm\[0-9\]+,\[^\{\n\]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtneph2bf8y\[ \\t\]*%ymm\[0-9\]+,\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtneph2bf8y\[ \\t\]*%ymm\[0-9\]+,\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtneph2bf8sx\[ \\t\]*%xmm\[0-9\]+,\[^\{\n\]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtneph2bf8sx\[ \\t\]*%xmm\[0-9\]+,\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtneph2bf8sx\[ \\t\]*%xmm\[0-9\]+,\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtneph2bf8sy\[ \\t\]*%ymm\[0-9\]+,\[^\{\n\]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtneph2bf8sy\[ \\t\]*%ymm\[0-9\]+,\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtneph2bf8sy\[ \\t\]*%ymm\[0-9\]+,\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtneph2hf8x\[ \\t\]*%xmm\[0-9\]+,\[^\{\n\]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtneph2hf8x\[ \\t\]*%xmm\[0-9\]+,\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtneph2hf8x\[ \\t\]*%xmm\[0-9\]+,\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtneph2hf8y\[ \\t\]*%ymm\[0-9\]+,\[^\{\n\]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtneph2hf8y\[ \\t\]*%ymm\[0-9\]+,\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtneph2hf8y\[ \\t\]*%ymm\[0-9\]+,\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtneph2hf8sx\[ \\t\]*%xmm\[0-9\]+,\[^\{\n\]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtneph2hf8sx\[ \\t\]*%xmm\[0-9\]+,\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtneph2hf8sx\[ \\t\]*%xmm\[0-9\]+,\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtneph2hf8sy\[ \\t\]*%ymm\[0-9\]+,\[^\{\n\]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtneph2hf8sy\[ \\t\]*%ymm\[0-9\]+,\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtneph2hf8sy\[ \\t\]*%ymm\[0-9\]+,\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ + +#include + +volatile __m128 x1,a1,b1; +volatile __m256 x2,a2,b2; +volatile __m128h y,x128h; +volatile __m256h y2,x256h; +volatile __m128i x128i; +volatile __m256i x256i; +volatile __mmask8 m8; +volatile __mmask16 m16; +volatile __mmask32 m32; +const void *a; +__m128bh *b; +__m256bh *c; +__m128h *d; +__m256h *e; + +void extern +avx10_2_test (void) +{ + y = _mm_cvtx2ps_ph (a1, b1); + y = _mm_mask_cvtx2ps_ph (y, m8, a1, b1); + y = _mm_maskz_cvtx2ps_ph (m8, a1, b1); + + y2 = _mm256_cvtx2ps_ph (a2, b2); + y2 = _mm256_mask_cvtx2ps_ph (y2, m16, a2, b2); + y2 = _mm256_maskz_cvtx2ps_ph (m16, a2, b2); + + y2 = _mm256_cvtx_round2ps_ph (a2, b2, 8); + y2 = _mm256_mask_cvtx_round2ps_ph (y2, m16, a2, b2, 8); + y2 = _mm256_maskz_cvtx_round2ps_ph (m16, a2, b2, 8); +} + +void extern +avx10_2_vcvtbiasph2bf8_test (void) +{ + x128i = _mm_cvtbiasph_pbf8 (x128i, x128h); + x128i = _mm_mask_cvtbiasph_pbf8 (x128i, m8, x128i, x128h); + x128i = _mm_maskz_cvtbiasph_pbf8 (m8, x128i, x128h); + + x128i = _mm256_cvtbiasph_pbf8 (x256i, x256h); + x128i = _mm256_mask_cvtbiasph_pbf8 (x128i, m16, x256i, x256h); + x128i = _mm256_maskz_cvtbiasph_pbf8 (m16, x256i, x256h); +} + +void extern +avx10_2_vcvtbiasph2bf8s_test (void) +{ + x128i = _mm_cvtbiassph_pbf8 (x128i, x128h); + x128i = _mm_mask_cvtbiassph_pbf8 (x128i, m8, x128i, x128h); + x128i = _mm_maskz_cvtbiassph_pbf8 (m8, x128i, x128h); + + x128i = _mm256_cvtbiassph_pbf8 (x256i, x256h); + x128i = _mm256_mask_cvtbiassph_pbf8 (x128i, m16, x256i, x256h); + x128i = _mm256_maskz_cvtbiassph_pbf8 (m16, x256i, x256h); +} + +void extern +avx10_2_vcvtbiasph2hf8_test (void) +{ + x128i = _mm_cvtbiasph_phf8 (x128i, x128h); + x128i = _mm_mask_cvtbiasph_phf8 (x128i, m8, x128i, x128h); + x128i = _mm_maskz_cvtbiasph_phf8 (m8, x128i, x128h); + + x128i = _mm256_cvtbiasph_phf8 (x256i, x256h); + x128i = _mm256_mask_cvtbiasph_phf8 (x128i, m16, x256i, x256h); + x128i = _mm256_maskz_cvtbiasph_phf8 (m16, x256i, x256h); +} + +void extern +avx10_2_vcvtbiasph2hf8s_test (void) +{ + x128i = _mm_cvtbiassph_phf8 (x128i, x128h); + x128i = _mm_mask_cvtbiassph_phf8 (x128i, m8, x128i, x128h); + x128i = _mm_maskz_cvtbiassph_phf8 (m8, x128i, x128h); + + x128i = _mm256_cvtbiassph_phf8 (x256i, x256h); + x128i = _mm256_mask_cvtbiassph_phf8 (x128i, m16, x256i, x256h); + x128i = _mm256_maskz_cvtbiassph_phf8 (m16, x256i, x256h); +} + +void extern +avx10_2_vcvtne2ph2bf8_test (void) +{ + x128i = _mm_cvtne2ph_pbf8 (x128h, x128h); + x128i = _mm_mask_cvtne2ph_pbf8 (x128i, m16, x128h, x128h); + x128i = _mm_maskz_cvtne2ph_pbf8 (m16, x128h, x128h); + x256i = _mm256_cvtne2ph_pbf8 (x256h, x256h); + x256i = _mm256_mask_cvtne2ph_pbf8 (x256i, m32, x256h, x256h); + x256i = _mm256_maskz_cvtne2ph_pbf8 (m32, x256h, x256h); +} + +void extern +avx10_2_vcvtne2ph2bf8s_test (void) +{ + x128i = _mm_cvtnes2ph_pbf8 (x128h, x128h); + x128i = _mm_mask_cvtnes2ph_pbf8 (x128i, m16, x128h, x128h); + x128i = _mm_maskz_cvtnes2ph_pbf8 (m16, x128h, x128h); + x256i = _mm256_cvtnes2ph_pbf8 (x256h, x256h); + x256i = _mm256_mask_cvtnes2ph_pbf8 (x256i, m32, x256h, x256h); + x256i = _mm256_maskz_cvtnes2ph_pbf8 (m32, x256h, x256h); +} + +void extern +avx10_2_vcvtne2ph2hf8_test (void) +{ + x128i = _mm_cvtne2ph_phf8 (x128h, x128h); + x128i = _mm_mask_cvtne2ph_phf8 (x128i, m16, x128h, x128h); + x128i = _mm_maskz_cvtne2ph_phf8 (m16, x128h, x128h); + x256i = _mm256_cvtne2ph_phf8 (x256h, x256h); + x256i = _mm256_mask_cvtne2ph_phf8 (x256i, m32, x256h, x256h); + x256i = _mm256_maskz_cvtne2ph_phf8 (m32, x256h, x256h); +} + +void extern +avx10_2_vcvtne2ph2hf8s_test (void) +{ + x128i = _mm_cvtnes2ph_phf8 (x128h, x128h); + x128i = _mm_mask_cvtnes2ph_phf8 (x128i, m16, x128h, x128h); + x128i = _mm_maskz_cvtnes2ph_phf8 (m16, x128h, x128h); + x256i = _mm256_cvtnes2ph_phf8 (x256h, x256h); + x256i = _mm256_mask_cvtnes2ph_phf8 (x256i, m32, x256h, x256h); + x256i = _mm256_maskz_cvtnes2ph_phf8 (m32, x256h, x256h); +} + +void extern +avx10_2_vcvthf82ph_test (void) +{ + x128h = _mm_cvthf8_ph (x128i); + x128h = _mm_mask_cvthf8_ph (x128h, m8, x128i); + x128h = _mm_maskz_cvthf8_ph (m8, x128i); + + x256h = _mm256_cvthf8_ph (x128i); + x256h = _mm256_mask_cvthf8_ph (x256h, m16, x128i); + x256h = _mm256_maskz_cvthf8_ph (m16, x128i); +} + +void extern +avx10_2_vcvtneph2bf8_test (void) +{ + x128i = _mm_cvtneph_pbf8 (x128h); + x128i = _mm_mask_cvtneph_pbf8 (x128i, m8, x128h); + x128i = _mm_maskz_cvtneph_pbf8 (m8, x128h); + + x128i = _mm256_cvtneph_pbf8 (x256h); + x128i = _mm256_mask_cvtneph_pbf8 (x128i, m16, x256h); + x128i = _mm256_maskz_cvtneph_pbf8 (m16, x256h); +} + +void extern +avx10_2_vcvtneph2bf8s_test (void) +{ + x128i = _mm_cvtnesph_pbf8 (x128h); + x128i = _mm_mask_cvtnesph_pbf8 (x128i, m8, x128h); + x128i = _mm_maskz_cvtnesph_pbf8 (m8, x128h); + + x128i = _mm256_cvtnesph_pbf8 (x256h); + x128i = _mm256_mask_cvtnesph_pbf8 (x128i, m16, x256h); + x128i = _mm256_maskz_cvtnesph_pbf8 (m16, x256h); +} + +void extern +avx10_2_vcvtneph2hf8_test (void) +{ + x128i = _mm_cvtneph_phf8 (x128h); + x128i = _mm_mask_cvtneph_phf8 (x128i, m8, x128h); + x128i = _mm_maskz_cvtneph_phf8 (m8, x128h); + + x128i = _mm256_cvtneph_phf8 (x256h); + x128i = _mm256_mask_cvtneph_phf8 (x128i, m16, x256h); + x128i = _mm256_maskz_cvtneph_phf8 (m16, x256h); +} + +void extern +avx10_2_vcvtneph2hf8s_test (void) +{ + x128i = _mm_cvtnesph_phf8 (x128h); + x128i = _mm_mask_cvtnesph_phf8 (x128i, m8, x128h); + x128i = _mm_maskz_cvtnesph_phf8 (m8, x128h); + + x128i = _mm256_cvtnesph_phf8 (x256h); + x128i = _mm256_mask_cvtnesph_phf8 (x128i, m16, x256h); + x128i = _mm256_maskz_cvtnesph_phf8 (m16, x256h); +} diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-vcvt2ps2phx-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-vcvt2ps2phx-2.c new file mode 100644 index 00000000000..ba3a30c9317 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx10_2-vcvt2ps2phx-2.c @@ -0,0 +1,16 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mavx10.2" } */ +/* { dg-require-effective-target avx10_2 } */ + +#define AVX10_2 +#define AVX512VL +#define AVX512F_LEN 256 +#define AVX512F_LEN_HALF 128 +#include "avx10_2-512-vcvt2ps2phx-2.c" + +#undef AVX512F_LEN +#undef AVX512F_LEN_HALF + +#define AVX512F_LEN 128 +#define AVX512F_LEN_HALF 128 +#include "avx10_2-512-vcvt2ps2phx-2.c" diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-vcvtbiasph2bf8-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-vcvtbiasph2bf8-2.c new file mode 100644 index 00000000000..b33d465f465 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx10_2-vcvtbiasph2bf8-2.c @@ -0,0 +1,16 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mavx10.2" } */ +/* { dg-require-effective-target avx10_2 } */ + +#define AVX10_2 +#define AVX512VL +#define AVX512F_LEN 256 +#define AVX512F_LEN_HALF 128 +#include "avx10_2-512-vcvtbiasph2bf8-2.c" + +#undef AVX512F_LEN +#undef AVX512F_LEN_HALF + +#define AVX512F_LEN 128 +#define AVX512F_LEN_HALF 128 +#include "avx10_2-512-vcvtbiasph2bf8-2.c" diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-vcvtbiasph2bf8s-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-vcvtbiasph2bf8s-2.c new file mode 100644 index 00000000000..dcf0d39a54c --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx10_2-vcvtbiasph2bf8s-2.c @@ -0,0 +1,16 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mavx10.2" } */ +/* { dg-require-effective-target avx10_2 } */ + +#define AVX10_2 +#define AVX512VL +#define AVX512F_LEN 256 +#define AVX512F_LEN_HALF 128 +#include "avx10_2-512-vcvtbiasph2bf8s-2.c" + +#undef AVX512F_LEN +#undef AVX512F_LEN_HALF + +#define AVX512F_LEN 128 +#define AVX512F_LEN_HALF 128 +#include "avx10_2-512-vcvtbiasph2bf8s-2.c" diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-vcvtbiasph2hf8-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-vcvtbiasph2hf8-2.c new file mode 100644 index 00000000000..93b80c7cecb --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx10_2-vcvtbiasph2hf8-2.c @@ -0,0 +1,16 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mavx10.2" } */ +/* { dg-require-effective-target avx10_2 } */ + +#define AVX10_2 +#define AVX512VL +#define AVX512F_LEN 256 +#define AVX512F_LEN_HALF 128 +#include "avx10_2-512-vcvtbiasph2hf8-2.c" + +#undef AVX512F_LEN +#undef AVX512F_LEN_HALF + +#define AVX512F_LEN 128 +#define AVX512F_LEN_HALF 128 +#include "avx10_2-512-vcvtbiasph2hf8-2.c" diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-vcvtbiasph2hf8s-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-vcvtbiasph2hf8s-2.c new file mode 100644 index 00000000000..ed35bf08e12 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx10_2-vcvtbiasph2hf8s-2.c @@ -0,0 +1,16 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mavx10.2" } */ +/* { dg-require-effective-target avx10_2 } */ + +#define AVX10_2 +#define AVX512VL +#define AVX512F_LEN 256 +#define AVX512F_LEN_HALF 128 +#include "avx10_2-512-vcvtbiasph2hf8s-2.c" + +#undef AVX512F_LEN +#undef AVX512F_LEN_HALF + +#define AVX512F_LEN 128 +#define AVX512F_LEN_HALF 128 +#include "avx10_2-512-vcvtbiasph2hf8s-2.c" diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-vcvthf82ph-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-vcvthf82ph-2.c new file mode 100644 index 00000000000..d0d9a8d6cff --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx10_2-vcvthf82ph-2.c @@ -0,0 +1,16 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mavx10.2" } */ +/* { dg-require-effective-target avx10_2 } */ + +#define AVX10_2 +#define AVX512VL +#define AVX512F_LEN 256 +#define AVX512F_LEN_HALF 128 +#include "avx10_2-512-vcvthf82ph-2.c" + +#undef AVX512F_LEN +#undef AVX512F_LEN_HALF + +#define AVX512F_LEN 128 +#define AVX512F_LEN_HALF 128 +#include "avx10_2-512-vcvthf82ph-2.c" diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-vcvtne2ph2bf8-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-vcvtne2ph2bf8-2.c new file mode 100644 index 00000000000..50948cfd00a --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx10_2-vcvtne2ph2bf8-2.c @@ -0,0 +1,16 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mavx10.2" } */ +/* { dg-require-effective-target avx10_2 } */ + +#define AVX10_2 +#define AVX512VL +#define AVX512F_LEN 256 +#define AVX512F_LEN_HALF 128 +#include "avx10_2-512-vcvtne2ph2bf8-2.c" + +#undef AVX512F_LEN +#undef AVX512F_LEN_HALF + +#define AVX512F_LEN 128 +#define AVX512F_LEN_HALF 128 +#include "avx10_2-512-vcvtne2ph2bf8-2.c" diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-vcvtne2ph2bf8s-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-vcvtne2ph2bf8s-2.c new file mode 100644 index 00000000000..dda859c5def --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx10_2-vcvtne2ph2bf8s-2.c @@ -0,0 +1,16 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mavx10.2" } */ +/* { dg-require-effective-target avx10_2 } */ + +#define AVX10_2 +#define AVX512VL +#define AVX512F_LEN 256 +#define AVX512F_LEN_HALF 128 +#include "avx10_2-512-vcvtne2ph2bf8s-2.c" + +#undef AVX512F_LEN +#undef AVX512F_LEN_HALF + +#define AVX512F_LEN 128 +#define AVX512F_LEN_HALF 128 +#include "avx10_2-512-vcvtne2ph2bf8s-2.c" diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-vcvtne2ph2hf8-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-vcvtne2ph2hf8-2.c new file mode 100644 index 00000000000..5db139f005a --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx10_2-vcvtne2ph2hf8-2.c @@ -0,0 +1,16 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mavx10.2" } */ +/* { dg-require-effective-target avx10_2 } */ + +#define AVX10_2 +#define AVX512VL +#define AVX512F_LEN 256 +#define AVX512F_LEN_HALF 128 +#include "avx10_2-512-vcvtne2ph2hf8-2.c" + +#undef AVX512F_LEN +#undef AVX512F_LEN_HALF + +#define AVX512F_LEN 128 +#define AVX512F_LEN_HALF 128 +#include "avx10_2-512-vcvtne2ph2hf8-2.c" diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-vcvtne2ph2hf8s-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-vcvtne2ph2hf8s-2.c new file mode 100644 index 00000000000..84bd9b2de2e --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx10_2-vcvtne2ph2hf8s-2.c @@ -0,0 +1,16 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mavx10.2" } */ +/* { dg-require-effective-target avx10_2 } */ + +#define AVX10_2 +#define AVX512VL +#define AVX512F_LEN 256 +#define AVX512F_LEN_HALF 128 +#include "avx10_2-512-vcvtne2ph2hf8s-2.c" + +#undef AVX512F_LEN +#undef AVX512F_LEN_HALF + +#define AVX512F_LEN 128 +#define AVX512F_LEN_HALF 128 +#include "avx10_2-512-vcvtne2ph2hf8s-2.c" diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-vcvtneph2bf8-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-vcvtneph2bf8-2.c new file mode 100644 index 00000000000..96deb4c4b55 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx10_2-vcvtneph2bf8-2.c @@ -0,0 +1,16 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mavx10.2" } */ +/* { dg-require-effective-target avx10_2 } */ + +#define AVX10_2 +#define AVX512VL +#define AVX512F_LEN 256 +#define AVX512F_LEN_HALF 128 +#include "avx10_2-512-vcvtneph2bf8-2.c" + +#undef AVX512F_LEN +#undef AVX512F_LEN_HALF + +#define AVX512F_LEN 128 +#define AVX512F_LEN_HALF 128 +#include "avx10_2-512-vcvtneph2bf8-2.c" diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-vcvtneph2bf8s-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-vcvtneph2bf8s-2.c new file mode 100644 index 00000000000..ea34459afbe --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx10_2-vcvtneph2bf8s-2.c @@ -0,0 +1,16 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mavx10.2" } */ +/* { dg-require-effective-target avx10_2 } */ + +#define AVX10_2 +#define AVX512VL +#define AVX512F_LEN 256 +#define AVX512F_LEN_HALF 128 +#include "avx10_2-512-vcvtneph2bf8s-2.c" + +#undef AVX512F_LEN +#undef AVX512F_LEN_HALF + +#define AVX512F_LEN 128 +#define AVX512F_LEN_HALF 128 +#include "avx10_2-512-vcvtneph2bf8s-2.c" diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-vcvtneph2hf8-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-vcvtneph2hf8-2.c new file mode 100644 index 00000000000..e43c6080309 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx10_2-vcvtneph2hf8-2.c @@ -0,0 +1,16 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mavx10.2" } */ +/* { dg-require-effective-target avx10_2 } */ + +#define AVX10_2 +#define AVX512VL +#define AVX512F_LEN 256 +#define AVX512F_LEN_HALF 128 +#include "avx10_2-512-vcvtneph2hf8-2.c" + +#undef AVX512F_LEN +#undef AVX512F_LEN_HALF + +#define AVX512F_LEN 128 +#define AVX512F_LEN_HALF 128 +#include "avx10_2-512-vcvtneph2hf8-2.c" diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-vcvtneph2hf8s-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-vcvtneph2hf8s-2.c new file mode 100644 index 00000000000..109df51b4d1 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx10_2-vcvtneph2hf8s-2.c @@ -0,0 +1,16 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mavx10.2" } */ +/* { dg-require-effective-target avx10_2 } */ + +#define AVX10_2 +#define AVX512VL +#define AVX512F_LEN 256 +#define AVX512F_LEN_HALF 128 +#include "avx10_2-512-vcvtneph2hf8s-2.c" + +#undef AVX512F_LEN +#undef AVX512F_LEN_HALF + +#define AVX512F_LEN 128 +#define AVX512F_LEN_HALF 128 +#include "avx10_2-512-vcvtneph2hf8s-2.c" diff --git a/gcc/testsuite/gcc.target/i386/fp8-helper.h b/gcc/testsuite/gcc.target/i386/fp8-helper.h new file mode 100644 index 00000000000..b486db5bae8 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/fp8-helper.h @@ -0,0 +1,135 @@ +#ifndef FP8_HELPER_UNCLUDED +#define FP8_HELPER_UNCLUDED + +typedef union +{ + _Float16 f16; + unsigned short u16; +} Float16Union; + +static unsigned char +convert_fp16_to_hf8 (_Float16 x, unsigned char b, int s) +{ + Float16Union ux = { .f16 = x }; + const unsigned short fp16_bias = 15, hf8_bias = 7; + unsigned short sign = (ux.u16 & 0x8000) >> 8; + unsigned short e_fp16 = (ux.u16 & 0x7c00) >> 10; + unsigned short m_fp16 = ux.u16 & 0x03ff; + + /* If bias */ + unsigned short x_bias = b ? ux.u16 + (b >> 1) : ux.u16; + unsigned short e = (x_bias & 0x7c00) >> 10; + unsigned short m = (x_bias & 0x03ff) >> 7; + + if (e_fp16 == 0x1f) + { + /* Special value: NaN or Infinity. */ + return (0xf << 3) | 0x7 | sign; + } + else if ((e_fp16 > (fp16_bias - hf8_bias + 15)) + || ((e_fp16 == (fp16_bias - hf8_bias + 15)) + && (m_fp16 > 0x0300))) + { + /* Overflow: Return Max or NaN. */ + return (0xf << 3) | (s ? 0x6 : 0x7) | sign; + } + else if (e_fp16 < fp16_bias - hf8_bias - 3) + { + /* Value too small: Return zero. */ + return sign; + } + else if (e_fp16 <= fp16_bias - hf8_bias) + { + /* Denormalized value: Adjust mantissa. */ + m = ((m_fp16 | 0x0400) >> ((fp16_bias - hf8_bias) + 1 - e_fp16)) + | (((m_fp16 & 0x007f) + 0x007f) >> 7); + return sign; + } + else + { + /* Normal value: Adjust exponent and mantissa. */ + e -= (fp16_bias - hf8_bias); + return (e << 3) | m | sign; + } +} + +static unsigned char +convert_fp16_to_bf8 (_Float16 x, unsigned char b, int s) +{ + Float16Union ux = { .f16 = x }; + unsigned short temp; + unsigned short fp8_res = 0; + + if (__builtin_isinf (x) || __builtin_isnan (x)) + { + /* Special value: NaN or Infinity. */ + fp8_res = (ux.u16 >> 8) & 0xFF; + if (__builtin_isnan (x)) + fp8_res |= 0x02; + } + else + { + unsigned short rounding_bias = b ? b & 0xFF + : ((ux.u16 >> 8) & 0x1) + 0x7F; + temp = ux.u16 + rounding_bias; + fp8_res = (temp >> 8) & 0xFF; + if (((temp >> 8) & 0x7F) == 0x7C && s) + fp8_res = (fp8_res & 0x80) | 0x7B; + } + return fp8_res; +} + +static unsigned char +convert_fp16_to_fp8 (_Float16 x, unsigned char b, int y, int s) +{ + return y ? convert_fp16_to_bf8 (x, b, s) + : convert_fp16_to_hf8 (x, b, s); +} + +static _Float16 +convert_bf8_to_fp16(unsigned char x) +{ + Float16Union u = {.u16 = (x << 8) & 0xff00}; + return u.f16; +} + +static _Float16 +convert_hf8_to_fp16(unsigned char x) +{ + unsigned char hf8_bias; + Float16Union res; + unsigned short fp_16bias, s, e, m, e_norm, lz_cnt; + + fp_16bias = 15; + hf8_bias = 7; + s = (x & 0x80) << 8; + e = (x & 0x78) >> 3; + m = x & 0x07; + e_norm = e + fp_16bias - hf8_bias; + + /* convert denormal hf8 number into a normal fp16 number */ + if ((e == 0) && (m !=0)) + { + lz_cnt = 2; + lz_cnt = (m > 0x1) ? 1 : lz_cnt; + lz_cnt = (m > 0x3) ? 0 : lz_cnt; + e_norm -= lz_cnt; + m = (m << (lz_cnt + 1)) & 0x07; + } + else if ((e == 0) && (m == 0)) + e_norm = 0; + else if ((e == 0xf) && (m == 0x7)) + { + e_norm = 0x1f; + m = 0x4; + } + + res.u16 = 0; + res.u16 |= e_norm << 10; + res.u16 |= m << 7; + res.u16 |= s; + + return res.f16; +} + +#endif diff --git a/gcc/testsuite/gcc.target/i386/sse-13.c b/gcc/testsuite/gcc.target/i386/sse-13.c index 6b1c9e545f0..a5ba3decc97 100644 --- a/gcc/testsuite/gcc.target/i386/sse-13.c +++ b/gcc/testsuite/gcc.target/i386/sse-13.c @@ -1018,4 +1018,10 @@ #define __builtin_ia32_mpsadbw128_mask(A, B, C, D, E) __builtin_ia32_mpsadbw128_mask (A, B, 1, D, E) #define __builtin_ia32_mpsadbw256_mask(A, B, C, D, E) __builtin_ia32_mpsadbw256_mask (A, B, 1, D, E) +/* avx10_2convertintrin.h */ +#define __builtin_ia32_vcvt2ps2phx256_mask_round(A, B, C, D, E) __builtin_ia32_vcvt2ps2phx256_mask_round(A, B, C, D, 8) + +/* avx10_2-512convertintrin.h */ +#define __builtin_ia32_vcvt2ps2phx512_mask_round(A, B, C, D, E) __builtin_ia32_vcvt2ps2phx512_mask_round(A, B, C, D, 8) + #include diff --git a/gcc/testsuite/gcc.target/i386/sse-14.c b/gcc/testsuite/gcc.target/i386/sse-14.c index 6dfdaa96c76..9253e5eb905 100644 --- a/gcc/testsuite/gcc.target/i386/sse-14.c +++ b/gcc/testsuite/gcc.target/i386/sse-14.c @@ -1382,3 +1382,9 @@ test_3 (_mm_maskz_mpsadbw_epu8, __m128i, __mmask8, __m128i, __m128i, 1) test_3 (_mm256_maskz_mpsadbw_epu8, __m256i, __mmask16, __m256i, __m256i, 1) test_4 (_mm_mask_mpsadbw_epu8, __m128i, __m128i, __mmask8, __m128i, __m128i, 1) test_4 (_mm256_mask_mpsadbw_epu8, __m256i, __m256i, __mmask16, __m256i, __m256i, 1) + +/* avx10_2convertintrin */ +test_2 (_mm256_cvtx_round2ps_ph, __m256h, __m256, __m256, 4) + +/* avx10_2-512convertintrin.h */ +test_2 (_mm512_cvtx_round2ps_ph, __m512h, __m512, __m512, 4) diff --git a/gcc/testsuite/gcc.target/i386/sse-22.c b/gcc/testsuite/gcc.target/i386/sse-22.c index 102b6b878c8..d57bbc41a49 100644 --- a/gcc/testsuite/gcc.target/i386/sse-22.c +++ b/gcc/testsuite/gcc.target/i386/sse-22.c @@ -1421,3 +1421,9 @@ test_3 (_mm_maskz_mpsadbw_epu8, __m128i, __mmask8, __m128i, __m128i, 1) test_3 (_mm256_maskz_mpsadbw_epu8, __m256i, __mmask16, __m256i, __m256i, 1) test_4 (_mm_mask_mpsadbw_epu8, __m128i, __m128i, __mmask8, __m128i, __m128i, 1) test_4 (_mm256_mask_mpsadbw_epu8, __m256i, __m256i, __mmask16, __m256i, __m256i, 1) + +/* avx10_2convertintrin */ +test_2 (_mm256_cvtx_round2ps_ph, __m256h, __m256, __m256, 4) + +/* avx10_2-512convertintrin.h */ +test_2 (_mm512_cvtx_round2ps_ph, __m512h, __m512, __m512, 4) diff --git a/gcc/testsuite/gcc.target/i386/sse-23.c b/gcc/testsuite/gcc.target/i386/sse-23.c index 962b9507283..438974cb0c6 100644 --- a/gcc/testsuite/gcc.target/i386/sse-23.c +++ b/gcc/testsuite/gcc.target/i386/sse-23.c @@ -992,6 +992,12 @@ #define __builtin_ia32_mpsadbw128_mask(A, B, C, D, E) __builtin_ia32_mpsadbw128_mask (A, B, 1, D, E) #define __builtin_ia32_mpsadbw256_mask(A, B, C, D, E) __builtin_ia32_mpsadbw256_mask (A, B, 1, D, E) +/* avx10_2convertintrin.h */ +#define __builtin_ia32_vcvt2ps2phx256_mask_round(A, B, C, D, E) __builtin_ia32_vcvt2ps2phx256_mask_round(A, B, C, D, 8) + +/* avx10_2-512convertintrin.h */ +#define __builtin_ia32_vcvt2ps2phx512_mask_round(A, B, C, D, E) __builtin_ia32_vcvt2ps2phx512_mask_round(A, B, C, D, 8) + #pragma GCC target ("sse4a,3dnow,avx,avx2,fma4,xop,aes,pclmul,popcnt,abm,lzcnt,bmi,bmi2,tbm,lwp,fsgsbase,rdrnd,f16c,fma,rtm,rdseed,prfchw,adx,fxsr,xsaveopt,sha,xsavec,xsaves,clflushopt,clwb,mwaitx,clzero,pku,sgx,rdpid,gfni,vpclmulqdq,pconfig,wbnoinvd,enqcmd,avx512vp2intersect,serialize,tsxldtrk,amx-tile,amx-int8,amx-bf16,kl,widekl,avxvnni,avxifma,avxvnniint8,avxneconvert,cmpccxadd,amx-fp16,prefetchi,raoint,amx-complex,avxvnniint16,sm3,sha512,sm4,avx10.2-512") #include From patchwork Mon Aug 19 08:56:49 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Haochen Jiang X-Patchwork-Id: 1973732 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=intel.com header.i=@intel.com header.a=rsa-sha256 header.s=Intel header.b=ZZR/ncwy; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=8.43.85.97; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4WnRPG61gXz1yfj for ; Mon, 19 Aug 2024 18:59:46 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 10E01384AB6A for ; Mon, 19 Aug 2024 08:59:45 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.18]) by sourceware.org (Postfix) with ESMTPS id 960643864C5F for ; Mon, 19 Aug 2024 08:57:30 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 960643864C5F Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=intel.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 960643864C5F Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=192.198.163.18 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1724057857; cv=none; b=jB1c9sg3YCfBuNSA/oKSl2Q1GHjahO/f36gVh6rB5yOh7N955g55p30hT9Cy8OTeZtcgBkmRzlyTMXnS+Q0Jxf5Fbf0MjSuGZzhiYXtbibYuVnyKZs8Ont3eifOhyjNlMQK07F3ZwshMuqjQdKNwKmgoehMniq50xx6Zc0RV3aU= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1724057857; c=relaxed/simple; bh=T55gelA7Llunfq6bokXm0lZb+YR3UInN+7m713bB+to=; h=DKIM-Signature:From:To:Subject:Date:Message-ID:MIME-Version; b=BfXQNX0/xgwBKhYxCCJeGl5kraeRnzDPVOPRiJKByccNc6GBlFRGJPPbcQMTsrLOiRarxGK9iuToSvJHkraGqmaOx4LjTk68NCMejzqlOrphGLS7T7Eh6kbxqs/5ZemOMSpVr3hj5PlFk2Uc94eTt+JPDin2TKNjX6zMo5YEvII= ARC-Authentication-Results: i=1; server2.sourceware.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1724057851; x=1755593851; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=T55gelA7Llunfq6bokXm0lZb+YR3UInN+7m713bB+to=; b=ZZR/ncwyv/dYGTVrTV7j6RGquXlusxoYRI7J8YhCG+mkV4Jyt33zXWaB HVZYvD/RwJCXvDASesNU9gHaZX8j1bhtBHBNDOp92G7utdZLXK52WIOFg Sa/60IqCJOzTzBBc/7C6c9PbfbQ1woNLLl3DA0R75Xp1CPIrqPN79vEDa 2ETvyiG06IyScg2cmMjg5lwhOraTp04fGf4w1GgkoeHO+JZYbpV4mAGb6 DkS6NTnSDSRem6pqC/8gbbeLicYR4acMsgxZMGbXadD2QtKWL/RCv20su zY4Y+dtTN+dsGXBKFx6TABEvcYzWt13B3P/1zFQSMAntPlEsrnO/BYIsI w==; X-CSE-ConnectionGUID: R/cTXR1zRNWphbnfsRJz/w== X-CSE-MsgGUID: JxpCgJDkQuiGsx4o/jJf1g== X-IronPort-AV: E=McAfee;i="6700,10204,11168"; a="21837756" X-IronPort-AV: E=Sophos;i="6.10,158,1719903600"; d="scan'208";a="21837756" Received: from fmviesa001.fm.intel.com ([10.60.135.141]) by fmvoesa112.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Aug 2024 01:57:22 -0700 X-CSE-ConnectionGUID: iIXSHab5QqKBTyQDmepH4Q== X-CSE-MsgGUID: nLa7V5clS3WIdjirX5SNew== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.10,158,1719903600"; d="scan'208";a="91084198" Received: from scymds04.sc.intel.com ([10.82.73.238]) by fmviesa001.fm.intel.com with ESMTP; 19 Aug 2024 01:57:21 -0700 Received: from icl-spr-01.jf.intel.com (icl-spr-01.jf.intel.com [10.165.54.241]) by scymds04.sc.intel.com (Postfix) with ESMTP id EE9A02003EAC; Mon, 19 Aug 2024 01:57:20 -0700 (PDT) From: Haochen Jiang To: gcc-patches@gcc.gnu.org Cc: hongtao.liu@intel.com, zewei.mo@pitt.edu, ubizjak@gmail.com, konglin1 , Levy Hsu Subject: [PATCH 05/12] [PATCH 1/2] AVX10.2: Support BF16 instructions Date: Mon, 19 Aug 2024 01:56:49 -0700 Message-ID: <20240819085717.193256-6-haochen.jiang@intel.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20240819085717.193256-1-haochen.jiang@intel.com> References: <20240819085717.193256-1-haochen.jiang@intel.com> MIME-Version: 1.0 X-Spam-Status: No, score=-10.8 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, SPF_HELO_NONE, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org From: konglin1 gcc/ChangeLog: * config.gcc: Add avx10_2-512bf16intrin.h and avx10_2bf16intrin.h. * config/i386/i386-builtin-types.def : Add new DEF_FUNCTION_TYPE for V32BF_FTYPE_V32BF_V32BF, V16BF_FTYPE_V16BF_V16BF, V8BF_FTYPE_V8BF_V8BF, V8BF_FTYPE_V8BF_V8BF_UQI, V16BF_FTYPE_V16BF_V16BF_UHI, V32BF_FTYPE_V32BF_V32BF_USI, V32BF_FTYPE_V32BF_V32BF_V32BF_USI, V8BF_FTYPE_V8BF_V8BF_V8BF_UQI and V16BF_FTYPE_V16BF_V16BF_V16BF_UHI. * config/i386/i386-builtin.def (BDESC): Add new builtins. * config/i386/i386-expand.cc (ix86_expand_args_builtin): Handle new DEF_FUNCTION_TYPE. * config/i386/immintrin.h: Include avx10_2-512bf16intrin.h and avx10_2bf16intrin.h. * config/i386/sse.md (avx10_2_scalefpbf16_): New define_insn. (avx10_2_nepbf16_): Ditto. (avx10_2_nepbf16_): Ditto. (avx10_2_pbf16__maskz): Ditto. (avx10_2_pbf16_): Ditto. (avx10_2_pbf16__mask3): Ditto. * config/i386/avx10_2-512bf16intrin.h: New file. * config/i386/avx10_2bf16intrin.h: Ditto. gcc/testsuite/ChangeLog: * gcc.target/i386/avx512f-helper.h: Add MAKE_MASK_MERGE and MAKE_MASK_ZERO for bf16_uw. * gcc.target/i386/m512-check.h: Add union512bf16_uw, union256bf16_uw, union128bf16_uw and CHECK_EXP for them. * gcc.target/i386/avx10-helper.h: New file. * gcc.target/i386/avx10_2-512-bf16ne-1.c: New test. * gcc.target/i386/avx10_2-512-vaddnepbf16-2.c: Ditto. * gcc.target/i386/avx10_2-512-vdivnepbf16-2.c: Ditto. * gcc.target/i386/avx10_2-512-vfmaddXXXnepbf16-2.c: Ditto. * gcc.target/i386/avx10_2-512-vfmsubXXXnepbf16-2.c: Ditto. * gcc.target/i386/avx10_2-512-vfnmaddXXXnepbf16-2.c: Ditto. * gcc.target/i386/avx10_2-512-vfnmsubXXXnepbf16-2.c: Ditto. * gcc.target/i386/avx10_2-512-vmaxpbf16-2.c: Ditto. * gcc.target/i386/avx10_2-512-vminpbf16-2.c: Ditto. * gcc.target/i386/avx10_2-512-vscalefpbf16-2.c: Ditto. * gcc.target/i386/avx10_2-512-vsubnepbf16-2.c: Ditto. * gcc.target/i386/avx10_2-bf16ne-1.c: Ditto. * gcc.target/i386/avx10_2-vaddnepbf16-2.c: Ditto. * gcc.target/i386/avx10_2-vdivnepbf16-2.c: Ditto. * gcc.target/i386/avx10_2-vfmaddXXXnepbf16-2.c: Ditto. * gcc.target/i386/avx10_2-vfmsubXXXnepbf16-2.c: Ditto. * gcc.target/i386/avx10_2-vfnmaddXXXnepbf16-2.c: Ditto. * gcc.target/i386/avx10_2-vfnmsubXXXnepbf16-2.c: Ditto. * gcc.target/i386/avx10_2-vmaxpbf16-2.c: Ditto. * gcc.target/i386/avx10_2-vminpbf16-2.c: Ditto. * gcc.target/i386/avx10_2-vmulnepbf16-2.c: Ditto. * gcc.target/i386/avx10_2-vscalefpbf16-2.c: Ditto. * gcc.target/i386/avx10_2-vsubnepbf16-2.c: Ditto. Co-authored-by: Levy Hsu --- gcc/config.gcc | 2 +- gcc/config/i386/avx10_2-512bf16intrin.h | 364 ++++++++++ gcc/config/i386/avx10_2bf16intrin.h | 685 ++++++++++++++++++ gcc/config/i386/i386-builtin-types.def | 9 + gcc/config/i386/i386-builtin.def | 78 ++ gcc/config/i386/i386-expand.cc | 9 + gcc/config/i386/immintrin.h | 4 + gcc/config/i386/sse.md | 293 ++++++++ gcc/testsuite/gcc.target/i386/avx10-helper.h | 48 +- .../gcc.target/i386/avx10_2-512-bf16-1.c | 87 +++ .../i386/avx10_2-512-vaddnepbf16-2.c | 49 ++ .../i386/avx10_2-512-vdivnepbf16-2.c | 49 ++ .../i386/avx10_2-512-vfmaddXXXnepbf16-2.c | 52 ++ .../i386/avx10_2-512-vfmsubXXXnepbf16-2.c | 53 ++ .../i386/avx10_2-512-vfnmaddXXXnepbf16-2.c | 53 ++ .../i386/avx10_2-512-vfnmsubXXXnepbf16-2.c | 53 ++ .../gcc.target/i386/avx10_2-512-vmaxpbf16-2.c | 51 ++ .../gcc.target/i386/avx10_2-512-vminpbf16-2.c | 51 ++ .../i386/avx10_2-512-vmulnepbf16-2.c | 49 ++ .../i386/avx10_2-512-vscalefpbf16-2.c | 51 ++ .../i386/avx10_2-512-vsubnepbf16-2.c | 49 ++ .../gcc.target/i386/avx10_2-bf16-1.c | 172 +++++ .../gcc.target/i386/avx10_2-vaddnepbf16-2.c | 16 + .../gcc.target/i386/avx10_2-vdivnepbf16-2.c | 16 + .../i386/avx10_2-vfmaddXXXnepbf16-2.c | 16 + .../i386/avx10_2-vfmsubXXXnepbf16-2.c | 16 + .../i386/avx10_2-vfnmaddXXXnepbf16-2.c | 16 + .../i386/avx10_2-vfnmsubXXXnepbf16-2.c | 16 + .../gcc.target/i386/avx10_2-vmaxpbf16-2.c | 16 + .../gcc.target/i386/avx10_2-vminpbf16-2.c | 16 + .../gcc.target/i386/avx10_2-vmulnepbf16-2.c | 16 + .../gcc.target/i386/avx10_2-vscalefpbf16-2.c | 16 + .../gcc.target/i386/avx10_2-vsubnepbf16-2.c | 16 + .../gcc.target/i386/avx512f-helper.h | 2 + gcc/testsuite/gcc.target/i386/m512-check.h | 27 + 35 files changed, 2514 insertions(+), 2 deletions(-) create mode 100644 gcc/config/i386/avx10_2-512bf16intrin.h create mode 100644 gcc/config/i386/avx10_2bf16intrin.h create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-512-bf16-1.c create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-512-vaddnepbf16-2.c create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-512-vdivnepbf16-2.c create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-512-vfmaddXXXnepbf16-2.c create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-512-vfmsubXXXnepbf16-2.c create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-512-vfnmaddXXXnepbf16-2.c create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-512-vfnmsubXXXnepbf16-2.c create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-512-vmaxpbf16-2.c create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-512-vminpbf16-2.c create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-512-vmulnepbf16-2.c create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-512-vscalefpbf16-2.c create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-512-vsubnepbf16-2.c create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-bf16-1.c create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-vaddnepbf16-2.c create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-vdivnepbf16-2.c create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-vfmaddXXXnepbf16-2.c create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-vfmsubXXXnepbf16-2.c create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-vfnmaddXXXnepbf16-2.c create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-vfnmsubXXXnepbf16-2.c create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-vmaxpbf16-2.c create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-vminpbf16-2.c create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-vmulnepbf16-2.c create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-vscalefpbf16-2.c create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-vsubnepbf16-2.c diff --git a/gcc/config.gcc b/gcc/config.gcc index 5e9c36a2aad..7d761b257cd 100644 --- a/gcc/config.gcc +++ b/gcc/config.gcc @@ -454,7 +454,7 @@ i[34567]86-*-* | x86_64-*-*) sm3intrin.h sha512intrin.h sm4intrin.h usermsrintrin.h avx10_2roundingintrin.h avx10_2mediaintrin.h avx10_2-512mediaintrin.h - avx10_2convertintrin.h avx10_2-512convertintrin.h" + avx10_2bf16intrin.h avx10_2-512bf16intrin.h" ;; ia64-*-*) extra_headers=ia64intrin.h diff --git a/gcc/config/i386/avx10_2-512bf16intrin.h b/gcc/config/i386/avx10_2-512bf16intrin.h new file mode 100644 index 00000000000..b409ea17adb --- /dev/null +++ b/gcc/config/i386/avx10_2-512bf16intrin.h @@ -0,0 +1,364 @@ +/* Copyright (C) 2024 Free Software Foundation, Inc. + + This file is part of GCC. + + GCC is free software; you can redistribute it and/or modify + it under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 3, or (at your option) + any later version. + + GCC is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + Under Section 7 of GPL version 3, you are granted additional + permissions described in the GCC Runtime Library Exception, version + 3.1, as published by the Free Software Foundation. + + You should have received a copy of the GNU General Public License and + a copy of the GCC Runtime Library Exception along with this program; + see the files COPYING3 and COPYING.RUNTIME respectively. If not, see + . */ + +#ifndef _IMMINTRIN_H_INCLUDED +#error "Never use directly; include instead." +#endif + +#ifndef _AVX10_2_512BF16INTRIN_H_INCLUDED +#define _AVX10_2_512BF16INTRIN_H_INCLUDED + +#if !defined (__AVX10_2_512__) +#pragma GCC push_options +#pragma GCC target("avx10.2-512") +#define __DISABLE_AVX10_2_512__ +#endif /* __AVX10_2_512__ */ + +extern __inline__ __m512bh +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_addne_pbh (__m512bh __A, __m512bh __B) +{ + return (__m512bh) __builtin_ia32_addnepbf16512 (__A, __B); +} + +extern __inline__ __m512bh +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_addne_pbh (__m512bh __W, __mmask32 __U, + __m512bh __A, __m512bh __B) +{ + return (__m512bh) + __builtin_ia32_addnepbf16512_mask (__A, __B, __W, __U); +} + +extern __inline__ __m512bh +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_addne_pbh (__mmask32 __U, __m512bh __A, __m512bh __B) +{ + return (__m512bh) + __builtin_ia32_addnepbf16512_mask (__A, __B, + (__v32bf) _mm512_setzero_si512 (), + __U); +} + +extern __inline__ __m512bh +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_subne_pbh (__m512bh __A, __m512bh __B) +{ + return (__m512bh) __builtin_ia32_subnepbf16512 (__A, __B); +} + +extern __inline__ __m512bh +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_subne_pbh (__m512bh __W, __mmask32 __U, + __m512bh __A, __m512bh __B) +{ + return (__m512bh) + __builtin_ia32_subnepbf16512_mask (__A, __B, __W, __U); +} + +extern __inline__ __m512bh +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_subne_pbh (__mmask32 __U, __m512bh __A, __m512bh __B) +{ + return (__m512bh) + __builtin_ia32_subnepbf16512_mask (__A, __B, + (__v32bf) _mm512_setzero_si512 (), + __U); +} + +extern __inline__ __m512bh +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mulne_pbh (__m512bh __A, __m512bh __B) +{ + return (__m512bh) __builtin_ia32_mulnepbf16512 (__A, __B); +} + +extern __inline__ __m512bh +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_mulne_pbh (__m512bh __W, __mmask32 __U, + __m512bh __A, __m512bh __B) +{ + return (__m512bh) + __builtin_ia32_mulnepbf16512_mask (__A, __B, __W, __U); +} + +extern __inline__ __m512bh +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_mulne_pbh (__mmask32 __U, __m512bh __A, __m512bh __B) +{ + return (__m512bh) + __builtin_ia32_mulnepbf16512_mask (__A, __B, + (__v32bf) _mm512_setzero_si512 (), + __U); +} + +extern __inline__ __m512bh +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_divne_pbh (__m512bh __A, __m512bh __B) +{ + return (__m512bh) __builtin_ia32_divnepbf16512 (__A, __B); +} + +extern __inline__ __m512bh +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_divne_pbh (__m512bh __W, __mmask32 __U, + __m512bh __A, __m512bh __B) +{ + return (__m512bh) + __builtin_ia32_divnepbf16512_mask (__A, __B, __W, __U); +} + +extern __inline__ __m512bh +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_divne_pbh (__mmask32 __U, __m512bh __A, __m512bh __B) +{ + return (__m512bh) + __builtin_ia32_divnepbf16512_mask (__A, __B, + (__v32bf) _mm512_setzero_si512 (), + __U); +} + +extern __inline__ __m512bh +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_max_pbh (__m512bh __A, __m512bh __B) +{ + return (__m512bh) __builtin_ia32_maxpbf16512 (__A, __B); +} + +extern __inline__ __m512bh +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_max_pbh (__m512bh __W, __mmask32 __U, + __m512bh __A, __m512bh __B) +{ + return (__m512bh) + __builtin_ia32_maxpbf16512_mask (__A, __B, __W, __U); +} + +extern __inline__ __m512bh +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_max_pbh (__mmask32 __U, __m512bh __A, __m512bh __B) +{ + return (__m512bh) + __builtin_ia32_maxpbf16512_mask (__A, __B, + (__v32bf) _mm512_setzero_si512 (), + __U); +} + +extern __inline__ __m512bh +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_min_pbh (__m512bh __A, __m512bh __B) +{ + return (__m512bh) __builtin_ia32_minpbf16512 (__A, __B); +} + +extern __inline__ __m512bh +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_min_pbh (__m512bh __W, __mmask32 __U, + __m512bh __A, __m512bh __B) +{ + return (__m512bh) + __builtin_ia32_minpbf16512_mask (__A, __B, __W, __U); +} + +extern __inline__ __m512bh +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_min_pbh (__mmask32 __U, __m512bh __A, __m512bh __B) +{ + return (__m512bh) + __builtin_ia32_minpbf16512_mask (__A, __B, + (__v32bf) _mm512_setzero_si512 (), + __U); +} + +extern __inline__ __m512bh +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_scalef_pbh (__m512bh __A, __m512bh __B) +{ + return (__m512bh) __builtin_ia32_scalefpbf16512 (__A, __B); +} + +extern __inline__ __m512bh +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_scalef_pbh (__m512bh __W, __mmask32 __U, + __m512bh __A, __m512bh __B) +{ + return (__m512bh) + __builtin_ia32_scalefpbf16512_mask (__A, __B, __W, __U); +} + +extern __inline__ __m512bh +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_scalef_pbh (__mmask32 __U, __m512bh __A, __m512bh __B) +{ + return (__m512bh) + __builtin_ia32_scalefpbf16512_mask (__A, __B, + (__v32bf) _mm512_setzero_si512 (), + __U); +} + +extern __inline__ __m512bh +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_fmaddne_pbh (__m512bh __A, __m512bh __B, __m512bh __C) +{ + return (__m512bh) + __builtin_ia32_fmaddnepbf16512_mask (__A, __B, __C, (__mmask32) -1); +} + +extern __inline__ __m512bh +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_fmaddne_pbh (__m512bh __A, __mmask32 __U, + __m512bh __B, __m512bh __C) +{ + return (__m512bh) + __builtin_ia32_fmaddnepbf16512_mask (__A, __B, __C, __U); +} + +extern __inline__ __m512bh +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask3_fmaddne_pbh (__m512bh __A, __m512bh __B, + __m512bh __C, __mmask32 __U) +{ + return (__m512bh) + __builtin_ia32_fmaddnepbf16512_mask3 (__A, __B, __C, __U); +} + +extern __inline__ __m512bh +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_fmaddne_pbh (__mmask32 __U, __m512bh __A, + __m512bh __B, __m512bh __C) +{ + return (__m512bh) + __builtin_ia32_fmaddnepbf16512_maskz (__A, __B, __C, __U); +} + +extern __inline__ __m512bh +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_fmsubne_pbh (__m512bh __A, __m512bh __B, __m512bh __C) +{ + return (__m512bh) + __builtin_ia32_fmsubnepbf16512_mask (__A, __B, __C, (__mmask32) -1); +} + +extern __inline__ __m512bh +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_fmsubne_pbh (__m512bh __A, __mmask32 __U, + __m512bh __B, __m512bh __C) +{ + return (__m512bh) + __builtin_ia32_fmsubnepbf16512_mask (__A, __B, __C, __U); +} + +extern __inline__ __m512bh +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask3_fmsubne_pbh (__m512bh __A, __m512bh __B, + __m512bh __C, __mmask32 __U) +{ + return (__m512bh) + __builtin_ia32_fmsubnepbf16512_mask3 (__A, __B, __C, __U); +} + +extern __inline__ __m512bh +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_fmsubne_pbh (__mmask32 __U, __m512bh __A, + __m512bh __B, __m512bh __C) +{ + return (__m512bh) + __builtin_ia32_fmsubnepbf16512_maskz (__A, __B, __C, __U); +} + +extern __inline__ __m512bh +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_fnmaddne_pbh (__m512bh __A, __m512bh __B, __m512bh __C) +{ + return (__m512bh) + __builtin_ia32_fnmaddnepbf16512_mask (__A, __B, __C, (__mmask32) -1); +} + +extern __inline__ __m512bh +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_fnmaddne_pbh (__m512bh __A, __mmask32 __U, + __m512bh __B, __m512bh __C) +{ + return (__m512bh) + __builtin_ia32_fnmaddnepbf16512_mask (__A, __B, __C, __U); +} + +extern __inline__ __m512bh +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask3_fnmaddne_pbh (__m512bh __A, __m512bh __B, + __m512bh __C, __mmask32 __U) +{ + return (__m512bh) + __builtin_ia32_fnmaddnepbf16512_mask3 (__A, __B, __C, __U); +} + +extern __inline__ __m512bh +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_fnmaddne_pbh (__mmask32 __U, __m512bh __A, + __m512bh __B, __m512bh __C) +{ + return (__m512bh) + __builtin_ia32_fnmaddnepbf16512_maskz (__A, __B, __C, __U); +} + +extern __inline__ __m512bh +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_fnmsubne_pbh (__m512bh __A, __m512bh __B, __m512bh __C) +{ + return (__m512bh) + __builtin_ia32_fnmsubnepbf16512_mask (__A, __B, __C, (__mmask32) -1); +} + +extern __inline__ __m512bh +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_fnmsubne_pbh (__m512bh __A, __mmask32 __U, + __m512bh __B, __m512bh __C) +{ + return (__m512bh) + __builtin_ia32_fnmsubnepbf16512_mask (__A, __B, __C, __U); +} + +extern __inline__ __m512bh +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask3_fnmsubne_pbh (__m512bh __A, __m512bh __B, + __m512bh __C, __mmask32 __U) +{ + return (__m512bh) + __builtin_ia32_fnmsubnepbf16512_mask3 (__A, __B, __C, __U); +} + +extern __inline__ __m512bh +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_fnmsubne_pbh (__mmask32 __U, __m512bh __A, + __m512bh __B, __m512bh __C) +{ + return (__m512bh) + __builtin_ia32_fnmsubnepbf16512_maskz (__A, __B, __C, __U); +} + +#ifdef __DISABLE_AVX10_2_512__ +#undef __DISABLE_AVX10_2_512__ +#pragma GCC pop_options +#endif /* __DISABLE_AVX10_2_512__ */ + +#endif /* _AVX10_2_512BF16INTRIN_H_INCLUDED */ diff --git a/gcc/config/i386/avx10_2bf16intrin.h b/gcc/config/i386/avx10_2bf16intrin.h new file mode 100644 index 00000000000..e16f1b66481 --- /dev/null +++ b/gcc/config/i386/avx10_2bf16intrin.h @@ -0,0 +1,685 @@ +/* Copyright (C) 2024 Free Software Foundation, Inc. + + This file is part of GCC. + + GCC is free software; you can redistribute it and/or modify + it under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 3, or (at your option) + any later version. + + GCC is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + Under Section 7 of GPL version 3, you are granted additional + permissions described in the GCC Runtime Library Exception, version + 3.1, as published by the Free Software Foundation. + + You should have received a copy of the GNU General Public License and + a copy of the GCC Runtime Library Exception along with this program; + see the files COPYING3 and COPYING.RUNTIME respectively. If not, see + . */ + +#if !defined _IMMINTRIN_H_INCLUDED +#error "Never use directly; include instead." +#endif + +#ifndef _AVX10_2BF16INTRIN_H_INCLUDED +#define _AVX10_2BF16INTRIN_H_INCLUDED + +#if !defined(__AVX10_2_256__) +#pragma GCC push_options +#pragma GCC target("avx10.2") +#define __DISABLE_AVX10_2_256__ +#endif /* __AVX10_2_256__ */ + +extern __inline__ __m256bh +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_addne_pbh (__m256bh __A, __m256bh __B) +{ + return (__m256bh) __builtin_ia32_addnepbf16256 (__A, __B); +} + +extern __inline__ __m256bh +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_mask_addne_pbh (__m256bh __W, __mmask16 __U, + __m256bh __A, __m256bh __B) +{ + return (__m256bh) + __builtin_ia32_addnepbf16256_mask (__A, __B, __W, __U); +} + +extern __inline__ __m256bh +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_maskz_addne_pbh (__mmask16 __U, __m256bh __A, __m256bh __B) +{ + return (__m256bh) + __builtin_ia32_addnepbf16256_mask (__A, __B, + (__v16bf) _mm256_setzero_si256 (), + __U); +} + +extern __inline__ __m128bh +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_addne_pbh (__m128bh __A, __m128bh __B) +{ + return (__m128bh) __builtin_ia32_addnepbf16128 (__A, __B); +} + +extern __inline__ __m128bh +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_addne_pbh (__m128bh __W, __mmask8 __U, + __m128bh __A, __m128bh __B) +{ + return (__m128bh) + __builtin_ia32_addnepbf16128_mask (__A, __B, __W, __U); +} + +extern __inline__ __m128bh +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_addne_pbh (__mmask8 __U, __m128bh __A, __m128bh __B) +{ + return (__m128bh) + __builtin_ia32_addnepbf16128_mask (__A, __B, + (__v8bf) _mm_setzero_si128 (), + __U); +} + +extern __inline__ __m256bh +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_subne_pbh (__m256bh __A, __m256bh __B) +{ + return (__m256bh) __builtin_ia32_subnepbf16256 (__A, __B); +} + +extern __inline__ __m256bh +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_mask_subne_pbh (__m256bh __W, __mmask16 __U, + __m256bh __A, __m256bh __B) +{ + return (__m256bh) + __builtin_ia32_subnepbf16256_mask (__A, __B, __W, __U); +} + +extern __inline__ __m256bh +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_maskz_subne_pbh (__mmask16 __U, __m256bh __A, __m256bh __B) +{ + return (__m256bh) + __builtin_ia32_subnepbf16256_mask (__A, __B, + (__v16bf) _mm256_setzero_si256 (), + __U); +} + +extern __inline__ __m128bh +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_subne_pbh (__m128bh __A, __m128bh __B) +{ + return (__m128bh) __builtin_ia32_subnepbf16128 (__A, __B); +} + +extern __inline__ __m128bh +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_subne_pbh (__m128bh __W, __mmask8 __U, + __m128bh __A, __m128bh __B) +{ + return (__m128bh) + __builtin_ia32_subnepbf16128_mask (__A, __B, __W, __U); +} + +extern __inline__ __m128bh +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_subne_pbh (__mmask8 __U, __m128bh __A, __m128bh __B) +{ + return (__m128bh) + __builtin_ia32_subnepbf16128_mask (__A, __B, + (__v8bf) _mm_setzero_si128 (), + __U); +} + +extern __inline__ __m256bh +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_mulne_pbh (__m256bh __A, __m256bh __B) +{ + return (__m256bh) __builtin_ia32_mulnepbf16256 (__A, __B); +} + +extern __inline__ __m256bh +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_mask_mulne_pbh (__m256bh __W, __mmask16 __U, + __m256bh __A, __m256bh __B) +{ + return (__m256bh) + __builtin_ia32_mulnepbf16256_mask (__A, __B, __W, __U); +} + +extern __inline__ __m256bh +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_maskz_mulne_pbh (__mmask16 __U, __m256bh __A, __m256bh __B) +{ + return (__m256bh) + __builtin_ia32_mulnepbf16256_mask (__A, __B, + (__v16bf) _mm256_setzero_si256 (), + __U); +} + +extern __inline__ __m128bh +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mulne_pbh (__m128bh __A, __m128bh __B) +{ + return (__m128bh) __builtin_ia32_mulnepbf16128 (__A, __B); +} + +extern __inline__ __m128bh +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_mulne_pbh (__m128bh __W, __mmask8 __U, + __m128bh __A, __m128bh __B) +{ + return (__m128bh) + __builtin_ia32_mulnepbf16128_mask (__A, __B, __W, __U); +} + +extern __inline__ __m128bh +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_mulne_pbh (__mmask8 __U, __m128bh __A, __m128bh __B) +{ + return (__m128bh) + __builtin_ia32_mulnepbf16128_mask (__A, __B, + (__v8bf) _mm_setzero_si128 (), + __U); +} + +extern __inline__ __m256bh +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_divne_pbh (__m256bh __A, __m256bh __B) +{ + return (__m256bh) __builtin_ia32_divnepbf16256 (__A, __B); +} + +extern __inline__ __m256bh +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_mask_divne_pbh (__m256bh __W, __mmask16 __U, + __m256bh __A, __m256bh __B) +{ + return (__m256bh) + __builtin_ia32_divnepbf16256_mask (__A, __B, __W, __U); +} + +extern __inline__ __m256bh +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_maskz_divne_pbh (__mmask16 __U, __m256bh __A, __m256bh __B) +{ + return (__m256bh) + __builtin_ia32_divnepbf16256_mask (__A, __B, + (__v16bf) _mm256_setzero_si256 (), + __U); +} + +extern __inline__ __m128bh +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_divne_pbh (__m128bh __A, __m128bh __B) +{ + return (__m128bh) __builtin_ia32_divnepbf16128 (__A, __B); +} + +extern __inline__ __m128bh +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_divne_pbh (__m128bh __W, __mmask8 __U, + __m128bh __A, __m128bh __B) +{ + return (__m128bh) + __builtin_ia32_divnepbf16128_mask (__A, __B, __W, __U); +} + +extern __inline__ __m128bh +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_divne_pbh (__mmask8 __U, __m128bh __A, __m128bh __B) +{ + return (__m128bh) + __builtin_ia32_divnepbf16128_mask (__A, __B, + (__v8bf) _mm_setzero_si128 (), + __U); +} + +extern __inline__ __m256bh +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_max_pbh (__m256bh __A, __m256bh __B) +{ + return (__m256bh) __builtin_ia32_maxpbf16256 (__A, __B); +} + +extern __inline__ __m256bh +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_mask_max_pbh (__m256bh __W, __mmask16 __U, + __m256bh __A, __m256bh __B) +{ + return (__m256bh) + __builtin_ia32_maxpbf16256_mask (__A, __B, __W, __U); +} + +extern __inline__ __m256bh +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_maskz_max_pbh (__mmask16 __U, __m256bh __A, __m256bh __B) +{ + return (__m256bh) + __builtin_ia32_maxpbf16256_mask (__A, __B, + (__v16bf) _mm256_setzero_si256 (), + __U); +} + +extern __inline__ __m128bh +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_max_pbh (__m128bh __A, __m128bh __B) +{ + return (__m128bh) __builtin_ia32_maxpbf16128 (__A, __B); +} + +extern __inline__ __m128bh +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_max_pbh (__m128bh __W, __mmask8 __U, + __m128bh __A, __m128bh __B) +{ + return (__m128bh) + __builtin_ia32_maxpbf16128_mask (__A, __B, __W, __U); +} + +extern __inline__ __m128bh +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_max_pbh (__mmask8 __U, __m128bh __A, __m128bh __B) +{ + return (__m128bh) + __builtin_ia32_maxpbf16128_mask (__A, __B, + (__v8bf) _mm_setzero_si128 (), + __U); +} + +extern __inline__ __m256bh +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_min_pbh (__m256bh __A, __m256bh __B) +{ + return (__m256bh) __builtin_ia32_minpbf16256 (__A, __B); +} + +extern __inline__ __m256bh +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_mask_min_pbh (__m256bh __W, __mmask16 __U, + __m256bh __A, __m256bh __B) +{ + return (__m256bh) + __builtin_ia32_minpbf16256_mask (__A, __B, __W, __U); +} + +extern __inline__ __m256bh +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_maskz_min_pbh (__mmask16 __U, __m256bh __A, __m256bh __B) +{ + return (__m256bh) + __builtin_ia32_minpbf16256_mask (__A, __B, + (__v16bf) _mm256_setzero_si256 (), + __U); +} + +extern __inline__ __m128bh +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_min_pbh (__m128bh __A, __m128bh __B) +{ + return (__m128bh) __builtin_ia32_minpbf16128 (__A, __B); +} + +extern __inline__ __m128bh +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_min_pbh (__m128bh __W, __mmask8 __U, + __m128bh __A, __m128bh __B) +{ + return (__m128bh) + __builtin_ia32_minpbf16128_mask (__A, __B, __W, __U); +} + +extern __inline__ __m128bh +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_min_pbh (__mmask8 __U, __m128bh __A, __m128bh __B) +{ + return (__m128bh) + __builtin_ia32_minpbf16128_mask (__A, __B, + (__v8bf) _mm_setzero_si128 (), + __U); +} + +extern __inline__ __m256bh +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_scalef_pbh (__m256bh __A, __m256bh __B) +{ + return (__m256bh) __builtin_ia32_scalefpbf16256 (__A, __B); +} + +extern __inline__ __m256bh +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_mask_scalef_pbh (__m256bh __W, __mmask16 __U, + __m256bh __A, __m256bh __B) +{ + return (__m256bh) + __builtin_ia32_scalefpbf16256_mask (__A, __B, __W, __U); +} + +extern __inline__ __m256bh +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_maskz_scalef_pbh (__mmask16 __U, __m256bh __A, __m256bh __B) +{ + return (__m256bh) + __builtin_ia32_scalefpbf16256_mask (__A, __B, + (__v16bf) _mm256_setzero_si256 (), + __U); +} + +extern __inline__ __m128bh +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_scalef_pbh (__m128bh __A, __m128bh __B) +{ + return (__m128bh) __builtin_ia32_scalefpbf16128 (__A, __B); +} + +extern __inline__ __m128bh +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_scalef_pbh (__m128bh __W, __mmask8 __U, + __m128bh __A, __m128bh __B) +{ + return (__m128bh) + __builtin_ia32_scalefpbf16128_mask (__A, __B, __W, __U); +} + +extern __inline__ __m128bh +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_scalef_pbh (__mmask8 __U, __m128bh __A, __m128bh __B) +{ + return (__m128bh) + __builtin_ia32_scalefpbf16128_mask (__A, __B, + (__v8bf) _mm_setzero_si128 (), + __U); +} + +extern __inline__ __m256bh +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_fmaddne_pbh (__m256bh __A, __m256bh __B, __m256bh __C) +{ + return (__m256bh) + __builtin_ia32_fmaddnepbf16256_mask (__A, __B, __C, (__mmask16) -1); +} + +extern __inline__ __m256bh +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_mask_fmaddne_pbh (__m256bh __A, __mmask16 __U, + __m256bh __B, __m256bh __C) +{ + return (__m256bh) + __builtin_ia32_fmaddnepbf16256_mask (__A, __B, __C, __U); +} + +extern __inline__ __m256bh +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_mask3_fmaddne_pbh (__m256bh __A, __m256bh __B, + __m256bh __C, __mmask16 __U) +{ + return (__m256bh) + __builtin_ia32_fmaddnepbf16256_mask3 (__A, __B, __C, __U); +} + +extern __inline__ __m256bh +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_maskz_fmaddne_pbh (__mmask16 __U, __m256bh __A, + __m256bh __B, __m256bh __C) +{ + return (__m256bh) + __builtin_ia32_fmaddnepbf16256_maskz (__A, __B, __C, __U); +} + +extern __inline__ __m128bh +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_fmaddne_pbh (__m128bh __A, __m128bh __B, __m128bh __C) +{ + return (__m128bh) + __builtin_ia32_fmaddnepbf16128_mask (__A, __B, __C, (__mmask8) -1); +} + +extern __inline__ __m128bh +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_fmaddne_pbh (__m128bh __A, __mmask8 __U, + __m128bh __B, __m128bh __C) +{ + return (__m128bh) + __builtin_ia32_fmaddnepbf16128_mask (__A, __B, __C, __U); +} + +extern __inline__ __m128bh +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask3_fmaddne_pbh (__m128bh __A, __m128bh __B, + __m128bh __C, __mmask8 __U) +{ + return (__m128bh) + __builtin_ia32_fmaddnepbf16128_mask3 (__A, __B, __C, __U); +} + +extern __inline__ __m128bh +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_fmaddne_pbh (__mmask8 __U, __m128bh __A, + __m128bh __B, __m128bh __C) +{ + return (__m128bh) + __builtin_ia32_fmaddnepbf16128_maskz (__A, __B, __C, __U); +} + +extern __inline__ __m256bh +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_fmsubne_pbh (__m256bh __A, __m256bh __B, __m256bh __C) +{ + return (__m256bh) + __builtin_ia32_fmsubnepbf16256_mask (__A, __B, __C, (__mmask16) -1); +} + +extern __inline__ __m256bh +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_mask_fmsubne_pbh (__m256bh __A, __mmask16 __U, + __m256bh __B, __m256bh __C) +{ + return (__m256bh) __builtin_ia32_fmsubnepbf16256_mask (__A, __B, __C, __U); +} + +extern __inline__ __m256bh +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_mask3_fmsubne_pbh (__m256bh __A, __m256bh __B, + __m256bh __C, __mmask16 __U) +{ + return (__m256bh) + __builtin_ia32_fmsubnepbf16256_mask3 (__A, __B, __C, __U); +} + +extern __inline__ __m256bh +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_maskz_fmsubne_pbh (__mmask16 __U, __m256bh __A, + __m256bh __B, __m256bh __C) +{ + return (__m256bh) + __builtin_ia32_fmsubnepbf16256_maskz (__A, __B, __C, __U); +} + +extern __inline__ __m128bh +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_fmsubne_pbh (__m128bh __A, __m128bh __B, __m128bh __C) +{ + return (__m128bh) + __builtin_ia32_fmsubnepbf16128_mask (__A, __B, __C, (__mmask8) -1); +} + +extern __inline__ __m128bh +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_fmsubne_pbh (__m128bh __A, __mmask8 __U, + __m128bh __B, __m128bh __C) +{ + return (__m128bh) + __builtin_ia32_fmsubnepbf16128_mask (__A, __B, __C, __U); +} + +extern __inline__ __m128bh +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask3_fmsubne_pbh (__m128bh __A, __m128bh __B, + __m128bh __C, __mmask8 __U) +{ + return (__m128bh) + __builtin_ia32_fmsubnepbf16128_mask3 (__A, __B, __C, __U); +} + +extern __inline__ __m128bh +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_fmsubne_pbh (__mmask8 __U, __m128bh __A, + __m128bh __B, __m128bh __C) +{ + return (__m128bh) + __builtin_ia32_fmsubnepbf16128_maskz (__A, __B, __C, __U); +} + +extern __inline__ __m256bh +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_fnmaddne_pbh (__m256bh __A, __m256bh __B, __m256bh __C) +{ + return (__m256bh) + __builtin_ia32_fnmaddnepbf16256_mask (__A, __B, __C, (__mmask16) -1); +} + +extern __inline__ __m256bh +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_mask_fnmaddne_pbh (__m256bh __A, __mmask16 __U, + __m256bh __B, __m256bh __C) +{ + return (__m256bh) + __builtin_ia32_fnmaddnepbf16256_mask (__A, __B, __C, __U); +} + +extern __inline__ __m256bh +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_mask3_fnmaddne_pbh (__m256bh __A, __m256bh __B, + __m256bh __C, __mmask16 __U) +{ + return (__m256bh) + __builtin_ia32_fnmaddnepbf16256_mask3 (__A, __B, __C, __U); +} + +extern __inline__ __m256bh +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_maskz_fnmaddne_pbh (__mmask16 __U, __m256bh __A, + __m256bh __B, __m256bh __C) +{ + return (__m256bh) + __builtin_ia32_fnmaddnepbf16256_maskz (__A, __B, __C, __U); +} + +extern __inline__ __m128bh +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_fnmaddne_pbh (__m128bh __A, __m128bh __B, __m128bh __C) +{ + return (__m128bh) + __builtin_ia32_fnmaddnepbf16128_mask (__A, __B, __C, (__mmask8) -1); +} + +extern __inline__ __m128bh +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_fnmaddne_pbh (__m128bh __A, __mmask8 __U, + __m128bh __B, __m128bh __C) +{ + return (__m128bh) + __builtin_ia32_fnmaddnepbf16128_mask (__A, __B, __C, __U); +} + +extern __inline__ __m128bh +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask3_fnmaddne_pbh (__m128bh __A, __m128bh __B, + __m128bh __C, __mmask8 __U) +{ + return (__m128bh) + __builtin_ia32_fnmaddnepbf16128_mask3 (__A, __B, __C, __U); +} + +extern __inline__ __m128bh +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_fnmaddne_pbh (__mmask8 __U, __m128bh __A, + __m128bh __B, __m128bh __C) +{ + return (__m128bh) + __builtin_ia32_fnmaddnepbf16128_maskz (__A, __B, __C, __U); +} + +extern __inline__ __m256bh +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_fnmsubne_pbh (__m256bh __A, __m256bh __B, __m256bh __C) +{ + return (__m256bh) + __builtin_ia32_fnmsubnepbf16256_mask (__A, __B, __C, (__mmask16) -1); +} + +extern __inline__ __m256bh +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_mask_fnmsubne_pbh (__m256bh __A, __mmask16 __U, + __m256bh __B, __m256bh __C) +{ + return (__m256bh) + __builtin_ia32_fnmsubnepbf16256_mask (__A, __B, __C, __U); +} + +extern __inline__ __m256bh +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_mask3_fnmsubne_pbh (__m256bh __A, __m256bh __B, + __m256bh __C, __mmask16 __U) +{ + return (__m256bh) + __builtin_ia32_fnmsubnepbf16256_mask3 (__A, __B, __C, __U); +} + +extern __inline__ __m256bh +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_maskz_fnmsubne_pbh (__mmask16 __U, __m256bh __A, + __m256bh __B, __m256bh __C) +{ + return (__m256bh) + __builtin_ia32_fnmsubnepbf16256_maskz (__A, __B, __C, __U); +} + +extern __inline__ __m128bh +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_fnmsubne_pbh (__m128bh __A, __m128bh __B, __m128bh __C) +{ + return (__m128bh) + __builtin_ia32_fnmsubnepbf16128_mask (__A, __B, __C, (__mmask8) -1); +} + +extern __inline__ __m128bh +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_fnmsubne_pbh (__m128bh __A, __mmask8 __U, + __m128bh __B, __m128bh __C) +{ + return (__m128bh) + __builtin_ia32_fnmsubnepbf16128_mask (__A, __B, __C, __U); +} + +extern __inline__ __m128bh +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask3_fnmsubne_pbh (__m128bh __A, __m128bh __B, + __m128bh __C, __mmask8 __U) +{ + return (__m128bh) + __builtin_ia32_fnmsubnepbf16128_mask3 (__A, __B, __C, __U); +} + +extern __inline__ __m128bh +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_fnmsubne_pbh (__mmask8 __U, __m128bh __A, + __m128bh __B, __m128bh __C) +{ + return (__m128bh) + __builtin_ia32_fnmsubnepbf16128_maskz (__A, __B, __C, __U); +} + +#ifdef __DISABLE_AVX10_2_256__ +#undef __DISABLE_AVX10_2_256__ +#pragma GCC pop_options +#endif /* __DISABLE_AVX10_2_256__ */ + +#endif /* __AVX10_2BF16INTRIN_H_INCLUDED */ diff --git a/gcc/config/i386/i386-builtin-types.def b/gcc/config/i386/i386-builtin-types.def index 63b65846c8f..f3838424fd4 100644 --- a/gcc/config/i386/i386-builtin-types.def +++ b/gcc/config/i386/i386-builtin-types.def @@ -1474,3 +1474,12 @@ DEF_FUNCTION_TYPE (V64QI, V32HF, V32HF, V64QI, UDI) DEF_FUNCTION_TYPE (V16QI, V8HF, V16QI, UQI) DEF_FUNCTION_TYPE (V16QI, V16HF, V16QI, UHI) DEF_FUNCTION_TYPE (V32QI, V32HF, V32QI, USI) +DEF_FUNCTION_TYPE (V32BF, V32BF, V32BF) +DEF_FUNCTION_TYPE (V16BF, V16BF, V16BF) +DEF_FUNCTION_TYPE (V8BF, V8BF, V8BF) +DEF_FUNCTION_TYPE (V32BF, V32BF, V32BF, USI) +DEF_FUNCTION_TYPE (V16BF, V16BF, V16BF, UHI) +DEF_FUNCTION_TYPE (V8BF, V8BF, V8BF, UQI) +DEF_FUNCTION_TYPE (V32BF, V32BF, V32BF, V32BF, USI) +DEF_FUNCTION_TYPE (V16BF, V16BF, V16BF, V16BF, UHI) +DEF_FUNCTION_TYPE (V8BF, V8BF, V8BF, V8BF, UQI) diff --git a/gcc/config/i386/i386-builtin.def b/gcc/config/i386/i386-builtin.def index 6f5ab32dd0d..3f3bc768348 100644 --- a/gcc/config/i386/i386-builtin.def +++ b/gcc/config/i386/i386-builtin.def @@ -3159,6 +3159,84 @@ BDESC (0, OPTION_MASK_ISA2_AVX10_2_512, CODE_FOR_vcvtneph2hf8sv32hf_mask, "__bui BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_vcvthf82phv8hf_mask, "__builtin_ia32_vcvthf82ph128_mask", IX86_BUILTIN_VCVTHF82PH128_MASK, UNKNOWN, (int) V8HF_FTYPE_V16QI_V8HF_UQI) BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_vcvthf82phv16hf_mask, "__builtin_ia32_vcvthf82ph256_mask", IX86_BUILTIN_VCVTHF82PH256_MASK, UNKNOWN, (int) V16HF_FTYPE_V16QI_V16HF_UHI) BDESC (0, OPTION_MASK_ISA2_AVX10_2_512, CODE_FOR_vcvthf82phv32hf_mask, "__builtin_ia32_vcvthf82ph512_mask", IX86_BUILTIN_VCVTHF82PH512_MASK, UNKNOWN, (int) V32HF_FTYPE_V32QI_V32HF_USI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_512, CODE_FOR_avx10_2_addnepbf16_v32bf, "__builtin_ia32_addnepbf16512", IX86_BUILTIN_ADDNEPBF16512, UNKNOWN, (int) V32BF_FTYPE_V32BF_V32BF) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_512, CODE_FOR_avx10_2_addnepbf16_v32bf_mask, "__builtin_ia32_addnepbf16512_mask", IX86_BUILTIN_ADDNEPBF16512_MASK, UNKNOWN, (int) V32BF_FTYPE_V32BF_V32BF_V32BF_USI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_addnepbf16_v16bf, "__builtin_ia32_addnepbf16256", IX86_BUILTIN_ADDNEPBF16256, UNKNOWN, (int) V16BF_FTYPE_V16BF_V16BF) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_addnepbf16_v16bf_mask, "__builtin_ia32_addnepbf16256_mask", IX86_BUILTIN_ADDNEPBF16256_MASK, UNKNOWN, (int) V16BF_FTYPE_V16BF_V16BF_V16BF_UHI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_addnepbf16_v8bf, "__builtin_ia32_addnepbf16128", IX86_BUILTIN_ADDNEPBF16128, UNKNOWN, (int) V8BF_FTYPE_V8BF_V8BF) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_addnepbf16_v8bf_mask, "__builtin_ia32_addnepbf16128_mask", IX86_BUILTIN_ADDNEPBF16128_MASK, UNKNOWN, (int) V8BF_FTYPE_V8BF_V8BF_V8BF_UQI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_512, CODE_FOR_avx10_2_subnepbf16_v32bf, "__builtin_ia32_subnepbf16512", IX86_BUILTIN_SUBNEPBF16512, UNKNOWN, (int) V32BF_FTYPE_V32BF_V32BF) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_512, CODE_FOR_avx10_2_subnepbf16_v32bf_mask, "__builtin_ia32_subnepbf16512_mask", IX86_BUILTIN_SUBNEPBF16512_MASK, UNKNOWN, (int) V32BF_FTYPE_V32BF_V32BF_V32BF_USI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_subnepbf16_v16bf, "__builtin_ia32_subnepbf16256", IX86_BUILTIN_SUBNEPBF16256, UNKNOWN, (int) V16BF_FTYPE_V16BF_V16BF) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_subnepbf16_v16bf_mask, "__builtin_ia32_subnepbf16256_mask", IX86_BUILTIN_SUBNEPBF16256_MASK, UNKNOWN, (int) V16BF_FTYPE_V16BF_V16BF_V16BF_UHI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_subnepbf16_v8bf, "__builtin_ia32_subnepbf16128", IX86_BUILTIN_SUBNEPBF16128, UNKNOWN, (int) V8BF_FTYPE_V8BF_V8BF) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_subnepbf16_v8bf_mask, "__builtin_ia32_subnepbf16128_mask", IX86_BUILTIN_SUBNEPBF16128_MASK, UNKNOWN, (int) V8BF_FTYPE_V8BF_V8BF_V8BF_UQI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_512, CODE_FOR_avx10_2_mulnepbf16_v32bf, "__builtin_ia32_mulnepbf16512", IX86_BUILTIN_MULNEPBF16512, UNKNOWN, (int) V32BF_FTYPE_V32BF_V32BF) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_512, CODE_FOR_avx10_2_mulnepbf16_v32bf_mask, "__builtin_ia32_mulnepbf16512_mask", IX86_BUILTIN_MULNEPBF16512_MASK, UNKNOWN, (int) V32BF_FTYPE_V32BF_V32BF_V32BF_USI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_mulnepbf16_v16bf, "__builtin_ia32_mulnepbf16256", IX86_BUILTIN_MULNEPBF16256, UNKNOWN, (int) V16BF_FTYPE_V16BF_V16BF) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_mulnepbf16_v16bf_mask, "__builtin_ia32_mulnepbf16256_mask", IX86_BUILTIN_MULNEPBF16256_MASK, UNKNOWN, (int) V16BF_FTYPE_V16BF_V16BF_V16BF_UHI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_mulnepbf16_v8bf, "__builtin_ia32_mulnepbf16128", IX86_BUILTIN_MULNEPBF16128, UNKNOWN, (int) V8BF_FTYPE_V8BF_V8BF) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_mulnepbf16_v8bf_mask, "__builtin_ia32_mulnepbf16128_mask", IX86_BUILTIN_MULNEPBF16128_MASK, UNKNOWN, (int) V8BF_FTYPE_V8BF_V8BF_V8BF_UQI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_512, CODE_FOR_avx10_2_divnepbf16_v32bf, "__builtin_ia32_divnepbf16512", IX86_BUILTIN_DIVNEPBF16512, UNKNOWN, (int) V32BF_FTYPE_V32BF_V32BF) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_512, CODE_FOR_avx10_2_divnepbf16_v32bf_mask, "__builtin_ia32_divnepbf16512_mask", IX86_BUILTIN_DIVNEPBF16512_MASK, UNKNOWN, (int) V32BF_FTYPE_V32BF_V32BF_V32BF_USI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_divnepbf16_v16bf, "__builtin_ia32_divnepbf16256", IX86_BUILTIN_DIVNEPBF16256, UNKNOWN, (int) V16BF_FTYPE_V16BF_V16BF) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_divnepbf16_v16bf_mask, "__builtin_ia32_divnepbf16256_mask", IX86_BUILTIN_DIVNEPBF16256_MASK, UNKNOWN, (int) V16BF_FTYPE_V16BF_V16BF_V16BF_UHI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_divnepbf16_v8bf, "__builtin_ia32_divnepbf16128", IX86_BUILTIN_DIVNEPBF16128, UNKNOWN, (int) V8BF_FTYPE_V8BF_V8BF) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_divnepbf16_v8bf_mask, "__builtin_ia32_divnepbf16128_mask", IX86_BUILTIN_DIVNEPBF16128_MASK, UNKNOWN, (int) V8BF_FTYPE_V8BF_V8BF_V8BF_UQI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_512, CODE_FOR_avx10_2_smaxpbf16_v32bf, "__builtin_ia32_maxpbf16512", IX86_BUILTIN_MAXPBF16512, UNKNOWN, (int) V32BF_FTYPE_V32BF_V32BF) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_512, CODE_FOR_avx10_2_smaxpbf16_v32bf_mask, "__builtin_ia32_maxpbf16512_mask", IX86_BUILTIN_MAXPBF16512_MASK, UNKNOWN, (int) V32BF_FTYPE_V32BF_V32BF_V32BF_USI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_smaxpbf16_v16bf, "__builtin_ia32_maxpbf16256", IX86_BUILTIN_MAXPBF16256, UNKNOWN, (int) V16BF_FTYPE_V16BF_V16BF) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_smaxpbf16_v16bf_mask, "__builtin_ia32_maxpbf16256_mask", IX86_BUILTIN_MAXPBF16256_MASK, UNKNOWN, (int) V16BF_FTYPE_V16BF_V16BF_V16BF_UHI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_smaxpbf16_v8bf, "__builtin_ia32_maxpbf16128", IX86_BUILTIN_MAXPBF16128, UNKNOWN, (int) V8BF_FTYPE_V8BF_V8BF) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_smaxpbf16_v8bf_mask, "__builtin_ia32_maxpbf16128_mask", IX86_BUILTIN_MAXPBF16128_MASK, UNKNOWN, (int) V8BF_FTYPE_V8BF_V8BF_V8BF_UQI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_512, CODE_FOR_avx10_2_sminpbf16_v32bf, "__builtin_ia32_minpbf16512", IX86_BUILTIN_MINPBF16512, UNKNOWN, (int) V32BF_FTYPE_V32BF_V32BF) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_512, CODE_FOR_avx10_2_sminpbf16_v32bf_mask, "__builtin_ia32_minpbf16512_mask", IX86_BUILTIN_MINPBF16512_MASK, UNKNOWN, (int) V32BF_FTYPE_V32BF_V32BF_V32BF_USI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_sminpbf16_v16bf, "__builtin_ia32_minpbf16256", IX86_BUILTIN_MINPBF16256, UNKNOWN, (int) V16BF_FTYPE_V16BF_V16BF) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_sminpbf16_v16bf_mask, "__builtin_ia32_minpbf16256_mask", IX86_BUILTIN_MINPBF16256_MASK, UNKNOWN, (int) V16BF_FTYPE_V16BF_V16BF_V16BF_UHI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_sminpbf16_v8bf, "__builtin_ia32_minpbf16128", IX86_BUILTIN_MINPBF16128, UNKNOWN, (int) V8BF_FTYPE_V8BF_V8BF) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_sminpbf16_v8bf_mask, "__builtin_ia32_minpbf16128_mask", IX86_BUILTIN_MINPBF16128_MASK, UNKNOWN, (int) V8BF_FTYPE_V8BF_V8BF_V8BF_UQI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_512, CODE_FOR_avx10_2_scalefpbf16_v32bf, "__builtin_ia32_scalefpbf16512", IX86_BUILTIN_SCALEFPBF16512, UNKNOWN, (int) V32BF_FTYPE_V32BF_V32BF) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_512, CODE_FOR_avx10_2_scalefpbf16_v32bf_mask, "__builtin_ia32_scalefpbf16512_mask", IX86_BUILTIN_SCALEFPBF16512_MASK, UNKNOWN, (int) V32BF_FTYPE_V32BF_V32BF_V32BF_USI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_scalefpbf16_v16bf, "__builtin_ia32_scalefpbf16256", IX86_BUILTIN_SCALEFPBF16256, UNKNOWN, (int) V16BF_FTYPE_V16BF_V16BF) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_scalefpbf16_v16bf_mask, "__builtin_ia32_scalefpbf16256_mask", IX86_BUILTIN_SCALEFPBF16256_MASK, UNKNOWN, (int) V16BF_FTYPE_V16BF_V16BF_V16BF_UHI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_scalefpbf16_v8bf, "__builtin_ia32_scalefpbf16128", IX86_BUILTIN_SCALEFPBF16128, UNKNOWN, (int) V8BF_FTYPE_V8BF_V8BF) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_scalefpbf16_v8bf_mask, "__builtin_ia32_scalefpbf16128_mask", IX86_BUILTIN_SCALEFPBF16128_MASK, UNKNOWN, (int) V8BF_FTYPE_V8BF_V8BF_V8BF_UQI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_512, CODE_FOR_avx10_2_fmaddnepbf16_v32bf_mask, "__builtin_ia32_fmaddnepbf16512_mask", IX86_BUILTIN_FMADDNEPBF16512_MASK, UNKNOWN, (int) V32BF_FTYPE_V32BF_V32BF_V32BF_USI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_512, CODE_FOR_avx10_2_fmaddnepbf16_v32bf_mask3, "__builtin_ia32_fmaddnepbf16512_mask3", IX86_BUILTIN_FMADDNEPBF16512_MASK3, UNKNOWN, (int) V32BF_FTYPE_V32BF_V32BF_V32BF_USI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_512, CODE_FOR_avx10_2_fmaddnepbf16_v32bf_maskz, "__builtin_ia32_fmaddnepbf16512_maskz", IX86_BUILTIN_FMADDNEPBF16512_MASKZ, UNKNOWN, (int) V32BF_FTYPE_V32BF_V32BF_V32BF_USI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_fmaddnepbf16_v16bf_mask, "__builtin_ia32_fmaddnepbf16256_mask", IX86_BUILTIN_FMADDNEPBF16256_MASK, UNKNOWN, (int) V16BF_FTYPE_V16BF_V16BF_V16BF_UHI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_fmaddnepbf16_v16bf_mask3, "__builtin_ia32_fmaddnepbf16256_mask3", IX86_BUILTIN_FMADDNEPBF16256_MASK3, UNKNOWN, (int) V16BF_FTYPE_V16BF_V16BF_V16BF_UHI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_fmaddnepbf16_v16bf_maskz, "__builtin_ia32_fmaddnepbf16256_maskz", IX86_BUILTIN_FMADDNEPBF16256_MASKZ, UNKNOWN, (int) V16BF_FTYPE_V16BF_V16BF_V16BF_UHI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_fmaddnepbf16_v8bf_mask, "__builtin_ia32_fmaddnepbf16128_mask", IX86_BUILTIN_FMADDNEPBF16128_MASK, UNKNOWN, (int) V8BF_FTYPE_V8BF_V8BF_V8BF_UQI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_fmaddnepbf16_v8bf_mask3, "__builtin_ia32_fmaddnepbf16128_mask3", IX86_BUILTIN_FMADDNEPBF16128_MASK3, UNKNOWN, (int) V8BF_FTYPE_V8BF_V8BF_V8BF_UQI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_fmaddnepbf16_v8bf_maskz, "__builtin_ia32_fmaddnepbf16128_maskz", IX86_BUILTIN_FMADDNEPBF16128_MASKZ, UNKNOWN, (int) V8BF_FTYPE_V8BF_V8BF_V8BF_UQI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_512, CODE_FOR_avx10_2_fmsubnepbf16_v32bf_mask, "__builtin_ia32_fmsubnepbf16512_mask", IX86_BUILTIN_FMSUBNEPBF16512_MASK, UNKNOWN, (int) V32BF_FTYPE_V32BF_V32BF_V32BF_USI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_512, CODE_FOR_avx10_2_fmsubnepbf16_v32bf_mask3, "__builtin_ia32_fmsubnepbf16512_mask3", IX86_BUILTIN_FMSUBNEPBF16512_MASK3, UNKNOWN, (int) V32BF_FTYPE_V32BF_V32BF_V32BF_USI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_512, CODE_FOR_avx10_2_fmsubnepbf16_v32bf_maskz, "__builtin_ia32_fmsubnepbf16512_maskz", IX86_BUILTIN_FMSUBNEPBF16512_MASKZ, UNKNOWN, (int) V32BF_FTYPE_V32BF_V32BF_V32BF_USI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_fmsubnepbf16_v16bf_mask, "__builtin_ia32_fmsubnepbf16256_mask", IX86_BUILTIN_FMSUBNEPBF16256_MASK, UNKNOWN, (int) V16BF_FTYPE_V16BF_V16BF_V16BF_UHI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_fmsubnepbf16_v16bf_mask3, "__builtin_ia32_fmsubnepbf16256_mask3", IX86_BUILTIN_FMSUBNEPBF16256_MASK3, UNKNOWN, (int) V16BF_FTYPE_V16BF_V16BF_V16BF_UHI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_fmsubnepbf16_v16bf_maskz, "__builtin_ia32_fmsubnepbf16256_maskz", IX86_BUILTIN_FMSUBNEPBF16256_MASKZ, UNKNOWN, (int) V16BF_FTYPE_V16BF_V16BF_V16BF_UHI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_fmsubnepbf16_v8bf_mask, "__builtin_ia32_fmsubnepbf16128_mask", IX86_BUILTIN_FMSUBNEPBF16128_MASK, UNKNOWN, (int) V8BF_FTYPE_V8BF_V8BF_V8BF_UQI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_fmsubnepbf16_v8bf_mask3, "__builtin_ia32_fmsubnepbf16128_mask3", IX86_BUILTIN_FMSUBNEPBF16128_MASK3, UNKNOWN, (int) V8BF_FTYPE_V8BF_V8BF_V8BF_UQI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_fmsubnepbf16_v8bf_maskz, "__builtin_ia32_fmsubnepbf16128_maskz", IX86_BUILTIN_FMSUBNEPBF16128_MASKZ, UNKNOWN, (int) V8BF_FTYPE_V8BF_V8BF_V8BF_UQI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_512, CODE_FOR_avx10_2_fnmaddnepbf16_v32bf_mask, "__builtin_ia32_fnmaddnepbf16512_mask", IX86_BUILTIN_FNMADDNEPBF16512_MASK, UNKNOWN, (int) V32BF_FTYPE_V32BF_V32BF_V32BF_USI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_512, CODE_FOR_avx10_2_fnmaddnepbf16_v32bf_mask3, "__builtin_ia32_fnmaddnepbf16512_mask3", IX86_BUILTIN_FNMADDNEPBF16512_MASK3, UNKNOWN, (int) V32BF_FTYPE_V32BF_V32BF_V32BF_USI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_512, CODE_FOR_avx10_2_fnmaddnepbf16_v32bf_maskz, "__builtin_ia32_fnmaddnepbf16512_maskz", IX86_BUILTIN_FNMADDNEPBF16512_MASKZ, UNKNOWN, (int) V32BF_FTYPE_V32BF_V32BF_V32BF_USI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_fnmaddnepbf16_v16bf_mask, "__builtin_ia32_fnmaddnepbf16256_mask", IX86_BUILTIN_FNMADDNEPBF16256_MASK, UNKNOWN, (int) V16BF_FTYPE_V16BF_V16BF_V16BF_UHI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_fnmaddnepbf16_v16bf_mask3, "__builtin_ia32_fnmaddnepbf16256_mask3", IX86_BUILTIN_FNMADDNEPBF16256_MASK3, UNKNOWN, (int) V16BF_FTYPE_V16BF_V16BF_V16BF_UHI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_fnmaddnepbf16_v16bf_maskz, "__builtin_ia32_fnmaddnepbf16256_maskz", IX86_BUILTIN_FNMADDNEPBF16256_MASKZ, UNKNOWN, (int) V16BF_FTYPE_V16BF_V16BF_V16BF_UHI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_fnmaddnepbf16_v8bf_mask, "__builtin_ia32_fnmaddnepbf16128_mask", IX86_BUILTIN_FNMADDNEPBF16128_MASK, UNKNOWN, (int) V8BF_FTYPE_V8BF_V8BF_V8BF_UQI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_fnmaddnepbf16_v8bf_mask3, "__builtin_ia32_fnmaddnepbf16128_mask3", IX86_BUILTIN_FNMADDNEPBF16128_MASK3, UNKNOWN, (int) V8BF_FTYPE_V8BF_V8BF_V8BF_UQI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_fnmaddnepbf16_v8bf_maskz, "__builtin_ia32_fnmaddnepbf16128_maskz", IX86_BUILTIN_FNMADDNEPBF16128_MASKZ, UNKNOWN, (int) V8BF_FTYPE_V8BF_V8BF_V8BF_UQI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_512, CODE_FOR_avx10_2_fnmsubnepbf16_v32bf_mask, "__builtin_ia32_fnmsubnepbf16512_mask", IX86_BUILTIN_FNMSUBNEPBF16512_MASK, UNKNOWN, (int) V32BF_FTYPE_V32BF_V32BF_V32BF_USI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_512, CODE_FOR_avx10_2_fnmsubnepbf16_v32bf_mask3, "__builtin_ia32_fnmsubnepbf16512_mask3", IX86_BUILTIN_FNMSUBNEPBF16512_MASK3, UNKNOWN, (int) V32BF_FTYPE_V32BF_V32BF_V32BF_USI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_512, CODE_FOR_avx10_2_fnmsubnepbf16_v32bf_maskz, "__builtin_ia32_fnmsubnepbf16512_maskz", IX86_BUILTIN_FNMSUBNEPBF16512_MASKZ, UNKNOWN, (int) V32BF_FTYPE_V32BF_V32BF_V32BF_USI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_fnmsubnepbf16_v16bf_mask, "__builtin_ia32_fnmsubnepbf16256_mask", IX86_BUILTIN_FNMSUBNEPBF16256_MASK, UNKNOWN, (int) V16BF_FTYPE_V16BF_V16BF_V16BF_UHI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_fnmsubnepbf16_v16bf_mask3, "__builtin_ia32_fnmsubnepbf16256_mask3", IX86_BUILTIN_FNMSUBNEPBF16256_MASK3, UNKNOWN, (int) V16BF_FTYPE_V16BF_V16BF_V16BF_UHI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_fnmsubnepbf16_v16bf_maskz, "__builtin_ia32_fnmsubnepbf16256_maskz", IX86_BUILTIN_FNMSUBNEPBF16256_MASKZ, UNKNOWN, (int) V16BF_FTYPE_V16BF_V16BF_V16BF_UHI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_fnmsubnepbf16_v8bf_mask, "__builtin_ia32_fnmsubnepbf16128_mask", IX86_BUILTIN_FNMSUBNEPBF16128_MASK, UNKNOWN, (int) V8BF_FTYPE_V8BF_V8BF_V8BF_UQI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_fnmsubnepbf16_v8bf_mask3, "__builtin_ia32_fnmsubnepbf16128_mask3", IX86_BUILTIN_FNMSUBNEPBF16128_MASK3, UNKNOWN, (int) V8BF_FTYPE_V8BF_V8BF_V8BF_UQI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_fnmsubnepbf16_v8bf_maskz, "__builtin_ia32_fnmsubnepbf16128_maskz", IX86_BUILTIN_FNMSUBNEPBF16128_MASKZ, UNKNOWN, (int) V8BF_FTYPE_V8BF_V8BF_V8BF_UQI) /* Builtins with rounding support. */ BDESC_END (ARGS, ROUND_ARGS) diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc index c5305395a64..dff9e09809e 100644 --- a/gcc/config/i386/i386-expand.cc +++ b/gcc/config/i386/i386-expand.cc @@ -11330,6 +11330,9 @@ ix86_expand_args_builtin (const struct builtin_description *d, case V16HI_FTYPE_V8SI_V8SI: case V64QI_FTYPE_V64QI_V64QI: case V32QI_FTYPE_V32QI_V32QI: + case V32BF_FTYPE_V32BF_V32BF: + case V16BF_FTYPE_V16BF_V16BF: + case V8BF_FTYPE_V8BF_V8BF: case V16HI_FTYPE_V32QI_V32QI: case V16HI_FTYPE_V16HI_V16HI: case V8SI_FTYPE_V4DF_V4DF: @@ -11497,6 +11500,7 @@ ix86_expand_args_builtin (const struct builtin_description *d, case V16HI_FTYPE_V8HI_V16HI_UHI: case V16HI_FTYPE_HI_V16HI_UHI: case V8HI_FTYPE_V8HI_V8HI_UQI: + case V8BF_FTYPE_V8BF_V8BF_UQI: case V8HI_FTYPE_HI_V8HI_UQI: case V16HF_FTYPE_V16HF_V16HF_UHI: case V8SF_FTYPE_V8HI_V8SF_UQI: @@ -11594,9 +11598,11 @@ ix86_expand_args_builtin (const struct builtin_description *d, case V16HF_FTYPE_V16HF_V16HF_V16HF: case V16HI_FTYPE_V16HF_V16HI_UHI: case V16HI_FTYPE_V16HI_V16HI_UHI: + case V16BF_FTYPE_V16BF_V16BF_UHI: case V8HI_FTYPE_V16QI_V8HI_UQI: case V16HI_FTYPE_V16QI_V16HI_UHI: case V32HI_FTYPE_V32HI_V32HI_USI: + case V32BF_FTYPE_V32BF_V32BF_USI: case V32HI_FTYPE_V32QI_V32HI_USI: case V8DI_FTYPE_V16QI_V8DI_UQI: case V8DI_FTYPE_V2DI_V8DI_UQI: @@ -11726,6 +11732,7 @@ ix86_expand_args_builtin (const struct builtin_description *d, break; case V32QI_FTYPE_V32QI_V32QI_V32QI_USI: case V32HI_FTYPE_V32HI_V32HI_V32HI_USI: + case V32BF_FTYPE_V32BF_V32BF_V32BF_USI: case V32HI_FTYPE_V64QI_V64QI_V32HI_USI: case V16SI_FTYPE_V32HI_V32HI_V16SI_UHI: case V64QI_FTYPE_V64QI_V64QI_V64QI_UDI: @@ -11756,6 +11763,7 @@ ix86_expand_args_builtin (const struct builtin_description *d, case V16SI_FTYPE_V16SI_V16SI_V16SI_UHI: case V16SI_FTYPE_V16SI_V4SI_V16SI_UHI: case V8HI_FTYPE_V8HI_V8HI_V8HI_UQI: + case V8BF_FTYPE_V8BF_V8BF_V8BF_UQI: case V8SI_FTYPE_V8SI_V8SI_V8SI_UQI: case V4SI_FTYPE_V4SI_V4SI_V4SI_UQI: case V16HF_FTYPE_V16HF_V16HF_V16HF_UQI: @@ -11763,6 +11771,7 @@ ix86_expand_args_builtin (const struct builtin_description *d, case V8SF_FTYPE_V8SF_V8SF_V8SF_UQI: case V16QI_FTYPE_V16QI_V16QI_V16QI_UHI: case V16HI_FTYPE_V16HI_V16HI_V16HI_UHI: + case V16BF_FTYPE_V16BF_V16BF_V16BF_UHI: case V2DI_FTYPE_V2DI_V2DI_V2DI_UQI: case V2DF_FTYPE_V2DF_V2DF_V2DF_UQI: case V4DI_FTYPE_V4DI_V4DI_V4DI_UQI: diff --git a/gcc/config/i386/immintrin.h b/gcc/config/i386/immintrin.h index fea55a298fc..025334027eb 100644 --- a/gcc/config/i386/immintrin.h +++ b/gcc/config/i386/immintrin.h @@ -148,4 +148,8 @@ #include +#include + +#include + #endif /* _IMMINTRIN_H_INCLUDED */ diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index 1d62f96dcc5..50274f01a01 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -229,6 +229,7 @@ UNSPEC_VCVTNEPH2HF8 UNSPEC_VCVTNEPH2HF8S UNSPEC_VCVTHF82PH + UNSPEC_VSCALEFPBF16 ]) (define_c_enum "unspecv" [ @@ -499,6 +500,9 @@ (define_mode_iterator VHF_AVX10_2 [(V32HF "TARGET_AVX10_2_512") V16HF V8HF]) +(define_mode_iterator VBF_AVX10_2 + [(V32BF "TARGET_AVX10_2_512") V16BF V8BF]) + ;; All vector integer modes (define_mode_iterator VI [(V16SI "TARGET_AVX512F && TARGET_EVEX512") @@ -31812,3 +31816,292 @@ "TARGET_AVX10_2_256" "vdpphps\t{%3, %2, %0%{%5%}%N4|%0%{%5%}%N4, %2, %3}" [(set_attr "prefix" "evex")]) + +(define_insn "avx10_2_scalefpbf16_" + [(set (match_operand:VBF_AVX10_2 0 "register_operand" "=v") + (unspec:VBF_AVX10_2 + [(match_operand:VBF_AVX10_2 1 "register_operand" "v") + (match_operand:VBF_AVX10_2 2 "nonimmediate_operand" "vm")] + UNSPEC_VSCALEFPBF16))] + "TARGET_AVX10_2_256" + "vscalefpbf16\t{%2, %1, %0|%0, %1, %2}" + [(set_attr "prefix" "evex")]) + +(define_insn "avx10_2_pbf16_" + [(set (match_operand:VBF_AVX10_2 0 "register_operand" "=v") + (smaxmin:VBF_AVX10_2 + (match_operand:VBF_AVX10_2 1 "register_operand" "v") + (match_operand:VBF_AVX10_2 2 "nonimmediate_operand" "vm")))] + "TARGET_AVX10_2_256" + "vpbf16\t{%2, %1, %0|%0, %1, %2}" + [(set_attr "prefix" "evex") + (set_attr "mode" "")]) + +(define_insn "avx10_2_nepbf16_" + [(set (match_operand:VBF_AVX10_2 0 "register_operand" "=v") + (plusminusmultdiv:VBF_AVX10_2 + (match_operand:VBF_AVX10_2 1 "register_operand" "v") + (match_operand:VBF_AVX10_2 2 "nonimmediate_operand" "vm")))] + "TARGET_AVX10_2_256" + "vnepbf16\t{%2, %1, %0|%0, %1, %2}" + [(set_attr "prefix" "evex")]) + +(define_expand "avx10_2_fmaddnepbf16__maskz" + [(match_operand:VBF_AVX10_2 0 "register_operand") + (match_operand:VBF_AVX10_2 1 "nonimmediate_operand") + (match_operand:VBF_AVX10_2 2 "nonimmediate_operand") + (match_operand:VBF_AVX10_2 3 "nonimmediate_operand") + (match_operand: 4 "register_operand")] + "TARGET_AVX10_2_256" + { + emit_insn (gen_avx10_2_fmaddnepbf16__maskz_1 (operands[0], operands[1], + operands[2], operands[3], + CONST0_RTX(mode), + operands[4])); + DONE; + }) + +(define_insn "avx10_2_fmaddnepbf16_" + [(set (match_operand:VBF_AVX10_2 0 "register_operand" "=v,v,v") + (fma:VBF_AVX10_2 + (match_operand:VBF_AVX10_2 1 "nonimmediate_operand" "%0,0,v") + (match_operand:VBF_AVX10_2 2 "nonimmediate_operand" "vm,v,vm") + (match_operand:VBF_AVX10_2 3 "nonimmediate_operand" "v,vm,0")))] + "TARGET_AVX10_2_256" + "@ + vfmadd132nepbf16\t{%2, %3, %0|%0, %3, %2} + vfmadd213nepbf16\t{%3, %2, %0|%0, %2, %3} + vfmadd231nepbf16\t{%2, %1, %0|%0, %1, %2}" + [(set_attr "prefix" "evex") + (set_attr "type" "ssemuladd") + (set_attr "mode" "")]) + +(define_insn "avx10_2_fmaddnepbf16__mask" + [(set (match_operand:VBF_AVX10_2 0 "register_operand" "=v,v") + (vec_merge:VBF_AVX10_2 + (fma:VBF_AVX10_2 + (match_operand:VBF_AVX10_2 1 "nonimmediate_operand" "0,0") + (match_operand:VBF_AVX10_2 2 "nonimmediate_operand" "vm,v") + (match_operand:VBF_AVX10_2 3 "nonimmediate_operand" "v,vm")) + (match_dup 1) + (match_operand: 4 "register_operand" "Yk,Yk")))] + "TARGET_AVX10_2_256" + "@ + vfmadd132nepbf16\t{%2, %3, %0%{%4%}|%0%{%4%}, %3, %2} + vfmadd213nepbf16\t{%3, %2, %0%{%4%}|%0%{%4%}, %2, %3}" + [(set_attr "prefix" "evex") + (set_attr "type" "ssemuladd") + (set_attr "mode" "")]) + +(define_insn "avx10_2_fmaddnepbf16__mask3" + [(set (match_operand:VBF_AVX10_2 0 "register_operand" "=v") + (vec_merge:VBF_AVX10_2 + (fma:VBF_AVX10_2 + (match_operand:VBF_AVX10_2 1 "nonimmediate_operand" "%v") + (match_operand:VBF_AVX10_2 2 "nonimmediate_operand" "vm") + (match_operand:VBF_AVX10_2 3 "register_operand" "0")) + (match_dup 3) + (match_operand: 4 "register_operand" "Yk")))] + "TARGET_AVX10_2_256" + "vfmadd231nepbf16\t{%2, %1, %0%{%4%}|%0%{%4%}, %1, %2}" + [(set_attr "prefix" "evex") + (set_attr "type" "ssemuladd") + (set_attr "mode" "")]) + +(define_expand "avx10_2_fnmaddnepbf16__maskz" + [(match_operand:VBF_AVX10_2 0 "register_operand") + (match_operand:VBF_AVX10_2 1 "nonimmediate_operand") + (match_operand:VBF_AVX10_2 2 "nonimmediate_operand") + (match_operand:VBF_AVX10_2 3 "nonimmediate_operand") + (match_operand: 4 "register_operand")] + "TARGET_AVX10_2_256" + { + emit_insn (gen_avx10_2_fnmaddnepbf16__maskz_1 (operands[0], operands[1], + operands[2], operands[3], + CONST0_RTX(mode), + operands[4])); + DONE; + }) + +(define_insn "avx10_2_fnmaddnepbf16_" + [(set (match_operand:VBF_AVX10_2 0 "register_operand" "=v,v,v") + (fma:VBF_AVX10_2 + (neg:VBF_AVX10_2 + (match_operand:VBF_AVX10_2 1 "nonimmediate_operand" "%0,0,v")) + (match_operand:VBF_AVX10_2 2 "nonimmediate_operand" "vm,v,vm") + (match_operand:VBF_AVX10_2 3 "nonimmediate_operand" "v,vm,0")))] + "TARGET_AVX10_2_256" + "@ + vfnmadd132nepbf16\t{%2, %3, %0|%0, %3, %2} + vfnmadd213nepbf16\t{%3, %2, %0|%0, %2, %3} + vfnmadd231nepbf16\t{%2, %1, %0|%0, %1, %2}" + [(set_attr "prefix" "evex") + (set_attr "type" "ssemuladd") + (set_attr "mode" "")]) + +(define_insn "avx10_2_fnmaddnepbf16__mask" + [(set (match_operand:VBF_AVX10_2 0 "register_operand" "=v,v") + (vec_merge:VBF_AVX10_2 + (fma:VBF_AVX10_2 + (neg:VBF_AVX10_2 + (match_operand:VBF_AVX10_2 1 "nonimmediate_operand" "0,0")) + (match_operand:VBF_AVX10_2 2 "nonimmediate_operand" "vm,v") + (match_operand:VBF_AVX10_2 3 "nonimmediate_operand" "v,vm")) + (match_dup 1) + (match_operand: 4 "register_operand" "Yk,Yk")))] + "TARGET_AVX10_2_256" + "@ + vfnmadd132nepbf16\t{%2, %3, %0%{%4%}|%0%{%4%}, %3, %2} + vfnmadd213nepbf16\t{%3, %2, %0%{%4%}|%0%{%4%}, %2, %3}" + [(set_attr "prefix" "evex") + (set_attr "type" "ssemuladd") + (set_attr "mode" "")]) + +(define_insn "avx10_2_fnmaddnepbf16__mask3" + [(set (match_operand:VBF_AVX10_2 0 "register_operand" "=v") + (vec_merge:VBF_AVX10_2 + (fma:VBF_AVX10_2 + (neg:VBF_AVX10_2 + (match_operand:VBF_AVX10_2 1 "nonimmediate_operand" "%v")) + (match_operand:VBF_AVX10_2 2 "nonimmediate_operand" "vm") + (match_operand:VBF_AVX10_2 3 "register_operand" "0")) + (match_dup 3) + (match_operand: 4 "register_operand" "Yk")))] + "TARGET_AVX10_2_256" + "vfnmadd231nepbf16\t{%2, %1, %0%{%4%}|%0%{%4%}, %1, %2}" + [(set_attr "prefix" "evex") + (set_attr "type" "ssemuladd") + (set_attr "mode" "")]) + +(define_expand "avx10_2_fmsubnepbf16__maskz" + [(match_operand:VBF_AVX10_2 0 "register_operand") + (match_operand:VBF_AVX10_2 1 "nonimmediate_operand") + (match_operand:VBF_AVX10_2 2 "nonimmediate_operand") + (match_operand:VBF_AVX10_2 3 "nonimmediate_operand") + (match_operand: 4 "register_operand")] + "TARGET_AVX10_2_256" + { + emit_insn (gen_avx10_2_fmsubnepbf16__maskz_1 (operands[0], operands[1], + operands[2], operands[3], + CONST0_RTX(mode), + operands[4])); + DONE; + }) + +(define_insn "avx10_2_fmsubnepbf16_" + [(set (match_operand:VBF_AVX10_2 0 "register_operand" "=v,v,v") + (fma:VBF_AVX10_2 + (match_operand:VBF_AVX10_2 1 "nonimmediate_operand" "%0,0,v") + (match_operand:VBF_AVX10_2 2 "nonimmediate_operand" "vm,v,vm") + (neg:VBF_AVX10_2 + (match_operand:VBF_AVX10_2 3 "nonimmediate_operand" "v,vm,0"))))] + "TARGET_AVX10_2_256" + "@ + vfmsub132nepbf16\t{%2, %3, %0|%0, %3, %2} + vfmsub213nepbf16\t{%3, %2, %0|%0, %2, %3} + vfmsub231nepbf16\t{%2, %1, %0|%0, %1, %2}" + [(set_attr "prefix" "evex") + (set_attr "type" "ssemuladd") + (set_attr "mode" "")]) + +(define_insn "avx10_2_fmsubnepbf16__mask" + [(set (match_operand:VBF_AVX10_2 0 "register_operand" "=v,v") + (vec_merge:VBF_AVX10_2 + (fma:VBF_AVX10_2 + (match_operand:VBF_AVX10_2 1 "nonimmediate_operand" "0,0") + (match_operand:VBF_AVX10_2 2 "nonimmediate_operand" "vm,v") + (neg:VBF_AVX10_2 + (match_operand:VBF_AVX10_2 3 "nonimmediate_operand" "v,vm"))) + (match_dup 1) + (match_operand: 4 "register_operand" "Yk,Yk")))] + "TARGET_AVX10_2_256" + "@ + vfmsub132nepbf16\t{%2, %3, %0%{%4%}|%0%{%4%}, %3, %2} + vfmsub213nepbf16\t{%3, %2, %0%{%4%}|%0%{%4%}, %2, %3}" + [(set_attr "prefix" "evex") + (set_attr "type" "ssemuladd") + (set_attr "mode" "")]) + +(define_insn "avx10_2_fmsubnepbf16__mask3" + [(set (match_operand:VBF_AVX10_2 0 "register_operand" "=v") + (vec_merge:VBF_AVX10_2 + (fma:VBF_AVX10_2 + (match_operand:VBF_AVX10_2 1 "nonimmediate_operand" "%v") + (match_operand:VBF_AVX10_2 2 "nonimmediate_operand" "vm") + (neg:VBF_AVX10_2 + (match_operand:VBF_AVX10_2 3 "register_operand" "0"))) + (match_dup 3) + (match_operand: 4 "register_operand" "Yk")))] + "TARGET_AVX10_2_256" + "vfmsub231nepbf16\t{%2, %1, %0%{%4%}|%0%{%4%}, %1, %2}" + [(set_attr "prefix" "evex") + (set_attr "type" "ssemuladd") + (set_attr "mode" "")]) + +(define_expand "avx10_2_fnmsubnepbf16__maskz" + [(match_operand:VBF_AVX10_2 0 "register_operand") + (match_operand:VBF_AVX10_2 1 "nonimmediate_operand") + (match_operand:VBF_AVX10_2 2 "nonimmediate_operand") + (match_operand:VBF_AVX10_2 3 "nonimmediate_operand") + (match_operand: 4 "register_operand")] + "TARGET_AVX10_2_256" + { + emit_insn (gen_avx10_2_fnmsubnepbf16__maskz_1 (operands[0], operands[1], + operands[2], operands[3], + CONST0_RTX(mode), + operands[4])); + DONE; + }) + +(define_insn "avx10_2_fnmsubnepbf16_" + [(set (match_operand:VBF_AVX10_2 0 "register_operand" "=v,v,v") + (fma:VBF_AVX10_2 + (neg:VBF_AVX10_2 + (match_operand:VBF_AVX10_2 1 "nonimmediate_operand" "%0,0,v")) + (match_operand:VBF_AVX10_2 2 "nonimmediate_operand" "vm,v,vm") + (neg:VBF_AVX10_2 + (match_operand:VBF_AVX10_2 3 "nonimmediate_operand" "v,vm,0"))))] + "TARGET_AVX10_2_256" + "@ + vfnmsub132nepbf16\t{%2, %3, %0|%0, %3, %2} + vfnmsub213nepbf16\t{%3, %2, %0|%0, %2, %3} + vfnmsub231nepbf16\t{%2, %1, %0|%0, %1, %2}" + [(set_attr "prefix" "evex") + (set_attr "type" "ssemuladd") + (set_attr "mode" "")]) + +(define_insn "avx10_2_fnmsubnepbf16__mask" + [(set (match_operand:VBF_AVX10_2 0 "register_operand" "=v,v") + (vec_merge:VBF_AVX10_2 + (fma:VBF_AVX10_2 + (neg:VBF_AVX10_2 + (match_operand:VBF_AVX10_2 1 "nonimmediate_operand" "0,0")) + (match_operand:VBF_AVX10_2 2 "nonimmediate_operand" "vm,v") + (neg:VBF_AVX10_2 + (match_operand:VBF_AVX10_2 3 "nonimmediate_operand" "v,vm"))) + (match_dup 1) + (match_operand: 4 "register_operand" "Yk,Yk")))] + "TARGET_AVX10_2_256" + "@ + vfnmsub132nepbf16\t{%2, %3, %0%{%4%}|%0%{%4%}, %3, %2} + vfnmsub213nepbf16\t{%3, %2, %0%{%4%}|%0%{%4%}, %2, %3}" + [(set_attr "prefix" "evex") + (set_attr "type" "ssemuladd") + (set_attr "mode" "")]) + +(define_insn "avx10_2_fnmsubnepbf16__mask3" + [(set (match_operand:VBF_AVX10_2 0 "register_operand" "=v") + (vec_merge:VBF_AVX10_2 + (fma:VBF_AVX10_2 + (neg:VBF_AVX10_2 + (match_operand:VBF_AVX10_2 1 "nonimmediate_operand" "%v")) + (match_operand:VBF_AVX10_2 2 "nonimmediate_operand" "vm") + (neg:VBF_AVX10_2 + (match_operand:VBF_AVX10_2 3 "register_operand" "0"))) + (match_dup 3) + (match_operand: 4 "register_operand" "Yk")))] + "TARGET_AVX10_2_256" + "vfnmsub231nepbf16\t{%2, %1, %0%{%4%}|%0%{%4%}, %1, %2}" + [(set_attr "prefix" "evex") + (set_attr "type" "ssemuladd") + (set_attr "mode" "")]) diff --git a/gcc/testsuite/gcc.target/i386/avx10-helper.h b/gcc/testsuite/gcc.target/i386/avx10-helper.h index 385c7446979..9ff1dd72e92 100644 --- a/gcc/testsuite/gcc.target/i386/avx10-helper.h +++ b/gcc/testsuite/gcc.target/i386/avx10-helper.h @@ -3,9 +3,55 @@ #define AVX10 #define AVX512FP16 - +#define AVX512BF16 #include "avx512f-helper.h" #include "avx512f-mask-type.h" +#include + +#define NOINLINE __attribute__((noinline,noclone)) +typedef union +{ + uint32_t int32; + float flt; +}float_int_t; + +float NOINLINE +convert_bf16_to_fp32 (unsigned short bf16) +{ + unsigned int ii = bf16 << 16; + return *(float*)ⅈ +} + +unsigned short NOINLINE +convert_fp32_to_bf16 (float fp) +{ + float_int_t fi; + fi.flt = fp; + return ((fi.int32 >> 16) & 0xffff); +} + +unsigned short NOINLINE +convert_fp32_to_bf16_ne (float fp) +{ + float_int_t fi; + uint32_t rounding_bias, lsb; + + fi.flt = fp; + lsb = (fi.int32 >> 16) & 0x1; + rounding_bias = 0x7fff + lsb; + fi.int32 += rounding_bias; + + return ((fi.int32 >> 16) & 0xffff); +} + +float NOINLINE +scalef (float x, float y) +{ + __m128 px = _mm_load_ss (&x); + __m128 py = _mm_load_ss (&y); + __m128 out = _mm_scalef_ss (px, py); + return _mm_cvtss_f32 (out); +} #endif /* AVX10_HELPER_INCLUDED */ diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-512-bf16-1.c b/gcc/testsuite/gcc.target/i386/avx10_2-512-bf16-1.c new file mode 100644 index 00000000000..78839fb1297 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx10_2-512-bf16-1.c @@ -0,0 +1,87 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx10.2-512 -O2" } */ +/* { dg-final { scan-assembler-times "vaddnepbf16\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vaddnepbf16\[ \\t\]+%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vaddnepbf16\[ \\t\]+%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vsubnepbf16\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vsubnepbf16\[ \\t\]+%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vsubnepbf16\[ \\t\]+%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vmulnepbf16\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vmulnepbf16\[ \\t\]+%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vmulnepbf16\[ \\t\]+%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vdivnepbf16\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vdivnepbf16\[ \\t\]+%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vdivnepbf16\[ \\t\]+%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vmaxpbf16\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vmaxpbf16\[ \\t\]+%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vmaxpbf16\[ \\t\]+%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vminpbf16\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vminpbf16\[ \\t\]+%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vminpbf16\[ \\t\]+%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vscalefpbf16\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vscalefpbf16\[ \\t\]+%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vscalefpbf16\[ \\t\]+%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vfmadd132nepbf16\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vfmadd132nepbf16\[ \\t\]+%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vfmadd231nepbf16\[ \\t\]+%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vfmadd132nepbf16\[ \\t\]+%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vfmsub132nepbf16\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vfmsub132nepbf16\[ \\t\]+%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vfmsub231nepbf16\[ \\t\]+%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vfmsub132nepbf16\[ \\t\]+%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vfnmadd132nepbf16\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vfnmadd132nepbf16\[ \\t\]+%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vfnmadd231nepbf16\[ \\t\]+%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vfnmadd132nepbf16\[ \\t\]+%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vfnmsub132nepbf16\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vfnmsub132nepbf16\[ \\t\]+%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vfnmsub231nepbf16\[ \\t\]+%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vfnmsub132nepbf16\[ \\t\]+%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ + +#include + +volatile __m512bh res, x1, x2; +volatile __mmask32 m32; + +void extern +avx10_2_512_test (void) +{ + res = _mm512_addne_pbh (x1, x2); + res = _mm512_mask_addne_pbh (res, m32, x1, x2); + res = _mm512_maskz_addne_pbh (m32, x1, x2); + res = _mm512_subne_pbh (x1, x2); + res = _mm512_mask_subne_pbh (res, m32, x1, x2); + res = _mm512_maskz_subne_pbh (m32, x1, x2); + res = _mm512_mulne_pbh (x1, x2); + res = _mm512_mask_mulne_pbh (res, m32, x1, x2); + res = _mm512_maskz_mulne_pbh (m32, x1, x2); + res = _mm512_divne_pbh (x1, x2); + res = _mm512_mask_divne_pbh (res, m32, x1, x2); + res = _mm512_maskz_divne_pbh (m32, x1, x2); + res = _mm512_max_pbh (x1, x2); + res = _mm512_mask_max_pbh (res, m32, x1, x2); + res = _mm512_maskz_max_pbh (m32, x1, x2); + res = _mm512_min_pbh (x1, x2); + res = _mm512_mask_min_pbh (res, m32, x1, x2); + res = _mm512_maskz_min_pbh (m32, x1, x2); + res = _mm512_scalef_pbh (x1, x2); + res = _mm512_mask_scalef_pbh (res, m32, x1, x2); + res = _mm512_maskz_scalef_pbh (m32, x1, x2); + + res = _mm512_fmaddne_pbh (res, x1, x2); + res = _mm512_mask_fmaddne_pbh (res, m32, x1, x2); + res = _mm512_mask3_fmaddne_pbh (res, x1, x2, m32); + res = _mm512_maskz_fmaddne_pbh (m32,res, x1, x2); + res = _mm512_fmsubne_pbh (res, x1, x2); + res = _mm512_mask_fmsubne_pbh (res, m32, x1, x2); + res = _mm512_mask3_fmsubne_pbh (res, x1, x2, m32); + res = _mm512_maskz_fmsubne_pbh (m32,res, x1, x2); + res = _mm512_fnmaddne_pbh (res, x1, x2); + res = _mm512_mask_fnmaddne_pbh (res, m32, x1, x2); + res = _mm512_mask3_fnmaddne_pbh (res, x1, x2, m32); + res = _mm512_maskz_fnmaddne_pbh (m32,res, x1, x2); + res = _mm512_fnmsubne_pbh (res, x1, x2); + res = _mm512_mask_fnmsubne_pbh (res, m32, x1, x2); + res = _mm512_mask3_fnmsubne_pbh (res, x1, x2, m32); + res = _mm512_maskz_fnmsubne_pbh (m32,res, x1, x2); +} diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-512-vaddnepbf16-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-512-vaddnepbf16-2.c new file mode 100644 index 00000000000..3b7d1635335 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx10_2-512-vaddnepbf16-2.c @@ -0,0 +1,49 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mavx10.2-512" } */ +/* { dg-require-effective-target avx10_2_512 } */ + +#ifndef AVX10_2 +#define AVX10_2 +#define AVX10_2_512 +#define AVX10_512BIT +#endif +#include "avx10-helper.h" +#define SIZE_RES (AVX512F_LEN / 16) + +void +TEST (void) +{ + int i; + UNION_TYPE (AVX512F_LEN, bf16_uw) res1, res2, res3, src1, src2; + MASK_TYPE mask = MASK_VALUE; + unsigned short res_ref[SIZE_RES], res_ref2[SIZE_RES]; + + for (i = 0; i < SIZE_RES; i++) + { + res1.a[i] = 0; + res2.a[i] = DEFAULT_VALUE; + res3.a[i] = DEFAULT_VALUE; + float x = (float) (2 * (i % 7) + 7); + float y = (float) (3 * (i % 7) - 5); + float res; + src2.a[i] = convert_fp32_to_bf16 (y); + src1.a[i] = convert_fp32_to_bf16 (x); + res = x + y; + res_ref[i] = res_ref2[i] = convert_fp32_to_bf16_ne (res); + } + + res1.x = INTRINSIC (_addne_pbh) (src1.x, src2.x); + res2.x = INTRINSIC (_mask_addne_pbh) (res2.x, mask, src1.x, src2.x); + res3.x = INTRINSIC (_maskz_addne_pbh) (mask, src1.x, src2.x); + + if (UNION_CHECK (AVX512F_LEN, bf16_uw) (res1, res_ref)) + abort (); + + MASK_MERGE (bf16_uw) (res_ref2, mask, SIZE_RES); + if (UNION_CHECK (AVX512F_LEN, bf16_uw) (res2, res_ref2)) + abort (); + + MASK_ZERO (bf16_uw) (res_ref2, mask, SIZE_RES); + if (UNION_CHECK (AVX512F_LEN, bf16_uw) (res3, res_ref2)) + abort (); +} diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-512-vdivnepbf16-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-512-vdivnepbf16-2.c new file mode 100644 index 00000000000..ca9082885e7 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx10_2-512-vdivnepbf16-2.c @@ -0,0 +1,49 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mavx10.2-512" } */ +/* { dg-require-effective-target avx10_2_512 } */ + +#ifndef AVX10_2 +#define AVX10_2 +#define AVX10_2_512 +#define AVX10_512BIT +#endif +#include "avx10-helper.h" +#define SIZE_RES (AVX512F_LEN / 16) + +void +TEST (void) +{ + int i; + UNION_TYPE (AVX512F_LEN, bf16_uw) res1, res2, res3, src1, src2; + MASK_TYPE mask = MASK_VALUE; + unsigned short res_ref[SIZE_RES], res_ref2[SIZE_RES]; + + for (i = 0; i < SIZE_RES; i++) + { + res1.a[i] = 0; + res2.a[i] = DEFAULT_VALUE; + res3.a[i] = DEFAULT_VALUE; + float x = (float) (2 * (i % 7) + 7); + float y = (float) (3 * (i % 7) - 5); + float res; + src2.a[i] = convert_fp32_to_bf16 (y); + src1.a[i] = convert_fp32_to_bf16 (x); + res = x / y; + res_ref[i] = res_ref2[i] = convert_fp32_to_bf16_ne (res); + } + + res1.x = INTRINSIC (_divne_pbh) (src1.x, src2.x); + res2.x = INTRINSIC (_mask_divne_pbh) (res2.x, mask, src1.x, src2.x); + res3.x = INTRINSIC (_maskz_divne_pbh) (mask, src1.x, src2.x); + + if (UNION_CHECK (AVX512F_LEN, bf16_uw) (res1, res_ref)) + abort (); + + MASK_MERGE (bf16_uw) (res_ref2, mask, SIZE_RES); + if (UNION_CHECK (AVX512F_LEN, bf16_uw) (res2, res_ref2)) + abort (); + + MASK_ZERO (bf16_uw) (res_ref2, mask, SIZE_RES); + if (UNION_CHECK (AVX512F_LEN, bf16_uw) (res3, res_ref2)) + abort (); +} diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-512-vfmaddXXXnepbf16-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-512-vfmaddXXXnepbf16-2.c new file mode 100644 index 00000000000..b19c9d437fb --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx10_2-512-vfmaddXXXnepbf16-2.c @@ -0,0 +1,52 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mavx10.2-512" } */ +/* { dg-require-effective-target avx10_2_512 } */ + +#ifndef AVX10_2 +#define AVX10_2 +#define AVX10_2_512 +#define AVX10_512BIT +#endif +#include "avx10-helper.h" +#define SIZE_RES (AVX512F_LEN / 16) + +void +TEST (void) +{ + int i; + UNION_TYPE (AVX512F_LEN, bf16_uw) res1, res2, src1, src2; + MASK_TYPE mask = MASK_VALUE; + unsigned short res_ref[SIZE_RES], res_ref2[SIZE_RES]; + + for (i = 0; i < SIZE_RES; i++) + { + float x = 0.5; + float y = 2; + float z = 0.25; + src1.a[i] = convert_fp32_to_bf16 (x); + src2.a[i] = convert_fp32_to_bf16 (y); + res1.a[i] = convert_fp32_to_bf16 (z); + res2.a[i] = res1.a[i]; + float x16, y16, z16, m1, m2; + x16 = convert_bf16_to_fp32 (src1.a[i]); + y16 = convert_bf16_to_fp32 (src2.a[i]); + z16 = convert_bf16_to_fp32 (res1.a[i]); + m1 = y16 + x16 * z16; + m2 = z16 + x16 * y16; + res_ref[i] = convert_fp32_to_bf16 (m1); + res_ref2[i] = convert_fp32_to_bf16 (m2); + } + + MASK_MERGE (bf16_uw) (res1.a, mask, SIZE_RES); + MASK_MERGE (bf16_uw) (res2.a, mask, SIZE_RES); + res1.x = INTRINSIC (_mask_fmaddne_pbh) (res1.x, mask, src1.x, src2.x); + res2.x = INTRINSIC (_mask3_fmaddne_pbh) (src1.x, src2.x, res2.x, mask); + + MASK_MERGE (bf16_uw) (res_ref, mask, SIZE_RES); + if (UNION_CHECK (AVX512F_LEN, bf16_uw) (res1, res_ref)) + abort (); + + MASK_MERGE (bf16_uw) (res_ref2, mask, SIZE_RES); + if (UNION_CHECK (AVX512F_LEN, bf16_uw) (res2, res_ref2)) + abort (); +} diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-512-vfmsubXXXnepbf16-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-512-vfmsubXXXnepbf16-2.c new file mode 100644 index 00000000000..86adbc5fba4 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx10_2-512-vfmsubXXXnepbf16-2.c @@ -0,0 +1,53 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mavx10.2-512" } */ +/* { dg-require-effective-target avx10_2_512 } */ + +#ifndef AVX10_2 +#define AVX10_2 +#define AVX10_2_512 +#define AVX10_512BIT +#endif +#include "avx10-helper.h" + +#define SIZE_RES (AVX512F_LEN / 16) + +void +TEST (void) +{ + int i; + UNION_TYPE (AVX512F_LEN, bf16_uw) res1, res2, src1, src2; + MASK_TYPE mask = MASK_VALUE; + unsigned short res_ref[SIZE_RES], res_ref2[SIZE_RES]; + + for (i = 0; i < SIZE_RES; i++) + { + float x = 0.5; + float y = 2; + float z = 0.25; + src1.a[i] = convert_fp32_to_bf16 (x); + src2.a[i] = convert_fp32_to_bf16 (y); + res1.a[i] = convert_fp32_to_bf16 (z); + res2.a[i] = res1.a[i]; + float x16, y16, z16, m1, m2; + x16 = convert_bf16_to_fp32 (src1.a[i]); + y16 = convert_bf16_to_fp32 (src2.a[i]); + z16 = convert_bf16_to_fp32 (res1.a[i]); + m1 = -y16 + x16 * z16; + m2 = -z16 + x16 * y16; + res_ref[i] = convert_fp32_to_bf16 (m1); + res_ref2[i] = convert_fp32_to_bf16 (m2); + } + + MASK_MERGE (bf16_uw) (res1.a, mask, SIZE_RES); + MASK_MERGE (bf16_uw) (res2.a, mask, SIZE_RES); + res1.x = INTRINSIC (_mask_fmsubne_pbh) (res1.x, mask, src1.x, src2.x); + res2.x = INTRINSIC (_mask3_fmsubne_pbh) (src1.x, src2.x, res2.x, mask); + + MASK_MERGE (bf16_uw) (res_ref, mask, SIZE_RES); + if (UNION_CHECK (AVX512F_LEN, bf16_uw) (res1, res_ref)) + abort (); + + MASK_MERGE (bf16_uw) (res_ref2, mask, SIZE_RES); + if (UNION_CHECK (AVX512F_LEN, bf16_uw) (res2, res_ref2)) + abort (); +} diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-512-vfnmaddXXXnepbf16-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-512-vfnmaddXXXnepbf16-2.c new file mode 100644 index 00000000000..3a7d4cfca48 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx10_2-512-vfnmaddXXXnepbf16-2.c @@ -0,0 +1,53 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mavx10.2-512" } */ +/* { dg-require-effective-target avx10_2_512 } */ + +#ifndef AVX10_2 +#define AVX10_2 +#define AVX10_2_512 +#define AVX10_512BIT +#endif +#include "avx10-helper.h" + +#define SIZE_RES (AVX512F_LEN / 16) + +void +TEST (void) +{ + int i; + UNION_TYPE (AVX512F_LEN, bf16_uw) res1, res2, src1, src2; + MASK_TYPE mask = MASK_VALUE; + unsigned short res_ref[SIZE_RES], res_ref2[SIZE_RES]; + + for (i = 0; i < SIZE_RES; i++) + { + float x = 0.5; + float y = 2; + float z = 0.25; + src1.a[i] = convert_fp32_to_bf16 (x); + src2.a[i] = convert_fp32_to_bf16 (y); + res1.a[i] = convert_fp32_to_bf16 (z); + res2.a[i] = res1.a[i]; + float x16, y16, z16, m1, m2; + x16 = convert_bf16_to_fp32 (src1.a[i]); + y16 = convert_bf16_to_fp32 (src2.a[i]); + z16 = convert_bf16_to_fp32 (res1.a[i]); + m1 = y16 - x16 * z16; + m2 = z16 - x16 * y16; + res_ref[i] = convert_fp32_to_bf16 (m1); + res_ref2[i] = convert_fp32_to_bf16 (m2); + } + + MASK_MERGE (bf16_uw) (res1.a, mask, SIZE_RES); + MASK_MERGE (bf16_uw) (res2.a, mask, SIZE_RES); + res1.x = INTRINSIC (_mask_fnmaddne_pbh) (res1.x, mask, src1.x, src2.x); + res2.x = INTRINSIC (_mask3_fnmaddne_pbh) (src1.x, src2.x, res2.x, mask); + + MASK_MERGE (bf16_uw) (res_ref, mask, SIZE_RES); + if (UNION_CHECK (AVX512F_LEN, bf16_uw) (res1, res_ref)) + abort (); + + MASK_MERGE (bf16_uw) (res_ref2, mask, SIZE_RES); + if (UNION_CHECK (AVX512F_LEN, bf16_uw) (res2, res_ref2)) + abort (); +} diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-512-vfnmsubXXXnepbf16-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-512-vfnmsubXXXnepbf16-2.c new file mode 100644 index 00000000000..943146e14f2 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx10_2-512-vfnmsubXXXnepbf16-2.c @@ -0,0 +1,53 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mavx10.2-512" } */ +/* { dg-require-effective-target avx10_2_512 } */ + +#ifndef AVX10_2 +#define AVX10_2 +#define AVX10_2_512 +#define AVX10_512BIT +#endif +#include "avx10-helper.h" + +#define SIZE_RES (AVX512F_LEN / 16) + +void +TEST (void) +{ + int i; + UNION_TYPE (AVX512F_LEN, bf16_uw) res1, res2, src1, src2; + MASK_TYPE mask = MASK_VALUE; + unsigned short res_ref[SIZE_RES], res_ref2[SIZE_RES]; + + for (i = 0; i < SIZE_RES; i++) + { + float x = 0.5; + float y = 2; + float z = 0.25; + src1.a[i] = convert_fp32_to_bf16 (x); + src2.a[i] = convert_fp32_to_bf16 (y); + res1.a[i] = convert_fp32_to_bf16 (z); + res2.a[i] = res1.a[i]; + float x16, y16, z16, m1, m2; + x16 = convert_bf16_to_fp32 (src1.a[i]); + y16 = convert_bf16_to_fp32 (src2.a[i]); + z16 = convert_bf16_to_fp32 (res1.a[i]); + m1 = -y16 - x16 * z16; + m2 = -z16 - x16 * y16; + res_ref[i] = convert_fp32_to_bf16 (m1); + res_ref2[i] = convert_fp32_to_bf16 (m2); + } + + MASK_MERGE (bf16_uw) (res1.a, mask, SIZE_RES); + MASK_MERGE (bf16_uw) (res2.a, mask, SIZE_RES); + res1.x = INTRINSIC (_mask_fnmsubne_pbh) (res1.x, mask, src1.x, src2.x); + res2.x = INTRINSIC (_mask3_fnmsubne_pbh) (src1.x, src2.x, res2.x, mask); + + MASK_MERGE (bf16_uw) (res_ref, mask, SIZE_RES); + if (UNION_CHECK (AVX512F_LEN, bf16_uw) (res1, res_ref)) + abort (); + + MASK_MERGE (bf16_uw) (res_ref2, mask, SIZE_RES); + if (UNION_CHECK (AVX512F_LEN, bf16_uw) (res2, res_ref2)) + abort (); +} diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-512-vmaxpbf16-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-512-vmaxpbf16-2.c new file mode 100644 index 00000000000..a563b1e933e --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx10_2-512-vmaxpbf16-2.c @@ -0,0 +1,51 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mavx10.2-512" } */ +/* { dg-require-effective-target avx10_2_512 } */ + +#ifndef AVX10_2 +#define AVX10_2 +#define AVX10_2_512 +#define AVX10_512BIT +#endif +#include "avx10-helper.h" +#define SIZE_RES (AVX512F_LEN / 16) + +void +TEST (void) +{ + int i; + UNION_TYPE (AVX512F_LEN, bf16_uw) res1, res2, res3, src1, src2; + MASK_TYPE mask = MASK_VALUE; + unsigned short res_ref[SIZE_RES], res_ref2[SIZE_RES]; + + for (i = 0; i < SIZE_RES; i++) + { + res1.a[i] = 0; + res2.a[i] = DEFAULT_VALUE; + res3.a[i] = DEFAULT_VALUE; + float x = 0.5; + float y = 0.25; + float res; + src2.a[i] = convert_fp32_to_bf16 (y); + src1.a[i] = convert_fp32_to_bf16 (x); + if (x > y) + res_ref[i] = res_ref2[i] = src1.a[i]; + else + res_ref[i] = res_ref2[i] = src2.a[i]; + } + + res1.x = INTRINSIC (_max_pbh) (src1.x, src2.x); + res2.x = INTRINSIC (_mask_max_pbh) (res2.x, mask, src1.x, src2.x); + res3.x = INTRINSIC (_maskz_max_pbh) (mask, src1.x, src2.x); + + if (UNION_CHECK (AVX512F_LEN, bf16_uw) (res1, res_ref)) + abort (); + + MASK_MERGE (bf16_uw) (res_ref2, mask, SIZE_RES); + if (UNION_CHECK (AVX512F_LEN, bf16_uw) (res2, res_ref2)) + abort (); + + MASK_ZERO (bf16_uw) (res_ref2, mask, SIZE_RES); + if (UNION_CHECK (AVX512F_LEN, bf16_uw) (res3, res_ref2)) + abort (); +} diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-512-vminpbf16-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-512-vminpbf16-2.c new file mode 100644 index 00000000000..10f13d45403 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx10_2-512-vminpbf16-2.c @@ -0,0 +1,51 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mavx10.2-512" } */ +/* { dg-require-effective-target avx10_2_512 } */ + +#ifndef AVX10_2 +#define AVX10_2 +#define AVX10_2_512 +#define AVX10_512BIT +#endif +#include "avx10-helper.h" +#define SIZE_RES (AVX512F_LEN / 16) + +void +TEST (void) +{ + int i; + UNION_TYPE (AVX512F_LEN, bf16_uw) res1, res2, res3, src1, src2; + MASK_TYPE mask = MASK_VALUE; + unsigned short res_ref[SIZE_RES], res_ref2[SIZE_RES]; + + for (i = 0; i < SIZE_RES; i++) + { + res1.a[i] = 0; + res2.a[i] = DEFAULT_VALUE; + res3.a[i] = DEFAULT_VALUE; + float x = 0.5; + float y = 0.25; + float res; + src2.a[i] = convert_fp32_to_bf16 (y); + src1.a[i] = convert_fp32_to_bf16 (x); + if (x < y) + res_ref[i] = res_ref2[i] = src1.a[i]; + else + res_ref[i] = res_ref2[i] = src2.a[i]; + } + + res1.x = INTRINSIC (_min_pbh) (src1.x, src2.x); + res2.x = INTRINSIC (_mask_min_pbh) (res2.x, mask, src1.x, src2.x); + res3.x = INTRINSIC (_maskz_min_pbh) (mask, src1.x, src2.x); + + if (UNION_CHECK (AVX512F_LEN, bf16_uw) (res1, res_ref)) + abort (); + + MASK_MERGE (bf16_uw) (res_ref2, mask, SIZE_RES); + if (UNION_CHECK (AVX512F_LEN, bf16_uw) (res2, res_ref2)) + abort (); + + MASK_ZERO (bf16_uw) (res_ref2, mask, SIZE_RES); + if (UNION_CHECK (AVX512F_LEN, bf16_uw) (res3, res_ref2)) + abort (); +} diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-512-vmulnepbf16-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-512-vmulnepbf16-2.c new file mode 100644 index 00000000000..ce168070a93 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx10_2-512-vmulnepbf16-2.c @@ -0,0 +1,49 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mavx10.2-512" } */ +/* { dg-require-effective-target avx10_2_512 } */ + +#ifndef AVX10_2 +#define AVX10_2 +#define AVX10_2_512 +#define AVX10_512BIT +#endif +#include "avx10-helper.h" +#define SIZE_RES (AVX512F_LEN / 16) + +void +TEST (void) +{ + int i; + UNION_TYPE (AVX512F_LEN, bf16_uw) res1, res2, res3, src1, src2; + MASK_TYPE mask = MASK_VALUE; + unsigned short res_ref[SIZE_RES], res_ref2[SIZE_RES]; + + for (i = 0; i < SIZE_RES; i++) + { + res1.a[i] = 0; + res2.a[i] = DEFAULT_VALUE; + res3.a[i] = DEFAULT_VALUE; + float x = (float) (2 * (i % 7) + 7); + float y = (float) (3 * (i % 7) - 5); + float res; + src2.a[i] = convert_fp32_to_bf16 (y); + src1.a[i] = convert_fp32_to_bf16 (x); + res = x * y; + res_ref[i] = res_ref2[i] = convert_fp32_to_bf16_ne (res); + } + + res1.x = INTRINSIC (_mulne_pbh) (src1.x, src2.x); + res2.x = INTRINSIC (_mask_mulne_pbh) (res2.x, mask, src1.x, src2.x); + res3.x = INTRINSIC (_maskz_mulne_pbh) (mask, src1.x, src2.x); + + if (UNION_CHECK (AVX512F_LEN, bf16_uw) (res1, res_ref)) + abort (); + + MASK_MERGE (bf16_uw) (res_ref2, mask, SIZE_RES); + if (UNION_CHECK (AVX512F_LEN, bf16_uw) (res2, res_ref2)) + abort (); + + MASK_ZERO (bf16_uw) (res_ref2, mask, SIZE_RES); + if (UNION_CHECK (AVX512F_LEN, bf16_uw) (res3, res_ref2)) + abort (); +} diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-512-vscalefpbf16-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-512-vscalefpbf16-2.c new file mode 100644 index 00000000000..867f77ad3a7 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx10_2-512-vscalefpbf16-2.c @@ -0,0 +1,51 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mavx10.2-512" } */ +/* { dg-require-effective-target avx10_2_512 } */ + +#ifndef AVX10_2 +#define AVX10_2 +#define AVX10_2_512 +#define AVX10_512BIT +#endif +#include "avx10-helper.h" +#define SIZE_RES (AVX512F_LEN / 16) + +void +TEST (void) +{ + int i; + UNION_TYPE (AVX512F_LEN, bf16_uw) res1, res2, res3, src1, src2; + MASK_TYPE mask = MASK_VALUE; + unsigned short res_ref[SIZE_RES], res_ref2[SIZE_RES]; + + for (i = 0; i < SIZE_RES; i++) + { + res1.a[i] = 0; + res2.a[i] = DEFAULT_VALUE; + res3.a[i] = DEFAULT_VALUE; + float x = (float) (2 * (i % 7) + 7); + float y = 1.0 + (float) (4 * i) / (float) SIZE_RES; + float xx, yy, res; + src2.a[i] = convert_fp32_to_bf16 (y); + src1.a[i] = convert_fp32_to_bf16 (x); + xx = convert_bf16_to_fp32 (src1.a[i]); + yy = convert_bf16_to_fp32 (src2.a[i]); + res = scalef (xx, yy); + res_ref[i] = res_ref2[i] = convert_fp32_to_bf16_ne(res); + } + + res1.x = INTRINSIC (_scalef_pbh) (src1.x, src2.x); + res2.x = INTRINSIC (_mask_scalef_pbh) (res2.x, mask, src1.x, src2.x); + res3.x = INTRINSIC (_maskz_scalef_pbh) (mask, src1.x, src2.x); + + if (UNION_CHECK (AVX512F_LEN, bf16_uw) (res1, res_ref)) + abort (); + + MASK_MERGE (bf16_uw) (res_ref2, mask, SIZE_RES); + if (UNION_CHECK (AVX512F_LEN, bf16_uw) (res2, res_ref2)) + abort (); + + MASK_ZERO (bf16_uw) (res_ref2, mask, SIZE_RES); + if (UNION_CHECK (AVX512F_LEN, bf16_uw) (res3, res_ref2)) + abort (); +} diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-512-vsubnepbf16-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-512-vsubnepbf16-2.c new file mode 100644 index 00000000000..f8a9a51cd37 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx10_2-512-vsubnepbf16-2.c @@ -0,0 +1,49 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mavx10.2-512" } */ +/* { dg-require-effective-target avx10_2_512 } */ + +#ifndef AVX10_2 +#define AVX10_2 +#define AVX10_2_512 +#define AVX10_512BIT +#endif +#include "avx10-helper.h" +#define SIZE_RES (AVX512F_LEN / 16) + +void +TEST (void) +{ + int i; + UNION_TYPE (AVX512F_LEN, bf16_uw) res1, res2, res3, src1, src2; + MASK_TYPE mask = MASK_VALUE; + unsigned short res_ref[SIZE_RES], res_ref2[SIZE_RES]; + + for (i = 0; i < SIZE_RES; i++) + { + res1.a[i] = 0; + res2.a[i] = DEFAULT_VALUE; + res3.a[i] = DEFAULT_VALUE; + float x = (float) (2 * (i % 7) + 7); + float y = (float) (3 * (i % 7) - 5); + float res; + src2.a[i] = convert_fp32_to_bf16 (y); + src1.a[i] = convert_fp32_to_bf16 (x); + res = x - y; + res_ref[i] = res_ref2[i] = convert_fp32_to_bf16_ne (res); + } + + res1.x = INTRINSIC (_subne_pbh) (src1.x, src2.x); + res2.x = INTRINSIC (_mask_subne_pbh) (res2.x, mask, src1.x, src2.x); + res3.x = INTRINSIC (_maskz_subne_pbh) (mask, src1.x, src2.x); + + if (UNION_CHECK (AVX512F_LEN, bf16_uw) (res1, res_ref)) + abort (); + + MASK_MERGE (bf16_uw) (res_ref2, mask, SIZE_RES); + if (UNION_CHECK (AVX512F_LEN, bf16_uw) (res2, res_ref2)) + abort (); + + MASK_ZERO (bf16_uw) (res_ref2, mask, SIZE_RES); + if (UNION_CHECK (AVX512F_LEN, bf16_uw) (res3, res_ref2)) + abort (); +} diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-bf16-1.c b/gcc/testsuite/gcc.target/i386/avx10_2-bf16-1.c new file mode 100644 index 00000000000..831c8f849ef --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx10_2-bf16-1.c @@ -0,0 +1,172 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx10.2 -O2" } */ +/* { dg-final { scan-assembler-times "vaddnepbf16\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vaddnepbf16\[ \\t\]+%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vaddnepbf16\[ \\t\]+%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vaddnepbf16\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vaddnepbf16\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vaddnepbf16\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vsubnepbf16\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vsubnepbf16\[ \\t\]+%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vsubnepbf16\[ \\t\]+%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vsubnepbf16\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vsubnepbf16\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vsubnepbf16\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vmulnepbf16\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vmulnepbf16\[ \\t\]+%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vmulnepbf16\[ \\t\]+%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vmulnepbf16\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vmulnepbf16\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vmulnepbf16\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vdivnepbf16\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vdivnepbf16\[ \\t\]+%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vdivnepbf16\[ \\t\]+%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vdivnepbf16\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vdivnepbf16\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vdivnepbf16\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vmaxpbf16\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vmaxpbf16\[ \\t\]+%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vmaxpbf16\[ \\t\]+%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vmaxpbf16\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vmaxpbf16\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vmaxpbf16\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vminpbf16\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vminpbf16\[ \\t\]+%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vminpbf16\[ \\t\]+%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vminpbf16\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vminpbf16\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vminpbf16\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vscalefpbf16\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vscalefpbf16\[ \\t\]+%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vscalefpbf16\[ \\t\]+%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vscalefpbf16\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vscalefpbf16\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vscalefpbf16\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vfmadd132nepbf16\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vfmadd132nepbf16\[ \\t\]+%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vfmadd231nepbf16\[ \\t\]+%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vfmadd132nepbf16\[ \\t\]+%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vfmadd132nepbf16\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vfmadd132nepbf16\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vfmadd231nepbf16\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vfmadd132nepbf16\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vfmsub132nepbf16\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vfmsub132nepbf16\[ \\t\]+%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vfmsub231nepbf16\[ \\t\]+%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vfmsub132nepbf16\[ \\t\]+%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vfmsub132nepbf16\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vfmsub132nepbf16\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vfmsub231nepbf16\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vfmsub132nepbf16\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vfnmadd132nepbf16\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vfnmadd132nepbf16\[ \\t\]+%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vfnmadd231nepbf16\[ \\t\]+%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vfnmadd132nepbf16\[ \\t\]+%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vfnmadd132nepbf16\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vfnmadd132nepbf16\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vfnmadd231nepbf16\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vfnmadd132nepbf16\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vfnmsub132nepbf16\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vfnmsub132nepbf16\[ \\t\]+%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vfnmsub231nepbf16\[ \\t\]+%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vfnmsub132nepbf16\[ \\t\]+%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vfnmsub132nepbf16\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vfnmsub132nepbf16\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vfnmsub231nepbf16\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vfnmsub132nepbf16\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ + +#include + +volatile __m256bh res, x1, x2; +volatile __m128bh res1, x3, x4; +volatile __mmask16 m16; +volatile __mmask8 m8; + +void extern +avx10_2_test (void) +{ + res = _mm256_addne_pbh (x1, x2); + res = _mm256_mask_addne_pbh (res, m16, x1, x2); + res = _mm256_maskz_addne_pbh (m16, x1, x2); + res1 = _mm_addne_pbh (x3, x4); + res1 = _mm_mask_addne_pbh (res1, m8, x3, x4); + res1 = _mm_maskz_addne_pbh (m8, x3, x4); + + res = _mm256_subne_pbh (x1, x2); + res = _mm256_mask_subne_pbh (res, m16, x1, x2); + res = _mm256_maskz_subne_pbh (m16, x1, x2); + res1 = _mm_subne_pbh (x3, x4); + res1 = _mm_mask_subne_pbh (res1, m8, x3, x4); + res1 = _mm_maskz_subne_pbh (m8, x3, x4); + + res = _mm256_mulne_pbh (x1, x2); + res = _mm256_mask_mulne_pbh (res, m16, x1, x2); + res = _mm256_maskz_mulne_pbh (m16, x1, x2); + res1 = _mm_mulne_pbh (x3, x4); + res1 = _mm_mask_mulne_pbh (res1, m8, x3, x4); + res1 = _mm_maskz_mulne_pbh (m8, x3, x4); + + res = _mm256_divne_pbh (x1, x2); + res = _mm256_mask_divne_pbh (res, m16, x1, x2); + res = _mm256_maskz_divne_pbh (m16, x1, x2); + res1 = _mm_divne_pbh (x3, x4); + res1 = _mm_mask_divne_pbh (res1, m8, x3, x4); + res1 = _mm_maskz_divne_pbh (m8, x3, x4); + + res = _mm256_max_pbh (x1, x2); + res = _mm256_mask_max_pbh (res, m16, x1, x2); + res = _mm256_maskz_max_pbh (m16, x1, x2); + res1 = _mm_max_pbh (x3, x4); + res1 = _mm_mask_max_pbh (res1, m8, x3, x4); + res1 = _mm_maskz_max_pbh (m8, x3, x4); + + res = _mm256_min_pbh (x1, x2); + res = _mm256_mask_min_pbh (res, m16, x1, x2); + res = _mm256_maskz_min_pbh (m16, x1, x2); + res1 = _mm_min_pbh (x3, x4); + res1 = _mm_mask_min_pbh (res1, m8, x3, x4); + res1 = _mm_maskz_min_pbh (m8, x3, x4); + + res = _mm256_scalef_pbh (x1, x2); + res = _mm256_mask_scalef_pbh (res, m16, x1, x2); + res = _mm256_maskz_scalef_pbh (m16, x1, x2); + res1 = _mm_scalef_pbh (x3, x4); + res1 = _mm_mask_scalef_pbh (res1, m8, x3, x4); + res1 = _mm_maskz_scalef_pbh (m8, x3, x4); + + res = _mm256_fmaddne_pbh (res, x1, x2); + res = _mm256_mask_fmaddne_pbh (res, m16, x1, x2); + res = _mm256_mask3_fmaddne_pbh (res, x1, x2, m16); + res = _mm256_maskz_fmaddne_pbh (m16,res, x1, x2); + res1 = _mm_fmaddne_pbh (res1, x3, x4); + res1 = _mm_mask_fmaddne_pbh (res1, m8, x3, x4); + res1 = _mm_mask3_fmaddne_pbh (res1, x3, x4, m8); + res1 = _mm_maskz_fmaddne_pbh (m8,res1, x3, x4); + + res = _mm256_fmsubne_pbh (res, x1, x2); + res = _mm256_mask_fmsubne_pbh (res, m16, x1, x2); + res = _mm256_mask3_fmsubne_pbh (res, x1, x2, m16); + res = _mm256_maskz_fmsubne_pbh (m16,res, x1, x2); + res1 = _mm_fmsubne_pbh (res1, x3, x4); + res1 = _mm_mask_fmsubne_pbh (res1, m8, x3, x4); + res1 = _mm_mask3_fmsubne_pbh (res1, x3, x4, m8); + res1 = _mm_maskz_fmsubne_pbh (m8,res1, x3, x4); + + res = _mm256_fnmaddne_pbh (res, x1, x2); + res = _mm256_mask_fnmaddne_pbh (res, m16, x1, x2); + res = _mm256_mask3_fnmaddne_pbh (res, x1, x2, m16); + res = _mm256_maskz_fnmaddne_pbh (m16,res, x1, x2); + res1 = _mm_fnmaddne_pbh (res1, x3, x4); + res1 = _mm_mask_fnmaddne_pbh (res1, m8, x3, x4); + res1 = _mm_mask3_fnmaddne_pbh (res1, x3, x4, m8); + res1 = _mm_maskz_fnmaddne_pbh (m8,res1, x3, x4); + + res = _mm256_fnmsubne_pbh (res, x1, x2); + res = _mm256_mask_fnmsubne_pbh (res, m16, x1, x2); + res = _mm256_mask3_fnmsubne_pbh (res, x1, x2, m16); + res = _mm256_maskz_fnmsubne_pbh (m16,res, x1, x2); + res1 = _mm_fnmsubne_pbh (res1, x3, x4); + res1 = _mm_mask_fnmsubne_pbh (res1, m8, x3, x4); + res1 = _mm_mask3_fnmsubne_pbh (res1, x3, x4, m8); + res1 = _mm_maskz_fnmsubne_pbh (m8,res1, x3, x4); +} diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-vaddnepbf16-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-vaddnepbf16-2.c new file mode 100644 index 00000000000..7783dcee820 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx10_2-vaddnepbf16-2.c @@ -0,0 +1,16 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mavx10.2" } */ +/* { dg-require-effective-target avx10_2 } */ + +#define AVX10_2 +#define AVX512VL +#define AVX512F_LEN 256 +#define AVX512F_LEN_HALF 128 +#include "avx10_2-512-vaddnepbf16-2.c" + +#undef AVX512F_LEN +#undef AVX512F_LEN_HALF + +#define AVX512F_LEN 128 +#define AVX512F_LEN_HALF 128 +#include "avx10_2-512-vaddnepbf16-2.c" diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-vdivnepbf16-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-vdivnepbf16-2.c new file mode 100644 index 00000000000..dd2c5442c47 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx10_2-vdivnepbf16-2.c @@ -0,0 +1,16 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mavx10.2" } */ +/* { dg-require-effective-target avx10_2 } */ + +#define AVX10_2 +#define AVX512VL +#define AVX512F_LEN 256 +#define AVX512F_LEN_HALF 128 +#include "avx10_2-512-vdivnepbf16-2.c" + +#undef AVX512F_LEN +#undef AVX512F_LEN_HALF + +#define AVX512F_LEN 128 +#define AVX512F_LEN_HALF 128 +#include "avx10_2-512-vdivnepbf16-2.c" diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-vfmaddXXXnepbf16-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-vfmaddXXXnepbf16-2.c new file mode 100644 index 00000000000..a4f2e5f791c --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx10_2-vfmaddXXXnepbf16-2.c @@ -0,0 +1,16 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mavx10.2" } */ +/* { dg-require-effective-target avx10_2 } */ + +#define AVX10_2 +#define AVX512VL +#define AVX512F_LEN 256 +#define AVX512F_LEN_HALF 128 +#include "avx10_2-512-vfmaddXXXnepbf16-2.c" + +#undef AVX512F_LEN +#undef AVX512F_LEN_HALF + +#define AVX512F_LEN 128 +#define AVX512F_LEN_HALF 128 +#include "avx10_2-512-vfmaddXXXnepbf16-2.c" diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-vfmsubXXXnepbf16-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-vfmsubXXXnepbf16-2.c new file mode 100644 index 00000000000..406c1739e00 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx10_2-vfmsubXXXnepbf16-2.c @@ -0,0 +1,16 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mavx10.2" } */ +/* { dg-require-effective-target avx10_2 } */ + +#define AVX10_2 +#define AVX512VL +#define AVX512F_LEN 256 +#define AVX512F_LEN_HALF 128 +#include "avx10_2-512-vfmsubXXXnepbf16-2.c" + +#undef AVX512F_LEN +#undef AVX512F_LEN_HALF + +#define AVX512F_LEN 128 +#define AVX512F_LEN_HALF 128 +#include "avx10_2-512-vfmsubXXXnepbf16-2.c" diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-vfnmaddXXXnepbf16-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-vfnmaddXXXnepbf16-2.c new file mode 100644 index 00000000000..3f53099bc4b --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx10_2-vfnmaddXXXnepbf16-2.c @@ -0,0 +1,16 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mavx10.2" } */ +/* { dg-require-effective-target avx10_2 } */ + +#define AVX10_2 +#define AVX512VL +#define AVX512F_LEN 256 +#define AVX512F_LEN_HALF 128 +#include "avx10_2-512-vfnmaddXXXnepbf16-2.c" + +#undef AVX512F_LEN +#undef AVX512F_LEN_HALF + +#define AVX512F_LEN 128 +#define AVX512F_LEN_HALF 128 +#include "avx10_2-512-vfnmaddXXXnepbf16-2.c" diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-vfnmsubXXXnepbf16-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-vfnmsubXXXnepbf16-2.c new file mode 100644 index 00000000000..fc906ccad3c --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx10_2-vfnmsubXXXnepbf16-2.c @@ -0,0 +1,16 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mavx10.2" } */ +/* { dg-require-effective-target avx10_2 } */ + +#define AVX10_2 +#define AVX512VL +#define AVX512F_LEN 256 +#define AVX512F_LEN_HALF 128 +#include "avx10_2-512-vfnmsubXXXnepbf16-2.c" + +#undef AVX512F_LEN +#undef AVX512F_LEN_HALF + +#define AVX512F_LEN 128 +#define AVX512F_LEN_HALF 128 +#include "avx10_2-512-vfnmsubXXXnepbf16-2.c" diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-vmaxpbf16-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-vmaxpbf16-2.c new file mode 100644 index 00000000000..2b8f820822b --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx10_2-vmaxpbf16-2.c @@ -0,0 +1,16 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mavx10.2" } */ +/* { dg-require-effective-target avx10_2 } */ + +#define AVX10_2 +#define AVX512VL +#define AVX512F_LEN 256 +#define AVX512F_LEN_HALF 128 +#include "avx10_2-512-vmaxpbf16-2.c" + +#undef AVX512F_LEN +#undef AVX512F_LEN_HALF + +#define AVX512F_LEN 128 +#define AVX512F_LEN_HALF 128 +#include "avx10_2-512-vmaxpbf16-2.c" diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-vminpbf16-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-vminpbf16-2.c new file mode 100644 index 00000000000..dcb7c0e4a7e --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx10_2-vminpbf16-2.c @@ -0,0 +1,16 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mavx10.2" } */ +/* { dg-require-effective-target avx10_2 } */ + +#define AVX10_2 +#define AVX512VL +#define AVX512F_LEN 256 +#define AVX512F_LEN_HALF 128 +#include "avx10_2-512-vminpbf16-2.c" + +#undef AVX512F_LEN +#undef AVX512F_LEN_HALF + +#define AVX512F_LEN 128 +#define AVX512F_LEN_HALF 128 +#include "avx10_2-512-vminpbf16-2.c" diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-vmulnepbf16-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-vmulnepbf16-2.c new file mode 100644 index 00000000000..753e2d100d7 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx10_2-vmulnepbf16-2.c @@ -0,0 +1,16 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mavx10.2" } */ +/* { dg-require-effective-target avx10_2 } */ + +#define AVX10_2 +#define AVX512VL +#define AVX512F_LEN 256 +#define AVX512F_LEN_HALF 128 +#include "avx10_2-512-vmulnepbf16-2.c" + +#undef AVX512F_LEN +#undef AVX512F_LEN_HALF + +#define AVX512F_LEN 128 +#define AVX512F_LEN_HALF 128 +#include "avx10_2-512-vmulnepbf16-2.c" diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-vscalefpbf16-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-vscalefpbf16-2.c new file mode 100644 index 00000000000..8f26dfbc9bd --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx10_2-vscalefpbf16-2.c @@ -0,0 +1,16 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mavx10.2" } */ +/* { dg-require-effective-target avx10_2 } */ + +#define AVX10_2 +#define AVX512VL +#define AVX512F_LEN 256 +#define AVX512F_LEN_HALF 128 +#include "avx10_2-512-vscalefpbf16-2.c" + +#undef AVX512F_LEN +#undef AVX512F_LEN_HALF + +#define AVX512F_LEN 128 +#define AVX512F_LEN_HALF 128 +#include "avx10_2-512-vscalefpbf16-2.c" diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-vsubnepbf16-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-vsubnepbf16-2.c new file mode 100644 index 00000000000..ad02ee19de2 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx10_2-vsubnepbf16-2.c @@ -0,0 +1,16 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mavx10.2" } */ +/* { dg-require-effective-target avx10_2 } */ + +#define AVX10_2 +#define AVX512VL +#define AVX512F_LEN 256 +#define AVX512F_LEN_HALF 128 +#include "avx10_2-512-vsubnepbf16-2.c" + +#undef AVX512F_LEN +#undef AVX512F_LEN_HALF + +#define AVX512F_LEN 128 +#define AVX512F_LEN_HALF 128 +#include "avx10_2-512-vsubnepbf16-2.c" diff --git a/gcc/testsuite/gcc.target/i386/avx512f-helper.h b/gcc/testsuite/gcc.target/i386/avx512f-helper.h index 3cd6751af26..b61c03b4781 100644 --- a/gcc/testsuite/gcc.target/i386/avx512f-helper.h +++ b/gcc/testsuite/gcc.target/i386/avx512f-helper.h @@ -45,6 +45,7 @@ MAKE_MASK_MERGE(, float) MAKE_MASK_MERGE(d, double) MAKE_MASK_MERGE(i_ub, unsigned char) MAKE_MASK_MERGE(i_uw, unsigned short) +MAKE_MASK_MERGE(bf16_uw, unsigned short) MAKE_MASK_MERGE(i_ud, unsigned int) MAKE_MASK_MERGE(i_uq, unsigned long long) @@ -70,6 +71,7 @@ MAKE_MASK_ZERO(, float) MAKE_MASK_ZERO(d, double) MAKE_MASK_ZERO(i_ub, unsigned char) MAKE_MASK_ZERO(i_uw, unsigned short) +MAKE_MASK_ZERO(bf16_uw, unsigned short) MAKE_MASK_ZERO(i_ud, unsigned int) MAKE_MASK_ZERO(i_uq, unsigned long long) diff --git a/gcc/testsuite/gcc.target/i386/m512-check.h b/gcc/testsuite/gcc.target/i386/m512-check.h index d5d18372947..bdc682d63bb 100644 --- a/gcc/testsuite/gcc.target/i386/m512-check.h +++ b/gcc/testsuite/gcc.target/i386/m512-check.h @@ -67,6 +67,12 @@ typedef union _Float16 a[32]; } union512h; +typedef union +{ + __m512bh x; + unsigned short a[32]; +} union512bf16_uw; + typedef union { __m128h x; @@ -79,6 +85,18 @@ typedef union _Float16 a[16]; } union256h; +typedef union +{ + __m128bh x; + unsigned short a[8]; +} union128bf16_uw; + +typedef union +{ + __m256bh x; + unsigned short a[16]; +} union256bf16_uw; + #define CHECK_ROUGH_EXP(UNION_TYPE, VALUE_TYPE, FMT) \ static int \ __attribute__((noinline, unused)) \ @@ -155,3 +173,12 @@ CHECK_FP_EXP (union256h, _Float16, ESP_FLOAT16, "%f") CHECK_ROUGH_EXP (union128h, _Float16, "%f") CHECK_ROUGH_EXP (union256h, _Float16, "%f") #endif + +#if defined(AVX512BF16) +CHECK_EXP (union512bf16_uw, unsigned short, "%d") +#endif + +#if defined(AVX512BF16) +CHECK_EXP (union128bf16_uw, unsigned short, "%d") +CHECK_EXP (union256bf16_uw, unsigned short, "%d") +#endif From patchwork Mon Aug 19 08:56:50 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Haochen Jiang X-Patchwork-Id: 1973733 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=intel.com header.i=@intel.com header.a=rsa-sha256 header.s=Intel header.b=BGp0woFV; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4WnRRD327Kz1yf6 for ; Mon, 19 Aug 2024 19:01:28 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 4A7F43865C28 for ; Mon, 19 Aug 2024 09:01:26 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.18]) by sourceware.org (Postfix) with ESMTPS id 2A9E338654A1 for ; Mon, 19 Aug 2024 08:57:32 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 2A9E338654A1 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=intel.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 2A9E338654A1 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=192.198.163.18 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1724057860; cv=none; b=GgMJ6fxvynwuNTZUHPv4BU2K0nsxgdHgRFu9CemmmQdFIcEA0cr69q7t4XDaY4RaGb+y4Zn03QgLQB8lLsKWXsu3lCtmHlZ+JC/nqkFKzTIjNpvyJ4WSaJRJRmNujRL4gF0U+/Bo54vtat6Q6CwiLuMvSj5F+u7CMslGsXleAt4= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1724057860; c=relaxed/simple; bh=iq4XEiC1FXFlwEUnleiWEcEWfCVQXbKidH4/cvsMYss=; h=DKIM-Signature:From:To:Subject:Date:Message-ID:MIME-Version; b=n7X1oSKnAchCzjl1G+bK6DHPg203HGYKW7SWHPM3kYmUJrvWzS07C2kSB9COa4JhaHM8KdpcLoreldD0psS2NxcoESRMnnVJDFaH9MbRf9RGmAvrf4www6RUF8wgms0UyedQSpXcpSixI/5x2fJ4GdqLy/76YupJGJ/QwOLXN+s= ARC-Authentication-Results: i=1; server2.sourceware.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1724057852; x=1755593852; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=iq4XEiC1FXFlwEUnleiWEcEWfCVQXbKidH4/cvsMYss=; b=BGp0woFVsfa63tSDWogW5E3fJJRS/LlHmihYOCzZCXcyAJekOdwtQlXr XL+O9b2mgg1kZV911xwYG5XoOkuLr24AWVRmOTiSViMGFEIZKur5NVwa0 CuGt1MkQxL/sSMgLWBaLvFaXkyyR8nw0v7PNV4AarvhW4V3Dc+APNutTY UH5dNJkFZr2rWl8fEmkC9UAtzMpLCL+QT2+CuPu0C4y2NZIa45r7+SxVI hZ1FfYfseg0AjsdhhY+eqW/A53fC7Zu6ezPIRlocD20uzg/jE7EnVJycu QFVqFKAAe0z4ewhM3X4speRgXlHBaSxGnIYTjRNnBghKSjtofwEWrAOBt g==; X-CSE-ConnectionGUID: RhfwrKlvS2GlEFqsG+GA2w== X-CSE-MsgGUID: j3JemZUOT62ICSwVkc6LGw== X-IronPort-AV: E=McAfee;i="6700,10204,11168"; a="21837761" X-IronPort-AV: E=Sophos;i="6.10,158,1719903600"; d="scan'208";a="21837761" Received: from fmviesa001.fm.intel.com ([10.60.135.141]) by fmvoesa112.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Aug 2024 01:57:22 -0700 X-CSE-ConnectionGUID: JklFBVWoSZ6CH83keYe+lA== X-CSE-MsgGUID: FTbwGyz5SmOXE37W3gv8Vg== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.10,158,1719903600"; d="scan'208";a="91084204" Received: from scymds04.sc.intel.com ([10.82.73.238]) by fmviesa001.fm.intel.com with ESMTP; 19 Aug 2024 01:57:22 -0700 Received: from icl-spr-01.jf.intel.com (icl-spr-01.jf.intel.com [10.165.54.241]) by scymds04.sc.intel.com (Postfix) with ESMTP id 81FAD2003EAB; Mon, 19 Aug 2024 01:57:21 -0700 (PDT) From: Haochen Jiang To: gcc-patches@gcc.gnu.org Cc: hongtao.liu@intel.com, zewei.mo@pitt.edu, ubizjak@gmail.com, konglin1 , Levy Hsu Subject: [PATCH 06/12] [PATCH 2/2] AVX10.2: Support BF16 instructions Date: Mon, 19 Aug 2024 01:56:50 -0700 Message-ID: <20240819085717.193256-7-haochen.jiang@intel.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20240819085717.193256-1-haochen.jiang@intel.com> References: <20240819085717.193256-1-haochen.jiang@intel.com> MIME-Version: 1.0 X-Spam-Status: No, score=-10.8 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, SPF_HELO_NONE, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org From: konglin1 gcc/ChangeLog: * config/i386/avx10_2-512bf16intrin.h: Add new intrinsics. * config/i386/avx10_2bf16intrin.h: Diito. * config/i386/i386-builtin-types.def : Add new DEF_FUNCTION_TYPE for new type. * config/i386/i386-builtin.def (BDESC): Add new buildin. * config/i386/i386-expand.cc (ix86_expand_args_builtin): Handle new type. * config/i386/sse.md (avx10_2_rsqrtpbf16_): New define_insn. (avx10_2_sqrtnepbf16_): Ditto. (avx10_2_rcppbf16_): Ditto. (avx10_2_getexppbf16_): Ditto. (BF16IMMOP): New iterator. (bf16immop): Ditto. (avx10_2_pbf16_): New define_insn. (avx10_2_fpclasspbf16_): Ditto. (avx10_2_cmppbf16_): Ditto. (avx10_2_comsbf16_v8bf): Ditto. gcc/testsuite/ChangeLog: * gcc.target/i386/avx10-helper.h: Add helper functions. * gcc.target/i386/avx10_2-512-bf16-1.c: Add new tests. * gcc.target/i386/avx10_2-bf16-1.c: Ditto. * gcc.target/i386/avx-1.c: Add macros. * gcc.target/i386/sse-13.c: Ditto. * gcc.target/i386/sse-14.c: Ditto. * gcc.target/i386/sse-22.c: Ditto. * gcc.target/i386/sse-23.c: Ditto. * gcc.target/i386/avx10_2-512-vcmppbf16-2.c: New test. * gcc.target/i386/avx10_2-512-vfpclasspbf16-2.c: Ditto. * gcc.target/i386/avx10_2-512-vgetexppbf16-2.c: Ditto. * gcc.target/i386/avx10_2-512-vgetmantpbf16-2.c: Ditto. * gcc.target/i386/avx10_2-512-vrcppbf16-2.c: Ditto. * gcc.target/i386/avx10_2-512-vreducenepbf16-2.c: Ditto. * gcc.target/i386/avx10_2-512-vrndscalenepbf16-2.c: Ditto. * gcc.target/i386/avx10_2-512-vrsqrtpbf16-2.c: Ditto. * gcc.target/i386/avx10_2-512-vsqrtnepbf16-2.c: Ditto. * gcc.target/i386/avx10_2-vcmppbf16-2.c: Ditto. * gcc.target/i386/avx10_2-vcomsbf16-1.c: Ditto. * gcc.target/i386/avx10_2-vcomsbf16-2.c: Ditto. * gcc.target/i386/avx10_2-vfpclasspbf16-2.c: Ditto. * gcc.target/i386/avx10_2-vgetexppbf16-2.c: Ditto. * gcc.target/i386/avx10_2-vgetmantpbf16-2.c: Ditto. * gcc.target/i386/avx10_2-vrcppbf16-2.c: Ditto. * gcc.target/i386/avx10_2-vreducenepbf16-2.c: Ditto. * gcc.target/i386/avx10_2-vrndscalenepbf16-2.c: Ditto. * gcc.target/i386/avx10_2-vrsqrtpbf16-2.c: Ditto. * gcc.target/i386/avx10_2-vsqrtnepbf16-2.c: Ditto. Co-authored-by: Levy Hsu --- gcc/config/i386/avx10_2-512bf16intrin.h | 317 +++++++++ gcc/config/i386/avx10_2bf16intrin.h | 650 ++++++++++++++++++ gcc/config/i386/i386-builtin-types.def | 10 + gcc/config/i386/i386-builtin.def | 33 + gcc/config/i386/i386-expand.cc | 16 + gcc/config/i386/sse.md | 92 +++ gcc/testsuite/gcc.target/i386/avx-1.c | 19 + gcc/testsuite/gcc.target/i386/avx10-check.h | 4 +- gcc/testsuite/gcc.target/i386/avx10-helper.h | 28 + .../gcc.target/i386/avx10_2-512-bf16-1.c | 58 ++ .../gcc.target/i386/avx10_2-512-vcmppbf16-2.c | 36 + .../i386/avx10_2-512-vfpclasspbf16-2.c | 44 ++ .../i386/avx10_2-512-vgetexppbf16-2.c | 47 ++ .../i386/avx10_2-512-vgetmantpbf16-2.c | 50 ++ .../gcc.target/i386/avx10_2-512-vrcppbf16-2.c | 45 ++ .../i386/avx10_2-512-vreducenepbf16-2.c | 50 ++ .../i386/avx10_2-512-vrndscalenepbf16-2.c | 46 ++ .../i386/avx10_2-512-vrsqrtpbf16-2.c | 47 ++ .../i386/avx10_2-512-vscalefpbf16-2.c | 2 +- .../i386/avx10_2-512-vsqrtnepbf16-2.c | 47 ++ .../gcc.target/i386/avx10_2-bf16-1.c | 114 +++ .../gcc.target/i386/avx10_2-vcmppbf16-2.c | 16 + .../gcc.target/i386/avx10_2-vcomsbf16-1.c | 19 + .../gcc.target/i386/avx10_2-vcomsbf16-2.c | 58 ++ .../gcc.target/i386/avx10_2-vfpclasspbf16-2.c | 16 + .../gcc.target/i386/avx10_2-vgetexppbf16-2.c | 16 + .../gcc.target/i386/avx10_2-vgetmantpbf16-2.c | 16 + .../gcc.target/i386/avx10_2-vrcppbf16-2.c | 16 + .../i386/avx10_2-vreducenepbf16-2.c | 16 + .../i386/avx10_2-vrndscalenepbf16-2.c | 16 + .../gcc.target/i386/avx10_2-vrsqrtpbf16-2.c | 16 + .../gcc.target/i386/avx10_2-vsqrtnepbf16-2.c | 16 + gcc/testsuite/gcc.target/i386/sse-13.c | 19 + gcc/testsuite/gcc.target/i386/sse-14.c | 43 ++ gcc/testsuite/gcc.target/i386/sse-22.c | 43 ++ gcc/testsuite/gcc.target/i386/sse-23.c | 19 + 36 files changed, 2097 insertions(+), 3 deletions(-) create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-512-vcmppbf16-2.c create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-512-vfpclasspbf16-2.c create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-512-vgetexppbf16-2.c create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-512-vgetmantpbf16-2.c create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-512-vrcppbf16-2.c create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-512-vreducenepbf16-2.c create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-512-vrndscalenepbf16-2.c create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-512-vrsqrtpbf16-2.c create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-512-vsqrtnepbf16-2.c create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-vcmppbf16-2.c create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-vcomsbf16-1.c create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-vcomsbf16-2.c create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-vfpclasspbf16-2.c create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-vgetexppbf16-2.c create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-vgetmantpbf16-2.c create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-vrcppbf16-2.c create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-vreducenepbf16-2.c create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-vrndscalenepbf16-2.c create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-vrsqrtpbf16-2.c create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-vsqrtnepbf16-2.c diff --git a/gcc/config/i386/avx10_2-512bf16intrin.h b/gcc/config/i386/avx10_2-512bf16intrin.h index b409ea17adb..4e7f8eba146 100644 --- a/gcc/config/i386/avx10_2-512bf16intrin.h +++ b/gcc/config/i386/avx10_2-512bf16intrin.h @@ -356,6 +356,323 @@ _mm512_maskz_fnmsubne_pbh (__mmask32 __U, __m512bh __A, __builtin_ia32_fnmsubnepbf16512_maskz (__A, __B, __C, __U); } +extern __inline__ __m512bh +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_rsqrt_pbh (__m512bh __A) +{ + return (__m512bh) + __builtin_ia32_rsqrtpbf16512_mask (__A, + (__v32bf) _mm512_setzero_si512 (), + (__mmask32) -1); + +} + +extern __inline__ __m512bh +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_rsqrt_pbh (__m512bh __W, __mmask32 __U, __m512bh __A) +{ + return (__m512bh) + __builtin_ia32_rsqrtpbf16512_mask (__A, __W, __U); +} + +extern __inline__ __m512bh +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_rsqrt_pbh (__mmask32 __U, __m512bh __A) +{ + return (__m512bh) + __builtin_ia32_rsqrtpbf16512_mask (__A, + (__v32bf) _mm512_setzero_si512 (), + __U); +} + +extern __inline__ __m512bh +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_sqrtne_pbh (__m512bh __A) +{ + return (__m512bh) + __builtin_ia32_sqrtnepbf16512_mask (__A, + (__v32bf) _mm512_setzero_si512 (), + (__mmask32) -1); +} + +extern __inline__ __m512bh +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_sqrtne_pbh (__m512bh __W, __mmask32 __U, __m512bh __A) +{ + return (__m512bh) + __builtin_ia32_sqrtnepbf16512_mask (__A, __W, __U); +} + +extern __inline__ __m512bh +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_sqrtne_pbh (__mmask32 __U, __m512bh __A) +{ + return (__m512bh) + __builtin_ia32_sqrtnepbf16512_mask (__A, + (__v32bf) _mm512_setzero_si512 (), + __U); +} + +extern __inline__ __m512bh +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_rcp_pbh (__m512bh __A) +{ + return (__m512bh) + __builtin_ia32_rcppbf16512_mask (__A, + (__v32bf) _mm512_setzero_si512 (), + (__mmask32) -1); +} + +extern __inline__ __m512bh +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_rcp_pbh (__m512bh __W, __mmask32 __U, __m512bh __A) +{ + return (__m512bh) + __builtin_ia32_rcppbf16512_mask (__A, __W, __U); +} + +extern __inline__ __m512bh +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_rcp_pbh (__mmask32 __U, __m512bh __A) +{ + return (__m512bh) + __builtin_ia32_rcppbf16512_mask (__A, + (__v32bf) _mm512_setzero_si512 (), + __U); +} + +extern __inline__ __m512bh +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_getexp_pbh (__m512bh __A) +{ + return (__m512bh) + __builtin_ia32_getexppbf16512_mask (__A, + (__v32bf) _mm512_setzero_si512 (), + (__mmask32) -1); +} + +extern __inline__ __m512bh +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_getexp_pbh (__m512bh __W, __mmask32 __U, __m512bh __A) +{ + return (__m512bh) __builtin_ia32_getexppbf16512_mask (__A, __W, __U); +} + +extern __inline__ __m512bh +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_getexp_pbh (__mmask32 __U, __m512bh __A) +{ + return (__m512bh) + __builtin_ia32_getexppbf16512_mask (__A, + (__v32bf) _mm512_setzero_si512 (), + __U); +} + +/* Intrinsics vrndscalepbf16. */ +#ifdef __OPTIMIZE__ +extern __inline__ __m512bh +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_roundscalene_pbh (__m512bh __A, int B) +{ + return (__m512bh) + __builtin_ia32_rndscalenepbf16512_mask (__A, B, + (__v32bf) _mm512_setzero_si512 (), + (__mmask32) -1); +} + +extern __inline__ __m512bh +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_roundscalene_pbh (__m512bh __W, __mmask32 __U, __m512bh __A, int B) +{ + return (__m512bh) + __builtin_ia32_rndscalenepbf16512_mask (__A, B, __W, __U); +} + +extern __inline__ __m512bh +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_roundscalene_pbh (__mmask32 __U, __m512bh __A, int B) +{ + return (__m512bh) + __builtin_ia32_rndscalenepbf16512_mask (__A, B, + (__v32bf) _mm512_setzero_si512 (), + __U); +} + +#else +#define _mm512_roundscalene_pbh(A, B) \ + (__builtin_ia32_rndscalenepbf16512_mask ((A), (B), \ + (__v32bf) _mm512_setzero_si512 (), \ + (__mmask32) -1)) + +#define _mm512_mask_roundscalene_pbh(A, B, C, D) \ + (__builtin_ia32_rndscalenepbf16512_mask ((C), (D), (A), (B))) + +#define _mm512_maskz_roundscalene_pbh(A, B, C) \ + (__builtin_ia32_rndscalenepbf16512_mask ((B), (C), \ + (__v32bf) _mm512_setzero_si512 (), \ + (A))) + +#endif /* __OPTIMIZE__ */ + +/* Intrinsics vreducepbf16. */ +#ifdef __OPTIMIZE__ +extern __inline__ __m512bh +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_reducene_pbh (__m512bh __A, int B) +{ + return (__m512bh) + __builtin_ia32_reducenepbf16512_mask (__A, B, + (__v32bf) _mm512_setzero_si512 (), + (__mmask32) -1); +} + +extern __inline__ __m512bh +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_reducene_pbh (__m512bh __W, __mmask32 __U, + __m512bh __A, int B) +{ + return (__m512bh) + __builtin_ia32_reducenepbf16512_mask (__A, B, __W, __U); +} + +extern __inline__ __m512bh +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_reducene_pbh (__mmask32 __U, __m512bh __A, int B) +{ + return (__m512bh) + __builtin_ia32_reducenepbf16512_mask (__A, B, + (__v32bf) _mm512_setzero_si512 (), + __U); +} + +#else +#define _mm512_reducene_pbh(A, B) \ + (__builtin_ia32_reducenepbf16512_mask ((A), (B), \ + (__v32bf) _mm512_setzero_si512 (), \ + (__mmask32) -1)) + +#define _mm512_mask_reducene_pbh(A, B, C, D) \ + (__builtin_ia32_reducenepbf16512_mask ((C), (D), (A), (B))) + +#define _mm512_maskz_reducene_pbh(A, B, C) \ + (__builtin_ia32_reducenepbf16512_mask ((B), (C), \ + (__v32bf) _mm512_setzero_si512 (), \ + (A))) + +#endif /* __OPTIMIZE__ */ + +/* Intrinsics vgetmantpbf16. */ +#ifdef __OPTIMIZE__ +extern __inline__ __m512bh +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_getmant_pbh (__m512bh __A, _MM_MANTISSA_NORM_ENUM __B, + _MM_MANTISSA_SIGN_ENUM __C) +{ + return (__m512bh) + __builtin_ia32_getmantpbf16512_mask (__A, (int) (__C << 2) | __B, + (__v32bf) _mm512_setzero_si512 (), + (__mmask32) -1); +} + +extern __inline__ __m512bh +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_getmant_pbh (__m512bh __W, __mmask32 __U, __m512bh __A, + _MM_MANTISSA_NORM_ENUM __B, + _MM_MANTISSA_SIGN_ENUM __C) +{ + return (__m512bh) + __builtin_ia32_getmantpbf16512_mask (__A, (int) (__C << 2) | __B, + __W, __U); +} + +extern __inline__ __m512bh +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_getmant_pbh (__mmask32 __U, __m512bh __A, + _MM_MANTISSA_NORM_ENUM __B, + _MM_MANTISSA_SIGN_ENUM __C) +{ + return (__m512bh) + __builtin_ia32_getmantpbf16512_mask (__A, (int) (__C << 2) | __B, + (__v32bf) _mm512_setzero_si512 (), + __U); +} + +#else +#define _mm512_getmant_pbh(A, B, C) \ + (__builtin_ia32_getmantpbf16512_mask ((A), (int)(((C)<<2) | (B)), \ + (__v32bf) _mm512_setzero_si512 (), \ + (__mmask32) -1)) + +#define _mm512_mask_getmant_pbh(A, B, C, D, E) \ + (__builtin_ia32_getmantpbf16512_mask ((C), (int)(((D)<<2) | (E)), (A), (B))) + +#define _mm512_maskz_getmant_pbh(A, B, C, D) \ + (__builtin_ia32_getmantpbf16512_mask ((B), (int)(((C)<<2) | (D)), \ + (__v32bf) _mm512_setzero_si512 (), \ + (A))) + +#endif /* __OPTIMIZE__ */ + +/* Intrinsics vfpclasspbf16. */ +#ifdef __OPTIMIZE__ +extern __inline __mmask32 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_fpclass_pbh_mask (__mmask32 __U, __m512bh __A, + const int __imm) +{ + return (__mmask32) + __builtin_ia32_fpclasspbf16512_mask (__A, __imm, __U); +} + +extern __inline __mmask32 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_fpclass_pbh_mask (__m512bh __A, const int __imm) +{ + return (__mmask32) + __builtin_ia32_fpclasspbf16512_mask (__A, __imm, + (__mmask32) -1); +} + +#else +#define _mm512_mask_fpclass_pbh_mask(U, X, C) \ + ((__mmask32) __builtin_ia32_fpclasspbf16512_mask ( \ + (__v32bf) (__m512bh) (X), (int) (C), (__mmask32) (U))) + +#define _mm512_fpclass_pbh_mask(X, C) \ + ((__mmask32) __builtin_ia32_fpclasspbf16512_mask ( \ + (__v32bf) (__m512bh) (X), (int) (C), (__mmask32) (-1))) +#endif /* __OPIMTIZE__ */ + + +/* Intrinsics vcmppbf16. */ +#ifdef __OPTIMIZE__ +extern __inline __mmask32 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_cmp_pbh_mask (__mmask32 __U, __m512bh __A, __m512bh __B, + const int __imm) +{ + return (__mmask32) + __builtin_ia32_cmppbf16512_mask (__A, __B, __imm, __U); +} + +extern __inline __mmask32 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_cmp_pbh_mask (__m512bh __A, __m512bh __B, const int __imm) +{ + return (__mmask32) + __builtin_ia32_cmppbf16512_mask (__A, __B, __imm, + (__mmask32) -1); +} + +#else +#define _mm512_mask_cmp_pbh_mask(A, B, C, D) \ + ((__mmask32) __builtin_ia32_cmppbf16512_mask ((B), (C), (D), (A))) + +#define _mm512_cmp_pbh_mask(A, B, C) \ + ((__mmask32) __builtin_ia32_cmppbf16512_mask ((A), (B), (C), (-1))) + +#endif /* __OPIMTIZE__ */ + #ifdef __DISABLE_AVX10_2_512__ #undef __DISABLE_AVX10_2_512__ #pragma GCC pop_options diff --git a/gcc/config/i386/avx10_2bf16intrin.h b/gcc/config/i386/avx10_2bf16intrin.h index e16f1b66481..f36fb8ee8b3 100644 --- a/gcc/config/i386/avx10_2bf16intrin.h +++ b/gcc/config/i386/avx10_2bf16intrin.h @@ -677,6 +677,656 @@ _mm_maskz_fnmsubne_pbh (__mmask8 __U, __m128bh __A, __builtin_ia32_fnmsubnepbf16128_maskz (__A, __B, __C, __U); } +extern __inline__ __m256bh +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_rsqrt_pbh (__m256bh __A) +{ + return (__m256bh) + __builtin_ia32_rsqrtpbf16256_mask (__A, + (__v16bf) _mm256_setzero_si256 (), + (__mmask16) -1); +} + +extern __inline__ __m256bh +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_mask_rsqrt_pbh (__m256bh __W, __mmask16 __U, __m256bh __A) +{ + return (__m256bh) + __builtin_ia32_rsqrtpbf16256_mask (__A, __W, __U); +} + +extern __inline__ __m256bh +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_maskz_rsqrt_pbh (__mmask16 __U, __m256bh __A) +{ + return (__m256bh) + __builtin_ia32_rsqrtpbf16256_mask (__A, + (__v16bf) _mm256_setzero_si256 (), + __U); +} + +extern __inline__ __m128bh +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_rsqrt_pbh (__m128bh __A) +{ + return (__m128bh) + __builtin_ia32_rsqrtpbf16128_mask (__A, + (__v8bf) _mm_setzero_si128 (), + (__mmask8) -1); +} + +extern __inline__ __m128bh +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_rsqrt_pbh (__m128bh __W, __mmask8 __U, __m128bh __A) +{ + return (__m128bh) + __builtin_ia32_rsqrtpbf16128_mask (__A, __W, __U); +} + +extern __inline__ __m128bh +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_rsqrt_pbh (__mmask8 __U, __m128bh __A) +{ + return (__m128bh) + __builtin_ia32_rsqrtpbf16128_mask (__A, + (__v8bf) _mm_setzero_si128 (), + __U); +} + +extern __inline__ __m256bh +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_sqrtne_pbh (__m256bh __A) +{ + return (__m256bh) + __builtin_ia32_sqrtnepbf16256_mask (__A, + (__v16bf) _mm256_setzero_si256 (), + (__mmask16) -1); +} + +extern __inline__ __m256bh +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_mask_sqrtne_pbh (__m256bh __W, __mmask16 __U, __m256bh __A) +{ + return (__m256bh) + __builtin_ia32_sqrtnepbf16256_mask (__A, __W, __U); +} + +extern __inline__ __m256bh +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_maskz_sqrtne_pbh (__mmask16 __U, __m256bh __A) +{ + return (__m256bh) + __builtin_ia32_sqrtnepbf16256_mask (__A, + (__v16bf) _mm256_setzero_si256 (), + __U); +} + +extern __inline__ __m128bh +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_sqrtne_pbh (__m128bh __A) +{ + return (__m128bh) + __builtin_ia32_sqrtnepbf16128_mask (__A, + (__v8bf) _mm_setzero_si128 (), + (__mmask8) -1); +} + +extern __inline__ __m128bh +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_sqrtne_pbh (__m128bh __W, __mmask8 __U, __m128bh __A) +{ + return (__m128bh) + __builtin_ia32_sqrtnepbf16128_mask (__A, __W, __U); +} + +extern __inline__ __m128bh +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_sqrtne_pbh (__mmask8 __U, __m128bh __A) +{ + return (__m128bh) + __builtin_ia32_sqrtnepbf16128_mask (__A, + (__v8bf) _mm_setzero_si128 (), + __U); +} + +extern __inline__ __m256bh +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_rcp_pbh (__m256bh __A) +{ + return (__m256bh) + __builtin_ia32_rcppbf16256_mask (__A, + (__v16bf) _mm256_setzero_si256 (), + (__mmask16) -1); +} + +extern __inline__ __m256bh +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_mask_rcp_pbh (__m256bh __W, __mmask16 __U, __m256bh __A) +{ + return (__m256bh) + __builtin_ia32_rcppbf16256_mask (__A, __W, __U); +} + +extern __inline__ __m256bh +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_maskz_rcp_pbh (__mmask16 __U, __m256bh __A) +{ + return (__m256bh) + __builtin_ia32_rcppbf16256_mask (__A, + (__v16bf) _mm256_setzero_si256 (), + __U); +} + +extern __inline__ __m128bh +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_rcp_pbh (__m128bh __A) +{ + return (__m128bh) + __builtin_ia32_rcppbf16128_mask (__A, + (__v8bf) _mm_setzero_si128 (), + (__mmask8) -1); +} + +extern __inline__ __m128bh +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_rcp_pbh (__m128bh __W, __mmask8 __U, __m128bh __A) +{ + return (__m128bh) + __builtin_ia32_rcppbf16128_mask (__A, __W, __U); +} + +extern __inline__ __m128bh +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_rcp_pbh (__mmask8 __U, __m128bh __A) +{ + return (__m128bh) + __builtin_ia32_rcppbf16128_mask (__A, + (__v8bf) _mm_setzero_si128 (), + __U); +} + +extern __inline__ __m256bh +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_getexp_pbh (__m256bh __A) +{ + return (__m256bh) + __builtin_ia32_getexppbf16256_mask (__A, + (__v16bf) _mm256_setzero_si256 (), + (__mmask16) -1); +} + +extern __inline__ __m256bh +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_mask_getexp_pbh (__m256bh __W, __mmask16 __U, __m256bh __A) +{ + return (__m256bh) + __builtin_ia32_getexppbf16256_mask (__A, __W, __U); +} + +extern __inline__ __m256bh +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_maskz_getexp_pbh (__mmask16 __U, __m256bh __A) +{ + return (__m256bh) + __builtin_ia32_getexppbf16256_mask (__A, + (__v16bf) _mm256_setzero_si256 (), + __U); +} + +extern __inline__ __m128bh +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_getexp_pbh (__m128bh __A) +{ + return (__m128bh) + __builtin_ia32_getexppbf16128_mask (__A, + (__v8bf) _mm_setzero_si128 (), + (__mmask8) -1); +} + +extern __inline__ __m128bh +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_getexp_pbh (__m128bh __W, __mmask8 __U, __m128bh __A) +{ + return (__m128bh) + __builtin_ia32_getexppbf16128_mask (__A, __W, __U); +} + +extern __inline__ __m128bh +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_getexp_pbh (__mmask8 __U, __m128bh __A) +{ + return (__m128bh) + __builtin_ia32_getexppbf16128_mask (__A, + (__v8bf) _mm_setzero_si128 (), + __U); +} + +/* Intrinsics vrndscalepbf16. */ +#ifdef __OPTIMIZE__ +extern __inline__ __m256bh +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_roundscalene_pbh (__m256bh __A, int B) +{ + return (__m256bh) + __builtin_ia32_rndscalenepbf16256_mask (__A, B, + (__v16bf) _mm256_setzero_si256 (), + (__mmask16) -1); +} + +extern __inline__ __m256bh +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_mask_roundscalene_pbh (__m256bh __W, __mmask16 __U, + __m256bh __A, int B) +{ + return (__m256bh) + __builtin_ia32_rndscalenepbf16256_mask (__A, B, __W, __U); +} + +extern __inline__ __m256bh +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_maskz_roundscalene_pbh (__mmask16 __U, __m256bh __A, int B) +{ + return (__m256bh) + __builtin_ia32_rndscalenepbf16256_mask (__A, B, + (__v16bf) _mm256_setzero_si256 (), + __U); +} + +extern __inline__ __m128bh +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_roundscalene_pbh (__m128bh __A, int B) +{ + return (__m128bh) + __builtin_ia32_rndscalenepbf16128_mask (__A, B, + (__v8bf) _mm_setzero_si128 (), + (__mmask8) -1); +} + +extern __inline__ __m128bh +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_roundscalene_pbh (__m128bh __W, __mmask8 __U, + __m128bh __A, int B) +{ + return (__m128bh) + __builtin_ia32_rndscalenepbf16128_mask (__A, B, __W, __U); +} + +extern __inline__ __m128bh +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_roundscalene_pbh (__mmask8 __U, __m128bh __A, int B) +{ + return (__m128bh) + __builtin_ia32_rndscalenepbf16128_mask (__A, B, + (__v8bf) _mm_setzero_si128 (), + __U); +} + +#else +#define _mm256_roundscalene_pbh(A, B) \ + (__builtin_ia32_rndscalenepbf16256_mask ((A), (B), \ + (__v16bf) _mm256_setzero_si256 (), \ + (__mmask16) -1)) + +#define _mm256_mask_roundscalene_pbh(A, B, C, D) \ + (__builtin_ia32_rndscalenepbf16256_mask ((C), (D), (A), (B))) + +#define _mm256_maskz_roundscalene_pbh(A, B, C) \ + (__builtin_ia32_rndscalenepbf16256_mask ((B), (C), \ + (__v16bf) _mm256_setzero_si256 (), \ + (A))) + +#define _mm_roundscalene_pbh(A, B) \ + (__builtin_ia32_rndscalenepbf16128_mask ((A), (B), \ + (__v8bf) _mm_setzero_si128 (), \ + (__mmask8) -1)) + +#define _mm_mask_roundscalene_pbh(A, B, C, D) \ + (__builtin_ia32_rndscalenepbf16128_mask ((C), (D), (A), (B))) + +#define _mm_maskz_roundscalene_pbh(A, B, C) \ + (__builtin_ia32_rndscalenepbf16128_mask ((B), (C), \ + (__v8bf) _mm_setzero_si128 (), \ + (A))) + +#endif /* __OPTIMIZE__ */ + +/* Intrinsics vreducepbf16. */ +#ifdef __OPTIMIZE__ +extern __inline__ __m256bh +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_reducene_pbh (__m256bh __A, int B) +{ + return (__m256bh) + __builtin_ia32_reducenepbf16256_mask (__A, B, + (__v16bf) _mm256_setzero_si256 (), + (__mmask16) -1); +} + +extern __inline__ __m256bh +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_mask_reducene_pbh (__m256bh __W, __mmask16 __U, + __m256bh __A, int B) +{ + return (__m256bh) + __builtin_ia32_reducenepbf16256_mask (__A, B, __W, __U); +} + +extern __inline__ __m256bh +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_maskz_reducene_pbh (__mmask16 __U, __m256bh __A, int B) +{ + return (__m256bh) + __builtin_ia32_reducenepbf16256_mask (__A, B, + (__v16bf) _mm256_setzero_si256 (), + __U); +} + +extern __inline__ __m128bh +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_reducene_pbh (__m128bh __A, int B) +{ + return (__m128bh) + __builtin_ia32_reducenepbf16128_mask (__A, B, + (__v8bf) _mm_setzero_si128 (), + (__mmask8) -1); +} + +extern __inline__ __m128bh +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_reducene_pbh (__m128bh __W, __mmask8 __U, + __m128bh __A, int B) +{ + return (__m128bh) + __builtin_ia32_reducenepbf16128_mask (__A, B, __W, __U); +} + +extern __inline__ __m128bh +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_reducene_pbh (__mmask8 __U, __m128bh __A, int B) +{ + return (__m128bh) + __builtin_ia32_reducenepbf16128_mask (__A, B, + (__v8bf) _mm_setzero_si128 (), + __U); +} + +#else +#define _mm256_reducene_pbh(A, B) \ + (__builtin_ia32_reducenepbf16256_mask ((A), (B), \ + (__v16bf) _mm256_setzero_si256 (), \ + (__mmask16) -1)) + +#define _mm256_mask_reducene_pbh(A, B, C, D) \ + (__builtin_ia32_reducenepbf16256_mask ((C), (D), (A), (B))) + +#define _mm256_maskz_reducene_pbh(A, B, C) \ + (__builtin_ia32_reducenepbf16256_mask ((B), (C), \ + (__v16bf) _mm256_setzero_si256 (), \ + (A))) + +#define _mm_reducene_pbh(A, B) \ + (__builtin_ia32_reducenepbf16128_mask ((A), (B), \ + (__v8bf) _mm_setzero_si128 (), \ + (__mmask8) -1)) + +#define _mm_mask_reducene_pbh(A, B, C, D) \ + (__builtin_ia32_reducenepbf16128_mask ((C), (D), (A), (B))) + +#define _mm_maskz_reducene_pbh(A, B, C) \ + (__builtin_ia32_reducenepbf16128_mask ((B), (C), \ + (__v8bf) _mm_setzero_si128 (), \ + (A))) + +#endif /* __OPTIMIZE__ */ + + +/* Intrinsics vgetmantpbf16. */ +#ifdef __OPTIMIZE__ +extern __inline__ __m256bh +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_getmant_pbh (__m256bh __A, _MM_MANTISSA_NORM_ENUM __B, + _MM_MANTISSA_SIGN_ENUM __C) +{ + return (__m256bh) + __builtin_ia32_getmantpbf16256_mask (__A, (int) (__C << 2) | __B, + (__v16bf) _mm256_setzero_si256 (), + (__mmask16) -1); +} + +extern __inline__ __m256bh +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_mask_getmant_pbh (__m256bh __W, __mmask16 __U, __m256bh __A, + _MM_MANTISSA_NORM_ENUM __B, + _MM_MANTISSA_SIGN_ENUM __C) +{ + return (__m256bh) + __builtin_ia32_getmantpbf16256_mask (__A, (int) (__C << 2) | __B, + __W, __U); +} + +extern __inline__ __m256bh +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_maskz_getmant_pbh (__mmask16 __U, __m256bh __A, + _MM_MANTISSA_NORM_ENUM __B, + _MM_MANTISSA_SIGN_ENUM __C) +{ + return (__m256bh) + __builtin_ia32_getmantpbf16256_mask (__A, (int) (__C << 2) | __B, + (__v16bf) _mm256_setzero_si256 (), + __U); +} + +extern __inline__ __m128bh +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_getmant_pbh (__m128bh __A, _MM_MANTISSA_NORM_ENUM __B, + _MM_MANTISSA_SIGN_ENUM __C) +{ + return (__m128bh) + __builtin_ia32_getmantpbf16128_mask (__A, (int) (__C << 2) | __B, + (__v8bf) _mm_setzero_si128 (), + (__mmask8) -1); +} + +extern __inline__ __m128bh +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_getmant_pbh (__m128bh __W, __mmask8 __U, __m128bh __A, + _MM_MANTISSA_NORM_ENUM __B, + _MM_MANTISSA_SIGN_ENUM __C) +{ + return (__m128bh) + __builtin_ia32_getmantpbf16128_mask (__A, (int) (__C << 2) | __B, + __W, __U); +} + +extern __inline__ __m128bh +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_getmant_pbh (__mmask8 __U, __m128bh __A, + _MM_MANTISSA_NORM_ENUM __B, + _MM_MANTISSA_SIGN_ENUM __C) +{ + return (__m128bh) + __builtin_ia32_getmantpbf16128_mask (__A, (int) (__C << 2) | __B, + (__v8bf) _mm_setzero_si128 (), + __U); +} + +#else +#define _mm256_getmant_pbh(A, B, C) \ + (__builtin_ia32_getmantpbf16256_mask ((A), (int)(((C)<<2) | (B)), \ + (__v16bf) _mm256_setzero_si256 (), \ + (__mmask16) (-1))) + +#define _mm256_mask_getmant_pbh(A, B, C, D, E) \ + (__builtin_ia32_getmantpbf16256_mask ((C), (int)(((D)<<2) | (E)), (A), (B))) + +#define _mm256_maskz_getmant_pbh(A, B, C, D) \ + (__builtin_ia32_getmantpbf16256_mask ((B), (int)(((C)<<2) | (D)), \ + (__v16bf) _mm256_setzero_si256 (), \ + (A))) + +#define _mm_getmant_pbh(A, B, C) \ + (__builtin_ia32_getmantpbf16128_mask ((A), (int)(((C)<<2) | (B)), \ + (__v8bf) _mm_setzero_si128 (), \ + (__mmask8) (-1))) + +#define _mm_mask_getmant_pbh(A, B, C, D, E) \ + (__builtin_ia32_getmantpbf16128_mask ((C), (int)(((D)<<2) | (E)), (A), (B))) + +#define _mm_maskz_getmant_pbh(A, B, C, D) \ + (__builtin_ia32_getmantpbf16128_mask ((B), (int)(((C)<<2) | (D)), \ + (__v8bf) _mm_setzero_si128 (), (A))) + +#endif /* __OPTIMIZE__ */ + +/* Intrinsics vfpclasspbf16. */ +#ifdef __OPTIMIZE__ +extern __inline __mmask16 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_mask_fpclass_pbh_mask (__mmask16 __U, __m256bh __A, + const int __imm) +{ + return (__mmask16) + __builtin_ia32_fpclasspbf16256_mask (__A, __imm, __U); +} + +extern __inline __mmask16 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_fpclass_pbh_mask (__m256bh __A, const int __imm) +{ + return (__mmask16) + __builtin_ia32_fpclasspbf16256_mask (__A, __imm, (__mmask16) -1); +} + +extern __inline __mmask8 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_fpclass_pbh_mask (__mmask8 __U, __m128bh __A, const int __imm) +{ + return (__mmask8) + __builtin_ia32_fpclasspbf16128_mask (__A, __imm, __U); +} + +extern __inline __mmask8 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_fpclass_pbh_mask (__m128bh __A, const int __imm) +{ + return (__mmask8) + __builtin_ia32_fpclasspbf16128_mask (__A, __imm, (__mmask8) -1); +} + +#else +#define _mm256_mask_fpclass_pbh_mask(U, A, B) \ + ((__mmask16) __builtin_ia32_fpclasspbf16256_mask ((A), (B), (U))) + +#define _mm256_fpclass_pbh_mask(A, B) \ + ((__mmask16) __builtin_ia32_fpclasspbf16256_mask ((A), (B), \ + (__mmask16) (-1))) + +#define _mm_mask_fpclass_pbh_mask(U, A, B) \ + ((__mmask8) __builtin_ia32_fpclasspbf16128_mask ((A), (B), (U))) + +#define _mm_fpclass_pbh_mask(A, B) \ + ((__mmask8) __builtin_ia32_fpclasspbf16128_mask ((A), (B), \ + (__mmask8) (-1))) + +#endif /* __OPIMTIZE__ */ + + +/* Intrinsics vcmppbf16. */ +#ifdef __OPTIMIZE__ +extern __inline __mmask16 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_mask_cmp_pbh_mask (__mmask16 __U, __m256bh __A, + __m256bh __B, const int __imm) +{ + return (__mmask16) + __builtin_ia32_cmppbf16256_mask (__A, __B, __imm, __U); +} + +extern __inline __mmask16 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_cmp_pbh_mask (__m256bh __A, __m256bh __B, const int __imm) +{ + return (__mmask16) + __builtin_ia32_cmppbf16256_mask (__A, __B, __imm, (__mmask16) -1); +} + +extern __inline __mmask8 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_cmp_pbh_mask (__mmask8 __U, __m128bh __A, + __m128bh __B, const int __imm) +{ + return (__mmask8) + __builtin_ia32_cmppbf16128_mask (__A, __B, __imm, __U); +} + +extern __inline __mmask8 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cmp_pbh_mask (__m128bh __A, __m128bh __B, const int __imm) +{ + return (__mmask8) + __builtin_ia32_cmppbf16128_mask (__A, __B, __imm, (__mmask8) -1); +} + +#else +#define _mm256_mask_cmp_pbh_mask(A, B, C, D) \ + ((__mmask16) __builtin_ia32_cmppbf16256_mask ((B), (C), (D), (A))) + +#define _mm256_cmp_pbh_mask(A, B, C) \ + ((__mmask16) __builtin_ia32_cmppbf16256_mask ((A), (B), (C), \ + (__mmask16) (-1))) + +#define _mm_mask_cmp_pbh_mask(A, B, C, D) \ + ((__mmask8) __builtin_ia32_cmppbf16128_mask ((B), (C), (D), (A))) + +#define _mm_cmp_pbh_mask(A, B, C) \ + ((__mmask8) __builtin_ia32_cmppbf16128_mask ((A), (B), (C), \ + (__mmask8) (-1))) + +#endif /* __OPIMTIZE__ */ + +/* Intrinsics vcomsbf16. */ +extern __inline int +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_comeq_sbh (__m128bh __A, __m128bh __B) +{ + return __builtin_ia32_vcomsbf16eq (__A, __B); +} + +extern __inline int +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_comlt_sbh (__m128bh __A, __m128bh __B) +{ + return __builtin_ia32_vcomsbf16lt (__A, __B); +} + +extern __inline int +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_comle_sbh (__m128bh __A, __m128bh __B) +{ + return __builtin_ia32_vcomsbf16le (__A, __B); +} + +extern __inline int +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_comgt_sbh (__m128bh __A, __m128bh __B) +{ + return __builtin_ia32_vcomsbf16gt (__A, __B); +} + +extern __inline int +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_comge_sbh (__m128bh __A, __m128bh __B) +{ + return __builtin_ia32_vcomsbf16ge (__A, __B); +} + +extern __inline int +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_comneq_sbh (__m128bh __A, __m128bh __B) +{ + return __builtin_ia32_vcomsbf16neq (__A, __B); +} + #ifdef __DISABLE_AVX10_2_256__ #undef __DISABLE_AVX10_2_256__ #pragma GCC pop_options diff --git a/gcc/config/i386/i386-builtin-types.def b/gcc/config/i386/i386-builtin-types.def index f3838424fd4..e6f53589e70 100644 --- a/gcc/config/i386/i386-builtin-types.def +++ b/gcc/config/i386/i386-builtin-types.def @@ -1483,3 +1483,13 @@ DEF_FUNCTION_TYPE (V8BF, V8BF, V8BF, UQI) DEF_FUNCTION_TYPE (V32BF, V32BF, V32BF, V32BF, USI) DEF_FUNCTION_TYPE (V16BF, V16BF, V16BF, V16BF, UHI) DEF_FUNCTION_TYPE (V8BF, V8BF, V8BF, V8BF, UQI) +DEF_FUNCTION_TYPE (V32BF, V32BF, INT, V32BF, USI) +DEF_FUNCTION_TYPE (V16BF, V16BF, INT, V16BF, UHI) +DEF_FUNCTION_TYPE (V8BF, V8BF, INT, V8BF, UQI) +DEF_FUNCTION_TYPE (QI, V8BF, INT, UQI) +DEF_FUNCTION_TYPE (HI, V16BF, INT, UHI) +DEF_FUNCTION_TYPE (SI, V32BF, INT, USI) +DEF_FUNCTION_TYPE (USI, V32BF, V32BF, INT, USI) +DEF_FUNCTION_TYPE (UHI, V16BF, V16BF, INT, UHI) +DEF_FUNCTION_TYPE (UQI, V8BF, V8BF, INT, UQI) +DEF_FUNCTION_TYPE (INT, V8BF, V8BF) diff --git a/gcc/config/i386/i386-builtin.def b/gcc/config/i386/i386-builtin.def index 3f3bc768348..25b8169c1ef 100644 --- a/gcc/config/i386/i386-builtin.def +++ b/gcc/config/i386/i386-builtin.def @@ -3237,6 +3237,39 @@ BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_fnmsubnepbf16_v16bf_mas BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_fnmsubnepbf16_v8bf_mask, "__builtin_ia32_fnmsubnepbf16128_mask", IX86_BUILTIN_FNMSUBNEPBF16128_MASK, UNKNOWN, (int) V8BF_FTYPE_V8BF_V8BF_V8BF_UQI) BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_fnmsubnepbf16_v8bf_mask3, "__builtin_ia32_fnmsubnepbf16128_mask3", IX86_BUILTIN_FNMSUBNEPBF16128_MASK3, UNKNOWN, (int) V8BF_FTYPE_V8BF_V8BF_V8BF_UQI) BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_fnmsubnepbf16_v8bf_maskz, "__builtin_ia32_fnmsubnepbf16128_maskz", IX86_BUILTIN_FNMSUBNEPBF16128_MASKZ, UNKNOWN, (int) V8BF_FTYPE_V8BF_V8BF_V8BF_UQI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_512, CODE_FOR_avx10_2_rsqrtpbf16_v32bf_mask, "__builtin_ia32_rsqrtpbf16512_mask", IX86_BUILTIN_RSQRTPBF16512_MASK, UNKNOWN, (int) V32BF_FTYPE_V32BF_V32BF_USI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_rsqrtpbf16_v16bf_mask, "__builtin_ia32_rsqrtpbf16256_mask", IX86_BUILTIN_RSQRTPBF16256_MASK, UNKNOWN, (int) V16BF_FTYPE_V16BF_V16BF_UHI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_rsqrtpbf16_v8bf_mask, "__builtin_ia32_rsqrtpbf16128_mask", IX86_BUILTIN_RSQRTPBF16128_MASK, UNKNOWN, (int) V8BF_FTYPE_V8BF_V8BF_UQI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_512, CODE_FOR_avx10_2_sqrtnepbf16_v32bf_mask, "__builtin_ia32_sqrtnepbf16512_mask", IX86_BUILTIN_SQRTNEPBF16512_MASK, UNKNOWN, (int) V32BF_FTYPE_V32BF_V32BF_USI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_sqrtnepbf16_v16bf_mask, "__builtin_ia32_sqrtnepbf16256_mask", IX86_BUILTIN_SQRTNEPBF16256_MASK, UNKNOWN, (int) V16BF_FTYPE_V16BF_V16BF_UHI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_sqrtnepbf16_v8bf_mask, "__builtin_ia32_sqrtnepbf16128_mask", IX86_BUILTIN_SQRTNEPBF16128_MASK, UNKNOWN, (int) V8BF_FTYPE_V8BF_V8BF_UQI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_512, CODE_FOR_avx10_2_rcppbf16_v32bf_mask, "__builtin_ia32_rcppbf16512_mask", IX86_BUILTIN_RCPPBF16512_MASK, UNKNOWN, (int) V32BF_FTYPE_V32BF_V32BF_USI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_rcppbf16_v16bf_mask, "__builtin_ia32_rcppbf16256_mask", IX86_BUILTIN_RCPPBF16256_MASK, UNKNOWN, (int) V16BF_FTYPE_V16BF_V16BF_UHI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_rcppbf16_v8bf_mask, "__builtin_ia32_rcppbf16128_mask", IX86_BUILTIN_RCPPBF16128_MASK, UNKNOWN, (int) V8BF_FTYPE_V8BF_V8BF_UQI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_512, CODE_FOR_avx10_2_getexppbf16_v32bf_mask, "__builtin_ia32_getexppbf16512_mask", IX86_BUILTIN_GETEXPPBF16512_MASK, UNKNOWN, (int) V32BF_FTYPE_V32BF_V32BF_USI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_getexppbf16_v16bf_mask, "__builtin_ia32_getexppbf16256_mask", IX86_BUILTIN_GETEXPPBF16256_MASK, UNKNOWN, (int) V16BF_FTYPE_V16BF_V16BF_UHI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_getexppbf16_v8bf_mask, "__builtin_ia32_getexppbf16128_mask", IX86_BUILTIN_GETEXPPBF16128_MASK, UNKNOWN, (int) V8BF_FTYPE_V8BF_V8BF_UQI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_512, CODE_FOR_avx10_2_rndscalenepbf16_v32bf_mask, "__builtin_ia32_rndscalenepbf16512_mask", IX86_BUILTIN_RNDSCALENEPBF16512_MASK, UNKNOWN, (int) V32BF_FTYPE_V32BF_INT_V32BF_USI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_rndscalenepbf16_v16bf_mask, "__builtin_ia32_rndscalenepbf16256_mask", IX86_BUILTIN_RNDSCALENEPBF16256_MASK, UNKNOWN, (int) V16BF_FTYPE_V16BF_INT_V16BF_UHI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_rndscalenepbf16_v8bf_mask, "__builtin_ia32_rndscalenepbf16128_mask", IX86_BUILTIN_RNDSCALENEPBF16128_MASK, UNKNOWN, (int) V8BF_FTYPE_V8BF_INT_V8BF_UQI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_512, CODE_FOR_avx10_2_reducenepbf16_v32bf_mask, "__builtin_ia32_reducenepbf16512_mask", IX86_BUILTIN_REDUCENEPBF16512_MASK, UNKNOWN, (int) V32BF_FTYPE_V32BF_INT_V32BF_USI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_reducenepbf16_v16bf_mask, "__builtin_ia32_reducenepbf16256_mask", IX86_BUILTIN_REDUCENEPBF16256_MASK, UNKNOWN, (int) V16BF_FTYPE_V16BF_INT_V16BF_UHI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_reducenepbf16_v8bf_mask, "__builtin_ia32_reducenepbf16128_mask", IX86_BUILTIN_REDUCENEPBF16128_MASK, UNKNOWN, (int) V8BF_FTYPE_V8BF_INT_V8BF_UQI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_512, CODE_FOR_avx10_2_getmantpbf16_v32bf_mask, "__builtin_ia32_getmantpbf16512_mask", IX86_BUILTIN_GETMANTPBF16512_MASK, UNKNOWN, (int) V32BF_FTYPE_V32BF_INT_V32BF_USI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_getmantpbf16_v16bf_mask, "__builtin_ia32_getmantpbf16256_mask", IX86_BUILTIN_GETMANTPBF16256_MASK, UNKNOWN, (int) V16BF_FTYPE_V16BF_INT_V16BF_UHI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_getmantpbf16_v8bf_mask, "__builtin_ia32_getmantpbf16128_mask", IX86_BUILTIN_GETMANTPBF16128_MASK, UNKNOWN, (int) V8BF_FTYPE_V8BF_INT_V8BF_UQI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_512, CODE_FOR_avx10_2_fpclasspbf16_v32bf_mask, "__builtin_ia32_fpclasspbf16512_mask", IX86_BUILTIN_FPCLASSPBF16512_MASK, UNKNOWN, (int) SI_FTYPE_V32BF_INT_USI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_fpclasspbf16_v16bf_mask, "__builtin_ia32_fpclasspbf16256_mask", IX86_BUILTIN_FPCLASSPBF16256_MASK, UNKNOWN, (int) HI_FTYPE_V16BF_INT_UHI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_fpclasspbf16_v8bf_mask, "__builtin_ia32_fpclasspbf16128_mask", IX86_BUILTIN_FPCLASSPBF16128_MASK, UNKNOWN, (int) QI_FTYPE_V8BF_INT_UQI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_512, CODE_FOR_avx10_2_cmppbf16_v32bf_mask, "__builtin_ia32_cmppbf16512_mask", IX86_BUILTIN_CMPPBF16512_MASK, UNKNOWN, (int) USI_FTYPE_V32BF_V32BF_INT_USI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_cmppbf16_v16bf_mask, "__builtin_ia32_cmppbf16256_mask", IX86_BUILTIN_CMPPBF16256_MASK, UNKNOWN, (int) UHI_FTYPE_V16BF_V16BF_INT_UHI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_cmppbf16_v8bf_mask, "__builtin_ia32_cmppbf16128_mask", IX86_BUILTIN_CMPPBF16128_MASK, UNKNOWN, (int) UQI_FTYPE_V8BF_V8BF_INT_UQI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_comsbf16_v8bf, "__builtin_ia32_vcomsbf16eq", IX86_BUILTIN_VCOMSBF16EQ, EQ, (int) INT_FTYPE_V8BF_V8BF) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_comsbf16_v8bf, "__builtin_ia32_vcomsbf16gt", IX86_BUILTIN_VCOMSBF16GT, GT, (int) INT_FTYPE_V8BF_V8BF) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_comsbf16_v8bf, "__builtin_ia32_vcomsbf16ge", IX86_BUILTIN_VCOMSBF16GE, GE, (int) INT_FTYPE_V8BF_V8BF) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_comsbf16_v8bf, "__builtin_ia32_vcomsbf16le", IX86_BUILTIN_VCOMSBF16LE, LE, (int) INT_FTYPE_V8BF_V8BF) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_comsbf16_v8bf, "__builtin_ia32_vcomsbf16lt", IX86_BUILTIN_VCOMSBF16LT, LT, (int) INT_FTYPE_V8BF_V8BF) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_comsbf16_v8bf, "__builtin_ia32_vcomsbf16neq", IX86_BUILTIN_VCOMSBF16NE, NE, (int) INT_FTYPE_V8BF_V8BF) /* Builtins with rounding support. */ BDESC_END (ARGS, ROUND_ARGS) diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc index dff9e09809e..7ea41924b98 100644 --- a/gcc/config/i386/i386-expand.cc +++ b/gcc/config/i386/i386-expand.cc @@ -11712,6 +11712,9 @@ ix86_expand_args_builtin (const struct builtin_description *d, case QI_FTYPE_V8HF_INT_UQI: case HI_FTYPE_V16HF_INT_UHI: case SI_FTYPE_V32HF_INT_USI: + case QI_FTYPE_V8BF_INT_UQI: + case HI_FTYPE_V16BF_INT_UHI: + case SI_FTYPE_V32BF_INT_USI: case V4SI_FTYPE_V4SI_V4SI_UHI: case V8SI_FTYPE_V8SI_V8SI_UHI: nargs = 3; @@ -11825,9 +11828,12 @@ ix86_expand_args_builtin (const struct builtin_description *d, case USI_FTYPE_V32QI_V32QI_INT_USI: case UHI_FTYPE_V16QI_V16QI_INT_UHI: case USI_FTYPE_V32HI_V32HI_INT_USI: + case USI_FTYPE_V32BF_V32BF_INT_USI: case USI_FTYPE_V32HF_V32HF_INT_USI: case UHI_FTYPE_V16HI_V16HI_INT_UHI: + case UHI_FTYPE_V16BF_V16BF_INT_UHI: case UQI_FTYPE_V8HI_V8HI_INT_UQI: + case UQI_FTYPE_V8BF_V8BF_INT_UQI: nargs = 4; mask_pos = 1; nargs_constant = 1; @@ -11864,6 +11870,9 @@ ix86_expand_args_builtin (const struct builtin_description *d, case V32HI_FTYPE_V32HI_INT_V32HI_USI: case V16HI_FTYPE_V16HI_INT_V16HI_UHI: case V8HI_FTYPE_V8HI_INT_V8HI_UQI: + case V32BF_FTYPE_V32BF_INT_V32BF_USI: + case V16BF_FTYPE_V16BF_INT_V16BF_UHI: + case V8BF_FTYPE_V8BF_INT_V8BF_UQI: case V4DI_FTYPE_V4DI_INT_V4DI_UQI: case V2DI_FTYPE_V2DI_INT_V2DI_UQI: case V8SI_FTYPE_V8SI_INT_V8SI_UQI: @@ -15662,6 +15671,13 @@ rdseed_step: case IX86_BUILTIN_RDPID: return ix86_expand_special_args_builtin (bdesc_args + i, exp, target); + case IX86_BUILTIN_VCOMSBF16EQ: + case IX86_BUILTIN_VCOMSBF16NE: + case IX86_BUILTIN_VCOMSBF16GT: + case IX86_BUILTIN_VCOMSBF16GE: + case IX86_BUILTIN_VCOMSBF16LT: + case IX86_BUILTIN_VCOMSBF16LE: + return ix86_expand_sse_comi (bdesc_args + i, exp, target); case IX86_BUILTIN_FABSQ: case IX86_BUILTIN_COPYSIGNQ: if (!TARGET_SSE) diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index 50274f01a01..d7d99c6359f 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -230,6 +230,11 @@ UNSPEC_VCVTNEPH2HF8S UNSPEC_VCVTHF82PH UNSPEC_VSCALEFPBF16 + UNSPEC_VRNDSCALENEPBF16 + UNSPEC_VREDUCENEPBF16 + UNSPEC_VGETMANTPBF16 + UNSPEC_VFPCLASSPBF16 + UNSPEC_VCOMSBF16 ]) (define_c_enum "unspecv" [ @@ -835,6 +840,7 @@ (define_mode_attr vecmemsuffix [(V32HF "{z}") (V16HF "{y}") (V8HF "{x}") + (V32BF "{z}") (V16BF "{y}") (V8BF "{x}") (V16SF "{z}") (V8SF "{y}") (V4SF "{x}") (V8DF "{z}") (V4DF "{y}") (V2DF "{x}")]) @@ -32105,3 +32111,89 @@ [(set_attr "prefix" "evex") (set_attr "type" "ssemuladd") (set_attr "mode" "")]) + +(define_insn "avx10_2_rsqrtpbf16_" + [(set (match_operand:VBF_AVX10_2 0 "register_operand" "=v") + (unspec:VBF_AVX10_2 + [(match_operand:VBF_AVX10_2 1 "nonimmediate_operand" "vm")] + UNSPEC_RSQRT))] + "TARGET_AVX10_2_256" + "vrsqrtpbf16\t{%1, %0|%0, %1}" + [(set_attr "prefix" "evex")]) + +(define_insn "avx10_2_sqrtnepbf16_" + [(set (match_operand:VBF_AVX10_2 0 "register_operand" "=v") + (sqrt:VBF_AVX10_2 + (match_operand:VBF_AVX10_2 1 "nonimmediate_operand" "vm")))] + "TARGET_AVX10_2_256" + "vsqrtnepbf16\t{%1, %0|%0, %1}" + [(set_attr "prefix" "evex")]) + +(define_insn "avx10_2_rcppbf16_" + [(set (match_operand:VBF_AVX10_2 0 "register_operand" "=v") + (unspec:VBF_AVX10_2 + [(match_operand:VBF_AVX10_2 1 "nonimmediate_operand" "vm")] + UNSPEC_RCP))] + "TARGET_AVX10_2_256" + "vrcppbf16\t{%1, %0|%0, %1}" + [(set_attr "prefix" "evex")]) + +(define_insn "avx10_2_getexppbf16_" + [(set (match_operand:VBF_AVX10_2 0 "register_operand" "=v") + (unspec:VBF_AVX10_2 + [(match_operand:VBF_AVX10_2 1 "nonimmediate_operand" "vm")] + UNSPEC_GETEXP))] + "TARGET_AVX10_2_256" + "vgetexppbf16\t{%1, %0|%0, %1}" + [(set_attr "prefix" "evex")]) + +(define_int_iterator BF16IMMOP + [UNSPEC_VRNDSCALENEPBF16 + UNSPEC_VREDUCENEPBF16 + UNSPEC_VGETMANTPBF16]) + +(define_int_attr bf16immop + [(UNSPEC_VRNDSCALENEPBF16 "rndscalene") + (UNSPEC_VREDUCENEPBF16 "reducene") + (UNSPEC_VGETMANTPBF16 "getmant")]) + +(define_insn "avx10_2_pbf16_" + [(set (match_operand:VBF_AVX10_2 0 "register_operand" "=v") + (unspec:VBF_AVX10_2 + [(match_operand:VBF_AVX10_2 1 "nonimmediate_operand" "vm") + (match_operand:SI 2 "const_0_to_255_operand")] + BF16IMMOP))] + "TARGET_AVX10_2_256" + "vpbf16\t{%2, %1, %0|%0, %1, %2}" + [(set_attr "prefix" "evex")]) + +(define_insn "avx10_2_fpclasspbf16_" + [(set (match_operand: 0 "register_operand" "=k") + (unspec: + [(match_operand:VBF_AVX10_2 1 "nonimmediate_operand" "vm") + (match_operand 2 "const_0_to_255_operand")] + UNSPEC_VFPCLASSPBF16))] + "TARGET_AVX10_2_256" + "vfpclasspbf16\t{%2, %1, %0|%0, %1, %2}" + [(set_attr "prefix" "evex")]) + +(define_insn "avx10_2_cmppbf16_" + [(set (match_operand: 0 "register_operand" "=k") + (unspec: + [(match_operand:VBF_AVX10_2 1 "register_operand" "v") + (match_operand:VBF_AVX10_2 2 "nonimmediate_operand" "vm") + (match_operand 3 "const_0_to_31_operand" "n")] + UNSPEC_PCMP))] + "TARGET_AVX10_2_256" + "vcmppbf16\t{%3, %2, %1, %0|%0, %1, %2, %3}" + [(set_attr "prefix" "evex")]) + +(define_insn "avx10_2_comsbf16_v8bf" + [(set (reg:CCFP FLAGS_REG) + (unspec:CCFP + [(match_operand:V8BF 0 "register_operand" "v") + (match_operand:V8BF 1 "nonimmediate_operand" "vm")] + UNSPEC_VCOMSBF16))] + "TARGET_AVX10_2_256" + "vcomsbf16\t{%1, %0|%0, %1}" + [(set_attr "prefix" "evex")]) diff --git a/gcc/testsuite/gcc.target/i386/avx-1.c b/gcc/testsuite/gcc.target/i386/avx-1.c index 4a47e313096..df4cfdfff8d 100644 --- a/gcc/testsuite/gcc.target/i386/avx-1.c +++ b/gcc/testsuite/gcc.target/i386/avx-1.c @@ -1016,6 +1016,25 @@ /* avx10_2-512convertintrin.h */ #define __builtin_ia32_vcvt2ps2phx512_mask_round(A, B, C, D, E) __builtin_ia32_vcvt2ps2phx512_mask_round(A, B, C, D, 8) +/* avx10_2-512bf16intrin.h */ +#define __builtin_ia32_rndscalenepbf16512_mask(A, B, C, D) __builtin_ia32_rndscalenepbf16512_mask(A, 123, C, D) +#define __builtin_ia32_reducenepbf16512_mask(A, B, C, D) __builtin_ia32_reducenepbf16512_mask(A, 123, C, D) +#define __builtin_ia32_getmantpbf16512_mask(A, B, C, D) __builtin_ia32_getmantpbf16512_mask(A, 1, C, D) +#define __builtin_ia32_fpclasspbf16512_mask(A, B, C) __builtin_ia32_fpclasspbf16512_mask(A, 1, C) +#define __builtin_ia32_cmppbf16512_mask(A, B, C, D) __builtin_ia32_cmppbf16512_mask(A, B, 1, D) + +/* avx10_2bf16intrin.h */ +#define __builtin_ia32_rndscalenepbf16256_mask(A, B, C, D) __builtin_ia32_rndscalenepbf16256_mask(A, 123, C, D) +#define __builtin_ia32_rndscalenepbf16128_mask(A, B, C, D) __builtin_ia32_rndscalenepbf16128_mask(A, 123, C, D) +#define __builtin_ia32_reducenepbf16256_mask(A, B, C, D) __builtin_ia32_reducenepbf16256_mask(A, 123, C, D) +#define __builtin_ia32_reducenepbf16128_mask(A, B, C, D) __builtin_ia32_reducenepbf16128_mask(A, 123, C, D) +#define __builtin_ia32_getmantpbf16256_mask(A, B, C, D) __builtin_ia32_getmantpbf16256_mask(A, 1, C, D) +#define __builtin_ia32_getmantpbf16128_mask(A, B, C, D) __builtin_ia32_getmantpbf16128_mask(A, 1, C, D) +#define __builtin_ia32_fpclasspbf16256_mask(A, B, C) __builtin_ia32_fpclasspbf16256_mask(A, 1, C) +#define __builtin_ia32_fpclasspbf16128_mask(A, B, C) __builtin_ia32_fpclasspbf16128_mask(A, 1, C) +#define __builtin_ia32_cmppbf16256_mask(A, B, C, D) __builtin_ia32_cmppbf16256_mask(A, B, 1, D) +#define __builtin_ia32_cmppbf16128_mask(A, B, C, D) __builtin_ia32_cmppbf16128_mask(A, B, 1, D) + #include #include #include diff --git a/gcc/testsuite/gcc.target/i386/avx10-check.h b/gcc/testsuite/gcc.target/i386/avx10-check.h index 76c32d7acaa..87fa818f048 100644 --- a/gcc/testsuite/gcc.target/i386/avx10-check.h +++ b/gcc/testsuite/gcc.target/i386/avx10-check.h @@ -5,7 +5,7 @@ #ifndef DO_TEST #define DO_TEST do_test -#if defined(AVX10_512BIT) +#if defined(AVX10_512BIT) || defined(AVX10_SCALAR) static void test_512 (void); #else static void test_256 (void); @@ -16,7 +16,7 @@ __attribute__ ((noinline)) static void do_test (void) { -#if defined(AVX10_512BIT) +#if defined(AVX10_512BIT) || defined(AVX10_SCALAR) test_512 (); #else test_256 (); diff --git a/gcc/testsuite/gcc.target/i386/avx10-helper.h b/gcc/testsuite/gcc.target/i386/avx10-helper.h index 9ff1dd72e92..4d092e27447 100644 --- a/gcc/testsuite/gcc.target/i386/avx10-helper.h +++ b/gcc/testsuite/gcc.target/i386/avx10-helper.h @@ -53,6 +53,34 @@ scalef (float x, float y) return _mm_cvtss_f32 (out); } +float NOINLINE +getexp (float val) +{ + float res; + __m128 px = _mm_load_ss (&val); + __m128 mx = _mm_broadcastss_ps (px); + __m128 out = _mm_getexp_ps (mx); + return _mm_cvtss_f32 (out); +} + +float NOINLINE +rndscale (float val) +{ + __m128 px = _mm_load_ss (&val); + __m128 mx = _mm_broadcastss_ps (px); + __m128 out = _mm_roundscale_ps (mx, 0x10); + return _mm_cvtss_f32 (out); +} + +float NOINLINE +getmant(float val) +{ + __m128 px = _mm_load_ss (&val); + __m128 mx = _mm_broadcastss_ps (px); + __m128 out = _mm_getmant_ps (mx, 0, 0); + return _mm_cvtss_f32 (out); +} + #endif /* AVX10_HELPER_INCLUDED */ /* Intrinsic being tested. It has different deffinitions, diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-512-bf16-1.c b/gcc/testsuite/gcc.target/i386/avx10_2-512-bf16-1.c index 78839fb1297..6d111a10b41 100644 --- a/gcc/testsuite/gcc.target/i386/avx10_2-512-bf16-1.c +++ b/gcc/testsuite/gcc.target/i386/avx10_2-512-bf16-1.c @@ -37,9 +37,36 @@ /* { dg-final { scan-assembler-times "vfnmsub132nepbf16\[ \\t\]+%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */ /* { dg-final { scan-assembler-times "vfnmsub231nepbf16\[ \\t\]+%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */ /* { dg-final { scan-assembler-times "vfnmsub132nepbf16\[ \\t\]+%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vrsqrtpbf16\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vrsqrtpbf16\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\[^\{\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vrsqrtpbf16\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vsqrtnepbf16\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vsqrtnepbf16\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\[^\{\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vsqrtnepbf16\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vrcppbf16\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vrcppbf16\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\[^\{\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vrcppbf16\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vgetexppbf16\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vgetexppbf16\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\[^\{\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vgetexppbf16\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vrndscalenepbf16\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vrndscalenepbf16\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vrndscalenepbf16\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vreducenepbf16\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vreducenepbf16\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vreducenepbf16\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vgetmantpbf16\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vgetmantpbf16\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vgetmantpbf16\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vfpclasspbf16z\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n^k\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vfpclasspbf16z\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n^k\]*%k\[0-7\]\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcmppbf16\[ \\t\]+\\\$1\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%k\[0-9\](?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcmppbf16\[ \\t\]+\\\$2\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%k\[0-9\]\{%k\[0-9\]\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ #include +#define IMM 123 + volatile __m512bh res, x1, x2; volatile __mmask32 m32; @@ -84,4 +111,35 @@ avx10_2_512_test (void) res = _mm512_mask_fnmsubne_pbh (res, m32, x1, x2); res = _mm512_mask3_fnmsubne_pbh (res, x1, x2, m32); res = _mm512_maskz_fnmsubne_pbh (m32,res, x1, x2); + + res = _mm512_rsqrt_pbh (x1); + res = _mm512_mask_rsqrt_pbh (res, m32, x1); + res = _mm512_maskz_rsqrt_pbh (m32, x1); + res = _mm512_sqrtne_pbh (x1); + res = _mm512_mask_sqrtne_pbh (res, m32, x1); + res = _mm512_maskz_sqrtne_pbh (m32, x1); + res = _mm512_rcp_pbh (x1); + res = _mm512_mask_rcp_pbh (res, m32, x1); + res = _mm512_maskz_rcp_pbh (m32, x1); + res = _mm512_getexp_pbh (x1); + res = _mm512_mask_getexp_pbh (res, m32, x1); + res = _mm512_maskz_getexp_pbh (m32, x1); + + res = _mm512_roundscalene_pbh (x1, IMM); + res = _mm512_mask_roundscalene_pbh (res, m32, x1, IMM); + res = _mm512_maskz_roundscalene_pbh (m32, x1, IMM); + res = _mm512_reducene_pbh (x1, IMM); + res = _mm512_mask_reducene_pbh (res, m32, x1, IMM); + res = _mm512_maskz_reducene_pbh (m32, x1, IMM); + res = _mm512_getmant_pbh (x1, _MM_MANT_NORM_p75_1p5, _MM_MANT_SIGN_src); + res = _mm512_mask_getmant_pbh (res, m32, x1, _MM_MANT_NORM_p75_1p5, + _MM_MANT_SIGN_src); + res = _mm512_maskz_getmant_pbh (m32, x1, _MM_MANT_NORM_p75_1p5, + _MM_MANT_SIGN_src); + + m32 = _mm512_fpclass_pbh_mask (x1, 13); + m32 = _mm512_mask_fpclass_pbh_mask (2, x1, 13); + + m32 = _mm512_cmp_pbh_mask (x1, x2, 1); + m32 = _mm512_mask_cmp_pbh_mask (m32, x1, x2, 2); } diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-512-vcmppbf16-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-512-vcmppbf16-2.c new file mode 100644 index 00000000000..a352890e9bc --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx10_2-512-vcmppbf16-2.c @@ -0,0 +1,36 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mavx10.2-512" } */ +/* { dg-require-effective-target avx10_2_512 } */ + +#ifndef AVX10_2 +#define AVX10_2 +#define AVX10_2_512 +#define AVX10_512BIT +#endif +#include "avx10-helper.h" +#define SIZE_RES (AVX512F_LEN / 16) + +void +TEST (void) +{ + int i; + __mmask32 res1, res2, exp = 0; + UNION_TYPE (AVX512F_LEN, bf16_uw) src1, src2; + MASK_TYPE mask = MASK_VALUE; + + for (i = 0; i < SIZE_RES; i++) + { + float x = 0.5; + float y = 0.25; + src2.a[i] = convert_fp32_to_bf16 (y); + src1.a[i] = convert_fp32_to_bf16 (x); + if (src1.a[i] == src2.a[i]) + exp |= 1 << i; + } + + res1 = INTRINSIC (_cmp_pbh_mask) (src1.x, src2.x, 0); + res2 = INTRINSIC (_mask_cmp_pbh_mask) (mask, src1.x, src2.x, 0); + + if (exp != res1 || exp != res2) + abort (); +} diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-512-vfpclasspbf16-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-512-vfpclasspbf16-2.c new file mode 100644 index 00000000000..1b25a070eff --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx10_2-512-vfpclasspbf16-2.c @@ -0,0 +1,44 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mavx10.2-512" } */ +/* { dg-require-effective-target avx10_2_512 } */ + +#ifndef AVX10_2 +#define AVX10_2 +#define AVX10_2_512 +#define AVX10_512BIT +#endif +#include "avx10-helper.h" +#define SIZE_RES (AVX512F_LEN / 16) + +void +TEST (void) +{ + int i; + MASK_TYPE res1 = 0, res2 = 0; + __mmask16 exp = 0; + UNION_TYPE (AVX512F_LEN, bf16_uw) src1; + UNION_TYPE (AVX512F_LEN, ) src2; + MASK_TYPE mask = MASK_VALUE; + + for (i = 0; i < SIZE_RES / 2; i++) + { + src1.a[i] = 0; + src2.a[i] = (uint32_t) (src1.a[i]) << 16; + } + + for (i = SIZE_RES / 2; i < SIZE_RES; i++) + src1.a[i] = 0; + + src1.a[0] = 0x7FC0; + src2.a[0] = convert_bf16_to_fp32 (src1.a[0]); + + _mm_setcsr (0x9FC0); + exp = INTRINSIC (_fpclass_ps_mask) (src2.x, 0x01); + + _mm_setcsr (0x1f80); + res1 = INTRINSIC (_fpclass_pbh_mask) (src1.x, 0x01); + res2 = INTRINSIC (_mask_fpclass_pbh_mask) (mask, src1.x, 1); + + if (exp != res1 || exp != res2) + abort (); +} diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-512-vgetexppbf16-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-512-vgetexppbf16-2.c new file mode 100644 index 00000000000..def6d93ccad --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx10_2-512-vgetexppbf16-2.c @@ -0,0 +1,47 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mavx10.2-512" } */ +/* { dg-require-effective-target avx10_2_512 } */ + +#ifndef AVX10_2 +#define AVX10_2 +#define AVX10_2_512 +#define AVX10_512BIT +#endif +#include "avx10-helper.h" +#define SIZE_RES (AVX512F_LEN / 16) + +void +TEST (void) +{ + int i; + UNION_TYPE (AVX512F_LEN, bf16_uw) res1, res2, res3, src1; + MASK_TYPE mask = MASK_VALUE; + unsigned short res_ref[SIZE_RES], res_ref2[SIZE_RES]; + + for (i = 0; i < SIZE_RES; i++) + { + res1.a[i] = 0; + res2.a[i] = DEFAULT_VALUE; + res3.a[i] = DEFAULT_VALUE; + float f, s; + f = 28 * i + 1; + src1.a[i] = convert_fp32_to_bf16 (f); + s = convert_bf16_to_fp32 (src1.a[i]); + res_ref[i] = res_ref2[i] = convert_fp32_to_bf16 (getexp (s)); + } + + res1.x = INTRINSIC (_getexp_pbh) (src1.x); + res2.x = INTRINSIC (_mask_getexp_pbh) (res2.x, mask, src1.x); + res3.x = INTRINSIC (_maskz_getexp_pbh) (mask, src1.x); + + if (UNION_CHECK (AVX512F_LEN, bf16_uw) (res1, res_ref)) + abort (); + + MASK_MERGE (bf16_uw) (res_ref2, mask, SIZE_RES); + if (UNION_CHECK (AVX512F_LEN, bf16_uw) (res2, res_ref2)) + abort (); + + MASK_ZERO (bf16_uw) (res_ref2, mask, SIZE_RES); + if (UNION_CHECK (AVX512F_LEN, bf16_uw) (res3, res_ref2)) + abort (); +} diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-512-vgetmantpbf16-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-512-vgetmantpbf16-2.c new file mode 100644 index 00000000000..898cf5ccf38 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx10_2-512-vgetmantpbf16-2.c @@ -0,0 +1,50 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mavx10.2-512" } */ +/* { dg-require-effective-target avx10_2_512 } */ + +#ifndef AVX10_2 +#define AVX10_2 +#define AVX10_2_512 +#define AVX10_512BIT +#endif +#include "avx10-helper.h" +#define SIZE_RES (AVX512F_LEN / 16) + +void +TEST (void) +{ + int i; + UNION_TYPE (AVX512F_LEN, bf16_uw) res1, res2, res3, src1; + MASK_TYPE mask = MASK_VALUE; + unsigned short res_ref[SIZE_RES], res_ref2[SIZE_RES]; + + for (i = 0; i < SIZE_RES; i++) + { + res1.a[i] = 5.0; + res2.a[i] = DEFAULT_VALUE; + res3.a[i] = DEFAULT_VALUE; + src1.a[i] = 0.5; + float x = convert_bf16_to_fp32 (src1.a[i]); + res_ref[i] = res_ref2[i] = convert_fp32_to_bf16 (getmant (x)); + } + + res1.x = INTRINSIC (_getmant_pbh) (src1.x, _MM_MANT_NORM_1_2, + _MM_MANT_SIGN_src); + res2.x = INTRINSIC (_mask_getmant_pbh) (res2.x, mask, src1.x, + _MM_MANT_NORM_1_2, + _MM_MANT_SIGN_src); + res3.x = INTRINSIC (_maskz_getmant_pbh) (mask, src1.x, + _MM_MANT_NORM_1_2, + _MM_MANT_SIGN_src); + + if (UNION_CHECK (AVX512F_LEN, bf16_uw) (res1, res_ref)) + abort (); + + MASK_MERGE (bf16_uw) (res_ref2, mask, SIZE_RES); + if (UNION_CHECK (AVX512F_LEN, bf16_uw) (res2, res_ref2)) + abort (); + + MASK_ZERO (bf16_uw) (res_ref2, mask, SIZE_RES); + if (UNION_CHECK (AVX512F_LEN, bf16_uw) (res3, res_ref2)) + abort (); +} diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-512-vrcppbf16-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-512-vrcppbf16-2.c new file mode 100644 index 00000000000..0bca27d504f --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx10_2-512-vrcppbf16-2.c @@ -0,0 +1,45 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mavx10.2-512" } */ +/* { dg-require-effective-target avx10_2_512 } */ + +#ifndef AVX10_2 +#define AVX10_2 +#define AVX10_2_512 +#define AVX10_512BIT +#endif +#include "avx10-helper.h" +#define SIZE_RES (AVX512F_LEN / 16) + +void +TEST (void) +{ + int i; + UNION_TYPE (AVX512F_LEN, bf16_uw) res1, res2, res3, src1; + MASK_TYPE mask = MASK_VALUE; + unsigned short res_ref[SIZE_RES], res_ref2[SIZE_RES]; + + for (i = 0; i < SIZE_RES; i++) + { + res1.a[i] = 0; + res2.a[i] = DEFAULT_VALUE; + res3.a[i] = DEFAULT_VALUE; + float s1 = 2.0; + src1.a[i] = convert_fp32_to_bf16 (s1); + res_ref[i] = res_ref2[i] = convert_fp32_to_bf16 (1.0 / s1); + } + + res1.x = INTRINSIC (_rcp_pbh) (src1.x); + res2.x = INTRINSIC (_mask_rcp_pbh) (res2.x, mask, src1.x); + res3.x = INTRINSIC (_maskz_rcp_pbh) (mask, src1.x); + + if (UNION_CHECK (AVX512F_LEN, bf16_uw) (res1, res_ref)) + abort (); + + MASK_MERGE (bf16_uw) (res_ref2, mask, SIZE_RES); + if (UNION_CHECK (AVX512F_LEN, bf16_uw) (res2, res_ref2)) + abort (); + + MASK_ZERO (bf16_uw) (res_ref2, mask, SIZE_RES); + if (UNION_CHECK (AVX512F_LEN, bf16_uw) (res3, res_ref2)) + abort (); +} diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-512-vreducenepbf16-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-512-vreducenepbf16-2.c new file mode 100644 index 00000000000..c3e2b36864e --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx10_2-512-vreducenepbf16-2.c @@ -0,0 +1,50 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mavx10.2-512" } */ +/* { dg-require-effective-target avx10_2_512 } */ + +#ifndef AVX10_2 +#define AVX10_2 +#define AVX10_2_512 +#define AVX10_512BIT +#endif +#include "avx10-helper.h" +#define SIZE_RES (AVX512F_LEN / 16) + +void +TEST (void) +{ + int i; + UNION_TYPE (AVX512F_LEN, bf16_uw) res1, res2, res3, src1; + MASK_TYPE mask = MASK_VALUE; + unsigned short res_ref[SIZE_RES], res_ref2[SIZE_RES]; + + for (i = 0; i < SIZE_RES; i++) + { + res1.a[i] = 5.0; + res2.a[i] = DEFAULT_VALUE; + res3.a[i] = DEFAULT_VALUE; + float s = (float) (SIZE_RES - 1) / (float) i; + src1.a[i] = convert_fp32_to_bf16 (s); + float x = convert_bf16_to_fp32 (src1.a[i]); + __m128 px = _mm_load_ss (&x); + __m128 mx = _mm_broadcastss_ps (px); + __m128 out = _mm_reduce_ps (mx, 0x10); + float res = _mm_cvtss_f32 (out); + res_ref[i] = res_ref2[i] = convert_fp32_to_bf16_ne (res); + } + + res1.x = INTRINSIC (_reducene_pbh) (src1.x, 0x10); + res2.x = INTRINSIC (_mask_reducene_pbh) (res2.x, mask, src1.x, 0x10); + res3.x = INTRINSIC (_maskz_reducene_pbh) (mask, src1.x, 0x10); + + if (UNION_CHECK (AVX512F_LEN, bf16_uw) (res1, res_ref)) + abort (); + + MASK_MERGE (bf16_uw) (res_ref2, mask, SIZE_RES); + if (UNION_CHECK (AVX512F_LEN, bf16_uw) (res2, res_ref2)) + abort (); + + MASK_ZERO (bf16_uw) (res_ref2, mask, SIZE_RES); + if (UNION_CHECK (AVX512F_LEN, bf16_uw) (res3, res_ref2)) + abort (); +} diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-512-vrndscalenepbf16-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-512-vrndscalenepbf16-2.c new file mode 100644 index 00000000000..5b0e6a89120 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx10_2-512-vrndscalenepbf16-2.c @@ -0,0 +1,46 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mavx10.2-512" } */ +/* { dg-require-effective-target avx10_2_512 } */ + +#ifndef AVX10_2 +#define AVX10_2 +#define AVX10_2_512 +#define AVX10_512BIT +#endif +#include "avx10-helper.h" +#define SIZE_RES (AVX512F_LEN / 16) + +void +TEST (void) +{ + int i; + UNION_TYPE (AVX512F_LEN, bf16_uw) res1, res2, res3, src1; + MASK_TYPE mask = MASK_VALUE; + unsigned short res_ref[SIZE_RES], res_ref2[SIZE_RES]; + + for (i = 0; i < SIZE_RES; i++) + { + res1.a[i] = 5.0; + res2.a[i] = DEFAULT_VALUE; + res3.a[i] = DEFAULT_VALUE; + float s = (float) (SIZE_RES - 1) / (float) i; + src1.a[i] = convert_fp32_to_bf16 (s); + float x = convert_bf16_to_fp32 (src1.a[i]); + res_ref[i] = res_ref2[i] = convert_fp32_to_bf16_ne (rndscale (x)); + } + + res1.x = INTRINSIC (_roundscalene_pbh) (src1.x, 0x10); + res2.x = INTRINSIC (_mask_roundscalene_pbh) (res2.x, mask, src1.x, 0x10); + res3.x = INTRINSIC (_maskz_roundscalene_pbh) (mask, src1.x, 0x10); + + if (UNION_CHECK (AVX512F_LEN, bf16_uw) (res1, res_ref)) + abort (); + + MASK_MERGE (bf16_uw) (res_ref2, mask, SIZE_RES); + if (UNION_CHECK (AVX512F_LEN, bf16_uw) (res2, res_ref2)) + abort (); + + MASK_ZERO (bf16_uw) (res_ref2, mask, SIZE_RES); + if (UNION_CHECK (AVX512F_LEN, bf16_uw) (res3, res_ref2)) + abort (); +} diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-512-vrsqrtpbf16-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-512-vrsqrtpbf16-2.c new file mode 100644 index 00000000000..a879efce3f8 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx10_2-512-vrsqrtpbf16-2.c @@ -0,0 +1,47 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mavx10.2-512" } */ +/* { dg-require-effective-target avx10_2_512 } */ + +#ifndef AVX10_2 +#define AVX10_2 +#define AVX10_2_512 +#define AVX10_512BIT +#endif +#include "avx10-helper.h" +#include +#define SIZE_RES (AVX512F_LEN / 16) + +void +TEST (void) +{ + int i; + UNION_TYPE (AVX512F_LEN, bf16_uw) res1, res2, res3, src1; + MASK_TYPE mask = MASK_VALUE; + unsigned short res_ref[SIZE_RES], res_ref2[SIZE_RES]; + + for (i = 0; i < SIZE_RES; i++) + { + res1.a[i] = 0; + res2.a[i] = DEFAULT_VALUE; + res3.a[i] = DEFAULT_VALUE; + float s1 = 2.0; + float rs = 1.0 / sqrtf (s1); + src1.a[i] = convert_fp32_to_bf16 (s1); + res_ref[i] = res_ref2[i] = convert_fp32_to_bf16 (rs); + } + + res1.x = INTRINSIC (_rsqrt_pbh) (src1.x); + res2.x = INTRINSIC (_mask_rsqrt_pbh) (res2.x, mask, src1.x); + res3.x = INTRINSIC (_maskz_rsqrt_pbh) (mask, src1.x); + + if (UNION_CHECK (AVX512F_LEN, bf16_uw) (res1, res_ref)) + abort (); + + MASK_MERGE (bf16_uw) (res_ref2, mask, SIZE_RES); + if (UNION_CHECK (AVX512F_LEN, bf16_uw) (res2, res_ref2)) + abort (); + + MASK_ZERO (bf16_uw) (res_ref2, mask, SIZE_RES); + if (UNION_CHECK (AVX512F_LEN, bf16_uw) (res3, res_ref2)) + abort (); +} diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-512-vscalefpbf16-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-512-vscalefpbf16-2.c index 867f77ad3a7..78df474240d 100644 --- a/gcc/testsuite/gcc.target/i386/avx10_2-512-vscalefpbf16-2.c +++ b/gcc/testsuite/gcc.target/i386/avx10_2-512-vscalefpbf16-2.c @@ -31,7 +31,7 @@ TEST (void) xx = convert_bf16_to_fp32 (src1.a[i]); yy = convert_bf16_to_fp32 (src2.a[i]); res = scalef (xx, yy); - res_ref[i] = res_ref2[i] = convert_fp32_to_bf16_ne(res); + res_ref[i] = res_ref2[i] = convert_fp32_to_bf16 (res); } res1.x = INTRINSIC (_scalef_pbh) (src1.x, src2.x); diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-512-vsqrtnepbf16-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-512-vsqrtnepbf16-2.c new file mode 100644 index 00000000000..987c9b1abe9 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx10_2-512-vsqrtnepbf16-2.c @@ -0,0 +1,47 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mavx10.2-512" } */ +/* { dg-require-effective-target avx10_2_512 } */ + +#ifndef AVX10_2 +#define AVX10_2 +#define AVX10_2_512 +#define AVX10_512BIT +#endif +#include "avx10-helper.h" +#include +#define SIZE_RES (AVX512F_LEN / 16) + +void +TEST (void) +{ + int i; + UNION_TYPE (AVX512F_LEN, bf16_uw) res1, res2, res3, src1; + MASK_TYPE mask = MASK_VALUE; + unsigned short res_ref[SIZE_RES], res_ref2[SIZE_RES]; + + for (i = 0; i < SIZE_RES; i++) + { + res1.a[i] = 0; + res2.a[i] = DEFAULT_VALUE; + res3.a[i] = DEFAULT_VALUE; + float s1 = i + 1.0; + float rs = sqrtf (s1); + src1.a[i] = convert_fp32_to_bf16_ne (s1); + res_ref[i] = res_ref2[i] = convert_fp32_to_bf16_ne (rs); + } + + res1.x = INTRINSIC (_sqrtne_pbh) (src1.x); + res2.x = INTRINSIC (_mask_sqrtne_pbh) (res2.x, mask, src1.x); + res3.x = INTRINSIC (_maskz_sqrtne_pbh) (mask, src1.x); + + if (UNION_CHECK (AVX512F_LEN, bf16_uw) (res1, res_ref)) + abort (); + + MASK_MERGE (bf16_uw) (res_ref2, mask, SIZE_RES); + if (UNION_CHECK (AVX512F_LEN, bf16_uw) (res2, res_ref2)) + abort (); + + MASK_ZERO (bf16_uw) (res_ref2, mask, SIZE_RES); + if (UNION_CHECK (AVX512F_LEN, bf16_uw) (res3, res_ref2)) + abort (); +} diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-bf16-1.c b/gcc/testsuite/gcc.target/i386/avx10_2-bf16-1.c index 831c8f849ef..56cec6df1d6 100644 --- a/gcc/testsuite/gcc.target/i386/avx10_2-bf16-1.c +++ b/gcc/testsuite/gcc.target/i386/avx10_2-bf16-1.c @@ -74,9 +74,60 @@ /* { dg-final { scan-assembler-times "vfnmsub132nepbf16\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */ /* { dg-final { scan-assembler-times "vfnmsub231nepbf16\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */ /* { dg-final { scan-assembler-times "vfnmsub132nepbf16\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vrsqrtpbf16\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vrsqrtpbf16\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}\[^\{\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vrsqrtpbf16\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vrsqrtpbf16\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vrsqrtpbf16\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\[^\{\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vrsqrtpbf16\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vsqrtnepbf16\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vsqrtnepbf16\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}\[^\{\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vsqrtnepbf16\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vsqrtnepbf16\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vsqrtnepbf16\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\[^\{\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vsqrtnepbf16\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vrcppbf16\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vrcppbf16\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}\[^\{\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vrcppbf16\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vrcppbf16\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vrcppbf16\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\[^\{\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vrcppbf16\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vgetexppbf16\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vgetexppbf16\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}\[^\{\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vgetexppbf16\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vgetexppbf16\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vgetexppbf16\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\[^\{\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vgetexppbf16\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vrndscalenepbf16\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vrndscalenepbf16\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vrndscalenepbf16\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vrndscalenepbf16\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vrndscalenepbf16\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vrndscalenepbf16\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vreducenepbf16\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vreducenepbf16\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vreducenepbf16\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vreducenepbf16\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vreducenepbf16\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vreducenepbf16\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vgetmantpbf16\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vgetmantpbf16\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vgetmantpbf16\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vgetmantpbf16\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vgetmantpbf16\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vgetmantpbf16\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vfpclasspbf16y\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n^k\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vfpclasspbf16y\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n^k\]*%k\[0-7\]\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vfpclasspbf16x\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n^k\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vfpclasspbf16x\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n^k\]*%k\[0-7\]\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcmppbf16\[ \\t\]+\\\$1\[^\n\r]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\[^\n\r]*%k\[0-9\](?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcmppbf16\[ \\t\]+\\\$2\[^\n\r]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\[^\n\r]*%k\[0-9\]\{%k\[0-9\]\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcmppbf16\[ \\t\]+\\\$1\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%k\[0-9\](?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcmppbf16\[ \\t\]+\\\$2\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%k\[0-9\]\{%k\[0-9\]\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ #include +#define IMM 123 volatile __m256bh res, x1, x2; volatile __m128bh res1, x3, x4; volatile __mmask16 m16; @@ -169,4 +220,67 @@ avx10_2_test (void) res1 = _mm_mask_fnmsubne_pbh (res1, m8, x3, x4); res1 = _mm_mask3_fnmsubne_pbh (res1, x3, x4, m8); res1 = _mm_maskz_fnmsubne_pbh (m8,res1, x3, x4); + + res = _mm256_rsqrt_pbh (x1); + res = _mm256_mask_rsqrt_pbh (res, m16, x1); + res = _mm256_maskz_rsqrt_pbh (m16, x1); + res1 = _mm_rsqrt_pbh (x3); + res1 = _mm_mask_rsqrt_pbh (res1, m8, x3); + res1 = _mm_maskz_rsqrt_pbh (m8, x3); + + res = _mm256_sqrtne_pbh (x1); + res = _mm256_mask_sqrtne_pbh (res, m16, x1); + res = _mm256_maskz_sqrtne_pbh (m16, x1); + res1 = _mm_sqrtne_pbh (x3); + res1 = _mm_mask_sqrtne_pbh (res1, m8, x3); + res1 = _mm_maskz_sqrtne_pbh (m8, x3); + + res = _mm256_rcp_pbh (x1); + res = _mm256_mask_rcp_pbh (res, m16, x1); + res = _mm256_maskz_rcp_pbh (m16, x1); + res1 = _mm_rcp_pbh (x3); + res1 = _mm_mask_rcp_pbh (res1, m8, x3); + res1 = _mm_maskz_rcp_pbh (m8, x3); + + res = _mm256_getexp_pbh (x1); + res = _mm256_mask_getexp_pbh (res, m16, x1); + res = _mm256_maskz_getexp_pbh (m16, x1); + res1 = _mm_getexp_pbh (x3); + res1 = _mm_mask_getexp_pbh (res1, m8, x3); + res1 = _mm_maskz_getexp_pbh (m8, x3); + + res = _mm256_roundscalene_pbh (x1, IMM); + res = _mm256_mask_roundscalene_pbh (res, m16, x1, IMM); + res = _mm256_maskz_roundscalene_pbh (m16, x1, IMM); + res1 = _mm_roundscalene_pbh (x3, IMM); + res1 = _mm_mask_roundscalene_pbh (res1, m8, x3, IMM); + res1 = _mm_maskz_roundscalene_pbh (m8, x3, IMM); + + res = _mm256_reducene_pbh (x1, IMM); + res = _mm256_mask_reducene_pbh (res, m16, x1, IMM); + res = _mm256_maskz_reducene_pbh (m16, x1, IMM); + res1 = _mm_reducene_pbh (x3, IMM); + res1 = _mm_mask_reducene_pbh (res1, m8, x3, IMM); + res1 = _mm_maskz_reducene_pbh (m8, x3, IMM); + + res = _mm256_getmant_pbh (x1, _MM_MANT_NORM_p75_1p5, _MM_MANT_SIGN_src); + res = _mm256_mask_getmant_pbh (res, m16, x1, _MM_MANT_NORM_p75_1p5, + _MM_MANT_SIGN_src); + res = _mm256_maskz_getmant_pbh (m16, x1, _MM_MANT_NORM_p75_1p5, + _MM_MANT_SIGN_src); + res1 = _mm_getmant_pbh (x3, _MM_MANT_NORM_p75_1p5, _MM_MANT_SIGN_src); + res1 = _mm_mask_getmant_pbh (res1, m8, x3, _MM_MANT_NORM_p75_1p5, + _MM_MANT_SIGN_src); + res1 = _mm_maskz_getmant_pbh (m8, x3, _MM_MANT_NORM_p75_1p5, + _MM_MANT_SIGN_src); + + m16 = _mm256_fpclass_pbh_mask (x1, 13); + m16 = _mm256_mask_fpclass_pbh_mask (2, x1, 13); + m8 = _mm_fpclass_pbh_mask (x3, 13); + m8 = _mm_mask_fpclass_pbh_mask (2, x3, 13); + + m16 = _mm256_cmp_pbh_mask (x1, x2, 1); + m16 = _mm256_mask_cmp_pbh_mask (m16, x1, x2, 2); + m8 = _mm_cmp_pbh_mask (x3, x4, 1); + m8 = _mm_mask_cmp_pbh_mask (m8, x3, x4, 2); } diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-vcmppbf16-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-vcmppbf16-2.c new file mode 100644 index 00000000000..fa8be3e8e8b --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx10_2-vcmppbf16-2.c @@ -0,0 +1,16 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mavx10.2" } */ +/* { dg-require-effective-target avx10_2 } */ + +#define AVX10_2 +#define AVX512VL +#define AVX512F_LEN 256 +#define AVX512F_LEN_HALF 128 +#include "avx10_2-512-vcmppbf16-2.c" + +#undef AVX512F_LEN +#undef AVX512F_LEN_HALF + +#define AVX512F_LEN 128 +#define AVX512F_LEN_HALF 128 +#include "avx10_2-512-vcmppbf16-2.c" diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-vcomsbf16-1.c b/gcc/testsuite/gcc.target/i386/avx10_2-vcomsbf16-1.c new file mode 100644 index 00000000000..e603aad27bd --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx10_2-vcomsbf16-1.c @@ -0,0 +1,19 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx10.2 -O2" } */ +/* { dg-final { scan-assembler-times "vcomsbf16\[ \\t\]+\[^{}\n\]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 6 } } */ +/* { dg-final { scan-assembler-times "jp" 2 } } */ +#include + +volatile __m128bh x1, x2; +volatile int res; + +void extern +avx10_2_vcom_test (void) +{ + res = _mm_comeq_sbh (x1, x2); + res = _mm_comlt_sbh (x1, x2); + res = _mm_comle_sbh (x1, x2); + res = _mm_comgt_sbh (x1, x2); + res = _mm_comge_sbh (x1, x2); + res = _mm_comneq_sbh (x1, x2); +} diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-vcomsbf16-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-vcomsbf16-2.c new file mode 100644 index 00000000000..c4f0c822678 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx10_2-vcomsbf16-2.c @@ -0,0 +1,58 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mavx10.2" } */ +/* { dg-require-effective-target avx10_2 } */ + +#define AVX10_2 +#define AVX10_SCALAR +#include "avx10-helper.h" +#define SIZE_RES (128 / 16) + +#define CMP(PRED, IMM) \ + exp = _mm_comi_round_ss (__A, __B, IMM, _MM_FROUND_NO_EXC); \ + res1 = _mm_com##PRED##_sbh (src1.x, src2.x); \ + if (exp != res1) \ + abort (); + +void +TEST (void) +{ + int i; + int res1, exp; + UNION_TYPE (128, bf16_uw) src1, src2; + + struct + { + float x1; + float x2; + } + inputs[] = + { + { 4.3, 2.18 }, + { -4.3, 3.18 }, + { __builtin_nanf (""), -5.8 }, + { -4.8, __builtin_nansf ("") }, + { 3.8, __builtin_nansf ("") }, + { 4.2, 4.2 }, + { __builtin_nanf (""), __builtin_nansf ("") }, + }; + + for (i = 0; i < sizeof (inputs) / sizeof (inputs[0]); i++) + { + float x = inputs[i].x1; + float y = inputs[i].x2; + + __m128 __A = _mm_load_ss (&x); + __m128 __B = _mm_load_ss (&y); + for (int n = 0; n < SIZE_RES; n++) + { + src2.a[n] = convert_fp32_to_bf16(y); + src1.a[n] = convert_fp32_to_bf16(x); + } + CMP (eq, _CMP_EQ_OQ); + CMP (ge, _CMP_GE_OS); + CMP (gt, _CMP_GT_OS); + CMP (lt, _CMP_LT_OS); + CMP (le, _CMP_LE_OS); + CMP (neq, _CMP_NEQ_UQ); + } +} diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-vfpclasspbf16-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-vfpclasspbf16-2.c new file mode 100644 index 00000000000..2aa57496c1f --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx10_2-vfpclasspbf16-2.c @@ -0,0 +1,16 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mavx10.2" } */ +/* { dg-require-effective-target avx10_2 } */ + +#define AVX10_2 +#define AVX512VL +#define AVX512F_LEN 256 +#define AVX512F_LEN_HALF 128 +#include "avx10_2-512-vfpclasspbf16-2.c" + +#undef AVX512F_LEN +#undef AVX512F_LEN_HALF + +#define AVX512F_LEN 128 +#define AVX512F_LEN_HALF 128 +#include "avx10_2-512-vfpclasspbf16-2.c" diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-vgetexppbf16-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-vgetexppbf16-2.c new file mode 100644 index 00000000000..804a32a4525 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx10_2-vgetexppbf16-2.c @@ -0,0 +1,16 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mavx10.2" } */ +/* { dg-require-effective-target avx10_2 } */ + +#define AVX10_2 +#define AVX512VL +#define AVX512F_LEN 256 +#define AVX512F_LEN_HALF 128 +#include "avx10_2-512-vgetexppbf16-2.c" + +#undef AVX512F_LEN +#undef AVX512F_LEN_HALF + +#define AVX512F_LEN 128 +#define AVX512F_LEN_HALF 128 +#include "avx10_2-512-vgetexppbf16-2.c" diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-vgetmantpbf16-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-vgetmantpbf16-2.c new file mode 100644 index 00000000000..53e0a5e0588 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx10_2-vgetmantpbf16-2.c @@ -0,0 +1,16 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mavx10.2" } */ +/* { dg-require-effective-target avx10_2 } */ + +#define AVX10_2 +#define AVX512VL +#define AVX512F_LEN 256 +#define AVX512F_LEN_HALF 128 +#include "avx10_2-512-vgetmantpbf16-2.c" + +#undef AVX512F_LEN +#undef AVX512F_LEN_HALF + +#define AVX512F_LEN 128 +#define AVX512F_LEN_HALF 128 +#include "avx10_2-512-vgetmantpbf16-2.c" diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-vrcppbf16-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-vrcppbf16-2.c new file mode 100644 index 00000000000..332010aba57 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx10_2-vrcppbf16-2.c @@ -0,0 +1,16 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mavx10.2" } */ +/* { dg-require-effective-target avx10_2 } */ + +#define AVX10_2 +#define AVX512VL +#define AVX512F_LEN 256 +#define AVX512F_LEN_HALF 128 +#include "avx10_2-512-vrcppbf16-2.c" + +#undef AVX512F_LEN +#undef AVX512F_LEN_HALF + +#define AVX512F_LEN 128 +#define AVX512F_LEN_HALF 128 +#include "avx10_2-512-vrcppbf16-2.c" diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-vreducenepbf16-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-vreducenepbf16-2.c new file mode 100644 index 00000000000..809baf7c284 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx10_2-vreducenepbf16-2.c @@ -0,0 +1,16 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mavx10.2" } */ +/* { dg-require-effective-target avx10_2 } */ + +#define AVX10_2 +#define AVX512VL +#define AVX512F_LEN 256 +#define AVX512F_LEN_HALF 128 +#include "avx10_2-512-vreducenepbf16-2.c" + +#undef AVX512F_LEN +#undef AVX512F_LEN_HALF + +#define AVX512F_LEN 128 +#define AVX512F_LEN_HALF 128 +#include "avx10_2-512-vreducenepbf16-2.c" diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-vrndscalenepbf16-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-vrndscalenepbf16-2.c new file mode 100644 index 00000000000..ee6e71da3ba --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx10_2-vrndscalenepbf16-2.c @@ -0,0 +1,16 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mavx10.2" } */ +/* { dg-require-effective-target avx10_2 } */ + +#define AVX10_2 +#define AVX512VL +#define AVX512F_LEN 256 +#define AVX512F_LEN_HALF 128 +#include "avx10_2-512-vrndscalenepbf16-2.c" + +#undef AVX512F_LEN +#undef AVX512F_LEN_HALF + +#define AVX512F_LEN 128 +#define AVX512F_LEN_HALF 128 +#include "avx10_2-512-vrndscalenepbf16-2.c" diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-vrsqrtpbf16-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-vrsqrtpbf16-2.c new file mode 100644 index 00000000000..80c8ba38815 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx10_2-vrsqrtpbf16-2.c @@ -0,0 +1,16 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mavx10.2" } */ +/* { dg-require-effective-target avx10_2 } */ + +#define AVX10_2 +#define AVX512VL +#define AVX512F_LEN 256 +#define AVX512F_LEN_HALF 128 +#include "avx10_2-512-vrsqrtpbf16-2.c" + +#undef AVX512F_LEN +#undef AVX512F_LEN_HALF + +#define AVX512F_LEN 128 +#define AVX512F_LEN_HALF 128 +#include "avx10_2-512-vrsqrtpbf16-2.c" diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-vsqrtnepbf16-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-vsqrtnepbf16-2.c new file mode 100644 index 00000000000..c6d6ca4c7bd --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx10_2-vsqrtnepbf16-2.c @@ -0,0 +1,16 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mavx10.2" } */ +/* { dg-require-effective-target avx10_2 } */ + +#define AVX10_2 +#define AVX512VL +#define AVX512F_LEN 256 +#define AVX512F_LEN_HALF 128 +#include "avx10_2-512-vsqrtnepbf16-2.c" + +#undef AVX512F_LEN +#undef AVX512F_LEN_HALF + +#define AVX512F_LEN 128 +#define AVX512F_LEN_HALF 128 +#include "avx10_2-512-vsqrtnepbf16-2.c" diff --git a/gcc/testsuite/gcc.target/i386/sse-13.c b/gcc/testsuite/gcc.target/i386/sse-13.c index a5ba3decc97..e92d04af3f5 100644 --- a/gcc/testsuite/gcc.target/i386/sse-13.c +++ b/gcc/testsuite/gcc.target/i386/sse-13.c @@ -1024,4 +1024,23 @@ /* avx10_2-512convertintrin.h */ #define __builtin_ia32_vcvt2ps2phx512_mask_round(A, B, C, D, E) __builtin_ia32_vcvt2ps2phx512_mask_round(A, B, C, D, 8) +/* avx10_2-512bf16intrin.h */ +#define __builtin_ia32_rndscalenepbf16512_mask(A, B, C, D) __builtin_ia32_rndscalenepbf16512_mask(A, 123, C, D) +#define __builtin_ia32_reducenepbf16512_mask(A, B, C, D) __builtin_ia32_reducenepbf16512_mask(A, 123, C, D) +#define __builtin_ia32_getmantpbf16512_mask(A, B, C, D) __builtin_ia32_getmantpbf16512_mask(A, 1, C, D) +#define __builtin_ia32_fpclasspbf16512_mask(A, B, C) __builtin_ia32_fpclasspbf16512_mask(A, 1, C) +#define __builtin_ia32_cmppbf16512_mask(A, B, C, D) __builtin_ia32_cmppbf16512_mask(A, B, 1, D) + +/* avx10_2bf16intrin.h */ +#define __builtin_ia32_rndscalenepbf16256_mask(A, B, C, D) __builtin_ia32_rndscalenepbf16256_mask(A, 123, C, D) +#define __builtin_ia32_rndscalenepbf16128_mask(A, B, C, D) __builtin_ia32_rndscalenepbf16128_mask(A, 123, C, D) +#define __builtin_ia32_reducenepbf16256_mask(A, B, C, D) __builtin_ia32_reducenepbf16256_mask(A, 123, C, D) +#define __builtin_ia32_reducenepbf16128_mask(A, B, C, D) __builtin_ia32_reducenepbf16128_mask(A, 123, C, D) +#define __builtin_ia32_getmantpbf16256_mask(A, B, C, D) __builtin_ia32_getmantpbf16256_mask(A, 1, C, D) +#define __builtin_ia32_getmantpbf16128_mask(A, B, C, D) __builtin_ia32_getmantpbf16128_mask(A, 1, C, D) +#define __builtin_ia32_fpclasspbf16256_mask(A, B, C) __builtin_ia32_fpclasspbf16256_mask(A, 1, C) +#define __builtin_ia32_fpclasspbf16128_mask(A, B, C) __builtin_ia32_fpclasspbf16128_mask(A, 1, C) +#define __builtin_ia32_cmppbf16256_mask(A, B, C, D) __builtin_ia32_cmppbf16256_mask(A, B, 1, D) +#define __builtin_ia32_cmppbf16128_mask(A, B, C, D) __builtin_ia32_cmppbf16128_mask(A, B, 1, D) + #include diff --git a/gcc/testsuite/gcc.target/i386/sse-14.c b/gcc/testsuite/gcc.target/i386/sse-14.c index 9253e5eb905..49a82d8a2d5 100644 --- a/gcc/testsuite/gcc.target/i386/sse-14.c +++ b/gcc/testsuite/gcc.target/i386/sse-14.c @@ -1388,3 +1388,46 @@ test_2 (_mm256_cvtx_round2ps_ph, __m256h, __m256, __m256, 4) /* avx10_2-512convertintrin.h */ test_2 (_mm512_cvtx_round2ps_ph, __m512h, __m512, __m512, 4) + +/* avx10_2-512bf16intrin.h */ +test_1 (_mm512_roundscalene_pbh, __m512bh, __m512bh, 123) +test_2 (_mm512_maskz_roundscalene_pbh, __m512bh, __mmask32, __m512bh, 123) +test_3 (_mm512_mask_roundscalene_pbh, __m512bh, __m512bh, __mmask32, __m512bh, 123) +test_1 (_mm512_reducene_pbh, __m512bh, __m512bh, 123) +test_2 (_mm512_maskz_reducene_pbh, __m512bh, __mmask32, __m512bh, 123) +test_3 (_mm512_mask_reducene_pbh, __m512bh, __m512bh, __mmask32, __m512bh, 123) +test_1x (_mm512_getmant_pbh, __m512bh, __m512bh, 1, 1) +test_2x (_mm512_maskz_getmant_pbh, __m512bh, __mmask32,__m512bh, 1, 1) +test_3x (_mm512_mask_getmant_pbh, __m512bh, __m512bh, __mmask32,__m512bh, 1, 1) +test_1 (_mm512_fpclass_pbh_mask, __mmask32, __m512bh, 13) +test_2 (_mm512_mask_fpclass_pbh_mask, __mmask32, __mmask32, __m512bh, 13) +test_2 (_mm512_cmp_pbh_mask, __mmask32, __m512bh, __m512bh, 1) +test_3 (_mm512_mask_cmp_pbh_mask, __mmask32, __mmask32,__m512bh, __m512bh, 1) + +/* avx10_2bf16intrin.h */ +test_1 (_mm256_roundscalene_pbh, __m256bh, __m256bh, 123) +test_1 (_mm_roundscalene_pbh, __m128bh, __m128bh, 123) +test_2 (_mm256_maskz_roundscalene_pbh, __m256bh, __mmask16, __m256bh, 123) +test_2 (_mm_maskz_roundscalene_pbh, __m128bh, __mmask8, __m128bh, 123) +test_3 (_mm256_mask_roundscalene_pbh, __m256bh, __m256bh, __mmask16, __m256bh, 123) +test_3 (_mm_mask_roundscalene_pbh, __m128bh, __m128bh, __mmask8, __m128bh, 123) +test_1 (_mm256_reducene_pbh, __m256bh, __m256bh, 123) +test_1 (_mm_reducene_pbh, __m128bh, __m128bh, 123) +test_2 (_mm256_maskz_reducene_pbh, __m256bh, __mmask16, __m256bh, 123) +test_2 (_mm_maskz_reducene_pbh, __m128bh, __mmask8, __m128bh, 123) +test_3 (_mm256_mask_reducene_pbh, __m256bh, __m256bh, __mmask16, __m256bh, 123) +test_3 (_mm_mask_reducene_pbh, __m128bh, __m128bh, __mmask8, __m128bh, 123) +test_1x (_mm256_getmant_pbh, __m256bh, __m256bh, 1, 1) +test_1x (_mm_getmant_pbh, __m128bh, __m128bh, 1, 1) +test_2x (_mm256_maskz_getmant_pbh, __m256bh, __mmask16,__m256bh, 1, 1) +test_2x (_mm_maskz_getmant_pbh, __m128bh, __mmask8, __m128bh, 1, 1) +test_3x (_mm256_mask_getmant_pbh, __m256bh, __m256bh, __mmask16,__m256bh, 1, 1) +test_3x (_mm_mask_getmant_pbh, __m128bh, __m128bh, __mmask8, __m128bh, 1, 1) +test_1 (_mm256_fpclass_pbh_mask, __mmask16, __m256bh, 13) +test_1 (_mm_fpclass_pbh_mask, __mmask8, __m128bh, 13) +test_2 (_mm256_mask_fpclass_pbh_mask, __mmask16, __mmask16, __m256bh, 13) +test_2 (_mm_mask_fpclass_pbh_mask, __mmask8, __mmask8, __m128bh, 13) +test_2 (_mm256_cmp_pbh_mask, __mmask16, __m256bh, __m256bh, 1) +test_2 (_mm_cmp_pbh_mask, __mmask8, __m128bh, __m128bh, 1) +test_3 (_mm256_mask_cmp_pbh_mask, __mmask16, __mmask16, __m256bh, __m256bh, 1) +test_3 (_mm_mask_cmp_pbh_mask, __mmask8, __mmask8, __m128bh, __m128bh, 1) diff --git a/gcc/testsuite/gcc.target/i386/sse-22.c b/gcc/testsuite/gcc.target/i386/sse-22.c index d57bbc41a49..193057a4719 100644 --- a/gcc/testsuite/gcc.target/i386/sse-22.c +++ b/gcc/testsuite/gcc.target/i386/sse-22.c @@ -1427,3 +1427,46 @@ test_2 (_mm256_cvtx_round2ps_ph, __m256h, __m256, __m256, 4) /* avx10_2-512convertintrin.h */ test_2 (_mm512_cvtx_round2ps_ph, __m512h, __m512, __m512, 4) + +/* avx10_2-512bf16intrin.h */ +test_1 (_mm512_roundscalene_pbh, __m512bh, __m512bh, 123) +test_2 (_mm512_maskz_roundscalene_pbh, __m512bh, __mmask32, __m512bh, 123) +test_3 (_mm512_mask_roundscalene_pbh, __m512bh, __m512bh, __mmask32, __m512bh, 123) +test_1 (_mm512_reducene_pbh, __m512bh, __m512bh, 123) +test_2 (_mm512_maskz_reducene_pbh, __m512bh, __mmask32, __m512bh, 123) +test_3 (_mm512_mask_reducene_pbh, __m512bh, __m512bh, __mmask32, __m512bh, 123) +test_1x (_mm512_getmant_pbh, __m512bh, __m512bh, 1, 1) +test_2x (_mm512_maskz_getmant_pbh, __m512bh, __mmask32,__m512bh, 1, 1) +test_3x (_mm512_mask_getmant_pbh, __m512bh, __m512bh, __mmask32,__m512bh, 1, 1) +test_1 (_mm512_fpclass_pbh_mask, __mmask32, __m512bh, 13) +test_2 (_mm512_mask_fpclass_pbh_mask, __mmask32, __mmask32, __m512bh, 13) +test_2 (_mm512_cmp_pbh_mask, __mmask32, __m512bh, __m512bh, 1) +test_3 (_mm512_mask_cmp_pbh_mask, __mmask32, __mmask32,__m512bh, __m512bh, 1) + +/* avx10_2bf16intrin.h */ +test_1 (_mm256_roundscalene_pbh, __m256bh, __m256bh, 123) +test_1 (_mm_roundscalene_pbh, __m128bh, __m128bh, 123) +test_2 (_mm256_maskz_roundscalene_pbh, __m256bh, __mmask16, __m256bh, 123) +test_2 (_mm_maskz_roundscalene_pbh, __m128bh, __mmask8, __m128bh, 123) +test_3 (_mm256_mask_roundscalene_pbh, __m256bh, __m256bh, __mmask16, __m256bh, 123) +test_3 (_mm_mask_roundscalene_pbh, __m128bh, __m128bh, __mmask8, __m128bh, 123) +test_1 (_mm256_reducene_pbh, __m256bh, __m256bh, 123) +test_1 (_mm_reducene_pbh, __m128bh, __m128bh, 123) +test_2 (_mm256_maskz_reducene_pbh, __m256bh, __mmask16, __m256bh, 123) +test_2 (_mm_maskz_reducene_pbh, __m128bh, __mmask8, __m128bh, 123) +test_3 (_mm256_mask_reducene_pbh, __m256bh, __m256bh, __mmask16, __m256bh, 123) +test_3 (_mm_mask_reducene_pbh, __m128bh, __m128bh, __mmask8, __m128bh, 123) +test_1x (_mm256_getmant_pbh, __m256bh, __m256bh, 1, 1) +test_1x (_mm_getmant_pbh, __m128bh, __m128bh, 1, 1) +test_2x (_mm256_maskz_getmant_pbh, __m256bh, __mmask16,__m256bh, 1, 1) +test_2x (_mm_maskz_getmant_pbh, __m128bh, __mmask8, __m128bh, 1, 1) +test_3x (_mm256_mask_getmant_pbh, __m256bh, __m256bh, __mmask16,__m256bh, 1, 1) +test_3x (_mm_mask_getmant_pbh, __m128bh, __m128bh, __mmask8, __m128bh, 1, 1) +test_1 (_mm256_fpclass_pbh_mask, __mmask16, __m256bh, 13) +test_1 (_mm_fpclass_pbh_mask, __mmask8, __m128bh, 13) +test_2 (_mm256_mask_fpclass_pbh_mask, __mmask16, __mmask16, __m256bh, 13) +test_2 (_mm_mask_fpclass_pbh_mask, __mmask8, __mmask8, __m128bh, 13) +test_2 (_mm256_cmp_pbh_mask, __mmask16, __m256bh, __m256bh, 1) +test_2 (_mm_cmp_pbh_mask, __mmask8, __m128bh, __m128bh, 1) +test_3 (_mm256_mask_cmp_pbh_mask, __mmask16, __mmask16, __m256bh, __m256bh, 1) +test_3 (_mm_mask_cmp_pbh_mask, __mmask8, __mmask8, __m128bh, __m128bh, 1) diff --git a/gcc/testsuite/gcc.target/i386/sse-23.c b/gcc/testsuite/gcc.target/i386/sse-23.c index 438974cb0c6..a33eb9945dd 100644 --- a/gcc/testsuite/gcc.target/i386/sse-23.c +++ b/gcc/testsuite/gcc.target/i386/sse-23.c @@ -998,6 +998,25 @@ /* avx10_2-512convertintrin.h */ #define __builtin_ia32_vcvt2ps2phx512_mask_round(A, B, C, D, E) __builtin_ia32_vcvt2ps2phx512_mask_round(A, B, C, D, 8) +/* avx10_2-512bf16intrin.h */ +#define __builtin_ia32_rndscalenepbf16512_mask(A, B, C, D) __builtin_ia32_rndscalenepbf16512_mask(A, 123, C, D) +#define __builtin_ia32_reducenepbf16512_mask(A, B, C, D) __builtin_ia32_reducenepbf16512_mask(A, 123, C, D) +#define __builtin_ia32_getmantpbf16512_mask(A, B, C, D) __builtin_ia32_getmantpbf16512_mask(A, 1, C, D) +#define __builtin_ia32_fpclasspbf16512_mask(A, B, C) __builtin_ia32_fpclasspbf16512_mask(A, 1, C) +#define __builtin_ia32_cmppbf16512_mask(A, B, C, D) __builtin_ia32_cmppbf16512_mask(A, B, 1, D) + +/* avx10_2bf16intrin.h */ +#define __builtin_ia32_rndscalenepbf16256_mask(A, B, C, D) __builtin_ia32_rndscalenepbf16256_mask(A, 123, C, D) +#define __builtin_ia32_rndscalenepbf16128_mask(A, B, C, D) __builtin_ia32_rndscalenepbf16128_mask(A, 123, C, D) +#define __builtin_ia32_reducenepbf16256_mask(A, B, C, D) __builtin_ia32_reducenepbf16256_mask(A, 123, C, D) +#define __builtin_ia32_reducenepbf16128_mask(A, B, C, D) __builtin_ia32_reducenepbf16128_mask(A, 123, C, D) +#define __builtin_ia32_getmantpbf16256_mask(A, B, C, D) __builtin_ia32_getmantpbf16256_mask(A, 1, C, D) +#define __builtin_ia32_getmantpbf16128_mask(A, B, C, D) __builtin_ia32_getmantpbf16128_mask(A, 1, C, D) +#define __builtin_ia32_fpclasspbf16256_mask(A, B, C) __builtin_ia32_fpclasspbf16256_mask(A, 1, C) +#define __builtin_ia32_fpclasspbf16128_mask(A, B, C) __builtin_ia32_fpclasspbf16128_mask(A, 1, C) +#define __builtin_ia32_cmppbf16256_mask(A, B, C, D) __builtin_ia32_cmppbf16256_mask(A, B, 1, D) +#define __builtin_ia32_cmppbf16128_mask(A, B, C, D) __builtin_ia32_cmppbf16128_mask(A, B, 1, D) + #pragma GCC target ("sse4a,3dnow,avx,avx2,fma4,xop,aes,pclmul,popcnt,abm,lzcnt,bmi,bmi2,tbm,lwp,fsgsbase,rdrnd,f16c,fma,rtm,rdseed,prfchw,adx,fxsr,xsaveopt,sha,xsavec,xsaves,clflushopt,clwb,mwaitx,clzero,pku,sgx,rdpid,gfni,vpclmulqdq,pconfig,wbnoinvd,enqcmd,avx512vp2intersect,serialize,tsxldtrk,amx-tile,amx-int8,amx-bf16,kl,widekl,avxvnni,avxifma,avxvnniint8,avxneconvert,cmpccxadd,amx-fp16,prefetchi,raoint,amx-complex,avxvnniint16,sm3,sha512,sm4,avx10.2-512") #include From patchwork Mon Aug 19 08:56:51 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Haochen Jiang X-Patchwork-Id: 1973730 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=intel.com header.i=@intel.com header.a=rsa-sha256 header.s=Intel header.b=aarBpn1c; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4WnRP05y8Rz1yfj for ; Mon, 19 Aug 2024 18:59:32 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 9C60C3865C23 for ; Mon, 19 Aug 2024 08:59:30 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.18]) by sourceware.org (Postfix) with ESMTPS id A73B53864C64 for ; Mon, 19 Aug 2024 08:57:32 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org A73B53864C64 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=intel.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org A73B53864C64 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=192.198.163.18 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1724057860; cv=none; b=XDR8MShjy5kwCiNRJ6u8ObYOrIfeG7svw08NnD1GCAGAzgVZq8sRgWtYTRzC7f/M9sveeiGK8Cvxg2AIl5asHpeCC7lCn0ep0EXGjWJWUpzSeNMOWf9dW6wA0ONM5iXQI1Vs2NhSo79Y0EAYOtOQkE5OjG+PnaCgi3jBl3NqK8w= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1724057860; c=relaxed/simple; bh=Xun+bNr4NJLhw/wfRRoO8La6HiCa5GrbpfUksZBskao=; h=DKIM-Signature:From:To:Subject:Date:Message-ID:MIME-Version; b=moK7CQPGwUByJuluqdcQYAM7jQaKQPqZvkVncDCQshx/aBDRPC8aEGGcjR9M/TKZEFWdVAa0/vEoR149DaP5DM5cLILCuAqjBLijCOe33nslV33ZEeNEyJZcZfpjVIDXTJcuC1ih5IIIcl0D+uT48aUmYjplCjxXtXIRrRGPNuQ= ARC-Authentication-Results: i=1; server2.sourceware.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1724057853; x=1755593853; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=Xun+bNr4NJLhw/wfRRoO8La6HiCa5GrbpfUksZBskao=; b=aarBpn1cr95oQZ8pgwsn0C74Aj2M/aBkP/IyLMrfn+UJPky6FFu62Uur mBQEJ1LjVk3ErIpHINUVuufnmylyDJqPGrfIVIIqNeIBbYftb+Idz4isZ sKGICSOpKDcNckqamzePqe0q0xvZ/oDPINXudL5GWlDZpsuaa1XB5PKQ3 SIctrhzCzWd3FedKGkQ9U+IlaI2TXD88sL45//tirRll01i1nB3B42K0x /dSsRVl4Ykm+T8C3GqA1CSgV72DNifCY1KTgsdYK3V8b3aauRTGwOSHCR 0EQZIG3RsLiUpIqCJR6psKXakxa6AuY0kiZBPzJxWyzsF/JeCwUYc8Hcp w==; X-CSE-ConnectionGUID: ICU/tRhZQrW0bvRpE2ojwQ== X-CSE-MsgGUID: Erzzw1qZSJWlrxMvi+VOWw== X-IronPort-AV: E=McAfee;i="6700,10204,11168"; a="21837765" X-IronPort-AV: E=Sophos;i="6.10,158,1719903600"; d="scan'208";a="21837765" Received: from fmviesa001.fm.intel.com ([10.60.135.141]) by fmvoesa112.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Aug 2024 01:57:23 -0700 X-CSE-ConnectionGUID: vVBwsbCuTi2I+pUwElmCZw== X-CSE-MsgGUID: AmAsqyySSTyyGMeTqrHsVQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.10,158,1719903600"; d="scan'208";a="91084208" Received: from scymds04.sc.intel.com ([10.82.73.238]) by fmviesa001.fm.intel.com with ESMTP; 19 Aug 2024 01:57:22 -0700 Received: from icl-spr-01.jf.intel.com (icl-spr-01.jf.intel.com [10.165.54.241]) by scymds04.sc.intel.com (Postfix) with ESMTP id 0844E2003EAC; Mon, 19 Aug 2024 01:57:22 -0700 (PDT) From: Haochen Jiang To: gcc-patches@gcc.gnu.org Cc: hongtao.liu@intel.com, zewei.mo@pitt.edu, ubizjak@gmail.com, "Hu, Lin1" Subject: [PATCH 07/12] [PATCH 1/2] AVX10.2: Support saturating convert instructions Date: Mon, 19 Aug 2024 01:56:51 -0700 Message-ID: <20240819085717.193256-8-haochen.jiang@intel.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20240819085717.193256-1-haochen.jiang@intel.com> References: <20240819085717.193256-1-haochen.jiang@intel.com> MIME-Version: 1.0 X-Spam-Status: No, score=-10.9 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, SPF_HELO_NONE, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org From: "Hu, Lin1" gcc/ChangeLog: * config.gcc: Add avx10_2satcvtintrin.h and avx10_2-512satcvtintrin.h. * config/i386/i386-builtin-types.def: Add DEF_FUNCTION_TYPE (V8HI, V8BF, V8HI, UQI), (V16HI, V16BF, V16HI, UHI), (V32HI, V32BF, V32HI, USI), (V16SI, V16SF, V16SI, UHI, INT), (V16HI, V16BF, V16HI, UHI, INT), (V32HI, V32BF, V32HI, USI, INT). * config/i386/i386-builtin.def (BDESC): Add new builtins. * config/i386/i386-expand.cc (ix86_expand_args_builtin): Handle V32HI_FTYPE_V32BF_V32HI_USI, V16HI_FTYPE_V16BF_V16HI_UHI, V8HI_FTYPE_V8BF_V8HI_UQI. (ix86_expand_round_builtin): Handle V32HI_FTYPE_V32BF_V32HI_USI_INT, V16SI_FTYPE_V16SF_V16SI_UHI_INT, V16HI_FTYPE_V16BF_V16HI_UHI_INT. * config/i386/immintrin.h: Include avx10_2satcvtintrin.h and avx10_2-512savcvtintrin.h. * config/i386/sse.md: (avx10_2_cvtnebf162ibs): New. (avx10_2_cvtph2ibs): Ditto. (avx10_2_cvttph2ibs): Ditto. (avx10_2_cvtps2ibs): Ditto. (avx10_2_cvttps2ibs): Ditto. * config/i386/avx10_2-512satcvtintrin.h: New file. * config/i386/avx10_2satcvtintrin.h: Ditto. gcc/testsuite/ChangeLog: * gcc.target/i386/avx-1.c: Add macros. * gcc.target/i386/sse-13.c: Ditto. * gcc.target/i386/sse-14.c: Ditto. * gcc.target/i386/sse-22.c: Ditto. * gcc.target/i386/sse-23.c: Ditto. * gcc.target/i386/avx10_2-512-satcvt-1.c: New test. * gcc.target/i386/avx10_2-512-vcvtnebf162ibs-2.c: Ditto. * gcc.target/i386/avx10_2-512-vcvtnebf162iubs-2.c: Ditto. * gcc.target/i386/avx10_2-512-vcvtph2ibs-2.c: Ditto. * gcc.target/i386/avx10_2-512-vcvtph2iubs-2.c: Ditto. * gcc.target/i386/avx10_2-512-vcvtps2ibs-2.c: Ditto. * gcc.target/i386/avx10_2-512-vcvtps2iubs-2.c: Ditto. * gcc.target/i386/avx10_2-512-vcvttnebf162ibs-2.c: Ditto. * gcc.target/i386/avx10_2-512-vcvttnebf162iubs-2.c: Ditto. * gcc.target/i386/avx10_2-512-vcvttph2ibs-2.c: Ditto. * gcc.target/i386/avx10_2-512-vcvttph2iubs-2.c: Ditto. * gcc.target/i386/avx10_2-512-vcvttps2ibs-2.c: Ditto. * gcc.target/i386/avx10_2-512-vcvttps2iubs-2.c: Ditto. * gcc.target/i386/avx10_2-satcvt-1.c: Ditto. * gcc.target/i386/avx10_2-vcvtnebf162ibs-2.c: Ditto. * gcc.target/i386/avx10_2-vcvtnebf162iubs-2.c: Ditto. * gcc.target/i386/avx10_2-vcvtph2ibs-2.c: Ditto. * gcc.target/i386/avx10_2-vcvtph2iubs-2.c: Ditto. * gcc.target/i386/avx10_2-vcvtps2ibs-2.c: Ditto. * gcc.target/i386/avx10_2-vcvttnebf162ibs-2.c: Ditto. * gcc.target/i386/avx10_2-vcvttnebf162iubs-2.c: Ditto. * gcc.target/i386/avx10_2-vcvttph2ibs-2.c: Ditto. * gcc.target/i386/avx10_2-vcvttph2iubs-2.c: Ditto. * gcc.target/i386/avx10_2-vcvttps2ibs-2.c: Ditto. * gcc.target/i386/avx10_2-vcvttps2iubs-2.c: Ditto. --- gcc/config.gcc | 4 +- gcc/config/i386/avx10_2-512satcvtintrin.h | 624 ++++++++++ gcc/config/i386/avx10_2satcvtintrin.h | 1022 +++++++++++++++++ gcc/config/i386/i386-builtin-types.def | 6 + gcc/config/i386/i386-builtin.def | 36 + gcc/config/i386/i386-expand.cc | 6 + gcc/config/i386/immintrin.h | 3 + gcc/config/i386/sse.md | 110 ++ gcc/testsuite/gcc.target/i386/avx-1.c | 20 + .../gcc.target/i386/avx10_2-512-satcvt-1.c | 100 ++ .../i386/avx10_2-512-vcvtnebf162ibs-2.c | 69 ++ .../i386/avx10_2-512-vcvtnebf162iubs-2.c | 69 ++ .../i386/avx10_2-512-vcvtph2ibs-2.c | 74 ++ .../i386/avx10_2-512-vcvtph2iubs-2.c | 74 ++ .../i386/avx10_2-512-vcvtps2ibs-2.c | 75 ++ .../i386/avx10_2-512-vcvtps2iubs-2.c | 73 ++ .../i386/avx10_2-512-vcvttnebf162ibs-2.c | 69 ++ .../i386/avx10_2-512-vcvttnebf162iubs-2.c | 69 ++ .../i386/avx10_2-512-vcvttph2ibs-2.c | 74 ++ .../i386/avx10_2-512-vcvttph2iubs-2.c | 74 ++ .../i386/avx10_2-512-vcvttps2ibs-2.c | 75 ++ .../i386/avx10_2-512-vcvttps2iubs-2.c | 73 ++ .../gcc.target/i386/avx10_2-satcvt-1.c | 187 +++ .../i386/avx10_2-vcvtnebf162ibs-2.c | 16 + .../i386/avx10_2-vcvtnebf162iubs-2.c | 16 + .../gcc.target/i386/avx10_2-vcvtph2ibs-2.c | 16 + .../gcc.target/i386/avx10_2-vcvtph2iubs-2.c | 16 + .../gcc.target/i386/avx10_2-vcvtps2ibs-2.c | 16 + .../i386/avx10_2-vcvttnebf162ibs-2.c | 16 + .../i386/avx10_2-vcvttnebf162iubs-2.c | 16 + .../gcc.target/i386/avx10_2-vcvttph2ibs-2.c | 16 + .../gcc.target/i386/avx10_2-vcvttph2iubs-2.c | 16 + .../gcc.target/i386/avx10_2-vcvttps2ibs-2.c | 16 + .../gcc.target/i386/avx10_2-vcvttps2iubs-2.c | 16 + .../gcc.target/i386/avx512f-helper.h | 2 + gcc/testsuite/gcc.target/i386/m512-check.h | 21 + gcc/testsuite/gcc.target/i386/sse-13.c | 20 + gcc/testsuite/gcc.target/i386/sse-14.c | 52 + gcc/testsuite/gcc.target/i386/sse-22.c | 52 + gcc/testsuite/gcc.target/i386/sse-23.c | 20 + 40 files changed, 3328 insertions(+), 1 deletion(-) create mode 100644 gcc/config/i386/avx10_2-512satcvtintrin.h create mode 100644 gcc/config/i386/avx10_2satcvtintrin.h create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-512-satcvt-1.c create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-512-vcvtnebf162ibs-2.c create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-512-vcvtnebf162iubs-2.c create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-512-vcvtph2ibs-2.c create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-512-vcvtph2iubs-2.c create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-512-vcvtps2ibs-2.c create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-512-vcvtps2iubs-2.c create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-512-vcvttnebf162ibs-2.c create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-512-vcvttnebf162iubs-2.c create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-512-vcvttph2ibs-2.c create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-512-vcvttph2iubs-2.c create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-512-vcvttps2ibs-2.c create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-512-vcvttps2iubs-2.c create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-satcvt-1.c create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-vcvtnebf162ibs-2.c create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-vcvtnebf162iubs-2.c create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-vcvtph2ibs-2.c create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-vcvtph2iubs-2.c create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-vcvtps2ibs-2.c create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-vcvttnebf162ibs-2.c create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-vcvttnebf162iubs-2.c create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-vcvttph2ibs-2.c create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-vcvttph2iubs-2.c create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-vcvttps2ibs-2.c create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-vcvttps2iubs-2.c diff --git a/gcc/config.gcc b/gcc/config.gcc index 7d761b257cd..4bcb461b68c 100644 --- a/gcc/config.gcc +++ b/gcc/config.gcc @@ -454,7 +454,9 @@ i[34567]86-*-* | x86_64-*-*) sm3intrin.h sha512intrin.h sm4intrin.h usermsrintrin.h avx10_2roundingintrin.h avx10_2mediaintrin.h avx10_2-512mediaintrin.h - avx10_2bf16intrin.h avx10_2-512bf16intrin.h" + avx10_2convertintrin.h avx10_2-512convertintrin.h + avx10_2bf16intrin.h avx10_2-512bf16intrin.h + avx10_2satcvtintrin.h avx10_2-512satcvtintrin.h" ;; ia64-*-*) extra_headers=ia64intrin.h diff --git a/gcc/config/i386/avx10_2-512satcvtintrin.h b/gcc/config/i386/avx10_2-512satcvtintrin.h new file mode 100644 index 00000000000..4286458c413 --- /dev/null +++ b/gcc/config/i386/avx10_2-512satcvtintrin.h @@ -0,0 +1,624 @@ +/* Copyright (C) 2024 Free Software Foundation, Inc. + + This file is part of GCC. + + GCC is free software; you can redistribute it and/or modify + it under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 3, or (at your option) + any later version. + + GCC is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + Under Section 7 of GPL version 3, you are granted additional + permissions described in the GCC Runtime Library Exception, version + 3.1, as published by the Free Software Foundation. + + You should have received a copy of the GNU General Public License and + a copy of the GCC Runtime Library Exception along with this program; + see the files COPYING3 and COPYING.RUNTIME respectively. If not, see + . */ + +#if !defined _IMMINTRIN_H_INCLUDED +#error "Never use directly; include instead." +#endif + +#ifndef _AVX10_2_512SATCVTINTRIN_H_INCLUDED +#define _AVX10_2_512SATCVTINTRIN_H_INCLUDED + +#if !defined (__AVX10_2_512__) +#pragma GCC push_options +#pragma GCC target("avx10.2-512") +#define __DISABLE_AVX10_2_512__ +#endif /* __AVX10_2_512__ */ + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_ipcvtnebf16_epi16 (__m512bh __A) +{ + return + (__m512i) __builtin_ia32_cvtnebf162ibs512_mask ((__v32bf) __A, + (__v32hi) + _mm512_undefined_si512 (), + (__mmask32) -1); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_ipcvtnebf16_epi16 (__m512i __W, __mmask32 __U, __m512bh __A) +{ + return (__m512i) __builtin_ia32_cvtnebf162ibs512_mask ((__v32bf) __A, + (__v32hi) __W, + (__mmask32) __U); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_ipcvtnebf16_epi16 (__mmask32 __U, __m512bh __A) +{ + return + (__m512i) __builtin_ia32_cvtnebf162ibs512_mask ((__v32bf) __A, + (__v32hi) + _mm512_setzero_si512 (), + (__mmask32) __U); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_ipcvtnebf16_epu16 (__m512bh __A) +{ + return + (__m512i) __builtin_ia32_cvtnebf162iubs512_mask ((__v32bf) __A, + (__v32hi) + _mm512_undefined_si512 (), + (__mmask32) -1); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_ipcvtnebf16_epu16 (__m512i __W, __mmask32 __U, __m512bh __A) +{ + return (__m512i) __builtin_ia32_cvtnebf162iubs512_mask ((__v32bf) __A, + (__v32hi) __W, + (__mmask32) __U); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_ipcvtnebf16_epu16 (__mmask32 __U, __m512bh __A) +{ + return + (__m512i) __builtin_ia32_cvtnebf162iubs512_mask ((__v32bf) __A, + (__v32hi) + _mm512_setzero_si512 (), + (__mmask32) __U); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_ipcvttnebf16_epi16 (__m512bh __A) +{ + return + (__m512i) __builtin_ia32_cvttnebf162ibs512_mask ((__v32bf) __A, + (__v32hi) + _mm512_undefined_si512 (), + (__mmask32) -1); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_ipcvttnebf16_epi16 (__m512i __W, __mmask32 __U, __m512bh __A) +{ + return (__m512i) __builtin_ia32_cvttnebf162ibs512_mask ((__v32bf) __A, + (__v32hi) __W, + (__mmask32) __U); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_ipcvttnebf16_epi16 (__mmask32 __U, __m512bh __A) +{ + return + (__m512i) __builtin_ia32_cvttnebf162ibs512_mask ((__v32bf) __A, + (__v32hi) + _mm512_setzero_si512 (), + (__mmask32) __U); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_ipcvttnebf16_epu16 (__m512bh __A) +{ + return (__m512i) + __builtin_ia32_cvttnebf162iubs512_mask ((__v32bf) __A, + (__v32hi) _mm512_undefined_si512 (), + (__mmask32) -1); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_ipcvttnebf16_epu16 (__m512i __W, __mmask32 __U, __m512bh __A) +{ + return (__m512i) __builtin_ia32_cvttnebf162iubs512_mask ((__v32bf) __A, + (__v32hi) __W, + (__mmask32) + __U); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_ipcvttnebf16_epu16 (__mmask32 __U, __m512bh __A) +{ + return (__m512i) + __builtin_ia32_cvttnebf162iubs512_mask ((__v32bf) __A, + (__v32hi) + _mm512_setzero_si512 (), + (__mmask32) __U); +} + +#ifdef __OPTIMIZE__ +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_ipcvt_roundph_epi16 (__m512h __A, const int __R) +{ + return + (__m512i) __builtin_ia32_cvtph2ibs512_mask_round ((__v32hf) __A, + (__v32hi) + _mm512_undefined_si512 (), + (__mmask32) -1, + __R); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_ipcvt_roundph_epi16 (__m512i __W, __mmask32 __U, __m512h __A, + const int __R) +{ + return (__m512i) __builtin_ia32_cvtph2ibs512_mask_round ((__v32hf) __A, + (__v32hi) __W, + (__mmask32) __U, + __R); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_ipcvt_roundph_epi16 (__mmask32 __U, __m512h __A, const int __R) +{ + return + (__m512i) __builtin_ia32_cvtph2ibs512_mask_round ((__v32hf) __A, + (__v32hi) + _mm512_setzero_si512 (), + (__mmask32) __U, + __R); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_ipcvt_roundph_epu16 (__m512h __A, const int __R) +{ + return + (__m512i) __builtin_ia32_cvtph2iubs512_mask_round ((__v32hf) __A, + (__v32hi) + _mm512_undefined_si512 (), + (__mmask32) -1, + __R); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_ipcvt_roundph_epu16 (__m512i __W, __mmask32 __U, __m512h __A, + const int __R) +{ + return (__m512i) __builtin_ia32_cvtph2iubs512_mask_round ((__v32hf) __A, + (__v32hi) __W, + (__mmask32) __U, + __R); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_ipcvt_roundph_epu16 (__mmask32 __U, __m512h __A, const int __R) +{ + return + (__m512i) __builtin_ia32_cvtph2iubs512_mask_round ((__v32hf) __A, + (__v32hi) + _mm512_setzero_si512 (), + (__mmask32) __U, + __R); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_ipcvt_roundps_epi32 (__m512 __A, const int __R) +{ + return + (__m512i) __builtin_ia32_cvtps2ibs512_mask_round ((__v16sf) __A, + (__v16si) + _mm512_undefined_si512 (), + (__mmask16) -1, + __R); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_ipcvt_roundps_epi32 (__m512i __W, __mmask16 __U, __m512 __A, + const int __R) +{ + return (__m512i) __builtin_ia32_cvtps2ibs512_mask_round ((__v16sf) __A, + (__v16si) __W, + (__mmask16) __U, + __R); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_ipcvt_roundps_epi32 (__mmask16 __U, __m512 __A, const int __R) +{ + return + (__m512i) __builtin_ia32_cvtps2ibs512_mask_round ((__v16sf) __A, + (__v16si) + _mm512_setzero_si512 (), + (__mmask16) __U, + __R); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_ipcvt_roundps_epu32 (__m512 __A, const int __R) +{ + return + (__m512i) __builtin_ia32_cvtps2iubs512_mask_round ((__v16sf) __A, + (__v16si) + _mm512_undefined_si512 (), + (__mmask16) -1, + __R); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_ipcvt_roundps_epu32 (__m512i __W, __mmask16 __U, __m512 __A, + const int __R) +{ + return (__m512i) __builtin_ia32_cvtps2iubs512_mask_round ((__v16sf) __A, + (__v16si) __W, + (__mmask16) __U, + __R); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_ipcvt_roundps_epu32 (__mmask16 __U, __m512 __A, const int __R) +{ + return + (__m512i) __builtin_ia32_cvtps2iubs512_mask_round ((__v16sf) __A, + (__v16si) + _mm512_setzero_si512 (), + (__mmask16) __U, + __R); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_ipcvtt_roundph_epi16 (__m512h __A, const int __R) +{ + return (__m512i) + __builtin_ia32_cvttph2ibs512_mask_round ((__v32hf) __A, + (__v32hi) + _mm512_undefined_si512 (), + (__mmask32) -1, + __R); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_ipcvtt_roundph_epi16 (__m512i __W, __mmask32 __U, __m512h __A, + const int __R) +{ + return (__m512i) __builtin_ia32_cvttph2ibs512_mask_round ((__v32hf) __A, + (__v32hi) __W, + (__mmask32) __U, + __R); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_ipcvtt_roundph_epi16 (__mmask32 __U, __m512h __A, const int __R) +{ + return + (__m512i) __builtin_ia32_cvttph2ibs512_mask_round ((__v32hf) __A, + (__v32hi) + _mm512_setzero_si512 (), + (__mmask32) __U, + __R); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_ipcvtt_roundph_epu16 (__m512h __A, const int __R) +{ + return (__m512i) + __builtin_ia32_cvttph2iubs512_mask_round ((__v32hf) __A, + (__v32hi) + _mm512_undefined_si512 (), + (__mmask32) -1, + __R); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_ipcvtt_roundph_epu16 (__m512i __W, __mmask32 __U, __m512h __A, + const int __R) +{ + return (__m512i) __builtin_ia32_cvttph2iubs512_mask_round ((__v32hf) __A, + (__v32hi) __W, + (__mmask32) __U, + __R); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_ipcvtt_roundph_epu16 (__mmask32 __U, __m512h __A, const int __R) +{ + return (__m512i) + __builtin_ia32_cvttph2iubs512_mask_round ((__v32hf) __A, + (__v32hi) + _mm512_setzero_si512 (), + (__mmask32) __U, + __R); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_ipcvtt_roundps_epi32 (__m512 __A, const int __R) +{ + return (__m512i) + __builtin_ia32_cvttps2ibs512_mask_round ((__v16sf) __A, + (__v16si) + _mm512_undefined_si512 (), + (__mmask16) -1, + __R); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_ipcvtt_roundps_epi32 (__m512i __W, __mmask16 __U, __m512 __A, + const int __R) +{ + return (__m512i) __builtin_ia32_cvttps2ibs512_mask_round ((__v16sf) __A, + (__v16si) __W, + (__mmask16) __U, + __R); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_ipcvtt_roundps_epi32 (__mmask16 __U, __m512 __A, const int __R) +{ + return (__m512i) + __builtin_ia32_cvttps2ibs512_mask_round ((__v16sf) __A, + (__v16si) + _mm512_setzero_si512 (), + (__mmask16) __U, + __R); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_ipcvtt_roundps_epu32 (__m512 __A, const int __R) +{ + return (__m512i) + __builtin_ia32_cvttps2iubs512_mask_round ((__v16sf) __A, + (__v16si) + _mm512_undefined_si512 (), + (__mmask16) -1, + __R); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_ipcvtt_roundps_epu32 (__m512i __W, __mmask16 __U, __m512 __A, + const int __R) +{ + return (__m512i) __builtin_ia32_cvttps2iubs512_mask_round ((__v16sf) __A, + (__v16si) __W, + (__mmask16) __U, + __R); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_ipcvtt_roundps_epu32 (__mmask16 __U, __m512 __A, const int __R) +{ + return (__m512i) + __builtin_ia32_cvttps2iubs512_mask_round ((__v16sf) __A, + (__v16si) + _mm512_setzero_si512 (), + (__mmask16) __U, + __R); +} +#else +#define _mm512_ipcvt_roundph_epi16(A, R) \ + ((__m512i) \ + __builtin_ia32_cvtph2ibs512_mask_round ((__v32hf) (A), \ + (__v32hi) \ + (_mm512_undefined_si512 ()), \ + (__mmask32) (-1), \ + (R))) + +#define _mm512_mask_ipcvt_roundph_epi16(W, U, A, R) \ + ((__m512i) __builtin_ia32_cvtph2ibs512_mask_round ((__v32hf) (A), \ + (__v32hi) (W), \ + (__mmask32) (U), \ + (R))) + +#define _mm512_maskz_ipcvt_roundph_epi16(U, A, R) \ + ((__m512i) \ + __builtin_ia32_cvtph2ibs512_mask_round ((__v32hf) (A), \ + (__v32hi) \ + (_mm512_setzero_si512 ()), \ + (__mmask32) (U), \ + (R))) + +#define _mm512_ipcvt_roundph_epu16(A, R) \ + ((__m512i) \ + __builtin_ia32_cvtph2iubs512_mask_round ((__v32hf) (A), \ + (__v32hi) \ + (_mm512_undefined_si512 ()), \ + (__mmask32) (-1), \ + (R))) + +#define _mm512_mask_ipcvt_roundph_epu16(W, U, A, R) \ + ((__m512i) __builtin_ia32_cvtph2iubs512_mask_round ((__v32hf) (A), \ + (__v32hi) (W), \ + (__mmask32) (U), \ + (R))) + +#define _mm512_maskz_ipcvt_roundph_epu16(U, A, R) \ + ((__m512i) \ + __builtin_ia32_cvtph2iubs512_mask_round ((__v32hf) (A), \ + (__v32hi) \ + (_mm512_setzero_si512 ()), \ + (__mmask32) (U), \ + (R))) + +#define _mm512_ipcvt_roundps_epi32(A, R) \ + ((__m512i) \ + __builtin_ia32_cvtps2ibs512_mask_round ((__v16sf) (A), \ + (__v16si) \ + (_mm512_undefined_si512 ()), \ + (__mmask16) (-1), \ + (R))) + +#define _mm512_mask_ipcvt_roundps_epi32(W, U, A, R) \ + ((__m512i) __builtin_ia32_cvtps2ibs512_mask_round ((__v16sf) (A), \ + (__v16si) (W), \ + (__mmask16) (U), \ + (R))) + +#define _mm512_maskz_ipcvt_roundps_epi32(U, A, R) \ + ((__m512i) \ + __builtin_ia32_cvtps2ibs512_mask_round ((__v16sf) (A), \ + (__v16si) \ + (_mm512_setzero_si512 ()), \ + (__mmask16) (U), \ + (R))) + +#define _mm512_ipcvt_roundps_epu32(A, R) \ + ((__m512i) \ + __builtin_ia32_cvtps2iubs512_mask_round ((__v16sf) (A), \ + (__v16si) \ + (_mm512_undefined_si512 ()), \ + (__mmask16) (-1), \ + (R))) + +#define _mm512_mask_ipcvt_roundps_epu32(W, U, A, R) \ + ((__m512i) __builtin_ia32_cvtps2iubs512_mask_round ((__v16sf) (A), \ + (__v16si) (W), \ + (__mmask16) (U), \ + (R))) + +#define _mm512_maskz_ipcvt_roundps_epu32(U, A, R) \ + ((__m512i) \ + __builtin_ia32_cvtps2iubs512_mask_round ((__v16sf) (A), \ + (__v16si) \ + (_mm512_setzero_si512 ()), \ + (__mmask16) (U), \ + (R))) + +#define _mm512_ipcvtt_roundph_epi16(A, R) \ + ((__m512i) \ + __builtin_ia32_cvttph2ibs512_mask_round ((__v32hf) (A), \ + (__v32hi) \ + (_mm512_undefined_si512 ()), \ + (__mmask32) (-1), \ + (R))) + +#define _mm512_mask_ipcvtt_roundph_epi16(W, U, A, R) \ + ((__m512i) __builtin_ia32_cvttph2ibs512_mask_round ((__v32hf) (A), \ + (__v32hi) (W), \ + (__mmask32) (U), \ + (R))) + +#define _mm512_maskz_ipcvtt_roundph_epi16(U, A, R) \ + ((__m512i) \ + __builtin_ia32_cvttph2ibs512_mask_round ((__v32hf) (A), \ + (__v32hi) \ + (_mm512_setzero_si512 ()), \ + (__mmask32) (U), \ + (R))) + +#define _mm512_ipcvtt_roundph_epu16(A, R) \ + ((__m512i) \ + __builtin_ia32_cvttph2iubs512_mask_round ((__v32hf) (A), \ + (__v32hi) \ + (_mm512_undefined_si512 ()), \ + (__mmask32) (-1), \ + (R))) + +#define _mm512_mask_ipcvtt_roundph_epu16(W, U, A, R) \ + ((__m512i) __builtin_ia32_cvttph2iubs512_mask_round ((__v32hf) (A), \ + (__v32hi) (W), \ + (__mmask32) (U), \ + (R))) + +#define _mm512_maskz_ipcvtt_roundph_epu16(U, A, R) \ + ((__m512i) \ + __builtin_ia32_cvttph2iubs512_mask_round ((__v32hf) (A), \ + (__v32hi) \ + (_mm512_setzero_si512 ()), \ + (__mmask32) (U), \ + (R))) + +#define _mm512_ipcvtt_roundps_epi32(A, R) \ + ((__m512i) \ + __builtin_ia32_cvttps2ibs512_mask_round ((__v16sf) (A), \ + (__v16si) \ + (_mm512_undefined_si512 ()), \ + (__mmask16) (-1), \ + (R))) + +#define _mm512_mask_ipcvtt_roundps_epi32(W, U, A, R) \ + ((__m512i) __builtin_ia32_cvttps2ibs512_mask_round ((__v16sf) (A), \ + (__v16si) (W), \ + (__mmask16) (U), \ + (R))) + +#define _mm512_maskz_ipcvtt_roundps_epi32(U, A, R) \ + ((__m512i) \ + __builtin_ia32_cvttps2ibs512_mask_round ((__v16sf) (A), \ + (__v16si) \ + (_mm512_setzero_si512 ()), \ + (__mmask16) (U), \ + (R))) + +#define _mm512_ipcvtt_roundps_epu32(A, R) \ + ((__m512i) \ + __builtin_ia32_cvttps2iubs512_mask_round ((__v16sf) (A), \ + (__v16si) \ + (_mm512_undefined_si512 ()), \ + (__mmask16) (-1), \ + (R))) + +#define _mm512_mask_ipcvtt_roundps_epu32(W, U, A, R) \ + ((__m512i) __builtin_ia32_cvttps2iubs512_mask_round ((__v16sf) (A), \ + (__v16si) (W), \ + (__mmask16) (U), \ + (R))) + +#define _mm512_maskz_ipcvtt_roundps_epu32(U, A, R) \ + ((__m512i) \ + __builtin_ia32_cvttps2iubs512_mask_round ((__v16sf) (A), \ + (__v16si) \ + (_mm512_setzero_si512 ()), \ + (__mmask16) (U), \ + (R))) +#endif + +#ifdef __DISABLE_AVX10_2_512__ +#undef __DISABLE_AVX10_2_512__ +#pragma GCC pop_options +#endif /* __DISABLE_AVX10_2_512__ */ + +#endif /* _AVX10_2_512SATCVTINTRIN_H_INCLUDED */ diff --git a/gcc/config/i386/avx10_2satcvtintrin.h b/gcc/config/i386/avx10_2satcvtintrin.h new file mode 100644 index 00000000000..4fcf78955df --- /dev/null +++ b/gcc/config/i386/avx10_2satcvtintrin.h @@ -0,0 +1,1022 @@ +/* Copyright (C) 2024 Free Software Foundation, Inc. + + This file is part of GCC. + + GCC is free software; you can redistribute it and/or modify + it under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 3, or (at your option) + any later version. + + GCC is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + Under Section 7 of GPL version 3, you are granted additional + permissions described in the GCC Runtime Library Exception, version + 3.1, as published by the Free Software Foundation. + + You should have received a copy of the GNU General Public License and + a copy of the GCC Runtime Library Exception along with this program; + see the files COPYING3 and COPYING.RUNTIME respectively. If not, see + . */ + +#if !defined _IMMINTRIN_H_INCLUDED +#error "Never use directly; include instead." +#endif + +#ifndef _AVX10_2SATCVTINTRIN_H_INCLUDED +#define _AVX10_2SATCVTINTRIN_H_INCLUDED + +#if !defined (__AVX10_2_256__) +#pragma GCC push_options +#pragma GCC target("avx10.2") +#define __DISABLE_AVX10_2_256__ +#endif /* __AVX10_2_256__ */ + +extern __inline __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_ipcvtnebf16_epi16 (__m128bh __A) +{ + return (__m128i) __builtin_ia32_cvtnebf162ibs128_mask ((__v8bf) __A, + (__v8hi) + _mm_undefined_si128 (), + (__mmask8) -1); +} + +extern __inline __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_ipcvtnebf16_epi16 (__m128i __W, __mmask8 __U, __m128bh __A) +{ + return (__m128i) __builtin_ia32_cvtnebf162ibs128_mask ((__v8bf) __A, + (__v8hi) __W, + (__mmask8) __U); +} + +extern __inline __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_ipcvtnebf16_epi16 (__mmask8 __U, __m128bh __A) +{ + return (__m128i) __builtin_ia32_cvtnebf162ibs128_mask ((__v8bf) __A, + (__v8hi) + _mm_setzero_si128 (), + (__mmask8) __U); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_ipcvtnebf16_epi16 (__m256bh __A) +{ + return + (__m256i) __builtin_ia32_cvtnebf162ibs256_mask ((__v16bf) __A, + (__v16hi) + _mm256_undefined_si256 (), + (__mmask16) -1); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_mask_ipcvtnebf16_epi16 (__m256i __W, __mmask16 __U, __m256bh __A) +{ + return (__m256i) __builtin_ia32_cvtnebf162ibs256_mask ((__v16bf) __A, + (__v16hi) __W, + (__mmask16) __U); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_maskz_ipcvtnebf16_epi16 (__mmask16 __U, __m256bh __A) +{ + return + (__m256i) __builtin_ia32_cvtnebf162ibs256_mask ((__v16bf) __A, + (__v16hi) + _mm256_setzero_si256 (), + (__mmask16) __U); +} + +extern __inline __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_ipcvtnebf16_epu16 (__m128bh __A) +{ + return + (__m128i) __builtin_ia32_cvtnebf162iubs128_mask ((__v8bf) __A, + (__v8hi) + _mm_undefined_si128 (), + (__mmask8) -1); +} + +extern __inline __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_ipcvtnebf16_epu16 (__m128i __W, __mmask8 __U, __m128bh __A) +{ + return (__m128i) __builtin_ia32_cvtnebf162iubs128_mask ((__v8bf) __A, + (__v8hi) __W, + (__mmask8) __U); +} + +extern __inline __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_ipcvtnebf16_epu16 (__mmask8 __U, __m128bh __A) +{ + return + (__m128i) __builtin_ia32_cvtnebf162iubs128_mask ((__v8bf) __A, + (__v8hi) + _mm_setzero_si128 (), + (__mmask8) __U); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_ipcvtnebf16_epu16 (__m256bh __A) +{ + return + (__m256i) __builtin_ia32_cvtnebf162iubs256_mask ((__v16bf) __A, + (__v16hi) + _mm256_undefined_si256 (), + (__mmask16) -1); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_mask_ipcvtnebf16_epu16 (__m256i __W, __mmask16 __U, __m256bh __A) +{ + return (__m256i) __builtin_ia32_cvtnebf162iubs256_mask ((__v16bf) __A, + (__v16hi) __W, + (__mmask16) __U); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_maskz_ipcvtnebf16_epu16 (__mmask16 __U, __m256bh __A) +{ + return + (__m256i) __builtin_ia32_cvtnebf162iubs256_mask ((__v16bf) __A, + (__v16hi) + _mm256_setzero_si256 (), + (__mmask16) __U); +} + +extern __inline __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_ipcvtph_epi16 (__m128h __A) +{ + return (__m128i) __builtin_ia32_cvtph2ibs128_mask ((__v8hf) __A, + (__v8hi) + _mm_undefined_si128 (), + (__mmask8) -1); +} + +extern __inline __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_ipcvtph_epi16 (__m128i __W, __mmask8 __U, __m128h __A) +{ + return (__m128i) __builtin_ia32_cvtph2ibs128_mask ((__v8hf) __A, + (__v8hi) __W, + (__mmask8) __U); +} + +extern __inline __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_ipcvtph_epi16 (__mmask8 __U, __m128h __A) +{ + return (__m128i) __builtin_ia32_cvtph2ibs128_mask ((__v8hf) __A, + (__v8hi) + _mm_setzero_si128 (), + (__mmask8) __U); +} + +extern __inline __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_ipcvtph_epu16 (__m128h __A) +{ + return (__m128i) __builtin_ia32_cvtph2iubs128_mask ((__v8hf) __A, + (__v8hi) + _mm_undefined_si128 (), + (__mmask8) -1); +} + +extern __inline __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_ipcvtph_epu16 (__m128i __W, __mmask8 __U, __m128h __A) +{ + return (__m128i) __builtin_ia32_cvtph2iubs128_mask ((__v8hf) __A, + (__v8hi) __W, + (__mmask8) __U); +} + +extern __inline __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_ipcvtph_epu16 (__mmask8 __U, __m128h __A) +{ + return (__m128i) __builtin_ia32_cvtph2iubs128_mask ((__v8hf) __A, + (__v8hi) + _mm_setzero_si128 (), + (__mmask8) __U); +} + +extern __inline __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_ipcvtps_epi32 (__m128 __A) +{ + return (__m128i) __builtin_ia32_cvtps2ibs128_mask ((__v4sf) __A, + (__v4si) + _mm_undefined_si128 (), + (__mmask8) -1); +} + +extern __inline __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_ipcvtps_epi32 (__m128i __W, __mmask8 __U, __m128 __A) +{ + return (__m128i) __builtin_ia32_cvtps2ibs128_mask ((__v4sf) __A, + (__v4si) __W, + (__mmask8) __U); +} + +extern __inline __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_ipcvtps_epi32 (__mmask8 __U, __m128 __A) +{ + return (__m128i) __builtin_ia32_cvtps2ibs128_mask ((__v4sf) __A, + (__v4si) + _mm_setzero_si128 (), + (__mmask8) __U); +} + +extern __inline __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_ipcvtps_epu32 (__m128 __A) +{ + return (__m128i) __builtin_ia32_cvtps2iubs128_mask ((__v4sf) __A, + (__v4si) + _mm_undefined_si128 (), + (__mmask8) -1); +} + +extern __inline __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_ipcvtps_epu32 (__m128i __W, __mmask8 __U, __m128 __A) +{ + return (__m128i) __builtin_ia32_cvtps2iubs128_mask ((__v4sf) __A, + (__v4si) __W, + (__mmask8) __U); +} + +extern __inline __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_ipcvtps_epu32 (__mmask8 __U, __m128 __A) +{ + return (__m128i) __builtin_ia32_cvtps2iubs128_mask ((__v4sf) __A, + (__v4si) + _mm_setzero_si128 (), + (__mmask8) __U); +} + +extern __inline __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_ipcvttnebf16_epi16 (__m128bh __A) +{ + return + (__m128i) __builtin_ia32_cvttnebf162ibs128_mask ((__v8bf) __A, + (__v8hi) + _mm_undefined_si128 (), + (__mmask8) -1); +} + +extern __inline __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_ipcvttnebf16_epi16 (__m128i __W, __mmask8 __U, __m128bh __A) +{ + return (__m128i) __builtin_ia32_cvttnebf162ibs128_mask ((__v8bf) __A, + (__v8hi) __W, + (__mmask8) __U); +} + +extern __inline __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_ipcvttnebf16_epi16 (__mmask8 __U, __m128bh __A) +{ + return (__m128i) __builtin_ia32_cvttnebf162ibs128_mask ((__v8bf) __A, + (__v8hi) + _mm_setzero_si128 (), + (__mmask8) __U); +} + +extern __inline __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_ipcvttnebf16_epu16 (__m128bh __A) +{ + return + (__m128i) __builtin_ia32_cvttnebf162iubs128_mask ((__v8bf) __A, + (__v8hi) + _mm_undefined_si128 (), + (__mmask8) -1); +} + +extern __inline __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_ipcvttnebf16_epu16 (__m128i __W, __mmask8 __U, __m128bh __A) +{ + return (__m128i) __builtin_ia32_cvttnebf162iubs128_mask ((__v8bf) __A, + (__v8hi) __W, + (__mmask8) __U); +} + +extern __inline __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_ipcvttnebf16_epu16 (__mmask8 __U, __m128bh __A) +{ + return (__m128i) __builtin_ia32_cvttnebf162iubs128_mask ((__v8bf) __A, + (__v8hi) + _mm_setzero_si128 (), + (__mmask8) __U); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_ipcvttnebf16_epi16 (__m256bh __A) +{ + return (__m256i) + __builtin_ia32_cvttnebf162ibs256_mask ((__v16bf) __A, + (__v16hi) + _mm256_undefined_si256 (), + (__mmask16) -1); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_mask_ipcvttnebf16_epi16 (__m256i __W, __mmask16 __U, __m256bh __A) +{ + return (__m256i) __builtin_ia32_cvttnebf162ibs256_mask ((__v16bf) __A, + (__v16hi) __W, + (__mmask16) __U); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_maskz_ipcvttnebf16_epi16 (__mmask16 __U, __m256bh __A) +{ + return (__m256i) + __builtin_ia32_cvttnebf162ibs256_mask ((__v16bf) __A, + (__v16hi) + _mm256_setzero_si256 (), + (__mmask16) __U); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_ipcvttnebf16_epu16 (__m256bh __A) +{ + return (__m256i) + __builtin_ia32_cvttnebf162iubs256_mask ((__v16bf) __A, + (__v16hi) + _mm256_undefined_si256 (), + (__mmask16) -1); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_mask_ipcvttnebf16_epu16 (__m256i __W, __mmask16 __U, __m256bh __A) +{ + return (__m256i) __builtin_ia32_cvttnebf162iubs256_mask ((__v16bf) __A, + (__v16hi) __W, + (__mmask16) __U); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_maskz_ipcvttnebf16_epu16 (__mmask16 __U, __m256bh __A) +{ + return (__m256i) + __builtin_ia32_cvttnebf162iubs256_mask ((__v16bf) __A, + (__v16hi) + _mm256_setzero_si256 (), + (__mmask16) __U); +} + +extern __inline __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_ipcvttph_epi16 (__m128h __A) +{ + return (__m128i) __builtin_ia32_cvttph2ibs128_mask ((__v8hf) __A, + (__v8hi) + _mm_undefined_si128 (), + (__mmask8) -1); +} + +extern __inline __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_ipcvttph_epi16 (__m128i __W, __mmask8 __U, __m128h __A) +{ + return (__m128i) __builtin_ia32_cvttph2ibs128_mask ((__v8hf) __A, + (__v8hi) __W, + (__mmask8) __U); +} + +extern __inline __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_ipcvttph_epi16 (__mmask8 __U, __m128h __A) +{ + return (__m128i) __builtin_ia32_cvttph2ibs128_mask ((__v8hf) __A, + (__v8hi) + _mm_setzero_si128 (), + (__mmask8) __U); +} + +extern __inline __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_ipcvttph_epu16 (__m128h __A) +{ + return (__m128i) __builtin_ia32_cvttph2iubs128_mask ((__v8hf) __A, + (__v8hi) + _mm_undefined_si128 (), + (__mmask8) -1); +} + +extern __inline __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_ipcvttph_epu16 (__m128i __W, __mmask8 __U, __m128h __A) +{ + return (__m128i) __builtin_ia32_cvttph2iubs128_mask ((__v8hf) __A, + (__v8hi) __W, + (__mmask8) __U); +} + +extern __inline __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_ipcvttph_epu16 (__mmask8 __U, __m128h __A) +{ + return (__m128i) __builtin_ia32_cvttph2iubs128_mask ((__v8hf) __A, + (__v8hi) + _mm_setzero_si128 (), + (__mmask8) __U); +} + +extern __inline __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_ipcvttps_epi32 (__m128 __A) +{ + return (__m128i) __builtin_ia32_cvttps2ibs128_mask ((__v4sf) __A, + (__v4si) + _mm_undefined_si128 (), + (__mmask8) -1); +} + +extern __inline __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_ipcvttps_epi32 (__m128i __W, __mmask8 __U, __m128 __A) +{ + return (__m128i) __builtin_ia32_cvttps2ibs128_mask ((__v4sf) __A, + (__v4si) __W, + (__mmask8) __U); +} + +extern __inline __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_ipcvttps_epi32 (__mmask8 __U, __m128 __A) +{ + return (__m128i) __builtin_ia32_cvttps2ibs128_mask ((__v4sf) __A, + (__v4si) + _mm_setzero_si128 (), + (__mmask8) __U); +} + +extern __inline __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_ipcvttps_epu32 (__m128 __A) +{ + return (__m128i) __builtin_ia32_cvttps2iubs128_mask ((__v4sf) __A, + (__v4si) + _mm_undefined_si128 (), + (__mmask8) -1); +} + +extern __inline __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_ipcvttps_epu32 (__m128i __W, __mmask8 __U, __m128 __A) +{ + return (__m128i) __builtin_ia32_cvttps2iubs128_mask ((__v4sf) __A, + (__v4si) __W, + (__mmask8) __U); +} + +extern __inline __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_ipcvttps_epu32 (__mmask8 __U, __m128 __A) +{ + return (__m128i) __builtin_ia32_cvttps2iubs128_mask ((__v4sf) __A, + (__v4si) + _mm_setzero_si128 (), + (__mmask8) __U); +} + +#ifdef __OPTIMIZE__ +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_ipcvt_roundph_epi16 (__m256h __A, const int __R) +{ + return + (__m256i) __builtin_ia32_cvtph2ibs256_mask_round ((__v16hf) __A, + (__v16hi) + _mm256_undefined_si256 (), + (__mmask16) -1, + __R); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_mask_ipcvt_roundph_epi16 (__m256i __W, __mmask16 __U, __m256h __A, + const int __R) +{ + return (__m256i) __builtin_ia32_cvtph2ibs256_mask_round ((__v16hf) __A, + (__v16hi) __W, + (__mmask16) __U, + __R); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_maskz_ipcvt_roundph_epi16 (__mmask16 __U, __m256h __A, const int __R) +{ + return + (__m256i) __builtin_ia32_cvtph2ibs256_mask_round ((__v16hf) __A, + (__v16hi) + _mm256_setzero_si256 (), + (__mmask16) __U, + __R); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_ipcvt_roundph_epu16 (__m256h __A, const int __R) +{ + return (__m256i) + __builtin_ia32_cvtph2iubs256_mask_round ((__v16hf) __A, + (__v16hi) + _mm256_undefined_si256 (), + (__mmask16) -1, + __R); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_mask_ipcvt_roundph_epu16 (__m256i __W, __mmask16 __U, __m256h __A, + const int __R) +{ + return (__m256i) __builtin_ia32_cvtph2iubs256_mask_round ((__v16hf) __A, + (__v16hi) __W, + (__mmask16) __U, + __R); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_maskz_ipcvt_roundph_epu16 (__mmask16 __U, __m256h __A, const int __R) +{ + return + (__m256i) __builtin_ia32_cvtph2iubs256_mask_round ((__v16hf) __A, + (__v16hi) + _mm256_setzero_si256 (), + (__mmask16) __U, + __R); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_ipcvt_roundps_epi32 (__m256 __A, const int __R) +{ + return + (__m256i) __builtin_ia32_cvtps2ibs256_mask_round ((__v8sf) __A, + (__v8si) + _mm256_undefined_si256 (), + (__mmask8) -1, + __R); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_mask_ipcvt_roundps_epi32 (__m256i __W, __mmask8 __U, __m256 __A, + const int __R) +{ + return (__m256i) __builtin_ia32_cvtps2ibs256_mask_round ((__v8sf) __A, + (__v8si) __W, + (__mmask8) __U, + __R); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_maskz_ipcvt_roundps_epi32 (__mmask8 __U, __m256 __A, const int __R) +{ + return + (__m256i) __builtin_ia32_cvtps2ibs256_mask_round ((__v8sf) __A, + (__v8si) + _mm256_setzero_si256 (), + (__mmask8) __U, + __R); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_ipcvt_roundps_epu32 (__m256 __A, const int __R) +{ + return (__m256i) + __builtin_ia32_cvtps2iubs256_mask_round ((__v8sf) __A, + (__v8si) + _mm256_undefined_si256 (), + (__mmask8) -1, + __R); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_mask_ipcvt_roundps_epu32 (__m256i __W, __mmask8 __U, __m256 __A, + const int __R) +{ + return (__m256i) __builtin_ia32_cvtps2iubs256_mask_round ((__v8sf) __A, + (__v8si) __W, + (__mmask8) __U, + __R); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_maskz_ipcvt_roundps_epu32 (__mmask8 __U, __m256 __A, const int __R) +{ + return + (__m256i) __builtin_ia32_cvtps2iubs256_mask_round ((__v8sf) __A, + (__v8si) + _mm256_setzero_si256 (), + (__mmask8) __U, + __R); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_ipcvtt_roundph_epi16 (__m256h __A, const int __R) +{ + return (__m256i) + __builtin_ia32_cvttph2ibs256_mask_round ((__v16hf) __A, + (__v16hi) + _mm256_undefined_si256 (), + (__mmask16) -1, + __R); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_mask_ipcvtt_roundph_epi16 (__m256i __W, __mmask16 __U, __m256h __A, + const int __R) +{ + return (__m256i) __builtin_ia32_cvttph2ibs256_mask_round ((__v16hf) __A, + (__v16hi) __W, + (__mmask16) __U, + __R); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_maskz_ipcvtt_roundph_epi16 (__mmask16 __U, __m256h __A, const int __R) +{ + return + (__m256i) __builtin_ia32_cvttph2ibs256_mask_round ((__v16hf) __A, + (__v16hi) + _mm256_setzero_si256 (), + (__mmask16) __U, + __R); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_ipcvtt_roundph_epu16 (__m256h __A, const int __R) +{ + return (__m256i) + __builtin_ia32_cvttph2iubs256_mask_round ((__v16hf) __A, + (__v16hi) + _mm256_undefined_si256 (), + (__mmask16) -1, + __R); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_mask_ipcvtt_roundph_epu16 (__m256i __W, __mmask16 __U, __m256h __A, + const int __R) +{ + return (__m256i) __builtin_ia32_cvttph2iubs256_mask_round ((__v16hf) __A, + (__v16hi) __W, + (__mmask16) __U, + __R); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_maskz_ipcvtt_roundph_epu16 (__mmask16 __U, __m256h __A, const int __R) +{ + return + (__m256i) __builtin_ia32_cvttph2iubs256_mask_round ((__v16hf) __A, + (__v16hi) + _mm256_setzero_si256 (), + (__mmask16) __U, + __R); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_ipcvtt_roundps_epi32 (__m256 __A, const int __R) +{ + return (__m256i) + __builtin_ia32_cvttps2ibs256_mask_round ((__v8sf) __A, + (__v8si) + _mm256_undefined_si256 (), + (__mmask8) -1, + __R); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_mask_ipcvtt_roundps_epi32 (__m256i __W, __mmask8 __U, __m256 __A, + const int __R) +{ + return (__m256i) __builtin_ia32_cvttps2ibs256_mask_round ((__v8sf) __A, + (__v8si) __W, + (__mmask8) __U, + __R); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_maskz_ipcvtt_roundps_epi32 (__mmask8 __U, __m256 __A, const int __R) +{ + return + (__m256i) __builtin_ia32_cvttps2ibs256_mask_round ((__v8sf) __A, + (__v8si) + _mm256_setzero_si256 (), + (__mmask8) __U, + __R); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_ipcvtt_roundps_epu32 (__m256 __A, const int __R) +{ + return (__m256i) + __builtin_ia32_cvttps2iubs256_mask_round ((__v8sf) __A, + (__v8si) + _mm256_undefined_si256 (), + (__mmask8) -1, + __R); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_mask_ipcvtt_roundps_epu32 (__m256i __W, __mmask8 __U, __m256 __A, + const int __R) +{ + return (__m256i) __builtin_ia32_cvttps2iubs256_mask_round ((__v8sf) __A, + (__v8si) __W, + (__mmask8) __U, + __R); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_maskz_ipcvtt_roundps_epu32 (__mmask8 __U, __m256 __A, const int __R) +{ + return + (__m256i) __builtin_ia32_cvttps2iubs256_mask_round ((__v8sf) __A, + (__v8si) + _mm256_setzero_si256 (), + (__mmask8) __U, + __R); +} +#else + +#define _mm256_ipcvt_roundph_epi16(A, R) \ + ((__m256i) \ + __builtin_ia32_cvtph2ibs256_mask_round ((__v16hf) (A), \ + (__v16hi) \ + (_mm256_undefined_si256 ()), \ + (__mmask16) (-1), \ + (R))) + +#define _mm256_mask_ipcvt_roundph_epi16(W, U, A, R) \ + ((__m256i) __builtin_ia32_cvtph2ibs256_mask_round ((__v16hf) (A), \ + (__v16hi) (W), \ + (__mmask16) (U), \ + (R))) + +#define _mm256_maskz_ipcvt_roundph_epi16(U, A, R) \ + ((__m256i) \ + __builtin_ia32_cvtph2ibs256_mask_round ((__v16hf) (A), \ + (__v16hi) \ + (_mm256_setzero_si256 ()), \ + (__mmask16) (U), \ + (R))) + +#define _mm256_ipcvt_roundph_epu16(A, R) \ + ((__m256i) \ + __builtin_ia32_cvtph2iubs256_mask_round ((__v16hf) (A), \ + (__v16hi) \ + (_mm256_undefined_si256 ()), \ + (__mmask16) (-1), \ + (R))) + +#define _mm256_mask_ipcvt_roundph_epu16(W, U, A, R) \ + ((__m256i) __builtin_ia32_cvtph2iubs256_mask_round ((__v16hf) (A), \ + (__v16hi) (W), \ + (__mmask16) (U), \ + (R))) + +#define _mm256_maskz_ipcvt_roundph_epu16(U, A, R) \ + ((__m256i) \ + __builtin_ia32_cvtph2iubs256_mask_round ((__v16hf) (A), \ + (__v16hi) \ + (_mm256_setzero_si256 ()), \ + (__mmask16) (U), \ + (R))) + +#define _mm256_ipcvt_roundps_epi32(A, R) \ + ((__m256i) \ + __builtin_ia32_cvtps2ibs256_mask_round ((__v8sf) (A), \ + (__v8si) \ + (_mm256_undefined_si256 ()), \ + (__mmask8) (-1), \ + (R))) + +#define _mm256_mask_ipcvt_roundps_epi32(W, U, A, R) \ + ((__m256i) __builtin_ia32_cvtps2ibs256_mask_round ((__v8sf) (A), \ + (__v8si) (W), \ + (__mmask8) (U), \ + (R))) + +#define _mm256_maskz_ipcvt_roundps_epi32(U, A, R) \ + ((__m256i) \ + __builtin_ia32_cvtps2ibs256_mask_round ((__v8sf) (A), \ + (__v8si) \ + (_mm256_setzero_si256 ()), \ + (__mmask8) (U), \ + (R))) + +#define _mm256_ipcvt_roundps_epu32(A, R) \ + ((__m256i) \ + __builtin_ia32_cvtps2iubs256_mask_round ((__v8sf) (A), \ + (__v8si) \ + (_mm256_undefined_si256 ()), \ + (__mmask8) (-1), \ + (R))) + +#define _mm256_mask_ipcvt_roundps_epu32(W, U, A, R) \ + ((__m256i) __builtin_ia32_cvtps2iubs256_mask_round ((__v8sf) (A), \ + (__v8si) (W), \ + (__mmask8) (U), \ + (R))) + +#define _mm256_maskz_ipcvt_roundps_epu32(U, A, R) \ + ((__m256i) \ + __builtin_ia32_cvtps2iubs256_mask_round ((__v8sf) (A), \ + (__v8si) \ + (_mm256_setzero_si256 ()), \ + (__mmask8) (U), \ + (R))) + + +#define _mm256_ipcvttne_roundbf16_epi16(A, R) \ + ((__m256i) \ + __builtin_ia32_cvttnebf162ibs256_mask_round ((__v16bf) (A), \ + (__v16hi) \ + (_mm256_undefined_si256 ()), \ + (__mmask16) (-1), \ + (R))) + +#define _mm256_mask_ipcvttne_roundbf16_epi16(W, U, A, R) \ + ((__m256i) __builtin_ia32_cvttnebf162ibs256_mask_round ((__v16bf) (A), \ + (__v16hi) (W), \ + (__mmask16) (U), \ + (R))) + +#define _mm256_maskz_ipcvttne_roundbf16_epi16(U, A, R) \ + ((__m256i) \ + __builtin_ia32_cvttnebf162ibs256_mask_round ((__v16bf) (A), \ + (__v16hi) \ + (_mm256_setzero_si256 ()), \ + (__mmask16) (U), \ + (R))) + +#define _mm256_ipcvttne_roundbf16_epu16(A, R) \ + ((__m256i) \ + __builtin_ia32_cvttnebf162iubs256_mask_round ((__v16bf) (A), \ + (__v16hi) \ + (_mm256_undefined_si256 ()), \ + (__mmask16) (-1), \ + (R))) + +#define _mm256_mask_ipcvttne_roundbf16_epu16(W, U, A, R) \ + ((__m256i) __builtin_ia32_cvttnebf162iubs256_mask_round ((__v16bf) (A), \ + (__v16hi) (W), \ + (__mmask16) (U), \ + (R))) + +#define _mm256_maskz_ipcvttne_roundbf16_epu16(U, A, R) \ + ((__m256i) \ + __builtin_ia32_cvttnebf162iubs256_mask_round ((__v16bf) (A), \ + (__v16hi) \ + (_mm256_setzero_si256 ()), \ + (__mmask16) (U), \ + (R))) + +#define _mm256_ipcvtt_roundph_epi16(A, R) \ + ((__m256i) \ + __builtin_ia32_cvttph2ibs256_mask_round ((__v16hf) (A), \ + (__v16hi) \ + (_mm256_undefined_si256 ()), \ + (__mmask16) (-1), \ + (R))) + +#define _mm256_mask_ipcvtt_roundph_epi16(W, U, A, R) \ + ((__m256i) __builtin_ia32_cvttph2ibs256_mask_round ((__v16hf) (A), \ + (__v16hi) (W), \ + (__mmask16) (U), \ + (R))) + +#define _mm256_maskz_ipcvtt_roundph_epi16(U, A, R) \ + ((__m256i) \ + __builtin_ia32_cvttph2ibs256_mask_round ((__v16hf) (A), \ + (__v16hi) \ + (_mm256_setzero_si256 ()), \ + (__mmask16) (U), \ + (R))) + +#define _mm256_ipcvtt_roundph_epu16(A, R) \ + ((__m256i) \ + __builtin_ia32_cvttph2iubs256_mask_round ((__v16hf) (A), \ + (__v16hi) \ + (_mm256_undefined_si256 ()), \ + (__mmask16) (-1), \ + (R))) + +#define _mm256_mask_ipcvtt_roundph_epu16(W, U, A, R) \ + ((__m256i) __builtin_ia32_cvttph2iubs256_mask_round ((__v16hf) (A), \ + (__v16hi) (W), \ + (__mmask16) (U), \ + (R))) + +#define _mm256_maskz_ipcvtt_roundph_epu16(U, A, R) \ + ((__m256i) \ + __builtin_ia32_cvttph2iubs256_mask_round ((__v16hf) (A), \ + (__v16hi) \ + (_mm256_setzero_si256 ()), \ + (__mmask16) (U), \ + (R))) + +#define _mm256_ipcvtt_roundps_epi32(A, R) \ + ((__m256i) \ + __builtin_ia32_cvttps2ibs256_mask_round ((__v8sf) (A), \ + (__v8si) \ + (_mm256_undefined_si256 ()), \ + (__mmask8) (-1), \ + (R))) + +#define _mm256_mask_ipcvtt_roundps_epi32(W, U, A, R) \ + ((__m256i) __builtin_ia32_cvttps2ibs256_mask_round ((__v8sf) (A), \ + (__v8si) (W), \ + (__mmask8) (U), \ + (R))) + +#define _mm256_maskz_ipcvtt_roundps_epi32(U, A, R) \ + ((__m256i) \ + __builtin_ia32_cvttps2ibs256_mask_round ((__v8sf) (A), \ + (__v8si) \ + (_mm256_setzero_si256 ()), \ + (__mmask8) (U), \ + (R))) + +#define _mm256_ipcvtt_roundps_epu32(A, R) \ + ((__m256i) \ + __builtin_ia32_cvttps2iubs256_mask_round ((__v8sf) (A), \ + (__v8si) \ + (_mm256_undefined_si256 ()), \ + (__mmask8) (-1), \ + (R))) + +#define _mm256_mask_ipcvtt_roundps_epu32(W, U, A, R) \ + ((__m256i) __builtin_ia32_cvttps2iubs256_mask_round ((__v8sf) (A), \ + (__v8si) (W), \ + (__mmask8) (U), \ + (R))) + +#define _mm256_maskz_ipcvtt_roundps_epu32(U, A, R) \ +((__m256i) \ + __builtin_ia32_cvttps2iubs256_mask_round ((__v8sf) (A), \ + (__v8si) \ + (_mm256_setzero_si256 ()), \ + (__mmask8) (U), \ + (R))) +#endif + +#ifdef __DISABLE_AVX10_2_256__ +#undef __DISABLE_AVX10_2_256__ +#pragma GCC pop_options +#endif /* __DISABLE_AVX10_2_256__ */ + +#endif /* _AVX10_2SATCVTINTRIN_H_INCLUDED */ diff --git a/gcc/config/i386/i386-builtin-types.def b/gcc/config/i386/i386-builtin-types.def index e6f53589e70..b2978591287 100644 --- a/gcc/config/i386/i386-builtin-types.def +++ b/gcc/config/i386/i386-builtin-types.def @@ -1493,3 +1493,9 @@ DEF_FUNCTION_TYPE (USI, V32BF, V32BF, INT, USI) DEF_FUNCTION_TYPE (UHI, V16BF, V16BF, INT, UHI) DEF_FUNCTION_TYPE (UQI, V8BF, V8BF, INT, UQI) DEF_FUNCTION_TYPE (INT, V8BF, V8BF) +DEF_FUNCTION_TYPE (V8HI, V8BF, V8HI, UQI) +DEF_FUNCTION_TYPE (V16HI, V16BF, V16HI, UHI) +DEF_FUNCTION_TYPE (V32HI, V32BF, V32HI, USI) +DEF_FUNCTION_TYPE (V16SI, V16SF, V16SI, UHI, INT) +DEF_FUNCTION_TYPE (V16HI, V16BF, V16HI, UHI, INT) +DEF_FUNCTION_TYPE (V32HI, V32BF, V32HI, USI, INT) diff --git a/gcc/config/i386/i386-builtin.def b/gcc/config/i386/i386-builtin.def index 25b8169c1ef..b85eba5b330 100644 --- a/gcc/config/i386/i386-builtin.def +++ b/gcc/config/i386/i386-builtin.def @@ -3270,6 +3270,26 @@ BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_comsbf16_v8bf, "__built BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_comsbf16_v8bf, "__builtin_ia32_vcomsbf16le", IX86_BUILTIN_VCOMSBF16LE, LE, (int) INT_FTYPE_V8BF_V8BF) BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_comsbf16_v8bf, "__builtin_ia32_vcomsbf16lt", IX86_BUILTIN_VCOMSBF16LT, LT, (int) INT_FTYPE_V8BF_V8BF) BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_comsbf16_v8bf, "__builtin_ia32_vcomsbf16neq", IX86_BUILTIN_VCOMSBF16NE, NE, (int) INT_FTYPE_V8BF_V8BF) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_cvtnebf162ibsv8bf_mask, "__builtin_ia32_cvtnebf162ibs128_mask", IX86_BUILTIN_CVTNEBF162IBS128_MASK, UNKNOWN, (int) V8HI_FTYPE_V8BF_V8HI_UQI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_cvtnebf162ibsv16bf_mask, "__builtin_ia32_cvtnebf162ibs256_mask", IX86_BUILTIN_CVTNEBF162IBS256_MASK, UNKNOWN, (int) V16HI_FTYPE_V16BF_V16HI_UHI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_512, CODE_FOR_avx10_2_cvtnebf162ibsv32bf_mask, "__builtin_ia32_cvtnebf162ibs512_mask", IX86_BUILTIN_CVTNEBF162IBS512_MASK, UNKNOWN, (int) V32HI_FTYPE_V32BF_V32HI_USI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_cvtnebf162iubsv8bf_mask, "__builtin_ia32_cvtnebf162iubs128_mask", IX86_BUILTIN_CVTNEBF162IUBS128_MASK, UNKNOWN, (int) V8HI_FTYPE_V8BF_V8HI_UQI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_cvtnebf162iubsv16bf_mask, "__builtin_ia32_cvtnebf162iubs256_mask", IX86_BUILTIN_CVTNEBF162IUBS256_MASK, UNKNOWN, (int) V16HI_FTYPE_V16BF_V16HI_UHI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_512, CODE_FOR_avx10_2_cvtnebf162iubsv32bf_mask, "__builtin_ia32_cvtnebf162iubs512_mask", IX86_BUILTIN_CVTNEBF162IUBS512_MASK, UNKNOWN, (int) V32HI_FTYPE_V32BF_V32HI_USI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_cvtph2ibsv8hf_mask, "__builtin_ia32_cvtph2ibs128_mask", IX86_BUILTIN_CVTPH2IBS128_MASK, UNKNOWN, (int) V8HI_FTYPE_V8HF_V8HI_UQI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_cvtph2iubsv8hf_mask, "__builtin_ia32_cvtph2iubs128_mask", IX86_BUILTIN_CVTPH2IUBS128_MASK, UNKNOWN, (int) V8HI_FTYPE_V8HF_V8HI_UQI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_cvtps2ibsv4sf_mask, "__builtin_ia32_cvtps2ibs128_mask", IX86_BUILTIN_CVTPS2IBS128_MASK, UNKNOWN, (int) V4SI_FTYPE_V4SF_V4SI_UQI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_cvtps2iubsv4sf_mask, "__builtin_ia32_cvtps2iubs128_mask", IX86_BUILTIN_CVTPS2IUBS128_MASK, UNKNOWN, (int) V4SI_FTYPE_V4SF_V4SI_UQI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_cvttnebf162ibsv8bf_mask, "__builtin_ia32_cvttnebf162ibs128_mask", IX86_BUILTIN_CVTTNEBF162IBS128_MASK, UNKNOWN, (int) V8HI_FTYPE_V8BF_V8HI_UQI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_cvttnebf162ibsv16bf_mask, "__builtin_ia32_cvttnebf162ibs256_mask", IX86_BUILTIN_CVTTNEBF162IBS256_MASK, UNKNOWN, (int) V16HI_FTYPE_V16BF_V16HI_UHI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_512, CODE_FOR_avx10_2_cvttnebf162ibsv32bf_mask, "__builtin_ia32_cvttnebf162ibs512_mask", IX86_BUILTIN_CVTTNEBF162IBS512_MASK, UNKNOWN, (int) V32HI_FTYPE_V32BF_V32HI_USI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_cvttnebf162iubsv8bf_mask, "__builtin_ia32_cvttnebf162iubs128_mask", IX86_BUILTIN_CVTTNEBF162IUBS128_MASK, UNKNOWN, (int) V8HI_FTYPE_V8BF_V8HI_UQI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_cvttnebf162iubsv16bf_mask, "__builtin_ia32_cvttnebf162iubs256_mask", IX86_BUILTIN_CVTTNEBF162IUBS256_MASK, UNKNOWN, (int) V16HI_FTYPE_V16BF_V16HI_UHI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_512, CODE_FOR_avx10_2_cvttnebf162iubsv32bf_mask, "__builtin_ia32_cvttnebf162iubs512_mask", IX86_BUILTIN_CVTTNEBF162IUBS512_MASK, UNKNOWN, (int) V32HI_FTYPE_V32BF_V32HI_USI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_cvttph2ibsv8hf_mask, "__builtin_ia32_cvttph2ibs128_mask", IX86_BUILTIN_CVTTPH2IBS128_MASK, UNKNOWN, (int) V8HI_FTYPE_V8HF_V8HI_UQI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_cvttph2iubsv8hf_mask, "__builtin_ia32_cvttph2iubs128_mask", IX86_BUILTIN_CVTTPH2IUBS128_MASK, UNKNOWN, (int) V8HI_FTYPE_V8HF_V8HI_UQI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_cvttps2ibsv4sf_mask, "__builtin_ia32_cvttps2ibs128_mask", IX86_BUILTIN_CVTTPS2IBS128_MASK, UNKNOWN, (int) V4SI_FTYPE_V4SF_V4SI_UQI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_cvttps2iubsv4sf_mask, "__builtin_ia32_cvttps2iubs128_mask", IX86_BUILTIN_CVTTPS2IUBS128_MASK, UNKNOWN, (int) V4SI_FTYPE_V4SF_V4SI_UQI) /* Builtins with rounding support. */ BDESC_END (ARGS, ROUND_ARGS) @@ -3730,6 +3750,22 @@ BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_subv16hf3_mask_round, "__builti BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_subv8sf3_mask_round, "__builtin_ia32_subps256_mask_round", IX86_BUILTIN_VSUBPS256_MASK_ROUND, UNKNOWN, (int) V8SF_FTYPE_V8SF_V8SF_V8SF_UQI_INT) BDESC (0, OPTION_MASK_ISA2_AVX10_2_512, CODE_FOR_avx10_2_cvt2ps2phx_v32hf_mask_round, "__builtin_ia32_vcvt2ps2phx512_mask_round", IX86_BUILTIN_VCVT2PS2PHX_V32HF_MASK_ROUND, UNKNOWN, (int) V32HF_FTYPE_V16SF_V16SF_V32HF_USI_INT) BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_cvt2ps2phx_v16hf_mask_round, "__builtin_ia32_vcvt2ps2phx256_mask_round", IX86_BUILTIN_VCVT2PS2PHX_V16HF_MASK_ROUND, UNKNOWN, (int) V16HF_FTYPE_V8SF_V8SF_V16HF_UHI_INT) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_cvtph2ibsv16hf_mask_round, "__builtin_ia32_cvtph2ibs256_mask_round", IX86_BUILTIN_CVTPH2IBS256_MASK_ROUND, UNKNOWN, (int) V16HI_FTYPE_V16HF_V16HI_UHI_INT) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_512, CODE_FOR_avx10_2_cvtph2ibsv32hf_mask_round, "__builtin_ia32_cvtph2ibs512_mask_round", IX86_BUILTIN_CVTPH2IBS512_MASK_ROUND, UNKNOWN, (int) V32HI_FTYPE_V32HF_V32HI_USI_INT) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_cvtph2iubsv16hf_mask_round, "__builtin_ia32_cvtph2iubs256_mask_round", IX86_BUILTIN_CVTPH2IUBS256_MASK_ROUND, UNKNOWN, (int) V16HI_FTYPE_V16HF_V16HI_UHI_INT) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_512, CODE_FOR_avx10_2_cvtph2iubsv32hf_mask_round, "__builtin_ia32_cvtph2iubs512_mask_round", IX86_BUILTIN_CVTPH2IUBS512_MASK_ROUND, UNKNOWN, (int) V32HI_FTYPE_V32HF_V32HI_USI_INT) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_cvtps2ibsv8sf_mask_round, "__builtin_ia32_cvtps2ibs256_mask_round", IX86_BUILTIN_CVTPS2IBS256_MASK_ROUND, UNKNOWN, (int) V8SI_FTYPE_V8SF_V8SI_UQI_INT) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_512, CODE_FOR_avx10_2_cvtps2ibsv16sf_mask_round, "__builtin_ia32_cvtps2ibs512_mask_round", IX86_BUILTIN_CVTPS2IBS512_MASK_ROUND, UNKNOWN, (int) V16SI_FTYPE_V16SF_V16SI_UHI_INT) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_cvtps2iubsv8sf_mask_round, "__builtin_ia32_cvtps2iubs256_mask_round", IX86_BUILTIN_CVTPS2IUBS256_MASK_ROUND, UNKNOWN, (int) V8SI_FTYPE_V8SF_V8SI_UQI_INT) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_512, CODE_FOR_avx10_2_cvtps2iubsv16sf_mask_round, "__builtin_ia32_cvtps2iubs512_mask_round", IX86_BUILTIN_CVTPS2IUBS512_MASK_ROUND, UNKNOWN, (int) V16SI_FTYPE_V16SF_V16SI_UHI_INT) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_cvttph2ibsv16hf_mask_round, "__builtin_ia32_cvttph2ibs256_mask_round", IX86_BUILTIN_CVTTPH2IBS256_MASK_ROUND, UNKNOWN, (int) V16HI_FTYPE_V16HF_V16HI_UHI_INT) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_512, CODE_FOR_avx10_2_cvttph2ibsv32hf_mask_round, "__builtin_ia32_cvttph2ibs512_mask_round", IX86_BUILTIN_CVTTPH2IBS512_MASK_ROUND, UNKNOWN, (int) V32HI_FTYPE_V32HF_V32HI_USI_INT) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_cvttph2iubsv16hf_mask_round, "__builtin_ia32_cvttph2iubs256_mask_round", IX86_BUILTIN_CVTTPH2IUBS256_MASK_ROUND, UNKNOWN, (int) V16HI_FTYPE_V16HF_V16HI_UHI_INT) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_512, CODE_FOR_avx10_2_cvttph2iubsv32hf_mask_round, "__builtin_ia32_cvttph2iubs512_mask_round", IX86_BUILTIN_CVTTPH2IUBS512_MASK_ROUND, UNKNOWN, (int) V32HI_FTYPE_V32HF_V32HI_USI_INT) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_cvttps2ibsv8sf_mask_round, "__builtin_ia32_cvttps2ibs256_mask_round", IX86_BUILTIN_CVTTPS2IBS256_MASK_ROUND, UNKNOWN, (int) V8SI_FTYPE_V8SF_V8SI_UQI_INT) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_512, CODE_FOR_avx10_2_cvttps2ibsv16sf_mask_round, "__builtin_ia32_cvttps2ibs512_mask_round", IX86_BUILTIN_CVTTPS2IBS512_MASK_ROUND, UNKNOWN, (int) V16SI_FTYPE_V16SF_V16SI_UHI_INT) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_cvttps2iubsv8sf_mask_round, "__builtin_ia32_cvttps2iubs256_mask_round", IX86_BUILTIN_CVTTPS2IUBS256_MASK_ROUND, UNKNOWN, (int) V8SI_FTYPE_V8SF_V8SI_UQI_INT) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_512, CODE_FOR_avx10_2_cvttps2iubsv16sf_mask_round, "__builtin_ia32_cvttps2iubs512_mask_round", IX86_BUILTIN_CVTTPS2IUBS512_MASK_ROUND, UNKNOWN, (int) V16SI_FTYPE_V16SF_V16SI_UHI_INT) BDESC_END (ROUND_ARGS, MULTI_ARG) diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc index 7ea41924b98..9d522818ef5 100644 --- a/gcc/config/i386/i386-expand.cc +++ b/gcc/config/i386/i386-expand.cc @@ -11496,10 +11496,13 @@ ix86_expand_args_builtin (const struct builtin_description *d, case V16QI_FTYPE_V16QI_V16QI_UHI: case V16QI_FTYPE_QI_V16QI_UHI: case V32HI_FTYPE_V8HI_V32HI_USI: + case V32HI_FTYPE_V32BF_V32HI_USI: case V32HI_FTYPE_HI_V32HI_USI: case V16HI_FTYPE_V8HI_V16HI_UHI: + case V16HI_FTYPE_V16BF_V16HI_UHI: case V16HI_FTYPE_HI_V16HI_UHI: case V8HI_FTYPE_V8HI_V8HI_UQI: + case V8HI_FTYPE_V8BF_V8HI_UQI: case V8BF_FTYPE_V8BF_V8BF_UQI: case V8HI_FTYPE_HI_V8HI_UQI: case V16HF_FTYPE_V16HF_V16HF_UHI: @@ -12484,6 +12487,7 @@ ix86_expand_round_builtin (const struct builtin_description *d, case V8SF_FTYPE_V8DF_V8SF_QI_INT: case V8DF_FTYPE_V8DF_V8DF_QI_INT: case V32HI_FTYPE_V32HF_V32HI_USI_INT: + case V32HI_FTYPE_V32BF_V32HI_USI_INT: case V8SI_FTYPE_V8DF_V8SI_QI_INT: case V8DI_FTYPE_V8HF_V8DI_UQI_INT: case V8DI_FTYPE_V8DF_V8DI_QI_INT: @@ -12498,6 +12502,7 @@ ix86_expand_round_builtin (const struct builtin_description *d, case V8DI_FTYPE_V8SF_V8DI_QI_INT: case V16SF_FTYPE_V16SI_V16SF_HI_INT: case V16SI_FTYPE_V16SF_V16SI_HI_INT: + case V16SI_FTYPE_V16SF_V16SI_UHI_INT: case V16SI_FTYPE_V16HF_V16SI_UHI_INT: case V16HF_FTYPE_V16HF_V16HF_V16HF_INT: case V16HF_FTYPE_V16SI_V16HF_UHI_INT: @@ -12530,6 +12535,7 @@ ix86_expand_round_builtin (const struct builtin_description *d, case V16HF_FTYPE_V16SF_V16HF_UHI_INT: case V16HF_FTYPE_V16HF_V16HF_UHI_INT: case V16HF_FTYPE_V16HI_V16HF_UHI_INT: + case V16HI_FTYPE_V16BF_V16HI_UHI_INT: case V8HF_FTYPE_V8HF_V8HF_V8HF_INT: nargs = 4; break; diff --git a/gcc/config/i386/immintrin.h b/gcc/config/i386/immintrin.h index 025334027eb..c8e37507088 100644 --- a/gcc/config/i386/immintrin.h +++ b/gcc/config/i386/immintrin.h @@ -152,4 +152,7 @@ #include +#include + +#include #endif /* _IMMINTRIN_H_INCLUDED */ diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index d7d99c6359f..0de94187e69 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -235,6 +235,18 @@ UNSPEC_VGETMANTPBF16 UNSPEC_VFPCLASSPBF16 UNSPEC_VCOMSBF16 + UNSPEC_VCVTNEBF162IBS + UNSPEC_VCVTNEBF162IUBS + UNSPEC_VCVTPH2IBS + UNSPEC_VCVTPH2IUBS + UNSPEC_VCVTPS2IBS + UNSPEC_VCVTPS2IUBS + UNSPEC_VCVTTNEBF162IBS + UNSPEC_VCVTTNEBF162IUBS + UNSPEC_VCVTTPH2IBS + UNSPEC_VCVTTPH2IUBS + UNSPEC_VCVTTPS2IBS + UNSPEC_VCVTTPS2IUBS ]) (define_c_enum "unspecv" [ @@ -32197,3 +32209,101 @@ "TARGET_AVX10_2_256" "vcomsbf16\t{%1, %0|%0, %1}" [(set_attr "prefix" "evex")]) + +(define_int_iterator UNSPEC_CVTNE_BF16_IBS_ITER + [UNSPEC_VCVTNEBF162IBS + UNSPEC_VCVTNEBF162IUBS + UNSPEC_VCVTTNEBF162IBS + UNSPEC_VCVTTNEBF162IUBS]) + +(define_int_attr sat_cvt_sign_prefix + [(UNSPEC_VCVTNEBF162IBS "") + (UNSPEC_VCVTNEBF162IUBS "u") + (UNSPEC_VCVTTNEBF162IBS "") + (UNSPEC_VCVTTNEBF162IUBS "u") + (UNSPEC_VCVTPH2IBS "") + (UNSPEC_VCVTPH2IUBS "u") + (UNSPEC_VCVTTPH2IBS "") + (UNSPEC_VCVTTPH2IUBS "u") + (UNSPEC_VCVTPS2IBS "") + (UNSPEC_VCVTPS2IUBS "u") + (UNSPEC_VCVTTPS2IBS "") + (UNSPEC_VCVTTPS2IUBS "u")]) + + +(define_int_attr sat_cvt_trunc_prefix + [(UNSPEC_VCVTNEBF162IBS "") + (UNSPEC_VCVTNEBF162IUBS "") + (UNSPEC_VCVTTNEBF162IBS "t") + (UNSPEC_VCVTTNEBF162IUBS "t")]) + +(define_insn "avx10_2_cvtnebf162ibs" + [(set (match_operand: 0 "register_operand" "=v") + (unspec: + [(match_operand:VBF_AVX10_2 1 "vector_operand" "vm")] + UNSPEC_CVTNE_BF16_IBS_ITER))] + "TARGET_AVX10_2_256" + "vcvtnebf162ibs\t{%1, %0|%0, %1}" + [(set_attr "type" "ssecvt") + (set_attr "prefix" "evex") + (set_attr "mode" "")]) + +(define_int_iterator UNSPEC_CVT_PH_IBS_ITER + [UNSPEC_VCVTPH2IBS + UNSPEC_VCVTPH2IUBS]) + +(define_insn "avx10_2_cvtph2ibs" + [(set (match_operand: 0 "register_operand" "=v") + (unspec: + [(match_operand:VHF_AVX10_2 1 "" "")] + UNSPEC_CVT_PH_IBS_ITER))] + "TARGET_AVX10_2_256 && " + "vcvtph2ibs\t{%1, %0|%0, %1}" + [(set_attr "type" "ssecvt") + (set_attr "prefix" "evex") + (set_attr "mode" "")]) + +(define_int_iterator UNSPEC_CVTT_PH_IBS_ITER + [UNSPEC_VCVTTPH2IBS + UNSPEC_VCVTTPH2IUBS]) + +(define_insn "avx10_2_cvttph2ibs" + [(set (match_operand: 0 "register_operand" "=v") + (unspec: + [(match_operand:VHF_AVX10_2 1 "" "")] + UNSPEC_CVTT_PH_IBS_ITER))] + "TARGET_AVX10_2_256 && " + "vcvttph2ibs\t{%1, %0|%0, %1}" + [(set_attr "type" "ssecvt") + (set_attr "prefix" "evex") + (set_attr "mode" "")]) + +(define_int_iterator UNSPEC_CVT_PS_IBS_ITER + [UNSPEC_VCVTPS2IBS + UNSPEC_VCVTPS2IUBS]) + +(define_insn "avx10_2_cvtps2ibs" + [(set (match_operand: 0 "register_operand" "=v") + (unspec: + [(match_operand:VF1_AVX10_2 1 "" "")] + UNSPEC_CVT_PS_IBS_ITER))] + "TARGET_AVX10_2_256 && " + "vcvtps2ibs\t{%1, %0|%0, %1}" + [(set_attr "type" "ssecvt") + (set_attr "prefix" "evex") + (set_attr "mode" "")]) + +(define_int_iterator UNSPEC_CVTT_PS_IBS_ITER + [UNSPEC_VCVTTPS2IBS + UNSPEC_VCVTTPS2IUBS]) + +(define_insn "avx10_2_cvttps2ibs" + [(set (match_operand: 0 "register_operand" "=v") + (unspec: + [(match_operand:VF1_AVX10_2 1 "" "")] + UNSPEC_CVTT_PS_IBS_ITER))] + "TARGET_AVX10_2_256 && " + "vcvttps2ibs\t{%1, %0|%0, %1}" + [(set_attr "type" "ssecvt") + (set_attr "prefix" "evex") + (set_attr "mode" "")]) diff --git a/gcc/testsuite/gcc.target/i386/avx-1.c b/gcc/testsuite/gcc.target/i386/avx-1.c index df4cfdfff8d..be2fb5ae15a 100644 --- a/gcc/testsuite/gcc.target/i386/avx-1.c +++ b/gcc/testsuite/gcc.target/i386/avx-1.c @@ -1035,6 +1035,26 @@ #define __builtin_ia32_cmppbf16256_mask(A, B, C, D) __builtin_ia32_cmppbf16256_mask(A, B, 1, D) #define __builtin_ia32_cmppbf16128_mask(A, B, C, D) __builtin_ia32_cmppbf16128_mask(A, B, 1, D) +/* avx10_2-512satcvtintrin.h */ +#define __builtin_ia32_cvtph2ibs512_mask_round(A, B, C, D) __builtin_ia32_cvtph2ibs512_mask_round(A, B, C, 8) +#define __builtin_ia32_cvtph2iubs512_mask_round(A, B, C, D) __builtin_ia32_cvtph2iubs512_mask_round(A, B, C, 8) +#define __builtin_ia32_cvtps2ibs512_mask_round(A, B, C, D) __builtin_ia32_cvtps2ibs512_mask_round(A, B, C, 8) +#define __builtin_ia32_cvtps2iubs512_mask_round(A, B, C, D) __builtin_ia32_cvtps2iubs512_mask_round(A, B, C, 8) +#define __builtin_ia32_cvttph2ibs512_mask_round(A, B, C, D) __builtin_ia32_cvttph2ibs512_mask_round(A, B, C, 8) +#define __builtin_ia32_cvttph2iubs512_mask_round(A, B, C, D) __builtin_ia32_cvttph2iubs512_mask_round(A, B, C, 8) +#define __builtin_ia32_cvttps2ibs512_mask_round(A, B, C, D) __builtin_ia32_cvttps2ibs512_mask_round(A, B, C, 8) +#define __builtin_ia32_cvttps2iubs512_mask_round(A, B, C, D) __builtin_ia32_cvttps2iubs512_mask_round(A, B, C, 8) + +/* avx10_2satcvtintrin.h */ +#define __builtin_ia32_cvtph2ibs256_mask_round(A, B, C, D) __builtin_ia32_cvtph2ibs256_mask_round(A, B, C, 8) +#define __builtin_ia32_cvtph2iubs256_mask_round(A, B, C, D) __builtin_ia32_cvtph2iubs256_mask_round(A, B, C, 8) +#define __builtin_ia32_cvtps2ibs256_mask_round(A, B, C, D) __builtin_ia32_cvtps2ibs256_mask_round(A, B, C, 8) +#define __builtin_ia32_cvtps2iubs256_mask_round(A, B, C, D) __builtin_ia32_cvtps2iubs256_mask_round(A, B, C, 8) +#define __builtin_ia32_cvttph2ibs256_mask_round(A, B, C, D) __builtin_ia32_cvttph2ibs256_mask_round(A, B, C, 8) +#define __builtin_ia32_cvttph2iubs256_mask_round(A, B, C, D) __builtin_ia32_cvttph2iubs256_mask_round(A, B, C, 8) +#define __builtin_ia32_cvttps2ibs256_mask_round(A, B, C, D) __builtin_ia32_cvttps2ibs256_mask_round(A, B, C, 8) +#define __builtin_ia32_cvttps2iubs256_mask_round(A, B, C, D) __builtin_ia32_cvttps2iubs256_mask_round(A, B, C, 8) + #include #include #include diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-512-satcvt-1.c b/gcc/testsuite/gcc.target/i386/avx10_2-512-satcvt-1.c new file mode 100644 index 00000000000..84826c0fe5a --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx10_2-512-satcvt-1.c @@ -0,0 +1,100 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -mavx10.2 -mavx10.2-512" } */ +/* { dg-final { scan-assembler-times "vcvtph2ibs\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtph2ibs\[ \\t\]+\{rn-sae\}\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\[^\{\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtph2ibs\[ \\t\]+\{rz-sae\}\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtph2iubs\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtph2iubs\[ \\t\]+\{rn-sae\}\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\[^\{\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtph2iubs\[ \\t\]+\{rz-sae\}\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvttph2ibs\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvttph2ibs\[ \\t\]+\{sae\}\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\[^\{\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvttph2ibs\[ \\t\]+\{sae\}\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvttph2iubs\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvttph2iubs\[ \\t\]+\{sae\}\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\[^\{\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvttph2iubs\[ \\t\]+\{sae\}\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtps2ibs\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtps2ibs\[ \\t\]+\{rn-sae\}\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\[^\{\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtps2ibs\[ \\t\]+\{rz-sae\}\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtps2iubs\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtps2iubs\[ \\t\]+\{rn-sae\}\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\[^\{\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtps2iubs\[ \\t\]+\{rz-sae\}\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvttps2ibs\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvttps2ibs\[ \\t\]+\{sae\}\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\[^\{\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvttps2ibs\[ \\t\]+\{sae\}\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvttps2iubs\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvttps2iubs\[ \\t\]+\{sae\}\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\[^\{\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvttps2iubs\[ \\t\]+\{sae\}\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtnebf162ibs\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtnebf162ibs\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\[^\{\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtnebf162ibs\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtnebf162iubs\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtnebf162iubs\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\[^\{\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtnebf162iubs\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvttnebf162ibs\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvttnebf162ibs\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\[^\{\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvttnebf162ibs\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvttnebf162iubs\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvttnebf162iubs\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\[^\{\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvttnebf162iubs\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ + +#include + +volatile __m512 x; +volatile __m512h xh; +volatile __m512i xi; +volatile __m512bh xbh; +volatile __mmask8 m8; +volatile __mmask16 m16; +volatile __mmask32 m32; + +void extern +avx10_2_test (void) +{ + xi = _mm512_ipcvt_roundph_epi16 (xh, 4); + xi = _mm512_mask_ipcvt_roundph_epi16 (xi, m32, xh, 8); + xi = _mm512_maskz_ipcvt_roundph_epi16 (m32, xh, 11); + + xi = _mm512_ipcvt_roundph_epu16 (xh, 4); + xi = _mm512_mask_ipcvt_roundph_epu16 (xi, m32, xh, 8); + xi = _mm512_maskz_ipcvt_roundph_epu16 (m32, xh, 11); + + xi = _mm512_ipcvtt_roundph_epi16 (xh, 4); + xi = _mm512_mask_ipcvtt_roundph_epi16 (xi, m32, xh, 8); + xi = _mm512_maskz_ipcvtt_roundph_epi16 (m32, xh, 8); + + xi = _mm512_ipcvtt_roundph_epu16 (xh, 4); + xi = _mm512_mask_ipcvtt_roundph_epu16 (xi, m32, xh, 8); + xi = _mm512_maskz_ipcvtt_roundph_epu16 (m32, xh, 8); + + xi = _mm512_ipcvt_roundps_epi32 (x, 4); + xi = _mm512_mask_ipcvt_roundps_epi32 (xi, m16, x, 8); + xi = _mm512_maskz_ipcvt_roundps_epi32 (m16, x, 11); + + xi = _mm512_ipcvt_roundps_epu32 (x, 4); + xi = _mm512_mask_ipcvt_roundps_epu32 (xi, m16, x, 8); + xi = _mm512_maskz_ipcvt_roundps_epu32 (m16, x, 11); + + xi = _mm512_ipcvtt_roundps_epi32 (x, 4); + xi = _mm512_mask_ipcvtt_roundps_epi32 (xi, m16, x, 8); + xi = _mm512_maskz_ipcvtt_roundps_epi32 (m16, x, 8); + + xi = _mm512_ipcvtt_roundps_epu32 (x, 4); + xi = _mm512_mask_ipcvtt_roundps_epu32 (xi, m16, x, 8); + xi = _mm512_maskz_ipcvtt_roundps_epu32 (m16, x, 8); + + xi = _mm512_ipcvtnebf16_epi16 (xbh); + xi = _mm512_mask_ipcvtnebf16_epi16 (xi, m32, xbh); + xi = _mm512_maskz_ipcvtnebf16_epi16 (m32, xbh); + + xi = _mm512_ipcvtnebf16_epu16 (xbh); + xi = _mm512_mask_ipcvtnebf16_epu16 (xi, m32, xbh); + xi = _mm512_maskz_ipcvtnebf16_epu16 (m32, xbh); + + xi = _mm512_ipcvttnebf16_epi16 (xbh); + xi = _mm512_mask_ipcvttnebf16_epi16 (xi, m32, xbh); + xi = _mm512_maskz_ipcvttnebf16_epi16 (m32, xbh); + + xi = _mm512_ipcvttnebf16_epu16 (xbh); + xi = _mm512_mask_ipcvttnebf16_epu16 (xi, m32, xbh); + xi = _mm512_maskz_ipcvttnebf16_epu16 (m32, xbh); +} diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-512-vcvtnebf162ibs-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-512-vcvtnebf162ibs-2.c new file mode 100644 index 00000000000..489927ee065 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx10_2-512-vcvtnebf162ibs-2.c @@ -0,0 +1,69 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mavx10.2-512" } */ +/* { dg-require-effective-target avx10_2_512 } */ + +#ifndef AVX10_2 +#define AVX10_2 +#define AVX10_2_512 +#define AVX10_512BIT +#endif +#include "avx10-helper.h" +#include +#include + +#define SRC_SIZE ((AVX512F_LEN) / 16) +#define DST_SIZE ((AVX512F_LEN) / 16) + +static void +CALC (__bf16 *s, short *r) +{ + int i; + unsigned char tmp; + + for (i = 0; i < SRC_SIZE; i++) + { + if (s[i] > SCHAR_MAX) + tmp = SCHAR_MAX; + else if (s[i] < SCHAR_MIN) + tmp = SCHAR_MIN; + else + tmp = nearbyint(_mm_cvtsbh_ss(s[i])); + r[i] = (unsigned short)tmp; + } +} + +void +TEST (void) +{ + UNION_TYPE (AVX512F_LEN, bf16_bf) s; + UNION_TYPE (AVX512F_LEN, i_w) res1, res2, res3; + MASK_TYPE mask = MASK_VALUE; + short res_ref[DST_SIZE] = { 0 }; + int i, sign = 1; + + for (i = 0; i < SRC_SIZE; i++) + { + s.a[i] = 1.23 * (i + 2) * sign; + sign = -sign; + } + + for (i = 0; i < DST_SIZE; i++) + res2.a[i] = DEFAULT_VALUE; + + res1.x = INTRINSIC (_ipcvtnebf16_epi16) (s.x); + res2.x = INTRINSIC (_mask_ipcvtnebf16_epi16) (res2.x, mask, s.x); + res3.x = INTRINSIC (_maskz_ipcvtnebf16_epi16) (mask, s.x); + + CALC (s.a, res_ref); + + if (UNION_CHECK (AVX512F_LEN, i_w) (res1, res_ref)) + abort (); + + MASK_MERGE (i_w) (res_ref, mask, SRC_SIZE); + if (UNION_CHECK (AVX512F_LEN, i_w) (res2, res_ref)) + abort (); + + MASK_ZERO (i_w) (res_ref, mask, SRC_SIZE); + if (UNION_CHECK (AVX512F_LEN, i_w) (res3, res_ref)) + abort (); +} diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-512-vcvtnebf162iubs-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-512-vcvtnebf162iubs-2.c new file mode 100644 index 00000000000..f901f41ea8b --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx10_2-512-vcvtnebf162iubs-2.c @@ -0,0 +1,69 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mavx10.2-512" } */ +/* { dg-require-effective-target avx10_2_512 } */ + +#ifndef AVX10_2 +#define AVX10_2 +#define AVX10_2_512 +#define AVX10_512BIT +#endif +#include "avx10-helper.h" +#include +#include + +#define SRC_SIZE ((AVX512F_LEN) / 16) +#define DST_SIZE ((AVX512F_LEN) / 16) + +static void +CALC (__bf16 *s, unsigned short *r) +{ + int i; + unsigned char tmp; + + for (i = 0; i < SRC_SIZE; i++) + { + if (s[i] > UCHAR_MAX) + tmp = UCHAR_MAX; + else if (s[i] < 0) + tmp = 0; + else + tmp = nearbyint(_mm_cvtsbh_ss(s[i])); + r[i] = (unsigned short)tmp; + } +} + +void +TEST (void) +{ + UNION_TYPE (AVX512F_LEN, bf16_bf) s; + UNION_TYPE (AVX512F_LEN, i_uw) res1, res2, res3; + MASK_TYPE mask = MASK_VALUE; + unsigned short res_ref[DST_SIZE] = { 0 }; + int i, sign = 1; + + for (i = 0; i < SRC_SIZE; i++) + { + s.a[i] = 1.23 * (i + 2) * sign; + sign = -sign; + } + + for (i = 0; i < DST_SIZE; i++) + res2.a[i] = DEFAULT_VALUE; + + res1.x = INTRINSIC (_ipcvtnebf16_epu16) (s.x); + res2.x = INTRINSIC (_mask_ipcvtnebf16_epu16) (res2.x, mask, s.x); + res3.x = INTRINSIC (_maskz_ipcvtnebf16_epu16) (mask, s.x); + + CALC (s.a, res_ref); + + if (UNION_CHECK (AVX512F_LEN, i_uw) (res1, res_ref)) + abort (); + + MASK_MERGE (i_uw) (res_ref, mask, SRC_SIZE); + if (UNION_CHECK (AVX512F_LEN, i_uw) (res2, res_ref)) + abort (); + + MASK_ZERO (i_uw) (res_ref, mask, SRC_SIZE); + if (UNION_CHECK (AVX512F_LEN, i_uw) (res3, res_ref)) + abort (); +} diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-512-vcvtph2ibs-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-512-vcvtph2ibs-2.c new file mode 100644 index 00000000000..4ce8dd06bdc --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx10_2-512-vcvtph2ibs-2.c @@ -0,0 +1,74 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mavx10.2-512" } */ +/* { dg-require-effective-target avx10_2_512 } */ + +#ifndef AVX10_2 +#define AVX10_2 +#define AVX10_2_512 +#define AVX10_512BIT +#endif +#include "avx10-helper.h" +#include + +#define SRC_SIZE ((AVX512F_LEN) / 16) +#define DST_SIZE ((AVX512F_LEN) / 16) + +static void +CALC (_Float16 *s, short *r) +{ + int i; + unsigned char tmp; + + for (i = 0; i < SRC_SIZE; i++) + { + if (s[i] > SCHAR_MAX) + tmp = SCHAR_MAX; + else if (s[i] < SCHAR_MIN) + tmp = SCHAR_MIN; + else + tmp = __builtin_nearbyintf16(s[i]); + r[i] = (unsigned short) tmp; + } +} + +void +TEST (void) +{ + UNION_TYPE (AVX512F_LEN, h) s; + UNION_TYPE (AVX512F_LEN, i_w) res1, res2, res3; + MASK_TYPE mask = MASK_VALUE; + short res_ref[DST_SIZE] = { 0 }; + int i, sign = 1; + + for (i = 0; i < SRC_SIZE; i++) + { + s.a[i] = 1.23 * (i + 2) * sign; + sign = -sign; + } + + for (i = 0; i < DST_SIZE; i++) + res2.a[i] = DEFAULT_VALUE; + +#if AVX512F_LEN == 128 + res1.x = INTRINSIC (_ipcvtph_epi16) (s.x); + res2.x = INTRINSIC (_mask_ipcvtph_epi16) (res2.x, mask, s.x); + res3.x = INTRINSIC (_maskz_ipcvtph_epi16) (mask, s.x); +#else + res1.x = INTRINSIC (_ipcvt_roundph_epi16) (s.x, 8); + res2.x = INTRINSIC (_mask_ipcvt_roundph_epi16) (res2.x, mask, s.x, 8); + res3.x = INTRINSIC (_maskz_ipcvt_roundph_epi16) (mask, s.x, 8); +#endif + + CALC (s.a, res_ref); + + if (UNION_CHECK (AVX512F_LEN, i_w) (res1, res_ref)) + abort (); + + MASK_MERGE (i_w) (res_ref, mask, SRC_SIZE); + if (UNION_CHECK (AVX512F_LEN, i_w) (res2, res_ref)) + abort (); + + MASK_ZERO (i_w) (res_ref, mask, SRC_SIZE); + if (UNION_CHECK (AVX512F_LEN, i_w) (res3, res_ref)) + abort (); +} diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-512-vcvtph2iubs-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-512-vcvtph2iubs-2.c new file mode 100644 index 00000000000..f78d6c7ee9e --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx10_2-512-vcvtph2iubs-2.c @@ -0,0 +1,74 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mavx10.2-512" } */ +/* { dg-require-effective-target avx10_2_512 } */ + +#ifndef AVX10_2 +#define AVX10_2 +#define AVX10_2_512 +#define AVX10_512BIT +#endif +#include "avx10-helper.h" +#include + +#define SRC_SIZE ((AVX512F_LEN) / 16) +#define DST_SIZE ((AVX512F_LEN) / 16) + +static void +CALC (_Float16 *s, short *r) +{ + int i; + unsigned char tmp; + + for (i = 0; i < SRC_SIZE; i++) + { + if (s[i] > UCHAR_MAX) + tmp = UCHAR_MAX; + else if (s[i] < 0) + tmp = 0; + else + tmp = __builtin_nearbyintf16(s[i]); + r[i] = (unsigned short) tmp; + } +} + +void +TEST (void) +{ + UNION_TYPE (AVX512F_LEN, h) s; + UNION_TYPE (AVX512F_LEN, i_w) res1, res2, res3; + MASK_TYPE mask = MASK_VALUE; + short res_ref[DST_SIZE] = { 0 }; + int i, sign = 1; + + for (i = 0; i < SRC_SIZE; i++) + { + s.a[i] = 1.23 * (i + 2) * sign; + sign = -sign; + } + + for (i = 0; i < DST_SIZE; i++) + res2.a[i] = DEFAULT_VALUE; + +#if AVX512F_LEN == 128 + res1.x = INTRINSIC (_ipcvtph_epu16) (s.x); + res2.x = INTRINSIC (_mask_ipcvtph_epu16) (res2.x, mask, s.x); + res3.x = INTRINSIC (_maskz_ipcvtph_epu16) (mask, s.x); +#else + res1.x = INTRINSIC (_ipcvt_roundph_epu16) (s.x, 8); + res2.x = INTRINSIC (_mask_ipcvt_roundph_epu16) (res2.x, mask, s.x, 8); + res3.x = INTRINSIC (_maskz_ipcvt_roundph_epu16) (mask, s.x, 8); +#endif + + CALC (s.a, res_ref); + + if (UNION_CHECK (AVX512F_LEN, i_w) (res1, res_ref)) + abort (); + + MASK_MERGE (i_w) (res_ref, mask, SRC_SIZE); + if (UNION_CHECK (AVX512F_LEN, i_w) (res2, res_ref)) + abort (); + + MASK_ZERO (i_w) (res_ref, mask, SRC_SIZE); + if (UNION_CHECK (AVX512F_LEN, i_w) (res3, res_ref)) + abort (); +} diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-512-vcvtps2ibs-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-512-vcvtps2ibs-2.c new file mode 100644 index 00000000000..4852a8bd6dd --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx10_2-512-vcvtps2ibs-2.c @@ -0,0 +1,75 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mavx10.2-512" } */ +/* { dg-require-effective-target avx10_2_512 } */ + +#ifndef AVX10_2 +#define AVX10_2 +#define AVX10_2_512 +#define AVX10_512BIT +#endif +#include "avx10-helper.h" +#include +#include + +#define SRC_SIZE ((AVX512F_LEN) / 32) +#define DST_SIZE ((AVX512F_LEN) / 32) + +static void +CALC (float *s, int *r) +{ + int i; + unsigned char tmp; + + for (i = 0; i < SRC_SIZE; i++) + { + if (s[i] > SCHAR_MAX) + tmp = SCHAR_MAX; + else if (s[i] < SCHAR_MIN) + tmp = SCHAR_MIN; + else + tmp = nearbyint(s[i]); + r[i] = (unsigned int) tmp; + } +} + +void +TEST (void) +{ + UNION_TYPE (AVX512F_LEN, ) s; + UNION_TYPE (AVX512F_LEN, i_d) res1, res2, res3; + MASK_TYPE mask = MASK_VALUE; + int res_ref[DST_SIZE] = { 0 }; + int i, sign = 1; + + for (i = 0; i < SRC_SIZE; i++) + { + s.a[i] = 1.23 * (i + 2) * sign; + sign = -sign; + } + + for (i = 0; i < DST_SIZE; i++) + res2.a[i] = DEFAULT_VALUE; + +#if AVX512F_LEN == 128 + res1.x = INTRINSIC (_ipcvtps_epi32) (s.x); + res2.x = INTRINSIC (_mask_ipcvtps_epi32) (res2.x, mask, s.x); + res3.x = INTRINSIC (_maskz_ipcvtps_epi32) (mask, s.x); +#else + res1.x = INTRINSIC (_ipcvt_roundps_epi32) (s.x, 8); + res2.x = INTRINSIC (_mask_ipcvt_roundps_epi32) (res2.x, mask, s.x, 8); + res3.x = INTRINSIC (_maskz_ipcvt_roundps_epi32) (mask, s.x, 8); +#endif + + CALC (s.a, res_ref); + + if (UNION_CHECK (AVX512F_LEN, i_d) (res1, res_ref)) + abort (); + + MASK_MERGE (i_d) (res_ref, mask, SRC_SIZE); + if (UNION_CHECK (AVX512F_LEN, i_d) (res2, res_ref)) + abort (); + + MASK_ZERO (i_d) (res_ref, mask, SRC_SIZE); + if (UNION_CHECK (AVX512F_LEN, i_d) (res3, res_ref)) + abort (); +} diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-512-vcvtps2iubs-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-512-vcvtps2iubs-2.c new file mode 100644 index 00000000000..6e0ad7d150c --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx10_2-512-vcvtps2iubs-2.c @@ -0,0 +1,73 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mavx10.2-512" } */ +/* { dg-require-effective-target avx10_2_512 } */ + +#ifndef AVX10_2 +#define AVX10_2 +#define AVX10_2_512 +#define AVX10_512BIT +#endif +#include "avx10-helper.h" +#include +#include + +#define SRC_SIZE ((AVX512F_LEN) / 32) +#define DST_SIZE ((AVX512F_LEN) / 32) + +static void +CALC (float *s, int *r) +{ + int i; + + for (i = 0; i < SRC_SIZE; i++) + { + if (s[i] > UCHAR_MAX) + r[i] = UCHAR_MAX; + else if (s[i] < 0) + r[i] = 0; + else + r[i] = nearbyint(s[i]); + } +} + +void +TEST (void) +{ + UNION_TYPE (AVX512F_LEN, ) s; + UNION_TYPE (AVX512F_LEN, i_d) res1, res2, res3; + MASK_TYPE mask = MASK_VALUE; + int res_ref[DST_SIZE] = { 0 }; + int i, sign = 1; + + for (i = 0; i < SRC_SIZE; i++) + { + s.a[i] = 1.23 * (i + 2) * sign; + sign = -sign; + } + + for (i = 0; i < DST_SIZE; i++) + res2.a[i] = DEFAULT_VALUE; + +#if AVX512F_LEN == 128 + res1.x = INTRINSIC (_ipcvtps_epu32) (s.x); + res2.x = INTRINSIC (_mask_ipcvtps_epu32) (res2.x, mask, s.x); + res3.x = INTRINSIC (_maskz_ipcvtps_epu32) (mask, s.x); +#else + res1.x = INTRINSIC (_ipcvt_roundps_epu32) (s.x, 8); + res2.x = INTRINSIC (_mask_ipcvt_roundps_epu32) (res2.x, mask, s.x, 8); + res3.x = INTRINSIC (_maskz_ipcvt_roundps_epu32) (mask, s.x, 8); +#endif + + CALC (s.a, res_ref); + + if (UNION_CHECK (AVX512F_LEN, i_d) (res1, res_ref)) + abort (); + + MASK_MERGE (i_d) (res_ref, mask, SRC_SIZE); + if (UNION_CHECK (AVX512F_LEN, i_d) (res2, res_ref)) + abort (); + + MASK_ZERO (i_d) (res_ref, mask, SRC_SIZE); + if (UNION_CHECK (AVX512F_LEN, i_d) (res3, res_ref)) + abort (); +} diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-512-vcvttnebf162ibs-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-512-vcvttnebf162ibs-2.c new file mode 100644 index 00000000000..23de8234aa6 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx10_2-512-vcvttnebf162ibs-2.c @@ -0,0 +1,69 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mavx10.2-512" } */ +/* { dg-require-effective-target avx10_2_512 } */ + +#ifndef AVX10_2 +#define AVX10_2 +#define AVX10_2_512 +#define AVX10_512BIT +#endif +#include "avx10-helper.h" +#include +#include + +#define SRC_SIZE ((AVX512F_LEN) / 16) +#define DST_SIZE ((AVX512F_LEN) / 16) + +static void +CALC (__bf16 *s, short *r) +{ + int i; + unsigned char tmp; + + for (i = 0; i < SRC_SIZE; i++) + { + if (s[i] > SCHAR_MAX) + tmp = SCHAR_MAX; + else if (s[i] < SCHAR_MIN) + tmp = SCHAR_MIN; + else + tmp = s[i]; + r[i] = (unsigned short)tmp; + } +} + +void +TEST (void) +{ + UNION_TYPE (AVX512F_LEN, bf16_bf) s; + UNION_TYPE (AVX512F_LEN, i_w) res1, res2, res3; + MASK_TYPE mask = MASK_VALUE; + short res_ref[DST_SIZE] = { 0 }; + int i, sign = 1; + + for (i = 0; i < SRC_SIZE; i++) + { + s.a[i] = 1.23 * (i + 2) * sign; + sign = -sign; + } + + for (i = 0; i < DST_SIZE; i++) + res2.a[i] = DEFAULT_VALUE; + + res1.x = INTRINSIC (_ipcvttnebf16_epi16) (s.x); + res2.x = INTRINSIC (_mask_ipcvttnebf16_epi16) (res2.x, mask, s.x); + res3.x = INTRINSIC (_maskz_ipcvttnebf16_epi16) (mask, s.x); + + CALC (s.a, res_ref); + + if (UNION_CHECK (AVX512F_LEN, i_w) (res1, res_ref)) + abort (); + + MASK_MERGE (i_w) (res_ref, mask, SRC_SIZE); + if (UNION_CHECK (AVX512F_LEN, i_w) (res2, res_ref)) + abort (); + + MASK_ZERO (i_w) (res_ref, mask, SRC_SIZE); + if (UNION_CHECK (AVX512F_LEN, i_w) (res3, res_ref)) + abort (); +} diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-512-vcvttnebf162iubs-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-512-vcvttnebf162iubs-2.c new file mode 100644 index 00000000000..858d8e73a00 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx10_2-512-vcvttnebf162iubs-2.c @@ -0,0 +1,69 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mavx10.2-512" } */ +/* { dg-require-effective-target avx10_2_512 } */ + +#ifndef AVX10_2 +#define AVX10_2 +#define AVX10_2_512 +#define AVX10_512BIT +#endif +#include "avx10-helper.h" +#include +#include + +#define SRC_SIZE ((AVX512F_LEN) / 16) +#define DST_SIZE ((AVX512F_LEN) / 16) + +static void +CALC (__bf16 *s, short *r) +{ + int i; + unsigned char tmp; + + for (i = 0; i < SRC_SIZE; i++) + { + if (s[i] > UCHAR_MAX) + tmp = UCHAR_MAX; + else if (s[i] < 0) + tmp = 0; + else + tmp = s[i]; + r[i] = (unsigned short) tmp; + } +} + +void +TEST (void) +{ + UNION_TYPE (AVX512F_LEN, bf16_bf) s; + UNION_TYPE (AVX512F_LEN, i_uw) res1, res2, res3; + MASK_TYPE mask = MASK_VALUE; + unsigned short res_ref[DST_SIZE] = { 0 }; + int i, sign = 1; + + for (i = 0; i < SRC_SIZE; i++) + { + s.a[i] = 1.23 * (i + 2) * sign; + sign = -sign; + } + + for (i = 0; i < DST_SIZE; i++) + res2.a[i] = DEFAULT_VALUE; + + res1.x = INTRINSIC (_ipcvttnebf16_epu16) (s.x); + res2.x = INTRINSIC (_mask_ipcvttnebf16_epu16) (res2.x, mask, s.x); + res3.x = INTRINSIC (_maskz_ipcvttnebf16_epu16) (mask, s.x); + + CALC (s.a, res_ref); + + if (UNION_CHECK (AVX512F_LEN, i_uw) (res1, res_ref)) + abort (); + + MASK_MERGE (i_uw) (res_ref, mask, SRC_SIZE); + if (UNION_CHECK (AVX512F_LEN, i_uw) (res2, res_ref)) + abort (); + + MASK_ZERO (i_uw) (res_ref, mask, SRC_SIZE); + if (UNION_CHECK (AVX512F_LEN, i_uw) (res3, res_ref)) + abort (); +} diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-512-vcvttph2ibs-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-512-vcvttph2ibs-2.c new file mode 100644 index 00000000000..e2624fb64b2 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx10_2-512-vcvttph2ibs-2.c @@ -0,0 +1,74 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mavx10.2-512" } */ +/* { dg-require-effective-target avx10_2_512 } */ + +#ifndef AVX10_2 +#define AVX10_2 +#define AVX10_2_512 +#define AVX10_512BIT +#endif +#include "avx10-helper.h" +#include + +#define SRC_SIZE ((AVX512F_LEN) / 16) +#define DST_SIZE ((AVX512F_LEN) / 16) + +static void +CALC (_Float16 *s, short *r) +{ + int i; + char tmp; + + for (i = 0; i < SRC_SIZE; i++) + { + if (s[i] > SCHAR_MAX) + tmp = SCHAR_MAX; + else if (s[i] < SCHAR_MIN) + tmp = SCHAR_MIN; + else + tmp = s[i]; + r[i] = (unsigned char) tmp; + } +} + +void +TEST (void) +{ + UNION_TYPE (AVX512F_LEN, h) s; + UNION_TYPE (AVX512F_LEN, i_w) res1, res2, res3; + MASK_TYPE mask = MASK_VALUE; + short res_ref[DST_SIZE] = { 0 }; + int i, sign = 1; + + for (i = 0; i < SRC_SIZE; i++) + { + s.a[i] = 1.23 * (i + 2) * sign; + sign = -sign; + } + + for (i = 0; i < DST_SIZE; i++) + res2.a[i] = DEFAULT_VALUE; + +#if AVX512F_LEN == 128 + res1.x = INTRINSIC (_ipcvttph_epi16) (s.x); + res2.x = INTRINSIC (_mask_ipcvttph_epi16) (res2.x, mask, s.x); + res3.x = INTRINSIC (_maskz_ipcvttph_epi16) (mask, s.x); +#else + res1.x = INTRINSIC (_ipcvtt_roundph_epi16) (s.x, 8); + res2.x = INTRINSIC (_mask_ipcvtt_roundph_epi16) (res2.x, mask, s.x, 8); + res3.x = INTRINSIC (_maskz_ipcvtt_roundph_epi16) (mask, s.x, 8); +#endif + + CALC (s.a, res_ref); + + if (UNION_CHECK (AVX512F_LEN, i_w) (res1, res_ref)) + abort (); + + MASK_MERGE (i_w) (res_ref, mask, SRC_SIZE); + if (UNION_CHECK (AVX512F_LEN, i_w) (res2, res_ref)) + abort (); + + MASK_ZERO (i_w) (res_ref, mask, SRC_SIZE); + if (UNION_CHECK (AVX512F_LEN, i_w) (res3, res_ref)) + abort (); +} diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-512-vcvttph2iubs-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-512-vcvttph2iubs-2.c new file mode 100644 index 00000000000..d98a462c4b3 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx10_2-512-vcvttph2iubs-2.c @@ -0,0 +1,74 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mavx10.2-512" } */ +/* { dg-require-effective-target avx10_2_512 } */ + +#ifndef AVX10_2 +#define AVX10_2 +#define AVX10_2_512 +#define AVX10_512BIT +#endif +#include "avx10-helper.h" +#include + +#define SRC_SIZE ((AVX512F_LEN) / 16) +#define DST_SIZE ((AVX512F_LEN) / 16) + +static void +CALC (_Float16 *s, short *r) +{ + int i; + unsigned char tmp; + + for (i = 0; i < SRC_SIZE; i++) + { + if (s[i] > UCHAR_MAX) + tmp = UCHAR_MAX; + else if (s[i] < 0) + tmp = 0; + else + tmp = s[i]; + r[i] = (unsigned short) tmp; + } +} + +void +TEST (void) +{ + UNION_TYPE (AVX512F_LEN, h) s; + UNION_TYPE (AVX512F_LEN, i_w) res1, res2, res3; + MASK_TYPE mask = MASK_VALUE; + short res_ref[DST_SIZE] = { 0 }; + int i, sign = 1; + + for (i = 0; i < SRC_SIZE; i++) + { + s.a[i] = 1.23 * (i + 2) * sign; + sign = -sign; + } + + for (i = 0; i < DST_SIZE; i++) + res2.a[i] = DEFAULT_VALUE; + +#if AVX512F_LEN == 128 + res1.x = INTRINSIC (_ipcvttph_epu16) (s.x); + res2.x = INTRINSIC (_mask_ipcvttph_epu16) (res2.x, mask, s.x); + res3.x = INTRINSIC (_maskz_ipcvttph_epu16) (mask, s.x); +#else + res1.x = INTRINSIC (_ipcvtt_roundph_epu16) (s.x, 8); + res2.x = INTRINSIC (_mask_ipcvtt_roundph_epu16) (res2.x, mask, s.x, 8); + res3.x = INTRINSIC (_maskz_ipcvtt_roundph_epu16) (mask, s.x, 8); +#endif + + CALC (s.a, res_ref); + + if (UNION_CHECK (AVX512F_LEN, i_w) (res1, res_ref)) + abort (); + + MASK_MERGE (i_w) (res_ref, mask, SRC_SIZE); + if (UNION_CHECK (AVX512F_LEN, i_w) (res2, res_ref)) + abort (); + + MASK_ZERO (i_w) (res_ref, mask, SRC_SIZE); + if (UNION_CHECK (AVX512F_LEN, i_w) (res3, res_ref)) + abort (); +} diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-512-vcvttps2ibs-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-512-vcvttps2ibs-2.c new file mode 100644 index 00000000000..47136108a6b --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx10_2-512-vcvttps2ibs-2.c @@ -0,0 +1,75 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mavx10.2-512" } */ +/* { dg-require-effective-target avx10_2_512 } */ + +#ifndef AVX10_2 +#define AVX10_2 +#define AVX10_2_512 +#define AVX10_512BIT +#endif +#include "avx10-helper.h" +#include +#include + +#define SRC_SIZE ((AVX512F_LEN) / 32) +#define DST_SIZE ((AVX512F_LEN) / 32) + +static void +CALC (float *s, int *r) +{ + int i; + unsigned char tmp; + + for (i = 0; i < SRC_SIZE; i++) + { + if (s[i] > SCHAR_MAX) + tmp = SCHAR_MAX; + else if (s[i] < SCHAR_MIN) + tmp = SCHAR_MIN; + else + tmp = s[i]; + r[i] = (unsigned int)tmp; + } +} + +void +TEST (void) +{ + UNION_TYPE (AVX512F_LEN, ) s; + UNION_TYPE (AVX512F_LEN, i_d) res1, res2, res3; + MASK_TYPE mask = MASK_VALUE; + int res_ref[DST_SIZE] = { 0 }; + int i, sign = 1; + + for (i = 0; i < SRC_SIZE; i++) + { + s.a[i] = 1.23 * (i + 2) * sign; + sign = -sign; + } + + for (i = 0; i < DST_SIZE; i++) + res2.a[i] = DEFAULT_VALUE; + +#if AVX512F_LEN == 128 + res1.x = INTRINSIC (_ipcvttps_epi32) (s.x); + res2.x = INTRINSIC (_mask_ipcvttps_epi32) (res2.x, mask, s.x); + res3.x = INTRINSIC (_maskz_ipcvttps_epi32) (mask, s.x); +#else + res1.x = INTRINSIC (_ipcvtt_roundps_epi32) (s.x, 8); + res2.x = INTRINSIC (_mask_ipcvtt_roundps_epi32) (res2.x, mask, s.x, 8); + res3.x = INTRINSIC (_maskz_ipcvtt_roundps_epi32) (mask, s.x, 8); +#endif + + CALC (s.a, res_ref); + + if (UNION_CHECK (AVX512F_LEN, i_d) (res1, res_ref)) + abort (); + + MASK_MERGE (i_d) (res_ref, mask, SRC_SIZE); + if (UNION_CHECK (AVX512F_LEN, i_d) (res2, res_ref)) + abort (); + + MASK_ZERO (i_d) (res_ref, mask, SRC_SIZE); + if (UNION_CHECK (AVX512F_LEN, i_d) (res3, res_ref)) + abort (); +} diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-512-vcvttps2iubs-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-512-vcvttps2iubs-2.c new file mode 100644 index 00000000000..f753dd5a707 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx10_2-512-vcvttps2iubs-2.c @@ -0,0 +1,73 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mavx10.2-512" } */ +/* { dg-require-effective-target avx10_2_512 } */ + +#ifndef AVX10_2 +#define AVX10_2 +#define AVX10_2_512 +#define AVX10_512BIT +#endif +#include "avx10-helper.h" +#include +#include + +#define SRC_SIZE ((AVX512F_LEN) / 32) +#define DST_SIZE ((AVX512F_LEN) / 32) + +static void +CALC (float *s, int *r) +{ + int i; + + for (i = 0; i < SRC_SIZE; i++) + { + if (s[i] > UCHAR_MAX) + r[i] = UCHAR_MAX; + else if (s[i] < 0) + r[i] = 0; + else + r[i] = s[i]; + } +} + +void +TEST (void) +{ + UNION_TYPE (AVX512F_LEN, ) s; + UNION_TYPE (AVX512F_LEN, i_d) res1, res2, res3; + MASK_TYPE mask = MASK_VALUE; + int res_ref[DST_SIZE] = { 0 }; + int i, sign = 1; + + for (i = 0; i < SRC_SIZE; i++) + { + s.a[i] = 1.23 * (i + 2) * sign; + sign = -sign; + } + + for (i = 0; i < DST_SIZE; i++) + res2.a[i] = DEFAULT_VALUE; + +#if AVX512F_LEN == 128 + res1.x = INTRINSIC (_ipcvttps_epu32) (s.x); + res2.x = INTRINSIC (_mask_ipcvttps_epu32) (res2.x, mask, s.x); + res3.x = INTRINSIC (_maskz_ipcvttps_epu32) (mask, s.x); +#else + res1.x = INTRINSIC (_ipcvtt_roundps_epu32) (s.x, 8); + res2.x = INTRINSIC (_mask_ipcvtt_roundps_epu32) (res2.x, mask, s.x, 8); + res3.x = INTRINSIC (_maskz_ipcvtt_roundps_epu32) (mask, s.x, 8); +#endif + + CALC (s.a, res_ref); + + if (UNION_CHECK (AVX512F_LEN, i_d) (res1, res_ref)) + abort (); + + MASK_MERGE (i_d) (res_ref, mask, SRC_SIZE); + if (UNION_CHECK (AVX512F_LEN, i_d) (res2, res_ref)) + abort (); + + MASK_ZERO (i_d) (res_ref, mask, SRC_SIZE); + if (UNION_CHECK (AVX512F_LEN, i_d) (res3, res_ref)) + abort (); +} diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-satcvt-1.c b/gcc/testsuite/gcc.target/i386/avx10_2-satcvt-1.c new file mode 100644 index 00000000000..f04e3ecb642 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx10_2-satcvt-1.c @@ -0,0 +1,187 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -mavx10.2" } */ +/* { dg-final { scan-assembler-times "vcvtph2ibs\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtph2ibs\[ \\t\]+\{rn-sae\}\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtph2ibs\[ \\t\]+\{rz-sae\}\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtph2iubs\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtph2iubs\[ \\t\]+\{rn-sae\}\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtph2iubs\[ \\t\]+\{rz-sae\}\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvttph2ibs\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvttph2ibs\[ \\t\]+\{sae\}\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}\[^\{\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvttph2ibs\[ \\t\]+\{sae\}\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvttph2iubs\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvttph2iubs\[ \\t\]+\{sae\}\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}\[^\{\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvttph2iubs\[ \\t\]+\{sae\}\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtps2ibs\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtps2ibs\[ \\t\]+\{rn-sae\}\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtps2ibs\[ \\t\]+\{rz-sae\}\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtps2iubs\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtps2iubs\[ \\t\]+\{rn-sae\}\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtps2iubs\[ \\t\]+\{rz-sae\}\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvttps2ibs\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvttps2ibs\[ \\t\]+\{sae\}\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}\[^\{\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvttps2ibs\[ \\t\]+\{sae\}\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvttps2iubs\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvttps2iubs\[ \\t\]+\{sae\}\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}\[^\{\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvttps2iubs\[ \\t\]+\{sae\}\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtnebf162ibs\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtnebf162ibs\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}\[^\{\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtnebf162ibs\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtnebf162iubs\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtnebf162iubs\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}\[^\{\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtnebf162iubs\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvttnebf162ibs\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvttnebf162ibs\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}\[^\{\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvttnebf162ibs\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvttnebf162iubs\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvttnebf162iubs\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}\[^\{\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvttnebf162iubs\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtph2ibs\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtph2ibs\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\[^\{\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtph2ibs\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtph2iubs\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtph2iubs\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\[^\{\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtph2iubs\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvttph2ibs\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvttph2ibs\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\[^\{\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvttph2ibs\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvttph2iubs\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvttph2iubs\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\[^\{\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvttph2iubs\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtps2ibs\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtps2ibs\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\[^\{\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtps2ibs\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtps2iubs\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtps2iubs\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\[^\{\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtps2iubs\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvttps2ibs\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvttps2ibs\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\[^\{\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvttps2ibs\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvttps2iubs\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvttps2iubs\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\[^\{\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvttps2iubs\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtnebf162ibs\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtnebf162ibs\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\[^\{\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtnebf162ibs\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtnebf162iubs\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtnebf162iubs\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\[^\{\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtnebf162iubs\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvttnebf162ibs\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvttnebf162ibs\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\[^\{\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvttnebf162ibs\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvttnebf162iubs\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvttnebf162iubs\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\[^\{\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvttnebf162iubs\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ + +#include + +volatile __m128 hx; +volatile __m128i hxi; +volatile __m128h hxh; +volatile __m128bh hxbh; +volatile __m256 x; +volatile __m256h xh; +volatile __m256i xi; +volatile __m256bh xbh; +volatile __mmask8 m8; +volatile __mmask16 m16; + +void extern +avx10_2_test (void) +{ + xi = _mm256_ipcvt_roundph_epi16 (xh, 4); + xi = _mm256_mask_ipcvt_roundph_epi16 (xi, m16, xh, 8); + xi = _mm256_maskz_ipcvt_roundph_epi16 (m16, xh, 11); + + xi = _mm256_ipcvt_roundph_epu16 (xh, 4); + xi = _mm256_mask_ipcvt_roundph_epu16 (xi, m16, xh, 8); + xi = _mm256_maskz_ipcvt_roundph_epu16 (m16, xh, 11); + + xi = _mm256_ipcvtt_roundph_epi16 (xh, 4); + xi = _mm256_mask_ipcvtt_roundph_epi16 (xi, m16, xh, 8); + xi = _mm256_maskz_ipcvtt_roundph_epi16 (m16, xh, 8); + + xi = _mm256_ipcvtt_roundph_epu16 (xh, 4); + xi = _mm256_mask_ipcvtt_roundph_epu16 (xi, m16, xh, 8); + xi = _mm256_maskz_ipcvtt_roundph_epu16 (m16, xh, 8); + + xi = _mm256_ipcvt_roundps_epi32 (x, 4); + xi = _mm256_mask_ipcvt_roundps_epi32 (xi, m8, x, 8); + xi = _mm256_maskz_ipcvt_roundps_epi32 (m8, x, 11); + + xi = _mm256_ipcvt_roundps_epu32 (x, 4); + xi = _mm256_mask_ipcvt_roundps_epu32 (xi, m8, x, 8); + xi = _mm256_maskz_ipcvt_roundps_epu32 (m8, x, 11); + + xi = _mm256_ipcvtt_roundps_epi32 (x, 4); + xi = _mm256_mask_ipcvtt_roundps_epi32 (xi, m8, x, 8); + xi = _mm256_maskz_ipcvtt_roundps_epi32 (m8, x, 8); + + xi = _mm256_ipcvtt_roundps_epu32 (x, 4); + xi = _mm256_mask_ipcvtt_roundps_epu32 (xi, m8, x, 8); + xi = _mm256_maskz_ipcvtt_roundps_epu32 (m8, x, 8); + + xi = _mm256_ipcvtnebf16_epi16 (xbh); + xi = _mm256_mask_ipcvtnebf16_epi16 (xi, m16, xbh); + xi = _mm256_maskz_ipcvtnebf16_epi16 (m16, xbh); + + xi = _mm256_ipcvtnebf16_epu16 (xbh); + xi = _mm256_mask_ipcvtnebf16_epu16 (xi, m16, xbh); + xi = _mm256_maskz_ipcvtnebf16_epu16 (m16, xbh); + + xi = _mm256_ipcvttnebf16_epi16 (xbh); + xi = _mm256_mask_ipcvttnebf16_epi16 (xi, m16, xbh); + xi = _mm256_maskz_ipcvttnebf16_epi16 (m16, xbh); + + xi = _mm256_ipcvttnebf16_epu16 (xbh); + xi = _mm256_mask_ipcvttnebf16_epu16 (xi, m16, xbh); + xi = _mm256_maskz_ipcvttnebf16_epu16 (m16, xbh); + + hxi = _mm_ipcvtph_epi16 (hxh); + hxi = _mm_mask_ipcvtph_epi16 (hxi, m8, hxh); + hxi = _mm_maskz_ipcvtph_epi16 (m8, hxh); + + hxi = _mm_ipcvtph_epu16 (hxh); + hxi = _mm_mask_ipcvtph_epu16 (hxi, m8, hxh); + hxi = _mm_maskz_ipcvtph_epu16 (m8, hxh); + + hxi = _mm_ipcvttph_epi16 (hxh); + hxi = _mm_mask_ipcvttph_epi16 (hxi, m8, hxh); + hxi = _mm_maskz_ipcvttph_epi16 (m8, hxh); + + hxi = _mm_ipcvttph_epu16 (hxh); + hxi = _mm_mask_ipcvttph_epu16 (hxi, m8, hxh); + hxi = _mm_maskz_ipcvttph_epu16 (m8, hxh); + + hxi = _mm_ipcvtps_epi32 (hx); + hxi = _mm_mask_ipcvtps_epi32 (hxi, m8, hx); + hxi = _mm_maskz_ipcvtps_epi32 (m8, hx); + + hxi = _mm_ipcvtps_epu32 (hx); + hxi = _mm_mask_ipcvtps_epu32 (hxi, m8, hx); + hxi = _mm_maskz_ipcvtps_epu32 (m8, hx); + + hxi = _mm_ipcvttps_epi32 (hx); + hxi = _mm_mask_ipcvttps_epi32 (hxi, m8, hx); + hxi = _mm_maskz_ipcvttps_epi32 (m8, hx); + + hxi = _mm_ipcvttps_epu32 (hx); + hxi = _mm_mask_ipcvttps_epu32 (hxi, m8, hx); + hxi = _mm_maskz_ipcvttps_epu32 (m8, hx); + + hxi = _mm_ipcvtnebf16_epi16 (hxbh); + hxi = _mm_mask_ipcvtnebf16_epi16 (hxi, m8, hxbh); + hxi = _mm_maskz_ipcvtnebf16_epi16 (m8, hxbh); + + hxi = _mm_ipcvtnebf16_epu16 (hxbh); + hxi = _mm_mask_ipcvtnebf16_epu16 (hxi, m8, hxbh); + hxi = _mm_maskz_ipcvtnebf16_epu16 (m8, hxbh); + + hxi = _mm_ipcvttnebf16_epi16 (hxbh); + hxi = _mm_mask_ipcvttnebf16_epi16 (hxi, m8, hxbh); + hxi = _mm_maskz_ipcvttnebf16_epi16 (m8, hxbh); + + hxi = _mm_ipcvttnebf16_epu16 (hxbh); + hxi = _mm_mask_ipcvttnebf16_epu16 (hxi, m8, hxbh); + hxi = _mm_maskz_ipcvttnebf16_epu16 (m8, hxbh); +} diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-vcvtnebf162ibs-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-vcvtnebf162ibs-2.c new file mode 100644 index 00000000000..130f19b253a --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx10_2-vcvtnebf162ibs-2.c @@ -0,0 +1,16 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mavx10.2" } */ +/* { dg-require-effective-target avx10_2 } */ + +#define AVX10_2 +#define AVX512VL +#define AVX512F_LEN 256 +#define AVX512F_LEN_HALF 128 +#include "avx10_2-512-vcvtnebf162ibs-2.c" + +#undef AVX512F_LEN +#undef AVX512F_LEN_HALF + +#define AVX512F_LEN 128 +#define AVX512F_LEN_HALF 128 +#include "avx10_2-512-vcvtnebf162ibs-2.c" diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-vcvtnebf162iubs-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-vcvtnebf162iubs-2.c new file mode 100644 index 00000000000..af6ec54236f --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx10_2-vcvtnebf162iubs-2.c @@ -0,0 +1,16 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mavx10.2" } */ +/* { dg-require-effective-target avx10_2 } */ + +#define AVX10_2 +#define AVX512VL +#define AVX512F_LEN 256 +#define AVX512F_LEN_HALF 128 +#include "avx10_2-512-vcvtnebf162iubs-2.c" + +#undef AVX512F_LEN +#undef AVX512F_LEN_HALF + +#define AVX512F_LEN 128 +#define AVX512F_LEN_HALF 128 +#include "avx10_2-512-vcvtnebf162iubs-2.c" diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-vcvtph2ibs-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-vcvtph2ibs-2.c new file mode 100644 index 00000000000..9954fc14c35 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx10_2-vcvtph2ibs-2.c @@ -0,0 +1,16 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mavx10.2" } */ +/* { dg-require-effective-target avx10_2 } */ + +#define AVX10_2 +#define AVX512VL +#define AVX512F_LEN 256 +#define AVX512F_LEN_HALF 128 +#include "avx10_2-512-vcvtph2ibs-2.c" + +#undef AVX512F_LEN +#undef AVX512F_LEN_HALF + +#define AVX512F_LEN 128 +#define AVX512F_LEN_HALF 128 +#include "avx10_2-512-vcvtph2ibs-2.c" diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-vcvtph2iubs-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-vcvtph2iubs-2.c new file mode 100644 index 00000000000..9bb25190af0 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx10_2-vcvtph2iubs-2.c @@ -0,0 +1,16 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mavx10.2" } */ +/* { dg-require-effective-target avx10_2 } */ + +#define AVX10_2 +#define AVX512VL +#define AVX512F_LEN 256 +#define AVX512F_LEN_HALF 128 +#include "avx10_2-512-vcvtph2iubs-2.c" + +#undef AVX512F_LEN +#undef AVX512F_LEN_HALF + +#define AVX512F_LEN 128 +#define AVX512F_LEN_HALF 128 +#include "avx10_2-512-vcvtph2iubs-2.c" diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-vcvtps2ibs-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-vcvtps2ibs-2.c new file mode 100644 index 00000000000..ce76ed780eb --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx10_2-vcvtps2ibs-2.c @@ -0,0 +1,16 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mavx10.2" } */ +/* { dg-require-effective-target avx10_2 } */ + +#define AVX10_2 +#define AVX512VL +#define AVX512F_LEN 256 +#define AVX512F_LEN_HALF 128 +#include "avx10_2-512-vcvtps2ibs-2.c" + +#undef AVX512F_LEN +#undef AVX512F_LEN_HALF + +#define AVX512F_LEN 128 +#define AVX512F_LEN_HALF 128 +#include "avx10_2-512-vcvtps2ibs-2.c" diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-vcvttnebf162ibs-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-vcvttnebf162ibs-2.c new file mode 100644 index 00000000000..8eaf7bcff26 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx10_2-vcvttnebf162ibs-2.c @@ -0,0 +1,16 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mavx10.2" } */ +/* { dg-require-effective-target avx10_2 } */ + +#define AVX10_2 +#define AVX512VL +#define AVX512F_LEN 256 +#define AVX512F_LEN_HALF 128 +#include "avx10_2-512-vcvttnebf162ibs-2.c" + +#undef AVX512F_LEN +#undef AVX512F_LEN_HALF + +#define AVX512F_LEN 128 +#define AVX512F_LEN_HALF 128 +#include "avx10_2-512-vcvttnebf162ibs-2.c" diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-vcvttnebf162iubs-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-vcvttnebf162iubs-2.c new file mode 100644 index 00000000000..c12964a4357 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx10_2-vcvttnebf162iubs-2.c @@ -0,0 +1,16 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mavx10.2" } */ +/* { dg-require-effective-target avx10_2 } */ + +#define AVX10_2 +#define AVX512VL +#define AVX512F_LEN 256 +#define AVX512F_LEN_HALF 128 +#include "avx10_2-512-vcvttnebf162iubs-2.c" + +#undef AVX512F_LEN +#undef AVX512F_LEN_HALF + +#define AVX512F_LEN 128 +#define AVX512F_LEN_HALF 128 +#include "avx10_2-512-vcvttnebf162iubs-2.c" diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-vcvttph2ibs-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-vcvttph2ibs-2.c new file mode 100644 index 00000000000..e8a4abb83a4 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx10_2-vcvttph2ibs-2.c @@ -0,0 +1,16 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mavx10.2" } */ +/* { dg-require-effective-target avx10_2 } */ + +#define AVX10_2 +#define AVX512VL +#define AVX512F_LEN 256 +#define AVX512F_LEN_HALF 128 +#include "avx10_2-512-vcvttph2ibs-2.c" + +#undef AVX512F_LEN +#undef AVX512F_LEN_HALF + +#define AVX512F_LEN 128 +#define AVX512F_LEN_HALF 128 +#include "avx10_2-512-vcvttph2ibs-2.c" diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-vcvttph2iubs-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-vcvttph2iubs-2.c new file mode 100644 index 00000000000..3683ed0dc10 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx10_2-vcvttph2iubs-2.c @@ -0,0 +1,16 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mavx10.2" } */ +/* { dg-require-effective-target avx10_2 } */ + +#define AVX10_2 +#define AVX512VL +#define AVX512F_LEN 256 +#define AVX512F_LEN_HALF 128 +#include "avx10_2-512-vcvttph2iubs-2.c" + +#undef AVX512F_LEN +#undef AVX512F_LEN_HALF + +#define AVX512F_LEN 128 +#define AVX512F_LEN_HALF 128 +#include "avx10_2-512-vcvttph2iubs-2.c" diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-vcvttps2ibs-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-vcvttps2ibs-2.c new file mode 100644 index 00000000000..4f8d4580172 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx10_2-vcvttps2ibs-2.c @@ -0,0 +1,16 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mavx10.2" } */ +/* { dg-require-effective-target avx10_2 } */ + +#define AVX10_2 +#define AVX512VL +#define AVX512F_LEN 256 +#define AVX512F_LEN_HALF 128 +#include "avx10_2-512-vcvttps2ibs-2.c" + +#undef AVX512F_LEN +#undef AVX512F_LEN_HALF + +#define AVX512F_LEN 128 +#define AVX512F_LEN_HALF 128 +#include "avx10_2-512-vcvttps2ibs-2.c" diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-vcvttps2iubs-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-vcvttps2iubs-2.c new file mode 100644 index 00000000000..defd38540bf --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx10_2-vcvttps2iubs-2.c @@ -0,0 +1,16 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mavx10.2" } */ +/* { dg-require-effective-target avx10_2 } */ + +#define AVX10_2 +#define AVX512VL +#define AVX512F_LEN 256 +#define AVX512F_LEN_HALF 128 +#include "avx10_2-512-vcvttps2iubs-2.c" + +#undef AVX512F_LEN +#undef AVX512F_LEN_HALF + +#define AVX512F_LEN 128 +#define AVX512F_LEN_HALF 128 +#include "avx10_2-512-vcvttps2iubs-2.c" diff --git a/gcc/testsuite/gcc.target/i386/avx512f-helper.h b/gcc/testsuite/gcc.target/i386/avx512f-helper.h index b61c03b4781..b49ff061f78 100644 --- a/gcc/testsuite/gcc.target/i386/avx512f-helper.h +++ b/gcc/testsuite/gcc.target/i386/avx512f-helper.h @@ -48,6 +48,7 @@ MAKE_MASK_MERGE(i_uw, unsigned short) MAKE_MASK_MERGE(bf16_uw, unsigned short) MAKE_MASK_MERGE(i_ud, unsigned int) MAKE_MASK_MERGE(i_uq, unsigned long long) +MAKE_MASK_MERGE(bf16_bf, __bf16) #define MASK_MERGE(TYPE) merge_masking_##TYPE @@ -74,6 +75,7 @@ MAKE_MASK_ZERO(i_uw, unsigned short) MAKE_MASK_ZERO(bf16_uw, unsigned short) MAKE_MASK_ZERO(i_ud, unsigned int) MAKE_MASK_ZERO(i_uq, unsigned long long) +MAKE_MASK_ZERO(bf16_bf, __bf16) #define MASK_ZERO(TYPE) zero_masking_##TYPE diff --git a/gcc/testsuite/gcc.target/i386/m512-check.h b/gcc/testsuite/gcc.target/i386/m512-check.h index bdc682d63bb..f22dda2113f 100644 --- a/gcc/testsuite/gcc.target/i386/m512-check.h +++ b/gcc/testsuite/gcc.target/i386/m512-check.h @@ -73,6 +73,12 @@ typedef union unsigned short a[32]; } union512bf16_uw; +typedef union +{ + __m512bh x; + __bf16 a[32]; +} union512bf16_bf; + typedef union { __m128h x; @@ -97,6 +103,18 @@ typedef union unsigned short a[16]; } union256bf16_uw; +typedef union +{ + __m128bh x; + __bf16 a[8]; +} union128bf16_bf; + +typedef union +{ + __m256bh x; + __bf16 a[16]; +} union256bf16_bf; + #define CHECK_ROUGH_EXP(UNION_TYPE, VALUE_TYPE, FMT) \ static int \ __attribute__((noinline, unused)) \ @@ -176,9 +194,12 @@ CHECK_ROUGH_EXP (union256h, _Float16, "%f") #if defined(AVX512BF16) CHECK_EXP (union512bf16_uw, unsigned short, "%d") +CHECK_EXP (union512bf16_bf, __bf16, "%f") #endif #if defined(AVX512BF16) CHECK_EXP (union128bf16_uw, unsigned short, "%d") CHECK_EXP (union256bf16_uw, unsigned short, "%d") +CHECK_EXP (union128bf16_bf, __bf16, "%f") +CHECK_EXP (union256bf16_bf, __bf16, "%f") #endif diff --git a/gcc/testsuite/gcc.target/i386/sse-13.c b/gcc/testsuite/gcc.target/i386/sse-13.c index e92d04af3f5..5669fa1aa00 100644 --- a/gcc/testsuite/gcc.target/i386/sse-13.c +++ b/gcc/testsuite/gcc.target/i386/sse-13.c @@ -1043,4 +1043,24 @@ #define __builtin_ia32_cmppbf16256_mask(A, B, C, D) __builtin_ia32_cmppbf16256_mask(A, B, 1, D) #define __builtin_ia32_cmppbf16128_mask(A, B, C, D) __builtin_ia32_cmppbf16128_mask(A, B, 1, D) +/* avx10_2-512satcvtintrin.h */ +#define __builtin_ia32_cvtph2ibs512_mask_round(A, B, C, D) __builtin_ia32_cvtph2ibs512_mask_round(A, B, C, 8) +#define __builtin_ia32_cvtph2iubs512_mask_round(A, B, C, D) __builtin_ia32_cvtph2iubs512_mask_round(A, B, C, 8) +#define __builtin_ia32_cvtps2ibs512_mask_round(A, B, C, D) __builtin_ia32_cvtps2ibs512_mask_round(A, B, C, 8) +#define __builtin_ia32_cvtps2iubs512_mask_round(A, B, C, D) __builtin_ia32_cvtps2iubs512_mask_round(A, B, C, 8) +#define __builtin_ia32_cvttph2ibs512_mask_round(A, B, C, D) __builtin_ia32_cvttph2ibs512_mask_round(A, B, C, 8) +#define __builtin_ia32_cvttph2iubs512_mask_round(A, B, C, D) __builtin_ia32_cvttph2iubs512_mask_round(A, B, C, 8) +#define __builtin_ia32_cvttps2ibs512_mask_round(A, B, C, D) __builtin_ia32_cvttps2ibs512_mask_round(A, B, C, 8) +#define __builtin_ia32_cvttps2iubs512_mask_round(A, B, C, D) __builtin_ia32_cvttps2iubs512_mask_round(A, B, C, 8) + +/* avx10_2satcvtintrin.h */ +#define __builtin_ia32_cvtph2ibs256_mask_round(A, B, C, D) __builtin_ia32_cvtph2ibs256_mask_round(A, B, C, 8) +#define __builtin_ia32_cvtph2iubs256_mask_round(A, B, C, D) __builtin_ia32_cvtph2iubs256_mask_round(A, B, C, 8) +#define __builtin_ia32_cvtps2ibs256_mask_round(A, B, C, D) __builtin_ia32_cvtps2ibs256_mask_round(A, B, C, 8) +#define __builtin_ia32_cvtps2iubs256_mask_round(A, B, C, D) __builtin_ia32_cvtps2iubs256_mask_round(A, B, C, 8) +#define __builtin_ia32_cvttph2ibs256_mask_round(A, B, C, D) __builtin_ia32_cvttph2ibs256_mask_round(A, B, C, 8) +#define __builtin_ia32_cvttph2iubs256_mask_round(A, B, C, D) __builtin_ia32_cvttph2iubs256_mask_round(A, B, C, 8) +#define __builtin_ia32_cvttps2ibs256_mask_round(A, B, C, D) __builtin_ia32_cvttps2ibs256_mask_round(A, B, C, 8) +#define __builtin_ia32_cvttps2iubs256_mask_round(A, B, C, D) __builtin_ia32_cvttps2iubs256_mask_round(A, B, C, 8) + #include diff --git a/gcc/testsuite/gcc.target/i386/sse-14.c b/gcc/testsuite/gcc.target/i386/sse-14.c index 49a82d8a2d5..550d2633b78 100644 --- a/gcc/testsuite/gcc.target/i386/sse-14.c +++ b/gcc/testsuite/gcc.target/i386/sse-14.c @@ -1431,3 +1431,55 @@ test_2 (_mm256_cmp_pbh_mask, __mmask16, __m256bh, __m256bh, 1) test_2 (_mm_cmp_pbh_mask, __mmask8, __m128bh, __m128bh, 1) test_3 (_mm256_mask_cmp_pbh_mask, __mmask16, __mmask16, __m256bh, __m256bh, 1) test_3 (_mm_mask_cmp_pbh_mask, __mmask8, __mmask8, __m128bh, __m128bh, 1) + +/* avx10_2-512satcvtintrin.h */ +test_1 (_mm512_ipcvt_roundph_epi16, __m512i, __m512h, 8) +test_1 (_mm512_ipcvt_roundph_epu16, __m512i, __m512h, 8) +test_1 (_mm512_ipcvt_roundps_epi32, __m512i, __m512, 8) +test_1 (_mm512_ipcvt_roundps_epu32, __m512i, __m512, 8) +test_1 (_mm512_ipcvtt_roundph_epi16, __m512i, __m512h, 8) +test_1 (_mm512_ipcvtt_roundph_epu16, __m512i, __m512h, 8) +test_1 (_mm512_ipcvtt_roundps_epi32, __m512i, __m512, 8) +test_1 (_mm512_ipcvtt_roundps_epu32, __m512i, __m512, 8) +test_2 (_mm512_maskz_ipcvt_roundph_epi16, __m512i, __mmask32, __m512h, 8) +test_2 (_mm512_maskz_ipcvt_roundph_epu16, __m512i, __mmask32, __m512h, 8) +test_2 (_mm512_maskz_ipcvt_roundps_epi32, __m512i, __mmask16, __m512, 8) +test_2 (_mm512_maskz_ipcvt_roundps_epu32, __m512i, __mmask16, __m512, 8) +test_2 (_mm512_maskz_ipcvtt_roundph_epi16, __m512i, __mmask32, __m512h, 8) +test_2 (_mm512_maskz_ipcvtt_roundph_epu16, __m512i, __mmask32, __m512h, 8) +test_2 (_mm512_maskz_ipcvtt_roundps_epi32, __m512i, __mmask16, __m512, 8) +test_2 (_mm512_maskz_ipcvtt_roundps_epu32, __m512i, __mmask16, __m512, 8) +test_3 (_mm512_mask_ipcvt_roundph_epi16, __m512i, __m512i, __mmask32, __m512h, 8) +test_3 (_mm512_mask_ipcvt_roundph_epu16, __m512i, __m512i, __mmask32, __m512h, 8) +test_3 (_mm512_mask_ipcvt_roundps_epi32, __m512i, __m512i, __mmask16, __m512, 8) +test_3 (_mm512_mask_ipcvt_roundps_epu32, __m512i, __m512i, __mmask16, __m512, 8) +test_3 (_mm512_mask_ipcvtt_roundph_epi16, __m512i, __m512i, __mmask32, __m512h, 8) +test_3 (_mm512_mask_ipcvtt_roundph_epu16, __m512i, __m512i, __mmask32, __m512h, 8) +test_3 (_mm512_mask_ipcvtt_roundps_epi32, __m512i, __m512i, __mmask16, __m512, 8) +test_3 (_mm512_mask_ipcvtt_roundps_epu32, __m512i, __m512i, __mmask16, __m512, 8) + +/* avx10_2satcvtintrin.h */ +test_1 (_mm256_ipcvt_roundph_epi16, __m256i, __m256h, 8) +test_1 (_mm256_ipcvt_roundph_epu16, __m256i, __m256h, 8) +test_1 (_mm256_ipcvt_roundps_epi32, __m256i, __m256, 8) +test_1 (_mm256_ipcvt_roundps_epu32, __m256i, __m256, 8) +test_1 (_mm256_ipcvtt_roundph_epi16, __m256i, __m256h, 8) +test_1 (_mm256_ipcvtt_roundph_epu16, __m256i, __m256h, 8) +test_1 (_mm256_ipcvtt_roundps_epi32, __m256i, __m256, 8) +test_1 (_mm256_ipcvtt_roundps_epu32, __m256i, __m256, 8) +test_2 (_mm256_maskz_ipcvt_roundph_epi16, __m256i, __mmask16, __m256h, 8) +test_2 (_mm256_maskz_ipcvt_roundph_epu16, __m256i, __mmask16, __m256h, 8) +test_2 (_mm256_maskz_ipcvt_roundps_epi32, __m256i, __mmask8, __m256, 8) +test_2 (_mm256_maskz_ipcvt_roundps_epu32, __m256i, __mmask8, __m256, 8) +test_2 (_mm256_maskz_ipcvtt_roundph_epi16, __m256i, __mmask16, __m256h, 8) +test_2 (_mm256_maskz_ipcvtt_roundph_epu16, __m256i, __mmask16, __m256h, 8) +test_2 (_mm256_maskz_ipcvtt_roundps_epi32, __m256i, __mmask8, __m256, 8) +test_2 (_mm256_maskz_ipcvtt_roundps_epu32, __m256i, __mmask8, __m256, 8) +test_3 (_mm256_mask_ipcvt_roundph_epi16, __m256i, __m256i, __mmask16, __m256h, 8) +test_3 (_mm256_mask_ipcvt_roundph_epu16, __m256i, __m256i, __mmask16, __m256h, 8) +test_3 (_mm256_mask_ipcvt_roundps_epi32, __m256i, __m256i, __mmask8, __m256, 8) +test_3 (_mm256_mask_ipcvt_roundps_epu32, __m256i, __m256i, __mmask8, __m256, 8) +test_3 (_mm256_mask_ipcvtt_roundph_epi16, __m256i, __m256i, __mmask16, __m256h, 8) +test_3 (_mm256_mask_ipcvtt_roundph_epu16, __m256i, __m256i, __mmask16, __m256h, 8) +test_3 (_mm256_mask_ipcvtt_roundps_epi32, __m256i, __m256i, __mmask8, __m256, 8) +test_3 (_mm256_mask_ipcvtt_roundps_epu32, __m256i, __m256i, __mmask8, __m256, 8) diff --git a/gcc/testsuite/gcc.target/i386/sse-22.c b/gcc/testsuite/gcc.target/i386/sse-22.c index 193057a4719..ba67ee26914 100644 --- a/gcc/testsuite/gcc.target/i386/sse-22.c +++ b/gcc/testsuite/gcc.target/i386/sse-22.c @@ -1470,3 +1470,55 @@ test_2 (_mm256_cmp_pbh_mask, __mmask16, __m256bh, __m256bh, 1) test_2 (_mm_cmp_pbh_mask, __mmask8, __m128bh, __m128bh, 1) test_3 (_mm256_mask_cmp_pbh_mask, __mmask16, __mmask16, __m256bh, __m256bh, 1) test_3 (_mm_mask_cmp_pbh_mask, __mmask8, __mmask8, __m128bh, __m128bh, 1) + +/* avx10_2-512satcvtintrin.h */ +test_1 (_mm512_ipcvt_roundph_epi16, __m512i, __m512h, 8) +test_1 (_mm512_ipcvt_roundph_epu16, __m512i, __m512h, 8) +test_1 (_mm512_ipcvt_roundps_epi32, __m512i, __m512, 8) +test_1 (_mm512_ipcvt_roundps_epu32, __m512i, __m512, 8) +test_1 (_mm512_ipcvtt_roundph_epi16, __m512i, __m512h, 8) +test_1 (_mm512_ipcvtt_roundph_epu16, __m512i, __m512h, 8) +test_1 (_mm512_ipcvtt_roundps_epi32, __m512i, __m512, 8) +test_1 (_mm512_ipcvtt_roundps_epu32, __m512i, __m512, 8) +test_2 (_mm512_maskz_ipcvt_roundph_epi16, __m512i, __mmask32, __m512h, 8) +test_2 (_mm512_maskz_ipcvt_roundph_epu16, __m512i, __mmask32, __m512h, 8) +test_2 (_mm512_maskz_ipcvt_roundps_epi32, __m512i, __mmask16, __m512, 8) +test_2 (_mm512_maskz_ipcvt_roundps_epu32, __m512i, __mmask16, __m512, 8) +test_2 (_mm512_maskz_ipcvtt_roundph_epi16, __m512i, __mmask32, __m512h, 8) +test_2 (_mm512_maskz_ipcvtt_roundph_epu16, __m512i, __mmask32, __m512h, 8) +test_2 (_mm512_maskz_ipcvtt_roundps_epi32, __m512i, __mmask16, __m512, 8) +test_2 (_mm512_maskz_ipcvtt_roundps_epu32, __m512i, __mmask16, __m512, 8) +test_3 (_mm512_mask_ipcvt_roundph_epi16, __m512i, __m512i, __mmask32, __m512h, 8) +test_3 (_mm512_mask_ipcvt_roundph_epu16, __m512i, __m512i, __mmask32, __m512h, 8) +test_3 (_mm512_mask_ipcvt_roundps_epi32, __m512i, __m512i, __mmask16, __m512, 8) +test_3 (_mm512_mask_ipcvt_roundps_epu32, __m512i, __m512i, __mmask16, __m512, 8) +test_3 (_mm512_mask_ipcvtt_roundph_epi16, __m512i, __m512i, __mmask32, __m512h, 8) +test_3 (_mm512_mask_ipcvtt_roundph_epu16, __m512i, __m512i, __mmask32, __m512h, 8) +test_3 (_mm512_mask_ipcvtt_roundps_epi32, __m512i, __m512i, __mmask16, __m512, 8) +test_3 (_mm512_mask_ipcvtt_roundps_epu32, __m512i, __m512i, __mmask16, __m512, 8) + +/* avx10_2satcvtintrin.h */ +test_1 (_mm256_ipcvt_roundph_epi16, __m256i, __m256h, 8) +test_1 (_mm256_ipcvt_roundph_epu16, __m256i, __m256h, 8) +test_1 (_mm256_ipcvt_roundps_epi32, __m256i, __m256, 8) +test_1 (_mm256_ipcvt_roundps_epu32, __m256i, __m256, 8) +test_1 (_mm256_ipcvtt_roundph_epi16, __m256i, __m256h, 8) +test_1 (_mm256_ipcvtt_roundph_epu16, __m256i, __m256h, 8) +test_1 (_mm256_ipcvtt_roundps_epi32, __m256i, __m256, 8) +test_1 (_mm256_ipcvtt_roundps_epu32, __m256i, __m256, 8) +test_2 (_mm256_maskz_ipcvt_roundph_epi16, __m256i, __mmask16, __m256h, 8) +test_2 (_mm256_maskz_ipcvt_roundph_epu16, __m256i, __mmask16, __m256h, 8) +test_2 (_mm256_maskz_ipcvt_roundps_epi32, __m256i, __mmask8, __m256, 8) +test_2 (_mm256_maskz_ipcvt_roundps_epu32, __m256i, __mmask8, __m256, 8) +test_2 (_mm256_maskz_ipcvtt_roundph_epi16, __m256i, __mmask16, __m256h, 8) +test_2 (_mm256_maskz_ipcvtt_roundph_epu16, __m256i, __mmask16, __m256h, 8) +test_2 (_mm256_maskz_ipcvtt_roundps_epi32, __m256i, __mmask8, __m256, 8) +test_2 (_mm256_maskz_ipcvtt_roundps_epu32, __m256i, __mmask8, __m256, 8) +test_3 (_mm256_mask_ipcvt_roundph_epi16, __m256i, __m256i, __mmask16, __m256h, 8) +test_3 (_mm256_mask_ipcvt_roundph_epu16, __m256i, __m256i, __mmask16, __m256h, 8) +test_3 (_mm256_mask_ipcvt_roundps_epi32, __m256i, __m256i, __mmask8, __m256, 8) +test_3 (_mm256_mask_ipcvt_roundps_epu32, __m256i, __m256i, __mmask8, __m256, 8) +test_3 (_mm256_mask_ipcvtt_roundph_epi16, __m256i, __m256i, __mmask16, __m256h, 8) +test_3 (_mm256_mask_ipcvtt_roundph_epu16, __m256i, __m256i, __mmask16, __m256h, 8) +test_3 (_mm256_mask_ipcvtt_roundps_epi32, __m256i, __m256i, __mmask8, __m256, 8) +test_3 (_mm256_mask_ipcvtt_roundps_epu32, __m256i, __m256i, __mmask8, __m256, 8) diff --git a/gcc/testsuite/gcc.target/i386/sse-23.c b/gcc/testsuite/gcc.target/i386/sse-23.c index a33eb9945dd..7e8b5d01871 100644 --- a/gcc/testsuite/gcc.target/i386/sse-23.c +++ b/gcc/testsuite/gcc.target/i386/sse-23.c @@ -1017,6 +1017,26 @@ #define __builtin_ia32_cmppbf16256_mask(A, B, C, D) __builtin_ia32_cmppbf16256_mask(A, B, 1, D) #define __builtin_ia32_cmppbf16128_mask(A, B, C, D) __builtin_ia32_cmppbf16128_mask(A, B, 1, D) +/* avx10_2-512satcvtintrin.h */ +#define __builtin_ia32_cvtph2ibs512_mask_round(A, B, C, D) __builtin_ia32_cvtph2ibs512_mask_round(A, B, C, 8) +#define __builtin_ia32_cvtph2iubs512_mask_round(A, B, C, D) __builtin_ia32_cvtph2iubs512_mask_round(A, B, C, 8) +#define __builtin_ia32_cvtps2ibs512_mask_round(A, B, C, D) __builtin_ia32_cvtps2ibs512_mask_round(A, B, C, 8) +#define __builtin_ia32_cvtps2iubs512_mask_round(A, B, C, D) __builtin_ia32_cvtps2iubs512_mask_round(A, B, C, 8) +#define __builtin_ia32_cvttph2ibs512_mask_round(A, B, C, D) __builtin_ia32_cvttph2ibs512_mask_round(A, B, C, 8) +#define __builtin_ia32_cvttph2iubs512_mask_round(A, B, C, D) __builtin_ia32_cvttph2iubs512_mask_round(A, B, C, 8) +#define __builtin_ia32_cvttps2ibs512_mask_round(A, B, C, D) __builtin_ia32_cvttps2ibs512_mask_round(A, B, C, 8) +#define __builtin_ia32_cvttps2iubs512_mask_round(A, B, C, D) __builtin_ia32_cvttps2iubs512_mask_round(A, B, C, 8) + +/* avx10_2satcvtintrin.h */ +#define __builtin_ia32_cvtph2ibs256_mask_round(A, B, C, D) __builtin_ia32_cvtph2ibs256_mask_round(A, B, C, 8) +#define __builtin_ia32_cvtph2iubs256_mask_round(A, B, C, D) __builtin_ia32_cvtph2iubs256_mask_round(A, B, C, 8) +#define __builtin_ia32_cvtps2ibs256_mask_round(A, B, C, D) __builtin_ia32_cvtps2ibs256_mask_round(A, B, C, 8) +#define __builtin_ia32_cvtps2iubs256_mask_round(A, B, C, D) __builtin_ia32_cvtps2iubs256_mask_round(A, B, C, 8) +#define __builtin_ia32_cvttph2ibs256_mask_round(A, B, C, D) __builtin_ia32_cvttph2ibs256_mask_round(A, B, C, 8) +#define __builtin_ia32_cvttph2iubs256_mask_round(A, B, C, D) __builtin_ia32_cvttph2iubs256_mask_round(A, B, C, 8) +#define __builtin_ia32_cvttps2ibs256_mask_round(A, B, C, D) __builtin_ia32_cvttps2ibs256_mask_round(A, B, C, 8) +#define __builtin_ia32_cvttps2iubs256_mask_round(A, B, C, D) __builtin_ia32_cvttps2iubs256_mask_round(A, B, C, 8) + #pragma GCC target ("sse4a,3dnow,avx,avx2,fma4,xop,aes,pclmul,popcnt,abm,lzcnt,bmi,bmi2,tbm,lwp,fsgsbase,rdrnd,f16c,fma,rtm,rdseed,prfchw,adx,fxsr,xsaveopt,sha,xsavec,xsaves,clflushopt,clwb,mwaitx,clzero,pku,sgx,rdpid,gfni,vpclmulqdq,pconfig,wbnoinvd,enqcmd,avx512vp2intersect,serialize,tsxldtrk,amx-tile,amx-int8,amx-bf16,kl,widekl,avxvnni,avxifma,avxvnniint8,avxneconvert,cmpccxadd,amx-fp16,prefetchi,raoint,amx-complex,avxvnniint16,sm3,sha512,sm4,avx10.2-512") #include From patchwork Mon Aug 19 08:56:52 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Haochen Jiang X-Patchwork-Id: 1973729 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=intel.com header.i=@intel.com header.a=rsa-sha256 header.s=Intel header.b=BgH9cpXN; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=8.43.85.97; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4WnRNb0Q3Zz1yfj for ; Mon, 19 Aug 2024 18:59:11 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 4D952384AB52 for ; Mon, 19 Aug 2024 08:59:09 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.18]) by sourceware.org (Postfix) with ESMTPS id CAF4A386547E for ; Mon, 19 Aug 2024 08:57:37 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org CAF4A386547E Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=intel.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org CAF4A386547E Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=192.198.163.18 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1724057865; cv=none; b=pghyt1URfr9ifbB/IyX/PpC8Da1nRuUuUyC660UHy6LPLmkmK4L4N5UPAJbB1w8l8D1LhQco/kQvqzbgZJDCq61pZcdTUFW2T1+DadWT+aV9m4L/sORCqWax4q89ltHbq1itNmWlYSL2EqFTRGeN/Of4ph3l85eVdZtc4MNZGkM= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1724057865; c=relaxed/simple; bh=AibQEq0ZRiQslAJA2+/jYNpXjD8D1ZxyNX7gBPgEfG4=; h=DKIM-Signature:From:To:Subject:Date:Message-ID:MIME-Version; b=j+RCDYv8U5VdWC9OcgjHqIobJrS/3ldsEjl1yIdynpyMTCTcRDuEmK57TYK/ngbvxLOzeczzHz4QgonlJhXYIwOIkQdKcW4bKc8n5x/zHLk7b2oAdJ6QcJwHoRPhBE2gC8k5jRQRbpBjgTqL5h7B2DWBSmbt/g+oWMgvMypPOhs= ARC-Authentication-Results: i=1; server2.sourceware.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1724057858; x=1755593858; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=AibQEq0ZRiQslAJA2+/jYNpXjD8D1ZxyNX7gBPgEfG4=; b=BgH9cpXNxIleZmOaDKp9ZFViSi7UHkDpogrWg3xNfM+FfqobSS5+R9e5 Miu4zyVaT4NB8jpPbJEV4h1jNe6EhGmViRjMl8Oy2mMipGHA36esIyUPf hG4YpCUb3wIOtcgoFfil3qyctx9p//XPE/1PeEMJpCt0hey6BbM+SpZ8L sT0unbuD2q3bIRlMGw3YUN2tSaonaHjwkTJ94SkFKTKSN2wZnvsXXOczl nx+EEcL9562t2LG9Ono3VLcpk3aDjdWcsLyiYgVd7ApjlscznVz8Ky+M2 xBchZW0pmLYrFIASL5k1jzw0Jw6zw8qA56RiLzUSbdVWEqViOu3SlhQm9 w==; X-CSE-ConnectionGUID: d2P4I80lTOmMvUIpUu5d3A== X-CSE-MsgGUID: b8N6wsovQdGAgUEUPSc3Hw== X-IronPort-AV: E=McAfee;i="6700,10204,11168"; a="21837768" X-IronPort-AV: E=Sophos;i="6.10,158,1719903600"; d="scan'208";a="21837768" Received: from fmviesa001.fm.intel.com ([10.60.135.141]) by fmvoesa112.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Aug 2024 01:57:23 -0700 X-CSE-ConnectionGUID: wPKYuTpETW+APNxfI5/W8w== X-CSE-MsgGUID: KhBPLr84S6+dUDeTdK7hYA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.10,158,1719903600"; d="scan'208";a="91084211" Received: from scymds04.sc.intel.com ([10.82.73.238]) by fmviesa001.fm.intel.com with ESMTP; 19 Aug 2024 01:57:22 -0700 Received: from icl-spr-01.jf.intel.com (icl-spr-01.jf.intel.com [10.165.54.241]) by scymds04.sc.intel.com (Postfix) with ESMTP id 6B6342003EAB; Mon, 19 Aug 2024 01:57:22 -0700 (PDT) From: Haochen Jiang To: gcc-patches@gcc.gnu.org Cc: hongtao.liu@intel.com, zewei.mo@pitt.edu, ubizjak@gmail.com, "Hu, Lin1" Subject: [PATCH 08/12] [PATCH 2/2] AVX10.2: Support saturating convert instructions Date: Mon, 19 Aug 2024 01:56:52 -0700 Message-ID: <20240819085717.193256-9-haochen.jiang@intel.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20240819085717.193256-1-haochen.jiang@intel.com> References: <20240819085717.193256-1-haochen.jiang@intel.com> MIME-Version: 1.0 X-Spam-Status: No, score=-10.9 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, SPF_HELO_NONE, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org From: "Hu, Lin1" gcc/ChangeLog: * config/i386/i386-builtin.def (BDESC): Add new builtins. * config/i386/sse.md (avx10_2_vcvttpd2dqs): New. (avx10_2_vcvttpd2qqs): Ditto. (avx10_2_vcvttps2dqs): Ditto. (avx10_2_vcvttps2qqs): Ditto. (avx10_2_vcvttsd2sis): Ditto. (avx10_2_vcvttss2sis): Ditto. gcc/testsuite/ChangeLog: * gcc.target/i386/avx-1.c: Add macros. * gcc.target/i386/sse-13.c: Ditto. * gcc.target/i386/sse-14.c: Ditto. * gcc.target/i386/sse-22.c: Ditto. * gcc.target/i386/sse-23.c: Ditto. * gcc.target/i386/avx10_2-satcvt-1.c: New test. * gcc.target/i386/avx10_2-satcvt-512-1.c: Ditto. * gcc.target/i386/avx10_2-satcvt-512-vcvttpd2dqs-2.c: Ditto. * gcc.target/i386/avx10_2-satcvt-512-vcvttpd2qqs-2.c: Ditto. * gcc.target/i386/avx10_2-satcvt-512-vcvttpd2udqs-2.c: Ditto. * gcc.target/i386/avx10_2-satcvt-512-vcvttpd2uqqs-2.c: Ditto. * gcc.target/i386/avx10_2-satcvt-512-vcvttps2dqs-2.c: Ditto. * gcc.target/i386/avx10_2-satcvt-512-vcvttps2qqs-2.c: Ditto. * gcc.target/i386/avx10_2-satcvt-512-vcvttps2udqs-2.c: Ditto. * gcc.target/i386/avx10_2-satcvt-512-vcvttps2uqqs-2.c: Ditto. * gcc.target/i386/avx10_2-satcvt-vcvttpd2dqs-2.c: Ditto. * gcc.target/i386/avx10_2-satcvt-vcvttpd2qqs-2.c: Ditto. * gcc.target/i386/avx10_2-satcvt-vcvttpd2udqs-2.c: Ditto. * gcc.target/i386/avx10_2-satcvt-vcvttpd2uqqs-2.c: Ditto. * gcc.target/i386/avx10_2-satcvt-vcvttps2dqs-2.c: Ditto. * gcc.target/i386/avx10_2-satcvt-vcvttps2qqs-2.c: Ditto. * gcc.target/i386/avx10_2-satcvt-vcvttps2udqs-2.c: Ditto. * gcc.target/i386/avx10_2-satcvt-vcvttps2uqqs-2.c: Ditto. * gcc.target/i386/avx10_2-satcvt-vcvttsd2sis-2.c: Ditto. * gcc.target/i386/avx10_2-satcvt-vcvttsd2usis-2.c: Ditto. * gcc.target/i386/avx10_2-satcvt-vcvttss2sis-2.c: Ditto. * gcc.target/i386/avx10_2-satcvt-vcvttss2usis-2.c: Ditto. --- gcc/config/i386/avx10_2-512satcvtintrin.h | 456 +++++++ gcc/config/i386/avx10_2satcvtintrin.h | 1055 ++++++++++++++++- gcc/config/i386/i386-builtin.def | 33 + gcc/config/i386/sse.md | 83 +- gcc/testsuite/gcc.target/i386/avx-1.c | 26 + .../gcc.target/i386/avx10_2-512-satcvt-1.c | 59 + .../i386/avx10_2-512-vcvttpd2dqs-2.c | 72 ++ .../i386/avx10_2-512-vcvttpd2qqs-2.c | 72 ++ .../i386/avx10_2-512-vcvttpd2udqs-2.c | 72 ++ .../i386/avx10_2-512-vcvttpd2uqqs-2.c | 72 ++ .../i386/avx10_2-512-vcvttps2dqs-2.c | 72 ++ .../i386/avx10_2-512-vcvttps2qqs-2.c | 73 ++ .../i386/avx10_2-512-vcvttps2udqs-2.c | 72 ++ .../i386/avx10_2-512-vcvttps2uqqs-2.c | 72 ++ .../gcc.target/i386/avx10_2-satcvt-1.c | 138 +++ .../gcc.target/i386/avx10_2-vcvttpd2dqs-2.c | 16 + .../gcc.target/i386/avx10_2-vcvttpd2qqs-2.c | 16 + .../gcc.target/i386/avx10_2-vcvttpd2udqs-2.c | 16 + .../gcc.target/i386/avx10_2-vcvttpd2uqqs-2.c | 16 + .../gcc.target/i386/avx10_2-vcvttps2dqs-2.c | 16 + .../gcc.target/i386/avx10_2-vcvttps2qqs-2.c | 16 + .../gcc.target/i386/avx10_2-vcvttps2udqs-2.c | 16 + .../gcc.target/i386/avx10_2-vcvttps2uqqs-2.c | 16 + .../gcc.target/i386/avx10_2-vcvttsd2sis-2.c | 47 + .../gcc.target/i386/avx10_2-vcvttsd2usis-2.c | 47 + .../gcc.target/i386/avx10_2-vcvttss2sis-2.c | 47 + .../gcc.target/i386/avx10_2-vcvttss2usis-2.c | 46 + gcc/testsuite/gcc.target/i386/sse-13.c | 26 + gcc/testsuite/gcc.target/i386/sse-14.c | 58 + gcc/testsuite/gcc.target/i386/sse-22.c | 58 + gcc/testsuite/gcc.target/i386/sse-23.c | 26 + 31 files changed, 2870 insertions(+), 40 deletions(-) create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-512-vcvttpd2dqs-2.c create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-512-vcvttpd2qqs-2.c create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-512-vcvttpd2udqs-2.c create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-512-vcvttpd2uqqs-2.c create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-512-vcvttps2dqs-2.c create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-512-vcvttps2qqs-2.c create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-512-vcvttps2udqs-2.c create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-512-vcvttps2uqqs-2.c create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-vcvttpd2dqs-2.c create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-vcvttpd2qqs-2.c create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-vcvttpd2udqs-2.c create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-vcvttpd2uqqs-2.c create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-vcvttps2dqs-2.c create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-vcvttps2qqs-2.c create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-vcvttps2udqs-2.c create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-vcvttps2uqqs-2.c create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-vcvttsd2sis-2.c create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-vcvttsd2usis-2.c create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-vcvttss2sis-2.c create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-vcvttss2usis-2.c diff --git a/gcc/config/i386/avx10_2-512satcvtintrin.h b/gcc/config/i386/avx10_2-512satcvtintrin.h index 4286458c413..d625a644948 100644 --- a/gcc/config/i386/avx10_2-512satcvtintrin.h +++ b/gcc/config/i386/avx10_2-512satcvtintrin.h @@ -438,6 +438,286 @@ _mm512_maskz_ipcvtt_roundps_epu32 (__mmask16 __U, __m512 __A, const int __R) (__mmask16) __U, __R); } + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_cvtts_roundpd_epi32 (__m512d __A, const int __R) +{ + return (__m256i) + __builtin_ia32_cvttpd2dqs512_mask_round ((__v8df) __A, + (__v8si) + _mm256_undefined_si256 (), + (__mmask8) -1, + __R); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_cvtts_roundpd_epi32 (__m256i __W, __mmask8 __U, __m512d __A, + const int __R) +{ + return (__m256i) __builtin_ia32_cvttpd2dqs512_mask_round ((__v8df) __A, + (__v8si) __W, + (__mmask8) __U, + __R); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_cvtts_roundpd_epi32 (__mmask8 __U, __m512d __A, const int __R) +{ + return + (__m256i) __builtin_ia32_cvttpd2dqs512_mask_round ((__v8df) __A, + (__v8si) + _mm256_setzero_si256 (), + (__mmask8) __U, + __R); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_cvtts_roundpd_epi64 (__m512d __A, const int __R) +{ + return (__m512i) + __builtin_ia32_cvttpd2qqs512_mask_round ((__v8df) __A, + (__v8di) + _mm512_undefined_si512 (), + (__mmask8) -1, + __R); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_cvtts_roundpd_epi64 (__m512i __W, __mmask8 __U, __m512d __A, + const int __R) +{ + return (__m512i) __builtin_ia32_cvttpd2qqs512_mask_round ((__v8df) __A, + (__v8di) __W, + (__mmask8) __U, + __R); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_cvtts_roundpd_epi64 (__mmask8 __U, __m512d __A, const int __R) +{ + return + (__m512i) __builtin_ia32_cvttpd2qqs512_mask_round ((__v8df) __A, + (__v8di) + _mm512_setzero_si512 (), + (__mmask8) __U, + __R); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_cvtts_roundpd_epu32 (__m512d __A, const int __R) +{ + return (__m256i) + __builtin_ia32_cvttpd2udqs512_mask_round ((__v8df) __A, + (__v8si) + _mm256_undefined_si256 (), + (__mmask8) -1, + __R); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_cvtts_roundpd_epu32 (__m256i __W, __mmask8 __U, __m512d __A, + const int __R) +{ + return (__m256i) __builtin_ia32_cvttpd2udqs512_mask_round ((__v8df) __A, + (__v8si) __W, + (__mmask8) __U, + __R); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_cvtts_roundpd_epu32 (__mmask8 __U, __m512d __A, const int __R) +{ + return + (__m256i) __builtin_ia32_cvttpd2udqs512_mask_round ((__v8df) __A, + (__v8si) + _mm256_setzero_si256 (), + (__mmask8) __U, + __R); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_cvtts_roundpd_epu64 (__m512d __A, const int __R) +{ + return (__m512i) + __builtin_ia32_cvttpd2uqqs512_mask_round ((__v8df) __A, + (__v8di) + _mm512_undefined_si512 (), + (__mmask8) -1, + __R); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_cvtts_roundpd_epu64 (__m512i __W, __mmask8 __U, __m512d __A, + const int __R) +{ + return (__m512i) __builtin_ia32_cvttpd2uqqs512_mask_round ((__v8df) __A, + (__v8di) __W, + (__mmask8) __U, + __R); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_cvtts_roundpd_epu64 (__mmask8 __U, __m512d __A, const int __R) +{ + return (__m512i) + __builtin_ia32_cvttpd2uqqs512_mask_round ((__v8df) __A, + (__v8di) + _mm512_setzero_si512 (), + (__mmask8) __U, + __R); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_cvtts_roundps_epi32 (__m512 __A, const int __R) +{ + return (__m512i) + __builtin_ia32_cvttps2dqs512_mask_round ((__v16sf) __A, + (__v16si) + _mm512_undefined_si512 (), + (__mmask16) -1, + __R); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_cvtts_roundps_epi32 (__m512i __W, __mmask16 __U, __m512 __A, + const int __R) +{ + return (__m512i) __builtin_ia32_cvttps2dqs512_mask_round ((__v16sf) __A, + (__v16si) __W, + (__mmask16) __U, + __R); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_cvtts_roundps_epi32 (__mmask16 __U, __m512 __A, const int __R) +{ + return + (__m512i) __builtin_ia32_cvttps2dqs512_mask_round ((__v16sf) __A, + (__v16si) + _mm512_setzero_si512 (), + (__mmask16) __U, + __R); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_cvtts_roundps_epi64 (__m256 __A, const int __R) +{ + return (__m512i) + __builtin_ia32_cvttps2qqs512_mask_round ((__v8sf) __A, + (__v8di) + _mm512_undefined_si512 (), + (__mmask8) -1, + __R); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_cvtts_roundps_epi64 (__m512i __W, __mmask8 __U, __m256 __A, + const int __R) +{ + return (__m512i) __builtin_ia32_cvttps2qqs512_mask_round ((__v8sf) __A, + (__v8di) __W, + (__mmask8) __U, + __R); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_cvtts_roundps_epi64 (__mmask8 __U, __m256 __A, const int __R) +{ + return + (__m512i) __builtin_ia32_cvttps2qqs512_mask_round ((__v8sf) __A, + (__v8di) + _mm512_setzero_si512 (), + (__mmask8) __U, + __R); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_cvtts_roundps_epu32 (__m512 __A, const int __R) +{ + return (__m512i) + __builtin_ia32_cvttps2udqs512_mask_round ((__v16sf) __A, + (__v16si) + _mm512_undefined_si512 (), + (__mmask16) -1, + __R); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_cvtts_roundps_epu32 (__m512i __W, __mmask16 __U, __m512 __A, + const int __R) +{ + return (__m512i) __builtin_ia32_cvttps2udqs512_mask_round ((__v16sf) __A, + (__v16si) __W, + (__mmask16) __U, + __R); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_cvtts_roundps_epu32 (__mmask16 __U, __m512 __A, const int __R) +{ + return (__m512i) + __builtin_ia32_cvttps2udqs512_mask_round ((__v16sf) __A, + (__v16si) + _mm512_setzero_si512 (), + (__mmask16) __U, + __R); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_cvtts_roundps_epu64 (__m256 __A, const int __R) +{ + return (__m512i) + __builtin_ia32_cvttps2uqqs512_mask_round ((__v8sf) __A, + (__v8di) + _mm512_undefined_si512 (), + (__mmask8) -1, + __R); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_cvtts_roundps_epu64 (__m512i __W, __mmask8 __U, __m256 __A, + const int __R) +{ + return (__m512i) __builtin_ia32_cvttps2uqqs512_mask_round ((__v8sf) __A, + (__v8di) __W, + (__mmask8) __U, + __R); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_cvtts_roundps_epu64 (__mmask8 __U, __m256 __A, const int __R) +{ + return + (__m512i) __builtin_ia32_cvttps2uqqs512_mask_round ((__v8sf) __A, + (__v8di) + _mm512_setzero_si512 (), + (__mmask8) __U, + __R); +} #else #define _mm512_ipcvt_roundph_epi16(A, R) \ ((__m512i) \ @@ -614,6 +894,182 @@ _mm512_maskz_ipcvtt_roundps_epu32 (__mmask16 __U, __m512 __A, const int __R) (_mm512_setzero_si512 ()), \ (__mmask16) (U), \ (R))) + +#define _mm512_cvtts_roundpd_epi32(A, R) \ + ((__m256i) \ + __builtin_ia32_cvttpd2dqs512_mask_round ((__v8df) (A), \ + (__v8si) \ + (_mm256_undefined_si256 ()), \ + (__mmask8) (-1), \ + (R))) + +#define _mm512_mask_cvtts_roundpd_epi32(W, U, A, R) \ + ((__m256i) __builtin_ia32_cvttpd2dqs512_mask_round ((__v8df) (A), \ + (__v8si) (W), \ + (__mmask8) (U), \ + (R))) + +#define _mm512_maskz_cvtts_roundpd_epi32(U, A, R) \ + ((__m256i) \ + __builtin_ia32_cvttpd2dqs512_mask_round ((__v8df) (A), \ + (__v8si) \ + (_mm256_setzero_si256 ()), \ + (__mmask8) (U), \ + (R))) + +#define _mm512_cvtts_roundpd_epi64(A, R) \ + ((__m512i) \ + __builtin_ia32_cvttpd2qqs512_mask_round ((__v8df) (A), \ + (__v8di) \ + (_mm512_undefined_si512 ()), \ + (__mmask8) (-1), \ + (R))) + +#define _mm512_mask_cvtts_roundpd_epi64(W, U, A, R) \ + ((__m512i) __builtin_ia32_cvttpd2qqs512_mask_round ((__v8df) (A), \ + (__v8di) (W), \ + (__mmask8) (U), \ + (R))) + +#define _mm512_maskz_cvtts_roundpd_epi64(U, A, R) \ + ((__m512i) \ + __builtin_ia32_cvttpd2qqs512_mask_round ((__v8df) (A), \ + (__v8di) \ + (_mm512_setzero_si512 ()), \ + (__mmask8) (U), \ + (R))) + +#define _mm512_cvtts_roundpd_epu32(A, R) \ + ((__m256i) \ + __builtin_ia32_cvttpd2udqs512_mask_round ((__v8df) (A), \ + (__v8si) \ + (_mm256_undefined_si256 ()), \ + (__mmask8) (-1), \ + (R))) + +#define _mm512_mask_cvtts_roundpd_epu32(W, U, A, R) \ + ((__m256i) __builtin_ia32_cvttpd2udqs512_mask_round ((__v8df) (A), \ + (__v8si) (W), \ + (__mmask8) (U), \ + (R))) + +#define _mm512_maskz_cvtts_roundpd_epu32(U, A, R) \ + ((__m256i) \ + __builtin_ia32_cvttpd2udqs512_mask_round ((__v8df) (A), \ + (__v8si) \ + (_mm256_setzero_si256 ()), \ + (__mmask8) (U), \ + (R))) + +#define _mm512_cvtts_roundpd_epu64(A, R) \ + ((__m512i) \ + __builtin_ia32_cvttpd2uqqs512_mask_round ((__v8df) (A), \ + (__v8di) \ + (_mm512_undefined_si512 ()), \ + (__mmask8) (-1), \ + (R))) + +#define _mm512_mask_cvtts_roundpd_epu64(W, U, A, R) \ + ((__m512i) __builtin_ia32_cvttpd2uqqs512_mask_round ((__v8df) (A), \ + (__v8di) (W), \ + (__mmask8) (U), \ + (R))) + +#define _mm512_maskz_cvtts_roundpd_epu64(U, A, R) \ + ((__m512i) \ + __builtin_ia32_cvttpd2uqqs512_mask_round ((__v8df) (A), \ + (__v8di) \ + (_mm512_setzero_si512 ()), \ + (__mmask8) (U), \ + (R))) + +#define _mm512_cvtts_roundps_epi32(A, R) \ + ((__m512i) \ + __builtin_ia32_cvttps2dqs512_mask_round ((__v16sf) (A), \ + (__v16si) \ + (_mm512_undefined_si512 ()), \ + (__mmask16) (-1), \ + (R))) + +#define _mm512_mask_cvtts_roundps_epi32(W, U, A, R) \ + ((__m512i) __builtin_ia32_cvttps2dqs512_mask_round ((__v16sf) (A), \ + (__v16si) (W), \ + (__mmask16) (U), \ + (R))) + +#define _mm512_maskz_cvtts_roundps_epi32(U, A, R) \ + ((__m512i) \ + __builtin_ia32_cvttps2dqs512_mask_round ((__v16sf) (A), \ + (__v16si) \ + (_mm512_setzero_si512 ()), \ + (__mmask16) (U), \ + (R))) + +#define _mm512_cvtts_roundps_epi64(A, R) \ + ((__m512i) \ + __builtin_ia32_cvttps2qqs512_mask_round ((__v8sf) (A), \ + (__v8di) \ + (_mm512_undefined_si512 ()), \ + (__mmask8) (-1), \ + (R))) + +#define _mm512_mask_cvtts_roundps_epi64(W, U, A, R) \ + ((__m512i) __builtin_ia32_cvttps2qqs512_mask_round ((__v8sf) (A), \ + (__v8di) (W), \ + (__mmask8) (U), \ + (R))) + +#define _mm512_maskz_cvtts_roundps_epi64(U, A, R) \ + ((__m512i) \ + __builtin_ia32_cvttps2qqs512_mask_round ((__v8sf) (A), \ + (__v8di) \ + (_mm512_setzero_si512 ()), \ + (__mmask8) (U), \ + (R))) + +#define _mm512_cvtts_roundps_epu32(A, R) \ + ((__m512i) \ + __builtin_ia32_cvttps2udqs512_mask_round ((__v16sf) (A), \ + (__v16si) \ + (_mm512_undefined_si512 ()), \ + (__mmask16) (-1), \ + (R))) + +#define _mm512_mask_cvtts_roundps_epu32(W, U, A, R) \ + ((__m512i) __builtin_ia32_cvttps2udqs512_mask_round ((__v16sf) (A), \ + (__v16si) (W), \ + (__mmask16) (U), \ + (R))) + +#define _mm512_maskz_cvtts_roundps_epu32(U, A, R) \ + ((__m512i) \ + __builtin_ia32_cvttps2udqs512_mask_round ((__v16sf) (A), \ + (__v16si) \ + (_mm512_setzero_si512 ()), \ + (__mmask16) (U), \ + (R))) + +#define _mm512_cvtts_roundps_epu64(A, R) \ + ((__m512i) \ + __builtin_ia32_cvttps2uqqs512_mask_round ((__v8sf) (A), \ + (__v8di) \ + (_mm512_undefined_si512 ()), \ + (__mmask8) (-1), \ + (R))) + +#define _mm512_mask_cvtts_roundps_epu64(W, U, A, R) \ + ((__m512i) __builtin_ia32_cvttps2uqqs512_mask_round ((__v8sf) (A), \ + (__v8di) (W), \ + (__mmask8) (U), \ + (R))) + +#define _mm512_maskz_cvtts_roundps_epu64(U, A, R) \ + ((__m512i) \ + __builtin_ia32_cvttps2uqqs512_mask_round ((__v8sf) (A), \ + (__v8di) \ + (_mm512_setzero_si512 ()), \ + (__mmask8) (U), \ + (R))) #endif #ifdef __DISABLE_AVX10_2_512__ diff --git a/gcc/config/i386/avx10_2satcvtintrin.h b/gcc/config/i386/avx10_2satcvtintrin.h index 4fcf78955df..d0e3e3790c4 100644 --- a/gcc/config/i386/avx10_2satcvtintrin.h +++ b/gcc/config/i386/avx10_2satcvtintrin.h @@ -510,6 +510,238 @@ _mm_maskz_ipcvttps_epu32 (__mmask8 __U, __m128 __A) (__mmask8) __U); } +extern __inline __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvttspd_epi32 (__m128d __A) +{ + return (__m128i) __builtin_ia32_cvttpd2dqs128_mask ((__v2df) __A, + (__v4si) + _mm_undefined_si128 (), + (__mmask8) -1); +} + +extern __inline __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_cvttspd_epi32 (__m128i __W, __mmask8 __U, __m128d __A) +{ + return (__m128i) __builtin_ia32_cvttpd2dqs128_mask ((__v2df) __A, + (__v4si) __W, + (__mmask8) __U); +} + +extern __inline __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_cvttspd_epi32 (__mmask8 __U, __m128d __A) +{ + return (__m128i) __builtin_ia32_cvttpd2dqs128_mask ((__v2df) __A, + (__v4si) + _mm_setzero_si128 (), + (__mmask8) __U); +} + +extern __inline __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvttspd_epi64 (__m128d __A) +{ + return (__m128i) __builtin_ia32_cvttpd2qqs128_mask ((__v2df) __A, + (__v2di) + _mm_undefined_si128 (), + (__mmask8) -1); +} + +extern __inline __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_cvttspd_epi64 (__m128i __W, __mmask8 __U, __m128d __A) +{ + return (__m128i) __builtin_ia32_cvttpd2qqs128_mask ((__v2df) __A, + (__v2di) __W, + (__mmask8) __U); +} + +extern __inline __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_cvttspd_epi64 (__mmask8 __U, __m128d __A) +{ + return (__m128i) __builtin_ia32_cvttpd2qqs128_mask ((__v2df) __A, + (__v2di) + _mm_setzero_si128 (), + (__mmask8) __U); +} + +extern __inline __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvttspd_epu32 (__m128d __A) +{ + return (__m128i) __builtin_ia32_cvttpd2udqs128_mask ((__v2df) __A, + (__v4si) + _mm_undefined_si128 (), + (__mmask8) -1); +} + +extern __inline __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_cvttspd_epu32 (__m128i __W, __mmask8 __U, __m128d __A) +{ + return (__m128i) __builtin_ia32_cvttpd2udqs128_mask ((__v2df) __A, + (__v4si) __W, + (__mmask8) __U); +} + +extern __inline __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_cvttspd_epu32 (__mmask8 __U, __m128d __A) +{ + return (__m128i) __builtin_ia32_cvttpd2udqs128_mask ((__v2df) __A, + (__v4si) + _mm_setzero_si128 (), + (__mmask8) __U); +} + +extern __inline __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvttspd_epu64 (__m128d __A) +{ + return (__m128i) __builtin_ia32_cvttpd2uqqs128_mask ((__v2df) __A, + (__v2di) + _mm_undefined_si128 (), + (__mmask8) -1); +} + +extern __inline __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_cvttspd_epu64 (__m128i __W, __mmask8 __U, __m128d __A) +{ + return (__m128i) __builtin_ia32_cvttpd2uqqs128_mask ((__v2df) __A, + (__v2di) __W, + (__mmask8) __U); +} + +extern __inline __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_cvttspd_epu64 (__mmask8 __U, __m128d __A) +{ + return (__m128i) __builtin_ia32_cvttpd2uqqs128_mask ((__v2df) __A, + (__v2di) + _mm_setzero_si128 (), + (__mmask8) __U); +} + +extern __inline __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvttsps_epi32 (__m128 __A) +{ + return (__m128i) __builtin_ia32_cvttps2dqs128_mask ((__v4sf) __A, + (__v4si) + _mm_undefined_si128 (), + (__mmask8) -1); +} + +extern __inline __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_cvttsps_epi32 (__m128i __W, __mmask8 __U, __m128 __A) +{ + return (__m128i) __builtin_ia32_cvttps2dqs128_mask ((__v4sf) __A, + (__v4si) __W, + (__mmask8) __U); +} + +extern __inline __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_cvttsps_epi32 (__mmask8 __U, __m128 __A) +{ + return (__m128i) __builtin_ia32_cvttps2dqs128_mask ((__v4sf) __A, + (__v4si) + _mm_setzero_si128 (), + (__mmask8) __U); +} + +extern __inline __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvttsps_epi64 (__m128 __A) +{ + return (__m128i) __builtin_ia32_cvttps2qqs128_mask ((__v4sf) __A, + (__v2di) + _mm_undefined_si128 (), + (__mmask8) -1); +} + +extern __inline __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_cvttsps_epi64 (__m128i __W, __mmask8 __U, __m128 __A) +{ + return (__m128i) __builtin_ia32_cvttps2qqs128_mask ((__v4sf) __A, + (__v2di) __W, + (__mmask8) __U); +} + +extern __inline __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_cvttsps_epi64 (__mmask8 __U, __m128 __A) +{ + return (__m128i) __builtin_ia32_cvttps2qqs128_mask ((__v4sf) __A, + (__v2di) + _mm_setzero_si128 (), + (__mmask8) __U); +} + +extern __inline __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvttsps_epu32 (__m128 __A) +{ + return (__m128i) __builtin_ia32_cvttps2udqs128_mask ((__v4sf) __A, + (__v4si) + _mm_undefined_si128 (), + (__mmask8) -1); +} + +extern __inline __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_cvttsps_epu32 (__m128i __W, __mmask8 __U, __m128 __A) +{ + return (__m128i) __builtin_ia32_cvttps2udqs128_mask ((__v4sf) __A, + (__v4si) __W, + (__mmask8) __U); +} + +extern __inline __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_cvttsps_epu32 (__mmask8 __U, __m128 __A) +{ + return (__m128i) __builtin_ia32_cvttps2udqs128_mask ((__v4sf) __A, + (__v4si) + _mm_setzero_si128 (), + (__mmask8) __U); +} + +extern __inline __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvttsps_epu64 (__m128 __A) +{ + return (__m128i) __builtin_ia32_cvttps2uqqs128_mask ((__v4sf) __A, + (__v2di) + _mm_undefined_si128 (), + (__mmask8) -1); +} + +extern __inline __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_cvttsps_epu64 (__m128i __W, __mmask8 __U, __m128 __A) +{ + return (__m128i) __builtin_ia32_cvttps2uqqs128_mask ((__v4sf) __A, + (__v2di) __W, + (__mmask8) __U); +} + +extern __inline __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_cvttsps_epu64 (__mmask8 __U, __m128 __A) +{ + return (__m128i) __builtin_ia32_cvttps2uqqs128_mask ((__v4sf) __A, + (__v2di) + _mm_setzero_si128 (), + (__mmask8) __U); +} + #ifdef __OPTIMIZE__ extern __inline __m256i __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) @@ -790,51 +1022,363 @@ _mm256_maskz_ipcvtt_roundps_epu32 (__mmask8 __U, __m256 __A, const int __R) (__mmask8) __U, __R); } -#else -#define _mm256_ipcvt_roundph_epi16(A, R) \ - ((__m256i) \ - __builtin_ia32_cvtph2ibs256_mask_round ((__v16hf) (A), \ - (__v16hi) \ - (_mm256_undefined_si256 ()), \ - (__mmask16) (-1), \ - (R))) +extern __inline __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_cvtts_roundpd_epi32 (__m256d __A, const int __R) +{ + return + (__m128i) __builtin_ia32_cvttpd2dqs256_mask_round ((__v4df) __A, + (__v4si) + _mm_undefined_si128 (), + (__mmask8) -1, + __R); +} -#define _mm256_mask_ipcvt_roundph_epi16(W, U, A, R) \ - ((__m256i) __builtin_ia32_cvtph2ibs256_mask_round ((__v16hf) (A), \ - (__v16hi) (W), \ - (__mmask16) (U), \ - (R))) +extern __inline __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_mask_cvtts_roundpd_epi32 (__m128i __W, __mmask8 __U, __m256d __A, + const int __R) +{ + return (__m128i) __builtin_ia32_cvttpd2dqs256_mask_round ((__v4df) __A, + (__v4si) __W, + (__mmask8) __U, + __R); +} -#define _mm256_maskz_ipcvt_roundph_epi16(U, A, R) \ - ((__m256i) \ - __builtin_ia32_cvtph2ibs256_mask_round ((__v16hf) (A), \ - (__v16hi) \ - (_mm256_setzero_si256 ()), \ - (__mmask16) (U), \ - (R))) +extern __inline __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_maskz_cvtts_roundpd_epi32 (__mmask8 __U, __m256d __A, const int __R) +{ + return + (__m128i) __builtin_ia32_cvttpd2dqs256_mask_round ((__v4df) __A, + (__v4si) + _mm_setzero_si128 (), + (__mmask8) __U, + __R); +} -#define _mm256_ipcvt_roundph_epu16(A, R) \ - ((__m256i) \ - __builtin_ia32_cvtph2iubs256_mask_round ((__v16hf) (A), \ - (__v16hi) \ - (_mm256_undefined_si256 ()), \ - (__mmask16) (-1), \ - (R))) +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_cvtts_roundpd_epi64 (__m256d __A, const int __R) +{ + return (__m256i) + __builtin_ia32_cvttpd2qqs256_mask_round ((__v4df) __A, + (__v4di) + _mm256_undefined_si256 (), + (__mmask8) -1, + __R); +} -#define _mm256_mask_ipcvt_roundph_epu16(W, U, A, R) \ - ((__m256i) __builtin_ia32_cvtph2iubs256_mask_round ((__v16hf) (A), \ - (__v16hi) (W), \ - (__mmask16) (U), \ - (R))) +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_mask_cvtts_roundpd_epi64 (__m256i __W, __mmask8 __U, __m256d __A, + const int __R) +{ + return (__m256i) __builtin_ia32_cvttpd2qqs256_mask_round ((__v4df) __A, + (__v4di) __W, + (__mmask8) __U, + __R); +} -#define _mm256_maskz_ipcvt_roundph_epu16(U, A, R) \ - ((__m256i) \ - __builtin_ia32_cvtph2iubs256_mask_round ((__v16hf) (A), \ - (__v16hi) \ - (_mm256_setzero_si256 ()), \ - (__mmask16) (U), \ - (R))) +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_maskz_cvtts_roundpd_epi64 (__mmask8 __U, __m256d __A, const int __R) +{ + return + (__m256i) __builtin_ia32_cvttpd2qqs256_mask_round ((__v4df) __A, + (__v4di) + _mm256_setzero_si256 (), + (__mmask8) __U, + __R); +} + +extern __inline __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_cvtts_roundpd_epu32 (__m256d __A, const int __R) +{ + return + (__m128i) __builtin_ia32_cvttpd2udqs256_mask_round ((__v4df) __A, + (__v4si) + _mm_undefined_si128 (), + (__mmask8) -1, + __R); +} + +extern __inline __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_mask_cvtts_roundpd_epu32 (__m128i __W, __mmask8 __U, __m256d __A, + const int __R) +{ + return (__m128i) __builtin_ia32_cvttpd2udqs256_mask_round ((__v4df) __A, + (__v4si) __W, + (__mmask8) __U, + __R); +} + +extern __inline __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_maskz_cvtts_roundpd_epu32 (__mmask8 __U, __m256d __A, const int __R) +{ + return + (__m128i) __builtin_ia32_cvttpd2udqs256_mask_round ((__v4df) __A, + (__v4si) + _mm_setzero_si128 (), + (__mmask8) __U, + __R); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_cvtts_roundpd_epu64 (__m256d __A, const int __R) +{ + return (__m256i) + __builtin_ia32_cvttpd2uqqs256_mask_round ((__v4df) __A, + (__v4di) + _mm256_undefined_si256 (), + (__mmask8) -1, + __R); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_mask_cvtts_roundpd_epu64 (__m256i __W, __mmask8 __U, __m256d __A, + const int __R) +{ + return (__m256i) __builtin_ia32_cvttpd2uqqs256_mask_round ((__v4df) __A, + (__v4di) __W, + (__mmask8) __U, + __R); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_maskz_cvtts_roundpd_epu64 (__mmask8 __U, __m256d __A, const int __R) +{ + return + (__m256i) __builtin_ia32_cvttpd2uqqs256_mask_round ((__v4df) __A, + (__v4di) + _mm256_setzero_si256 (), + (__mmask8) __U, + __R); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_cvtts_roundps_epi32 (__m256 __A, const int __R) +{ + return (__m256i) + __builtin_ia32_cvttps2dqs256_mask_round ((__v8sf) __A, + (__v8si) + _mm256_undefined_si256 (), + (__mmask8) -1, + __R); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_mask_cvtts_roundps_epi32 (__m256i __W, __mmask8 __U, __m256 __A, + const int __R) +{ + return (__m256i) __builtin_ia32_cvttps2dqs256_mask_round ((__v8sf) __A, + (__v8si) __W, + (__mmask8) __U, + __R); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_maskz_cvtts_roundps_epi32 (__mmask8 __U, __m256 __A, const int __R) +{ + return + (__m256i) __builtin_ia32_cvttps2dqs256_mask_round ((__v8sf) __A, + (__v8si) + _mm256_setzero_si256 (), + (__mmask8) __U, + __R); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_cvtts_roundps_epi64 (__m128 __A, const int __R) +{ + return (__m256i) + __builtin_ia32_cvttps2qqs256_mask_round ((__v4sf) __A, + (__v4di) + _mm256_undefined_si256 (), + (__mmask8) -1, + __R); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_mask_cvtts_roundps_epi64 (__m256i __W, __mmask8 __U, __m128 __A, + const int __R) +{ + return (__m256i) __builtin_ia32_cvttps2qqs256_mask_round ((__v4sf) __A, + (__v4di) __W, + (__mmask8) __U, + __R); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_maskz_cvtts_roundps_epi64 (__mmask8 __U, __m128 __A, const int __R) +{ + return + (__m256i) __builtin_ia32_cvttps2qqs256_mask_round ((__v4sf) __A, + (__v4di) + _mm256_setzero_si256 (), + (__mmask8) __U, + __R); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_cvtts_roundps_epu32 (__m256 __A, const int __R) +{ + return (__m256i) + __builtin_ia32_cvttps2udqs256_mask_round ((__v8sf) __A, + (__v8si) + _mm256_undefined_si256 (), + (__mmask8) -1, + __R); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_mask_cvtts_roundps_epu32 (__m256i __W, __mmask8 __U, __m256 __A, + const int __R) +{ + return (__m256i) __builtin_ia32_cvttps2udqs256_mask_round ((__v8sf) __A, + (__v8si) __W, + (__mmask8) __U, + __R); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_maskz_cvtts_roundps_epu32 (__mmask8 __U, __m256 __A, const int __R) +{ + return + (__m256i) __builtin_ia32_cvttps2udqs256_mask_round ((__v8sf) __A, + (__v8si) + _mm256_setzero_si256 (), + (__mmask8) __U, + __R); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_cvtts_roundps_epu64 (__m128 __A, const int __R) +{ + return (__m256i) + __builtin_ia32_cvttps2uqqs256_mask_round ((__v4sf) __A, + (__v4di) + _mm256_undefined_si256 (), + (__mmask8) -1, + __R); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_mask_cvtts_roundps_epu64 (__m256i __W, __mmask8 __U, __m128 __A, + const int __R) +{ + return (__m256i) __builtin_ia32_cvttps2uqqs256_mask_round ((__v4sf) __A, + (__v4di) __W, + (__mmask8) __U, + __R); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_maskz_cvtts_roundps_epu64 (__mmask8 __U, __m128 __A, const int __R) +{ + return + (__m256i) __builtin_ia32_cvttps2uqqs256_mask_round ((__v4sf) __A, + (__v4di) + _mm256_setzero_si256 (), + (__mmask8) __U, + __R); +} + +extern __inline int +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvtts_roundsd_epi32 (__m128d __A, const int __R) +{ + return (int) __builtin_ia32_cvttsd2sis32_round ((__v2df) __A, + __R); +} + +extern __inline unsigned int +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvtts_roundsd_epu32 (__m128d __A, const int __R) +{ + return (unsigned int) __builtin_ia32_cvttsd2usis32_round ((__v2df) __A, + __R); +} + +extern __inline int +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvtts_roundss_epi32 (__m128 __A, const int __R) +{ + return (int) __builtin_ia32_cvttss2sis32_round ((__v4sf) __A, + __R); +} + +extern __inline unsigned int +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvtts_roundss_epu32 (__m128 __A, const int __R) +{ + return (unsigned int) __builtin_ia32_cvttss2usis32_round ((__v4sf) __A, + __R); +} +#else + +#define _mm256_ipcvt_roundph_epi16(A, R) \ + ((__m256i) \ + __builtin_ia32_cvtph2ibs256_mask_round ((__v16hf) (A), \ + (__v16hi) \ + (_mm256_undefined_si256 ()), \ + (__mmask16) (-1), \ + (R))) + +#define _mm256_mask_ipcvt_roundph_epi16(W, U, A, R) \ + ((__m256i) __builtin_ia32_cvtph2ibs256_mask_round ((__v16hf) (A), \ + (__v16hi) (W), \ + (__mmask16) (U), \ + (R))) + +#define _mm256_maskz_ipcvt_roundph_epi16(U, A, R) \ + ((__m256i) \ + __builtin_ia32_cvtph2ibs256_mask_round ((__v16hf) (A), \ + (__v16hi) \ + (_mm256_setzero_si256 ()), \ + (__mmask16) (U), \ + (R))) + +#define _mm256_ipcvt_roundph_epu16(A, R) \ + ((__m256i) \ + __builtin_ia32_cvtph2iubs256_mask_round ((__v16hf) (A), \ + (__v16hi) \ + (_mm256_undefined_si256 ()), \ + (__mmask16) (-1), \ + (R))) + +#define _mm256_mask_ipcvt_roundph_epu16(W, U, A, R) \ + ((__m256i) __builtin_ia32_cvtph2iubs256_mask_round ((__v16hf) (A), \ + (__v16hi) (W), \ + (__mmask16) (U), \ + (R))) + +#define _mm256_maskz_ipcvt_roundph_epu16(U, A, R) \ + ((__m256i) \ + __builtin_ia32_cvtph2iubs256_mask_round ((__v16hf) (A), \ + (__v16hi) \ + (_mm256_setzero_si256 ()), \ + (__mmask16) (U), \ + (R))) #define _mm256_ipcvt_roundps_epi32(A, R) \ ((__m256i) \ @@ -1012,7 +1556,440 @@ _mm256_maskz_ipcvtt_roundps_epu32 (__mmask8 __U, __m256 __A, const int __R) (_mm256_setzero_si256 ()), \ (__mmask8) (U), \ (R))) + +#define _mm256_cvtts_roundpd_epi32(A, R) \ + ((__m128i) \ + __builtin_ia32_cvttpd2dqs256_mask_round ((__v4df) (A), \ + (__v4si) \ + (_mm_undefined_si128 ()), \ + (__mmask8) (-1), \ + (R))) + +#define _mm256_mask_cvtts_roundpd_epi32(W, U, A, R) \ + ((__m128i) __builtin_ia32_cvttpd2dqs256_mask_round ((__v4df) (A), \ + (__v4si) (W), \ + (__mmask8) (U), \ + (R))) + +#define _mm256_maskz_cvtts_roundpd_epi32(U, A, R) \ + ((__m128i) __builtin_ia32_cvttpd2dqs256_mask_round ((__v4df) (A), \ + (__v4si) \ + (_mm_setzero_si128 ()), \ + (__mmask8) (U), \ + (R))) + +#define _mm256_cvtts_roundpd_epi64(A, R) \ + ((__m256i) \ + __builtin_ia32_cvttpd2qqs256_mask_round ((__v4df) (A), \ + (__v4di) \ + (_mm256_undefined_si256 ()), \ + (__mmask8) (-1), \ + (R))) + +#define _mm256_mask_cvtts_roundpd_epi64(W, U, A, R) \ + ((__m256i) __builtin_ia32_cvttpd2qqs256_mask_round ((__v4df) (A), \ + (__v4di) (W), \ + (__mmask8) (U), \ + (R))) + +#define _mm256_maskz_cvtts_roundpd_epi64(U, A, R) \ + ((__m256i) \ + __builtin_ia32_cvttpd2qqs256_mask_round ((__v4df) (A), \ + (__v4di) \ + (_mm256_setzero_si256 ()), \ + (__mmask8) (U), \ + (R))) + +#define _mm256_cvtts_roundpd_epu32(A, R) \ + ((__m128i) \ + __builtin_ia32_cvttpd2udqs256_mask_round ((__v4df) (A), \ + (__v4si) \ + (_mm_undefined_si128 ()), \ + (__mmask8) (-1), \ + (R))) + +#define _mm256_mask_cvtts_roundpd_epu32(W, U, A, R) \ + ((__m128i) __builtin_ia32_cvttpd2udqs256_mask_round ((__v4df) (A), \ + (__v4si) (W), \ + (__mmask8) (U), \ + (R))) + +#define _mm256_maskz_cvtts_roundpd_epu32(U, A, R) \ + ((__m128i) \ + __builtin_ia32_cvttpd2udqs256_mask_round ((__v4df) (A), \ + (__v4si) (_mm_setzero_si128 ()), \ + (__mmask8) (U), \ + (R))) + +#define _mm256_cvtts_roundpd_epu64(A, R) \ + ((__m256i) \ + __builtin_ia32_cvttpd2uqqs256_mask_round ((__v4df) (A), \ + (__v4di) \ + (_mm256_undefined_si256 ()), \ + (__mmask8) (-1), \ + (R))) + +#define _mm256_mask_cvtts_roundpd_epu64(W, U, A, R) \ + ((__m256i) __builtin_ia32_cvttpd2uqqs256_mask_round ((__v4df) (A), \ + (__v4di) (W), \ + (__mmask8) (U), \ + (R))) + +#define _mm256_maskz_cvtts_roundpd_epu64(U, A, R) \ + ((__m256i) \ + __builtin_ia32_cvttpd2uqqs256_mask_round ((__v4df) (A), \ + (__v4di) \ + (_mm256_setzero_si256 ()), \ + (__mmask8) (U), \ + (R))) + +#define _mm256_cvtts_roundps_epi32(A, R) \ + ((__m256i) \ + __builtin_ia32_cvttps2dqs256_mask_round ((__v8sf) (A), \ + (__v8si) \ + (_mm256_undefined_si256 ()), \ + (__mmask8) (-1), \ + (R))) + +#define _mm256_mask_cvtts_roundps_epi32(W, U, A, R) \ + ((__m256i) __builtin_ia32_cvttps2dqs256_mask_round ((__v8sf) (A), \ + (__v8si) (W), \ + (__mmask8) (U), \ + (R))) + +#define _mm256_maskz_cvtts_roundps_epi32(U, A, R) \ + ((__m256i) \ + __builtin_ia32_cvttps2dqs256_mask_round ((__v8sf) (A), \ + (__v8si) \ + (_mm256_setzero_si256 ()), \ + (__mmask8) (U), \ + (R))) + +#define _mm256_cvtts_roundps_epi64(A, R) \ + ((__m256i) \ + __builtin_ia32_cvttps2qqs256_mask_round ((__v4sf) (A), \ + (__v4di) \ + (_mm256_undefined_si256 ()), \ + (__mmask8) (-1), \ + (R))) + +#define _mm256_mask_cvtts_roundps_epi64(W, U, A, R) \ + ((__m256i) __builtin_ia32_cvttps2qqs256_mask_round ((__v4sf) (A), \ + (__v4di) (W), \ + (__mmask8) (U), \ + (R))) + +#define _mm256_maskz_cvtts_roundps_epi64(U, A, R) \ + ((__m256i) \ + __builtin_ia32_cvttps2qqs256_mask_round ((__v4sf) (A), \ + (__v4di) \ + (_mm256_setzero_si256 ()), \ + (__mmask8) (U), \ + (R))) + +#define _mm256_cvtts_roundps_epu32(A, R) \ + ((__m256i) \ + __builtin_ia32_cvttps2udqs256_mask_round ((__v8sf) (A), \ + (__v8si) \ + (_mm256_undefined_si256 ()), \ + (__mmask8) (-1), \ + (R))) + +#define _mm256_mask_cvtts_roundps_epu32(W, U, A, R) \ + ((__m256i) __builtin_ia32_cvttps2udqs256_mask_round ((__v8sf) (A), \ + (__v8si) (W), \ + (__mmask8) (U), \ + (R))) + +#define _mm256_maskz_cvtts_roundps_epu32(U, A, R) \ + ((__m256i) \ + __builtin_ia32_cvttps2udqs256_mask_round ((__v8sf) (A), \ + (__v8si) \ + (_mm256_setzero_si256 ()), \ + (__mmask8) (U), \ + (R))) + +#define _mm256_cvtts_roundps_epu64(A, R) \ + ((__m256i) \ + __builtin_ia32_cvttps2uqqs256_mask_round ((__v4sf) (A), \ + (__v4di) \ + (_mm256_undefined_si256 ()), \ + (__mmask8) (-1), \ + (R))) + +#define _mm256_mask_cvtts_roundps_epu64(W, U, A, R) \ + ((__m256i) __builtin_ia32_cvttps2uqqs256_mask_round ((__v4sf) (A), \ + (__v4di) (W), \ + (__mmask8) (U), \ + (R))) + +#define _mm256_maskz_cvtts_roundps_epu64(U, A, R) \ + ((__m256i) \ + __builtin_ia32_cvttps2uqqs256_mask_round ((__v4sf) (A), \ + (__v4di) \ + (_mm256_setzero_si256 ()), \ + (__mmask8) (U), \ + (R))) + +#define _mm_cvtts_roundsd_epi32(A, R) \ + ((int) __builtin_ia32_cvttsd2sis32_round ((__v2df) (A), \ + (R))) + +#define _mm_cvtts_roundsd_epu32(A, R) \ + ((unsigned int) __builtin_ia32_cvttsd2usis32_round ((__v2df) (A), \ + (R))) + +#define _mm_cvtts_roundss_epi32(A, R) \ + ((int) __builtin_ia32_cvttss2sis32_round ((__v4sf) (A), \ + (R))) + +#define _mm_cvtts_roundss_epu32(A, R) \ + ((unsigned int) __builtin_ia32_cvttss2usis32_round ((__v4sf) (A), \ + (R))) +#define _mm256_cvtts_roundpd_epi32(A, R) \ + ((__m128i) \ + __builtin_ia32_cvttpd2dqs256_mask_round ((__v4df) (A), \ + (__v4si) \ + (_mm_undefined_si128 ()), \ + (__mmask8) (-1), \ + (R))) + +#define _mm256_mask_cvtts_roundpd_epi32(W, U, A, R) \ + ((__m128i) __builtin_ia32_cvttpd2dqs256_mask_round ((__v4df) (A), \ + (__v4si) (W), \ + (__mmask8) (U), \ + (R))) + +#define _mm256_maskz_cvtts_roundpd_epi32(U, A, R) \ + ((__m128i) __builtin_ia32_cvttpd2dqs256_mask_round ((__v4df) (A), \ + (__v4si) \ + (_mm_setzero_si128 ()), \ + (__mmask8) (U), \ + (R))) + +#define _mm256_cvtts_roundpd_epi64(A, R) \ + ((__m256i) \ + __builtin_ia32_cvttpd2qqs256_mask_round ((__v4df) (A), \ + (__v4di) \ + (_mm256_undefined_si256 ()), \ + (__mmask8) (-1), \ + (R))) + +#define _mm256_mask_cvtts_roundpd_epi64(W, U, A, R) \ + ((__m256i) __builtin_ia32_cvttpd2qqs256_mask_round ((__v4df) (A), \ + (__v4di) (W), \ + (__mmask8) (U), \ + (R))) + +#define _mm256_maskz_cvtts_roundpd_epi64(U, A, R) \ + ((__m256i) \ + __builtin_ia32_cvttpd2qqs256_mask_round ((__v4df) (A), \ + (__v4di) \ + (_mm256_setzero_si256 ()), \ + (__mmask8) (U), \ + (R))) + +#define _mm256_cvtts_roundpd_epu32(A, R) \ + ((__m128i) \ + __builtin_ia32_cvttpd2udqs256_mask_round ((__v4df) (A), \ + (__v4si) \ + (_mm_undefined_si128 ()), \ + (__mmask8) (-1), \ + (R))) + +#define _mm256_mask_cvtts_roundpd_epu32(W, U, A, R) \ + ((__m128i) __builtin_ia32_cvttpd2udqs256_mask_round ((__v4df) (A), \ + (__v4si) (W), \ + (__mmask8) (U), \ + (R))) + +#define _mm256_maskz_cvtts_roundpd_epu32(U, A, R) \ + ((__m128i) \ + __builtin_ia32_cvttpd2udqs256_mask_round ((__v4df) (A), \ + (__v4si) (_mm_setzero_si128 ()), \ + (__mmask8) (U), \ + (R))) + +#define _mm256_cvtts_roundpd_epu64(A, R) \ + ((__m256i) \ + __builtin_ia32_cvttpd2uqqs256_mask_round ((__v4df) (A), \ + (__v4di) \ + (_mm256_undefined_si256 ()), \ + (__mmask8) (-1), \ + (R))) + +#define _mm256_mask_cvtts_roundpd_epu64(W, U, A, R) \ + ((__m256i) __builtin_ia32_cvttpd2uqqs256_mask_round ((__v4df) (A), \ + (__v4di) (W), \ + (__mmask8) (U), \ + (R))) + +#define _mm256_maskz_cvtts_roundpd_epu64(U, A, R) \ + ((__m256i) \ + __builtin_ia32_cvttpd2uqqs256_mask_round ((__v4df) (A), \ + (__v4di) \ + (_mm256_setzero_si256 ()), \ + (__mmask8) (U), \ + (R))) + +#define _mm256_cvtts_roundps_epi32(A, R) \ + ((__m256i) \ + __builtin_ia32_cvttps2dqs256_mask_round ((__v8sf) (A), \ + (__v8si) \ + (_mm256_undefined_si256 ()), \ + (__mmask8) (-1), \ + (R))) + +#define _mm256_mask_cvtts_roundps_epi32(W, U, A, R) \ + ((__m256i) __builtin_ia32_cvttps2dqs256_mask_round ((__v8sf) (A), \ + (__v8si) (W), \ + (__mmask8) (U), \ + (R))) + +#define _mm256_maskz_cvtts_roundps_epi32(U, A, R) \ + ((__m256i) \ + __builtin_ia32_cvttps2dqs256_mask_round ((__v8sf) (A), \ + (__v8si) \ + (_mm256_setzero_si256 ()), \ + (__mmask8) (U), \ + (R))) + +#define _mm256_cvtts_roundps_epi64(A, R) \ + ((__m256i) \ + __builtin_ia32_cvttps2qqs256_mask_round ((__v4sf) (A), \ + (__v4di) \ + (_mm256_undefined_si256 ()), \ + (__mmask8) (-1), \ + (R))) + +#define _mm256_mask_cvtts_roundps_epi64(W, U, A, R) \ + ((__m256i) __builtin_ia32_cvttps2qqs256_mask_round ((__v4sf) (A), \ + (__v4di) (W), \ + (__mmask8) (U), \ + (R))) + +#define _mm256_maskz_cvtts_roundps_epi64(U, A, R) \ + ((__m256i) \ + __builtin_ia32_cvttps2qqs256_mask_round ((__v4sf) (A), \ + (__v4di) \ + (_mm256_setzero_si256 ()), \ + (__mmask8) (U), \ + (R))) + +#define _mm256_cvtts_roundps_epu32(A, R) \ + ((__m256i) \ + __builtin_ia32_cvttps2udqs256_mask_round ((__v8sf) (A), \ + (__v8si) \ + (_mm256_undefined_si256 ()), \ + (__mmask8) (-1), \ + (R))) + +#define _mm256_mask_cvtts_roundps_epu32(W, U, A, R) \ + ((__m256i) __builtin_ia32_cvttps2udqs256_mask_round ((__v8sf) (A), \ + (__v8si) (W), \ + (__mmask8) (U), \ + (R))) + +#define _mm256_maskz_cvtts_roundps_epu32(U, A, R) \ + ((__m256i) \ + __builtin_ia32_cvttps2udqs256_mask_round ((__v8sf) (A), \ + (__v8si) \ + (_mm256_setzero_si256 ()), \ + (__mmask8) (U), \ + (R))) + +#define _mm256_cvtts_roundps_epu64(A, R) \ + ((__m256i) \ + __builtin_ia32_cvttps2uqqs256_mask_round ((__v4sf) (A), \ + (__v4di) \ + (_mm256_undefined_si256 ()), \ + (__mmask8) (-1), \ + (R))) + +#define _mm256_mask_cvtts_roundps_epu64(W, U, A, R) \ + ((__m256i) __builtin_ia32_cvttps2uqqs256_mask_round ((__v4sf) (A), \ + (__v4di) (W), \ + (__mmask8) (U), \ + (R))) + +#define _mm256_maskz_cvtts_roundps_epu64(U, A, R) \ + ((__m256i) \ + __builtin_ia32_cvttps2uqqs256_mask_round ((__v4sf) (A), \ + (__v4di) \ + (_mm256_setzero_si256 ()), \ + (__mmask8) (U), \ + (R))) + +#define _mm_cvtts_roundsd_epi32(A, R) \ + ((int) __builtin_ia32_cvttsd2sis32_round ((__v2df) (A), \ + (R))) + +#define _mm_cvtts_roundsd_epu32(A, R) \ + ((unsigned int) __builtin_ia32_cvttsd2usis32_round ((__v2df) (A), \ + (R))) + +#define _mm_cvtts_roundss_epi32(A, R) \ + ((int) __builtin_ia32_cvttss2sis32_round ((__v4sf) (A), \ + (R))) + +#define _mm_cvtts_roundss_epu32(A, R) \ + ((unsigned int) __builtin_ia32_cvttss2usis32_round ((__v4sf) (A), \ + (R))) +#endif + +#ifdef __x86_64__ +#ifdef __OPTIMIZE__ +extern __inline long long +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvtts_roundsd_epi64 (__m128d __A, const int __R) +{ + return (long long) __builtin_ia32_cvttsd2sis64_round ((__v2df) __A, + __R); +} + +extern __inline unsigned long long +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvtts_roundsd_epu64 (__m128d __A, const int __R) +{ + return (unsigned long long) __builtin_ia32_cvttsd2usis64_round ((__v2df) __A, + __R); +} + +extern __inline long long +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvtts_roundss_epi64 (__m128 __A, const int __R) +{ + return (long long) __builtin_ia32_cvttss2sis64_round ((__v4sf) __A, + __R); +} + + +extern __inline unsigned long long +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvtts_roundss_epu64 (__m128 __A, const int __R) +{ + return (unsigned long long) __builtin_ia32_cvttss2usis64_round ((__v4sf) __A, + __R); +} +#else + +#define _mm_cvtts_roundsd_epi64(A, R) \ + ((long long) __builtin_ia32_cvttsd2sis64_round ((__v2df) (A), \ + (R))) + +#define _mm_cvtts_roundsd_epu64(A, R) \ + ((unsigned long long) __builtin_ia32_cvttsd2usis64_round ((__v2df) (A), \ + (R))) + +#define _mm_cvtts_roundss_epi64(A, R) \ + ((long long) __builtin_ia32_cvttss2sis64_round ((__v4sf) (A), \ + (R))) + +#define _mm_cvtts_roundss_epu64(A, R) \ + ((unsigned long long) __builtin_ia32_cvttss2usis64_round ((__v4sf) (A), \ + (R))) #endif +#endif /* __x86_64__ */ #ifdef __DISABLE_AVX10_2_256__ #undef __DISABLE_AVX10_2_256__ diff --git a/gcc/config/i386/i386-builtin.def b/gcc/config/i386/i386-builtin.def index b85eba5b330..d39274bc323 100644 --- a/gcc/config/i386/i386-builtin.def +++ b/gcc/config/i386/i386-builtin.def @@ -3290,6 +3290,14 @@ BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_cvttph2ibsv8hf_mask, "_ BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_cvttph2iubsv8hf_mask, "__builtin_ia32_cvttph2iubs128_mask", IX86_BUILTIN_CVTTPH2IUBS128_MASK, UNKNOWN, (int) V8HI_FTYPE_V8HF_V8HI_UQI) BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_cvttps2ibsv4sf_mask, "__builtin_ia32_cvttps2ibs128_mask", IX86_BUILTIN_CVTTPS2IBS128_MASK, UNKNOWN, (int) V4SI_FTYPE_V4SF_V4SI_UQI) BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_cvttps2iubsv4sf_mask, "__builtin_ia32_cvttps2iubs128_mask", IX86_BUILTIN_CVTTPS2IUBS128_MASK, UNKNOWN, (int) V4SI_FTYPE_V4SF_V4SI_UQI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_vcvttpd2dqsv2df_mask, "__builtin_ia32_cvttpd2dqs128_mask", IX86_BUILTIN_VCVTTPD2DQS128_MASK, UNKNOWN, (int) V4SI_FTYPE_V2DF_V4SI_UQI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_vcvttpd2qqsv2df_mask, "__builtin_ia32_cvttpd2qqs128_mask", IX86_BUILTIN_VCVTTPD2QQS128_MASK, UNKNOWN, (int) V2DI_FTYPE_V2DF_V2DI_UQI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_vcvttpd2udqsv2df_mask, "__builtin_ia32_cvttpd2udqs128_mask", IX86_BUILTIN_VCVTTPD2UDQS128_MASK, UNKNOWN, (int) V4SI_FTYPE_V2DF_V4SI_UQI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_vcvttpd2uqqsv2df_mask, "__builtin_ia32_cvttpd2uqqs128_mask", IX86_BUILTIN_VCVTTPD2UQQS128_MASK, UNKNOWN, (int) V2DI_FTYPE_V2DF_V2DI_UQI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_vcvttps2dqsv4sf_mask, "__builtin_ia32_cvttps2dqs128_mask", IX86_BUILTIN_VCVTTPS2DQS128_MASK, UNKNOWN, (int) V4SI_FTYPE_V4SF_V4SI_UQI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_vcvttps2qqsv2di_mask, "__builtin_ia32_cvttps2qqs128_mask", IX86_BUILTIN_VCVTTPS2QQS128_MASK, UNKNOWN, (int) V2DI_FTYPE_V4SF_V2DI_UQI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_vcvttps2udqsv4sf_mask, "__builtin_ia32_cvttps2udqs128_mask", IX86_BUILTIN_VCVTTPS2UDQS128_MASK, UNKNOWN, (int) V4SI_FTYPE_V4SF_V4SI_UQI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_vcvttps2uqqsv2di_mask, "__builtin_ia32_cvttps2uqqs128_mask", IX86_BUILTIN_VCVTTPS2UQQS128_MASK, UNKNOWN, (int) V2DI_FTYPE_V4SF_V2DI_UQI) /* Builtins with rounding support. */ BDESC_END (ARGS, ROUND_ARGS) @@ -3767,6 +3775,31 @@ BDESC (0, OPTION_MASK_ISA2_AVX10_2_512, CODE_FOR_avx10_2_cvttps2ibsv16sf_mask_ro BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_cvttps2iubsv8sf_mask_round, "__builtin_ia32_cvttps2iubs256_mask_round", IX86_BUILTIN_CVTTPS2IUBS256_MASK_ROUND, UNKNOWN, (int) V8SI_FTYPE_V8SF_V8SI_UQI_INT) BDESC (0, OPTION_MASK_ISA2_AVX10_2_512, CODE_FOR_avx10_2_cvttps2iubsv16sf_mask_round, "__builtin_ia32_cvttps2iubs512_mask_round", IX86_BUILTIN_CVTTPS2IUBS512_MASK_ROUND, UNKNOWN, (int) V16SI_FTYPE_V16SF_V16SI_UHI_INT) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_vcvttpd2dqsv4df_mask_round, "__builtin_ia32_cvttpd2dqs256_mask_round", IX86_BUILTIN_VCVTTPD2DQS256_MASK_ROUND, UNKNOWN, (int) V4SI_FTYPE_V4DF_V4SI_UQI_INT) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_512, CODE_FOR_avx10_2_vcvttpd2dqsv8df_mask_round, "__builtin_ia32_cvttpd2dqs512_mask_round", IX86_BUILTIN_VCVTTPD2DQS512_MASK_ROUND, UNKNOWN, (int) V8SI_FTYPE_V8DF_V8SI_QI_INT) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_vcvttpd2qqsv4df_mask_round, "__builtin_ia32_cvttpd2qqs256_mask_round", IX86_BUILTIN_VCVTTPD2QQS256_MASK_ROUND, UNKNOWN, (int) V4DI_FTYPE_V4DF_V4DI_UQI_INT) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_512, CODE_FOR_avx10_2_vcvttpd2qqsv8df_mask_round, "__builtin_ia32_cvttpd2qqs512_mask_round", IX86_BUILTIN_VCVTTPD2QQS512_MASK_ROUND, UNKNOWN, (int) V8DI_FTYPE_V8DF_V8DI_QI_INT) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_vcvttpd2udqsv4df_mask_round, "__builtin_ia32_cvttpd2udqs256_mask_round", IX86_BUILTIN_VCVTTPD2UDQS256_MASK_ROUND, UNKNOWN, (int) V4SI_FTYPE_V4DF_V4SI_UQI_INT) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_512, CODE_FOR_avx10_2_vcvttpd2udqsv8df_mask_round, "__builtin_ia32_cvttpd2udqs512_mask_round", IX86_BUILTIN_VCVTTPD2UDQS512_MASK_ROUND, UNKNOWN, (int) V8SI_FTYPE_V8DF_V8SI_QI_INT) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_vcvttpd2uqqsv4df_mask_round, "__builtin_ia32_cvttpd2uqqs256_mask_round", IX86_BUILTIN_VCVTTPD2UQQS256_MASK_ROUND, UNKNOWN, (int) V4DI_FTYPE_V4DF_V4DI_UQI_INT) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_512, CODE_FOR_avx10_2_vcvttpd2uqqsv8df_mask_round, "__builtin_ia32_cvttpd2uqqs512_mask_round", IX86_BUILTIN_VCVTTPD2UQQS512_MASK_ROUND, UNKNOWN, (int) V8DI_FTYPE_V8DF_V8DI_QI_INT) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_vcvttps2dqsv8sf_mask_round, "__builtin_ia32_cvttps2dqs256_mask_round", IX86_BUILTIN_VCVTTPS2DQS256_MASK_ROUND, UNKNOWN, (int) V8SI_FTYPE_V8SF_V8SI_UQI_INT) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_512, CODE_FOR_avx10_2_vcvttps2dqsv16sf_mask_round, "__builtin_ia32_cvttps2dqs512_mask_round", IX86_BUILTIN_VCVTTPS2DQS512_MASK_ROUND, UNKNOWN, (int) V16SI_FTYPE_V16SF_V16SI_HI_INT) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_vcvttps2qqsv4di_mask_round, "__builtin_ia32_cvttps2qqs256_mask_round", IX86_BUILTIN_VCVTTPS2QQS256_MASK_ROUND, UNKNOWN, (int) V4DI_FTYPE_V4SF_V4DI_UQI_INT) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_512, CODE_FOR_avx10_2_vcvttps2qqsv8di_mask_round, "__builtin_ia32_cvttps2qqs512_mask_round", IX86_BUILTIN_VCVTTPS2QQS512_MASK_ROUND, UNKNOWN, (int) V8DI_FTYPE_V8SF_V8DI_QI_INT) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_vcvttps2udqsv8sf_mask_round, "__builtin_ia32_cvttps2udqs256_mask_round", IX86_BUILTIN_VCVTTPS2UDQS256_MASK_ROUND, UNKNOWN, (int) V8SI_FTYPE_V8SF_V8SI_UQI_INT) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_512, CODE_FOR_avx10_2_vcvttps2udqsv16sf_mask_round, "__builtin_ia32_cvttps2udqs512_mask_round", IX86_BUILTIN_VCVTTPS2UDQS512_MASK_ROUND, UNKNOWN, (int) V16SI_FTYPE_V16SF_V16SI_HI_INT) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_vcvttps2uqqsv4di_mask_round, "__builtin_ia32_cvttps2uqqs256_mask_round", IX86_BUILTIN_VCVTTPS2UQQS256_MASK_ROUND, UNKNOWN, (int) V4DI_FTYPE_V4SF_V4DI_UQI_INT) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_512, CODE_FOR_avx10_2_vcvttps2uqqsv8di_mask_round, "__builtin_ia32_cvttps2uqqs512_mask_round", IX86_BUILTIN_VCVTTPS2UQQS512_MASK_ROUND, UNKNOWN, (int) V8DI_FTYPE_V8SF_V8DI_QI_INT) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_vcvttsd2sissi_round, "__builtin_ia32_cvttsd2sis32_round", IX86_BUILTIN_VCVTTSD2SIS32_ROUND, UNKNOWN, (int) INT_FTYPE_V2DF_INT) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_vcvttsd2sisdi_round, "__builtin_ia32_cvttsd2sis64_round", IX86_BUILTIN_VCVTTSD2SIS64_ROUND, UNKNOWN, (int) INT64_FTYPE_V2DF_INT) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_vcvttsd2usissi_round, "__builtin_ia32_cvttsd2usis32_round", IX86_BUILTIN_VCVTTSD2USIS32_ROUND, UNKNOWN, (int) INT_FTYPE_V2DF_INT) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_vcvttsd2usisdi_round, "__builtin_ia32_cvttsd2usis64_round", IX86_BUILTIN_VCVTTSD2USIS64_ROUND, UNKNOWN, (int) INT64_FTYPE_V2DF_INT) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_vcvttss2sissi_round, "__builtin_ia32_cvttss2sis32_round", IX86_BUILTIN_VCVTTSS2SIS32_ROUND, UNKNOWN, (int) INT_FTYPE_V4SF_INT) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_vcvttss2sisdi_round, "__builtin_ia32_cvttss2sis64_round", IX86_BUILTIN_VCVTTSS2SIS64_ROUND, UNKNOWN, (int) INT64_FTYPE_V4SF_INT) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_vcvttss2usissi_round, "__builtin_ia32_cvttss2usis32_round", IX86_BUILTIN_VCVTTSS2USIS32_ROUND, UNKNOWN, (int) INT_FTYPE_V4SF_INT) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_vcvttss2usisdi_round, "__builtin_ia32_cvttss2usis64_round", IX86_BUILTIN_VCVTTSS2USIS64_ROUND, UNKNOWN, (int) INT64_FTYPE_V4SF_INT) + BDESC_END (ROUND_ARGS, MULTI_ARG) /* FMA4 and XOP. */ diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index 0de94187e69..7c40079047a 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -247,6 +247,8 @@ UNSPEC_VCVTTPH2IUBS UNSPEC_VCVTTPS2IBS UNSPEC_VCVTTPS2IUBS + UNSPEC_SFIX_SATURATION + UNSPEC_UFIX_SATURATION ]) (define_c_enum "unspecv" [ @@ -375,6 +377,10 @@ (V4DF "TARGET_AVX512DQ && TARGET_AVX512VL") (V2DF "TARGET_AVX512DQ && TARGET_AVX512VL")]) +(define_mode_iterator VF1_VF2_AVX10_2 + [(V16SF "TARGET_AVX10_2_512") V8SF V4SF + (V8DF "TARGET_AVX10_2_512") V4DF V2DF]) + (define_mode_iterator VFH [(V32HF "TARGET_AVX512FP16 && TARGET_EVEX512") (V16HF "TARGET_AVX512FP16 && TARGET_AVX512VL") @@ -423,6 +429,9 @@ (define_mode_iterator VF2 [(V8DF "TARGET_AVX512F && TARGET_EVEX512") (V4DF "TARGET_AVX") V2DF]) +(define_mode_iterator VF2_AVX10_2 + [(V8DF "TARGET_AVX10_2_512") V4DF V2DF]) + ;; All DFmode & HFmode vector float modes (define_mode_iterator VF2H [(V32HF "TARGET_AVX512FP16 && TARGET_EVEX512") @@ -570,6 +579,9 @@ (define_mode_iterator VI8 [(V8DI "TARGET_AVX512F && TARGET_EVEX512") (V4DI "TARGET_AVX") V2DI]) +(define_mode_iterator VI8_AVX10_2 + [(V8DI "TARGET_AVX10_2_512") V4DI V2DI]) + (define_mode_iterator VI8_FVL [(V8DI "TARGET_AVX512F && TARGET_EVEX512") V4DI (V2DI "TARGET_AVX512VL")]) @@ -32228,7 +32240,9 @@ (UNSPEC_VCVTPS2IBS "") (UNSPEC_VCVTPS2IUBS "u") (UNSPEC_VCVTTPS2IBS "") - (UNSPEC_VCVTTPS2IUBS "u")]) + (UNSPEC_VCVTTPS2IUBS "u") + (UNSPEC_SFIX_SATURATION "") + (UNSPEC_UFIX_SATURATION "u")]) (define_int_attr sat_cvt_trunc_prefix @@ -32307,3 +32321,70 @@ [(set_attr "type" "ssecvt") (set_attr "prefix" "evex") (set_attr "mode" "")]) + +(define_int_iterator UNSPEC_SAT_CVT_DS_SIGN_ITER + [UNSPEC_SFIX_SATURATION + UNSPEC_UFIX_SATURATION]) + +(define_mode_attr pd2dqssuff + [(V16SF "") (V8SF "") (V4SF "") + (V8DF "") (V4DF "{y}") (V2DF "{x}")]) + +(define_insn "avx10_2_vcvtt2dqs" + [(set (match_operand: 0 "register_operand" "=v") + (unspec: + [(match_operand:VF1_VF2_AVX10_2 1 "" "")] + UNSPEC_SAT_CVT_DS_SIGN_ITER))] + "TARGET_AVX10_2_256 && " + "vcvtt2dqs\t{%1, %0|%0, %1}" + [(set_attr "type" "ssecvt") + (set_attr "prefix" "evex") + (set_attr "mode" "")]) + +(define_insn "avx10_2_vcvttpd2qqs" + [(set (match_operand: 0 "register_operand" "=v") + (unspec: + [(match_operand:VF2_AVX10_2 1 "" "")] + UNSPEC_SAT_CVT_DS_SIGN_ITER))] + "TARGET_AVX10_2_256 && " + "vcvttpd2qqs\t{%1, %0|%0, %1}" + [(set_attr "type" "ssecvt") + (set_attr "prefix" "evex") + (set_attr "mode" "")]) + +(define_insn "avx10_2_vcvttps2qqs" + [(set (match_operand:VI8_AVX10_2 0 "register_operand" "=v") + (unspec:VI8_AVX10_2 + [(match_operand: 1 "" "")] + UNSPEC_SAT_CVT_DS_SIGN_ITER))] + "TARGET_AVX10_2_256 && " + "vcvttps2qqs\t{%1, %0|%0, %1}" + [(set_attr "type" "ssecvt") + (set_attr "prefix" "evex") + (set_attr "mode" "")]) + +(define_insn "avx10_2_vcvttsd2sis" + [(set (match_operand:SWI48 0 "register_operand" "=r") + (unspec:SWI48 + [(vec_select:DF + (match_operand:V2DF 1 "" "") + (parallel [(const_int 0)]))] + UNSPEC_SAT_CVT_DS_SIGN_ITER))] + "TARGET_AVX10_2_256" + "vcvttsd2sis\t{%1, %0|%0, %1}" + [(set_attr "type" "ssecvt") + (set_attr "prefix" "evex") + (set_attr "mode" "")]) + +(define_insn "avx10_2_vcvttss2sis" + [(set (match_operand:SWI48 0 "register_operand" "=r") + (unspec:SWI48 + [(vec_select:SF + (match_operand:V4SF 1 "" "") + (parallel [(const_int 0)]))] + UNSPEC_SAT_CVT_DS_SIGN_ITER))] + "TARGET_AVX10_2_256" + "vcvttss2sis\t{%1, %0|%0, %1}" + [(set_attr "type" "ssecvt") + (set_attr "prefix" "evex") + (set_attr "mode" "")]) diff --git a/gcc/testsuite/gcc.target/i386/avx-1.c b/gcc/testsuite/gcc.target/i386/avx-1.c index be2fb5ae15a..30c071adf13 100644 --- a/gcc/testsuite/gcc.target/i386/avx-1.c +++ b/gcc/testsuite/gcc.target/i386/avx-1.c @@ -1044,6 +1044,14 @@ #define __builtin_ia32_cvttph2iubs512_mask_round(A, B, C, D) __builtin_ia32_cvttph2iubs512_mask_round(A, B, C, 8) #define __builtin_ia32_cvttps2ibs512_mask_round(A, B, C, D) __builtin_ia32_cvttps2ibs512_mask_round(A, B, C, 8) #define __builtin_ia32_cvttps2iubs512_mask_round(A, B, C, D) __builtin_ia32_cvttps2iubs512_mask_round(A, B, C, 8) +#define __builtin_ia32_cvttpd2dqs512_mask_round(A, B, C, D) __builtin_ia32_cvttpd2dqs512_mask_round(A, B, C, 8) +#define __builtin_ia32_cvttpd2qqs512_mask_round(A, B, C, D) __builtin_ia32_cvttpd2qqs512_mask_round(A, B, C, 8) +#define __builtin_ia32_cvttpd2udqs512_mask_round(A, B, C, D) __builtin_ia32_cvttpd2udqs512_mask_round(A, B, C, 8) +#define __builtin_ia32_cvttpd2uqqs512_mask_round(A, B, C, D) __builtin_ia32_cvttpd2uqqs512_mask_round(A, B, C, 8) +#define __builtin_ia32_cvttps2dqs512_mask_round(A, B, C, D) __builtin_ia32_cvttps2dqs512_mask_round(A, B, C, 8) +#define __builtin_ia32_cvttps2qqs512_mask_round(A, B, C, D) __builtin_ia32_cvttps2qqs512_mask_round(A, B, C, 8) +#define __builtin_ia32_cvttps2udqs512_mask_round(A, B, C, D) __builtin_ia32_cvttps2udqs512_mask_round(A, B, C, 8) +#define __builtin_ia32_cvttps2uqqs512_mask_round(A, B, C, D) __builtin_ia32_cvttps2uqqs512_mask_round(A, B, C, 8) /* avx10_2satcvtintrin.h */ #define __builtin_ia32_cvtph2ibs256_mask_round(A, B, C, D) __builtin_ia32_cvtph2ibs256_mask_round(A, B, C, 8) @@ -1054,6 +1062,24 @@ #define __builtin_ia32_cvttph2iubs256_mask_round(A, B, C, D) __builtin_ia32_cvttph2iubs256_mask_round(A, B, C, 8) #define __builtin_ia32_cvttps2ibs256_mask_round(A, B, C, D) __builtin_ia32_cvttps2ibs256_mask_round(A, B, C, 8) #define __builtin_ia32_cvttps2iubs256_mask_round(A, B, C, D) __builtin_ia32_cvttps2iubs256_mask_round(A, B, C, 8) +#define __builtin_ia32_cvttpd2dqs256_mask_round(A, B, C, D) __builtin_ia32_cvttpd2dqs256_mask_round(A, B, C, 8) +#define __builtin_ia32_cvttpd2qqs256_mask_round(A, B, C, D) __builtin_ia32_cvttpd2qqs256_mask_round(A, B, C, 8) +#define __builtin_ia32_cvttpd2udqs256_mask_round(A, B, C, D) __builtin_ia32_cvttpd2udqs256_mask_round(A, B, C, 8) +#define __builtin_ia32_cvttpd2uqqs256_mask_round(A, B, C, D) __builtin_ia32_cvttpd2uqqs256_mask_round(A, B, C, 8) +#define __builtin_ia32_cvttps2dqs256_mask_round(A, B, C, D) __builtin_ia32_cvttps2dqs256_mask_round(A, B, C, 8) +#define __builtin_ia32_cvttps2qqs256_mask_round(A, B, C, D) __builtin_ia32_cvttps2qqs256_mask_round(A, B, C, 8) +#define __builtin_ia32_cvttps2udqs256_mask_round(A, B, C, D) __builtin_ia32_cvttps2udqs256_mask_round(A, B, C, 8) +#define __builtin_ia32_cvttps2uqqs256_mask_round(A, B, C, D) __builtin_ia32_cvttps2uqqs256_mask_round(A, B, C, 8) +#define __builtin_ia32_cvttsd2sis32_round(A, B) __builtin_ia32_cvttsd2sis32_round(A, 8) +#define __builtin_ia32_cvttsd2usis32_round(A, B) __builtin_ia32_cvttsd2usis32_round(A, 8) +#define __builtin_ia32_cvttss2sis32_round(A, B) __builtin_ia32_cvttss2sis32_round(A, 8) +#define __builtin_ia32_cvttss2usis32_round(A, B) __builtin_ia32_cvttss2usis32_round(A, 8) +#ifdef __x86_64__ +#define __builtin_ia32_cvttsd2sis64_round(A, B) __builtin_ia32_cvttsd2sis64_round(A, 8) +#define __builtin_ia32_cvttsd2usis64_round(A, B) __builtin_ia32_cvttsd2usis64_round(A, 8) +#define __builtin_ia32_cvttss2sis64_round(A, B) __builtin_ia32_cvttss2sis64_round(A, 8) +#define __builtin_ia32_cvttss2usis64_round(A, B) __builtin_ia32_cvttss2usis64_round(A, 8) +#endif #include #include diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-512-satcvt-1.c b/gcc/testsuite/gcc.target/i386/avx10_2-512-satcvt-1.c index 84826c0fe5a..ecc356aab94 100644 --- a/gcc/testsuite/gcc.target/i386/avx10_2-512-satcvt-1.c +++ b/gcc/testsuite/gcc.target/i386/avx10_2-512-satcvt-1.c @@ -36,12 +36,39 @@ /* { dg-final { scan-assembler-times "vcvttnebf162iubs\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ /* { dg-final { scan-assembler-times "vcvttnebf162iubs\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\[^\{\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ /* { dg-final { scan-assembler-times "vcvttnebf162iubs\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvttpd2dqs\[ \\t\]+\{sae\}\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvttpd2dqs\[ \\t\]+\{sae\}\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvttpd2dqs\[ \\t\]+\{sae\}\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvttpd2qqs\[ \\t\]+\{sae\}\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvttpd2qqs\[ \\t\]+\{sae\}\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvttpd2qqs\[ \\t\]+\{sae\}\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvttpd2udqs\[ \\t\]+\{sae\}\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvttpd2udqs\[ \\t\]+\{sae\}\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvttpd2udqs\[ \\t\]+\{sae\}\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvttpd2uqqs\[ \\t\]+\{sae\}\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvttpd2uqqs\[ \\t\]+\{sae\}\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvttpd2uqqs\[ \\t\]+\{sae\}\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvttps2dqs\[ \\t\]+\{sae\}\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvttps2dqs\[ \\t\]+\{sae\}\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvttps2dqs\[ \\t\]+\{sae\}\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvttps2qqs\[ \\t\]+\{sae\}\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvttps2qqs\[ \\t\]+\{sae\}\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvttps2qqs\[ \\t\]+\{sae\}\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvttps2udqs\[ \\t\]+\{sae\}\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvttps2udqs\[ \\t\]+\{sae\}\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvttps2udqs\[ \\t\]+\{sae\}\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvttps2uqqs\[ \\t\]+\{sae\}\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvttps2uqqs\[ \\t\]+\{sae\}\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvttps2uqqs\[ \\t\]+\{sae\}\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ #include +volatile __m256 hx; +volatile __m256i hxi; volatile __m512 x; volatile __m512h xh; volatile __m512i xi; +volatile __m512d xd; volatile __m512bh xbh; volatile __mmask8 m8; volatile __mmask16 m16; @@ -97,4 +124,36 @@ avx10_2_test (void) xi = _mm512_ipcvttnebf16_epu16 (xbh); xi = _mm512_mask_ipcvttnebf16_epu16 (xi, m32, xbh); xi = _mm512_maskz_ipcvttnebf16_epu16 (m32, xbh); + + hxi = _mm512_cvtts_roundpd_epi32 (xd, 8); + hxi = _mm512_mask_cvtts_roundpd_epi32 (hxi, m8, xd, 8); + hxi = _mm512_maskz_cvtts_roundpd_epi32 (m8, xd, 8); + + xi = _mm512_cvtts_roundpd_epi64 (xd, 8); + xi = _mm512_mask_cvtts_roundpd_epi64 (xi, m8, xd, 8); + xi = _mm512_maskz_cvtts_roundpd_epi64 (m8, xd, 8); + + hxi = _mm512_cvtts_roundpd_epu32 (xd, 8); + hxi = _mm512_mask_cvtts_roundpd_epu32 (hxi, m8, xd, 8); + hxi = _mm512_maskz_cvtts_roundpd_epu32 (m8, xd, 8); + + xi = _mm512_cvtts_roundpd_epu64 (xd, 8); + xi = _mm512_mask_cvtts_roundpd_epu64 (xi, m8, xd, 8); + xi = _mm512_maskz_cvtts_roundpd_epu64 (m8, xd, 8); + + xi = _mm512_cvtts_roundps_epi32 (x, 8); + xi = _mm512_mask_cvtts_roundps_epi32 (xi, m16, x, 8); + xi = _mm512_maskz_cvtts_roundps_epi32 (m16, x, 8); + + xi = _mm512_cvtts_roundps_epi64 (hx, 8); + xi = _mm512_mask_cvtts_roundps_epi64 (xi, m8, hx, 8); + xi = _mm512_maskz_cvtts_roundps_epi64 (m8, hx, 8); + + xi = _mm512_cvtts_roundps_epu32 (x, 8); + xi = _mm512_mask_cvtts_roundps_epu32 (xi, m16, x, 8); + xi = _mm512_maskz_cvtts_roundps_epu32 (m16, x, 8); + + xi = _mm512_cvtts_roundps_epu64 (hx, 8); + xi = _mm512_mask_cvtts_roundps_epu64 (xi, m8, hx, 8); + xi = _mm512_maskz_cvtts_roundps_epu64 (m8, hx, 8); } diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-512-vcvttpd2dqs-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-512-vcvttpd2dqs-2.c new file mode 100644 index 00000000000..dd7ea88cb82 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx10_2-512-vcvttpd2dqs-2.c @@ -0,0 +1,72 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mavx10.2-512" } */ +/* { dg-require-effective-target avx10_2_512 } */ + +#ifndef AVX10_2 +#define AVX10_2 +#define AVX10_2_512 +#define AVX10_512BIT +#endif +#include "avx10-helper.h" +#include + +#define SRC_SIZE ((AVX512F_LEN) / 64) +#define DST_SIZE ((AVX512F_LEN_HALF) / 32) + +static void +CALC (double *s, int *r) +{ + int i; + + for (i = 0; i < SRC_SIZE; i++) + { + if (s[i] > INT_MAX) + r[i] = INT_MAX; + else if (s[i] < INT_MIN) + r[i] = INT_MIN; + else + r[i] = s[i]; + } +} + +void +TEST (void) +{ + UNION_TYPE (AVX512F_LEN, d) s; + UNION_TYPE (AVX512F_LEN_HALF, i_d) res1, res2, res3; + MASK_TYPE mask = MASK_VALUE; + int res_ref[DST_SIZE] = { 0 }; + int i, sign = 1; + + for (i = 0; i < SRC_SIZE; i++) + { + s.a[i] = 1.23 * (i + 2) * sign; + sign = -sign; + } + + for (i = 0; i < DST_SIZE; i++) + res2.a[i] = DEFAULT_VALUE; + +#if AVX512F_LEN == 128 + res1.x = INTRINSIC (_cvttspd_epi32) (s.x); + res2.x = INTRINSIC (_mask_cvttspd_epi32) (res2.x, mask, s.x); + res3.x = INTRINSIC (_maskz_cvttspd_epi32) (mask, s.x); +#else + res1.x = INTRINSIC (_cvtts_roundpd_epi32) (s.x, 8); + res2.x = INTRINSIC (_mask_cvtts_roundpd_epi32) (res2.x, mask, s.x, 8); + res3.x = INTRINSIC (_maskz_cvtts_roundpd_epi32) (mask, s.x, 8); +#endif + + CALC (s.a, res_ref); + + if (UNION_CHECK (AVX512F_LEN_HALF, i_d) (res1, res_ref)) + abort (); + + MASK_MERGE (i_d) (res_ref, mask, SRC_SIZE); + if (UNION_CHECK (AVX512F_LEN_HALF, i_d) (res2, res_ref)) + abort (); + + MASK_ZERO (i_d) (res_ref, mask, SRC_SIZE); + if (UNION_CHECK (AVX512F_LEN_HALF, i_d) (res3, res_ref)) + abort (); +} diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-512-vcvttpd2qqs-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-512-vcvttpd2qqs-2.c new file mode 100644 index 00000000000..a28643152ae --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx10_2-512-vcvttpd2qqs-2.c @@ -0,0 +1,72 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mavx10.2-512" } */ +/* { dg-require-effective-target avx10_2_512 } */ + +#ifndef AVX10_2 +#define AVX10_2 +#define AVX10_2_512 +#define AVX10_512BIT +#endif +#include "avx10-helper.h" +#include + +#define SRC_SIZE ((AVX512F_LEN) / 64) +#define DST_SIZE ((AVX512F_LEN) / 64) + +static void +CALC (double *s, long long *r) +{ + int i; + + for (i = 0; i < SRC_SIZE; i++) + { + if (s[i] > LLONG_MAX) + r[i] = LLONG_MAX; + else if (s[i] < LLONG_MIN) + r[i] = LLONG_MIN; + else + r[i] = s[i]; + } +} + +void +TEST (void) +{ + UNION_TYPE (AVX512F_LEN, d) s; + UNION_TYPE (AVX512F_LEN, i_q) res1, res2, res3; + MASK_TYPE mask = MASK_VALUE; + long long res_ref[DST_SIZE] = { 0 }; + int i, sign = 1; + + for (i = 0; i < SRC_SIZE; i++) + { + s.a[i] = 1.23 * (i + 2) * sign; + sign = -sign; + } + + for (i = 0; i < DST_SIZE; i++) + res2.a[i] = DEFAULT_VALUE; + +#if AVX512F_LEN == 128 + res1.x = INTRINSIC (_cvttspd_epi64) (s.x); + res2.x = INTRINSIC (_mask_cvttspd_epi64) (res2.x, mask, s.x); + res3.x = INTRINSIC (_maskz_cvttspd_epi64) (mask, s.x); +#else + res1.x = INTRINSIC (_cvtts_roundpd_epi64) (s.x, 8); + res2.x = INTRINSIC (_mask_cvtts_roundpd_epi64) (res2.x, mask, s.x, 8); + res3.x = INTRINSIC (_maskz_cvtts_roundpd_epi64) (mask, s.x, 8); +#endif + + CALC (s.a, res_ref); + + if (UNION_CHECK (AVX512F_LEN, i_q) (res1, res_ref)) + abort (); + + MASK_MERGE (i_q) (res_ref, mask, SRC_SIZE); + if (UNION_CHECK (AVX512F_LEN, i_q) (res2, res_ref)) + abort (); + + MASK_ZERO (i_q) (res_ref, mask, SRC_SIZE); + if (UNION_CHECK (AVX512F_LEN, i_q) (res3, res_ref)) + abort (); +} diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-512-vcvttpd2udqs-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-512-vcvttpd2udqs-2.c new file mode 100644 index 00000000000..768567747a4 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx10_2-512-vcvttpd2udqs-2.c @@ -0,0 +1,72 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mavx10.2-512" } */ +/* { dg-require-effective-target avx10_2_512 } */ + +#ifndef AVX10_2 +#define AVX10_2 +#define AVX10_2_512 +#define AVX10_512BIT +#endif +#include "avx10-helper.h" +#include + +#define SRC_SIZE ((AVX512F_LEN) / 64) +#define DST_SIZE ((AVX512F_LEN_HALF) / 32) + +static void +CALC (double *s, unsigned int *r) +{ + int i; + + for (i = 0; i < SRC_SIZE; i++) + { + if (s[i] > UINT_MAX) + r[i] = UINT_MAX; + else if (s[i] < 0) + r[i] = 0; + else + r[i] = s[i]; + } +} + +void +TEST (void) +{ + UNION_TYPE (AVX512F_LEN, d) s; + UNION_TYPE (AVX512F_LEN_HALF, i_ud) res1, res2, res3; + MASK_TYPE mask = MASK_VALUE; + unsigned int res_ref[DST_SIZE] = { 0 }; + int i, sign = 1; + + for (i = 0; i < SRC_SIZE; i++) + { + s.a[i] = 1.23 * (i + 2) * sign; + sign = -sign; + } + + for (i = 0; i < DST_SIZE; i++) + res2.a[i] = DEFAULT_VALUE; + +#if AVX512F_LEN == 128 + res1.x = INTRINSIC (_cvttspd_epu32) (s.x); + res2.x = INTRINSIC (_mask_cvttspd_epu32) (res2.x, mask, s.x); + res3.x = INTRINSIC (_maskz_cvttspd_epu32) (mask, s.x); +#else + res1.x = INTRINSIC (_cvtts_roundpd_epu32) (s.x, 8); + res2.x = INTRINSIC (_mask_cvtts_roundpd_epu32) (res2.x, mask, s.x, 8); + res3.x = INTRINSIC (_maskz_cvtts_roundpd_epu32) (mask, s.x, 8); +#endif + + CALC (s.a, res_ref); + + if (UNION_CHECK (AVX512F_LEN_HALF, i_ud) (res1, res_ref)) + abort (); + + MASK_MERGE (i_ud) (res_ref, mask, SRC_SIZE); + if (UNION_CHECK (AVX512F_LEN_HALF, i_ud) (res2, res_ref)) + abort (); + + MASK_ZERO (i_ud) (res_ref, mask, SRC_SIZE); + if (UNION_CHECK (AVX512F_LEN_HALF, i_ud) (res3, res_ref)) + abort (); +} diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-512-vcvttpd2uqqs-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-512-vcvttpd2uqqs-2.c new file mode 100644 index 00000000000..dbdd8114241 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx10_2-512-vcvttpd2uqqs-2.c @@ -0,0 +1,72 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mavx10.2-512" } */ +/* { dg-require-effective-target avx10_2_512 } */ + +#ifndef AVX10_2 +#define AVX10_2 +#define AVX10_2_512 +#define AVX10_512BIT +#endif +#include "avx10-helper.h" +#include + +#define SRC_SIZE ((AVX512F_LEN) / 64) +#define DST_SIZE ((AVX512F_LEN) / 64) + +static void +CALC (double *s, unsigned long long *r) +{ + int i; + + for (i = 0; i < SRC_SIZE; i++) + { + if (s[i] > ULONG_MAX) + r[i] = ULONG_MAX; + else if (s[i] < 0) + r[i] = 0; + else + r[i] = s[i]; + } +} + +void +TEST (void) +{ + UNION_TYPE (AVX512F_LEN, d) s; + UNION_TYPE (AVX512F_LEN, i_uq) res1, res2, res3; + MASK_TYPE mask = MASK_VALUE; + unsigned long long res_ref[DST_SIZE] = { 0 }; + int i, sign = 1; + + for (i = 0; i < SRC_SIZE; i++) + { + s.a[i] = 1.23 * (i + 2) * sign; + sign = -sign; + } + + for (i = 0; i < DST_SIZE; i++) + res2.a[i] = DEFAULT_VALUE; + +#if AVX512F_LEN == 128 + res1.x = INTRINSIC (_cvttspd_epu64) (s.x); + res2.x = INTRINSIC (_mask_cvttspd_epu64) (res2.x, mask, s.x); + res3.x = INTRINSIC (_maskz_cvttspd_epu64) (mask, s.x); +#else + res1.x = INTRINSIC (_cvtts_roundpd_epu64) (s.x, 8); + res2.x = INTRINSIC (_mask_cvtts_roundpd_epu64) (res2.x, mask, s.x, 8); + res3.x = INTRINSIC (_maskz_cvtts_roundpd_epu64) (mask, s.x, 8); +#endif + + CALC (s.a, res_ref); + + if (UNION_CHECK (AVX512F_LEN, i_uq) (res1, res_ref)) + abort (); + + MASK_MERGE (i_uq) (res_ref, mask, SRC_SIZE); + if (UNION_CHECK (AVX512F_LEN, i_uq) (res2, res_ref)) + abort (); + + MASK_ZERO (i_uq) (res_ref, mask, SRC_SIZE); + if (UNION_CHECK (AVX512F_LEN, i_uq) (res3, res_ref)) + abort (); +} diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-512-vcvttps2dqs-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-512-vcvttps2dqs-2.c new file mode 100644 index 00000000000..7a9b6e31e40 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx10_2-512-vcvttps2dqs-2.c @@ -0,0 +1,72 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mavx10.2-512" } */ +/* { dg-require-effective-target avx10_2_512 } */ + +#ifndef AVX10_2 +#define AVX10_2 +#define AVX10_2_512 +#define AVX10_512BIT +#endif +#include "avx10-helper.h" +#include + +#define SRC_SIZE ((AVX512F_LEN) / 32) +#define DST_SIZE ((AVX512F_LEN) / 32) + +static void +CALC (float *s, int *r) +{ + int i; + + for (i = 0; i < SRC_SIZE; i++) + { + if (s[i] > INT_MAX) + r[i] = INT_MAX; + else if (s[i] < INT_MIN) + r[i] = INT_MIN; + else + r[i] = s[i]; + } +} + +void +TEST (void) +{ + UNION_TYPE (AVX512F_LEN, ) s; + UNION_TYPE (AVX512F_LEN, i_d) res1, res2, res3; + MASK_TYPE mask = MASK_VALUE; + int res_ref[DST_SIZE] = { 0 }; + int i, sign = 1; + + for (i = 0; i < SRC_SIZE; i++) + { + s.a[i] = 1.23 * (i + 2) * sign; + sign = -sign; + } + + for (i = 0; i < DST_SIZE; i++) + res2.a[i] = DEFAULT_VALUE; + +#if AVX512F_LEN == 128 + res1.x = INTRINSIC (_cvttsps_epi32) (s.x); + res2.x = INTRINSIC (_mask_cvttsps_epi32) (res2.x, mask, s.x); + res3.x = INTRINSIC (_maskz_cvttsps_epi32) (mask, s.x); +#else + res1.x = INTRINSIC (_cvtts_roundps_epi32) (s.x, 8); + res2.x = INTRINSIC (_mask_cvtts_roundps_epi32) (res2.x, mask, s.x, 8); + res3.x = INTRINSIC (_maskz_cvtts_roundps_epi32) (mask, s.x, 8); +#endif + + CALC (s.a, res_ref); + + if (UNION_CHECK (AVX512F_LEN, i_d) (res1, res_ref)) + abort (); + + MASK_MERGE (i_d) (res_ref, mask, SRC_SIZE); + if (UNION_CHECK (AVX512F_LEN, i_d) (res2, res_ref)) + abort (); + + MASK_ZERO (i_d) (res_ref, mask, SRC_SIZE); + if (UNION_CHECK (AVX512F_LEN, i_d) (res3, res_ref)) + abort (); +} diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-512-vcvttps2qqs-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-512-vcvttps2qqs-2.c new file mode 100644 index 00000000000..ed19c5e329d --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx10_2-512-vcvttps2qqs-2.c @@ -0,0 +1,73 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mavx10.2-512" } */ +/* { dg-require-effective-target avx10_2_512 } */ + +#ifndef AVX10_2 +#define AVX10_2 +#define AVX10_2_512 +#define AVX10_512BIT +#endif +#include "avx10-helper.h" +#include + +#define SRC_SIZE ((AVX512F_LEN_HALF) / 32) +#define DST_SIZE ((AVX512F_LEN) / 64) + +static void +CALC (float *s, long long *r) +{ + int i; + + for (i = 0; i < DST_SIZE; i++) + { + if (s[i] > LLONG_MAX) + r[i] = LLONG_MAX; + else if (s[i] < LLONG_MIN) + r[i] = LLONG_MIN; + else + r[i] = s[i]; + } +} + +void +TEST (void) +{ + UNION_TYPE (AVX512F_LEN_HALF, ) s; + UNION_TYPE (AVX512F_LEN, i_q) res1, res2, res3; + MASK_TYPE mask = MASK_VALUE; + long long res_ref[DST_SIZE] = { 0 }; + int i, sign = 1; + + for (i = 0; i < SRC_SIZE; i++) + { + s.a[i] = 1.23 * (i + 2) * sign; + sign = -sign; + } + + for (i = 0; i < DST_SIZE; i++) + res2.a[i] = DEFAULT_VALUE; + +#if AVX512F_LEN == 128 + res1.x = INTRINSIC (_cvttsps_epi64) (s.x); + res2.x = INTRINSIC (_mask_cvttsps_epi64) (res2.x, mask, s.x); + res3.x = INTRINSIC (_maskz_cvttsps_epi64) (mask, s.x); +#else + res1.x = INTRINSIC (_cvtts_roundps_epi64) (s.x, 8); + res2.x = INTRINSIC (_mask_cvtts_roundps_epi64) (res2.x, mask, s.x, 8); + res3.x = INTRINSIC (_maskz_cvtts_roundps_epi64) (mask, s.x, 8); +#endif + + CALC (s.a, res_ref); + + + if (UNION_CHECK (AVX512F_LEN, i_q) (res1, res_ref)) + abort (); + + MASK_MERGE (i_q) (res_ref, mask, DST_SIZE); + if (UNION_CHECK (AVX512F_LEN, i_q) (res2, res_ref)) + abort (); + + MASK_ZERO (i_q) (res_ref, mask, DST_SIZE); + if (UNION_CHECK (AVX512F_LEN, i_q) (res3, res_ref)) + abort (); +} diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-512-vcvttps2udqs-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-512-vcvttps2udqs-2.c new file mode 100644 index 00000000000..b279af29326 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx10_2-512-vcvttps2udqs-2.c @@ -0,0 +1,72 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mavx10.2-512" } */ +/* { dg-require-effective-target avx10_2_512 } */ + +#ifndef AVX10_2 +#define AVX10_2 +#define AVX10_2_512 +#define AVX10_512BIT +#endif +#include "avx10-helper.h" +#include + +#define SRC_SIZE ((AVX512F_LEN) / 32) +#define DST_SIZE ((AVX512F_LEN) / 32) + +static void +CALC (float *s, unsigned int *r) +{ + int i; + + for (i = 0; i < SRC_SIZE; i++) + { + if (s[i] > UINT_MAX) + r[i] = UINT_MAX; + else if (s[i] < 0) + r[i] = 0; + else + r[i] = s[i]; + } +} + +void +TEST (void) +{ + UNION_TYPE (AVX512F_LEN, ) s; + UNION_TYPE (AVX512F_LEN, i_ud) res1, res2, res3; + MASK_TYPE mask = MASK_VALUE; + unsigned int res_ref[DST_SIZE] = { 0 }; + int i, sign = 1; + + for (i = 0; i < SRC_SIZE; i++) + { + s.a[i] = 1.23 * (i + 2) * sign; + sign = -sign; + } + + for (i = 0; i < DST_SIZE; i++) + res2.a[i] = DEFAULT_VALUE; + +#if AVX512F_LEN == 128 + res1.x = INTRINSIC (_cvttsps_epu32) (s.x); + res2.x = INTRINSIC (_mask_cvttsps_epu32) (res2.x, mask, s.x); + res3.x = INTRINSIC (_maskz_cvttsps_epu32) (mask, s.x); +#else + res1.x = INTRINSIC (_cvtts_roundps_epu32) (s.x, 8); + res2.x = INTRINSIC (_mask_cvtts_roundps_epu32) (res2.x, mask, s.x, 8); + res3.x = INTRINSIC (_maskz_cvtts_roundps_epu32) (mask, s.x, 8); +#endif + + CALC (s.a, res_ref); + + if (UNION_CHECK (AVX512F_LEN, i_ud) (res1, res_ref)) + abort (); + + MASK_MERGE (i_ud) (res_ref, mask, SRC_SIZE); + if (UNION_CHECK (AVX512F_LEN, i_ud) (res2, res_ref)) + abort (); + + MASK_ZERO (i_ud) (res_ref, mask, SRC_SIZE); + if (UNION_CHECK (AVX512F_LEN, i_ud) (res3, res_ref)) + abort (); +} diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-512-vcvttps2uqqs-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-512-vcvttps2uqqs-2.c new file mode 100644 index 00000000000..7151d079b79 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx10_2-512-vcvttps2uqqs-2.c @@ -0,0 +1,72 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mavx10.2-512" } */ +/* { dg-require-effective-target avx10_2_512 } */ + +#ifndef AVX10_2 +#define AVX10_2 +#define AVX10_2_512 +#define AVX10_512BIT +#endif +#include "avx10-helper.h" +#include + +#define SRC_SIZE ((AVX512F_LEN_HALF) / 32) +#define DST_SIZE ((AVX512F_LEN) / 64) + +static void +CALC (float *s, unsigned long long *r) +{ + int i; + + for (i = 0; i < SRC_SIZE; i++) + { + if (s[i] > ULONG_MAX) + r[i] = ULONG_MAX; + else if (s[i] < 0) + r[i] = 0; + else + r[i] = s[i]; + } +} + +void +TEST (void) +{ + UNION_TYPE (AVX512F_LEN_HALF, ) s; + UNION_TYPE (AVX512F_LEN, i_uq) res1, res2, res3; + MASK_TYPE mask = MASK_VALUE; + unsigned long long res_ref[DST_SIZE] = { 0 }; + int i, sign = 1; + + for (i = 0; i < SRC_SIZE; i++) + { + s.a[i] = 1.23 * (i + 2) * sign; + sign = -sign; + } + + for (i = 0; i < DST_SIZE; i++) + res2.a[i] = DEFAULT_VALUE; + +#if AVX512F_LEN == 128 + res1.x = INTRINSIC (_cvttsps_epu64) (s.x); + res2.x = INTRINSIC (_mask_cvttsps_epu64) (res2.x, mask, s.x); + res3.x = INTRINSIC (_maskz_cvttsps_epu64) (mask, s.x); +#else + res1.x = INTRINSIC (_cvtts_roundps_epu64) (s.x, 8); + res2.x = INTRINSIC (_mask_cvtts_roundps_epu64) (res2.x, mask, s.x, 8); + res3.x = INTRINSIC (_maskz_cvtts_roundps_epu64) (mask, s.x, 8); +#endif + + CALC (s.a, res_ref); + + if (UNION_CHECK (AVX512F_LEN, i_uq) (res1, res_ref)) + abort (); + + MASK_MERGE (i_uq) (res_ref, mask, SRC_SIZE); + if (UNION_CHECK (AVX512F_LEN, i_uq) (res2, res_ref)) + abort (); + + MASK_ZERO (i_uq) (res_ref, mask, SRC_SIZE); + if (UNION_CHECK (AVX512F_LEN, i_uq) (res3, res_ref)) + abort (); +} diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-satcvt-1.c b/gcc/testsuite/gcc.target/i386/avx10_2-satcvt-1.c index f04e3ecb642..83ef63cf067 100644 --- a/gcc/testsuite/gcc.target/i386/avx10_2-satcvt-1.c +++ b/gcc/testsuite/gcc.target/i386/avx10_2-satcvt-1.c @@ -72,19 +72,81 @@ /* { dg-final { scan-assembler-times "vcvttnebf162iubs\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ /* { dg-final { scan-assembler-times "vcvttnebf162iubs\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\[^\{\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ /* { dg-final { scan-assembler-times "vcvttnebf162iubs\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvttpd2dqsy\[ \\t\]+\{sae\}\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvttpd2dqsy\[ \\t\]+\{sae\}\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvttpd2dqsy\[ \\t\]+\{sae\}\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvttpd2qqs\[ \\t\]+\{sae\}\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvttpd2qqs\[ \\t\]+\{sae\}\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvttpd2qqs\[ \\t\]+\{sae\}\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvttpd2udqsy\[ \\t\]+\{sae\}\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvttpd2udqsy\[ \\t\]+\{sae\}\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvttpd2udqsy\[ \\t\]+\{sae\}\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvttpd2uqqs\[ \\t\]+\{sae\}\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvttpd2uqqs\[ \\t\]+\{sae\}\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvttpd2uqqs\[ \\t\]+\{sae\}\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvttps2dqs\[ \\t\]+\{sae\}\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvttps2dqs\[ \\t\]+\{sae\}\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvttps2dqs\[ \\t\]+\{sae\}\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvttps2qqs\[ \\t\]+\{sae\}\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvttps2qqs\[ \\t\]+\{sae\}\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvttps2qqs\[ \\t\]+\{sae\}\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvttps2udqs\[ \\t\]+\{sae\}\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvttps2udqs\[ \\t\]+\{sae\}\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvttps2udqs\[ \\t\]+\{sae\}\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvttps2uqqs\[ \\t\]+\{sae\}\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvttps2uqqs\[ \\t\]+\{sae\}\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvttps2uqqs\[ \\t\]+\{sae\}\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvttpd2dqsx\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvttpd2dqsx\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvttpd2dqsx\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvttpd2qqs\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvttpd2qqs\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvttpd2qqs\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvttpd2udqsx\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvttpd2udqsx\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvttpd2udqsx\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvttpd2uqqs\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvttpd2uqqs\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvttpd2uqqs\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvttps2dqs\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvttps2dqs\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvttps2dqs\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvttps2qqs\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvttps2qqs\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvttps2qqs\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvttps2udqs\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvttps2udqs\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvttps2udqs\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvttps2uqqs\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvttps2uqqs\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvttps2uqqs\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvttsd2sis\[ \\t\]+\{sae\}\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%e.x+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvttsd2usis\[ \\t\]+\{sae\}\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%e.x+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvttss2sis\[ \\t\]+\{sae\}\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%e.x+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvttss2usis\[ \\t\]+\{sae\}\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%e.x+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvttsd2sis\[ \\t\]+\{sae\}\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%r.x+(?:\n|\[ \\t\]+#)" 1 { target { ! ia32 } } } } */ +/* { dg-final { scan-assembler-times "vcvttsd2usis\[ \\t\]+\{sae\}\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%r.x+(?:\n|\[ \\t\]+#)" 1 { target { ! ia32 } } } } */ +/* { dg-final { scan-assembler-times "vcvttss2sis\[ \\t\]+\{sae\}\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%r.x+(?:\n|\[ \\t\]+#)" 1 { target { ! ia32 } } } } */ +/* { dg-final { scan-assembler-times "vcvttss2usis\[ \\t\]+\{sae\}\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%r.x+(?:\n|\[ \\t\]+#)" 1 { target { ! ia32 } } } } */ #include volatile __m128 hx; volatile __m128i hxi; volatile __m128h hxh; +volatile __m128d hxd; volatile __m128bh hxbh; volatile __m256 x; volatile __m256h xh; volatile __m256i xi; +volatile __m256d xd; volatile __m256bh xbh; volatile __mmask8 m8; volatile __mmask16 m16; +volatile int i; +volatile unsigned int ui; +volatile long long ll; +volatile unsigned long long ull; void extern avx10_2_test (void) @@ -184,4 +246,80 @@ avx10_2_test (void) hxi = _mm_ipcvttnebf16_epu16 (hxbh); hxi = _mm_mask_ipcvttnebf16_epu16 (hxi, m8, hxbh); hxi = _mm_maskz_ipcvttnebf16_epu16 (m8, hxbh); + + hxi = _mm256_cvtts_roundpd_epi32 (xd, 8); + hxi = _mm256_mask_cvtts_roundpd_epi32 (hxi, m8, xd, 8); + hxi = _mm256_maskz_cvtts_roundpd_epi32 (m8, xd, 8); + + xi = _mm256_cvtts_roundpd_epi64 (xd, 8); + xi = _mm256_mask_cvtts_roundpd_epi64 (xi, m8, xd, 8); + xi = _mm256_maskz_cvtts_roundpd_epi64 (m8, xd, 8); + + hxi = _mm256_cvtts_roundpd_epu32 (xd, 8); + hxi = _mm256_mask_cvtts_roundpd_epu32 (hxi, m8, xd, 8); + hxi = _mm256_maskz_cvtts_roundpd_epu32 (m8, xd, 8); + + xi = _mm256_cvtts_roundpd_epu64 (xd, 8); + xi = _mm256_mask_cvtts_roundpd_epu64 (xi, m8, xd, 8); + xi = _mm256_maskz_cvtts_roundpd_epu64 (m8, xd, 8); + + xi = _mm256_cvtts_roundps_epi32 (x, 8); + xi = _mm256_mask_cvtts_roundps_epi32 (xi, m16, x, 8); + xi = _mm256_maskz_cvtts_roundps_epi32 (m16, x, 8); + + xi = _mm256_cvtts_roundps_epi64 (hx, 8); + xi = _mm256_mask_cvtts_roundps_epi64 (xi, m8, hx, 8); + xi = _mm256_maskz_cvtts_roundps_epi64 (m8, hx, 8); + + xi = _mm256_cvtts_roundps_epu32 (x, 8); + xi = _mm256_mask_cvtts_roundps_epu32 (xi, m16, x, 8); + xi = _mm256_maskz_cvtts_roundps_epu32 (m16, x, 8); + + xi = _mm256_cvtts_roundps_epu64 (hx, 8); + xi = _mm256_mask_cvtts_roundps_epu64 (xi, m8, hx, 8); + xi = _mm256_maskz_cvtts_roundps_epu64 (m8, hx, 8); + + hxi = _mm_cvttspd_epi32 (hxd); + hxi = _mm_mask_cvttspd_epi32 (hxi, m8, hxd); + hxi = _mm_maskz_cvttspd_epi32 (m8, hxd); + + hxi = _mm_cvttspd_epi64 (hxd); + hxi = _mm_mask_cvttspd_epi64 (hxi, m8, hxd); + hxi = _mm_maskz_cvttspd_epi64 (m8, hxd); + + hxi = _mm_cvttspd_epu32 (hxd); + hxi = _mm_mask_cvttspd_epu32 (hxi, m8, hxd); + hxi = _mm_maskz_cvttspd_epu32 (m8, hxd); + + hxi = _mm_cvttspd_epu64 (hxd); + hxi = _mm_mask_cvttspd_epu64 (hxi, m8, hxd); + hxi = _mm_maskz_cvttspd_epu64 (m8, hxd); + + hxi = _mm_cvttsps_epi32 (hx); + hxi = _mm_mask_cvttsps_epi32 (hxi, m8, hx); + hxi = _mm_maskz_cvttsps_epi32 (m8, hx); + + hxi = _mm_cvttsps_epi64 (hx); + hxi = _mm_mask_cvttsps_epi64 (hxi, m8, hx); + hxi = _mm_maskz_cvttsps_epi64 (m8, hx); + + hxi = _mm_cvttsps_epu32 (hx); + hxi = _mm_mask_cvttsps_epu32 (hxi, m8, hx); + hxi = _mm_maskz_cvttsps_epu32 (m8, hx); + + hxi = _mm_cvttsps_epu64 (hx); + hxi = _mm_mask_cvttsps_epu64 (hxi, m8, hx); + hxi = _mm_maskz_cvttsps_epu64 (m8, hx); + + i = _mm_cvtts_roundsd_epi32 (hxd, 8); + ui = _mm_cvtts_roundsd_epu32 (hxd, 8); + i = _mm_cvtts_roundss_epi32 (hx, 8); + ui = _mm_cvtts_roundss_epu32 (hx, 8); + +#ifdef __x86_64__ + ll = _mm_cvtts_roundsd_epi64 (hxd, 8); + ull = _mm_cvtts_roundsd_epu64 (hxd, 8); + ll = _mm_cvtts_roundss_epi64 (hx, 8); + ull = _mm_cvtts_roundss_epu64 (hx, 8); +#endif } diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-vcvttpd2dqs-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-vcvttpd2dqs-2.c new file mode 100644 index 00000000000..06cbb5b24e3 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx10_2-vcvttpd2dqs-2.c @@ -0,0 +1,16 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mavx10.2" } */ +/* { dg-require-effective-target avx10_2 } */ + +#define AVX10_2 +#define AVX512VL +#define AVX512F_LEN 256 +#define AVX512F_LEN_HALF 128 +#include "avx10_2-512-vcvttpd2dqs-2.c" + +#undef AVX512F_LEN +#undef AVX512F_LEN_HALF + +#define AVX512F_LEN 128 +#define AVX512F_LEN_HALF 128 +#include "avx10_2-512-vcvttpd2dqs-2.c" diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-vcvttpd2qqs-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-vcvttpd2qqs-2.c new file mode 100644 index 00000000000..df29d0f14da --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx10_2-vcvttpd2qqs-2.c @@ -0,0 +1,16 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mavx10.2" } */ +/* { dg-require-effective-target avx10_2 } */ + +#define AVX10_2 +#define AVX512VL +#define AVX512F_LEN 256 +#define AVX512F_LEN_HALF 128 +#include "avx10_2-512-vcvttpd2qqs-2.c" + +#undef AVX512F_LEN +#undef AVX512F_LEN_HALF + +#define AVX512F_LEN 128 +#define AVX512F_LEN_HALF 128 +#include "avx10_2-512-vcvttpd2qqs-2.c" diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-vcvttpd2udqs-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-vcvttpd2udqs-2.c new file mode 100644 index 00000000000..9e9cea121a3 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx10_2-vcvttpd2udqs-2.c @@ -0,0 +1,16 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mavx10.2" } */ +/* { dg-require-effective-target avx10_2 } */ + +#define AVX10_2 +#define AVX512VL +#define AVX512F_LEN 256 +#define AVX512F_LEN_HALF 128 +#include "avx10_2-512-vcvttpd2udqs-2.c" + +#undef AVX512F_LEN +#undef AVX512F_LEN_HALF + +#define AVX512F_LEN 128 +#define AVX512F_LEN_HALF 128 +#include "avx10_2-512-vcvttpd2udqs-2.c" diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-vcvttpd2uqqs-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-vcvttpd2uqqs-2.c new file mode 100644 index 00000000000..282b43f56a6 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx10_2-vcvttpd2uqqs-2.c @@ -0,0 +1,16 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mavx10.2" } */ +/* { dg-require-effective-target avx10_2 } */ + +#define AVX10_2 +#define AVX512VL +#define AVX512F_LEN 256 +#define AVX512F_LEN_HALF 128 +#include "avx10_2-512-vcvttpd2uqqs-2.c" + +#undef AVX512F_LEN +#undef AVX512F_LEN_HALF + +#define AVX512F_LEN 128 +#define AVX512F_LEN_HALF 128 +#include "avx10_2-512-vcvttpd2uqqs-2.c" diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-vcvttps2dqs-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-vcvttps2dqs-2.c new file mode 100644 index 00000000000..57acd36b28f --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx10_2-vcvttps2dqs-2.c @@ -0,0 +1,16 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mavx10.2" } */ +/* { dg-require-effective-target avx10_2 } */ + +#define AVX10_2 +#define AVX512VL +#define AVX512F_LEN 256 +#define AVX512F_LEN_HALF 128 +#include "avx10_2-512-vcvttps2dqs-2.c" + +#undef AVX512F_LEN +#undef AVX512F_LEN_HALF + +#define AVX512F_LEN 128 +#define AVX512F_LEN_HALF 128 +#include "avx10_2-512-vcvttps2dqs-2.c" diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-vcvttps2qqs-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-vcvttps2qqs-2.c new file mode 100644 index 00000000000..1e6bbfd24ea --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx10_2-vcvttps2qqs-2.c @@ -0,0 +1,16 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mavx10.2" } */ +/* { dg-require-effective-target avx10_2 } */ + +#define AVX10_2 +#define AVX512VL +#define AVX512F_LEN 256 +#define AVX512F_LEN_HALF 128 +#include "avx10_2-512-vcvttps2qqs-2.c" + +#undef AVX512F_LEN +#undef AVX512F_LEN_HALF + +#define AVX512F_LEN 128 +#define AVX512F_LEN_HALF 128 +#include "avx10_2-512-vcvttps2qqs-2.c" diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-vcvttps2udqs-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-vcvttps2udqs-2.c new file mode 100644 index 00000000000..4b175e694f2 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx10_2-vcvttps2udqs-2.c @@ -0,0 +1,16 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mavx10.2" } */ +/* { dg-require-effective-target avx10_2 } */ + +#define AVX10_2 +#define AVX512VL +#define AVX512F_LEN 256 +#define AVX512F_LEN_HALF 128 +#include "avx10_2-512-vcvttps2udqs-2.c" + +#undef AVX512F_LEN +#undef AVX512F_LEN_HALF + +#define AVX512F_LEN 128 +#define AVX512F_LEN_HALF 128 +#include "avx10_2-512-vcvttps2udqs-2.c" diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-vcvttps2uqqs-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-vcvttps2uqqs-2.c new file mode 100644 index 00000000000..3abebfb4559 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx10_2-vcvttps2uqqs-2.c @@ -0,0 +1,16 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mavx10.2" } */ +/* { dg-require-effective-target avx10_2 } */ + +#define AVX10_2 +#define AVX512VL +#define AVX512F_LEN 256 +#define AVX512F_LEN_HALF 128 +#include "avx10_2-512-vcvttps2uqqs-2.c" + +#undef AVX512F_LEN +#undef AVX512F_LEN_HALF + +#define AVX512F_LEN 128 +#define AVX512F_LEN_HALF 128 +#include "avx10_2-512-vcvttps2uqqs-2.c" diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-vcvttsd2sis-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-vcvttsd2sis-2.c new file mode 100644 index 00000000000..9e4bd71a411 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx10_2-vcvttsd2sis-2.c @@ -0,0 +1,47 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mavx10.2" } */ +/* { dg-require-effective-target avx10_2 } */ + +#define AVX10_2 +#define AVX10_SCALAR +#include "avx10-helper.h" +#include + +void +TEST (void) +{ + UNION_TYPE (128, d) s; + int res1; + long long res2; + int res1_ref = 0; + long long res2_ref = 0; + int i, sign = 1; + + s.a[0] = 2.46; + + res1 = _mm_cvtts_roundsd_epi32 (s.x, 8); + + if (s.a[0] > INT_MAX) + res1_ref = INT_MAX; + else if (s.a[0] < INT_MIN) + res1_ref = INT_MIN; + else + res1_ref = s.a[0]; + + if (res1 != res1_ref) + abort(); + +#ifdef __x86_64__ + res2 = _mm_cvtts_roundsd_epi64 (s.x, 8); + + if (s.a[0] > LLONG_MAX) + res2_ref = LLONG_MAX; + else if (s.a[0] < LLONG_MIN) + res2_ref = LLONG_MIN; + else + res2_ref = s.a[0]; + + if (res2 != res2_ref) + abort(); +#endif +} diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-vcvttsd2usis-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-vcvttsd2usis-2.c new file mode 100644 index 00000000000..b4ab914862b --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx10_2-vcvttsd2usis-2.c @@ -0,0 +1,47 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mavx10.2" } */ +/* { dg-require-effective-target avx10_2 } */ + +#define AVX10_2 +#define AVX10_SCALAR +#include "avx10-helper.h" +#include + +void +TEST (void) +{ + UNION_TYPE (128, d) s; + unsigned int res1; + unsigned long long res2; + unsigned int res1_ref = 0; + unsigned long long res2_ref = 0; + int i, sign = 1; + + s.a[0] = 2.46; + + res1 = _mm_cvtts_roundsd_epu32 (s.x, 8); + + if (s.a[0] > UINT_MAX) + res1_ref = UINT_MAX; + else if (s.a[0] < 0) + res1_ref = 0; + else + res1_ref = s.a[0]; + + if (res1 != res1_ref) + abort(); + +#ifdef __x86_64__ + res2 = _mm_cvtts_roundsd_epu64 (s.x, 8); + + if (s.a[0] > ULONG_MAX) + res2_ref = ULONG_MAX; + else if (s.a[0] < 0) + res2_ref = 0; + else + res2_ref = s.a[0]; + + if (res2 != res2_ref) + abort(); +#endif +} diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-vcvttss2sis-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-vcvttss2sis-2.c new file mode 100644 index 00000000000..67b6b8d384b --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx10_2-vcvttss2sis-2.c @@ -0,0 +1,47 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mavx10.2" } */ +/* { dg-require-effective-target avx10_2 } */ + +#define AVX10_2 +#define AVX10_SCALAR +#include "avx10-helper.h" +#include + +void +TEST (void) +{ + UNION_TYPE (128, ) s; + int res1; + long long res2; + int res1_ref = 0; + long long res2_ref = 0; + int i, sign = 1; + + s.a[0] = 2.46; + + res1 = _mm_cvtts_roundss_epi32 (s.x, 8); + + if (s.a[0] > INT_MAX) + res1_ref = INT_MAX; + else if (s.a[0] < INT_MIN) + res1_ref = INT_MIN; + else + res1_ref = s.a[0]; + + if (res1 != res1_ref) + abort(); + +#ifdef __x86_64__ + res2 = _mm_cvtts_roundss_epi64 (s.x, 8); + + if (s.a[0] > LLONG_MAX) + res2_ref = LLONG_MAX; + else if (s.a[0] < LLONG_MIN) + res2_ref = LLONG_MIN; + else + res2_ref = s.a[0]; + + if (res2 != res2_ref) + abort(); +#endif +} diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-vcvttss2usis-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-vcvttss2usis-2.c new file mode 100644 index 00000000000..1e58a9c6979 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx10_2-vcvttss2usis-2.c @@ -0,0 +1,46 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mavx10.2" } */ +/* { dg-require-effective-target avx10_2 } */ + +#define AVX10_2 +#define AVX10_SCALAR +#include "avx10-helper.h" +#include + +void +TEST (void) +{ + UNION_TYPE (AVX512F_LEN, ) s; + unsigned int res1; + unsigned long long res2; + unsigned int res1_ref = 0; + unsigned long long res2_ref = 0; + + s.a[0] = 2.46; + + res1 = _mm_cvtts_roundss_epu32 (s.x, 8); + + if (s.a[0] > UINT_MAX) + res1_ref = UINT_MAX; + else if (s.a[0] < 0) + res1_ref = 0; + else + res1_ref = s.a[0]; + + if (res1 != res1_ref) + abort(); + +#ifdef __x86_64__ + res2 = _mm_cvtts_roundss_epu64 (s.x, 8); + + if (s.a[0] > ULONG_MAX) + res2_ref = ULONG_MAX; + else if (s.a[0] < 0) + res2_ref = 0; + else + res2_ref = s.a[0]; + + if (res2 != res2_ref) + abort(); +#endif +} diff --git a/gcc/testsuite/gcc.target/i386/sse-13.c b/gcc/testsuite/gcc.target/i386/sse-13.c index 5669fa1aa00..1d6ca552fcc 100644 --- a/gcc/testsuite/gcc.target/i386/sse-13.c +++ b/gcc/testsuite/gcc.target/i386/sse-13.c @@ -1052,6 +1052,14 @@ #define __builtin_ia32_cvttph2iubs512_mask_round(A, B, C, D) __builtin_ia32_cvttph2iubs512_mask_round(A, B, C, 8) #define __builtin_ia32_cvttps2ibs512_mask_round(A, B, C, D) __builtin_ia32_cvttps2ibs512_mask_round(A, B, C, 8) #define __builtin_ia32_cvttps2iubs512_mask_round(A, B, C, D) __builtin_ia32_cvttps2iubs512_mask_round(A, B, C, 8) +#define __builtin_ia32_cvttpd2dqs512_mask_round(A, B, C, D) __builtin_ia32_cvttpd2dqs512_mask_round(A, B, C, 8) +#define __builtin_ia32_cvttpd2qqs512_mask_round(A, B, C, D) __builtin_ia32_cvttpd2qqs512_mask_round(A, B, C, 8) +#define __builtin_ia32_cvttpd2udqs512_mask_round(A, B, C, D) __builtin_ia32_cvttpd2udqs512_mask_round(A, B, C, 8) +#define __builtin_ia32_cvttpd2uqqs512_mask_round(A, B, C, D) __builtin_ia32_cvttpd2uqqs512_mask_round(A, B, C, 8) +#define __builtin_ia32_cvttps2dqs512_mask_round(A, B, C, D) __builtin_ia32_cvttps2dqs512_mask_round(A, B, C, 8) +#define __builtin_ia32_cvttps2qqs512_mask_round(A, B, C, D) __builtin_ia32_cvttps2qqs512_mask_round(A, B, C, 8) +#define __builtin_ia32_cvttps2udqs512_mask_round(A, B, C, D) __builtin_ia32_cvttps2udqs512_mask_round(A, B, C, 8) +#define __builtin_ia32_cvttps2uqqs512_mask_round(A, B, C, D) __builtin_ia32_cvttps2uqqs512_mask_round(A, B, C, 8) /* avx10_2satcvtintrin.h */ #define __builtin_ia32_cvtph2ibs256_mask_round(A, B, C, D) __builtin_ia32_cvtph2ibs256_mask_round(A, B, C, 8) @@ -1062,5 +1070,23 @@ #define __builtin_ia32_cvttph2iubs256_mask_round(A, B, C, D) __builtin_ia32_cvttph2iubs256_mask_round(A, B, C, 8) #define __builtin_ia32_cvttps2ibs256_mask_round(A, B, C, D) __builtin_ia32_cvttps2ibs256_mask_round(A, B, C, 8) #define __builtin_ia32_cvttps2iubs256_mask_round(A, B, C, D) __builtin_ia32_cvttps2iubs256_mask_round(A, B, C, 8) +#define __builtin_ia32_cvttpd2dqs256_mask_round(A, B, C, D) __builtin_ia32_cvttpd2dqs256_mask_round(A, B, C, 8) +#define __builtin_ia32_cvttpd2qqs256_mask_round(A, B, C, D) __builtin_ia32_cvttpd2qqs256_mask_round(A, B, C, 8) +#define __builtin_ia32_cvttpd2udqs256_mask_round(A, B, C, D) __builtin_ia32_cvttpd2udqs256_mask_round(A, B, C, 8) +#define __builtin_ia32_cvttpd2uqqs256_mask_round(A, B, C, D) __builtin_ia32_cvttpd2uqqs256_mask_round(A, B, C, 8) +#define __builtin_ia32_cvttps2dqs256_mask_round(A, B, C, D) __builtin_ia32_cvttps2dqs256_mask_round(A, B, C, 8) +#define __builtin_ia32_cvttps2qqs256_mask_round(A, B, C, D) __builtin_ia32_cvttps2qqs256_mask_round(A, B, C, 8) +#define __builtin_ia32_cvttps2udqs256_mask_round(A, B, C, D) __builtin_ia32_cvttps2udqs256_mask_round(A, B, C, 8) +#define __builtin_ia32_cvttps2uqqs256_mask_round(A, B, C, D) __builtin_ia32_cvttps2uqqs256_mask_round(A, B, C, 8) +#define __builtin_ia32_cvttsd2sis32_round(A, B) __builtin_ia32_cvttsd2sis32_round(A, 8) +#define __builtin_ia32_cvttsd2usis32_round(A, B) __builtin_ia32_cvttsd2usis32_round(A, 8) +#define __builtin_ia32_cvttss2sis32_round(A, B) __builtin_ia32_cvttss2sis32_round(A, 8) +#define __builtin_ia32_cvttss2usis32_round(A, B) __builtin_ia32_cvttss2usis32_round(A, 8) +#ifdef __x86_64__ +#define __builtin_ia32_cvttsd2sis64_round(A, B) __builtin_ia32_cvttsd2sis64_round(A, 8) +#define __builtin_ia32_cvttsd2usis64_round(A, B) __builtin_ia32_cvttsd2usis64_round(A, 8) +#define __builtin_ia32_cvttss2sis64_round(A, B) __builtin_ia32_cvttss2sis64_round(A, 8) +#define __builtin_ia32_cvttss2usis64_round(A, B) __builtin_ia32_cvttss2usis64_round(A, 8) +#endif #include diff --git a/gcc/testsuite/gcc.target/i386/sse-14.c b/gcc/testsuite/gcc.target/i386/sse-14.c index 550d2633b78..799982b6f7e 100644 --- a/gcc/testsuite/gcc.target/i386/sse-14.c +++ b/gcc/testsuite/gcc.target/i386/sse-14.c @@ -1457,6 +1457,30 @@ test_3 (_mm512_mask_ipcvtt_roundph_epi16, __m512i, __m512i, __mmask32, __m512h, test_3 (_mm512_mask_ipcvtt_roundph_epu16, __m512i, __m512i, __mmask32, __m512h, 8) test_3 (_mm512_mask_ipcvtt_roundps_epi32, __m512i, __m512i, __mmask16, __m512, 8) test_3 (_mm512_mask_ipcvtt_roundps_epu32, __m512i, __m512i, __mmask16, __m512, 8) +test_1 (_mm512_cvtts_roundpd_epi32, __m256i, __m512d, 8) +test_2 (_mm512_maskz_cvtts_roundpd_epi32, __m256i, __mmask8, __m512d, 8) +test_3 (_mm512_mask_cvtts_roundpd_epi32, __m256i, __m256i, __mmask8, __m512d, 8) +test_1 (_mm512_cvtts_roundpd_epi64, __m512i, __m512d, 8) +test_2 (_mm512_maskz_cvtts_roundpd_epi64, __m512i, __mmask8, __m512d, 8) +test_3 (_mm512_mask_cvtts_roundpd_epi64, __m512i, __m512i, __mmask8, __m512d, 8) +test_1 (_mm512_cvtts_roundpd_epu32, __m256i, __m512d, 8) +test_2 (_mm512_maskz_cvtts_roundpd_epu32, __m256i, __mmask8, __m512d, 8) +test_3 (_mm512_mask_cvtts_roundpd_epu32, __m256i, __m256i, __mmask8, __m512d, 8) +test_1 (_mm512_cvtts_roundpd_epu64, __m512i, __m512d, 8) +test_2 (_mm512_maskz_cvtts_roundpd_epu64, __m512i, __mmask8, __m512d, 8) +test_3 (_mm512_mask_cvtts_roundpd_epu64, __m512i, __m512i, __mmask8, __m512d, 8) +test_1 (_mm512_cvtts_roundps_epi32, __m512i, __m512, 8) +test_2 (_mm512_maskz_cvtts_roundps_epi32, __m512i, __mmask16, __m512, 8) +test_3 (_mm512_mask_cvtts_roundps_epi32, __m512i, __m512i, __mmask16, __m512, 8) +test_1 (_mm512_cvtts_roundps_epi64, __m512i, __m256, 8) +test_2 (_mm512_maskz_cvtts_roundps_epi64, __m512i, __mmask8, __m256, 8) +test_3 (_mm512_mask_cvtts_roundps_epi64, __m512i, __m512i, __mmask8, __m256, 8) +test_1 (_mm512_cvtts_roundps_epu32, __m512i, __m512, 8) +test_2 (_mm512_maskz_cvtts_roundps_epu32, __m512i, __mmask16, __m512, 8) +test_3 (_mm512_mask_cvtts_roundps_epu32, __m512i, __m512i, __mmask16, __m512, 8) +test_1 (_mm512_cvtts_roundps_epu64, __m512i, __m256, 8) +test_2 (_mm512_maskz_cvtts_roundps_epu64, __m512i, __mmask8, __m256, 8) +test_3 (_mm512_mask_cvtts_roundps_epu64, __m512i, __m512i, __mmask8, __m256, 8) /* avx10_2satcvtintrin.h */ test_1 (_mm256_ipcvt_roundph_epi16, __m256i, __m256h, 8) @@ -1483,3 +1507,37 @@ test_3 (_mm256_mask_ipcvtt_roundph_epi16, __m256i, __m256i, __mmask16, __m256h, test_3 (_mm256_mask_ipcvtt_roundph_epu16, __m256i, __m256i, __mmask16, __m256h, 8) test_3 (_mm256_mask_ipcvtt_roundps_epi32, __m256i, __m256i, __mmask8, __m256, 8) test_3 (_mm256_mask_ipcvtt_roundps_epu32, __m256i, __m256i, __mmask8, __m256, 8) +test_1 (_mm256_cvtts_roundpd_epi32, __m128i, __m256d, 8) +test_2 (_mm256_maskz_cvtts_roundpd_epi32, __m128i, __mmask8, __m256d, 8) +test_3 (_mm256_mask_cvtts_roundpd_epi32, __m128i, __m128i, __mmask8, __m256d, 8) +test_1 (_mm256_cvtts_roundpd_epi64, __m256i, __m256d, 8) +test_2 (_mm256_maskz_cvtts_roundpd_epi64, __m256i, __mmask8, __m256d, 8) +test_3 (_mm256_mask_cvtts_roundpd_epi64, __m256i, __m256i, __mmask8, __m256d, 8) +test_1 (_mm256_cvtts_roundpd_epu32, __m128i, __m256d, 8) +test_2 (_mm256_maskz_cvtts_roundpd_epu32, __m128i, __mmask8, __m256d, 8) +test_3 (_mm256_mask_cvtts_roundpd_epu32, __m128i, __m128i, __mmask8, __m256d, 8) +test_1 (_mm256_cvtts_roundpd_epu64, __m256i, __m256d, 8) +test_2 (_mm256_maskz_cvtts_roundpd_epu64, __m256i, __mmask8, __m256d, 8) +test_3 (_mm256_mask_cvtts_roundpd_epu64, __m256i, __m256i, __mmask8, __m256d, 8) +test_1 (_mm256_cvtts_roundps_epi32, __m256i, __m256, 8) +test_2 (_mm256_maskz_cvtts_roundps_epi32, __m256i, __mmask8, __m256, 8) +test_3 (_mm256_mask_cvtts_roundps_epi32, __m256i, __m256i, __mmask8, __m256, 8) +test_1 (_mm256_cvtts_roundps_epi64, __m256i, __m128, 8) +test_2 (_mm256_maskz_cvtts_roundps_epi64, __m256i, __mmask8, __m128, 8) +test_3 (_mm256_mask_cvtts_roundps_epi64, __m256i, __m256i, __mmask8, __m128, 8) +test_1 (_mm256_cvtts_roundps_epu32, __m256i, __m256, 8) +test_2 (_mm256_maskz_cvtts_roundps_epu32, __m256i, __mmask8, __m256, 8) +test_3 (_mm256_mask_cvtts_roundps_epu32, __m256i, __m256i, __mmask8, __m256, 8) +test_1 (_mm256_cvtts_roundps_epu64, __m256i, __m128, 8) +test_2 (_mm256_maskz_cvtts_roundps_epu64, __m256i, __mmask8, __m128, 8) +test_3 (_mm256_mask_cvtts_roundps_epu64, __m256i, __m256i, __mmask8, __m128, 8) +test_1 (_mm_cvtts_roundsd_epi32, int, __m128d, 8) +test_1 (_mm_cvtts_roundsd_epu32, unsigned int, __m128d, 8) +test_1 (_mm_cvtts_roundss_epi32, int, __m128, 8) +test_1 (_mm_cvtts_roundss_epu32, unsigned int, __m128, 8) +#ifdef __x86_64__ +test_1 (_mm_cvtts_roundsd_epi64, long long, __m128d, 8) +test_1 (_mm_cvtts_roundsd_epu64, unsigned long long, __m128d, 8) +test_1 (_mm_cvtts_roundss_epi64, long long, __m128, 8) +test_1 (_mm_cvtts_roundss_epu64, unsigned long long, __m128, 8) +#endif diff --git a/gcc/testsuite/gcc.target/i386/sse-22.c b/gcc/testsuite/gcc.target/i386/sse-22.c index ba67ee26914..b8eb6ae7828 100644 --- a/gcc/testsuite/gcc.target/i386/sse-22.c +++ b/gcc/testsuite/gcc.target/i386/sse-22.c @@ -1496,6 +1496,30 @@ test_3 (_mm512_mask_ipcvtt_roundph_epi16, __m512i, __m512i, __mmask32, __m512h, test_3 (_mm512_mask_ipcvtt_roundph_epu16, __m512i, __m512i, __mmask32, __m512h, 8) test_3 (_mm512_mask_ipcvtt_roundps_epi32, __m512i, __m512i, __mmask16, __m512, 8) test_3 (_mm512_mask_ipcvtt_roundps_epu32, __m512i, __m512i, __mmask16, __m512, 8) +test_1 (_mm512_cvtts_roundpd_epi32, __m256i, __m512d, 8) +test_2 (_mm512_maskz_cvtts_roundpd_epi32, __m256i, __mmask8, __m512d, 8) +test_3 (_mm512_mask_cvtts_roundpd_epi32, __m256i, __m256i, __mmask8, __m512d, 8) +test_1 (_mm512_cvtts_roundpd_epi64, __m512i, __m512d, 8) +test_2 (_mm512_maskz_cvtts_roundpd_epi64, __m512i, __mmask8, __m512d, 8) +test_3 (_mm512_mask_cvtts_roundpd_epi64, __m512i, __m512i, __mmask8, __m512d, 8) +test_1 (_mm512_cvtts_roundpd_epu32, __m256i, __m512d, 8) +test_2 (_mm512_maskz_cvtts_roundpd_epu32, __m256i, __mmask8, __m512d, 8) +test_3 (_mm512_mask_cvtts_roundpd_epu32, __m256i, __m256i, __mmask8, __m512d, 8) +test_1 (_mm512_cvtts_roundpd_epu64, __m512i, __m512d, 8) +test_2 (_mm512_maskz_cvtts_roundpd_epu64, __m512i, __mmask8, __m512d, 8) +test_3 (_mm512_mask_cvtts_roundpd_epu64, __m512i, __m512i, __mmask8, __m512d, 8) +test_1 (_mm512_cvtts_roundps_epi32, __m512i, __m512, 8) +test_2 (_mm512_maskz_cvtts_roundps_epi32, __m512i, __mmask16, __m512, 8) +test_3 (_mm512_mask_cvtts_roundps_epi32, __m512i, __m512i, __mmask16, __m512, 8) +test_1 (_mm512_cvtts_roundps_epi64, __m512i, __m256, 8) +test_2 (_mm512_maskz_cvtts_roundps_epi64, __m512i, __mmask8, __m256, 8) +test_3 (_mm512_mask_cvtts_roundps_epi64, __m512i, __m512i, __mmask8, __m256, 8) +test_1 (_mm512_cvtts_roundps_epu32, __m512i, __m512, 8) +test_2 (_mm512_maskz_cvtts_roundps_epu32, __m512i, __mmask16, __m512, 8) +test_3 (_mm512_mask_cvtts_roundps_epu32, __m512i, __m512i, __mmask16, __m512, 8) +test_1 (_mm512_cvtts_roundps_epu64, __m512i, __m256, 8) +test_2 (_mm512_maskz_cvtts_roundps_epu64, __m512i, __mmask8, __m256, 8) +test_3 (_mm512_mask_cvtts_roundps_epu64, __m512i, __m512i, __mmask8, __m256, 8) /* avx10_2satcvtintrin.h */ test_1 (_mm256_ipcvt_roundph_epi16, __m256i, __m256h, 8) @@ -1522,3 +1546,37 @@ test_3 (_mm256_mask_ipcvtt_roundph_epi16, __m256i, __m256i, __mmask16, __m256h, test_3 (_mm256_mask_ipcvtt_roundph_epu16, __m256i, __m256i, __mmask16, __m256h, 8) test_3 (_mm256_mask_ipcvtt_roundps_epi32, __m256i, __m256i, __mmask8, __m256, 8) test_3 (_mm256_mask_ipcvtt_roundps_epu32, __m256i, __m256i, __mmask8, __m256, 8) +test_1 (_mm256_cvtts_roundpd_epi32, __m128i, __m256d, 8) +test_2 (_mm256_maskz_cvtts_roundpd_epi32, __m128i, __mmask8, __m256d, 8) +test_3 (_mm256_mask_cvtts_roundpd_epi32, __m128i, __m128i, __mmask8, __m256d, 8) +test_1 (_mm256_cvtts_roundpd_epi64, __m256i, __m256d, 8) +test_2 (_mm256_maskz_cvtts_roundpd_epi64, __m256i, __mmask8, __m256d, 8) +test_3 (_mm256_mask_cvtts_roundpd_epi64, __m256i, __m256i, __mmask8, __m256d, 8) +test_1 (_mm256_cvtts_roundpd_epu32, __m128i, __m256d, 8) +test_2 (_mm256_maskz_cvtts_roundpd_epu32, __m128i, __mmask8, __m256d, 8) +test_3 (_mm256_mask_cvtts_roundpd_epu32, __m128i, __m128i, __mmask8, __m256d, 8) +test_1 (_mm256_cvtts_roundpd_epu64, __m256i, __m256d, 8) +test_2 (_mm256_maskz_cvtts_roundpd_epu64, __m256i, __mmask8, __m256d, 8) +test_3 (_mm256_mask_cvtts_roundpd_epu64, __m256i, __m256i, __mmask8, __m256d, 8) +test_1 (_mm256_cvtts_roundps_epi32, __m256i, __m256, 8) +test_2 (_mm256_maskz_cvtts_roundps_epi32, __m256i, __mmask8, __m256, 8) +test_3 (_mm256_mask_cvtts_roundps_epi32, __m256i, __m256i, __mmask8, __m256, 8) +test_1 (_mm256_cvtts_roundps_epi64, __m256i, __m128, 8) +test_2 (_mm256_maskz_cvtts_roundps_epi64, __m256i, __mmask8, __m128, 8) +test_3 (_mm256_mask_cvtts_roundps_epi64, __m256i, __m256i, __mmask8, __m128, 8) +test_1 (_mm256_cvtts_roundps_epu32, __m256i, __m256, 8) +test_2 (_mm256_maskz_cvtts_roundps_epu32, __m256i, __mmask8, __m256, 8) +test_3 (_mm256_mask_cvtts_roundps_epu32, __m256i, __m256i, __mmask8, __m256, 8) +test_1 (_mm256_cvtts_roundps_epu64, __m256i, __m128, 8) +test_2 (_mm256_maskz_cvtts_roundps_epu64, __m256i, __mmask8, __m128, 8) +test_3 (_mm256_mask_cvtts_roundps_epu64, __m256i, __m256i, __mmask8, __m128, 8) +test_1 (_mm_cvtts_roundsd_epi32, int, __m128d, 8) +test_1 (_mm_cvtts_roundsd_epu32, unsigned int, __m128d, 8) +test_1 (_mm_cvtts_roundss_epi32, int, __m128, 8) +test_1 (_mm_cvtts_roundss_epu32, unsigned int, __m128, 8) +#ifdef __x86_64__ +test_1 (_mm_cvtts_roundsd_epi64, long long, __m128d, 8) +test_1 (_mm_cvtts_roundsd_epu64, unsigned long long, __m128d, 8) +test_1 (_mm_cvtts_roundss_epi64, long long, __m128, 8) +test_1 (_mm_cvtts_roundss_epu64, unsigned long long, __m128, 8) +#endif diff --git a/gcc/testsuite/gcc.target/i386/sse-23.c b/gcc/testsuite/gcc.target/i386/sse-23.c index 7e8b5d01871..f3ab4a4f34a 100644 --- a/gcc/testsuite/gcc.target/i386/sse-23.c +++ b/gcc/testsuite/gcc.target/i386/sse-23.c @@ -1026,6 +1026,14 @@ #define __builtin_ia32_cvttph2iubs512_mask_round(A, B, C, D) __builtin_ia32_cvttph2iubs512_mask_round(A, B, C, 8) #define __builtin_ia32_cvttps2ibs512_mask_round(A, B, C, D) __builtin_ia32_cvttps2ibs512_mask_round(A, B, C, 8) #define __builtin_ia32_cvttps2iubs512_mask_round(A, B, C, D) __builtin_ia32_cvttps2iubs512_mask_round(A, B, C, 8) +#define __builtin_ia32_cvttpd2dqs512_mask_round(A, B, C, D) __builtin_ia32_cvttpd2dqs512_mask_round(A, B, C, 8) +#define __builtin_ia32_cvttpd2qqs512_mask_round(A, B, C, D) __builtin_ia32_cvttpd2qqs512_mask_round(A, B, C, 8) +#define __builtin_ia32_cvttpd2udqs512_mask_round(A, B, C, D) __builtin_ia32_cvttpd2udqs512_mask_round(A, B, C, 8) +#define __builtin_ia32_cvttpd2uqqs512_mask_round(A, B, C, D) __builtin_ia32_cvttpd2uqqs512_mask_round(A, B, C, 8) +#define __builtin_ia32_cvttps2dqs512_mask_round(A, B, C, D) __builtin_ia32_cvttps2dqs512_mask_round(A, B, C, 8) +#define __builtin_ia32_cvttps2qqs512_mask_round(A, B, C, D) __builtin_ia32_cvttps2qqs512_mask_round(A, B, C, 8) +#define __builtin_ia32_cvttps2udqs512_mask_round(A, B, C, D) __builtin_ia32_cvttps2udqs512_mask_round(A, B, C, 8) +#define __builtin_ia32_cvttps2uqqs512_mask_round(A, B, C, D) __builtin_ia32_cvttps2uqqs512_mask_round(A, B, C, 8) /* avx10_2satcvtintrin.h */ #define __builtin_ia32_cvtph2ibs256_mask_round(A, B, C, D) __builtin_ia32_cvtph2ibs256_mask_round(A, B, C, 8) @@ -1036,6 +1044,24 @@ #define __builtin_ia32_cvttph2iubs256_mask_round(A, B, C, D) __builtin_ia32_cvttph2iubs256_mask_round(A, B, C, 8) #define __builtin_ia32_cvttps2ibs256_mask_round(A, B, C, D) __builtin_ia32_cvttps2ibs256_mask_round(A, B, C, 8) #define __builtin_ia32_cvttps2iubs256_mask_round(A, B, C, D) __builtin_ia32_cvttps2iubs256_mask_round(A, B, C, 8) +#define __builtin_ia32_cvttpd2dqs256_mask_round(A, B, C, D) __builtin_ia32_cvttpd2dqs256_mask_round(A, B, C, 8) +#define __builtin_ia32_cvttpd2qqs256_mask_round(A, B, C, D) __builtin_ia32_cvttpd2qqs256_mask_round(A, B, C, 8) +#define __builtin_ia32_cvttpd2udqs256_mask_round(A, B, C, D) __builtin_ia32_cvttpd2udqs256_mask_round(A, B, C, 8) +#define __builtin_ia32_cvttpd2uqqs256_mask_round(A, B, C, D) __builtin_ia32_cvttpd2uqqs256_mask_round(A, B, C, 8) +#define __builtin_ia32_cvttps2dqs256_mask_round(A, B, C, D) __builtin_ia32_cvttps2dqs256_mask_round(A, B, C, 8) +#define __builtin_ia32_cvttps2qqs256_mask_round(A, B, C, D) __builtin_ia32_cvttps2qqs256_mask_round(A, B, C, 8) +#define __builtin_ia32_cvttps2udqs256_mask_round(A, B, C, D) __builtin_ia32_cvttps2udqs256_mask_round(A, B, C, 8) +#define __builtin_ia32_cvttps2uqqs256_mask_round(A, B, C, D) __builtin_ia32_cvttps2uqqs256_mask_round(A, B, C, 8) +#define __builtin_ia32_cvttsd2sis32_round(A, B) __builtin_ia32_cvttsd2sis32_round(A, 8) +#define __builtin_ia32_cvttsd2usis32_round(A, B) __builtin_ia32_cvttsd2usis32_round(A, 8) +#define __builtin_ia32_cvttss2sis32_round(A, B) __builtin_ia32_cvttss2sis32_round(A, 8) +#define __builtin_ia32_cvttss2usis32_round(A, B) __builtin_ia32_cvttss2usis32_round(A, 8) +#ifdef __x86_64__ +#define __builtin_ia32_cvttsd2sis64_round(A, B) __builtin_ia32_cvttsd2sis64_round(A, 8) +#define __builtin_ia32_cvttsd2usis64_round(A, B) __builtin_ia32_cvttsd2usis64_round(A, 8) +#define __builtin_ia32_cvttss2sis64_round(A, B) __builtin_ia32_cvttss2sis64_round(A, 8) +#define __builtin_ia32_cvttss2usis64_round(A, B) __builtin_ia32_cvttss2usis64_round(A, 8) +#endif #pragma GCC target ("sse4a,3dnow,avx,avx2,fma4,xop,aes,pclmul,popcnt,abm,lzcnt,bmi,bmi2,tbm,lwp,fsgsbase,rdrnd,f16c,fma,rtm,rdseed,prfchw,adx,fxsr,xsaveopt,sha,xsavec,xsaves,clflushopt,clwb,mwaitx,clzero,pku,sgx,rdpid,gfni,vpclmulqdq,pconfig,wbnoinvd,enqcmd,avx512vp2intersect,serialize,tsxldtrk,amx-tile,amx-int8,amx-bf16,kl,widekl,avxvnni,avxifma,avxvnniint8,avxneconvert,cmpccxadd,amx-fp16,prefetchi,raoint,amx-complex,avxvnniint16,sm3,sha512,sm4,avx10.2-512") From patchwork Mon Aug 19 09:02:48 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Haochen Jiang X-Patchwork-Id: 1973735 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=intel.com header.i=@intel.com header.a=rsa-sha256 header.s=Intel header.b=USWYf1Jx; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=8.43.85.97; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4WnRTx3nLfz1yf6 for ; Mon, 19 Aug 2024 19:03:49 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id C1799386481F for ; Mon, 19 Aug 2024 09:03:47 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.9]) by sourceware.org (Postfix) with ESMTPS id 3E50A386481F for ; Mon, 19 Aug 2024 09:02:58 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 3E50A386481F Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=intel.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 3E50A386481F Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=198.175.65.9 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1724058184; cv=none; b=i+m/e0H7jSi3zUgwS4KvZlCLFcX7F0NJjwnU99fZG5VjCiom3HCOGP9YdXxdfbEMkTWN0C4zOz19pEd8F4/Ai/N8NMXK8N/NQSsn87oZldvkaa4vbS8fJEpKGPwXf0uP1XOyyII2ueT04X1Ecso5Q1SoNclXolSmQ6pQR5eZCqo= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1724058184; c=relaxed/simple; bh=XpiblN/aOCiXxTz71UaASNCWJra5OgQgrj6HljjZoz4=; h=DKIM-Signature:From:To:Subject:Date:Message-ID:MIME-Version; b=hONu93aKHC8aavius5+xA+Ve9m0wcfkSMTSdw1lSeKswXj5gBj+ChRQ95ruhDwslOk1XlvNJIK9PxN9C4ckLRM4A1ZwwgW4ml44QVjO+aUcTFdJ/8FA4v0z0pwHmQdmDpbWMs2l4xjMNb4wT4XafNYhByFjrbirxM+7zExOoJOE= ARC-Authentication-Results: i=1; server2.sourceware.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1724058178; x=1755594178; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=XpiblN/aOCiXxTz71UaASNCWJra5OgQgrj6HljjZoz4=; b=USWYf1JxF7OpZPOc+7tnArh3gR0txCMj7biaJw1q7FRimnJMGfr0zwwf FgGRfzf/SYp0CLq7gXgGtLxrtgok2ZCZ6uB6HXMxBnJ5eapUoQqDEbLEK rtKER32D5X8PvBn9CehY2siN4wrCVfgR7mpai8yc7CtwUVRChUSnjteqW BN46a2Twb7M9Ci8LJWatHuNEKOukT0gxEg9RC4GURJem4W5I1BBFW9hCT 65M0+w1KXEuIB/4VPMNgdrt2pWcnOt1K3X3wgCDSeA9Ozv16KQHUQJj7d Be/MVf9YV3rKjokgX/b9AKX1Fy/NCoPn/ACtnzARnkJizrAx97wM9zCny g==; X-CSE-ConnectionGUID: 5YXFm8/ISNKmmv3keC8WYQ== X-CSE-MsgGUID: PCnbO3H7SyahMQjhic5tNQ== X-IronPort-AV: E=McAfee;i="6700,10204,11168"; a="44815260" X-IronPort-AV: E=Sophos;i="6.10,158,1719903600"; d="scan'208";a="44815260" Received: from orviesa001.jf.intel.com ([10.64.159.141]) by orvoesa101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Aug 2024 02:02:57 -0700 X-CSE-ConnectionGUID: iz1/O1cFRrOEDmTlwtoHcw== X-CSE-MsgGUID: 4S2AxkofQ3SDwkYxcPOtTw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.10,158,1719903600"; d="scan'208";a="97793541" Received: from scymds03.sc.intel.com ([10.148.94.166]) by orviesa001.jf.intel.com with ESMTP; 19 Aug 2024 02:02:56 -0700 Received: from icl-spr-01.jf.intel.com (icl-spr-01.jf.intel.com [10.165.54.241]) by scymds03.sc.intel.com (Postfix) with ESMTP id F03BA7E; Mon, 19 Aug 2024 02:02:55 -0700 (PDT) From: Haochen Jiang To: gcc-patches@gcc.gnu.org Cc: hongtao.liu@intel.com, zewei.mo@pitt.edu, ubizjak@gmail.com, "Mo, Zewei" , Lin Hu Subject: [PATCH 09/12] AVX10.2: Support minmax instructions Date: Mon, 19 Aug 2024 02:02:48 -0700 Message-ID: <20240819090255.193430-1-haochen.jiang@intel.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20240819085717.193256-1-haochen.jiang@intel.com> References: <20240819085717.193256-1-haochen.jiang@intel.com> MIME-Version: 1.0 X-Spam-Status: No, score=-10.7 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, SPF_HELO_NONE, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org From: "Mo, Zewei" gcc/ChangeLog: * config.gcc: Add avx10_2-512minmaxintrin.h and avx10_2minmaxintrin.h. * config/i386/i386-builtin-types.def: Add DEF_FUNCTION_TYPE (V8BF, V8BF, V8BF, INT, V8BF, UQI), (V16BF, V16BF, V16BF, INT, V16BF, UHI), (V32BF, V32BF, V32BF, INT, V32BF, USI), (V8HF, V8HF, V8HF, INT, V8HF, UQI), (V8DF, V8DF, V8DF, INT, V8DF, UQI, INT), (V32HF, V32HF, V32HF, INT, V32HF, USI, INT), (V16HF, V16HF, V16HF, INT, V16HF, UHI, INT), (V16SF, V16SF, V16SF, INT, V16SF, UHI, INT). * config/i386/i386-builtin.def (BDESC): Add new builtins. * config/i386/i386-expand.cc (ix86_expand_args_builtin): Handle V8BF_FTYPE_V8BF_V8BF_INT_V8BF_UQI, V16BF_FTYPE_V16BF_V16BF_INT_V16BF_UHI, V32BF_FTYPE_V32BF_V32BF_INT_V32BF_USI, V8HF_FTYPE_V8HF_V8HF_INT_V8HF_UQI, (ix86_expand_round_builtin): Handle V8DF_FTYPE_V8DF_V8DF_INT_V8DF_UQI_INT, V32HF_FTYPE_V32HF_V32HF_INT_V32HF_USI_INT, V16HF_FTYPE_V16HF_V16HF_INT_V16HF_UHI_INT. V16SF_FTYPE_V16SF_V16SF_INT_V16SF_UHI_INT. * config/i386/immtrin.h: Include avx10_2-512miscsintrin.h and avx10_2miscsintrin.h. * config/i386/sse.md (avx10_2_vminmaxnepbf16_): New. (avx10_2_minmaxp): Ditto. (avx10_2_minmaxs): Ditto. * config/i386/avx10_2-512minmaxintrin.h: New file. * config/i386/avx10_2minmaxintrin.h: Ditto. gcc/testsuite/ChangeLog: * gcc.target/i386/avx-1.c: Add macros. * gcc.target/i386/sse-13.c: Ditto. * gcc.target/i386/sse-14.c: Ditto. * gcc.target/i386/sse-22.c: Ditto. * gcc.target/i386/sse-23.c: Ditto. * gcc.target/i386/avx10-minmax-helper.h: New helper file. * gcc.target/i386/avx10_2-512-minmax-1.c: New test. * gcc.target/i386/avx10_2-512-vminmaxnepbf16-2.c: Ditto. * gcc.target/i386/avx10_2-512-vminmaxpd-2.c: Ditto. * gcc.target/i386/avx10_2-512-vminmaxph-2.c: Ditto. * gcc.target/i386/avx10_2-512-vminmaxps-2.c: Ditto. * gcc.target/i386/avx10_2-mixmax-1.c: Ditto. * gcc.target/i386/avx10_2-vminmaxnepbf16-2.c: Ditto. * gcc.target/i386/avx10_2-vminmaxsd-2.c: Ditto. * gcc.target/i386/avx10_2-vminmaxsh-2.c: Ditto. * gcc.target/i386/avx10_2-vminmaxss-2.c: Ditto. * gcc.target/i386/avx10_2-vminmaxpd-2.c: Ditto. * gcc.target/i386/avx10_2-vminmaxph-2.c: Ditto. * gcc.target/i386/avx10_2-vminmaxps-2.c: Ditto. Co-authored-by: Lin Hu Co-authored-by: Haochen Jiang --- gcc/config.gcc | 3 +- gcc/config/i386/avx10_2-512minmaxintrin.h | 489 ++++++++ gcc/config/i386/avx10_2minmaxintrin.h | 1063 +++++++++++++++++ gcc/config/i386/i386-builtin-types.def | 8 + gcc/config/i386/i386-builtin.def | 16 +- gcc/config/i386/i386-expand.cc | 8 + gcc/config/i386/immintrin.h | 5 + gcc/config/i386/sse.md | 46 + gcc/testsuite/gcc.target/i386/avx-1.c | 19 + .../gcc.target/i386/avx10-minmax-helper.h | 257 ++++ .../gcc.target/i386/avx10_2-512-minmax-1.c | 51 + .../i386/avx10_2-512-vminmaxnepbf16-2.c | 35 + .../gcc.target/i386/avx10_2-512-vminmaxpd-2.c | 35 + .../gcc.target/i386/avx10_2-512-vminmaxph-2.c | 35 + .../gcc.target/i386/avx10_2-512-vminmaxps-2.c | 35 + .../gcc.target/i386/avx10_2-minmax-1.c | 122 ++ .../i386/avx10_2-vminmaxnepbf16-2.c | 13 + .../gcc.target/i386/avx10_2-vminmaxpd-2.c | 13 + .../gcc.target/i386/avx10_2-vminmaxph-2.c | 15 + .../gcc.target/i386/avx10_2-vminmaxps-2.c | 13 + .../gcc.target/i386/avx10_2-vminmaxsd-2.c | 34 + .../gcc.target/i386/avx10_2-vminmaxsh-2.c | 34 + .../gcc.target/i386/avx10_2-vminmaxss-2.c | 34 + .../gcc.target/i386/avx512f-helper.h | 2 + gcc/testsuite/gcc.target/i386/sse-13.c | 19 + gcc/testsuite/gcc.target/i386/sse-14.c | 67 ++ gcc/testsuite/gcc.target/i386/sse-22.c | 67 ++ gcc/testsuite/gcc.target/i386/sse-23.c | 19 + 28 files changed, 2555 insertions(+), 2 deletions(-) create mode 100644 gcc/config/i386/avx10_2-512minmaxintrin.h create mode 100644 gcc/config/i386/avx10_2minmaxintrin.h create mode 100644 gcc/testsuite/gcc.target/i386/avx10-minmax-helper.h create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-512-minmax-1.c create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-512-vminmaxnepbf16-2.c create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-512-vminmaxpd-2.c create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-512-vminmaxph-2.c create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-512-vminmaxps-2.c create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-minmax-1.c create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-vminmaxnepbf16-2.c create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-vminmaxpd-2.c create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-vminmaxph-2.c create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-vminmaxps-2.c create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-vminmaxsd-2.c create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-vminmaxsh-2.c create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-vminmaxss-2.c diff --git a/gcc/config.gcc b/gcc/config.gcc index 4bcb461b68c..cd8a34b292f 100644 --- a/gcc/config.gcc +++ b/gcc/config.gcc @@ -456,7 +456,8 @@ i[34567]86-*-* | x86_64-*-*) avx10_2mediaintrin.h avx10_2-512mediaintrin.h avx10_2convertintrin.h avx10_2-512convertintrin.h avx10_2bf16intrin.h avx10_2-512bf16intrin.h - avx10_2satcvtintrin.h avx10_2-512satcvtintrin.h" + avx10_2satcvtintrin.h avx10_2-512satcvtintrin.h + avx10_2minmaxintrin.h avx10_2-512minmaxintrin.h" ;; ia64-*-*) extra_headers=ia64intrin.h diff --git a/gcc/config/i386/avx10_2-512minmaxintrin.h b/gcc/config/i386/avx10_2-512minmaxintrin.h new file mode 100644 index 00000000000..95e9bee0079 --- /dev/null +++ b/gcc/config/i386/avx10_2-512minmaxintrin.h @@ -0,0 +1,489 @@ +/* Copyright (C) 2024 Free Software Foundation, Inc. + This file is part of GCC. + GCC is free software; you can redistribute it and/or modify + it under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 3, or (at your option) + any later version. + GCC is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + Under Section 7 of GPL version 3, you are granted additional + permissions described in the GCC Runtime Library Exception, version + 3.1, as published by the Free Software Foundation. + You should have received a copy of the GNU General Public License and + a copy of the GCC Runtime Library Exception along with this program; + see the files COPYING3 and COPYING.RUNTIME respectively. If not, see + . */ + +#if !defined _IMMINTRIN_H_INCLUDED +#error "Never use directly; include instead." +#endif + +#ifndef _AVX10_2_512MINMAXINTRIN_H_INCLUDED +#define _AVX10_2_512MINMAXINTRIN_H_INCLUDED + +#if !defined (__AVX10_2_512__) +#pragma GCC push_options +#pragma GCC target("avx10.2-512") +#define __DISABLE_AVX10_2_512__ +#endif /* __AVX10_2_512__ */ + +#ifdef __OPTIMIZE__ +extern __inline __m512bh +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_minmax_nepbh (__m512bh __A, __m512bh __B, const int __C) +{ + return (__m512bh) __builtin_ia32_minmaxnepbf16512_mask ((__v32bf) __A, + (__v32bf) __B, + __C, + (__v32bf)(__m512bh) + _mm512_setzero_si512 (), + (__mmask32) -1); +} + +extern __inline __m512bh +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_minmax_nepbh (__m512bh __W, __mmask32 __U, + __m512bh __A, __m512bh __B, const int __C) +{ + return (__m512bh) __builtin_ia32_minmaxnepbf16512_mask ((__v32bf) __A, + (__v32bf) __B, + __C, + (__v32bf) __W, + (__mmask32) __U); +} + +extern __inline __m512bh +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_minmax_nepbh (__mmask32 __U, __m512bh __A, + __m512bh __B, const int __C) +{ + return (__m512bh) __builtin_ia32_minmaxnepbf16512_mask ((__v32bf) __A, + (__v32bf) __B, + __C, + (__v32bf)(__m512bh) + _mm512_setzero_si512 (), + (__mmask32) __U); +} + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_minmax_pd (__m512d __A, __m512d __B, const int __C) +{ + return (__m512d) __builtin_ia32_minmaxpd512_mask_round ((__v8df) __A, + (__v8df) __B, + __C, + (__v8df) + _mm512_undefined_pd (), + (__mmask8) -1, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_minmax_pd (__m512d __W, __mmask8 __U, __m512d __A, + __m512d __B, const int __C) +{ + return (__m512d) __builtin_ia32_minmaxpd512_mask_round ((__v8df) __A, + (__v8df) __B, + __C, + (__v8df) __W, + (__mmask8) __U, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_minmax_pd (__mmask8 __U, __m512d __A, __m512d __B, + const int __C) +{ + return (__m512d) __builtin_ia32_minmaxpd512_mask_round ((__v8df) __A, + (__v8df) __B, + __C, + (__v8df) + _mm512_setzero_pd (), + (__mmask8) __U, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_minmax_round_pd (__m512d __A, __m512d __B, const int __C, + const int __R) +{ + return (__m512d) __builtin_ia32_minmaxpd512_mask_round ((__v8df) __A, + (__v8df) __B, + __C, + (__v8df) + _mm512_undefined_pd (), + (__mmask8) -1, __R); +} + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_minmax_round_pd (__m512d __W, __mmask8 __U, __m512d __A, + __m512d __B, const int __C, const int __R) +{ + return (__m512d) __builtin_ia32_minmaxpd512_mask_round ((__v8df) __A, + (__v8df) __B, + __C, + (__v8df) __W, + (__mmask8) __U, __R); +} + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_minmax_round_pd (__mmask8 __U, __m512d __A, __m512d __B, + const int __C, const int __R) +{ + return (__m512d) __builtin_ia32_minmaxpd512_mask_round ((__v8df) __A, + (__v8df) __B, + __C, + (__v8df) + _mm512_setzero_pd (), + (__mmask8) __U, __R); +} + +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_minmax_ph (__m512h __A, __m512h __B, const int __C) +{ + return (__m512h) __builtin_ia32_minmaxph512_mask_round ((__v32hf) __A, + (__v32hf) __B, + __C, + (__v32hf) + _mm512_undefined_ph (), + (__mmask32) -1, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_minmax_ph (__m512h __W, __mmask32 __U, __m512h __A, + __m512h __B, const int __C) +{ + return (__m512h) __builtin_ia32_minmaxph512_mask_round ((__v32hf) __A, + (__v32hf) __B, + __C, + (__v32hf) __W, + (__mmask32) __U, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_minmax_ph (__mmask32 __U, __m512h __A, __m512h __B, + const int __C) +{ + return (__m512h) __builtin_ia32_minmaxph512_mask_round ((__v32hf) __A, + (__v32hf) __B, + __C, + (__v32hf) + _mm512_setzero_ph (), + (__mmask32) __U, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_minmax_round_ph (__m512h __A, __m512h __B, const int __C, const int __R) +{ + return (__m512h) __builtin_ia32_minmaxph512_mask_round ((__v32hf) __A, + (__v32hf) __B, + __C, + (__v32hf) + _mm512_undefined_ph (), + (__mmask32) -1, __R); +} + +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_minmax_round_ph (__m512h __W, __mmask32 __U, __m512h __A, + __m512h __B, const int __C, const int __R) +{ + return (__m512h) __builtin_ia32_minmaxph512_mask_round ((__v32hf) __A, + (__v32hf) __B, + __C, + (__v32hf) __W, + (__mmask32) __U, __R); +} + +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_minmax_round_ph (__mmask32 __U, __m512h __A, __m512h __B, + const int __C, const int __R) +{ + return (__m512h) __builtin_ia32_minmaxph512_mask_round ((__v32hf) __A, + (__v32hf) __B, + __C, + (__v32hf) + _mm512_setzero_ph (), + (__mmask32) __U, __R); +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_minmax_ps (__m512 __A, __m512 __B, const int __C) +{ + return (__m512) __builtin_ia32_minmaxps512_mask_round ((__v16sf) __A, + (__v16sf) __B, + __C, + (__v16sf) + _mm512_undefined_ps (), + (__mmask16) -1, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_minmax_ps (__m512 __W, __mmask16 __U, __m512 __A, + __m512 __B, const int __C) +{ + return (__m512) __builtin_ia32_minmaxps512_mask_round ((__v16sf) __A, + (__v16sf) __B, + __C, + (__v16sf) __W, + (__mmask16) __U, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_minmax_ps (__mmask16 __U, __m512 __A, __m512 __B, + const int __C) +{ + return (__m512) __builtin_ia32_minmaxps512_mask_round ((__v16sf) __A, + (__v16sf) __B, + __C, + (__v16sf) + _mm512_setzero_ps (), + (__mmask16) __U, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_minmax_round_ps (__m512 __A, __m512 __B, const int __C, const int __R) +{ + return (__m512) __builtin_ia32_minmaxps512_mask_round ((__v16sf) __A, + (__v16sf) __B, + __C, + (__v16sf) + _mm512_undefined_ps (), + (__mmask16) -1, __R); +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_minmax_round_ps (__m512 __W, __mmask16 __U, __m512 __A, + __m512 __B, const int __C, const int __R) +{ + return (__m512) __builtin_ia32_minmaxps512_mask_round ((__v16sf) __A, + (__v16sf) __B, + __C, + (__v16sf) __W, + (__mmask16) __U, __R); +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_minmax_round_ps (__mmask16 __U, __m512 __A, __m512 __B, + const int __C, const int __R) +{ + return (__m512) __builtin_ia32_minmaxps512_mask_round ((__v16sf) __A, + (__v16sf) __B, + __C, + (__v16sf) + _mm512_setzero_ps (), + (__mmask16) __U, __R); +} + +#else +#define _mm512_minmax_nepbh(A, B, C) \ + ((__m512bh) __builtin_ia32_minmaxnepbf16512_mask ((__v32bf) (A), \ + (__v32bf) (B), \ + (int) (C), \ + (__v32bf) (__m512bh) \ + _mm512_setzero_si512 (), \ + (__mmask32) (-1))) + +#define _mm512_mask_minmax_nepbh(W, U, A, B, C) \ + ((__m512bh) __builtin_ia32_minmaxnepbf16512_mask ((__v32bf) (A), \ + (__v32bf) (B), \ + (int) (C), \ + (__v32bf) (__m512bh) (W), \ + (__mmask32) (U))) + +#define _mm512_maskz_minmax_nepbh(U, A, B, C) \ + ((__m512bh) __builtin_ia32_minmaxnepbf16512_mask ((__v32bf) (A), \ + (__v32bf) (B), \ + (int) (C), \ + (__v32bf) (__m512bh) \ + _mm512_setzero_si512 (), \ + (__mmask32) (U))) + +#define _mm512_minmax_round_pd(A, B, C, R) \ + ((__m512d) __builtin_ia32_minmaxpd512_mask_round ((__v8df) (A), \ + (__v8df) (B), \ + (int) (C), \ + (__v8df) (__m512d) \ + _mm512_undefined_pd (), \ + (__mmask8) (-1), \ + (int) (R))) + +#define _mm512_mask_minmax_round_pd(W, U, A, B, C, R) \ + ((__m512d) __builtin_ia32_minmaxpd512_mask_round ((__v8df) (A), \ + (__v8df) (B), \ + (int) (C), \ + (__v8df) (__m512d) (W), \ + (__mmask8) (U), \ + (int) (R))) + +#define _mm512_maskz_minmax_round_pd(U, A, B, C, R) \ + ((__m512d) __builtin_ia32_minmaxpd512_mask_round ((__v8df) (A), \ + (__v8df) (B), \ + (int) (C), \ + (__v8df) (__m512d) \ + _mm512_setzero_pd (), \ + (__mmask8) (U), \ + (int) (R))) + +#define _mm512_minmax_round_ph(A, B, C, R) \ + ((__m512h) __builtin_ia32_minmaxph512_mask_round ((__v32hf) (A), \ + (__v32hf) (B), \ + (int) (C), \ + (__v32hf) (__m512h) \ + _mm512_undefined_ph (), \ + (__mmask32) (-1), \ + (int) (R))) + +#define _mm512_mask_minmax_round_ph(W, U, A, B, C, R) \ + ((__m512h) __builtin_ia32_minmaxph512_mask_round ((__v32hf) (A), \ + (__v32hf) (B), \ + (int) (C), \ + (__v32hf) (__m512h) (W), \ + (__mmask32) (U), \ + (int) (R))) + +#define _mm512_maskz_minmax_round_ph(U, A, B, C, R) \ + ((__m512h) __builtin_ia32_minmaxph512_mask_round ((__v32hf) (A), \ + (__v32hf) (B), \ + (int) (C), \ + (__v32hf) (__m512h) \ + _mm512_setzero_ph (), \ + (__mmask32) (U), \ + (int) (R))) + +#define _mm512_minmax_round_ps(A, B, C, R) \ + ((__m512) __builtin_ia32_minmaxps512_mask_round ((__v16sf) (A), \ + (__v16sf) (B), \ + (int) (C), \ + (__v16sf) (__m512) \ + _mm512_undefined_ps (), \ + (__mmask16) (-1), \ + (int) (R))) + +#define _mm512_mask_minmax_round_ps(W, U, A, B, C, R) \ + ((__m512) __builtin_ia32_minmaxps512_mask_round ((__v16sf) (A), \ + (__v16sf) (B), \ + (int) (C), \ + (__v16sf) (__m512) (W), \ + (__mmask16) (U), \ + (int) (R))) + +#define _mm512_maskz_minmax_round_ps(U, A, B, C, R) \ + ((__m512) __builtin_ia32_minmaxps512_mask_round ((__v16sf) (A), \ + (__v16sf) (B), \ + (int) (C), \ + (__v16sf) (__m512) \ + _mm512_setzero_ps (), \ + (__mmask16) (U), \ + (int) (R))) + +#define _mm512_minmax_pd(A, B, C) \ + ((__m512d) __builtin_ia32_minmaxpd512_mask_round ((__v8df) (A), \ + (__v8df) (B), \ + (int) (C), \ + (__v8df) (__m512d) \ + _mm512_undefined_pd (), \ + (__mmask8) (-1), \ + _MM_FROUND_CUR_DIRECTION)) + +#define _mm512_mask_minmax_pd(W, U, A, B, C) \ + ((__m512d) __builtin_ia32_minmaxpd512_mask_round ((__v8df) (A), \ + (__v8df) (B), \ + (int) (C), \ + (__v8df) (__m512d) (W), \ + (__mmask8) (U), \ + _MM_FROUND_CUR_DIRECTION)) + +#define _mm512_maskz_minmax_pd(U, A, B, C) \ + ((__m512d) __builtin_ia32_minmaxpd512_mask_round ((__v8df) (A), \ + (__v8df) (B), \ + (int) (C), \ + (__v8df) (__m512d) \ + _mm512_setzero_pd (), \ + (__mmask8) (U), \ + _MM_FROUND_CUR_DIRECTION)) + +#define _mm512_minmax_ph(A, B, C) \ + ((__m512h) __builtin_ia32_minmaxph512_mask_round ((__v32hf) (A), \ + (__v32hf) (B), \ + (int) (C), \ + (__v32hf) (__m512h) \ + _mm512_undefined_ph (), \ + (__mmask32) (-1), \ + _MM_FROUND_CUR_DIRECTION)) + +#define _mm512_mask_minmax_ph(W, U, A, B, C) \ + ((__m512h) __builtin_ia32_minmaxph512_mask_round ((__v32hf) (A), \ + (__v32hf) (B), \ + (int) (C), \ + (__v32hf) (__m512h) (W), \ + (__mmask32) (U), \ + _MM_FROUND_CUR_DIRECTION)) + +#define _mm512_maskz_minmax_ph(U, A, B, C) \ + ((__m512h) __builtin_ia32_minmaxph512_mask_round ((__v32hf) (A), \ + (__v32hf) (B), \ + (int) (C), \ + (__v32hf) (__m512h) \ + _mm512_setzero_ph (), \ + (__mmask32) (U), \ + _MM_FROUND_CUR_DIRECTION)) + +#define _mm512_minmax_ps(A, B, C) \ + ((__m512) __builtin_ia32_minmaxps512_mask_round ((__v16sf) (A), \ + (__v16sf) (B), \ + (int) (C), \ + (__v16sf) (__m512) \ + _mm512_undefined_ps (), \ + (__mmask16) (-1), \ + _MM_FROUND_CUR_DIRECTION)) + +#define _mm512_mask_minmax_ps(W, U, A, B, C) \ + ((__m512) __builtin_ia32_minmaxps512_mask_round ((__v16sf) (A), \ + (__v16sf) (B), \ + (int) (C), \ + (__v16sf) (__m512) (W), \ + (__mmask16) (U), \ + _MM_FROUND_CUR_DIRECTION)) + +#define _mm512_maskz_minmax_ps(U, A, B, C) \ + ((__m512) __builtin_ia32_minmaxps512_mask_round ((__v16sf) (A), \ + (__v16sf) (B), \ + (int) (C), \ + (__v16sf) (__m512) \ + _mm512_setzero_ps (), \ + (__mmask16) (U), \ + _MM_FROUND_CUR_DIRECTION)) + +#endif + +#ifdef __DISABLE_AVX10_2_512__ +#undef __DISABLE_AVX10_2_512__ +#pragma GCC pop_options +#endif /* __DISABLE_AVX10_2_512__ */ + +#endif /* _AVX10_2_512MINMAXINTRIN_H_INCLUDED */ diff --git a/gcc/config/i386/avx10_2minmaxintrin.h b/gcc/config/i386/avx10_2minmaxintrin.h new file mode 100644 index 00000000000..a4dad80a89c --- /dev/null +++ b/gcc/config/i386/avx10_2minmaxintrin.h @@ -0,0 +1,1063 @@ +/* Copyright (C) 2024 Free Software Foundation, Inc. + This file is part of GCC. + GCC is free software; you can redistribute it and/or modify + it under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 3, or (at your option) + any later version. + GCC is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + Under Section 7 of GPL version 3, you are granted additional + permissions described in the GCC Runtime Library Exception, version + 3.1, as published by the Free Software Foundation. + You should have received a copy of the GNU General Public License and + a copy of the GCC Runtime Library Exception along with this program; + see the files COPYING3 and COPYING.RUNTIME respectively. If not, see + . */ + +#if !defined _IMMINTRIN_H_INCLUDED +#error "Never use directly; include instead." +#endif + +#ifndef _AVX10_2MINMAXINTRIN_H_INCLUDED +#define _AVX10_2MINMAXINTRIN_H_INCLUDED + +#if !defined(__AVX10_2_256__) +#pragma GCC push_options +#pragma GCC target("avx10.2") +#define __DISABLE_AVX10_2_256__ +#endif /* __AVX10_2_256__ */ + +#ifdef __OPTIMIZE__ +extern __inline __m128bh +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_minmax_nepbh (__m128bh __A, __m128bh __B, const int __C) +{ + return (__m128bh) __builtin_ia32_minmaxnepbf16128_mask ((__v8bf) __A, + (__v8bf) __B, + __C, + (__v8bf)(__m128bh) + _mm_setzero_si128 (), + (__mmask8) -1); +} + +extern __inline __m128bh +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_minmax_nepbh (__m128bh __W, __mmask8 __U, __m128bh __A, + __m128bh __B, const int __C) +{ + return (__m128bh) __builtin_ia32_minmaxnepbf16128_mask ((__v8bf) __A, + (__v8bf) __B, + __C, + (__v8bf) __W, + (__mmask8) __U); +} + +extern __inline __m128bh +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_minmax_nepbh (__mmask8 __U, __m128bh __A, __m128bh __B, const int __C) +{ + return (__m128bh) __builtin_ia32_minmaxnepbf16128_mask ((__v8bf) __A, + (__v8bf) __B, + __C, + (__v8bf)(__m128bh) + _mm_setzero_si128 (), + (__mmask8) __U); +} + +extern __inline __m256bh +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_minmax_nepbh (__m256bh __A, __m256bh __B, const int __C) +{ + return (__m256bh) __builtin_ia32_minmaxnepbf16256_mask ((__v16bf) __A, + (__v16bf) __B, + __C, + (__v16bf)(__m256bh) + _mm256_setzero_si256 (), + (__mmask16) -1); +} + +extern __inline __m256bh +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_mask_minmax_nepbh (__m256bh __W, __mmask16 __U, __m256bh __A, __m256bh __B, + const int __C) +{ + return (__m256bh) __builtin_ia32_minmaxnepbf16256_mask ((__v16bf) __A, + (__v16bf) __B, + __C, + (__v16bf) __W, + (__mmask16) __U); +} + +extern __inline __m256bh +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_maskz_minmax_nepbh (__mmask16 __U, __m256bh __A, __m256bh __B, const int __C) +{ + return (__m256bh) __builtin_ia32_minmaxnepbf16256_mask ((__v16bf) __A, + (__v16bf) __B, + __C, + (__v16bf)(__m256bh) + _mm256_setzero_si256 (), + (__mmask16) __U); +} + +extern __inline __m128d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_minmax_pd (__m128d __A, __m128d __B, const int __C) +{ + return (__m128d) __builtin_ia32_minmaxpd128_mask ((__v2df) __A, + (__v2df) __B, + __C, + (__v2df)(__m128d) + _mm_undefined_pd (), + (__mmask8) -1); +} + +extern __inline __m128d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_minmax_pd (__m128d __W, __mmask8 __U, __m128d __A, __m128d __B, + const int __C) +{ + return (__m128d) __builtin_ia32_minmaxpd128_mask ((__v2df) __A, + (__v2df) __B, + __C, + (__v2df) __W, + (__mmask8) __U); +} + +extern __inline __m128d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_minmax_pd (__mmask8 __U, __m128d __A, __m128d __B, const int __C) +{ + return (__m128d) __builtin_ia32_minmaxpd128_mask ((__v2df) __A, + (__v2df) __B, + __C, + (__v2df)(__m128d) + _mm_setzero_pd (), + (__mmask8) __U); +} + +extern __inline __m256d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_minmax_pd (__m256d __A, __m256d __B, const int __C) +{ + return (__m256d) __builtin_ia32_minmaxpd256_mask_round ( + (__v4df) __A, (__v4df) __B, __C, + (__v4df) (__m256d) _mm256_undefined_pd (), + (__mmask8) -1, _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m256d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_mask_minmax_pd (__m256d __W, __mmask8 __U, __m256d __A, __m256d __B, + const int __C) +{ + return (__m256d) __builtin_ia32_minmaxpd256_mask_round ( + (__v4df) __A, (__v4df) __B, __C, (__v4df) __W, + (__mmask8) __U, _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m256d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_maskz_minmax_pd (__mmask8 __U, __m256d __A, __m256d __B, const int __C) +{ + return (__m256d) __builtin_ia32_minmaxpd256_mask_round ( + (__v4df) __A, (__v4df) __B, __C, + (__v4df) (__m256d) _mm256_setzero_pd (), + (__mmask8) __U, _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m256d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_minmax_round_pd (__m256d __A, __m256d __B, const int __C, const int __R) +{ + return (__m256d) __builtin_ia32_minmaxpd256_mask_round ( + (__v4df) __A, (__v4df) __B, __C, + (__v4df) (__m256d) _mm256_undefined_pd (), + (__mmask8) -1, __R); +} + +extern __inline __m256d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_mask_minmax_round_pd (__m256d __W, __mmask8 __U, __m256d __A, + __m256d __B, const int __C, const int __R) +{ + return (__m256d) __builtin_ia32_minmaxpd256_mask_round ( + (__v4df) __A, (__v4df) __B, __C, (__v4df) __W, + (__mmask8) __U, __R); +} + +extern __inline __m256d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_maskz_minmax_round_pd (__mmask8 __U, __m256d __A, __m256d __B, + const int __C, const int __R) +{ + return (__m256d) __builtin_ia32_minmaxpd256_mask_round ( + (__v4df) __A, (__v4df) __B, __C, + (__v4df) (__m256d) _mm256_setzero_pd (), + (__mmask8) __U, __R); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_minmax_ph (__m128h __A, __m128h __B, const int __C) +{ + return (__m128h) __builtin_ia32_minmaxph128_mask ((__v8hf) __A, + (__v8hf) __B, + __C, + (__v8hf)(__m128h) + _mm_undefined_ph (), + (__mmask8) -1); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_minmax_ph (__m128h __W, __mmask8 __U, __m128h __A, __m128h __B, + const int __C) +{ + return (__m128h) __builtin_ia32_minmaxph128_mask ((__v8hf) __A, + (__v8hf) __B, + __C, + (__v8hf) __W, + (__mmask8) __U); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_minmax_ph (__mmask8 __U, __m128h __A, __m128h __B, const int __C) +{ + return (__m128h) __builtin_ia32_minmaxph128_mask ((__v8hf) __A, + (__v8hf) __B, + __C, + (__v8hf)(__m128h) + _mm_setzero_ph (), + (__mmask8) __U); +} + +extern __inline __m256h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_minmax_ph (__m256h __A, __m256h __B, const int __C) +{ + return (__m256h) __builtin_ia32_minmaxph256_mask_round ( + (__v16hf) __A, (__v16hf) __B, __C, + (__v16hf) (__m256h) _mm256_undefined_ph (), + (__mmask16) -1, _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m256h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_mask_minmax_ph (__m256h __W, __mmask16 __U, __m256h __A, __m256h __B, + const int __C) +{ + return (__m256h) __builtin_ia32_minmaxph256_mask_round ( + (__v16hf) __A, (__v16hf) __B, __C, (__v16hf) __W, + (__mmask16) __U, _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m256h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_maskz_minmax_ph (__mmask16 __U, __m256h __A, __m256h __B, const int __C) +{ + return (__m256h) __builtin_ia32_minmaxph256_mask_round ( + (__v16hf) __A, (__v16hf) __B, __C, + (__v16hf) (__m256h) _mm256_setzero_ph (), + (__mmask16) __U, _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m256h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_minmax_round_ph (__m256h __A, __m256h __B, const int __C, const int __R) +{ + return (__m256h) __builtin_ia32_minmaxph256_mask_round ( + (__v16hf) __A, (__v16hf) __B, __C, + (__v16hf) (__m256h) _mm256_undefined_ph (), + (__mmask16) -1, __R); +} + +extern __inline __m256h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_mask_minmax_round_ph (__m256h __W, __mmask16 __U, __m256h __A, + __m256h __B, const int __C, const int __R) +{ + return (__m256h) __builtin_ia32_minmaxph256_mask_round ( + (__v16hf) __A, (__v16hf) __B, __C, (__v16hf) __W, + (__mmask16) __U, __R); +} + +extern __inline __m256h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_maskz_minmax_round_ph (__mmask16 __U, __m256h __A, __m256h __B, + const int __C, const int __R) +{ + return (__m256h) __builtin_ia32_minmaxph256_mask_round ( + (__v16hf) __A, (__v16hf) __B, __C, + (__v16hf) (__m256h) _mm256_setzero_ph (), + (__mmask16) __U, __R); +} + +extern __inline __m128 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_minmax_ps (__m128 __A, __m128 __B, const int __C) +{ + return (__m128) __builtin_ia32_minmaxps128_mask ((__v4sf) __A, + (__v4sf) __B, + __C, + (__v4sf)(__m128) + _mm_undefined_ps (), + (__mmask8) -1); +} + +extern __inline __m128 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_minmax_ps (__m128 __W, __mmask8 __U, __m128 __A, __m128 __B, + const int __C) +{ + return (__m128) __builtin_ia32_minmaxps128_mask ((__v4sf) __A, + (__v4sf) __B, + __C, + (__v4sf) __W, + (__mmask8) __U); +} + +extern __inline __m128 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_minmax_ps (__mmask8 __U, __m128 __A, __m128 __B, const int __C) +{ + return (__m128) __builtin_ia32_minmaxps128_mask ((__v4sf) __A, + (__v4sf) __B, + __C, + (__v4sf)(__m128) + _mm_setzero_ps (), + (__mmask8) __U); +} + +extern __inline __m256 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_minmax_ps (__m256 __A, __m256 __B, const int __C) +{ + return (__m256) __builtin_ia32_minmaxps256_mask_round ( + (__v8sf) __A, (__v8sf) __B, __C, + (__v8sf) (__m256) _mm256_undefined_ps (), + (__mmask8) -1, _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m256 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_mask_minmax_ps (__m256 __W, __mmask8 __U, __m256 __A, __m256 __B, + const int __C) +{ + return (__m256) __builtin_ia32_minmaxps256_mask_round ( + (__v8sf) __A, (__v8sf) __B, __C, (__v8sf) __W, + (__mmask8) __U, _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m256 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_maskz_minmax_ps (__mmask8 __U, __m256 __A, __m256 __B, const int __C) +{ + return (__m256) __builtin_ia32_minmaxps256_mask_round ( + (__v8sf) __A, (__v8sf) __B, __C, + (__v8sf) (__m256) _mm256_setzero_ps (), + (__mmask8) __U, _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m256 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_minmax_round_ps (__m256 __A, __m256 __B, const int __C, const int __R) +{ + return (__m256) __builtin_ia32_minmaxps256_mask_round ( + (__v8sf) __A, (__v8sf) __B, __C, + (__v8sf) (__m256) _mm256_undefined_ps (), + (__mmask8) -1, __R); +} + +extern __inline __m256 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_mask_minmax_round_ps (__m256 __W, __mmask8 __U, __m256 __A, __m256 __B, + const int __C, const int __R) +{ + return (__m256) __builtin_ia32_minmaxps256_mask_round ( + (__v8sf) __A, (__v8sf) __B, __C, (__v8sf) __W, + (__mmask8) __U, __R); +} + +extern __inline __m256 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_maskz_minmax_round_ps (__mmask8 __U, __m256 __A, __m256 __B, + const int __C, const int __R) +{ + return (__m256) __builtin_ia32_minmaxps256_mask_round ( + (__v8sf) __A, (__v8sf) __B, __C, + (__v8sf) (__m256) _mm256_setzero_ps (), + (__mmask8) __U, __R); +} + +extern __inline __m128d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_minmax_sd (__m128d __A, __m128d __B, const int __C) +{ + return (__m128d) __builtin_ia32_minmaxsd_mask_round ((__v2df) __A, + (__v2df) __B, + __C, + (__v2df) + _mm_undefined_pd (), + (__mmask8) -1, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m128d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_minmax_sd (__m128d __W, __mmask8 __U, __m128d __A, + __m128d __B, const int __C) +{ + return (__m128d) __builtin_ia32_minmaxsd_mask_round ((__v2df) __A, + (__v2df) __B, + __C, + (__v2df) __W, + (__mmask8) __U, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m128d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_minmax_sd (__mmask8 __U, __m128d __A, __m128d __B, + const int __C) +{ + return (__m128d) __builtin_ia32_minmaxsd_mask_round ((__v2df) __A, + (__v2df) __B, + __C, + (__v2df) + _mm_setzero_pd (), + (__mmask8) __U, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m128d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_minmax_round_sd (__m128d __A, __m128d __B, const int __C, const int __R) +{ + return (__m128d) __builtin_ia32_minmaxsd_mask_round ((__v2df) __A, + (__v2df) __B, + __C, + (__v2df) + _mm_undefined_pd (), + (__mmask8) -1, __R); +} + +extern __inline __m128d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_minmax_round_sd (__m128d __W, __mmask8 __U, __m128d __A, + __m128d __B, const int __C, const int __R) +{ + return (__m128d) __builtin_ia32_minmaxsd_mask_round ((__v2df) __A, + (__v2df) __B, + __C, + (__v2df) __W, + (__mmask8) __U, __R); +} + +extern __inline __m128d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_minmax_round_sd (__mmask8 __U, __m128d __A, __m128d __B, + const int __C, const int __R) +{ + return (__m128d) __builtin_ia32_minmaxsd_mask_round ((__v2df) __A, + (__v2df) __B, + __C, + (__v2df) + _mm_setzero_pd (), + (__mmask8) __U, __R); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_minmax_sh (__m128h __A, __m128h __B, const int __C) +{ + return (__m128h) __builtin_ia32_minmaxsh_mask_round ((__v8hf) __A, + (__v8hf) __B, + __C, + (__v8hf) + _mm_undefined_ph (), + (__mmask8) -1, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_minmax_sh (__m128h __W, __mmask8 __U, __m128h __A, __m128h __B, + const int __C) +{ + return (__m128h) __builtin_ia32_minmaxsh_mask_round ((__v8hf) __A, + (__v8hf) __B, + __C, + (__v8hf) __W, + (__mmask8) __U, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_minmax_sh (__mmask8 __U, __m128h __A, __m128h __B, + const int __C) +{ + return (__m128h) __builtin_ia32_minmaxsh_mask_round ((__v8hf) __A, + (__v8hf) __B, + __C, + (__v8hf) + _mm_setzero_ph (), + (__mmask8) __U, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_minmax_round_sh (__m128h __A, __m128h __B, const int __C, const int __R) +{ + return (__m128h) __builtin_ia32_minmaxsh_mask_round ((__v8hf) __A, + (__v8hf) __B, + __C, + (__v8hf) + _mm_undefined_ph (), + (__mmask8) -1, __R); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_minmax_round_sh (__m128h __W, __mmask8 __U, __m128h __A, __m128h __B, + const int __C, const int __R) +{ + return (__m128h) __builtin_ia32_minmaxsh_mask_round ((__v8hf) __A, + (__v8hf) __B, + __C, + (__v8hf) __W, + (__mmask8) __U, __R); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_minmax_round_sh (__mmask8 __U, __m128h __A, __m128h __B, + const int __C, const int __R) +{ + return (__m128h) __builtin_ia32_minmaxsh_mask_round ((__v8hf) __A, + (__v8hf) __B, + __C, + (__v8hf) + _mm_setzero_ph (), + (__mmask8) __U, __R); +} + +extern __inline __m128 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_minmax_ss (__m128 __A, __m128 __B, const int __C) +{ + return (__m128) __builtin_ia32_minmaxss_mask_round ((__v4sf) __A, + (__v4sf) __B, + __C, + (__v4sf) + _mm_undefined_ps (), + (__mmask8) -1, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m128 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_minmax_ss (__m128 __W, __mmask8 __U, __m128 __A, __m128 __B, + const int __C) +{ + return (__m128) __builtin_ia32_minmaxss_mask_round ((__v4sf) __A, + (__v4sf) __B, + __C, + (__v4sf) __W, + (__mmask8) __U, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m128 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_minmax_ss (__mmask8 __U, __m128 __A, __m128 __B, + const int __C) +{ + return (__m128) __builtin_ia32_minmaxss_mask_round ((__v4sf) __A, + (__v4sf) __B, + __C, + (__v4sf) + _mm_setzero_ps (), + (__mmask8) __U, + _MM_FROUND_CUR_DIRECTION); +} + + +extern __inline __m128 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_minmax_round_ss (__m128 __A, __m128 __B, const int __C, const int __R) +{ + return (__m128) __builtin_ia32_minmaxss_mask_round ((__v4sf) __A, + (__v4sf) __B, + __C, + (__v4sf) + _mm_undefined_ps (), + (__mmask8) -1, __R); +} + +extern __inline __m128 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_minmax_round_ss (__m128 __W, __mmask8 __U, __m128 __A, __m128 __B, + const int __C, const int __R) +{ + return (__m128) __builtin_ia32_minmaxss_mask_round ((__v4sf) __A, + (__v4sf) __B, + __C, + (__v4sf) __W, + (__mmask8) __U, __R); +} + +extern __inline __m128 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_minmax_round_ss (__mmask8 __U, __m128 __A, __m128 __B, + const int __C, const int __R) +{ + return (__m128) __builtin_ia32_minmaxss_mask_round ((__v4sf) __A, + (__v4sf) __B, + __C, + (__v4sf) + _mm_setzero_ps (), + (__mmask8) __U, __R); +} + +#else +#define _mm_minmax_nepbh(A, B, C) \ + ((__m128bh) __builtin_ia32_minmaxnepbf16128_mask ((__v8bf) (A), \ + (__v8bf) (B), \ + (int) (C), \ + (__v8bf) (__m128bh) \ + _mm_setzero_si128 (), \ + (__mmask8) (-1))) + +#define _mm_mask_minmax_nepbh(W, U, A, B, C) \ + ((__m128bh) __builtin_ia32_minmaxnepbf16128_mask ((__v8bf) (A), \ + (__v8bf) (B), \ + (int) (C), \ + (__v8bf) (__m128bh) (W), \ + (__mmask8) (U))) + +#define _mm_maskz_minmax_nepbh(U, A, B, C) \ + ((__m128bh) __builtin_ia32_minmaxnepbf16128_mask ((__v8bf) (A), \ + (__v8bf) (B), \ + (int) (C), \ + (__v8bf) (__m128bh) \ + _mm_setzero_si128 (), \ + (__mmask8) (U))) + +#define _mm256_minmax_nepbh(A, B, C) \ + ((__m256bh) __builtin_ia32_minmaxnepbf16256_mask ((__v16bf) (A), \ + (__v16bf) (B), \ + (int) (C), \ + (__v16bf) (__m256bh) \ + _mm256_setzero_si256 (), \ + (__mmask16) (-1))) + +#define _mm256_mask_minmax_nepbh(W, U, A, B, C) \ + ((__m256bh) __builtin_ia32_minmaxnepbf16256_mask ((__v16bf) (A), \ + (__v16bf) (B), \ + (int) (C), \ + (__v16bf) (__m256bh) (W), \ + (__mmask16) (U))) + +#define _mm256_maskz_minmax_nepbh(U, A, B, C) \ + ((__m256bh) __builtin_ia32_minmaxnepbf16256_mask ((__v16bf) (A), \ + (__v16bf) (B), \ + (int) (C), \ + (__v16bf) (__m256bh) \ + _mm256_setzero_si256 (), \ + (__mmask16) (U))) + +#define _mm_minmax_pd(A, B, C) \ + ((__m128d) __builtin_ia32_minmaxpd128_mask ((__v2df) (A), \ + (__v2df) (B), \ + (int) (C), \ + (__v2df) (__m128d) \ + _mm_undefined_pd (), \ + (__mmask8) (-1))) + +#define _mm_mask_minmax_pd(W, U, A, B, C) \ + ((__m128d) __builtin_ia32_minmaxpd128_mask ((__v2df) (A), \ + (__v2df) (B), \ + (int) (C), \ + (__v2df) (__m128d) (W), \ + (__mmask8) (U))) + +#define _mm_maskz_minmax_pd(U, A, B, C) \ + ((__m128d) __builtin_ia32_minmaxpd128_mask ((__v2df) (A), \ + (__v2df) (B), \ + (int) (C), \ + (__v2df) (__m128d) \ + _mm_setzero_pd (), \ + (__mmask8) (U))) + +#define _mm256_minmax_pd(A, B, C) \ + ((__m256d) __builtin_ia32_minmaxpd256_mask_round ((__v4df) (A), \ + (__v4df) (B), \ + (int) (C), \ + (__v4df) (__m256d) \ + _mm256_undefined_pd (), \ + (__mmask8) (-1), \ + _MM_FROUND_CUR_DIRECTION)) + +#define _mm256_mask_minmax_pd(W, U, A, B, C) \ + ((__m256d) __builtin_ia32_minmaxpd256_mask_round ((__v4df) (A), \ + (__v4df) (B), \ + (int) (C), \ + (__v4df) (__m256d) (W), \ + (__mmask8) (U), \ + _MM_FROUND_CUR_DIRECTION)) + +#define _mm256_maskz_minmax_pd(U, A, B, C) \ + ((__m256d) __builtin_ia32_minmaxpd256_mask_round ((__v4df) (A), \ + (__v4df) (B), \ + (int) (C), \ + (__v4df) (__m256d) \ + _mm256_setzero_pd (), \ + (__mmask8) (U), \ + _MM_FROUND_CUR_DIRECTION)) + +#define _mm256_minmax_round_pd(A, B, C, R) \ + ((__m256d) __builtin_ia32_minmaxpd256_mask_round ((__v4df) (A), \ + (__v4df) (B), \ + (int) (C), \ + (__v4df) (__m256d) \ + _mm256_undefined_pd (), \ + (__mmask8) (-1), \ + (int) (R))) + +#define _mm256_mask_minmax_round_pd(W, U, A, B, C, R) \ + ((__m256d) __builtin_ia32_minmaxpd256_mask_round ((__v4df) (A), \ + (__v4df) (B), \ + (int) (C), \ + (__v4df) (__m256d) (W), \ + (__mmask8) (U), \ + (int) (R))) + +#define _mm256_maskz_minmax_round_pd(U, A, B, C, R) \ + ((__m256d) __builtin_ia32_minmaxpd256_mask_round ((__v4df) (A), \ + (__v4df) (B), \ + (int) (C), \ + (__v4df) (__m256d) \ + _mm256_setzero_pd (), \ + (__mmask8) (U), \ + (int) (R))) + +#define _mm_minmax_ph(A, B, C) \ + ((__m128h) __builtin_ia32_minmaxph128_mask ((__v8hf) (A), \ + (__v8hf) (B), \ + (int) (C), \ + (__v8hf) (__m128h) \ + _mm_undefined_ph (), \ + (__mmask8) (-1))) + +#define _mm_mask_minmax_ph(W, U, A, B, C) \ + ((__m128h) __builtin_ia32_minmaxph128_mask ((__v8hf) (A), \ + (__v8hf) (B), \ + (int) (C), \ + (__v8hf) (__m128h) (W), \ + (__mmask8) (U))) + +#define _mm_maskz_minmax_ph(U, A, B, C) \ + ((__m128h) __builtin_ia32_minmaxph128_mask ((__v8hf) (A), \ + (__v8hf) (B), \ + (int) (C), \ + (__v8hf) (__m128h) \ + _mm_setzero_ph (), \ + (__mmask8) (U))) + +#define _mm256_minmax_ph(A, B, C) \ + ((__m256h) __builtin_ia32_minmaxph256_mask_round ((__v16hf) (A), \ + (__v16hf) (B), \ + (int) (C), \ + (__v16hf) (__m256h) \ + _mm256_undefined_ph (), \ + (__mmask16) (-1), \ + _MM_FROUND_CUR_DIRECTION)) + +#define _mm256_mask_minmax_ph(W, U, A, B, C) \ + ((__m256h) __builtin_ia32_minmaxph256_mask_round ((__v16hf) (A), \ + (__v16hf) (B), \ + (int) (C), \ + (__v16hf) (__m256h) (W), \ + (__mmask16) (U), \ + _MM_FROUND_CUR_DIRECTION)) + +#define _mm256_maskz_minmax_ph(U, A, B, C) \ + ((__m256h) __builtin_ia32_minmaxph256_mask_round ((__v16hf) (A), \ + (__v16hf) (B), \ + (int) (C), \ + (__v16hf) (__m256h) \ + _mm256_setzero_ph (), \ + (__mmask16) (U), \ + _MM_FROUND_CUR_DIRECTION)) + +#define _mm256_minmax_round_ph(A, B, C, R) \ + ((__m256h) __builtin_ia32_minmaxph256_mask_round ((__v16hf) (A), \ + (__v16hf) (B), \ + (int) (C), \ + (__v16hf) (__m256h) \ + _mm256_undefined_ph (), \ + (__mmask16) (-1), \ + (int) (R))) + +#define _mm256_mask_minmax_round_ph(W, U, A, B, C, R) \ + ((__m256h) __builtin_ia32_minmaxph256_mask_round ((__v16hf) (A), \ + (__v16hf) (B), \ + (int) (C), \ + (__v16hf) (__m256h) (W), \ + (__mmask16) (U), \ + (int) (R))) + +#define _mm256_maskz_minmax_round_ph(U, A, B, C, R) \ + ((__m256h) __builtin_ia32_minmaxph256_mask_round ((__v16hf) (A), \ + (__v16hf) (B), \ + (int) (C), \ + (__v16hf) (__m256h) \ + _mm256_setzero_ph (), \ + (__mmask16) (U), \ + (int) (R))) + +#define _mm_minmax_ps(A, B, C) \ + ((__m128) __builtin_ia32_minmaxps128_mask ((__v4sf) (A), \ + (__v4sf) (B), \ + (int) (C), \ + (__v4sf) (__m128) \ + _mm_undefined_ps (), \ + (__mmask8) (-1))) + +#define _mm_mask_minmax_ps(W, U, A, B, C) \ + ((__m128) __builtin_ia32_minmaxps128_mask ((__v4sf) (A), \ + (__v4sf) (B), \ + (int) (C), \ + (__v4sf) (__m128) (W), \ + (__mmask8) (U))) + +#define _mm_maskz_minmax_ps(U, A, B, C) \ + ((__m128) __builtin_ia32_minmaxps128_mask ((__v4sf) (A), \ + (__v4sf) (B), \ + (int) (C), \ + (__v4sf) (__m128) \ + _mm_setzero_ps (), \ + (__mmask8) (U))) + +#define _mm256_minmax_ps(A, B, C) \ + ((__m256) __builtin_ia32_minmaxps256_mask_round ((__v8sf) (A), \ + (__v8sf) (B), \ + (int) (C), \ + (__v8sf) (__m256) \ + _mm256_undefined_ps (), \ + (__mmask8) (-1), \ + _MM_FROUND_CUR_DIRECTION)) + +#define _mm256_mask_minmax_ps(W, U, A, B, C) \ + ((__m256) __builtin_ia32_minmaxps256_mask_round ((__v8sf) (A), \ + (__v8sf) (B), \ + (int) (C), \ + (__v8sf) (__m256) (W), \ + (__mmask8) (U), \ + _MM_FROUND_CUR_DIRECTION)) + +#define _mm256_maskz_minmax_ps(U, A, B, C) \ + ((__m256) __builtin_ia32_minmaxps256_mask_round ((__v8sf) (A), \ + (__v8sf) (B), \ + (int) (C), \ + (__v8sf) (__m256) \ + _mm256_setzero_ps (), \ + (__mmask8) (U), \ + _MM_FROUND_CUR_DIRECTION)) + +#define _mm256_minmax_round_ps(A, B, C, R) \ + ((__m256) __builtin_ia32_minmaxps256_mask_round ((__v8sf) (A), \ + (__v8sf) (B), \ + (int) (C), \ + (__v8sf) (__m256) \ + _mm256_undefined_ps (), \ + (__mmask8) (-1), \ + (int) (R))) + +#define _mm256_mask_minmax_round_ps(W, U, A, B, C, R) \ + ((__m256) __builtin_ia32_minmaxps256_mask_round ((__v8sf) (A), \ + (__v8sf) (B), \ + (int) (C), \ + (__v8sf) (__m256) (W), \ + (__mmask8) (U), \ + (int) (R))) + +#define _mm256_maskz_minmax_round_ps(U, A, B, C, R) \ + ((__m256) __builtin_ia32_minmaxps256_mask_round ((__v8sf) (A), \ + (__v8sf) (B), \ + (int) (C), \ + (__v8sf) (__m256) \ + _mm256_setzero_ps (), \ + (__mmask8) (U), \ + (int) (R))) + +#define _mm_minmax_round_sd(A, B, C, R) \ + ((__m128d) __builtin_ia32_minmaxsd_mask_round ((__v2df) (A), \ + (__v2df) (B), \ + (int) (C), \ + (__v2df) (__m128d) \ + _mm_undefined_pd (), \ + (__mmask8) (-1), \ + (int) (R))) + +#define _mm_mask_minmax_round_sd(W, U, A, B, C, R) \ + ((__m128d) __builtin_ia32_minmaxsd_mask_round ((__v2df) (A), \ + (__v2df) (B), \ + (int) (C), \ + (__v2df) (__m128d) (W), \ + (__mmask8) (U), \ + (int) (R))) + +#define _mm_maskz_minmax_round_sd(U, A, B, C, R) \ + ((__m128d) __builtin_ia32_minmaxsd_mask_round ((__v2df) (A), \ + (__v2df)(B), \ + (int) (C), \ + (__v2df) (__m128d) \ + _mm_setzero_pd (), \ + (__mmask8) (U), \ + (int) (R))) + +#define _mm_minmax_round_sh(A, B, C, R) \ + ((__m128h) __builtin_ia32_minmaxsh_mask_round ((__v8hf) (A), \ + (__v8hf) (B), \ + (int) (C), \ + (__v8hf) (__m128h) \ + _mm_undefined_ph (), \ + (__mmask8) (-1), \ + (int) (R))) + +#define _mm_mask_minmax_round_sh(W, U, A, B, C, R) \ + ((__m128h) __builtin_ia32_minmaxsh_mask_round ((__v8hf) (A), \ + (__v8hf) (B), \ + (int) (C), \ + (__v8hf) (__m128h) (W), \ + (__mmask8) (U), \ + (int) (R))) + +#define _mm_maskz_minmax_round_sh(U, A, B, C, R) \ + ((__m128h) __builtin_ia32_minmaxsh_mask_round ((__v8hf) (A), \ + (__v8hf) (B), \ + (int) (C), \ + (__v8hf) (__m128h) \ + _mm_setzero_ph (), \ + (__mmask8) (U), \ + (int) (R))) + +#define _mm_minmax_round_ss(A, B, C, R) \ + ((__m128) __builtin_ia32_minmaxss_mask_round ((__v4sf) (A), \ + (__v4sf) (B), \ + (int) (C), \ + (__v4sf) (__m128) \ + _mm_undefined_ps (), \ + (__mmask8) (-1), \ + (int) (R))) + +#define _mm_mask_minmax_round_ss(W, U, A, B, C, R) \ + ((__m128) __builtin_ia32_minmaxss_mask_round ((__v4sf) (A), \ + (__v4sf) (B), \ + (int) (C), \ + (__v4sf) (__m128) (W), \ + (__mmask8) (U), \ + (int) (R))) + +#define _mm_maskz_minmax_round_ss(U, A, B, C, R) \ + ((__m128) __builtin_ia32_minmaxss_mask_round ((__v4sf) (A), \ + (__v4sf) (B), \ + (int) (C), \ + (__v4sf)(__m128) \ + _mm_setzero_ps (), \ + (__mmask8) (U), \ + (int) (R))) + +#define _mm_minmax_sd(A, B, C) \ + ((__m128d) __builtin_ia32_minmaxsd_mask_round ((__v2df) (A), \ + (__v2df) (B), \ + (int) (C), \ + (__v2df) (__m128d) \ + _mm_undefined_pd (), \ + (__mmask8) (-1), \ + _MM_FROUND_CUR_DIRECTION)) + +#define _mm_mask_minmax_sd(W, U, A, B, C) \ + ((__m128d) __builtin_ia32_minmaxsd_mask_round ((__v2df) (A), \ + (__v2df) (B), \ + (int) (C), \ + (__v2df) (__m128d) (W), \ + (__mmask8) (U), \ + _MM_FROUND_CUR_DIRECTION)) + +#define _mm_maskz_minmax_sd(U, A, B, C) \ + ((__m128d) __builtin_ia32_minmaxsd_mask_round ((__v2df) (A), \ + (__v2df) (B), \ + (int) (C), \ + (__v2df) (__m128d) \ + _mm_setzero_pd (), \ + (__mmask8) (U), \ + _MM_FROUND_CUR_DIRECTION)) + +#define _mm_minmax_sh(A, B, C) \ + ((__m128h) __builtin_ia32_minmaxsh_mask_round ((__v8hf) (A), \ + (__v8hf) (B), \ + (int) (C), \ + (__v8hf) (__m128h) \ + _mm_undefined_ph (), \ + (__mmask8) (-1), \ + _MM_FROUND_CUR_DIRECTION)) + +#define _mm_mask_minmax_sh(W, U, A, B, C) \ + ((__m128h) __builtin_ia32_minmaxsh_mask_round ((__v8hf) (A), \ + (__v8hf) (B), \ + (int) (C), \ + (__v8hf) (__m128h) (W), \ + (__mmask8) (U), \ + _MM_FROUND_CUR_DIRECTION)) + +#define _mm_maskz_minmax_sh(U, A, B, C) \ + ((__m128h) __builtin_ia32_minmaxsh_mask_round ((__v8hf) (A), \ + (__v8hf) (B), \ + (int) (C), \ + (__v8hf) (__m128h) \ + _mm_setzero_ph (), \ + (__mmask8) (U), \ + _MM_FROUND_CUR_DIRECTION)) + +#define _mm_minmax_ss(A, B, C) \ + ((__m128) __builtin_ia32_minmaxss_mask_round ((__v4sf) (A), \ + (__v4sf) (B), \ + (int) (C), \ + (__v4sf) (__m128) \ + _mm_undefined_ps (), \ + (__mmask8) (-1), \ + _MM_FROUND_CUR_DIRECTION)) + +#define _mm_mask_minmax_ss(W, U, A, B, C) \ + ((__m128) __builtin_ia32_minmaxss_mask_round ((__v4sf) (A), \ + (__v4sf) (B), \ + (int) (C), \ + (__v4sf) (__m128) (W), \ + (__mmask8) (U), \ + _MM_FROUND_CUR_DIRECTION)) + +#define _mm_maskz_minmax_ss(U, A, B, C) \ + ((__m128) __builtin_ia32_minmaxss_mask_round ((__v4sf) (A), \ + (__v4sf) (B), \ + (int) (C), \ + (__v4sf) (__m128) \ + _mm_setzero_ps (), \ + (__mmask8) (U), \ + _MM_FROUND_CUR_DIRECTION)) + +#endif + +#ifdef __DISABLE_AVX10_2_256__ +#undef __DISABLE_AVX10_2_256__ +#pragma GCC pop_options +#endif /* __DISABLE_AVX10_2_256__ */ + +#endif /* _AVX10_2MINMAXINTRIN_H_INCLUDED */ diff --git a/gcc/config/i386/i386-builtin-types.def b/gcc/config/i386/i386-builtin-types.def index b2978591287..290f6e649a9 100644 --- a/gcc/config/i386/i386-builtin-types.def +++ b/gcc/config/i386/i386-builtin-types.def @@ -1499,3 +1499,11 @@ DEF_FUNCTION_TYPE (V32HI, V32BF, V32HI, USI) DEF_FUNCTION_TYPE (V16SI, V16SF, V16SI, UHI, INT) DEF_FUNCTION_TYPE (V16HI, V16BF, V16HI, UHI, INT) DEF_FUNCTION_TYPE (V32HI, V32BF, V32HI, USI, INT) +DEF_FUNCTION_TYPE (V8BF, V8BF, V8BF, INT, V8BF, UQI) +DEF_FUNCTION_TYPE (V16BF, V16BF, V16BF, INT, V16BF, UHI) +DEF_FUNCTION_TYPE (V32BF, V32BF, V32BF, INT, V32BF, USI) +DEF_FUNCTION_TYPE (V8HF, V8HF, V8HF, INT, V8HF, UQI) +DEF_FUNCTION_TYPE (V8DF, V8DF, V8DF, INT, V8DF, UQI, INT) +DEF_FUNCTION_TYPE (V32HF, V32HF, V32HF, INT, V32HF, USI, INT) +DEF_FUNCTION_TYPE (V16HF, V16HF, V16HF, INT, V16HF, UHI, INT) +DEF_FUNCTION_TYPE (V16SF, V16SF, V16SF, INT, V16SF, UHI, INT) diff --git a/gcc/config/i386/i386-builtin.def b/gcc/config/i386/i386-builtin.def index d39274bc323..151ccf4f252 100644 --- a/gcc/config/i386/i386-builtin.def +++ b/gcc/config/i386/i386-builtin.def @@ -3298,6 +3298,12 @@ BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_vcvttps2dqsv4sf_mask, " BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_vcvttps2qqsv2di_mask, "__builtin_ia32_cvttps2qqs128_mask", IX86_BUILTIN_VCVTTPS2QQS128_MASK, UNKNOWN, (int) V2DI_FTYPE_V4SF_V2DI_UQI) BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_vcvttps2udqsv4sf_mask, "__builtin_ia32_cvttps2udqs128_mask", IX86_BUILTIN_VCVTTPS2UDQS128_MASK, UNKNOWN, (int) V4SI_FTYPE_V4SF_V4SI_UQI) BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_vcvttps2uqqsv2di_mask, "__builtin_ia32_cvttps2uqqs128_mask", IX86_BUILTIN_VCVTTPS2UQQS128_MASK, UNKNOWN, (int) V2DI_FTYPE_V4SF_V2DI_UQI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_minmaxnepbf16_v8bf_mask, "__builtin_ia32_minmaxnepbf16128_mask", IX86_BUILTIN_MINMAXNEPBF16128_MASK, UNKNOWN, (int) V8BF_FTYPE_V8BF_V8BF_INT_V8BF_UQI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_minmaxnepbf16_v16bf_mask, "__builtin_ia32_minmaxnepbf16256_mask", IX86_BUILTIN_MINMAXNEPBF16256_MASK, UNKNOWN, (int) V16BF_FTYPE_V16BF_V16BF_INT_V16BF_UHI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_512, CODE_FOR_avx10_2_minmaxnepbf16_v32bf_mask, "__builtin_ia32_minmaxnepbf16512_mask", IX86_BUILTIN_MINMAXNEPBF16512_MASK, UNKNOWN, (int) V32BF_FTYPE_V32BF_V32BF_INT_V32BF_USI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_minmaxpv2df_mask, "__builtin_ia32_minmaxpd128_mask", IX86_BUILTIN_MINMAXPD128_MASK, UNKNOWN, (int) V2DF_FTYPE_V2DF_V2DF_INT_V2DF_UQI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_minmaxpv8hf_mask, "__builtin_ia32_minmaxph128_mask", IX86_BUILTIN_MINMAXPH128_MASK, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_INT_V8HF_UQI) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_minmaxpv4sf_mask, "__builtin_ia32_minmaxps128_mask", IX86_BUILTIN_MINMAXPS128_MASK, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_INT_V4SF_UQI) /* Builtins with rounding support. */ BDESC_END (ARGS, ROUND_ARGS) @@ -3774,7 +3780,6 @@ BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_cvttps2ibsv8sf_mask_rou BDESC (0, OPTION_MASK_ISA2_AVX10_2_512, CODE_FOR_avx10_2_cvttps2ibsv16sf_mask_round, "__builtin_ia32_cvttps2ibs512_mask_round", IX86_BUILTIN_CVTTPS2IBS512_MASK_ROUND, UNKNOWN, (int) V16SI_FTYPE_V16SF_V16SI_UHI_INT) BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_cvttps2iubsv8sf_mask_round, "__builtin_ia32_cvttps2iubs256_mask_round", IX86_BUILTIN_CVTTPS2IUBS256_MASK_ROUND, UNKNOWN, (int) V8SI_FTYPE_V8SF_V8SI_UQI_INT) BDESC (0, OPTION_MASK_ISA2_AVX10_2_512, CODE_FOR_avx10_2_cvttps2iubsv16sf_mask_round, "__builtin_ia32_cvttps2iubs512_mask_round", IX86_BUILTIN_CVTTPS2IUBS512_MASK_ROUND, UNKNOWN, (int) V16SI_FTYPE_V16SF_V16SI_UHI_INT) - BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_vcvttpd2dqsv4df_mask_round, "__builtin_ia32_cvttpd2dqs256_mask_round", IX86_BUILTIN_VCVTTPD2DQS256_MASK_ROUND, UNKNOWN, (int) V4SI_FTYPE_V4DF_V4SI_UQI_INT) BDESC (0, OPTION_MASK_ISA2_AVX10_2_512, CODE_FOR_avx10_2_vcvttpd2dqsv8df_mask_round, "__builtin_ia32_cvttpd2dqs512_mask_round", IX86_BUILTIN_VCVTTPD2DQS512_MASK_ROUND, UNKNOWN, (int) V8SI_FTYPE_V8DF_V8SI_QI_INT) BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_vcvttpd2qqsv4df_mask_round, "__builtin_ia32_cvttpd2qqs256_mask_round", IX86_BUILTIN_VCVTTPD2QQS256_MASK_ROUND, UNKNOWN, (int) V4DI_FTYPE_V4DF_V4DI_UQI_INT) @@ -3799,6 +3804,15 @@ BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_vcvttss2sissi_round, "_ BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_vcvttss2sisdi_round, "__builtin_ia32_cvttss2sis64_round", IX86_BUILTIN_VCVTTSS2SIS64_ROUND, UNKNOWN, (int) INT64_FTYPE_V4SF_INT) BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_vcvttss2usissi_round, "__builtin_ia32_cvttss2usis32_round", IX86_BUILTIN_VCVTTSS2USIS32_ROUND, UNKNOWN, (int) INT_FTYPE_V4SF_INT) BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_vcvttss2usisdi_round, "__builtin_ia32_cvttss2usis64_round", IX86_BUILTIN_VCVTTSS2USIS64_ROUND, UNKNOWN, (int) INT64_FTYPE_V4SF_INT) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_512, CODE_FOR_avx10_2_minmaxpv8df_mask_round, "__builtin_ia32_minmaxpd512_mask_round", IX86_BUILTIN_MINMAXPD512_MASK_ROUND, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_INT_V8DF_UQI_INT) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_512, CODE_FOR_avx10_2_minmaxpv32hf_mask_round, "__builtin_ia32_minmaxph512_mask_round", IX86_BUILTIN_MINMAXPH512_MASK_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_INT_V32HF_USI_INT) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_512, CODE_FOR_avx10_2_minmaxpv16sf_mask_round, "__builtin_ia32_minmaxps512_mask_round", IX86_BUILTIN_MINMAXPS512_MASK_ROUND, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_INT_V16SF_UHI_INT) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_minmaxpv4df_mask_round, "__builtin_ia32_minmaxpd256_mask_round", IX86_BUILTIN_MINMAXPD256_MASK_ROUND, UNKNOWN, (int) V4DF_FTYPE_V4DF_V4DF_INT_V4DF_UQI_INT) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_minmaxpv16hf_mask_round, "__builtin_ia32_minmaxph256_mask_round", IX86_BUILTIN_MINMAXPH256_MASK_ROUND, UNKNOWN, (int) V16HF_FTYPE_V16HF_V16HF_INT_V16HF_UHI_INT) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_minmaxpv8sf_mask_round, "__builtin_ia32_minmaxps256_mask_round", IX86_BUILTIN_MINMAXPS256_MASK_ROUND, UNKNOWN, (int) V8SF_FTYPE_V8SF_V8SF_INT_V8SF_UQI_INT) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_minmaxsv2df_mask_round, "__builtin_ia32_minmaxsd_mask_round", IX86_BUILTIN_MINMAXSD_MASK_ROUND, UNKNOWN, (int) V2DF_FTYPE_V2DF_V2DF_INT_V2DF_UQI_INT) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_minmaxsv8hf_mask_round, "__builtin_ia32_minmaxsh_mask_round", IX86_BUILTIN_MINMAXSH_MASK_ROUND, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_INT_V8HF_UQI_INT) +BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_minmaxsv4sf_mask_round, "__builtin_ia32_minmaxss_mask_round", IX86_BUILTIN_MINMAXSS_MASK_ROUND, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_INT_V4SF_UQI_INT) BDESC_END (ROUND_ARGS, MULTI_ARG) diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc index 9d522818ef5..0322ef003d1 100644 --- a/gcc/config/i386/i386-expand.cc +++ b/gcc/config/i386/i386-expand.cc @@ -11954,6 +11954,10 @@ ix86_expand_args_builtin (const struct builtin_description *d, case V8HI_FTYPE_V8HI_V8HI_INT_V8HI_INT: case V4SI_FTYPE_V4SI_V4SI_INT_V4SI_INT: case V2DI_FTYPE_V2DI_V2DI_INT_V2DI_INT: + case V8BF_FTYPE_V8BF_V8BF_INT_V8BF_UQI: + case V16BF_FTYPE_V16BF_V16BF_INT_V16BF_UHI: + case V32BF_FTYPE_V32BF_V32BF_INT_V32BF_USI: + case V8HF_FTYPE_V8HF_V8HF_INT_V8HF_UQI: nargs = 5; mask_pos = 1; nargs_constant = 2; @@ -12604,6 +12608,10 @@ ix86_expand_round_builtin (const struct builtin_description *d, case V2DF_FTYPE_V2DF_V2DF_INT_V2DF_UQI_INT: case V4SF_FTYPE_V4SF_V4SF_INT_V4SF_UQI_INT: case V8HF_FTYPE_V8HF_V8HF_INT_V8HF_UQI_INT: + case V8DF_FTYPE_V8DF_V8DF_INT_V8DF_UQI_INT: + case V32HF_FTYPE_V32HF_V32HF_INT_V32HF_USI_INT: + case V16HF_FTYPE_V16HF_V16HF_INT_V16HF_UHI_INT: + case V16SF_FTYPE_V16SF_V16SF_INT_V16SF_UHI_INT: nargs = 6; nargs_constant = 4; break; diff --git a/gcc/config/i386/immintrin.h b/gcc/config/i386/immintrin.h index c8e37507088..0d5af155c36 100644 --- a/gcc/config/i386/immintrin.h +++ b/gcc/config/i386/immintrin.h @@ -155,4 +155,9 @@ #include #include + +#include + +#include + #endif /* _IMMINTRIN_H_INCLUDED */ diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index 7c40079047a..956cdba55d3 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -249,6 +249,8 @@ UNSPEC_VCVTTPS2IUBS UNSPEC_SFIX_SATURATION UNSPEC_UFIX_SATURATION + UNSPEC_MINMAXNEPBF16 + UNSPEC_MINMAX ]) (define_c_enum "unspecv" [ @@ -501,6 +503,11 @@ (V16SF "TARGET_EVEX512") (V8SF "TARGET_AVX512VL") (V4SF "TARGET_AVX512VL") (V8DF "TARGET_EVEX512") (V4DF "TARGET_AVX512VL") (V2DF "TARGET_AVX512VL")]) +(define_mode_iterator VFH_AVX10_2 + [(V32HF "TARGET_AVX10_2_512") V16HF V8HF + (V16SF "TARGET_AVX10_2_512") V8SF V4SF + (V8DF "TARGET_AVX10_2_512") V4DF V2DF]) + (define_mode_iterator VF2_AVX512VL [(V8DF "TARGET_EVEX512") (V4DF "TARGET_AVX512VL") (V2DF "TARGET_AVX512VL")]) @@ -32388,3 +32395,42 @@ [(set_attr "type" "ssecvt") (set_attr "prefix" "evex") (set_attr "mode" "")]) + +(define_insn "avx10_2_minmaxnepbf16_" + [(set (match_operand:VBF_AVX10_2 0 "register_operand" "=v") + (unspec:VBF_AVX10_2 + [(match_operand:VBF_AVX10_2 1 "register_operand" "v") + (match_operand:VBF_AVX10_2 2 "bcst_vector_operand" "vmBr") + (match_operand:SI 3 "const_0_to_255_operand")] + UNSPEC_MINMAXNEPBF16))] + "TARGET_AVX10_2_256" + "vminmaxnepbf16\t{%3, %2, %1, %0|%0, %1, %2, %3}" + [(set_attr "prefix" "evex") + (set_attr "mode" "")]) + +(define_insn "avx10_2_minmaxp" + [(set (match_operand:VFH_AVX10_2 0 "register_operand" "=v") + (unspec:VFH_AVX10_2 + [(match_operand:VFH_AVX10_2 1 "register_operand" "v") + (match_operand:VFH_AVX10_2 2 "" "") + (match_operand:SI 3 "const_0_to_255_operand")] + UNSPEC_MINMAX))] + "TARGET_AVX10_2_256" + "vminmax\t{%3, %2, %1, %0|%0, %1, %2, %3}" + [(set_attr "prefix" "evex") + (set_attr "mode" "")]) + +(define_insn "avx10_2_minmaxs" + [(set (match_operand:VFH_128 0 "register_operand" "=v") + (vec_merge:VFH_128 + (unspec:VFH_128 + [(match_operand:VFH_128 1 "register_operand" "v") + (match_operand:VFH_128 2 "" "") + (match_operand:SI 3 "const_0_to_255_operand")] + UNSPEC_MINMAX) + (match_dup 1) + (const_int 1)))] + "TARGET_AVX10_2_256" + "vminmax\t{%3, %2, %1, %0|%0, %1, %2, %3}" + [(set_attr "prefix" "evex") + (set_attr "mode" "")]) diff --git a/gcc/testsuite/gcc.target/i386/avx-1.c b/gcc/testsuite/gcc.target/i386/avx-1.c index 30c071adf13..b954374fe5f 100644 --- a/gcc/testsuite/gcc.target/i386/avx-1.c +++ b/gcc/testsuite/gcc.target/i386/avx-1.c @@ -1081,6 +1081,25 @@ #define __builtin_ia32_cvttss2usis64_round(A, B) __builtin_ia32_cvttss2usis64_round(A, 8) #endif +/* avx10_2-512minmaxintrin.h */ +#define __builtin_ia32_minmaxpd512_mask_round(A, B, C, D, E, F) __builtin_ia32_minmaxpd512_mask_round (A, B, 4, D, E, 4) +#define __builtin_ia32_minmaxph512_mask_round(A, B, C, D, E, F) __builtin_ia32_minmaxph512_mask_round (A, B, 4, D, E, 4) +#define __builtin_ia32_minmaxps512_mask_round(A, B, C, D, E, F) __builtin_ia32_minmaxps512_mask_round (A, B, 4, D, E, 4) +#define __builtin_ia32_minmaxnepbf16512_mask(A, B, C, W, U) __builtin_ia32_minmaxnepbf16512_mask (A, B, 4, W, U) + +/* avx10_2minmaxintrin.h */ +#define __builtin_ia32_minmaxsd_mask_round(A, B, C, D, E, F) __builtin_ia32_minmaxsd_mask_round (A, B, 4, D, E, 4) +#define __builtin_ia32_minmaxsh_mask_round(A, B, C, D, E, F) __builtin_ia32_minmaxsh_mask_round (A, B, 4, D, E, 4) +#define __builtin_ia32_minmaxss_mask_round(A, B, C, D, E, F) __builtin_ia32_minmaxss_mask_round (A, B, 4, D, E, 4) +#define __builtin_ia32_minmaxnepbf16128_mask(A, B, C, D, E) __builtin_ia32_minmaxnepbf16128_mask (A, B, 4, D, E) +#define __builtin_ia32_minmaxnepbf16256_mask(A, B, C, D, E) __builtin_ia32_minmaxnepbf16256_mask (A, B, 4, D, E) +#define __builtin_ia32_minmaxpd128_mask(A, B, C, D, E) __builtin_ia32_minmaxpd128_mask (A, B, 4, D, E) +#define __builtin_ia32_minmaxpd256_mask_round(A, B, C, D, E, F) __builtin_ia32_minmaxpd256_mask_round (A, B, 4, D, E, 4) +#define __builtin_ia32_minmaxph128_mask(A, B, C, D, E) __builtin_ia32_minmaxph128_mask (A, B, 4, D, E) +#define __builtin_ia32_minmaxph256_mask_round(A, B, C, D, E, F) __builtin_ia32_minmaxph256_mask_round (A, B, 4, D, E, 4) +#define __builtin_ia32_minmaxps128_mask(A, B, C, D, E) __builtin_ia32_minmaxps128_mask (A, B, 4, D, E) +#define __builtin_ia32_minmaxps256_mask_round(A, B, C, D, E, F) __builtin_ia32_minmaxps256_mask_round (A, B, 4, D, E, 4) + #include #include #include diff --git a/gcc/testsuite/gcc.target/i386/avx10-minmax-helper.h b/gcc/testsuite/gcc.target/i386/avx10-minmax-helper.h new file mode 100644 index 00000000000..e799975fe67 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx10-minmax-helper.h @@ -0,0 +1,257 @@ +#ifndef AVX10MINMAX_HELPERFUNC_INCLUDED +#define AVX10MINMAX_HELPERFUNC_INCLUDED + +#include +#include +#include +#include +#include "avx512f-helper.h" +#define SNAN_float __builtin_nansf ("") +#define SNAN_flag_float 0x7fa00000 +#define QNAN_float __builtin_nanf ("") +#define QNAN_flag_float 0x7fc00000 +#define SNAN_double ((double)__builtin_nans ("")) +#define SNAN_flag_double 0x7ff4000000000000 +#define QNAN_double ((double)__builtin_nan ("")) +#define QNAN_flag_double 0x7ff8000000000000 +#define SNAN__Float16 ((_Float16)__builtin_nansf16 ("")) +#define SNAN_flag__Float16 0x7d00 +#define QNAN__Float16 ((_Float16)__builtin_nanf16 ("")) +#define QNAN_flag__Float16 0x7e00 +#define SNAN___bf16 ((__bf16)__builtin_nansf16b ("")) +#define SNAN_flag___bf16 0x7fa0 +#define QNAN___bf16 ((__bf16)__builtin_nanf ("")) +#define QNAN_flag___bf16 0x7fc0 +#define ISNAN(x) (x != x) +#define ABS_float(x) fabsf (x) +#define ABS_double(x) fabs (x) +#define ABS__Float16(x) __builtin_fabsf16 (x) +#define ABS___bf16(x) __builtin_fabsf (x) + +#define Union_Data(typef, typei) \ +typedef union \ +{ \ + typef f; \ + typei i; \ +} union_##typef; + +Union_Data(float, int) +Union_Data(double, long long) +Union_Data(__bf16, short) +Union_Data(_Float16, short) + +#define IS_SNAN(union_x, type) ((union_x.i & SNAN_flag_##type) == union_snan.i) + +#define IS_QNAN(union_x, type) ((union_x.i & QNAN_flag_##type) == union_qnan.i) + +#define CHECK_EXP_MINMAX(UNION_TYPE, VALUE_TYPE, INT_TYPE) \ +static int \ +__attribute__((noinline, unused)) \ +check_minmax_##UNION_TYPE (UNION_TYPE u, const VALUE_TYPE *v) \ +{ \ + int i; \ + int err = 0; \ + for (i = 0; i < ARRAY_SIZE (u.a); i++) \ + { \ + union_##VALUE_TYPE union_x, union_y; \ + union_x.f = u.a[i]; \ + union_y.f = v[i]; \ + if (union_x.i != union_y.i) \ + { \ + err++; \ + PRINTF ("%i: " "%f" " != " "%f" "\n", \ + i, v[i], u.a[i]); \ + } \ + } \ + return err; \ +} + +#if defined (AVX10_512BIT) +CHECK_EXP_MINMAX (union512, float, int) +CHECK_EXP_MINMAX (union512d, double, long int) +CHECK_EXP_MINMAX (union512bf16_bf, __bf16, short int) +CHECK_EXP_MINMAX (union512h, _Float16, short int) +#endif +CHECK_EXP_MINMAX (union256, float, int) +CHECK_EXP_MINMAX (union256d, double, long int) +CHECK_EXP_MINMAX (union128, float, int) +CHECK_EXP_MINMAX (union128d, double, long int) +CHECK_EXP_MINMAX (union256bf16_bf, __bf16, short int) +CHECK_EXP_MINMAX (union128bf16_bf, __bf16, short int) +CHECK_EXP_MINMAX (union256h, _Float16, short int) +CHECK_EXP_MINMAX (union128h, _Float16, short int) + +#define UNION_CHECK_MINMAX(SIZE, NAME) EVAL(check_minmax_union, SIZE, NAME) + +#define CMP(res, x, y, type, value, op1, np, op2, zero, num, mag) \ +{ \ + union_##type union_a, union_b; \ + union_a.f = x; \ + union_b.f = y; \ + union_##type union_snan, union_qnan; \ + union_snan.f = SNAN_##type; \ + union_qnan.f = QNAN_##type; \ + bool flag = false; \ + if(num) \ + { \ + if(ISNAN(x) && ISNAN(y)) \ + { \ + if(IS_SNAN(union_a,type) || (IS_QNAN(union_a,type) && IS_QNAN(union_b,type))) \ + { \ + union_a.i |= value; \ + res = union_a.f; \ + flag = true; \ + } \ + else \ + { \ + union_b.i |= value; \ + res = union_b.f; \ + flag = true; \ + } \ + } \ + else if(ISNAN(x)) \ + { \ + res = y; \ + flag = true; \ + } \ + else if(ISNAN(y)) \ + { \ + res = x; \ + flag = true; \ + } \ + } \ + else \ + { \ + if(IS_SNAN(union_a,type) || (IS_QNAN(union_a,type) && !IS_SNAN(union_b,type))) \ + { \ + union_a.i |= value; \ + res = union_a.f; \ + flag = true; \ + } \ + else if(ISNAN(y)) \ + { \ + union_b.i |= value; \ + res = union_b.f; \ + flag = true; \ + } \ + } \ + if(!flag) \ + { \ + if(!mag) \ + { \ + if((x == zero && y == - zero) || (x == - zero && y == zero)) \ + res = np zero; \ + else if(x op1 y) \ + res = x; \ + else \ + res = y; \ + } \ + else \ + { \ + if(ABS_##type(x) op2 ABS_##type(y)) \ + res = x; \ + else if(ABS_##type(y) op2 ABS_##type(x)) \ + res = y; \ + else \ + { \ + if((x == zero && y == - zero) || (x == - zero && y == zero)) \ + res = np zero; \ + else if(x op1 y) \ + res = x; \ + else \ + res = y; \ + } \ + } \ + } \ +} + +#define MINMAX(type, value, zero) \ +type \ +minmax_##type (type * a, type * b, int imm) \ +{ \ + int op_select = imm & 0x03; \ + int sign_control = (imm & 0x0C) >> 2; \ + int nan_prop_select = (imm & 0x10) >> 4; \ + type tmp; \ + if(nan_prop_select == 0) \ + if(op_select == 0) \ + CMP(tmp, *a, *b, type, value, <=, -, <, zero, false, false) \ + else if(op_select == 1) \ + CMP(tmp, *a, *b, type, value, >=, +, >, zero, false, false) \ + else if(op_select == 2) \ + CMP(tmp, *a, *b, type, value, <=, -, <, zero, false, true) \ + else \ + CMP(tmp, *a, *b, type, value, >=, +, >, zero, false, true) \ + else \ + if(op_select == 0) \ + CMP(tmp, *a, *b, type, value, <=, -, <, zero, true, false) \ + else if(op_select == 1) \ + CMP(tmp, *a, *b, type, value, >=, +, >, zero, true, false) \ + else if(op_select == 2) \ + CMP(tmp, *a, *b, type, value, <=, -, <, zero, true, true) \ + else \ + CMP(tmp, *a, *b, type, value, >=, +, >, zero, true, true) \ + if(!ISNAN(tmp)) \ + if(sign_control == 0 && !ISNAN(*a)) \ + if((tmp < 0 && *a > 0) || (tmp > 0 && *a < 0)) \ + tmp = -tmp; \ + else if(sign_control == 2) \ + if(tmp < 0) tmp = -tmp; \ + else if(sign_control == 3) \ + if(tmp > 0) tmp = -tmp; \ + return tmp; \ +} + + +MINMAX(double, 0x7ff8000000000000, 0.0) +MINMAX(float, 0x7fc00000, 0.0f) +MINMAX(_Float16, 0x7e00, 0.0f16) +MINMAX(__bf16, 0x7fc0, 0.0bf16) + +#define UNIT_TEST(R, InsnSuffix, MaskType, type) \ + sign = -1; \ + for (i = 0; i < SIZE; i++) \ + { \ + src1.a[i] = i % 2 ? SNAN_##type : 1.5 + 34.67 * i * sign; \ + src2.a[i] = i % 3 ? QNAN_##type : -22.17 * i * sign; \ + sign = sign * -1; \ + } \ + for (i = 0; i < SIZE; i++) \ + res2.a[i] = DEFAULT_VALUE; \ + res1.x = INTRINSIC(_minmax_##InsnSuffix) (src1.x, src2.x, R); \ + res2.x = INTRINSIC(_mask_minmax_##InsnSuffix) (res2.x, mask, src1.x, src2.x, R); \ + res3.x = INTRINSIC(_maskz_minmax_##InsnSuffix) (mask, src1.x, src2.x, R); \ + CALC (res_ref, src1.a, src2.a, R); \ + if (UNION_CHECK_MINMAX (AVX512F_LEN, MaskType) (res1, res_ref)) \ + abort(); \ + MASK_MERGE (MaskType) (res_ref, mask, SIZE); \ + if (UNION_CHECK_MINMAX (AVX512F_LEN, MaskType) (res2, res_ref)) \ + abort(); \ + MASK_ZERO (MaskType) (res_ref, mask, SIZE); \ + if (UNION_CHECK_MINMAX (AVX512F_LEN, MaskType) (res3, res_ref)) \ + abort(); + +#define SCALAR_UNIT_TEST(R, InsnSuffix, MaskType, type) \ + sign = -1; \ + for (i = 0; i < SIZE; i++) \ + { \ + src1.a[i] = i % 2 ? SNAN_##type : 1.5 + 34.67 * i * sign; \ + src2.a[i] = i % 3 ? QNAN_##type : -22.17 * i * sign; \ + sign = sign * -1; \ + } \ + for (i = 0; i < SIZE; i++) \ + res2.a[i] = DEFAULT_VALUE; \ + res1.x = _mm_minmax_##InsnSuffix (src1.x, src2.x, R); \ + res2.x = _mm_mask_minmax_##InsnSuffix (res2.x, mask, src1.x, src2.x, R); \ + res3.x = _mm_maskz_minmax_##InsnSuffix (mask, src1.x, src2.x, R); \ + CALC (res_ref, src1.a, src2.a, R); \ + if (UNION_CHECK_MINMAX (128, MaskType) (res1, res_ref)) \ + abort(); \ + MASK_MERGE (MaskType) (res_ref, mask, 1); \ + if (UNION_CHECK_MINMAX (128, MaskType) (res2, res_ref)) \ + abort(); \ + MASK_ZERO (MaskType) (res_ref, mask, 1); \ + if (UNION_CHECK_MINMAX (128, MaskType) (res3, res_ref)) \ + abort(); + +#endif diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-512-minmax-1.c b/gcc/testsuite/gcc.target/i386/avx10_2-512-minmax-1.c new file mode 100644 index 00000000000..a75a5fef011 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx10_2-512-minmax-1.c @@ -0,0 +1,51 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -mavx10.2-512" } */ +/* { dg-final { scan-assembler-times "vminmaxnepbf16\[ \\t\]+\[^\{\n\]*\[^\}\]%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r\]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vminmaxnepbf16\[ \\t\]+\[^\{\n\]*\[^\}\]%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r\]*%zmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vminmaxnepbf16\[ \\t\]+\[^\{\n\]*\[^\}\]%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r\]*%zmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vminmaxph\[ \\t\]+\[^\{\n\]*\[^\}\]%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r\]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 2 } } */ +/* { dg-final { scan-assembler-times "vminmaxph\[ \\t\]+\[^\{\n\]*\[^\}\]%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r\]*%zmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 2 } } */ +/* { dg-final { scan-assembler-times "vminmaxph\[ \\t\]+\[^\{\n\]*\[^\}\]%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r\]*%zmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 2 } } */ +/* { dg-final { scan-assembler-times "vminmaxps\[ \\t\]+\[^\{\n\]*\[^\}\]%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r\]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 2 } } */ +/* { dg-final { scan-assembler-times "vminmaxps\[ \\t\]+\[^\{\n\]*\[^\}\]%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r\]*%zmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 2 } } */ +/* { dg-final { scan-assembler-times "vminmaxps\[ \\t\]+\[^\{\n\]*\[^\}\]%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r\]*%zmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 2 } } */ +/* { dg-final { scan-assembler-times "vminmaxpd\[ \\t\]+\[^\{\n\]*\[^\}\]%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r\]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 2 } } */ +/* { dg-final { scan-assembler-times "vminmaxpd\[ \\t\]+\[^\{\n\]*\[^\}\]%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r\]*%zmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 2 } } */ +/* { dg-final { scan-assembler-times "vminmaxpd\[ \\t\]+\[^\{\n\]*\[^\}\]%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r\]*%zmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 2 } } */ + + +#include + +volatile __m512bh x1; +volatile __m512h x2; +volatile __m512 x3; +volatile __m512d x4; +volatile __mmask32 m32; +volatile __mmask16 m16; +volatile __mmask8 m8; + +void extern +avx10_2_512_test (void) +{ + x1 = _mm512_minmax_nepbh (x1, x1, 100); + x1 = _mm512_mask_minmax_nepbh (x1, m32, x1, x1, 100); + x1 = _mm512_maskz_minmax_nepbh (m32, x1, x1, 100); + x2 = _mm512_minmax_ph (x2, x2, 1); + x2 = _mm512_mask_minmax_ph (x2, m32, x2, x2, 1); + x2 = _mm512_maskz_minmax_ph (m32, x2, x2, 1); + x2 = _mm512_minmax_round_ph (x2, x2, 1, 4); + x2 = _mm512_mask_minmax_round_ph (x2, m32, x2, x2, 1, 4); + x2 = _mm512_maskz_minmax_round_ph (m32, x2, x2, 1, 4); + x3 = _mm512_minmax_ps (x3, x3, 1); + x3 = _mm512_mask_minmax_ps (x3, m16, x3, x3, 1); + x3 = _mm512_maskz_minmax_ps (m16, x3, x3, 1); + x3 = _mm512_minmax_round_ps (x3, x3, 1, 4); + x3 = _mm512_mask_minmax_round_ps (x3, m16, x3, x3, 1, 4); + x3 = _mm512_maskz_minmax_round_ps (m16, x3, x3, 1, 4); + x4 = _mm512_minmax_pd (x4, x4, 100); + x4 = _mm512_mask_minmax_pd (x4, m8, x4, x4, 100); + x4 = _mm512_maskz_minmax_pd (m8, x4, x4, 100); + x4 = _mm512_minmax_round_pd (x4, x4, 100, 4); + x4 = _mm512_mask_minmax_round_pd (x4, m8, x4, x4, 100, 4); + x4 = _mm512_maskz_minmax_round_pd (m8, x4, x4, 100, 4); +} diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-512-vminmaxnepbf16-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-512-vminmaxnepbf16-2.c new file mode 100644 index 00000000000..491a63d1726 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx10_2-512-vminmaxnepbf16-2.c @@ -0,0 +1,35 @@ +/* { dg-do run } */ +/* { dg-options "-fsignaling-nans -mfpmath=sse -O2 -mavx10.2-512" } */ +/* { dg-require-effective-target avx10_2 } */ + +#ifndef AVX10_2 +#define AVX10_2 +#define AVX10_512BIT +#endif +#define SIZE (AVX512F_LEN / 16) +#include "avx10-helper.h" +#include +#include "avx10-minmax-helper.h" + +void static +CALC (__bf16 *r, __bf16 *s1, __bf16 *s2, int R) +{ + for(int i = 0; i < SIZE; i++) + r[i] = minmax___bf16(&s1[i], &s2[i], R); +} + +void +TEST (void) +{ + int i, sign; + UNION_TYPE (AVX512F_LEN, bf16_bf) res1, res2, res3, src1, src2; + MASK_TYPE mask = MASK_VALUE; + __bf16 res_ref[SIZE]; + + UNIT_TEST(0, nepbh, bf16_bf, __bf16); + UNIT_TEST(1, nepbh, bf16_bf, __bf16); + UNIT_TEST(4, nepbh, bf16_bf, __bf16); + UNIT_TEST(5, nepbh, bf16_bf, __bf16); + UNIT_TEST(16, nepbh, bf16_bf, __bf16); + UNIT_TEST(17, nepbh, bf16_bf, __bf16); +} diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-512-vminmaxpd-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-512-vminmaxpd-2.c new file mode 100644 index 00000000000..fe9bb65e6b1 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx10_2-512-vminmaxpd-2.c @@ -0,0 +1,35 @@ +/* { dg-do run } */ +/* { dg-options "-fsignaling-nans -mfpmath=sse -O2 -mavx10.2-512" } */ +/* { dg-require-effective-target avx10_2 } */ + +#ifndef AVX10_2 +#define AVX10_2 +#define AVX10_512BIT +#endif +#define SIZE (AVX512F_LEN / 64) +#include "avx10-helper.h" +#include +#include "avx10-minmax-helper.h" + +void static +CALC (double *r, double *s1, double *s2, int R) +{ + for(int i = 0; i < SIZE; i++) + r[i] = minmax_double(&s1[i], &s2[i], R); +} + +void +TEST (void) +{ + int i, sign; + UNION_TYPE (AVX512F_LEN, d) res1, res2, res3, src1, src2; + MASK_TYPE mask = MASK_VALUE; + double res_ref[SIZE]; + + UNIT_TEST(0, pd, d, double); + UNIT_TEST(1, pd, d, double); + UNIT_TEST(4, pd, d, double); + UNIT_TEST(5, pd, d, double); + UNIT_TEST(16, pd, d, double); + UNIT_TEST(17, pd, d, double); +} diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-512-vminmaxph-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-512-vminmaxph-2.c new file mode 100644 index 00000000000..503bb9f18b4 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx10_2-512-vminmaxph-2.c @@ -0,0 +1,35 @@ +/* { dg-do run } */ +/* { dg-options "-fsignaling-nans -mfpmath=sse -O2 -mavx10.2-512" } */ +/* { dg-require-effective-target avx10_2 } */ + +#ifndef AVX10_2 +#define AVX10_2 +#define AVX10_512BIT +#endif +#define SIZE (AVX512F_LEN / 16) +#include "avx10-helper.h" +#include +#include "avx10-minmax-helper.h" + +void static +CALC (_Float16 *r, _Float16 *s1, _Float16 *s2, int R) +{ + for(int i = 0; i < SIZE; i++) + r[i] = minmax__Float16(&s1[i], &s2[i], R); +} + +void +TEST (void) +{ + int i, sign; + UNION_TYPE (AVX512F_LEN, h) res1, res2, res3, src1, src2; + MASK_TYPE mask = MASK_VALUE; + _Float16 res_ref[SIZE]; + + UNIT_TEST(0, ph, h, _Float16); + UNIT_TEST(1, ph, h, _Float16); + UNIT_TEST(4, ph, h, _Float16); + UNIT_TEST(5, ph, h, _Float16); + UNIT_TEST(16, ph, h, _Float16); + UNIT_TEST(17, ph, h, _Float16); +} diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-512-vminmaxps-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-512-vminmaxps-2.c new file mode 100644 index 00000000000..f3ef43ed629 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx10_2-512-vminmaxps-2.c @@ -0,0 +1,35 @@ +/* { dg-do run } */ +/* { dg-options "-fsignaling-nans -mfpmath=sse -O2 -mavx10.2-512" } */ +/* { dg-require-effective-target avx10_2 } */ + +#ifndef AVX10_2 +#define AVX10_2 +#define AVX10_512BIT +#endif +#define SIZE (AVX512F_LEN / 32) +#include "avx10-helper.h" +#include +#include "avx10-minmax-helper.h" + +void static +CALC (float *r, float *s1, float *s2, int R) +{ + for(int i = 0; i < SIZE; i++) + r[i] = minmax_float(&s1[i], &s2[i], R); +} + +void +TEST (void) +{ + int i, sign; + UNION_TYPE (AVX512F_LEN, ) res1, res2, res3, src1, src2; + MASK_TYPE mask = MASK_VALUE; + float res_ref[SIZE]; + + UNIT_TEST(0, ps, , float); + UNIT_TEST(1, ps, , float); + UNIT_TEST(4, ps, , float); + UNIT_TEST(5, ps, , float); + UNIT_TEST(16, ps, , float); + UNIT_TEST(17, ps, , float); +} diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-minmax-1.c b/gcc/testsuite/gcc.target/i386/avx10_2-minmax-1.c new file mode 100644 index 00000000000..44798e27800 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx10_2-minmax-1.c @@ -0,0 +1,122 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -mavx10.2" } */ +/* { dg-final { scan-assembler-times "vminmaxnepbf16\[ \\t\]+\[^\{\n\]*\[^\}\]%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r\]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vminmaxnepbf16\[ \\t\]+\[^\{\n\]*\[^\}\]%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vminmaxnepbf16\[ \\t\]+\[^\{\n\]*\[^\}\]%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r\]*%xmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vminmaxnepbf16\[ \\t\]+\[^\{\n\]*\[^\}\]%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\[^\n\r\]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vminmaxnepbf16\[ \\t\]+\[^\{\n\]*\[^\}\]%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\[^\n\r\]*%ymm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vminmaxnepbf16\[ \\t\]+\[^\{\n\]*\[^\}\]%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\[^\n\r\]*%ymm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vminmaxph\[ \\t\]+\[^\{\n\]*\[^\}\]%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r\]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vminmaxph\[ \\t\]+\[^\{\n\]*\[^\}\]%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vminmaxph\[ \\t\]+\[^\{\n\]*\[^\}\]%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r\]*%xmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vminmaxph\[ \\t\]+\[^\{\n\]*\[^\}\]%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\[^\n\r\]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vminmaxph\[ \\t\]+\[^\{\n\]*\[^\}\]%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\[^\n\r\]*%ymm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vminmaxph\[ \\t\]+\[^\{\n\]*\[^\}\]%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\[^\n\r\]*%ymm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vminmaxph\[ \\t\]+\[^\n\]*\{sae\}\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\[^\n\r\]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vminmaxph\[ \\t\]+\[^\n\]*\{sae\}\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\[^\n\r\]*%ymm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vminmaxph\[ \\t\]+\[^\n\]*\{sae\}\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\[^\n\r\]*%ymm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vminmaxps\[ \\t\]+\[^\{\n\]*\[^\}\]%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r\]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vminmaxps\[ \\t\]+\[^\{\n\]*\[^\}\]%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vminmaxps\[ \\t\]+\[^\{\n\]*\[^\}\]%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r\]*%xmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vminmaxps\[ \\t\]+\[^\{\n\]*\[^\}\]%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\[^\n\r\]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vminmaxps\[ \\t\]+\[^\{\n\]*\[^\}\]%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\[^\n\r\]*%ymm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vminmaxps\[ \\t\]+\[^\{\n\]*\[^\}\]%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\[^\n\r\]*%ymm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vminmaxps\[ \\t\]+\[^\n\]*\{sae\}\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\[^\n\r\]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vminmaxps\[ \\t\]+\[^\n\]*\{sae\}\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\[^\n\r\]*%ymm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vminmaxps\[ \\t\]+\[^\n\]*\{sae\}\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\[^\n\r\]*%ymm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vminmaxpd\[ \\t\]+\[^\{\n\]*\[^\}\]%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r\]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vminmaxpd\[ \\t\]+\[^\{\n\]*\[^\}\]%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vminmaxpd\[ \\t\]+\[^\{\n\]*\[^\}\]%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r\]*%xmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vminmaxpd\[ \\t\]+\[^\{\n\]*\[^\}\]%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\[^\n\r\]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vminmaxpd\[ \\t\]+\[^\{\n\]*\[^\}\]%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\[^\n\r\]*%ymm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vminmaxpd\[ \\t\]+\[^\{\n\]*\[^\}\]%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\[^\n\r\]*%ymm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vminmaxpd\[ \\t\]+\[^\n\]*\{sae\}\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\[^\n\r\]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vminmaxpd\[ \\t\]+\[^\n\]*\{sae\}\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\[^\n\r\]*%ymm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vminmaxpd\[ \\t\]+\[^\n\]*\{sae\}\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\[^\n\r\]*%ymm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vminmaxsh\[ \\t\]+\[^\{\n\]*\[^\}\]%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r\]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vminmaxsh\[ \\t\]+\[^\{\n\]*\[^\}\]%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vminmaxsh\[ \\t\]+\[^\{\n\]*\[^\}\]%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r\]*%xmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vminmaxsh\[ \\t\]+\[^\n\]*\{sae\}\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r\]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vminmaxsh\[ \\t\]+\[^\n\]*\{sae\}\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vminmaxsh\[ \\t\]+\[^\n\]*\{sae\}\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r\]*%xmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vminmaxss\[ \\t\]+\[^\{\n\]*\[^\}\]%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r\]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vminmaxss\[ \\t\]+\[^\{\n\]*\[^\}\]%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vminmaxss\[ \\t\]+\[^\{\n\]*\[^\}\]%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r\]*%xmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vminmaxss\[ \\t\]+\[^\n\]*\{sae\}\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r\]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vminmaxss\[ \\t\]+\[^\n\]*\{sae\}\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vminmaxss\[ \\t\]+\[^\n\]*\{sae\}\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r\]*%xmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vminmaxsd\[ \\t\]+\[^\{\n\]*\[^\}\]%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r\]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vminmaxsd\[ \\t\]+\[^\{\n\]*\[^\}\]%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vminmaxsd\[ \\t\]+\[^\{\n\]*\[^\}\]%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r\]*%xmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vminmaxsd\[ \\t\]+\[^\n\]*\{sae\}\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r\]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vminmaxsd\[ \\t\]+\[^\n\]*\{sae\}\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vminmaxsd\[ \\t\]+\[^\n\]*\{sae\}\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r\]*%xmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ + +#include + +volatile __m256bh y1_; +volatile __m256h y2; +volatile __m256 y3; +volatile __m256d y4; +volatile __m128bh x1; +volatile __m128h x2; +volatile __m128 x3; +volatile __m128d x4; +volatile __mmask16 m16; +volatile __mmask8 m8; + +void extern +avx10_2_test (void) +{ + x1 = _mm_minmax_nepbh (x1, x1, 100); + x1 = _mm_mask_minmax_nepbh (x1, m8, x1, x1, 100); + x1 = _mm_maskz_minmax_nepbh (m8, x1, x1, 100); + y1_ = _mm256_minmax_nepbh (y1_, y1_, 100); + y1_ = _mm256_mask_minmax_nepbh (y1_, m16, y1_, y1_, 100); + y1_ = _mm256_maskz_minmax_nepbh (m16, y1_, y1_, 100); + x2 = _mm_minmax_ph (x2, x2, 100); + x2 = _mm_mask_minmax_ph (x2, m8, x2, x2, 100); + x2 = _mm_maskz_minmax_ph (m8, x2, x2, 100); + y2 = _mm256_minmax_ph (y2, y2, 100); + y2 = _mm256_mask_minmax_ph (y2, m16, y2, y2, 100); + y2 = _mm256_maskz_minmax_ph (m16, y2, y2, 100); + y2 = _mm256_minmax_round_ph (y2, y2, 100, _MM_FROUND_TO_NEAREST_INT | _MM_FROUND_NO_EXC); + y2 = _mm256_mask_minmax_round_ph (y2, m16, y2, y2, 100, _MM_FROUND_TO_NEAREST_INT | _MM_FROUND_NO_EXC); + y2 = _mm256_maskz_minmax_round_ph (m16, y2, y2, 100, _MM_FROUND_TO_NEAREST_INT | _MM_FROUND_NO_EXC); + x3 = _mm_minmax_ps (x3, x3, 100); + x3 = _mm_mask_minmax_ps (x3, m8, x3, x3, 100); + x3 = _mm_maskz_minmax_ps (m8, x3, x3, 100); + y3 = _mm256_minmax_ps (y3, y3, 100); + y3 = _mm256_mask_minmax_ps (y3, m8, y3, y3, 100); + y3 = _mm256_maskz_minmax_ps (m8, y3, y3, 100); + y3 = _mm256_minmax_round_ps (y3, y3, 100, _MM_FROUND_TO_NEAREST_INT | _MM_FROUND_NO_EXC); + y3 = _mm256_mask_minmax_round_ps (y3, m8, y3, y3, 100, _MM_FROUND_TO_NEAREST_INT | _MM_FROUND_NO_EXC); + y3 = _mm256_maskz_minmax_round_ps (m8, y3, y3, 100, _MM_FROUND_TO_NEAREST_INT | _MM_FROUND_NO_EXC); + x4 = _mm_minmax_pd (x4, x4, 100); + x4 = _mm_mask_minmax_pd (x4, m8, x4, x4, 100); + x4 = _mm_maskz_minmax_pd (m8, x4, x4, 100); + y4 = _mm256_minmax_pd (y4, y4, 100); + y4 = _mm256_mask_minmax_pd (y4, m8, y4, y4, 100); + y4 = _mm256_maskz_minmax_pd (m8, y4, y4, 100); + y4 = _mm256_minmax_round_pd (y4, y4, 100, _MM_FROUND_TO_NEAREST_INT | _MM_FROUND_NO_EXC); + y4 = _mm256_mask_minmax_round_pd (y4, m8, y4, y4, 100, _MM_FROUND_TO_NEAREST_INT | _MM_FROUND_NO_EXC); + y4 = _mm256_maskz_minmax_round_pd (m8, y4, y4, 100, _MM_FROUND_TO_NEAREST_INT | _MM_FROUND_NO_EXC); + x2 = _mm_minmax_sh (x2, x2, 1); + x2 = _mm_mask_minmax_sh (x2, m8, x2, x2, 1); + x2 = _mm_maskz_minmax_sh (m8, x2, x2, 1); + x2 = _mm_minmax_round_sh (x2, x2, 1, _MM_FROUND_TO_NEAREST_INT | _MM_FROUND_NO_EXC); + x2 = _mm_mask_minmax_round_sh (x2, m8, x2, x2, 1, _MM_FROUND_TO_NEAREST_INT | _MM_FROUND_NO_EXC); + x2 = _mm_maskz_minmax_round_sh (m8, x2, x2, 1, _MM_FROUND_TO_NEAREST_INT | _MM_FROUND_NO_EXC); + x3 = _mm_minmax_ss (x3, x3, 1); + x3 = _mm_mask_minmax_ss (x3, m8, x3, x3, 1); + x3 = _mm_maskz_minmax_ss (m8, x3, x3, 1); + x3 = _mm_minmax_round_ss (x3, x3, 1, _MM_FROUND_TO_NEAREST_INT | _MM_FROUND_NO_EXC); + x3 = _mm_mask_minmax_round_ss (x3, m8, x3, x3, 1, _MM_FROUND_TO_NEAREST_INT | _MM_FROUND_NO_EXC); + x3 = _mm_maskz_minmax_round_ss (m8, x3, x3, 1, _MM_FROUND_TO_NEAREST_INT | _MM_FROUND_NO_EXC); + x4 = _mm_minmax_sd (x4, x4, 1); + x4 = _mm_mask_minmax_sd (x4, m8, x4, x4, 1); + x4 = _mm_maskz_minmax_sd (m8, x4, x4, 1); + x4 = _mm_minmax_round_sd (x4, x4, 1, _MM_FROUND_TO_NEAREST_INT | _MM_FROUND_NO_EXC); + x4 = _mm_mask_minmax_round_sd (x4, m8, x4, x4, 1, _MM_FROUND_TO_NEAREST_INT | _MM_FROUND_NO_EXC); + x4 = _mm_maskz_minmax_round_sd (m8, x4, x4, 1, _MM_FROUND_TO_NEAREST_INT | _MM_FROUND_NO_EXC); +} diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-vminmaxnepbf16-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-vminmaxnepbf16-2.c new file mode 100644 index 00000000000..e1ac0639ff2 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx10_2-vminmaxnepbf16-2.c @@ -0,0 +1,13 @@ +/* { dg-do run } */ +/* { dg-options "-fsignaling-nans -mfpmath=sse -O2 -mavx10.2" } */ +/* { dg-require-effective-target avx10_2 } */ + +#define AVX10_2 +#define AVX512VL +#define AVX512F_LEN 256 +#include "avx10_2-512-vminmaxnepbf16-2.c" + +#undef AVX512F_LEN + +#define AVX512F_LEN 128 +#include "avx10_2-512-vminmaxnepbf16-2.c" diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-vminmaxpd-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-vminmaxpd-2.c new file mode 100644 index 00000000000..29cd113d42a --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx10_2-vminmaxpd-2.c @@ -0,0 +1,13 @@ +/* { dg-do run } */ +/* { dg-options "-fsignaling-nans -mfpmath=sse -O2 -mavx10.2" } */ +/* { dg-require-effective-target avx10_2 } */ + +#define AVX10_2 +#define AVX512VL +#define AVX512F_LEN 256 +#include "avx10_2-512-vminmaxpd-2.c" + +#undef AVX512F_LEN + +#define AVX512F_LEN 128 +#include "avx10_2-512-vminmaxpd-2.c" diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-vminmaxph-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-vminmaxph-2.c new file mode 100644 index 00000000000..8a2229498b3 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx10_2-vminmaxph-2.c @@ -0,0 +1,15 @@ +/* { dg-do run } */ +/* { dg-options "-fsignaling-nans -mfpmath=sse -O2 -mavx10.2" } */ +/* { dg-require-effective-target avx10_2 } */ + +#define AVX10_2 +#define AVX512VL +#define AVX512F_LEN 256 +typedef _Float16 __m256h __attribute__ ((__vector_size__ (32), __may_alias__)); +#include "avx10_2-512-vminmaxph-2.c" + +#undef AVX512F_LEN + +#define AVX512F_LEN 128 +typedef _Float16 __m128h __attribute__ ((__vector_size__ (16), __may_alias__)); +#include "avx10_2-512-vminmaxph-2.c" diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-vminmaxps-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-vminmaxps-2.c new file mode 100644 index 00000000000..f6f1e79aa9e --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx10_2-vminmaxps-2.c @@ -0,0 +1,13 @@ +/* { dg-do run } */ +/* { dg-options "-fsignaling-nans -mfpmath=sse -O2 -mavx10.2" } */ +/* { dg-require-effective-target avx10_2 } */ + +#define AVX10_2 +#define AVX512VL +#define AVX512F_LEN 256 +#include "avx10_2-512-vminmaxps-2.c" + +#undef AVX512F_LEN + +#define AVX512F_LEN 128 +#include "avx10_2-512-vminmaxps-2.c" diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-vminmaxsd-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-vminmaxsd-2.c new file mode 100644 index 00000000000..1e2d78c4068 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx10_2-vminmaxsd-2.c @@ -0,0 +1,34 @@ +/* { dg-do run } */ +/* { dg-options "-fsignaling-nans -mfpmath=sse -O2 -mavx10.2" } */ +/* { dg-require-effective-target avx10_2 } */ + +#define AVX10_2 +#define AVX10_SCALAR +#define SIZE (128 / 64) +#include "avx10-helper.h" +#include +#include "avx10-minmax-helper.h" + +void static +CALC (double *r, double *s1, double *s2, int R) +{ + r[0] = minmax_double(&s1[0], &s2[0], R); + for(int i = 1; i < SIZE; i++) + r[i] = s1[i]; +} + +void +TEST (void) +{ + int i, sign; + UNION_TYPE (128, d) res1, res2, res3, src1, src2; + MASK_TYPE mask = MASK_VALUE; + double res_ref[SIZE]; + + SCALAR_UNIT_TEST(0, sd, d, double); + SCALAR_UNIT_TEST(1, sd, d, double); + SCALAR_UNIT_TEST(4, sd, d, double); + SCALAR_UNIT_TEST(5, sd, d, double); + SCALAR_UNIT_TEST(16, sd, d, double); + SCALAR_UNIT_TEST(17, sd, d, double); +} diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-vminmaxsh-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-vminmaxsh-2.c new file mode 100644 index 00000000000..e6a93c403b5 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx10_2-vminmaxsh-2.c @@ -0,0 +1,34 @@ +/* { dg-do run } */ +/* { dg-options "-fsignaling-nans -mfpmath=sse -O2 -mavx10.2" } */ +/* { dg-require-effective-target avx10_2 } */ + +#define AVX10_2 +#define AVX10_SCALAR +#define SIZE (128 / 16) +#include "avx10-helper.h" +#include +#include "avx10-minmax-helper.h" + +void static +CALC (_Float16 *r, _Float16 *s1, _Float16 *s2, int R) +{ + r[0] = minmax__Float16(&s1[0], &s2[0], R); + for(int i = 1; i < SIZE; i++) + r[i] = s1[i]; +} + +void +TEST (void) +{ + int i, sign; + UNION_TYPE (128, h) res1, res2, res3, src1, src2; + MASK_TYPE mask = MASK_VALUE; + _Float16 res_ref[SIZE]; + + SCALAR_UNIT_TEST(0, sh, h, _Float16); + SCALAR_UNIT_TEST(1, sh, h, _Float16); + SCALAR_UNIT_TEST(4, sh, h, _Float16); + SCALAR_UNIT_TEST(5, sh, h, _Float16); + SCALAR_UNIT_TEST(16, sh, h, _Float16); + SCALAR_UNIT_TEST(17, sh, h, _Float16); +} diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-vminmaxss-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-vminmaxss-2.c new file mode 100644 index 00000000000..47177e69640 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx10_2-vminmaxss-2.c @@ -0,0 +1,34 @@ +/* { dg-do run } */ +/* { dg-options "-fsignaling-nans -mfpmath=sse -O2 -mavx10.2" } */ +/* { dg-require-effective-target avx10_2 } */ + +#define AVX10_2 +#define AVX10_SCALAR +#define SIZE (128 / 32) +#include "avx10-helper.h" +#include +#include "avx10-minmax-helper.h" + +void static +CALC (float *r, float *s1, float *s2, int R) +{ + r[0] = minmax_float(&s1[0], &s2[0], R); + for(int i = 1; i < SIZE; i++) + r[i] = s1[i]; +} + +void +TEST (void) +{ + int i, sign; + UNION_TYPE (128, ) res1, res2, res3, src1, src2; + MASK_TYPE mask = MASK_VALUE; + float res_ref[SIZE]; + + SCALAR_UNIT_TEST(0, ss, , float); + SCALAR_UNIT_TEST(1, ss, , float); + SCALAR_UNIT_TEST(4, ss, , float); + SCALAR_UNIT_TEST(5, ss, , float); + SCALAR_UNIT_TEST(16, ss, , float); + SCALAR_UNIT_TEST(17, ss, , float); +} diff --git a/gcc/testsuite/gcc.target/i386/avx512f-helper.h b/gcc/testsuite/gcc.target/i386/avx512f-helper.h index b49ff061f78..21f691b8441 100644 --- a/gcc/testsuite/gcc.target/i386/avx512f-helper.h +++ b/gcc/testsuite/gcc.target/i386/avx512f-helper.h @@ -41,6 +41,7 @@ MAKE_MASK_MERGE(i_b, char) MAKE_MASK_MERGE(i_w, short) MAKE_MASK_MERGE(i_d, int) MAKE_MASK_MERGE(i_q, long long) +MAKE_MASK_MERGE(h, _Float16) MAKE_MASK_MERGE(, float) MAKE_MASK_MERGE(d, double) MAKE_MASK_MERGE(i_ub, unsigned char) @@ -68,6 +69,7 @@ MAKE_MASK_ZERO(i_b, char) MAKE_MASK_ZERO(i_w, short) MAKE_MASK_ZERO(i_d, int) MAKE_MASK_ZERO(i_q, long long) +MAKE_MASK_ZERO(h, _Float16) MAKE_MASK_ZERO(, float) MAKE_MASK_ZERO(d, double) MAKE_MASK_ZERO(i_ub, unsigned char) diff --git a/gcc/testsuite/gcc.target/i386/sse-13.c b/gcc/testsuite/gcc.target/i386/sse-13.c index 1d6ca552fcc..b32a5d75d81 100644 --- a/gcc/testsuite/gcc.target/i386/sse-13.c +++ b/gcc/testsuite/gcc.target/i386/sse-13.c @@ -1089,4 +1089,23 @@ #define __builtin_ia32_cvttss2usis64_round(A, B) __builtin_ia32_cvttss2usis64_round(A, 8) #endif +/* avx10_2-512minmaxintrin.h */ +#define __builtin_ia32_minmaxpd512_mask_round(A, B, C, D, E, F) __builtin_ia32_minmaxpd512_mask_round (A, B, 4, D, E, 4) +#define __builtin_ia32_minmaxph512_mask_round(A, B, C, D, E, F) __builtin_ia32_minmaxph512_mask_round (A, B, 4, D, E, 4) +#define __builtin_ia32_minmaxps512_mask_round(A, B, C, D, E, F) __builtin_ia32_minmaxps512_mask_round (A, B, 4, D, E, 4) +#define __builtin_ia32_minmaxnepbf16512_mask(A, B, C, W, U) __builtin_ia32_minmaxnepbf16512_mask (A, B, 4, W, U) + +/* avx10_2minmaxintrin.h */ +#define __builtin_ia32_minmaxsd_mask_round(A, B, C, D, E, F) __builtin_ia32_minmaxsd_mask_round (A, B, 4, D, E, 4) +#define __builtin_ia32_minmaxsh_mask_round(A, B, C, D, E, F) __builtin_ia32_minmaxsh_mask_round (A, B, 4, D, E, 4) +#define __builtin_ia32_minmaxss_mask_round(A, B, C, D, E, F) __builtin_ia32_minmaxss_mask_round (A, B, 4, D, E, 4) +#define __builtin_ia32_minmaxnepbf16128_mask(A, B, C, D, E) __builtin_ia32_minmaxnepbf16128_mask (A, B, 4, D, E) +#define __builtin_ia32_minmaxnepbf16256_mask(A, B, C, D, E) __builtin_ia32_minmaxnepbf16256_mask (A, B, 4, D, E) +#define __builtin_ia32_minmaxpd128_mask(A, B, C, D, E) __builtin_ia32_minmaxpd128_mask (A, B, 4, D, E) +#define __builtin_ia32_minmaxpd256_mask_round(A, B, C, D, E, F) __builtin_ia32_minmaxpd256_mask_round (A, B, 4, D, E, 4) +#define __builtin_ia32_minmaxph128_mask(A, B, C, D, E) __builtin_ia32_minmaxph128_mask (A, B, 4, D, E) +#define __builtin_ia32_minmaxph256_mask_round(A, B, C, D, E, F) __builtin_ia32_minmaxph256_mask_round (A, B, 4, D, E, 4) +#define __builtin_ia32_minmaxps128_mask(A, B, C, D, E) __builtin_ia32_minmaxps128_mask (A, B, 4, D, E) +#define __builtin_ia32_minmaxps256_mask_round(A, B, C, D, E, F) __builtin_ia32_minmaxps256_mask_round (A, B, 4, D, E, 4) + #include diff --git a/gcc/testsuite/gcc.target/i386/sse-14.c b/gcc/testsuite/gcc.target/i386/sse-14.c index 799982b6f7e..4662c863af7 100644 --- a/gcc/testsuite/gcc.target/i386/sse-14.c +++ b/gcc/testsuite/gcc.target/i386/sse-14.c @@ -1541,3 +1541,70 @@ test_1 (_mm_cvtts_roundsd_epu64, unsigned long long, __m128d, 8) test_1 (_mm_cvtts_roundss_epi64, long long, __m128, 8) test_1 (_mm_cvtts_roundss_epu64, unsigned long long, __m128, 8) #endif + +/* avx10_2-512minmaxintrin.h */ +test_2 (_mm512_minmax_nepbh, __m512bh, __m512bh, __m512bh, 100) +test_3 (_mm512_maskz_minmax_nepbh, __m512bh, __mmask32, __m512bh, __m512bh, 100) +test_4 (_mm512_mask_minmax_nepbh, __m512bh, __m512bh, __mmask32, __m512bh, __m512bh, 100) +test_2x (_mm512_minmax_round_pd, __m512d, __m512d, __m512d, 100, 4) +test_3x (_mm512_maskz_minmax_round_pd, __m512d, __mmask8, __m512d, __m512d, 100, 4) +test_4x (_mm512_mask_minmax_round_pd, __m512d, __m512d, __mmask8, __m512d, __m512d, 100, 4) +test_2x (_mm512_minmax_round_ps, __m512, __m512, __m512, 100, 4) +test_3x (_mm512_maskz_minmax_round_ps, __m512, __mmask16, __m512, __m512, 100, 4) +test_4x (_mm512_mask_minmax_round_ps, __m512, __m512, __mmask16, __m512, __m512, 100, 4) +test_2x (_mm512_minmax_round_ph, __m512h, __m512h, __m512h, 100, 4) +test_3x (_mm512_maskz_minmax_round_ph, __m512h, __mmask32, __m512h, __m512h, 100, 4) +test_4x (_mm512_mask_minmax_round_ph, __m512h, __m512h, __mmask32, __m512h, __m512h, 100, 4) +test_2 (_mm512_minmax_pd, __m512d, __m512d, __m512d, 100) +test_3 (_mm512_maskz_minmax_pd, __m512d, __mmask8, __m512d, __m512d, 100) +test_4 (_mm512_mask_minmax_pd, __m512d, __m512d, __mmask8, __m512d, __m512d, 100) +test_2 (_mm512_minmax_ps, __m512, __m512, __m512, 100) +test_3 (_mm512_maskz_minmax_ps, __m512, __mmask16, __m512, __m512, 100) +test_4 (_mm512_mask_minmax_ps, __m512, __m512, __mmask16, __m512, __m512, 100) +test_2 (_mm512_minmax_ph, __m512h, __m512h, __m512h, 100) +test_3 (_mm512_maskz_minmax_ph, __m512h, __mmask32, __m512h, __m512h, 100) +test_4 (_mm512_mask_minmax_ph, __m512h, __m512h, __mmask32, __m512h, __m512h, 100) + +/* avx10_2minmaxintrin.h */ +test_2 (_mm256_minmax_nepbh, __m256bh, __m256bh, __m256bh, 100) +test_3 (_mm256_maskz_minmax_nepbh, __m256bh, __mmask16, __m256bh, __m256bh, 100) +test_4 (_mm256_mask_minmax_nepbh, __m256bh, __m256bh, __mmask16, __m256bh, __m256bh, 100) +test_2x (_mm256_minmax_round_pd, __m256d, __m256d, __m256d, 100, 4) +test_3x (_mm256_maskz_minmax_round_pd, __m256d, __mmask8, __m256d, __m256d, 100, 4) +test_4x (_mm256_mask_minmax_round_pd, __m256d, __m256d, __mmask8, __m256d, __m256d, 100, 4) +test_2x (_mm256_minmax_round_ps, __m256, __m256, __m256, 100, 4) +test_3x (_mm256_maskz_minmax_round_ps, __m256, __mmask8, __m256, __m256, 100, 4) +test_4x (_mm256_mask_minmax_round_ps, __m256, __m256, __mmask8, __m256, __m256, 100, 4) +test_2x (_mm256_minmax_round_ph, __m256h, __m256h, __m256h, 100, 4) +test_3x (_mm256_maskz_minmax_round_ph, __m256h, __mmask16, __m256h, __m256h, 100, 4) +test_4x (_mm256_mask_minmax_round_ph, __m256h, __m256h, __mmask16, __m256h, __m256h, 100, 4) +test_2 (_mm256_minmax_pd, __m256d, __m256d, __m256d, 100) +test_3 (_mm256_maskz_minmax_pd, __m256d, __mmask8, __m256d, __m256d, 100) +test_4 (_mm256_mask_minmax_pd, __m256d, __m256d, __mmask8, __m256d, __m256d, 100) +test_2 (_mm256_minmax_ps, __m256, __m256, __m256, 100) +test_3 (_mm256_maskz_minmax_ps, __m256, __mmask8, __m256, __m256, 100) +test_4 (_mm256_mask_minmax_ps, __m256, __m256, __mmask8, __m256, __m256, 100) +test_2 (_mm256_minmax_ph, __m256h, __m256h, __m256h, 100) +test_3 (_mm256_maskz_minmax_ph, __m256h, __mmask16, __m256h, __m256h, 100) +test_4 (_mm256_mask_minmax_ph, __m256h, __m256h, __mmask16, __m256h, __m256h, 100) +test_2 (_mm_minmax_nepbh, __m128bh, __m128bh, __m128bh, 100) +test_3 (_mm_maskz_minmax_nepbh, __m128bh, __mmask8, __m128bh, __m128bh, 100) +test_4 (_mm_mask_minmax_nepbh, __m128bh, __m128bh, __mmask8, __m128bh, __m128bh, 100) +test_2 (_mm_minmax_pd, __m128d, __m128d, __m128d, 100) +test_3 (_mm_maskz_minmax_pd, __m128d, __mmask8, __m128d, __m128d, 100) +test_4 (_mm_mask_minmax_pd, __m128d, __m128d, __mmask8, __m128d, __m128d, 100) +test_2 (_mm_minmax_ps, __m128, __m128, __m128, 100) +test_3 (_mm_maskz_minmax_ps, __m128, __mmask8, __m128, __m128, 100) +test_4 (_mm_mask_minmax_ps, __m128, __m128, __mmask8, __m128, __m128, 100) +test_2 (_mm_minmax_ph, __m128h, __m128h, __m128h, 100) +test_3 (_mm_maskz_minmax_ph, __m128h, __mmask8, __m128h, __m128h, 100) +test_4 (_mm_mask_minmax_ph, __m128h, __m128h, __mmask8, __m128h, __m128h, 100) +test_2x (_mm_minmax_round_sd, __m128d, __m128d, __m128d, 100, 4) +test_3x (_mm_maskz_minmax_round_sd, __m128d, __mmask8, __m128d, __m128d, 100, 4) +test_4x (_mm_mask_minmax_round_sd, __m128d, __m128d, __mmask8, __m128d, __m128d, 100, 4) +test_2x (_mm_minmax_round_ss, __m128, __m128, __m128, 100, 4) +test_3x (_mm_maskz_minmax_round_ss, __m128, __mmask8, __m128, __m128, 100, 4) +test_4x (_mm_mask_minmax_round_ss, __m128, __m128, __mmask8, __m128, __m128, 100, 4) +test_2x (_mm_minmax_round_sh, __m128h, __m128h, __m128h, 100, 4) +test_3x (_mm_maskz_minmax_round_sh, __m128h, __mmask8, __m128h, __m128h, 100, 4) +test_4x (_mm_mask_minmax_round_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 100, 4) diff --git a/gcc/testsuite/gcc.target/i386/sse-22.c b/gcc/testsuite/gcc.target/i386/sse-22.c index b8eb6ae7828..229e2f7d646 100644 --- a/gcc/testsuite/gcc.target/i386/sse-22.c +++ b/gcc/testsuite/gcc.target/i386/sse-22.c @@ -1580,3 +1580,70 @@ test_1 (_mm_cvtts_roundsd_epu64, unsigned long long, __m128d, 8) test_1 (_mm_cvtts_roundss_epi64, long long, __m128, 8) test_1 (_mm_cvtts_roundss_epu64, unsigned long long, __m128, 8) #endif + +/* avx10_2-512minmaxintrin.h */ +test_2 (_mm512_minmax_nepbh, __m512bh, __m512bh, __m512bh, 100) +test_3 (_mm512_maskz_minmax_nepbh, __m512bh, __mmask32, __m512bh, __m512bh, 100) +test_4 (_mm512_mask_minmax_nepbh, __m512bh, __m512bh, __mmask32, __m512bh, __m512bh, 100) +test_2x (_mm512_minmax_round_pd, __m512d, __m512d, __m512d, 100, 4) +test_3x (_mm512_maskz_minmax_round_pd, __m512d, __mmask8, __m512d, __m512d, 100, 4) +test_4x (_mm512_mask_minmax_round_pd, __m512d, __m512d, __mmask8, __m512d, __m512d, 100, 4) +test_2x (_mm512_minmax_round_ps, __m512, __m512, __m512, 100, 4) +test_3x (_mm512_maskz_minmax_round_ps, __m512, __mmask16, __m512, __m512, 100, 4) +test_4x (_mm512_mask_minmax_round_ps, __m512, __m512, __mmask16, __m512, __m512, 100, 4) +test_2x (_mm512_minmax_round_ph, __m512h, __m512h, __m512h, 100, 4) +test_3x (_mm512_maskz_minmax_round_ph, __m512h, __mmask32, __m512h, __m512h, 100, 4) +test_4x (_mm512_mask_minmax_round_ph, __m512h, __m512h, __mmask32, __m512h, __m512h, 100, 4) +test_2 (_mm512_minmax_pd, __m512d, __m512d, __m512d, 100) +test_3 (_mm512_maskz_minmax_pd, __m512d, __mmask8, __m512d, __m512d, 100) +test_4 (_mm512_mask_minmax_pd, __m512d, __m512d, __mmask8, __m512d, __m512d, 100) +test_2 (_mm512_minmax_ps, __m512, __m512, __m512, 100) +test_3 (_mm512_maskz_minmax_ps, __m512, __mmask16, __m512, __m512, 100) +test_4 (_mm512_mask_minmax_ps, __m512, __m512, __mmask16, __m512, __m512, 100) +test_2 (_mm512_minmax_ph, __m512h, __m512h, __m512h, 100) +test_3 (_mm512_maskz_minmax_ph, __m512h, __mmask32, __m512h, __m512h, 100) +test_4 (_mm512_mask_minmax_ph, __m512h, __m512h, __mmask32, __m512h, __m512h, 100) + +/* avx10_2minmaxintrin.h */ +test_2 (_mm256_minmax_nepbh, __m256bh, __m256bh, __m256bh, 100) +test_3 (_mm256_maskz_minmax_nepbh, __m256bh, __mmask16, __m256bh, __m256bh, 100) +test_4 (_mm256_mask_minmax_nepbh, __m256bh, __m256bh, __mmask16, __m256bh, __m256bh, 100) +test_2x (_mm256_minmax_round_pd, __m256d, __m256d, __m256d, 100, 4) +test_3x (_mm256_maskz_minmax_round_pd, __m256d, __mmask8, __m256d, __m256d, 100, 4) +test_4x (_mm256_mask_minmax_round_pd, __m256d, __m256d, __mmask8, __m256d, __m256d, 100, 4) +test_2x (_mm256_minmax_round_ps, __m256, __m256, __m256, 100, 4) +test_3x (_mm256_maskz_minmax_round_ps, __m256, __mmask8, __m256, __m256, 100, 4) +test_4x (_mm256_mask_minmax_round_ps, __m256, __m256, __mmask8, __m256, __m256, 100, 4) +test_2x (_mm256_minmax_round_ph, __m256h, __m256h, __m256h, 100, 4) +test_3x (_mm256_maskz_minmax_round_ph, __m256h, __mmask16, __m256h, __m256h, 100, 4) +test_4x (_mm256_mask_minmax_round_ph, __m256h, __m256h, __mmask16, __m256h, __m256h, 100, 4) +test_2 (_mm256_minmax_pd, __m256d, __m256d, __m256d, 100) +test_3 (_mm256_maskz_minmax_pd, __m256d, __mmask8, __m256d, __m256d, 100) +test_4 (_mm256_mask_minmax_pd, __m256d, __m256d, __mmask8, __m256d, __m256d, 100) +test_2 (_mm256_minmax_ps, __m256, __m256, __m256, 100) +test_3 (_mm256_maskz_minmax_ps, __m256, __mmask8, __m256, __m256, 100) +test_4 (_mm256_mask_minmax_ps, __m256, __m256, __mmask8, __m256, __m256, 100) +test_2 (_mm256_minmax_ph, __m256h, __m256h, __m256h, 100) +test_3 (_mm256_maskz_minmax_ph, __m256h, __mmask16, __m256h, __m256h, 100) +test_4 (_mm256_mask_minmax_ph, __m256h, __m256h, __mmask16, __m256h, __m256h, 100) +test_2 (_mm_minmax_nepbh, __m128bh, __m128bh, __m128bh, 100) +test_3 (_mm_maskz_minmax_nepbh, __m128bh, __mmask8, __m128bh, __m128bh, 100) +test_4 (_mm_mask_minmax_nepbh, __m128bh, __m128bh, __mmask8, __m128bh, __m128bh, 100) +test_2 (_mm_minmax_pd, __m128d, __m128d, __m128d, 100) +test_3 (_mm_maskz_minmax_pd, __m128d, __mmask8, __m128d, __m128d, 100) +test_4 (_mm_mask_minmax_pd, __m128d, __m128d, __mmask8, __m128d, __m128d, 100) +test_2 (_mm_minmax_ps, __m128, __m128, __m128, 100) +test_3 (_mm_maskz_minmax_ps, __m128, __mmask8, __m128, __m128, 100) +test_4 (_mm_mask_minmax_ps, __m128, __m128, __mmask8, __m128, __m128, 100) +test_2 (_mm_minmax_ph, __m128h, __m128h, __m128h, 100) +test_3 (_mm_maskz_minmax_ph, __m128h, __mmask8, __m128h, __m128h, 100) +test_4 (_mm_mask_minmax_ph, __m128h, __m128h, __mmask8, __m128h, __m128h, 100) +test_2x (_mm_minmax_round_sd, __m128d, __m128d, __m128d, 100, 4) +test_3x (_mm_maskz_minmax_round_sd, __m128d, __mmask8, __m128d, __m128d, 100, 4) +test_4x (_mm_mask_minmax_round_sd, __m128d, __m128d, __mmask8, __m128d, __m128d, 100, 4) +test_2x (_mm_minmax_round_ss, __m128, __m128, __m128, 100, 4) +test_3x (_mm_maskz_minmax_round_ss, __m128, __mmask8, __m128, __m128, 100, 4) +test_4x (_mm_mask_minmax_round_ss, __m128, __m128, __mmask8, __m128, __m128, 100, 4) +test_2x (_mm_minmax_round_sh, __m128h, __m128h, __m128h, 100, 4) +test_3x (_mm_maskz_minmax_round_sh, __m128h, __mmask8, __m128h, __m128h, 100, 4) +test_4x (_mm_mask_minmax_round_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 100, 4) diff --git a/gcc/testsuite/gcc.target/i386/sse-23.c b/gcc/testsuite/gcc.target/i386/sse-23.c index f3ab4a4f34a..f0e2054f3db 100644 --- a/gcc/testsuite/gcc.target/i386/sse-23.c +++ b/gcc/testsuite/gcc.target/i386/sse-23.c @@ -1063,6 +1063,25 @@ #define __builtin_ia32_cvttss2usis64_round(A, B) __builtin_ia32_cvttss2usis64_round(A, 8) #endif +/* avx10_2-512minmaxintrin.h */ +#define __builtin_ia32_minmaxpd512_mask_round(A, B, C, D, E, F) __builtin_ia32_minmaxpd512_mask_round (A, B, 100, D, E, 4) +#define __builtin_ia32_minmaxph512_mask_round(A, B, C, D, E, F) __builtin_ia32_minmaxph512_mask_round (A, B, 100, D, E, 4) +#define __builtin_ia32_minmaxps512_mask_round(A, B, C, D, E, F) __builtin_ia32_minmaxps512_mask_round (A, B, 100, D, E, 4) +#define __builtin_ia32_minmaxnepbf16512_mask(A, B, C, W, U) __builtin_ia32_minmaxnepbf16512_mask (A, B, 100, W, U) + +/* avx10_2-minmaxintrin.h */ +#define __builtin_ia32_minmaxsd_mask_round(A, B, C, D, E, F) __builtin_ia32_minmaxsd_mask_round (A, B, 100, D, E, 4) +#define __builtin_ia32_minmaxsh_mask_round(A, B, C, D, E, F) __builtin_ia32_minmaxsh_mask_round (A, B, 100, D, E, 4) +#define __builtin_ia32_minmaxss_mask_round(A, B, C, D, E, F) __builtin_ia32_minmaxss_mask_round (A, B, 100, D, E, 4) +#define __builtin_ia32_minmaxnepbf16128_mask(A, B, C, D, E) __builtin_ia32_minmaxnepbf16128_mask (A, B, 100, D, E) +#define __builtin_ia32_minmaxnepbf16256_mask(A, B, C, D, E) __builtin_ia32_minmaxnepbf16256_mask (A, B, 100, D, E) +#define __builtin_ia32_minmaxpd128_mask(A, B, C, D, E) __builtin_ia32_minmaxpd128_mask (A, B, 100, D, E) +#define __builtin_ia32_minmaxpd256_mask_round(A, B, C, D, E, F) __builtin_ia32_minmaxpd256_mask_round (A, B, 100, D, E, 4) +#define __builtin_ia32_minmaxph128_mask(A, B, C, D, E) __builtin_ia32_minmaxph128_mask (A, B, 100, D, E) +#define __builtin_ia32_minmaxph256_mask_round(A, B, C, D, E, F) __builtin_ia32_minmaxph256_mask_round (A, B, 100, D, E, 4) +#define __builtin_ia32_minmaxps128_mask(A, B, C, D, E) __builtin_ia32_minmaxps128_mask (A, B, 100, D, E) +#define __builtin_ia32_minmaxps256_mask_round(A, B, C, D, E, F) __builtin_ia32_minmaxps256_mask_round (A, B, 100, D, E, 4) + #pragma GCC target ("sse4a,3dnow,avx,avx2,fma4,xop,aes,pclmul,popcnt,abm,lzcnt,bmi,bmi2,tbm,lwp,fsgsbase,rdrnd,f16c,fma,rtm,rdseed,prfchw,adx,fxsr,xsaveopt,sha,xsavec,xsaves,clflushopt,clwb,mwaitx,clzero,pku,sgx,rdpid,gfni,vpclmulqdq,pconfig,wbnoinvd,enqcmd,avx512vp2intersect,serialize,tsxldtrk,amx-tile,amx-int8,amx-bf16,kl,widekl,avxvnni,avxifma,avxvnniint8,avxneconvert,cmpccxadd,amx-fp16,prefetchi,raoint,amx-complex,avxvnniint16,sm3,sha512,sm4,avx10.2-512") #include From patchwork Mon Aug 19 09:03:12 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Haochen Jiang X-Patchwork-Id: 1973736 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=intel.com header.i=@intel.com header.a=rsa-sha256 header.s=Intel header.b=QnUGVgi1; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4WnRVN2734z1yf6 for ; Mon, 19 Aug 2024 19:04:12 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 22C9F38654A1 for ; Mon, 19 Aug 2024 09:04:10 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.13]) by sourceware.org (Postfix) with ESMTPS id 5069F3865C13 for ; Mon, 19 Aug 2024 09:03:17 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 5069F3865C13 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=intel.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 5069F3865C13 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=198.175.65.13 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1724058215; cv=none; b=cYVF41LMeMwCo1dhAvDL16ceEUE9sZvRAub+wYiYWe3tdZZjQZWhqcGQs2eyrOS57ACdg9kD+tFTTb04obrDPgqTg/UIjwD8u0M4KmeYaesem6iPpOJQBf+Pg4LBL8g97fpISAFnJZQCs7FosQOCPFsxYmHbktpIWnFBit+UVik= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1724058215; c=relaxed/simple; bh=ZAg5z/J1ug/KxigtCc/dDeuAIbLSV1p01R2qr3elMIc=; h=DKIM-Signature:From:To:Subject:Date:Message-ID:MIME-Version; b=nGzdAe0Sgei7vvzu+n76slNs/KKiphXgUurYFumL1ot8GXy0nYUOPr1Xlhc2v97Hwc1BSjUDOS/Vpj8dYB/RlIUmjyrjfFU+JkKwo1sQP+GptBwb9VjaICPQm5JgGRPd510vayI9OMPx4SCpH0uu/Q1i90ATeOUiKWPsT0I5A7c= ARC-Authentication-Results: i=1; server2.sourceware.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1724058197; x=1755594197; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=ZAg5z/J1ug/KxigtCc/dDeuAIbLSV1p01R2qr3elMIc=; b=QnUGVgi1jOkpKaMCeFMCj1INGUdAfNtlIdWEc4JmwLA4IZxRXYC0J4pq jIVP2vDKk3ReIUBpecGmYddmqK5oi17+VkO6SR1Hnzxkoj7SERW+eICGP Rjf4N/3ZYxLmQHIz1Pgx7bM/U3vzxAXqW8a0KR1WEzGbqJz1QBsFJcwV2 NXW3uCzyHLO7hjbHhTTkI2EkMVEEatnzxFFZdCECs2v6y0hUOiRptIDQT HgAKyVJY7UeH0PTSOPCbHw0O4nTeYVSVpM1LW5fCzDSMD/VJfa2xXS8bj Zuh3H/WWAR9lTsxOhT7dbDOuEIPmvR91/wrHB2/U/qq8v2BAmevVjZwoL Q==; X-CSE-ConnectionGUID: bu9M7bAMS2+eedwZfKR0kQ== X-CSE-MsgGUID: d/B0OxKMSleh+7Cpu2qGsQ== X-IronPort-AV: E=McAfee;i="6700,10204,11168"; a="33444661" X-IronPort-AV: E=Sophos;i="6.10,158,1719903600"; d="scan'208";a="33444661" Received: from orviesa008.jf.intel.com ([10.64.159.148]) by orvoesa105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Aug 2024 02:03:15 -0700 X-CSE-ConnectionGUID: Gj5YIP9bTiOZXWavGh2Q1w== X-CSE-MsgGUID: 5Cpg3zXvQIGourmuxFeIvw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.10,158,1719903600"; d="scan'208";a="61084443" Received: from scymds04.sc.intel.com ([10.82.73.238]) by orviesa008.jf.intel.com with ESMTP; 19 Aug 2024 02:03:15 -0700 Received: from icl-spr-01.jf.intel.com (icl-spr-01.jf.intel.com [10.165.54.241]) by scymds04.sc.intel.com (Postfix) with ESMTP id 960952003EC6; Mon, 19 Aug 2024 02:03:14 -0700 (PDT) From: Haochen Jiang To: gcc-patches@gcc.gnu.org Cc: hongtao.liu@intel.com, ubizjak@gmail.com, "Zhang, Jun" Subject: [PATCH 10/12] AVX10.2: Support vector copy instructions Date: Mon, 19 Aug 2024 02:03:12 -0700 Message-ID: <20240819090314.193441-1-haochen.jiang@intel.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20240819085717.193256-1-haochen.jiang@intel.com> References: <20240819085717.193256-1-haochen.jiang@intel.com> MIME-Version: 1.0 X-Spam-Status: No, score=-10.8 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, SPF_HELO_NONE, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org From: "Zhang, Jun" gcc/ChangeLog: * config/config.gcc: Add avx10_2copyintrin.h. * config/i386/i386.md (avx10_2): New isa attribute. * config/i386/immintrin.h: Include avx10_2copyintrin.h. * config/i386/sse.md (sse_movss_): Add new constraints to handle AVX10.2. (vec_set_0): Ditto. (@vec_set_0): Ditto. (vec_set_0): Ditto. (avx512fp16_mov): Ditto. (*vec_set_0_1): New split. * config/i386/avx10_2copyintrin.h: New file. gcc/testsuite/ChangeLog: * gcc.target/i386/avx10_2-vmovd-1.c: New test. * gcc.target/i386/avx10_2-vmovd-2.c: Ditto. * gcc.target/i386/avx10_2-vmovw-1.c: Ditto. * gcc.target/i386/avx10_2-vmovw-2.c: Ditto. --- gcc/config.gcc | 3 +- gcc/config/i386/avx10_2copyintrin.h | 38 +++++ gcc/config/i386/i386.md | 3 +- gcc/config/i386/immintrin.h | 2 + gcc/config/i386/sse.md | 138 +++++++++++------- .../gcc.target/i386/avx10_2-vmovd-1.c | 48 ++++++ .../gcc.target/i386/avx10_2-vmovd-2.c | 44 ++++++ .../gcc.target/i386/avx10_2-vmovw-1.c | 69 +++++++++ .../gcc.target/i386/avx10_2-vmovw-2.c | 64 ++++++++ 9 files changed, 356 insertions(+), 53 deletions(-) create mode 100644 gcc/config/i386/avx10_2copyintrin.h create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-vmovd-1.c create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-vmovd-2.c create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-vmovw-1.c create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-vmovw-2.c diff --git a/gcc/config.gcc b/gcc/config.gcc index cd8a34b292f..e887c9c7432 100644 --- a/gcc/config.gcc +++ b/gcc/config.gcc @@ -457,7 +457,8 @@ i[34567]86-*-* | x86_64-*-*) avx10_2convertintrin.h avx10_2-512convertintrin.h avx10_2bf16intrin.h avx10_2-512bf16intrin.h avx10_2satcvtintrin.h avx10_2-512satcvtintrin.h - avx10_2minmaxintrin.h avx10_2-512minmaxintrin.h" + avx10_2minmaxintrin.h avx10_2-512minmaxintrin.h + avx10_2copyintrin.h" ;; ia64-*-*) extra_headers=ia64intrin.h diff --git a/gcc/config/i386/avx10_2copyintrin.h b/gcc/config/i386/avx10_2copyintrin.h new file mode 100644 index 00000000000..f1150c71dbf --- /dev/null +++ b/gcc/config/i386/avx10_2copyintrin.h @@ -0,0 +1,38 @@ +/* Copyright (C) 2024 Free Software Foundation, Inc. + This file is part of GCC. + GCC is free software; you can redistribute it and/or modify + it under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 3, or (at your option) + any later version. + GCC is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + Under Section 7 of GPL version 3, you are granted additional + permissions described in the GCC Runtime Library Exception, version + 3.1, as published by the Free Software Foundation. + You should have received a copy of the GNU General Public License and + a copy of the GCC Runtime Library Exception along with this program; + see the files COPYING3 and COPYING.RUNTIME respectively. If not, see + . */ + +#if !defined _IMMINTRIN_H_INCLUDED +#error "Never use directly; include instead." +#endif + +#ifndef _AVX10_2COPYINTRIN_H_INCLUDED +#define _AVX10_2COPYINTRIN_H_INCLUDED + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_move_epi32 (__m128i __A) +{ + return _mm_set_epi32 (0, 0, 0, ((__v4si) __A)[0]); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_move_epi16 (__m128i __A) +{ + return _mm_set_epi16 (0, 0, 0, 0, 0, 0, 0, ((__v8hi) __A)[0]); +} + +#endif /* _AVX10_2COPYINTRIN_H_INCLUDED */ diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md index 470ae5444db..e28f9bb5eae 100644 --- a/gcc/config/i386/i386.md +++ b/gcc/config/i386/i386.md @@ -582,7 +582,7 @@ noavx512dq,fma_or_avx512vl,avx512vl,noavx512vl,avxvnni, avx512vnnivl,avx512fp16,avxifma,avx512ifmavl,avxneconvert, avx512bf16vl,vpclmulqdqvl,avx_noavx512f,avx_noavx512vl, - vaes_avx512vl,noapx_nf,apx_cfcmov" + vaes_avx512vl,noapx_nf,apx_cfcmov,avx10_2" (const_string "base")) ;; The (bounding maximum) length of an instruction immediate. @@ -979,6 +979,7 @@ (symbol_ref "TARGET_APX_NDD && Pmode == DImode") (eq_attr "isa" "vaes_avx512vl") (symbol_ref "TARGET_VAES && TARGET_AVX512VL") + (eq_attr "isa" "avx10_2") (symbol_ref "TARGET_AVX10_2_256") (eq_attr "mmx_isa" "native") (symbol_ref "!TARGET_MMX_WITH_SSE") diff --git a/gcc/config/i386/immintrin.h b/gcc/config/i386/immintrin.h index 0d5af155c36..6b8035e6467 100644 --- a/gcc/config/i386/immintrin.h +++ b/gcc/config/i386/immintrin.h @@ -160,4 +160,6 @@ #include +#include + #endif /* _IMMINTRIN_H_INCLUDED */ diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index 956cdba55d3..93aa6d46ae4 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -11505,19 +11505,20 @@ (set_attr "mode" "V4SF,V4SF,V2SF,V2SF,V2SF")]) (define_insn "sse_movss_" - [(set (match_operand:VI4F_128 0 "register_operand" "=x,v") + [(set (match_operand:VI4F_128 0 "register_operand" "=x,v,v") (vec_merge:VI4F_128 - (match_operand:VI4F_128 2 "register_operand" " x,v") - (match_operand:VI4F_128 1 "register_operand" " 0,v") + (match_operand:VI4F_128 2 "register_operand" " x,v,v") + (match_operand:VI4F_128 1 "reg_or_0_operand" " 0,v,C") (const_int 1)))] "TARGET_SSE" "@ movss\t{%2, %0|%0, %2} - vmovss\t{%2, %1, %0|%0, %1, %2}" - [(set_attr "isa" "noavx,avx") + vmovss\t{%2, %1, %0|%0, %1, %2} + vmovd\t{%2, %0|%0, %2}" + [(set_attr "isa" "noavx,avx,avx10_2") (set_attr "type" "ssemov") - (set_attr "prefix" "orig,maybe_evex") - (set_attr "mode" "SF")]) + (set_attr "prefix" "orig,maybe_evex,evex") + (set_attr "mode" "SF,SF,SI")]) (define_insn "avx2_vec_dup" [(set (match_operand:VF1_128_256 0 "register_operand" "=v") @@ -11687,18 +11688,19 @@ ;; see comment above inline_secondary_memory_needed function in i386.cc (define_insn "vec_set_0" [(set (match_operand:VI4F_128 0 "nonimmediate_operand" - "=Yr,*x,v,v,v,x,x,v,Yr ,?x ,x ,m ,m ,m") + "=Yr,*x,v,v,v,v,x,x,v,Yr ,?x ,x ,m ,m ,m") (vec_merge:VI4F_128 (vec_duplicate:VI4F_128 (match_operand: 2 "general_operand" - " Yr,*x,v,m,r ,m,x,v,?jrjm,?jrjm,?rm,!x,?re,!*fF")) + " Yr,*x,v,v,m,r ,m,x,v,?jrjm,?jrjm,?rm,!x,?re,!*fF")) (match_operand:VI4F_128 1 "nonimm_or_0_operand" - " C , C,C,C,C ,C,0,v,0 ,0 ,x ,0 ,0 ,0") + " C , C,C,C,C,C ,C,0,v,0 ,0 ,x ,0 ,0 ,0") (const_int 1)))] "TARGET_SSE" "@ insertps\t{$0xe, %2, %0|%0, %2, 0xe} insertps\t{$0xe, %2, %0|%0, %2, 0xe} + vmovd\t{%2, %0|%0, %2} vinsertps\t{$0xe, %2, %2, %0|%0, %2, %2, 0xe} %vmov\t{%2, %0|%0, %2} %vmovd\t{%2, %0|%0, %2} @@ -11712,22 +11714,24 @@ # #" [(set (attr "isa") - (cond [(eq_attr "alternative" "0,1,8,9") + (cond [(eq_attr "alternative" "0,1,9,10") (const_string "sse4_noavx") - (eq_attr "alternative" "2,7,10") + (eq_attr "alternative" "2") + (const_string "avx10_2") + (eq_attr "alternative" "3,8,11") (const_string "avx") - (eq_attr "alternative" "3,4") + (eq_attr "alternative" "4,5") (const_string "sse2") - (eq_attr "alternative" "5,6") + (eq_attr "alternative" "6,7") (const_string "noavx") ] (const_string "*"))) (set (attr "type") - (cond [(eq_attr "alternative" "0,1,2,8,9,10") + (cond [(eq_attr "alternative" "0,1,3,9,10,11") (const_string "sselog") - (eq_attr "alternative" "12") - (const_string "imov") (eq_attr "alternative" "13") + (const_string "imov") + (eq_attr "alternative" "14") (const_string "fmov") ] (const_string "ssemov"))) @@ -11736,45 +11740,46 @@ (const_string "gpr16") (const_string "*"))) (set (attr "prefix_extra") - (if_then_else (eq_attr "alternative" "8,9,10") + (if_then_else (eq_attr "alternative" "9,10,11") (const_string "1") (const_string "*"))) (set (attr "length_immediate") - (if_then_else (eq_attr "alternative" "8,9,10") + (if_then_else (eq_attr "alternative" "9,10,11") (const_string "1") (const_string "*"))) (set (attr "prefix") - (cond [(eq_attr "alternative" "0,1,5,6,8,9") + (cond [(eq_attr "alternative" "0,1,6,7,9,10") (const_string "orig") - (eq_attr "alternative" "2") + (eq_attr "alternative" "2,3") (const_string "maybe_evex") - (eq_attr "alternative" "3,4") + (eq_attr "alternative" "4,5") (const_string "maybe_vex") - (eq_attr "alternative" "7,10") + (eq_attr "alternative" "8,11") (const_string "vex") ] (const_string "*"))) - (set_attr "mode" "SF,SF,SF,,SI,SF,SF,SF,TI,TI,TI,*,*,*") + (set_attr "mode" "SF,SF,SI,SF,,SI,SF,SF,SF,TI,TI,TI,*,*,*") (set (attr "preferred_for_speed") - (cond [(eq_attr "alternative" "4") + (cond [(eq_attr "alternative" "5") (symbol_ref "TARGET_INTER_UNIT_MOVES_TO_VEC") ] (symbol_ref "true")))]) (define_insn "@vec_set_0" [(set (match_operand:V8_128 0 "register_operand" - "=v,v,v,x,x,Yr,*x,x,x,x,v,v") + "=v,v,v,v,x,x,Yr,*x,x,x,x,v,v") (vec_merge:V8_128 (vec_duplicate:V8_128 (match_operand: 2 "nonimmediate_operand" - " r,m,v,r,m,Yr,*x,r,m,x,r,m")) + " r,m,v,v,r,m,Yr,*x,r,m,x,r,m")) (match_operand:V8_128 1 "reg_or_0_operand" - " C,C,v,0,0,0 ,0 ,x,x,x,v,v") + " C,C,C,v,0,0,0 ,0 ,x,x,x,v,v") (const_int 1)))] "TARGET_SSE2" "@ vmovw\t{%k2, %0|%0, %k2} vmovw\t{%2, %0|%0, %2} + vmovw\t{%2, %0|%0, %2} vmovsh\t{%2, %1, %0|%0, %1, %2} pinsrw\t{$0, %k2, %0|%0, %k2, 0} pinsrw\t{$0, %2, %0|%0, %2, 0} @@ -11786,65 +11791,92 @@ vpinsrw\t{$0, %k2, %1, %0|%0, %1, %k2, 0} vpinsrw\t{$0, %2, %1, %0|%0, %1, %2, 0}" [(set (attr "isa") - (cond [(eq_attr "alternative" "0,1,2") + (cond [(eq_attr "alternative" "0,1,3") (const_string "avx512fp16") - (eq_attr "alternative" "3,4") + (eq_attr "alternative" "2") + (const_string "avx10_2") + (eq_attr "alternative" "4,5") (const_string "noavx") - (eq_attr "alternative" "5,6") + (eq_attr "alternative" "6,7") (const_string "sse4_noavx") - (eq_attr "alternative" "7,8,9") + (eq_attr "alternative" "8,9,10") (const_string "avx") - (eq_attr "alternative" "10,11") + (eq_attr "alternative" "11,12") (const_string "avx512bw") ] (const_string "*"))) (set (attr "type") - (if_then_else (eq_attr "alternative" "0,1,2,5,6,9") + (if_then_else (eq_attr "alternative" "0,1,2,3,6,7,10") (const_string "ssemov") (const_string "sselog"))) (set (attr "prefix_data16") - (if_then_else (eq_attr "alternative" "3,4") + (if_then_else (eq_attr "alternative" "4,5") (const_string "1") (const_string "*"))) (set (attr "prefix_extra") - (if_then_else (eq_attr "alternative" "5,6,9") + (if_then_else (eq_attr "alternative" "6,7,10") (const_string "1") (const_string "*"))) (set (attr "length_immediate") - (if_then_else (eq_attr "alternative" "0,1,2") + (if_then_else (eq_attr "alternative" "0,1,2,3") (const_string "*") (const_string "1"))) (set (attr "prefix") - (cond [(eq_attr "alternative" "0,1,2,10,11") + (cond [(eq_attr "alternative" "0,1,2,3,11,12") (const_string "evex") - (eq_attr "alternative" "7,8,9") + (eq_attr "alternative" "8,9,10") (const_string "vex") ] (const_string "orig"))) (set (attr "mode") - (if_then_else (eq_attr "alternative" "0,1,2") + (if_then_else (eq_attr "alternative" "0,1,2,3") (const_string "HF") (const_string "TI"))) (set (attr "enabled") (cond [(and (not (match_test "mode == V8HFmode || mode == V8BFmode")) - (eq_attr "alternative" "2")) + (eq_attr "alternative" "3")) (symbol_ref "false") ] (const_string "*")))]) +(define_insn_and_split "*vec_set_0_1" + [(set (match_operand:V8_128 0 "register_operand") + (vec_merge:V8_128 + (vec_duplicate:V8_128 + (vec_select: + (match_operand:V8_128 2 "nonimmediate_operand") + (parallel [(const_int 0)]))) + (match_operand:V8_128 1 "reg_or_0_operand") + (const_int 1)))] + "TARGET_SSE2 && ix86_pre_reload_split ()" + "#" + "&& 1" + [(set (match_dup 0) + (vec_merge:V8_128 + (vec_duplicate:V8_128 (match_dup 2)) + (match_dup 1) + (const_int 1)))] +{ + if (register_operand (operands[2], mode)) + operands[2] = force_reg (mode, operands[2]); + operands[2] = gen_lowpart (mode, operands[2]); +}) + ;; vmovw clears also the higer bits (define_insn "vec_set_0" - [(set (match_operand:VI2F_256_512 0 "register_operand" "=v,v") + [(set (match_operand:VI2F_256_512 0 "register_operand" "=v,v,v") (vec_merge:VI2F_256_512 (vec_duplicate:VI2F_256_512 - (match_operand: 2 "nonimmediate_operand" "r,m")) + (match_operand: 2 "nonimmediate_operand" "r,m,v")) (match_operand:VI2F_256_512 1 "const0_operand") (const_int 1)))] "TARGET_AVX512FP16" "@ vmovw\t{%k2, %x0|%x0, %k2} + vmovw\t{%2, %x0|%x0, %2} vmovw\t{%2, %x0|%x0, %2}" - [(set_attr "type" "ssemov") + [(set_attr "isa" "*,*,avx10_2") + (set_attr "type" "ssemov") (set_attr "prefix" "evex") (set_attr "mode" "HF")]) @@ -11889,16 +11921,20 @@ }) (define_insn "avx512fp16_mov" - [(set (match_operand:V8_128 0 "register_operand" "=v") + [(set (match_operand:V8_128 0 "register_operand" "=v,v") (vec_merge:V8_128 - (match_operand:V8_128 2 "register_operand" "v") - (match_operand:V8_128 1 "register_operand" "v") + (match_operand:V8_128 2 "register_operand" "v,v") + (match_operand:V8_128 1 "reg_or_0_operand" "v,C") (const_int 1)))] - "TARGET_AVX512FP16" - "vmovsh\t{%2, %1, %0|%0, %1, %2}" - [(set_attr "type" "ssemov") + "TARGET_AVX512FP16 + || (TARGET_AVX10_2_256 && const0_operand (operands[1], mode))" + "@ + vmovsh\t{%2, %1, %0|%0, %1, %2} + vmovw\t{%2, %0|%2, %0}" + [(set_attr "isa" "*,avx10_2") + (set_attr "type" "ssemov") (set_attr "prefix" "evex") - (set_attr "mode" "HF")]) + (set_attr "mode" "HF,HI")]) ;; A subset is vec_setv4sf. (define_insn "*vec_setv4sf_sse4_1" diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-vmovd-1.c b/gcc/testsuite/gcc.target/i386/avx10_2-vmovd-1.c new file mode 100644 index 00000000000..275bbade106 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx10_2-vmovd-1.c @@ -0,0 +1,48 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -mavx10.2" } */ +/* { dg-final { scan-assembler-times "vmovd\t4\\(%esp\\), %xmm0" 1 { target ia32 } } } */ +/* { dg-final { scan-assembler-times "vmovss\t4\\(%esp\\), %xmm0" 1 { target ia32 } } } */ +/* { dg-final { scan-assembler-times "vmovd\t%xmm0, %xmm0" 3 { target ia32 } } } */ +/* { dg-final { scan-assembler-times "vmovd\t%edi, %xmm0" 1 { target { ! ia32 } } } } */ +/* { dg-final { scan-assembler-times "vmovd\t%xmm0, %xmm0" 4 { target { ! ia32 } } } } */ + + +#include + +typedef int v4si __attribute__((vector_size(16))); +typedef float v4sf __attribute__((vector_size(16))); + +v4si +__attribute__((noipa, unused)) +f1 (int a) +{ + return __extension__(v4si){a, 0, 0, 0}; +} + +v4sf +__attribute__((noipa, unused)) +f2 (float a) +{ + return __extension__(v4sf){a, 0, 0, 0}; +} + +v4si +__attribute__((noipa, unused)) +f3 (v4si a) +{ + return __extension__(v4si){a[0], 0, 0, 0}; +} + +v4sf +__attribute__((noipa, unused)) +f4 (v4sf a) +{ + return __extension__(v4sf){a[0], 0, 0, 0}; +} + +__m128i +__attribute__((noipa, unused)) +f5 (__m128i a) +{ + return _mm_set_epi32 (0, 0, 0,((__v4si)a)[0]); +} diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-vmovd-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-vmovd-2.c new file mode 100644 index 00000000000..7d659300d81 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx10_2-vmovd-2.c @@ -0,0 +1,44 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mavx10.2" } */ +/* { dg-require-effective-target avx10_2 } */ + +#define AVX10_2 +#define AVX10_SCALAR + +#include "avx10-helper.h" +#include "avx10_2-vmovd-1.c" + +static void +TEST (void) +{ + union128i_d u1, s1; + int e1[4] = {0}; + + s1.x = _mm_set_epi32(-12876, -12886, -12776, 3376590); + e1[0] = s1.a[0]; + + u1.x = _mm_set_epi32(-1, -1, -1, -1); + u1.x = (__m128i)f1((int)s1.a[0]); + if (check_union128i_d (u1, e1)) + abort (); + + u1.x = _mm_set_epi32(-1, -1, -1, -1); + u1.x = (__m128i)f2(((float*)s1.a)[0]); + if (check_union128i_d (u1, e1)) + abort (); + + u1.x = _mm_set_epi32(-1, -1, -1, -1); + u1.x = (__m128i)f3((v4si)s1.x); + if (check_union128i_d (u1, e1)) + abort (); + + u1.x = _mm_set_epi32(-1, -1, -1, -1); + u1.x = (__m128i)f4((v4sf)s1.x); + if (check_union128i_d (u1, e1)) + abort (); + + u1.x = _mm_set_epi32(-1, -1, -1, -1); + u1.x = (__m128i)f5((__m128i)s1.x); + if (check_union128i_d (u1, e1)) + abort (); +} diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-vmovw-1.c b/gcc/testsuite/gcc.target/i386/avx10_2-vmovw-1.c new file mode 100644 index 00000000000..ec19a9a263a --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx10_2-vmovw-1.c @@ -0,0 +1,69 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -mavx10.2" } */ +/* { dg-final { scan-assembler-times "vmovw\t4\\(%esp\\), %xmm0" 3 { target ia32 } } } */ +/* { dg-final { scan-assembler-times "vmovw\t8\\(%ebp\\), %xmm0" 1 { target ia32 } } } */ +/* { dg-final { scan-assembler-times "vmovw\t%xmm0, %xmm0" 4 { target ia32 } } } */ +/* { dg-final { scan-assembler-times "vmovw\t%edi, %xmm0" 1 { target { ! ia32 } } } } */ +/* { dg-final { scan-assembler-times "vmovw\t%xmm0, %xmm0" 7 { target { ! ia32 } } } } */ + +#include + +typedef _Float16 v8hf __attribute__((vector_size(16))); +typedef __bf16 v8bf __attribute__((vector_size(16))); +typedef short v8hi __attribute__((vector_size(16))); + +v8hf +__attribute__((noipa, unused)) +f1 (_Float16 a) +{ + return __extension__(v8hf){a, 0, 0, 0, 0, 0, 0, 0}; +} + +v8bf +__attribute__((noipa, unused)) +f2 (__bf16 a) +{ + return __extension__(v8bf){a, 0, 0, 0, 0, 0, 0, 0}; +} + +v8hi +__attribute__((noipa, unused)) +f3 (short a) +{ + return __extension__(v8hi){a, 0, 0, 0, 0, 0, 0, 0}; +} + +v8hf +__attribute__((noipa, unused)) +f4 (v8hf a) +{ + return __extension__(v8hf){a[0], 0, 0, 0, 0, 0, 0, 0}; +} + +v8bf +__attribute__((noipa, unused)) +f5 (v8bf a) +{ + return __extension__(v8bf){a[0], 0, 0, 0, 0, 0, 0, 0}; +} + +v8hi +__attribute__((noipa, unused)) +f6 (v8hi a) +{ + return __extension__(v8hi){a[0], 0, 0, 0, 0, 0, 0, 0}; +} + +__m128i +__attribute__((noipa, unused)) +f7 (__m128i a) +{ + return _mm_set_epi16 (0, 0, 0, 0, 0, 0, 0, ((__v8hi)a)[0]); +} + +__m256h +__attribute__((noipa, unused)) +f8 (_Float16 a) +{ + return _mm256_set_ph (0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, a); +} diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-vmovw-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-vmovw-2.c new file mode 100644 index 00000000000..d63739e6887 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx10_2-vmovw-2.c @@ -0,0 +1,64 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mavx10.2" } */ +/* { dg-require-effective-target avx10_2 } */ + +#define AVX10_2 +#define AVX10_SCALAR + +#include "avx10-helper.h" +#include "avx10_2-vmovw-1.c" + +static void +TEST (void) +{ + union128i_w u1, s1; + union256i_w u2, s2; + short e1[8] = {0}; + short e2[16] = {0}; + + s1.x = _mm_set_epi16(-12876, -12886, -12776, -22876, -22886, -22776, -32766, 30158); + e1[0] = s1.a[0]; + + u1.x = _mm_set_epi16(-1, -1, -1, -1, -1, -1, -1, -1); + u1.x = (__m128i)f1(((_Float16*)s1.a)[0]); + if (check_union128i_w (u1, e1)) + abort (); + + u1.x = _mm_set_epi16(-1, -1, -1, -1, -1, -1, -1, -1); + u1.x = (__m128i)f2(((__bf16*)s1.a)[0]); + if (check_union128i_w (u1, e1)) + abort (); + + u1.x = _mm_set_epi16(-1, -1, -1, -1, -1, -1, -1, -1); + u1.x = (__m128i)f3((short)s1.a[0]); + if (check_union128i_w (u1, e1)) + abort (); + + u1.x = _mm_set_epi16(-1, -1, -1, -1, -1, -1, -1, -1); + u1.x = (__m128i)f4((v8hf)s1.x); + if (check_union128i_w (u1, e1)) + abort (); + + u1.x = _mm_set_epi16(-1, -1, -1, -1, -1, -1, -1, -1); + u1.x = (__m128i)f5((v8bf)s1.x); + if (check_union128i_w (u1, e1)) + abort (); + + u1.x = _mm_set_epi16(-1, -1, -1, -1, -1, -1, -1, -1); + u1.x = (__m128i)f6((v8hi)s1.x); + if (check_union128i_w (u1, e1)) + abort (); + + u1.x = _mm_set_epi16(-1, -1, -1, -1, -1, -1, -1, -1); + u1.x = (__m128i)f7((__m128i)s1.x); + if (check_union128i_w (u1, e1)) + abort (); + + s2.x = _mm256_set_epi16(-12876, -12886, -12776, -22876, -22886, -22776, -32766, 30158, + -12876, -12886, -12776, -22876, -22886, -22776, -32766, 30158); + e2[0] = s2.a[0]; + u2.x = _mm256_set_epi16(-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1); + u2.x = (__m256i)f8(((_Float16*)s2.a)[0]); + if (check_union256i_w (u2, e2)) + abort (); +} From patchwork Mon Aug 19 09:03:29 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Haochen Jiang X-Patchwork-Id: 1973740 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=intel.com header.i=@intel.com header.a=rsa-sha256 header.s=Intel header.b=dqTAo4di; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4WnRWh5rBRz1yXZ for ; Mon, 19 Aug 2024 19:05:20 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id A39E338654B3 for ; Mon, 19 Aug 2024 09:05:18 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.13]) by sourceware.org (Postfix) with ESMTPS id 342E3386484A for ; Mon, 19 Aug 2024 09:03:33 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 342E3386484A Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=intel.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 342E3386484A Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=198.175.65.13 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1724058218; cv=none; b=U/Pfg3+g9Ydb/9MxrsOESFCGGg3YFqIur6IgKsWzt5imD+KVn/QO0w5YhvCUKWgF0w7SoxXbLsIk5o6Oj4C8cFW3xuSifrIQnfvuMiGgs4g7TXnicmDUJsahS9Cw/0VelLFkHKbX4H9wNinnmeD3KtvRFBriad4PferyquEUTos= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1724058218; c=relaxed/simple; bh=DAQnyzVsJ247PRQCIzCHFNlps2/r6UcHvIjC+0VtEaE=; h=DKIM-Signature:From:To:Subject:Date:Message-ID:MIME-Version; b=HZyNkcOUbvyD098iralYhLAvDLq1QX2wyJ1p6ktj0YTZZPmrliUj3d6kIpED1GYwSGKskk21Uf2V31s1cG5OgyK3PmXmwZFFUolcgLPe9Hflguhq9qg6AMUi8ANuJSYeyU6jq2gX2UsPW5I+ZJYbOIzlZlyZZEqiLlFJUALFHPY= ARC-Authentication-Results: i=1; server2.sourceware.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1724058215; x=1755594215; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=DAQnyzVsJ247PRQCIzCHFNlps2/r6UcHvIjC+0VtEaE=; b=dqTAo4diAnMbQ5B3D/f5ehv1joq4Hx1gmq0scTWXd222zlpyq4VbKyw9 NVe12PciRACl1NCd1yE3BAH76e2NPwG5bfEpNwMD4t2xZEyyeJ80w+66Z PCGDc4w15dyzzF/TCEGUAhT8NGJPNJn79jPcc1m2vwhhc1fX74Z4g5E/P gMuNu488BB32MhVcWRWgrrnxSpgFBJUJCTOfXM8b+rlNhPPYSXX3GKXIS M1pqHIfv5JdigtzLsJDOjUWzkxVLmFsu9wvjkIdXAuTcZHY56w6KfZpWv XPaq6vJi6SaFKPfiHE4HlSfTgn7/VE5Q+pGMs8Hww2lFvIEpSHvaq815A A==; X-CSE-ConnectionGUID: +ZtVk2NJQ0K/8pRWEn+8vQ== X-CSE-MsgGUID: 77glFrKJQDKeUEDFWur6/Q== X-IronPort-AV: E=McAfee;i="6700,10204,11168"; a="33444717" X-IronPort-AV: E=Sophos;i="6.10,158,1719903600"; d="scan'208";a="33444717" Received: from orviesa008.jf.intel.com ([10.64.159.148]) by orvoesa105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Aug 2024 02:03:32 -0700 X-CSE-ConnectionGUID: 9hzHq/IyQnu0LurqfYjfhw== X-CSE-MsgGUID: 4S+qKclyTJybf+LRePdixA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.10,158,1719903600"; d="scan'208";a="61084534" Received: from scymds04.sc.intel.com ([10.82.73.238]) by orviesa008.jf.intel.com with ESMTP; 19 Aug 2024 02:03:33 -0700 Received: from icl-spr-01.jf.intel.com (icl-spr-01.jf.intel.com [10.165.54.241]) by scymds04.sc.intel.com (Postfix) with ESMTP id CA7942003EC6; Mon, 19 Aug 2024 02:03:31 -0700 (PDT) From: Haochen Jiang To: gcc-patches@gcc.gnu.org Cc: hongtao.liu@intel.com, ubizjak@gmail.com, "Zhang, Jun" Subject: [PATCH 11/12] AVX10.2: Support compare instructions Date: Mon, 19 Aug 2024 02:03:29 -0700 Message-ID: <20240819090331.193452-1-haochen.jiang@intel.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20240819085717.193256-1-haochen.jiang@intel.com> References: <20240819085717.193256-1-haochen.jiang@intel.com> MIME-Version: 1.0 X-Spam-Status: No, score=-10.8 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, SPF_HELO_NONE, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org From: "Zhang, Jun" gcc/ChangeLog: * config/i386/i386-expand.cc (ix86_ssecom_setcc): Mention behavior change on flags. (ix86_expand_sse_comi): Handle AVX10.2 behavior. (ix86_expand_sse_comi_round): Ditto. (ix86_expand_round_builtin): Ditto. (ix86_expand_builtin): Change function call. * config/i386/i386.md (UNSPEC_COMX): New unspec. * config/i386/sse.md (avx10_2_vcomx): New. (_comi): Add HFmode. gcc/testsuite/ChangeLog: * gcc.target/i386/avx10_2-compare-1.c: New test. Co-authored-by: Haochen Jiang Co-authored-by: Hongtao Liu --- gcc/config/i386/i386-expand.cc | 170 +++++++++++++++--- gcc/config/i386/i386.md | 1 + gcc/config/i386/sse.md | 18 +- .../gcc.target/i386/avx10_2-compare-1.c | 21 +++ 4 files changed, 183 insertions(+), 27 deletions(-) create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-compare-1.c diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc index 0322ef003d1..cdeb8b14eb7 100644 --- a/gcc/config/i386/i386-expand.cc +++ b/gcc/config/i386/i386-expand.cc @@ -10664,7 +10664,9 @@ ix86_ssecom_setcc (const enum rtx_code comparison, rtx_code_label *label = NULL; /* NB: For ordered EQ or unordered NE, check ZF alone isn't sufficient - with NAN operands. */ + with NAN operands. + Under TARGET_AVX10_2_256, VCOMX/VUCOMX are generated instead of + COMI/UCOMI. VCOMX/VUCOMX will not set ZF for NAN operands. */ if (check_unordered) { gcc_assert (comparison == EQ || comparison == NE); @@ -10703,7 +10705,7 @@ ix86_ssecom_setcc (const enum rtx_code comparison, static rtx ix86_expand_sse_comi (const struct builtin_description *d, tree exp, - rtx target) + rtx target, bool comx_ok) { rtx pat, set_dst; tree arg0 = CALL_EXPR_ARG (exp, 0); @@ -10736,11 +10738,13 @@ ix86_expand_sse_comi (const struct builtin_description *d, tree exp, case GE: break; case EQ: - check_unordered = true; + if (!TARGET_AVX10_2_256 || !comx_ok) + check_unordered = true; mode = CCZmode; break; case NE: - check_unordered = true; + if (!TARGET_AVX10_2_256 || !comx_ok) + check_unordered = true; mode = CCZmode; const_val = const1_rtx; break; @@ -10759,6 +10763,28 @@ ix86_expand_sse_comi (const struct builtin_description *d, tree exp, || !insn_p->operand[1].predicate (op1, mode1)) op1 = copy_to_mode_reg (mode1, op1); + if ((comparison == EQ || comparison == NE) + && TARGET_AVX10_2_256 && comx_ok) + { + switch (icode) + { + case CODE_FOR_sse_comi: + icode = CODE_FOR_avx10_2_comxsf; + break; + case CODE_FOR_sse_ucomi: + icode = CODE_FOR_avx10_2_ucomxsf; + break; + case CODE_FOR_sse2_comi: + icode = CODE_FOR_avx10_2_comxdf; + break; + case CODE_FOR_sse2_ucomi: + icode = CODE_FOR_avx10_2_ucomxdf; + break; + + default: + gcc_unreachable (); + } + } pat = GEN_FCN (icode) (op0, op1); if (! pat) return 0; @@ -12253,7 +12279,7 @@ ix86_erase_embedded_rounding (rtx pat) with rounding. */ static rtx ix86_expand_sse_comi_round (const struct builtin_description *d, - tree exp, rtx target) + tree exp, rtx target, bool comx_ok) { rtx pat, set_dst; tree arg0 = CALL_EXPR_ARG (exp, 0); @@ -12315,6 +12341,7 @@ ix86_expand_sse_comi_round (const struct builtin_description *d, op1 = safe_vector_operand (op1, mode1); enum rtx_code comparison = comparisons[INTVAL (op2)]; + enum rtx_code orig_comp = comparison; bool ordered = ordereds[INTVAL (op2)]; bool non_signaling = non_signalings[INTVAL (op2)]; rtx const_val = const0_rtx; @@ -12326,10 +12353,21 @@ ix86_expand_sse_comi_round (const struct builtin_description *d, case ORDERED: if (!ordered) { - /* NB: Use CCSmode/NE for _CMP_TRUE_UQ/_CMP_TRUE_US. */ - if (!non_signaling) - ordered = true; - mode = CCSmode; + if (TARGET_AVX10_2_256 && comx_ok) + { + /* Unlike VCOMI{SH,SS,SD}, VCOMX{SH,SS,SD} will set SF + differently. So directly return true here. */ + target = gen_reg_rtx (SImode); + emit_move_insn (target, const1_rtx); + return target; + } + else + { + /* NB: Use CCSmode/NE for _CMP_TRUE_UQ/_CMP_TRUE_US. */ + if (!non_signaling) + ordered = true; + mode = CCSmode; + } } else { @@ -12343,10 +12381,21 @@ ix86_expand_sse_comi_round (const struct builtin_description *d, case UNORDERED: if (ordered) { - /* NB: Use CCSmode/EQ for _CMP_FALSE_OQ/_CMP_FALSE_OS. */ - if (non_signaling) - ordered = false; - mode = CCSmode; + if (TARGET_AVX10_2_256 && comx_ok) + { + /* Unlike VCOMI{SH,SS,SD}, VCOMX{SH,SS,SD} will set SF + differently. So directly return false here. */ + target = gen_reg_rtx (SImode); + emit_move_insn (target, const0_rtx); + return target; + } + else + { + /* NB: Use CCSmode/EQ for _CMP_FALSE_OQ/_CMP_FALSE_OS. */ + if (non_signaling) + ordered = false; + mode = CCSmode; + } } else { @@ -12377,17 +12426,23 @@ ix86_expand_sse_comi_round (const struct builtin_description *d, if (ordered == non_signaling) ordered = !ordered; break; - case EQ: /* NB: COMI/UCOMI will set ZF with NAN operands. Use CCZmode for - _CMP_EQ_OQ/_CMP_EQ_OS. */ - check_unordered = true; + _CMP_EQ_OQ/_CMP_EQ_OS. + Under TARGET_AVX10_2_256, VCOMX/VUCOMX are always generated instead + of COMI/UCOMI, VCOMX/VUCOMX will not set ZF with NAN. */ + case EQ: + if (!TARGET_AVX10_2_256 || !comx_ok) + check_unordered = true; mode = CCZmode; break; case NE: /* NB: COMI/UCOMI will set ZF with NAN operands. Use CCZmode for - _CMP_NEQ_UQ/_CMP_NEQ_US. */ + _CMP_NEQ_UQ/_CMP_NEQ_US. + Under TARGET_AVX10_2_256, VCOMX/VUCOMX are always generated instead + of COMI/UCOMI, VCOMX/VUCOMX will not set ZF with NAN. */ gcc_assert (!ordered); - check_unordered = true; + if (!TARGET_AVX10_2_256 || !comx_ok) + check_unordered = true; mode = CCZmode; const_val = const1_rtx; break; @@ -12406,14 +12461,77 @@ ix86_expand_sse_comi_round (const struct builtin_description *d, || !insn_p->operand[1].predicate (op1, mode1)) op1 = copy_to_mode_reg (mode1, op1); + /* Generate comx instead of comi when EQ/NE to avoid NAN checks. + Use orig_comp to exclude ORDERED/UNORDERED cases. */ + if ((orig_comp == EQ || orig_comp == NE) + && TARGET_AVX10_2_256 && comx_ok) + { + switch (icode) + { + case CODE_FOR_avx512fp16_comi_round: + icode = CODE_FOR_avx10_2_comxhf_round; + break; + case CODE_FOR_sse_comi_round: + icode = CODE_FOR_avx10_2_comxsf_round; + break; + case CODE_FOR_sse2_comi_round: + icode = CODE_FOR_avx10_2_comxdf_round; + break; + + default: + break; + } + } + + /* Generate comi instead of comx when UNEQ/LTGT to avoid NAN checks. */ + if ((comparison == UNEQ || comparison == LTGT) + && TARGET_AVX10_2_256 && comx_ok) + { + switch (icode) + { + case CODE_FOR_avx10_2_comxhf_round: + icode = CODE_FOR_avx512fp16_comi_round; + break; + case CODE_FOR_avx10_2_comxsf_round: + icode = CODE_FOR_sse_comi_round; + break; + case CODE_FOR_avx10_2_comxdf_round: + icode = CODE_FOR_sse2_comi_round; + break; + + default: + break; + } + } + /* - 1. COMI: ordered and signaling. - 2. UCOMI: unordered and non-signaling. + 1. COMI/VCOMX: ordered and signaling. + 2. UCOMI/VUCOMX: unordered and non-signaling. */ if (non_signaling) - icode = (icode == CODE_FOR_sse_comi_round - ? CODE_FOR_sse_ucomi_round - : CODE_FOR_sse2_ucomi_round); + switch (icode) + { + case CODE_FOR_sse_comi_round: + icode = CODE_FOR_sse_ucomi_round; + break; + case CODE_FOR_sse2_comi_round: + icode = CODE_FOR_sse2_ucomi_round; + break; + case CODE_FOR_avx512fp16_comi_round: + icode = CODE_FOR_avx512fp16_ucomi_round; + break; + case CODE_FOR_avx10_2_comxsf_round: + icode = CODE_FOR_avx10_2_ucomxsf_round; + break; + case CODE_FOR_avx10_2_comxhf_round: + icode = CODE_FOR_avx10_2_ucomxhf_round; + break; + case CODE_FOR_avx10_2_comxdf_round: + icode = CODE_FOR_avx10_2_ucomxdf_round; + break; + default: + gcc_unreachable (); + } pat = GEN_FCN (icode) (op0, op1, op3); if (! pat) @@ -12550,7 +12668,7 @@ ix86_expand_round_builtin (const struct builtin_description *d, break; case INT_FTYPE_V4SF_V4SF_INT_INT: case INT_FTYPE_V2DF_V2DF_INT_INT: - return ix86_expand_sse_comi_round (d, exp, target); + return ix86_expand_sse_comi_round (d, exp, target, true); case V4DF_FTYPE_V4DF_V4DF_V4DF_UQI_INT: case V8DF_FTYPE_V8DF_V8DF_V8DF_UQI_INT: case V2DF_FTYPE_V2DF_V2DF_V2DF_UQI_INT: @@ -15691,7 +15809,7 @@ rdseed_step: case IX86_BUILTIN_VCOMSBF16GE: case IX86_BUILTIN_VCOMSBF16LT: case IX86_BUILTIN_VCOMSBF16LE: - return ix86_expand_sse_comi (bdesc_args + i, exp, target); + return ix86_expand_sse_comi (bdesc_args + i, exp, target, false); case IX86_BUILTIN_FABSQ: case IX86_BUILTIN_COPYSIGNQ: if (!TARGET_SSE) @@ -15707,7 +15825,7 @@ rdseed_step: && fcode <= IX86_BUILTIN__BDESC_COMI_LAST) { i = fcode - IX86_BUILTIN__BDESC_COMI_FIRST; - return ix86_expand_sse_comi (bdesc_comi + i, exp, target); + return ix86_expand_sse_comi (bdesc_comi + i, exp, target, true); } if (fcode >= IX86_BUILTIN__BDESC_ROUND_ARGS_FIRST diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md index e28f9bb5eae..ab6059759b4 100644 --- a/gcc/config/i386/i386.md +++ b/gcc/config/i386/i386.md @@ -139,6 +139,7 @@ UNSPEC_SCALEF UNSPEC_PCMP UNSPEC_CVTBFSF + UNSPEC_COMX ;; Generic math support UNSPEC_IEEE_MIN ; not commutative diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index 93aa6d46ae4..db538ac4ad5 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -4692,6 +4692,22 @@ (set_attr "prefix" "evex") (set_attr "mode" "")]) +(define_insn "avx10_2_comx" + [(set (reg:CCFP FLAGS_REG) + (unspec:CCFP + [(vec_select:MODEFH + (match_operand: 0 "register_operand" "v") + (parallel [(const_int 0)])) + (vec_select:MODEFH + (match_operand: 1 "" "") + (parallel [(const_int 0)]))] + UNSPEC_COMX))] + "TARGET_AVX10_2_256" + "vcomx\t{%1, %0|%0, %1}" + [(set_attr "type" "ssecomi") + (set_attr "prefix" "evex") + (set_attr "mode" "")]) + (define_insn "_comi" [(set (reg:CCFP FLAGS_REG) (compare:CCFP @@ -4701,7 +4717,7 @@ (vec_select:MODEFH (match_operand: 1 "" "") (parallel [(const_int 0)]))))] - "SSE_FLOAT_MODE_P (mode)" + "SSE_FLOAT_MODE_P (mode) || mode == E_HFmode" "%vcomi\t{%1, %0|%0, %1}" [(set_attr "type" "ssecomi") (set_attr "prefix" "maybe_vex") diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-compare-1.c b/gcc/testsuite/gcc.target/i386/avx10_2-compare-1.c new file mode 100644 index 00000000000..99d32186e6b --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx10_2-compare-1.c @@ -0,0 +1,21 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -mavx10.2" } */ +/* { dg-final { scan-assembler-times "vcomxsd\[ \\t\]+\{sae\}\[^\n\]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcomxss\[ \\t\]+\{sae\}\[^\n\]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vucomxsd\[ \\t\]+\{sae\}\[^\n\]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vucomxss\[ \\t\]+\{sae\}\[^\n\]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ + +#include + +volatile __m128 x3; +volatile __m128d x4; +volatile int a; + +void extern +avx10_2_test (void) +{ + a = _mm_comi_round_sd (x4, x4, _CMP_EQ_OS, _MM_FROUND_NO_EXC); + a = _mm_comi_round_ss (x3, x3, _CMP_NEQ_US, _MM_FROUND_NO_EXC); + a = _mm_comi_round_sd (x4, x4, _CMP_EQ_OQ, _MM_FROUND_NO_EXC); + a = _mm_comi_round_ss (x3, x3, _CMP_NEQ_UQ, _MM_FROUND_NO_EXC); +} From patchwork Mon Aug 19 09:03:38 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Haochen Jiang X-Patchwork-Id: 1973739 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=intel.com header.i=@intel.com header.a=rsa-sha256 header.s=Intel header.b=S45r75cV; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=8.43.85.97; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4WnRWc1Xr3z1yg2 for ; Mon, 19 Aug 2024 19:05:16 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 632613864C64 for ; Mon, 19 Aug 2024 09:05:14 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.17]) by sourceware.org (Postfix) with ESMTPS id 9846538654AD for ; Mon, 19 Aug 2024 09:03:55 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 9846538654AD Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=intel.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 9846538654AD Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=192.198.163.17 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1724058238; cv=none; b=i0ZTaSkl6j0d/Pn7Eaf3q2y1lbvlHFDg8s9su9N1D7mDoFkIe+liliWMYqcIw3S1qTxxzxPEMdMzsDUT/GM2cXoZLBRSJtnHZwZuK0u1wFWLv4xkQofKyZsXYSwRk3SVHT1dYA30kFrbqYD9lQWL5dds0iC3Fa5Bf12Fa/j5hVA= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1724058238; c=relaxed/simple; bh=3LoqrWXlzZQskL+28EwWJ7JqD9AYkLHOR3KSClneXQU=; h=DKIM-Signature:From:To:Subject:Date:Message-ID:MIME-Version; b=rynaKFVPApQ6vSXaXX24GIskPSY7QR8wVBPagRzthN2E+6Ozr9Xlsy/NQZUq+uikZSTv0/Ce5dhGk24iXGOF4rJPG1gMKnzwQIiZ2ij8EIBFjmXifJ+9wQ4dVEnGJwKbQ0i1zsoB8FVlJ7HdO71xXr4MWG5vKW0acqWzxL5PIHY= ARC-Authentication-Results: i=1; server2.sourceware.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1724058236; x=1755594236; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=3LoqrWXlzZQskL+28EwWJ7JqD9AYkLHOR3KSClneXQU=; b=S45r75cVMcejViW4G8bTJKPXb7nYdDoi8JggQKBd9mc71a6VXIVA3pFQ iQ5DNQWy7Bo5JvjxLjZWzpURdhfW1W7IDKXxToaTutctHbffom6uVW/aD LqZ0trMS5LEM0lhx9m3GIaLW2v93OHIjbo03IYwznZl2tSWsGHHxzg9ir FLh2WEOaDLgvAndzmoYEomOuqPt+s5bhnfDCbdjeiwxpMBRgScF78xFiF 5Sp2unSPQqtNb/J4TQc46Y/u+OHrFie9tB9kc99u1XUwjK9VWIfddK56z t20ZJXuOhVTklfTokOYmzVag68JBHoX9d1gbL1cr8b9050iV/J6BuTcjB Q==; X-CSE-ConnectionGUID: xaOb/FtkT5uvoZ2jCcaakw== X-CSE-MsgGUID: Kf7VLojNS3y1dQuQfO0v5Q== X-IronPort-AV: E=McAfee;i="6700,10204,11168"; a="22174100" X-IronPort-AV: E=Sophos;i="6.10,158,1719903600"; d="scan'208";a="22174100" Received: from fmviesa006.fm.intel.com ([10.60.135.146]) by fmvoesa111.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Aug 2024 02:03:41 -0700 X-CSE-ConnectionGUID: MnSsBtfRRSuvYChB5ToEZQ== X-CSE-MsgGUID: K8KPwocLSmmV6TLpE1rgyw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.10,158,1719903600"; d="scan'208";a="59967118" Received: from scymds03.sc.intel.com ([10.148.94.166]) by fmviesa006.fm.intel.com with ESMTP; 19 Aug 2024 02:03:40 -0700 Received: from icl-spr-01.jf.intel.com (icl-spr-01.jf.intel.com [10.165.54.241]) by scymds03.sc.intel.com (Postfix) with ESMTP id 38A387E; Mon, 19 Aug 2024 02:03:40 -0700 (PDT) From: Haochen Jiang To: gcc-patches@gcc.gnu.org Cc: hongtao.liu@intel.com, ubizjak@gmail.com Subject: [PATCH 12/12] i386: Add bf8 -> fp16 intrin Date: Mon, 19 Aug 2024 02:03:38 -0700 Message-ID: <20240819090340.193463-1-haochen.jiang@intel.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20240819085717.193256-1-haochen.jiang@intel.com> References: <20240819085717.193256-1-haochen.jiang@intel.com> MIME-Version: 1.0 X-Spam-Status: No, score=-10.8 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, SPF_HELO_NONE, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org Since BF8 and FP16 have same bits for exponent, the type conversion between them is just a cast for fraction part. We will use a sequence of instrctions instead of new instructions to do that. For convenience, intrins are also provided. gcc/ChangeLog: * config/i386/avx10_2-512convertintrin.h (_mm512_cvtpbf8_ph): New. (_mm512_mask_cvtpbf8_ph): Ditto. (_mm512_maskz_cvtpbf8_ph): Ditto. * config/i386/avx10_2convertintrin.h (_mm_cvtpbf8_ph): Ditto. (_mm_mask_cvtpbf8_ph): Ditto. (_mm_maskz_cvtpbf8_ph): Ditto. (_mm256_cvtpbf8_ph): Ditto. (_mm256_mask_cvtpbf8_ph): Ditto. (_mm256_maskz_cvtpbf8_ph): Ditto. gcc/testsuite/ChangeLog: * gcc.target/i386/avx10_2-512-convert-1.c: Add tests for new intrin. * gcc.target/i386/avx10_2-convert-1.c: Ditto. --- gcc/config/i386/avx10_2-512convertintrin.h | 24 ++++++++++ gcc/config/i386/avx10_2convertintrin.h | 48 +++++++++++++++++++ .../gcc.target/i386/avx10_2-512-convert-1.c | 16 ++++++- .../gcc.target/i386/avx10_2-convert-1.c | 26 ++++++++-- 4 files changed, 109 insertions(+), 5 deletions(-) diff --git a/gcc/config/i386/avx10_2-512convertintrin.h b/gcc/config/i386/avx10_2-512convertintrin.h index 4ad339bbbf9..dfbdfc3e51b 100644 --- a/gcc/config/i386/avx10_2-512convertintrin.h +++ b/gcc/config/i386/avx10_2-512convertintrin.h @@ -540,6 +540,30 @@ _mm512_maskz_cvtnesph_phf8 (__mmask32 __U, __m512h __A) (__mmask32) __U); } +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_cvtpbf8_ph (__m256i __A) +{ + return (__m512h) _mm512_castsi512_ph ((__m512i) _mm512_slli_epi16 ( + (__m512i) _mm512_cvtepi8_epi16 (__A), 8)); +} + +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_cvtpbf8_ph (__m512h __S, __mmask16 __U, __m256i __A) +{ + return (__m512h) _mm512_castsi512_ph ((__m512i) _mm512_mask_slli_epi16 ( + (__m512i) __S, __U, (__m512i) _mm512_cvtepi8_epi16 (__A), 8)); +} + +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_cvtpbf8_ph (__mmask16 __U, __m256i __A) +{ + return (__m512h) _mm512_castsi512_ph ((__m512i) _mm512_slli_epi16 ( + (__m512i) _mm512_maskz_cvtepi8_epi16 (__U, __A), 8)); +} + #ifdef __DISABLE_AVX10_2_512__ #undef __DISABLE_AVX10_2_512__ #pragma GCC pop_options diff --git a/gcc/config/i386/avx10_2convertintrin.h b/gcc/config/i386/avx10_2convertintrin.h index ac62d1290a5..8d2c1a54147 100644 --- a/gcc/config/i386/avx10_2convertintrin.h +++ b/gcc/config/i386/avx10_2convertintrin.h @@ -970,6 +970,54 @@ _mm256_maskz_cvtnesph_phf8 (__mmask16 __U, __m256h __A) (__mmask16) __U); } +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvtpbf8_ph (__m128i __A) +{ + return (__m128h) _mm_castsi128_ph ((__m128i) _mm_slli_epi16 ( + (__m128i) _mm_cvtepi8_epi16 (__A), 8)); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_cvtpbf8_ph (__m128h __S, __mmask8 __U, __m128i __A) +{ + return (__m128h) _mm_castsi128_ph ((__m128i) _mm_mask_slli_epi16 ( + (__m128i) __S, __U, (__m128i) _mm_cvtepi8_epi16 (__A), 8)); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_cvtpbf8_ph (__mmask8 __U, __m128i __A) +{ + return (__m128h) _mm_castsi128_ph ((__m128i) _mm_slli_epi16 ( + (__m128i) _mm_maskz_cvtepi8_epi16 (__U, __A), 8)); +} + +extern __inline __m256h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_cvtpbf8_ph (__m128i __A) +{ + return (__m256h) _mm256_castsi256_ph ((__m256i) _mm256_slli_epi16 ( + (__m256i) _mm256_cvtepi8_epi16 (__A), 8)); +} + +extern __inline __m256h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_mask_cvtpbf8_ph (__m256h __S, __mmask8 __U, __m128i __A) +{ + return (__m256h) _mm256_castsi256_ph ((__m256i) _mm256_mask_slli_epi16 ( + (__m256i) __S, __U, (__m256i) _mm256_cvtepi8_epi16 (__A), 8)); +} + +extern __inline __m256h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_maskz_cvtpbf8_ph (__mmask8 __U, __m128i __A) +{ + return (__m256h) _mm256_castsi256_ph ((__m256i) _mm256_slli_epi16 ( + (__m256i) _mm256_maskz_cvtepi8_epi16 (__U, __A), 8)); +} + #ifdef __DISABLE_AVX10_2_256__ #undef __DISABLE_AVX10_2_256__ #pragma GCC pop_options diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-512-convert-1.c b/gcc/testsuite/gcc.target/i386/avx10_2-512-convert-1.c index bbbff186d0a..f67138c237c 100644 --- a/gcc/testsuite/gcc.target/i386/avx10_2-512-convert-1.c +++ b/gcc/testsuite/gcc.target/i386/avx10_2-512-convert-1.c @@ -45,13 +45,17 @@ /* { dg-final { scan-assembler-times "vcvtneph2hf8s\[ \\t\]*%zmm\[0-9\]+,\[^\{\n\]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ /* { dg-final { scan-assembler-times "vcvtneph2hf8s\[ \\t\]*%zmm\[0-9\]+,\[^\{\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ /* { dg-final { scan-assembler-times "vcvtneph2hf8s\[ \\t\]*%zmm\[0-9\]+,\[^\{\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vpsllw\[ \t]\+\\\$8, %zmm\[0-9]\+, %zmm\[0-9]\+(?:\n|\[ \\t\]+#)" 2 } } */ +/* { dg-final { scan-assembler-times "vpsllw\[ \t]\+\\\$8, %zmm\[0-9]\+, %zmm\[0-9]\+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vpmovsxbw\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\]*%zmm\[0-9\](?:\n|\[ \\t\]+#)" 2 } } */ +/* { dg-final { scan-assembler-times "vpmovsxbw\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\]*%zmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ #include -volatile __m256i x256i; +volatile __m256i x256i, z1; volatile __m512i x512i; volatile __m512 x, a1, b1; -volatile __m512h y, x512h; +volatile __m512h y, x512h, z; volatile __mmask16 m16; volatile __mmask32 m32; volatile __mmask64 m64; @@ -174,3 +178,11 @@ avx10_2_512_vcvtneph2hf8s_test (void) x256i = _mm512_mask_cvtnesph_phf8 (x256i, m32, x512h); x256i = _mm512_maskz_cvtnesph_phf8 (m32, x512h); } + +void extern +avx10_2_512_cvtbf8_fp16_test (void) +{ + y = _mm512_cvtpbf8_ph (z1); + y = _mm512_mask_cvtpbf8_ph (z, m16, z1); + y = _mm512_maskz_cvtpbf8_ph (m16, z1); +} diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-convert-1.c b/gcc/testsuite/gcc.target/i386/avx10_2-convert-1.c index 015474f8cf3..9c3e85718f2 100644 --- a/gcc/testsuite/gcc.target/i386/avx10_2-convert-1.c +++ b/gcc/testsuite/gcc.target/i386/avx10_2-convert-1.c @@ -87,14 +87,22 @@ /* { dg-final { scan-assembler-times "vcvtneph2hf8sy\[ \\t\]*%ymm\[0-9\]+,\[^\{\n\]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ /* { dg-final { scan-assembler-times "vcvtneph2hf8sy\[ \\t\]*%ymm\[0-9\]+,\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ /* { dg-final { scan-assembler-times "vcvtneph2hf8sy\[ \\t\]*%ymm\[0-9\]+,\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vpmovsxbw\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\]*%ymm\[0-9\](?:\n|\[ \\t\]+#)" 2 } } */ +/* { dg-final { scan-assembler-times "vpsllw\[ \t]\+\\\$8, %ymm\[0-9]\+, %ymm\[0-9]\+(?:\n|\[ \\t\]+#)" 2 } } */ +/* { dg-final { scan-assembler-times "vpsllw\[ \t]\+\\\$8, %ymm\[0-9]\+, %ymm\[0-9]\+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vpmovsxbw\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vpmovsxbw\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 2 } } */ +/* { dg-final { scan-assembler-times "vpsllw\[ \t]\+\\\$8, %xmm\[0-9]\+, %xmm\[0-9]\+(?:\n|\[ \\t\]+#)" 2 } } */ +/* { dg-final { scan-assembler-times "vpsllw\[ \t]\+\\\$8, %xmm\[0-9]\+, %xmm\[0-9]\+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vpmovsxbw\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ #include volatile __m128 x1,a1,b1; volatile __m256 x2,a2,b2; -volatile __m128h y,x128h; -volatile __m256h y2,x256h; -volatile __m128i x128i; +volatile __m128h y,x128h,z; +volatile __m256h y2,x256h,z2; +volatile __m128i x128i,z3; volatile __m256i x256i; volatile __mmask8 m8; volatile __mmask16 m16; @@ -272,3 +280,15 @@ avx10_2_vcvtneph2hf8s_test (void) x128i = _mm256_mask_cvtnesph_phf8 (x128i, m16, x256h); x128i = _mm256_maskz_cvtnesph_phf8 (m16, x256h); } + +void extern +avx10_2_cvtbf8_fp16_test (void) +{ + y = _mm_cvtpbf8_ph (z3); + y = _mm_mask_cvtpbf8_ph (z, m8, z3); + y = _mm_maskz_cvtpbf8_ph (m8, z3); + + y2 = _mm256_cvtpbf8_ph (z3); + y2 = _mm256_mask_cvtpbf8_ph (z2, m8, z3); + y2 = _mm256_maskz_cvtpbf8_ph (m8, z3); +}